New Spin

Dave Hein · 2017-06-22 02:52

Why not run it all in the same cog? Just convert everything to PASM and run it using hub exec. I'm thinking of adding Spin to p2gcc. This way you could compile and link a program like this:

p2gcc main.c sub1.s sub2.spin

p2gcc would run the C compiler on main.c, the P2 assembler on sub1.s and spin2cpp on sub2.spin. The three files would then be linked together into one executable that would run in one cog. Of course, if you want to run parts of it in other cogs you would just do a cognew or coginit in your code. This way you could program a part of your project in Spin, another part in C and another part in assembly.

EDIT: I'm not sure of the value of include PropBASIC or Tachyon in the mix. PropBASIC compiles to PASM, so it probably wouldn't be too hard to interface to it. Tachyon runs in it's own world, and I don't see much value in interfacing to it.

msrobots · 2017-06-22 03:35

Dave Hein wrote: »

...
I'm not sure of the value of include PropBASIC or Tachyon in the mix. PropBASIC compiles to PASM, so it probably wouldn't be too hard to interface to it. Tachyon runs in it's own world, and I don't see much value in interfacing to it.

And here I really disagree. Having 16 COG makes it handy to have some FORTH running a G-Code interpreter to feed some C-stepper driver with data and spit out errors to some Spin main Cog.

Same goes for any other language. Those who like them, like them, other do not.

But being able to combine them in your project, all together is a very nice thing. I for example like COBOL,PASM, SPIN, C# and C in about that order.

Sure I am able to tweak some C driver to fit my needs, but I do like to program in PASM/SPIN.

You or some other C programmer, might be able to tweak some SPIN/PASM/PROPBASIC (I leave out Tachyon here...) Program to fit your needs.

And - sorry if this is rude - but your idea to compile everything into one main program running on just one cog is just - hmm - wrong. I avoided words starting with s...

I do really appreciate your idea of - p2gcc main.c sub1.s sub2.spin

But this does not fit the bill, because say 4 cogs are running some Spin-Interpreter with OBEX drivers. 6 are running C. Mikster would like to use my project, but he swears on PropBasic so his main Project is in Basic. He needs to call Spin-interpreterCog2-obj3-fillparams(Z,X,Y) to provide some Z,Y,X parameter.

My goal is not to compile everything to C and then to PASM and run it on a single cog. That is - hmm - shortsighted.

Enjoy!

Mike

jmg · 2017-06-22 03:54

Dave Hein wrote: »

Why not run it all in the same cog? Just convert everything to PASM and run it using hub exec. I'm thinking of adding Spin to p2gcc. This way you could compile and link a program like this:....
p2gcc main.c sub1.s sub2.spin

Sounds useful. Can that control how many COGS the whole thing ends up running in/on ?

msrobots wrote: »

... Having 16 COG makes it handy to have some FORTH running a G-Code interpreter to feed some C-stepper driver with data and spit out errors to some Spin main Cog.
...
And - sorry if this is rude - but your idea to compile everything into one main program running on just one cog is just - hmm - wrong. I avoided words starting with s...

I can see both use cases being deployed.

COGs are completely independent but they are very code constrained, so there will be cases where code simply does not neatly fit into each COG.
Then, those overflow fragments will need to be managed, as will the whole memory map. Things are no longer nicely contained.

Next, someone may want to call 'other language' chunks as a library, rather than as a 'stand along runtime', in this use, Dave Hein's proposal is perfectly sensible, and not just - hmm - wrong.

msrobots · 2017-06-22 03:59

well jmg,

a cog is not code constrained anymore since we have Hub-Exec. But still independent.

Enjoy!

Mike

jmg · 2017-06-22 04:02

msrobots wrote: »

well jmg,

a cog is not code constrained anymore since we have Hub-Exec. But still independent.

Depends on your values of "But still independent" ?
Once it needs to eat space from the shared memory pool, most would call that no longer quite meeting 'still independent'.
Clearly, all 16 COGS do not get 512k each! The user will have to know, and manage, those now possibly conflicting and inter-dependent, demands.

Enjoy !

msrobots · 2017-06-22 04:17

@jmg,

exactly!

That is why I think a 'common' standard should be a goal to allow sharing the common resource HUB.

And 'common' would mean 'between Spin2 and C' to make it easy. Other may and can follow then.

But those both SHOULD work together from the beginning.

Sure not every Cog has 512K, sadly.

But Spin and C are quite 'relocation able', it just needs to be done right in the first place.

Now we can run any Spin Program in 32K, same for C and Tachyon and PropBasic.

So 512K can support multiple parallel running Programs.

Enjoy!

Mike

cgracey · 2017-06-22 08:21

jmg wrote: »

Phil Pilgrim (PhiPi) wrote: »

I second the need for variable-length arg lists in Spin2. Printing to the terminal could then be so much easier.

This likely comes down to size. printf is famous for bloat effects on the smallest MCUs.

To have the least impact, and best performance, Spin 2 should ideally fit into a single COG, with no overflow/bleed effects to worry about & manage.
I'm not sure where Chip is up to on the 'features list' vs 'code size' trade offs....

Spin2 fits in one cog.

Dave Hein · 2017-06-22 11:29

msrobots, Spin, C, PropBASIC and Tachycon can all work together if you use the mailbox method. However, developing a standard mailbox method for the Prop has been quite allusive over the years. I wish you good luck on achieving that.

David Betz developed spinwrap, which uses the mailbox technique to interface Spin to C. Maybe you can expand on that idea.

msrobots, you seem to be violently against the idea of just compiling everything to PASM using the C calling convention, and just running everything on one cog. spin2cpp works, so I don't understand your objection to it. A Spin programmer can write all his code in Spin, and then compile it with spin2cpp. He never has to touch a line of C code. This results in a program that runs at a higher speed than interpreted Spin. It does require more memory than using bytecodes, but this is not an issue on the P2.

As I said, multiple cogs can be used. You just have to start them up yourself. That's no different than how Spin works today.

There are far fewer programs written in PropBASIC and Tachyon then there are in C and Spin. Integrating them with C and Spin reaches the law of diminishing returns. So I would suggest using the mailbox technique with those languages.

msrobots · 2017-06-22 19:18

All I am asking for is a way to call a Spin method running in a spin-interpreter cog from PASM.

This would solve the whole call-back issue in Spin without having to juggle with method pointers, Types, and all this in the Spin language.

It also would give a standard way to integrate a spin-object without hassle into all other languages able to run PASM stubs. Which basically all other languages for the Prop do.

And the reason why I dislike to compile everything to PASM and use the c calling convention is that I think that memory is still a issue on the P2. First of all we have now 16 Cogs and they basically doubled in size. So the first 64K are already gone. Second higher resolution Video is now possible and will eat a lot of memory. And third, as of my experience you never have enough ram.

Enjoy!

Mike

Dave Hein · 2017-06-22 19:50

Have you looked at SpinWrap? It seems to do what you want between C and Spin. The C code runs in one cog, and the Spin code runs in another cog.

Yes, I understand that any resource will be fully consumed no matter how large it is. However, P2's 512K of RAM is more than enough for almost any Propeller application that's been constructed to date. It is true that 512K is not enough to support high resolution displays. Hopefully, the P2 will be able to utilize external RAM to do that, which will leave plenty of room for code.

As far as 16 cogs requiring 64K of hub RAM, I don't think that will be the case in most situations. That assumes that each cog will fully utilize all of its cog RAM plus all of its LUT RAM, and that they all will run different code. It's more likely the cogs will only use a portion of the cog RAM, and multiple Spin cogs would use the same interpreter code. So realistically, you're probably talking about less than 16K of hub RAM required for cog images. Even if you did use 64K you would still have 448K remaining.

Rayman · 2017-06-22 20:27

Why force Spin2 into one cog?

Well, maybe the core should fit in one cog.
But, I'd think about how to support a Spin2 floating point library.

People (like me) will run to C or C++ at the first hint they might want floating point...

idbruce · 2017-06-22 21:18

People (like me) will run to C or C++ at the first hint they might want floating point...

agreed

jmg · 2017-06-22 21:23

msrobots wrote: »

All I am asking for is a way to call a Spin method running in a spin-interpreter cog from PASM.

This is still a linker problem, and PUBLIC / EXTERN issues.
Overall, I'm not following what you hope to see here.

The Spin engine is inside a COG, and so that cannot be called into by any other COG.
The Spin bytecodes will be somewhere in HUB, but the pointers to those bytecodes & VAR RAM, are inside the COG.

You cannot "call a Spin method" from anywhere else, full stop.
You could 'request' a Spin method, via some agreed mailbox, but that is a handshake operation, not a call.
This would require the Spin be coded as a Slave, not a Master.

In theory, if you placed HLL functions, on a per function basis, outside of a COG and into HUB (this cost memory and speed, but does make them public), now you could physically call those from any COG, but extreme care is needed around any VARs those shared functions expect.
HUB vars must be separate (pointer based), otherwise they can corrupt, and COG local VARs are ok, but these must be placed in the same address, in all calling COGs.
Self modifying code is clearly forbidden.

All that adds up to a quite complex linker, with many segments, and many 'coding rules'.....

jmg · 2017-06-22 21:51

Rayman wrote: »

Why force Spin2 into one cog?

Well, maybe the core should fit in one cog.
But, I'd think about how to support a Spin2 floating point library.

People (like me) will run to C or C++ at the first hint they might want floating point...

I see some ARM parts have a ROM section, that supports common math extensions.

Perhaps Spin2 can do something similar ? - have a carefully crafted float 'ROM' that can locate in HUB, thus be shared by all Spin2 COGS, and any other HLLs that have the same register-locations/param handling.

msrobots · 2017-06-23 23:14

@jmg,

yes I am aware of the mailbox vs. calling issue. But currently the Spin-Stack seems to be in the Hub, so should be accessible.

But after trying to follow the whole discussion to allow Spin to do call-backs to spin, adding templates/prototypes to Spin-Objects, moving around a 2(4?) LONG pointer and such I thought it might be more easy to just allow one entry method, alike C calling convention and put stuff on the stack, provide a identifier and get the result back. That identifier needs to be provided by the spin compiler to the linker like external c-functions are provided by GCC to the linker.

Sure, this might need one Hub-Mailbox per Interpreter, since the SPIN Cog would be a Slave then, but it would not clutter the whole Spin Language. That mailbox could be the first Stack address of Spin or such.

@David and @Dave,

after rereading my post I can see why you think I am against C, but in the opposite, I am pushing for GCC development now since a couple of month, because even if I do like SPIN, I really think that p2 without serious C support is doomed.

And serious means that this needs to be a ongoing process, not like P1 where after a great start, and further development, Parallax not update SimpleIIDE to the latest PropGcc, even after years!

And PropGcc itself is also behind Gcc, because nobody is in charge and get paid to handle the ongoing needed tasks.

@Chip is not needed (or not much) to get PropGcc running again. But the people who created PropGcc in the first place. And somehow I feel they need to get paid to do that, I earn my living since 30+ years thru programming and expect to get paid for my effort. So should them.

And like @Chip they might discover that a slight change in one or two instructions could save a lot of work for the c-compiler and code-size/run-time for the created binaries, IF we would have known that before the final silicon was created.

But as far as I know none of them is working for Parallax or seeks a fulltime employment there.

After being in Rocklin and meeting some of the staff, I think it is a quite good place to work, I would not qualify for the job, nor do I actively seek for another employment.

And the final reason is, that the only COBOL compiler I can use now transpiles COBOL-Programs to C (or C++) and runs them thru GCC/clang/MS-C/ you name it. Just has to be some sort of POSIX support there.

And in opposite to the P1 the P2 has the horsepower and Ram to run COBOL usefully.

But without PropGCC, I will never get to running COBOL on the P2.

Except @Heater. rescues me and get CP/M running on ZICOG on a P2. But that would be cheating, I am told.

Enjoy!

Mike

jmg · 2017-06-23 23:43

msrobots wrote: »

@jmg,
yes I am aware of the mailbox vs. calling issue. But currently the Spin-Stack seems to be in the Hub, so should be accessible.

I'm still not following, accessible is a long way short of usable.
The stack-space may be in the hub, but you have no way of knowing where in that Stack space Spin is currently working, or when it is safe to disturb the stack.

I still see this as more a linking issue - all languages that want to co-operate across COGS, need a means to declare EXTERN and PUBLIC names, & those are linker directives, that the linker uses to then overlay at the same memory.

Once you have that, you can pass information between any COGS by agreement on flags and content of those records.

DLL handling in PCs are somewhat similar, but I'm not sure P2 could manage DLLs (or would want to).

Heater. · 2017-06-23 23:51

msrobots,

...the only COBOL compiler I can use now transpiles COBOL-Programs to C (or C++) and runs them thru GCC/clang/MS-C/ you name it....

So....we can transpile COBOL to C/C++. We can then transpile that C/C++ to Javascript. and run the thing in a browser.

That's nuts of course, therefore it is something that needs to be done. Besides it would put a frown on the faces of enthusiasts of all three languages

Do you have a COBOL program that might be entertaining for this purpose? Something not too big.

jmg · 2017-06-23 23:58

Heater. wrote: »

.... Besides it would put a frown on the faces of enthusiasts of all three languages

That alone makes it worth doing

msrobots · 2017-06-24 00:32

OK, we are getting nearer.

I actual found some example of somebody here using the ELF-Linker from PropGcc to support the old overlays used in PC-XT times. And it did work with a C-Program on a P1, I was really surprised.

Then I looked a bit more into the ELF-Linker and most of it went straight over my head. But at that moment I got the Idea that, if SPIN and C (and other) would somehow agree on using the Elf-Linker interesting things could happen. Because a lot of things the linker can do could be reused for a smart loader for P2.

For example COG images can now be loaded into a COG on a P2 and terminated, leaving the COG image in the COG. Later on you can start a COG without loading it again at a given address.

So the loader could support to preload certain Sections into COGS before loading the main program. so it could preload a SPIN-Interpreter to COGx. No wasted HUB memory. Same goes for other COG/LUT images. could be preloaded.

EXTERN and PUBLIC names (in HUB) could also be used if the SPIN-compiler would (optionally?) support ELF-Linking.

Then SPIN can share named memory locations with C (and others) and can call C (and other) EXTERNAL functions from Hub Exec PASM. Nice.

But calling from C to SPIN(interpreter.obja.methodb), or SPIN(interpreterX) to SPIN(interpreterX/Y.obja.methodb) or PASM to SPIN(interpreterX.obja.methodb) has to be solved.

So if a Spin-Interpreter running in Cog X could be 'called' by some PASM running from another COG, all of this would be solved at once.

And I think if the SPIN2 interpreter could provide some way to allow that (even polling some mailbox per COG in one of the interrupts) then ALL of them calling/linking problems could be solved nicely.

Without much change in the SPIN Language. Allow 'PEX' or 'EXT' do define a Method/function/procedure as external accessible and provide a way to call it.

I am not sure how to explain this better, for me it is a quite obvious solution.

SpinWrap does this for C, somehow, but if the Spin-Interpreter would be able to provide that by itself, we would be able to use some C-calling variant from all languages supporting PASM.

Enjoy!

Mike

Dave Hein · 2017-06-24 02:22

The SpinWrap method is pretty straight-forward. The Spin program is just running in a polling loop waiting for a command from the C program. When the Spin program gets a command it figures out which method is being called, and then calls it. The calling parameters and the return values are passed through the mail box along with the command.

The magic in SpinWrap is in the code that it generates for C and for Spin to implement this. David Betz did a good job in putting this all together.

msrobots · 2017-06-24 02:43

Then this basically does what I like the spin-interpreter to offer by itself.

Enjoy!

Mike

msrobots · 2017-06-24 02:49

Except that the spin interpreter runs its own Program and just checks the mailbox once in a while to 'run calls from external COGs'

Mike

Dave Hein · 2017-06-24 02:59

That seems like more than you were asking for at first. This has morphed into independent programs doing their own thing, but they can also task switch when commanded by another cog.

Phil Pilgrim (PhiPi) · 2017-06-24 05:19

What you guys are describing is nothing more than a remote procedure call (RPC), but between processors on the same chip, rather than something miles away on another server.

-Phil

msrobots · 2017-06-24 05:23

no, not task switch. Just interrupt

Just run method and return result, while the rest of the normal program is running, just delayed for the call.

The called method should run by the called SPIN COG in the SPIN memory space returning SPIN values of the running Spin Interpreter.

Its not about 'Including' a spin object, its about 'cooperating'.

And that is what I try to explain the whole time, but seem to be to stupid to do so.

Most Programs I have looked at on the P1 do not really work parallel. Some serial driver do, Lonesocks FSRW SPI does but usually a COG calls a Mailbox AND WAITS FOR THE RESULT.

So we are single processing on multiple execution units.

With 16 COGS we can finally try to use the 'SLICES' and run real parallel programs, not pseudo parallel sequential ones.

That is why I think compiling everything together into one PASM block and run on one core is not the right solution for a development system with 16 independent execution units.

To get the uniqueness of the P2 out we will need some different approach to linking and combining objects running on different cores.

Maybe Blockly and @Ken are on the right way there with just allowing multiple Block-Diagrams next to each other.

just thinking,

Mike

David Betz · 2017-06-24 09:36

Phil Pilgrim (PhiPi) wrote: »

What you guys are describing is nothing more than a remote procedure call (RPC), but between processors on the same chip, rather than something miles away on another server.

-Phil

Yes, that is exactly correct. I'm not sure how else you do it on P1. It's entire purpose was to allow Spin code to run as-is rather than translating it into C/C++ or PASM.

jmg · 2017-06-26 21:00

Dave Hein wrote: »

The SpinWrap method is pretty straight-forward. The Spin program is just running in a polling loop waiting for a command from the C program. When the Spin program gets a command it figures out which method is being called, and then calls it. The calling parameters and the return values are passed through the mail box along with the command.

The magic in SpinWrap is in the code that it generates for C and for Spin to implement this. David Betz did a good job in putting this all together.

Thinking some more about cross-cog/cross-language handling, one simple and broadly general model to use would be a SoftwareFIFO - that can be used for both data flows, and parameter lists. These have an Array of agreed Size, and Rd & Wr pointers.
Manual-overlay via some absolute address does works, but a linker PUBLIC, EXTERN would be more general.

Next obvious question is how compact can the polling loop be, and what power does it use ?

I think Chip has mentioned you can write 4 bytes atomically and read one long, which means you could poll up to 4 COGs flags in one HUB read.
Above 4, the RDFAST with 4 longs, (not avail in hubexec) would capture 16 flags in one eggbeater pass, how fast can those be tested ?

A RDLONG / Test / Loop looks like it can loop inside a hub-slot window, at least in COG exec. Is Hubexec still < slot time ?

During those SysCLK cycles the polling is waiting, I'm guessing the COG power drops to WAITxx equivalent ?

I don't see a single opcode for Compare and short jump, which could save power & time here ?

David Betz · 2017-06-26 21:03

jmg wrote: »

Dave Hein wrote: »

The SpinWrap method is pretty straight-forward. The Spin program is just running in a polling loop waiting for a command from the C program. When the Spin program gets a command it figures out which method is being called, and then calls it. The calling parameters and the return values are passed through the mail box along with the command.

The magic in SpinWrap is in the code that it generates for C and for Spin to implement this. David Betz did a good job in putting this all together.

Thinking some more about cross-cog/cross-language handling, one simple and broadly general model to use would be a SoftwareFIFO - that can be used for both data flows, and parameter lists. These have an Array of agreed Size, and Rd & Wr pointers.
Manual-overlay via some absolute address does works, but a linker PUBLIC, EXTERN would be more general.

Next obvious question is how compact can the polling loop be, and what power does it use ?

I think Chip has mentioned you can write 4 bytes atomically and read one long, which means you could poll up to 4 COGs flags in one HUB read.
Above 4, the RDFAST with 4 longs, (not avail in hubexec) would capture 16 flags in one eggbeater pass, how fast can those be tested ?

A RDLONG / Test / Loop looks like it can loop inside a hub-slot window, at least in COG exec. Is Hubexec still < slot time ?

During those SysCLK cycles the polling is waiting, I'm guessing the COG power drops to WAITxx equivalent ?

I don't see a single opcode for Compare and short jump, which could save power & time here ?

Do you really need to do polling at all on P2? Can't one COG interrupt another? Why not just have the server COG just idle at low power waiting for an interrupt?

jmg · 2017-06-26 21:12

David Betz wrote: »

Why not just have the server COG just idle at low power waiting for an interrupt?

Yes, that is what I'm driving towards - a means for lowest power idles.

David Betz wrote: »

Do you really need to do polling at all on P2? Can't one COG interrupt another?

I've not followed interrupts in depth, but I see there are only 3, & it's not clear how the originator identifies.

I do now see a WAITATN, which sounds close, but lacks ID. Sounds like that does save power.

So maybe WAITATN and RDFAST paired, can wake, then get infos from up to 16(15) possible requestors ?
Is 4 longs (16 bytes) the most compact atomic flag scheme ?

Yanomani · 2017-06-27 06:03

Hi jmg

Some other options to consider.

There are the three instantly accessible long repository positions, crafted into each smart pin structure, but we can't do atomic read-modify-write operations into them.

For each Cog that is polling the specific pin, IN is raised to flag that a new value was loaded into any one of the repos, but I didn't saw any discriminator to flag wich long was written by the sender Cog.

Due to the fact that there are 16 Cogs and 3 x 64 = 192 repos, sure that can lead to a protocol where only one Cog does writes, says, for example, into the first long repository of the pin whose number coincides with its own number, or any other pre-predefined pin-number space (e. g.;Cog[0]=>Pins[3:0], ..., Cog[15]=>Pins[63:60]).

RQPIN can be used by any Cog, to enable some "quietly sniffing" flag/data gathering mode.

The pretend-to-be answering Cog, if any, could write it's own number-related bit position at, says, the second long, then, wait some minimum cycles, before doing a RQPIN, to check if it was the winner of the bid.

Case not, it can kill itself, out of frustration, by not being as fast as the winner was, or by don't receiving permissions, to bite a piece of the cake!

Having three repos into each pin, things like Request/Grant/Grant Aknowledge can be a breeze.

In extremis, locks could be used, to ensure that flags aren't lost by having two or more Cogs simultaneously trying to modify the repos contents, but I hope no one would feels itself needing to do a trip this far away.

There are some nice trigger/helper mechanisms spread along pins and into the hub, to the point I can't remember/mention them all, into a glance. :cool:

To don't appear to be so dense...; protocol details could vary as dreamed/intended by any ingenous programmer.

Henrique

New Spin

Comments