Understanding _cogstart
I'm trying to full understand what happens at a low level with a call to _cogstart
The signature isint _cogstart(void (*func)(void *), void *arg, void *stack_base, uint32_t stack_size);
My questions are
1. I notice there's no cog id argument. Is there no deterministic way to start on a specific cog?
2. I've seen some examples that define a struct for arg, and an array for stack and am wondering what happens and what ends up in memory where on init. At a low level I assume that the code in the function is ending up in the new cog RAM, but what's happening with the args and stack on the new cog and the original cog?
Comments
I'm telling you Catalina's answer, but I think it would be similar for the other C compilers:
This function will probably use any available cog (and return the cog id it actually used). There is probably a similar function that also accepts a specific cog as an additional parameter. In Catalina these are _cogstart_C(func, arg, stack, size) and _cogstart_C_cog(func, start, stack, size, cog).
The arg is type "void *", which is a typical C way of effectively saying "any type" - i.e. either a pointer or another type the size of a pointer or smaller. As long as the caller and callee agree on what they type is, this is fine. This means it can be a pointer to a struct if the definition of the struct is known to both. As for the stack, it can be an array name, or a pointer to RAM. In C these are both just pointers. Yes, the code of the C function is what ends up being executed by the new cog. Finally, you have no guarantee that the stacks of the caller and called are at all related.
Ross.
To further expand upon Ross's answer: most C compilers keep the stack in HUB memory, which is why it has to be specified explicitly (and has to be different from the original COG's stack). Similarly the C code is usually kept in HUB memory too, so all the COGs can execute the same copy of it. On the P2 the processor can do this directly, on the P1 a trick called "LMM" (Large Memory Model) is used, which is basically a very fast interpreter that reads instructions from HUB and executes them in the COG.
Whether you can specify an explicit COG or not is probably compiler dependent. Catalina can do this. I don't think propgcc has an option for it, and FlexProp doesn't either as of 5.9.24. That's an oversight on my part, and I'll add defines to propeller2.h which match Ross's declarations above. They'll be in the next release (5.9.25), which will be coming out soon.
Thanks for the replies. I should have noticed the int return type on _cogstart.
I see the point of putting the code in HUB memory, but from a performance standpoint isn't it slower to read instructions from Hub RAM (assuming the code would fit in Cog RAM)? Is the approach usually to try it that way (in HUB memory) and see if it's fast enough?
Is the stack in HUB memory only because the Cog stack is limited to 8 levels? I see some example code that is defining the stack and passing the size to _cogstart but not referencing it again explicitly. How would one determine the size of the stack they need?
@ersmith, by soon you meant within 24 hours? Did the change to propeller2.h make it into the release? I'm up to date with origin/master and don't see it.
I had a read of the FAQ and I think it brought some clarity on the process. Please correct me if I misunderstand:
On the P2 execution from hub is almost as fast as execution from cog, except for branches (which have to re-fill the instruction fifo and so are relatively slow). Some of the C compilers can use cog memory as a cache for small loops to get around this. Putting the whole program in cog memory is impractical except for very small programs.
Yes, the built-in stack is far too small for use with a C compiler. Compilers typically use the stack to save context, and also for local variables.
The stack should only be used implicitly, by the compiler; trying to use it explicitly would be a bad idea (unless you stop the cog first, of course). Determining the size is tricky, but basically it needs to be big enough to hold all local variables in all active functions (so if A() calls B() calls C() all three of those functions are "active" at the same time), plus a bit of housekeeping room for internal purposes (roughly 32 bytes per active function, but that depends). Typically 1K of stack is enough for code that doesn't allocate any arrays on the stack.
Most of the compilers have an option to show a listing of the code they've generated, which you may find useful if you're interested in the low level operation of them.
It is in the 5.9.26 release, yes (you are talking about flexspin/flexprop, right?). _cogstart_C and _cogstart_C_cog are actually macros in FlexC.
No, coginit/cogstart of a C function does not copy anything into cog memory, it just runs the code directly from hub. At least that's true for FlexC, Ross can answer for Catalina but I think it's the same there.
For assembly language programs (written directly in PASM) the code typically is loaded into COG memory, but for that you have to use _cogstart_PASM rather than _cogstart_C.
You can see the low level details if you look at the listing file (in flexprop it's under the File > Open listing file... menu). It's not commented, because it's generated by machine, but
the initialization code takes two paths depending on the value of the ptra register. If ptra is 0, it sets everything up for the main cog program (cog 0). If ptra is nonzero then it jumps to the ill-named "spininit" function to initialize to run a high level function. In this case ptra contains the beginning of the new stack area, which the compiler has set up with a pointer to the function to run and the parameters for that function. It pops those off the stack and then does an indirect jump to the new function. When that function returns it jumps to an exit routine which shuts down the cog.
Learning a lot, thanks. Of course, that just brings more questions, but I'll try to absorb this for a bit. I do have one though.
_cogstart_PASM is a macro that just calls _coginit, so that will do a copy to Cog RAM? Is a C function pointer value valid to pass as *pgm to force a copy to Hub RAM (assuming it would fit)?
No, that won't work, because C code (at least in general) expects the cog to be set up in a certain way, e.g. that ptra points to the new stack, that certain internal routines are available in cog memory already, and so on. All of that setup happens "under the hood" in cogstart_C.
To write COG code in C, at least with FlexC, you'll have to do a 2 step process -- manually build the COG functions by running spin2cpp with the --code=cog flag, then put the generated assembly into the project with your main code (treating it as any other assembly). This is a pretty awkward process. Really, I wouldn't recommend it unless it's really vital -- HUB code is almost as fast.
Oh, it is possible to put small functions permanently into COG or LUT ram by adding attribute(cog) or attribute(lut) after them (see the general compiler documentation, in general.md). These will be copied into all COGs that get started, so they have to be really small.
Makes sense. Can you point me general.md? I can't find it in your repos.
spin2cpp/doc/general.md
Yes, all pretty much the same for Catalina - no C code is copied into cog memory by the cogstart functions, just the stuff necessary to enable the cog to execute the C code from Hub RAM or external RAM.
Catalina doesn't have a cog execution mode for C, because the cog limitations makes it too difficult to conform to the ANSI C standard, and I didn't want to have a subset of the language for something that had so little practical use.
Wow, major fail github search. Seems like it will only match content that says general.md, not file names = general.md
Interesting. I'm new to the Propeller and have only dabbled in coding this close to the hardware previously. It's good to find out about the motivations. I guess if it comes down to something that needs increased performance directly on a cog I'll learn a little PASM.
spin2cpp is a submodule, which actually lives in another repository, so you can either search here: https://github.com/totalspectrum/spin2cpp/tree/21bb446d0faf42fc521555f7f9682a5887233714
OR, you can just download the flexprop ZIP: https://github.com/totalspectrum/flexprop/archive/refs/heads/master.zip
Or, possibly if you can get to a command line (macOS, Linux, WIN?):
And you'll get all of the files needed for flexprop and will have access to all of its documentation.
dgately
On the Propeller you really do have to program some things in PASM. No compiler could ever effectively use some of the programming features that are available in the Propeller instruction set and smartpins. And you need to use them to use the Propeller to its full potential.
I seem to have the _cogstart_C_cog definitions now in propeller2.h (after updating submodules), but I get this error when try to compile with a call to it:
error: unknown identifier __builtin_cogstart_cog used in function call
You'll also need to rebuild the compiler (flexspin/spin2cpp). __builtin_cogstart_cog is built in to the 5.9.26 compiler but is not present on earlier compilers.
Got it. Working now. Thanks.
If I had a small PASM program, how would I include it in C and get a pointer to it that could be passed to _cogstart_PASM ?
In the FlexProp "samples" directory are two examples of how to do this, led_server_asm.c (using C style expressions and comments) and led_server_pasm.c (using Spin style). I've attached the led_server_asm.c program here. It uses _cognew() rather than _cogstart_PASM(), but the difference between the two is trivial.
Perfect, thanks.
Sorry, a bit late to this one. But just for completeness ...
In Catalina you can compile the PASM as normal, then use spinc (P1) or bindump (P2) to convert the compiled version into an array of binary data, and then use
_coginit
(P1) or_cogstart_PASM
(P2) on that array. There is an example for the the p1 in catalina\demos\spinc and for the p2 in catalina\demos\p2An even simpler option is to just format your PASM program as a Catalina object (there are some naming and call conventions you have to conform to) then you can just use your PASM function exactly as you would a C function. There is an example of this for the the p1 in catalina\demos\spinc - it would be similar for the P2 as the conventions are identical even though the PASM is not. Here is the actual P1 code:
On the P1 you can also load and run spin programs. Haven't tried that on the P2, but I should do so just to make sure it works. I've never actually used Spin on a P2 - there always seemed to be too many incomplete or incompatible Spin compilers. I assume that's all sorted now.
Ross.