Shared code between LMM and COGM cogs.
TomUdale
Posts: 75
Hi again,
The journey of discovery continues. I am now trying to do some real stuff with my board.
Because we have a lot of IO, there is a data bus hanging off the propeller. I have written a c module that provides a clean API to that data bus. So for example, there are
void WriteLEDRed(int led);
void WriteLEDOff(int led);
functions that turn LEDs on and off, hiding all the details of which pins the address/data/strobe lines are on, which bits of which CPLD register the leds are on and so forth. There are piles of other functions that handle other devices (relays, ADC, DAC, etc) that are also sitting on the bus. All these functions are lock protected so they can be called from any cog without corrupting the various write shadows, address lines etc that need to be twiddled during a bus cycle.
Now here comes the problem. Several of these devices on the bus are logically separated. I need to be able to have multiple cogs doing different things all talking to this bus. A few of the cogs (in particular the one touching the ADC/DAC which is doing some signal processing) need to run as fast as possible. Thus I want those cogs to be COGM.
Other cogs, for example one that is running the status LEDs, can and should run LMM. The trouble is, how do I share this C module between the two spaces?
It almost seems like I need to build it twice, once in a .cogm file and once in a regular c file with some preprocessor magic to give the two versions of each function different names.
But I am not even sure if I can put shared code in a COGM. Indeed I think I saw that COGM functions cannot call outside the module they reside in. It would not be impossible that I end up with >1 cog running COGM and >1 cog running LMM all talking to the data bus.
Is anyone dealing with this sort of problem?
Best regards,
Tom
The journey of discovery continues. I am now trying to do some real stuff with my board.
Because we have a lot of IO, there is a data bus hanging off the propeller. I have written a c module that provides a clean API to that data bus. So for example, there are
void WriteLEDRed(int led);
void WriteLEDOff(int led);
functions that turn LEDs on and off, hiding all the details of which pins the address/data/strobe lines are on, which bits of which CPLD register the leds are on and so forth. There are piles of other functions that handle other devices (relays, ADC, DAC, etc) that are also sitting on the bus. All these functions are lock protected so they can be called from any cog without corrupting the various write shadows, address lines etc that need to be twiddled during a bus cycle.
Now here comes the problem. Several of these devices on the bus are logically separated. I need to be able to have multiple cogs doing different things all talking to this bus. A few of the cogs (in particular the one touching the ADC/DAC which is doing some signal processing) need to run as fast as possible. Thus I want those cogs to be COGM.
Other cogs, for example one that is running the status LEDs, can and should run LMM. The trouble is, how do I share this C module between the two spaces?
It almost seems like I need to build it twice, once in a .cogm file and once in a regular c file with some preprocessor magic to give the two versions of each function different names.
But I am not even sure if I can put shared code in a COGM. Indeed I think I saw that COGM functions cannot call outside the module they reside in. It would not be impossible that I end up with >1 cog running COGM and >1 cog running LMM all talking to the data bus.
Is anyone dealing with this sort of problem?
Best regards,
Tom
Comments
Wouldn't the usual solution to this be a mail-box / small FIFO per message path ?
That avoids needing calls across COGS and one cog is in charge of the DataBus.
It also prevents any lock protection stalls, from affecting the COGs operation
However, it might be worth examining your code carefully to see if it really needs to be in COG mode. LMM code can run at COG speed in short loops by using FCACHE ("fast cache", I think it stands for -- Bill Henning came up with the idea and name). The compiler does this automatically for small loops. You can also compile a complete function to run in FCACHE, so long as it is small enough and does not call out to other functions. Just prefix the function with "__attribute__((fcache))". At run time when the function is called it will be loaded into COG memory and executed entirely from there, at the same speed as native COG code. The SimpleSerial driver does this to ensure timing is correct.
I'm not sure if SimpleIDE handles this or not, but COG code can certainly be linked together. Normally the way it's done is to link all the modules that are used by the COG together, but to make the output relocatable so that the resulting object file can be linked with LMM code (and can refer to variables in data memory). To avoid conflicts between function names in the COG and LMM we use objcopy --localize-text on the COG file (this changes all global function symbols in the COG code into local ones, so won't satisfy external references in the LMM code).
Eric
I did think about this (a lot in fact), but it seemed a bit tricky. The signal processing loop is something like
ad0=readADC(chan0);
ad1=readADC(chan1);
desiredDacValue=getNextDacOutValue();
actDacValue=BunchOfMath(desiredDacValue,ad0,ad1);
writeDAC(actDacValue);
I want this to go as absolutely quickly as possible.
It seemed to me like if each ADC and DAC call (which already take some time because of the bus) required additional overhead to post to a mailbox, wait for the other cog to do the operation and then read back from...where ever (certainly somewhere hub), I might be disappointed.
Also it was not completely clear to me that the mailbox approach did not just move the lock from being around the bus access to being around the mailbox access (assuming the mailbox queue has some depth which it would need). I don't know jack about how these mailboxes are typically implemented, so maybe there is a clever lockless way to do it.
Best regards,
Tom
Normally this wouldn't be a problem with devices on separate pins because separate processes can use separate mailbox buffers. Your single bus design offers some challenges though.
Fortunately, David Betz has some experience with sharing a single bus among processes without locks in the new cache methodology he developed using Chip's driver.
Maybe David can help.
If you have a shared bus, and a lot of accessors, then yes, waits/locks will have to happen somewhere.
There also has to be some sustained-flow limit.
Usually with COGs, you want each COG loop deterministic, and hard real time.
eg For modules like DACs, they just read the most-recent requested value, and never need to worry about latencies in getting that.
Something like an ADC often has a set sample update rate, and it updates the mailbox at that rate.
Of course, if the FIFO size and Databus activity conspire to not empty fast enough, then something has to give (but this should be rare)
In that case, the ADC code can decide to skip, or wait, or flag a sticky error, depending on what matters most to you.
If hard real time matters less to you, and just one one-after-the-other is OK, with the Bus setting the average speeds, you may be fine with the Cache approach mentioned above.
I kind of thought so. I did not see much way around that.
I did mess around with this some. It has two problems. One is the "not call other functions" bit because of course the template for a bus cycle is
void WriteXXX(param)
{
WriteAddress(ADDR_FOR_XXX);
WriteData(param);
ToggleWriteStrobe();
}
I have the internal functions to make maintenance easier (only one thing to change if we remap pins) and to prevent code bloat since those functions are also handling the register write shadows and OUTA/DIRA.
The second issue (which I realized after turning a lot of the functions into macros to get around the "no calls" problem) is that I am not sure that it actually helps me in the end. I think the only way to be really fast for these one-off functions is to be in cog ram all the time. The fcache works great for loops where the function execution time is much longer than the time to load the instructions into the fcache. But for these silly short things with no loops, it is not clear that fcaching will actually be faster than the straight up LMM execution time when all is said an done.
That said, I should revisit this because in fact the biggest problem is not necessarily the function execution time (although that is important) but rather time between AcquireLock and FreeLock since that determines delays between cogs. _That_ would improve greatly with fcache even at the expense of the load time. Hmmm....
Ok so we can link cog code together. That is cool. I am not understanding quite that last bit however about the objcopy --localize-text. Is this done on a cog by cog basis? Say I have this:
databus.c->databus_cog.o
databus.c->databus_lmm.o
(i.e. two different compiles of databus.c with different compiler flags - implemented probably with a separate SIDE library or dreaded makefile).
Now I have 4 cogs all of which call databus functions:
fastcog1.cogm
fastcog2.cogm
pokey1cog.c
pokey2cog.c
The pokeycogs are clear enough, they link to the _lmm versions.
For the fastcogs...I do see the problem of trying to generate a single image for each cog - the linker starts with the cog entry point and then starts pulling in called functions to generate a single block of code that does not call out of itself. This block is loaded wholesale in cognew. I don't get why the renaming is needed.....oh....now on the fourth rewrite of this I do get it. The linker is not cog aware. It does not know that "pull in" a function means "copy the code into the block for the current cog and link to there". It just happily finds the code, tacks it on to the main segment and points the call there as if this were a uniprocessor executable. Thus the second cog that calls a shared function will end up trying to jump into an address somewhere not in its block.
I am recalling now each cog gets its own segment. So probably objcopy --localize-text not only renames all functions, but changes thier destination segment so they end up in the right block. The rename is probably needed to prevent the same function from showing up in multiple segments. Or something to that effect.
Cheers,
Tom
In SimpleIDE projects filenames *.cogc will be compiled to cog objects. This is mentioned in the User Guide, but maybe it needs more details.
I knew that *cogc would be compiled with -cogm, but I had the distinct impression (perhaps completely misplaced) that each and every cogc file denoted a new cog entry point and therefore _must_ include a void main() function. That requirement would make it a bit of a hack to use cogc files for shared code.
Tom
A minor correction, it's actually "-mcog". :-)
#include "cpx_prop_databus.c"
Very daring and avant garde I know.
Interestingly this compiles and seemingly links to all databus calls from both LMM and COGC cogs. I really thought I would need to have some #ifdef __LMM kind of code around the function names to append a _lmm/_cogm so as to avoid linker confusion. I may still.
But in any event, the only complaint thus far is:
===============
propeller-elf-gcc.exe -I . -L . -o lmm/cpx_prop_slave.elf -Os -mlmm -m32bit-doubles -fno-exceptions -std=c99 cpx_prop_slave.h lmm/cpx_prop_slave.a c:/propgcc/bin/../lib/gcc/propeller-elf/4.6.1/short-doubles/_crtbegin.o: In function `argc_cnt':
(.init+0x3c): undefined reference to `_main'
lmm/cpx_prop_slave.a(cpx_prop_databus.cog): In function `_start':
(cpx_prop_databus.cog+0x48): undefined reference to `_main'
collect2: ld returned 1 exit status
Done. Build Failed!
Check source for bad function call or global variable name `_main'
lmm/cpx_prop_slave.a(cpx_prop_databus.cog): In function `_start':
(cpx_prop_databus.cog+0x48):
===============
This does make it look like *cogc files need a main() function. Do I just include that function but ignore it?
Cheers,
Tom
The .cogc files always need a main(). They are not intended for sharing code. That's the way it works.
You should be able to define .cogc functions to share in a common header file.
Don't be too ambitious with what you do in a .cogc file, otherwise you could experience much grief. Try to limit functions to 1 or 2 calls deep.
All .cogc functions should be _NATIVE or _NAKED. https://code.google.com/p/propgcc/wiki/COGModeExperiences
OHHH, light dawns on Marblehead. Now I get it. I just realized I have been way over thinking what the compiler/linker are capable of. This is much more like the old school PIC compilers that could really only handle a single file program. To split things up you would simply include .c files in the "main" c file that was compiled.
So in this case, if I have a shared set of code, in say databus.c/.h, I would include databus.c in my SIDE file for the LMM calls, and then for a given cogc file I could do something like:
And to contend with the realities of cogc as described in the call stack and so forth, I might need conditionally compiled versions of the functions based on the memory model.
One thing the "Experiences" page mentioned is declaring all functions (besides main) with _NATIVE. Is that strictly necessary? Are not all functions in a cogc file _NATIVE by default?
Best regards,
Tom
Tom,
Typically PropellerGCC code is not native, i.e. HUB instructions are pseudo-interpreted by the kernel.
Therefore _NATIVE is required, because .cogc generated code is native.
Apparently the way to get around this for code sharing is ....
I think my main problem was that I was trying to figure out how to compile individual modules (i.e. c files) under the different memory models, that is, compile one file under -mcog and another under -mlmm and then link them all together. That is why I was wondering about the _NATIVE. It makes perfect sense that you need to specify _NATIVE if the file is being compiled under -mlmm since that is not the default LMM calling convention. It made less sense when you are in a cogm file where you are already "NATIVE".
But I think I get it now. I should just set my project to LMM and explicitly put _NATIVE on functions that need to be callable from cogc files.
I will probably try something like this:
That seems like it might do the trick. I will give it a whirl on Monday.
Thanks for your help.
All the best,
Tom
It is certainly possible to link code compiled with different memory models, without creating a separate .elf file for the [native] COG program first.
My Makefile compiles .cogc and .S files with -mcog, and .c files with -mlmm or -mcmm; the output of each is an object file, with a .o extension for .c files and a .cog or .ecog extension for .cogc and .S files. The .cog and .ecog files get passed through "objcopy --localize-text --rename-section .text=<module_name>.[e]cog", and then all object files get linked to produce the .elf file. No intermediate .elf files were harmed (or used) during this process.
Now, it may be argued that these are really separate programs that never see the same cog, but you know that it's ill-advised to use the phrase "it isn't possible" on this forum !
I want to make a cogc object that fills a buffer in HUB that is specified by the LMM code and the LMM can access. I realize the implication of needing locks and whatnot when two or more cogs are accessing the same hub memory.
Is it as simple as just having a global variable in the LMM code and referring to it in the cogc code and then the linker deals?
Nevermind for now, I found jac_goudsmit's excellent article explaining a lot of the details here: https://code.google.com/p/propgcc/wiki/PropGccInDepth
Edit: I just noticed you're using a makefile. However, my comment still stands. When you say you compile to a ".cog" file that is just a ".elf" file with a different extension. You're describing exactly the same process as I described.
Attached is another example (sense I spent time on it .
Thanks for clearing that up. I also was not realizing what was happening with that objcopy trick. I thought it had to do with linking individual MCOG functions or sections together. But that clearly is not happening as of now. I will just use the #include hack for the time being. That is easy enough.
All the best,
Tom
That may be, but it remains that I don't know how to do that easily and cleanly in SimpleIDE.
What I'm trying to do is to me the holy grail (we all define this a different way, don't we ?) of prop development in GCC: device drivers running in cogs written in cogc or PASM (.S files, not inline assembly, which is an ugly kludge) and the main thread in LMM (or multiple high-level threads; this is easy and impressive in propgcc, just like in SPIN). Ideally, we could combine cogc and PASM in the same cog by linking the two object files together, but to me, PASM is so clean and nice (especially with gcc directives) that once you need to use some PASM, you might as well do the whole cog in it since it's not that big, anyway.
This is in fact very similar in concept to the original SPIN/PASM combo, but with a much faster high-level language that more people are familiar with, and with the normal gcc directives like conditional compilation, for lmm c, cogc, and PASM (this does not work properly if you use inline assembly), that works the same and cohesively for each.
Here's how I would implement it in SimpleIDE and propgcc:
In SimpleIDE, you would have a thread tree, where you would define each thread as either lmm or native. For each lmm thread, there would be a "master" function that would be the start of execution for that lmm cog on program init; for many programs, there would be only the "main" lmm thread. Of course, you could always start the lmm cogs explicitly in your code (and right-click on the thread and uncheck the "load on program init" checkbox ;-), but most people won't need to do this, and SimpleIDE (or a Makefile) could automate it for us. For each native (cogc/PASM) thread, any file dragged and dropped onto that thread would be linked with the other files in that thread, and then that binary blob would be loaded as .cog and .ecog sections are loaded now.
Behind the scenes, each file (or function in the file, or even library function) would have an attribute assigned to it so the linker knows to put it in the proper section. Anyone using Makefiles would either include the attribute on the command line or could even assign attributes directly to functions in the .cogc/.S files to make it explicit. External references during the compile of a .cogc or .S file would be given the proper attribute so the linker knows which section to put the library function in.
Here's an example to demonstrate what I mean (this is my main project right now; I don't mix .cogc and .S, but I'll fictionalize that for demonstration purposes):
In this example, the lmm cogs call many of the functions in the files that are uncategorized under REsys.side; the file listed under each lmm cog includes the thread entry point. The communications driver has two native files (buffer.cogc and RS485.S) that need to be linked together and then loaded into a cog; all other device drivers have only one .S file.
I think this would be an intuitive and powerful way to assign resources on the prop, showcasing its architecture and making it easy to learn; it would make what is happening more explicit and easier to handle by SimpleIDE; and it would reduce the gap that seems to be ever widening between SimpleIDE and us Makefile nerds.
Actually the memory models themselves, meaning the difference between LMM and COG, were clear enough from the very beginning. It is just the mechanics of it all that was tricky. It is somewhat exacerbated by the wonky messages you get from the linker. Pretty much anything that goes wrong seems to generate a "truncating" message so you are never quite sure about the actual error.
I can think of several things one could change about the mechanics to make it all a bit more seamless, but I think they would require quite a bit of work on the compiler and linker to implement. It is not clear to me how many people will actually use this mixed model (myself included in the end) so I will sit tight.
Thanks again for all your help.
Cheers,
Tom
I had not read Altosack's post carefully when I said this. He seems to be going down a similar path as I was so I may as well spill the beans. It seems to me that the compiler/linker should be able to figure all this stuff out if it knows the entry points. Thus if there was a way to do something like:
The compiler should be able to figure out what to do with that.
It is clear enough that any function called by NATIVE entry points need to be NATIVE and functions called from LMM can be either (it seems).
The linker would have to do quite a bit more work. But the user would have to do quite a bit less. You could ditch the .cogc idea and everything could just be a .c file. You would not need any thread tree file either. The linker would just know that it needs to create effectively a new program for each NATIVE ENTRY function and would act accordingly.
Another nice trick for sharing code would be to have a _shared attribute. That would direct the compiler to create both LMM and COG versions of the function (obviously with some name decoration to differentiate them). The linker would then be able to find the appropriate one based on the model of the caller, and throw out versions that were not needed.
Alternately, _shared functions could be treated like C++ templates and only generated after the rest of the program has been seen, on demand.
Indeed, you could conceivably make it so that the entire project was SHARED and the user had to use attributes only on the entry points:
This would make the coupling between LMM/NATIVE, compiler switches, file names, and attributes very, very loose. The only time you would need something "special" would be on the entry points. So switchng a cog from LMM to NATIVE would involve only 1 change to the whole program: switching from _LMM to _NATIVE.
Currently you have to rename the file, remove the old file from the SIDE, add the new file, then possibly pull some number of functions into the new COGM file.
Anyway, that would be the "holy grail" from my perspective.
Cheers,
Tom