Another problem is that we are using GCC and we don't want to change the machine-independant part of GCC unless absolutely necessary. This makes it easier for us to move from one release of GCC to another. That means we don't want to change radically the behavior of the compiler or linker. We could, of course, add a tool to the toolchain that manipulated COG images and made a combiled binary. Also, linker scripts can be used to do that with the standard linker. Lots of options.
Yes, I had a feeling you were trying to do this rather than simply writing prop_ld from zero which would give you nearly infinite space for optimizations and tricks. That is one reason that I was holding off on springing all my brilliant (ehem) ideas on you - I really had no idea of the implementation costs. And when you are not paying, you really can only ask for nearly free things
Yes, I had a feeling you were trying to do this rather than simply writing prop_ld from zero which would give you nearly infinite space for optimizations and tricks. That is one reason that I was holding off on springing all my brilliant (ehem) ideas on you - I really had no idea of the implementation costs. And when you are not paying, you really can only ask for nearly free things
Tom
Well, I guess you could offer to build this linker yourself. :-)
Seriously, the ELF file format is pretty well documented. The propeller-load program parses it to some extent but it only handles absolute files. It can't handle relocatable files. You'd have to be able to handle all of the relocation information placed in the file by the compiler/assembler.
The compiler should be able to figure out what to do with that.
I don't think that's enough for the compiler/linker to figure out what to do, except in the special case that there is only one non-LMM COG and it always runs. If there are going to be multiple non-LMM COGs running different code, then there needs to be a way to tell the compiler which bits of COG code go together, and when to load them.
It is clear enough that any function called by NATIVE entry points need to be NATIVE and functions called from LMM can be either (it seems).
I think there's some confusion here about what "_NATIVE" means. It means that the function needs to be called by the "call" instruction and will return using the "ret" instruction. That's all it means. It does have some implications. For example, "call" requires a target in the COG memory, so in LMM mode a _NATIVE instruction has to be placed in the .kernel section so it will be loaded into COG memory (as part of the LMM kernel). For this reason an LMM mode _NATIVE function cannot call a non-native function (it would require the LMM interpreter to be re-entrant, which it isn't).
In -mcog mode _NATIVE and "regular" functions can be freely mixed, at least as far as the compiler is concerned -- it is the programmer's responsibility to make sure that _NATIVE functions are never called recursively. Functions do *not* have to be marked _NATIVE when compiled with -mcog, and indeed should not be so marked if they are recursive.
Your other suggestion (creating a _COG_ENTRY attribute to mark top level functions) is a good one, but it would require a lot more intelligence than the tools presently have. It'd be great though if we had a volunteer to work on a tool that could extract this info and create linker scripts and/or makefiles from source code decorated with these attributes. :-)
Well, I guess you could offer to build this linker yourself. :-)
Ha, I knew that was coming! That is the biggest payment of all - at least with my schedule.
Seriously, the ELF file format is pretty well documented. The propeller-load program parses it to some extent but it only handles absolute files. It can't handle relocatable files. You'd have to be able to handle all of the relocation information placed in the file by the compiler/assembler.
Not that I am volunteering, but do you imagine it would be easier to start from scratch or is there some way to shoehorn the scheme I am imagining into ld? One thing I noticed when looking at the ld manual, it was very very clear about the command link and linker script directives, but it never mentioned once how it links function and data references together. So it was never that clear to me if it could do, for example, multiple mains from within the same linker pass.
Hi Tom, I'm also David, but it might be less confusing just to call me altosack.
The changes that I would like would all be done in SimpleIDE and in the Makefile; no changes would need to be done either to the compiler or linker, although it may be expedient to change some linker scripts. I don't know how the internals of SimpleIDE work, but I think it would be great if SimpleIDE were to create a Makefile on the fly to do the compilation and linking each time. While it will be somewhat of a challenge to implement the methods in the Makefile, the changes to SimpleIDE to create the Makefile would then be almost trivial.
In accordance with that, here's a couple of reasons I had for making a tree with a branch for each thread or cog:
1. It would show a graphical at-a-glance allocation of cog resources and program hierarchy in SimpleIDE. I don't use SimpleIDE, but many others do, and that seems to be where all the resources are being put. I might use it if it could do what I'm proposing here. Someone creating their own Makefile would replicate the tree-structured representation (see example below).
2. _COG_ENTRY _NATIVE works fine if there is only only one file to be compiled for that cog driver (and this is the way I do it now, but it means that I have to write the whole driver in either cogc or PASM), but it's expecting too much of the tool chain to know that it has to link buffer.cogc and rs485.S together before passing the result to OBJCOPY unless there's a direct mechanism for telling it (i.e., some shared attribute between buffer.cogc and rs485.S that is not shared with pwmph2.S).
Here's one way to implement the SimpleIDE tree structure in a Makefile. Basically, there would be a string for each thread; I'll use the same example as before, but with structured names for the threads. However, I'll change LMM to HUB, and NATIVE to COG; HUB refers to LMM, CMM, XMM, etc., and COG could also be ECOG. There's probably a better name than "HUB", but I hope it's clear for now.
The makefile would then concatenate PRJSRC and the *HUBs to get the list of files to be compiled and linked together for the hub, and each _COG or _ECOG would be linked separately and OBJCOPY'd into binary .o's, and then everything would be linked together. I already do it mostly like this in my makefiles, but I've modified it a bit here to be compatible with the way it'd be great to have SimpleIDE do it.
Another possibility would be to just have a list (or tree) of NATIVE threads, and tagging the entry-point LMM functions directly like you did with _COG_ENTRY, although I like the idea of putting the function names in the tree (instead of the filenames) to keep it orthogonal.
I may toy with creating a Makefile to do this. The only reason linker scripts might need to be messed with is if the entry points are to be loaded automatically, and I definitely see this as an optional item, although it would be slick. If I can create the Makefile, duplicating that functionality in SimpleIDE (i.e., creating the Makefile from the tree) definitely would be trivial.
It would show a graphical at-a-glance allocation of cog resources and program hierarchy in SimpleIDE
Perhaps I misinterpret what you mean but I don't see how this can work.
There is not necessarily a one to one relation between bits of code that can be run in COG and actual COG resources used at run time. A single piece of COG code could be loaded and run in many COGs at run time and the number of COGs actually used may depend on what work actually comes up whilst running.
I get nervous about suggestions to add features to SimpleIDE. It is supposed to be, well, simple. Every new little option and feature takes it further away from that goal. At what point should we rename it to ComplexIDE:)
First off, I finally got my shared code working between my LMM anf COGC file by using the #include "databus.c" hack. I think I am indeed now seeing the difference between "regular" MCOG code and _NATIVE code in that while I have several layers of function call and many references to sp in the resulting assembly, it does not seem to be causing any trouble. I expect that if I start throwing _NATIVE around all over, my performance will go up further but my trouble will multiply.
So thank you all very much for helping to get that going.
Now turning to Eric and Altosack's comments: I confess that I did not completely understand _NATIVE. Rather, I did not understand that all functions compiled under MCOG where not _NATIVE by default. That was my main assumption I think that was causing confusion. I gather now that the default calling convention for MCOG is something not quite _NATIVE - I guess primarily it is using the HUB allocated stack for the function calls. The main thrust of my previous argument was that (excluding XMM and its buddies), there are two main ways to run code on a cog, interpreted via the LMM kernel, and "bare metal" via MCOG, _NATIVE, NAKED, etc. The main difference being whether or not the interpreter is present on the COG or not.
So let me try to restate my earlier thoughts using _INTERPRETED and _BAREMETAL as the decorators so as to avoid any terminology confusion. My point was that given two entry points
_COGENTRY _INTERPRETED void Entry1(void*)
{
}
_COGENTRY _BAREMETAL void Entry2(void*)
{
}
a sufficiently smart compiler/linker should be able to "do the right thing" with that information alone. It is clear that upon seeing the _COGENTRY marker that some kind of COG image will need to be created.
In the case of _INTERPRETED the image would contain the LMM kernel and any _BAREMETAL/_NATIVE functions called in the function call graph rooted by the entry point.
In the case of the _BAREMETAL entry point, the COG image would have to contain the entire function call graph rooted by the entry point.
Altosack, I agree that what I am talking about is probably not possible without beating on ld some. Indeed, just now I am realizing that it is incumbent on the _user_ to make the final link between each COGC image and the cognew call via this slightly voodooish _load_start_cog[] business - the linker cannot even pull that off. I am sure this is because of the "separate program" nature of the COGC implementation. All the images just get piled together and the user has to know that a given memory block is in fact a COG image (it is all making more sense by the minute).
I am wondering however if it would not be all that bad. Consider what you would have if you just treated everything as one big program (i.e. everything in C files, no special COGC treatment). Assume also for the moment that a COG CALL instruction can access a full 32 bits address space instead of the actual 512 words (small detail ). After the linker ran, you would have an image that looked like something that would run on a uniprocessor propeller. Each COGENTRY would look like a simple function pointer passed into cogstart/cognew. The linker would have pulled all the various functions out of each of the .o files into the one big honking image. Everything would be in fact more or less structurally correct with one exception: the _BAREMETAL entry points would be calling functions that were all over the place. And any function that was called by more than one COG entry point would be in the image only once.
It seems however that it would not be that horrible a task to write a post-processor to run over this image and produce a multi-cog aware image. Most of the heavy lifting of lib search paths, opening multiple .o files, etc would have been taken care of. One would need to just open the one image and find all the entry points. Then you would need to descend each BAREMETAL entry point's call tree and copy each function from the main image into a proper COG image, of course doing fixups along the way.
The nice thing about a post-processor is that it would be entirely decoupled from GCC. Another is that it could spit out all sorts of metrics like memory used, call stack depth, etc.
No, there is not necessarily a one-to-one relation; however, this is the case most of the time, and certainly for many newcomers starting to program the prop. Also, the example I gave just happened to have 8 static threads; it doesn't have to be that way. You could define 11 threads, one of which is loaded onto two cogs simultaneously, and still use a graphical tree-structure to represent it. Of course, if all of the threads try to get loaded at initialization, you won't get the results you expect !
I don't see this as adding features to SimpleIDE so much as changing the way the project is presented. Nearly all of what I'm talking about would get implemented with a makefile, which SimpleIDE would create on the fly based on the structure defined; as I've mentioned, it doesn't need any changes to the compiler or linker, nor even any linker scripts (I've started to implement it, and found out that it's not necessary), and the changes to SimpleIDE would be pretty easy to do. A big win for me, as someone who doesn't use SimpleIDE, would be to bridge the gap somewhat between SimpleIDE and the do-it-yourself makefilers like me, and have them work the same way.
I've realized that it's very difficult to make a single makefile that will link a cogc file and a PASM file together, OBJCOPY that cog image, and link that with the rest of the program. I think the answer is a two-stage makefile, which simplifies it a great deal. All SimpleIDE would have to do is convert the graphical tree structure to the strings in my previous post, put that into an included makefile stub, and execute make. The first stage would create the 2nd stage makefile and then execute it.
Tom, if I implied that we'd need to change the linker, I was off base. Nothing needs to be changed to the compiler, linker, or scripts, even if some cogs are auto-loaded. It would be pretty easy to simply put those commands in an init section, something that already exists in gcc. Again, auto-loading wouldn't work for all situations, but it would be nice to have that as an option for the (majority, I think) situations that it does work for.
The makefile I've been using for about a year now mostly works for me, but it needs to be massaged a bit too often for my tastes when things need to be done a different way. I'm going to implement what I've talked about here; if others like it and think it would be useful for SimpleIDE to use it, also, great; if not, at the end of the day (or week, or whatever) I'll still have a more functional and orthogonal solution that will make me happy every time I use it. Who could ask for more ?
Sorry to be so obsessive about naming but it's actually "COG" memory model. The -m means "machine dependent option" and is used for specifying the memory model as well as other things in PropGCC. For example, in later versions of PropGCC, "-mp2" compiles code to run on the Propeller 2.
Sorry to be so obsessive about naming but it's actually "COG" memor model. The -m means "machine dependent option" and is used for specifying the memory model as well as other things in PropGCC. For example, in later versions of PropGCC, "-mp2" compiles code to run on the Propeller 2.
That's cool, we ought to get it right. I had not appreciated that the -m was the compiler switch that let you modify the mode. I thought it stood for Memory Model.
And I always thought it stood for "make". As in "make large memory model" and "make propeller 2". It reads a bit easier than "machine dependent option large memory model", but I'm glad to know the real name.
Comments
Yes, I had a feeling you were trying to do this rather than simply writing prop_ld from zero which would give you nearly infinite space for optimizations and tricks. That is one reason that I was holding off on springing all my brilliant (ehem) ideas on you - I really had no idea of the implementation costs. And when you are not paying, you really can only ask for nearly free things
Tom
Seriously, the ELF file format is pretty well documented. The propeller-load program parses it to some extent but it only handles absolute files. It can't handle relocatable files. You'd have to be able to handle all of the relocation information placed in the file by the compiler/assembler.
I don't think that's enough for the compiler/linker to figure out what to do, except in the special case that there is only one non-LMM COG and it always runs. If there are going to be multiple non-LMM COGs running different code, then there needs to be a way to tell the compiler which bits of COG code go together, and when to load them.
I think there's some confusion here about what "_NATIVE" means. It means that the function needs to be called by the "call" instruction and will return using the "ret" instruction. That's all it means. It does have some implications. For example, "call" requires a target in the COG memory, so in LMM mode a _NATIVE instruction has to be placed in the .kernel section so it will be loaded into COG memory (as part of the LMM kernel). For this reason an LMM mode _NATIVE function cannot call a non-native function (it would require the LMM interpreter to be re-entrant, which it isn't).
In -mcog mode _NATIVE and "regular" functions can be freely mixed, at least as far as the compiler is concerned -- it is the programmer's responsibility to make sure that _NATIVE functions are never called recursively. Functions do *not* have to be marked _NATIVE when compiled with -mcog, and indeed should not be so marked if they are recursive.
Your other suggestion (creating a _COG_ENTRY attribute to mark top level functions) is a good one, but it would require a lot more intelligence than the tools presently have. It'd be great though if we had a volunteer to work on a tool that could extract this info and create linker scripts and/or makefiles from source code decorated with these attributes. :-)
Eric
Ha, I knew that was coming! That is the biggest payment of all - at least with my schedule.
Not that I am volunteering, but do you imagine it would be easier to start from scratch or is there some way to shoehorn the scheme I am imagining into ld? One thing I noticed when looking at the ld manual, it was very very clear about the command link and linker script directives, but it never mentioned once how it links function and data references together. So it was never that clear to me if it could do, for example, multiple mains from within the same linker pass.
Cheers,
Tom
The changes that I would like would all be done in SimpleIDE and in the Makefile; no changes would need to be done either to the compiler or linker, although it may be expedient to change some linker scripts. I don't know how the internals of SimpleIDE work, but I think it would be great if SimpleIDE were to create a Makefile on the fly to do the compilation and linking each time. While it will be somewhat of a challenge to implement the methods in the Makefile, the changes to SimpleIDE to create the Makefile would then be almost trivial.
In accordance with that, here's a couple of reasons I had for making a tree with a branch for each thread or cog:
1. It would show a graphical at-a-glance allocation of cog resources and program hierarchy in SimpleIDE. I don't use SimpleIDE, but many others do, and that seems to be where all the resources are being put. I might use it if it could do what I'm proposing here. Someone creating their own Makefile would replicate the tree-structured representation (see example below).
2. _COG_ENTRY _NATIVE works fine if there is only only one file to be compiled for that cog driver (and this is the way I do it now, but it means that I have to write the whole driver in either cogc or PASM), but it's expecting too much of the tool chain to know that it has to link buffer.cogc and rs485.S together before passing the result to OBJCOPY unless there's a direct mechanism for telling it (i.e., some shared attribute between buffer.cogc and rs485.S that is not shared with pwmph2.S).
Here's one way to implement the SimpleIDE tree structure in a Makefile. Basically, there would be a string for each thread; I'll use the same example as before, but with structured names for the threads. However, I'll change LMM to HUB, and NATIVE to COG; HUB refers to LMM, CMM, XMM, etc., and COG could also be ECOG. There's probably a better name than "HUB", but I hope it's clear for now.
The makefile would then concatenate PRJSRC and the *HUBs to get the list of files to be compiled and linked together for the hub, and each _COG or _ECOG would be linked separately and OBJCOPY'd into binary .o's, and then everything would be linked together. I already do it mostly like this in my makefiles, but I've modified it a bit here to be compatible with the way it'd be great to have SimpleIDE do it.
Another possibility would be to just have a list (or tree) of NATIVE threads, and tagging the entry-point LMM functions directly like you did with _COG_ENTRY, although I like the idea of putting the function names in the tree (instead of the filenames) to keep it orthogonal.
I may toy with creating a Makefile to do this. The only reason linker scripts might need to be messed with is if the entry points are to be loaded automatically, and I definitely see this as an optional item, although it would be slick. If I can create the Makefile, duplicating that functionality in SimpleIDE (i.e., creating the Makefile from the tree) definitely would be trivial.
There is not necessarily a one to one relation between bits of code that can be run in COG and actual COG resources used at run time. A single piece of COG code could be loaded and run in many COGs at run time and the number of COGs actually used may depend on what work actually comes up whilst running.
I get nervous about suggestions to add features to SimpleIDE. It is supposed to be, well, simple. Every new little option and feature takes it further away from that goal. At what point should we rename it to ComplexIDE:)
First off, I finally got my shared code working between my LMM anf COGC file by using the #include "databus.c" hack. I think I am indeed now seeing the difference between "regular" MCOG code and _NATIVE code in that while I have several layers of function call and many references to sp in the resulting assembly, it does not seem to be causing any trouble. I expect that if I start throwing _NATIVE around all over, my performance will go up further but my trouble will multiply.
So thank you all very much for helping to get that going.
Now turning to Eric and Altosack's comments: I confess that I did not completely understand _NATIVE. Rather, I did not understand that all functions compiled under MCOG where not _NATIVE by default. That was my main assumption I think that was causing confusion. I gather now that the default calling convention for MCOG is something not quite _NATIVE - I guess primarily it is using the HUB allocated stack for the function calls. The main thrust of my previous argument was that (excluding XMM and its buddies), there are two main ways to run code on a cog, interpreted via the LMM kernel, and "bare metal" via MCOG, _NATIVE, NAKED, etc. The main difference being whether or not the interpreter is present on the COG or not.
So let me try to restate my earlier thoughts using _INTERPRETED and _BAREMETAL as the decorators so as to avoid any terminology confusion. My point was that given two entry points
_COGENTRY _INTERPRETED void Entry1(void*)
{
}
_COGENTRY _BAREMETAL void Entry2(void*)
{
}
a sufficiently smart compiler/linker should be able to "do the right thing" with that information alone. It is clear that upon seeing the _COGENTRY marker that some kind of COG image will need to be created.
In the case of _INTERPRETED the image would contain the LMM kernel and any _BAREMETAL/_NATIVE functions called in the function call graph rooted by the entry point.
In the case of the _BAREMETAL entry point, the COG image would have to contain the entire function call graph rooted by the entry point.
Altosack, I agree that what I am talking about is probably not possible without beating on ld some. Indeed, just now I am realizing that it is incumbent on the _user_ to make the final link between each COGC image and the cognew call via this slightly voodooish _load_start_cog[] business - the linker cannot even pull that off. I am sure this is because of the "separate program" nature of the COGC implementation. All the images just get piled together and the user has to know that a given memory block is in fact a COG image (it is all making more sense by the minute).
I am wondering however if it would not be all that bad. Consider what you would have if you just treated everything as one big program (i.e. everything in C files, no special COGC treatment). Assume also for the moment that a COG CALL instruction can access a full 32 bits address space instead of the actual 512 words (small detail ). After the linker ran, you would have an image that looked like something that would run on a uniprocessor propeller. Each COGENTRY would look like a simple function pointer passed into cogstart/cognew. The linker would have pulled all the various functions out of each of the .o files into the one big honking image. Everything would be in fact more or less structurally correct with one exception: the _BAREMETAL entry points would be calling functions that were all over the place. And any function that was called by more than one COG entry point would be in the image only once.
It seems however that it would not be that horrible a task to write a post-processor to run over this image and produce a multi-cog aware image. Most of the heavy lifting of lib search paths, opening multiple .o files, etc would have been taken care of. One would need to just open the one image and find all the entry points. Then you would need to descend each BAREMETAL entry point's call tree and copy each function from the main image into a proper COG image, of course doing fixups along the way.
The nice thing about a post-processor is that it would be entirely decoupled from GCC. Another is that it could spit out all sorts of metrics like memory used, call stack depth, etc.
Would something like that work?
All the best,
Tom
No, there is not necessarily a one-to-one relation; however, this is the case most of the time, and certainly for many newcomers starting to program the prop. Also, the example I gave just happened to have 8 static threads; it doesn't have to be that way. You could define 11 threads, one of which is loaded onto two cogs simultaneously, and still use a graphical tree-structure to represent it. Of course, if all of the threads try to get loaded at initialization, you won't get the results you expect !
I don't see this as adding features to SimpleIDE so much as changing the way the project is presented. Nearly all of what I'm talking about would get implemented with a makefile, which SimpleIDE would create on the fly based on the structure defined; as I've mentioned, it doesn't need any changes to the compiler or linker, nor even any linker scripts (I've started to implement it, and found out that it's not necessary), and the changes to SimpleIDE would be pretty easy to do. A big win for me, as someone who doesn't use SimpleIDE, would be to bridge the gap somewhat between SimpleIDE and the do-it-yourself makefilers like me, and have them work the same way.
I've realized that it's very difficult to make a single makefile that will link a cogc file and a PASM file together, OBJCOPY that cog image, and link that with the rest of the program. I think the answer is a two-stage makefile, which simplifies it a great deal. All SimpleIDE would have to do is convert the graphical tree structure to the strings in my previous post, put that into an included makefile stub, and execute make. The first stage would create the 2nd stage makefile and then execute it.
Tom, if I implied that we'd need to change the linker, I was off base. Nothing needs to be changed to the compiler, linker, or scripts, even if some cogs are auto-loaded. It would be pretty easy to simply put those commands in an init section, something that already exists in gcc. Again, auto-loading wouldn't work for all situations, but it would be nice to have that as an option for the (majority, I think) situations that it does work for.
The makefile I've been using for about a year now mostly works for me, but it needs to be massaged a bit too often for my tastes when things need to be done a different way. I'm going to implement what I've talked about here; if others like it and think it would be useful for SimpleIDE to use it, also, great; if not, at the end of the day (or week, or whatever) I'll still have a more functional and orthogonal solution that will make me happy every time I use it. Who could ask for more ?
That's cool, we ought to get it right. I had not appreciated that the -m was the compiler switch that let you modify the mode. I thought it stood for Memory Model.
Cheers,
Tom
And I always thought it stood for "make". As in "make large memory model" and "make propeller 2". It reads a bit easier than "machine dependent option large memory model", but I'm glad to know the real name.