Going into the future I think we need to find a way to get updates to our compiler toolchain into production rather than sticking with an old buggy version for years. As we've pointed out a number of times, there have been many bugs and updates done to PropGCC over the years that never made it into the SimpleIDE release. We need to avoid that in the future. We should at least keep up with our own branch!
The only reason I've heard from Parallax for not integrating the new compiler is that they saw some bug with libraries, or some incompatibility (which I think you or another PropGCC dev confirmed an ABI change?).
So, is Parallax going to be okay with ABI changes in the future? If not, can we convince them otherwise? If not, we need to do everything we can to avoid ABI changes in the future... sometimes change is inevitable, but if we know the cost of making an ABI change is Parallax never adopting it, that might change how hard we look for an alternative solution. And maybe not - but at least we'll all feel better knowing ahead of time what the outcome will be (and we all know it's all about our feelings, isn't it?).
The other thing I'd like see created (if they exists already in PropGCC, I'd love to know where and how to use them) are tests for as many library functions as humanly possible. Some tests can be "hardware independent" and others will need a special board with peripherals hooked up to the right pins. For the hardware-dependent tests, some can be verified automatically via the P2 test code, and others will likely need a human observing and making note of failures.
With all that existing, one board should be accessible from a continuous integration server and that CI server should be configured to execute on every merge all tests except those which require human confirmation. That will be a lot of work writing all those tests (and maybe this ends up being my contribution?) but I can't imagine any better way to give Parallax the confidence they need to release a new version of PropGCC to public when its their reputation and revenue on the line.
Yes, all of that is a good idea. I don't think we ever figured out what caused the problem with the SimpleIDE library. No one had time to track it down. I don't think it was an ABI change. The biggest change in the more recent branch (other than moving to a newer version of GCC) was a change in the interface to cache drivers for XMM and that feature has been deprecated anyway.
David, do you know off hand if GCC has support for auto-allocating static variables to different memory segments based on how much is free?
For instance, in some cases you'd want to leverage the 2K of LUT ram for static variables, or not.
You'd want to use perhaps half of COG ram for static variables, or only use it for locals (stack variables).
Then of course you'd defer BIG allocations to hub heap space.
If there isn't anything simple like that, perhaps:
a) use volatile to signify a variable should be stored in COG RAM, then use LUT for stack
b) use volatile to signify a COG variable, LUT for user code, COG for locals
c) use volatile for COG variable, LUT for user code, hub ram for locals
The LUT memory usage for locals could be a compiler switch, using hub ram or COG ram for locals could be handled just by overflow, a locals stack would be 256 longwords, any locals that need more would just overflow to hub ram as needed. The code generation wouldn't need to do multi-pass analysis to support that.
The assembler or linker would have to verify that COG ram wasn't exhausted by the stack/volatile allocations -- you certainly don't want code to be assembled/linked that would allow the stack to be stepped on, or allow the stack to step on volatile variables.
Just some thoughts on how to best support the P2 memory model.
I imagine a good main model for P2 C/C++ to use would be HUBEXEC with all/most of COG mem being registers, and all of LUT being intrinsic functions (particularly the most used and needing of performance ones) and/or FCACHE.
Another one, maybe, that is COG/LUT only where it favors LUT for code (but also has a small stack there), leaving COG for registers/data.
...you certainly don't want code to be assembled/linked that would allow the stack to be stepped on, or allow the stack to step on volatile variables.
I have never known a C compiler that prevents the stack from growing over statically allocated variables of the heap. In general it is impossible for a C compiler to know at compile time how much stack your program will use at run time. Run time checks are not compiled in as that would be a performance hit.
The issue of "volatile" variables is something else. "volatile" only indicates to the compiler that it does not have sole access to a variable and therefore prevents it performing optimizations that would cause shared data to be used wrongly.
Going into the future I think we need to find a way to get updates to our compiler toolchain into production rather than sticking with an old buggy version for years.
Agreed! It's essential to keep integrating upstream changes into our GCC (and GAS and libc) fork while we work on our own version. That's also the only way to ever get the Propeller target merged into the mainstream. Which should be a goal.
These all sound like interesting ideas. Unfortunately I have no idea of GCC offers this sort of fine control on data and code placement. Certainly you can do this sort of thing with linker scripts but I'm not sure if the compiler itself can automatically direct code to different memory sections. Eric Smith would probably know more.
Heater,
Um, Most C/C++ compilers (VC, GCC, Clang) do compile in stack overflow checks, and have a fixed stack size (specified at compile time). They may offer ways to remove those checks, but I wouldn't use those options.
On Many CPUs it's not a check at each usage of the stack, but instead a memory page fault mechanism.
Heater,
Um, Most C/C++ compilers (VC, GCC, Clang) do compile in stack overflow checks, and have a fixed stack size (specified at compile time). They may offer ways to remove those checks, but I wouldn't use those options.
On Many CPUs it's not a check at each usage of the stack, but instead a memory page fault mechanism.
Yes, GCC can handle stack limits. I doubt you want to always put the stack in the LUT though since it's quite limited in size. Also, addressing data in the LUT may be more complicated than in hub memory. I guess I need to read through the P2 docs to refresh my memory on what instructions are available for addressing the LUT.
Yeah, David, I was only suggesting stack in LUT for the small model that is just in COG/LUT ram, so very tiny programs/drivers.
For the HUBEXEC model, stack should go in HUB (and can be read/written with the normal HUB access instructions so as to avoid conflict with HUBEXEC in the fifo).
Yeah, David, I was only suggesting stack in LUT for the small model that is just in COG/LUT ram, so very tiny programs/drivers.
For the HUBEXEC model, stack should go in HUB (and can be read/written with the normal HUB access instructions so as to avoid conflict with HUBEXEC in the fifo).
That makes sense. The P1 compiler has a "COG" mode. I guess the P2 compiler could have a "COG+LUT" mode.
...you certainly don't want code to be assembled/linked that would allow the stack to be stepped on, or allow the stack to step on volatile variables.
I have never known a C compiler that prevents the stack from growing over statically allocated variables of the heap. In general it is impossible for a C compiler to know at compile time how much stack your program will use at run time. Run time checks are not compiled in as that would be a performance hit.
The issue of "volatile" variables is something else. "volatile" only indicates to the compiler that it does not have sole access to a variable and therefore prevents it performing optimizations that would cause shared data to be used wrongly.
I think there should be some way to do a static analysis, post-compile, to ensure your COG memory usage does not result in the stack smashing local variables, even if it's just a tool that analyzed the linker maps and ensures there isn't overlapping ranges.
I'm thinking we'll end up with memory models anyway, for the following cases:
a) Small bits of code all self-contained in COG space (COG/LUT RAM used only)
b) Typical model with hubexec, hub stack, GCC __attribute__
c) Hybrid model with hubexec, stack in LUT, GCC __attribute__ for COG variables
Looking online, it's clear that we should support explicit allocation of COG and LUT variables via __attribute__ hinting.
Perhaps we could define macros in a propeller2.h header file that define something like this:
unsigned long foo _COG;
char[32] string _LUT;
Then if you use hybrid mode, the compiler would throw an error if there are any _LUT attributes.
In small, all variables would be implied to be _COG and _LUT would be explicit.
For the "no switches" model, it would just do hubexec, hub stack, local registers (r1-r32 perhaps) for passing
I don't believe that in general it is possible for the compiler to determine your programs stack usage by static analysis.
For example, when traversing a binary tree structure the amount of stack used will depend on the depth of the tree. The depth of the tree depends on the data in use at run time which the compiler knows nothing about.
And what about variable length arrays whose size is not known until run time?
Heater,
The stack checking on modern CPUs isn't in the code itself, it's done via page faults in the MMU (like you said). I guess GCC defaults it's stack checking off, but perhaps the backend targeting windows builds still does the page fault method? It might even be a windows executable thing? I am pretty sure I have had gcc built executables throw stack overflow exceptions without me having to enable anything at compile time specifically. However, I rarely use gcc (except propgcc via SimpleIDE), so maybe I am misremembering my compile settings?
I know when I build stuff with VC++ that it handle stack overflow as an exception even in release builds for windows targets. There are addition options for checking stack frames and other things that typically you only turn on for debug builds, but overflow is pretty much a given.
pedward,
Not sure which is better, having specific attribute tags for COG/LUT, or just having 256+ registers (we can just let all/most of COG mem be registers) and letting the compiler just do it's thing? The perf difference between accessing HUB vs COG isn't going to be that big a deal in most cases. And the compiler will use registers for things like loop indices and locals, as well as for parameter passing. Seems like we should get the simplest thing up and running and then test to find the areas where we need to add things to fix performance.
I don't recall exactly but certainly some kind of segmentation fault happens when the stack overflows using GCC on Linux. Generated by the MMU as you say. For embedded systems with no MMU it just runs off the end and bumps into the heap or whatever leading to all kind of Heisenbugs.
pedward,
Not sure which is better, having specific attribute tags for COG/LUT, or just having 256+ registers (we can just let all/most of COG mem be registers) and letting the compiler just do it's thing? The perf difference between accessing HUB vs COG isn't going to be that big a deal in most cases. And the compiler will use registers for things like loop indices and locals, as well as for parameter passing. Seems like we should get the simplest thing up and running and then test to find the areas where we need to add things to fix performance.
I very much agree, Roy. We have to walk before we can run, and the first steps are getting a compiler to work that looks a lot like the P1 PropGCC. Refinements can come later.
The basic instruction set that the compiler will want to use hasn't changed that much from P1, at least at the assembly language level. So I think the hardest part of a gcc port to P2 will be getting an assembler and linker that gcc can use. We have 3 P2 assemblers now (PNut, p2asm, and fastspin) but none of them are quite as full featured as the GNU assembler. For linking we need the assembler to be able to generate relocatable output (ELF or COFF files, probably) with code and data separated into their own sections. I think p2asm is probably the closest to satisfying the compiler's needs, and represents the best starting point. For a linker I don't know whether it makes sense to port binutils to P2 (it would provide a lot of other tools like ar, nm, for free) or to roll our own. Probably porting binutils is the best path. None of this is going to be glamorous or even particularly interesting work . It's the plumbing that is needed for most languages to work though.
pedward,
Not sure which is better, having specific attribute tags for COG/LUT, or just having 256+ registers (we can just let all/most of COG mem be registers) and letting the compiler just do it's thing? The perf difference between accessing HUB vs COG isn't going to be that big a deal in most cases. And the compiler will use registers for things like loop indices and locals, as well as for parameter passing. Seems like we should get the simplest thing up and running and then test to find the areas where we need to add things to fix performance.
I very much agree, Roy. We have to walk before we can run, and the first steps are getting a compiler to work that looks a lot like the P1 PropGCC. Refinements can come later.
The basic instruction set that the compiler will want to use hasn't changed that much from P1, at least at the assembly language level. So I think the hardest part of a gcc port to P2 will be getting an assembler and linker that gcc can use. We have 3 P2 assemblers now (PNut, p2asm, and fastspin) but none of them are quite as full featured as the GNU assembler. For linking we need the assembler to be able to generate relocatable output (ELF or COFF files, probably) with code and data separated into their own sections. I think p2asm is probably the closest to satisfying the compiler's needs, and represents the best starting point. For a linker I don't know whether it makes sense to port binutils to P2 (it would provide a lot of other tools like ar, nm, for free) or to roll our own. Probably porting binutils is the best path. None of this is going to be glamorous or even particularly interesting work . It's the plumbing that is needed for most languages to work though.
If we're going to go to the trouble of porting the rest of binutils would it make sense to port gas as well?
I think there should be some way to do a static analysis, post-compile, to ensure your COG memory usage does not result in the stack smashing local variables, even if it's just a tool that analyzed the linker maps and ensures there isn't overlapping ranges.
Most linkers I use throw an error, if there is a memory overlap.
It's also common for them to report the size used of each linked segment.
In a P2 linker, I guess that means 8 separate reports for the individual cog/cores.
I'm thinking we'll end up with memory models anyway, for the following cases:
a) Small bits of code all self-contained in COG space (COG/LUT RAM used only)
b) Typical model with hubexec, hub stack, GCC __attribute__
c) Hybrid model with hubexec, stack in LUT, GCC __attribute__ for COG variables
Yes, & there seems a case for code going into LUT first, if LUT is free, as COG RAM is more valuable for variables.
That may mean directives for this type of message to the linker
* Must go into COG
* Must go into LUT
* Must go into HUB
* Can go into either COG or LUT (COG first)
* Can go into either LUT or COG (LUT first)
* Can go into either COG or LUT or HUB
ersmith,
Yeah not particularly exciting work in of itself, but what it will enable is exciting.
Also, seems like if we would like to have P2 support merged back into the mainline (which I think might be a good long term goal), then we should try to do things as properly as possible, right? I need to spend some time looking at the docs for creating GCC backends, and also the existing P1 implementation, just so I am less in the dark on this topic.
Parallax always wants to keep the GCC compiler current. What prevents us from doing this is not being able to test and document the changes that we're putting in place. We are known by our educational customers for providing exceptional stability in software and hardware, with backward compatibility and without breaking things along the way.
I think it's more a matter of communication between us and the developers to get all these pieces in place at the right time. We can plan and execute on most improvements.
Please realize that in some way our business is indeed a game of whack-a-mole; we have to continually shift our efforts and resources to sustain business and development of P2. It may seem that our silence to requests to update the compiler have been perceived as a lack of interest, but it's not the case.
It seems it's time to talk about C/C++ options again. Here are the ones I know of:
1) p2gcc
2) FastSpin/C (just C, no C++ planned)
3) GCC
4) LLVM
We have some experience among us with GCC but none for LLVM that I know of so GCC is probably an easier route than LLVM. Dave has said that p2gcc is just a bridge tool and that he doesn't expect it to be a long-term solution for C/C++ on the P2. However, p2asm might be a good alternative to gas if it gets the ability to generate ELF files that can be understood by the binutils linker. I'm not sure how far Eric plans to take FastSpin/C but it would be a good alternative since it provides support for not only C but also Spin and BASIC.
Yes, we had PropGCC working with an earlier P2 but that was before there was hubexec support so it still used LMM. The code is still in the Parallax propgcc repository as far as I know.
Other than GCC, are any of the other open source tools likely to still be around years from now?
I ask because so many P1 tools just died, for example, Propeller Tool because of QT.
I'm not sure I understand your question. If you're talking about things like SimpleIDE or OpenSpin or PropLoader, you'll have to ask Parallax. Any open source tool will almost by definition continue to remain around if there is anyone who still cares about it since the source is available for someone to adopt if the original author loses interest. Some of the tools that have disappeared like BST were not open source and hence died when their original authors abandoned them.
Comments
I think that's what Ken meant "our [branch of] GCC is outdated and not maintained".
The only reason I've heard from Parallax for not integrating the new compiler is that they saw some bug with libraries, or some incompatibility (which I think you or another PropGCC dev confirmed an ABI change?).
So, is Parallax going to be okay with ABI changes in the future? If not, can we convince them otherwise? If not, we need to do everything we can to avoid ABI changes in the future... sometimes change is inevitable, but if we know the cost of making an ABI change is Parallax never adopting it, that might change how hard we look for an alternative solution. And maybe not - but at least we'll all feel better knowing ahead of time what the outcome will be (and we all know it's all about our feelings, isn't it?).
The other thing I'd like see created (if they exists already in PropGCC, I'd love to know where and how to use them) are tests for as many library functions as humanly possible. Some tests can be "hardware independent" and others will need a special board with peripherals hooked up to the right pins. For the hardware-dependent tests, some can be verified automatically via the P2 test code, and others will likely need a human observing and making note of failures.
With all that existing, one board should be accessible from a continuous integration server and that CI server should be configured to execute on every merge all tests except those which require human confirmation. That will be a lot of work writing all those tests (and maybe this ends up being my contribution?) but I can't imagine any better way to give Parallax the confidence they need to release a new version of PropGCC to public when its their reputation and revenue on the line.
For instance, in some cases you'd want to leverage the 2K of LUT ram for static variables, or not.
You'd want to use perhaps half of COG ram for static variables, or only use it for locals (stack variables).
Then of course you'd defer BIG allocations to hub heap space.
If there isn't anything simple like that, perhaps:
a) use volatile to signify a variable should be stored in COG RAM, then use LUT for stack
b) use volatile to signify a COG variable, LUT for user code, COG for locals
c) use volatile for COG variable, LUT for user code, hub ram for locals
The LUT memory usage for locals could be a compiler switch, using hub ram or COG ram for locals could be handled just by overflow, a locals stack would be 256 longwords, any locals that need more would just overflow to hub ram as needed. The code generation wouldn't need to do multi-pass analysis to support that.
The assembler or linker would have to verify that COG ram wasn't exhausted by the stack/volatile allocations -- you certainly don't want code to be assembled/linked that would allow the stack to be stepped on, or allow the stack to step on volatile variables.
Just some thoughts on how to best support the P2 memory model.
Another one, maybe, that is COG/LUT only where it favors LUT for code (but also has a small stack there), leaving COG for registers/data.
The issue of "volatile" variables is something else. "volatile" only indicates to the compiler that it does not have sole access to a variable and therefore prevents it performing optimizations that would cause shared data to be used wrongly.
Agreed! It's essential to keep integrating upstream changes into our GCC (and GAS and libc) fork while we work on our own version. That's also the only way to ever get the Propeller target merged into the mainstream. Which should be a goal.
===Jac
Um, Most C/C++ compilers (VC, GCC, Clang) do compile in stack overflow checks, and have a fixed stack size (specified at compile time). They may offer ways to remove those checks, but I wouldn't use those options.
On Many CPUs it's not a check at each usage of the stack, but instead a memory page fault mechanism.
For the HUBEXEC model, stack should go in HUB (and can be read/written with the normal HUB access instructions so as to avoid conflict with HUBEXEC in the fifo).
Certainly on machines with MMUs the stack checking is performed in hardware.
I think there should be some way to do a static analysis, post-compile, to ensure your COG memory usage does not result in the stack smashing local variables, even if it's just a tool that analyzed the linker maps and ensures there isn't overlapping ranges.
I'm thinking we'll end up with memory models anyway, for the following cases:
a) Small bits of code all self-contained in COG space (COG/LUT RAM used only)
b) Typical model with hubexec, hub stack, GCC __attribute__
c) Hybrid model with hubexec, stack in LUT, GCC __attribute__ for COG variables
Looking online, it's clear that we should support explicit allocation of COG and LUT variables via __attribute__ hinting.
Perhaps we could define macros in a propeller2.h header file that define something like this:
unsigned long foo _COG;
char[32] string _LUT;
Then if you use hybrid mode, the compiler would throw an error if there are any _LUT attributes.
In small, all variables would be implied to be _COG and _LUT would be explicit.
For the "no switches" model, it would just do hubexec, hub stack, local registers (r1-r32 perhaps) for passing
For example, when traversing a binary tree structure the amount of stack used will depend on the depth of the tree. The depth of the tree depends on the data in use at run time which the compiler knows nothing about.
And what about variable length arrays whose size is not known until run time?
The stack checking on modern CPUs isn't in the code itself, it's done via page faults in the MMU (like you said). I guess GCC defaults it's stack checking off, but perhaps the backend targeting windows builds still does the page fault method? It might even be a windows executable thing? I am pretty sure I have had gcc built executables throw stack overflow exceptions without me having to enable anything at compile time specifically. However, I rarely use gcc (except propgcc via SimpleIDE), so maybe I am misremembering my compile settings?
I know when I build stuff with VC++ that it handle stack overflow as an exception even in release builds for windows targets. There are addition options for checking stack frames and other things that typically you only turn on for debug builds, but overflow is pretty much a given.
Not sure which is better, having specific attribute tags for COG/LUT, or just having 256+ registers (we can just let all/most of COG mem be registers) and letting the compiler just do it's thing? The perf difference between accessing HUB vs COG isn't going to be that big a deal in most cases. And the compiler will use registers for things like loop indices and locals, as well as for parameter passing. Seems like we should get the simplest thing up and running and then test to find the areas where we need to add things to fix performance.
I very much agree, Roy. We have to walk before we can run, and the first steps are getting a compiler to work that looks a lot like the P1 PropGCC. Refinements can come later.
The basic instruction set that the compiler will want to use hasn't changed that much from P1, at least at the assembly language level. So I think the hardest part of a gcc port to P2 will be getting an assembler and linker that gcc can use. We have 3 P2 assemblers now (PNut, p2asm, and fastspin) but none of them are quite as full featured as the GNU assembler. For linking we need the assembler to be able to generate relocatable output (ELF or COFF files, probably) with code and data separated into their own sections. I think p2asm is probably the closest to satisfying the compiler's needs, and represents the best starting point. For a linker I don't know whether it makes sense to port binutils to P2 (it would provide a lot of other tools like ar, nm, for free) or to roll our own. Probably porting binutils is the best path. None of this is going to be glamorous or even particularly interesting work . It's the plumbing that is needed for most languages to work though.
Most linkers I use throw an error, if there is a memory overlap.
It's also common for them to report the size used of each linked segment.
In a P2 linker, I guess that means 8 separate reports for the individual cog/cores.
Yes, & there seems a case for code going into LUT first, if LUT is free, as COG RAM is more valuable for variables.
That may mean directives for this type of message to the linker
* Must go into COG
* Must go into LUT
* Must go into HUB
* Can go into either COG or LUT (COG first)
* Can go into either LUT or COG (LUT first)
* Can go into either COG or LUT or HUB
Yeah not particularly exciting work in of itself, but what it will enable is exciting.
Also, seems like if we would like to have P2 support merged back into the mainline (which I think might be a good long term goal), then we should try to do things as properly as possible, right? I need to spend some time looking at the docs for creating GCC backends, and also the existing P1 implementation, just so I am less in the dark on this topic.
I think it's more a matter of communication between us and the developers to get all these pieces in place at the right time. We can plan and execute on most improvements.
Please realize that in some way our business is indeed a game of whack-a-mole; we have to continually shift our efforts and resources to sustain business and development of P2. It may seem that our silence to requests to update the compiler have been perceived as a lack of interest, but it's not the case.
Ken Gracey
1) p2gcc
2) FastSpin/C (just C, no C++ planned)
3) GCC
4) LLVM
We have some experience among us with GCC but none for LLVM that I know of so GCC is probably an easier route than LLVM. Dave has said that p2gcc is just a bridge tool and that he doesn't expect it to be a long-term solution for C/C++ on the P2. However, p2asm might be a good alternative to gas if it gets the ability to generate ELF files that can be understood by the binutils linker. I'm not sure how far Eric plans to take FastSpin/C but it would be a good alternative since it provides support for not only C but also Spin and BASIC.
Other than GCC, are any of the other open source tools likely to still be around years from now?
I ask because so many P1 tools just died, for example, Propeller Tool because of QT.
The command line programs that actually do the work should not go away as long as they are open source.