using a COG as a floating point coprocessor
ersmith
Posts: 6,053
Here's a simple C file that allows one to use a dedicated COG as a floating point coprocessor. It accelerates the basic operations (add, subtract, multiply, divide, and conversions from int and between floats) in 32 and 64 bit forms.
To use it, just include fpucog.c in your SimpleIDE project or Makefile; you don't have to do anything else. The function start_fpu_cog() to start the COG is delcared as a constructor, so it is called automatically when your program starts. You can shut the COG down with stop_fpu_cog(), but if you do this no floating point will be available at all until you call start_fpu_cog() again.
The program is also interesting because it illustrates how to include a substantial GAS assembly code program inline in a C file.
Performance is OK in LMM mode (it's a bit faster than the default library, and a few K smaller). In CMM mode and especially the XMM modes the increased performance is much more noticeable. For example, with the default libraries and 64 bit doubles the Solar Positioning Algorithm takes 108 seconds in XMMC mode. With fpucog.c and the same size doubles it takes 12 seconds.
Note that this code is not thread safe. Making it thread safe should be pretty easy, and is left as an exercise for the reader :-).
EDIT: there were some problems with fpucog.c in CMM mode; I've updated the .zip file with a new version to work around these. Note that fpucog.c does require the compiler from the CMM preview release, as it has some CMM directives in the assembly code.
EDIT: released version 4 that fixed the integer to float conversion, and added some #ifdefs to allow compilation with older PropGCC releases. If you use an older release (pre-CMM) you will get a warning about unknown attribute((fcache)), but that warning is safe to ignore.
Version 4 also has an optional ECOG define to allow the FPU COG code to reside in EEPROM.
EDIT: and now there's version 5, fixing a small bug in the handling of the sticky bit (which affects rounding).
To use it, just include fpucog.c in your SimpleIDE project or Makefile; you don't have to do anything else. The function start_fpu_cog() to start the COG is delcared as a constructor, so it is called automatically when your program starts. You can shut the COG down with stop_fpu_cog(), but if you do this no floating point will be available at all until you call start_fpu_cog() again.
The program is also interesting because it illustrates how to include a substantial GAS assembly code program inline in a C file.
Performance is OK in LMM mode (it's a bit faster than the default library, and a few K smaller). In CMM mode and especially the XMM modes the increased performance is much more noticeable. For example, with the default libraries and 64 bit doubles the Solar Positioning Algorithm takes 108 seconds in XMMC mode. With fpucog.c and the same size doubles it takes 12 seconds.
Note that this code is not thread safe. Making it thread safe should be pretty easy, and is left as an exercise for the reader :-).
EDIT: there were some problems with fpucog.c in CMM mode; I've updated the .zip file with a new version to work around these. Note that fpucog.c does require the compiler from the CMM preview release, as it has some CMM directives in the assembly code.
EDIT: released version 4 that fixed the integer to float conversion, and added some #ifdefs to allow compilation with older PropGCC releases. If you use an older release (pre-CMM) you will get a warning about unknown attribute((fcache)), but that warning is safe to ignore.
Version 4 also has an optional ECOG define to allow the FPU COG code to reside in EEPROM.
EDIT: and now there's version 5, fixing a small bug in the handling of the sticky bit (which affects rounding).
zip
9K
Comments
This looks like a great idea!
Unfortunately, i'm having trouble with it. Seems like the serial IO is getting trashed. I'll post a .zip example in a moment.
I've attached David's expr program. If i do linux $ make run with the default makefile everything works. If I add fpucog.c (uncomment #FPU=fpucog.c), I don't get any IO. Similar behavior on linux/windows with the .side project file.
Without fpucog.c
Really? Nothing wasted then.
I didn't do enough testing in CMM mode, obviously. There are at least 3 problems:
(1) in the version I posted I forgot the .compress off/.compress default around the FPU COG code, which meant the COG is doing random things
(2) there's a bug in CMM FCACHE code which is being exposed by fpucog.c; disabling the __attribute__((fcache)) works around this for now
(3) there's also some kind of problem in the ITOF conversion functions which I haven't figured out yet; I've disabled those for now
I'll update the original post with fpucog2.zip, containing these fixes.
Eric
This is great, and something that I've been meaning to do. I love that now the brunt of the work has been done for me !
However, I'm scratching my head wondering why the cog portion was done as inline assembly; was it for demonstration purposes, showing what is possible ? Why not do a separate assembly file (fpucog.S, and call the other one fpuwrapper.c or some such), so you you can get rid of all the quotes, newlines, and .equs, and have it pre-processed like a regular GCC file, with #defines and block comments (and even #ifdefs if you want) ? This would also make it much easier to implement as an ecog if the user wants to (and I do !).
For example, in my Makefiles, I have two symbols, COGSRC and ECOGSRC; if I comment out COGSRC+=fpucog.S and put in ECOGSRC+=fpucog.S, the details are done for me (except for changing the cognew() to a cognew_from_boot_EEPROM() in the main program - or wrapper - but this could also be done with an #ifdef-#else). This also works with cogc code - but, either way, the cog code has to be in a separate file for this automation to work. To me, this sort of configurability and maintainability is what makes GCC so great, and why I could never get Catalina (or even Spin) to work for me, even though Ross did some really good work.
Since I want this functionality, at some point, I'm going to change your code this way. If you are interested, I would be quite willing to take this on now, and give you the files to modify if you need to tweak some other things to make it work properly with all the memory models. What I don't want to have to do is re-port your tweaks from your code to mine.
So... should I do it ?
Thanks for your great work (and I'm looking to give something back),
David Voss
@David Voss, any contributions are welcome. I look forward to the results.
@Heater. That's right, nothing wasted. I thought you knew.
It was mainly for demonstration purposes, to show how to do inline GAS (lots of people have asked). It also makes the "packaging" simpler; users only have to add one file instead of two to their projects, and I wasn't quite sure if SimpleIDE handled .S files that need pre-processing correctly. That said I do agree with you that in general it would be better to have the assembly as a separate file; it would certainly make the source more readable.
It is still possible to do some #ifdefs and conditional compilation (around the strings inside the asm) and the new version I've uploaded does that. In version 4 I've added an ECOG define to allow it to be built as a .ecog instead of .cog.
However, I'm probably not going to update this very much more -- it's mostly intended as a demo. So if you want to take it and run with it, please be my guest! I certainly don't mind forks, and I expected users to customize it for various purposes anyway. There's not a whole lot of room left in the COG, so people will have different ideas about what additional floating point functions should go in there.
Thanks,
Eric
Indeed! ZOG is a cool project, and an inspiration to everyone developing C on the Propeller.
I would have done the communication with the cog differently, though: instead of a mailbox with one command, I would set up a stack and "push" the numbers and operations onto it, and then use a lock to start the operation. The cog would work as an RPN calculator, pulling commands and numbers off the stack and (eventually) leaving the stack with just one number which is the result of the operation. If you've ever worked with Forth, you'll know what I mean. Incidentally, I think this is also how "real" FPUs work if I understand correctly.
Keep up the good work!
===Jac
fpucog.c uses the same interface, so it replaces the default floating point routines with no other changes required.
A stack based FPU would be a great idea if it went along with changes to GCC's code generator so that it used the FPU to do all the math (generating things like "fpush A; fpush B; fadd" for A+B). But that's beyond the scope of what I wanted to do in this project :-).
Thanks for the comments!
Eric
Awesome!