Just so I understand, then, the "conversion" you speak of was to take the current x86asm Spin/PASM compiler and convert it to C or C++ so that it would be natively compatible with the GCC toolchain? And what you're suggesting now, until that work is complete, is a viable workaround, correct? Is there any plan to include a linking loader that can bring together compiled object modules from disparate sources and produce a single executable binary? Or would each module's code have to end up in its own cog, as your "workaround" description suggests. IOW, will there be any universal hooks to allow a C program to call a Spin method and vice-versa? Or would such a coupling be considered a crime against nature?
Phil,
Yes, the conversion I am talking about is the x86 spin/pasm compiler conversion into C++. The code could be linked directly into the toolchain as needed.
I imagined it being something that would compile the spin file and include it in the binary along with the C/C++ code. The main problem to resolve is how to handle the spin bytecode. Do we convert it to some C LMM code, or launch it in a separate cog? I'd be inclined to convert it, but that is another significant piece of work (that may or may not be already partially done in a usable form). It would be nice to "expose" spin functions as if they were extern C functions and the "global vars" as globals in the C/C++ side too. Some thinking needs to be done on the details, but my ideal situation is taking a spin file from obex as is and "linking" it in with the C/C++ code and being able to call the spin functions and access the vars directly.
Including SPIN byte-codes in a VM or translation to SPIN LMM both seem to be attractive solutions to SPIN integration issues. I'm sure Ross would have an opinion on all this.
The first gives a faster path, the second gives easier same COG inter-operability (think P2 also) and better performance. It would not be too hard to create include files from the SPIN objects, but associating a C call with a SPIN method library might be more difficult for a modified SPIN byte-code VM.
It should be made clear that the current requirements do not include any C and SPIN bytecode machine inter-operability. The only thing we have planned for from SPIN is a PASM blob that can be started with cognew like is widely done today.
If Parallax wants the feature for running SPIN programs with propgcc, that must be planned. It is not in any plan at the moment. We will most likely finish the current plan first.
I think Catalina has already solved this the best way possible. You are welcome to use that solution (especially as it is loosely based on your original spinc utility)
- COG usage, i.e. writing applications that load COG's with device drivers, etc. that are written in the new compiler, or possibly imported as binaries from other sources,or even using SPIN source.
As far as I know nothing about the direction has changed since the decision was made to focus exclusively on the current Propeller several weeks ago.
The post you refer to was my own independent initiative. It was not sanctioned by Parallax or anyone else on the team. The only a answer to that initiative that I've heard so far from others is that I should never post anything like that again without talking to the principals. I submit to that guidance.
I will take more time to respond to your areas of interest later. I didn't want you to feel ignored.
Previously reported progress is based on models that may not necessarily be kept in the end product.
As mentioned a few times now, future status will be posted on a forum blog.
Still, I will entertain reasonable questions where I have solid answers here.
The post you refer to was my own independent initiative. It was not sanctioned by Parallax or anyone else on the team. The only a answer to that initiative that I've heard so far from others is that I should never post anything like that again without talking to the principals. I submit to that guidance.
Hi jazzed,
Does that mean I should not expect either you or Parallax to respond any further to the suggestions you raised in that thread? I wish you guys would sort yourselves out a bit - I'm having a hard tiome keeping up with all the chopping and changing.
Does that mean I should not expect either you or Parallax to respond any further to the suggestions you raised in that thread? I wish you guys would sort yourselves out a bit - I'm having a hard tiome keeping up with all the chopping and changing.
Ross, Parallax may have something to say about it later.
I will answer some questions, but I will not respond to any shade of provocation.
Hi Jazzed,
Ok - without meaning to be provocative, I guess I'm just having a hard time understanding how the GCC team and Parallax actually interact. I must admit I have lost track of who is actually on the GCC team - but don't some members of the team actually work for Parallax?
Are you able to tell us who is on the 'Team' and what their primary roles are?
We have regular forum contributors and some GCC core contractors.
I am very proud of the team, but their participation is their business.
The contributors can volunteer that information if they like.
I'm one of the members of the GCC team. My main responsibility is the P2 simulator. However, that activity is on hold right now, and I've been writing small test programs to check out the compiler. I work as a volunteer, and do about 8 hours a week on the project.
Chip has FPGA prototypes. Masked P2 silicon does not exist. Parallax can update the schedule.
The simulator is based on instructions as of June 22. Some key instructions have changed since then.
Chip has FPGA prototypes. Masked P2 silicon does not exist. Parallax can update the schedule.
The simulator is based on instructions as of June 22. Some key instructions have changed since then.
Thanks, I was just curious as to how you were pushing on with a simulator with no hardware to speak of.
This is already looking very good. I just pulled up some LMM benchmarks results from the Compiler Benchmark thread.
prop-gcc:
fibo(26) = 121393 (01909ms) (152791280 ticks)
Catalina:
fibo (26) = 121393 (4583ms)
Zog Virtual Machine (zpu-gcc):
fibo(26) in 12288ms
Is it really so that prop-gcc is already turning in over twice the speed of Catalina for LMM code? Hope I have the latest numbers there and an not doing RossH a disservice.
Ignore the Zog result, I just threw that in there as Zog gets grumpy if I don't mention him occasionally. Well, no, just to show that the VM is doing quite well compared to "native" LMM.
This is already looking very good. I just pulled up some LMM benchmarks results from the Compiler Benchmark thread.
prop-gcc:
fibo(26) = 121393 (01909ms) (152791280 ticks)
Catalina:
fibo (26) = 121393 (4583ms)
Zog Virtual Machine (zpu-gcc):
fibo(26) in 12288ms
Is it really so that prop-gcc is already turning in over twice the speed of Catalina for LMM code? Hope I have the latest numbers there and an not doing RossH a disservice.
Ignore the Zog result, I just threw that in there as Zog gets grumpy if I don't mention him occasionally. Well, no, just to show that the VM is doing quite well compared to "native" LMM.
How about the code size for that fibo() function?
Hi heater,
Yes, you have the right numbers for Catalina. Looks like I may have to tweak my code generator yet again ... but perhaps not just yet. It is possible to write a code generator that gives good results for fibo, but not much else. There was the well known case of "another C compiler" that gave very good numbers in small fibo cases but crashed and burned on larger fibo cases.
Not saying that gcc will do the same (in fact I'm confident it wont!) - I'm just saying that fibo is not a good overall guide because it doesn't exercise very much of the code generator.
Still, this looks quite promising. Catalina is pleased to finally see some competition - especially after sending Zog back to his ice cube in shame
I agree, I think we and others have often said that fibo() is a pretty poor benchmark of anything useful about a compiler. Especially in space constrained and real-time systems where heavily recursive functions are not the norm.
Hmm..Makes me think, are they using the same recursive fibo() that we do or a more straight forward one that would obviously be much faster?
So yes, I would not jump to any conclusions just yet. Let's see what they can do with fft_bench for example, which is an all round better test of loop constructs and expression evaluation.
Ahhh! Zog feels to shame. In fact he's quite proud of himself.
Look at the figures. If I understand correctly what we have in Zog is a byte code interpreter that is 6.4 times slower than prop-gcc and 2.6 times slower than Catalina. Please show me another virtual machine (without JIT) that compares so well to native compiled code. That's like saying Zog can interpret his byte codes in a loop of only 3 or 6 native instructions.
If anything the native code compilers should be holding their heads in shame for performing so badly compared to a byte code interpreter.
But I forget my own argument, LMM is not native code, it's a VM as well. In that light Catalina and prop-gcc are doing OK I guess:)
But I forget my own argument, LMM is not native code, it's a VM as well. In that light Catalina and prop-gcc are doing OK I guess:)
Propeller GCC produces native COG code.
The 486ms FIBO26 result was produced by running native COG code.
FIBO is a terrible benchmark especially in certain cases.
I suspect that most users whose programs target the cog will not be doing anything so stack-centric as recursive functions. A better benchmark for the cog code generator might be something more DSP-like that uses a lot of math operations and register load/stores, but not so much stack thrashing.
1. BigSpin is not fully functional and is only in the table for posterity.
2. There was a propgcc LMM code generator improvement this morning. FIBO(26) now takes 1767ms.
3. Based on weekend build.
Wow, my PASM version of fft_bench is running somewhere between 30 and 40ms so thats pretty damn good. Catalina turned in about 410ms but perhaps I should let Ross speak on that.
Is it possible to compile only the butterfly function into COG space? Most of the rest of it could remain LMM and just use the butterfly cog through a mailbox/buffer interface.
Heater, I'll do a PASM butterfly version of fft_bench later.
It is very interesting to me that Propeller fft_bench compiled by propgcc lmm runs faster than debian linux on my PC's VirtualBox most of the time - PC's are not very deterministic .
Results for Propeller fft_bench compiled with propgcc are always the same.
fft_bench v1.0
Freq. Magnitude
00000000 1FE
000000C0 1FF
00000140 1FF
00000200 1FF
1024 point bit-reversal and butterfly run time = 161 ms
Here are the varying results from my VirtualBox debian tests:
fft_bench v1.0
Freq. Magnitude
00000000 1fe
000000c0 1ff
00000140 1ff
00000200 1ff
1024 point bit-reversal and butterfly run time = 285 ms
steve@debian-vm:~/gcc/propgcc/demos/fft/lmm$ ./fft
fft_bench v1.0
Freq. Magnitude
00000000 1fe
000000c0 1ff
00000140 1ff
00000200 1ff
1024 point bit-reversal and butterfly run time = 253 ms
steve@debian-vm:~/gcc/propgcc/demos/fft/lmm$ ./fft
fft_bench v1.0
Freq. Magnitude
00000000 1fe
000000c0 1ff
00000140 1ff
00000200 1ff
1024 point bit-reversal and butterfly run time = 149 ms
steve@debian-vm:~/gcc/propgcc/demos/fft/lmm$ ./fft
fft_bench v1.0
Freq. Magnitude
00000000 1fe
000000c0 1ff
00000140 1ff
00000200 1ff
1024 point bit-reversal and butterfly run time = 2972 ms
steve@debian-vm:~/gcc/propgcc/demos/fft/lmm$ ./fft
fft_bench v1.0
Freq. Magnitude
00000000 1fe
000000c0 1ff
00000140 1ff
00000200 1ff
1024 point bit-reversal and butterfly run time = 270 ms
Of course the same code running on my PC in a DOS window screams.
fft_bench v1.0
Freq. Magnitude
00000000 000001fe
000000c0 000001ff
00000140 000001ff
00000200 000001ff
1024 point bit-reversal and butterfly run time = 116 us
Comments
Just so I understand, then, the "conversion" you speak of was to take the current x86asm Spin/PASM compiler and convert it to C or C++ so that it would be natively compatible with the GCC toolchain? And what you're suggesting now, until that work is complete, is a viable workaround, correct? Is there any plan to include a linking loader that can bring together compiled object modules from disparate sources and produce a single executable binary? Or would each module's code have to end up in its own cog, as your "workaround" description suggests. IOW, will there be any universal hooks to allow a C program to call a Spin method and vice-versa? Or would such a coupling be considered a crime against nature?
-Phil
Yes, the conversion I am talking about is the x86 spin/pasm compiler conversion into C++. The code could be linked directly into the toolchain as needed.
I imagined it being something that would compile the spin file and include it in the binary along with the C/C++ code. The main problem to resolve is how to handle the spin bytecode. Do we convert it to some C LMM code, or launch it in a separate cog? I'd be inclined to convert it, but that is another significant piece of work (that may or may not be already partially done in a usable form). It would be nice to "expose" spin functions as if they were extern C functions and the "global vars" as globals in the C/C++ side too. Some thinking needs to be done on the details, but my ideal situation is taking a spin file from obex as is and "linking" it in with the C/C++ code and being able to call the spin functions and access the vars directly.
Roy
The first gives a faster path, the second gives easier same COG inter-operability (think P2 also) and better performance. It would not be too hard to create include files from the SPIN objects, but associating a C call with a SPIN method library might be more difficult for a modified SPIN byte-code VM.
It should be made clear that the current requirements do not include any C and SPIN bytecode machine inter-operability. The only thing we have planned for from SPIN is a PASM blob that can be started with cognew like is widely done today.
If Parallax wants the feature for running SPIN programs with propgcc, that must be planned. It is not in any plan at the moment. We will most likely finish the current plan first.
I think Catalina has already solved this the best way possible. You are welcome to use that solution (especially as it is loosely based on your original spinc utility)
Ross.
The following thread seems to be cause for concern:
http://forums.parallax.com/showthread.php?133618-A-Parallax-C-Proposal
Is the requirements document available for review?
Areas that I think are of interest are:
- Execution Target - PASM / LMM / Other
- Stack location - COG / HUB / External Memory / Other
- COG usage, i.e. writing applications that load COG's with device drivers, etc. that are written in the new compiler, or possibly imported as binaries from other sources,or even using SPIN source.
C.W.
As far as I know nothing about the direction has changed since the decision was made to focus exclusively on the current Propeller several weeks ago.
The post you refer to was my own independent initiative. It was not sanctioned by Parallax or anyone else on the team. The only a answer to that initiative that I've heard so far from others is that I should never post anything like that again without talking to the principals. I submit to that guidance.
The requirements summary is essentially posted here: http://forums.parallax.com/showthread.php?132434-Propeller-GCC-Status&p=1024532&viewfull=1#post1024532
I will take more time to respond to your areas of interest later. I didn't want you to feel ignored.
Previously reported progress is based on models that may not necessarily be kept in the end product.
As mentioned a few times now, future status will be posted on a forum blog.
Still, I will entertain reasonable questions where I have solid answers here.
Thanks,
--Steve
Hi jazzed,
Does that mean I should not expect either you or Parallax to respond any further to the suggestions you raised in that thread? I wish you guys would sort yourselves out a bit - I'm having a hard tiome keeping up with all the chopping and changing.
Ross.
I will answer some questions, but I will not respond to any shade of provocation.
Hi Jazzed,
Ok - without meaning to be provocative, I guess I'm just having a hard time understanding how the GCC team and Parallax actually interact. I must admit I have lost track of who is actually on the GCC team - but don't some members of the team actually work for Parallax?
Ross.
Are you able to tell us who is on the 'Team' and what their primary roles are?
Thank you,
Coley
I am very proud of the team, but their participation is their business.
The contributors can volunteer that information if they like.
Do you have P2 hardware prototypes to work on or is everything being done in theory?
Coley
The simulator is based on instructions as of June 22. Some key instructions have changed since then.
C.W.
This is already looking very good. I just pulled up some LMM benchmarks results from the Compiler Benchmark thread.
prop-gcc:
fibo(26) = 121393 (01909ms) (152791280 ticks)
Catalina:
fibo (26) = 121393 (4583ms)
Zog Virtual Machine (zpu-gcc):
fibo(26) in 12288ms
Is it really so that prop-gcc is already turning in over twice the speed of Catalina for LMM code? Hope I have the latest numbers there and an not doing RossH a disservice.
Ignore the Zog result, I just threw that in there as Zog gets grumpy if I don't mention him occasionally. Well, no, just to show that the VM is doing quite well compared to "native" LMM.
How about the code size for that fibo() function?
Hi heater,
Yes, you have the right numbers for Catalina. Looks like I may have to tweak my code generator yet again ... but perhaps not just yet. It is possible to write a code generator that gives good results for fibo, but not much else. There was the well known case of "another C compiler" that gave very good numbers in small fibo cases but crashed and burned on larger fibo cases.
Not saying that gcc will do the same (in fact I'm confident it wont!) - I'm just saying that fibo is not a good overall guide because it doesn't exercise very much of the code generator.
Still, this looks quite promising. Catalina is pleased to finally see some competition - especially after sending Zog back to his ice cube in shame
Ross.
Hmm..Makes me think, are they using the same recursive fibo() that we do or a more straight forward one that would obviously be much faster?
So yes, I would not jump to any conclusions just yet. Let's see what they can do with fft_bench for example, which is an all round better test of loop constructs and expression evaluation.
Ahhh! Zog feels to shame. In fact he's quite proud of himself.
Look at the figures. If I understand correctly what we have in Zog is a byte code interpreter that is 6.4 times slower than prop-gcc and 2.6 times slower than Catalina. Please show me another virtual machine (without JIT) that compares so well to native compiled code. That's like saying Zog can interpret his byte codes in a loop of only 3 or 6 native instructions.
If anything the native code compilers should be holding their heads in shame for performing so badly compared to a byte code interpreter.
But I forget my own argument, LMM is not native code, it's a VM as well. In that light Catalina and prop-gcc are doing OK I guess:)
The 486ms FIBO26 result was produced by running native COG code.
FIBO is a terrible benchmark especially in certain cases.
I've posted code to the blog for your appraisal.
xBasic uses a byte-code VM and runs FIBO(26) in 6.4 seconds. Again FIBO is a terrible benchmark.
-Phil
Yep, I'm hoping prop-gcc can compile fft_bench into the COG space. Altogether a lot more realistic test.
Here's a FIBO(20) language round-up on 80MHz Propellers.
The propgcc compiler is not ready for general testing yet.
1. BigSpin is not fully functional and is only in the table for posterity.
2. There was a propgcc LMM code generator improvement this morning. FIBO(26) now takes 1767ms.
3. Based on weekend build.
Heater's fft_bench.c code is too big for a single COG. Here's the lmm result.
fft_bench v1.0
Freq. Magnitude
00000000 1FE
000000C0 1FF
00000140 1FF
00000200 1FF
1024 point bit-reversal and butterfly run time = 161 ms
How does it compare?
Is it possible to compile only the butterfly function into COG space? Most of the rest of it could remain LMM and just use the butterfly cog through a mailbox/buffer interface.
It is very interesting to me that Propeller fft_bench compiled by propgcc lmm runs faster than debian linux on my PC's VirtualBox most of the time - PC's are not very deterministic
Results for Propeller fft_bench compiled with propgcc are always the same.
Here are the varying results from my VirtualBox debian tests:
Of course the same code running on my PC in a DOS window screams.