For the uninitiated: RSN is neither the "Realschule Schloss Neuenahr" nor the "Rostocker Studentinnen Netz", but a much feared abbreviation for the waiting for kingdom come: "Real Soon Now"
The asm/linker/LMM kernel team continues to make progress. The LMM kernel is taking baby steps. I need to decide whether to add things like divide/mul as LMM kernel code (fast but take up space), or do it as normal HUB routines (slow). We are using a lot of the COG RAM. I know Bill and Mike's ideas were to add multi-threading capability etc. to the Prop, but at this junction, I want to see how fast we can push this puppy using as much as COG RAM as possible for the C Virtual Machine. A multi-threading kernel can use HUB RAM of course (slow again...). We will see...
Oh, to re-iterate: the beta compiler will be "free" (to use). On release, the demo is fully functional for 45 days with a K size limit for non-commercial use afterward. For the Propeller, I am thinking that 4K may be a good value for X. This will allow hobbyists to test drive it etc.
We just raised the price of our STD compilers to $249. Prices have not changed for well for 7 years!!! so it's about time, with inflation and all. I understand that a lot of the Propeller users are hobbyists so I may run a "release special" for $99 or something like that for a few months. Heck, if you want to send me $99 now to keep us going (hey, pizza and ramen can get expensive), I'll take it
ImageCraft said...
.... I need to decide whether to add things like divide/mul as LMM kernel code
Do it! I thought about this some time ago....
You will ruin any benchmark past recovery without.
And how would you handle multi dimensional arrays? Though that's multiplication only...
And how would you handle multi dimensional arrays? Though that's multiplication only...
The syntax and semantic of the multi-dimensional arrays are (generically) supported by the portable ANSI C front end (if it's in C86, we support it). The code generator supports consists of generating either MUL, shift, shift/add code. As the Propeller has single instruction cycle multiple bits shifts, we will aggressively use that for MUL by constants.
The Propeller has a nice barrel shifter, but the LMM could counterballance that.
Multiplication by (constant!) 10 is a much used operation, also out of psychological reasons (whenever you need a number between 8 and 12 you often choose 10 )
MOV r0, x
SHL x, #2
ADD x, r0
SHL x, #1
This has some advantages wrt operand flexibility..
Maybe your front end is smart enough to generate the shortest sequence of code....for e.g. times 1023
I of course see your points as e.g. crowding the COG.... but the opcodes are there, to be used by Pro II..
But you are aware that some SPIN programs will run circles around C if you ommit MULT/DIV from the COG?
ImageCraft said...
... I know Bill and Mike's ideas were to add multi-threading capability etc. to the Prop, but at this junction, I want to see how fast we can push this puppy using as much as COG RAM as possible for the C Virtual Machine. A multi-threading kernel can use HUB RAM of course (slow again...). We will see...
So you will support some form of threading though right? Just wondering if there is another approach to engauging the cores other than threading (POSIX or otherwise)? Will you support pthreads or equivalent? I'm not asking for an extra feature, but without threading, prop running C might be just another uC. What other library support is planned?
Make this a compile time option, perhaps with some others. That way, the limited COG ram, and this first pass release, have max flexibility for the programmer.
If adding these in COG means surrendering some other functionality, default to be HUB so everything just works. Then those that want faster can make their choices.
deSilva said...
The Propeller has a nice barrel shifter, but the LMM could counterballance that.
Multiplication by (constant!) 10 is a much used operation, also out of psychological reasons (whenever you need a number between 8 and 12 you often choose 10 )
MOV r0, x
SHL x, #2
ADD x, r0
SHL x, #1
This has some advantages wrt operand flexibility..
Maybe your front end is smart enough to generate the shortest sequence of code....for e.g. times 1023
I of course see your points as e.g. crowding the COG.... but the opcodes are there, to be used by Pro II..
But you are aware that some SPIN programs will run circles around C if you ommit MULT/DIV from the COG?
The front end doesn't generate the sequence per se, it's up to the backend to select the best code. There are several algorithms to generate (near-?) optimal sequence of shift/add/sub so it should not be a major problem. As for benchmarking, it's always possible to construe tests to show certain things in best light. For example, a strcpy implemented in unrolled C will be much faster than a compact asm single copy loop. Anyway, the goal is to make it fast enough and eliminate inefficiency if possible.
Comments
The asm/linker/LMM kernel team continues to make progress. The LMM kernel is taking baby steps. I need to decide whether to add things like divide/mul as LMM kernel code (fast but take up space), or do it as normal HUB routines (slow). We are using a lot of the COG RAM. I know Bill and Mike's ideas were to add multi-threading capability etc. to the Prop, but at this junction, I want to see how fast we can push this puppy using as much as COG RAM as possible for the C Virtual Machine. A multi-threading kernel can use HUB RAM of course (slow again...). We will see...
// richard
We just raised the price of our STD compilers to $249. Prices have not changed for well for 7 years!!! so it's about time, with inflation and all. I understand that a lot of the Propeller users are hobbyists so I may run a "release special" for $99 or something like that for a few months. Heck, if you want to send me $99 now to keep us going (hey, pizza and ramen can get expensive), I'll take it
// richard
No panic Richard ... it's mainly the proposed C Compiler from Essay Software, LLC which this thread was about, but thanks for the update.
Do it! I thought about this some time ago....
You will ruin any benchmark past recovery without.
And how would you handle multi dimensional arrays? Though that's multiplication only...
The syntax and semantic of the multi-dimensional arrays are (generically) supported by the portable ANSI C front end (if it's in C86, we support it). The code generator supports consists of generating either MUL, shift, shift/add code. As the Propeller has single instruction cycle multiple bits shifts, we will aggressively use that for MUL by constants.
// richard
Multiplication by (constant!) 10 is a much used operation, also out of psychological reasons (whenever you need a number between 8 and 12 you often choose 10 )
This has some advantages wrt operand flexibility..
Maybe your front end is smart enough to generate the shortest sequence of code....for e.g. times 1023
I of course see your points as e.g. crowding the COG.... but the opcodes are there, to be used by Pro II..
But you are aware that some SPIN programs will run circles around C if you ommit MULT/DIV from the COG?
... I know Bill and Mike's ideas were to add multi-threading capability etc. to the Prop, but at this junction, I want to see how fast we can push this puppy using as much as COG RAM as possible for the C Virtual Machine. A multi-threading kernel can use HUB RAM of course (slow again...). We will see...
So you will support some form of threading though right? Just wondering if there is another approach to engauging the cores other than threading (POSIX or otherwise)? Will you support pthreads or equivalent? I'm not asking for an extra feature, but without threading, prop running C might be just another uC. What other library support is planned?
Thanks.
jazzed ... by Linux SMP.
Make this a compile time option, perhaps with some others. That way, the limited COG ram, and this first pass release, have max flexibility for the programmer.
If adding these in COG means surrendering some other functionality, default to be HUB so everything just works. Then those that want faster can make their choices.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
The front end doesn't generate the sequence per se, it's up to the backend to select the best code. There are several algorithms to generate (near-?) optimal sequence of shift/add/sub so it should not be a major problem. As for benchmarking, it's always possible to construe tests to show certain things in best light. For example, a strcpy implemented in unrolled C will be much faster than a compact asm single copy loop. Anyway, the goal is to make it fast enough and eliminate inefficiency if possible.