I have also a Mandelbrot set display showing the (easy) parallelisation of independent point sets. I run only 4 COGs (needing one for TV and one for the main program, and one for thr mouse). This occupies the beast by 5/8
However, Mirror, your observation (done many times for other microcontrollers before) is the same as:
We underuse the Propeller by runnung SPIN! We all know it COULD run 60 to 100 times faster.
But this is not neccessary! You have never the chance to exploit the features of a microcontroller to the fullest. Rather you generally design your program and THEN SELECT (or build) the controller according to the needed resources (Pins, Flash, Clocks, ADC, CAN) This is why there exist about 5000 microcontrollers in 100 families.
Starting with a given computing hardware is NEARLY ALWAYS total obverkill. It must, otherwise you would run against limits many more times than we do!
This is where the concept of "time sharing" comes from: Let someone else use your processor when you do not need it.
An office PC today ist used 20% of the time, and the CPU idles 95% during that time. Have a look at your Windows Task Manager!
A COG CAN BE USED as number cruncher - we appreciate that!.
It can also - AND MUST BE - used as the processor for event handling. Everybody except some hardcore Propeller fans know that true interrupt handling is the more advanced way to do it. When you refuse that fact you have to pay for it - by idle time.
Same holds for SPIN. When you WANT an interpretative system - for good reasons! - you have to overpower the underlying hardware by an order of magnitude (or better two)
This makes all sense. So what you describe is simply a pattern of cause and consequence, and I should not consider it a serious criticism.
Edit:
To put it more poitive: The moment of truth in real time processing is, when - at the same moment - the video shift register has to be honored, a startbit arrives at the serial port, and the mouse is clicked! THEN you need 100 MIPS! Maybe for 100 µs only, but this is the difference betwee a working or a breaking system.
We can be lucky to have that peak power at our hands!
This thread was for chip to maybe get some feedback , cause he asked about this a while ago.· Also...
There is nothing·in·my posts·related to·feasibility , everything·rather "dependend" on probability
feasibility only exists when you measure repeated·collapsed wave functions and compare your thoughts to what is currently accepted science, it will only make you closed minded
·Vitruvian cat·is to·Schrodingers cat as·Possible thought is to·Probable thought·, Schrodinger knew that probability was more·important than·possibility densities. And·Davinci was more concerned with·possibility not·probability·. Thats why·Davinci was·just an·artist, never got his helicopter to fly,·and liked to cut up dead bodies so he could draw them and call himself a doctor.· Schrodinger was a real scientist, he didnt make replicas of collapsed wave functions. He was the wave function.·
An electron can be on the tip of your nose, then on the moon, then back on the tip of your nose, for no apparent reason
its not very PROBABLE, but it is FEASIBLE, or possible, quantum·can produce strange things. Be the wave function and everything is possible.
I mean you could always do dual port ram using convoluted spin pairs in split bosons, not entirely·feasible, but probable
And maybe in a few years, the things Ive written about earlier·in this thread·will be done by someone , somehow.
The thing that I've noticed reading the Prop architecture documents that struck me as worthy of fixing is the hub design. In all the designs of high performance arbiters I've seen, there is an optimization made to not waste bus cycles. In the Prop, however, it seems that if only one core is accessing main memory, 7/8ths of the potential memory bandwidth is wasted. There are a number of solutions to this problem, some more complex than others. However, if Chip chooses to fix this problem, it's critical to not create the opposite problem: allowing one core to starve the other 7.
Also, to weigh in on the subject of this topic, I'd take more RAM over more cogs any day (well, ok, most days at least). The main reason is buffering. It's much harder to make effective use of the processor when there isn't enough RAM to do lots of software pipelining. For reference, a dual core AMD Opteron @ 2.4 GHz and 2GB of RAM graduates about 9.6 instructions per second per byte of RAM (4 instructions per clock per core). The prop graduates 4883 instructions per second per byte.
Another item (and this one is simple): Expand the "cnt" register to 64 bits (spread across two registers, presumably). I have an event loop with things that are supposed to happen in the future. Each pass through the loop, I need to see if the current time is later than time when the timeout expires. If there's any thought of bumping the clock beyond 80MHz, this would seem like almost a must. 56 seconds is a really short amount of time to have the counter roll over in. The lack of unsigned comparisons in Spin makes this limitation even more annoying.
While I can live without them, interrupts would be nice as well...
On the IDE side, I'd like to not have to run Windows. Command line tools would be perfect (and more easily portable). Also, I'd like a C compiler. A lot. A whole lot. Having no high-level compiled language is a major frustration for me.
Finally, since this post has mostly been me whining, I'd like to point out that I *do* really like the Prop. It basically "just works", which is always a nice change in the computer world!
The effective instruction execution rate per used bytes is an application specific value. The possible instruction execution rate per available bytes is an archtectural parameter, showing what kind of algorithms are possible on that specific architecture. (Notwithstanding their ABSOLUTE values )
As interesting as those numbers are, they must be corrected by "benchmark MIPS" and the consideration of primary and secondary memory. It is also most important to differ between "active data space" and "program space".
Look at the parameters for an SX! Processors having 128 bytes RAM only.
But this shows clearly that the propeller has "too much relative processor speed". This is why you can use SPIN for a lot of applications!
You can also say: "It has too little memory!" This is why you cannot simply transfer PC-concepts to the Propeller.
For a long time I too grappled with the 'fear' of 'wasting' COGs on simple tasks that 'wasted' their available processing power. But then I thought "wait a minute, I used to work with PICs - OK, they had dedicated A/D, UART, etc., BUT compared with the Prop' they're a pain to work with as a hobbyist - I found it hard to get concurrent stuff working reliably, which is why I'm happy the Prop' doesn't have interrupts!
Now; the Prop' means I don't have to worry about selecting a part with all the required dedicated bits - if I need a UART I plug-in FDSerial, similar thing for A/D. And I no longer have to worry about missing an event, because I can just have a COG dedicated to watching for it - true concurrency, YAY!
Oh; and I get those other really nice things like TV out and VGA, keyboard and mouse input too!
Yup, I know I pay a price for that flexibility, but I'm just a hobbyist. I want to get stuff done, not spend time trying to debug a PIC.
As for RAM? Well, the good old BBC micro only had 32K and look at what could be achieved with that! Sometimes it's good to face the challenge, rather than just ask for more hardware to accept the bloat...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,
Simon
www.norfolkhelicopterclub.co.uk
You'll always have as many take-offs as landings, the trick is to be sure you can take-off again ;-) BTW: I type as I'm thinking, so please don't take any offense at my writing style
I've been exceptionally happy using with the Prop for some high bandwidth PID-like control tasks, in which I can use integer math in fast assembly to close my control loop, and dedicate cogs to interfacing with A/D and DAC units using bit-banging SPI interfaces implemented in assembly. The prop has been a joy to work with from a standpoint of implementing deterministic control system behaviors.
Right now the prop is somewhat harder to use for high-bandwidth signal processing, though. The two principal features I'd like to see would make DSP algorithms infinitely easier to implement: a hardware integer multiplier in each cog (possibly only one "math" cog). Obviously the multiply results are 64 bits and therefore would require two contiguous destination registers, but I don't think this poses a problem from an instruction set standpoint (though at this point a 64-bit-MOV in one instruction cycle becomes attractive!). The second feature I'd like would be a saturation mode for registers -- in the event of an arithmetic overflow I'd like the option to have the destination register go to positive of negative max (as appropriate) rather than having a potential sign reversal; this is a pretty standard feature on various DSP chips. Yes I know I can use the carry flags to execute conditional code to handle overflows but this obviously slows things up and/or results in fairly clunky code to maintain determinism.
Well, at least at my current knowledge level, only 1 "spaceship monitor".
In theory, each cog can theoretically drive a separate screen (and all the other junk can easily be done by a single prop & polling).
The practical problem is the memory required to represent the screen, as each screen uses a fair chunk of memory for the raster. This has been the limiting factor. As it is I have done away with the double buffering and instead rely on updating graphics only during the vertical sync cycle.
When chip realized a few months ago that you can run 2 props in lock step simultaneously, it gave me the·idea to run 2 props with shared·I/O pins in half-step , thereby interleaving 2 propellers with a complementary oscillator as a clock, should result in twice the MIPS of ADC DAC jobs that a single propeller could do.
Comments
However, Mirror, your observation (done many times for other microcontrollers before) is the same as:
We underuse the Propeller by runnung SPIN! We all know it COULD run 60 to 100 times faster.
But this is not neccessary! You have never the chance to exploit the features of a microcontroller to the fullest. Rather you generally design your program and THEN SELECT (or build) the controller according to the needed resources (Pins, Flash, Clocks, ADC, CAN) This is why there exist about 5000 microcontrollers in 100 families.
Starting with a given computing hardware is NEARLY ALWAYS total obverkill. It must, otherwise you would run against limits many more times than we do!
This is where the concept of "time sharing" comes from: Let someone else use your processor when you do not need it.
An office PC today ist used 20% of the time, and the CPU idles 95% during that time. Have a look at your Windows Task Manager!
A COG CAN BE USED as number cruncher - we appreciate that!.
It can also - AND MUST BE - used as the processor for event handling. Everybody except some hardcore Propeller fans know that true interrupt handling is the more advanced way to do it. When you refuse that fact you have to pay for it - by idle time.
Same holds for SPIN. When you WANT an interpretative system - for good reasons! - you have to overpower the underlying hardware by an order of magnitude (or better two)
This makes all sense. So what you describe is simply a pattern of cause and consequence, and I should not consider it a serious criticism.
Edit:
To put it more poitive: The moment of truth in real time processing is, when - at the same moment - the video shift register has to be honored, a startbit arrives at the serial port, and the mouse is clicked! THEN you need 100 MIPS! Maybe for 100 µs only, but this is the difference betwee a working or a breaking system.
We can be lucky to have that peak power at our hands!
Post Edited (deSilva) : 1/2/2008 10:25:09 PM GMT
Do you intend to stack bits?
Or how do you intend to get say 2 * 10 address bits + the opcode itself into 32 bit?
Nick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!
The DIY Digital-Readout for mills, lathes etc.:
YADRO
This thread is mainly about unrealistic "what would be nice to have independend of any feasibility". I just added some to that
There is nothing·in·my posts·related to·feasibility , everything·rather "dependend" on probability
feasibility only exists when you measure repeated·collapsed wave functions and compare your thoughts to what is currently accepted science, it will only make you closed minded
·Vitruvian cat·is to·Schrodingers cat as·Possible thought is to·Probable thought·, Schrodinger knew that probability was more·important than·possibility densities. And·Davinci was more concerned with·possibility not·probability·. Thats why·Davinci was·just an·artist, never got his helicopter to fly,·and liked to cut up dead bodies so he could draw them and call himself a doctor.· Schrodinger was a real scientist, he didnt make replicas of collapsed wave functions. He was the wave function.·
An electron can be on the tip of your nose, then on the moon, then back on the tip of your nose, for no apparent reason
its not very PROBABLE, but it is FEASIBLE, or possible, quantum·can produce strange things. Be the wave function and everything is possible.
I mean you could always do dual port ram using convoluted spin pairs in split bosons, not entirely·feasible, but probable
And maybe in a few years, the things Ive written about earlier·in this thread·will be done by someone , somehow.
And its very possible you disagree....
·
Also, to weigh in on the subject of this topic, I'd take more RAM over more cogs any day (well, ok, most days at least). The main reason is buffering. It's much harder to make effective use of the processor when there isn't enough RAM to do lots of software pipelining. For reference, a dual core AMD Opteron @ 2.4 GHz and 2GB of RAM graduates about 9.6 instructions per second per byte of RAM (4 instructions per clock per core). The prop graduates 4883 instructions per second per byte.
Another item (and this one is simple): Expand the "cnt" register to 64 bits (spread across two registers, presumably). I have an event loop with things that are supposed to happen in the future. Each pass through the loop, I need to see if the current time is later than time when the timeout expires. If there's any thought of bumping the clock beyond 80MHz, this would seem like almost a must. 56 seconds is a really short amount of time to have the counter roll over in. The lack of unsigned comparisons in Spin makes this limitation even more annoying.
While I can live without them, interrupts would be nice as well...
On the IDE side, I'd like to not have to run Windows. Command line tools would be perfect (and more easily portable). Also, I'd like a C compiler. A lot. A whole lot. Having no high-level compiled language is a major frustration for me.
Finally, since this post has mostly been me whining, I'd like to point out that I *do* really like the Prop. It basically "just works", which is always a nice change in the computer world!
As interesting as those numbers are, they must be corrected by "benchmark MIPS" and the consideration of primary and secondary memory. It is also most important to differ between "active data space" and "program space".
Look at the parameters for an SX! Processors having 128 bytes RAM only.
But this shows clearly that the propeller has "too much relative processor speed". This is why you can use SPIN for a lot of applications!
You can also say: "It has too little memory!" This is why you cannot simply transfer PC-concepts to the Propeller.
Now; the Prop' means I don't have to worry about selecting a part with all the required dedicated bits - if I need a UART I plug-in FDSerial, similar thing for A/D. And I no longer have to worry about missing an event, because I can just have a COG dedicated to watching for it - true concurrency, YAY!
Oh; and I get those other really nice things like TV out and VGA, keyboard and mouse input too!
Yup, I know I pay a price for that flexibility, but I'm just a hobbyist. I want to get stuff done, not spend time trying to debug a PIC.
As for RAM? Well, the good old BBC micro only had 32K and look at what could be achieved with that! Sometimes it's good to face the challenge, rather than just ask for more hardware to accept the bloat...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,
Simon
www.norfolkhelicopterclub.co.uk
You'll always have as many take-offs as landings, the trick is to be sure you can take-off again ;-)
BTW: I type as I'm thinking, so please don't take any offense at my writing style
Right now the prop is somewhat harder to use for high-bandwidth signal processing, though. The two principal features I'd like to see would make DSP algorithms infinitely easier to implement: a hardware integer multiplier in each cog (possibly only one "math" cog). Obviously the multiply results are 64 bits and therefore would require two contiguous destination registers, but I don't think this poses a problem from an instruction set standpoint (though at this point a 64-bit-MOV in one instruction cycle becomes attractive!). The second feature I'd like would be a saturation mode for registers -- in the event of an arithmetic overflow I'd like the option to have the destination register go to positive of negative max (as appropriate) rather than having a potential sign reversal; this is a pretty standard feature on various DSP chips. Yes I know I can use the carry flags to execute conditional code to handle overflows but this obviously slows things up and/or results in fairly clunky code to maintain determinism.
V/R
Mike
In theory, each cog can theoretically drive a separate screen (and all the other junk can easily be done by a single prop & polling).
The practical problem is the memory required to represent the screen, as each screen uses a fair chunk of memory for the raster. This has been the limiting factor. As it is I have done away with the double buffering and instead rely on updating graphics only during the vertical sync cycle.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
---
Jo
thats 160MIPS instead of 80MIPS for·the simplest·4 cog interleaved I/O routine
I asked Paul Baker and he said it should work.
Any thoughts.