Max MIPs for propeller processor
webmasterpdx
Posts: 39
Assuming an optimized application that runs with 8 processors in parallel without any waiting for the hub, what is the maximum processing speed in MIPs?
Is it one clock per instruction???
If so, thats 640MIPs for $8 !!!
I'm assuming that the app has gone through it's initialization and is running all out, again optimized without any waits for the hub.
Anyone got that number????
Thanks
-Donald
Is it one clock per instruction???
If so, thats 640MIPs for $8 !!!
I'm assuming that the app has gone through it's initialization and is running all out, again optimized without any waits for the hub.
Anyone got that number????
Thanks
-Donald
Comments
Thats still not bad....160MIPs is 5 times the fastest desktop PC less than 20 years ago.
160MIPs was considered in the supercomputer domain 30-40 years ago.
Compared to the fastest 8-bit pic which is 16MIPS at $4. For $4-6 you can get a high end dsPIC33 which is a 32 bit processor that runs at 40MIPs.
As long as the application can be implemented in the propeller, it has a lot to offer at those speeds.
I do have one or two more questions please....
1. Does the processor have hardware multiply or divide. I noticed a 16-bit multiply in the datasheet, but I'm unsure if that's SPIN or assembler.
2. How wide are the processor internal buses. i.e. Are the instructions 32-bit or what?
Thanks very much
-Donald
Sadly there is no multiply or divide instructions. BUT there are some damn fast 32, 16 and 8 bit multiplies been coded up on this forum.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
The bus is 32 bits wide, instructions are 32 bits. This also is true for accessing HUB-RAM.
Maybe you didn't get the HUB-RAM vs. COG-RAM thing absolutely right. So I repeat in other words:
The HUB-RAM is common RAM that all COGs can access. To avoid concurrent accesses, a "rotating propeller" (thats from what the name 'Propeller' is coming from I guess) grants accesse to each COG in a time-sliced manner.
Each COG has its own limited RAM where every access is at full speed without any waits. So you could copy some data from HUB-RAM to COG-RAM if you need to do intensive computation on that. As long as that copied section fits into COG-RAM together with code, of course.
ASM for the Prop tends to be very compact. So if 500 words sounds incredibly small, it is only very small.
Don't forget, the SPIN-interpreter fits into one single COG. And again, the SPIN code to be interpreted is outside of the COG in HUB-RAM.
For C (ICCV7Prop), it is a bit different. There is an interpreter in a COG that is loading sections of ASM-code from HUB-RAM to COG-RAM and this is executed there. So C is actually in PASM (Propeller-ASM) that is paged in to a COG. Execution is about 7 times faster (I had the same experience) but the code is about 5 times bigger compared to SPIN. Again, SPIN is interpreted, think of a P-machine and C is ASM that is paged in.
You might do a search for "LMM" (Large Memory Model) here in the forum. There's also a XMM that reads from (EE)PROM with paging to HUB-RAM. There's also an assembler (not from Parallax) that supports LMM (& XMM?) programming. But I have no experience, others have to jump in if you do have questions to that.
Hope that helped to clarify some things that are a bit confusing / unclear at the beginning.
Nick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!
The DIY Digital-Readout for mills, lathes etc.:
YADRO
Which reminded me that I have long ago wondered if I could implement my little audio project on the Prop.
Last time I thought about it I did not know much about the Prop so it went out of my mind.
Basically I have an algorithm coded in C that performs as a 3 way digital crossover for use in driving multi element loud speakers and sub-woofers.
For some years now a friend of mine has been using this crossover algorithm in his PC sound set up and he loves it.
Now apart from wondering if the Prop has the horsepower to run the algorithm at 40 odd KHz sample rate the hard part for me is the hardware, getting high quality 16 bit sound in and out of the thing. Ideally 16 bit A/D in and two or three 16 bit D/A out.
I'm beginning to think the horse power is there.
A Prop crossover in a small box with VGA interface for crossover point setting would be so cool.
Any ideas for A/D, D/A anyone?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Post Edited (heater) : 8/31/2009 9:04:19 AM GMT
I would be interested in the Xover project too. Years of listening to £1000+ speakers leaves you very dissapointed with 99.99% of commertial speakers. I have not made any recently as just as they were getting to sound reasonable SHE objected to size and uglyness (woodwork, I think) and so a compromise was struck and they still need tweaking. A configerable Xover may be the answer, even for old ears with tinnitus.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
The dsPIC is actually a 16-bit processor with a DSP engine. The PIC32 is 32-bit, and runs at 80 MIPS.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
As Spin is both interpreted and makes heavy use of external memory access, it is much slower than pasm; I believe I saw references claiming that Spin is approximately 40x slower than assembly.
In short, like any hardware, there is a difference between native mips performance and actual throughput which is often determined by coding efficiency.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Need some suggestions/advice about selection of audio quality A/D and D/A that can
a) Be bolted to the Prop easily.
c) Are easy/fast for the Prop to drive.
d) Don't cost arms and legs.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
Post Edited (Leon) : 8/31/2009 12:13:07 PM GMT
humanoido
Millions of instructions per second... but what is an instruction?
For an X86, a single instruction can do some really complex stuff.
When I was studying super-computers (many years ago now) the preferred measurement was MFLOPS. But even that is not very useful.
Better is application specific benchmarks.
You might consider the AD1871 ADC. You get 2 channels ADC on one chip.
www.analog.com/en/analog-to-digital-converters/audio-ad-converters/ad1871/products/product.html
It needs an external clock but you might be able to drive it from the prop counters. I can share code with you for reading from it in its default mode written in PASM. Cost is ~$5 in quantity one (which gets you 2 channels of 24-bit, if you believe the specs).
PM me if you are interested.
Leon
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Amateur radio callsign: G1HSM
Suzuki SV1000S motorcycle
Coders do not live by mips alone!
The prop is more powerful than the mips rating would suggest.
The relative slowness of interpreted Spin (relative to pasm) made me wince a bit at first.
But then I realized that with 8 cogs I could use one to run critical timing code in pasm
and others to run parts of the program that did not really need speed, in Spin. Every
program has code that would be running fast enough if it was coded in PICBASIC for an old 8bit PIC.
So what if a cog or two are not running the code at max speed in pasm but instead in slow Spin?
It just doesn't make any real difference....well, actually it does, you save a lot of time by
using Spin when it suffices. With 8 cogs it's wasteful to spend any more programming time
than is absolutely necessary writing asm code...wasting time wastes $ and $ is almost
always the most important consideration. Spin is inside the beast to allow us to take "the easy
way out" and still get the job done.
When you program a single processor, even if it is a fast ARM, always foremost is
the reality that any part of the program that is not optimized is slowing down the whole
thing and could end up making a mess of critical timing....so you churn out the tightest
C and asm you can manage. The code inside those interrupt routines have got to
be made faster, faster and still faster or the whole thing will go down in flames.
I do admit that at times I still cringe at the thought of that interpreter running my code.
I just can't entirely shake the feeling that I have cheated a bit. Multiple processors
has done this to me! Chip is an evil genius, he knew the prop would bring us to
this disgraceful condition
When you are in big doodoo is when your application is so close to the limits of the prop
chip that you need to code everything in pasm and tweak, tweak, tweak to fit it all in
and make it fast enough. I have not come anywhere close to that yet...maybe the prop2 will
come along before I have to face it.
If someone told me a year ago I'd be writing code that would run on an interpreter I'd have
called them crazy... On the AVR and ARM even compiled C was a compromise, asm was how the real coders
created magic. But plodding along in Spin on a cog in a prop chip is just fine....you have more cogs!
Way back in HS we had a science fair, I was programming PICs (insert scream here) and created a project
that could unlock a door by tapping the metal doorknob with your finger in a coded pattern.
I simulated a door lock by using a homemade solenoid to move a large nail in and out of
a hold in a tiny homemade wooden door. (it was embarrassingly crude and noisy) I coded it
in PIC asm and was very proud of it
Someone else had a project that used what I now recognize as a basic stamp. I inquired
about it and poked a bit of fun at them because it used basic and was interpreted and SLOW.
But that project beat out mine at the end of the day. The little cpu was fast enough to do
the job and it was so easy to use anyone could work with it.... The extra points I felt
were due me because I slaved over that nasty PIC asm were not at all obvious to the 2 teachers
that judged the fair. I shoulda used a stamp and spent the extra time working on that noisy
solenoid and ugly little door.
About the mips on the prop, remember that you go from 160 to 200 mips theoretical maximum by just using
one of those 6.250mhz xtals in place of the regular xtal. That extra 40mips might save you having to write some
pasm someday
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- Some mornings I wake up cranky.....but usually I just let him sleep -
1. It has a two-register instruction architecture. This means any memory location can be a data source, and any location can be a destination. There's no accumulator bottleneck, and operations are 32 bits wide.
2. Any instruction can be exectued conditionally. You don't have to jump around instructions (or small groups of them) that don't need to be executed.
3. Any instruction for which zero and carry results have meaning can selectively write the zero and carry flags, or not.
4. Any non-hub instruction can selectively write its destination register, or not.
5. There's no pipeline penalty for jumps (only for conditional jumps that aren't taken).
6. Shift instrucitons use barrel shifters. This means any size shift can be accomplished in one instruction cycle.
7. The mux instructions make quick work of bit setting and clearing.
8. Several multiple-effect instructions (e.g. cmpsub, addabs, etc.) optimize common operations that would take more instructions on another processor.
9. Timing granularity for the wait instructions is one clock, not one instruction cycle.
In many ways, the Propeller instruction set resembles microcode. There's a lot that gets done, selectively and/or conditionally, during each instruction cycle.
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 8/31/2009 6:48:40 PM GMT
(And no I do not know how to spell)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Style and grace : Nil point
I'm going to copy those points and stick the txt file in my prop directory....
I squirrel away everything I find useful about the prop....that way I don't have
to come here and search for it later on. It's helpful to do when you are a noob like me.
PhilPilgrimsPropPoints.txt
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- Some mornings I wake up cranky.....but usually I just let him sleep -
I actually hate blue LEDs....they seem too bright and harsh.
I understand they were once quite rare? Now they seem to be overused.
My fav LEDs are RGB so I can select from a large range of colors.
Green I like best of standard colors.
1 Green
2 Yellow
3 Orange
4 red
5 blue
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- Some mornings I wake up cranky.....but usually I just let him sleep -
As has been pointed out elsewhere, the jobs that we ask a microcontroller to do, are usually jobs that it doesn't make sense to throw a more complex CPU at.
e.g.
Price
Simplicity
"Everything embedded" (give or take an external flash)
I/O functionality (PIN stickyness, ability to to ADC, DAC, signal edge detection, special functions etc, etc)
Instruction predictability (watch the PROP output to VGA and tell me this isn't important [noparse]:)[/noparse]
As I check my 'bit' draw, I have lots of different types of microcontrollers I constantly use. I have a few old generation X86 chips I've scavanged, but I doubt I'll ever use them. Now that said, I would not use a microcontroller to build a compute intensive supercomputer.
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Post Edited (Agent420) : 8/31/2009 7:22:19 PM GMT
But there are times you want to run a very low power controller just to watch and wait for something to
happen, you can have it sleeping most of the time and wake it several times a second to see if it has happened.
And sometimes once that something does happen then the lowly controller is not capable of dealing with it
at all so you have it wake up a multi Bips machine to do the work, store the data and then turn back off
and hand the lookout chore back to the controller. Both are needed sometimes...the little $2 controller
keeps the big number cruncher from staying on all the time and killing the battery....he's the tireless little
lookout that makes it all possible sometimes.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
- Some mornings I wake up cranky.....but usually I just let him sleep -
@Agent
Nice clock!
Only thing it seems to be missing is a big red lever on the side [noparse]:)[/noparse])
- H
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
This happens in a PC today, known as "wake on hardware". E.g. when a network packet arrives. A low power device·(Microcontroller, PLA, ASIC, etc)·identifies an external event and wakes up·the power hungry·PC in response.
I have a general saying when mentoring "right tool for the right job". Goes for both hardware and software.
·
Woohoo!!! Time Machine! Can you set that to 1977 and Press Enter?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
--Steve
Propeller Tools
I built a Nixie clock in about 1973. All of a sudden Nixie tubes are as rare as hens teeth.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.