What I want to know is when will the Prop II launch expo/conference/party be?
I need to arrange holiday time, finance, permission etc. so that I can be there. This is a big deal from Finland.
I don't want to look impatient, I don't. But the Prop 2 Prelim site says "October 22, 2010" as the last update time. Surely these specifications can be updated to some extent. If there really is no more news than what you knew in October, over a year and a half ago, then change the date on that page to show today's date, that way we know the information is still up-to-date.
ROM doesn't need to support qSPI, since it only needs to load 2KB of data before it switches to the boot loader. Once it has executed the boot loader, that code can kick the SPI into high gear, or boot from SD, or any myriad of devices, since the boot loader isn't part of the chip.
Last I spoke to Chip, his desire was to have FLASH programming, authentication, bootloader execution, and perhaps handover from variable pulsewidth serial comms to standard RS232 type comms. We discussed allowing the host computer to send a CLKFREQ, CLKMODE, and baud rate values to the ROM code, enabling the chip to switch to PLL and RS232 comms, so memory downloads are a lot faster. This could be achieved with a little more work by sending a bootstrap to the ROM, which then switches to PLL and RS232 mode.
What little or maximum could be implemented depends on how much he can pack into the ROM. The static footprint of the SHA-256 code was 196 longs of PASM, but should be trimmed by 12 longs or more with P2ASM instructions.
If he can use the CLUT or hub ram for buffers, he should be able to trim it more. Even so, that still leave him 250+ longs to write the rest of the bootloader, and I think that gives him a lot of room to tinker. I don't know if he has written the ROM code for the shuttle run, he seemed to desire that, but he may have just gone with the generic loader he originally planned for the shuttle.
I haven't heard from him in a couple of months, I assume he is hibernating and hasn't come back up for air.
I don't want to look impatient, I don't. But the Prop 2 Prelim site says "October 22, 2010" as the last update time.
Are you referring to the Propeller Wikispaces page? That's ancient and has a fair bit of hearsay and wishful thinking on it. I kinda wish it were just deleted, or made to point to the official Propeller II page.
This thread is absolutely amazing. I found it last night, and kept telling myself, "just one more screen, then I'm going to bed" After several screens of that, I gave up, and finally went to bed. This whole process is fascinating. If I were a better writer, I'd turn the whole thread/subthreads into a book documenting from end-to-end the production of the whole prop2 board. I had some nice ideas for what to call it, but those left me when I fell asleep at the keyboard reading these posts.
I finished the thread this morning, and have just been saying "wow!" I'm absolutely amazed by the entire process, and how much information is being provided on the forums. I've still got several other threads to wander through, and see what other amazing things I can pick up about the prop2, but this thread was an excellent primer, I'm now anxiously awaiting it's release, perhaps in time for me to purchase myself a nice christmas present (or maybe chanukkah (real spelling, not that americanized one most folks use) :-).
Anyway, Thank you parallax for giving us such a detailed glimpse into the inner workings of how things are done. If I'd not read it on the forums, I sure would love to see it in a book after the fact. This is just so fascinating. Anyone know of any books that document such things that I may be able to obtain in drm-free electronic format?
This stuff is just so cool.
I'm definitely joining the cueue for purchasing some of these when they come out. What to do with them, I don't know yet, but I'm sure I'll find something.
The best account of the development of a new computer is probably Tracy Kidder's The Soul of a new Machine.
That book was a great read.
I think softcon may have a point though. A writer with as much talent as Tracy Kidder could probably make the development of the propeller ( 1, 2, or both ) as good or better.
Ahh, yes, I have that book, haven't delved into it too much though, started reading it last year before our move, and I think it's in storage back in Alabama. Have to get down there sometime this summer, and get it out.
(or get me another copy) I have it in braille from the library for the blind, so loan time isn't an issue, but still, I think they're probably wondering where it is.
4 stage pipeline. What does that mean? I thought single-cycle implied no pipeline.
The processor has to do the same amount of work.
So each instructions needs several steps.
The pipeline allows those steps to be done in parallel.
So after loading the pipeline, when the first result appears - now on each clock cycle another result pops out.
So every cycle a result appears, even if processing each takes 4 cycles.
Of course branching needs some special handling, since the processor has to deal with reloading the pipeline.
But for this there exist some intelligent approaches to minimize the effects.
Props special handling of conditional execution (compared to other MCUs) is one of them.
There is a description somewhere in the docs.
MJB
Are there any updates on the schedule? Has the final transistor count been determined? I'm ready to collect my prize from the contest that was started 2 months ago.
Is there an update to the P2 spec that was published several months ago? Quite a few details have been discussed since the last update, and it would be nice to have the details in one document.
Notice how the add to the pointer is outside of the loop? That because it's a djnzd instruction. Meaning that the instruction after the jump is also executed. The new prop chip will have pointer registers also making the addition to the address during the loop not needed too.
The next 2 instructions after a delayed jump will be executed, so you would need a NOP in your example or some other useful instruction. The loop would look like this:
I realized we're just talking about pipeline delays and not hub access windows, but P2's hub access window repeats every 8 cycles, so in this case a non-delayed djnz would run the loop as fast as the delayed djnzd instruction. So you could also do
The next 2 instructions after a delayed jump will be executed, so you would need a NOP in your example or some other useful instruction. The loop would look like this:
GCC automatically handles pipeline issues like this. It's been done before on other architectures. The GCC compiler will generate code that has this form.
so, I understand pipelining. we've been going over that in detail and designing our own forwarding tables and other stuff in the class i'm taking right now. what I don't understand is why one would specify "single cycle" and "pipelined" at the same time. doesn't pipeline imply that every instruction takes a single cycle through each stage of the pipeline? and then, the part that confuses me, doesn't "single cycle" imply an entire instruction runs - fetches & decodes & etc - in a total of clock cycle?
Single cycle instructions means that for each cycle a new instruction is decoded and executed, but due to the pipeline, the instruction that is being decoded this cycle won't be executed until 4 cycles later.
The 4 stage pipeline that exists in many microcontrollers still exists in the P2, but each stage in the pipeline happens in parallel with the other stages. This is in contrast to the P1, in which all stages in the pipeline must be completed before the next instruction is decoded, which is why each instruction takes 4 cycles.
Microprocessors do a similar process, but their pipeline is much longer (the new Intel Ivy Bridge has 14 stages). As long as the pipeline remains full, each instruction is "single cycle" (a new instruction gets decoded every cycle), but when the pipeline gets flushed for one reason or another, it takes 14 cycles before the next instruction is completed.
"Single cycle" means a throughput of one instruction per clock tick. I don't know of any micros that can fetch an instruction and all of its operands then produce a result in one clock cycle.
FWIW, one of the newer Intel chips I was reading about has a *31* stage pipeline!
In contrast, 4 instructions isn't that bad. Imagine having to write optimized assembler for a 31 stage, I would imagine that in many circumstances the pipeline is bigger than the chunk of code you want to run, so it get's wasted with a bunch of stalls.
Chip has been actively working on things up until just Saturday. He told me about a new, crucial, addition to the counters. I also made a suggestion to him about the e-fuses, which may make it into the die, or not -- depending on the hit it would make on synthesis timeline.
Yeah, the Prescott Pentium 4s had a 31 stage pipeline. At about the same time AMD had their Athlon XPs with only 10 stages. Because of the importance for optimization on the longer pipeline, AMD, in many less-predictable applications, had major performance gains over Intel, and that is why AMD had a boon in the market share for a few years. I was surprised that the newest chips have such short pipelines (I always thought Intel maintained the raw-clock-speed-trumps-short-pipelines mentality, but I guess they learned).
Hopefully the new 31-stager has some other very impressive features.
He told me about a new, crucial, addition to the counters.
Cool. Any more details ??
It would be really great if the Counters got just a teensy bit smarter, like hardware edge capture, and a Shift option added to Count Up / Count Down.
There are spare bits in the Counter control registers to steer these functions, so it can be backward compatible in code and memory.
Some way to cascade counters internally would be nice, too. Another nice feature would be stateful logic filtering, e.g. count on a rising edge on input A iff input B is high. More than two counters per cog would really be nice. I've had programs where I've had to start extra cogs, just because I ran out of counters, not because I needed the extra processing. Counters are the bread and butter for so much of the programming I do that any and all additional features will surely be welcome.
Atmel's AVR claims this : 131 powerful instructions most single clock cycle execution and they do not mention the word pipeline.
The AVR uses a two-stage pipeline with separate fetch and execute cycles. The reason it only requires two cycles is that it uses a Harvard architecture in which the program and data spaces are separate and can be accessed simultaneously -- unlike the Prop I, which has a von Neumann architecture and single-ported RAM. The Prop II will get a 4x speed boost by using multi-ported RAM that supports simultaneous access from several "stations" in its pipeline.
. Another nice feature would be stateful logic filtering, e.g. count on a rising edge on input A iff input B is high.
- and add a Common Flag/Bit enable - we have some instances where atomic Enable control of two counters Capture would be very useful.
Without the atomic bit, it can be very hard to guarantee no aperture effects.
Comments
I need to arrange holiday time, finance, permission etc. so that I can be there. This is a big deal from Finland.
http://ww1.microchip.com/downloads/en/DeviceDoc/22288A.pdf
I'd say unlikely, as that is a SPI master, so it does not look like a SPI memory.
A better question is to ask if the Rom will support boot and callable fast routines for block reads from QuadSPI, or QuadDDR spi ?
No word on how the Silicon is looking yet ? The month of May is now behind us
Last I spoke to Chip, his desire was to have FLASH programming, authentication, bootloader execution, and perhaps handover from variable pulsewidth serial comms to standard RS232 type comms. We discussed allowing the host computer to send a CLKFREQ, CLKMODE, and baud rate values to the ROM code, enabling the chip to switch to PLL and RS232 comms, so memory downloads are a lot faster. This could be achieved with a little more work by sending a bootstrap to the ROM, which then switches to PLL and RS232 mode.
What little or maximum could be implemented depends on how much he can pack into the ROM. The static footprint of the SHA-256 code was 196 longs of PASM, but should be trimmed by 12 longs or more with P2ASM instructions.
If he can use the CLUT or hub ram for buffers, he should be able to trim it more. Even so, that still leave him 250+ longs to write the rest of the bootloader, and I think that gives him a lot of room to tinker. I don't know if he has written the ROM code for the shuttle run, he seemed to desire that, but he may have just gone with the generic loader he originally planned for the shuttle.
I haven't heard from him in a couple of months, I assume he is hibernating and hasn't come back up for air.
Don't go by some html in a corner, look at the latest files updates.
This one
http://www.parallaxsemiconductor.com/sites/default/files/parallax/Propeller2DetailedPreliminaryFeatureList-v2.0.pdf
dates from 6 March 2012.
I finished the thread this morning, and have just been saying "wow!" I'm absolutely amazed by the entire process, and how much information is being provided on the forums. I've still got several other threads to wander through, and see what other amazing things I can pick up about the prop2, but this thread was an excellent primer, I'm now anxiously awaiting it's release, perhaps in time for me to purchase myself a nice christmas present (or maybe chanukkah (real spelling, not that americanized one most folks use) :-).
Anyway, Thank you parallax for giving us such a detailed glimpse into the inner workings of how things are done. If I'd not read it on the forums, I sure would love to see it in a book after the fact. This is just so fascinating. Anyone know of any books that document such things that I may be able to obtain in drm-free electronic format?
This stuff is just so cool.
I'm definitely joining the cueue for purchasing some of these when they come out. What to do with them, I don't know yet, but I'm sure I'll find something.
That book was a great read.
I think softcon may have a point though. A writer with as much talent as Tracy Kidder could probably make the development of the propeller ( 1, 2, or both ) as good or better.
(or get me another copy) I have it in braille from the library for the blind, so loan time isn't an issue, but still, I think they're probably wondering where it is.
Thanks!!! This is great news and far more detailed than the HTML page! 200 MHz and Single cycle instructions.... wow.... 8O 8O
Btw, the link I referred to in my last post is is this: http://www.parallax.com/Propeller2FeatureList/tabid/898/Default.aspx
The PDF linked above says single-cycle instructions. The webpage that goes with it (http://www.parallaxsemiconductor.com/Products/propeller2specs) says 4 stage pipeline. What does that mean? I thought single-cycle implied no pipeline.
So each instructions needs several steps.
The pipeline allows those steps to be done in parallel.
So after loading the pipeline, when the first result appears - now on each clock cycle another result pops out.
So every cycle a result appears, even if processing each takes 4 cycles.
Of course branching needs some special handling, since the processor has to deal with reloading the pipeline.
But for this there exist some intelligent approaches to minimize the effects.
Props special handling of conditional execution (compared to other MCUs) is one of them.
There is a description somewhere in the docs.
MJB
Is there an update to the P2 spec that was published several months ago? Quite a few details have been discussed since the last update, and it would be nice to have the details in one document.
So in a delay slot instruction the pipeline is not flushed during the branch. E.g. you write code that looks like this:
Notice how the add to the pointer is outside of the loop? That because it's a djnzd instruction. Meaning that the instruction after the jump is also executed. The new prop chip will have pointer registers also making the addition to the address during the loop not needed too.
Or someone could use the REPD opcode instead ?
is this also valid ?
To me, this type of construct really shows the need for a smarter assembler.
It should be possible for a user to code what they intend, and the assembler to pipeline-shuffle as needed / as legal.
ie I would prefer to code something like
and have the assembler manage any pipeline/nop dances. ( of course, showing what it did in the LST file)
I also like High Level Assemblers, that remove some of the label-drudgery, but not the opcode access.
With a Prop 2, this gets more important.
Here it is clear what the code should do, so the the assembler can consider REPD, DJNZD, or any other loop construct it likes.
For complete user control, a REPEAT could include a hint, like REPEAT_REPD, or REPEAT_DJNZD ?
Thanks,
The 4 stage pipeline that exists in many microcontrollers still exists in the P2, but each stage in the pipeline happens in parallel with the other stages. This is in contrast to the P1, in which all stages in the pipeline must be completed before the next instruction is decoded, which is why each instruction takes 4 cycles.
Microprocessors do a similar process, but their pipeline is much longer (the new Intel Ivy Bridge has 14 stages). As long as the pipeline remains full, each instruction is "single cycle" (a new instruction gets decoded every cycle), but when the pipeline gets flushed for one reason or another, it takes 14 cycles before the next instruction is completed.
-Phil
In contrast, 4 instructions isn't that bad. Imagine having to write optimized assembler for a 31 stage, I would imagine that in many circumstances the pipeline is bigger than the chunk of code you want to run, so it get's wasted with a bunch of stalls.
Chip has been actively working on things up until just Saturday. He told me about a new, crucial, addition to the counters. I also made a suggestion to him about the e-fuses, which may make it into the die, or not -- depending on the hit it would make on synthesis timeline.
Hopefully the new 31-stager has some other very impressive features.
Cool. Any more details ??
It would be really great if the Counters got just a teensy bit smarter, like hardware edge capture, and a Shift option added to Count Up / Count Down.
There are spare bits in the Counter control registers to steer these functions, so it can be backward compatible in code and memory.
Atmel's AVR claims this : 131 powerful instructions – most single clock cycle execution
and they do not mention the word pipeline.
The single cycle opcodes, fetch the opcode and register operands, and store the result.
Of course, they may use both edges of the clock to do this, so it becomes something of a semantics exercise.
-Phil
The AVR uses a two-stage pipeline with separate fetch and execute cycles. The reason it only requires two cycles is that it uses a Harvard architecture in which the program and data spaces are separate and can be accessed simultaneously -- unlike the Prop I, which has a von Neumann architecture and single-ported RAM. The Prop II will get a 4x speed boost by using multi-ported RAM that supports simultaneous access from several "stations" in its pipeline.
-Phil
- and add a Common Flag/Bit enable - we have some instances where atomic Enable control of two counters Capture would be very useful.
Without the atomic bit, it can be very hard to guarantee no aperture effects.
Of course, but that is less likely.
A set of Three would allow 3 phase topologies from one COG.