Propeller II
dr hydra
Posts: 212
It is the start of August...any updates on the propeller II...does it exist in chip form...has batch testing started...when can I buy them:)
Comments
no,
when they show up in the new product listing
It WILL be worth the wait!
Don't hold your breath. And if it's for me, an alternative gift will do just fine.
-Phil
:cool:
I once had more employees than Parallax... but our output was about 1/100th. I don't know how they do it, but they do. In the meantime... all kinds of opportunities left on the Prop-1.
No, it wouldn't. Not with 100% registered outputs -- at least for the kind of stuff I'd want to use it for.
-Phil
How much faster than a 80mHz Prop-1 would a PropII be for most operations?... 2 1/2. Is there or is there not an adc built into every pin? How much more program space? How many more pins?
Can you honestly say that if a 50Mhz Propii was available tomorrow, you wouldn't jump at it?
-Rich
I am really curious how they stack the DDR ram on top of the Arm chip on the Raspberry Pi. It would make a great way to attach extra hub ram on a later P2. Of course, Parallax would have to make the SRAM chip too, but I think that process is well understood anyway. My guess is that we could get well over a few MBs on this chip, and full hub speed.
In the Prop I, the outputs are not registered. With a counter in PLL mode, the edges are asynchronous to the system clock. In the Prop II, all outputs are registered, including PLL outputs. A 20ns timing increment is completely inadequate for RF applications. And that's what I'm interested in. So, no: 50 MHz is not an adequate speed.
-Phil
You know I like you... and you know that I want you to have whatever you want. BUT I also know that if Chip put out a 50Mhz PropII... you would buy 4 of them and add your own asynchronous timing.
ok... so you probably wouldn't have to buy them... but you know what I mean:)
Rich
Once timing is synchronized you can't "add" asynchronous timing.
BTW, I have no idea why you like me, since we've never met, and I have a tendency to dog your posts with critical rejoinders.
-Phil
I think I said this in a post a ways back.
I love conversation and don't think of your comments as dogging at all.
With regard to the synchronization, what I thought you were referring to is the benefit of interleaving cogs to gain resolution in the time domain.
This could be done with multiple prop II's with an overall improvement.
What I wasn't thinking of is that you may very well have projects in mind that simply could not compete in the market if you had to use additional hardware over the PropII.
Regards
Rich
'Spoken like a man who would sooner invest in Procter and Gamble than Facebook -- and never split face cards!
-Phil
Is that a recent number, post layout, or was he talking about the FPGA Speed being limited to 50MHz.
Seems a large drop, from the simulated speeds if he was talking about a real device ? This is not 1995, simulation should be more accurate than that, in 2012.
I am currently working on getting the ROM booter code done, which loads (from half-duplex serial or SPI flash) an authenticated (SHA-256/HMAC) secondary booter which can handle full-chip loading with faster clocking and AES-128 decryption using the keys which get passed to it, which reside in special one-time-programmable fuse bits on the chip. I think these last two things (final die artwork and ROM bit pattern) will come together at about the same time and within three weeks we'll have final GDS2 data to send off to the fab. Then, three weeks later we'll see if it works.
That is the state of things. I'm really pleased with how things are turning out. I just hope it works on the first try. By the way, the final clock speed is 160MHz and although all digital outputs are registered, the video generator clocks the DACs at whatever frequency you program the PLL to, so you are not stuck with 6.25ns (160MHz) granularity, in all cases.
Is that 160MHz based on post-route physical simulations ?
Is that a margined 160MHz (ie a spec value, corner case instance ), or a 'typical' ?
What about input (pin read) granularity ? Is that fixed at 6.25ns
That's good news, Chip. The registered outputs had me concerned for the sake of RF apps -- regardless of the clock speed.
-Phil
Yes, post-route, worst-case (slow-slow process corner, 1.62V, and 125C). We could get it to go 180MHz in this fashion, but we are climbing the wall with power consumption. Better to back off a bit and cut down on the buffering needed to make timing.
Pin inputs are registered to the main clock, so 6.25ns is it at 160MHz.
You mean 6.25ns at 160MHz, right?
Chip converted my P1 PASM SHA-256 code into P2 PASM and lost a bunch of weight
I think the instruction count dropped by 30+ instructions by just having the advantage of the new instruction set.
The cycle count for 1 block was 4742 instructions on the Prop1, a quick static analysis puts it at 3392 instructions on a Prop2. That makes the Prop2 instruction set 30% denser than the Prop1 in real world applications.
Take 20,000,000 / 4742 = 4217 blocks per second on the Prop 1 and 160,000,000 / 3392 = 47169 blocks per second on the Prop 2. That's better than 1 order of magnitude faster, for a chip that is only 8x faster through standard architectural improvements!
Since the code is run in RCFAST, that works out to 5896 blocks per second on the Prop 2 in boot mode. Given it only hashes 64 total blocks to generate the HMAC, that is 10.8 milliseconds to authenticate the second stage bootloader
The main reason to overclock is to have an orthogonal clock ratio of 5ns per interval. This makes timing arithmetic trivially easy.
Based on P1, my guess is that P2 will be happy with 200MHz at least. I dont want to go to the edge, just proceed towards it.
Nice work Chip I would love to be able to design a chip.
On the current Prop C compiled to LMM is a bit of a pig and soon fills up the HUB space.
REPS #32,#1 'repeat 1 instruction 32 times
SETINDS buffb,buffa 'spacer instruction, INDB=@buffb, INDA=@buffa
MOV INDB++,INDA++ 'move 32 registers
You can use INDA or INDB as follows in either/both destination or/and source:
INDA
INDA++
INDA--
++INDA
There are SETINDA, SETINDB, and SETINDS (both) instructions that can set or modify INDA or/and INDB pointers. Examples:
SETINDS -4,$100 'subtract 4 from INDB and set INDA to $100
SETINDA +20 'add 20 to INDA
SETINDB buff 'set INDB to buff
There are also FIXINDA, FIXINDB, and FIXINDS instructions that set wrapping bounds on the INDA and INDB pointers:
FIXINDS buff+15,buff 'set INDA and INDB to buff, fix limits to buff and buff+15 (circular buffer)
FIXINDA buff,buff+7 'set INDA to buff+7, fix limits to buff and buff+7 (decrementing circular buffer)
FIXINDB 31,0 'set INDB to 0, fix limits to 0 and 31
Some interesting new instructions yoou have there. Have to think about them for a bit.
Any chance of changing the mnemonics, I find INDA/INDB a bit clumbsy and grating? Perhaps call them IX and IY and have SETIX, SETIY, SETIS and so on.
More seriously, these appear to offer code compaction and speed up to raw PASM running in COG but of litle use to LMM code. Are there any new instructions to help with C compiled to LMM for example? Has there been any consultation with compiler gurus to see what would help the compiler there?
I get the impression from looking at other architectures that they can have many weird and wonderful instructions and architectural features that seemed like a good idea to speed things up but end up not being used by compilers because it's just to hard. As an example the Z80 triples the instruction set size over the 8080 with a lot of bit twiddling ops etc, nice for assembler programmers but the compilers stuck to the old 8080 set.