Propeller II

dr hydra · 2012-08-06 07:24

It is the start of August...any updates on the propeller II...does it exist in chip form...has batch testing started...when can I buy them:)

mindrobots · 2012-08-06 07:27

no,
no,
when they show up in the new product listing

It WILL be worth the wait!

dr hydra · 2012-08-06 13:27

I hope it makes its debut before christmas

Phil Pilgrim (PhiPi) · 2012-08-06 13:33

dr hydra wrote:

I hope it makes its debut before christmas

Don't hold your breath. And if it's for me, an alternative gift will do just fine.

-Phil

Clock Loop · 2012-08-06 16:13

P16X64A

:cool:

rjo__ · 2012-08-06 18:21

In a webcast from the UPWest. Chip said he had everything working except that the actual performance was limited to 50MHz. I was thinking to myself, "Chip... I swear to God, 50MHz would be just fine:)"
I once had more employees than Parallax... but our output was about 1/100th. I don't know how they do it, but they do. In the meantime... all kinds of opportunities left on the Prop-1.

Phil Pilgrim (PhiPi) · 2012-08-06 18:52

rjo__ wrote:

... 50MHz would be just fine ...

No, it wouldn't. Not with 100% registered outputs -- at least for the kind of stuff I'd want to use it for.

-Phil

rjo__ · 2012-08-06 19:20

yes it would!!!!

How much faster than a 80mHz Prop-1 would a PropII be for most operations?... 2 1/2. Is there or is there not an adc built into every pin? How much more program space? How many more pins?
Can you honestly say that if a 50Mhz Propii was available tomorrow, you wouldn't jump at it?

-Rich

Cluso99 · 2012-08-06 19:21

Random thoughts while we wait with baited breath for any word on the P2...
I am really curious how they stack the DDR ram on top of the Arm chip on the Raspberry Pi. It would make a great way to attach extra hub ram on a later P2. Of course, Parallax would have to make the SRAM chip too, but I think that process is well understood anyway. My guess is that we could get well over a few MBs on this chip, and full hub speed.

Phil Pilgrim (PhiPi) · 2012-08-06 19:29

rjo__ wrote:

yes it would!!!! How much faster than a 80mHz Prop-1 would a PropII be for most operations?

In the Prop I, the outputs are not registered. With a counter in PLL mode, the edges are asynchronous to the system clock. In the Prop II, all outputs are registered, including PLL outputs. A 20ns timing increment is completely inadequate for RF applications. And that's what I'm interested in. So, no: 50 MHz is not an adequate speed.

-Phil

rjo__ · 2012-08-06 19:30

I'm wondering if reports that the Pi doesn't have enough horsepower to host a kinect are true... If it doesn't, I'm not interested. The Pandaboard does!!!

rjo__ · 2012-08-06 19:35

Phil,

You know I like you... and you know that I want you to have whatever you want. BUT I also know that if Chip put out a 50Mhz PropII... you would buy 4 of them and add your own asynchronous timing.
ok... so you probably wouldn't have to buy them... but you know what I mean:)

Rich

Phil Pilgrim (PhiPi) · 2012-08-06 19:41

rjo__ wrote:

... you would buy 4 of them and add your own asynchronous timing.

Once timing is synchronized you can't "add" asynchronous timing.

BTW, I have no idea why you like me, since we've never met, and I have a tendency to dog your posts with critical rejoinders.

-Phil

rjo__ · 2012-08-06 19:48

I like your work... your work reflects who you are. Therefore I like you.
I think I said this in a post a ways back.

I love conversation and don't think of your comments as dogging at all.

With regard to the synchronization, what I thought you were referring to is the benefit of interleaving cogs to gain resolution in the time domain.
This could be done with multiple prop II's with an overall improvement.

What I wasn't thinking of is that you may very well have projects in mind that simply could not compete in the market if you had to use additional hardware over the PropII.

Regards

Rich

rod1963 · 2012-08-06 19:52

My guess after Jan 1 2013.

Phil Pilgrim (PhiPi) · 2012-08-06 20:01

rod1963 wrote:

My guess after Jan 1 2013.

'Spoken like a man who would sooner invest in Procter and Gamble than Facebook -- and never split face cards!

-Phil

jmg · 2012-08-06 21:03

rjo__ wrote: »

In a webcast from the UPWest. Chip said he had everything working except that the actual performance was limited to 50MHz.

Is that a recent number, post layout, or was he talking about the FPGA Speed being limited to 50MHz.
Seems a large drop, from the simulated speeds if he was talking about a real device ? This is not 1995, simulation should be more accurate than that, in 2012.

cgracey · 2012-08-06 21:10

Right now, the final work is being done on the main logic block which is being synthesized from the finished Verilog code. Our pad frame, into which it plops, has been done for some time now. The layout work right now is focused on getting our pad frame's and the synthesized logic block's VDD/VSS signals connected properly. There are several thousand of these connections which must be automatically made, but it's taken a lot of work for the synthesis guy to script up how this happens. Beau has been working with him for a few weeks on this.

I am currently working on getting the ROM booter code done, which loads (from half-duplex serial or SPI flash) an authenticated (SHA-256/HMAC) secondary booter which can handle full-chip loading with faster clocking and AES-128 decryption using the keys which get passed to it, which reside in special one-time-programmable fuse bits on the chip. I think these last two things (final die artwork and ROM bit pattern) will come together at about the same time and within three weeks we'll have final GDS2 data to send off to the fab. Then, three weeks later we'll see if it works.

That is the state of things. I'm really pleased with how things are turning out. I just hope it works on the first try. By the way, the final clock speed is 160MHz and although all digital outputs are registered, the video generator clocks the DACs at whatever frequency you program the PLL to, so you are not stuck with 6.25ns (160MHz) granularity, in all cases.

jmg · 2012-08-06 21:19

cgracey wrote: »

By the way, the final clock speed is 160MHz and although all digital outputs are registered, the video generator clocks the DACs at whatever frequency you program the PLL to, so you are not stuck with 6.25ns (160MHz) granularity, in all cases.

Is that 160MHz based on post-route physical simulations ?
Is that a margined 160MHz (ie a spec value, corner case instance ), or a 'typical' ?

What about input (pin read) granularity ? Is that fixed at 6.25ns

Phil Pilgrim (PhiPi) · 2012-08-06 21:38

cgracey wrote:

By the way, the final clock speed is 160MHz and although all digital outputs are registered, the video generator clocks the DACs at whatever frequency you program the PLL to, so you are not stuck with 6.25ns (160MHz) granularity, in all cases.

That's good news, Chip. The registered outputs had me concerned for the sake of RF apps -- regardless of the clock speed.

-Phil

cgracey · 2012-08-06 21:40

jmg wrote: »

Is that 160MHz based on post-route physical simulations ?
Is that a margined 160MHz (ie a spec value, corner case instance ), or a 'typical' ?

What about input (pin read) granularity ? Is that fixed at 6.25ns

Yes, post-route, worst-case (slow-slow process corner, 1.62V, and 125C). We could get it to go 180MHz in this fashion, but we are climbing the wall with power consumption. Better to back off a bit and cut down on the buffering needed to make timing.

Pin inputs are registered to the main clock, so 6.25ns is it at 160MHz.

Kye · 2012-08-06 22:50

200 MHz operation may be possible if the P2 can survive a 25% over-clock like the P1. For room temperature environments with steady supplies this should be fine.

pedward · 2012-08-06 23:01

cgracey wrote: »

Pin inputs are registered to the main clock, so 6.25MHz is it at 160MHz.

You mean 6.25ns at 160MHz, right?

rod1963 · 2012-08-06 23:02

Why overclock the P2, it's going to be as fast as greased lightening and will take years to fully exploit as it is.

pedward · 2012-08-06 23:17

Another teaser:

Chip converted my P1 PASM SHA-256 code into P2 PASM and lost a bunch of weight

I think the instruction count dropped by 30+ instructions by just having the advantage of the new instruction set.

The cycle count for 1 block was 4742 instructions on the Prop1, a quick static analysis puts it at 3392 instructions on a Prop2. That makes the Prop2 instruction set 30% denser than the Prop1 in real world applications.

Take 20,000,000 / 4742 = 4217 blocks per second on the Prop 1 and 160,000,000 / 3392 = 47169 blocks per second on the Prop 2. That's better than 1 order of magnitude faster, for a chip that is only 8x faster through standard architectural improvements!

Since the code is run in RCFAST, that works out to 5896 blocks per second on the Prop 2 in boot mode. Given it only hashes 64 total blocks to generate the HMAC, that is 10.8 milliseconds to authenticate the second stage bootloader

pedward · 2012-08-06 23:20

rod1963 wrote: »

Why overclock the P2, it's going to be as fast as greased lightening and will take years to fully exploit as it is.

The main reason to overclock is to have an orthogonal clock ratio of 5ns per interval. This makes timing arithmetic trivially easy.

cgracey · 2012-08-07 00:30

pedward wrote: »

You mean 6.25ns at 160MHz, right?

Yes! You're right.I just fixed it.

Cluso99 · 2012-08-07 00:52

pedward wrote: »

The main reason to overclock is to have an orthogonal clock ratio of 5ns per interval. This makes timing arithmetic trivially easy.

No. The main reason I use 104MHz on the P1 is that 6.5MHz xtals are easy to get and cheap (and of course it works). I keep looking for a 6.75MHz and 7MHz xtal although I have successfully used 13.5MHz.

Based on P1, my guess is that P2 will be happy with 200MHz at least. I dont want to go to the edge, just proceed towards it.

Nice work Chip

I would love to be able to design a chip.

Heater. · 2012-08-07 01:07

Wow, a 30% reduction in PASM code size bodes very well for the C compilers for Prop II. Both when targetting LMM and in COG code. Assuming it is possible that the compilers can be coaxed into using the required instructions and optimizations. And assuming those things also work effectively for LMM.

On the current Prop C compiled to LMM is a bit of a pig and soon fills up the HUB space.

cgracey · 2012-08-07 01:45

In the Prop 2, there are indirect registers INDA and INDB (at $1F6 and $1F7). Either or both can be set or modified with a single instruction. There is also a repeat-block instruction which begins repeating after one spacer instruction. Here is how you can move a block of registers, for example:

REPS #32,#1 'repeat 1 instruction 32 times
SETINDS buffb,buffa 'spacer instruction, INDB=@buffb, INDA=@buffa
MOV INDB++,INDA++ 'move 32 registers

You can use INDA or INDB as follows in either/both destination or/and source:

INDA
INDA++
INDA--
++INDA

There are SETINDA, SETINDB, and SETINDS (both) instructions that can set or modify INDA or/and INDB pointers. Examples:

SETINDS -4,$100 'subtract 4 from INDB and set INDA to $100
SETINDA +20 'add 20 to INDA
SETINDB buff 'set INDB to buff

There are also FIXINDA, FIXINDB, and FIXINDS instructions that set wrapping bounds on the INDA and INDB pointers:

FIXINDS buff+15,buff 'set INDA and INDB to buff, fix limits to buff and buff+15 (circular buffer)
FIXINDA buff,buff+7 'set INDA to buff+7, fix limits to buff and buff+7 (decrementing circular buffer)
FIXINDB 31,0 'set INDB to 0, fix limits to 0 and 31

Heater. · 2012-08-07 03:15

Chip,

Some interesting new instructions yoou have there. Have to think about them for a bit.

Any chance of changing the mnemonics, I find INDA/INDB a bit clumbsy and grating? Perhaps call them IX and IY and have SETIX, SETIY, SETIS and so on.

More seriously, these appear to offer code compaction and speed up to raw PASM running in COG but of litle use to LMM code. Are there any new instructions to help with C compiled to LMM for example? Has there been any consultation with compiler gurus to see what would help the compiler there?

I get the impression from looking at other architectures that they can have many weird and wonderful instructions and architectural features that seemed like a good idea to speed things up but end up not being used by compilers because it's just to hard. As an example the Z80 triples the instruction set size over the 8080 with a lot of bit twiddling ops etc, nice for assembler programmers but the compilers stuck to the old 8080 set.

Propeller II

Comments