The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

evanh · 2016-04-16 13:40

cgracey wrote: »

They made it look simple.

Yeah, tidy, the amount of empty space is certainly inviting. The tiny size of the weld pads is a bit mind boggling.

I note the large red features beside the I/O pin pads - smaller beside the top left corner pads - not present at all beside the power pads. I'm guessing those are capacitors.

evanh · 2016-04-16 13:41

oops, forgot the piccy ...

cgracey · 2016-04-16 13:46

T Chap wrote: »

I am wondering about an interim embedded A9 or similar solution.

BeMicro makes an Cyclone V -A9 board for $149. Either Cluso or OzPropDev were planning an extender board for it.

cgracey · 2016-04-16 13:50

evanh wrote: »

cgracey wrote: »

They made it look simple.

Yeah, tidy, the amount of empty space is certainly inviting. The tiny size of the weld pads is a bit mind boggling.

I note the large red features beside the I/O pin pads - smaller beside the top left corner pads - not present at all beside the power pads. I'm guessing those are capacitors.

Those are the digital drive transistors for 0/1 output. The big one is the PMOS transistor which pulls high and the smaller one is the NMOS transistor which pulls low. It takes a lot more PMOS to get the same drive strength as from an NMOS. These transistors were almost doubled in size recently to get the drive strength needed for USB. They can sink/source 60mA. That's overkill for most things, but USB calls for such low impedance.

evanh · 2016-04-16 14:27

Struth! So many details.

Cluso99 · 2016-04-16 15:04

Looks like a really clean design!

It's a shame to waste a shuttle run without more functionality.
Would it cost much more to include a single cog design for testing the instruction set or something?
What about the fuses? Are they just OTP?
Testing dual port memory?

Or even drop a P1V inside???

Cluso99 · 2016-04-16 15:20

cgracey wrote: »

...

I could support the BeMicroCV which has a Cyclone V -A2 device on it, which holds more than the DE0-Nano. I think that board is only $49. It would do one or two cogs and several smart pins, maybe 64KB hub RAM. If you wanted that working, I could start doing compiles for it, along with the others.

Yes please Chip

If you could have P62 & P63 smart pins that would be fantastic! After downloading, it would really help for that code to then use the prop plug to output serial using the UART and a PC terminal program such as PST. I need that for testing USB.

T Chap · 2016-04-16 15:21

Thanks Chip, if I could solder the a9 BGA, is that the only real issue with embedding a real A9 chip? I see guys hand soldering large BGAs on you tube like it is nothing to it. Multilayer board is not such an issue to make with some study. The thing would be how to buy A9 at a cheaper rate than qnty 1.

JRetSapDoog · 2016-04-16 18:32

cgracey wrote: »

Check this out:

die outline		8.5mm x 8.5mm = 72.25 mm2 (+13%)
pad frame		14.5 mm2 (-33%)
interior		57.75 mm2 (+37%)

16 of 8192x32 SP RAM	16 x 1.57 mm2 = 25.1 mm2
16 of 512x32 DP RAM	16 x 0.292 mm2 = 4.7 mm2
16 of 512x32 SP RAM	16 x 0.150 mm2 = 2.4 mm2
16384x8 ROM		0.3 mm2
memories		25.1 + 4.7 + 2.4 + 0.3 = 32.5 mm2

logic area		interior 57.75 - memories 32.5 = 25.25 mm2 for logic (2.38x)
...

There's not enough room to double the hub RAM to 1MB, but...

Curiosity killed the cat, but I'm still curious about the feasibility of increasing the die size to make room for 1MB. I just reread Chip's musing (however brief, and assuming he wasn't kidding) about resorting to a 17x17mm die for P2-Hot to dissipate a possible 5W of heat. Okay, that sounds big and expensive, but what about a 10x10mm die size?

Presumably, a bigger die makes for a bigger pad frame. I know nothing about chip design, but at an estimated 0.45mm in thickness around the periphery for the pad frame, I believe that a larger one of a 10x10mm die would occupy about 17.2 mm2, making it about 2.7 mm2 larger than the 14.5 mm2 pad frame of the 8.5x8.5mm die. (note: I updated the last paragraph since originally posting due to an error)

Now, the 8.5x8.5mm die gave 72.25 mm2 of total space (including the die frame). So:

100 mm2 - 72.25 mm2 - 2.7 mm2 = 25.05 mm2 of additional real estate, which comes very close to the 25.1 mm2 needed for another 512KB of SP RAM (and I think I over-estimated the pad frame size by a tad, and if any gap still exists, it could hopefully be made up for by the extra space already available in the current design).

Is a 10x10 mm die not standard? Would it make the chip significantly more expensive? Regarding the latter, aren't we already looking at a $10 to $15 chip in the beginning (Chip mentioned a possible $12 figure for a 17x17 mm P2-Hot), so the added expense for the larger die would seem marginal (considering the P2 is not designed to compete with low-cost chips). And I presume the bigger die would spread out heat better. And a 1MB part would be quite a sales point! (And perhaps a full meg could be considered a "reward" of sorts for waiting for the P2 goodness, but that's tugging on the heartstrings.)

Plus, the address space of the current P2 design already allows for addressing a full meg. And it seems somewhat unlikely that a third chip will come along anytime soon to use the remaining 512KB of address space (and if there is a 3rd chip, it might very well leapfrog the P2 specs by quite a bit). So, if that other 512KB of addressing space doesn't get used, it seems similar to the situation with port B on the P1 (even though that failed to materialize due to tool chain problems).

Don't get me wrong, I was quite happy to see Chip's willingness to bump the RAM up from 256KB to 512KB (thanks again, Chip). And I'd be ecstatic to have such a chip in my hand today. But I have in mind an application that really could make use of most of a full 1MB of RAM. I realize we might attach off-chip memory that blows away 512KB or 1MB of RAM in terms of size, but I'm guessing that a lot of designers would rather keep things simple with the on-chip RAM where possible, so the more, the better (discounting cost and power consumption).

I'm sure that there are lots of reasons I have no idea about that could argue against going with a larger die size with a full meg of memory. But I just thought I'd bring it up one more time before the design gets locked down with a torque wrench. In some sense, we might already be beyond that, since Treehouse has already designed a pad frame, but I think Chip just gave them the "final" dimensions, so perhaps that's not too difficult to change. Anyway, just curious. Thanks for reading.

KeithE · 2016-04-16 19:04

> I could support the BeMicroCV which has a Cyclone V -A2 device on it, which holds more than the DE0-Nano

I've got no dog in this, but for $100 you can get the DE0-Nano-SoC with a Cyclone V SE A4 on it which not only has more fabric (A2 = 25K, A4 = 40K & A2 = 1400 M10K blocks, A4 = 2700 M10K blocks), it has a dual core ARM Cortex-A9 running at 925 MHz which can run linux. So you could run propeller development tools on the board. You can ssh/scp to it which is convenient.

But you would probably want a volunteer to deal with the HPS stuff for you, as that could consume your valuable time.

jmg · 2016-04-16 19:49

Dave Hein wrote: »

I suggest freezing the design and getting the P2 out in silicon. Leave the empty space for future enhancements of the P2 after the initial chip has been in use for a while.

I'd fine tune that comment a little, to not waste the space, but to use it for an existing feature size change.
Little point in everyone paying for empty silicon.

jmg · 2016-04-16 19:57

cgracey wrote: »

We are going to do a shuttle run in July for the pad frame elements, in order to test the reset, clock, and I/O pads.

Sounds like a good idea.
Will this run be able to test USB modes ? And crystal Oscillator ?
Does the PLL need testing, or are Treehouse sign-off on that ?

I'd suggest you take a quick glance a the intel D2000 data sheet.
In the register details of this, around the Crystal Oscillator are interesting things like
* margin settings for Oscillator Gain/Drive - ie you can verify a PCB has operating margin.
* To cover 4~32MHz, they look to have 4 Gain/Drive settings.
* Trim CL, they have 16 steps of Xtal C choice
Also, most Serial channels, have a Loop-back boolean, that allows better testing coverage.

cgracey wrote: »

They can sink/source 60mA. That's overkill for most things, but USB calls for such low impedance.

The user still has drive settings choices ?
What about slew rates ?

jmg · 2016-04-16 20:02

cgracey wrote: »

I want to see about having the streamer do 4/2/1-bit operations, not just 8/16/32. This will take a day.

Sounds good, nibble wide will be important for QuadSPI devices, and bulk data.

Will the streamer also support simple hardware handshakes ?

Many systems can burst at high rates, but cannot sustain that, and use simple handshakes to keep things paced.
FIFO modes are a common example, where HS USB and GHz USB devices have FIFO interfaces.
FPGA connection, would also likely be a FIFO based one.

jmg · 2016-04-16 20:47

Cluso99 wrote: »

It's a shame to waste a shuttle run without more functionality.
Would it cost much more to include a single cog design for testing the instruction set or something?
What about the fuses? Are they just OTP?
Testing dual port memory?

Good points, but there must be plans for some interior logic, in order to test the pins properly.

If it was me, I'd certainly look to test anything outside of common synthesis flows.
Things like

* Crystal Oscillator, Boot Oscillator - confirm Xtal range
* PLL - less risky, but it does couple with Crystal & some easily variable clocks are going to be needed
* USB, including pullups etc
* DACs / Analog modes / Drive settings etc

1-2 COG and 2-4 pin cells would seem a useful quick placement and test ?
Going too simple on the interior logic could limit test coverage ?

Tubular · 2016-04-16 20:47

cgracey wrote: »

T Chap wrote: »

I am wondering about an interim embedded A9 or similar solution.

BeMicro makes an Cyclone V -A9 board for $149. Either Cluso or OzPropDev were planning an extender board for it.

Actually, 'twas Peter Jakacki,
http://forums.parallax.com/discussion/163632/p2-motherboard-for-bemicrocv-a9-proposal-features-request-pcb-layout

jmg · 2016-04-16 21:02

cgracey wrote: »

Here is the full pad frame that Treehouse has pretty much finished.
The power pads have been beefed up quite a bit, since these screenshots.

Cool plots, are those PAD areas more compact that earlier images, or is that my imagination ?

cgracey wrote: »

The power pads have been beefed up quite a bit, since these screenshots.

Good, they looked a little 'light', given the space available.
Do the power PADS now include local capacitors - GND-Pwr ?
Spread decoupling helps, as does 'beefed up' traces.

Have you checked if this can bond into a BGA package ?

cgracey · 2016-04-16 21:05

The plan is to bring out all logic-side signals for two I/O pins, plus the clock pins and reset. This way, we can test it all with an FPGA, as if we had the pins internally connected.

There probably won't be occasion to do a cog and hub, as the licenses for those EDA tools won't be active. When we pay to turn those on, we'll need to do the whole chip.

jmg · 2016-04-16 21:10

cgracey wrote: »

The plan is to bring out all logic-side signals for two I/O pins, plus the clock pins and reset. This way, we can test it all with an FPGA, as if we had the pins internally connected.

There probably won't be occasion to do a cog and hub, as the licenses for those EDA tools won't be active. When we pay to turn those on, we'll need to do the whole chip.

I thought those were tiered licenses, and you needed extra $$ only for the full-size build ?
Crystal + PLL would seem useful to include in the earlier test coverage, at least ?

cgracey · 2016-04-16 21:16

jmg wrote: »

cgracey wrote: »

The plan is to bring out all logic-side signals for two I/O pins, plus the clock pins and reset. This way, we can test it all with an FPGA, as if we had the pins internally connected.

There probably won't be occasion to do a cog and hub, as the licenses for those EDA tools won't be active. When we pay to turn those on, we'll need to do the whole chip.

I thought those were tiered licenses, and you needed extra $$ only for the full-size build ?
Crystal + PLL would seem useful to include in the earlier test coverage, at least ?

Even the first tier costs lots of money.

The crystal oscillator, RC oscillator and PLL are all inside the clock pads. So, we'd cover ever bit of analog by just involving the various pads. And the reset pad has a 50ns glitch filter (your idea, if I recall) and a 20ms one-shot timer. The synthesized stuff is likely to just work, as the flow is quite proven. Still room for some mistakes on our end, though.

jmg · 2016-04-16 21:24

cgracey wrote: »

Even the first tier costs lots of money.

Bummer...

cgracey wrote: »

The crystal oscillator, RC oscillator and PLL are all inside the clock pads. So, we'd cover ever bit of analog by just involving the various pads. And the reset pad has a 50ns glitch filter (your idea, if I recall) and a 20ms one-shot timer. The synthesized stuff is likely to just work, as the flow is quite proven. Still room for some mistakes on our end, though.

That's sounding better, if you can get coverage of all that non-synth stuff, that's important.
Can you add a simple divider, ( or tap into the PLL somewhere) so you do not need to drive a FPGA line at highest PLL Speeds, for testing.

Are there any DOCs yet around the Crystal Oscillator settings and PLL equations and Dividers / PFD / VCO / CL / Gm etc ?

cgracey · 2016-04-16 21:41

jmg wrote: »

cgracey wrote: »

Even the first tier costs lots of money.

Bummer...

cgracey wrote: »

The crystal oscillator, RC oscillator and PLL are all inside the clock pads. So, we'd cover ever bit of analog by just involving the various pads. And the reset pad has a 50ns glitch filter (your idea, if I recall) and a 20ms one-shot timer. The synthesized stuff is likely to just work, as the flow is quite proven. Still room for some mistakes on our end, though.

That's sounding better, if you can get coverage of all that non-synth stuff, that's important.
Can you add a simple divider, ( or tap into the PLL somewhere) so you do not need to drive a FPGA line at highest PLL Speeds, for testing.

Are they any DOCs yet around the Crystal Oscillator settings and PLL equations and Dividers / PFD / VCO etc ?

Good idea about maybe dividing the PLL output. I've got a 1.5GHz scope here, though, and the I/O pads are designed to switch very quickly. It might be good to test in that way. I could bring a divided-by-16 PLL signal out on another pin.

I don't have any doc written up on the clock modes, but they look like this:

%xxxxxx00 = RC Fast (20MHz)
%xxxxxx01 = RC Slow (20KHz)
%xxxxxx10 = XI input
%xxxxxx11 = XI input + PLL

%xxxx00xx = XI = float, XO = float
%xxxx01xx = XI = input, XO = float
%xxxx10xx = XI = input, XO = !XI, 1Mohm feedback, 15pF on XI/XO
%xxxx11xx = XI = input, XO = !XI, 1Mohm feedback, 30pF on XI/XO

%0000xxxx = PLL off
%0001xxxx = PLL, XI * 2
%0010xxxx = PLL, XI * 3
%0011xxxx = PLL, XI * 4
...
%1111xxxx = PLL, XI * 16

jmg · 2016-04-16 21:56

cgracey wrote: »

I don't have any doc written up on the clock modes, but they look like this:

%xxxxxx00 = RC Fast (20MHz)
%xxxxxx01 = RC Slow (20KHz)
%xxxxxx10 = XI input
%xxxxxx11 = XI input + PLL

%xxxx00xx = XI = float, XO = float
%xxxx01xx = XI = input, XO = float
%xxxx10xx = XI = input, XO = !XI, 1Mohm feedback, 15pF on XI/XO
%xxxx11xx = XI = input, XO = !XI, 1Mohm feedback, 30pF on XI/XO

%0000xxxx = PLL off
%0001xxxx = PLL, XI * 2
%0010xxxx = PLL, XI * 3
%0011xxxx = PLL, XI * 4
...
%1111xxxx = PLL, XI * 16

OK.
Is there a CMOS Clock in mode ?
What MHz range does Xtal target ?

You may need lower CL values, if you want 30MHz Xtal support & I'd avoid 15pF min setting.

Also, a mode with
XO = !XI, 1Mohm feedback, 0pF added on XI/XO
allows Clipped Sine TCXO's to be AC coupled, and also covers any user-C cases.

There is no field for Xtal Division ?

With a range of possible sources, and higher MHz being smaller these days, being able to divide the Xtal by a modest amount before PFD is useful. Something like 4-5b Xtal Divide and 6-7b PLL divide gives better PLL control, but still light logic.

Cluso99 · 2016-04-16 23:32

Chip,
There are two guys who are extremely serious about wanting a P1V.
Perhaps they may be willing to fund the licence section to have a P1V placed inside the frame. What are the chances the frame will work in its basic mode for P1V?

For others, the P1V is...
* Prop 1 plus
* 64 I/O
* More hub ram
* Faster (160+ MHz)

Optional extras...
* LUT RAM for cog execution
* Security fuses (OTP)
* serial input on video logic
* Multiply & Divide
* Relative addressing
* Boot from Flash (instead of EEPROM)
* ??? OTP for boot code

Cluso99 · 2016-04-16 23:59

Chip,
How do you propose to access these internal pins to the smart pins? Sounds smart to be able to use an FPGA I/O to connect to the internal smart pin pads!

But I cannot envision how you will do this. Will it just be a die with the external pads wired to at test frame, and internal pads also wired to the test frame?

BTW I presume that since Treehouse have now done the I/O block/ring, that the chip die dimensions are now frozen?

evanh · 2016-04-17 00:04

JRetSapDoog wrote: »

cgracey wrote: »

There's not enough room to double the hub RAM to 1MB, but...

Curiosity killed the cat, but I'm still curious about the feasibility of increasing the die size to make room for 1MB. I just reread Chip's musing (however brief, and assuming he wasn't kidding) about resorting to a 17x17mm die for P2-Hot to dissipate a possible 5W of heat. Okay, that sounds big and expensive, but what about a 10x10mm die size? ...

Doh! How come no one corrected me on the not? I was babbling away happily about choosing double the HubRAM.

Regarding the die size, I'd think it's seriously constrained by the QFP100 package size. Any talk of larger for the Prop2-Hot was when targeting a package with closer to 200 pins I'd guess. The Hot design did have a full extra 32 I/O.

evanh · 2016-04-17 00:20

Cluso99 wrote: »

Chip,
There are two guys who are extremely serious about wanting a P1V.
Perhaps they may be willing to fund the licence section to have a P1V placed inside the frame. What are the chances the frame will work in its basic mode for P1V?

Eek! Stressful. The extra license money will make that very risky to do on first shuttle. I can only assume they both need features that aren't on the FPGA.

evanh · 2016-04-17 00:29

Cluso99 wrote: »

But I cannot envision how you will do this. Will it just be a die with the external pads wired to at test frame, and internal pads also wired to the test frame?

The plan is for only two smartpins to be complete in the test run. The rest of the I/O pads can be re-purposed. So presumably a few of them will form an external Cog interface for wiring over to a FPGA for testing.

EDIT: Err, not even that, just the pad frame only:

cgracey wrote: »

... bring out all logic-side signals for two I/O pins, ...

jmg · 2016-04-17 01:12

Cluso99 wrote: »

Chip,
There are two guys who are extremely serious about wanting a P1V.
Perhaps they may be willing to fund the licence section to have a P1V placed inside the frame.

Why take this risk, for what could well be Zero devices, and delay P2 as well, for a modest number of P1V chips ?
This is a shuttle run only.

Cluso99 wrote: »

What are the chances the frame will work in its basic mode for P1V?

Without special design and test, probably quite low.

Anyone serious about P1V should be looking at FPGA, surely ?

jmg · 2016-04-17 01:18

Cluso99 wrote: »

Chip,
How do you propose to access these internal pins to the smart pins? Sounds smart to be able to use an FPGA I/O to connect to the internal smart pin pads!

My reading was there are no smart pins, just the custom parts of the design, but that includes the PLL and Oscs.
Smart pins are done in the FPGA, and use the PAD-Ring IO to confirm pin-coupling effects.

Cluso99 wrote: »

BTW I presume that since Treehouse have now done the I/O block/ring, that the chip die dimensions are now frozen?

Weren't the dimensions also frozen by the package ?

I think you can put almost any die into BGA, so there could possibly be a BGA version down the line.
(same die, smaller total PCB area, but more PCB layers )

jmg · 2016-04-17 01:59

cgracey wrote: »

The plan is to bring out all logic-side signals for two I/O pins, plus the clock pins and reset. This way, we can test it all with an FPGA, as if we had the pins internally connected....

The crystal oscillator, RC oscillator and PLL are all inside the clock pads. So, we'd cover ever bit of analog by just involving the various pads. And the reset pad has a 50ns glitch filter (your idea, if I recall) and a 20ms one-shot timer. The synthesized stuff is likely to just work, as the flow is quite proven. Still room for some mistakes on our end, though.

Do you plan to package some, if the first probe tests are ok ?

If so, it may pay to bus-up all other pins output path connections, so you can do some 'all-as' tests on DAC/Decoupling.
Maybe input paths can simple OR, which may already be there, for all-pin IP path confirm tests.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments