2-Cog, 128KB-Hub, 32-I/O P2

cgracey · 2018-10-09 05:18

Yanomani wrote: »

cgracey wrote: »

They use their ONC5 process to build 2-layer interposer chips for this purpose. So, it could be done.

Won't a stacked die raise new concerns about heat dissipation?

I don't think so. The bottom of the P2 die sits on the exposed pad. The flash doesn't make much heat, at all.

Yanomani · 2018-10-09 05:32

Since the compound used in package molding has substantial contents of fused silica to help spreading the heat, I always have a trend to mentalize some eggs, frying into a hot teflon-linned pan.

Aniway, having all the gathered/observed behavioral data about current prototypes, perhaps OnSemi's people could give a word of advice on that subject.

If the intention is keeping-up with Amkor July 2018 line card, their Exposed Pad 64 lead LQFP package, 10 mm x 10 mm body size, 0.5 mm lead pitch offering has a nice 7.5 mm x 7.5 mm bottom-thermal pad.

https://c44f5d406df450f4a66b-1b94a87d576253d9446df0a9ca62e142.ssl.cf2.rackcdn.com/2018/02/Amkor_LineCard.pdf

I'm almost sure; there you'll find a confortable ground (sic) to play with (page 20 top).

But, doesn't the current 100-pin version uses 6 package pins for each group of 4 Smart pins (VDD, Pin_n, Pin_n+1, VIO(n_n+3), Pin_n+2, Pin_n+3)?

Thus, ( 6 x 8 ) + 4 = 52, leaving 12 free pins, or exactly two full extra groups of 4 Smart Pins, for a neat 40-I/O P2. that can still have.

Perhaps, it'll worth keeping the topmost bits of DIRB/OUTB (P[63:56]), which, in turn, will ease someway maintaining wichever was/will be developed for booting the current 100-pin version of the chip.

Also, when it comes to STREAMER's XINIT/XZERO/XCONT instructions, D/#[31:16], %eppp-field, you will gain some freedom to do the task of redefining their grouping possibilities.

Only my 0.02

Henrique

Roy Eltham · 2018-10-09 05:55

If it's gotta be the 64 pin package, then could the remaining pins be more I/Os. Having the top 8 I/Os match number wise with the larger version would be ideal. So P0-P31 then P56-P63.
Assuming that would fit on the available die space, I see that as a pretty nice cut down version that will have a lot of compatibility with the larger one for ROM and software that connects to the serial/flash/etc.
In fact, I think if you can't fit more I/Os then the 32 you have should be P0-P23, and P56-P63.

cgracey · 2018-10-09 06:08

In the 64-pin package, it could look like this:

40 I/O's = P0..P31, P56..P63
10 VIO's
10 VDD's
1 TEST
1 RESn
1 XI
1 XO

64 pins

I'll have to see if they can be distributed nicely around the package.

Brian Fairchild · 2018-10-09 06:29

Given the un-proven demand for the current chip, isn't it a bit premature to be thinking of yet another one?

pedward · 2018-10-09 06:38

Yanomani wrote: »

cgracey wrote: »

They use their ONC5 process to build 2-layer interposer chips for this purpose. So, it could be done.

Won't a stacked die raise new concerns about heat dissipation?

The bottom is the thermal interface, so the P2 would be on bottom with a serial flash on top.

Chip, if you do consider a stacked die, I recommend this:

Breakout p0-p31 to pins, then breakout p58-p63 as well, with the internal SPI flash die tied to p58-61, then breakout the SPI flash CS to a dedicated pin, which would loopback to p61 in typical applications.

What this achieves is SPI bus on p58-p61, tied to the piggy-back flash, but the flash's CS is external, so other SPI devices can share the bus, or you could use an external SPI flash. You get the advantage of full p0-p31 I/O pins, not shared with anything else.

Phil Pilgrim (PhiPi) · 2018-10-09 06:44

As long as developing the P2 subset isn't just avoidance activity in lieu of finishing the paperwork (i.e. documentation, dev tools, marketing strategy, etc.) for the P2 ...

I do worry about launching into a new endeavor before the P2 device becomes the P2 product. There is a difference between the two and probably another year's worth of effort to complete that transition.

When the P1 came out, all the i's were dotted and t's crossed. We had everything we needed out of the gate without having to watch the sausage being made. The P2 effort feels different to me and somehow more scattered.

-Phil

Okay, I guess I'm the wet blanket now, since Ken won't do it.

cgracey · 2018-10-09 06:51

I found out that we CAN fit 40 I/O pins into the 64-pin package. This follows the current pin-out scheme nicely:

jmg · 2018-10-09 06:53

Phil Pilgrim (PhiPi) wrote: »

I do worry about launching into a new endeavor before the P2 device becomes the P2 product. There is a difference between the two and probably another year's worth of effort to complete that transition.

Likewise, but my reading is this would not be started, until after P2 was Rev-B proven.
It's a point to raise at OnSemi, when Ken & Chip talk about plans.
Makes sense to at least sketch a road map device, and see if any large customers say 'it's a perfect fit'.

I would also suggest getting QuadSPI access stress-tested with streamer on P2, before such a 2nd device, just to shake out any details that might be limiting SPI access speeds. With less memory, good flash access for XIP-like operation, will matter more.

I did note Gigadevice have very small flash packages now, which means small die...

cgracey · 2018-10-09 06:57

The next step down from 4 cogs, in order to fit a 5x5mm die paddle would have to be 2 cogs, 64KB hub, and maybe 20+8 I/O's.

Roy Eltham · 2018-10-09 07:05

Chip,
The 64pin, 40 I/O, 4 cog, 256K hub looks like a nice setup! Hope it fits in the die too.

Also, I could see the 2 cog one being really nice for certain use cases if it has a cheap enough price.

Brian Fairchild · 2018-10-09 07:17

Phil Pilgrim (PhiPi) wrote: »

As long as developing the P2 subset isn't just avoidance activity in lieu of finishing the paperwork (i.e. documentation, dev tools, marketing strategy, etc.) for the P2 ...

.... ^^^^^^^that.

cgracey · 2018-10-09 07:20

Here is a 2-cog, 128K-hub, 32-pin version that would fit into the Amkor 7x7mm 48-pin package:

Roy Eltham · 2018-10-09 07:26

Chip,
I like that 2 cogger (thanks Ken) version even more! I hope we can end up with a nice family of P2s, the 3 you've mentioned so far sound great.

Peter Jakacki · 2018-10-09 07:51

At this stage of the game now that we have silicon I am not perturbed by these proposed additions to the P2 family. In fact, I am somewhat cheered that Chip is talking not about bigger and faster but smaller and just as fast variations. I can see a lot of potential even for a 2 cog version that I might have dismissed as unimportant even just yesterday but with 128k RAM and 32 smartpins in a small package it can now replace a lot of those tiny micros in all but the smallest packages that I still use.

I say go for it Chip.

cgracey · 2018-10-09 08:11

You know that 2-cog chip would probably have an Fmax of 220MHz. It would be really inexpensive. Maybe it would be best to make that one before a 4-cog version. It would go into a 7x7mm package with a 5x5mm die pad. It would be great for distributed systems.

jmg · 2018-10-09 08:20

cgracey wrote: »

Here is a 2-cog, 128K-hub, 32-pin version that would fit into the Amkor 7x7mm 48-pin package:

Here are the newest, teensy package 1.5x1.5mm (0.4mm) (!) FLASH memories

- these would demonstrate very well alongside P2, as they shrink the boot footprint way down. - makes SO8 look massive.

The 512kb/1Mb are in stock, and the 8Mb looks to arrive in 10 days time.

GD25D80C USON8 1.5*1.5MM http://www.gigadevice.com/datasheet/gd25d80c/
8M-bit Serial Flash 3.3V
Fast Program/Erase Speed
-1024K-byte
-256 bytes per programmable page
-Page Program time: 0.7ms typical
-Sector Erase time: 100ms typical
-Block Erase time: 0.3/0.5s typical
-Chip Erase time: 8s typical
Standard, Dual Output
-Standard SPI: SCLK, CS#, SI, SO, WP#
-Dual Output: SCLK, CS#, IO0, O1, WP#
Clock Frequency
-100MHz for fast read with 30PF load
-Dual Output Data transfer up to 160Mbits/s

GD25D10C/05C http://www.gigadevice.com/datasheet/gd25d05c/
1M/512K-bit Serial Flash 3.3V
128K/64K-byte

For 29c/30c/46c with increasing memory.
Looks like they chose DualSPI as the no-extra-pins approach variant, and push up the MHz

cgracey · 2018-10-09 08:39

I think the 2-cog version is more compelling than the 4-cog version. It's selling point would be the analog, smart pins, 2 processors, 128KB RAM, and CORDIC. It would be a squirrelly little thing.

I've already got the schematics done for the 4-cog and 2-cog variants. They are just simple reductions of the 8-cog schematic. Turns out we get lots of pins, after all.

Roy Eltham · 2018-10-09 09:16

I agree, Chip, the 2 cogger is more interesting, and likely more sellable than the 4 cogger to certain markets.
I'd still like to see both smaller variants eventually, but if you have to pick just one smaller variant right now to go along with the full 8 cogger, I'd go with the 2 cogger.

Mickster · 2018-10-09 09:22

When it comes time for the P2 press release, I would think that a family would be a real attention grabber. 👍

Peter Jakacki · 2018-10-09 09:31

A 2-cog version would target low part count designs so you would have to have brown-out reset surely. Sure, you could get by without crystal or Flash but reliable reset is essential. Perhaps a watchdog is in order?

cgracey · 2018-10-09 09:37

Making the pad-ring schematics for the 4-cog and 2-cog variants went really quickly, as it's just a reductive exercise. Fortunately, the original pinout scheme reduces quite nicely for smaller parts.

I estimate that the 2-cog die would be 4.5 x 4.5mm and fit perfectly into the 7x7mm 48-pin Amkor exposed-pad package with the 5x5mm die pad. That part would be a lot of fun to play with. It would be very fast and much lower power than the current chip. Way less-expensive, too. Maybe in the $3 range.

cgracey · 2018-10-09 09:39

Peter Jakacki wrote: »

A 2-cog version would target low part count designs so you would have to have brown-out reset surely. Sure, you could get by without crystal or Flash but reliable reset is essential. Perhaps a watchdog is in order?

New analog design would add months to the project. We'd have to have an external brownout detector.

ozpropdev · 2018-10-09 09:49

While these smaller P2's sound great and I certainly can think of projects I can/will use them in, I am finding it very hard turning my attention away from the REAL 8 cogger!
This BEAST is awesome!

cgracey · 2018-10-09 10:15

ozpropdev wrote: »

While these smaller P2's sound great and I certainly can think of projects I can/will use them in, I am finding it very hard turning my attention away from the REAL 8 cogger!
This BEAST is awesome!

Well, that makes me happy to hear!

I would like to make two smaller variants: a 2-cog, then a 4-cog. And then, if we are successful, we could use the ONC110 process (110nm) to make a 300MHz 16-cog 1MB-hub version. That would probably cost $400k to pursue. I'm thinking we could get a 2-cog variant for maybe $75k right now.

Here are ON Semi's process offerings. Note that their 110nm process is 69% faster than their 180nm process:

https://www.onsemi.com/PowerSolutions/content.do?id=16621

evanh · 2018-10-09 10:22

That will be cool. The 110 nm is clearly the sweet spot for speed in that chart.

Dave Hein · 2018-10-09 10:42

All this talk about 2 and 4 cog versions is sooooooo painful to watch. We were so close to the finish line with the 8-cog version, and now we've gone off the track to go look at shiny objects. There are lots of other things to do before other versions should even be thought about. Every time I see the project diverted I just wonder why we even bother with spending time with the P2. I realize this is Chip's project, but many of us have been hoping to get our hands on the P2 for years.

PLEASE FINISH THE 8-COG VERSION!!!!!!! (YES, I AM YELLING!)

Cluso99 · 2018-10-09 10:46

Wow! Been at work this afternoon/evening and all this breaks! Just had dinner and get to the forum.

Chip,
I have a question. Could the 2 Cog version fit 4 Cogs if two were severely Instruction reduced?
Need to be careful not to put much effort, and minimal risk, into the carve up.

If there's merit, here are a couple of ideas for the other 2 cogs.
Could they be P1 (basic instruction set) cogs?
Could they be "blind" cogs? ie the paired cogs to the main 2 cogs, sharing LUT but no Hub access?

Rayman · 2018-10-09 11:50

I'm trying to remember... Would 2 cogs have faster hub memory access than 8 cogs?

Cluso99 · 2018-10-09 11:52

Thinking further, yesterday I couldn't see a 2 cog having any legs (ie market).

But we are seeing 300+MHz running, x2 cores, 40 smart pins with analog, and interrupts. Add 128KB Hub Ram and 4KB private cog/lut ram per cog. This is still a mighty powerful little beast.

The downside is two supply rails, external flash (unless SD is required), external brownout.

It's a shame we cannot feed RCFAST into the OSC PLL.

2-Cog, 128KB-Hub, 32-I/O P2

Comments