The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Tubular · 2014-04-22 22:15

Very slick, Chip.

Bodes well for multiple cogs in a DE0, even allowing for quite a few LEs for each smart pin.

I'm curious how you divvied up the 6 ALU blocks to reduce current - by critical path? Instruction coding? Instruction frequency?

cgracey · 2014-04-22 22:28

jmg wrote: »

Ahh.. oops, when you said 'the ALU', I took the Arithmetic to mean the single shared Arithmetic Block in the Hub.

So the COG ALU does MUL, MULS and Addition and Subtraction opcodes ?

That's right.

cgracey · 2014-04-22 22:29

T Chap wrote: »

I am curious if since the ALU is slower than the cog, does the cog simply just wait for a reply from the ALU and then continue?

The cog allows two clocks for the ALU to settle. By that time, the result WILL be ready.

cgracey · 2014-04-22 22:39

Tubular wrote: »

Very slick, Chip.

Bodes well for multiple cogs in a DE0, even allowing for quite a few LEs for each smart pin.

I'm curious how you divvied up the 6 ALU blocks to reduce current - by critical path? Instruction coding? Instruction frequency?

By functional blocks:

add, inc, rot, logic, mul, mux

Each block's internals take about the same time to resolve, while each block varies in response time. They are all within 60% of each other's sizes. I arranged the instruction op codes so that the blocks were easy to decode. Their different response times makes the final mux easy to arrange, with the slowest (add) at the top of the mux.

Right now, I'm going to make the 'add' block go faster by pre-decoding a few internal signals before the flops. This will get the adder off to a faster start and earlier finish time, as it is sticking WAY out, slowing everything down. In this design, the Z flag must be totally resolved by the end of the ALU cycle. On the Prop2, it was half-resolved, with other circuits getting the final result early in the next cycle. With these power-saving flop partitions, though, everything must be done at the end of the cycle, so the Z-test must complete AFTER the result mux. This tacks more time onto the cycle. By getting rid of SUMZ/SUMNZ (which were hardly used, according to the analyses Phil, Heater, and Roy did), I can pre-determine the bottom-level inputs to the 'add' block to make it go faster. This thing is sticking out like a sore thumb, causing the Z-test to stretch things out to 100MHz, when we were 120MHz before. I had gotten one compile to go 140MHz, but I haven't been able to repeat it.

Electrodude · 2014-04-22 22:50

cgracey wrote: »

This thing is sticking out like a sore thumb, causing the Z-test to stretch things out to 100MHz, when we were 120MHz before. I had gotten one compile to go 140MHz, but I haven't been able to repeat it.

100/120/140 MHz or MIPS? Didn't it use to be 200MHz?

Tubular · 2014-04-22 22:57

Thanks, Chip. Fascinating stuff.

cgracey · 2014-04-22 23:09

Electrodude wrote: »

100/120/140 MHz or MIPS? Didn't it use to be 200MHz?

100MHz for the ALU means 100MIPS, or 200MHz clock.

Electrodude · 2014-04-22 23:20

OK, thanks. I didn't realize the ALU had a separate clock.

Cluso99 · 2014-04-22 23:56

WTG Chip !!!

New Cog ALU 1800 LEs
P1 Cog complete 1850 LEs
P2 Cog complete ~30,000 LEs

Shows the complexity and hence power of the resulting P2 design.
The shakedown seems to have resulted in something extra-ordinary!!
A very efficient design plus >200MHz likely too.

Heater. · 2014-04-23 02:23

Chip,

The entire Prop1 cog was 1850 LE's, while the ALU is always the biggest chunk. The Prop2 cog was ~30,000 LE's.

Say what! And 100MIPs. And 16 of them. Now I'm very worried. This sounds too good be true

Can't wait to see it in action. Just dusting off my nano board as we speak.

evanh · 2014-04-23 02:31

cgracey wrote: »

... All ALU operations take two clocks.

Not including branch instructions, right?

Heater. · 2014-04-23 02:56

Who cares? My code only goes straight ahead. It's moving too fast to make a turn

Bean · 2014-04-23 05:04

In the quest to get more instructions, I was wondering if anyone has thought about using the IF_xx bits for instructions that would make no sense if they were conditional.

For example "IF_Z NEGZ value1, value2" would not make sense, since the instruction could be assembled as just "NEG value1,value2".
Likewise "IF_NZ NEGZ value,value2" would be assembled as just "MOV value1,value2".

So could not the IF_Z condition flag be used to indicate a totally different instruction for NEGZ and the like ?

Just a thought....

Bean

evanh · 2014-04-23 05:15

Start with NOP, or more accurately IF_NEVER. There is currently over 268 million possibly variants of NOP in the instruction set.

Seairth · 2014-04-23 12:19

Bean wrote: »

In the quest to get more instructions, I was wondering if anyone has thought about using the IF_xx bits for instructions that would make no sense if they were conditional.

For example "IF_Z NEGZ value1, value2" would not make sense, since the instruction could be assembled as just "NEG value1,value2".
Likewise "IF_NZ NEGZ value,value2" would be assembled as just "MOV value1,value2".

So could not the IF_Z condition flag be used to indicate a totally different instruction for NEGZ and the like ?

No, you can't substitute like that. In the first example, if Z=0, then NEGZ doesn't execute at all and the D register (value1) does not get changed. NEG (without a predicate) always updates D, which may not be your intent.

Cluso99 · 2014-04-23 13:27

Bean,
We are not short of opcodes. However, on the earlier P2 Chip used these bits to define some instructions.

T Chap · 2014-04-23 13:58

What is involved to offer a .65mm pitch version?

Roy Eltham · 2014-04-23 14:53

T Chap,
A LOT more money! (million+ instead of in the 10s-100s of thousands)
Oops, misread.

T Chap · 2014-04-23 14:56

I am talking pin spacing.

Brian Fairchild · 2014-04-23 15:19

T Chap wrote: »

What is involved to offer a .65mm pitch version?

Is there a 0.65mm 100-pin TQFP package?

Peter Jakacki · 2014-04-23 15:47

Brian Fairchild wrote: »

Is there a 0.65mm 100-pin TQFP package?

Even if there was, the 0.4mm pitch package is big enough already, having it too big would limit some applications I feel. Here's a footprint comparison between P1 and P2 (thermal pad in blue is approximate in size).

Correction, I have used a QFP100 with 0.65mm pitch before (not TQFP).

Rayman · 2014-04-23 16:52

Brian Fairchild wrote: »

Is there a 0.65mm 100-pin TQFP package?

I googled and found someone selling a probe adapter for one, so I guess it does/did exist...

I have to say that 0.65 is enormously easier than 0.5 mm for me.
Don't know if it's an option, but I'd also much prefer 0.65 mm spacing.

For the old P2, maybe it didn't matter because I guess we'd mostly buy Parallax made modules anyway...

But, the new P2 has enough RAM that I think I could do cool things without the need for SDRAM.
So, I'd like to be able to solder the chip myself.

I can do 0.5 mm, but it often needs very difficult rework, needing a loop to inspect the leads...

BTW: Why does everybody quote everything somebody says when replying? People seem pretty quote happy here...

mindrobots · 2014-04-23 17:01

Doesn't some of the packaging options depend on the one Chip found with the large thermal pad. It sounded like he wanted to use that packager so it's up to what they have to offer.

jmg · 2014-04-23 17:11

T Chap wrote: »

What is involved to offer a .65mm pitch version?

Mainly bonding mapping and deciding what not to bond, as well as NREs for test and packaging setups.

Amkor show these possible pin counts, in a Thermal Pad, 14x14 package : 52/64/80/100/120/128

and the appx Pin pitch these pin counts imply are
12.5/(128/4) = 0.390625
12.5/(120/4) = 0.416
12.5/(100/4) = 0.5 << Package Chip has chosen
12.5/(80/4) = 0.625 (0.65mm) << Might just manage 64io ?
12.5/(64/4) = 0.78125 (0.8mm) << 64 pin 0.8mm seems common Asian option
12.5/(52/4) = 0.9615

and there is also Body Size 14mm x 20mm, Pitch 0.65mm, and also a 128 pin 0.8mm

There is a bit of a trend for Asian uC vendors (eg Renesas) to offer both 0.5mm and 0.8mm packages, which must be production-flow/handling driven.

I've seen consumer PCBs with very long pads / solder thieves, 45° placed, on what I think was 0.8mm, that looked to be wave soldering handled. - some info here http://www.ami.ac.uk/courses/topics/0170_wsp/

Infineon also says this

[" Wave soldering an exposed-pad QFP is not possible because the package has to be attached to the PCB by SMD glue. In QFPs with exposed pad, this is not possible if the exposed pad is intended to be soldered to the PCB. Furthermore, wave soldering is only possible if the products in QFPs are qualified for wave soldering (passing the solder heat test).
Generally wave soldering of QFPs is difficult because the leads have to be soldered at all four sides of the component. Using a 45° rotated layout is recommended to allow the solder to wet the leads more easily. Wave soldering of QFPs with lead pitches of 0.65 mm or smaller may lead to excessive bridging and therefore is not recommended."]
ie ~0.8mm is wave solder min.

Looks like a common option is the 64p 0.8mm one, question is, what IO's can that give (~48?) , and is that still useful ?

T Chap · 2014-04-23 17:28

I will use a stainless stencil on the .5mm Prop, not wave, and there will percentage of bridge fixes. .65mm is very easy. I was more asking about .65mm as an option. I will be dropping 3 SO16 wide body parts and 1 SO8 with the implementation oft the new Prop, so I would be happy with the large package.

Rayman · 2014-04-23 17:33

Think I'll have to get a stereo microscope if P2 comes out with 0.5mm pitch, as planned.
Need to figure out what X I need...

T Chap · 2014-04-23 17:36

So much space, yet 25 pins crammed together. I think I'd prefer a BGA 100 10x10 array at 1mm centers, although I understand the thermal needs. To get access to a TQFP pins you will need some tiny traces at the chip fanned out as large a the .65mm part anyway.

jmg · 2014-04-23 17:47

T Chap wrote: »

I will use a stainless stencil on the .5mm Prop, not wave, and there will percentage of bridge fixes. .65mm is very easy. I was more asking about .65mm as an option. .

If adding another package, it might be smarter to gain more than just slightly better yields on manual assembly.

The 0.8mm option seems to open up wave soldering, and I find there is a 88 pin 0.8mm offering in 20mm x 20mm.
(which may be a better combination that the 28mm 128 pin 0.8mm ?)

88 pins should manage 64io quite well, or the more common 80pins fits 0.8mm at 14mm x 20mm, but has fewer Vcc pins for still 64 io.

http://www.pcb-3d.com/tag/qfp

T Chap · 2014-04-23 17:58

Yes, 88pin at .8mm is great.

Rayman · 2014-04-23 18:47

Maybe they're thinking that using smaller pin spacing means that pins are closer to the die and therefore are better at dissapating heat...

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments