The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Phil Pilgrim (PhiPi) · 2014-04-24 17:54

I'm not sure I get the Smart Pin idea. They can't be very smart, or you end up replicating a whole bunch of transistors (64x!), most of which will never get used.

-Phil

RossH · 2014-04-24 18:03

Phil Pilgrim (PhiPi) wrote: »

I'm not sure I get the Smart Pin idea. They can't be very smart, or you end up replicating a whole bunch of transistors (64x!), most of which will never get used.

I thought that was the point? The extra transistors cost effectively nothing (provided they don't suck too much power, as we discovered on the P2!) but it means you can use any pin for anything. Surely that's a very attractive feature for embedded hardware designers? **

Ross.

** it also makes life much easier for software designers, since it means any number of objects that communicate with the outside world can all be accomodated without worry or conflict - simply be assinging them to use different pins, and they can use any pins that are available!

Phil Pilgrim (PhiPi) · 2014-04-24 19:07

RossH wrote:

I thought that was the point? The extra transistors cost effectively nothing (provided they don't suck too much power, as we discovered on the P2!) but it means you can use any pin for anything.

We can do that now in the P1. What Smart pins imply is that you can use every pin for anything. But nobody's going to do that. Transistors don't cost nothing: there's silicon real estate to consider. And 64 smarty-pins don't come free. To me, it makes more sense to put the smarts in the functional blocks (i.e. cogs), since that's where the "smart" behavior takes place.

-Phil

Disclaimer: I'm not a chip designer and don't claim that my posts on the subject are authoritive, rather, merely inquisitive. (At least I'm honest about it.)

RossH · 2014-04-24 19:14

Phil Pilgrim (PhiPi) wrote: »

We can do that now in the P1. What Smart pins imply is that you can use every pin for anything. But nobody's going to do that. Transistors don't cost nothing: there's silicon real estate to consider. And 64 smarty-pins don't come free. To me, it makes more sense to put the smarts in the functional blocks (i.e. cogs), since that's where the "smart" behavior takes place.

I don't think the cost in real-estate is all that significant. A few thousand extra transistors is nothing compared to the final total the P2 ended up with. However, I am also not a chip designer, so I can't be sure - but I'll bet Chip is!

Ross.

evanh · 2014-04-24 19:36

I suspect there is significant room about the pad ring that is often ignored in layout, Beau has spent most of his time working in this area if I'm not mistaken.

Either way, I doubt silicon space has been the issue for quite some time. The removal of the global DAC bus freed a large amount of inner space. Saying yes to freeing this space, for more transistors, will have been the forum's biggest contribution to the Prop2 being even hotter running. Without that space, Chip couldn't have added such hefty support for Hubexec, not to mention doubling the size of HubRAM from 128 kB to 256 kB.

Tubular · 2014-04-24 19:54

Phil Pilgrim (PhiPi) wrote: »

But nobody's going to do that.

Oh yes they are.

I think its important to keep it in perspective. 16 cogs is something like 32k flops. 64 flops per pin, times 64 pins, is only 4k flops, but buys a heap of flexibility that you just can't get elsewhere for a reasonable price.

That space between the pins is probably harder to utilise for other things. I imagine you have a big metal landing pad to receive the gold wire, then you need a similarly big gap to the next landing pad. Inside the pads you'd have some busy interconnect bus. So the space in between the pads is probably relatively "cheap" for putting pin smarts into. I'm not a chip designer either, but this seems to make sense

Also the smarts at the pins mean we're wasting nowhere near as much interconnect bus space. Now *thats* a much worse waste of space. Remember how much space was freed up when that direct dac routing was replaced? evanh beat me to it

Lawson · 2014-04-24 20:01

Phil Pilgrim (PhiPi) wrote: »

We can do that now in the P1. What Smart pins imply is that you can use every pin for anything. But nobody's going to do that. Transistors don't cost nothing: there's silicon real estate to consider. And 64 smarty-pins don't come free. To me, it makes more sense to put the smarts in the functional blocks (i.e. cogs), since that's where the "smart" behavior takes place.

-Phil

Disclaimer: I'm not a chip designer and don't claim that my posts on the subject are authoritive, rather, merely inquisitive. (At least I'm honest about it.)

I think it depends on how much routing can be saved by the smart pins. Remember, each pin has a DAC, ADC modulator, and a ton of modes. Routing all that to any one of 16 cogs isn't free. In this case, adding serial configuration and a counter right at the pin that re-use the IN, OUT, and DIR busses has a good chance of using the same or less area, while making the pins *really* smart.

Marty

evanh · 2014-04-24 20:16

Lawson wrote: »

... Routing all that to any one of 16 cogs isn't free. In this case, adding serial configuration and a counter right at the pin that re-use the IN, OUT, and DIR busses has a good chance of using the same or less area, while making the pins *really* smart.

Well, there is an argument in favour of putting more inside the Cogs on that front. Every Cog already has a basic binary view of the pins. Smarts can be built around that inside the Cogs without adding long routes.

For each register that exists at the pins, however, there is a desire to have non-stalling access to it. The only way to achieve this generically is to have a gaggle of wide buses, one for each Cog, circumnavigating the whole IC just like the global DAC bus did. This would be very space costly and I'm quite certain Chip has no intention of revisiting it.

So, if the counters do end up as part of the Smart-pins then we will be sacrificing programmatic speed to them.

Which, in turn, means we are beholden to what the hardware is designed to do and very little more.

EDIT: What Chip has done with the DACs is another option but it does have the side effect of partitioning specific Cogs to specific pins.

Cluso99 · 2014-04-24 20:31

There is already some smarts in the pins. This has been there since the start of the P2 design. Comms is only over the existing Input/Output/Direction pins, so there is no additional busses required.
So, it is just a matter of adding some additional smarts to do extra things. If there is space, nothing is lost, because if they are not used, they are not being clocked and there is no power being wasted. Its another of Chips WIN-WIN features.

evanh · 2014-04-24 20:56

Cluso99 wrote: »

There is already some smarts in the pins. This has been there since the start of the P2 design. Comms is only over the existing Input/Output/Direction pins, so there is no additional busses required.
So, it is just a matter of adding some additional smarts to do extra things. If there is space, nothing is lost, because if they are not used, they are not being clocked and there is no power being wasted. Its another of Chips WIN-WIN features.

That's fine for config. However, if you want to programmaticly stream samples to the DACs or to/from some new counters, say for ADC function, then the speed of loading and reading those registers becomes of concern.

Adding more to the existing Cog counters might be a better option.

Ah, oops, a correction to my previous statement of the only way: What Chip has done with the DACs is an alternative. Ie: Associate a group of I/O to a particular core. So, to have fast access to certain smarts you would be forced to put the software in the matching Cog. This is probably okay in most cases. I wouldn't say no. It still leaves bit-bashing and config bus for pin access from the other Cogs.

cgracey · 2014-04-24 21:09

JonnyMac wrote: »

I've been very busy coding several P1 projects so I haven't visited this forum.

For the most part, I'm thrilled with the Propeller as it is, and will heartily welcome more cogs and memory. What I'd really love to have is the ability to do set-and-forget PWM control using a counter -- control that allows me to maintain duty cycle and frequency without having to reload the phsx register in a loop. This feature does in fact exist in the SX48, and I used it for motor control in a product built by Camera Turret Company (www.cameraturret.com). With the demise of the SX, I encouraged Lou to embrace the Propeller and now all of his product use it. What's frustrating, though, is that we have to use a cog to run motors with a specific duty cycle and frequency.

In my idyllic world, there would be would be two frqx regsiters for each counter: frx1 and frx2. The frx1 register would be used in the modes we know and use today. In set-and-forget mode, frx1 would hold the "on" ticks for a pin, the frx2 register would hold the "off" ticks for a pin (of course, those would be reversed for a differenital pin).

It's a small request, Chip... what do you say?

Better than a counter to set and forget, the actual pins will have smarts to support PWM, all on their own. I'm thinking that maybe I can have the counters perform the messaging, augmented with a shift-out function. And a shift-in function to receive messages back from pins. That would lighten the cog circuitry a little, if the CTRs could do messaging to/from pins.

EDIT: I just remembered that it may be necessary to configure multiple pins at once, so I'll keep the current arrangement, as is.

I was looking at the CTRs the other night and there are many modes that no longer make sense, like the feedback modes, because those are now handled within the pins, themselves. Besides, there will be a few-clock delay when reading and writing pins, so instantaneous feedback from the CTRs doesn't even work as you'd expect, anymore.

evanh · 2014-04-24 21:22

cgracey wrote: »

I'm thinking that maybe I can have the counters perform the messaging, augmented with a shift-out function. And a shift-in function to receive messages back from pins. That would lighten the cog circuitry a little, if the CTRs could do messaging to/from pins.

Are you talking about over the config bus? There is a common shared serial bus now, right?

jmg · 2014-04-24 21:27

cgracey wrote: »

I was looking at the CTRs the other night and there are many modes that no longer make sense, like the feedback modes, because those are now handled within the pins, themselves. Besides, there will be a few-clock delay when reading and writing pins, so instantaneous feedback from the CTRs doesn't even work as you'd expect, anymore.

Could this be split into two locations, so the CTR config has a common subset to P1, but the D-FF needed for ADC is done at the pin.

It will make some CTR modes redundant, but make code easier to port, as a CTR setup for ADC now, just needs a Pin set to enable the D-FF and feed as a CE to the CTRs ?

jmg · 2014-04-24 21:32

cgracey wrote: »

Better than a counter to set and forget, the actual pins will have smarts to support PWM, all on their own. I'm thinking that maybe I can have the counters perform the messaging, augmented with a shift-out function. And a shift-in function to receive messages back from pins. That would lighten the cog circuitry a little, if the CTRs could do messaging to/from pins..

I'm not quite following this - there are message opcodes now, which include the shift, and if you change to overlay that shift HW with local counters HW then
a) can they still meet MHz with the extra modes ?
b) That means any Local CTR use is sacrificed when pin-cells are used ?

There are requests to expand the config of COG counters a little, so as to not use a pin for simpler timing tasks that an adder cannot quite do. There are spare config bits to support this.

pedward · 2014-04-24 21:49

cgracey wrote: »

Better than a counter to set and forget, the actual pins will have smarts to support PWM, all on their own. I'm thinking that maybe I can have the counters perform the messaging, augmented with a shift-out function. And a shift-in function to receive messages back from pins. That would lighten the cog circuitry a little, if the CTRs could do messaging to/from pins.

EDIT: I just remembered that it may be necessary to configure multiple pins at once, so I'll keep the current arrangement, as is.

I was looking at the CTRs the other night and there are many modes that no longer make sense, like the feedback modes, because those are now handled within the pins, themselves. Besides, there will be a few-clock delay when reading and writing pins, so instantaneous feedback from the CTRs doesn't even work as you'd expect, anymore.

What about implementing comparators and op-amps with the feedback modes? Am I just off base?

jmg · 2014-04-24 21:51

pedward wrote: »

What about implementing comparators and op-amps with the feedback modes? Am I just off base?

IIRC there is already a fast comparator in the pin, for handling ADC modes.

jmg · 2014-04-24 23:11

Seairth wrote: »

Yeah, it might be possible to re-utilize the smart pins. Maybe. But the timers I speak of are internal only. While there is similarity to the CTRx and smart pins, they are not (necessarily) meant to drive pins. It seems like a misuse of the smart pins, but I'll happily reserve judgement until I see what Chip comes up with.

Looking some more into extending the modes of the COG timer a little, to cover your timer cases, it seems this can be done at low Logic cost, and any speed impact looks ok.

Two config flags expand this to NCO OR Saturate OR Reload modes. Control registers are the same as now.

Default mode is NCO, the same adder-mode as P1 uses now, so it is backward compatible.

Saturate mode acts as a monostable => SW loads and can be polled for Zero, where it stays after counting down.

Reload mode Reloads the Set Value on =Zero to repeat the fixed-divide count, and can set a flag on Reload.
(Possibly that flag could map to a virtual pin, or be read somehow by the new JP.JNP opcodes.)

Logic needed is a wide OR-MUX for (AddV : -1) select, and a Zero detect is needed, with a Reload/Saturate option, plus some unused bits in CNTx.

Lattice tools say this runs in their equiv to Cyclone IV at ~250.376MHz & needs 148 LUT4.

Phil Pilgrim (PhiPi) · 2014-04-24 23:41

cgracey wrote:

I was looking at the CTRs the other night and there are many modes that no longer make sense, like the feedback modes,

I use feedback mode a lot for other than direct sigma-delta ADC. The counter feedback output is used in two other counters with local oscillator I and Q signals in the XOR logic mode to perform RF mixing. The FRQx and PHSx registers of the feedback counter are not used, however -- just the inverted feedback signal from the input pin, as if the feedback counter (with a 10M resistor path back to the input pin) were some sort of high-gain digital op amp.

Can this capability be replicated somehow under the new regime? If not, a lot of potential high-frequency signal-processing apps will have to find another home.

-Phil

jmg · 2014-04-24 23:54

Phil Pilgrim (PhiPi) wrote: »

I use feedback mode a lot for other than direct sigma-delta ADC. The counter feedback output is used in two other counters with local oscillator I and Q signals in the XOR logic mode to perform RF mixing. The FRQx and PHSx registers of the feedback counter are not used, however -- just the inverted feedback signal from the input pin, as if the feedback counter (with a 10M resistor path back to the input pin) were some sort of high-gain digital op amp.

Can this capability be replicated somehow under the new regime? If not, a lot of potential high-frequency signal-processing apps will have to find another home.

Can you give that in eqn or circuit form ?
AFAIK, the pin cells will be able to pair-pins in Quadrature Counting modes, so I think that makes the XOR piece relatively simple, but I'm not sure on your other needed logic.
I guess pipeline delays do not matter in RF apps

Phil Pilgrim (PhiPi) · 2014-04-25 00:11

jmg wrote:

Can you give that in eqn or circuit form ?

See this post.

I guess pipeline delays do not matter in RF apps

That would depend upon where the pipeline delays occur. If the delay is between the counter input and the inverted output, that would definitely be a problem. If the delay is between the inverted output and the input to the mixer counter, not such a big deal.

But, truth be known, losing the feedback modes may not be the only stumbling block here. The P1 counters' PLLs, jittery though they may be, have still been a vital part of my RF designs.

-Phil

Cluso99 · 2014-04-25 01:44

Phil Pilgrim (PhiPi) wrote: »

See this post.

That would depend upon where the pipeline delays occur. If the delay is between the counter input and the inverted output, that would definitely be a problem. If the delay is between the inverted output and the input to the mixer counter, not such a big deal.

But, truth be known, losing the feedback modes may not be the only stumbling block here. The P1 counters' PLLs, jittery though they may be, have still been a vital part of my RF designs.

-Phil

How many PLL's do you use? ie could one common shared PLL (like some of the maths Chip is proposing) work???

BTW I loved what you were able to do with the P1, so it would be a shame to lose that ability.

jmg · 2014-04-25 03:11

Phil Pilgrim (PhiPi) wrote: »

But, truth be known, losing the feedback modes may not be the only stumbling block here. The P1 counters' PLLs, jittery though they may be, have still been a vital part of my RF designs.

Yes, I think feedback modes can be moved to a pin cell relatively easily (FF + XOR's), but the lack of a Analog PLL is more of a killer.
Perhaps a design can be made to work with an external PLL like a LV4046 ?

evanh · 2014-04-25 04:09

This is interesting. The Prop1 leakage current is pretty small. The attached page from the Prop datasheet shows it at 3.5 uA. And that above 100 kHz clock rate it's pretty linearly proportional to the clock rate. Which makes the 20 kHz RC oscillator located at a prime position for some MIPS at minimal power usage.

Has Chip said anything on static leakage for either of the Prop2 designs?

EDIT: I just had a look at an AVR32 datasheet with 3.3 volt / 1.8 volt supplies and it looks to be around 2 mA leakage with everything powered up.

Tubular · 2014-04-25 06:52

Chip's indicated roughly 1mA for this design. I guess it will depend a bit on total die area

Prop1 gets down to around 1uA at 1.02v or so, running on Rcslow at a couple of kHz.

rabaggett · 2014-04-25 15:40

Phil Pilgrim (PhiPi) wrote: »

I use feedback mode a lot for other than direct sigma-delta ADC. The counter feedback output is used in two other counters with local oscillator I and Q signals in the XOR logic mode to perform RF mixing. The FRQx and PHSx registers of the feedback counter are not used, however -- just the inverted feedback signal from the input pin, as if the feedback counter (with a 10M resistor path back to the input pin) were some sort of high-gain digital op amp.

Can this capability be replicated somehow under the new regime? If not, a lot of potential high-frequency signal-processing apps will have to find another home.

-Phil

AMEN. I too, use the counters for lots of stuff that has nothing to do with counting, exactly because of these logic modes, and the ability to affect other pins with them.

I like the possibility of using the cog counter to 'gate' the pin pwm (Some kind of logic mode?) as Chip has indicated elsewhere. If we also had some logic modes for output, after the pin 'smarts' and ways to affect adjacent pins, you'd have some of the functionality of Microchip's CLC... Super cool for applications like these.

jmg · 2014-04-25 16:27

Phil Pilgrim (PhiPi) wrote: »

But, truth be known, losing the feedback modes may not be the only stumbling block here. The P1 counters' PLLs, jittery though they may be, have still been a vital part of my RF designs.

Thinking some more about this, I think the PLLs are gone from COGS, and certainly not in the Pin cells, but there is one higher frequency VCO/PLL used for SysCLK, so maybe it could be possible to include a second VCO/PLL.?
The challenge will be managing the Clock domain crossing, but this could feed a /N and direct to a Pin(s) ?
One choice in the /N -> pins could include I-Q outputs on 2 pins, supporting external demodulators.

cgracey · 2014-04-25 23:04

A question for you all:

Would 16 bits of accuracy for X, Y, and angle be adequate for the CORDIC system?

Here is why I'm thinking about 16 bits and no more:

1) A pipelined 16-bit, 20-stage pipelined CORDIC w/K correction would take ~1320 flops, while a comparable 32-bit would take almost 4x as much.
2) Our fast cog multipliers are 16x16 bits.
3) Our DACs will output 16-bit dithered samples.
4) Our ADCs are practically below 16-bit resolution.
5) Two 16-bit values, (sin,cos) or (ro,theta) can fit into a long.

Is there any real need for more than 16 bits of accuracy?

RossH · 2014-04-25 23:21

cgracey wrote: »

A question for you all:

Would 16 bits of accuracy for X, Y, and angle be adequate for the CORDIC system?

Here is why I'm thinking about 16 bits and no more:

1) A pipelined 16-bit, 20-stage pipelined CORDIC w/K correction would take ~1320 flops, while a comparable 32-bit would take almost 4x as much.
2) Our fast cog multipliers are 16x16 bits.
3) Our DACs will output 16-bit dithered samples.
4) Our ADCs are practically below 16-bit resolution.
5) Two 16-bit values, (sin,cos) or (ro,theta) can fit into a long.

Is there any real need for more than 16 bits of accuracy?

Hi Chip - I'm not really up on cordic - can you use multiple 16 bit cordic operations to achieve higher accuracy? If the answer is yes, then I'd say we don't need any higher resolution in a single operation.

Ross.

cgracey · 2014-04-25 23:25

RossH wrote: »

Hi Chip - I'm not really up on cordic - can you use multiple 16 bit cordic operations to achieve higher accuracy? If the answer is yes, then I'd say we don't need any higher resolution in a single operation.

Ross.

It's a one-shot computation that does any of the following:

rotate x,y around 0,0
convert ro,theta to x,y (polar to Cartesian)
convert x,y to ro,theta (Cartesian to polar)

I'm thinking 20 bits would certainly be sufficient, but we'd need to up our multipliers to 20x20 bits and make a new SCL 'scale' instruction to keep things within longs.

Cluso99 · 2014-04-25 23:43

Sorry Chip, I have no idea.

The New 16-Cog, 512KB, 64 analog I/O Propeller Chip

Comments