Whats stalling the P2 and P8X64A?

pedward · 2016-01-03 07:26

In a lot of ways, the P2 couldn't have come about until now because the toolset wasn't widely available, the mindset hadn't shifted to open-source everything development, the desire for what features are needed now was different 8 years ago, and the ability for so many people to collaborate in an open manner to produce this chip, wasn't there 8 years ago.

cgracey · 2016-01-03 07:45

Bill Henning wrote: »

Drooling!

With smart pins, I'll dust off my DE2-115

Any idea how many cogs & smart pins will find Nano and DE2?

cgracey wrote: »

Within a few days, I will have a new FPGA image posted here which will feature the new smart pins, which can do UART, SPI, ADC, DAC, PWM, and many measurements functions. This will unburden the cogs quite a bit.

After we've refined the smart pins and done any final tuning to the rest of the design, we will proceed to make the chip.

It's true that things like Arduinos do impressive things for cheap, but they are not often useful for strict real-time control. This is what the Prop2 is all about. You code exactly what it does. It's quite a simple and direct system, in that sense.

On the DE0-Nano, I'm sure we can get two to four smart pins, at least.

I'm redoing the separate async TX and RX state machines into one. I'll do the same for the SPI shifter. The smart pin needs to lighten up a bit.

jmg · 2016-01-03 19:37

Bill Henning wrote: »

I think I2C would best be handled with an interrupt on the clock (due to clock stretching) -

You would also want to sense START (and maybe STOP), which could be 2 or 3 interrupts.
I guess one INT on SCL =\_ and one INT on SDA Change may work, but the SDA one would fire during data & need more SW overhead.

A simple pin-mode that properly sensed START and STOP could reduce interrupt thrashing.

Some MCUs manage byte shift using USI hardware, (but ACK.START.STOP are SW) and that seems minimal HW cost ?

jmg · 2016-01-03 19:49

cgracey wrote: »

I'm redoing the separate async TX and RX state machines into one. I'll do the same for the SPI shifter. The smart pin needs to lighten up a bit.

Or, you could do a Pin-cell per pair of pins, and almost halve the logic consumed ?

Probably worth doing a spread sheet that covers the use cases :

eg
* Pin-limited Duplex UARTS = 32 channels
* Pin-limited Quad Counters = 32 channels
* Pin-limited multi-SPI = 16 channels
* Pin-limited i2c = 32 channels
* Pin-limited CAN = 32 channels
* Pin-limited LIN = 32 channels
* Pin-limited USB = 16 channels ? (COG limited?)

Then, you have asymmetric cases like :
More Rx or Tx - how many users want 63 Rx, 1Tx ?
Servo PWM - Maybe someone needs 64 chans, but with 2 PWMs per cell, that is managed @ 32 cells ?

Frequency Counting ? - 64 chans ? Rare ?
32 channels of high performance Frequency Counting seems a pretty good ceiling ?

cgracey · 2016-01-03 20:01

jmg wrote: »

cgracey wrote: »

I'm redoing the separate async TX and RX state machines into one. I'll do the same for the SPI shifter. The smart pin needs to lighten up a bit.

Or, you could do a Pin-cell per pair of pins, and almost halve the logic consumed ?

Probably worth doing a spread sheet that covers the use cases :

eg
* Pin-limited Duplex UARTS = 32 channels
* Pin-limited Quad Counters = 32 channels
* Pin-limited multi-SPI = 16 channels
* Pin-limited i2c = 32 channels
* Pin-limited CAN = 32 channels
* Pin-limited LIN = 32 channels
* Pin-limited USB = 16 channels ? (COG limited?)

Then, you have asymmetric cases like :
More Rx or Tx - how many users want 63 Rx, 1Tx ?
Servo PWM - Maybe someone needs 64 chans, but with 2 PWMs per cell, that is managed @ 32 cells ?

Frequency Counting ? - 64 chans ? Rare ?
32 channels of high performance Frequency Counting seems a pretty good ceiling ?

Yes, that could save a lot of logic at the expense of raw versatility. Once we get this first FPGA image out, we'll have to see how it flows. Having fewer smart cells also means a huge reduction in communication conduit, which saves lots of logic.

Phil Pilgrim (PhiPi) · 2016-01-04 00:06

In the P1, smarts (e.g. counters) were allocated per cog. But this static allocation doesn't help if one cog needs, say, four counters, and another needs none. In the P2, smarts are allocated per pin. But if the latter causes a logic explosion (who needs 64 counters or 64 SPI peripherals?), there is a middle ground: a smaller collection of globally-accessible smarts available via -new, -init, and -stop instructions. For example, to allocate a counter, a ctrnew instruction would return a handle to it, from which it could be set up and given assigned pins.

Yeah, I know, it's very late in the game to suggest something like this ...

-Phil

cgracey · 2016-01-04 00:50

Phil Pilgrim (PhiPi) wrote: »

In the P1, smarts (e.g. counter6s) were allocated per cog. But this static allocation doesn't help if one cog needs, say, four counters, and another needs none. In the P2, smarts are allocated per pin. But if the latter causes a logic explosion (who needs 64 counters or 64 SPI peripherals?), there is a middle ground: a smaller collection of globally-accessible smarts available via -new, -init, and -stop instructions. For example, to allocate a counter, a ctrnew instruction would return a handle to it, from which it could be set up and given assigned pins.

Yeah, I know, it's very late in the game to suggest something like this ...

-Phil

There'd be a lot of mux'ing in a case like that - on both ends.

Each pin has 16 digital I/O signals, so it makes sense to stick the logic onto the pins, as mux'ing all that would be too much. We only need to mux 2 bits in and 4 bits out of each pin for communication purposes now. It's a reversal of paradigm from Prop1, but I think it will really leave cogs' bandwidth available for processing, rather than babysitting I/O operations.

jmg · 2016-01-04 01:38

Phil Pilgrim (PhiPi) wrote: »

I... But if the latter causes a logic explosion (who needs 64 counters or 64 SPI peripherals?), there is a middle ground: a smaller collection ...

Yup, that is what my Paired-cell approach suggestion gives.

Simple Pin-limits mean 64 SPI is never going to happen nor is 64 UARTS, nor 64 Quad Counters.

2 Pin interfaces have a natural ceiling of 32

However, someone may want 64 counters, but a paired pin cell could give likely that.

Bill Henning · 2016-01-04 06:28

Sounds great.

Given that a single pin can only handle one direction at one time, it would make sense to have one state machine, which runs in either TX or RX mode.

cgracey wrote: »

On the DE0-Nano, I'm sure we can get two to four smart pins, at least.

I'm redoing the separate async TX and RX state machines into one. I'll do the same for the SPI shifter. The smart pin needs to lighten up a bit.

Bill Henning · 2016-01-04 06:30

I think jmg is on to something here.

If the dual-pin blocks could still talk to their neighbours, it would allow for a six pin QSPI / SD interface nicely.

cgracey wrote: »

jmg wrote: »

cgracey wrote: »

I'm redoing the separate async TX and RX state machines into one. I'll do the same for the SPI shifter. The smart pin needs to lighten up a bit.

Or, you could do a Pin-cell per pair of pins, and almost halve the logic consumed ?

Probably worth doing a spread sheet that covers the use cases :

eg
* Pin-limited Duplex UARTS = 32 channels
* Pin-limited Quad Counters = 32 channels
* Pin-limited multi-SPI = 16 channels
* Pin-limited i2c = 32 channels
* Pin-limited CAN = 32 channels
* Pin-limited LIN = 32 channels
* Pin-limited USB = 16 channels ? (COG limited?)

Then, you have asymmetric cases like :
More Rx or Tx - how many users want 63 Rx, 1Tx ?
Servo PWM - Maybe someone needs 64 chans, but with 2 PWMs per cell, that is managed @ 32 cells ?

Frequency Counting ? - 64 chans ? Rare ?
32 channels of high performance Frequency Counting seems a pretty good ceiling ?

Yes, that could save a lot of logic at the expense of raw versatility. Once we get this first FPGA image out, we'll have to see how it flows. Having fewer smart cells also means a huge reduction in communication conduit, which saves lots of logic.

Bill Henning · 2016-01-04 06:32

I agree, however I am not sure it is worth the cost in gates - with interrupts, a P2 could easily handle 1Mbps I2C.

For faster serial interfacing we'd still have synchronous serial, SPI and possibly QSPI

jmg wrote: »

Bill Henning wrote: »

I think I2C would best be handled with an interrupt on the clock (due to clock stretching) -

You would also want to sense START (and maybe STOP), which could be 2 or 3 interrupts.
I guess one INT on SCL =\_ and one INT on SDA Change may work, but the SDA one would fire during data & need more SW overhead.

A simple pin-mode that properly sensed START and STOP could reduce interrupt thrashing.

Some MCUs manage byte shift using USI hardware, (but ACK.START.STOP are SW) and that seems minimal HW cost ?

jmg · 2016-01-04 19:52

Bill Henning wrote: »

I agree, however I am not sure it is worth the cost in gates - with interrupts, a P2 could easily handle 1Mbps I2C.

What cost are you taking about ?
The gate cost of sensing Start, is very low, and is a variant on edge-detect choices likely already there.
Likewise, for byte-shift (exclude ACK.START.STOP)

If you have only a SCL interrupt, which you seem to be proposing, how do you sense i2c START / STOP ?

Whats stalling the P2 and P8X64A?

Comments