P2 Smart Pins

jmg · 2016-02-02 01:37

cgracey wrote: »

For what it's worth, the user-programmable mode can do any-bit-length SPI/I2C, it just can't double-buffer data like the serial mode can. The serial mode is really deluxe, in that you can run full-speed, without interruption, because of the double-buffering.

I'm not clear, does that mean you cannot run full speed ( aka continual data, no gaps) on SPI ?

Can the UART mode manage HW handshake lines, when the double buffer is not read in time ?

Electrodude · 2016-02-02 01:39

What if you drop the shifting stuff in the smartpins and make the cogs do it themselves?

What's the difference between user-defineable mode and serial mode? I was under the impression that user-defined mode could be used to do serial, by making it increment if pin A was set and shift if pin B (which could be an actual pin for synchronous or an oscillator for asynchronous?) was set.

jmg · 2016-02-02 01:45

Electrodude wrote: »

What if you drop the shifting stuff in the smartpins and make the cogs do it themselves?

You then severely limit the Serial channels per COG
to one ? or two ?

Electrodude wrote: »

What's the difference between user-defineable mode and serial mode?

Good question - I had expected all serial modes to have similar buffering and similar bit-granularity, from sharing a much as possible..

Electrodude · 2016-02-02 01:48

jmg wrote: »

Electrodude wrote: »

What if you drop the shifting stuff in the smartpins and make the cogs do it themselves?

You then severely limit the Serial channels per COG
to one ? or two ?

No, I mean the final shift we were talking about earlier, in case Chip didn't see it. Leave the smart pins on the pins, of course, but have them give cogs data that's shifted wrong and have the cogs shift it into position. That would eliminate all of the mux's

jmg · 2016-02-02 01:51

Electrodude wrote: »

No, I mean the final shift we were talking about earlier, in case Chip didn't see it. Leave the smart pins on the pins, of course, but have them give cogs data that's shifted wrong and have the cogs shift it into position. That would eliminate all of the mux's

Ah, right - you mean the rotate/justify & maybe even Mirror-bit-order (for MSB/LSB) ?
I agree it is worth looking at what opcode support in the COG, can save Logic in the pins.

Electrodude · 2016-02-02 02:11

Precisely. The cogs already have tons of bit twiddling hardware, so the smartpins don't really need to also be able to do this.

The one exception I can think of is that you can't use a cog's bit twiddling instructions if you're piping data directly from the streamer to the pins, or vice versa.

evanh · 2016-02-02 02:23

The Streamers don't access the SmartPins, afaik. SmartPins can generate an event so it'll be software driven via events/IRQ.

cgracey · 2016-02-02 02:25

I just started recoding the wide mux's and on the first instance, it cut 13 ALM's from each smart pin.

Using the ternary ?/: operator makes for easy writing and reading of code, but the Altera tools don't break it down as well as the ASIC tools do. The Altera tools always compile logic while respecting the expressed priority. ASIC tools break things down to individually exclusive cases where signals can be AND'd and OR'd, instead of mux'd.

Here is what I had:

assign zd =	!reset && wx		?	wd
	  :	reset ||
		op[2:0] ==  3'b001	?	32'b0
	  :	op[2:0] ==? 3'b01?	?	$signed({op_lsb, 1'b1})
	  :	op[2:0] ==? 3'b10?	?	$signed({op_lsb, 1'b1}) + zq
	  :	op_lsb			?	{zq[30:0], sq[7] && xq[30]}
					:	{sq[7] && xq[30], zq[31:1]};

And here is what I changed it to, which is logically equivalent, but Quartus digests more gracefully:

assign zd =	{32{!reset &&  wx}}				&	wd						|
		{32{!reset && !wx && ^op[2:1]}}			&	{{31{op_lsb}}, 1'b1} + ({32{op[2]}} & zq)	|
		{32{!reset && !wx && &op[2:1] && !op_lsb}}	&	{sq[7] && xq[30], zq[31:1]}			|
		{32{!reset && !wx && &op[2:1] &&  op_lsb}}	&	{zq[30:0], sq[7] && xq[30]}			;

There are a few more instances to recode. This may yield another 30 ALM's.

evanh · 2016-02-02 02:32

jmg wrote: »

cgracey wrote: »

jmg wrote: »

I can see an option of 32 Pin cells, done as Pairs of pins as one way to save some silicon.
There, the Pin cell does get larger, as Chip now has Tx OR Rx in a cell, not both. Pair-pins would be duplex,so this might save 20-25% of Pin Area ?

Right now, smart pins can do RX and TX. I just looked into splitting them up, so that even pins do TX and odd pins to RX, but there is about 90% overlap in logic, so it's a big price to pay for saving only a little logic.

Yes, but I was meaning they are configured as either RX or configured as Tx.

I suspect a full duplex pair as a set will be just as costly as two separate half duplexs.

cgracey · 2016-02-02 02:38

jmg wrote: »

cgracey wrote: »

For what it's worth, the user-programmable mode can do any-bit-length SPI/I2C, it just can't double-buffer data like the serial mode can. The serial mode is really deluxe, in that you can run full-speed, without interruption, because of the double-buffering.

I'm not clear, does that mean you cannot run full speed ( aka continual data, no gaps) on SPI ?

Can the UART mode manage HW handshake lines, when the double buffer is not read in time ?

The pin-clocked serial mode is like SPI. It double-buffers data, so it can run full speed. If you configure the programmable mode to do SPI, though, there's no double-buffering for data.

There is no provision for managing hardware handshaking lines in the serial mode. I just inputs and outputs data.

cgracey · 2016-02-02 02:39

Electrodude wrote: »

What if you drop the shifting stuff in the smartpins and make the cogs do it themselves?

What's the difference between user-defineable mode and serial mode? I was under the impression that user-defined mode could be used to do serial, by making it increment if pin A was set and shift if pin B (which could be an actual pin for synchronous or an oscillator for asynchronous?) was set.

The hardware can push the boundaries of speed. Software is always going to take maybe 4 instructions (8 clocks) to shift a bit out. Plus, software ties up the cog.

Programmable mode can be used for synchronous shifting, but there's no double-buffering of data.

cgracey · 2016-02-02 02:42

jmg wrote: »

Electrodude wrote: »

No, I mean the final shift we were talking about earlier, in case Chip didn't see it. Leave the smart pins on the pins, of course, but have them give cogs data that's shifted wrong and have the cogs shift it into position. That would eliminate all of the mux's

Ah, right - you mean the rotate/justify & maybe even Mirror-bit-order (for MSB/LSB) ?
I agree it is worth looking at what opcode support in the COG, can save Logic in the pins.

That could be handled in the cog, if need be.

jmg · 2016-02-02 02:42

cgracey wrote: »

There are a few more instances to recode. This may yield another 30 ALM's.

You may be able to squeeze a little more by reducing the MUX select terms to a single bit. (ie add nodes)
A little slower, but then one decode is used across all 32 registers per line.

Peter Jakacki · 2016-02-02 02:57

cgracey wrote: »

Regarding the current 8- and 32-bit limitations on serial:

I suspect a 9-bit mode might be handy in the dedicated serial circuit, to accommodate parity. Personally, I like to handle errors at a much higher level, rather than having to react from the trenches.

Yet it's not only parity but many microcontroller UARTs support 9-bit address mode and right at the moment I'm working on one of my TFT displays that use an SSD2119 which supports 4 or 3 wire serial, but to get it down to 3 wires you need 9-bits serial. So 9-bit is another magic number although most serial is in 8-bit multiples. BTW, does the SPI support toggling the chip-select?

Electrodude · 2016-02-02 03:18

cgracey wrote: »

Electrodude wrote: »

What if you drop the shifting stuff in the smartpins and make the cogs do it themselves?

What's the difference between user-defineable mode and serial mode? I was under the impression that user-defined mode could be used to do serial, by making it increment if pin A was set and shift if pin B (which could be an actual pin for synchronous or an oscillator for asynchronous?) was set.

The hardware can push the boundaries of speed. Software is always going to take maybe 4 instructions (8 clocks) to shift a bit out. Plus, software ties up the cog.

Programmable mode can be used for synchronous shifting, but there's no double-buffering of data.

By the shifting stuff, I meant the final shift that could be done in the cogs that would allow you to eliminate the muxes in the smart pins that you have now that are making 1..32 bit serial hard. I certainly don't think any of the heavy lifting should be moved back into the cogs.

Also, why can't the programmable mode be made to do both syncronous and asyncronous serial, by sourcing the B pin from either the adjacent pin for synchronous or the pin's counter for asynchronous? Then you could eliminate the separate serial hardware. I'm under the impression that you still have smartpin opcode space left, due to only the top bit of the 1xxxx instruction meaning anything. You could add four more opcodes for shift-and-increment, with one bit to specify left or right shift and another to specify increment or decrement. You could then also make the double-buffering configurable somehow, perhaps through two separate "signal" opcodes - one that double-buffers and one that doesn't.

jmg · 2016-02-02 03:32

cgracey wrote: »

There is no provision for managing hardware handshaking lines in the serial mode. I just inputs and outputs data.

This could use adjacent cells, and the cross-feed feature ?
On Receive handshake Out, it is a copy of the Buffer_ready signal, that toggles if a shifted byte finds it cannot load into the buffer.
Tx handshake In acts as a clock-enable on the Tx counter/state during the idle times.

jmg · 2016-02-02 03:42

Peter Jakacki wrote: »

I'm working on one of my TFT displays that use an SSD2119 which supports 4 or 3 wire serial, but to get it down to 3 wires you need 9-bits serial.

To confirm, that is 9 bit SPI you need ?

Electrodude · 2016-02-02 03:52

Electrodude wrote: »

cgracey wrote: »

Electrodude wrote: »

What if you drop the shifting stuff in the smartpins and make the cogs do it themselves?

What's the difference between user-defineable mode and serial mode? I was under the impression that user-defined mode could be used to do serial, by making it increment if pin A was set and shift if pin B (which could be an actual pin for synchronous or an oscillator for asynchronous?) was set.

The hardware can push the boundaries of speed. Software is always going to take maybe 4 instructions (8 clocks) to shift a bit out. Plus, software ties up the cog.

Programmable mode can be used for synchronous shifting, but there's no double-buffering of data.

By the shifting stuff, I meant the final shift that could be done in the cogs that would allow you to eliminate the muxes in the smart pins that you have now that are making 1..32 bit serial hard. I certainly don't think any of the heavy lifting should be moved back into the cogs.

Also, why can't the programmable mode be made to do both syncronous and asyncronous serial, by sourcing the B pin from either the adjacent pin for synchronous or the pin's counter for asynchronous? Then you could eliminate the separate serial hardware. I'm under the impression that you still have smartpin opcode space left, due to only the top bit of the 1xxxx instruction meaning anything. You could add four more opcodes for shift-and-increment, with one bit to specify left or right shift and another to specify increment or decrement. You could then also make the double-buffering configurable somehow, perhaps through two separate "signal" opcodes - one that double-buffers and one that doesn't.

I completely misunderstood the streamer opcodes until now. It already can shift and input a bit at the same time. So why can't the programmable part do asynchronous serial?

Seairth · 2016-02-02 04:01

I get the desire to support at many variations as possible, but I would much rather have something that covers the most common use cases really well and otherwise keep it simple (80/20 rule!). Assuming I'm understanding the current design (with I have to admit I'm still a bit fuzzy on), how about this:

* Have a configuration (bcount) that indicates the number of bits (1-32) to receive before indicating that the buffer is "full".
* Tap 8, 16, or 32 bits, according to whichever one best fits the configured bits (smallest tap >= bcount).
* When the configured length is less then the next-largest tap length, you will get additional "garbage" bits that it is up to the user code to ignore. For example, if bcount is 4, the 8-bit tap will be used and 4 of the bits will be ignored in code.

This way, the most common use cases (8, 16, and 32 bits) will be the most efficient use of the hardware. bcount between 1 and 7 bits will result in an 8 bits being streamed, bcount between 9 and 15 bits will result in 16 bits being streamed, and a bcount between 17 and 31 bits will result in 32 bits being streamed.

So, if you want to do 8-bit serial with parity, you will be dealing with a 16-bit message. Yes, it will reduce your maximum bit rate, but it's also a less common use case. It's not perfect, but it doesn't look like we have room for perfect.

jmg · 2016-02-02 04:27

Seairth wrote: »

* Have a configuration (bcount) that indicates the number of bits (1-32) to receive before indicating that the buffer is "full".
* Tap 8, 16, or 32 bits, according to whichever one best fits the configured bits (smallest tap >= bcount).

The broad idea seems good, I think it needs 24b option for i2s use, giving 4 taps set by 2 config bits

This would need opcode support for the other align/justify cases, and I do not see a Mirror-Bits opcode - Could that save total device logic, by moving some work cog-side ?

Seairth wrote: »

This way, the most common use cases (8, 16, and 32 bits) will be the most efficient use of the hardware. bcount between 1 and 7 bits will result in an 8 bits being streamed, bcount between 9 and 15 bits will result in 16 bits being streamed, and a bcount between 17 and 31 bits will result in 32 bits being streamed.

IIRC the payload between COG and Pin cell is always of fixed size, but the pin payload of course varies.
The shifter can clear on transfer, to make 'extra' bits always 0 for simpler handling.

cgracey · 2016-02-02 10:29

The full implementation is compiling okay now.

It just finished. It used 111,992 ALMs out of 113,560 in the -A9. Fmax is ~69MHz, so it will run for us at 80MHz. The interconnect delays are slowing it way down, now that the whole FPGA is being used.

Cogs are taking ~4050 ALMs each and smart pins are taking ~440 ALMs each. This means smart pins are taking ~25% of overall logic. The CORDIC takes ~8400 ALMs and the hub memory (egg beater) takes ~5060 ALMs - a lot of mux's.

Tomorrow I'll give it a thorough test and get this up so you guys can use it. Sorry this has been taking forever.

We'll get the 1..32-bit serial stuff working a little later. It would be good to start testing these pins soon.

Tubular · 2016-02-02 11:16

Nice going Chip. 80 MHz is great at this stage

ozpropdev · 2016-02-02 11:17

Fabulous work Chip!

Looking forward to giving the pins a thorough workout!

Cluso99 · 2016-02-02 11:52

Fantastic news Chip!

evanh · 2016-02-02 12:46

/me is chuffed at guessing the 25%.

evanh · 2016-02-02 12:52

Chip, you obviously did a lot more hand optimising - knocking off a good 60 ALMs per SmartPin!

KMyers · 2016-02-02 18:19

Fantastic work but I have a headache trying to understand the finer interworking. Need to re read several more times!!

cgracey · 2016-02-02 18:47

KMyers wrote: »

Fantastic work but I have a headache trying to understand the finer interworking. Need to re read several more times!!

Documentation is needed and I'll be on it soon.

Cluso99 · 2016-02-02 20:22

Chip,
Anything we can help with the Smart Pin documentation?
For a block diagram what drawing package would you prefer? ExpressSchematic???

cgracey · 2016-02-02 20:36

Cluso99 wrote: »

Chip,
Anything we can help with the Smart Pin documentation?
For a block diagram what drawing package would you prefer? ExpressSchematic???

I don't know. What I'll have to do first is describe it in text. A diagram would just make it click in people's minds much faster.

P2 Smart Pins

Comments