JMG, if I'm reading your thinking correctly, you are saying that the shift/buffer registers are managed as 32-bit in terms of parallel data then only the shifting is controlled by the length parameter?
Yup. Chip's fancy pre-conditioners decide when to shift, ( CE, Din, Start etc) and the counter TC transfers wherever the shifter is up to, to the holding register & reloads with the length setting.
I'd guess hardware handshake can be done with the cross-coupling feature, by sensing when the buffer has not yet been read.
... the counter TC transfers wherever the shifter is up to, to the holding register & reloads with the length setting.
What I meant is the shift length would only be applied to the direct pin bit-shifts. The parallel transfers/buffering would all become 32-bit in nature. Software would have to mask off or ignore the excess bits.
With this arrangement the parallel mux'ing for variable bit length goes away.
Regarding the current 8- and 32-bit limitations on serial:
We could do ANY word length, but it adds a lot of mux's, which inflate the gate count. It would be better to pick what several word lengths are likely to be needed, and implement maybe four or eight different ones.
I don't see a lot of muxes, as they are shifters, and a counter defines when to stop shifting and load ?
That means a 5b loadable unidirectional counter for length, which is quite compact in logic.
There is likely already a bit counter for 8 & 32, so it is adding 4 config bits.
I would expect 8 preset values to use more muxes/roms.
By using mux's and masking, I get around needing what would become a barrel shifter, to get the LSB-justified result, if we had full 1..32-bit variability.
If you always shift into the MSB, say in the case of asynchronous serial receive, you will need to shift the final result down some number of bits, in case you are doing less than 32. Instead of needing a final-result shifter, I just insert the bit at the appropriate tap using a mux, on each input shift. This one-mux-per-word-length saves us from needing to do a final shift of some arbitrary number of bits.
I would think it would be simpler to leave the data not LSB-justified and make the cog (which has a barrel shifter) do it.
Indeed it would, hardware-wise, but minus that 'how cool is it that I just get a number I can use' kind of simplicity for the end user that is the hallmark here.
If you always shift into the MSB, say in the case of asynchronous serial receive, you will need to shift the final result down some number of bits, in case you are doing less than 32. Instead of needing a final-result shifter, I just insert the bit at the appropriate tap using a mux, on each input shift. This one-mux-per-word-length saves us from needing to do a final shift of some arbitrary number of bits.
Understood, but 1..32 is more flexible than a fixed choice of 1of 8, and once you can define 8, it is only 2 more config bits to define 1..32.
This is Infineon's XMC1xx USC capability, as an example of a shipping flexible Serial Cell design.
• UART (ASC, asynchronous serial channel)
– Module capability: receiver/transmitter with max. baud rate f PERIPH / 4
– Wide baud rate range down to single-digit baud rates
– Number of data bits per data frame: 1 to 63
– MSB or LSB first • LIN Support by hardware (Local Interconnect Network)
– Data transfers based on ASC protocol
– Baud rate detection possible by built-in capture event of baud rate generator
– Checksum generation under software control for higher flexibility • SSC/SPI (synchronous serial channel with or without slave select lines)
– Standard, Dual and Quad SPI format supported
– Module capability: maximum baud rate f PERIPH / 2, limited by loop delay
– Number of data bits per data frame 1 to 63, more with explicit stop condition
– Parity bit generation supported
– MSB or LSB first • IIC (Inter-IC Bus)
– Application baud rate 100 kbit/s to 400 kbit/s
– 7-bit and 10-bit addressing supported
– Full master and slave device capability • IIS (infotainment audio bus)
– Module capability: maximum baud rate f PERIPH / 2
Plus they have a split-able 64 word FIFO.
Plus they have a split-able 64 word FIFO.
( I think they have Parity bit in the wrong section ? )
That's highly flexible, and P2 should be able to get close to that, (maybe not to 63 bits*) but it is something of a large block.
However, P2 can do the byte/word level stuff in SW, and do the bit-level stuff in the pins.
Some muxes will always be needed if you want to support SPI slave as an N-Bit shift register model, but I'm not sure that is mandatory.
Keeping in the KISS + SW for higher level P2 approach, the simplest and most flexible Pin cell would have SW manage justify (which P2 can do to any count, in 1opcode).
Q:Does the P2 still have a Mirror-bit-order opcode ?
That could be used to further remove logic, by doing MSB/LSB in SW ?
With 64 of these Pin cells, at some stage 'small' has to trump 'nice', but flexible should never be lost.
*Maybe 2 can be chained to get to the 63 bits Infineon offer already ?
What I meant is the shift length would only be applied to the direct pin bit-shifts. The parallel transfers/buffering would all become 32-bit in nature.
Yes, I was maybe unclear but the buffer path has no shifting.
With this arrangement the parallel mux'ing for variable bit length goes away.
The buffer path muxes go, but shifter length muxes may still be needed, depending on how you manage SPI.
Some SPI parts model HC595 type shifters, ie support chaining, with data pass thru over many clocks.
Others just Rd/Wr every 8 clocks, but still draw their SPI as an 8b shifter.
Chaining makes more sense on smaller MCUs, which may replace HC595, but would it be missed on P2 ?
What is needed on SPI, is a continual stream mode.
eg 4 bytes sent in 8 Bit with 32 clocks, no gaps.
P2 may impose some upper limits on that ?
Maybe the cog could automagically shift the value when it receives it from the smartpins?
Specifying the shift value via SETQ would be just as many instructions but might somehow feel simpler than an explicit shift?
Thinking some more on this, a READJUSTIFY (or Read-Shift) type opcode could keep code small, and save logic in the pins.
READJUSTIFY either has params, or a previous config done when the user configured the Pin Cell.
Previous config is simpler, but needs care around mixing multiple ports, which a user could easily do.
There are a LOT of serial ports now, so users could mix any number in final use.
Maybe the cog could automagically shift the value when it receives it from the smartpins?
Specifying the shift value via SETQ would be just as many instructions but might somehow feel simpler than an explicit shift?
Thinking some more on this, a READJUSTIFY (or Read-Shift) type opcode could keep code small, and save logic in the pins.
READJUSTIFY either has params, or a previous config done when the user configured the Pin Cell.
Previous config is simpler, but needs care around mixing multiple ports, which a user could easily do.
There are a LOT of serial ports now, so users could mix any number in final use.
But if we can identify what word lengths are needed, it's only one single-bit mux per. That seems like the simplest solution. We have 8- and 32-bit word lengths, but what others are likely needed? Nobody's going to want 21, or 17, right?
But if we can identify what word lengths are needed, it's only one single-bit mux per. That seems like the simplest solution. We have 8- and 32-bit word lengths, but what others are likely needed? Nobody's going to want 21, or 17, right?
Yes, fixed config is the simplest, but far from the most flexible.
It gets rather hard to call them 'smart pins' when they are less flexible than existing designs out there, on chips under $1.
Worse, it means you cannot talk to those low cost chips as slaves in all their modes.
Ideal, would be some means to chain 2 cells to cover the full 1..63 supported, in HW.
JTAG uses a lot of variable lengths, if you wanted to target that in hardware. (needs dual-SPI mode)
Standard UARTS offer 5,6,7,8,9, and you certainly also need 16,24,32, then if you want Parity and 1-2 stop fields in SW, you get to 10,11,17,18,25,26, so are already at 14 choices - now just 1 config bit short of 1..32.
Might this be possible, and might it be useful???
34 used for start+32+stop and when read returns 32 bits with start in Z and stop in C
I presume we can set the number of bits, or can this be added easily? If so, we can just read the 32 bits and mask/shift in software to deliver the required bits. This way, 8 & 32 should be fine - cover the usual but help the unusual.
I presume we can set the number of bits, or can this be added easily?
That is what is being discussed.
As spec'd above, there are right now just 2 choices 8 or 32b set by mode.
16b & 24b i2s is not possible, nor is 5,6,7,9b UART (+Parity).
My suggestion is to save 2 modes for other uses, and allow user specify number of bits as 1..32
Nine-bit words are used in some RS485/422 networks, where the MSB is used to designate that it's an address.
Yup, many small MCUs have 9-bit Address based interrupts, and EXAR have good USB-UARTS that properly support 9b modes.
Getting 9b from a generic USB-UART is a royal pain, as the force-parity paths are USB-frame paced.
We have also designed 'twisted ring' chained UART networks, using that 9th bit as edge-signal, that have very small code overhead , and run at high Baud rates.
What if the Smart Pins were moved back into the cogs?
We don't get a 4 fold reduction because we would require at least two sets. But there would be some common logic that could be saved. Now, maybe with the ability to cut the 2x 32-bit register into 4x 16-bit registers, we could do 16-bit quad spi.
Of course this method would add some muxes to get the correct input & output pins. Would this trade in silicon to what we save?
Can any of this be integrated with the streamer to combine logic and perhaps save logic?
Could we use this logic/mux to benefit the INA/INB OUTA/OUTB access instructions by automatically masking the bits?
Can we then use these counters in place of the cogs counters that have been put back in?
What if the Smart Pins were moved back into the cogs?
That's a very large change. at this stage.
You still need 16:64 pin mapping muxes, and you lose the ability to have one COG control 32 UARTS, or 16 PWMS etc.
A nice feature of Pin cells, is you remove the constraint of 2 counters per COG.
I can see an option of 32 Pin cells, done as Pairs of pins as one way to save some silicon.
There, the Pin cell does get larger, as Chip now has Tx OR Rx in a cell, not both. Pair-pins would be duplex,so this might save 20-25% of Pin Area ?
Ugh! I just did a full-chip compile and got this message:
Error (170012): Fitter requires 11401 LABs to implement the design, but the device contains only 11356 LABs
The way out of this is to reduce the logic in the smart pins. We need to cut 7 ALM's from each smart pin. Better goal would be ~30 ALM's. A LAB is 10 ALM's.
I just checked and the serial mode uses ALL state bits. That's 8 plus 32 * 3 flops. To do 1..32 bit length, we're going to have to add some flops and expand the configuration protocol.
For what it's worth, the user-programmable mode can do any-bit-length SPI/I2C, it just can't double-buffer data like the serial mode can. The serial mode is really deluxe, in that you can run full-speed, without interruption, because of the double-buffering.
I see there's still room for compressing the smart pin logic, as one of the 32-bit registers is used by all modes to either shift left/right or count up/down. If that logic were to be consolidated, we'd free up ~80 ALMs per smart pin, which would get us fitting again.
This is going to delay the FPGA image by another day.
What if the Smart Pins were moved back into the cogs?
That's a very large change. at this stage.
You still need 16:64 pin mapping muxes, and you lose the ability to have one COG control 32 UARTS, or 16 PWMS etc.
A nice feature of Pin cells, is you remove the constraint of 2 counters per COG.
I can see an option of 32 Pin cells, done as Pairs of pins as one way to save some silicon.
There, the Pin cell does get larger, as Chip now has Tx OR Rx in a cell, not both. Pair-pins would be duplex,so this might save 20-25% of Pin Area ?
Right now, smart pins can do RX and TX. I just looked into splitting them up, so that even pins do TX and odd pins to RX, but there is about 90% overlap in logic, so it's a big price to pay for saving only a little logic.
Right now, smart pins can do RX and TX. I just looked into splitting them up, so that even pins do TX and odd pins to RX, but there is about 90% overlap in logic, so it's a big price to pay for saving only a little logic.
Yes, but I was meaning they are configured as either RX or configured as Tx.
The Pin Cell can configure either way, but it is 'one use'.
To move to 32 Pair Cells would need configure of 1 Rx and 1Tx at the same time. (hence the saving is not 50%)
Comments
Here - http://forums.parallax.com/discussion/comment/1362145/#Comment_1362145
Yup. Chip's fancy pre-conditioners decide when to shift, ( CE, Din, Start etc) and the counter TC transfers wherever the shifter is up to, to the holding register & reloads with the length setting.
I'd guess hardware handshake can be done with the cross-coupling feature, by sensing when the buffer has not yet been read.
What I meant is the shift length would only be applied to the direct pin bit-shifts. The parallel transfers/buffering would all become 32-bit in nature. Software would have to mask off or ignore the excess bits.
With this arrangement the parallel mux'ing for variable bit length goes away.
By using mux's and masking, I get around needing what would become a barrel shifter, to get the LSB-justified result, if we had full 1..32-bit variability.
If you always shift into the MSB, say in the case of asynchronous serial receive, you will need to shift the final result down some number of bits, in case you are doing less than 32. Instead of needing a final-result shifter, I just insert the bit at the appropriate tap using a mux, on each input shift. This one-mux-per-word-length saves us from needing to do a final shift of some arbitrary number of bits.
PS: I like the idea of leaving it to software to format the data.
Indeed it would, hardware-wise, but minus that 'how cool is it that I just get a number I can use' kind of simplicity for the end user that is the hallmark here.
Specifying the shift value via SETQ would be just as many instructions but might somehow feel simpler than an explicit shift?
Interesting idea, as there are only 16 COGs vs 64 Pin cells.
I've read Chip's post twice, but still don't get it...
Guess I need to read more carefully maybe...
When we see a bit of sample code, I suspect it will become a lot more clear.
Exciting times!
This is Infineon's XMC1xx USC capability, as an example of a shipping flexible Serial Cell design.
Plus they have a split-able 64 word FIFO.
( I think they have Parity bit in the wrong section ? )
That's highly flexible, and P2 should be able to get close to that, (maybe not to 63 bits*) but it is something of a large block.
However, P2 can do the byte/word level stuff in SW, and do the bit-level stuff in the pins.
Some muxes will always be needed if you want to support SPI slave as an N-Bit shift register model, but I'm not sure that is mandatory.
Keeping in the KISS + SW for higher level P2 approach, the simplest and most flexible Pin cell would have SW manage justify (which P2 can do to any count, in 1opcode).
Q:Does the P2 still have a Mirror-bit-order opcode ?
That could be used to further remove logic, by doing MSB/LSB in SW ?
With 64 of these Pin cells, at some stage 'small' has to trump 'nice', but flexible should never be lost.
*Maybe 2 can be chained to get to the 63 bits Infineon offer already ?
Agreed, to keep smallest possible Pin-cell, some byte/word level operations in SW can help.
Yes, I was maybe unclear but the buffer path has no shifting.
It is relatively low cost to clear the shifter on transfer, which saves any masking step, but reads are always 32b for simplicity.
The buffer path muxes go, but shifter length muxes may still be needed, depending on how you manage SPI.
Some SPI parts model HC595 type shifters, ie support chaining, with data pass thru over many clocks.
Others just Rd/Wr every 8 clocks, but still draw their SPI as an 8b shifter.
Chaining makes more sense on smaller MCUs, which may replace HC595, but would it be missed on P2 ?
What is needed on SPI, is a continual stream mode.
eg 4 bytes sent in 8 Bit with 32 clocks, no gaps.
P2 may impose some upper limits on that ?
Thinking some more on this, a READJUSTIFY (or Read-Shift) type opcode could keep code small, and save logic in the pins.
READJUSTIFY either has params, or a previous config done when the user configured the Pin Cell.
Previous config is simpler, but needs care around mixing multiple ports, which a user could easily do.
There are a LOT of serial ports now, so users could mix any number in final use.
But if we can identify what word lengths are needed, it's only one single-bit mux per. That seems like the simplest solution. We have 8- and 32-bit word lengths, but what others are likely needed? Nobody's going to want 21, or 17, right?
Yes, fixed config is the simplest, but far from the most flexible.
It gets rather hard to call them 'smart pins' when they are less flexible than existing designs out there, on chips under $1.
Worse, it means you cannot talk to those low cost chips as slaves in all their modes.
Ideal, would be some means to chain 2 cells to cover the full 1..63 supported, in HW.
JTAG uses a lot of variable lengths, if you wanted to target that in hardware. (needs dual-SPI mode)
Standard UARTS offer 5,6,7,8,9, and you certainly also need 16,24,32, then if you want Parity and 1-2 stop fields in SW, you get to 10,11,17,18,25,26, so are already at 14 choices - now just 1 config bit short of 1..32.
The question comes down to how many taps, and can some taps be saved with a READJUSTIFY type opcode ?
IMHO UART sizes of 8, 10, 32 seem reasonable.
8 covers 8, 7+stop, start+6+stop
10 covers 9+stop and start+8+stop
32 covers 32, 31+stop, start+30+stop
Might this be possible, and might it be useful???
34 used for start+32+stop and when read returns 32 bits with start in Z and stop in C
I presume we can set the number of bits, or can this be added easily? If so, we can just read the 32 bits and mask/shift in software to deliver the required bits. This way, 8 & 32 should be fine - cover the usual but help the unusual.
Sure, that is 'reasonable', but it is also a subset of what ships now, in UARTS.
There are also SPI/i2s lengths to support, of 16b & 24b
Sounds a good idea, not sure if Chip supports that.
That is what is being discussed.
As spec'd above, there are right now just 2 choices 8 or 32b set by mode.
16b & 24b i2s is not possible, nor is 5,6,7,9b UART (+Parity).
My suggestion is to save 2 modes for other uses, and allow user specify number of bits as 1..32
-Phil
Getting 9b from a generic USB-UART is a royal pain, as the force-parity paths are USB-frame paced.
We have also designed 'twisted ring' chained UART networks, using that 9th bit as edge-signal, that have very small code overhead , and run at high Baud rates.
What if the Smart Pins were moved back into the cogs?
We don't get a 4 fold reduction because we would require at least two sets. But there would be some common logic that could be saved. Now, maybe with the ability to cut the 2x 32-bit register into 4x 16-bit registers, we could do 16-bit quad spi.
Of course this method would add some muxes to get the correct input & output pins. Would this trade in silicon to what we save?
Can any of this be integrated with the streamer to combine logic and perhaps save logic?
Could we use this logic/mux to benefit the INA/INB OUTA/OUTB access instructions by automatically masking the bits?
Can we then use these counters in place of the cogs counters that have been put back in?
You still need 16:64 pin mapping muxes, and you lose the ability to have one COG control 32 UARTS, or 16 PWMS etc.
A nice feature of Pin cells, is you remove the constraint of 2 counters per COG.
I can see an option of 32 Pin cells, done as Pairs of pins as one way to save some silicon.
There, the Pin cell does get larger, as Chip now has Tx OR Rx in a cell, not both. Pair-pins would be duplex,so this might save 20-25% of Pin Area ?
Error (170012): Fitter requires 11401 LABs to implement the design, but the device contains only 11356 LABs
The way out of this is to reduce the logic in the smart pins. We need to cut 7 ALM's from each smart pin. Better goal would be ~30 ALM's. A LAB is 10 ALM's.
I just checked and the serial mode uses ALL state bits. That's 8 plus 32 * 3 flops. To do 1..32 bit length, we're going to have to add some flops and expand the configuration protocol.
For what it's worth, the user-programmable mode can do any-bit-length SPI/I2C, it just can't double-buffer data like the serial mode can. The serial mode is really deluxe, in that you can run full-speed, without interruption, because of the double-buffering.
I see there's still room for compressing the smart pin logic, as one of the 32-bit registers is used by all modes to either shift left/right or count up/down. If that logic were to be consolidated, we'd free up ~80 ALMs per smart pin, which would get us fitting again.
This is going to delay the FPGA image by another day.
Right now, smart pins can do RX and TX. I just looked into splitting them up, so that even pins do TX and odd pins to RX, but there is about 90% overlap in logic, so it's a big price to pay for saving only a little logic.
But, I guess if you see a way to trim it down that's better...
The Pin Cell can configure either way, but it is 'one use'.
To move to 32 Pair Cells would need configure of 1 Rx and 1Tx at the same time. (hence the saving is not 50%)