SmartPin USB (was: SmartPin state machine modes)

Seairth · 2016-02-12 16:20

I'm starting a separate thread, as I suspect there will be quite a bit of discussion on this particular feature.

To start it off, I wanted to expand upon one of the functions Chip has described in the docs (link below), mostly because it took me a little while to realize what was going on. And I'm still not quite sure I've got it.

In the 2-bit, 1-pattern variant, it turns out that six of the opcodes are deliberately paired such their behavior is complementary (Z=1/-1, Z=inc/dec, Z=shr/shl), where toggling the LSB effectively selects between complementary behaviors. What I'm a little confused about is how the LSB is toggled. First, I understand that opcode[3] must be set to 1. From there, the actual toggling behavior is controlled by X[26], Y[26], and Y[31:27]. These basically control when to toggle the LSB state.

Where I'm getting stuck is understanding what happens next. My current guess is that the state machine has two states, 0 (initial) and 1. Each time an opcode triggers a "next state" transition, the effective behavior of the opcodes switch to their complementary behavior. In other words, the document states that %xx100 increments Z and %xx101 decrements Z. If input selects Y[25:21] (which, let's say, I had set to %xx100), the actual behavior would be:

{Y[25:22], Y[21] ^ state}

In other words, in State 0, Y[25:21] would be treated as an INC Z, while in State 1, it would be treated as DEC Z. If Y[25:21] had been initially set to %xx101, then it would be DEC Z and INC Z, respectively.

Is this correct? Or am I totally misunderstanding how states are used for the 2-bit, 1-pattern variant?

https://docs.google.com/document/d/10qQn_-B7avY2ce0N1MDDdzOF1lACPNWUJkjHFyzITiY/edit#bookmark=kix.13lo3hd6iafi

cgracey · 2016-02-12 17:18

That's correct. This state business was really intended for the one-bit, two-pattern mode. I was just trying to find a way to use it in the two-bt, one-pattern mode.

jmg · 2016-02-12 19:16

Maybe some examples of what the State feature can do, and just as important, cannot do, are needed.
Quadrature encode has been mentioned
What about Quadrature encode, with index clear ?
Or external Pin A edge Count, with Pin B Edge Capture
Manchester/BiPhase/USB encode/decode
Bit stuff or destuff ?

etc

cgracey · 2016-02-12 19:27

It's designed for counting and shifting operations. It won't do anything like bit stuffing.

After I get the serial modes documented, I'll make some state machine examples.

Seairth · 2016-02-12 20:22

When you use one of the output modes (%10101 or %10111), which pin is the output to?

ozpropdev · 2016-02-13 01:55

Based on the other output modes the smartpin number is the IO pin number.

cgracey · 2016-02-13 01:58

ozpropdev wrote: »

Based on the other output modes the smartpin number is the IO pin number.

That's right.

Seairth · 2016-02-13 02:59

cgracey wrote: »

ozpropdev wrote: »

Based on the other output modes the smartpin number is the IO pin number.

That's right.

Ugh. I keep forgetting that both A and B could be set to other pins. So, it would be possible to configure pin 2 as:

PINSETM ##%1_00_10111_0111_0110_00_0_0000000000000, #2

mode : %10111 (Custom state machine, 1-bit, 2-pattern, with output)
Input A : Pin 0
Input B : Pin 1
Output Z : Pin 2

Ariba · 2016-02-13 05:26

To understand the Custom State Maschine modes, I tried to draw a schematic for the whole logic for both modes.
It's a bit simplified to not make it look too complicated ;-)

Andy

cgracey · 2016-02-13 06:08

Ariba wrote: »

To understand the Custom State Maschine modes, I tried to draw a schematic for the whole logic for both modes.
It's a bit simplified to not make it look too complicated ;-)

Andy

You nailed it, Andy!

I don't know if this design is the ideal of what could be accomplished, given the resource expenditure, but it's good for simple counting and shifting.

jmg · 2016-02-13 06:35

cgracey wrote: »

I don't know if this design is the ideal of what could be accomplished, given the resource expenditure, but it's good for simple counting and shifting.

What's the added logic cost of having this State Machine Support in every pin cell ?

Rayman · 2016-02-13 12:43

Guess it's going to take me some time to figure this state machine stuff out...
First impression is that it's very complicated and yet can't do very much.
I'm sure I'm wrong about that though.
I guess, like jmg, I could use an example of what this could be used for...

cgracey · 2016-02-13 14:20

Rayman wrote: »

Guess it's going to take me some time to figure this state machine stuff out...
First impression is that it's very complicated and yet can't do very much.
I'm sure I'm wrong about that though.
I guess, like jmg, I could use an example of what this could be used for...

Once you understand it, you'll see it's not that complicated, at all. It's just something new.

jmg · 2016-02-13 19:25

cgracey wrote: »

Once you understand it, you'll see it's not that complicated, at all. It's just something new.

Where does QuadSPI fit onto the P2 resource ?
Is that a Streamer feature, or can the State engine manage that ?

evanh · 2016-02-13 22:23

The Streamer maybe could be used as a master - which is what's wanted I believe. When outputting, set only 4 pins as output but pad the DMA buffer as 8-bit data. Like wise for reading in, only 4 pins have valid data so the other four pins that the Streamer reads into HubRAM will need to be masked off in software. Quite a lot of shuffling to format the data, I know. Think of it like pre-embedding the clock in the data stream.

The trick will be in setting up a SmartPin for generating the master clock that is nicely aligned with the Streamer's data phases.

evanh · 2016-02-13 22:43

cgracey wrote: »

You nailed it, Andy!

Yes, Andy, thank you for the diagram.

jmg · 2016-02-13 23:05

evanh wrote: »

The Streamer maybe could be used as a master - which is what's wanted I believe. When outputting, set only 4 pins as output but pad the DMA buffer as 8-bit data. Like wise for reading in, only 4 pins have valid data so the other four pins that the Streamer reads into HubRAM will need to be masked off in software. Quite a lot of shuffling to format the data, I know.

Fingers crossed QuadSPI is not that bad. SW shuffling would kill performance.
Hopefully, the Streamer has a native 4 bit mode.

There could be a place for Byte -> QuadSPI in DDR modes, where the streamer thinks in bytes, but the Pin-cells Mux 8:4 with the SPI CLK, to allow Double Data Rate SPI.

evanh · 2016-02-14 00:58

Software pre/post-formatting the data won't hurt performance at all. Just throw a Cog at it, gives them something to actually do.

jmg · 2016-02-14 01:07

evanh wrote: »

Software pre/post-formatting the data won't hurt performance at all. Just throw a Cog at it, gives them something to actually do.

Is that a serious post ?
Good luck marketing that line, on a device with claimed Smart Pins.

If 'performance' measures exclude/ignores mA/Bus MHz, and ignores extra layers of software to debug, via COG Ping-Pong , then you might be right to loftily claim "won't hurt performance at all"

potatohead · 2016-02-14 01:16

That is what there are 16 of them for.

We did this before on the hot chip. The expectation that everything happen in one COG, or even just the background of that COG isn't realistic.

If it's fast, one or two COGS won't matter. And the Smart Pins are very useful regardless of how this case works out. On a single CPU, it does kill performance, bit on a 16 CPU system, it won't.

That matters.

As for debug, we got really great features added for the express purpose of making that task easier, single COG or not.

Finally, streamer output has 4 bit LUT modes. Seems to me, the LUT can pre shuffle data to align with pins. Input does not offer that, but the LUT could also be used to avoid some of the instructions.

Maybe it makes sense to attempt some of these things rather than continue to press for dedicate hardware. There are a lot of features on this chip.

ozpropdev · 2016-02-14 01:42

jmg · 2016-02-14 02:52

potatohead wrote: »

The expectation that everything happen in one COG, or even just I the background of that COG isn't realistic.

The expectation is that bit level stuff will be managed in hardware.

There are UARTS and SPI ports already, in hardware. As well as PWM and Capture, in hardware.
There is a Streamer, in hardware.
All of that gives an expected performance level, and users really do not expect gaps, where they suddenly have to change to software band-aids, especially on mainstream serial modes like Quad SPI.

potatohead · 2016-02-14 03:39

Okie Dokie.

There are a lot of users and it's pretentious to speak for them all.

Edit: I'm not speaking to the validity of the desired feature here as much as I am the idea of "the market" and "what will sell" being way too easy to drop out there as a qualification or justification.

If we applied that across this design, it would not exist!

jmg · 2016-02-14 03:57

cgracey wrote: »

Once you understand it, you'll see it's not that complicated, at all. It's just something new.

Getting back onto the State Machine topic...

Still trying to grasp examples where this can fill in a gap.

Let's look at another use case, of 2 bit SPI and JTAG.
Unlike mainstream QuadSPI, which really does need to run FAST, DualSPI may be fringe enough to consider software support, (if the streamer cannot support in native form).

Can the State engine help JTAG write ?
Could the State engine manage JTAG bypass, & JTAG read ?
Or, is that just a little too complex for the State Engine ?

evanh · 2016-02-14 04:53

2-bit meaning TDO+TDI, correct? And bypass probably means as a slave SPI device. So, as a slave device, you're asking if a single SmartPin shifter can have it's two ends tied to a pin each. One pin as the incoming shift in from TDI and the other pin as out going shift out to TDO.

JTAG read/write meaning as SPI master? Those should be easy.

jmg · 2016-02-14 05:30

evanh wrote: »

2-bit meaning TDO+TDI, correct?

Not quite, JTAG modulates 2 outputs, TDO and TMS during the State-sequence part, then usually holds TMS, during longer data phases.
JTAG also requires variable bit lengths.
TDI is the one-bit-wide reply, and that is Duplex.

evanh wrote: »

And bypass probably means as a slave SPI device.

Not quite, JTAG bypass is a special State sequence on TDO and TMS. Once in that state, JTAG slave device looks like a single D-FF

evanh wrote: »

So, as a slave device, you're asking if a single SmartPin shifter can have it's two ends tied to a pin each. One pin as the incoming shift in from TDI and the other pin as out going shift out to TDO.

JTAG read/write meaning as SPI master? Those should be easy.

Again, not quite. JTAG write is probably the easier one, as that can tolerate speed changes with SW.
JTAG read would require the State Engine be able to follow a JTAG sequence, again I suspect may be beyond the State Cbility,
That's why this is a good test case, to help define what can, and cannot be done.

evanh · 2016-02-14 06:38

jmg wrote: »

evanh wrote: »

2-bit meaning TDO+TDI, correct?

Not quite, JTAG modulates 2 outputs, TDO and TMS during the State-sequence part, then usually holds TMS, during longer data phases.
JTAG also requires variable bit lengths.
TDI is the one-bit-wide reply, and that is Duplex.

Ordinary SPI slaves are nominally duplex and handles varied bit timing. Anything less is not full SPI.

evanh wrote: »

And bypass probably means as a slave SPI device.

Not quite, JTAG bypass is a special State sequence on TDO and TMS. Once in that state, JTAG slave device looks like a single D-FF

I've managed to find this now you've described it, not something I've noticed before but I see why it exists given JTAG can heavily use SPI's daisy chaining abilities.

Again, JMG, you should be asking for the enabling features, with reference to products maybe, rather than asking for the products themselves.

... That's why this is a good test case, to help define what can, and cannot be done.

Agreed.

potatohead · 2016-02-14 07:17

To be fair here, it may be those enabling features, in this design context, are not obvious. What Chip translated our chatter into sure wasn't obvious, I'll bet for most all of us.

This is part of why I mentioned it being worth it to attempt some things. What will fall out of that is much closer to those enabling features, and that is high value.

jmg · 2016-02-14 08:20

evanh wrote: »

Again, JMG, you should be asking for the enabling features, with reference to products maybe, rather than asking for the products themselves.

As mentioned above, with so little known about just what the limits are, the best way to find those is to mention real product usage cases, and find what 'fits'.

'Enabling features' is too vague, for multiple people to correctly understand.
Real applications are far easier for everyone to follow.

evanh · 2016-02-14 09:52

Yet you correctly listed such features once I asked the right questions.

Brian Fairchild · 2016-02-14 11:32

jmg wrote: »

'Enabling features' is too vague...

And won't sell chips.

SmartPin USB (was: SmartPin state machine modes)

Comments