SmartPin USB (was: SmartPin state machine modes)
Seairth
Posts: 2,474
I'm starting a separate thread, as I suspect there will be quite a bit of discussion on this particular feature.
To start it off, I wanted to expand upon one of the functions Chip has described in the docs (link below), mostly because it took me a little while to realize what was going on. And I'm still not quite sure I've got it.
In the 2-bit, 1-pattern variant, it turns out that six of the opcodes are deliberately paired such their behavior is complementary (Z=1/-1, Z=inc/dec, Z=shr/shl), where toggling the LSB effectively selects between complementary behaviors. What I'm a little confused about is how the LSB is toggled. First, I understand that opcode[3] must be set to 1. From there, the actual toggling behavior is controlled by X[26], Y[26], and Y[31:27]. These basically control when to toggle the LSB state.
Where I'm getting stuck is understanding what happens next. My current guess is that the state machine has two states, 0 (initial) and 1. Each time an opcode triggers a "next state" transition, the effective behavior of the opcodes switch to their complementary behavior. In other words, the document states that %xx100 increments Z and %xx101 decrements Z. If input selects Y[25:21] (which, let's say, I had set to %xx100), the actual behavior would be:
{Y[25:22], Y[21] ^ state}
In other words, in State 0, Y[25:21] would be treated as an INC Z, while in State 1, it would be treated as DEC Z. If Y[25:21] had been initially set to %xx101, then it would be DEC Z and INC Z, respectively.
Is this correct? Or am I totally misunderstanding how states are used for the 2-bit, 1-pattern variant?
https://docs.google.com/document/d/10qQn_-B7avY2ce0N1MDDdzOF1lACPNWUJkjHFyzITiY/edit#bookmark=kix.13lo3hd6iafi
To start it off, I wanted to expand upon one of the functions Chip has described in the docs (link below), mostly because it took me a little while to realize what was going on. And I'm still not quite sure I've got it.
In the 2-bit, 1-pattern variant, it turns out that six of the opcodes are deliberately paired such their behavior is complementary (Z=1/-1, Z=inc/dec, Z=shr/shl), where toggling the LSB effectively selects between complementary behaviors. What I'm a little confused about is how the LSB is toggled. First, I understand that opcode[3] must be set to 1. From there, the actual toggling behavior is controlled by X[26], Y[26], and Y[31:27]. These basically control when to toggle the LSB state.
Where I'm getting stuck is understanding what happens next. My current guess is that the state machine has two states, 0 (initial) and 1. Each time an opcode triggers a "next state" transition, the effective behavior of the opcodes switch to their complementary behavior. In other words, the document states that %xx100 increments Z and %xx101 decrements Z. If input selects Y[25:21] (which, let's say, I had set to %xx100), the actual behavior would be:
{Y[25:22], Y[21] ^ state}
In other words, in State 0, Y[25:21] would be treated as an INC Z, while in State 1, it would be treated as DEC Z. If Y[25:21] had been initially set to %xx101, then it would be DEC Z and INC Z, respectively.
Is this correct? Or am I totally misunderstanding how states are used for the 2-bit, 1-pattern variant?
https://docs.google.com/document/d/10qQn_-B7avY2ce0N1MDDdzOF1lACPNWUJkjHFyzITiY/edit#bookmark=kix.13lo3hd6iafi
Comments
Quadrature encode has been mentioned
What about Quadrature encode, with index clear ?
Or external Pin A edge Count, with Pin B Edge Capture
Manchester/BiPhase/USB encode/decode
Bit stuff or destuff ?
etc
After I get the serial modes documented, I'll make some state machine examples.
That's right.
Ugh. I keep forgetting that both A and B could be set to other pins. So, it would be possible to configure pin 2 as:
PINSETM ##%1_00_10111_0111_0110_00_0_0000000000000, #2
mode : %10111 (Custom state machine, 1-bit, 2-pattern, with output)
Input A : Pin 0
Input B : Pin 1
Output Z : Pin 2
It's a bit simplified to not make it look too complicated ;-)
Andy
You nailed it, Andy!
I don't know if this design is the ideal of what could be accomplished, given the resource expenditure, but it's good for simple counting and shifting.
First impression is that it's very complicated and yet can't do very much.
I'm sure I'm wrong about that though.
I guess, like jmg, I could use an example of what this could be used for...
Once you understand it, you'll see it's not that complicated, at all. It's just something new.
Where does QuadSPI fit onto the P2 resource ?
Is that a Streamer feature, or can the State engine manage that ?
The trick will be in setting up a SmartPin for generating the master clock that is nicely aligned with the Streamer's data phases.
Yes, Andy, thank you for the diagram.
Fingers crossed QuadSPI is not that bad. SW shuffling would kill performance.
Hopefully, the Streamer has a native 4 bit mode.
There could be a place for Byte -> QuadSPI in DDR modes, where the streamer thinks in bytes, but the Pin-cells Mux 8:4 with the SPI CLK, to allow Double Data Rate SPI.
Good luck marketing that line, on a device with claimed Smart Pins.
If 'performance' measures exclude/ignores mA/Bus MHz, and ignores extra layers of software to debug, via COG Ping-Pong , then you might be right to loftily claim "won't hurt performance at all"
We did this before on the hot chip. The expectation that everything happen in one COG, or even just the background of that COG isn't realistic.
If it's fast, one or two COGS won't matter. And the Smart Pins are very useful regardless of how this case works out. On a single CPU, it does kill performance, bit on a 16 CPU system, it won't.
That matters.
As for debug, we got really great features added for the express purpose of making that task easier, single COG or not.
Finally, streamer output has 4 bit LUT modes. Seems to me, the LUT can pre shuffle data to align with pins. Input does not offer that, but the LUT could also be used to avoid some of the instructions.
Maybe it makes sense to attempt some of these things rather than continue to press for dedicate hardware. There are a lot of features on this chip.
The expectation is that bit level stuff will be managed in hardware.
There are UARTS and SPI ports already, in hardware. As well as PWM and Capture, in hardware.
There is a Streamer, in hardware.
All of that gives an expected performance level, and users really do not expect gaps, where they suddenly have to change to software band-aids, especially on mainstream serial modes like Quad SPI.
There are a lot of users and it's pretentious to speak for them all.
Edit: I'm not speaking to the validity of the desired feature here as much as I am the idea of "the market" and "what will sell" being way too easy to drop out there as a qualification or justification.
If we applied that across this design, it would not exist!
Getting back onto the State Machine topic...
Still trying to grasp examples where this can fill in a gap.
Let's look at another use case, of 2 bit SPI and JTAG.
Unlike mainstream QuadSPI, which really does need to run FAST, DualSPI may be fringe enough to consider software support, (if the streamer cannot support in native form).
Can the State engine help JTAG write ?
Could the State engine manage JTAG bypass, & JTAG read ?
Or, is that just a little too complex for the State Engine ?
JTAG read/write meaning as SPI master? Those should be easy.
JTAG also requires variable bit lengths.
TDI is the one-bit-wide reply, and that is Duplex.
Not quite, JTAG bypass is a special State sequence on TDO and TMS. Once in that state, JTAG slave device looks like a single D-FF
Again, not quite. JTAG write is probably the easier one, as that can tolerate speed changes with SW.
JTAG read would require the State Engine be able to follow a JTAG sequence, again I suspect may be beyond the State Cbility,
That's why this is a good test case, to help define what can, and cannot be done.
I've managed to find this now you've described it, not something I've noticed before but I see why it exists given JTAG can heavily use SPI's daisy chaining abilities.
Again, JMG, you should be asking for the enabling features, with reference to products maybe, rather than asking for the products themselves.
Agreed.
This is part of why I mentioned it being worth it to attempt some things. What will fall out of that is much closer to those enabling features, and that is high value.
'Enabling features' is too vague, for multiple people to correctly understand.
Real applications are far easier for everyone to follow.
And won't sell chips.