SmartPin USB (was: SmartPin state machine modes)

cgracey · 2016-02-17 20:09

jmg wrote: »

cgracey wrote: »

Jmg was asking about logic usage for different modes.

Here is a table from Quartus of usage for two smart pins. The first column is 'Logic Cells' and the second column is "Dedicated Logic Registers".

Thanks for the table.
Key question around state logic, is how much does that shrink, if you comment out the state mode?

With all the muxes, is the pin-cell still comfortably faster than the critical path ? ie clocks easily >>sysclk in nco modes ?

That's a good way to determine the real size of the programmable mode.

The pin cell actually is not far from the critical path because of the programmable state mode. There are many layers of mux's in that block.

cgracey · 2016-02-18 07:21

.

cgracey · 2016-02-18 07:25

.

cgracey · 2016-02-18 07:32

I recompiled without the programmable state mode to see what the difference was.

Here is the Cyclone IV with the programmable state mode:

Here it is without:

Looks like a net difference of ~210 LE's, or ~20%. So, the programmable state mode takes about 1/5th of the smart pin logic.

Ariba · 2016-02-18 09:00

cgracey wrote: »

Andy, do you feel these custom modes are arranged in a useful manner? Could we get better functions out of this much logic? I just figured counting and shifting were likely useful, but maybe a little twist could make this a lot better.

I study some use cases like Manchester decoding for Ethernet, or receiving the output of a MEMS microphone or an external SigmaDelta ADC. Also 12MHz USB receive would be cool, but I think that needs a lot of additions.

The question is: Does it make sense to do that with Smartpins or will USB and Ethernet anyway be done with external chips (Wiznet, FTDI)?
Should we add some specialized logic to support for example USB, or must all be generic usable?

What do others expect of the custom Smartpin modes?

Andy

Bob Lawrence (VE1RLL) · 2016-02-18 09:17

Andy,

Excellent Diagrams and example. Thanks! for both.

I had a quick look at the example I got stuck on this line

"' ST=1 Bp Bc Ap Ac Posedge detector on B inp"

and I was trying to understand where it was done in the code example but I don't see it yet. I have to figure out what the Bp Bc and AP Ac refer to. I don't see any reference to Posedge in the code so it must be in the mode set up. It's too late tonight but tomorrow I'll check the doc's then I'll print the example off and look again , after my morning coffee, maybe a light bulb will light up LOL.

cgracey · 2016-02-18 09:48

Bob Lawrence (VE1RLL) wrote: »

Andy,

Excellent Diagrams and example. Thanks! for both.

I had a quick look at the example I got stuck on this line

"' ST=1 Bp Bc Ap Ac Posedge detector on B inp"

and I was trying to understand where it was done in the code example but I don't see it yet. I have to figure out what the Bp Bc and AP Ac refer to. I don't see any reference to Posedge in the code so it must be in the mode set up. It's too late tonight but tomorrow I'll check the doc's then I'll print the example off and look again , after my morning coffee, maybe a light bulb will light up LOL.

Bp - B previous
Bc - B current
Ap - A previous
Ac - A current

cgracey · 2016-02-18 09:50

Ariba wrote: »

cgracey wrote: »

Andy, do you feel these custom modes are arranged in a useful manner? Could we get better functions out of this much logic? I just figured counting and shifting were likely useful, but maybe a little twist could make this a lot better.

I study some use cases like Manchester decoding for Ethernet, or receiving the output of a MEMS microphone or an external SigmaDelta ADC. Also 12MHz USB receive would be cool, but I think that needs a lot of additions.

The question is: Does it make sense to do that with Smartpins or will USB and Ethernet anyway be done with external chips (Wiznet, FTDI)?
Should we add some specialized logic to support for example USB, or must all be generic usable?

What do others expect of the custom Smartpin modes?

Andy

It would be neat to get USB and Ethernet receive working. I feel like each of these would take a few weeks, though, and I'm already late.

Right now, I'm trying to figure out how I can steal 3 bits to get that low-pass filtering implemented. Also, how will I add 5 bits for async serial bit count?

Ariba · 2016-02-18 09:50

Bob

Ac,Ap,Bc,Bp = A-current, A-previous, B-current, B-previous
These are the 4 inputs into the LUT.
Prevous means that the input is delayed by one clockcycle (the FF in the diagram).

To make a pos-edge detector on B: If B-current is high and B-prevoius is still low, we are on the first clock after a rising edge (later they are both high, and before they are both low).

So the LUT must be programmed with that pattern:

      Bp Bc Ap Ac | Out   Bit in
input 3  2  1  0  |       X[15..0]
---------------------------------
      0  0  0  0     0      0
      0  0  0  1     0      1
      0  0  1  0     0      2
      0  0  1  1     0      3
      0  1  0  0     1      4
      0  1  0  1     1      5
      0  1  1  0     1      6
      0  1  1  1     1      7
      1  0  0  0     0      8
      1  0  0  1     0      9
      1  0  1  0     0      10
      1  0  1  1     0      11
      1  1  0  0     0      12
      1  1  0  1     0      13
      1  1  1  0     0      14
      1  1  1  1     0      15

In the source code this is heavy abbrevaited ;-) You can see that Bp=0 and Bc=1 happens on input combination 4..7.

Every pos-edge generates a NextState command. After N such commands State changes. If State=1 the edge INCrements Z, which toggles Z[0] and on next edge State goes back to 0. Z[0] is the output to the pin.

Hope this helps.

Andy

rogloh · 2016-02-18 10:57

I keep looking at Andy's nice diagram showing Chip's design and wondering if there might be some clever way to try to use Smart pins for USB bit unstuff purposes somehow. But I think some minor changes would be certainly be needed with the N counter.

E.g. if we normally operate from X and keep reloading/restarting the N counter each time we see a differential logic 0 and only allow N to hit zero and switch the ST after seeing five differential 1's at which point we temporarily switch our instructions to use Y for one bit time where we then change behaviour and can decide to strip the bit if required (depending on next differential bit value), then unstuffing could be achieved using a NOP (add 0 to Z) before switching back to normal X instruction logic patterns and continuing. But there is still the issue of deciding when to push the data up to the COG. Maybe we really need to preserve N for that too, or in a USB mode we could somehow partition N into two separate counters (one part for unstuff purposes and one 3 bit part for the bit counter of the byte being reassembled). Gets slightly tricky but something may be doable without a lot more state required if you can repurpose the N counter bits slightly in some type of special "USB counter mode". But where could we fit such a USB mode state bit if everything is already allocated in X,Y?

For transmit, I guess USB bit stuffing could be done ahead of time by software using LUTs perhaps in the COG (eg. from the stack RAM) which can identify patterns of 5 1's and translate them to a stuffed version perhaps. Though for receive it would be quite nice to do it on the fly if we could leverage what Chip has got done already in the Smart Pins. This frees up the receiving COG to do other parsing operations that a USB receive process typically would require as compared to transmit.

Rayman · 2016-02-18 12:48

Can USB interfacing be tested with a 80 MHz clock?

I thought it had to be multiple of 12 MHz...

Ramon · 2016-02-18 13:58

Ariba wrote: »

The question is: Does it make sense to do that with Smartpins or will USB and Ethernet anyway be done with external chips (Wiznet, FTDI)?

It would be great to bypass a USB or Ethernet external IC, but I am not sure this could be possible.

The problem with USB is not only the signal complexity (SE0) but also the protocol complexity (define endpoints, etc...). Just take a look to some USB protocol analyzer captures to see what is transmitted. Also, HW protocol analyzers are not cheap.

Ethernet looks easier, but has some issues too (baseline wandering on RJ45, scrambling/descrambling, 8b/10b, and FCS calculation).

Optical Ethernet doesn't have the baseline wandering problem, this maybe can be easier.

BTW, I wonder if anyone is testing the SYNC mode on smart pins. Is SYNC mode documented?
Two days ago neither ASYNC or SYNC was documented.

Seairth · 2016-02-18 15:22

Rayman wrote: »

Can USB interfacing be tested with a 80 MHz clock?

I thought it had to be multiple of 12 MHz...

Since it looks like full speed (12Mb/s) and low speed (1.5Mb/s) are mostly distinguished by which data line is pulled high, shouldn't we be able to prove out the feasibility of USB at low speed?

Rayman · 2016-02-18 15:54

It would be great if P2 could connect to USB mouse and keyboard directly.
People were able to do hack this a bit with P1, but had to use 6 MHz clock...
I remember somebody did a lot of code to get P1 to act like USB keyboard this way...

Bob Lawrence (VE1RLL) · 2016-02-18 17:37

@Rayman

re:I remember somebody did a lot of code to get P1 to act like USB keyboard this way...

Working full-speed (12 Mb/s) bit-banging USB Host controller
http://forums.parallax.com/discussion/121321

IT’S ALIVE: BIT-BANGING FULL-SPEED USB HOST FOR THE PROPELLER
http://scanlime.org/2010/04/its-alive-bit-banging-full-speed-usb-host-for-the-propeller/

Rayman · 2016-02-18 18:03

That's the one... See now scanlime got host mode working somewhat too.
But, in the fine print, you'd see they had to use 6 MHz crystal.

If that can be done with P1, surely it can be done better with P2.
The internal pullup resistors resistors should make it even easier, I think...

potatohead · 2016-02-18 18:12

Honestly, doing that makes sense. I know Chip is a little late, but maximizing the Smart Pin seems worth it.

I'm not sure I could complete the task in a reasonable time, but if someone does, and we all work with it some, we may find the existing functionality is good, or we may find something else, maybe a trade off to make a big difference.

This chip runs at a few multiples of a P1 at the same clock, and it's got a lot more bit bashing type instructions.

Software USB at the higher speed may be possible in a COG now, and it would seem for sure at real hardware clock rates, right?

Given the bigger RAM and double COGS, is it really that important to have this in hardware as opposed to a potentially nice hardware assist, or even full on software?

Once it's done, people won't have a lot of trouble using it. That is the strength of the COG model, unchanged and enhanced over P1.

Given the logic cost and x64 multiplier, it sure seems like it is worth a little time to attempt a few things to understand the real benefits.

My .02 on it.

jmg · 2016-02-18 18:43

Ariba wrote: »

.... Also 12MHz USB receive would be cool, but I think that needs a lot of additions.

The question is: Does it make sense to do that with Smartpins or will USB and Ethernet anyway be done with external chips (Wiznet, FTDI)?
Should we add some specialized logic to support for example USB, or must all be generic usable?

What do others expect of the custom Smartpin modes?

Based on examples of use so far, the State-option seems to give poor return on Logic-Cost.
That's a lot of LEs, that may have more useful work ?

Looking at USB, starting from the simplest Logic-Helpers

SE0 - Single AND gate ?
Differential IO - I think this may already be there ?
Comparator RX - Already there ?
NRZI - bit level encode/decode - Single FF + XOR, with Config Bit to Enable
Bit Stuff/Destuff - Small counter/state engine/ShiftEnable gate

USB also needs to Lock a Sample point to the Edges - DPLL. Mostly this is 48MHz state engine.

Many of the simpler SW kludges lack DPLL, and also have many other caveats that limit them from Commercial use.

Once that (minimal?) HW is in place, from here up, SW seems natural. CRC ?

Would USB use TWO Pin Cells, one for Rx and one for Tx - attached to same 2 pins ?
eg the BitStuff done on Tx and Destuff on Rx
Based on present SPI and UARTs that seems the natural mapping, and may allow less Logic per-cell ?

Would a Dual-path system pass a blanking signal between cells, so Own-Tx does not fill Own-Rx buffer ?
Some cross-cell handshake seems likely.

Q: Is the USB target Hobbyist only, or should Commercial use be a focus ?

kwinn · 2016-02-18 18:56

potatohead wrote: »

Honestly, doing that makes sense. I know Chip is a little late, but maximizing the Smart Pin seems worth it.

I'm not sure I could complete the task in a reasonable time, but if someone does, and we all work with it some, we may find the existing functionality is good, or we may find something else, maybe a trade off to make a big difference.

This chip runs at a few multiples of a P1 at the same clock, and it's got a lot more bit bashing type instructions.

Software USB at the higher speed may be possible in a COG now, and it would seem for sure at real hardware clock rates, right?

Given the bigger RAM and double COGS, is it really that important to have this in hardware as opposed to a potentially nice hardware assist, or even full on software?

Once it's done, people won't have a lot of trouble using it. That is the strength of the COG model, unchanged and enhanced over P1.

Given the logic cost and x64 multiplier, it sure seems like it is worth a little time to attempt a few things to understand the real benefits.

My .02 on it.

I agree that on chip usb would be a worthwhile addition, and cannot help wondering if this functionality could be shared between the pins and the cogs. After all a cog and some software is needed so having all the hardware on every pin is overkill.

IOW would it be possible to take advantage of the pin logic by adding a bit more logic to the cogs for hardware assisted full speed usb?

jmg · 2016-02-18 19:28

Rayman wrote: »

Can USB interfacing be tested with a 80 MHz clock?

I thought it had to be multiple of 12 MHz...

Multiples of 12 make life easier, and most use 48MHz as a min of 4 samples per bit.
Choices above 48MHz and close to 80MHz are then 72MHz or 84MHz, but a 80MHz clock could be used with /7 /7 /6 (needs 5 registers)

Bob Lawrence (VE1RLL) · 2016-02-18 19:37

@ Ariba

Thanks! Andy. That sure helps

Bob
====================================

Bob

Ac,Ap,Bc,Bp = A-current, A-previous, B-current, B-previous
These are the 4 inputs into the LUT.
Prevous means that the input is delayed by one clockcycle (the FF in the diagram).

To make a pos-edge detector on B: If B-current is high and B-prevoius is still low, we are on the first clock after a rising edge (later they are both high, and before they are both low).

So the LUT must be programmed with that pattern:
Bp Bc Ap Ac | Out Bit in
input 3 2 1 0 | X[15..0]
0 0 0 0 0 0
0 0 0 1 0 1
0 0 1 0 0 2
0 0 1 1 0 3
0 1 0 0 1 4
0 1 0 1 1 5
0 1 1 0 1 6
0 1 1 1 1 7
1 0 0 0 0 8
1 0 0 1 0 9
1 0 1 0 0 10
1 0 1 1 0 11
1 1 0 0 0 12
1 1 0 1 0 13
1 1 1 0 0 14
1 1 1 1 0 15
In the source code this is heavy abbrevaited ;-) You can see that Bp=0 and Bc=1 happens on input combination 4..7.

Every pos-edge generates a NextState command. After N such commands State changes. If State=1 the edge INCrements Z, which toggles Z[0] and on next edge State goes back to 0. Z[0] is the output to the pin.

Hope this helps.

Andy

jmg · 2016-02-18 19:45

kwinn wrote: »

I agree that on chip usb would be a worthwhile addition, and cannot help wondering if this functionality could be shared between the pins and the cogs. After all a cog and some software is needed so having all the hardware on every pin is overkill.

Certainly the pins need just enough hardware to have this work at 12MHz, and no more.
eg CRC is one item that may push up into a COG.

I cannot yet find a number on the USB tolerated turn-around time ?

eg after a Host sends a block, (finishing in CRC5 or CRC16 ) and expects an ACK from slave, how long can that take ?
There is significant latency between COG and PINs with the Pin Cells, and any intra-message SW checks will only slow that further.

http://www.usbmadesimple.co.uk/ums_3.htm

I see mention of idle state during turnaround, so this seems to be multi-cycle (12MHz) tolerant, but I'd guess the 1ms SOF Packet should never be crossed - Delays up to single digit microseconds might be ok ?.

jmg · 2016-02-18 19:58

Ramon wrote: »

It would be great to bypass a USB or Ethernet external IC, but I am not sure this could be possible.

As a sanity check here, USB MCUs have decreased in price while the P2 has been in gestation, and a 8k Flash USB FS Device MCU is
86c 1 off or 64c / 1500+
- that covers Device, but Host is not quite as cheap.
USB OTG as a filter, bumps MCU base price to ~$1.57/1500

Ariba · 2016-02-18 20:33

USB lowspeed can be done in software, this was also possible on P1.
USB highspeed is out of question.
USB fullspeed (12MBit/s) needs some Smartpin support if we want to do it efficiently.

It may be possible to receive the raw USB bitstream with the streamer and analyse it later per software, while we send NAKs back to the Host to gain some time for the software. But that is not efficient.

If the Smartpins can do the low level stuff that rogloh and jmg described, and the cog gets a full decoded byte, then CRC and protocol handling should be doable in software.
If I undestand it right jmg proposes a specialized peripheral that does bit level handling. But will that be on every pin? This may add more logic than the custom smartpin modes and is not flexible usable.

I think with some little tweaks here and there it should be possible to do it with the existing resources - but we will use more than 2 pins, some pins will only output some intermediate decoding results which then get further processed by another pincell.

rogloh already has done a proposal, I have another idea - need to draw a diagram of it...

Andy

tonyp12 · 2016-02-18 20:37

P2 smart pins should been designed as USB2.0 in the first place, similar to what a smcs ULPI Physical Layer Transceiver (PHY).
and then make every function it does an option with settings, as to make it more generic that it can do spi/pwm etc

Not the other way around as is done, trying to make it usb from spi/pwm hardware.

You can get Silabs ARM with USB2.0 (comes with factory installed usb boot loader) for $1 each

jmg · 2016-02-18 21:14

Ariba wrote: »

If I undestand it right jmg proposes a specialized peripheral that does bit level handling. But will that be on every pin? This may add more logic than the custom smartpin modes and is not flexible usable.

I see broadly 4 Pin-support items, two are very low cost in Logic, and the next two are modest. (<12 FF + Config)
It should be much less than the 200+ LE that State-Mode consumes currently.

I think 2 USB Pins can map to two Pin Cells, so some logic may be able to be saved, and some useful queueing done
(eg for your NACK where SW is still busy)

or perhaps one Pin cell does half-duplex Tx-Rx in a ping-pong manner and the other pin cell does CRC-shifter + DPLL ?

Ariba wrote: »

I think with some little tweaks here and there it should be possible to do it with the existing resources - but we will use more than 2 pins, some pins will only output some intermediate decoding results which then get further processed by another pincell.

I'm not sure more than 2 pins is necessary ?

cgracey · 2016-02-18 23:00

Two pins should be able to handle it, especially if odd and even pins had different USB smarts.

Something I've always wanted to see, but have never found anywhere, is a diagram of USB protocol from the wire level up. If there was something definitive to look at, this could be easy. Diving deep into the huge USB specification to try to construct such information by making lots of inferences has put me off. If I wanted someone to know how something worked, I would explain it in very direct terms. For whatever reason, these protocol standards are never written like that.

Ariba · 2016-02-18 23:10

jmg wrote: »

...
I'm not sure more than 2 pins is necessary ?

With the existing modes you can not do it with 2 smartpins. For sure with a dedicated USB mode it needs only 2 pins, unless the following idea is worth a third pin:

A NCO should be able to generate the bitclock for USB with a little tweak. The NCO Frequency mode must get a variant that allows to reset the phase register on edges of input A or B. There are 16 bits in X left to configure such a mode.
The nice thing is that USB will work with every crystal/sysclock frequency maybe even with the internal oscillator because the bitclock gets synchronized on every edge in the USB datastream.
If there is no edge the NCO runs free with the programmed frequency. This happens max. for 6 bits, because then follows the stuffbit which always generates an edge (thats the purpose of the stuffbit).

Here is a timing diagram:

If there is no dedicated USB smartmode, a second synchronized NCO with 1/6 the frequency can be used to detect a stuffbit (the green part in the diagram).

The sychron RX mode needs then a variant that does NRZ->NRZI translation (an Xor + FlipFlop) and diisables shift in on a stuffbit.

Andy

cgracey · 2016-02-18 23:35

Thanks, Ariba. That's all understandable. Do the input and output modes follow the same protocol?

I know that on top of that, there's some CRC stuff going on. This may only take 50 LE's to implement. All I need to know is what is needed, exactly, with consideration of what would be done in hardware and what would be done in software. I just don't know enough, yet.

Is there anything one can read to get the whole full-speed (12MHz) USB picture? There are probably just several things we'd need to implement.

Rayman · 2016-02-19 00:03

I'd like to see such a thing too... I've read about USB stuff at various times, but still don't really understand it...

I have to image USB stuff would be a lot easier just using a 6 or 12 MHz crystal though.
Is this a problem?

SmartPin USB (was: SmartPin state machine modes)

Comments