P2 and full speed USB slave requirements/ideas

jmg · 2014-03-09 14:57

jmg wrote: »

Another approach could be to code USB for 1.5MHz, and focus on compact block 'macros' that do the Dual-Bit, and DeStuff operations, that Chip can then turn into opcodes that can allow a jump to 12MHz ?

because a code picture is worth 1000 words, applying this rule backwards, this is some (untested) Verilog
for the suggested DeStuff-Jump opcode

// DS_Shift_JEQ8 Reg,Adr   Opcode  => 3 Fields mapped into active Register, does Skip and Count, and exits on 8 valid bits
reg [9:0] ShiftB;  // can be as small as 7b, but 10b is full raw copy, might be useful ?
reg [7:0] DataBY;
reg [2:0] BitCtr;

assign  Q[31:22]  = ShiftB[9:0];  // Field 3 = copy Shifter, No skip
assign  Q[18:16]  = BitCtr[2:0];  // Field 2 = Bit Counter, skips,  exit on last-Rx-bit
assign  Q[7:0]    = DataBY[7:0];  // Field 1 = RxByte bits, skips 

always  @(ShiftB) // inserts a 0 after six (USB) sequential 1's in the transmitter 
begin
 DoSwallow = (ShiftB[5:0] == 6'b111111); // next bit is skipped
end

always  @(BitCtr)
begin
 JExit  =  (BitCtr == 3'b111) & !DoSwallow; // this clock edge is LAST shift/inc, so exit too
end

always  @(posedge CLK)
begin
        Z_FLag_Err <= (ShiftB[6:0] == 7'b1111111);  // Store overflow, optional.
	ShiftB <= {ShiftB[8:0],Din};   // live raw bit pattern, no skip
        if ( DoSwallow ) begin         // .CE Hold Ctr, No Shift == Skip this Stuff-bit
          BitCtr <= BitCtr;
          DataBY <= DataBY;
        end 
        else begin                     // VALID bit, so INC and Do Shift 
          BitCtr <= BitCtr + 1;
	  DataBY <= {DataBY[6:0],Din}; 
	end
end

Can be used rolled, or unrolled, but needs inversion on JUMP,
or it could be patched into SerDes, with PairSample, for USB_ByteRX

' Psuedo ASM code, 2 USB helper opcodes.
	Destuff_d = 0 ' Init all 3 fields and Z
'Start Loop:
	PairSampleOpcode  ' or can include the Jump ?
	JumpIf_SE0
	DS_Shift_JEQ8   Destuff_d, ByteDone   'updates 3 fields, and jumps if Field_N_Bits = 8 bits, Sets Z on Error.
	JNZ Destuff_ERR ' check if last DeStuff had an error
	PairSampleOpcode
	JumpIf_SE0
	DS_Shift_JEQ8   Destuff_d, ByteDone   'updates 3 fields, and jumps if Field_N_Bits = 8 bits, Sets Z on Error.
	JNZ Destuff_ERR ' check if last DeStuff had an error
.. repeat unrolled for 10? PairSampleOpcode
ByteDone: '8 pin samples with no-destuff, 9 or 10 pin samples with 1.2 Skips
	WrByte

After DS_Shift_JEQ8 jumps, it has 3 fields in register : lower 8 bits = valid USB data, 3 mid bits as counter (000 on exit) and 10 upper bits as USB raw copy.

Cluso99 · 2014-03-09 15:19

Ariba wrote: »

Every Pin has a 1.5k a 10k and a 100k resistor which can be configured as Pullup or Pulldown. Further every pin has a comparator that can compare the levels of two pins and builds the difference. Some of these was implemented especially for USB years ago when Chip asked what pin hardware is needed for USB.
I don't know what the comparator outputs when both pins are Low. Can we detect this state reliable? We need that to detect the SE0 (end of packet) state. But I never have seen that a software USB solution detects an SE1 state as an error case inside the bit receive loop.

For differential output on two pins it is as simple as:
XOR OUTx,MaskDmp
where MaskDmp is the pinmask for D- and D+ pins.

Andy

Thanks Andy. That 1k5 pullup is what is required on the D+ pin for FS (1k5 pullup on D- for LS). That saves a pin.
I think you are correct that SE1 is usually not checked at each bit time. SE1 IIRC continues for some time, so its not a real problem.

I wasn't worried about the tx side because it is so much easier to do than rx. Basically if we can do rx then we can do tx. The real P2 will run at least 2x the fpga speed so we will be in a much better position when the real silicon is ready. However, for now if it takes 2 cogs that's fine by me. At least we can get something working enough to prove no further instructions/logic is required. I am fairly certain that the 2 instructions I asked for will be of sufficient help.

SERDES should be able to tx anyway - all we need to do is be able to set the no of bits to be sent and pre-do any bitstuffing into the output buffers.

We can always resort to a lookup table for CRC16 but by being able to calculate it for each bit as it is read/written pretty much solves this issue easily.

Sure we may be able to make serdes help, but first I want to understand the precise instructions required to satisfactorily perform the rx by sw bit reading. Then I can look at the top level protocol for endpoints etc. This is the part I don't yet understand although I have seen example code.

BTW 10K pulldowns will most likely work for USB Master. 10K pullups will work for PS2, I2C and lots of other cases. So these internal pullups/pulldowns are going to be a great help to minimise hw.

jmg · 2014-03-09 15:37

On the topic of CRC, the code above for 3 fields, could (just) pack to 4, to include CRC16. (not sure about CRC-5-USB ? - operand bit ?)
DataByte:8 CRC:16 BitCtr:3, leaves 5 for LiveBits, ok if register DoSwallow

There may be some alignment that allows init of BitCtr and LiveBits, without clobbering CRC, and still give Byte read-off.

Cluso99 · 2014-03-09 17:19

Here is the link to my older posts re FS USB and the requested instructions
http://forums.parallax.com/showthread.php/151821-P2-Possible-additional-Instructions?p=1221492&viewfull=1#post1221492
I have reproduced this post here although there has been later updates to this (need to check what precisely)

Cluso99 wrote: »
Perhaps the above op should be called

GETXP [#]D [WZ],[WC] 'pin into !Z via WZ, xor pin into C via WC (similar to GETP & GETNP)

Just a bit more info for the bit-banging USB FS RX sequence for each bit currently is..
              waitcnt   time, bittime           ' wait for next mid-bit sample time 
              test      K, ina          wz      ' read usb pin
              muxz      bits, bitmask   wc      ' b30 (mux mask for rx inbound xor register)
              shl       bits, #1                ' shift new xor'd in bit to b31 (to prev bit)
              test      JK, ina         wz      ' SE0 ? (ie EOP ?)
        if_z  jmp       #waitforend             ' y: wait for end
              rcl       data, #1                ' accumulate bit into data byte
              rcl       stuff, #6       wz      ' accumulate 6 bit blocks. If zero we need to unstuff next bit
[I]'There is no time to accumulate the crc16 here. A special 1bit crc instruction as suggested in the first post would help here.
[/I]  if_z  call      #unstuff 
If the special instruction did the following...

GETUSB [#]D WZ,WC
where
D = pin no (0..127)
C = C XOR PINx
Z = ! ( PINx OR PINy ) 'ie ZERO if both PINx and PINy are ZERO; PINy = PINx XOR #1
Note1: PINx and PINy are a pair of pins. If PINx is even then PINy := PINx + 1 else if PINx is odd then PINy := PINx - 1
- The allowance for the PINx/PINy pair to be reversed is for USB LS & HS where J/K are effectively swapped between D-/D+.
Note2: WZ & WC could be permanently set on if required.

This instruction would permit the above bit-banging code sequence to be reduced to (replaces 4 instructions)...
              waitcnt   time, bittime           ' wait for next mid-bit sample time 
[B]        getusb    K               wz,wc   ' C has prev bit; C = C XOR PIN; Z = !(PIN OR PIN+/-1) = both pin pairs are zero
[/B]  if_z  jmp       #waitforend             ' y: wait for end
              rcl       data, #1                ' accumulate bit into data byte
              rcl       stuff, #6       wz      ' accumulate 6 bit blocks. If zero (6 zero bits) we need to unstuff next bit.
[I]'There is no time to accumulate the crc16 here. A special 1bit crc instruction as suggested in the first post would help here.
[/I]  if_z  call      #unstuff 
As you can see, a new single bit 1 clock CRC instruction would help immensely too.

Here is a working USB CRC5 generation for reference...
'initialisation first
              mov       data, xxxxx             ' get the 5bit data
              and       data, #$1F              ' just in case
              mov       count, #5               ' 5 bits
              mov       crc5, xxxxx             ' preset crc5 register
' calculate CRC5
:loop         mov       temp, data              ' get copy of data bits left to process
              xor       temp, crc5              ' lsb of data xor crc5 required below
              shr       temp, #1        wc      ' result of data[lsb] xor crc5[lsb] from above
              shr       data, #1                ' shift input data
              shr       crc5, #1                ' shift crc5
        if_c  xor       crc5, #$14              ' crc5 polynomial =$14=100
              djnz      count, #:loop              
Analysing the CRC breakdown for a single bit is (can someone please verify this is correct)...
' C has the single data bit to be accumulated into the CRC5 register
' POLY stores the polynomial
' COUNT stores the number of CRC bits in the CRC algorithm
              rcl       temp, temp              ' put C into bit 0
              xor       temp, crc5              ' xor the lowest bit of crc5
              and       temp, #1        wz      ' and put result in Z
              shr       crc5, #1                ' CRC5 >> 1
        if_nz xor       crc5, poly              ' if BIT XOR CRC5[0] = 1 then CRC5 XOR POLY
Provided the above is correct then a new special instruction could do the following...
(This is slightly different to my proposal for the instruction in the earlier post)

WARNING: There is at least something wrong with the CRC generation below as it does not conform with the block diagram above. Maybe it is just reversed LSB/MSB but I am not sure yet. Can anyone help get this right???

CRCBIT D
where
D = CRCn cog register
C = C has the input bit
and two internal registers POLY and COUNT (set by special instructions, or else ACCA & ACCB could be used) are
POLY = The polynomial (up to 32 bits, unused bits zero) (could be ACCA)
COUNT = The number of bits in the CRC generation (or a mask???) (could be ACCB)
the instruction would perform the following (can someone please check)...

if (C XOR D[0] ) == 1 then
D >> 1
D XOR POLY
else
D>>1
endif

I cannot see the use for the COUNT (number of bits in the CRC) other than at the end of the whole CRC calculation where an AND mask would extract the relevant bits. If this is correct, then COUNT would not be required. What am I missing?

Now the resulting code would become...
[I]'Note: The internal register(s) POLY and COUNT would be previously set as would the users CRCn Register[/I]
              waitcnt   time, bittime           ' wait for next mid-bit sample time 
[B]        getusb    K               wz,wc   ' C has prev bit; C = C XOR PIN; Z = !(PIN OR PIN+/-1) = both pin pairs are zero
[/B]  if_z  jmp       #waitforend             ' y: wait for end
              rcl       data, #1                ' accumulate bit into data byte
              rcl       stuff, #6       wz      ' accumulate 6 bit blocks. If zero (6 zero bits) we need to unstuff next bit.
[B]        crcbit    CRC                      ' C has data bit; POLY has polynomial; COUNT (if reqd) has no.of.bits/mask; accumulate the CRC[/B]
        if_z  call      #unstuff 
So the new CRCBIT instruction would replace at least 4 instructions.

Cluso99 · 2014-03-09 17:25

And here is the CRCBIT instruction/discussion
http://forums.parallax.com/showthread.php/151992-CRC-generation?p=1222728&viewfull=1#post1222728
and copied below...

Cluso99 wrote: »
Attached is a simple spin program for the P1 to calculate any CRC.
There are various polynomials, number of bits, lsb/msb first, preset crc initial value, xor final value, send LSB/MSB crc byte first.
But a general purpose CRC is better.

Would some of you please test/modify this program and check it works?

What I would like to do is ask Chip for a single-bit CRC instruction for the P2. IMHO the best format for this would be that the data-bit would be in the C flag. Because we only have P2 instructions available with a single operand [#]D style, I thought that the polynomial could be written to the ACCA (perhaps or ACCB?) and that D would point to the CRC register in cog memory.

This is the CRC calculation in spin for a byte...
    d := DATA & $FF
    repeat i from 0 to 7
      c := (d ^ crc) & $01                              ' data bit 0 XOR crc bit 0
      d := d >> 1                                       ' data >> 1
      crc := crc >> 1                                   ' crc  >> 1
      if c
        crc := crc ^ poly                               ' if c==1: crc xor poly
This is a possible P2 CRC bit accumulate instruction format...
  [B]CRCBIT  D[/B]
[I]where D = CRC Register, C = current data bit, ACCA = polynomial
The CRCBIT instruction performs the following...
(1) X := C XOR D[0]
(2) D := D >> 1
(3) if X == 1 then D := D XOR ACCA
[/I]
The idea is that for bit-banging, the CRCBIT instruction would be called for each bit sent/received, and the bit would already be in C.
I expect CRCBIT should be capable of being a 1 clock instruction.

So, to accumulate an 8 bit byte (disregarding any reversals and initialisation) the following could be used...
This would take 2+16 clocks per byte, or for 4 bytes in a passed long 2+64 clocks.
        REPS    #2,#8           '\\ 2 instructions x 8 loops
        NOP                     '|| spacer
        SHR     DATA, #1  WC    '\\ C:=DATA[0]
        CRCBIT  CRC             '// accumulate 1bit into crc
This method does not require the CRCBIT instruction to know the number of bits in the algorithm.

Attachment not found.

and Chip's reply

cgracey wrote: »

Cluso99,

Implementing an atomic CRC instruction would be easy to do and a good use of resources. Let's do it, along with the special pin instructions to facilitate USB. These are really good ideas that result in almost no silicon growth, but will cut bit-period processing requirements in half for many protocols.

Cluso99 · 2014-03-09 17:33

CRC

I suggested a possible version of the CRCBIT instruction (haven't found the thread/posts yet) where we could use the instruction to calculate any polynomial.
However, when Chip looked at the complexity it would take too much silicon. Therefore I suggest we look at the possibility of just 2 polynomial options, those being the two common CRC16 - the IBM/USB and CCITT. The xmodem variant of the CCITT is easily done on the initial and final CRC value by sw. As I have said, I don't think we need CRC5 for USB as we can precalculate most of the CRC5 including our USB address, so its quite simple. It may be worth while to see just what gates are involved.

As I have said previously here, I am quite happy to just get the 2 instructions and start to work with them while Chip moves on to SERDES. Because anything I discover that would help would be quite simple I don't mind suggesting it while Chip is doing SERDES.

There is nothing better than to run code to find the weaknesses. Much better than theory.

rogloh · 2014-03-09 17:43

I take it this would be for the CRC16. It may not be needed to have any CRC5 hardware support. As already mentioned, once you know your address and endpoint, the CRC5 value is static for most packet types and can therefore be precalculated. The only time where it is dynamic is for the SOF (start of frame) token packets which contain an 11bit incrementing frame counter and are sent once per millisecond. For a slave, unless your application wants to know the frame number at all times, you probably can ignore the CRC5 in this packet type as you won't need to care too much if there is a bit error in the frame counter and the value is occasionally wrong.

Slaves never have to generate the CRC-5, only check it which is easy. But if a P2 implements a USB host we would need to be able to generate it, and in the worst case we could always use an 11 bit indexed lookup table for an exact match if required. It will just burn 2kB of hub RAM for that approach, or we could do some type of 8 bit LUT implementation using stack RAM perhaps.

Update: Wrote this before your previous reply Cluso99, just saw you agree CRC5 is probably not needed too.

Sapieha · 2014-03-09 17:55

Hi Cluso.

I think it was in Propeller II update - BLOG

I posted link to page that generate Verilog code for both 5 and 16 Bit CRC for USB

Cluso99 wrote: »

CRC

I suggested a possible version of the CRCBIT instruction (haven't found the thread/posts yet) where we could use the instruction to calculate any polynomial.
However, when Chip looked at the complexity it would take too much silicon. Therefore I suggest we look at the possibility of just 2 polynomial options, those being the two common CRC16 - the IBM/USB and CCITT. The xmodem variant of the CCITT is easily done on the initial and final CRC value by sw. As I have said, I don't think we need CRC5 for USB as we can precalculate most of the CRC5 including our USB address, so its quite simple. It may be worth while to see just what gates are involved.

As I have said previously here, I am quite happy to just get the 2 instructions and start to work with them while Chip moves on to SERDES. Because anything I discover that would help would be quite simple I don't mind suggesting it while Chip is doing SERDES.

There is nothing better than to run code to find the weaknesses. Much better than theory.

rogloh · 2014-03-09 18:03

Cluso99 wrote: »
And here is the CRCBIT instruction/discussion

So, to accumulate an 8 bit byte (disregarding any reversals and initialisation) the following could be used...
This would take 2+16 clocks per byte, or for 4 bytes in a passed long 2+64 clocks.
REPS    #2,#8           '\\ 2 instructions x 8 loops
        NOP                     '|| spacer
        SHR     DATA, #1  WC    '\\ C:=DATA[0]
        CRCBIT  CRC             '// accumulate 1bit into crc

One interesting thing I noticed about the proposed CRCBIT instruction is that at best it takes a mininum of one clock per bit if you already have your bit in C and unrolled everything etc. If you have to rotate to C from another register it will take 16 clocks per byte (2 per bit). There is also some CRC initial setup overhead required but I will ignore that for now.

This means this will take 8 clock cycles per byte to complete at best. A LUT implementation will only take 5 instructions per byte. That means if you have the Stack RAM to spare, it will be significantly faster and free more cycles so interestingly the HW is not necessarily adding as much value as we would like in this case. In fact it is lowering performance which is a little counter-intuitive. Just wanted to point that out. It would however let you interleave it within the bit processing workload of the COG which could be useful if there is already a free cycle there for doing it.

Cluso99 · 2014-03-09 18:08

Sapieha, that link is to the first post. Did you mean to post a link to the actual post# ?

jmg · 2014-03-09 18:27

Sapieha wrote: »

I posted link to page that generate Verilog code for both 5 and 16 Bit CRC for USB

CRC could be included on-the-fly in the suggested DS_Shift_JEQ8 opcode, it will fit, but I see some fish hooks.

* The examples show init CRC to all 1's, not a large issue, but can take more code.
* There is no Length element in USB, so the EOP signals when to finish-and-check, problem is, in simplest designs by the time EOP arrives, you have just done a CRC on itself. Hmm...

I think the CRC applies only to the preceeding data

Maybe there is enough time to roll-back those last 16 bits of CRC ? (anyone seen CRC roll-back code ?)

Other option would be a 16bit delay line feeding CRC, so the CRC is that from 16 bits back in time.
That needs some init of that delay line - to what content ?

Sapieha · 2014-03-09 18:31

Sorry Cluso.

Don't mind what post it was

Cluso99 wrote: »

Sapieha, that link is to the first post. Did you mean to post a link to the actual post# ?

Sapieha · 2014-03-09 18:35

Hi Cluso.

Link to that Site.

http://www.easics.be/webtools/crctool

rogloh · 2014-03-09 18:37

jmg wrote: »

CRC could be included on-the-fly in the suggested DS_Shift_JEQ8 opcode, it will fit, but I see some fish hooks.

* The examples show init CRC to all 1's, not a large issue, but can take more code.
* There is no Length element in USB, so the EOP signals when to finish-and-check, problem is, in simplest designs by the time EOP arrives, you have just done a CRC on itself. Hmm...

I think the CRC applies only to the preceeding data

Maybe there is enough time to roll-back those last 16 bits of CRC ? (anyone seen CRC roll-back code ?)

Other option would be a 16bit delay line feeding CRC, so the CRC is that from 16 bits back in time.
That needs some init of that delay line - to what content ?

You may not have to worry about rollback. I'm not a CRC expert but from memory if you include the CRC itself in the CRC accumulation you may end up with a zero or some known constant to check.

jmg · 2014-03-09 18:52

rogloh wrote: »

You may not have to worry about rollback. I'm not a CRC expert but from memory if you include the CRC itself in the CRC accumulation you may end up with a zero or some known constant to check.

I think that is true only for checksums.

I think there is a post-rx check, which relies on pre-load of CRC, which comes naturally with the Opcode-on-register design,

Works like this
Pass1: have full packet, all data and RxCRC and the Calculated CRC, which is 'overcooked' by having run on CRC too.
Save ocCRC for later.

What is needed is a equality comparison, so is we re-prime the CRC register with the RxCRC, and now play through the RxCRC for 16 'clocks', we now have a copy of RxCRC (+) Last 16 bits(=CRC) = ocCRC, and this is compared with saved ocCRC, and we do not need to reverse CRC, we just need to duplicate the CRC-append, and check that.

This would cost preloads + 16x(RRC+DS_Shift_JEQ8), at the end of a packet. Is that too slow ?

jmg · 2014-03-09 18:54

Sapieha wrote: »

Hi Cluso.
Link to that Site.
http://www.easics.be/webtools/crctool

There is also
http://outputlogic.com/?page_id=321

rogloh · 2014-03-09 19:24

jmg wrote: »

I think that is true only for checksums.

Maybe you are right (I don't know either way) but this is what Wikipedia had to say: http://en.wikipedia.org/wiki/Computation_of_cyclic_redundancy_checks

"One-pass checking

When appending a CRC to a message, it is possible to detach the transmitted CRC, recompute it, and verify the recomputed value against the transmitted one. However, a simpler technique is commonly used in hardware.

When the CRC is transmitted with the correct bit order (most significant terms first), a receiver can compute an overall CRC, over the message and the CRC, and if the CRC is correct, the result will be zero. This possibility is the reason that most network protocols that include a CRC do so before the ending delimiter; it is not necessary to know whether the end of the packet is imminent to check the CRC."

jmg · 2014-03-09 19:54

rogloh wrote: »

Maybe you are right (I don't know either way) but this is what Wikipedia had to say: http://en.wikipedia.org/wiki/Computation_of_cyclic_redundancy_checks

"One-pass checking

When appending a CRC to a message, it is possible to detach the transmitted CRC, recompute it, and verify the recomputed value against the transmitted one. However, a simpler technique is commonly used in hardware.

When the CRC is transmitted with the correct bit order (most significant terms first), a receiver can compute an overall CRC, over the message and the CRC, and if the CRC is correct, the result will be zero. This possibility is the reason that most network protocols that include a CRC do so before the ending delimiter; it is not necessary to know whether the end of the packet is imminent to check the CRC."

Cool , I just assumed it was too complex to do that, if that is correct, then life does get a lot simpler.
Just Prime CRC field with the needed 1's and check for 0000 at the EOP - no post RX footwork needed at all.

The Verilog above does USB DeStuff, it may be an opcode param or two can select HDLC DeStuff or no Destuff, which would allow the CRC engine in the opcode to be used for Txmit ?

Cluso99 · 2014-03-09 20:25

It is simple enough to just push the CRC at the end of each byte onto the tasks 4 deep stack. At the end, the CRC will be two pops down. QED.

jmg · 2014-03-09 20:36

Cluso99 wrote: »

It is simple enough to just push the CRC at the end of each byte onto the tasks 4 deep stack. At the end, the CRC will be two pops down. QED.

With the 4 Field DS_Shift_JEQ8 proposed, there is not even the need to do that.

The CRC is available in the upper bits of the register, and preserves across bytes.
If the total packet CRC sums over itself to Zero, then you just check that field for 0000 at the EOP

A switch would be needed to make the CRC field accessible for transmit, tho I suppose it could call on every Physical TxBit, in which case the de-stuff does not need disable ?

That allows one opcode to be used both ways, (but it does not stuff-on-tx)

Cluso99 · 2014-03-09 21:29

Here is a possible single bit CRC Verilog that should do CRC5usb, CRC16usb and CRC16ccitt...

You will note where the resultant bits are different for the 3 crc polynomials, I have just included 3 lines, first for CRC5usb, then CRC16usb and last CRC16ccitt. These 3 statements need to have some if then or similar decoding depending upon which crc polynomial is chosen.
For the n/a case of crc5, anything can be chosen.

////////////////////////////////////////////////////////////////////////////////
// Copyright (C) 1999-2008 Easics NV.
// This source file may be used and distributed without restriction
// provided that this copyright statement is not removed from the file
// and that any derivative work contains the original copyright notice
// and the associated disclaimer.
//
// THIS SOURCE FILE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS
// OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
// WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
//
// Purpose : synthesizable CRC function
//
// Info : [EMAIL="tools@easics.be"]tools@easics.be[/EMAIL]
//        [URL="http://www.easics.com/"]http://www.easics.com[/URL]
//
// RR20130310 modified for CRC5, CRC16usb & CRC16ccitt
////////////////////////////////////////////////////////////////////////////////
module CRC;
  // polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16=(0 5 12 16) 
  // data width: 1
  // convention: the first serial bit is D[0]
  function [15:0] nextCRC16_D1;
    input Data;
    input [15:0] crc;
    reg [0:0] d;
    reg [15:0] c;
    reg [15:0] newcrc;
  begin
    d[0] = Data;
    c = crc;

    newcrc[0] = d[0] ^ c[4];
    newcrc[0] = d[0] ^ c[15];
    newcrc[0] = d[0] ^ c[15];

    newcrc[1] = c[0];

    newcrc[2] = d[0] ^ c[1] ^ c[4];
    newcrc[2] = d[0] ^ c[1] ^ c[15];
    newcrc[2] = c[1];

    newcrc[3] = c[2];
    newcrc[4] = c[3];
    
    //n/a
    newcrc[5] = c[4];
    newcrc[5] = d[0] ^ c[4] ^ c[15];

    newcrc[6] = c[5];
    newcrc[7] = c[6];
    newcrc[8] = c[7];
    newcrc[9] = c[8];
    newcrc[10] = c[9];
    newcrc[11] = c[10];
    
    //n/a
    newcrc[12] = c[11];
    newcrc[12] = d[0] ^ c[11] ^ c[15];

    newcrc[13] = c[12];
    newcrc[14] = c[13];

    //n/a
    newcrc[15] = d[0] ^ c[14] ^ c[15];
    newcrc[15] = c[14];

    nextCRC16_D1 = newcrc;
  end
  endfunction
endmodule

Thanks for the easics link as I used this to see what Verilog was generated for each polynomial.

jmg · 2014-03-09 21:42

When you have 2 or more conflicting lines, as you do in that Verilog, my Verilog compiler seems to simply apply the last-line and ignore the others.

Cluso99 · 2014-03-09 22:08

jmg: Of course they are going to fail - didn't you read my comments?

I am not sure how you specify the inputs to select the 3 possible polynomials. Presuming
00 = crc5
10 = crc16 usb
11 = crc16 ccitt

then how would you write the following selectively for

if crc5 then
    newcrc[0] = d[0] ^ c[4];
else if crc16usb then
    newcrc[0] = d[0] ^ c[15];
else
    newcrc[0] = d[0] ^ c[15];
endif;

And yes this one can be simplified.
Or would a complex statement something like

    newcrc[0] = d[0] ^ (enable5 & c[4]) ^ (enable16 & c[15]) ;

be better.
Best get out my Verilog intro.

Cluso99 · 2014-03-09 22:55

Here is an update that I hope decodes the 3 supported polynomials

module CRC;
  // polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16=(0 5 12 16) 
  // data width: 1
  // convention: the first serial bit is D[0]
  function [15:0] nextCRC16_D1;
    input Data;                 // the Carry flag
    input [15:0] crc;
    input [1:0] poly;           // 00=crc5usb, 01=illegal, 10=crc16usb, 11=crc16ccitt 
    reg [0:0] d;
    reg [15:0] c;
    reg [poly] p;               
    reg [15:0] newcrc;
  begin
    d[0] = Data;
    c = crc;
    p = poly;
    newcrc[0] = d[0] ^ (!p[1] & c[4]) ^ (p[1] & c[15]);
    newcrc[1] = c[0];
    newcrc[2] = (p[0] & d[0]) ^ c[1] ^ (!p1[1] & c[4]) ^ (!p[0] & c[15]);
    newcrc[3] = c[2];
    newcrc[4] = c[3];
    newcrc[5] = (p[0] & d[0]) ^ c[4] ^ (p[0] & c[15]);
    newcrc[6] = c[5];
    newcrc[7] = c[6];
    newcrc[8] = c[7];
    newcrc[9] = c[8];
    newcrc[10] = c[9];
    newcrc[11] = c[10];
    newcrc[12] = (p[0] & d[0]) ^ c[11] ^ (p[0] & c[15]);
    newcrc[13] = c[12];
    newcrc[14] = c[13];
    newcrc[15] = (!p[0] & d[0]) ^ c[14] ^ (!p[0] & c[15]);
    newcrc[15] = c[14];
    nextCRC16_D1 = newcrc;
  end
  endfunction
endmodule

jmg · 2014-03-09 23:06

I would tend to code it so you can quickly scan-check it, with a single mode declaration point

// 00=crc5usb, 01=illegal, 10=crc16usb, 11=crc16ccitt
	always @(poly)  begin   
	  crc5usb    = (poly == 2'b00); 
 	  crc16usbb  = (poly == 2'b10); 
	  crc16ccitt = (poly == 2'b11); 
	end

msrobots · 2014-03-10 11:53

Hi all.

I just want to remind you that SD card also need CRC. In this case we also need CRC7.
The polynomial for CRC7 is 0x89; the polynomial for CRC16 is 0x1021 which is based upon a standard called CRC-CCITT.

Thanks

Mike

Bill Henning · 2014-03-10 11:58

SD card 4 bit mode is painful, the CRC is evaluated independently for each of the four bits in parallel.

Cluso99 · 2014-03-10 16:15

msrobots wrote: »

Hi all.

I just want to remind you that SD card also need CRC. In this case we also need CRC7.
The polynomial for CRC7 is 0x89; the polynomial for CRC16 is 0x1021 which is based upon a standard called CRC-CCITT.

Thanks

Mike

Mike,
The poly for CRC16 USB is $8005. $1021 is for CRC16 CCITT. Confusing isn't it. IBM created the original CRC16 (as now used by USB) for use in sync comms back in the 80's or earlier. But as Europe usually does, they had to use a different poly

rogloh · 2014-03-10 19:41

@Cluso99,

There is one thing still confusing me about the proposed GETXP instruction. After calling such an instruction you would get carry flag C result being the XOR of the original C flag value and one of the USB data pins. So if C comes back as 1 that means it was different to the sampled pin value, and if it comes back 0 it was the same value as the sampled pin value. This is fine and it detects logical 0/1 NRZI bitstream nicely.

However, unless I am missing something else it appears you would then want to reuse C again for the next iteration. The problem is that this time around C is not the last pin value, it indicates whether there a difference between previous C value and the previous pin value. So some other operation to reset C back to the previous data pin value appears to be required before the next time it gets called, or some trick is required. Are you doing this as well somewhere in your code? I didn't see that mentioned anywhere. Won't this require an additional clock cycle to do?

jmg · 2014-03-10 19:56

rogloh wrote: »

So some other operation to reset C back to the previous data pin value appears to be required before the next time it gets called, or some trick is required. Are you doing this as well somewhere in your code? I didn't see that mentioned anywhere. Won't this require an additional clock cycle to do?

I think this last value work is done in the background, as part of the opcode.
It means the very first opcode C will be discarded, as the previous value is ?? but the SE0 will be valid from first clock.

Once an edge is sensed, the SW will phase adjust to try to sample in bit-centre.
The USB bit stream allows for edge-resync but that may be harder to achieve at 12MHz, so some limits on Xtal tolerance and data-length may be imposed.
A 1.5MHz P2 probably could manage edge-resync, and 1.5MHz is fine for a lot of tasks.

It may be that the P2 Counters have a capture mode that can help with edge-resync ?

P2 and full speed USB slave requirements/ideas

Comments