Shop OBEX P1 Docs P2 Docs Learn Events
P2 and full speed USB slave requirements/ideas - Page 2 — Parallax Forums

P2 and full speed USB slave requirements/ideas

245

Comments

  • jmgjmg Posts: 15,173
    edited 2014-03-09 14:57
    jmg wrote: »
    Another approach could be to code USB for 1.5MHz, and focus on compact block 'macros' that do the Dual-Bit, and DeStuff operations, that Chip can then turn into opcodes that can allow a jump to 12MHz ?

    because a code picture is worth 1000 words, applying this rule backwards, this is some (untested) Verilog
    for the suggested DeStuff-Jump opcode
    // DS_Shift_JEQ8 Reg,Adr   Opcode  => 3 Fields mapped into active Register, does Skip and Count, and exits on 8 valid bits
    reg [9:0] ShiftB;  // can be as small as 7b, but 10b is full raw copy, might be useful ?
    reg [7:0] DataBY;
    reg [2:0] BitCtr;
    
    assign  Q[31:22]  = ShiftB[9:0];  // Field 3 = copy Shifter, No skip
    assign  Q[18:16]  = BitCtr[2:0];  // Field 2 = Bit Counter, skips,  exit on last-Rx-bit
    assign  Q[7:0]    = DataBY[7:0];  // Field 1 = RxByte bits, skips 
    
    always  @(ShiftB) // inserts a 0 after six (USB) sequential 1's in the transmitter 
    begin
     DoSwallow = (ShiftB[5:0] == 6'b111111); // next bit is skipped
    end
    
    always  @(BitCtr)
    begin
     JExit  =  (BitCtr == 3'b111) & !DoSwallow; // this clock edge is LAST shift/inc, so exit too
    end
    
    always  @(posedge CLK)
    begin
            Z_FLag_Err <= (ShiftB[6:0] == 7'b1111111);  // Store overflow, optional.
    	ShiftB <= {ShiftB[8:0],Din};   // live raw bit pattern, no skip
            if ( DoSwallow ) begin         // .CE Hold Ctr, No Shift == Skip this Stuff-bit
              BitCtr <= BitCtr;
              DataBY <= DataBY;
            end 
            else begin                     // VALID bit, so INC and Do Shift 
              BitCtr <= BitCtr + 1;
    	  DataBY <= {DataBY[6:0],Din}; 
    	end
    end
    

    Can be used rolled, or unrolled, but needs inversion on JUMP,
    or it could be patched into SerDes, with PairSample, for USB_ByteRX
    ' Psuedo ASM code, 2 USB helper opcodes.
    	Destuff_d = 0 ' Init all 3 fields and Z
    'Start Loop:
    	PairSampleOpcode  ' or can include the Jump ?
    	JumpIf_SE0
    	DS_Shift_JEQ8   Destuff_d, ByteDone   'updates 3 fields, and jumps if Field_N_Bits = 8 bits, Sets Z on Error.
    	JNZ Destuff_ERR ' check if last DeStuff had an error
    	PairSampleOpcode
    	JumpIf_SE0
    	DS_Shift_JEQ8   Destuff_d, ByteDone   'updates 3 fields, and jumps if Field_N_Bits = 8 bits, Sets Z on Error.
    	JNZ Destuff_ERR ' check if last DeStuff had an error
    .. repeat unrolled for 10? PairSampleOpcode
    ByteDone: '8 pin samples with no-destuff, 9 or 10 pin samples with 1.2 Skips
    	WrByte
    

    After DS_Shift_JEQ8 jumps, it has 3 fields in register : lower 8 bits = valid USB data, 3 mid bits as counter (000 on exit) and 10 upper bits as USB raw copy.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 15:19
    Ariba wrote: »
    Every Pin has a 1.5k a 10k and a 100k resistor which can be configured as Pullup or Pulldown. Further every pin has a comparator that can compare the levels of two pins and builds the difference. Some of these was implemented especially for USB years ago when Chip asked what pin hardware is needed for USB.
    I don't know what the comparator outputs when both pins are Low. Can we detect this state reliable? We need that to detect the SE0 (end of packet) state. But I never have seen that a software USB solution detects an SE1 state as an error case inside the bit receive loop.

    For differential output on two pins it is as simple as:
    XOR OUTx,MaskDmp
    where MaskDmp is the pinmask for D- and D+ pins.

    Andy
    Thanks Andy. That 1k5 pullup is what is required on the D+ pin for FS (1k5 pullup on D- for LS). That saves a pin.
    I think you are correct that SE1 is usually not checked at each bit time. SE1 IIRC continues for some time, so its not a real problem.

    I wasn't worried about the tx side because it is so much easier to do than rx. Basically if we can do rx then we can do tx. The real P2 will run at least 2x the fpga speed so we will be in a much better position when the real silicon is ready. However, for now if it takes 2 cogs that's fine by me. At least we can get something working enough to prove no further instructions/logic is required. I am fairly certain that the 2 instructions I asked for will be of sufficient help.

    SERDES should be able to tx anyway - all we need to do is be able to set the no of bits to be sent and pre-do any bitstuffing into the output buffers.

    We can always resort to a lookup table for CRC16 but by being able to calculate it for each bit as it is read/written pretty much solves this issue easily.

    Sure we may be able to make serdes help, but first I want to understand the precise instructions required to satisfactorily perform the rx by sw bit reading. Then I can look at the top level protocol for endpoints etc. This is the part I don't yet understand although I have seen example code.

    BTW 10K pulldowns will most likely work for USB Master. 10K pullups will work for PS2, I2C and lots of other cases. So these internal pullups/pulldowns are going to be a great help to minimise hw.
  • jmgjmg Posts: 15,173
    edited 2014-03-09 15:37
    On the topic of CRC, the code above for 3 fields, could (just) pack to 4, to include CRC16. (not sure about CRC-5-USB ? - operand bit ?)
    DataByte:8 CRC:16 BitCtr:3, leaves 5 for LiveBits, ok if register DoSwallow

    There may be some alignment that allows init of BitCtr and LiveBits, without clobbering CRC, and still give Byte read-off.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 17:19
    Here is the link to my older posts re FS USB and the requested instructions
    http://forums.parallax.com/showthread.php/151821-P2-Possible-additional-Instructions?p=1221492&viewfull=1#post1221492
    I have reproduced this post here although there has been later updates to this (need to check what precisely)
    Cluso99 wrote: »
    Perhaps the above op should be called

    GETXP [#]D [WZ],[WC] 'pin into !Z via WZ, xor pin into C via WC
    (similar to GETP & GETNP)

    Just a bit more info for the bit-banging USB FS RX sequence for each bit currently is..
                  waitcnt   time, bittime           ' wait for next mid-bit sample time 
                  test      K, ina          wz      ' read usb pin
                  muxz      bits, bitmask   wc      ' b30 (mux mask for rx inbound xor register)
                  shl       bits, #1                ' shift new xor'd in bit to b31 (to prev bit)
                  test      JK, ina         wz      ' SE0 ? (ie EOP ?)
            if_z  jmp       #waitforend             ' y: wait for end
                  rcl       data, #1                ' accumulate bit into data byte
                  rcl       stuff, #6       wz      ' accumulate 6 bit blocks. If zero we need to unstuff next bit
    [I]'There is no time to accumulate the crc16 here. A special 1bit crc instruction as suggested in the first post would help here.
    [/I]  if_z  call      #unstuff 
    
    If the special instruction did the following...

    GETUSB [#]D WZ,WC

    where
    D = pin no (0..127)
    C = C XOR PINx
    Z = ! ( PINx OR PINy ) 'ie ZERO if both PINx and PINy are ZERO; PINy = PINx XOR #1
    Note1: PINx and PINy are a pair of pins. If PINx is even then PINy := PINx + 1 else if PINx is odd then PINy := PINx - 1
    - The allowance for the PINx/PINy pair to be reversed is for USB LS & HS where J/K are effectively swapped between D-/D+.
    Note2: WZ & WC could be permanently set on if required.


    This instruction would permit the above bit-banging code sequence to be reduced to (replaces 4 instructions)...
                  waitcnt   time, bittime           ' wait for next mid-bit sample time 
    [B]        getusb    K               wz,wc   ' C has prev bit; C = C XOR PIN; Z = !(PIN OR PIN+/-1) = both pin pairs are zero
    [/B]  if_z  jmp       #waitforend             ' y: wait for end
                  rcl       data, #1                ' accumulate bit into data byte
                  rcl       stuff, #6       wz      ' accumulate 6 bit blocks. If zero (6 zero bits) we need to unstuff next bit.
    [I]'There is no time to accumulate the crc16 here. A special 1bit crc instruction as suggested in the first post would help here.
    [/I]  if_z  call      #unstuff 
    

    As you can see, a new single bit 1 clock CRC instruction would help immensely too.

    Here is a working USB CRC5 generation for reference...
    'initialisation first
                  mov       data, xxxxx             ' get the 5bit data
                  and       data, #$1F              ' just in case
                  mov       count, #5               ' 5 bits
                  mov       crc5, xxxxx             ' preset crc5 register
    ' calculate CRC5
    :loop         mov       temp, data              ' get copy of data bits left to process
                  xor       temp, crc5              ' lsb of data xor crc5 required below
                  shr       temp, #1        wc      ' result of data[lsb] xor crc5[lsb] from above
                  shr       data, #1                ' shift input data
                  shr       crc5, #1                ' shift crc5
            if_c  xor       crc5, #$14              ' crc5 polynomial =$14=100
                  djnz      count, #:loop              
    
    Analysing the CRC breakdown for a single bit is (can someone please verify this is correct)...
    ' C has the single data bit to be accumulated into the CRC5 register
    ' POLY stores the polynomial
    ' COUNT stores the number of CRC bits in the CRC algorithm
                  rcl       temp, temp              ' put C into bit 0
                  xor       temp, crc5              ' xor the lowest bit of crc5
                  and       temp, #1        wz      ' and put result in Z
                  shr       crc5, #1                ' CRC5 >> 1
            if_nz xor       crc5, poly              ' if BIT XOR CRC5[0] = 1 then CRC5 XOR POLY
    
    Provided the above is correct then a new special instruction could do the following...
    (This is slightly different to my proposal for the instruction in the earlier post)

    WARNING: There is at least something wrong with the CRC generation below as it does not conform with the block diagram above. Maybe it is just reversed LSB/MSB but I am not sure yet. Can anyone help get this right???

    CRCBIT D
    where
    D = CRCn cog register
    C = C has the input bit
    and two internal registers POLY and COUNT (set by special instructions, or else ACCA & ACCB could be used) are
    POLY = The polynomial (up to 32 bits, unused bits zero) (could be ACCA)
    COUNT = The number of bits in the CRC generation (or a mask???) (could be ACCB)
    the instruction would perform the following (can someone please check)...

    if (C XOR D[0] ) == 1 then
    D >> 1
    D XOR POLY
    else
    D>>1
    endif

    I cannot see the use for the COUNT (number of bits in the CRC) other than at the end of the whole CRC calculation where an AND mask would extract the relevant bits. If this is correct, then COUNT would not be required. What am I missing?

    Now the resulting code would become...
    [I]'Note: The internal register(s) POLY and COUNT would be previously set as would the users CRCn Register[/I]
                  waitcnt   time, bittime           ' wait for next mid-bit sample time 
    [B]        getusb    K               wz,wc   ' C has prev bit; C = C XOR PIN; Z = !(PIN OR PIN+/-1) = both pin pairs are zero
    [/B]  if_z  jmp       #waitforend             ' y: wait for end
                  rcl       data, #1                ' accumulate bit into data byte
                  rcl       stuff, #6       wz      ' accumulate 6 bit blocks. If zero (6 zero bits) we need to unstuff next bit.
    [B]        crcbit    CRC                      ' C has data bit; POLY has polynomial; COUNT (if reqd) has no.of.bits/mask; accumulate the CRC[/B]
            if_z  call      #unstuff 
    

    So the new CRCBIT instruction would replace at least 4 instructions.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 17:25
    And here is the CRCBIT instruction/discussion
    http://forums.parallax.com/showthread.php/151992-CRC-generation?p=1222728&viewfull=1#post1222728
    and copied below...
    Cluso99 wrote: »
    Attached is a simple spin program for the P1 to calculate any CRC.
    There are various polynomials, number of bits, lsb/msb first, preset crc initial value, xor final value, send LSB/MSB crc byte first.
    But a general purpose CRC is better.

    Would some of you please test/modify this program and check it works?

    What I would like to do is ask Chip for a single-bit CRC instruction for the P2. IMHO the best format for this would be that the data-bit would be in the C flag. Because we only have P2 instructions available with a single operand [#]D style, I thought that the polynomial could be written to the ACCA (perhaps or ACCB?) and that D would point to the CRC register in cog memory.

    This is the CRC calculation in spin for a byte...
        d := DATA & $FF
        repeat i from 0 to 7
          c := (d ^ crc) & $01                              ' data bit 0 XOR crc bit 0
          d := d >> 1                                       ' data >> 1
          crc := crc >> 1                                   ' crc  >> 1
          if c
            crc := crc ^ poly                               ' if c==1: crc xor poly
    
    This is a possible P2 CRC bit accumulate instruction format...
      [B]CRCBIT  D[/B]
    [I]where D = CRC Register, C = current data bit, ACCA = polynomial
    The CRCBIT instruction performs the following...
    (1) X := C XOR D[0]
    (2) D := D >> 1
    (3) if X == 1 then D := D XOR ACCA
    [/I]
    
    The idea is that for bit-banging, the CRCBIT instruction would be called for each bit sent/received, and the bit would already be in C.
    I expect CRCBIT should be capable of being a 1 clock instruction.

    So, to accumulate an 8 bit byte (disregarding any reversals and initialisation) the following could be used...
    This would take 2+16 clocks per byte, or for 4 bytes in a passed long 2+64 clocks.
            REPS    #2,#8           '\\ 2 instructions x 8 loops
            NOP                     '|| spacer
            SHR     DATA, #1  WC    '\\ C:=DATA[0]
            CRCBIT  CRC             '// accumulate 1bit into crc
    
    This method does not require the CRCBIT instruction to know the number of bits in the algorithm.

    Attachment not found.
    and Chip's reply
    cgracey wrote: »
    Cluso99,

    Implementing an atomic CRC instruction would be easy to do and a good use of resources. Let's do it, along with the special pin instructions to facilitate USB. These are really good ideas that result in almost no silicon growth, but will cut bit-period processing requirements in half for many protocols.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 17:33
    CRC

    I suggested a possible version of the CRCBIT instruction (haven't found the thread/posts yet) where we could use the instruction to calculate any polynomial.
    However, when Chip looked at the complexity it would take too much silicon. Therefore I suggest we look at the possibility of just 2 polynomial options, those being the two common CRC16 - the IBM/USB and CCITT. The xmodem variant of the CCITT is easily done on the initial and final CRC value by sw. As I have said, I don't think we need CRC5 for USB as we can precalculate most of the CRC5 including our USB address, so its quite simple. It may be worth while to see just what gates are involved.

    As I have said previously here, I am quite happy to just get the 2 instructions and start to work with them while Chip moves on to SERDES. Because anything I discover that would help would be quite simple I don't mind suggesting it while Chip is doing SERDES.

    There is nothing better than to run code to find the weaknesses. Much better than theory.
  • roglohrogloh Posts: 5,786
    edited 2014-03-09 17:43
    I take it this would be for the CRC16. It may not be needed to have any CRC5 hardware support. As already mentioned, once you know your address and endpoint, the CRC5 value is static for most packet types and can therefore be precalculated. The only time where it is dynamic is for the SOF (start of frame) token packets which contain an 11bit incrementing frame counter and are sent once per millisecond. For a slave, unless your application wants to know the frame number at all times, you probably can ignore the CRC5 in this packet type as you won't need to care too much if there is a bit error in the frame counter and the value is occasionally wrong.

    Slaves never have to generate the CRC-5, only check it which is easy. But if a P2 implements a USB host we would need to be able to generate it, and in the worst case we could always use an 11 bit indexed lookup table for an exact match if required. It will just burn 2kB of hub RAM for that approach, or we could do some type of 8 bit LUT implementation using stack RAM perhaps.

    Update: Wrote this before your previous reply Cluso99, just saw you agree CRC5 is probably not needed too.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-09 17:55
    Hi Cluso.

    I think it was in Propeller II update - BLOG

    I posted link to page that generate Verilog code for both 5 and 16 Bit CRC for USB


    Cluso99 wrote: »
    CRC

    I suggested a possible version of the CRCBIT instruction (haven't found the thread/posts yet) where we could use the instruction to calculate any polynomial.
    However, when Chip looked at the complexity it would take too much silicon. Therefore I suggest we look at the possibility of just 2 polynomial options, those being the two common CRC16 - the IBM/USB and CCITT. The xmodem variant of the CCITT is easily done on the initial and final CRC value by sw. As I have said, I don't think we need CRC5 for USB as we can precalculate most of the CRC5 including our USB address, so its quite simple. It may be worth while to see just what gates are involved.

    As I have said previously here, I am quite happy to just get the 2 instructions and start to work with them while Chip moves on to SERDES. Because anything I discover that would help would be quite simple I don't mind suggesting it while Chip is doing SERDES.

    There is nothing better than to run code to find the weaknesses. Much better than theory.
  • roglohrogloh Posts: 5,786
    edited 2014-03-09 18:03
    Cluso99 wrote: »
    And here is the CRCBIT instruction/discussion

    So, to accumulate an 8 bit byte (disregarding any reversals and initialisation) the following could be used...
    This would take 2+16 clocks per byte, or for 4 bytes in a passed long 2+64 clocks.
    REPS    #2,#8           '\\ 2 instructions x 8 loops
            NOP                     '|| spacer
            SHR     DATA, #1  WC    '\\ C:=DATA[0]
            CRCBIT  CRC             '// accumulate 1bit into crc
    

    One interesting thing I noticed about the proposed CRCBIT instruction is that at best it takes a mininum of one clock per bit if you already have your bit in C and unrolled everything etc. If you have to rotate to C from another register it will take 16 clocks per byte (2 per bit). There is also some CRC initial setup overhead required but I will ignore that for now.

    This means this will take 8 clock cycles per byte to complete at best. A LUT implementation will only take 5 instructions per byte. That means if you have the Stack RAM to spare, it will be significantly faster and free more cycles so interestingly the HW is not necessarily adding as much value as we would like in this case. In fact it is lowering performance which is a little counter-intuitive. Just wanted to point that out. It would however let you interleave it within the bit processing workload of the COG which could be useful if there is already a free cycle there for doing it.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 18:08
    Sapieha, that link is to the first post. Did you mean to post a link to the actual post# ?
  • jmgjmg Posts: 15,173
    edited 2014-03-09 18:27
    Sapieha wrote: »
    I posted link to page that generate Verilog code for both 5 and 16 Bit CRC for USB

    CRC could be included on-the-fly in the suggested DS_Shift_JEQ8 opcode, it will fit, but I see some fish hooks.

    * The examples show init CRC to all 1's, not a large issue, but can take more code.
    * There is no Length element in USB, so the EOP signals when to finish-and-check, problem is, in simplest designs by the time EOP arrives, you have just done a CRC on itself. Hmm...

    I think the CRC applies only to the preceeding data

    Maybe there is enough time to roll-back those last 16 bits of CRC ? (anyone seen CRC roll-back code ?)

    Other option would be a 16bit delay line feeding CRC, so the CRC is that from 16 bits back in time.
    That needs some init of that delay line - to what content ?
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-09 18:31
    Sorry Cluso.

    Don't mind what post it was


    Cluso99 wrote: »
    Sapieha, that link is to the first post. Did you mean to post a link to the actual post# ?
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-09 18:35
    Hi Cluso.

    Link to that Site.

    http://www.easics.be/webtools/crctool
  • roglohrogloh Posts: 5,786
    edited 2014-03-09 18:37
    jmg wrote: »
    CRC could be included on-the-fly in the suggested DS_Shift_JEQ8 opcode, it will fit, but I see some fish hooks.

    * The examples show init CRC to all 1's, not a large issue, but can take more code.
    * There is no Length element in USB, so the EOP signals when to finish-and-check, problem is, in simplest designs by the time EOP arrives, you have just done a CRC on itself. Hmm...

    I think the CRC applies only to the preceeding data

    Maybe there is enough time to roll-back those last 16 bits of CRC ? (anyone seen CRC roll-back code ?)

    Other option would be a 16bit delay line feeding CRC, so the CRC is that from 16 bits back in time.
    That needs some init of that delay line - to what content ?

    You may not have to worry about rollback. I'm not a CRC expert but from memory if you include the CRC itself in the CRC accumulation you may end up with a zero or some known constant to check.
  • jmgjmg Posts: 15,173
    edited 2014-03-09 18:52
    rogloh wrote: »
    You may not have to worry about rollback. I'm not a CRC expert but from memory if you include the CRC itself in the CRC accumulation you may end up with a zero or some known constant to check.

    I think that is true only for checksums.

    I think there is a post-rx check, which relies on pre-load of CRC, which comes naturally with the Opcode-on-register design,

    Works like this
    Pass1: have full packet, all data and RxCRC and the Calculated CRC, which is 'overcooked' by having run on CRC too.
    Save ocCRC for later.

    What is needed is a equality comparison, so is we re-prime the CRC register with the RxCRC, and now play through the RxCRC for 16 'clocks', we now have a copy of RxCRC (+) Last 16 bits(=CRC) = ocCRC, and this is compared with saved ocCRC, and we do not need to reverse CRC, we just need to duplicate the CRC-append, and check that.

    This would cost preloads + 16x(RRC+DS_Shift_JEQ8), at the end of a packet. Is that too slow ?
  • jmgjmg Posts: 15,173
    edited 2014-03-09 18:54
  • roglohrogloh Posts: 5,786
    edited 2014-03-09 19:24
    jmg wrote: »
    I think that is true only for checksums.

    Maybe you are right (I don't know either way) but this is what Wikipedia had to say: http://en.wikipedia.org/wiki/Computation_of_cyclic_redundancy_checks

    "One-pass checking

    When appending a CRC to a message, it is possible to detach the transmitted CRC, recompute it, and verify the recomputed value against the transmitted one. However, a simpler technique is commonly used in hardware.

    When the CRC is transmitted with the correct bit order (most significant terms first), a receiver can compute an overall CRC, over the message and the CRC, and if the CRC is correct, the result will be zero. This possibility is the reason that most network protocols that include a CRC do so before the ending delimiter; it is not necessary to know whether the end of the packet is imminent to check the CRC."
  • jmgjmg Posts: 15,173
    edited 2014-03-09 19:54
    rogloh wrote: »
    Maybe you are right (I don't know either way) but this is what Wikipedia had to say: http://en.wikipedia.org/wiki/Computation_of_cyclic_redundancy_checks

    "One-pass checking

    When appending a CRC to a message, it is possible to detach the transmitted CRC, recompute it, and verify the recomputed value against the transmitted one. However, a simpler technique is commonly used in hardware.

    When the CRC is transmitted with the correct bit order (most significant terms first), a receiver can compute an overall CRC, over the message and the CRC, and if the CRC is correct, the result will be zero. This possibility is the reason that most network protocols that include a CRC do so before the ending delimiter; it is not necessary to know whether the end of the packet is imminent to check the CRC."

    Cool , I just assumed it was too complex to do that, if that is correct, then life does get a lot simpler.
    Just Prime CRC field with the needed 1's and check for 0000 at the EOP - no post RX footwork needed at all.

    The Verilog above does USB DeStuff, it may be an opcode param or two can select HDLC DeStuff or no Destuff, which would allow the CRC engine in the opcode to be used for Txmit ?
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 20:25
    It is simple enough to just push the CRC at the end of each byte onto the tasks 4 deep stack. At the end, the CRC will be two pops down. QED.
  • jmgjmg Posts: 15,173
    edited 2014-03-09 20:36
    Cluso99 wrote: »
    It is simple enough to just push the CRC at the end of each byte onto the tasks 4 deep stack. At the end, the CRC will be two pops down. QED.

    With the 4 Field DS_Shift_JEQ8 proposed, there is not even the need to do that.

    The CRC is available in the upper bits of the register, and preserves across bytes.
    If the total packet CRC sums over itself to Zero, then you just check that field for 0000 at the EOP

    A switch would be needed to make the CRC field accessible for transmit, tho I suppose it could call on every Physical TxBit, in which case the de-stuff does not need disable ?

    That allows one opcode to be used both ways, (but it does not stuff-on-tx)
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 21:29
    Here is a possible single bit CRC Verilog that should do CRC5usb, CRC16usb and CRC16ccitt...

    You will note where the resultant bits are different for the 3 crc polynomials, I have just included 3 lines, first for CRC5usb, then CRC16usb and last CRC16ccitt. These 3 statements need to have some if then or similar decoding depending upon which crc polynomial is chosen.
    For the n/a case of crc5, anything can be chosen.
    ////////////////////////////////////////////////////////////////////////////////
    // Copyright (C) 1999-2008 Easics NV.
    // This source file may be used and distributed without restriction
    // provided that this copyright statement is not removed from the file
    // and that any derivative work contains the original copyright notice
    // and the associated disclaimer.
    //
    // THIS SOURCE FILE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS
    // OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
    // WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
    //
    // Purpose : synthesizable CRC function
    //
    // Info : [EMAIL="tools@easics.be"]tools@easics.be[/EMAIL]
    //        [URL="http://www.easics.com/"]http://www.easics.com[/URL]
    //
    // RR20130310 modified for CRC5, CRC16usb & CRC16ccitt
    ////////////////////////////////////////////////////////////////////////////////
    module CRC;
      // polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16=(0 5 12 16) 
      // data width: 1
      // convention: the first serial bit is D[0]
      function [15:0] nextCRC16_D1;
        input Data;
        input [15:0] crc;
        reg [0:0] d;
        reg [15:0] c;
        reg [15:0] newcrc;
      begin
        d[0] = Data;
        c = crc;
    
        newcrc[0] = d[0] ^ c[4];
        newcrc[0] = d[0] ^ c[15];
        newcrc[0] = d[0] ^ c[15];
    
        newcrc[1] = c[0];
    
        newcrc[2] = d[0] ^ c[1] ^ c[4];
        newcrc[2] = d[0] ^ c[1] ^ c[15];
        newcrc[2] = c[1];
    
        newcrc[3] = c[2];
        newcrc[4] = c[3];
        
        //n/a
        newcrc[5] = c[4];
        newcrc[5] = d[0] ^ c[4] ^ c[15];
    
        newcrc[6] = c[5];
        newcrc[7] = c[6];
        newcrc[8] = c[7];
        newcrc[9] = c[8];
        newcrc[10] = c[9];
        newcrc[11] = c[10];
        
        //n/a
        newcrc[12] = c[11];
        newcrc[12] = d[0] ^ c[11] ^ c[15];
    
        newcrc[13] = c[12];
        newcrc[14] = c[13];
    
        //n/a
        newcrc[15] = d[0] ^ c[14] ^ c[15];
        newcrc[15] = c[14];
    
        nextCRC16_D1 = newcrc;
      end
      endfunction
    endmodule
    
    
    Thanks for the easics link as I used this to see what Verilog was generated for each polynomial.
  • jmgjmg Posts: 15,173
    edited 2014-03-09 21:42
    When you have 2 or more conflicting lines, as you do in that Verilog, my Verilog compiler seems to simply apply the last-line and ignore the others.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 22:08
    jmg: Of course they are going to fail - didn't you read my comments?

    I am not sure how you specify the inputs to select the 3 possible polynomials. Presuming
    00 = crc5
    10 = crc16 usb
    11 = crc16 ccitt

    then how would you write the following selectively for
    if crc5 then
        newcrc[0] = d[0] ^ c[4];
    else if crc16usb then
        newcrc[0] = d[0] ^ c[15];
    else
        newcrc[0] = d[0] ^ c[15];
    endif;
    
    And yes this one can be simplified.
    Or would a complex statement something like
        newcrc[0] = d[0] ^ (enable5 & c[4]) ^ (enable16 & c[15]) ;
    
    be better.
    Best get out my Verilog intro.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-09 22:55
    Here is an update that I hope decodes the 3 supported polynomials
    module CRC;
      // polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16=(0 5 12 16) 
      // data width: 1
      // convention: the first serial bit is D[0]
      function [15:0] nextCRC16_D1;
        input Data;                 // the Carry flag
        input [15:0] crc;
        input [1:0] poly;           // 00=crc5usb, 01=illegal, 10=crc16usb, 11=crc16ccitt 
        reg [0:0] d;
        reg [15:0] c;
        reg [poly] p;               
        reg [15:0] newcrc;
      begin
        d[0] = Data;
        c = crc;
        p = poly;
        newcrc[0] = d[0] ^ (!p[1] & c[4]) ^ (p[1] & c[15]);
        newcrc[1] = c[0];
        newcrc[2] = (p[0] & d[0]) ^ c[1] ^ (!p1[1] & c[4]) ^ (!p[0] & c[15]);
        newcrc[3] = c[2];
        newcrc[4] = c[3];
        newcrc[5] = (p[0] & d[0]) ^ c[4] ^ (p[0] & c[15]);
        newcrc[6] = c[5];
        newcrc[7] = c[6];
        newcrc[8] = c[7];
        newcrc[9] = c[8];
        newcrc[10] = c[9];
        newcrc[11] = c[10];
        newcrc[12] = (p[0] & d[0]) ^ c[11] ^ (p[0] & c[15]);
        newcrc[13] = c[12];
        newcrc[14] = c[13];
        newcrc[15] = (!p[0] & d[0]) ^ c[14] ^ (!p[0] & c[15]);
        newcrc[15] = c[14];
        nextCRC16_D1 = newcrc;
      end
      endfunction
    endmodule
    
    
  • jmgjmg Posts: 15,173
    edited 2014-03-09 23:06
    I would tend to code it so you can quickly scan-check it, with a single mode declaration point
    // 00=crc5usb, 01=illegal, 10=crc16usb, 11=crc16ccitt
    	always @(poly)  begin   
    	  crc5usb    = (poly == 2'b00); 
     	  crc16usbb  = (poly == 2'b10); 
    	  crc16ccitt = (poly == 2'b11); 
    	end
    
  • msrobotsmsrobots Posts: 3,709
    edited 2014-03-10 11:53
    Hi all.

    I just want to remind you that SD card also need CRC. In this case we also need CRC7.
    The polynomial for CRC7 is 0x89; the polynomial for CRC16 is 0x1021 which is based upon a standard called CRC-CCITT.

    Thanks

    Mike
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-10 11:58
    SD card 4 bit mode is painful, the CRC is evaluated independently for each of the four bits in parallel.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-10 16:15
    msrobots wrote: »
    Hi all.

    I just want to remind you that SD card also need CRC. In this case we also need CRC7.
    The polynomial for CRC7 is 0x89; the polynomial for CRC16 is 0x1021 which is based upon a standard called CRC-CCITT.

    Thanks

    Mike
    Mike,
    The poly for CRC16 USB is $8005. $1021 is for CRC16 CCITT. Confusing isn't it. IBM created the original CRC16 (as now used by USB) for use in sync comms back in the 80's or earlier. But as Europe usually does, they had to use a different poly ;)
  • roglohrogloh Posts: 5,786
    edited 2014-03-10 19:41
    @Cluso99,

    There is one thing still confusing me about the proposed GETXP instruction. After calling such an instruction you would get carry flag C result being the XOR of the original C flag value and one of the USB data pins. So if C comes back as 1 that means it was different to the sampled pin value, and if it comes back 0 it was the same value as the sampled pin value. This is fine and it detects logical 0/1 NRZI bitstream nicely.

    However, unless I am missing something else it appears you would then want to reuse C again for the next iteration. The problem is that this time around C is not the last pin value, it indicates whether there a difference between previous C value and the previous pin value. So some other operation to reset C back to the previous data pin value appears to be required before the next time it gets called, or some trick is required. Are you doing this as well somewhere in your code? I didn't see that mentioned anywhere. Won't this require an additional clock cycle to do?
  • jmgjmg Posts: 15,173
    edited 2014-03-10 19:56
    rogloh wrote: »
    So some other operation to reset C back to the previous data pin value appears to be required before the next time it gets called, or some trick is required. Are you doing this as well somewhere in your code? I didn't see that mentioned anywhere. Won't this require an additional clock cycle to do?

    I think this last value work is done in the background, as part of the opcode.
    It means the very first opcode C will be discarded, as the previous value is ?? but the SE0 will be valid from first clock.

    Once an edge is sensed, the SW will phase adjust to try to sample in bit-centre.
    The USB bit stream allows for edge-resync but that may be harder to achieve at 12MHz, so some limits on Xtal tolerance and data-length may be imposed.
    A 1.5MHz P2 probably could manage edge-resync, and 1.5MHz is fine for a lot of tasks.

    It may be that the P2 Counters have a capture mode that can help with edge-resync ?
Sign In or Register to comment.