Shop OBEX P1 Docs P2 Docs Learn Events
P2 and full speed USB slave requirements/ideas - Page 4 — Parallax Forums

P2 and full speed USB slave requirements/ideas

124

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 19:04
    jmg wrote: »
    I'm not following, byte handling allows much lower clocks, even in one task.
    I think lowish (FPGA region) clocks and threads should be a practical goal.
    If there is any possibility of doing it in 1 cog, then threads or interleaved code will be required. If we have a couple of free instructions per bit, then that permits getting things ready for the reply, as well as decoding as it comes in.
    Ideally, but TX is less 'drop dead' as it can take some time to assemble/organize things I think.
    True of course.
    Of course, I think coding a "verilog clone" in SW for 1.5 MHz USB testing should be possible.
    If that also allows timer-paced sampling, it is a small step to use counters and a per-byte jump.

    The shift to timer-paced operation uses almost identical Verilog, and a data buffer for read is small.
    It may also avoid this somewhat complex opcode, pushing down fMAX if it works on register-space.
    (timer paced code decouples things a little from register critical paths)
    The reason I am using FS rather than LS is that I can parallel the FTDI transactions, so I can snoop the USB FS. I don't have any LS USB devices (that I know of) that I can snoop the same way.
    What this means is that I can decode all the incoming frames to the FTDI Chip (connected to a P1). I can also snoop the replies.
    Remember, while I have read the USB Spec summaries, and looked at code doing the protocol, I have never actually done it. However, I have written lots of sync software over many years including SDLC and BiSync, and built ASCII to EBCDIC sync converters. But this was before TCPIP etc.
    I think maybe the CRC does not need a buffered read, as it is checked on EOP ?
    As long as you save after each byte, and you keep at least 3 levels, you will have the CRC available. I am unsure if the CRC can be used to verify the CRC (if you know what I mean). I will need to do some simple testing of this.
    BTW I did write a simple P1 program to calculate CRC5 & CRC16 for USB. I just have not got around to looking at it.
    If there are spare virtual Pins, the USB RxRDY flags could hook into some of those ?
    Not sure what you mean here?
    Chip would likely need to modify the counters slightly to allow /N reloadable counting, and edge resync.
    I'm not sure if those modes are already in the Counters.
    I don't know.

    I am still after the KISS way, at least for now. If it turns out that it's not too complex to set off a simple instruction to run in the background like the mult/cordic instructions, and they don't take huge blocks of silicon, then it may be worthwhile. ATM I am trying to walk not run.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 19:10
    jmg wrote: »
    That's starting to sound like a lot of crossed fingers...?
    Chip may already have edge reset modes in the counters, and I think the SW WAIT can then work, with a Counter.

    To test at 1.5MHz, and a simple Reload timer, the FPGA needs to clock at either 78MHz or 81MHz , with reload values of 52 or 54, and use SW wait values of 50% of those for mid-bit sampling.
    No, its not crossed fingers. I can reliably resync to new packets at 80MHz FPGA. It is just coded sequentialy atm to read the incoming data bytes.
    For 80MHz you wait +1/3 +1/3 -2/3 (ie you wait an extra +1 clock +1 clock, -2 clocks). That is how they got USB to work originally.
  • jmgjmg Posts: 15,173
    edited 2014-03-11 19:13
    Cluso99 wrote: »
    I am unsure if the CRC can be used to verify the CRC (if you know what I mean). I will need to do some simple testing of this.

    See the discussion further up.
    Apparently, if you include the CRC in the stream, ie read to the end tag, then the CRC should read 0000
    That would make life simpler.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 19:13
    jmg wrote: »
    The problem with this, is if the Verilog needs a lot of changes( as this does), it quickly becomes too clumsy to have someone else applying fix-ups. Also in the form you code, checking is harder as it is not so self contained.

    As always, it is better to code in small pieces, get 'working' equations, and look at the .eq0 & .rpt files to confirm you have counters / clock enables / MUXes as expected, and no logic blow-outs.

    Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).
    ////////////////////////////////////////////////////////////////////////////////
    // Acknowledgements: Verilog code for CRC's   http://www.easics.com
    // RR20140310 start
    // RR20140311,12 continued
    ////////////////////////////////////////////////////////////////////////////////
    // polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16ccitt=(0 5 12 16) 
    // data width: 1, LSB first
    //
    // inputs:  D, S, PINS
    // outputs: D, Z, C
    module          RxUSB
    (
    input   CLK,                 // 
    input   Load_d,              // 
    input   jI,              // 
    input   kI,              // 
    input   WZ,              // 
    input   WC,              // 
    input   [31:0]  s,              // S operand
    input   [31:0]  d,              // D operand
    input   [127:0] p,              // input pins
    output reg   [31:0]  r,              // D result
    output reg        zz,              // Z flag
    output reg        cy               // Carry flag    
    
    );
    
    reg     [15:0]  crc;            // original CRC (accumulated)
    reg     [2:0]   bitcnt;         // data bit counter 3 bits
    reg             k;              // K new pin value
    reg             j;              // J new pin value
    reg     [2:0]   stuffcnt;       // stuff counter 3 bits
    reg     [7:0]   data;           // data byte (accumulated)
    reg     [8:7]   poly;           // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    reg     [6:0]   pinno;          // pin pair numbers 0-127
    reg     [15:0]  newcrc;         // new crc
    reg             t;              // 1 if k toggles (ie 1 bit)
    reg             kP;              // K old pin value
    reg             jP;              // J old pin value
    //reg     r,z,c;              // D result
    
    
    reg     crc05usb;
    reg     crc16usb;
    reg     crc16itt;
    reg     crc16ndef;
    reg     SkipStuff;
    reg     InvalidPM;
    
    ///////////////////////////////////////////////////////////////////////////////
    
    // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
        always @(poly)  begin   
            crc05usb  = (poly == 2'b00);    // CRC5usb   =(0 2 5)
            crc16usb  = (poly == 2'b01);    // CRC16usb  =(0 2      15 16)
            crc16itt  = (poly == 2'b10);    // CRC16ccitt=(0   5 12    16)
            crc16ndef = (poly == 2'b11);    // undefined - alias to one above 
        end
    
    // check for a "1" bit toggle
        always @(kI or jI or kP or stuffcnt)  begin   
          t = kI ^ kP;                 // new pin value ^ previous pin value; 1=toggled
          SkipStuff = (!t & (stuffcnt == 3'b110));  // !t needed for ccitt ?
          InvalidPM = (kI==jI);    // Signaling states are non-diff
        end
    
    
    
    always  @(posedge CLK) begin
      if (Load_d) begin  // WRITE to register - Value INIT
    //    crc      = d[31:16];        // original crc value (accum) moved below
        kP       = d[15];           // previous K
        jP       = d[14];           // previous J
        stuffcnt = d[13:11];        // original stuff counter value
        bitcnt   = d[10:8];         // original bit   counter value
        data     = d[7:0];          // original data value (accum)
    
        poly     = s[8:7];          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    //?   kpin     = value(s[6:0]);   // K pin no.
    //?   jpin     = value(s[6:0]) ^1 // J pin no.
        k        = kI;      // new pin value
        j        = jI;      // new pin value
      end // Load_d
      else begin // !Load_d  = normal RUN , compiler wants in one block..
        k        = kI;      // new pin value
        j        = jI;      // new pin value
        kP       = k;       // previous K
        jP       = j;       // previous J
    
    // check for bit unstuff
        if (SkipStuff) begin
            // unstuff
            stuffcnt = 3'b000;
    //      bitcnt   = bitcnt;    //  implicit, but makes hold action clear 
        end
        else if (!InvalidPM) begin
            // inc bit count & accum data bit into byte
            bitcnt++;
            stuffcnt++;           // will be reset at result if input bit toggles
        end  
      end // Load_d 
    end // (posedge CLK)     
    
    
    reg     kr0;
    reg     kr2;
    reg     kr5;
    reg     kr12;
    reg     kr15;
    
    always @(*) begin 
    // calculate the new crc... - decoded values, so no overlaps in if
        if (crc05usb) begin
            kr0  = t ^ crc[4];
            kr2  = t ^ crc[4];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = 1'b0; 
        end
        if (crc16usb) begin
            kr0  = t ^ crc[15];
            kr2  = t ^ crc[15];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = t ^ crc[15]; 
        end
        if (crc16itt) begin
            kr0  = t ^ crc[15];
            kr2  = 1'b0;
            kr5  = t ^ crc[15];
            kr12 = t ^ crc[15];
            kr15 = 1'b0; 
        end    
        if (crc16ndef) begin  // alias crc16itt, so cover ALL decodes.
            kr0  = t ^ crc[15];
            kr2  = 1'b0;
            kr5  = t ^ crc[15];
            kr12 = t ^ crc[15];
            kr15 = 1'b0; 
        end    
    end // always @(*)
    
    always  @(posedge CLK) begin
      if (Load_d) begin  // WRITE to register - Value INIT
        crc      = d[31:16];        // original crc value (accum)
      end 
      else if (!InvalidPM & !SkipStuff) begin
        crc[0]  = kr0;
        crc[1]  = crc[0];
        crc[2]  = crc[1] ^ kr2;
        crc[3]  = crc[2];
        crc[4]  = crc[3];
        crc[5]  = crc[4] ^ kr5;
        crc[6]  = crc[5];
        crc[7]  = crc[6];
        crc[8]  = crc[7];
        crc[9]  = crc[8];
        crc[10] = crc[9];
        crc[11] = crc[10];
        crc[12] = crc[11] ^ kr12;
        crc[13] = crc[12];
        crc[14] = crc[13];
        crc[15] = crc[14] ^ kr15;
      end // valid 
    end // (posedge CLK)     
        
    // set results
    always @(*) begin 
        r[31:16] = crc;
        r[15]    = k;
        r[14]    = j;
    end // always @(*)
    
    always  @(*) begin   // non register here ? - this is a bit mangled, data needs fixing 
        if (t)   begin                // toggled bit?
            r[13:11] = 3'b000;       // reset stuff counter - moved to above
        end
        else begin   
            r[13:11] = stuffcnt;
        end    
        r[10:8]  = bitcnt;
        if (SkipStuff) begin
            r[7:0] = data;
        end
        else begin
            r[7:1] = data[6:0];
            r[0]   = t;             // add new data bit
        end        
    end // @(*)     
    
    always  @(posedge CLK) begin
        if (WZ) begin
            if (  !SkipStuff & (bitcnt == 3'b000)) begin
                zz = 1'b1;           // byte ready
            end
            else begin
                zz = 1'b0;           // byte not ready
            end    
        end
    
        if (WC) begin          
            cy = k ^ j;              // c = SE0/SE1
        end           
    end // (posedge CLK)   
    endmodule  // RxUSB
    
    
    Thanks heaps.
    Some things I didn't know was crc = crc ^ x was possible.
    Also which ways are the best. These are all things I don't understand.
    So for me, its better that I ultimately put the things to be done within if blocks and let Chip (or you) sort that part out for me.
  • jmgjmg Posts: 15,173
    edited 2014-03-11 19:15
    Cluso99 wrote: »
    No, its not crossed fingers. I can reliably resync to new packets at 80MHz FPGA. It is just coded sequentialy atm to read the incoming data bytes.

    The issue is not resync at the beginning, it is sampling creep during long packets.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 19:15
    jmg wrote: »
    See the discussion further up.
    Apparently, if you include the CRC in the stream, ie read to the end tag, then the CRC should read 0000
    That would make life simpler.
    Yes, this is what I was wondering. Seems it should be possible because that is the way the old hw would have likely worked. But then again, it does not sound correct.
    It is easily solved by running my P1 program with some input parameters and see. just haven't got around to it yet.
  • jmgjmg Posts: 15,173
    edited 2014-03-11 20:14
    Cluso99 wrote: »
    Thanks heaps.

    I've updated the code, as checking the eqns showed it dropped the ball on some CRC nodes.
    I tend to always use <= for clocked and = combin, and it seems your use of = in clocked sometimes works, but can get confused on more complex forms...
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 20:31
    jmg wrote: »
    Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).
    I have reproduced snippets for related questions and my understanding of them...
    // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
        always @(poly)  begin   
            crc05usb  = (poly == 2'b00);    // CRC5usb   =(0 2 5)
            crc16usb  = (poly == 2'b01);    // CRC16usb  =(0 2      15 16)
            crc16itt  = (poly == 2'b10);    // CRC16ccitt=(0   5 12    16)
            crc16ndef = (poly == 2'b11);    // undefined - alias to one above 
        end
    
    Always decodes this, so it is a logic block, not a clocked register.
    // check for a "1" bit toggle
        always @(kI or jI or kP or stuffcnt)  begin   
          t = kI ^ kP;                 // new pin value ^ previous pin value; 1=toggled
          SkipStuff = (!t & (stuffcnt == 3'b110));  // !t needed for ccitt ?
          InvalidPM = (kI==jI);    // Signaling states are non-diff
        end
    
    Always decodes this on any change in the inputs kI, jI, kP, or stuffcnt. Again, a logic block.
    always  @(posedge CLK) begin
      if (Load_d) begin  // WRITE to register - Value INIT
    //    crc      = d[31:16];        // original crc value (accum) moved below
        kP       = d[15];           // previous K
        jP       = d[14];           // previous J
        stuffcnt = d[13:11];        // original stuff counter value
        bitcnt   = d[10:8];         // original bit   counter value
        data     = d[7:0];          // original data value (accum)
    
        poly     = s[8:7];          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    //?   kpin     = value(s[6:0]);   // K pin no.
    //?   jpin     = value(s[6:0]) ^1 // J pin no.
        k        = kI;      // new pin value
        j        = jI;      // new pin value
      end // Load_d
    
    Performed once at the start of each instruction (loads initial values), clocked by CLK.
      else begin // !Load_d  = normal RUN , compiler wants in one block..
        k        = kI;      // new pin value
        j        = jI;      // new pin value
        kP       = k;       // previous K
        jP       = j;       // previous J
    // check for bit unstuff
        if (SkipStuff) begin
            // unstuff
            stuffcnt = 3'b000;
    //      bitcnt   = bitcnt;    //  implicit, but makes hold action clear 
        end
        else if (!InvalidPM) begin
            // inc bit count & accum data bit into byte
            bitcnt++;
            stuffcnt++;           // will be reset at result if input bit toggles
        end  
      end // Load_d 
    end // (posedge CLK)     
    
    Completes the remaining initialisation, clocked by CLK.
    reg     kr0;
    reg     kr2;
    reg     kr5;
    reg     kr12;
    reg     kr15;
    
    always @(*) begin 
    // calculate the new crc... - decoded values, so no overlaps in if
        if (crc05usb) begin
            kr0  = t ^ crc[4];
            kr2  = t ^ crc[4];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = 1'b0; 
        end
        if (crc16usb) begin
            kr0  = t ^ crc[15];
            kr2  = t ^ crc[15];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = t ^ crc[15]; 
        end
        if (crc16itt) begin
            kr0  = t ^ crc[15];
            kr2  = 1'b0;
            kr5  = t ^ crc[15];
            kr12 = t ^ crc[15];
            kr15 = 1'b0; 
        end    
        if (crc16ndef) begin  // alias crc16itt, so cover ALL decodes.
            kr0  = t ^ crc[15];
            kr2  = 1'b0;
            kr5  = t ^ crc[15];
            kr12 = t ^ crc[15];
            kr15 = 1'b0; 
        end    
    end // always @(*)
    
    always  @(posedge CLK) begin
      if (Load_d) begin  // WRITE to register - Value INIT
        crc      = d[31:16];        // original crc value (accum)
      end 
      else if (!InvalidPM & !SkipStuff) begin
        crc[0]  = kr0;
        crc[1]  = crc[0];
        crc[2]  = crc[1] ^ kr2;
        crc[3]  = crc[2];
        crc[4]  = crc[3];
        crc[5]  = crc[4] ^ kr5;
        crc[6]  = crc[5];
        crc[7]  = crc[6];
        crc[8]  = crc[7];
        crc[9]  = crc[8];
        crc[10] = crc[9];
        crc[11] = crc[10];
        crc[12] = crc[11] ^ kr12;
        crc[13] = crc[12];
        crc[14] = crc[13];
        crc[15] = crc[14] ^ kr15;
      end // valid 
    end // (posedge CLK)     
    
    Calculates the new CRC, if required else keep the same, clocked by CLK. (needs a tidy up)
    // set results
    always @(*) begin 
        r[31:16] = crc;
        r[15]    = k;
        r[14]    = j;
    end // always @(*)
    
    always  @(*) begin   // non register here ? - this is a bit mangled, data needs fixing 
        if (t)   begin                // toggled bit?
            r[13:11] = 3'b000;       // reset stuff counter - moved to above
        end
        else begin   
            r[13:11] = stuffcnt;
        end    
        r[10:8]  = bitcnt;
        if (SkipStuff) begin
            r[7:0] = data;
        end
        else begin
            r[7:1] = data[6:0];
            r[0]   = t;             // add new data bit
        end        
    end // @(*)     
    
    Accumulates the new Data, if required else keep the same, clocked by CLK.
    Increment the bit counter, if required.
    Increments the stuff counter, or resets it, as required.
    always  @(posedge CLK) begin
        if (WZ) begin
            if (  !SkipStuff & (bitcnt == 3'b000)) begin
                zz = 1'b1;           // byte ready
            end
            else begin
                zz = 1'b0;           // byte not ready
            end    
        end
    
        if (WC) begin          
            cy = k ^ j;              // c = SE0/SE1
        end           
    end // (posedge CLK)   
    endmodule  // RxUSB
    
    Sets the Z & C flags, clocked by CLK.
  • jmgjmg Posts: 15,173
    edited 2014-03-11 21:01
    Always decodes this, so it is a logic block, not a clocked register.
    Always decodes this on any change in the inputs kI, jI, kP, or stuffcnt. Again, a logic block.

    Yes, these are to make later code easier to read. The compiler/fit will likely optimize some of these names away.
    To keep them, they can be move to the module header where they become pins
    Performed once at the start of each instruction (loads initial values), clocked by CLK.
    Completes the remaining initialisation, clocked by CLK.

    Only sort of. That's where it gets tricky - the best code is stand alone, that has one register and some muxes on that.
    That makes eqn-scan, and general testing easier.

    If you want to code this like it is a Read/Write path on a register, then you do not have the register as well and so it does not reduce down to useful equations.
    Calculates the new CRC, if required else keep the same, clocked by CLK. (needs a tidy up)
    See the amended code, <= is better than =, strangely = is almost right, and seems ok in very simple cases, and gives no errors.
    Accumulates the new Data, if required else keep the same, clocked by CLK.
    Increment the bit counter, if required.
    Increments the stuff counter, or resets it, as required.

    Again not quite, the counters are further up, and the data should really be registered
    Sets the Z & C flags, clocked by CLK.

    Yes.

    Rather than trying the double gymnastics of [start of each instruction] and [end of each instruction], I think it is best in the early stages to KISS, and focus on simplest most readable verilog, that is then used as a template for software.
    I treat each CLK as a data sample point on the USB waveform.

    I'm sure Chip will be able to re-warp it around registers, if the pathways allow it, or he may choose to use separate registers.
    At some point the extra muxes to merge all this into the opcode tree, will bite into the MHz values.
    Local routing is smaller and faster.

    The only benefit of a full merge into the multiport register stack, is you can run multiple copies in multiple registers, but I don't think anyone is expecting to run TWO USBs in one COG ?! Just one USB with some spare MIPS would be fine for most.
    There are 8 COGS here.
  • jmgjmg Posts: 15,173
    edited 2014-03-11 21:31
    Cluso99 wrote: »
    The reason I am using FS rather than LS is that I can parallel the FTDI transactions, so I can snoop the USB FS. I don't have any LS USB devices (that I know of) that I can snoop the same way.
    What this means is that I can decode all the incoming frames to the FTDI Chip (connected to a P1). I can also snoop the replies.
    Remember, while I have read the USB Spec summaries, and looked at code doing the protocol, I have never actually done it. However, I have written lots of sync software over many years including SDLC and BiSync, and built ASCII to EBCDIC sync converters. But this was before TCPIP etc.

    I recalled other discussions on USB & port debug, and the suggested tool was PortMon.
    http://technet.microsoft.com/en-us/sysinternals/bb896644.aspx

    A P1 might even be able to edge-capture to 12.5ns at 1.5MHz speeds, as an edge-based logic analyser ?
    A complete frame can never need more than 1500 max stamps.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-11 21:37
    Hi jmg.

    It compiles in Quartus for me.

    Maybe form this SCH attached --- You can see if all logic's are as desired.


    jmg wrote: »
    The problem with this, is if the Verilog needs a lot of changes( as this does), it quickly becomes too clumsy to have someone else applying fix-ups. Also in the form you code, checking is harder as it is not so self contained.

    As always, it is better to code in small pieces, get 'working' equations, and look at the .eq0 & .rpt files to confirm you have counters / clock enables / MUXes as expected, and no logic blow-outs.

    Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).
    ////////////////////////////////////////////////////////////////////////////////
    // Acknowledgements: Verilog code for CRC's   http://www.easics.com
    // RR20140310 start
    // RR20140311,12 continued
    ////////////////////////////////////////////////////////////////////////////////
    // polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16ccitt=(0 5 12 16) 
    // data width: 1, LSB first
    //
    // inputs:  D, S, PINS
    // outputs: D, Z, C
    module          RxUSB
    (
    input   CLK,                 // 
    input   Load_d,              // 
    input   jI,              // 
    input   kI,              // 
    input   WZ,              // 
    input   WC,              // 
    input   [31:0]  s,              // S operand
    input   [31:0]  d,              // D operand
    input   [127:0] p,              // input pins
    output reg   [31:0]  r,              // D result
    output reg        zz,              // Z flag
    output reg        cy,               // Carry flag    
    
    output reg     SkipStuff,   // move so can see in EQNs better
    output reg     InvalidPM
    
    );
    
    reg     [15:0]  crc;            // original CRC (accumulated)
    reg     [2:0]   bitcnt;         // data bit counter 3 bits
    reg             k;              // K new pin value
    reg             j;              // J new pin value
    reg     [2:0]   stuffcnt;       // stuff counter 3 bits
    reg     [7:0]   data;           // data byte (accumulated)
    reg     [8:7]   poly;           // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    reg     [6:0]   pinno;          // pin pair numbers 0-127
    reg     [15:0]  newcrc;         // new crc
    reg             t;              // 1 if k toggles (ie 1 bit)
    reg             kP;              // K old pin value
    reg             jP;              // J old pin value
    //reg     r,z,c;              // D result
    
    
    reg     crc05usb;
    reg     crc16usb;
    reg     crc16itt;
    reg     crc16ndef;
    
    ///////////////////////////////////////////////////////////////////////////////
    
    // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
        always @(poly)  begin   
            crc05usb  = (poly == 2'b00);    // CRC5usb   =(0 2 5)
            crc16usb  = (poly == 2'b01);    // CRC16usb  =(0 2      15 16)
            crc16itt  = (poly == 2'b10);    // CRC16ccitt=(0   5 12    16)
            crc16ndef = (poly == 2'b11);    // undefined - alias to one above 
        end
    
    // check for a "1" bit toggle
        always @(kI or jI or kP or stuffcnt)  begin   
          t = kI ^ kP;                 // new pin value ^ previous pin value; 1=toggled
          SkipStuff = (!t & (stuffcnt == 3'b110));  // !t needed for ccitt ?
          InvalidPM = (kI==jI);    // Signaling states are non-diff
        end
    
    
    
    always  @(posedge CLK) begin
      if (Load_d) begin  // WRITE to register - Value INIT
    //    crc      = d[31:16];        // original crc value (accum) moved below
        kP       = d[15];           // previous K
        jP       = d[14];           // previous J
        stuffcnt = d[13:11];        // original stuff counter value
        bitcnt   = d[10:8];         // original bit   counter value
        data     = d[7:0];          // original data value (accum)
    
        poly     = s[8:7];          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    //?   kpin     = value(s[6:0]);   // K pin no.
    //?   jpin     = value(s[6:0]) ^1 // J pin no.
        k        = kI;      // new pin value
        j        = jI;      // new pin value
      end // Load_d
      else begin // !Load_d  = normal RUN , compiler wants in one block..
        k        = kI;      // new pin value
        j        = jI;      // new pin value
        kP       = k;       // previous K
        jP       = j;       // previous J
    
    // check for bit unstuff
        if (SkipStuff) begin
            // unstuff
            stuffcnt <= 3'b000;
    //      bitcnt   = bitcnt;    //  implicit, but makes hold action clear 
        end
        else if (!InvalidPM) begin
            // inc bit count & accum data bit into byte
            bitcnt++;
            if (t) 
               stuffcnt <= 3'b000; // reset if input bit toggles
            else
               stuffcnt++;          
        end  
      end // Load_d 
    end // (posedge CLK)     
    
    
    reg     kr0;
    reg     kr2;
    reg     kr5;
    reg     kr12;
    reg     kr15;
    reg     HoldCRC;
    
    always @(*) begin 
    // calculate the new crc... - decoded values, so no overlaps in if
        if (crc05usb) begin
            kr0  = t ^ crc[4];
            kr2  = t ^ crc[4];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = 1'b0; 
        end
        if (crc16usb) begin
            kr0  = t ^ crc[15];
            kr2  = t ^ crc[15];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = t ^ crc[15]; 
        end
        if (crc16itt) begin
            kr0  = t ^ crc[15];
            kr2  = 1'b0;
            kr5  = t ^ crc[15];
            kr12 = t ^ crc[15];
            kr15 = 1'b0; 
        end    
        if (crc16ndef) begin  // alias crc16itt, so cover ALL decodes.
            kr0  = t ^ crc[15];
            kr2  = 1'b0;
            kr5  = t ^ crc[15];
            kr12 = t ^ crc[15];
            kr15 = 1'b0; 
        end  
        HoldCRC = InvalidPM | SkipStuff;
    end // always @(*)
    
    always  @(posedge CLK) begin
      if (Load_d) begin  // WRITE to register - Value INIT
        crc     <= d[31:16];        // original crc value (accum)
      end 
      else if (HoldCRC) begin
        crc[0]  <= kr0;              //16
        crc[1]  <= crc[0];           //17
        crc[2]  <= crc[1] ^ kr2;     //18
        crc[3]  <= crc[2];           //19
        crc[4]  <= crc[3];           //20
        crc[5]  <= crc[4] ^ kr5;     //21
        crc[6]  <= crc[5];           //22
        crc[7]  <= crc[6];           //23
        crc[8]  <= crc[7];           //24
        crc[9]  <= crc[8];           //25
        crc[10] <= crc[9];           //26
        crc[11] <= crc[10];          //27 - bad eqns??, needed <= 
        crc[12] <= crc[11] ^ kr12;   //28
        crc[13] <= crc[12];          //29
        crc[14] <= crc[13];          //30
        crc[15] <= crc[14] ^ kr15;   //31
      end // valid 
    end // (posedge CLK)     
        
    // set results
    always @(*) begin 
        r[31:16] = crc;
        r[15]    = k;
        r[14]    = j;
    end // always @(*)
    
    always  @(*) begin   // non register here ? - this is a bit mangled, data needs fixing 
        if (t)   begin                // toggled bit?
            r[13:11] = 3'b000;       // reset stuff counter
        end
        else begin   
            r[13:11] = stuffcnt;
        end    
        r[10:8]  = bitcnt;
        if (SkipStuff) begin
            r[7:0] = data;
        end
        else begin
            r[7:1] = data[6:0];
            r[0]   = t;             // add new data bit
        end        
    end // @(*)     
    
    always  @(posedge CLK) begin
        if (WZ) begin
            if (  !SkipStuff & (bitcnt == 3'b000)) begin
                zz <= 1'b1;           // byte ready
            end
            else begin
                zz <= 1'b0;           // byte not ready
            end    
        end
    
        if (WC) begin          
            cy <= k ^ j;              // c = SE0/SE1
        end           
    end // (posedge CLK)   
    
    
    endmodule
    
    

    Updated code, better CRC eqns
    RxUSB.jpg
    1024 x 367 - 49K
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-11 21:37
    P1: 10ns @ 100Mhz :-)

    Btw,

    http://www.seeedstudio.com/depot/Open-Workbench-Logic-Sniffer-p-612.html?cPath=63

    is a great little tool!

    I have a Hantek 500Msps logic analyzer, but I tend to use the Logic Sniffer's a lot more.
  • jmgjmg Posts: 15,173
    edited 2014-03-11 22:11

    Looks nice, says
    16 channels with 8K sample depth
    8 channels with 16K sample depth

    which is just a little light for 1 USB frame.

    My personal preference is Logic Analysers that capture & store timestamps, as they have much better dynamic range.
    A P1 might make 1.5MHz that way ?
    > 3MHz (say 4MHz) would allow capture of the USB edges and the mid-point sample-tags, but that may be asking too much.
    I guess multiple COGS could give more, and a Logic capture unit does not care it if uses 7 COGS for captures.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-11 22:25
    Hi Cluso.

    I have any question regarding USB packet --->

    1. Have every BYTE that header with bit stuff ---- else are it only at start of packet

    2. Have every byte any Start-Stop condition else Only entire packet?

    Sory for that questions --- But cant find that on Internet
  • Bob Lawrence (VE1RLL)Bob Lawrence (VE1RLL) Posts: 1,720
    edited 2014-03-11 22:31
    Re: Seed Studio:
    I tend to use the Logic Sniffer's a lot more.

    I use their logic Logic Sniffer(Ver 4). The Bus Pirate is cool as well.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 22:40
    I think this is getting close now. Thanks again jmg.
    ////////////////////////////////////////////////////////////////////////////////
    // RR20140310-12 P2 RxUSB instruction
    ////////////////////////////////////////////////////////////////////////////////
    /*---------------------------------------------------------------------------------------------------------------------
                  RxUSB   D, S/#          WZ,WC             ' Receive single NRZI bit pair, accum CRC and byte, unstuff bits
    where
      S/# is the PinPair# and Poly bits
        S[31..9]  = unused
        S[8..7]   = 00= CRC5  USB    (0 2 5)  
                    01= CRC16 USB    (0 2 15 16)
                    10= CRC16 CCITT  (0 5 12 16)
                    11= undefined
        S[6..0]   = D-/D+ Pin Pair #0..127
                    The pin pair is always a pair of pins mod 2. ie nnnnnnx where x=0 and x=1 for the pair.
                    If the pin pair is even (S[0]=0) then J is the lowest pin and K is the higher pin of the consecutive pair
                    If the pin pair is odd  (S[0]=1) then K is the lowest pin and J is the higher pin of the consecutive pair.
                    This arrangement allows for simple LS and FS by making the pin pair even or odd.                              
      D is the cog register storing a 32 bit field...
        D[31..16] = crc16
        D[15]     = K new pin value
        D[14]     = J new pin value
        D[13..11] = unstuff counter 3 bits
        D[10..8]  = bit counter 3 bits
        D[7..0]   = data byte accumulation
      Z = data byte ready (8 bits)
      C = SE0/SE1
    It would be acceptable for D to be at a fixed location eg $1F0.
    ---------------------------------------------------------------------------------------------------------------------*/
    // inputs:  D, S, PINS
    // outputs: D, Z, C
    ////////////////////////////////////////////////////////////////////////////////
    module          RxUSB
    (
    input           CLK,
    input           Load_d,
    input           jI,             // new J value
    input           kI,             // new K value
    input   [31:0]  s,              // S operand
    input   [31:0]  d,              // D operand
    input           wz,             // WZ operand
    input           wc,             // WC operand
    input   [127:0] p,              // input pins
    output  [31:0]  r,              // D result
    output          zz,             // Z flag
    output          cy              // C flag    
    );
    reg     [15:0]  crc;            // original CRC (accumulated)
    reg     [2:0]   bitcnt;         // data bit counter 3 bits
    reg             k;              // K new pin value
    reg             j;              // J new pin value
    reg     [2:0]   stuffcnt;       // stuff counter 3 bits
    reg     [7:0]   data;           // data byte (accumulated)
    reg     [1:0]   poly;           // crc05usb/crc16usb/crc16ccitt/undef polynomial selection
    reg     [6:0]   pinno;          // pin pair numbers 0-127
    reg             kP;             // K previous pin value
    reg             jP;             // J previous pin value
    // flags/conditions...
    reg             crc05usb;       // 00= CRC5  USB    
    reg             crc16usb;       // 01= CRC16 USB   
    reg             crc16itt;       // 10= CRC16 CCITT 
    reg             crc16ndef;      // 11= undefined   
    reg             toggle;         // data bit 0 or 1
    reg             BitStuff;       // unstuff this bit
    reg             SE0_SE1;        // SE0/SE1 condition
    ///////////////////////////////////////////////////////////////////////////////
    // set crc option
        always @(poly)  begin   
            crc05usb  = (poly == 2'b00);                    // CRC5usb   =(0 2 5)
            crc16usb  = (poly == 2'b01);                    // CRC16usb  =(0 2      15 16)
            crc16itt  = (poly == 2'b10);                    // CRC16ccitt=(0   5 12    16)
            crc16ndef = (poly == 2'b11);                    // undefined
        end
    // check for a "1" bit toggle, and SE0/SE1 conditions, and BitStuff condition
        always @(kI or jI or kP or stuffcnt)  begin   
            toggle    = kI ^ kP;                            // data bit (toggle) = new pin value ^ previous pin value
            SE0_SE1   = (kI == jI);                         // detect SE0/SE1 (j==k)
            BitStuff  = (!toggle & (stuffcnt == 3'b110) & (crc05usb or crc16usb));  // unstuff this bit
        end    
    ///////////////////////////////////////////////////////////////////////////////
    // Set Initial conditions
        always @(posedge CLK) begin
            if (Load_d) begin                               // write initial values to registers
                k0       = d[15];                           // previous K
                j0       = d[14];                           // previous J
                stuffcnt = d[13:11];                        // original stuff counter value
                bitcnt   = d[10:8];                         // original bit   counter value
                data     = d[7:0];                          // original data value (accum)
                poly     = s[8:7];                          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    // ???      kpin     = value(s[6:0]);                   // K pin no.
    // ???      jpin     = value(s[6:0]) ^ 1;               // J pin no.
    // ???      k        = pins[kpin];                      // new pin value
    // ???      j        = pins[jpin];                      // new pin value
                k        = kI;                              // new pin value
                j        = jI;                              // new pin value
            end
            else begin                                      // !Load_d = normal RUN (compiler wants in one block)
    // ??? is this correct way around etc ???
                k        = kI;                              // new pin value
                j        = jI;                              // new pin value
                kP       = kI;                              // previous pin value
                jP       = jI;                              // previous pin value
    // check for bit unstuff
                if (BitStuff) begin
                    // unstuff...
                    stuffcnt = 3'b000;                      // reset unstuff counter
    //              bitcnt   = bitcnt;                      // implicit but makes hold action clear
                end
                else if (!SE0_SE1) begin
                    // valid data bit
                    bitcnt++;                               // 
                    stuffcnt++;                             // will be reset at result if input bit toggles
                end
            end
        end                                                                          
    ///////////////////////////////////////////////////////////////////////////////
    // CRC routine
    reg             kr0;
    reg             kr2;
    reg             kr5;
    reg             kr12;
    reg             kr15;
    
    // calculate the new crc... (decoded values so no overlaps in if)
        always @(*) begin
            if crc05usb begin
                kr0  = toggle ^ crc[4];
                kr2  = toggle ^ crc[4];
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = 1'b0;
            end
            if crc16usb begin
                kr0  = toggle ^ crc[15];
                kr2  = toggle ^ crc[15];
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = toggle ^ crc[15]; 
            end
            if crc16itt egin
                kr0  = toggle ^ crc[15];
                kr2  = 1'b0;
                kr5  = toggle ^ crc[15];
                kr12 = toggle ^ crc[15];
                kr15 = 1'b0; 
            end
            if crc16ndef begin
                kr0  = 1'b0;
                kr2  = 1'b0;
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = 1'b0; 
            end
        end        
        always @(posedge CLK) begin
            if (Load_d) begin                               // write to reg initial value
                crc = d[31:16];                             // original crc value (accum)
            end
            else if (!SE0_SE1 & !BitStuff) begin
                crc[0]  = kr0;
                crc[1]  = crc[0];
                crc[2]  = crc[1] ^ kr2;
                crc[3]  = crc[2];
                crc[4]  = crc[3];
                crc[5]  = crc[4] ^ kr5;
                crc[6]  = crc[5];
                crc[7]  = crc[6];
                crc[8]  = crc[7];
                crc[9]  = crc[8];
                crc[10] = crc[9];
                crc[11] = crc[10];
                crc[12] = crc[11] ^ kr12;
                crc[13] = crc[12];
                crc[14] = crc[13];
                crc[15] = crc[14] ^ kr15;
            end
        end    
            
    ///////////////////////////////////////////////////////////////////////////////
        
    // set D results
        always @(*)  begin                                  ??? or @(posedge CLK)
            r[31:16] = crc;
            r[15]    = k;
            r[14]    = j;
            r[13:11] = stuffcnt;
            r[10..8] = bitcnt;
            if (BitStuff) begin
                r[7:0]   = data;                            // unstuff so no change
            end
            else begin
                r[6:0] = data[7:1];                         // LSB first - shift and...
                r[7]   = toggle;                            // ...add new data bit
            end       
        end    
        
    // set Z and C flags
        always @(*)  begin                                  ??? or @(posedge CLK)
            if wz then begin
                if (!BitStuff & (bitcnt == 3'b000)) begin
                    zz = 1'b1;                              // byte ready
                end
                else begin    
                    zz = 1'b0;                              // byte not ready
                end
            end
            if wc then begin          
                cy = SE0_SE1;                               // c = SE0/SE1
            end           
        end
    endmodule
    ///////////////////////////////////////////////////////////////////////////////
    
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 22:44
    jmg wrote: »
    Yes, these are to make later code easier to read. The compiler/fit will likely optimize some of these names away.
    To keep them, they can be move to the module header where they become pins



    Only sort of. That's where it gets tricky - the best code is stand alone, that has one register and some muxes on that.
    That makes eqn-scan, and general testing easier.

    If you want to code this like it is a Read/Write path on a register, then you do not have the register as well and so it does not reduce down to useful equations.


    See the amended code, <= is better than =, strangely = is almost right, and seems ok in very simple cases, and gives no errors.



    Again not quite, the counters are further up, and the data should really be registered



    Yes.

    Rather than trying the double gymnastics of [start of each instruction] and [end of each instruction], I think it is best in the early stages to KISS, and focus on simplest most readable verilog, that is then used as a template for software.
    I treat each CLK as a data sample point on the USB waveform.

    I'm sure Chip will be able to re-warp it around registers, if the pathways allow it, or he may choose to use separate registers.
    At some point the extra muxes to merge all this into the opcode tree, will bite into the MHz values.
    Local routing is smaller and faster.

    The only benefit of a full merge into the multiport register stack, is you can run multiple copies in multiple registers, but I don't think anyone is expecting to run TWO USBs in one COG ?! Just one USB with some spare MIPS would be fine for most.
    There are 8 COGS here.
    Thanks. I need to relook at the changes you made.

    Yes, I am sure Chip will know the best way to do it.

    No, I am not expecting to do multiple USBs in a single cog. Urhg :(
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 22:51
    Sapieha wrote: »
    Hi Cluso.

    I have any question regarding USB packet --->

    1. Have every BYTE that header with bit stuff ---- else are it only at start of packet

    2. Have every byte any Start-Stop condition else Only entire packet?

    Sory for that questions --- But cant find that on Internet
    1. All groups of bits can have a bit stuff. If there are more that 6 bits without a transition, a bit change is inserted. So it starts right from the start. However, because the header data is special, I don't think it can occur at the beginning. But I will take care of it anyway because that's the easiest to do.

    2. No, its NRZI synchronous. No start or stop bits ever. There are sync bits at the start, and the SE0 (both J & K low) at the end.


    BTW Thanks for the logic but I am so far removed from that now its not much help to me for now.
  • jmgjmg Posts: 15,173
    edited 2014-03-11 22:54
    Cluso99 wrote: »
    I think this is getting close now. Thanks again jmg.

    I see you combine CRC into BitStuff - Here, it may pay to expand that slightly?

    See http://en.wikipedia.org/wiki/Cyclic_redundancy_check

    crc05usb -> Does this change BitStuff ?
    crc16usb -> Use USB bit-stuff rules
    crc16itt -> Use SDLC bitstuff rules
    crc16ndef -> disable BitStuff, for more general CRC use ? - Pick one ?
    I think you can also attach the CRC to a USB sending Pins (includes stuff, which HW removes), and (quickly) grab the CRC for use in TX append ?
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 23:00
    I don't have a logic analyser. I am just going to try this realtime to snoop what is happening on a real FS USB (FTDI to P1). I will just treat it as though I am receiving the data, and debug info out the P2s serial. I can check it out quite simply as I am used to this type of thing.

    A DE2 could do this in two cogs and the Propplug.

    Earlier it was asked about syncing to SE0 and waiting. It is a simple matter while waiting for the next valid frame to start, to look for the SE0 or SE1 condition. Two successive pin reads will validate an SE0 condition. Remember, the USB line is not oscillating (else the whole thing is U/S), so the unfortunate read during a transition will be resolved by two consecutive reads. The frame resync mechanism is not hard and I am doing that now (well 3+ months ago).
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-11 23:10
    jmg wrote: »
    I see you combine CRC into BitStuff - Here, it may pay to expand that slightly?

    See http://en.wikipedia.org/wiki/Cyclic_redundancy_check

    crc05usb -> Does this change BitStuff ?
    crc16usb -> Use USB bit-stuff rules
    crc16itt -> Use SDLC bitstuff rules
    crc16ndef -> disable BitStuff, for more general CRC use ? - Pick one ?
    I think you can also attach the CRC to a USB sending Pins (includes stuff, which HW removes), and (quickly) grab the CRC for use in TX append ?
    If memory serves me correctly I don't think the CRC5 frames can even generate a bit stuff because of the bit formatting.

    Just reread the CRC algorithms on the wiki. Its as I thought, by just passing the received CRC thru the CRC generator, the final CRC after this will be a fixed value ($8005 IIRC). This is easy. Its so long ago since I calculated CRC16s on IBM sync comms using micros.

    The value depends on start ($0000 or $FFFF) and endian (LSB or MSB first). Once it is working I can check the endian issue.

    You may have noted that the last post also fixed the endian of the data byte I had it the wrong way around :(

    ANd yes, I am sure I can grab the CRC calculated from this during the last data bit for sending out the CRC.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-11 23:47
    Hi Cluso.

    On this page are link to one PDF.

    http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1250045&viewfull=1#post1250045

    That show as CRC5 are 11bits IN. that say to me -- after all bits of PID received

    BUT CRC16 calculated bitwise.



    Cluso99 wrote: »
    If memory serves me correctly I don't think the CRC5 frames can even generate a bit stuff because of the bit formatting.

    Just reread the CRC algorithms on the wiki. Its as I thought, by just passing the received CRC thru the CRC generator, the final CRC after this will be a fixed value ($8005 IIRC). This is easy. Its so long ago since I calculated CRC16s on IBM sync comms using micros.

    The value depends on start ($0000 or $FFFF) and endian (LSB or MSB first). Once it is working I can check the endian issue.

    You may have noted that the last post also fixed the endian of the data byte I had it the wrong way around :(

    ANd yes, I am sure I can grab the CRC calculated from this during the last data bit for sending out the CRC.
  • roglohrogloh Posts: 5,786
    edited 2014-03-12 03:51
    Here is some brief CRC info from the USB guys... looks like they invert the CRC before transmitting it at the end, and on reception there will be a known constant residual after doing the CRC on the entire packet including its CRC (it will be non zero because of this). Something to bear in mind.

    http://www.usb.org/developers/whitepapers/crcdes.pdf

    EDIT: Another document which describes SE0 detection problems... sounds a bit scary if there is asynchronous SE0 generation and bit dribble going on (see Pages 7-8)...! http://www.usb.org/developers/whitepapers/siewp.pdf
  • Bill HenningBill Henning Posts: 6,445
    edited 2014-03-12 06:49
    Actually, the buffers are a bit bigger than that - from the web page:
    216K Block RAM supports following memory configurations*

    8 channels with 24K sample depth
    16 channels with 12K sample depth
    32 channels with 6K sample depth

    What I like about them is how inexpensive they are, after I got the first one, I picked up two more so I could test more gear at the same time.

    My 500Msps unit only has a 4K buffer, which is a real pain. It is supposed to have a compressed mode, but with the firmware I have installed, that does not work. Reminds me to update its firmware...

    What I really want is a 1Gsps or 2Gsps unit with a large buffer...

    Hanno's ViewPort will sample to clkfreq using four cogs, and has an approximately 1500 sample buffer. I used it to debug Morpheus a few years ago.
    jmg wrote: »
    Looks nice, says
    16 channels with 8K sample depth
    8 channels with 16K sample depth

    which is just a little light for 1 USB frame.

    My personal preference is Logic Analysers that capture & store timestamps, as they have much better dynamic range.
    A P1 might make 1.5MHz that way ?
    > 3MHz (say 4MHz) would allow capture of the USB edges and the mid-point sample-tags, but that may be asking too much.
    I guess multiple COGS could give more, and a Logic capture unit does not care it if uses 7 COGS for captures.
  • cgraceycgracey Posts: 14,151
    edited 2014-03-12 08:38
    Cluso99,

    I'm about ready to release the new FPGA image. I just need to finish the docs.

    Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?
  • jmgjmg Posts: 15,173
    edited 2014-03-12 11:57
    cgracey wrote: »
    I'm about ready to release the new FPGA image. I just need to finish the docs.

    Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?

    I would say here, that any code that defines and selects the pin-pair (with reverse feature), and does SE0 and Toggle decode will still be common to any solution. (ie not be wasted at all)

    It would also allow more testing in a FPGA, as the present USB code is not quite enough entirely in SW.

    That said, a release now would be used by everyone, and if all that is added is USB_SET on a .1 release, only a few would need to download the .1, so to most it would not be a dual release.
  • jmgjmg Posts: 15,173
    edited 2014-03-12 13:13
    Pasted from other thread, as it is USB detail
    cgracey wrote: »
    The counters can count the frequency of edges and the durations of states, but they don't have a reload mode like you are asking about.

    A special circuit can be made for the USB handler, though. In many instances, it's not the guts of a circuit that take lots of space, but all the conduit to make it breathe. If we encapsulated it, it might be the best way to go.

    Here is some Verilog for a Sync'able Baud counter, that should work from

    /4 ie 48MHz CLK on 12M USB
    to
    > /133 ie > 200MHz CLK on 1.5M USB
    //           0   1   2   3   0   1   2   3   0   1   2   3   0   1   2   3   0   1   2   3   0   1   2   3
    // CLK  ==\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=
    // Di  ==============\\\____________///==============================================\\\_____________///============
    // Di'  ================\______________/================================================\_______________/============
    // Di'' ====================\_______s______/========s===============s===============s=======\_______s_______/============
    // ED    _______________/===\__________/===\____________________________________________/===\___________/===\________
    // RST                      ^->0           ^->0                                             ^->0           ^->0 
    // TSW  xxxxxxxxxxxxxxxxxxxx____/====\__________/====\__________/====\__________/====\__________/====\__________
    //                                  ^-               ^-              ^-              ^-              ^- get s
    //  RL    Comp       CTR    TSW
    // 0011   SHR  001    M4    1-2
    // 0111   SHR  011    M8    3-4
    // 1000   SHR  100    M9    4-5
    // 1001   SHR  100    M10   4-5
    //                            ^-- Grabs Di'' on this edge 
    //  ED is Edge Detect,  Di' XOR Di'',  and TSW is Sample Enable Window for Di'', one clk wide.
    // TSW as CE samples just before falling edge  
    
    
    reg [7:0] RL_Ctr;    // RL is 8b reload field, ED(i), TSW(o) are one bit
      always @(*) begin  // combin codes == is 16 wide, >= is many more.
        RL_FS = (RL_Ctr == RL);              // common compare, flyback to 00, change constraints to keep this.
        TSW   = (RL_Ctr == {1'b0,RL[7:1]});  // Divide by 2 compare/slice 1 clk wide , keeps clear of flyback chatter.
        RL_FZ = RL_FS | ED | WrRL;  // force to zero on Either FullScale (free run) OR USB Edge Detect 
        // Optional WriteRL signal, can reset on Baud change, to allow timed SW start, and safe lowering of Value.
      end  
     always  @(posedge CLK) 
     begin
         if (RL_FZ) begin 
            RL_Ctr <= 7'b0000000;  // Sync Flyback on TC 
         end
         else begin 
            RL_Ctr <= RL_Ctr+1;    // Up counter 
         end
     end
    
     // * changed to Up counter, due to rounding nature of SHR
     //   Adds a compare, but counters are simpler, Sync CLR or INC. 
     // * Added Force Zero term, to give optimizer less options & shrink counters further.
     // * Added Optional WrRL, to force reset on Update of BaudValue(RL)
     //   Allows Sw control of timing, and safe decrease in BaudV 
    


    TSW is the Sample window, which can then enable the Byte-level WAITUSB style code block discussed above.

    ie this snippet allows BYTE level rather than BIT level handling, and re-syncs sample point on USB data, to allow longer stream tolerance.

    On /4 the phase of TSW matters, but I think above is right, for samples taken from D'' (2nd sampler FF)

    This supports odd divides too, for more clock flexibility. Takes an 8 bit RL value to set Baud speed.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-12 16:19
    cgracey wrote: »
    Cluso99,

    I'm about ready to release the new FPGA image. I just need to finish the docs.

    Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?

    Chip,
    If it is easy to convert this Verilog then it would be nice to have this to be able to test it. I am not sure it is totally correct, but it is a place to start.
    BTW Where I use xxx = 3'b000 or similar, jmg has suggested it be xxx <= 3'b000 (ie replace = with <=). I have not had time to check what this means.

    My code is in post #107.
    Thanks heaps.

    Postedit: Should have said, if it is quicker/easier for you to put out a release without it, and follow up with a release with the above shortly after, that is fine by me.

    BTW IMHO I think a number of us would appreciate the fpga code + pnut before you complete the docs. We have a lot of things to change since the last release even without the docs.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-12 16:29
    rogloh wrote: »
    Here is some brief CRC info from the USB guys... looks like they invert the CRC before transmitting it at the end, and on reception there will be a known constant residual after doing the CRC on the entire packet including its CRC (it will be non zero because of this). Something to bear in mind.

    http://www.usb.org/developers/whitepapers/crcdes.pdf

    EDIT: Another document which describes SE0 detection problems... sounds a bit scary if there is asynchronous SE0 generation and bit dribble going on (see Pages 7-8)...! http://www.usb.org/developers/whitepapers/siewp.pdf
    I haven't looked yet. There are various ways the CRCs can be calculated and hence they reverse/invert/startvalues can all be different. The ultimate result is the same though. FWIW I am hoping that I have it the correct way around. I realised yesterday I had the data byte being assembled in reverse (MSB first instead of LSB) but I fixed that yesterday. The CRC16 I have used requires that the CRC be preset to $FFFF and will result at the end with IIRC $8005. I can check this out when Chip implements the Verilog. If its wrong, then temporarily I can correct it in sw and modify the Verilog for the next release.

    I am happy to be able to detect SE0 and also to resync for new frames. I have seen dribble detection but I don't have it covered yet.

    Thanks for the links. There is a much older P2 USB thread that I started where I listed some of the docs I use. I am quite happy I have the bit stream covered but I don't have have a good understanding of the upper sw protocol levels. But I do have some info to guide me.
  • jmgjmg Posts: 15,173
    edited 2014-03-12 16:31
    Cluso99 wrote: »
    Chip,
    If it is easy to convert this Verilog then it would be nice to have this to be able to test it. I am not sure it is totally correct, but it is a place to start.
    My code is in post #107.
    Thanks heaps.

    I think chip was meaning the earlier, simpler code to allocate Pins and manage SE0 and T into the flags ?

    The code in #107 is not quite 'mission-ready', and Pin mapping and the couple of FF's & XORs to do SE0_SE1 and T should be common to any extended code.
    BTW Where I use xxx = 3'b000 or similar, jmg has suggested it be xxx <= 3'b000 (ie replace = with <=). I have not had time to check what this means.
    <= assign is verilog that ensures you do get a clocked result. ( ie usually a D-FF )
    = within a clocked block seems to sometimes give a clocked result, but not always. Best to be careful.
    ( another reason I suggested you run something like Lattice ISPlever)
Sign In or Register to comment.