P2 and full speed USB slave requirements/ideas

Cluso99 · 2014-03-11 19:04

jmg wrote: »

I'm not following, byte handling allows much lower clocks, even in one task.
I think lowish (FPGA region) clocks and threads should be a practical goal.

If there is any possibility of doing it in 1 cog, then threads or interleaved code will be required. If we have a couple of free instructions per bit, then that permits getting things ready for the reply, as well as decoding as it comes in.

Ideally, but TX is less 'drop dead' as it can take some time to assemble/organize things I think.

True of course.

Of course, I think coding a "verilog clone" in SW for 1.5 MHz USB testing should be possible.
If that also allows timer-paced sampling, it is a small step to use counters and a per-byte jump.

The shift to timer-paced operation uses almost identical Verilog, and a data buffer for read is small.
It may also avoid this somewhat complex opcode, pushing down fMAX if it works on register-space.
(timer paced code decouples things a little from register critical paths)

The reason I am using FS rather than LS is that I can parallel the FTDI transactions, so I can snoop the USB FS. I don't have any LS USB devices (that I know of) that I can snoop the same way.
What this means is that I can decode all the incoming frames to the FTDI Chip (connected to a P1). I can also snoop the replies.
Remember, while I have read the USB Spec summaries, and looked at code doing the protocol, I have never actually done it. However, I have written lots of sync software over many years including SDLC and BiSync, and built ASCII to EBCDIC sync converters. But this was before TCPIP etc.

I think maybe the CRC does not need a buffered read, as it is checked on EOP ?

As long as you save after each byte, and you keep at least 3 levels, you will have the CRC available. I am unsure if the CRC can be used to verify the CRC (if you know what I mean). I will need to do some simple testing of this.
BTW I did write a simple P1 program to calculate CRC5 & CRC16 for USB. I just have not got around to looking at it.

If there are spare virtual Pins, the USB RxRDY flags could hook into some of those ?

Not sure what you mean here?

Chip would likely need to modify the counters slightly to allow /N reloadable counting, and edge resync.
I'm not sure if those modes are already in the Counters.

I don't know.

I am still after the KISS way, at least for now. If it turns out that it's not too complex to set off a simple instruction to run in the background like the mult/cordic instructions, and they don't take huge blocks of silicon, then it may be worthwhile. ATM I am trying to walk not run.

Cluso99 · 2014-03-11 19:10

jmg wrote: »

That's starting to sound like a lot of crossed fingers...?
Chip may already have edge reset modes in the counters, and I think the SW WAIT can then work, with a Counter.

To test at 1.5MHz, and a simple Reload timer, the FPGA needs to clock at either 78MHz or 81MHz , with reload values of 52 or 54, and use SW wait values of 50% of those for mid-bit sampling.

No, its not crossed fingers. I can reliably resync to new packets at 80MHz FPGA. It is just coded sequentialy atm to read the incoming data bytes.
For 80MHz you wait +1/3 +1/3 -2/3 (ie you wait an extra +1 clock +1 clock, -2 clocks). That is how they got USB to work originally.

jmg · 2014-03-11 19:13

Cluso99 wrote: »

I am unsure if the CRC can be used to verify the CRC (if you know what I mean). I will need to do some simple testing of this.

See the discussion further up.
Apparently, if you include the CRC in the stream, ie read to the end tag, then the CRC should read 0000
That would make life simpler.

Cluso99 · 2014-03-11 19:13

jmg wrote: »

The problem with this, is if the Verilog needs a lot of changes( as this does), it quickly becomes too clumsy to have someone else applying fix-ups. Also in the form you code, checking is harder as it is not so self contained.

As always, it is better to code in small pieces, get 'working' equations, and look at the .eq0 & .rpt files to confirm you have counters / clock enables / MUXes as expected, and no logic blow-outs.

Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).

////////////////////////////////////////////////////////////////////////////////
// Acknowledgements: Verilog code for CRC's   http://www.easics.com
// RR20140310 start
// RR20140311,12 continued
////////////////////////////////////////////////////////////////////////////////
// polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16ccitt=(0 5 12 16) 
// data width: 1, LSB first
//
// inputs:  D, S, PINS
// outputs: D, Z, C
module          RxUSB
(
input   CLK,                 // 
input   Load_d,              // 
input   jI,              // 
input   kI,              // 
input   WZ,              // 
input   WC,              // 
input   [31:0]  s,              // S operand
input   [31:0]  d,              // D operand
input   [127:0] p,              // input pins
output reg   [31:0]  r,              // D result
output reg        zz,              // Z flag
output reg        cy               // Carry flag    

);

reg     [15:0]  crc;            // original CRC (accumulated)
reg     [2:0]   bitcnt;         // data bit counter 3 bits
reg             k;              // K new pin value
reg             j;              // J new pin value
reg     [2:0]   stuffcnt;       // stuff counter 3 bits
reg     [7:0]   data;           // data byte (accumulated)
reg     [8:7]   poly;           // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
reg     [6:0]   pinno;          // pin pair numbers 0-127
reg     [15:0]  newcrc;         // new crc
reg             t;              // 1 if k toggles (ie 1 bit)
reg             kP;              // K old pin value
reg             jP;              // J old pin value
//reg     r,z,c;              // D result


reg     crc05usb;
reg     crc16usb;
reg     crc16itt;
reg     crc16ndef;
reg     SkipStuff;
reg     InvalidPM;

///////////////////////////////////////////////////////////////////////////////

// 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    always @(poly)  begin   
        crc05usb  = (poly == 2'b00);    // CRC5usb   =(0 2 5)
        crc16usb  = (poly == 2'b01);    // CRC16usb  =(0 2      15 16)
        crc16itt  = (poly == 2'b10);    // CRC16ccitt=(0   5 12    16)
        crc16ndef = (poly == 2'b11);    // undefined - alias to one above 
    end

// check for a "1" bit toggle
    always @(kI or jI or kP or stuffcnt)  begin   
      t = kI ^ kP;                 // new pin value ^ previous pin value; 1=toggled
      SkipStuff = (!t & (stuffcnt == 3'b110));  // !t needed for ccitt ?
      InvalidPM = (kI==jI);    // Signaling states are non-diff
    end



always  @(posedge CLK) begin
  if (Load_d) begin  // WRITE to register - Value INIT
//    crc      = d[31:16];        // original crc value (accum) moved below
    kP       = d[15];           // previous K
    jP       = d[14];           // previous J
    stuffcnt = d[13:11];        // original stuff counter value
    bitcnt   = d[10:8];         // original bit   counter value
    data     = d[7:0];          // original data value (accum)

    poly     = s[8:7];          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
//?   kpin     = value(s[6:0]);   // K pin no.
//?   jpin     = value(s[6:0]) ^1 // J pin no.
    k        = kI;      // new pin value
    j        = jI;      // new pin value
  end // Load_d
  else begin // !Load_d  = normal RUN , compiler wants in one block..
    k        = kI;      // new pin value
    j        = jI;      // new pin value
    kP       = k;       // previous K
    jP       = j;       // previous J

// check for bit unstuff
    if (SkipStuff) begin
        // unstuff
        stuffcnt = 3'b000;
//      bitcnt   = bitcnt;    //  implicit, but makes hold action clear 
    end
    else if (!InvalidPM) begin
        // inc bit count & accum data bit into byte
        bitcnt++;
        stuffcnt++;           // will be reset at result if input bit toggles
    end  
  end // Load_d 
end // (posedge CLK)     


reg     kr0;
reg     kr2;
reg     kr5;
reg     kr12;
reg     kr15;

always @(*) begin 
// calculate the new crc... - decoded values, so no overlaps in if
    if (crc05usb) begin
        kr0  = t ^ crc[4];
        kr2  = t ^ crc[4];
        kr5  = 1'b0;
        kr12 = 1'b0;
        kr15 = 1'b0; 
    end
    if (crc16usb) begin
        kr0  = t ^ crc[15];
        kr2  = t ^ crc[15];
        kr5  = 1'b0;
        kr12 = 1'b0;
        kr15 = t ^ crc[15]; 
    end
    if (crc16itt) begin
        kr0  = t ^ crc[15];
        kr2  = 1'b0;
        kr5  = t ^ crc[15];
        kr12 = t ^ crc[15];
        kr15 = 1'b0; 
    end    
    if (crc16ndef) begin  // alias crc16itt, so cover ALL decodes.
        kr0  = t ^ crc[15];
        kr2  = 1'b0;
        kr5  = t ^ crc[15];
        kr12 = t ^ crc[15];
        kr15 = 1'b0; 
    end    
end // always @(*)

always  @(posedge CLK) begin
  if (Load_d) begin  // WRITE to register - Value INIT
    crc      = d[31:16];        // original crc value (accum)
  end 
  else if (!InvalidPM & !SkipStuff) begin
    crc[0]  = kr0;
    crc[1]  = crc[0];
    crc[2]  = crc[1] ^ kr2;
    crc[3]  = crc[2];
    crc[4]  = crc[3];
    crc[5]  = crc[4] ^ kr5;
    crc[6]  = crc[5];
    crc[7]  = crc[6];
    crc[8]  = crc[7];
    crc[9]  = crc[8];
    crc[10] = crc[9];
    crc[11] = crc[10];
    crc[12] = crc[11] ^ kr12;
    crc[13] = crc[12];
    crc[14] = crc[13];
    crc[15] = crc[14] ^ kr15;
  end // valid 
end // (posedge CLK)     
    
// set results
always @(*) begin 
    r[31:16] = crc;
    r[15]    = k;
    r[14]    = j;
end // always @(*)

always  @(*) begin   // non register here ? - this is a bit mangled, data needs fixing 
    if (t)   begin                // toggled bit?
        r[13:11] = 3'b000;       // reset stuff counter - moved to above
    end
    else begin   
        r[13:11] = stuffcnt;
    end    
    r[10:8]  = bitcnt;
    if (SkipStuff) begin
        r[7:0] = data;
    end
    else begin
        r[7:1] = data[6:0];
        r[0]   = t;             // add new data bit
    end        
end // @(*)     

always  @(posedge CLK) begin
    if (WZ) begin
        if (  !SkipStuff & (bitcnt == 3'b000)) begin
            zz = 1'b1;           // byte ready
        end
        else begin
            zz = 1'b0;           // byte not ready
        end    
    end

    if (WC) begin          
        cy = k ^ j;              // c = SE0/SE1
    end           
end // (posedge CLK)   
endmodule  // RxUSB

Thanks heaps.
Some things I didn't know was crc = crc ^ x was possible.
Also which ways are the best. These are all things I don't understand.
So for me, its better that I ultimately put the things to be done within if blocks and let Chip (or you) sort that part out for me.

jmg · 2014-03-11 19:15

Cluso99 wrote: »

No, its not crossed fingers. I can reliably resync to new packets at 80MHz FPGA. It is just coded sequentialy atm to read the incoming data bytes.

The issue is not resync at the beginning, it is sampling creep during long packets.

Cluso99 · 2014-03-11 19:15

jmg wrote: »

See the discussion further up.
Apparently, if you include the CRC in the stream, ie read to the end tag, then the CRC should read 0000
That would make life simpler.

Yes, this is what I was wondering. Seems it should be possible because that is the way the old hw would have likely worked. But then again, it does not sound correct.
It is easily solved by running my P1 program with some input parameters and see. just haven't got around to it yet.

jmg · 2014-03-11 20:14

Cluso99 wrote: »

Thanks heaps.

I've updated the code, as checking the eqns showed it dropped the ball on some CRC nodes.
I tend to always use <= for clocked and = combin, and it seems your use of = in clocked sometimes works, but can get confused on more complex forms...

Cluso99 · 2014-03-11 20:31

jmg wrote: »

Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).

I have reproduced snippets for related questions and my understanding of them...

// 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    always @(poly)  begin   
        crc05usb  = (poly == 2'b00);    // CRC5usb   =(0 2 5)
        crc16usb  = (poly == 2'b01);    // CRC16usb  =(0 2      15 16)
        crc16itt  = (poly == 2'b10);    // CRC16ccitt=(0   5 12    16)
        crc16ndef = (poly == 2'b11);    // undefined - alias to one above 
    end

Always decodes this, so it is a logic block, not a clocked register.

// check for a "1" bit toggle
    always @(kI or jI or kP or stuffcnt)  begin   
      t = kI ^ kP;                 // new pin value ^ previous pin value; 1=toggled
      SkipStuff = (!t & (stuffcnt == 3'b110));  // !t needed for ccitt ?
      InvalidPM = (kI==jI);    // Signaling states are non-diff
    end

Always decodes this on any change in the inputs kI, jI, kP, or stuffcnt. Again, a logic block.

always  @(posedge CLK) begin
  if (Load_d) begin  // WRITE to register - Value INIT
//    crc      = d[31:16];        // original crc value (accum) moved below
    kP       = d[15];           // previous K
    jP       = d[14];           // previous J
    stuffcnt = d[13:11];        // original stuff counter value
    bitcnt   = d[10:8];         // original bit   counter value
    data     = d[7:0];          // original data value (accum)

    poly     = s[8:7];          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
//?   kpin     = value(s[6:0]);   // K pin no.
//?   jpin     = value(s[6:0]) ^1 // J pin no.
    k        = kI;      // new pin value
    j        = jI;      // new pin value
  end // Load_d

Performed once at the start of each instruction (loads initial values), clocked by CLK.

  else begin // !Load_d  = normal RUN , compiler wants in one block..
    k        = kI;      // new pin value
    j        = jI;      // new pin value
    kP       = k;       // previous K
    jP       = j;       // previous J
// check for bit unstuff
    if (SkipStuff) begin
        // unstuff
        stuffcnt = 3'b000;
//      bitcnt   = bitcnt;    //  implicit, but makes hold action clear 
    end
    else if (!InvalidPM) begin
        // inc bit count & accum data bit into byte
        bitcnt++;
        stuffcnt++;           // will be reset at result if input bit toggles
    end  
  end // Load_d 
end // (posedge CLK)

Completes the remaining initialisation, clocked by CLK.

reg     kr0;
reg     kr2;
reg     kr5;
reg     kr12;
reg     kr15;

always @(*) begin 
// calculate the new crc... - decoded values, so no overlaps in if
    if (crc05usb) begin
        kr0  = t ^ crc[4];
        kr2  = t ^ crc[4];
        kr5  = 1'b0;
        kr12 = 1'b0;
        kr15 = 1'b0; 
    end
    if (crc16usb) begin
        kr0  = t ^ crc[15];
        kr2  = t ^ crc[15];
        kr5  = 1'b0;
        kr12 = 1'b0;
        kr15 = t ^ crc[15]; 
    end
    if (crc16itt) begin
        kr0  = t ^ crc[15];
        kr2  = 1'b0;
        kr5  = t ^ crc[15];
        kr12 = t ^ crc[15];
        kr15 = 1'b0; 
    end    
    if (crc16ndef) begin  // alias crc16itt, so cover ALL decodes.
        kr0  = t ^ crc[15];
        kr2  = 1'b0;
        kr5  = t ^ crc[15];
        kr12 = t ^ crc[15];
        kr15 = 1'b0; 
    end    
end // always @(*)

always  @(posedge CLK) begin
  if (Load_d) begin  // WRITE to register - Value INIT
    crc      = d[31:16];        // original crc value (accum)
  end 
  else if (!InvalidPM & !SkipStuff) begin
    crc[0]  = kr0;
    crc[1]  = crc[0];
    crc[2]  = crc[1] ^ kr2;
    crc[3]  = crc[2];
    crc[4]  = crc[3];
    crc[5]  = crc[4] ^ kr5;
    crc[6]  = crc[5];
    crc[7]  = crc[6];
    crc[8]  = crc[7];
    crc[9]  = crc[8];
    crc[10] = crc[9];
    crc[11] = crc[10];
    crc[12] = crc[11] ^ kr12;
    crc[13] = crc[12];
    crc[14] = crc[13];
    crc[15] = crc[14] ^ kr15;
  end // valid 
end // (posedge CLK)

Calculates the new CRC, if required else keep the same, clocked by CLK. (needs a tidy up)

// set results
always @(*) begin 
    r[31:16] = crc;
    r[15]    = k;
    r[14]    = j;
end // always @(*)

always  @(*) begin   // non register here ? - this is a bit mangled, data needs fixing 
    if (t)   begin                // toggled bit?
        r[13:11] = 3'b000;       // reset stuff counter - moved to above
    end
    else begin   
        r[13:11] = stuffcnt;
    end    
    r[10:8]  = bitcnt;
    if (SkipStuff) begin
        r[7:0] = data;
    end
    else begin
        r[7:1] = data[6:0];
        r[0]   = t;             // add new data bit
    end        
end // @(*)

Accumulates the new Data, if required else keep the same, clocked by CLK.
Increment the bit counter, if required.
Increments the stuff counter, or resets it, as required.

always  @(posedge CLK) begin
    if (WZ) begin
        if (  !SkipStuff & (bitcnt == 3'b000)) begin
            zz = 1'b1;           // byte ready
        end
        else begin
            zz = 1'b0;           // byte not ready
        end    
    end

    if (WC) begin          
        cy = k ^ j;              // c = SE0/SE1
    end           
end // (posedge CLK)   
endmodule  // RxUSB

Sets the Z & C flags, clocked by CLK.

jmg · 2014-03-11 21:01

Always decodes this, so it is a logic block, not a clocked register.
Always decodes this on any change in the inputs kI, jI, kP, or stuffcnt. Again, a logic block.

Yes, these are to make later code easier to read. The compiler/fit will likely optimize some of these names away.
To keep them, they can be move to the module header where they become pins

Performed once at the start of each instruction (loads initial values), clocked by CLK.
Completes the remaining initialisation, clocked by CLK.

Only sort of. That's where it gets tricky - the best code is stand alone, that has one register and some muxes on that.
That makes eqn-scan, and general testing easier.

If you want to code this like it is a Read/Write path on a register, then you do not have the register as well and so it does not reduce down to useful equations.

Calculates the new CRC, if required else keep the same, clocked by CLK. (needs a tidy up)

See the amended code, <= is better than =, strangely = is almost right, and seems ok in very simple cases, and gives no errors.

Accumulates the new Data, if required else keep the same, clocked by CLK.
Increment the bit counter, if required.
Increments the stuff counter, or resets it, as required.

Again not quite, the counters are further up, and the data should really be registered

Sets the Z & C flags, clocked by CLK.

Yes.

Rather than trying the double gymnastics of [start of each instruction] and [end of each instruction], I think it is best in the early stages to KISS, and focus on simplest most readable verilog, that is then used as a template for software.
I treat each CLK as a data sample point on the USB waveform.

I'm sure Chip will be able to re-warp it around registers, if the pathways allow it, or he may choose to use separate registers.
At some point the extra muxes to merge all this into the opcode tree, will bite into the MHz values.
Local routing is smaller and faster.

The only benefit of a full merge into the multiport register stack, is you can run multiple copies in multiple registers, but I don't think anyone is expecting to run TWO USBs in one COG ?! Just one USB with some spare MIPS would be fine for most.
There are 8 COGS here.

jmg · 2014-03-11 21:31

Cluso99 wrote: »

The reason I am using FS rather than LS is that I can parallel the FTDI transactions, so I can snoop the USB FS. I don't have any LS USB devices (that I know of) that I can snoop the same way.
What this means is that I can decode all the incoming frames to the FTDI Chip (connected to a P1). I can also snoop the replies.
Remember, while I have read the USB Spec summaries, and looked at code doing the protocol, I have never actually done it. However, I have written lots of sync software over many years including SDLC and BiSync, and built ASCII to EBCDIC sync converters. But this was before TCPIP etc.

I recalled other discussions on USB & port debug, and the suggested tool was PortMon.
http://technet.microsoft.com/en-us/sysinternals/bb896644.aspx

A P1 might even be able to edge-capture to 12.5ns at 1.5MHz speeds, as an edge-based logic analyser ?
A complete frame can never need more than 1500 max stamps.

Sapieha · 2014-03-11 21:37

Hi jmg.

It compiles in Quartus for me.

Maybe form this SCH attached --- You can see if all logic's are as desired.

jmg wrote: »

The problem with this, is if the Verilog needs a lot of changes( as this does), it quickly becomes too clumsy to have someone else applying fix-ups. Also in the form you code, checking is harder as it is not so self contained.

As always, it is better to code in small pieces, get 'working' equations, and look at the .eq0 & .rpt files to confirm you have counters / clock enables / MUXes as expected, and no logic blow-outs.

Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).

////////////////////////////////////////////////////////////////////////////////
// Acknowledgements: Verilog code for CRC's   http://www.easics.com
// RR20140310 start
// RR20140311,12 continued
////////////////////////////////////////////////////////////////////////////////
// polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16ccitt=(0 5 12 16) 
// data width: 1, LSB first
//
// inputs:  D, S, PINS
// outputs: D, Z, C
module          RxUSB
(
input   CLK,                 // 
input   Load_d,              // 
input   jI,              // 
input   kI,              // 
input   WZ,              // 
input   WC,              // 
input   [31:0]  s,              // S operand
input   [31:0]  d,              // D operand
input   [127:0] p,              // input pins
output reg   [31:0]  r,              // D result
output reg        zz,              // Z flag
output reg        cy,               // Carry flag    

output reg     SkipStuff,   // move so can see in EQNs better
output reg     InvalidPM

);

reg     [15:0]  crc;            // original CRC (accumulated)
reg     [2:0]   bitcnt;         // data bit counter 3 bits
reg             k;              // K new pin value
reg             j;              // J new pin value
reg     [2:0]   stuffcnt;       // stuff counter 3 bits
reg     [7:0]   data;           // data byte (accumulated)
reg     [8:7]   poly;           // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
reg     [6:0]   pinno;          // pin pair numbers 0-127
reg     [15:0]  newcrc;         // new crc
reg             t;              // 1 if k toggles (ie 1 bit)
reg             kP;              // K old pin value
reg             jP;              // J old pin value
//reg     r,z,c;              // D result


reg     crc05usb;
reg     crc16usb;
reg     crc16itt;
reg     crc16ndef;

///////////////////////////////////////////////////////////////////////////////

// 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
    always @(poly)  begin   
        crc05usb  = (poly == 2'b00);    // CRC5usb   =(0 2 5)
        crc16usb  = (poly == 2'b01);    // CRC16usb  =(0 2      15 16)
        crc16itt  = (poly == 2'b10);    // CRC16ccitt=(0   5 12    16)
        crc16ndef = (poly == 2'b11);    // undefined - alias to one above 
    end

// check for a "1" bit toggle
    always @(kI or jI or kP or stuffcnt)  begin   
      t = kI ^ kP;                 // new pin value ^ previous pin value; 1=toggled
      SkipStuff = (!t & (stuffcnt == 3'b110));  // !t needed for ccitt ?
      InvalidPM = (kI==jI);    // Signaling states are non-diff
    end



always  @(posedge CLK) begin
  if (Load_d) begin  // WRITE to register - Value INIT
//    crc      = d[31:16];        // original crc value (accum) moved below
    kP       = d[15];           // previous K
    jP       = d[14];           // previous J
    stuffcnt = d[13:11];        // original stuff counter value
    bitcnt   = d[10:8];         // original bit   counter value
    data     = d[7:0];          // original data value (accum)

    poly     = s[8:7];          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
//?   kpin     = value(s[6:0]);   // K pin no.
//?   jpin     = value(s[6:0]) ^1 // J pin no.
    k        = kI;      // new pin value
    j        = jI;      // new pin value
  end // Load_d
  else begin // !Load_d  = normal RUN , compiler wants in one block..
    k        = kI;      // new pin value
    j        = jI;      // new pin value
    kP       = k;       // previous K
    jP       = j;       // previous J

// check for bit unstuff
    if (SkipStuff) begin
        // unstuff
        stuffcnt <= 3'b000;
//      bitcnt   = bitcnt;    //  implicit, but makes hold action clear 
    end
    else if (!InvalidPM) begin
        // inc bit count & accum data bit into byte
        bitcnt++;
        if (t) 
           stuffcnt <= 3'b000; // reset if input bit toggles
        else
           stuffcnt++;          
    end  
  end // Load_d 
end // (posedge CLK)     


reg     kr0;
reg     kr2;
reg     kr5;
reg     kr12;
reg     kr15;
reg     HoldCRC;

always @(*) begin 
// calculate the new crc... - decoded values, so no overlaps in if
    if (crc05usb) begin
        kr0  = t ^ crc[4];
        kr2  = t ^ crc[4];
        kr5  = 1'b0;
        kr12 = 1'b0;
        kr15 = 1'b0; 
    end
    if (crc16usb) begin
        kr0  = t ^ crc[15];
        kr2  = t ^ crc[15];
        kr5  = 1'b0;
        kr12 = 1'b0;
        kr15 = t ^ crc[15]; 
    end
    if (crc16itt) begin
        kr0  = t ^ crc[15];
        kr2  = 1'b0;
        kr5  = t ^ crc[15];
        kr12 = t ^ crc[15];
        kr15 = 1'b0; 
    end    
    if (crc16ndef) begin  // alias crc16itt, so cover ALL decodes.
        kr0  = t ^ crc[15];
        kr2  = 1'b0;
        kr5  = t ^ crc[15];
        kr12 = t ^ crc[15];
        kr15 = 1'b0; 
    end  
    HoldCRC = InvalidPM | SkipStuff;
end // always @(*)

always  @(posedge CLK) begin
  if (Load_d) begin  // WRITE to register - Value INIT
    crc     <= d[31:16];        // original crc value (accum)
  end 
  else if (HoldCRC) begin
    crc[0]  <= kr0;              //16
    crc[1]  <= crc[0];           //17
    crc[2]  <= crc[1] ^ kr2;     //18
    crc[3]  <= crc[2];           //19
    crc[4]  <= crc[3];           //20
    crc[5]  <= crc[4] ^ kr5;     //21
    crc[6]  <= crc[5];           //22
    crc[7]  <= crc[6];           //23
    crc[8]  <= crc[7];           //24
    crc[9]  <= crc[8];           //25
    crc[10] <= crc[9];           //26
    crc[11] <= crc[10];          //27 - bad eqns??, needed <= 
    crc[12] <= crc[11] ^ kr12;   //28
    crc[13] <= crc[12];          //29
    crc[14] <= crc[13];          //30
    crc[15] <= crc[14] ^ kr15;   //31
  end // valid 
end // (posedge CLK)     
    
// set results
always @(*) begin 
    r[31:16] = crc;
    r[15]    = k;
    r[14]    = j;
end // always @(*)

always  @(*) begin   // non register here ? - this is a bit mangled, data needs fixing 
    if (t)   begin                // toggled bit?
        r[13:11] = 3'b000;       // reset stuff counter
    end
    else begin   
        r[13:11] = stuffcnt;
    end    
    r[10:8]  = bitcnt;
    if (SkipStuff) begin
        r[7:0] = data;
    end
    else begin
        r[7:1] = data[6:0];
        r[0]   = t;             // add new data bit
    end        
end // @(*)     

always  @(posedge CLK) begin
    if (WZ) begin
        if (  !SkipStuff & (bitcnt == 3'b000)) begin
            zz <= 1'b1;           // byte ready
        end
        else begin
            zz <= 1'b0;           // byte not ready
        end    
    end

    if (WC) begin          
        cy <= k ^ j;              // c = SE0/SE1
    end           
end // (posedge CLK)   


endmodule

Updated code, better CRC eqns

Bill Henning · 2014-03-11 21:37

P1: 10ns @ 100Mhz :-)

Btw,

http://www.seeedstudio.com/depot/Open-Workbench-Logic-Sniffer-p-612.html?cPath=63

is a great little tool!

I have a Hantek 500Msps logic analyzer, but I tend to use the Logic Sniffer's a lot more.

jmg · 2014-03-11 22:11

Bill Henning wrote: »

P1: 10ns @ 100Mhz :-)

Btw,

http://www.seeedstudio.com/depot/Open-Workbench-Logic-Sniffer-p-612.html?cPath=63

is a great little tool!

Looks nice, says
16 channels with 8K sample depth
8 channels with 16K sample depth

which is just a little light for 1 USB frame.

My personal preference is Logic Analysers that capture & store timestamps, as they have much better dynamic range.
A P1 might make 1.5MHz that way ?
> 3MHz (say 4MHz) would allow capture of the USB edges and the mid-point sample-tags, but that may be asking too much.
I guess multiple COGS could give more, and a Logic capture unit does not care it if uses 7 COGS for captures.

Sapieha · 2014-03-11 22:25

Hi Cluso.

I have any question regarding USB packet --->

1. Have every BYTE that header with bit stuff ---- else are it only at start of packet

2. Have every byte any Start-Stop condition else Only entire packet?

Sory for that questions --- But cant find that on Internet

Bob Lawrence (VE1RLL) · 2014-03-11 22:31

Re: Seed Studio:

I tend to use the Logic Sniffer's a lot more.

I use their logic Logic Sniffer(Ver 4). The Bus Pirate is cool as well.

Cluso99 · 2014-03-11 22:40

I think this is getting close now. Thanks again jmg.

////////////////////////////////////////////////////////////////////////////////
// RR20140310-12 P2 RxUSB instruction
////////////////////////////////////////////////////////////////////////////////
/*---------------------------------------------------------------------------------------------------------------------
              RxUSB   D, S/#          WZ,WC             ' Receive single NRZI bit pair, accum CRC and byte, unstuff bits
where
  S/# is the PinPair# and Poly bits
    S[31..9]  = unused
    S[8..7]   = 00= CRC5  USB    (0 2 5)  
                01= CRC16 USB    (0 2 15 16)
                10= CRC16 CCITT  (0 5 12 16)
                11= undefined
    S[6..0]   = D-/D+ Pin Pair #0..127
                The pin pair is always a pair of pins mod 2. ie nnnnnnx where x=0 and x=1 for the pair.
                If the pin pair is even (S[0]=0) then J is the lowest pin and K is the higher pin of the consecutive pair
                If the pin pair is odd  (S[0]=1) then K is the lowest pin and J is the higher pin of the consecutive pair.
                This arrangement allows for simple LS and FS by making the pin pair even or odd.                              
  D is the cog register storing a 32 bit field...
    D[31..16] = crc16
    D[15]     = K new pin value
    D[14]     = J new pin value
    D[13..11] = unstuff counter 3 bits
    D[10..8]  = bit counter 3 bits
    D[7..0]   = data byte accumulation
  Z = data byte ready (8 bits)
  C = SE0/SE1
It would be acceptable for D to be at a fixed location eg $1F0.
---------------------------------------------------------------------------------------------------------------------*/
// inputs:  D, S, PINS
// outputs: D, Z, C
////////////////////////////////////////////////////////////////////////////////
module          RxUSB
(
input           CLK,
input           Load_d,
input           jI,             // new J value
input           kI,             // new K value
input   [31:0]  s,              // S operand
input   [31:0]  d,              // D operand
input           wz,             // WZ operand
input           wc,             // WC operand
input   [127:0] p,              // input pins
output  [31:0]  r,              // D result
output          zz,             // Z flag
output          cy              // C flag    
);
reg     [15:0]  crc;            // original CRC (accumulated)
reg     [2:0]   bitcnt;         // data bit counter 3 bits
reg             k;              // K new pin value
reg             j;              // J new pin value
reg     [2:0]   stuffcnt;       // stuff counter 3 bits
reg     [7:0]   data;           // data byte (accumulated)
reg     [1:0]   poly;           // crc05usb/crc16usb/crc16ccitt/undef polynomial selection
reg     [6:0]   pinno;          // pin pair numbers 0-127
reg             kP;             // K previous pin value
reg             jP;             // J previous pin value
// flags/conditions...
reg             crc05usb;       // 00= CRC5  USB    
reg             crc16usb;       // 01= CRC16 USB   
reg             crc16itt;       // 10= CRC16 CCITT 
reg             crc16ndef;      // 11= undefined   
reg             toggle;         // data bit 0 or 1
reg             BitStuff;       // unstuff this bit
reg             SE0_SE1;        // SE0/SE1 condition
///////////////////////////////////////////////////////////////////////////////
// set crc option
    always @(poly)  begin   
        crc05usb  = (poly == 2'b00);                    // CRC5usb   =(0 2 5)
        crc16usb  = (poly == 2'b01);                    // CRC16usb  =(0 2      15 16)
        crc16itt  = (poly == 2'b10);                    // CRC16ccitt=(0   5 12    16)
        crc16ndef = (poly == 2'b11);                    // undefined
    end
// check for a "1" bit toggle, and SE0/SE1 conditions, and BitStuff condition
    always @(kI or jI or kP or stuffcnt)  begin   
        toggle    = kI ^ kP;                            // data bit (toggle) = new pin value ^ previous pin value
        SE0_SE1   = (kI == jI);                         // detect SE0/SE1 (j==k)
        BitStuff  = (!toggle & (stuffcnt == 3'b110) & (crc05usb or crc16usb));  // unstuff this bit
    end    
///////////////////////////////////////////////////////////////////////////////
// Set Initial conditions
    always @(posedge CLK) begin
        if (Load_d) begin                               // write initial values to registers
            k0       = d[15];                           // previous K
            j0       = d[14];                           // previous J
            stuffcnt = d[13:11];                        // original stuff counter value
            bitcnt   = d[10:8];                         // original bit   counter value
            data     = d[7:0];                          // original data value (accum)
            poly     = s[8:7];                          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
// ???      kpin     = value(s[6:0]);                   // K pin no.
// ???      jpin     = value(s[6:0]) ^ 1;               // J pin no.
// ???      k        = pins[kpin];                      // new pin value
// ???      j        = pins[jpin];                      // new pin value
            k        = kI;                              // new pin value
            j        = jI;                              // new pin value
        end
        else begin                                      // !Load_d = normal RUN (compiler wants in one block)
// ??? is this correct way around etc ???
            k        = kI;                              // new pin value
            j        = jI;                              // new pin value
            kP       = kI;                              // previous pin value
            jP       = jI;                              // previous pin value
// check for bit unstuff
            if (BitStuff) begin
                // unstuff...
                stuffcnt = 3'b000;                      // reset unstuff counter
//              bitcnt   = bitcnt;                      // implicit but makes hold action clear
            end
            else if (!SE0_SE1) begin
                // valid data bit
                bitcnt++;                               // 
                stuffcnt++;                             // will be reset at result if input bit toggles
            end
        end
    end                                                                          
///////////////////////////////////////////////////////////////////////////////
// CRC routine
reg             kr0;
reg             kr2;
reg             kr5;
reg             kr12;
reg             kr15;

// calculate the new crc... (decoded values so no overlaps in if)
    always @(*) begin
        if crc05usb begin
            kr0  = toggle ^ crc[4];
            kr2  = toggle ^ crc[4];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = 1'b0;
        end
        if crc16usb begin
            kr0  = toggle ^ crc[15];
            kr2  = toggle ^ crc[15];
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = toggle ^ crc[15]; 
        end
        if crc16itt egin
            kr0  = toggle ^ crc[15];
            kr2  = 1'b0;
            kr5  = toggle ^ crc[15];
            kr12 = toggle ^ crc[15];
            kr15 = 1'b0; 
        end
        if crc16ndef begin
            kr0  = 1'b0;
            kr2  = 1'b0;
            kr5  = 1'b0;
            kr12 = 1'b0;
            kr15 = 1'b0; 
        end
    end        
    always @(posedge CLK) begin
        if (Load_d) begin                               // write to reg initial value
            crc = d[31:16];                             // original crc value (accum)
        end
        else if (!SE0_SE1 & !BitStuff) begin
            crc[0]  = kr0;
            crc[1]  = crc[0];
            crc[2]  = crc[1] ^ kr2;
            crc[3]  = crc[2];
            crc[4]  = crc[3];
            crc[5]  = crc[4] ^ kr5;
            crc[6]  = crc[5];
            crc[7]  = crc[6];
            crc[8]  = crc[7];
            crc[9]  = crc[8];
            crc[10] = crc[9];
            crc[11] = crc[10];
            crc[12] = crc[11] ^ kr12;
            crc[13] = crc[12];
            crc[14] = crc[13];
            crc[15] = crc[14] ^ kr15;
        end
    end    
        
///////////////////////////////////////////////////////////////////////////////
    
// set D results
    always @(*)  begin                                  ??? or @(posedge CLK)
        r[31:16] = crc;
        r[15]    = k;
        r[14]    = j;
        r[13:11] = stuffcnt;
        r[10..8] = bitcnt;
        if (BitStuff) begin
            r[7:0]   = data;                            // unstuff so no change
        end
        else begin
            r[6:0] = data[7:1];                         // LSB first - shift and...
            r[7]   = toggle;                            // ...add new data bit
        end       
    end    
    
// set Z and C flags
    always @(*)  begin                                  ??? or @(posedge CLK)
        if wz then begin
            if (!BitStuff & (bitcnt == 3'b000)) begin
                zz = 1'b1;                              // byte ready
            end
            else begin    
                zz = 1'b0;                              // byte not ready
            end
        end
        if wc then begin          
            cy = SE0_SE1;                               // c = SE0/SE1
        end           
    end
endmodule
///////////////////////////////////////////////////////////////////////////////

Cluso99 · 2014-03-11 22:44

jmg wrote: »

Yes, these are to make later code easier to read. The compiler/fit will likely optimize some of these names away.
To keep them, they can be move to the module header where they become pins

Only sort of. That's where it gets tricky - the best code is stand alone, that has one register and some muxes on that.
That makes eqn-scan, and general testing easier.

If you want to code this like it is a Read/Write path on a register, then you do not have the register as well and so it does not reduce down to useful equations.

See the amended code, <= is better than =, strangely = is almost right, and seems ok in very simple cases, and gives no errors.

Again not quite, the counters are further up, and the data should really be registered

Yes.

Rather than trying the double gymnastics of [start of each instruction] and [end of each instruction], I think it is best in the early stages to KISS, and focus on simplest most readable verilog, that is then used as a template for software.
I treat each CLK as a data sample point on the USB waveform.

I'm sure Chip will be able to re-warp it around registers, if the pathways allow it, or he may choose to use separate registers.
At some point the extra muxes to merge all this into the opcode tree, will bite into the MHz values.
Local routing is smaller and faster.

The only benefit of a full merge into the multiport register stack, is you can run multiple copies in multiple registers, but I don't think anyone is expecting to run TWO USBs in one COG ?! Just one USB with some spare MIPS would be fine for most.
There are 8 COGS here.

Thanks. I need to relook at the changes you made.

Yes, I am sure Chip will know the best way to do it.

No, I am not expecting to do multiple USBs in a single cog. Urhg

Cluso99 · 2014-03-11 22:51

Sapieha wrote: »

Hi Cluso.

I have any question regarding USB packet --->

1. Have every BYTE that header with bit stuff ---- else are it only at start of packet

2. Have every byte any Start-Stop condition else Only entire packet?

Sory for that questions --- But cant find that on Internet

1. All groups of bits can have a bit stuff. If there are more that 6 bits without a transition, a bit change is inserted. So it starts right from the start. However, because the header data is special, I don't think it can occur at the beginning. But I will take care of it anyway because that's the easiest to do.

2. No, its NRZI synchronous. No start or stop bits ever. There are sync bits at the start, and the SE0 (both J & K low) at the end.

BTW Thanks for the logic but I am so far removed from that now its not much help to me for now.

jmg · 2014-03-11 22:54

Cluso99 wrote: »

I think this is getting close now. Thanks again jmg.

I see you combine CRC into BitStuff - Here, it may pay to expand that slightly?

See http://en.wikipedia.org/wiki/Cyclic_redundancy_check

crc05usb -> Does this change BitStuff ?
crc16usb -> Use USB bit-stuff rules
crc16itt -> Use SDLC bitstuff rules
crc16ndef -> disable BitStuff, for more general CRC use ? - Pick one ?
I think you can also attach the CRC to a USB sending Pins (includes stuff, which HW removes), and (quickly) grab the CRC for use in TX append ?

Cluso99 · 2014-03-11 23:00

I don't have a logic analyser. I am just going to try this realtime to snoop what is happening on a real FS USB (FTDI to P1). I will just treat it as though I am receiving the data, and debug info out the P2s serial. I can check it out quite simply as I am used to this type of thing.

A DE2 could do this in two cogs and the Propplug.

Earlier it was asked about syncing to SE0 and waiting. It is a simple matter while waiting for the next valid frame to start, to look for the SE0 or SE1 condition. Two successive pin reads will validate an SE0 condition. Remember, the USB line is not oscillating (else the whole thing is U/S), so the unfortunate read during a transition will be resolved by two consecutive reads. The frame resync mechanism is not hard and I am doing that now (well 3+ months ago).

Cluso99 · 2014-03-11 23:10

jmg wrote: »

I see you combine CRC into BitStuff - Here, it may pay to expand that slightly?

See http://en.wikipedia.org/wiki/Cyclic_redundancy_check

crc05usb -> Does this change BitStuff ?
crc16usb -> Use USB bit-stuff rules
crc16itt -> Use SDLC bitstuff rules
crc16ndef -> disable BitStuff, for more general CRC use ? - Pick one ?
I think you can also attach the CRC to a USB sending Pins (includes stuff, which HW removes), and (quickly) grab the CRC for use in TX append ?

If memory serves me correctly I don't think the CRC5 frames can even generate a bit stuff because of the bit formatting.

Just reread the CRC algorithms on the wiki. Its as I thought, by just passing the received CRC thru the CRC generator, the final CRC after this will be a fixed value ($8005 IIRC). This is easy. Its so long ago since I calculated CRC16s on IBM sync comms using micros.

The value depends on start ($0000 or $FFFF) and endian (LSB or MSB first). Once it is working I can check the endian issue.

You may have noted that the last post also fixed the endian of the data byte I had it the wrong way around

ANd yes, I am sure I can grab the CRC calculated from this during the last data bit for sending out the CRC.

Sapieha · 2014-03-11 23:47

Hi Cluso.

On this page are link to one PDF.

http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1250045&viewfull=1#post1250045

That show as CRC5 are 11bits IN. that say to me -- after all bits of PID received

BUT CRC16 calculated bitwise.

Cluso99 wrote: »

If memory serves me correctly I don't think the CRC5 frames can even generate a bit stuff because of the bit formatting.

Just reread the CRC algorithms on the wiki. Its as I thought, by just passing the received CRC thru the CRC generator, the final CRC after this will be a fixed value ($8005 IIRC). This is easy. Its so long ago since I calculated CRC16s on IBM sync comms using micros.

The value depends on start ($0000 or $FFFF) and endian (LSB or MSB first). Once it is working I can check the endian issue.

You may have noted that the last post also fixed the endian of the data byte I had it the wrong way around

ANd yes, I am sure I can grab the CRC calculated from this during the last data bit for sending out the CRC.

rogloh · 2014-03-12 03:51

Here is some brief CRC info from the USB guys... looks like they invert the CRC before transmitting it at the end, and on reception there will be a known constant residual after doing the CRC on the entire packet including its CRC (it will be non zero because of this). Something to bear in mind.

http://www.usb.org/developers/whitepapers/crcdes.pdf

EDIT: Another document which describes SE0 detection problems... sounds a bit scary if there is asynchronous SE0 generation and bit dribble going on (see Pages 7-8)...! http://www.usb.org/developers/whitepapers/siewp.pdf

Bill Henning · 2014-03-12 06:49

Actually, the buffers are a bit bigger than that - from the web page:

216K Block RAM supports following memory configurations*

8 channels with 24K sample depth
16 channels with 12K sample depth
32 channels with 6K sample depth

What I like about them is how inexpensive they are, after I got the first one, I picked up two more so I could test more gear at the same time.

My 500Msps unit only has a 4K buffer, which is a real pain. It is supposed to have a compressed mode, but with the firmware I have installed, that does not work. Reminds me to update its firmware...

What I really want is a 1Gsps or 2Gsps unit with a large buffer...

Hanno's ViewPort will sample to clkfreq using four cogs, and has an approximately 1500 sample buffer. I used it to debug Morpheus a few years ago.

jmg wrote: »

Looks nice, says
16 channels with 8K sample depth
8 channels with 16K sample depth

which is just a little light for 1 USB frame.

My personal preference is Logic Analysers that capture & store timestamps, as they have much better dynamic range.
A P1 might make 1.5MHz that way ?
> 3MHz (say 4MHz) would allow capture of the USB edges and the mid-point sample-tags, but that may be asking too much.
I guess multiple COGS could give more, and a Logic capture unit does not care it if uses 7 COGS for captures.

cgracey · 2014-03-12 08:38

Cluso99,

I'm about ready to release the new FPGA image. I just need to finish the docs.

Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?

jmg · 2014-03-12 11:57

cgracey wrote: »

I'm about ready to release the new FPGA image. I just need to finish the docs.

Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?

I would say here, that any code that defines and selects the pin-pair (with reverse feature), and does SE0 and Toggle decode will still be common to any solution. (ie not be wasted at all)

It would also allow more testing in a FPGA, as the present USB code is not quite enough entirely in SW.

That said, a release now would be used by everyone, and if all that is added is USB_SET on a .1 release, only a few would need to download the .1, so to most it would not be a dual release.

jmg · 2014-03-12 13:13

Pasted from other thread, as it is USB detail

cgracey wrote: »

The counters can count the frequency of edges and the durations of states, but they don't have a reload mode like you are asking about.

A special circuit can be made for the USB handler, though. In many instances, it's not the guts of a circuit that take lots of space, but all the conduit to make it breathe. If we encapsulated it, it might be the best way to go.

Here is some Verilog for a Sync'able Baud counter, that should work from

/4 ie 48MHz CLK on 12M USB
to
> /133 ie > 200MHz CLK on 1.5M USB

//           0   1   2   3   0   1   2   3   0   1   2   3   0   1   2   3   0   1   2   3   0   1   2   3
// CLK  ==\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=
// Di  ==============\\\____________///==============================================\\\_____________///============
// Di'  ================\______________/================================================\_______________/============
// Di'' ====================\_______s______/========s===============s===============s=======\_______s_______/============
// ED    _______________/===\__________/===\____________________________________________/===\___________/===\________
// RST                      ^->0           ^->0                                             ^->0           ^->0 
// TSW  xxxxxxxxxxxxxxxxxxxx____/====\__________/====\__________/====\__________/====\__________/====\__________
//                                  ^-               ^-              ^-              ^-              ^- get s
//  RL    Comp       CTR    TSW
// 0011   SHR  001    M4    1-2
// 0111   SHR  011    M8    3-4
// 1000   SHR  100    M9    4-5
// 1001   SHR  100    M10   4-5
//                            ^-- Grabs Di'' on this edge 
//  ED is Edge Detect,  Di' XOR Di'',  and TSW is Sample Enable Window for Di'', one clk wide.
// TSW as CE samples just before falling edge  


reg [7:0] RL_Ctr;    // RL is 8b reload field, ED(i), TSW(o) are one bit
  always @(*) begin  // combin codes == is 16 wide, >= is many more.
    RL_FS = (RL_Ctr == RL);              // common compare, flyback to 00, change constraints to keep this.
    TSW   = (RL_Ctr == {1'b0,RL[7:1]});  // Divide by 2 compare/slice 1 clk wide , keeps clear of flyback chatter.
    RL_FZ = RL_FS | ED | WrRL;  // force to zero on Either FullScale (free run) OR USB Edge Detect 
    // Optional WriteRL signal, can reset on Baud change, to allow timed SW start, and safe lowering of Value.
  end  
 always  @(posedge CLK) 
 begin
     if (RL_FZ) begin 
        RL_Ctr <= 7'b0000000;  // Sync Flyback on TC 
     end
     else begin 
        RL_Ctr <= RL_Ctr+1;    // Up counter 
     end
 end

 // * changed to Up counter, due to rounding nature of SHR
 //   Adds a compare, but counters are simpler, Sync CLR or INC. 
 // * Added Force Zero term, to give optimizer less options & shrink counters further.
 // * Added Optional WrRL, to force reset on Update of BaudValue(RL)
 //   Allows Sw control of timing, and safe decrease in BaudV

TSW is the Sample window, which can then enable the Byte-level WAITUSB style code block discussed above.

ie this snippet allows BYTE level rather than BIT level handling, and re-syncs sample point on USB data, to allow longer stream tolerance.

On /4 the phase of TSW matters, but I think above is right, for samples taken from D'' (2nd sampler FF)

This supports odd divides too, for more clock flexibility. Takes an 8 bit RL value to set Baud speed.

Cluso99 · 2014-03-12 16:19

cgracey wrote: »

Cluso99,

I'm about ready to release the new FPGA image. I just need to finish the docs.

Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?

Chip,
If it is easy to convert this Verilog then it would be nice to have this to be able to test it. I am not sure it is totally correct, but it is a place to start.
BTW Where I use xxx = 3'b000 or similar, jmg has suggested it be xxx <= 3'b000 (ie replace = with <=). I have not had time to check what this means.

My code is in post #107.
Thanks heaps.

Postedit: Should have said, if it is quicker/easier for you to put out a release without it, and follow up with a release with the above shortly after, that is fine by me.

BTW IMHO I think a number of us would appreciate the fpga code + pnut before you complete the docs. We have a lot of things to change since the last release even without the docs.

Cluso99 · 2014-03-12 16:29

rogloh wrote: »

Here is some brief CRC info from the USB guys... looks like they invert the CRC before transmitting it at the end, and on reception there will be a known constant residual after doing the CRC on the entire packet including its CRC (it will be non zero because of this). Something to bear in mind.

http://www.usb.org/developers/whitepapers/crcdes.pdf

EDIT: Another document which describes SE0 detection problems... sounds a bit scary if there is asynchronous SE0 generation and bit dribble going on (see Pages 7-8)...! http://www.usb.org/developers/whitepapers/siewp.pdf

I haven't looked yet. There are various ways the CRCs can be calculated and hence they reverse/invert/startvalues can all be different. The ultimate result is the same though. FWIW I am hoping that I have it the correct way around. I realised yesterday I had the data byte being assembled in reverse (MSB first instead of LSB) but I fixed that yesterday. The CRC16 I have used requires that the CRC be preset to $FFFF and will result at the end with IIRC $8005. I can check this out when Chip implements the Verilog. If its wrong, then temporarily I can correct it in sw and modify the Verilog for the next release.

I am happy to be able to detect SE0 and also to resync for new frames. I have seen dribble detection but I don't have it covered yet.

Thanks for the links. There is a much older P2 USB thread that I started where I listed some of the docs I use. I am quite happy I have the bit stream covered but I don't have have a good understanding of the upper sw protocol levels. But I do have some info to guide me.

jmg · 2014-03-12 16:31

Cluso99 wrote: »

Chip,
If it is easy to convert this Verilog then it would be nice to have this to be able to test it. I am not sure it is totally correct, but it is a place to start.
My code is in post #107.
Thanks heaps.

I think chip was meaning the earlier, simpler code to allocate Pins and manage SE0 and T into the flags ?

The code in #107 is not quite 'mission-ready', and Pin mapping and the couple of FF's & XORs to do SE0_SE1 and T should be common to any extended code.

BTW Where I use xxx = 3'b000 or similar, jmg has suggested it be xxx <= 3'b000 (ie replace = with <=). I have not had time to check what this means.

<= assign is verilog that ensures you do get a clocked result. ( ie usually a D-FF )
= within a clocked block seems to sometimes give a clocked result, but not always. Best to be careful.
( another reason I suggested you run something like Lattice ISPlever)

P2 and full speed USB slave requirements/ideas

Comments