I'm not following, byte handling allows much lower clocks, even in one task.
I think lowish (FPGA region) clocks and threads should be a practical goal.
If there is any possibility of doing it in 1 cog, then threads or interleaved code will be required. If we have a couple of free instructions per bit, then that permits getting things ready for the reply, as well as decoding as it comes in.
Ideally, but TX is less 'drop dead' as it can take some time to assemble/organize things I think.
True of course.
Of course, I think coding a "verilog clone" in SW for 1.5 MHz USB testing should be possible.
If that also allows timer-paced sampling, it is a small step to use counters and a per-byte jump.
The shift to timer-paced operation uses almost identical Verilog, and a data buffer for read is small.
It may also avoid this somewhat complex opcode, pushing down fMAX if it works on register-space.
(timer paced code decouples things a little from register critical paths)
The reason I am using FS rather than LS is that I can parallel the FTDI transactions, so I can snoop the USB FS. I don't have any LS USB devices (that I know of) that I can snoop the same way.
What this means is that I can decode all the incoming frames to the FTDI Chip (connected to a P1). I can also snoop the replies.
Remember, while I have read the USB Spec summaries, and looked at code doing the protocol, I have never actually done it. However, I have written lots of sync software over many years including SDLC and BiSync, and built ASCII to EBCDIC sync converters. But this was before TCPIP etc.
I think maybe the CRC does not need a buffered read, as it is checked on EOP ?
As long as you save after each byte, and you keep at least 3 levels, you will have the CRC available. I am unsure if the CRC can be used to verify the CRC (if you know what I mean). I will need to do some simple testing of this.
BTW I did write a simple P1 program to calculate CRC5 & CRC16 for USB. I just have not got around to looking at it.
If there are spare virtual Pins, the USB RxRDY flags could hook into some of those ?
Not sure what you mean here?
Chip would likely need to modify the counters slightly to allow /N reloadable counting, and edge resync.
I'm not sure if those modes are already in the Counters.
I don't know.
I am still after the KISS way, at least for now. If it turns out that it's not too complex to set off a simple instruction to run in the background like the mult/cordic instructions, and they don't take huge blocks of silicon, then it may be worthwhile. ATM I am trying to walk not run.
That's starting to sound like a lot of crossed fingers...?
Chip may already have edge reset modes in the counters, and I think the SW WAIT can then work, with a Counter.
To test at 1.5MHz, and a simple Reload timer, the FPGA needs to clock at either 78MHz or 81MHz , with reload values of 52 or 54, and use SW wait values of 50% of those for mid-bit sampling.
No, its not crossed fingers. I can reliably resync to new packets at 80MHz FPGA. It is just coded sequentialy atm to read the incoming data bytes.
For 80MHz you wait +1/3 +1/3 -2/3 (ie you wait an extra +1 clock +1 clock, -2 clocks). That is how they got USB to work originally.
I am unsure if the CRC can be used to verify the CRC (if you know what I mean). I will need to do some simple testing of this.
See the discussion further up.
Apparently, if you include the CRC in the stream, ie read to the end tag, then the CRC should read 0000
That would make life simpler.
The problem with this, is if the Verilog needs a lot of changes( as this does), it quickly becomes too clumsy to have someone else applying fix-ups. Also in the form you code, checking is harder as it is not so self contained.
As always, it is better to code in small pieces, get 'working' equations, and look at the .eq0 & .rpt files to confirm you have counters / clock enables / MUXes as expected, and no logic blow-outs.
Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).
////////////////////////////////////////////////////////////////////////////////
// Acknowledgements: Verilog code for CRC's http://www.easics.com
// RR20140310 start
// RR20140311,12 continued
////////////////////////////////////////////////////////////////////////////////
// polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16ccitt=(0 5 12 16)
// data width: 1, LSB first
//
// inputs: D, S, PINS
// outputs: D, Z, C
module RxUSB
(
input CLK, //
input Load_d, //
input jI, //
input kI, //
input WZ, //
input WC, //
input [31:0] s, // S operand
input [31:0] d, // D operand
input [127:0] p, // input pins
output reg [31:0] r, // D result
output reg zz, // Z flag
output reg cy // Carry flag
);
reg [15:0] crc; // original CRC (accumulated)
reg [2:0] bitcnt; // data bit counter 3 bits
reg k; // K new pin value
reg j; // J new pin value
reg [2:0] stuffcnt; // stuff counter 3 bits
reg [7:0] data; // data byte (accumulated)
reg [8:7] poly; // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
reg [6:0] pinno; // pin pair numbers 0-127
reg [15:0] newcrc; // new crc
reg t; // 1 if k toggles (ie 1 bit)
reg kP; // K old pin value
reg jP; // J old pin value
//reg r,z,c; // D result
reg crc05usb;
reg crc16usb;
reg crc16itt;
reg crc16ndef;
reg SkipStuff;
reg InvalidPM;
///////////////////////////////////////////////////////////////////////////////
// 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
always @(poly) begin
crc05usb = (poly == 2'b00); // CRC5usb =(0 2 5)
crc16usb = (poly == 2'b01); // CRC16usb =(0 2 15 16)
crc16itt = (poly == 2'b10); // CRC16ccitt=(0 5 12 16)
crc16ndef = (poly == 2'b11); // undefined - alias to one above
end
// check for a "1" bit toggle
always @(kI or jI or kP or stuffcnt) begin
t = kI ^ kP; // new pin value ^ previous pin value; 1=toggled
SkipStuff = (!t & (stuffcnt == 3'b110)); // !t needed for ccitt ?
InvalidPM = (kI==jI); // Signaling states are non-diff
end
always @(posedge CLK) begin
if (Load_d) begin // WRITE to register - Value INIT
// crc = d[31:16]; // original crc value (accum) moved below
kP = d[15]; // previous K
jP = d[14]; // previous J
stuffcnt = d[13:11]; // original stuff counter value
bitcnt = d[10:8]; // original bit counter value
data = d[7:0]; // original data value (accum)
poly = s[8:7]; // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
//? kpin = value(s[6:0]); // K pin no.
//? jpin = value(s[6:0]) ^1 // J pin no.
k = kI; // new pin value
j = jI; // new pin value
end // Load_d
else begin // !Load_d = normal RUN , compiler wants in one block..
k = kI; // new pin value
j = jI; // new pin value
kP = k; // previous K
jP = j; // previous J
// check for bit unstuff
if (SkipStuff) begin
// unstuff
stuffcnt = 3'b000;
// bitcnt = bitcnt; // implicit, but makes hold action clear
end
else if (!InvalidPM) begin
// inc bit count & accum data bit into byte
bitcnt++;
stuffcnt++; // will be reset at result if input bit toggles
end
end // Load_d
end // (posedge CLK)
reg kr0;
reg kr2;
reg kr5;
reg kr12;
reg kr15;
always @(*) begin
// calculate the new crc... - decoded values, so no overlaps in if
if (crc05usb) begin
kr0 = t ^ crc[4];
kr2 = t ^ crc[4];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = 1'b0;
end
if (crc16usb) begin
kr0 = t ^ crc[15];
kr2 = t ^ crc[15];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = t ^ crc[15];
end
if (crc16itt) begin
kr0 = t ^ crc[15];
kr2 = 1'b0;
kr5 = t ^ crc[15];
kr12 = t ^ crc[15];
kr15 = 1'b0;
end
if (crc16ndef) begin // alias crc16itt, so cover ALL decodes.
kr0 = t ^ crc[15];
kr2 = 1'b0;
kr5 = t ^ crc[15];
kr12 = t ^ crc[15];
kr15 = 1'b0;
end
end // always @(*)
always @(posedge CLK) begin
if (Load_d) begin // WRITE to register - Value INIT
crc = d[31:16]; // original crc value (accum)
end
else if (!InvalidPM & !SkipStuff) begin
crc[0] = kr0;
crc[1] = crc[0];
crc[2] = crc[1] ^ kr2;
crc[3] = crc[2];
crc[4] = crc[3];
crc[5] = crc[4] ^ kr5;
crc[6] = crc[5];
crc[7] = crc[6];
crc[8] = crc[7];
crc[9] = crc[8];
crc[10] = crc[9];
crc[11] = crc[10];
crc[12] = crc[11] ^ kr12;
crc[13] = crc[12];
crc[14] = crc[13];
crc[15] = crc[14] ^ kr15;
end // valid
end // (posedge CLK)
// set results
always @(*) begin
r[31:16] = crc;
r[15] = k;
r[14] = j;
end // always @(*)
always @(*) begin // non register here ? - this is a bit mangled, data needs fixing
if (t) begin // toggled bit?
r[13:11] = 3'b000; // reset stuff counter - moved to above
end
else begin
r[13:11] = stuffcnt;
end
r[10:8] = bitcnt;
if (SkipStuff) begin
r[7:0] = data;
end
else begin
r[7:1] = data[6:0];
r[0] = t; // add new data bit
end
end // @(*)
always @(posedge CLK) begin
if (WZ) begin
if ( !SkipStuff & (bitcnt == 3'b000)) begin
zz = 1'b1; // byte ready
end
else begin
zz = 1'b0; // byte not ready
end
end
if (WC) begin
cy = k ^ j; // c = SE0/SE1
end
end // (posedge CLK)
endmodule // RxUSB
Thanks heaps.
Some things I didn't know was crc = crc ^ x was possible.
Also which ways are the best. These are all things I don't understand.
So for me, its better that I ultimately put the things to be done within if blocks and let Chip (or you) sort that part out for me.
See the discussion further up.
Apparently, if you include the CRC in the stream, ie read to the end tag, then the CRC should read 0000
That would make life simpler.
Yes, this is what I was wondering. Seems it should be possible because that is the way the old hw would have likely worked. But then again, it does not sound correct.
It is easily solved by running my P1 program with some input parameters and see. just haven't got around to it yet.
I've updated the code, as checking the eqns showed it dropped the ball on some CRC nodes.
I tend to always use <= for clocked and = combin, and it seems your use of = in clocked sometimes works, but can get confused on more complex forms...
Always decodes this, so it is a logic block, not a clocked register.
// check for a "1" bit toggle
always @(kI or jI or kP or stuffcnt) begin
t = kI ^ kP; // new pin value ^ previous pin value; 1=toggled
SkipStuff = (!t & (stuffcnt == 3'b110)); // !t needed for ccitt ?
InvalidPM = (kI==jI); // Signaling states are non-diff
end
Always decodes this on any change in the inputs kI, jI, kP, or stuffcnt. Again, a logic block.
always @(posedge CLK) begin
if (Load_d) begin // WRITE to register - Value INIT
// crc = d[31:16]; // original crc value (accum) moved below
kP = d[15]; // previous K
jP = d[14]; // previous J
stuffcnt = d[13:11]; // original stuff counter value
bitcnt = d[10:8]; // original bit counter value
data = d[7:0]; // original data value (accum)
poly = s[8:7]; // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
//? kpin = value(s[6:0]); // K pin no.
//? jpin = value(s[6:0]) ^1 // J pin no.
k = kI; // new pin value
j = jI; // new pin value
end // Load_d
Performed once at the start of each instruction (loads initial values), clocked by CLK.
else begin // !Load_d = normal RUN , compiler wants in one block..
k = kI; // new pin value
j = jI; // new pin value
kP = k; // previous K
jP = j; // previous J
// check for bit unstuff
if (SkipStuff) begin
// unstuff
stuffcnt = 3'b000;
// bitcnt = bitcnt; // implicit, but makes hold action clear
end
else if (!InvalidPM) begin
// inc bit count & accum data bit into byte
bitcnt++;
stuffcnt++; // will be reset at result if input bit toggles
end
end // Load_d
end // (posedge CLK)
Completes the remaining initialisation, clocked by CLK.
reg kr0;
reg kr2;
reg kr5;
reg kr12;
reg kr15;
always @(*) begin
// calculate the new crc... - decoded values, so no overlaps in if
if (crc05usb) begin
kr0 = t ^ crc[4];
kr2 = t ^ crc[4];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = 1'b0;
end
if (crc16usb) begin
kr0 = t ^ crc[15];
kr2 = t ^ crc[15];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = t ^ crc[15];
end
if (crc16itt) begin
kr0 = t ^ crc[15];
kr2 = 1'b0;
kr5 = t ^ crc[15];
kr12 = t ^ crc[15];
kr15 = 1'b0;
end
if (crc16ndef) begin // alias crc16itt, so cover ALL decodes.
kr0 = t ^ crc[15];
kr2 = 1'b0;
kr5 = t ^ crc[15];
kr12 = t ^ crc[15];
kr15 = 1'b0;
end
end // always @(*)
always @(posedge CLK) begin
if (Load_d) begin // WRITE to register - Value INIT
crc = d[31:16]; // original crc value (accum)
end
else if (!InvalidPM & !SkipStuff) begin
crc[0] = kr0;
crc[1] = crc[0];
crc[2] = crc[1] ^ kr2;
crc[3] = crc[2];
crc[4] = crc[3];
crc[5] = crc[4] ^ kr5;
crc[6] = crc[5];
crc[7] = crc[6];
crc[8] = crc[7];
crc[9] = crc[8];
crc[10] = crc[9];
crc[11] = crc[10];
crc[12] = crc[11] ^ kr12;
crc[13] = crc[12];
crc[14] = crc[13];
crc[15] = crc[14] ^ kr15;
end // valid
end // (posedge CLK)
Calculates the new CRC, if required else keep the same, clocked by CLK. (needs a tidy up)
// set results
always @(*) begin
r[31:16] = crc;
r[15] = k;
r[14] = j;
end // always @(*)
always @(*) begin // non register here ? - this is a bit mangled, data needs fixing
if (t) begin // toggled bit?
r[13:11] = 3'b000; // reset stuff counter - moved to above
end
else begin
r[13:11] = stuffcnt;
end
r[10:8] = bitcnt;
if (SkipStuff) begin
r[7:0] = data;
end
else begin
r[7:1] = data[6:0];
r[0] = t; // add new data bit
end
end // @(*)
Accumulates the new Data, if required else keep the same, clocked by CLK.
Increment the bit counter, if required.
Increments the stuff counter, or resets it, as required.
always @(posedge CLK) begin
if (WZ) begin
if ( !SkipStuff & (bitcnt == 3'b000)) begin
zz = 1'b1; // byte ready
end
else begin
zz = 1'b0; // byte not ready
end
end
if (WC) begin
cy = k ^ j; // c = SE0/SE1
end
end // (posedge CLK)
endmodule // RxUSB
Always decodes this, so it is a logic block, not a clocked register.
Always decodes this on any change in the inputs kI, jI, kP, or stuffcnt. Again, a logic block.
Yes, these are to make later code easier to read. The compiler/fit will likely optimize some of these names away.
To keep them, they can be move to the module header where they become pins
Performed once at the start of each instruction (loads initial values), clocked by CLK.
Completes the remaining initialisation, clocked by CLK.
Only sort of. That's where it gets tricky - the best code is stand alone, that has one register and some muxes on that.
That makes eqn-scan, and general testing easier.
If you want to code this like it is a Read/Write path on a register, then you do not have the register as well and so it does not reduce down to useful equations.
Calculates the new CRC, if required else keep the same, clocked by CLK. (needs a tidy up)
See the amended code, <= is better than =, strangely = is almost right, and seems ok in very simple cases, and gives no errors.
Accumulates the new Data, if required else keep the same, clocked by CLK.
Increment the bit counter, if required.
Increments the stuff counter, or resets it, as required.
Again not quite, the counters are further up, and the data should really be registered
Sets the Z & C flags, clocked by CLK.
Yes.
Rather than trying the double gymnastics of [start of each instruction] and [end of each instruction], I think it is best in the early stages to KISS, and focus on simplest most readable verilog, that is then used as a template for software.
I treat each CLK as a data sample point on the USB waveform.
I'm sure Chip will be able to re-warp it around registers, if the pathways allow it, or he may choose to use separate registers.
At some point the extra muxes to merge all this into the opcode tree, will bite into the MHz values.
Local routing is smaller and faster.
The only benefit of a full merge into the multiport register stack, is you can run multiple copies in multiple registers, but I don't think anyone is expecting to run TWO USBs in one COG ?! Just one USB with some spare MIPS would be fine for most.
There are 8 COGS here.
The reason I am using FS rather than LS is that I can parallel the FTDI transactions, so I can snoop the USB FS. I don't have any LS USB devices (that I know of) that I can snoop the same way.
What this means is that I can decode all the incoming frames to the FTDI Chip (connected to a P1). I can also snoop the replies.
Remember, while I have read the USB Spec summaries, and looked at code doing the protocol, I have never actually done it. However, I have written lots of sync software over many years including SDLC and BiSync, and built ASCII to EBCDIC sync converters. But this was before TCPIP etc.
A P1 might even be able to edge-capture to 12.5ns at 1.5MHz speeds, as an edge-based logic analyser ?
A complete frame can never need more than 1500 max stamps.
The problem with this, is if the Verilog needs a lot of changes( as this does), it quickly becomes too clumsy to have someone else applying fix-ups. Also in the form you code, checking is harder as it is not so self contained.
As always, it is better to code in small pieces, get 'working' equations, and look at the .eq0 & .rpt files to confirm you have counters / clock enables / MUXes as expected, and no logic blow-outs.
Below is the code, edited/modified so Lattice Verilog at least compiles it (with some warnings).
////////////////////////////////////////////////////////////////////////////////
// Acknowledgements: Verilog code for CRC's http://www.easics.com
// RR20140310 start
// RR20140311,12 continued
////////////////////////////////////////////////////////////////////////////////
// polynomial: CRC5usb=(0 2 5), CRC16usb=(0 2 15 16), CRC16ccitt=(0 5 12 16)
// data width: 1, LSB first
//
// inputs: D, S, PINS
// outputs: D, Z, C
module RxUSB
(
input CLK, //
input Load_d, //
input jI, //
input kI, //
input WZ, //
input WC, //
input [31:0] s, // S operand
input [31:0] d, // D operand
input [127:0] p, // input pins
output reg [31:0] r, // D result
output reg zz, // Z flag
output reg cy, // Carry flag
output reg SkipStuff, // move so can see in EQNs better
output reg InvalidPM
);
reg [15:0] crc; // original CRC (accumulated)
reg [2:0] bitcnt; // data bit counter 3 bits
reg k; // K new pin value
reg j; // J new pin value
reg [2:0] stuffcnt; // stuff counter 3 bits
reg [7:0] data; // data byte (accumulated)
reg [8:7] poly; // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
reg [6:0] pinno; // pin pair numbers 0-127
reg [15:0] newcrc; // new crc
reg t; // 1 if k toggles (ie 1 bit)
reg kP; // K old pin value
reg jP; // J old pin value
//reg r,z,c; // D result
reg crc05usb;
reg crc16usb;
reg crc16itt;
reg crc16ndef;
///////////////////////////////////////////////////////////////////////////////
// 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
always @(poly) begin
crc05usb = (poly == 2'b00); // CRC5usb =(0 2 5)
crc16usb = (poly == 2'b01); // CRC16usb =(0 2 15 16)
crc16itt = (poly == 2'b10); // CRC16ccitt=(0 5 12 16)
crc16ndef = (poly == 2'b11); // undefined - alias to one above
end
// check for a "1" bit toggle
always @(kI or jI or kP or stuffcnt) begin
t = kI ^ kP; // new pin value ^ previous pin value; 1=toggled
SkipStuff = (!t & (stuffcnt == 3'b110)); // !t needed for ccitt ?
InvalidPM = (kI==jI); // Signaling states are non-diff
end
always @(posedge CLK) begin
if (Load_d) begin // WRITE to register - Value INIT
// crc = d[31:16]; // original crc value (accum) moved below
kP = d[15]; // previous K
jP = d[14]; // previous J
stuffcnt = d[13:11]; // original stuff counter value
bitcnt = d[10:8]; // original bit counter value
data = d[7:0]; // original data value (accum)
poly = s[8:7]; // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
//? kpin = value(s[6:0]); // K pin no.
//? jpin = value(s[6:0]) ^1 // J pin no.
k = kI; // new pin value
j = jI; // new pin value
end // Load_d
else begin // !Load_d = normal RUN , compiler wants in one block..
k = kI; // new pin value
j = jI; // new pin value
kP = k; // previous K
jP = j; // previous J
// check for bit unstuff
if (SkipStuff) begin
// unstuff
stuffcnt <= 3'b000;
// bitcnt = bitcnt; // implicit, but makes hold action clear
end
else if (!InvalidPM) begin
// inc bit count & accum data bit into byte
bitcnt++;
if (t)
stuffcnt <= 3'b000; // reset if input bit toggles
else
stuffcnt++;
end
end // Load_d
end // (posedge CLK)
reg kr0;
reg kr2;
reg kr5;
reg kr12;
reg kr15;
reg HoldCRC;
always @(*) begin
// calculate the new crc... - decoded values, so no overlaps in if
if (crc05usb) begin
kr0 = t ^ crc[4];
kr2 = t ^ crc[4];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = 1'b0;
end
if (crc16usb) begin
kr0 = t ^ crc[15];
kr2 = t ^ crc[15];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = t ^ crc[15];
end
if (crc16itt) begin
kr0 = t ^ crc[15];
kr2 = 1'b0;
kr5 = t ^ crc[15];
kr12 = t ^ crc[15];
kr15 = 1'b0;
end
if (crc16ndef) begin // alias crc16itt, so cover ALL decodes.
kr0 = t ^ crc[15];
kr2 = 1'b0;
kr5 = t ^ crc[15];
kr12 = t ^ crc[15];
kr15 = 1'b0;
end
HoldCRC = InvalidPM | SkipStuff;
end // always @(*)
always @(posedge CLK) begin
if (Load_d) begin // WRITE to register - Value INIT
crc <= d[31:16]; // original crc value (accum)
end
else if (HoldCRC) begin
crc[0] <= kr0; //16
crc[1] <= crc[0]; //17
crc[2] <= crc[1] ^ kr2; //18
crc[3] <= crc[2]; //19
crc[4] <= crc[3]; //20
crc[5] <= crc[4] ^ kr5; //21
crc[6] <= crc[5]; //22
crc[7] <= crc[6]; //23
crc[8] <= crc[7]; //24
crc[9] <= crc[8]; //25
crc[10] <= crc[9]; //26
crc[11] <= crc[10]; //27 - bad eqns??, needed <=
crc[12] <= crc[11] ^ kr12; //28
crc[13] <= crc[12]; //29
crc[14] <= crc[13]; //30
crc[15] <= crc[14] ^ kr15; //31
end // valid
end // (posedge CLK)
// set results
always @(*) begin
r[31:16] = crc;
r[15] = k;
r[14] = j;
end // always @(*)
always @(*) begin // non register here ? - this is a bit mangled, data needs fixing
if (t) begin // toggled bit?
r[13:11] = 3'b000; // reset stuff counter
end
else begin
r[13:11] = stuffcnt;
end
r[10:8] = bitcnt;
if (SkipStuff) begin
r[7:0] = data;
end
else begin
r[7:1] = data[6:0];
r[0] = t; // add new data bit
end
end // @(*)
always @(posedge CLK) begin
if (WZ) begin
if ( !SkipStuff & (bitcnt == 3'b000)) begin
zz <= 1'b1; // byte ready
end
else begin
zz <= 1'b0; // byte not ready
end
end
if (WC) begin
cy <= k ^ j; // c = SE0/SE1
end
end // (posedge CLK)
endmodule
Looks nice, says
16 channels with 8K sample depth
8 channels with 16K sample depth
which is just a little light for 1 USB frame.
My personal preference is Logic Analysers that capture & store timestamps, as they have much better dynamic range.
A P1 might make 1.5MHz that way ?
> 3MHz (say 4MHz) would allow capture of the USB edges and the mid-point sample-tags, but that may be asking too much.
I guess multiple COGS could give more, and a Logic capture unit does not care it if uses 7 COGS for captures.
I think this is getting close now. Thanks again jmg.
////////////////////////////////////////////////////////////////////////////////
// RR20140310-12 P2 RxUSB instruction
////////////////////////////////////////////////////////////////////////////////
/*---------------------------------------------------------------------------------------------------------------------
RxUSB D, S/# WZ,WC ' Receive single NRZI bit pair, accum CRC and byte, unstuff bits
where
S/# is the PinPair# and Poly bits
S[31..9] = unused
S[8..7] = 00= CRC5 USB (0 2 5)
01= CRC16 USB (0 2 15 16)
10= CRC16 CCITT (0 5 12 16)
11= undefined
S[6..0] = D-/D+ Pin Pair #0..127
The pin pair is always a pair of pins mod 2. ie nnnnnnx where x=0 and x=1 for the pair.
If the pin pair is even (S[0]=0) then J is the lowest pin and K is the higher pin of the consecutive pair
If the pin pair is odd (S[0]=1) then K is the lowest pin and J is the higher pin of the consecutive pair.
This arrangement allows for simple LS and FS by making the pin pair even or odd.
D is the cog register storing a 32 bit field...
D[31..16] = crc16
D[15] = K new pin value
D[14] = J new pin value
D[13..11] = unstuff counter 3 bits
D[10..8] = bit counter 3 bits
D[7..0] = data byte accumulation
Z = data byte ready (8 bits)
C = SE0/SE1
It would be acceptable for D to be at a fixed location eg $1F0.
---------------------------------------------------------------------------------------------------------------------*/
// inputs: D, S, PINS
// outputs: D, Z, C
////////////////////////////////////////////////////////////////////////////////
module RxUSB
(
input CLK,
input Load_d,
input jI, // new J value
input kI, // new K value
input [31:0] s, // S operand
input [31:0] d, // D operand
input wz, // WZ operand
input wc, // WC operand
input [127:0] p, // input pins
output [31:0] r, // D result
output zz, // Z flag
output cy // C flag
);
reg [15:0] crc; // original CRC (accumulated)
reg [2:0] bitcnt; // data bit counter 3 bits
reg k; // K new pin value
reg j; // J new pin value
reg [2:0] stuffcnt; // stuff counter 3 bits
reg [7:0] data; // data byte (accumulated)
reg [1:0] poly; // crc05usb/crc16usb/crc16ccitt/undef polynomial selection
reg [6:0] pinno; // pin pair numbers 0-127
reg kP; // K previous pin value
reg jP; // J previous pin value
// flags/conditions...
reg crc05usb; // 00= CRC5 USB
reg crc16usb; // 01= CRC16 USB
reg crc16itt; // 10= CRC16 CCITT
reg crc16ndef; // 11= undefined
reg toggle; // data bit 0 or 1
reg BitStuff; // unstuff this bit
reg SE0_SE1; // SE0/SE1 condition
///////////////////////////////////////////////////////////////////////////////
// set crc option
always @(poly) begin
crc05usb = (poly == 2'b00); // CRC5usb =(0 2 5)
crc16usb = (poly == 2'b01); // CRC16usb =(0 2 15 16)
crc16itt = (poly == 2'b10); // CRC16ccitt=(0 5 12 16)
crc16ndef = (poly == 2'b11); // undefined
end
// check for a "1" bit toggle, and SE0/SE1 conditions, and BitStuff condition
always @(kI or jI or kP or stuffcnt) begin
toggle = kI ^ kP; // data bit (toggle) = new pin value ^ previous pin value
SE0_SE1 = (kI == jI); // detect SE0/SE1 (j==k)
BitStuff = (!toggle & (stuffcnt == 3'b110) & (crc05usb or crc16usb)); // unstuff this bit
end
///////////////////////////////////////////////////////////////////////////////
// Set Initial conditions
always @(posedge CLK) begin
if (Load_d) begin // write initial values to registers
k0 = d[15]; // previous K
j0 = d[14]; // previous J
stuffcnt = d[13:11]; // original stuff counter value
bitcnt = d[10:8]; // original bit counter value
data = d[7:0]; // original data value (accum)
poly = s[8:7]; // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
// ??? kpin = value(s[6:0]); // K pin no.
// ??? jpin = value(s[6:0]) ^ 1; // J pin no.
// ??? k = pins[kpin]; // new pin value
// ??? j = pins[jpin]; // new pin value
k = kI; // new pin value
j = jI; // new pin value
end
else begin // !Load_d = normal RUN (compiler wants in one block)
// ??? is this correct way around etc ???
k = kI; // new pin value
j = jI; // new pin value
kP = kI; // previous pin value
jP = jI; // previous pin value
// check for bit unstuff
if (BitStuff) begin
// unstuff...
stuffcnt = 3'b000; // reset unstuff counter
// bitcnt = bitcnt; // implicit but makes hold action clear
end
else if (!SE0_SE1) begin
// valid data bit
bitcnt++; //
stuffcnt++; // will be reset at result if input bit toggles
end
end
end
///////////////////////////////////////////////////////////////////////////////
// CRC routine
reg kr0;
reg kr2;
reg kr5;
reg kr12;
reg kr15;
// calculate the new crc... (decoded values so no overlaps in if)
always @(*) begin
if crc05usb begin
kr0 = toggle ^ crc[4];
kr2 = toggle ^ crc[4];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = 1'b0;
end
if crc16usb begin
kr0 = toggle ^ crc[15];
kr2 = toggle ^ crc[15];
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = toggle ^ crc[15];
end
if crc16itt egin
kr0 = toggle ^ crc[15];
kr2 = 1'b0;
kr5 = toggle ^ crc[15];
kr12 = toggle ^ crc[15];
kr15 = 1'b0;
end
if crc16ndef begin
kr0 = 1'b0;
kr2 = 1'b0;
kr5 = 1'b0;
kr12 = 1'b0;
kr15 = 1'b0;
end
end
always @(posedge CLK) begin
if (Load_d) begin // write to reg initial value
crc = d[31:16]; // original crc value (accum)
end
else if (!SE0_SE1 & !BitStuff) begin
crc[0] = kr0;
crc[1] = crc[0];
crc[2] = crc[1] ^ kr2;
crc[3] = crc[2];
crc[4] = crc[3];
crc[5] = crc[4] ^ kr5;
crc[6] = crc[5];
crc[7] = crc[6];
crc[8] = crc[7];
crc[9] = crc[8];
crc[10] = crc[9];
crc[11] = crc[10];
crc[12] = crc[11] ^ kr12;
crc[13] = crc[12];
crc[14] = crc[13];
crc[15] = crc[14] ^ kr15;
end
end
///////////////////////////////////////////////////////////////////////////////
// set D results
always @(*) begin ??? or @(posedge CLK)
r[31:16] = crc;
r[15] = k;
r[14] = j;
r[13:11] = stuffcnt;
r[10..8] = bitcnt;
if (BitStuff) begin
r[7:0] = data; // unstuff so no change
end
else begin
r[6:0] = data[7:1]; // LSB first - shift and...
r[7] = toggle; // ...add new data bit
end
end
// set Z and C flags
always @(*) begin ??? or @(posedge CLK)
if wz then begin
if (!BitStuff & (bitcnt == 3'b000)) begin
zz = 1'b1; // byte ready
end
else begin
zz = 1'b0; // byte not ready
end
end
if wc then begin
cy = SE0_SE1; // c = SE0/SE1
end
end
endmodule
///////////////////////////////////////////////////////////////////////////////
Yes, these are to make later code easier to read. The compiler/fit will likely optimize some of these names away.
To keep them, they can be move to the module header where they become pins
Only sort of. That's where it gets tricky - the best code is stand alone, that has one register and some muxes on that.
That makes eqn-scan, and general testing easier.
If you want to code this like it is a Read/Write path on a register, then you do not have the register as well and so it does not reduce down to useful equations.
See the amended code, <= is better than =, strangely = is almost right, and seems ok in very simple cases, and gives no errors.
Again not quite, the counters are further up, and the data should really be registered
Yes.
Rather than trying the double gymnastics of [start of each instruction] and [end of each instruction], I think it is best in the early stages to KISS, and focus on simplest most readable verilog, that is then used as a template for software.
I treat each CLK as a data sample point on the USB waveform.
I'm sure Chip will be able to re-warp it around registers, if the pathways allow it, or he may choose to use separate registers.
At some point the extra muxes to merge all this into the opcode tree, will bite into the MHz values.
Local routing is smaller and faster.
The only benefit of a full merge into the multiport register stack, is you can run multiple copies in multiple registers, but I don't think anyone is expecting to run TWO USBs in one COG ?! Just one USB with some spare MIPS would be fine for most.
There are 8 COGS here.
Thanks. I need to relook at the changes you made.
Yes, I am sure Chip will know the best way to do it.
No, I am not expecting to do multiple USBs in a single cog. Urhg
1. Have every BYTE that header with bit stuff ---- else are it only at start of packet
2. Have every byte any Start-Stop condition else Only entire packet?
Sory for that questions --- But cant find that on Internet
1. All groups of bits can have a bit stuff. If there are more that 6 bits without a transition, a bit change is inserted. So it starts right from the start. However, because the header data is special, I don't think it can occur at the beginning. But I will take care of it anyway because that's the easiest to do.
2. No, its NRZI synchronous. No start or stop bits ever. There are sync bits at the start, and the SE0 (both J & K low) at the end.
BTW Thanks for the logic but I am so far removed from that now its not much help to me for now.
crc05usb -> Does this change BitStuff ?
crc16usb -> Use USB bit-stuff rules
crc16itt -> Use SDLC bitstuff rules
crc16ndef -> disable BitStuff, for more general CRC use ? - Pick one ?
I think you can also attach the CRC to a USB sending Pins (includes stuff, which HW removes), and (quickly) grab the CRC for use in TX append ?
I don't have a logic analyser. I am just going to try this realtime to snoop what is happening on a real FS USB (FTDI to P1). I will just treat it as though I am receiving the data, and debug info out the P2s serial. I can check it out quite simply as I am used to this type of thing.
A DE2 could do this in two cogs and the Propplug.
Earlier it was asked about syncing to SE0 and waiting. It is a simple matter while waiting for the next valid frame to start, to look for the SE0 or SE1 condition. Two successive pin reads will validate an SE0 condition. Remember, the USB line is not oscillating (else the whole thing is U/S), so the unfortunate read during a transition will be resolved by two consecutive reads. The frame resync mechanism is not hard and I am doing that now (well 3+ months ago).
crc05usb -> Does this change BitStuff ?
crc16usb -> Use USB bit-stuff rules
crc16itt -> Use SDLC bitstuff rules
crc16ndef -> disable BitStuff, for more general CRC use ? - Pick one ?
I think you can also attach the CRC to a USB sending Pins (includes stuff, which HW removes), and (quickly) grab the CRC for use in TX append ?
If memory serves me correctly I don't think the CRC5 frames can even generate a bit stuff because of the bit formatting.
Just reread the CRC algorithms on the wiki. Its as I thought, by just passing the received CRC thru the CRC generator, the final CRC after this will be a fixed value ($8005 IIRC). This is easy. Its so long ago since I calculated CRC16s on IBM sync comms using micros.
The value depends on start ($0000 or $FFFF) and endian (LSB or MSB first). Once it is working I can check the endian issue.
You may have noted that the last post also fixed the endian of the data byte I had it the wrong way around
ANd yes, I am sure I can grab the CRC calculated from this during the last data bit for sending out the CRC.
If memory serves me correctly I don't think the CRC5 frames can even generate a bit stuff because of the bit formatting.
Just reread the CRC algorithms on the wiki. Its as I thought, by just passing the received CRC thru the CRC generator, the final CRC after this will be a fixed value ($8005 IIRC). This is easy. Its so long ago since I calculated CRC16s on IBM sync comms using micros.
The value depends on start ($0000 or $FFFF) and endian (LSB or MSB first). Once it is working I can check the endian issue.
You may have noted that the last post also fixed the endian of the data byte I had it the wrong way around
ANd yes, I am sure I can grab the CRC calculated from this during the last data bit for sending out the CRC.
Here is some brief CRC info from the USB guys... looks like they invert the CRC before transmitting it at the end, and on reception there will be a known constant residual after doing the CRC on the entire packet including its CRC (it will be non zero because of this). Something to bear in mind.
EDIT: Another document which describes SE0 detection problems... sounds a bit scary if there is asynchronous SE0 generation and bit dribble going on (see Pages 7-8)...! http://www.usb.org/developers/whitepapers/siewp.pdf
Actually, the buffers are a bit bigger than that - from the web page:
216K Block RAM supports following memory configurations*
8 channels with 24K sample depth
16 channels with 12K sample depth
32 channels with 6K sample depth
What I like about them is how inexpensive they are, after I got the first one, I picked up two more so I could test more gear at the same time.
My 500Msps unit only has a 4K buffer, which is a real pain. It is supposed to have a compressed mode, but with the firmware I have installed, that does not work. Reminds me to update its firmware...
What I really want is a 1Gsps or 2Gsps unit with a large buffer...
Hanno's ViewPort will sample to clkfreq using four cogs, and has an approximately 1500 sample buffer. I used it to debug Morpheus a few years ago.
Looks nice, says
16 channels with 8K sample depth
8 channels with 16K sample depth
which is just a little light for 1 USB frame.
My personal preference is Logic Analysers that capture & store timestamps, as they have much better dynamic range.
A P1 might make 1.5MHz that way ?
> 3MHz (say 4MHz) would allow capture of the USB edges and the mid-point sample-tags, but that may be asking too much.
I guess multiple COGS could give more, and a Logic capture unit does not care it if uses 7 COGS for captures.
I'm about ready to release the new FPGA image. I just need to finish the docs.
Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?
I would say here, that any code that defines and selects the pin-pair (with reverse feature), and does SE0 and Toggle decode will still be common to any solution. (ie not be wasted at all)
It would also allow more testing in a FPGA, as the present USB code is not quite enough entirely in SW.
That said, a release now would be used by everyone, and if all that is added is USB_SET on a .1 release, only a few would need to download the .1, so to most it would not be a dual release.
The counters can count the frequency of edges and the durations of states, but they don't have a reload mode like you are asking about.
A special circuit can be made for the USB handler, though. In many instances, it's not the guts of a circuit that take lots of space, but all the conduit to make it breathe. If we encapsulated it, it might be the best way to go.
Here is some Verilog for a Sync'able Baud counter, that should work from
/4 ie 48MHz CLK on 12M USB
to
> /133 ie > 200MHz CLK on 1.5M USB
// 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3 0 1 2 3
// CLK ==\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=\_/=
// Di ==============\\\____________///==============================================\\\_____________///============
// Di' ================\______________/================================================\_______________/============
// Di'' ====================\_______s______/========s===============s===============s=======\_______s_______/============
// ED _______________/===\__________/===\____________________________________________/===\___________/===\________
// RST ^->0 ^->0 ^->0 ^->0
// TSW xxxxxxxxxxxxxxxxxxxx____/====\__________/====\__________/====\__________/====\__________/====\__________
// ^- ^- ^- ^- ^- get s
// RL Comp CTR TSW
// 0011 SHR 001 M4 1-2
// 0111 SHR 011 M8 3-4
// 1000 SHR 100 M9 4-5
// 1001 SHR 100 M10 4-5
// ^-- Grabs Di'' on this edge
// ED is Edge Detect, Di' XOR Di'', and TSW is Sample Enable Window for Di'', one clk wide.
// TSW as CE samples just before falling edge
reg [7:0] RL_Ctr; // RL is 8b reload field, ED(i), TSW(o) are one bit
always @(*) begin // combin codes == is 16 wide, >= is many more.
RL_FS = (RL_Ctr == RL); // common compare, flyback to 00, change constraints to keep this.
TSW = (RL_Ctr == {1'b0,RL[7:1]}); // Divide by 2 compare/slice 1 clk wide , keeps clear of flyback chatter.
RL_FZ = RL_FS | ED | WrRL; // force to zero on Either FullScale (free run) OR USB Edge Detect
// Optional WriteRL signal, can reset on Baud change, to allow timed SW start, and safe lowering of Value.
end
always @(posedge CLK)
begin
if (RL_FZ) begin
RL_Ctr <= 7'b0000000; // Sync Flyback on TC
end
else begin
RL_Ctr <= RL_Ctr+1; // Up counter
end
end
// * changed to Up counter, due to rounding nature of SHR
// Adds a compare, but counters are simpler, Sync CLR or INC.
// * Added Force Zero term, to give optimizer less options & shrink counters further.
// * Added Optional WrRL, to force reset on Update of BaudValue(RL)
// Allows Sw control of timing, and safe decrease in BaudV
TSW is the Sample window, which can then enable the Byte-level WAITUSB style code block discussed above.
ie this snippet allows BYTE level rather than BIT level handling, and re-syncs sample point on USB data, to allow longer stream tolerance.
On /4 the phase of TSW matters, but I think above is right, for samples taken from D'' (2nd sampler FF)
This supports odd divides too, for more clock flexibility. Takes an 8 bit RL value to set Baud speed.
I'm about ready to release the new FPGA image. I just need to finish the docs.
Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?
Chip,
If it is easy to convert this Verilog then it would be nice to have this to be able to test it. I am not sure it is totally correct, but it is a place to start.
BTW Where I use xxx = 3'b000 or similar, jmg has suggested it be xxx <= 3'b000 (ie replace = with <=). I have not had time to check what this means.
My code is in post #107.
Thanks heaps.
Postedit: Should have said, if it is quicker/easier for you to put out a release without it, and follow up with a release with the above shortly after, that is fine by me.
BTW IMHO I think a number of us would appreciate the fpga code + pnut before you complete the docs. We have a lot of things to change since the last release even without the docs.
Here is some brief CRC info from the USB guys... looks like they invert the CRC before transmitting it at the end, and on reception there will be a known constant residual after doing the CRC on the entire packet including its CRC (it will be non zero because of this). Something to bear in mind.
EDIT: Another document which describes SE0 detection problems... sounds a bit scary if there is asynchronous SE0 generation and bit dribble going on (see Pages 7-8)...! http://www.usb.org/developers/whitepapers/siewp.pdf
I haven't looked yet. There are various ways the CRCs can be calculated and hence they reverse/invert/startvalues can all be different. The ultimate result is the same though. FWIW I am hoping that I have it the correct way around. I realised yesterday I had the data byte being assembled in reverse (MSB first instead of LSB) but I fixed that yesterday. The CRC16 I have used requires that the CRC be preset to $FFFF and will result at the end with IIRC $8005. I can check this out when Chip implements the Verilog. If its wrong, then temporarily I can correct it in sw and modify the Verilog for the next release.
I am happy to be able to detect SE0 and also to resync for new frames. I have seen dribble detection but I don't have it covered yet.
Thanks for the links. There is a much older P2 USB thread that I started where I listed some of the docs I use. I am quite happy I have the bit stream covered but I don't have have a good understanding of the upper sw protocol levels. But I do have some info to guide me.
Chip,
If it is easy to convert this Verilog then it would be nice to have this to be able to test it. I am not sure it is totally correct, but it is a place to start.
My code is in post #107.
Thanks heaps.
I think chip was meaning the earlier, simpler code to allocate Pins and manage SE0 and T into the flags ?
The code in #107 is not quite 'mission-ready', and Pin mapping and the couple of FF's & XORs to do SE0_SE1 and T should be common to any extended code.
BTW Where I use xxx = 3'b000 or similar, jmg has suggested it be xxx <= 3'b000 (ie replace = with <=). I have not had time to check what this means.
<= assign is verilog that ensures you do get a clocked result. ( ie usually a D-FF )
= within a clocked block seems to sometimes give a clocked result, but not always. Best to be careful.
( another reason I suggested you run something like Lattice ISPlever)
Comments
What this means is that I can decode all the incoming frames to the FTDI Chip (connected to a P1). I can also snoop the replies.
Remember, while I have read the USB Spec summaries, and looked at code doing the protocol, I have never actually done it. However, I have written lots of sync software over many years including SDLC and BiSync, and built ASCII to EBCDIC sync converters. But this was before TCPIP etc. As long as you save after each byte, and you keep at least 3 levels, you will have the CRC available. I am unsure if the CRC can be used to verify the CRC (if you know what I mean). I will need to do some simple testing of this.
BTW I did write a simple P1 program to calculate CRC5 & CRC16 for USB. I just have not got around to looking at it. Not sure what you mean here? I don't know.
I am still after the KISS way, at least for now. If it turns out that it's not too complex to set off a simple instruction to run in the background like the mult/cordic instructions, and they don't take huge blocks of silicon, then it may be worthwhile. ATM I am trying to walk not run.
For 80MHz you wait +1/3 +1/3 -2/3 (ie you wait an extra +1 clock +1 clock, -2 clocks). That is how they got USB to work originally.
See the discussion further up.
Apparently, if you include the CRC in the stream, ie read to the end tag, then the CRC should read 0000
That would make life simpler.
Some things I didn't know was crc = crc ^ x was possible.
Also which ways are the best. These are all things I don't understand.
So for me, its better that I ultimately put the things to be done within if blocks and let Chip (or you) sort that part out for me.
The issue is not resync at the beginning, it is sampling creep during long packets.
It is easily solved by running my P1 program with some input parameters and see. just haven't got around to it yet.
I've updated the code, as checking the eqns showed it dropped the ball on some CRC nodes.
I tend to always use <= for clocked and = combin, and it seems your use of = in clocked sometimes works, but can get confused on more complex forms...
Increment the bit counter, if required.
Increments the stuff counter, or resets it, as required. Sets the Z & C flags, clocked by CLK.
Yes, these are to make later code easier to read. The compiler/fit will likely optimize some of these names away.
To keep them, they can be move to the module header where they become pins
Only sort of. That's where it gets tricky - the best code is stand alone, that has one register and some muxes on that.
That makes eqn-scan, and general testing easier.
If you want to code this like it is a Read/Write path on a register, then you do not have the register as well and so it does not reduce down to useful equations.
See the amended code, <= is better than =, strangely = is almost right, and seems ok in very simple cases, and gives no errors.
Again not quite, the counters are further up, and the data should really be registered
Yes.
Rather than trying the double gymnastics of [start of each instruction] and [end of each instruction], I think it is best in the early stages to KISS, and focus on simplest most readable verilog, that is then used as a template for software.
I treat each CLK as a data sample point on the USB waveform.
I'm sure Chip will be able to re-warp it around registers, if the pathways allow it, or he may choose to use separate registers.
At some point the extra muxes to merge all this into the opcode tree, will bite into the MHz values.
Local routing is smaller and faster.
The only benefit of a full merge into the multiport register stack, is you can run multiple copies in multiple registers, but I don't think anyone is expecting to run TWO USBs in one COG ?! Just one USB with some spare MIPS would be fine for most.
There are 8 COGS here.
I recalled other discussions on USB & port debug, and the suggested tool was PortMon.
http://technet.microsoft.com/en-us/sysinternals/bb896644.aspx
A P1 might even be able to edge-capture to 12.5ns at 1.5MHz speeds, as an edge-based logic analyser ?
A complete frame can never need more than 1500 max stamps.
It compiles in Quartus for me.
Maybe form this SCH attached --- You can see if all logic's are as desired.
Btw,
http://www.seeedstudio.com/depot/Open-Workbench-Logic-Sniffer-p-612.html?cPath=63
is a great little tool!
I have a Hantek 500Msps logic analyzer, but I tend to use the Logic Sniffer's a lot more.
Looks nice, says
16 channels with 8K sample depth
8 channels with 16K sample depth
which is just a little light for 1 USB frame.
My personal preference is Logic Analysers that capture & store timestamps, as they have much better dynamic range.
A P1 might make 1.5MHz that way ?
> 3MHz (say 4MHz) would allow capture of the USB edges and the mid-point sample-tags, but that may be asking too much.
I guess multiple COGS could give more, and a Logic capture unit does not care it if uses 7 COGS for captures.
I have any question regarding USB packet --->
1. Have every BYTE that header with bit stuff ---- else are it only at start of packet
2. Have every byte any Start-Stop condition else Only entire packet?
Sory for that questions --- But cant find that on Internet
I use their logic Logic Sniffer(Ver 4). The Bus Pirate is cool as well.
Yes, I am sure Chip will know the best way to do it.
No, I am not expecting to do multiple USBs in a single cog. Urhg
2. No, its NRZI synchronous. No start or stop bits ever. There are sync bits at the start, and the SE0 (both J & K low) at the end.
BTW Thanks for the logic but I am so far removed from that now its not much help to me for now.
I see you combine CRC into BitStuff - Here, it may pay to expand that slightly?
See http://en.wikipedia.org/wiki/Cyclic_redundancy_check
crc05usb -> Does this change BitStuff ?
crc16usb -> Use USB bit-stuff rules
crc16itt -> Use SDLC bitstuff rules
crc16ndef -> disable BitStuff, for more general CRC use ? - Pick one ?
I think you can also attach the CRC to a USB sending Pins (includes stuff, which HW removes), and (quickly) grab the CRC for use in TX append ?
A DE2 could do this in two cogs and the Propplug.
Earlier it was asked about syncing to SE0 and waiting. It is a simple matter while waiting for the next valid frame to start, to look for the SE0 or SE1 condition. Two successive pin reads will validate an SE0 condition. Remember, the USB line is not oscillating (else the whole thing is U/S), so the unfortunate read during a transition will be resolved by two consecutive reads. The frame resync mechanism is not hard and I am doing that now (well 3+ months ago).
Just reread the CRC algorithms on the wiki. Its as I thought, by just passing the received CRC thru the CRC generator, the final CRC after this will be a fixed value ($8005 IIRC). This is easy. Its so long ago since I calculated CRC16s on IBM sync comms using micros.
The value depends on start ($0000 or $FFFF) and endian (LSB or MSB first). Once it is working I can check the endian issue.
You may have noted that the last post also fixed the endian of the data byte I had it the wrong way around
ANd yes, I am sure I can grab the CRC calculated from this during the last data bit for sending out the CRC.
On this page are link to one PDF.
http://forums.parallax.com/showthread.php/125543-Propeller-II-update-BLOG?p=1250045&viewfull=1#post1250045
That show as CRC5 are 11bits IN. that say to me -- after all bits of PID received
BUT CRC16 calculated bitwise.
http://www.usb.org/developers/whitepapers/crcdes.pdf
EDIT: Another document which describes SE0 detection problems... sounds a bit scary if there is asynchronous SE0 generation and bit dribble going on (see Pages 7-8)...! http://www.usb.org/developers/whitepapers/siewp.pdf
What I like about them is how inexpensive they are, after I got the first one, I picked up two more so I could test more gear at the same time.
My 500Msps unit only has a 4K buffer, which is a real pain. It is supposed to have a compressed mode, but with the firmware I have installed, that does not work. Reminds me to update its firmware...
What I really want is a 1Gsps or 2Gsps unit with a large buffer...
Hanno's ViewPort will sample to clkfreq using four cogs, and has an approximately 1500 sample buffer. I used it to debug Morpheus a few years ago.
I'm about ready to release the new FPGA image. I just need to finish the docs.
Do you still want me to make a USB pin instruction for this release, or are things too up in the air now?
I would say here, that any code that defines and selects the pin-pair (with reverse feature), and does SE0 and Toggle decode will still be common to any solution. (ie not be wasted at all)
It would also allow more testing in a FPGA, as the present USB code is not quite enough entirely in SW.
That said, a release now would be used by everyone, and if all that is added is USB_SET on a .1 release, only a few would need to download the .1, so to most it would not be a dual release.
Here is some Verilog for a Sync'able Baud counter, that should work from
/4 ie 48MHz CLK on 12M USB
to
> /133 ie > 200MHz CLK on 1.5M USB
TSW is the Sample window, which can then enable the Byte-level WAITUSB style code block discussed above.
ie this snippet allows BYTE level rather than BIT level handling, and re-syncs sample point on USB data, to allow longer stream tolerance.
On /4 the phase of TSW matters, but I think above is right, for samples taken from D'' (2nd sampler FF)
This supports odd divides too, for more clock flexibility. Takes an 8 bit RL value to set Baud speed.
Chip,
If it is easy to convert this Verilog then it would be nice to have this to be able to test it. I am not sure it is totally correct, but it is a place to start.
BTW Where I use xxx = 3'b000 or similar, jmg has suggested it be xxx <= 3'b000 (ie replace = with <=). I have not had time to check what this means.
My code is in post #107.
Thanks heaps.
Postedit: Should have said, if it is quicker/easier for you to put out a release without it, and follow up with a release with the above shortly after, that is fine by me.
BTW IMHO I think a number of us would appreciate the fpga code + pnut before you complete the docs. We have a lot of things to change since the last release even without the docs.
I am happy to be able to detect SE0 and also to resync for new frames. I have seen dribble detection but I don't have it covered yet.
Thanks for the links. There is a much older P2 USB thread that I started where I listed some of the docs I use. I am quite happy I have the bit stream covered but I don't have have a good understanding of the upper sw protocol levels. But I do have some info to guide me.
I think chip was meaning the earlier, simpler code to allocate Pins and manage SE0 and T into the flags ?
The code in #107 is not quite 'mission-ready', and Pin mapping and the couple of FF's & XORs to do SE0_SE1 and T should be common to any extended code.
<= assign is verilog that ensures you do get a clocked result. ( ie usually a D-FF )
= within a clocked block seems to sometimes give a clocked result, but not always. Best to be careful.
( another reason I suggested you run something like Lattice ISPlever)