Shop OBEX P1 Docs P2 Docs Learn Events
P2 and full speed USB slave requirements/ideas - Page 5 — Parallax Forums

P2 and full speed USB slave requirements/ideas

1235»

Comments

  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-12 17:13
    jmg wrote: »
    I think chip was meaning the earlier, simpler code to allocate Pins and manage SE0 and T into the flags ?

    The code in #107 is not quite 'mission-ready', and Pin mapping and the couple of FF's & XORs to do SE0_SE1 and T should be common to any extended code.


    <= assign is verilog that ensures you do get a clocked result. ( ie usually a D-FF )
    = within a clocked block seems to sometimes give a clocked result, but not always. Best to be careful.
    ( another reason I suggested you run something like Lattice ISPlever)
    There is now no point in producing code for the previous GETXP examples.
    I am certain Chip understands what I am trying to do regarding extracting the correct pin pair. If not, a fixed P0 & P1 would work for now.
    I think the code is close enough for testing - it is the time it will take Chip to do this with appropriate fixes and fitting into his Verilog regime.

    As for testing, initially I propose to just output on those pin pairs various conditions, and each time calling the new RxUSB instruction (whatever we call it - and we don't need pnut support as I can code it as a long). This way, I can control what is output and therefore test what the instruction receives. So I can verify the instruction is working as designed in a controlled environment.

    Once I have that running, I can snoop a real FS USB and ensure I can read the tokens and packets, and verify the crc5 and crc16usb, the SE0 at the EOP and of course the initial sync sequence.
  • jmgjmg Posts: 15,144
    edited 2014-03-12 17:32
    Cluso99 wrote: »
    As for testing, initially I propose to just output on those pin pairs various conditions, and each time calling the new RxUSB instruction (whatever we call it - and we don't need pnut support as I can code it as a long). This way, I can control what is output and therefore test what the instruction receives. So I can verify the instruction is working as designed in a controlled environment.

    That testing approach sounds like a good idea, Chip then just needs to make as much SW-readable as is practical. ie 32b each way.
    It probably does not need to mesh into the register-array, just as long as it can R/W in SW. (ie like the counter setups)
  • roglohrogloh Posts: 5,158
    edited 2014-03-12 17:44
    I've been looking at Cluso99's proposed RXUSB instruction.

    I think for the final P2 (not the FPGA) there is scope to be able to use this simply with a byte processing loop in another COG task if the P2 clock is a multiple of 12MHz and >=96MHz.

    This could be the bitloop.
    LOOP:           SYNCTRA
                    RXUSB   data, setup WC WZ
    if_nz_and_nc    JMPD    #LOOP
    if_c            JMP     #SE_ERROR
    if_z            MOV     INDA++, data
    if_z            ADD     fifo_counter, #1
    

    EDIT: Sorry accidently hit tabs + space while typing which clicked submit and this posted too soon. I'm still formulating and thinking about this idea. I want to come up with a 6 or (7?) instruction bit loop and use SYNCTRA which will wait until the right time to sample without stopping the other hardware task from running. I think I will need to subtract from PHSA somewhere as well making this 7 instructions. We would still do the 1:8 task allocation to give the byte processor task its time for the packet.
  • jmgjmg Posts: 15,144
    edited 2014-03-12 18:02
    Thinking about the 4 phase Digital PLL equivalent I posted in #118.

    This can help snoop on a 1.5MHz USB, and so open that testing domain, when this gets to real data flows.

    Taking a 80MHz FPGA clock, we can get to 1.5MHz on average, with modest jitter.
    80/1.5 = 53.3333333333333333
    2^32/(80/1.5) = 80530636.8 round(2^32/(80/1.5)) = 80530637
    2^32/(round(2^32/(80/1.5))) = 53.3333332008785675
    80M/(2^32/(round(2^32/(80/1.5)))) = 1500000.0037252903
    1/(ans-1.5M) = 268.435456s of numeric beat error.
    Will add 80530637 every clock (1.5/80)

    Then the upper 2 bits are the fastest /4 case, and the DPLL rule, from #118 is along the lines of

    If edge occurs @ MSB = 3 -> No change (add as usual) (1.5/80)
    If edge occurs @ MSB = 2 -> Need to advance to 1 quadrant, or add 2^32/4 ONCE ( or 2^32/8 twice )
    If edge occurs @ MSB = 0 -> Need to retard 1 quadrant, or add 2^32*3/4 ONCE ( or 2^32*3/8 twice )
    edge @ MSB =1 should never happen during a data stream.
    That case could be flagged, and it can add 2^32*2/4 ONCE ( or 2^32*4/8 twice ) to match the Verilog action.

    If a COG is set for 2 threads 50%, we have 2 x 40MHz flows to manage 1.5MBd data.

    Code then does
      FRQx  = Default_2e32_1p5d80
    
    and adjust is for advance two lines
      FRQx  = Advance_S2_2e32d8
      FRQx  = Default_2e32_1p5d80
    
    and adjust is for retard two lines
      FRQx  = retard_S0_2e32_3d8
      FRQx  = Default_2e32_1p5d80
    At 50% slot, FRQx I think will apply twice, before being restored to default locked value.
    
    

    Or there may be time for a way to read PHSx, for all edges not in quadrant 3, and calculate a (double) add to give quadrant 0 next.
    just one adjust code block is then needed.

    If quadrant3 is the edge value, then quadrant 1 is the sample point ( and Q0,Q2 are the guard bands)

    Each thread has 26 thread cycles per bit.
    PHSx.MSB can map to a pin, and be used as a sample-and scope trigger.
  • jmgjmg Posts: 15,144
    edited 2014-03-12 18:09
    rogloh wrote: »
    I think for the final P2 (not the FPGA) there is scope to be able to use this simply with a byte processing loop in another COG task if the P2 clock is a multiple of 12MHz and >=96MHz.

    The problem with that >=96MHz, is you cannot fully test this in the FPGA, which is close to drop-dead.

    Better I think, to include 48MHz ( & 60MHz & 72MHz& maybe 84MHz ) on the Clock targets.
    The Auto-pacing (DPLL) code for BYTE level handler, is in #118, and is not large.
  • roglohrogloh Posts: 5,158
    edited 2014-03-12 18:25
    jmg wrote: »
    The problem with that >=96MHz, is you cannot fully test this in the FPGA, which is close to drop-dead.

    Better I think, to include 48MHz ( & 60MHz & 72MHz& maybe 84MHz ) on the Clock targets.
    The Auto-pacing (DPLL) code for BYTE level handler, is in #118, and is not large.

    Yeah I know 96MHz is just too fast for the FPGA. Assuming we were to stick to bit processing loops in software I see a lot of merit in the 1:8 approach with a byte processing task running as well, as it simplifies the software design and decouples the timing critical bit work from the other (slower) byte orientied protocol processing work. The problem is transferring the data and doing the error checking takes time and I don't see a way to get it down too much more given what Cluso has proposed. Now if we come up with more USB extensions beyond RXUSB that provides bytes for us and deals with timing, that could work out nicely as well. I don't think we are there yet but hopefully we are heading in that direction...it's worth continuing that discussion too.
  • jmgjmg Posts: 15,144
    edited 2014-03-12 18:34
    rogloh wrote: »
    ....Now if we come up with more USB extensions beyond RXUSB that provides bytes for us and deals with timing, that could work out nicely as well. I don't think we are there yet but hopefully we are heading in that direction...

    See Chip's comment in #118 - we are closer than you think - once you have a bit-counter inside the Verilog, then you just need to 'fire' the per-bit Verilog, on a DPLL timer, (#118) and buffer the DataTX.

    I'm not sure if CRC needs buffering, or just preservation-care over a SE0 event.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-12 18:57
    Hi jmg.

    I posted that code's SCH on prop ii blog thread -- none even commented it.

    jmg wrote: »
    I think chip was meaning the earlier, simpler code to allocate Pins and manage SE0 and T into the flags ?

    The code in #107 is not quite 'mission-ready', and Pin mapping and the couple of FF's & XORs to do SE0_SE1 and T should be common to any extended code.


    <= assign is verilog that ensures you do get a clocked result. ( ie usually a D-FF )
    = within a clocked block seems to sometimes give a clocked result, but not always. Best to be careful.
    ( another reason I suggested you run something like Lattice ISPlever)
  • jmgjmg Posts: 15,144
    edited 2014-03-12 19:08
    Sapieha wrote: »
    Hi jmg.
    I posted that code's SCH on prop ii blog thread -- none even commented it.

    It was too low resolution for me to see clearly, and besides, I can see the fitter equation's which are easier to follow...
    It does pay with Verilog (like with most high level languages) to check you got what you expected, and not something else, or logic-bloat.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-12 19:45
    jmg wrote: »
    That testing approach sounds like a good idea, Chip then just needs to make as much SW-readable as is practical. ie 32b each way.
    It probably does not need to mesh into the register-array, just as long as it can R/W in SW. (ie like the counter setups)
    ??? The instruction uses the D register, so we can preset it, and we can read it.
    Chip said there will be instruction space, so no need for a fixed D address.
  • jmgjmg Posts: 15,144
    edited 2014-03-12 19:58
    Cluso99 wrote: »
    ??? The instruction uses the D register, so we can preset it, and we can read it.
    Chip said there will be instruction space, so no need for a fixed D address.

    The exact implementation is up to Chip, I'm just observing that this is more like a Counter or SerDes in operation, than a memory/register.
    The Counters use SETxx and GETxx opcodes , which also have a D address.
  • roglohrogloh Posts: 5,158
    edited 2014-03-12 20:43
    jmg wrote: »
    Thinking about the 4 phase Digital PLL equivalent I posted in #118.

    This can help snoop on a 1.5MHz USB, and so open that testing domain, when this gets to real data flows.

    Taking a 80MHz FPGA clock, we can get to 1.5MHz on average, with modest jitter.
    80/1.5 = 53.3333333333333333
    2^32/(80/1.5) = 80530636.8 round(2^32/(80/1.5)) = 80530637
    2^32/(round(2^32/(80/1.5))) = 53.3333332008785675
    80M/(2^32/(round(2^32/(80/1.5)))) = 1500000.0037252903
    1/(ans-1.5M) = 268.435456s of numeric beat error.
    Will add 80530637 every clock (1.5/80)

    Then the upper 2 bits are the fastest /4 case, and the DPLL rule, from #118 is along the lines of

    If edge occurs @ MSB = 3 -> No change (add as usual) (1.5/80)
    If edge occurs @ MSB = 2 -> Need to advance to 1 quadrant, or add 2^32/4 ONCE ( or 2^32/8 twice )
    If edge occurs @ MSB = 0 -> Need to retard 1 quadrant, or add 2^32*3/4 ONCE ( or 2^32*3/8 twice )
    edge @ MSB =1 should never happen during a data stream.
    That case could be flagged, and it can add 2^32*2/4 ONCE ( or 2^32*4/8 twice ) to match the Verilog action.

    If a COG is set for 2 threads 50%, we have 2 x 40MHz flows to manage 1.5MBd data.

    Code then does
      FRQx  = Default_2e32_1p5d80
    
    and adjust is for advance two lines
      FRQx  = Advance_S2_2e32d8
      FRQx  = Default_2e32_1p5d80
    
    and adjust is for retard two lines
      FRQx  = retard_S0_2e32_3d8
      FRQx  = Default_2e32_1p5d80
    At 50% slot, FRQx I think will apply twice, before being restored to default locked value.
    
    

    Or there may be time for a way to read PHSx, for all edges not in quadrant 3, and calculate a (double) add to give quadrant 0 next.
    just one adjust code block is then needed.

    If quadrant3 is the edge value, then quadrant 1 is the sample point ( and Q0,Q2 are the guard bands)

    Each thread has 26 thread cycles per bit.
    PHSx.MSB can map to a pin, and be used as a sample-and scope trigger.

    I do quite like the sound of this adaptive clocking. I need to get my head around it more. What HW changes or other further instructions would be required to support it? How much is done in HW vs software? I see we would use one of the counters, does it need modifications or do all the clocking tweaks adjusting FRQA happen in software?

    I think you meant 26 thread cycles per byte, not per bit above. But that is still nice and already gives us at least 3 hub cycles per byte @80MHz in the byte processing task. At 48MHz this drops down to 16 clocks per thread or 2 hub cycles which I think should still be fairly generous.

    So what crystal frequency limitations would this overall approach entail? Anything >=48Mhz or do you still need discrete 12MHz multiples above this? I imagine for receiving if we get aggressive we could potentially adjust timing after every byte which that probably means we don't want slip any more than say 1/4 bit per 8 bits right? That is ~3% tolerance. But the transmit adds it own complexity and I expect we want to be able to transmit accurately at the right bit rate, which then means a 12MHz multiple. How does your design deal with that?
  • jmgjmg Posts: 15,144
    edited 2014-03-12 21:45
    rogloh wrote: »
    I do quite like the sound of this adaptive clocking. I need to get my head around it more. What HW changes or other further instructions would be required to support it? How much is done in HW vs software? I see we would use one of the counters, does it need modifications or do all the clocking tweaks adjusting FRQA happen in software?

    The counter form above is just a skeleton of SW-emulation ideas, of the verilog code in #118, and a means to smart-sample a USB stream at 1.5MHz.
    The SW-emulation is looking at ways to test.emulate verilog ideas, using the FPGA in SW, but at modest USB speeds. Lucky there is the low speed mode :)

    In #118 you ca see Chips comment suggests he may fit a custom Baud controller easier than adding modes to a Timer.

    Either way fine, it's whatever is easiest and smallest to include. Separate 8bit Baud Div, frees 32b timers for other tasks.

    The Code in #118 is pretty much all you need, just a few lines of Verilog -> Silicon.

    There are not many changes once the USB code block includes a BitCtr, and is called once per bit, it's just a matter of do you call it in SW, or use a DPLL as in #118 ?.
    rogloh wrote: »
    I
    I think you meant 26 thread cycles per byte, not per bit above.

    No, that is per-bit, - but notice that is for a test version, running at 1.5MHz LO-speed USB, where things will be easier to probe, and hook-into.

    At 1.5MHz I think FPGA-P2 can fit one DPLL + Diagnostics thread sampling and checking, and one Thread running the USB-Verilog tests.

    Once that looks good, the Verilog DPLL would be added to pace the USB engine, instead of SW calls.
    In this form, it's not quite as easy to probe or test, so a mixed SW version (aka Verilog emulation) gives a way to bring this up, and get higher level code working.
    rogloh wrote: »
    So what crystal frequency limitations would this overall approach entail? Anything >=48Mhz or do you still need discrete 12MHz multiples above this?

    Yes, the Baud-DPLL assumes N x 12MHz with N >= 4, and Ok up to 200MHz, and can do low-Speed USB at >200MHz SysClk.

    The Code in #118 auto-syncs, so tolerance is not so critical, but you would need a crystal or resonator for timing.
    ( ie RC osc's are probably off the table)
    Chips Xtal PLL is now any-integer, so that gives a few choices of how to get to 12MHz x N
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 03:23
    Hi jmg.

    I think that to BitBanged send receive it is all hardware that needs.

    Look in attachment.
    Only one more signal I think is needed are MODE 0/1 that inverse TXD/RXD.

    HAve even one version that include NRZI IN/OUT
    902 x 514 - 41K
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-13 03:46
    Sapieha wrote: »
    Hi jmg.

    I think that to BitBanged send receive it is all hardware that needs.

    Look in attachment.
    Only one more signal I think is needed are MODE 0/1 that inverse TXD/RXD.

    HAve even one version that include NRZI IN/OUT
    Sapieha,
    Sorry, I don't understand what you are saying/showing.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 03:53
    Hi Cluso

    Hardware between 2 PIN's to read/send BitBanged USB --->
    Most of it needs even if other functions that need connect to USB's differential pins

    This part of Hardware You can't omit in any type of USB communication.
    Cluso99 wrote: »
    Sapieha,
    Sorry, I don't understand what you are saying/showing.
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-13 04:55
    Sapieha wrote: »
    Hi Cluso

    Hardware between 2 PIN's to read/send BitBanged USB --->
    Most of it needs even if other functions that need connect to USB's differential pins

    This part of Hardware You can't omit in any type of USB communication.
    Yes. My usb instruction uses a pair of pins.
    But your circuit did not show both inputs.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 05:01
    Hi Cluso.

    D_n, D_p are lines that Input/Output to D-, D+

    nOEi --->select Input/Output to USB
    TDX, RXD are NRZI Output/Input to this circuity

    Cluso99 wrote: »
    Yes. My usb instruction uses a pair of pins.
    But your circuit did not show both inputs.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 16:02
    Hi

    Here are circuity with in build NRZI.

    D_n, D_p are lines that Input/Output to D-, D+

    nOEi --->select Input/Output to USB
    TXD_di, RXD_do are real bits Output/Input to this circuity

    Give directly Real bits with Receive and use real bits with send

    Ned only one instruction that in field D --- Can control special signals --->
    some of them are read only and some write/read (nOEi, SOE, SEI, SUSPEND, reset_i)
    And field S port number.
    and send/receive to flag C bit value (maybe directly shifted IN/OUT from register specified by RESD instruction.
    1024 x 370 - 37K
  • jmgjmg Posts: 15,144
    edited 2014-03-13 16:59
    Verilog from above, after edits/fixes to get it to compile, and some cleanups on stuff and Data.
    Included all logic in ONE place (KISS) and made stuff counter more self contained.
    For simplicity, the CLK here is considered as USB sample point.

    Code that merged with the DPLL Baud further up would add TSW as a CE gate,to get that sample point aligned correctly.
    ////////////////////////////////////////////////////////////////////////////////
    // RR20140310-12 P2 RxUSB instruction
    ////////////////////////////////////////////////////////////////////////////////
    /*---------------------------------------------------------------------------------------------------------------------
                  RxUSB   D, S/#          WZ,WC             ' Receive single NRZI bit pair, accum CRC and byte, unstuff bits
    where
      S/# is the PinPair# and Poly bits
        S[31..9]  = unused
        S[8..7]   = 00= CRC5  USB    (0 2 5)  
                    01= CRC16 USB    (0 2 15 16)
                    10= CRC16 CCITT  (0 5 12 16)
                    11= undefined
        S[6..0]   = D-/D+ Pin Pair #0..127
                    The pin pair is always a pair of pins mod 2. ie nnnnnnx where x=0 and x=1 for the pair.
                    If the pin pair is even (S[0]=0) then J is the lowest pin and K is the higher pin of the consecutive pair
                    If the pin pair is odd  (S[0]=1) then K is the lowest pin and J is the higher pin of the consecutive pair.
                    This arrangement allows for simple LS and FS by making the pin pair even or odd.                              
      D is the cog register storing a 32 bit field...
        D[31..16] = crc16
        D[15]     = K new pin value
        D[14]     = J new pin value
        D[13..11] = unstuff counter 3 bits
        D[10..8]  = bit counter 3 bits
        D[7..0]   = data byte accumulation
      Z = data byte ready (8 bits)
      C = SE0/SE1
    It would be acceptable for D to be at a fixed location eg $1F0.
    ---------------------------------------------------------------------------------------------------------------------*/
    // inputs:  D, S, PINS
    // outputs: D, Z, C
    ////////////////////////////////////////////////////////////////////////////////
    module          RxUSB
    (
    input           CLK,
    input           Load_d,
    input           jI,             // new J value
    input           kI,             // new K value
    input   [31:0]  s,              // S operand
    input   [31:0]  d,              // D operand
    input           wz,             // WZ operand
    input           wc,             // WC operand
    input   [127:0] p,              // input pins
    output reg [31:0]  r,              // D result
    output reg      zz,             // Z flag
    output reg      cy              // C flag    
    );
    reg     [15:0]  crc;            // original CRC (accumulated)
    reg     [2:0]   bitcnt;         // data bit counter 3 bits
    reg             k;              // K new pin value
    reg             j;              // J new pin value
    reg     [2:0]   stuffcnt;       // stuff counter 3 bits
    reg     [7:0]   data;           // data byte (accumulated)
    reg     [1:0]   poly;           // crc05usb/crc16usb/crc16ccitt/undef polynomial selection
    //reg     [6:0]   pinno;          // pin pair numbers 0-127
    reg             kP;             // K previous pin value
    reg             jP;             // J previous pin value
    // flags/conditions...
    reg             crc05usb;       // 00= CRC5  USB    
    reg             crc16usb;       // 01= CRC16 USB   
    reg             crc16itt;       // 10= CRC16 CCITT 
    reg             crc16ndef;      // 11= undefined   
    reg             toggle;         // data bit 0 or 1
    reg             BitStuff;       // unstuff this bit
    reg             SE0_SE1;        // SE0/SE1 condition
    ///////////////////////////////////////////////////////////////////////////////
    // set crc options
        always @(poly)  begin   
            crc05usb  = (poly == 2'b00);                    // CRC5usb   =(0 2 5)
            crc16usb  = (poly == 2'b01);                    // CRC16usb  =(0 2      15 16)
            crc16itt  = (poly == 2'b10);                    // CRC16ccitt=(0   5 12    16)
            crc16ndef = (poly == 2'b11);                    // undefined
        end
    // check for a "1" bit =toggle, and SE0/SE1 conditions, and BitStuff condition
        always @(*)  begin   
            toggle    = kI ^ kP;                            // 1=Hi data bit (toggle) = new pin value ^ previous pin value
            SE0_SE1   = (kI == jI);                         // detect SE0/SE1 (j==k)
            BitStuff  = (!toggle & (stuffcnt == 3'b110) & (crc05usb | crc16usb));  // unstuff this bit - USB only ?
    //        BitStuff  = ( (stuffcnt == 3'b110) & (crc05usb or crc16usb));  // unstuff this bit - USB only ?
        end    // Counter alone is enough, once have 6, will get a 0, unless we want to preserve 1111111, not used in USB?
    ///////////////////////////////////////////////////////////////////////////////
    // Set Initial conditions
        always @(posedge CLK) begin
            if (Load_d) begin                               // write initial values to registers
                kP       <= d[15];                           // previous K
                jP       <= d[14];                           // previous J
                stuffcnt <= d[13:11];                        // original stuff counter value
                bitcnt   <= d[10:8];                         // original bit   counter value
                data     <= d[7:0];                          // original data value (accum)
                poly     <= s[8:7];                          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
                k        <= kI;                              // new pin value
                j        <= jI;                              // new pin value
            end
            else begin                                      // !Load_d = normal RUN (compiler wants in one block)
    // ??? is this correct way around etc ???
                k       <= kI;                              // new pin value
                j       <= jI;                              // new pin value
                kP      <= kI;                              // previous pin value
                jP      <= jI;                              // previous pin value
    // check for bit unstuff
                 if (!BitStuff & !SE0_SE1) begin    // Collect only valid data bits
                    bitcnt    <= bitcnt+1;                      
                    data[6:0] <= data[7:1];         // LSB first - shift right
                    data[7]   <= toggle;            
                 end       
    	     if (!toggle | (stuffcnt == 3'b110) ) begin  // reset if Din = 0, OR reaches (USB) Threshold.  
                    stuffcnt<= 3'b000;        
                 end          
                 else begin
                    stuffcnt <= stuffcnt+1;
                 end
    
            end // Load_d
        end                                                                          
    ///////////////////////////////////////////////////////////////////////////////
    // CRC routine
    reg             kr0;
    reg             kr2;
    reg             kr5;
    reg             kr12;
    reg             kr15;
    
    // calculate the new crc... (decoded values so no overlaps in if)
        always @(*) begin
            if (crc05usb) begin
                kr0  = toggle ^ crc[4];
                kr2  = toggle ^ crc[4];
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = 1'b0;
            end
            if (crc16usb) begin
                kr0  = toggle ^ crc[15];
                kr2  = toggle ^ crc[15];
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = toggle ^ crc[15]; 
            end
            if (crc16itt) begin
                kr0  = toggle ^ crc[15];
                kr2  = 1'b0;
                kr5  = toggle ^ crc[15];
                kr12 = toggle ^ crc[15];
                kr15 = 1'b0; 
            end
            if (crc16ndef) begin
                kr0  = 1'b0;
                kr2  = 1'b0;
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = 1'b0; 
            end
        end        
        always @(posedge CLK) begin
            if (Load_d) begin                     // write to reg initial value
                crc <= d[31:16];                  // original crc value (accum)
            end
            else if (!SE0_SE1 & !BitStuff) begin  // Only valid data 
                crc[0]  <= kr0;
                crc[1]  <= crc[0];
                crc[2]  <= crc[1] ^ kr2;
                crc[3]  <= crc[2];
                crc[4]  <= crc[3];
                crc[5]  <= crc[4] ^ kr5;
                crc[6]  <= crc[5];
                crc[7]  <= crc[6];
                crc[8]  <= crc[7];
                crc[9]  <= crc[8];
                crc[10] <= crc[9];
                crc[11] <= crc[10];
                crc[12] <= crc[11] ^ kr12;
                crc[13] <= crc[12];
                crc[14] <= crc[13];
                crc[15] <= crc[14] ^ kr15;
            end
        end    
            
    ///////////////////////////////////////////////////////////////////////////////
        
    // set D results - optional 32 bit pick-off.
        always @(*)  begin             //                     ??? or @(posedge CLK)
            r[31:16] = crc;
            r[15]    = k;
            r[14]    = j;
            r[13:11] = stuffcnt;
            r[10:8]  = bitcnt;
            r[7:0]   = data;               
        end    
        
    // set Z and C flags
        always  @(posedge CLK) begin
            if (wz)  begin
                if (!BitStuff & (bitcnt == 3'b111)) begin    // About to load last bit.. so  
                    zz <= 1'b1;                              // byte ready
                end
                else begin    
                    zz <= 1'b0;                              // byte not ready
                end
            end
            if (wc) begin          
                cy <= SE0_SE1;          // c = SE0/SE1
            end           
        end
    endmodule
    // Pre/   Post/ Post loaded
    // 000    001   1
    // 001    010   2
    // 010    011   3
    // 011    100   4
    // 100    101   5
    // 101    110   6
    // 110    111   7
    // 111    000   8
    
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-13 17:36
    Sapieha wrote: »
    Hi

    Here are circuity with in build NRZI.

    D_n, D_p are lines that Input/Output to D-, D+

    nOEi --->select Input/Output to USB
    TXD_di, RXD_do are real bits Output/Input to this circuity

    Give directly Real bits with Receive and use real bits with send

    Ned only one instruction that in field D --- Can control special signals --->
    some of them are read only and some write/read (nOEi, SOE, SEI, SUSPEND, reset_i)
    And field S port number.
    and send/receive to flag C bit value (maybe directly shifted IN/OUT from register specified by RESD instruction.
    Thanks Sapieha. Now I understand what you mean.
    I am writing the instruction using Verilog now. It does show the circuitry required.
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 17:54
    Hi jmg.

    Nice code ---> compile nice in my Quartus.

    BUT it is only Receive part ---> Still need Send part and Hardware drivers for ( j, k ) IN/OUT
    jmg wrote: »
    Verilog from above, after edits/fixes to get it to compile, and some cleanups on stuff and Data.
    Included all logic in ONE place (KISS) and made stuff counter more self contained.
    For simplicity, the CLK here is considered as USB sample point.

    Code that merged with the DPLL Baud further up would add TSW as a CE gate,to get that sample point aligned correctly.
    ////////////////////////////////////////////////////////////////////////////////
    // RR20140310-12 P2 RxUSB instruction
    ////////////////////////////////////////////////////////////////////////////////
    /*---------------------------------------------------------------------------------------------------------------------
                  RxUSB   D, S/#          WZ,WC             ' Receive single NRZI bit pair, accum CRC and byte, unstuff bits
    where
      S/# is the PinPair# and Poly bits
        S[31..9]  = unused
        S[8..7]   = 00= CRC5  USB    (0 2 5)  
                    01= CRC16 USB    (0 2 15 16)
                    10= CRC16 CCITT  (0 5 12 16)
                    11= undefined
        S[6..0]   = D-/D+ Pin Pair #0..127
                    The pin pair is always a pair of pins mod 2. ie nnnnnnx where x=0 and x=1 for the pair.
                    If the pin pair is even (S[0]=0) then J is the lowest pin and K is the higher pin of the consecutive pair
                    If the pin pair is odd  (S[0]=1) then K is the lowest pin and J is the higher pin of the consecutive pair.
                    This arrangement allows for simple LS and FS by making the pin pair even or odd.                              
      D is the cog register storing a 32 bit field...
        D[31..16] = crc16
        D[15]     = K new pin value
        D[14]     = J new pin value
        D[13..11] = unstuff counter 3 bits
        D[10..8]  = bit counter 3 bits
        D[7..0]   = data byte accumulation
      Z = data byte ready (8 bits)
      C = SE0/SE1
    It would be acceptable for D to be at a fixed location eg $1F0.
    ---------------------------------------------------------------------------------------------------------------------*/
    // inputs:  D, S, PINS
    // outputs: D, Z, C
    ////////////////////////////////////////////////////////////////////////////////
    module          RxUSB
    (
    input           CLK,
    input           Load_d,
    input           jI,             // new J value
    input           kI,             // new K value
    input   [31:0]  s,              // S operand
    input   [31:0]  d,              // D operand
    input           wz,             // WZ operand
    input           wc,             // WC operand
    input   [127:0] p,              // input pins
    output reg [31:0]  r,              // D result
    output reg      zz,             // Z flag
    output reg      cy              // C flag    
    );
    reg     [15:0]  crc;            // original CRC (accumulated)
    reg     [2:0]   bitcnt;         // data bit counter 3 bits
    reg             k;              // K new pin value
    reg             j;              // J new pin value
    reg     [2:0]   stuffcnt;       // stuff counter 3 bits
    reg     [7:0]   data;           // data byte (accumulated)
    reg     [1:0]   poly;           // crc05usb/crc16usb/crc16ccitt/undef polynomial selection
    //reg     [6:0]   pinno;          // pin pair numbers 0-127
    reg             kP;             // K previous pin value
    reg             jP;             // J previous pin value
    // flags/conditions...
    reg             crc05usb;       // 00= CRC5  USB    
    reg             crc16usb;       // 01= CRC16 USB   
    reg             crc16itt;       // 10= CRC16 CCITT 
    reg             crc16ndef;      // 11= undefined   
    reg             toggle;         // data bit 0 or 1
    reg             BitStuff;       // unstuff this bit
    reg             SE0_SE1;        // SE0/SE1 condition
    ///////////////////////////////////////////////////////////////////////////////
    // set crc options
        always @(poly)  begin   
            crc05usb  = (poly == 2'b00);                    // CRC5usb   =(0 2 5)
            crc16usb  = (poly == 2'b01);                    // CRC16usb  =(0 2      15 16)
            crc16itt  = (poly == 2'b10);                    // CRC16ccitt=(0   5 12    16)
            crc16ndef = (poly == 2'b11);                    // undefined
        end
    // check for a "1" bit =toggle, and SE0/SE1 conditions, and BitStuff condition
        always @(*)  begin   
            toggle    = kI ^ kP;                            // 1=Hi data bit (toggle) = new pin value ^ previous pin value
            SE0_SE1   = (kI == jI);                         // detect SE0/SE1 (j==k)
            BitStuff  = (!toggle & (stuffcnt == 3'b110) & (crc05usb | crc16usb));  // unstuff this bit - USB only ?
    //        BitStuff  = ( (stuffcnt == 3'b110) & (crc05usb or crc16usb));  // unstuff this bit - USB only ?
        end    // Counter alone is enough, once have 6, will get a 0, unless we want to preserve 1111111, not used in USB?
    ///////////////////////////////////////////////////////////////////////////////
    // Set Initial conditions
        always @(posedge CLK) begin
            if (Load_d) begin                               // write initial values to registers
                kP       <= d[15];                           // previous K
                jP       <= d[14];                           // previous J
                stuffcnt <= d[13:11];                        // original stuff counter value
                bitcnt   <= d[10:8];                         // original bit   counter value
                data     <= d[7:0];                          // original data value (accum)
                poly     <= s[8:7];                          // 00=crc16usb, 01=crc05usb, 10=crc16ccitt, 11=undefined
                k        <= kI;                              // new pin value
                j        <= jI;                              // new pin value
            end
            else begin                                      // !Load_d = normal RUN (compiler wants in one block)
    // ??? is this correct way around etc ???
                k       <= kI;                              // new pin value
                j       <= jI;                              // new pin value
                kP      <= kI;                              // previous pin value
                jP      <= jI;                              // previous pin value
    // check for bit unstuff
                 if (!BitStuff & !SE0_SE1) begin    // Collect only valid data bits
                    bitcnt    <= bitcnt+1;                      
                    data[6:0] <= data[7:1];         // LSB first - shift right
                    data[7]   <= toggle;            
                 end       
             if (!toggle | (stuffcnt == 3'b110) ) begin  // reset if Din = 0, OR reaches (USB) Threshold.  
                    stuffcnt<= 3'b000;        
                 end          
                 else begin
                    stuffcnt <= stuffcnt+1;
                 end
    
            end // Load_d
        end                                                                          
    ///////////////////////////////////////////////////////////////////////////////
    // CRC routine
    reg             kr0;
    reg             kr2;
    reg             kr5;
    reg             kr12;
    reg             kr15;
    
    // calculate the new crc... (decoded values so no overlaps in if)
        always @(*) begin
            if (crc05usb) begin
                kr0  = toggle ^ crc[4];
                kr2  = toggle ^ crc[4];
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = 1'b0;
            end
            if (crc16usb) begin
                kr0  = toggle ^ crc[15];
                kr2  = toggle ^ crc[15];
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = toggle ^ crc[15]; 
            end
            if (crc16itt) begin
                kr0  = toggle ^ crc[15];
                kr2  = 1'b0;
                kr5  = toggle ^ crc[15];
                kr12 = toggle ^ crc[15];
                kr15 = 1'b0; 
            end
            if (crc16ndef) begin
                kr0  = 1'b0;
                kr2  = 1'b0;
                kr5  = 1'b0;
                kr12 = 1'b0;
                kr15 = 1'b0; 
            end
        end        
        always @(posedge CLK) begin
            if (Load_d) begin                     // write to reg initial value
                crc <= d[31:16];                  // original crc value (accum)
            end
            else if (!SE0_SE1 & !BitStuff) begin  // Only valid data 
                crc[0]  <= kr0;
                crc[1]  <= crc[0];
                crc[2]  <= crc[1] ^ kr2;
                crc[3]  <= crc[2];
                crc[4]  <= crc[3];
                crc[5]  <= crc[4] ^ kr5;
                crc[6]  <= crc[5];
                crc[7]  <= crc[6];
                crc[8]  <= crc[7];
                crc[9]  <= crc[8];
                crc[10] <= crc[9];
                crc[11] <= crc[10];
                crc[12] <= crc[11] ^ kr12;
                crc[13] <= crc[12];
                crc[14] <= crc[13];
                crc[15] <= crc[14] ^ kr15;
            end
        end    
            
    ///////////////////////////////////////////////////////////////////////////////
        
    // set D results - optional 32 bit pick-off.
        always @(*)  begin             //                     ??? or @(posedge CLK)
            r[31:16] = crc;
            r[15]    = k;
            r[14]    = j;
            r[13:11] = stuffcnt;
            r[10:8]  = bitcnt;
            r[7:0]   = data;               
        end    
        
    // set Z and C flags
        always  @(posedge CLK) begin
            if (wz)  begin
                if (!BitStuff & (bitcnt == 3'b111)) begin    // About to load last bit.. so  
                    zz <= 1'b1;                              // byte ready
                end
                else begin    
                    zz <= 1'b0;                              // byte not ready
                end
            end
            if (wc) begin          
                cy <= SE0_SE1;          // c = SE0/SE1
            end           
        end
    endmodule
    // Pre/   Post/ Post loaded
    // 000    001   1
    // 001    010   2
    // 010    011   3
    // 011    100   4
    // 100    101   5
    // 101    110   6
    // 110    111   7
    // 111    000   8
    
    
  • SapiehaSapieha Posts: 2,964
    edited 2014-03-13 17:56
    Hi Cluso.

    To part that shows in my SCH I already have Verilog code

    Cluso99 wrote: »
    Thanks Sapieha. Now I understand what you mean.
    I am writing the instruction using Verilog now. It does show the circuitry required.
  • jmgjmg Posts: 15,144
    edited 2014-03-13 19:55
    Sapieha wrote: »
    Hi jmg.

    Nice code ---> compile nice in my Quartus.

    BUT it is only Receive part ---> Still need Send part and Hardware drivers for ( j, k ) IN/OUT

    Correct, no Tx yet - I think P2 has differential out support now, and the Serdes may/(should?) support packed sends.
    If the CRC above can be shared (it could snoop on a Tx stream?), that just leaves bit-stuff to do in SW before starting to
    send a block.
    Receive is a tougher nut to crack, so the focus was on that.

    Even doing TxStuff in Verilog is not many gates ( similar to the Rx Side )
    Roughly :
    // ~~~~~~~~~~~ Stuff counter, INC when sending ones, else clear ~~~~~~~~~~~~~~~~~~~
            if (!DataBY[0] | (StuffCtr == 3'b110) ) begin    // reset when DSend = 0, OR reaches Threshold.  
                StuffCtr <= 3'b000;        
            end          
            else begin
                StuffCtr <= StuffCtr+1;
            end
    	
    // ~~~~~~~~~~~ Insert 0, or send/Shift data  ~~~~~~~~~~~~~~~~~~~
            if (StuffCtr == 3'b110) begin 
              TxT  <= !TxT;               // toggle = insert send 0, skip TxCount, skip shift DataBY
            end 
            else begin                    // No insert, normal data send, so INC and Do Shift 
    	  if ( !DataBY[0] ) begin     // send 0 = toggle, send 1 = hold value on TxT
                TxT  <= !TxT;      
              end 
              BitCtr <= BitCtr + 1;
    	  DataBY <= {Din,DataBY[7:1]};   // LSB first, so shift in from right 
    	end
    
    
  • Cluso99Cluso99 Posts: 18,069
    edited 2014-03-13 20:41
    Sapieha,
    As jmg said, we are concentrating on the harder part - the receive end first. But we can also use the same instruction to do the CRC calcs after outputting each bit. So the instruction is quite powerful as it is.
    Currently it is also capable of doing CRC16 but it needs a couple of fixes because BiSync/SDLC is uses a single bit, not complementary pairs, and it can be NRZ or NRZI.
  • jmgjmg Posts: 15,144
    edited 2014-03-25 16:33
    I'll add in here some test results from another discussion, as this gives a performance reference point of existing USB devices,
    and also shows some issues in the details of settings, and sustained speeds, for when testing USB flows on P2.

    Testing on more PCs and USB ports, shows some subtle differences on the test PCs
    * USB3 ports(blue) seem to sustain higher baud traffic, then a 'standard' port (even tho both run at 12MHz - maybe larger buffers ?)
    * Windows Device Settings defaulted to 16ms and change to 1ms did help 2MBd on std USB
    * 3MBd on USB3 HW, was very close to managing reliable streamed Duplex.
    * Adding a 2nd stop bit, seemed to help a little.
    * Some failure modes looked a little brutal, at 3Mbd giving errors, sometimes the USB VCP vanished from Win8, and did fully restore on unpliug/replug. Moving a another PC then back, seemed to clear things.
    (ie maybe more than just dropped data was going on here )
    * TX seemed to never drop, but receive side seemed to have the issues.

    As another reference point, Silabs CP2130 specs 3.9 and 2.6MBps on read.write so that does look to be about the duplex limit.
    They also give 5.8MBd(W) and 6.6MBd(R) as one-way limits.


    Loopback streaming tests, 100000 blocks, with a Frequency counter and Char counter Terminal.
    ( This terminal has been crafted to have low overhead, and quiet modes, so the PC SW side does not set the ceiling.)
    Propeller Project Board Tests  FT231X (20p) Loopback 
    FT231X  File of   [U......U]                        Shift-Ctr-V.                        Right-click Paste.
    Block Size  Baud    Set     TxSend       RxBack    FreqAv                               FreqAv
    100000      3Mbd    n,8,1  100000 	 99128!*  1.49989M Qm Overrun errors            1.49985MHz Overrun errors
    100000      2.4MBd  n,8,1  100000 	100000    1.00001M Qm                           1000.018MHz Solid << 2MBd alias
    100000      2Mbd    n,8,1  100000 	100000    1.00001M Qm                           1000.018MHz Solid
    100000      1.5Mbd  n,8,1  100000 	100000    750k quiet mode, less in hex          750.007KHz Solid 
    100000      1Mbd    n,8,1  100000 	100000    380~500KHz variable(hex)              500.0062KHz Solid 
    100000      500kbd  n,8,1  100000 	100000    243.KHz  sometimes 250KHz (hex)       250.0045KHz Solid
    
    100000      3Mbd    n,8,2  100000 	 99577!*   fewer Overrun errors            1.49985MHz Overrun errors
    100000      3Mbd    m,8,2  100000 	 99949!*   Better Rx Yield, still < 100%
    
    * in 3MBd case, external edge TX count is correct, so it is RX side which is dropping chars
    
    Added:
    Same tests, SiLabs CP2105 (ENH) channel Duplex, Shift-Ctr-V QuietMode : 
    (kHz values under 0.5*Baud, mean added stop bits)
    Block Size  Baud    Set    TxSend       RxBack   FreqAv  
    100000      1.2Mbd  n,8,1  100000 	100000   ->  441.194kHz
    100000      2Mbd    n,8,1  100000 	100000   ->  525.516kHz
    100000      3Mbd    n,8,1  100000 	100000   ->  624.674kHz 
    
    

    It seems the FT231X can sustain 2MBd duplex, (with good PC sw) and at 3MBd can send to that with no added stop bits, but it stutters a little on 3MBd Duplex, on the Receive side.

    expanding to 2 Stop bits, and mark parity both help, but are not quite enough to make duplex without over run.
    (SW works to well above this on a FT232H, but that uses different frame speed and drivers)

    I think FTDI have somewhat mangled their Baud formula in my data sheet, tests show more correct is

    FT231X Virtual Baud Clock of 24MHz, with legal divisors of 8,12,16,17,18,19,20,21...

    ie above 16, single digit steps are supported, below 16 it is 8,12
Sign In or Register to comment.