Shop OBEX P1 Docs P2 Docs Learn Events
P2-ES Ethernet Dev Board Add-on - Page 2 — Parallax Forums

P2-ES Ethernet Dev Board Add-on

2»

Comments

  • RamonRamon Posts: 484
    edited 2021-04-10 02:39

    Thank you, will take a look later at that CRCBIT instruction. It guess that the instruction allows to define any polynomial, so it can be used.

    I have updated the MDIO_Interactive program. I have created the MDIO_WRITE_REG function.
    It's buggy, sorry. But sometimes works !!

    Write (option 1) is done with a predefined value $7F, and only for PHY 1 and REG 0.
    Option 2 (RESET). Is just a write of PHY 1 REG 0 with bit 15 enabled. (I works sometimes).

    [2021/04/10]  Edit: I have found the bug. It was related to reg variable not being correctly updated.
     I changed reg to mdio_reg, as 'reg' is a reserved keyword in spin2.
    (It worked before because the file extension was '.spin' instead of '.spin2')
    
  • RamonRamon Posts: 484
    edited 2021-04-10 10:21

    (I have fixed the MDIO Interactive v0.3 code and updated code)

    Please, can someone check the following CRC code?

    The correct result should contain the following bytes : $5B $00 $6C $87 (but I don't know in which order)

    Edited: fixed code loop and other errors (but still wrong FCS, see a few posts later)

    CON
      'poly    = $04C11DB7
      'poly    = $EDB88320
      'poly    = $04C11DAB
      poly    = $FB3EE254
    
      rx_pin   = 63
      tx_pin   = 62
      baud     = 921_600
      DOWNLOAD_BAUD = 921_600
      DEBUG_DELAY   = 100
      DEBUG_BAUD    = 921_600
    
    VAR
      long  crc   
    
    OBJ
      ser: "spin/SmartSerial"
    
    PUB main() | data, i, mycrc
    
      ser.start(rx_pin, tx_pin, 0, baud)
    
      repeat i FROM 0 to 63
        data := byte[@eth_frame[i]]
        ser.printf("data: %x   ", data)
    
        mycrc := mycrc + crc32(data, poly)
        ser.printf("eth_frame[%d]: %x Polynomial: %x  CRC: %x \n", i, data, poly, mycrc)
    
     PUB crc32 (data, poly): crc     ' data byte in, crc word out
    
      org
           SHL         data,#32-8
           SETQ        data
           CRCNIB      crc,poly
           CRCNIB      crc,poly
      end
    
    DAT
    
    ' Captured ethernet frame :
    '
    ' ff ff ff ff ff ff 11 11 11 11 11 11 81 00 00 64
    ' 81 00 00 c8 08 06 00 01 08 00 06 04 00 01 11 11
    ' 11 11 11 11 c0 a8 02 01 00 00 00 00 00 00 01 01
    ' 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ' 5B 00 6C 87  
    '
    ' (last four bytes are the FCS/CRC)
    '
    eth_frame  byte  $ff, $ff, $ff, $ff, $ff, $ff, $11, $11, $11, $11, $11, $11, $81, $00, $00, $64
               byte  $81, $00, $00, $c8, $08, $06, $00, $01, $08, $00, $06, $04, $00, $01, $11, $11
               byte  $11, $11, $11, $11, $c0, $a8, $02, $01, $00, $00, $00, $00, $00, $00, $01, $01
               byte  $01, $01, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
    
    FCS_long   long
    FCS        byte  $5B, $00, $6C, $87
    
  • Your CRC code was never going to work the way it was accumulating its CRC. :( Here's a version using CRCNIB that matches your Ethernet CRC. You may want to test it on more packets but if it worked here, it's probably correct.

    CON
      poly    = $edb88320
    
    OBJ
      ser: "SmartSerial"
      f  : "ers_fmt"
    
    PUB main() | i, crc
      ser.start(115200)
      send:=@ser.tx
    
      crc:=-1 ' initial crc value is all ones
      repeat i from 0 to 63
        crc := crc32(byte[@eth_frame][i], poly, crc)
    
      send("computed ethernet CRC : ", f.hex(crc ^ -1), 13,10) ' need to xor with all ones for final value
    
      send("captured ethernet CRC : ", f.hex(long[@FCS]), 13,10)
    
    
    PUB crc32 (data, polynomial, priorcrc) : newcrc     ' data byte in & crc, polynomial longs in,  crc long out
      newcrc := priorcrc  ' start with prior crc
      org
           rev data ' crcnib uses bits 31 to 28 of Q
           setq data ' setup data in Q
           crcnib newcrc, polynomial
           crcnib newcrc, polynomial
      end
    
    DAT
    
    ' Captured ethernet frame :
    '
    ' ff ff ff ff ff ff 11 11 11 11 11 11 81 00 00 64
    ' 81 00 00 c8 08 06 00 01 08 00 06 04 00 01 11 11
    ' 11 11 11 11 c0 a8 02 01 00 00 00 00 00 00 01 01
    ' 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ' 5B 00 6C 87  
    '
    ' (last four bytes are the FCS/CRC)
    '
    eth_frame  byte  $ff, $ff, $ff, $ff, $ff, $ff, $11, $11, $11, $11, $11, $11, $81, $00, $00, $64
               byte  $81, $00, $00, $c8, $08, $06, $00, $01, $08, $00, $06, $04, $00, $01, $11, $11
               byte  $11, $11, $11, $11, $c0, $a8, $02, $01, $00, $00, $00, $00, $00, $00, $01, $01
               byte  $01, $01, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
    
    FCS        byte  $5B, $00, $6C, $87
    
    
  • roglohrogloh Posts: 5,837
    edited 2021-04-10 11:16

    This shows that the Ethernet CRC can be computed at a rate of 20 clocks for 8 nibbles (when working on 32 bits at a time). This is 2.5 clocks per nibble. At 25M nibbles / sec transmit or receive timing, we can probably do CRC on the fly if the clock speed is sufficient.

    For transmit, while the packet is streaming, it also could get read into LUTRAM by the transmitting COG and a CRC could then be computed in parallel and in time before the final byte is sent. You can read 1 long per clock into LUT, then 3 clocks for a RDLUT data, ptr++ so 4 clocks for 8 nibbles, or 0.5 clocks per nibble. This plus the 2.5 clocks per nibble for CRC means you could operate CRC on the fly if the clock rate is at least 75MHz. You'd want to go higher for overheads, say 100MHz.

    On receive there is more do to but if the streamer is working you have some time to do things. However if you have to byte bang (or more precisely nibble bang), you'd need 2 clocks per nibble for that (rolnib data, ina/inb, #xx), plus you'd want to accumulate your data into a long to write into the LUTRAM, then 1 clock for 32 bits out of LUTRAM to HUB. Trick is going to be delimiting packets correctly and handling the various end cases. It's likely going to need more than P2 3-4x clocks per nibble for this, especially if it is two bit wide RMII at 50MHz instead of 4 bit wide MII or RGMII at 25MHz. Hopefully something like 150-200MHz would be enough for 25MHz x 4 bit Ethernet. For 2 bit RMII you may also like the RCZL and RCZR instructions if the RMII buses are nibble aligned, or the SPLIT/MERGE stuff that evanh used with Dual SPI and 2 smart pins in synch receiver mode. RX is going to be a lot trickier to get correct vs TX. The smart pin sync mode is nice because it probably lets you operate using a range of P2 frequencies, though we probably want the P2 clock to be a multiple of 25 or 50MHz anyway for the TX side if the streamer is used.

  • Rogloh, Thank you so much!

    I would never find out how to calculate the CRC with CRCNIB instructions unless you haven't posted the code.
    So many things can go wrong (wrong initialized value, wrong polynomial, wrong direction SHL or REV?, wrong nibble order, not doing final XOR, etc ...)

    I have tested with several frames, and it's PERFECT !! Thanks a lot !

    ' From Rogloh Ethernet CRC example (2021/04/10 15:17)
    '
    '    https://forums.parallax.com/discussion/comment/1521854/#Comment_1521854
    '
    CON
    
      rx_pin   = 63
      tx_pin   = 62
      baud     = 115_200
      DOWNLOAD_BAUD = 115_200
      DEBUG_DELAY   = 100
      DEBUG_BAUD    = 115_200
    
    OBJ
      ser: "spin/SmartSerial"
    
    PUB main() | i, mycrc, mypoly
    
      ser.start(rx_pin, tx_pin, 0, baud)
    
      ser.printf("\nRogloh Ethernet CRC function: \n\n")
    
      mypoly := $edb88320
      mycrc:=-1
      repeat i FROM 0 to 63
        mycrc  := crc32(byte[@eth_frame1][i], mypoly, mycrc)
      ser.printf("FCS = %x (Expected) Polynomial: %x  CRC: %x CRC_xor: %x (Calculated)\n", long[@FCS1], mypoly, mycrc, mycrc ^ -1)
    
      mypoly := $edb88320
      mycrc:=-1
      repeat i FROM 0 to 73
        mycrc  := crc32(byte[@eth_frame2][i], mypoly, mycrc)
      ser.printf("FCS = %x (Expected) Polynomial: %x  CRC: %x CRC_xor: %x (Calculated)\n", long[@FCS2], mypoly, mycrc, mycrc ^ -1)
    
      mypoly := $edb88320
      mycrc:=-1
      repeat i FROM 0 to 59
        mycrc  := crc32(byte[@eth_frame3][i], mypoly, mycrc)
      ser.printf("FCS = %x (Expected) Polynomial: %x  CRC: %x CRC_xor: %x (Calculated)\n", long[@FCS3], mypoly, mycrc, mycrc ^ -1)
    
      mypoly := $edb88320
      mycrc:=-1
      repeat i FROM 0 to 59
        mycrc  := crc32(byte[@eth_frame4][i], mypoly, mycrc)
      ser.printf("FCS = %x (Expected) Polynomial: %x  CRC: %x CRC_xor: %x (Calculated)\n", long[@FCS4], mypoly, mycrc, mycrc ^ -1)
    
      repeat
        org
          nop
        end
    
    PUB crc32 (data, polynomial, priorcrc) : newcrc     ' data byte in & crc, polynomial longs in,  crc long out
      newcrc := priorcrc  ' start with prior crc
      org
           rev data ' crcnib uses bits 31 to 28 of Q
           setq data ' setup data in Q
           crcnib newcrc, polynomial
           crcnib newcrc, polynomial
      end
    
    DAT
    
    ' Captured ethernet frame :
    '
    ' ff ff ff ff ff ff 11 11 11 11 11 11 81 00 00 64
    ' 81 00 00 c8 08 06 00 01 08 00 06 04 00 01 11 11
    ' 11 11 11 11 c0 a8 02 01 00 00 00 00 00 00 01 01
    ' 01 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    ' 5B 00 6C 87  
    '
    ' (last four bytes are the FCS/CRC)
    '
    
    eth_frame1 byte  $ff, $ff, $ff, $ff, $ff, $ff, $11, $11, $11, $11, $11, $11, $81, $00, $00, $64
               byte  $81, $00, $00, $c8, $08, $06, $00, $01, $08, $00, $06, $04, $00, $01, $11, $11
               byte  $11, $11, $11, $11, $c0, $a8, $02, $01, $00, $00, $00, $00, $00, $00, $01, $01
               byte  $01, $01, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
    
    FCS1_long  long
    FCS1       byte  $5B, $00, $6C, $87
    
    ' http://wiki.hevs.ch/uit/index.php5/Standards/Ethernet
    '
    eth_frame2 byte  $F0, $4D, $A2, $33, $B7, $EF, $00, $25, $64, $C2, $6F, $6A, $08, $00, $45, $00
               byte  $00, $3C, $31, $98, $00, $00, $80, $01, $CA, $DA, $99, $6D, $05, $E0, $99, $6D
               byte  $05, $94, $08, $00, $D5, $5B, $04, $00, $74, $00, $61, $62, $63, $64, $65, $66
               byte  $67, $68, $69, $6A, $6B, $6C, $6D, $6E, $6F, $70, $71, $72, $73, $74, $75, $76
               byte  $77, $61, $62, $63, $64, $65, $66, $67, $68, $69
    
    FCS2_long  long
    FCS2       byte  $63, $A7, $EA, $82
    
    ' https://web.archive.org/web/20081023184031/http://www.fpga-faq.com/archives/91050.html#91062
    '
    eth_frame3 byte  $0d, $0d, $0d, $0d, $0d, $0d, $0c, $0c, $0c, $0c, $0c, $0c, $88, $08, $00, $01
               byte  $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77
               byte  $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77
               byte  $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77, $77
    
    FCS3_long  long
    FCS3       byte  $e5, $5c, $0b, $f8
    
    ' http://stackoverflow.com/questions/9286631/ethernet-crc32-calculation-software-vs-algorithmic-result
    '
    eth_frame4 byte  $ff, $ff, $ff, $ff, $ff, $ff, $00, $00, $00, $00, $00, $00, $08, $00, $45, $00
               byte  $00, $1C, $00, $00, $00, $00, $FF, $01, $62, $F9, $AC, $64, $00, $01, $AC, $64
               byte  $00, $1E, $08, $00, $F7, $BC, $00, $42, $00, $01, $00, $00, $00, $00, $00, $00
               byte  $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00, $00
    
    FCS4_long  long
    FCS4       byte  $9E, $89, $24, $C6
    
  • No problem. I want to see Ethernet on P2 as much as you do.

  • SimoniusSimonius Posts: 94
    edited 2021-04-10 22:42

    I would recommend using MII at 25Mhz, 4 bits at a time, both clocks are generated in the PHY ( 2 pins Management + 2 x 6 pins for TX/RX which fits on a double extension board)
    on a positive clock edge the data should be valid.
    when transmitting, sync to the negative edge to make sure data is in place on posedge
    so can the smartpins be used?
    the documentation is not clear here: can a smart pin output data using an external clock ?
    besides, a pin that determines the packet start is also sampled with the clock, another smartpin used? let it count the pulses since RX_DV (packet start) went up and know exact ethernet frame start
    would be nice if a) PLL is not used for ethernet b) only 2-3 cogs used for full duplex ethernet c) calculate CRC on the fly for realtime ethernet
    at 25 Mhz could a simple wait-for-pin / interrupt scheme be used to drive the signals?
    for crc a lookup table could be used to process 8 bit at once

  • roglohrogloh Posts: 5,837
    edited 2021-04-11 07:22

    @Simonius said:
    I would recommend using MII at 25Mhz, 4 bits at a time, both clocks are generated in the PHY ( 2 pins Management + 2 x 6 pins for TX/RX which fits on a double extension board)
    on a positive clock edge the data should be valid.
    when transmitting, sync to the negative edge to make sure data is in place on posedge
    so can the smartpins be used?
    the documentation is not clear here: can a smart pin output data using an external clock ?

    It should be possible based on this information:

    %11100 = synchronous serial transmit
    
    This mode overrides OUT to control the pin output state.
    
    Words of 1 to 32 bits are shifted out on the pin, LSB first, with each new bit being output two internal clock cycles after registering a positive edge on the B input. For negative-edge clocking, the B input may be inverted by setting B[3] in WRPIN's D value.
    

    besides, a pin that determines the packet start is also sampled with the clock, another smartpin used? let it count the pulses since RX_DV (packet start) went up and know exact ethernet frame start
    would be nice if a) PLL is not used for ethernet b) only 2-3 cogs used for full duplex ethernet c) calculate CRC on the fly for realtime ethernet

    Yes it would be nicer if their clocks could be fully decoupled. For MII it might be feasible. Possibly not for RMII. At least two COGs are a given for full duplex Ethernet. You might be able to hold off the TX transfers if it's COG is needed for some MAC functions. Hopefully a 3rd COG, if needed, is just the client COG itself and not an additional one.

    at 25 Mhz could a simple wait-for-pin / interrupt scheme be used to drive the signals?

    I would hope so.

    for crc a lookup table could be used to process 8 bit at once

    It could but I'm not sure in this particular case if a table approach is faster than the 4 instructions per byte with the P2 HW approach (REV+SETQ+CRCNIB+CRCNIB). It might be 5 - 6 from memory since I last did this. The RDLUT takes 3 clocks too.

    uint32_t crc32_1byte(const void* data, size_t length, uint32_t previousCrc32 = 0)
    {
       uint32_t crc = ~previousCrc32;
       unsigned char* current = (unsigned char*) data;
       while (length--)
         crc = (crc >> 8) ^ Crc32Lookup[(crc & 0xFF) ^ *current++];
       return ~crc;
    }
    

    Yeah it's inner loop in PASM becomes 11 clocks vs 8, so a table approach isn't worth it in this case.

    getbyte xx, crc, #0
    xor     xx, newdata
    rdlut   xx, xx
    shr     crc, #8
    xor     crc, xx
    
  • The data must be collected in 32 bit words as early as possible... using 4 smart pins @ 25 MHz, MII // or 2 smart pins @ 50 MHz, RMII
    i suspect that using 2 pins is more efficient than dealing with 4 pins
    the overhead of starting/stopping the 2 vs 4 smart pins and gathering/interleaving their data differs because in one case we can clock in 16 bits (x2) in a row and in the other only 8 bits (x4)
    After it's inside an 32 bit register we are dealing with mere 3,125,000 words/second... suddenly it sounds quite doable
    doing a 32 bit crc should be REV+SET?+8 times CRCNIB ... 10 instructions
    i think about how we can employ the PIN COMPARE interrupt to start the smart pins once the "start frame delimiter" is received (i.e. the first "11" bits on the line that come together with the RX_CLOCK high, which comes right after the 010101...preamble)
    once the header of the packet is received, the packet length is known and the smart pins can be stopped at the exact frame end
    after running through the CRC the data could be sent off into the HUB memory or be kept inside the cog to process it there while switching to another cog to receive the next frame

  • roglohrogloh Posts: 5,837
    edited 2021-04-14 03:14

    @Simonius said:
    The data must be collected in 32 bit words as early as possible... using 4 smart pins @ 25 MHz, MII // or 2 smart pins @ 50 MHz, RMII
    i suspect that using 2 pins is more efficient than dealing with 4 pins

    It might be, just needs to be coded to see which is easier.

    the overhead of starting/stopping the 2 vs 4 smart pins and gathering/interleaving their data differs because in one case we can clock in 16 bits (x2) in a row and in the other only 8 bits (x4)
    After it's inside an 32 bit register we are dealing with mere 3,125,000 words/second... suddenly it sounds quite doable
    doing a 32 bit crc should be REV+SET?+8 times CRCNIB ... 10 instructions
    i think about how we can employ the PIN COMPARE interrupt to start the smart pins once the "start frame delimiter" is received (i.e. the first "11" bits on the line that come together with the RX_CLOCK high, which comes right after the 010101...preamble)

    Yes it is critical to get this part right to find the start of the frame. I think it is potentially the hardest part of the whole thing and needs to work or nothing will work. If we can't delimit at wire speed then we'll have a problem keeping up with the data.

    once the header of the packet is received, the packet length is known and the smart pins can be stopped at the exact frame end

    Only if you interpret what is in the packet, like the IP header contents (and you don't want to do that). Otherwise you probably don't really know the length until the packet ends or it exceeds some upper limit. This might mean the CRC needs to keep a pipeline of previous 4 values and selects the one four bytes ago for comparison only after the packet ends.

    after running through the CRC the data could be sent off into the HUB memory or be kept inside the cog to process it there while switching to another cog to receive the next frame

    I'd hope it would be possible to do the CRC and use the FIFO to write the Ethernet data bits to HUB RAM at the same time. This way there is less dead time and the COG would be ready to receive the next frame sooner, ie. right after the 12 byte IPG.

  • Cluso99Cluso99 Posts: 18,069

    There is a possibility of using a cog pair with shared lut. This is precisely the situation I thought for the shared lut.

  • In what way do you think it should be shared in this application? To have an RX coordinate with TX COG somehow, or for something else? I know for full duplex operation we'd want independent RX and TX COGs because RX packets could arrive at any time while sending packets.

  • RamonRamon Posts: 484
    edited 2021-04-21 13:57

    I am playing with a saleae-like analyzer (using Sigrok Pulseview).

    I have found that Pulseview has a decode for MDIO. Here is the source code:

    https://sigrok.org/gitweb/?p=libsigrokdecode.git;a=tree;f=decoders/mdio

    I was unsure if my code was working correctly or not (it matched the TI MSP430 USB-2-MDIO tool, but some times the output is not stable).

    I found a good register (register 0x17, 23d) in DP83848 to experiment with. This register has some bits that are RW and some others that are read-only and always returns 0.

    So I tested to write and read into this register. And for this test, I implemented a few new options on the menu to select the register and do writes (this is version 0.4).

    I found that this decoder (in Pulseview) it is not working correctly for reads (as it is impossible to read 67h on register 17h due to the read-only bits).

    My code reads and writes correctly most of times (so there could be still some bugs, that I was not able to find). I wonder if it could be something related to the turnaround bit time. I will upload this new code v0.4 to the first page (first post).

Sign In or Register to comment.