Clarification needed on wxpin behaviour with running (a)sync serial RX

M1k3y · 2024-10-09 11:10

I'm currently developing my own RMII driver for the P2 and recently found the code from @ManAtWork on the OBEX: https://obex.parallax.com/obex/ethernet-rmii-driver/

There I discovered a pretty genius way of handling the async DV (data valid) signal on the RMII receiver. To synchronize the receiver with the bit-stream, the word length in the X register gets written while the pin is active. This does not conform to the documentation. The question that arises from this for the serial modes (mainly RX, but might also concern TX) is:

What exactly happens in the smart pin, when the X register gets written while it is an active serial receiver? Is there a time frame or delay when the new configuration gets applied? What happens to the Bits that got received before the wxpin instruction? Does a wxpin instruction effectively reset the serial module or does it just change it's "data received" trigger condition?

Another related question: What happens to the previous data when the pin is configured for a message length of less than 32 Bit? I assume the Z register gets fully overwritten by the pins internal shift register. But what about previous data? Will the internal shift register retain the previous bits or does it get cleared when the word length is reached and the data got copied to the Z register? The documentation (requirement of masking for non 32Bit words of MSB data) leads me to assume, that previous Bits are retained.

Context/Primer for RMII RX (here for 100MBit/s Ethernet):
RMII RX works a lot like SPI. There is a 50 MHz Clock (always active), two data pins (RX0 and RX1, one for even bits, one for odd bits) and the DV/CRS signal (data valid / carrier sensed). The DV/CRS works like a chip select, but goes high at a random, unknown time before the actual data frame begins. The beginning of the ethernet frame must be detected from the preamble and start frame delimiter, which is a known sequence of 8 Bytes (7 times 0x55 followed by a 0xd5. On RMII this effectively means that RX0 is high for the entire 8 Nibbles while RX0 stays low and only goes high on the very last Bit of the 8 Nibbles).

evanh · 2024-10-09 12:45

I haven't tried to work out exactly what happens but, for sure, the bits get mangled momentarily. I had a shot at using this trick for handling the start-bit framing of the SD interface mode of SD cards. It didn't work because the bits all have to be accounted for from the moment the start-bit arrives. Whereas with Ethernet, it doesn't matter if some of the preamble bits are lost. You only have to predict their termination.

ManAtWork · 2024-10-09 13:30

@M1k3y said:
There I discovered a pretty genius way of handling the async DV (data valid) signal on the RMII receiver. To synchronize the receiver with the bit-stream, the word length in the X register gets written while the pin is active. This does not conform to the documentation.

Why? The docs don't allow nor forbid to write the X register while the pin is active. The documentation is actually very sparse. You have to guess or try out how the hardware is built and how it behaves. As I understand it the shift register size does not change, it's always 32 bits. All the X register does is to influence when the Z register gets updated and IN (ready/buffer full flag) is raised. WXPIN itself doesn't trigger any action.

The question that arises from this for the serial modes (mainly RX, but might also concern TX) is:

What exactly happens in the smart pin, when the X register gets written while it is an active serial receiver? Is there a time frame or delay when the new configuration gets applied? What happens to the Bits that got received before the wxpin instruction? Does a wxpin instruction effectively reset the serial module or does it just change it's "data received" trigger condition?

The exact timing doesn't matter as long as X is written with some clocks margin before the trigger should happen. For example, if the number of bits is changed from 16 to 18 and X is written before bit #10 is recieved I assume it's safe.

Another related question: What happens to the previous data when the pin is configured for a message length of less than 32 Bit? I assume the Z register gets fully overwritten by the pins internal shift register. But what about previous data? Will the internal shift register retain the previous bits or does it get cleared when the word length is reached and the data got copied to the Z register? The documentation (requirement of masking for non 32Bit words of MSB data) leads me to assume, that previous Bits are retained.

The shift register is never changed or cleared by events other than the shift clock. I think it even retains older data before the lsat smart pin reset (DIR=low). Only the shift clock is stopped.

ManAtWork · 2024-10-09 13:35

@evanh said:
I haven't tried to work out exactly what happens but, for sure, the bits get mangled momentarily. I had a shot at using this trick for handling the start-bit framing of the SD interface mode of SD cards. It didn't work because the bits all have to be accounted for from the moment the start-bit arrives. Whereas with Ethernet, it doesn't matter if some of the preamble bits are lost. You only have to predict their termination.

I think with the SD card interface timing is quite critical. The Ethernet preamble is long so you have much more time to react.

BTW, @M1k3y what is the motivation for developping a new RMII driver? You need some functionality my driver doesn't offer? I ask because some forum members asked for an IP stack for my driver, but unfortunatelly, I never had the time (nor the requirement) to implement it. If you plan to do IP communication we could probably merge our efforts.

M1k3y · 2024-10-09 13:38

@ManAtWork said:

Why? The docs don't allow nor forbid to write the X register while the pin is active. The documentation is actually very sparse. You have to guess or try out how the hardware is built and how it behaves. As I understand it the shift register size does not change, it's always 32 bits. All the X register does is to influence when the Z register gets updated and IN (ready/buffer full flag) is raised. WXPIN itself doesn't trigger any action.

Hm. My reading of the documantation was, that (in case of serial smart modes) the X register is part of the configuration and smart mode configuration should be done while the pin is disabled (OUT low).

What exactly happens in the smart pin, when the X register gets written while it is an active serial receiver? Is there a time frame or delay when the new configuration gets applied? What happens to the Bits that got received before the wxpin instruction? Does a wxpin instruction effectively reset the serial module or does it just change it's "data received" trigger condition?

The exact timing doesn't matter as long as X is written with some clocks margin before the trigger should happen. For example, if the number of bits is changed from 16 to 18 and X is written before bit #10 is recieved I assume it's safe.

I guess I'll run some tests to verify this behavior.

M1k3y · 2024-10-09 13:48

@ManAtWork said:

BTW, @M1k3y what is the motivation for developping a new RMII driver? You need some functionality my driver doesn't offer? I ask because some forum members asked for an IP stack for my driver, but unfortunatelly, I never had the time (nor the requirement) to implement it. If you plan to do IP communication we could probably merge our efforts.

Mainly the fact, that I started working on it before looking on the OBEX. It's also a good opportunity for me to get a better understanding of how the P2 works (this is my first project with it and Ethernet is both a critical and the most demanding requirement). I will add UDP/IP to it, but am currently not planning anything beyond receiving and sending UDP messages and ARP. So no DHCP, TCP or other functionality is planned.

I'll be happy to share my work (once it is actually done, so far I am receiving and decoding the Ethernet and IP Headers) but it might have limited portability as I started in C with the p2llvm project and am currently migrating it over to zig (thanks to a good friend of mine who just added the P2 as a target to the language) and testing various C compilers for the P2 (zig does not (yet) support the P2 on a hardware level, it compiles down to C to be used by another compiler afterwards).

evanh · 2024-10-09 13:52

With smartpins in general, the X,Y and Z registers stay intact when mode adjustments are made ... even when switching to completely different modes. Not surprisingly, without the reset step, this instant switchover will corrupt the state machine. Weird side effects do occur, including a lock up (crash) of the smartpin.

So, inline with that info, the docs state that DIRL/DIRH reset sequence is required as part of mode changing. It is just basic hardware, it doesn't stop you misconfiguring/abusing it.

M1k3y · 2024-10-09 14:00

@evanh With the possibility of undefined/unexpected behaviour in mind, I might have an idea for a different solution that could achieve the same result without reconfiguring the pin mid transmission. As you seem to have a lot of knowledge around the P2 I have another question I hope you could provide some insights to.

My idea revolves around using the neighbour pin and input logic to logically AND the clock and RX0 line. As RX0 goes high with the first real Bit and stays high for the entire preamble, this would effectively disable the clock until the frame begins and should not cause any side effects to reconfigure mid transmission. However, I'm unclear on where exactly a pin grabs the neighbours pin state from.

For the sync serial modes, the (data) pin takes it's clock from the B input. Does this refer to the actual electrical state at the pad or to the "IN" state of the respective pin after the input logic is applied?

evanh · 2024-10-09 14:24

Yeah, the labels A and B aren't clear at all. I've started using the labels smartA and smartB instead. That's because there is also another pinA and pinB (odd/even pins) at each pin of the low-level pad ring as well.

So, smartA and smartB sit between the eight input pin selector/logic/debounce block and smartpin block. SmartA and smartB each have a 1-of-8 selector and debounce filter but only smartA has the logic function. To mask the clock that way, it would have to be on smartA, and be ANDed with data on smartB. But then the data would still be clocking the smartpin.

evanh · 2024-10-09 14:35

The logic function really only works when smartB is not used.

M1k3y · 2024-10-09 14:37

Ok, I have some trouble following you.

I'm aware of the even/odd pin neighbour signals. I think they are referred to as "OTHER" in the documentation. I'm not referring to those.

If I understand you correctly, there is a different behaviour for the input selection (fields A, B and F of the wrpin configuration) for "dumb" and "smart" modes?

What I had in mind:
Pin0 - DV ("chip select")
Pin1 - CLK
Pin2 - RX0
Pin3 - RX1

RX0 and RX1 are configured as sync serial RX with configuration field B referencing the CLK Pin.
The CLK Pin gets configured to use itself as input A and RX0 as input B.
While idle, The CLK Pin has it's input filter set to A AND B.
Once a frame has started and the first bytes are received, reconfigure the CLK input B source to the DV Pin until the frame has finished.

That way the clock is gated until the actual frame starts.

evanh · 2024-10-09 14:46

"Other" is an output feature. The block diagram shows both pinA/B and smartA/B - https://forums.parallax.com/discussion/download/137264/Slide1.PNG

M1k3y · 2024-10-09 14:53

Oh, now I get it. So my idea would not actually work, as the input selection and filtering takes the actual State of the pins frontend.

However it might be possible to abuse the comparator to achieve the filtering I was looking for. I'll have to think about this a bit.

evanh · 2024-10-09 14:55

[you got it before I posted]

ManAtWork · 2024-10-09 15:21

@M1k3y said:
My idea revolves around using the neighbour pin and input logic to logically AND the clock and RX0 line. As RX0 goes high with the first real Bit and stays high for the entire preamble, this would effectively disable the clock until the frame begins and should not cause any side effects to reconfigure mid transmission. However, I'm unclear on where exactly a pin grabs the neighbours pin state from.

If I remember correctly, the state of the RXD pins before the preamble starts is undefined. Also the exact timing of the CRS_DV signal is undefined. So there is no way of detecting the start of the preamble. All you can do is to detect its end. I've made a lot of experiments to get this right.

' CRS_DV goes high asynchronously when the J/K delimiter symbols are recognized (before the preamble)
' The first received longword looks something like $55555500 or $55555400 (LSB = received first).
' So this is a variable number of '0' bits followed by the preamble ($55 means RXD0=1 and RXD1=0).
' First, we read a fixed number of 32 bits = 16 bit pairs.

              waitse2                           ' wait for first 16 bit pairs
              rdpin   rxd0,pinRxd0              ' MSW 16 bits, last received in bit 31
              rev     rxd0                      ' -> LSW, first bit received in bit 15
              getword rxd0,rxd0,#0              ' clear MSW
              encod   rxd1,rxd0                 ' find position of first '1'
              subr    rxd1,#30                  ' number of '0'bits = how many extra bits to shift
              wxpin   rxd1,pinRxdAll3           ' length = 16 data bit pairs + extra bits

' To synchronize the packet data we now need to shift more than 16 bit pairs in to get the full
' preamble + SFD. The bits of the last 16 bit pairs remain in the lower bits of the shift registers.
' The extra bits are at bit #15 down and push the leading '0' bits out at the right. The fresh
' bits are in the MSW and are now the synchronized last 32 bits of the preamble + SFD = $D5_555555.

M1k3y · 2024-10-09 15:58

@ManAtWork said:

@M1k3y said:
My idea revolves around using the neighbour pin and input logic to logically AND the clock and RX0 line. As RX0 goes high with the first real Bit and stays high for the entire preamble, this would effectively disable the clock until the frame begins and should not cause any side effects to reconfigure mid transmission. However, I'm unclear on where exactly a pin grabs the neighbours pin state from.

If I remember correctly, the state of the RXD pins before the preamble starts is undefined. Also the exact timing of the CRS_DV signal is undefined. So there is no way of detecting the start of the preamble. All you can do is to detect its end. I've made a lot of experiments to get this right.

I just checked the RMII specifications. It is a bit convoluted as CRS_DV effectively carries two different signals, however it is quite clear about the behavior of the RX signals.

"[..] The data on RXD[1:0] is considered valid once CRS_DV is asserted. However, since the assertion of CRS_DV is asynchronous relative to REF_CLK, the data on RXD[1:0] shall be “00” until proper receive signal decoding takes place (see definition of RXD[1:0] behavior). [...] Upon assertion of CRS_DV, the PHY shall ensure that RXD[1:0]=“00” until proper receive decoding takes place [...]" (See section 5.2 and 5.3 http://ebook.pldworld.com/_eBook/-Telecommunications,Networks-/TCPIP/RMII/rmii_rev12.pdf ).

To be clear, I'm not criticizing your implementation. I think the method you used is pretty genius (my current implementation just searches for the preamble, calculates the offset and shifts all following data to match the offset, which is currently not working in real-time as I'm struggling a bit with the p2llvm compiler not recognizing some instructions in the inline asm).

The only part I'm "not liking" is the possible undefined behavior that might arise from changing the smart pin configuration in a way that is not specifically intended. But this comes purely from the requirements I have set for this personal project.
For some context on this: This project of mine uses the P2 as a data interface to connect a large number of microcontrollers to ethernet. There will be over a dozen P2s in the system, each handling between 2 and 8 Ethernet Frames every millisecond. The whole project requires both high reliability and very low latency in it's data processing. That is why I'm currently searching for a solution to implement this fully inside the intended procedures.

Simonius · 2024-10-30 07:02

we had that discussion before, i'm not sure if it can be done in any other way, i dare say RMII did almost not happen and took me lots of effort to make the loop fast enough to actually catch the frame start. would have been nice to use the same technique (changing fifo size mid-flight) for the TX cog but the way ManAtWork did it is fully functional so that chapter is closed for me

also note that some PHYs toggle the DV line because they mux two signals on that line and the code in the current form does not account for that (since we use the chip that doesn't do muxing)

https://forums.parallax.com/discussion/174351/rmii-ethernet-interface-driver-software/p1
the old thread

Simonius · 2024-10-30 07:11

I have a idea where the packet inspector (that is the arp/ip/udp/whatnot handler) runs beforehand and starts to call the RMII receive routine repeatedly (CALL opcode or similar) and then follows the flow of the packet based on the packet's header and so forth
the receiver procedure works the same way it does now but i will have a _RET after each cycle that hands over back to the packet inspector. in that way we would lose almost no time for the protocol stack on the RX side
if i had more time i would have done it already but for now it's just an suggestion

evanh · 2024-10-30 08:05

There is a task slicing technique that Chip promoted with the Prop1 that allows progressive stepping through multiple mini tasks in cogRAM. The basis of it was the JMPRET instruction that exchanged jump addresses at designated points.

The Prop2 has the more familiar link register type instruction, CALLD, to do the same. It's how the Prop2 handles ISR branching too. In fact there is a specific alias, RESIx, just for slicing mini tasks inside an ISR.

Clarification needed on wxpin behaviour with running (a)sync serial RX

Comments