Bit-Banged Ethernet

hippy · 2007-10-11 13:15

Is it possible to bit-bang Ethernet UDP ( transmission only ) with a Propeller Chip ?

It would need banging at 20MHz so usual bit-banging techniques are out of the question, but could the video hardware / WAITVID be utilised in some way to read a pre-built bit-stream 'packet' from Hub and churn them out ?

Obviously an ENC28J60 or similar solution would be much more useful but just UDP on a one-to-one link isn't without its uses.

crgwbr · 2007-10-11 13:16

This should Answer your question

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
"... one of the main causes of the fall of the Roman Empire was that, lacking zero, they had no way to indicate successful termination of their C programs." -

"If Python is executable pseudocode, then perl is executable line noise."

"The best accelerator available for a Mac is one that causes it to go at 9.81 m/s2."

"My software never has bugs. It just develops random features."

"Windows isn't a virus, viruses do something."

"Programmers are tools for converting caffeine into code."

"Enter any 11-digit prime number to continue."

hippy · 2007-10-11 13:39

@ crgwbr : Thanks for thie link, but "not really". Something without needing a NIC ( without magnetics would be nice too ) is what I'm after, everything done on the Propeller Chip alone except for any external hardware needed to produce the bi-phase signal.

Bergamot · 2007-10-11 13:47

As I understand it, you have to implement not only UDP (pretty simple) but also a large chunk of IPv4:

en.wikipedia.org/wiki/IPv4#Packet_structure

Though, I know someone on here had a bare-bones TCP/IP implementation that you could probably use parts of.

hippy · 2007-10-11 15:05

@ Bergamot : Embedding the UDP packet within IP isn't really too much of a problem. As it's transmit only the source and destination IP Addresses and Ports would have to be pre-defined ( or updated by some non-TCP/IP mechanism ). Calculating the header checksum isn't hard and may only need to be done once or when changed if fixed size UDP packets are used. That should be easy to do in Spin. On top of that it's necessary to add the Ethernet packet preamble, framing and checksum. but once done, you have a simple bit-array which needs to be churned out.

I don't see a problem in building up the bit-stream to send, just in getting them sent out fast enough. It can be done with AVR and any micro / FPGA fast enough ...

http://www.cesko.host.sk/IgorPlugUDP/IgorPlug-UDP (AVR)_eng.htm
http://www.fpga4fun.com/10BASE-T2.html

I did think of doing it with a Cog sequence of just "MOV OUTA,#%10 / MOV OUTA,#%01" but that only gives 494 bits. The FPGA implementation shows 55 bytes for a single byte of UDP data but with Manchester encoding that's 880 bits.

Getting NLP/FLP pulses using "MOV INA,#" is probably the first place to test anything.

rokicki · 2007-10-11 17:03

One trick I use here a lot is to use PHSA with the data to send, FRQA set to 0, and then just ROL PHSA for each bit.
To start a new 32-bit chunk, you can just move from some other register. So if you can afford one instruction
per bit, that will get you 20MHz for 400+ bits or about 50 bytes.

If you're careful you can use two cogs to alternate sending a 32-bit chunk; that will give each cog plenty of time to
load the next word. You just need to time things carefully.

Since you are doing Manchester encoding, each bit will transition, so you could perhaps set FRQA to 0x8000_0000
to get the in-bit transitions, and then use the ROL PHSA to get the actual data transitions, and then you have
every other instruction to do other things like load up the next data value to ship.

Ariba · 2007-10-11 18:50

I think the use of the Video Shifter is the easiest approach. I don't have much expirience with the VideoShifter, but it should be possible to use the VGA 1Bit (2 color) mode. The Shift Frequency must be set to 20 MHz and the ColorRegister contains two colors with the bit pattern %xxxxxx10 and %xxxxxx01 for the inverted and not inverted Manchester Output. The Mask must be set to output only 2 Bits of the Color-Bytes.
With that the shifter outputs %10 for every 0-Bit in a long, or %01 for every 1-Bit in a long that is written to the shifter with WAITVID.
The Pre-built bit stream can stay in HubRAM and must be read by the cog with a rate of 20MHz / 32 = 625 kHz, this is no problem !
In this Bit-Stream you have 2 Bits for every Data-Bit to send (one for every state of the Manchester codeing). So for 55 Byte packets you need 55 words in HubMemory (28 longs).

Andy

hippy · 2007-10-11 23:11

@ Ariba : That was my line of thinking; with the Video Shifter handling the bits out that gives the time necessary to get the next long from the 'bit array' already built in Hub. An additional advantage is that the Cog will be checking for when the array is ready to send and can send the NLP's every 16mS as required to fill in when it has sent nothing.

Unfortunately I'm not very clear on using the Video Shifter and don't have a scope so cannot just throw things at it and see what happens; it's going to have be done 'theoretically' and by trial and error.

Is there any decent, detailed explanation as to how the Video Shifter, associated registers and commands work ? I know there's the Hydra book but I don't have that, and I don't really want to buy third-party books to get what other manufacturers would include in their datasheets.

Ariba · 2007-10-12 00:31

The Datasheet has the necessary informations in Chapter 4.9:
www.parallax.com/dl/docs/prod/prop/PropellerDSv0.3.pdf

What I would try:
Set CntrA in PLL Mode to 20 MHz
VCFG Register:
- VideoMode to %01 (VGA)
- VGroup to the desired PinGroup
- VPins = %00000011 to enable only the lowest 2 Pins of the Group
VSCL Register:
- PixelClocks = 1
- FrameClocks = 32
WAITVID:
- The ColorPart (source) fix to %xxxxxxxx_xxxxxxxx_xxxxxx10_xxxxxx01
- the PixelPart (destination) holds the Data that is shifted out: a 0 Bit outputs 'Color 0' %xxxxxx01, a 1 Bit 'Color 1' %xxxxxx10

The harder Part is to generate the pre-built Bitstring

Andy

hippy · 2007-10-12 02:47

@ Andy : Thanks for the pointer ( D'Oh ! Why didn't I look in the actual datasheet ! ) and the accompanying notes. I should be able to tie the two together and start to make sense of it. I guess it's not unlike hardware driven SPI, but multiple outputs in parallel, no clock output.

"The harder Part is to generate the pre-built Bitstring" -- I'm glad you say that, because I think that's the easy part ( famous last words ! ), so looks like this could be quite straight forward in total.

Lawson · 2007-10-12 17:53

hippy said...
don't have a scope

Do you have a spare Propeller board? Viewport might provide a solution for you then. It's not an oscilloscope, but it is able to do logic analysis as high speeds.

hope this helps,
Marty

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Lunch cures all problems! have you had lunch?

deSilva · 2007-10-12 17:56

DeSilva has a COMPLETE chapter about this in his tutorial, exactly explaining how to use the video logic for this kind of purpose... but no-one reads it .. sigh..

deSilva · 2007-10-12 18:34

rokicki said...
One trick I use here a lot is to use PHSA with the data to send, FRQA set to 0, and then just ROL PHSA for each bit.

This is a fine application, I think mostly unknown, as the use of the counters is not very popular - or so it seems.

When I was looking for fast input some time ago I - obviously - tried conter mode 50, FRQA set to 1. This would add to PHSA every quarter of an instruction, so shifting PHSA 3 places should lead to a 20 MHz raw-input. This was when I discovered the "shadow registers"

So I finally came up with the conservative

TEST theBit, INA WC
RCL value,#1

@10 Mhz

hippy · 2007-10-13 03:44

Spin code written, debugged, tested and seems to be producing what I expect, so now have an array of longs full of bits to stream out and a wrapper object which makes it extremely easy to use.

Now to get the Assembler Cog doing something useful with the 'bit array' it gets given. Then on to working out how to do the hardware interfacing. Moving out of my comfort zone now so progress may be a little slower than for the Spin side of things. When I have the code tidied up I'll publish what I currently have.

@ Marty : Viewport looks like it will be very useful. Thanks for the reminder.

@ deSilva : Your tutorial makes a lot more sense now than when I read it before, so I guess I'm starting to get to grips with it. Still slowly edging into the Assembler side of things, and I have to admit that I don't really understand the CTR, FRQ and PHS register usage yet. The problem with ROL PHSA for each bit looks to be that it will still be too tight to be able to fetch the data from Hub without disrupting the bit-stream timing. Off-loading 32-bits at a time for waitvid to churn out still looks the best bet.

deSilva · 2007-10-13 07:41

This the charrming effect of WAITVID: It gives you the time to do other things. You can automatically output 4 arbitrary(!) bitpatterns per cycle. When "uploading" it from the HUB @ 16 ticks you theoretically have 20 MHz

But the loop will not allow incrementing

WAITVID
RDLONG
DJNZ

You can either unroll it (for 512 bits will need 128 loops x 3 instructions = 384 instructions)

WAITVID
RDLONG
ADD hubpointer , #4
...

As you output 8 bits in parallel you can also think of a PISO shiftregister at the output, shifting 8 @ 20MHz = 160 MHz!!! The clock for the shift register can be generated by the second counter...

@deSilvas Tutorial: I was concentrating on an introduction for the Machine Code; I assumed at that time that everybody was acquainted with Video, Counters, COG operations etc. I have lerned in the meantime that this was a total misjudgment. But don't fear the counters, AN001 is excellent, e.g. this sketch gives you 99% of the information you need. The 1% missing is the table of the 64 timer/counter modes

Post Edited (deSilva) : 10/13/2007 2:11:30 PM GMT

hippy · 2007-10-13 13:49

@ deSilva : I certainly will not criticise the tutorial and effort you have put in or the help you give in the forum. Like everything, people have to get up to speed and with a new processor there's often a lot that has to be learned. There are always those initial hurdles, but once 'the light comes on', it is plain sailing after that.

I have found the Propeller easier to understand than other devices but the devil is always in the detail. For example, I know I can set a 20MHz clock for CTRA to drive the video shifter, I know that's done by setting PLLDIV in CTRA, but I have to determine what value PLLDIV to use. I know it is to create a division of "VCO", but nowhere in the Propeller Manual can I see anything which says what frequency "VCO" runs at or what "VCO" actually is ( although I know what a generic VCO is ). Reference to "VCO" only appears within the CTRA/CTRB description, same in the Data Sheet. Figure 7 of the Counter App Note shows VCO driven by a "Clock In", but it's not clear where that comes from or what frequency it is either.

It may be very obvious to others and will be to myself once I have done it ( and if I had a scope I could no doubt set PLLDIV, experiment and see what happens to fill in the missing gap of knowledge, but I don't have that option ). Those are the hurdles I find myself coming up against.

Like "How can one eat an entire elephant ?" ... one bite at a time, I'm slowly but surely getting there.

deSilva · 2007-10-13 14:20

@hippy: I think most misunderstandings come from the idea that a microcontroller - especially one designed by Chip - must be something extremely intelligent... On the contrary: The Propeller is the most simple thing imaginable. If ever in doubt how something can work, think of the simplest solution possible

So with the counters: It is BIT 31 of PHSA driving the PLL nothing else -have a look at the sketch above.

This "BIT 31 trick" is also used for the PWM mode, which needs some more attention from the programmer but realizes one (!) PWM pulse just in the same way.

But back to the PLL. When you load the CTRA with the value to be added each system clock (80 MHz), then you will find that the sign bit 31 will not toggle as regularly as you should need.. This is only the case when CTRA consists of just one bit. This is due to "remainders", as adding to PHSA is equivalent to finding out "how often" CTRA goes into 2^32. A remainder will cause a "jitter" of bit 31.

The PLL is used (and has to! Especially when you have large values for CTRA) to equalize those iregularities. So after the PLL has stabilized you will get a very smooth frequency out of the timer.

Note also that PLL = 1 does not mean: "no PLL", but that PLL output is devided by 16, after being multiplied with 16.

Post Edited (deSilva) : 10/13/2007 2:26:04 PM GMT

hippy · 2007-10-13 15:19

It wasn't really imagining it was more intelligent than it is but that something ( "VCO" ) was referenced but didn't appear in any diagrams

With FRQA/PHSA + PLLx16 as the "VCO" that makes more sense, although why "VCO" is used instead of referring to PLLx16 output I don't know; maybe I'm not intuitive enough or just don't like guessing what unclear things may mean.

So ... to get my 25nS bit times ( 20MHz clocking frequency for waitvid ), with 5MHz+PLL16x system clock (80MHz), I set FRQA=$8000_0000>>4 (5MHz), after the PLLx16 that is
80MHz, so I use PLLDIV=%101 ( divide by 4 ) and that gives me the 20MHz I'm after.

Have I got that right ?

hippy · 2007-10-13 16:30

For anyone interested, here's the code I've got so far. No idea what bugs it has and the hardware and actual operation of ethernet out is completely largely untested.

Added : I got brave. Connected direct to the hub ( 2 x 68R, no magnetics ) the Hub Link Led comes on, the Rx Led flashes once a second so something is getting through. Collision Led also comes on so it's not quite right. Now it's time for poke 'n' hope. In the meantime, if anyone with a scope wants to take a look at what's coming out it would be very helpful.

Post Edited (hippy) : 10/13/2007 5:14:55 PM GMT

hippy · 2007-10-13 18:18

Another D'Oh moment !

We're sending bytes as bits, and the "waitvid colours,#bits" gives a mapping of one byte per one Cog instruction so potentially allowing a 496 packet less overhead in setting things up. It's even better than deSilva's un-rolling the Hub Fetch Loop, just copy/convert the packet bytes at leisure into a sequence of waitvids and then let them run. That's rokicki idea but using immediates not registers.

deSilva · 2007-10-13 19:00

Hmmm, when you only want to output a single bit stream, you can let the WAITVID shift out all 32 bits, which will give you 32 instructions time @ 20 MHz shift-clock. My former calculation based on outputting 4 full bytes, which however is not the case in your application...

BTW: This is exactly what Ariba suggested...

Post Edited (deSilva) : 10/13/2007 7:05:02 PM GMT

Phil Pilgrim (PhiPi) · 2007-10-13 19:39

In Ariba's scheme, the two Manchester bits are output in parallel. You would need a separate shifter to serialize them. Another approach would be to use the chroma clock to provide the two Manchester phases. Use one bit per pixel, and pick two "colors" that give opposite chroma outputs.

-Phil

deSilva · 2007-10-13 20:16

Wow!

Ariba · 2007-10-13 21:14

My suggestion was to do the Manchester coding direct in the prepared BitStream, thats why I said you need 1 word per output-Byte (2 Bits in the Stream per Databit to output).
The two parallel Bits are the TX+ and TX- that must be invertet to each other. For that the Color0 is %01 and Color1 %10.

Andy

Post Edited (Ariba) : 10/14/2007 1:59:46 AM GMT

hippy · 2007-10-13 23:26

@ Ariba : I'm with you on that, and your info helped a lot.

I expand each bit of the raw bit-stream to two bits Manchester encoded ( 16 original bits in a long / 1 word per output byte as you say ), stream them out ( as 32 sequential bits ) and the two colours 0/1 automatically give the bi-phase TX+/TX- lines.

I have to admit that when not thinking straight it is easy to confuse the Manchester encoding with the bi-phase signal production.

Phil Pilgrim (PhiPi) · 2007-10-14 00:01

Ariba said...
... and the ColorRegister contains two colors with the bit pattern %xxxxxx10 and %xxxxxx01 for the inverted and not inverted Manchester Output.

Yup, I completely misunderstood the intent of those two bits. It didn't occur to me that it was for the biphase lines. When I saw the word "inverted", I was thinking phase instead of polarity!

-Phil

Fred Hawkins · 2007-10-14 01:03

A citation of it in this original naval meaning comes from Adam Martindale in 1683, in A Collection of Letters for Improvement of Husbandry & Trade:

"A Token for Ship-Boys, or Plain-Sailing, made more plain."

The first use of it in a figurative sense, meaning simple 'easy and uncomplicated' comes in Fanny Burney's Camilla, 1796:

"The rudiments, which would no sooner be run over, than the rest would become plain sailing."

From http://www.phrases.org.uk/meanings/plain-sailing.html

Ariba · 2007-10-14 01:43

Yes, I know it was confusing, but I don't found the right words in English.....

I think about also receiving Ethernet with a Propeller (with bitbanging). The question is: How precise is the 10MHz timing of a sending device, is there a norm for that? If it is precise enough it should be possible to sample the RX+ signal with only 10MHz exact in the middle of the second Halfwave of the Manchester coded bits. The Cog has to wait for the Preamble and samples then a fixed number of bits into the Cog-RAM. After the whole packet is received, it can be decoded and transfered to the Hub-RAM.

Perhaps somebody knows the answer for this questions:
1) Is the source of the Manchester modulation timing always a crystal?
2) What is the minimal Vpp level on the RX+,RX- signals?
3) What is the maximal size of a packet for tcp/ip?
4) How long is the minimal time between 2 packets?

Andy

Sapieha · 2007-10-14 01:55

Hi Ariba.

All modulation timing is always crystal modulated.
Vpp ??
Packets size and time is variable (sender from prop can control it)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.

Sapieha

Harrison. · 2007-10-14 02:21

The following documentation gives the specs for full duplex ethernet: tp.cs.princeton.edu/techreports/2002/645.pdf . The specs include packet size, packets per second, etc.

Sapieha · 2007-10-14 02:53

Hi hippy.

Maybe it helps

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nothing is impossible, there are only different degrees of difficulty.

Sapieha

Bit-Banged Ethernet

Comments