The 3-Bit Protocol

deSilva · 2007-10-07 10:36

Funny how the "3-bit protocol" of the boot loader nearly automatically leads to head scrapping

I have prepared a short explanation some time ago, and can as well post ist here as I think I did't do it before...:

deSilva said...
... remember that there is no such thing as a "stop bit". "High" is the idle state of the asynchronious line and it falls back to it, whenever a byte has been sent. To indicate the start of a new byte the line has to drop (or change polarity). Clearly this cannot contain any information; the real information is contained in (and retrieved from) the next 5 to 8 "Bit Cells", sampled according to a fixed timing pattern after the falling edge "t" of this so called "start bit": around t+1.5*d, t+2.5*d,.... where "d" is the length of the "bit clock", e.g. 1/9600 seconds. "Around" means that there is generally more than one sampling per bit clock, to improve reliability.

You immediately notice by this awkward sequence (and the redundancy of the "start bit") that it would have been aesthetically much more satisfying to make startbits of half the bit clock. I absolutely agree, but they hadn't, mostly for lack of intelligence in the early teletype terminals, which themselves were compatible offsprings of the even earlier telex tickers.

To summarize: For asynchronious communication we need a drop of the line (called start bit), and a short time between the last sent information bit and the next drop of the line, to give the receiver a chance to:
- synchronize small deviations between the bit clocks
- process the assembled byte, if the reception of bits could not be done by a parallel unit ("UART")

For mostly historical reasons (and a little bit of paranoia with worst case considerations from signal theory) the start bit has the same length as the bit clock, and the idle state after all bits have been sent is at least of that length (or at least 1.5, or even twice as much)

The "weak spot" is d, the bit clock to be agreed upon between transmitter and receiver. It is synchronized at the falling edge of each start bit, but has to hold out until the last bit, which is in most cases 8. So the following condition must hold under all circumstances:
8.5*|drcv - dxmit| <0.5*d
which means the clocks must not differ more than 1/17 = 6%

What if they do? Or to put it more interestingly: What if you do not know the bit rate someone is transmitting at?
You can make this a part of the protocol: "In the beginning, please send some "P"s and I try to guess your bauds.." This indeed works, and is generally advertised as "auto bit rate detection"....

But you can make it even more tricky. Someone remember the old days when you associated "SPIN" with magnetic tape reels? NRZ? Things like that?

Right, we need a clock! Missing a line for that we just mix the clock into the data signal which is - obviously! - then called "self clocking".
So, here you have your clock: 0_1 0_1 0_1 ....
Just wait for the falling edges! They must not even be extremely regular! Each marks a bit cell.

"Oops - but is this not just what we have discussed already? A start bit! And an idle condition?
And - maybe, maybe - ONE BIT of information in between: See, I well noticed these underscores you put there "

Nearly! The difference is this: The transmitter makes the length of the idle condition definite: exactly the lenght of the startbit, not a teeny weeny bit longer or shorter. Both can have arbitrary length but each the same!

No we carry on to the "underscore". Until now there was no "information" in the clock signal. But we shall now "merge" it with our data: Each underscore is replaced by a 0 or a 1 (of the same length as the leading 0 and the trailing 1)

Note that this does not at all influence the recognition logic for the clock; that still depends on the falling edge. The difference is now that the rising edge is sometimes "early" and sometimes "late". To determine this is a piece of cake and works upto very high frequencies.
Say, you see a drop at time t=0, the rise at t=R and the next drop at D, then R/D should be 1/3 or 2/3 - easy to distinguish. BTW: The clever Chinese call those fractions the "lesser half" and the "greater half"

When analysing this from the point of signal theory it is a most simple phase encoding protocol; but I hope you also enjoyed my more redundant explanation.

Now to the last part! This string of (synchronious) selfclocking data can be arbitrarily long, no need for any clock synchronization. But our PCs have their problem with sending this, let alone receiving it...
So the Propeller wraps this into pieces of 7 bit asynchroniously transmitted (by whatever bit rate). Look here:
1 .. 1 0        X         1       0X1  0X   1
idle   async    data     pads      2.   3.  idle line
line   start    0 or 1   to next            for ONE bit clock!
       bit               clock
       = clock
Note that we will have problems determining the size of the 3rd item if the idle condition is uphold for longer than one bit clock. This should normally not be the case, and this situation can be identified by comparing against the length of first two items, there simply can be no difference in the bit clock.

Post Edited (deSilva) : 10/7/2007 9:46:52 PM GMT

bambino · 2007-10-07 11:25

Thank you deSilva,

This is quit a bit to chew on. I'll have to shut up and read awhile!

bambino · 2007-10-07 11:43

Ok!

I want·to understand this because when I ever get the time to develope it I would like to take a Pic and create an interpretor between the Prop and a uSD using pins 29 and 28.

I don't fully grasp this yet due to the inversion of the signal by an ordinary 232 transiever, but my main confusion is this!

The Prop deals with this bit packing from the uart,· But it doesn't spit that confusion out to the eeprom, unless I have totally ate someone else's breakfast here! When the prop boots from the pins 28 &29 it is expecting normal bytes on an I2C interface correct? If so, then, as interesting as it may be, there is no need to deal with the bit packing!

deSilva · 2007-10-07 12:21

You are absolutely right! The EEPROM contains absolutely clean data, and the communication with it are purest I2C

hippy · 2007-10-07 13:31

Fundamentally, all communication between the PC and Propeller during download is done by sending a pulse of width 1T to indicate a 1 bit and a pulse of width 2T to indicate a 0 bit. These are negative going pulses.

If the PC could bit-bang with reliable timing it would be possible to churn these pulses out as required. Unfortunately most PC's cannot, but when using a UART accurately timed pulses can be generated by using the start bit and data bits of the serial data.

The serial line normally idles high and a start bit is a negative going pulse. Thus by sending a serial byte with it's data such that the line is kept high, a single 1T negative pulse ( the start bit ) is sent. This is equivalent to sending a 1 bit encoded as a 1T pulse.

If the next bit in the transmitted byte after the start bit is set such that this is also low, as is the start bit is, a single 2T negative pulse is sent. This is equivalent to sending a 0 bit encoded as a 2T pulse.

We could stop there and would have implemented the PC to Propeller interface needing one byte for every bit we need to pass over. That's not very efficient and there is no reason we cannot alter other bits within the byte to produce more negative going pulses.

Doing this we can encode at least three bits to send within a single byte which is sent.

There's no reason to have to send three bits packed as three pulses within every byte. We could convey one, two, three or more bits. It is only the width of the negative going pulse which is important, the highs between pulses can be of any length.

In the Delphi code which Chip posted, bits to be sent are always packed as 3T entities; a 1 bit is a 1T low followed by 2T of high, a 0 bit is a 2T low followed by a 1T high; hence 3T for every bit.

By limiting the high after the negative going pulse to a period of just 1T we can send a 1 bit in a 2T period ( 1T low, 1T high ) and a 0 bit in a 3T period ( 2T low, 1T high ). These bits can be better packed into single bytes ( start bit plus eight data bits ) to give a higher throughput when 1 bits are being sent.

Remember that the Propeller Chip itself is not at all interested in how we are byte framing the data sent, that's just a convenience for us to allow a UART to be used to achieve accurately timed bit-banging, the Propeller is only interested in negative going pulses of width 1T or 2T.

During downloading of the Eeprom image these 1T and 2T bits are thrown at the Propeller Chip as fast as we choose to, so packing multiple bits into bytes makes sense to minimise the transfer times. At the start of the download process and at the end single 1T and 2T pulses are sent which elicit 1T and 2T responses from the Propeller Chip. At this time it makes sense to send these bits packed one bit to a byte as first described. Likewise, the 1T and 2T bit pulses sent back from the propeller, when received using a UART, appear as bytes, the initial 1T part of the negative pulse being the start bit of a received byte, the bit after the start bit indicating if a 1T or 2T negative pulse was sent. Because of bit ordering, that indicator appears in the lsb of the byte received.

This 'byte framing' using a UART on a PC is simply because that's the only way to generate accurately timed 1T and 2T pulses. If sending from a microcontroller, the 1T and 2T pulses can be genuinely bit-banged, and that is how the Propeller downloading to another Propeller Spin code which Chip also posted works. It is different to the Delphi code because bits do not have to be byte framed. A micro with a UART could either use a UART and byte framing as a PC does or it can choose to do traditional bit-banging.

There is one further thing which helps in trying to understand the Delphi code. We have already seen how the the serial data is simply used to convey a stream of 1T and 2T pulses and a variety of packing densities can be used. The Delphi code always packs the bits sent so they occupy a 3T time frame and this is what gives rise to the 'three bits per byte' description. The code also sends in long word ( 32-bit ) lumps to make things simpler to code.

Sending 32 bits requires 32x3T time frames; 96T time frames in total. With 9T time frames available per byte ( each carrying 3x3T time frames ) we can send those 32-bits in 11 bytes, 99T time frames in total. The extra 3T time frame which is not required is simply set so the serial line is left high, neither a 1T nor 2T pulse is sent. It would be possible to bring forward a bit from the next long to be sent but this is not done to keep the code simple.

When comparing the Propeller-to-Propeller Spin code and the Delphi download code, the Spin code uses bit banging, a continuous stream of 1T low and 2T low pulses, while the Delphi code does byte framing. They achieve the same tasks but in different ways.

The Delphi code ( without any implied criticism here ) uses a fixed byte framing regime of three 3T time periods per byte sent and 11 bytes sent for every long word transferred. The complete byte stream ( 1T and 2T pulses ) can be better packed and sent using fewer bytes. As said, it's not how the bits turned into pulses and are packed into bytes which is important but the stream of 1T and 2T negative pulses alone which the Propeller is interested in during receiving.

A final footnote : One might expect a denser 'byte framing' packing algorithm ( 1T low/1T high and 2T low/1T high ) to dramatically reduce the download time on average. In reality that is often not the case. Because unused Eeprom image is 0 bit filled a 3T time frame has to be used for those bits so the gains are a lot less than they would be were the unused Eeprom 1 bit filled. Sending 32KB of zero bytes with most dense packing is around 6% faster than the packing used in the Delphi program. 32KB of $FF bytes would be 30% faster.

Mike Green · 2007-10-07 14:45

Part of the advantage of denser packing is lost since the Propeller Tool does not send the whole 32K image most of the time. There's a length long word provided in the download protocol and (as far as I know) the Propeller Tool only sends the part of the binary image up to where the stack begins. The RAM is cleared by the loader either initially or starting from the end of the download to the end of memory.

deSilva · 2007-10-07 17:53

Mike Green said...
... and (as far as I know) the Propeller Tool only sends the part of the binary image up to where the stack begins.

Even better. It only sends upto the beginning of the VAR section

This is why there can be no value preset in VAR...

Edited a stupid typo found by hippy - thanks!

Post Edited (deSilva) : 10/7/2007 7:06:31 PM GMT

hippy · 2007-10-07 18:48

@ deSilva : DAT has got to be sent or Assembly code and pre-defined variables would be lost

I'm sure you meant to write "upto the beginning of VAR"

Ken Peterson · 2007-10-07 19:32

@deSilva: I beg to differ. There IS such a thing as a stop bit. Ask yourself what would happen if it didn't exist? How would you identify the start bit? For implementation sake, it's a bit long (minimum), so yes there is such a thing as a stop bit. A well designed protocol will test for the 1 value of the stop bit to help verify accurate transmission, so the value is significant. It has no value other than as a delimiter, however.

I just felt like arguing

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔

The more I know, the more I know I don't know.· Is this what they call Wisdom?

Post Edited (Ken Peterson) : 10/7/2007 7:38:53 PM GMT

deSilva · 2007-10-07 19:51

Ken Peterson said...
I just felt like arguing

I know that mode

But If you re-read my posting you will notice that I have been absolutely clean in my reasoning..

BTW: It was a second part of a paper about serial communication, I added as a nice example. So there may be a missing context...

mpark · 2007-10-09 06:49

I'm trying to catch up here...

hippy said...
The serial line normally idles high and a start bit is a negative going pulse. Thus by sending a serial byte with it's data such that the line is kept high, a single 1T negative pulse ( the start bit ) is sent. This is equivalent to sending a 1 bit encoded as a 1T pulse.

If the prop sees just one negative pulse, how does it know if it's 1T or 2T. I mean, it doesn't know what T is yet, right?

Tracy Allen · 2007-10-09 09:53

The protocol starts by resetting the Prop, then immediately expects a calibration sequence consisting of a 1T pulse followed by a 2T pulse. I don't know if the Prop does further tracking on the fly as it processes bits.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

The 3-Bit Protocol

Comments