Four Port Serial with 152 Longs* (with lots of caveats)

Duane Degn · 2015-02-18 20:55

The "149 longs" figure comes from what is required by the child object. This does not include any tx or rx buffers. It also doesn't include the PASM section of the code.

This version of Tracy Allen's 4 port serial driver is a continuation of my effort to free up program space in my hexapod. Earlier I posted my version of F32 which runs after the PASM section of the code has been retrieved from upper EEPROM. This is a variation on Dr_Acula's idea of "cogjects".

I found separating Tracy Allen's 4 port serial driver into Spin and PASM sections much harder than any of the other objects I've attempted to separate this way.

There were a lot of values "poked" into the PASM code prior to the PASM section being launched in a new cog.

As was the case with Tim Moore's original program and the variations I had made of his program, Tracy's object preserves the shortcut of using the same variable in both PASM and Spin.

Since I wanted to separate the PASM from the Spin, I ended up creating a second copy of many of the variables. I added the extension "_cog" to indicate the variable to be used in the PASM section of the code.

I soon realized many of the variables were just used in PASM but the initial values of these variables were generated in the Spin section of the code. Rather than creating Spin versions of these variables, I loaded the initial values into the tx and rx buffer. Once these initial values are read into the cog, the RAM is reused for the buffers.

In hopes of making it obvious on how to access the PASM section to store to either upper EEPROM or to a SD card, I moved the PASM section to the parent object.

The parent object is required to pass the address of the rx and tx buffers as well as the location of the PASM code.

The size of the buffers are defined as part of the "AddPort" method. This allows the buffer sizes to be changed without the need to update the PASM code stored in EEPROM.

As pointed out in the comments of the "Init" method:

PUB Init(bufferAddress)
'' Always call init before adding ports
'' The buffer at location "bufferAddress" should be
'' long aligned. 96 bytes of the buffer will temporarily
'' be used to pass values to PASM. The parent object needs
'' to make sure there are 96 bytes available at the location
'' of "bufferAddress". This will unlikely be a problem since
'' the combined buffers of the serial driver will likely
'' exceed 96 bytes.

  bufferPtr := bufferAddress

There needs to be at least 96 bytes at the location "bufferAddress" in order to provide room for the initialized variables. This may need to be 99 bytes depending on the alignment of the buffer. The initialization variables will be long aligned and if the "bufferAddress" isn't long aligned, the space used to initialize the variables could be up to three bytes more than the "96" figure.

The buffer does NOT need to be long aligned. I forgot to correct the comment before archiving it. I've tested the code with several different buffer alignments.

There doesn't appear to be any issue if unused buffers are set to a size of zero.

If anyone needs help loading the PASM to EEPROM, let me know. I can also provide and example of how to retrieve the PASM code from EEPROM is requested.

BTW, All the variables are in the DAT section so this object could be used from more than one object if desired. The version I'm using includes locks. I'm willing to post my version with locks if anyone is interested in seeing it.

Edit (2/26/15): Removed earlier version of code and replaced it with a version which will allow the rx and tx buffer to used the same space as the PASM code. See post #9 for more information about the new version of code.

mhm21 · 2015-02-26 06:57

This is hugely interesting! I am running into space issues with my current project as well. Can you point me to an outline of how to load the PASM to eeprom?
Once the driver resides in eeprom, at what point do you read it back in? And where does the pasm code get written in main memory?

If you can't tell, I am on step 0.00001. Thanks for any guidance you can give!

Duane Degn · 2015-02-26 08:51

mhm21 wrote: »

This is hugely interesting! I am running into space issues with my current project as well. Can you point me to an outline of how to load the PASM to eeprom?
Once the driver resides in eeprom, at what point do you read it back in? And where does the pasm code get written in main memory?

If you can't tell, I am on step 0.00001. Thanks for any guidance you can give!

I was editing my demo code in preparation to attach it to a forum post when I realized I have a significant problem with my current strategy. I use the tx and rx buffers to hold initializing data which is read from PASM. I'd also like the option of locating the rx and tx buffers in the same section of RAM as used by the PASM image. In order to do this, I need to change the code a bit.

I'll upload some demo code once I get this kinked straightened out.

tonyp12 · 2015-02-26 10:06

With Pasm cog, using less than 496 longs only saves the load-code stored in hub with some expertise can be reclaimed.
Use it as the RX/TX buffer for example.
So I say max out the Pasm to do most of the work.

With ring buffers, using a fixed length of 128, 256 or 512 is the best way to go.
Using a base address for buffer location is needed as to avoid using double the space just to get it aligned.

mov  t2, TxBase
add  t2, TxPnt
rdbyte mybyte, t2
add TxPnt,#1           'prepare to get next byte
and TxPnt, #TxSize-1   'auto wrap-around it.

Tracy Allen · 2015-02-26 10:22

Hi Duane,
I was away for a while and missed this when you first posted it. It sounds like this effort is part of a larger context you are working on. Do you have related links handy?

I wondered first off about what you were going to do about the rx/tx buffer overlay. One way or another the program needs its rx/tx buffers. The pasm footprint is substantial, about 1600 bytes above the part that can normally be recycled for rx/tx buffers.

It has always bothered me (and others too judging from PMs), that the buffers sizes are CONstants and cannot be set up at run time. I've made an alternate version to do that. The buffers still default to CONstants. The rx or tx pin number are passed as usual, and -1 if disabled. The extension is that ORing in a value in the high word of the pin number parameter serves as the pointer to the external rx or tx buffer passed in at run time. As I set it up, the buffer must contain a valid zString that determines its size.

Tracy Allen · 2015-02-26 10:35

Tony,

The 4port code uses CMPSUB to implement any buffer size, nothing special about powers of two.

add     rx_head,#1
    cmpsub  rx_head,rxsize   ' (TTA) allows non-binary buffer size

I recall that Duane in earlier threads has reclaimed almost all of the pasm code footprint for use as rx/tx buffers once the pasm cog was loaded. But that means that it cannot be restarted without reloading from EEPROM. That is okay in some circumstances.

The 4-port object does reuse as buffers the hub locations that are mirrored as data into the cog but no longer used by the spin code. Restartable without reloading from EEPROM. As it is, the cog is nearly full. Not much room for additional initialization in pasm.

Duane Degn · 2015-02-26 10:38

tonyp12 wrote: »

With ring buffers, using a fixed length of 128, 256 or 512 is the best way to go.

Tracy uses cmpsub (IIRC) instead of and to wrap the buffer. This allows buffers to be any size.

And yes, I'm aware the only space which can be reclaimed is the PASM image. I'm launching three different PASM images from the same buffer. Once the PASM code has been launched, the the buffer is reused for stack space and rx and tx buffers.

Duane Degn · 2015-02-26 10:47

Tracy Allen wrote: »

Hi Duane,
I was away for a while and missed this when you first posted it. It sounds like this effort is part of a larger context you are working on. Do you have related links handy?

I wondered first off about what you were going to do about the rx/tx buffer overlay. One way or another the program needs its rx/tx buffers. The pasm footprint is substantial, about 1600 bytes above the part that can normally be recycled for rx/tx buffers.

It has always bothered me (and others too judging from PMs), that the buffers sizes are CONstants and cannot be set up at run time. I've made an alternate version to do that. The buffers still default to CONstants. The rx or tx pin number are passed as usual, and -1 if disabled. The extension is that ORing in a value in the high word of the pin number parameter serves as the pointer to the external rx or tx buffer passed in at run time. As I set it up, the buffer must contain a valid zString that determines its size.

This is part of my hexapod project. I've done similar things with the servo driver and F32. There are links to these threads in post #1.

The buffer location is set with the "Init" method. The buffer sizes are now set with the "AddPort" method. The program uses 96 (I think) bytes of the rx and tx buffers to load data required by the PASM code but not Spin. The same EEPROM image can be used for any combination of buffer sizes. The PASM section does not have to be reloaded to EEPROM if changes are made to the number of ports or if any of the "AddPort" parameters are changed.

The problem my current version has is, if the buffer is designated as the same location used to store the PASM image, the "AddPort" method ends up overwriting the PASM code before it gets launched. Since I want the option of using the same buffer for both the PASM and tx and rx buffers I'm modifying the code a bit to launch the PASM code and then wait until all the initialization variables are loaded before starting execution of the PASM section. Unfortunately these changes increase (though not much) the size of the program.

Since my hexapod code uses the first section of the PASM buffer as stack space, the problem with the "AddPort" method wasn't noticed when I ran my hexapod code.

Edit: As you suggested, the PASM code would need to be read from upper EEPROM again in order to restart the serial driver. I don't plan to restart the serial driver so I've removed the "Stop" method to save RAM.

Duane Degn · 2015-02-26 16:23

I updated the code attached to post #1 to a new version which can use the same memory location for the PASM image and the rx and tx buffers. The new version of the child object is 152 longs.

The program uses the rx and tx buffers to hold values which will be loaded into PASM variables (the cog will read these values which previously had been poked into place). Since the rx and tx buffers may be set to the same location as the PASM image, the PASM image needs to loaded to the cog prior to writing to the rx and tx buffers. In order to get the PASM code safely in the cog before it gets overwritten, I now launch the cog from the "Init" method instead of the "Start" method.

PUB Init(pasmAddress, bufferAddress)
'' Always call init before adding ports
'' The buffer at location "bufferAddress" should be
'' long aligned. 96 bytes of the buffer will temporarily
'' be used to pass values to PASM. The parent object needs
'' to make sure there are 96 bytes available at the location
'' of "bufferAddress". This will unlikely be a problem since
'' the combined buffers of the serial driver will likely
'' exceed 96 bytes.

  bufferPtr := bufferAddress

  result := cognew(pasmAddress, @startFlag) + 1
  waitcnt(80_000 + cnt) ' give time for PASM to load completely
  longfill(bufferPtr, 0, LONGS_TO_CLEAR) ' unused port's values will be set to zero

The longfill command keeps unwanted values out of the variables reserved for unused ports. The longfill command is required if all four ports aren't being used.

The "Start" method still sets up the pointers to the various buffers. These pointers are used by both the Spin code and PASM code but now the values are loaded to the PASM variables from within the cog. The PASM code has waiting in a tight loop while all these values are set. Once all the variables have been set to their proper values the PASM code is allowed to start reading them in.

The variable "startFlag" is used to indicate when the cog running the PASM code can start reading in all these values and commence executing the serial driver.

Here's the last line of the "Start" method.

startFlag := 1

The variable "startFlag" is a word but it is long aligned. When "@startFlag" is passed to par, par is correctly set. Generally it's best to pass the address of a long when launching a cog.

The PASM code which has been waiting for the "startFlag" to be set is this:

mov     t1, par
waitToStart             rdword  rxsize_cog, t1 wz ' read startFlag
              if_z      jmp     #waitToStart

I like the nice short two line loop. I forget which object I learned this trick from but I think it's a useful one.

As with the earlier code I posted, this version can use the same PASM image no matter what values are used with the "AddPort" method and the same image should work with any combination of buffer sizes.

I'll add examples of loading the PASM image to EEPROM and reading the image from EEPROM soon.

There isn't really a need to store the PASM code in EEPROM unless more than one PASM image is to be stored this way. If only the serial driver were being used in its "cogject" form, there isn't a need to load it to EEPROM. The space taken up by the PASM code could be reused for stack space, rx/tx buffers and any other buffer space one may need.

Even when the PASM is stored in EEPROM, there needs to be a section of memory large enough to hold the PASM code in RAM. The code won't take any less space if the PASM is stored in EEPROM unless there are multiple PASM sections which will launched this same way.

Other objects which use PASM sections of code can be treated this way. Some objects are easier than others to separate into Spin and PASM sections. The servo object was pretty easy to separate but the F32 object was difficult. Separating the Spin from the PASM code of this serial object was more difficult than separating the Spin and the PASM of other objects.

ksltd · 2015-02-26 17:16

FWIW,

I decided some time ago that the bit-banging approach to serial just makes little sense for low-to-medium volume hardware products. The cost of a full featured UART is pretty small and the throughput increase, code size decrease and reliability improvements are remarkable. While the bit banging is cute, it's just doesn't seem to be the right tradeoff.

I have a driver for 8x SC17IS740s that is 102 longs and gets me about 230K bytes per second aggregate throughput at baud rates up to 5Mbps, the hardware's limitation. And all that for only 7 IO pins.

On a related front, I've worked from Tracy's bit-banging quad driver for some time and exchanged quite a few notes with him. I've recently found several opportunities to improve the performance of both the Spin and the assembly language portions.

In the assembly language, there are really three things to do. First, the two lines of code between start bit detection and capturing cnt need to be moved out of that path. Capturing cnt turns out to be critical to high speed performance. Second, don't receive the stop bit in the receiver loop. Instead, decrease that loop count to only 8 bits and then, after storing the byte, have a new loop that only awaits the data wiring returning high. Finally, there are several places in the transmit code where the code path between jmpret instructions is longer than required. There is no reason for any code path to be longer than the shortest path possible, which is the transmit loop itself. Adding some additional jmpret instructions adds a bit of code size but has a remarkable impact on the performance of the receivers. Remember that the worst case receiver latency is the sum of 3x worst case receive latency plus 4x transmit latency. Maximum throughput really increases dramatically when those seemingly insignificant tweaks are made.

The spin changes are of an entirely different nature. There are two different areas. First, because the Spin compiler generates lousy code, source level optimization becomes critical. There are lots of common subexpressions that, if factored, cause substantial throughput increases. Second, since most of my code deals with protocol engines, my paths through put and get are dominated by block writes and block reads with timeouts. In both cases, I've implemented transactions into the buffers that are at most two bytemoves. No byte-by-byte copies into the buffers, ever.

I'm happy to post the code, I've re-written both the Spin and assembly language portions entirely because they were so damned unreadable. All of the DTR/CTS active high/active low and other glop is gone. It will take a little effort to post, but if there's interest let me know.

Duane Degn · 2015-02-26 17:25

ksltd wrote: »

I'm happy to post the code, I've re-written both the Spin and assembly language portions entirely because they were so damned unreadable. All of the DTR/CTS active high/active low and other glop is gone. It will take a little effort to post, but if there's interest let me know.

I'm very interested in your code. I wondered about removing the handshaking code since I rarely need it and it looks like it could slow the driver down (and use memory). I haven't tried removing it since I'm not really sure how to do it. I don't understand all the parts of the code which use handshaking. I know for some the handshaking features are very important.

ksltd · 2015-02-26 17:34

The flow control code does add a pile-o-code distributed through the code for each port.

My own evaluation of reducing code size if you really need the quad port driver is that you can reclaim all of the assembly language storage provided you don't need to reinitialize once you're running. That's a much cleaner solution than moving the thing externally and doing the EEPROM load.

Duane Degn · 2015-02-26 17:44

Here are the programs to load the PASM section of the serial driver to EEPROM and to read the PASM code from EEPROM.

The "LoadToEeprom" program doesn't write to EEPROM in its current state. Line #39 needs to be changed:

writeFlag := 0 ' change this to one to write to EEPROM

Change the zero to an one (or any other number) to enable writing to EEPROM.

The program will display the size of the PASM image as well as the memory location of the code. If one were to use this program on their own code, they would want to write down the size of the PASM image to use with the program reading the image from EEPROM.

I should have mentioned previously about how to use the demo program. I originally wrote this demo to test my data logging program. I used the data logging program to log data from two balances, and a spectrometer. This is the reason I named the serial ports the following:

' Com enumeration
  #0, DEBUG_COM, ANALYTICAL_BALANCE_COM, LARGE_BALANCE_COM, SPECTROMETER_COM

Presently the "LARGE_BALANCE_COM" is disabled so I could test the driver with unused ports and zero sized buffers. The active ports may be linked together by connecting the tx line of one port to the rx line of a different port. The debug line will identify the various ports both as they transmit and as they receive.

Sorry, I forgot to set the baud of the two programs to the same value. The "Load" program uses a baud of 115,200 and the "FromEeprom" program uses 57,600. I've had trouble using the four port driver at 115,200bps when more than one port is active. Hopefully ksltd's changes will allow faster communication speeds.

Duane Degn · 2015-02-26 17:54

ksltd wrote: »

The flow control code does add a pile-o-code distributed through the code for each port.

My own evaluation of reducing code size if you really need the quad port driver is that you can reclaim all of the assembly language storage provided you don't need to reinitialize once you're running. That's a much cleaner solution than moving the thing externally and doing the EEPROM load.

As I mentioned in an earlier post, there isn't any benefit to moving the PASM to EEPROM if the serial driver is the only PASM being moved. If you have several PASM sections stored in EEPROM then there is a benefit from the saved RAM.

ksltd · 2015-02-26 18:06

You can save a pile of code by never using cognew. It's a dumb idea, just like locknew. If those things ever fail, your program is toast. Instead, statically assign all your cogs and pass them into the various start/init/stop routines. You'd be shocked what that saves ...

Tracy Allen · 2015-02-26 18:28

Tim's original code did not support handshaking, but he added it by popular demand, requests here on the forum. I left it in, but I must say I could count on one hand the times that I've actually used it in an application. A larger buffer usually seems to be a better solution. I have used it with XBees. My feeling is that all four ports would not need to support it, but who knows. This is one of the few FDS objects that supports flow control.

The handshake code does not entail a performance hit when it is not enabled. The pasm initialization patches the code so that it completely short circuits unused co-routines. On the other hand, when handshaking is enabled, there is obviously a speed hit. RTS flow control is especially slow latency. After a byte is received, there are two hub accesses to write the byte received and to write the head pointer, and those writes are optimally spaced to hit the sweet spot. Then a co-routine switch. But with RTS flow control enabled there is a third hub access to read the tail pointer, albeit on the sweet spot, and a bunch of other stuff to control the RTS pin before it gets to the co-routine switch. Duane, it is pretty easy to jettison the flow control code. I can help with that.

Wayne, it could be mutually beneficial if you see fit to post your optimizations here, and/or to the OBEX. One size definitely does not fit all.

For low speed data I'd rather keep the uarts on the Propeller rather than to provide real estate for external uart chips, but I can certainly see why you would want external hdwr for your higher speed protocol engines.

ksltd · 2015-02-26 19:22

Yeah, that's not an unreasonable approach - but it does chew up considerable memory.

Here's my 1x serial object/driver/thing. It's 200 longs all up, can be stopped and restarted and runs at baud rates to 460800. There's a bit of upside remaining.

Serial_1X.spin

I'll try to get to the 4x version soon ... ish ...

ksltd · 2015-02-26 19:23

One other thing I neglected to mention is that in the calculation of Ticks-Per-Bit and in the Shift-Offset, rounding is critical for correct operation at high baud rates.

All of the optimizations I mentioned are included in the 1X thing posted above.

xgaProp8 · 2016-02-08 02:50

Hello, I guess I'm "bumping" this thread a little. I recently started a Propeller project using the amazing single-cog XGA video driver by @kuroneko, the four-port RS-232 driver in this thread, and some glue logic. But I hit a problem tonight, and it appears that the quad-port RS-232 driver is at the very least somehow messing with memory it's not supposed to.

Basically, I have one cog running a 5-minute "live timeout" on the VGA driver (I.E. not using WaitCnt), while scanning one of the serial ports for data. (Upon receipt of data, it'll turn the display back on.) But the problem occurs with the otherwise excellent (customizable and small) four-port RS-232 library in this thread: incoming serial data causes my display timeout to immediately decrement to zero. If I drop in the (ginormous) "fullDuplexSerial4port" object, no amount of input data causes any problem...except for the memory usage! I have no clue what is going on.

At the moment, I am only initializing one port in this driver. I also am not providing a transmit buffer, but it doesn't change anything if I give it a transmit buffer. (I am not trying to send anything.)
''
serial.initialize
serial.Define_Port(3, 9600, serial#No_Pin, 0, 0, 7, serTouchBuffer, 16) 'NO transmit buffer given.
serial.Start
''

Here's a code snippit of my delay timer. It doesn't matter if I read the data from the serial port driver (with serial.Get) or not.
''
long SecChaser, DisplayTime

vga.start ' start the video driver.
DisplayTime := VGATimeout ' specify that the display has just been turned on
SecChaser := cnt

repeat
'Display timeout
if (cnt - SecChaser) > clkfreq
SecChaser := cnt
if DisplayTime > -1
DisplayTime-- ' TODO: Optimize. There's probably a better way, "Decrement if not negative"

if DisplayTime == 0
vga.str(String("Display Off"))

vga.SetPos(5, 16)
PrintDec(DisplayTime)
''

Basically, send some serial data to the Propeller, and "DisplayTime" decrements to zero in a jiffy. Does anyone know what's going wrong here?

xgaProp8 · 2016-02-08 12:55

My bad. Perhaps 10 minutes after writing the post, I realized that I'd forgotten the "@" when passing "serTouchBuffer" to the routine! Fix that to "@serTouchBuffer", and it works swell.

Four Port Serial with 152 Longs* (with lots of caveats)

Comments