FYI: Potential Gotcha with multi-core processing

The original code was from a friend of mine asking for programming help.
The idea of the code operation is to sync two or more Propellers to an event initiated by a master Propeller. As it is the Propellers should be able to communicate asynchronous serial between one another.
So, the flow should go something like this ...
1) Master sends a broadcast command via asynchronous serial to all other Propellers that a synchronized event is about to happen.
2) This cues the Clients to 'look' for the sync pulse, initiating the event.
3) Over the same transmission line, the master Propeller sends out a 'sync pulse' which is NOT asynchronous serial.
4) The Clients see the pulse and execute code accordingly.
This seems pretty straight forward in the approach, but it won't work as described. Why?
Answer)
Since the FullDuplexSerial object as well as other serial objects operate out of their own cog, It's possible to send serial commands to the FullDuplexSerial object and return to the calling cog BEFORE all of the serial data has been sent. This is especially true if your baud rate is slow. In this scenario the Master could send the request, and the sync before the serial data had completed from the FullDuplexSerial object, thus corrupting the serial data. As a result of the corrupted data, the Client's would never be able to determine that there was an upcoming sync event at all.
To visually see this, look at the attached image with reference to the commented corrected code.
Below is the original code sent to me trying to accomplish this task.
Below is virtually the same code with comments and corrections.
The idea of the code operation is to sync two or more Propellers to an event initiated by a master Propeller. As it is the Propellers should be able to communicate asynchronous serial between one another.
So, the flow should go something like this ...
1) Master sends a broadcast command via asynchronous serial to all other Propellers that a synchronized event is about to happen.
2) This cues the Clients to 'look' for the sync pulse, initiating the event.
3) Over the same transmission line, the master Propeller sends out a 'sync pulse' which is NOT asynchronous serial.
4) The Clients see the pulse and execute code accordingly.
This seems pretty straight forward in the approach, but it won't work as described. Why?
Answer)
Since the FullDuplexSerial object as well as other serial objects operate out of their own cog, It's possible to send serial commands to the FullDuplexSerial object and return to the calling cog BEFORE all of the serial data has been sent. This is especially true if your baud rate is slow. In this scenario the Master could send the request, and the sync before the serial data had completed from the FullDuplexSerial object, thus corrupting the serial data. As a result of the corrupted data, the Client's would never be able to determine that there was an upcoming sync event at all.
To visually see this, look at the attached image with reference to the commented corrected code.
Below is the original code sent to me trying to accomplish this task.
CON
_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000
OBJ
Ser : "FullDuplexSerial" 'Used for Debug output
CON
RX = 31
TX = 30
SyncBit = 37
Pub Start
Ser.start(RX,TX, 0, 9600)
repeat
Ser.str(string("Serial Enabled"))
Ser.tx(13)
Ser.tx(SyncBit)
Ser.stop
' waitcnt(cnt+clkfreq/5000)
'DirA[TX]~~ 'Set the direction bit to write out
'OutA[TX]~~ 'Set the pin to High to signal all other devices that sync has occured
' waitcnt(cnt+clkfreq/5000)
'DirA[TX]~ 'give up control of the pin
Ser.Start(RX,TX,0,9600) 'restart serial
Below is virtually the same code with comments and corrections.
CON
_clkmode = xtal1 + pll16x
_xinfreq = 5_000_000
OBJ
Ser : "FullDuplexSerial" 'Used for Debug output
CON
RX = 31
'TX = 30
TX = 7 '<- for scope debugging only
SyncBit = 37
Pub Start
'Ser.start(RX,TX, 0, 9600) '<- can't just run inverted mode, need to run open-drain
Ser.start(RX,TX, %0100 {<-mode open drain}, 9600)
'' mode bit 0 = invert rx
'' mode bit 1 = invert tx
'' mode bit 2 = open-drain/source tx
'' mode bit 3 = ignore tx echo on rx
OutA[TX]~ 'preset TX Low ; this way all we need to do is control direction
' and it allows for open drain style control
repeat
Ser.str(string("Serial Enabled"))
Ser.tx(13)
Ser.tx(SyncBit)
' Ser.stop '<- don't do this! too much overhead in stopping/starting a new serial driver
' waitcnt(clkfreq/5000 + cnt) '<- Your not thinking 3 dimensional here. The FullDuplexSerial is running
' in it's own cog, and especially since the baud is relativly low, the three
' 'Ser' commands above have executed and loaded the TX buffer within the
' FullDuplexSerial before all of the data has been sent. So.... you must
' wait a longer time interval so the data has a chance to be sent.
waitcnt(clkfreq/40 + cnt)
DirA[TX]~~ 'Set the direction bit to write out
'OutA[TX]~~ 'Set the pin to High to signal all other devices that sync has occured
'<- See OutA[TX]~ above ; remember the serial data is INVERTED, so this would
' need to go LOW instead.
waitcnt(clkfreq/5000 + cnt) '<- This delay is ok because we are out of the serial stream ; this delay
' represents the width of the pulse you want to send
DirA[TX]~ 'give up control of the pin
waitcnt(clkfreq/40 + cnt) '<- This delay can be much shorter but is set the the same interval as the
' entry delay above so that you can see in the attached scope image what
' can happen with multi-processing.
' Ser.Start(RX,TX,0,9600) 'restart serial ; don't do that! related reason to the Ser.stop note above.
Comments
or, use a separate "sync" pin, and then the slaves can waitpeq or waitpne on it after receiving the serial command
"Since you are running at 9600bps, bit-bang the output (even Spin should be able to do it) - problem solved" - Right, that's what I would do, but this wasn't my project. Just helping out a friend.
This still doesn't mean that the problem completely goes away at higher baud rates, so it's something we all should be aware of.
cognew(....
do_something
But do_something depended on a variable being set by the code running in the new cog. A short delay between cognew and and do_something solved the problem, but it took a lot of time to finally realize what the problem was.
John Abshier
or perhaps specify a LOCK or hub address the caller can wait on?
-Phil
-Phil
"Why would some one use serial to do this? Send a pulse, wait a short time and send the sync pulse." - The mechanism for intercommunication between Propellers is necessary to pass other information. Using the existing wires was an easy way to approach syncing the Propellers all together at once.
PUB txflush '' Flush transmit buffer, then wait for last byte '' to be sent. repeat until tx_tail == tx_head waitcnt(11*bit_ticks + cnt)
Phew, I thought you were about to announce the discovery of a horrible bug in the Propeller chip.
No, this is just a common or garden race condition that could happen with a single core chip and some external hardware performing the action you have forgotten to wait (or check for) the for completion of.
Perhaps the title of this thread should be changed.