Locks with PASM

JohnR2010 · 2017-09-25 14:07

I sheepishly stand in your office doorway, looking at the floor, knowing full well I should know the answer to this but I’m just not 100% sure.
Say I have two PASM apps sharing the same long in main memory. Do I need to use locks to make sure they have exclusive access to the long as each cog independently reads and writes to it with RDLONG and WRLONG? I know if this was all SPINN code I don’t need locks for access to a single byte, word, or long. Just not sure if the same is true for PASM.
I have some weird behavior and just trying to track it down. I have tried with and without locks and no change. After rereading the lock section of the manual for the 3rd time I’m pretty sure I don’t need them if I’m accessing a single long, both in PASM and SPINN.

Thanks

kwinn · 2017-09-25 14:44

Since each cog accesses hub ram on a different clock cycle there is no physical need for locks. If multiple cogs are communicating with each other then locks may be required to avoid one cog reading or writing over data from another cog.

For instance, if cog a writes data from cog B and cog C to an LCD locks could be used so that only one of the cogs has access to the variable cog A writes to the LCD at any one time.

Phil Pilgrim (PhiPi) · 2017-09-25 15:19

If the two cogs are doing read-modify-write operations on the same hub element then, yes, you do need to use locks: set the lock ahead of the read; clear it after the write. This is true of both Spin and PASM.

-Phil

JohnR2010 · 2017-09-25 15:42

I feel a little better for asking now. I'm not hearing the same answer from both of you. I understand the need for locks when it comes to accessing a block of memory (several longs). But I have always understood if your accessing one long in main memory the hub will guarantee exclusive access to that long. Phil, see this sentence from page 122 of the propeller manual v1.2

The Hub prevents such collisions from occurring on elemental data (such a byte, word or long) at every moment in time, but it cannot prevent “logical” collisions on blocks of multiple elements (such as a block of bytes, words, longs or any combination of these).

I have found this to be the case when working with SPIN, I just wasn’t sure if it was also true for PASM. I’m pretty sure it is the same. The rule I have always used for my SPIN code is to use locks when two or more cogs are accessing the same block of memory. But if they are accessing only one long that is not necessary as the hub will only allow one cog to write to that long at a time.

Heater. · 2017-09-25 15:42

JohnR2010,

What are you doing standing there boy? What, cat got your tongue? Come in. And close the door behind you.

Don't you know it's bad manners to disturb the staff during their afternoon nap... er...tea break. Bend over and take three strokes of my new cane.

Now, what seems to be the problem? Stop sniveling boy, speak up.

What's that? Your having problems with atomic operations and locks in the Propeller? You were sleeping in class again weren't you boy? You have been warned about that. Bend over and take another three strokes.

It's simple, pay attention boy, reading and writing bytes, words and longs to hub memory are each atomic operations and don't need any locks. Of course you had better be sure the reader and writer are exchanging the same type else you will be in trouble and it's detention again for you.

That's good enough if you are only communicating simple values between two COGs. But what if you want them synchronized, to be sure the reader sees every change the writer makes? Then you had better have a flag that is set TRUE by the writer when it writes and set FALSE by the reader when it reads. The writer never writes if the flag is TRUE, the reader never reads if the flag is FALSE. That way they stay in sync. Get that wrong boy and it's another three strokes of the cane and possibly expulsion.

Of course that means that the reader and writer block on every exchange. That can be eased by using a FIFO between reader and writer. That's probably a bit complex for you boy, but if you are interested Master Chip has a fine example in his FullDuplexSerial module. Go to the OBEX and look it up.

Of course if you have such a flag or FIFO in place for exchanging messages between two COGs you don't need any locks.

What? Speak up boy, stop mumbling. You have more than one reader and/or one writer and you are exchanging complex data structures between them? Unheard of impertinence. You are getting too big for your boots boy. Bend over, take another three strokes.

Now, get out of here and leave us to our nap...er...tea break. Before I reach for my cane again. And close the door behind you!

....

Seriously though. One rarely needs locks in Propeller projects. If there is only one reader and one writer for any particular piece of data. I wager there is not one example in OBEX.

Oh, and that is pretty much how our teachers used to interact with us when I was a kid.

JohnR2010 · 2017-09-25 16:00

Big Smile!! Why are songs from the Wall all of a sudden playing in my head? I knew that was coming as soon as I hit Post! It has now been properly beat into my thick head! Glad you used a cane instead of your bear hand. That would have just been weird!
I been banging my head on the wall all weekend. Found the problem just after I hit post. Had nothing to do with locks. I often find after I take the time to clearly communicate the problem to you guys, the answer comes on its own.
Thanks Mr. Teacher!

Mike Green · 2017-09-25 16:18

Heater's answer is cute. More simply put ... If you have a variable or variables shared by two or more cogs, you need to lock access before accessing the variable(s) and release it afterwards. The only exception is when only one cog changes a variable that other cogs may read. The oft-used example of this is a pointer into a buffer with one cog writing into the buffer and another cog reading from it using a second pointer.

Heater. · 2017-09-25 16:34

JohnR2010,

Sorry, could not resist it. Glad you appreciate it. Looks like you have the idea down now.

As is often the case, by the time you can articulate the problem your brain has figured out the answer in the background!

Interestingly locks are so rarely used in Propeller projects that Chip was asking if he could leave them out of the P2 design a while back. As Mike says they are only needed when there are more than one reader or writer to a data area that is supposed to be consistent.

Luckily the locks stayed in the P2, else my multi-core FFT would never work.

Phil Pilgrim (PhiPi) · 2017-09-25 16:46

Here's a concrete example:

Suppose you have a byte in the hub that tracks the state of multiple processes. Each bit in the byte is a flag of some sort. If process A wants to update the byte by changing one or more bits, it has to read the byte, modify the bits, and write it back. Now, suppose, between reading the byte and writing it back, process B needs to do the same. So it reads the byte, then process A writes its new value while process B is modifying its bits in cog memory prior writing it back. But what gets written back by process B completely ignores that changes that A made. That's why each process has to set a lock before reading the byte and to keep it set until after writing the byte.

There are cases where two cogs write the same hub data where locks are not necessary. A common one is this:

A Spin cog does a remote procedure call to a PASM cog by writing parameters to the hub, then sets a byte telling the PASM cog what to do with the data. It then waits for the command byte to be cleared before accessing the answer from another location. The PASM hub sits in a loop, waiting for the command byte to be set. Once it sees a non-zero command, it reads the parameters, does its calculations, then writes the answer back to the hub. After doing so, it clears the command byte, telling the "calling" process that it has finished.

Now, there's even an exception to the above example. If the PASM cog services more than one other Spin cog, the Spin cogs will have to use a lock to keep from trashing the common command/parameter/result hub area.

-Phil

Tracy Allen · 2017-09-25 17:47

Another example of read/modify/write. I have a PASM cog that counts transitions on pins. When it detects a transition it executes the following code, cog A:

  if_c rdlong t1, hubadrs     ' read the existing count value
  if_c add t1, #1               ' increment the count
  if_c wrlong t1, hubadrs    ' store it back to the hub array

At some point cog B will reset the count back to zero. Without a lock there is a chance that cog B hits it exactly between the read and the write, so it misses the reset. There are workarounds, but the lock provides a simple mechanism.

Locks also provide a good mechanism for handling things like an i2c buss that needs to be hammered from different sides of the wall.

JohnR2010 · 2017-09-25 20:59

Mike, Phil, Tracy, I completely see and understand what your telling me.

What fostered this question is, I read someplace that the PASM command WRLONG takes 8 to 24 clock cycles to complete. I got off on a tangent thinking each clock cycle wrote only one byte of the long as main memory is addressable at the byte level. If two cogs were writing to the long at the same time there could be a collision. But instead what I guess is happening, is when the hub gives a cog access to main memory it must allow it to transfer a full long before the next COG is given access??

I had a single long in memory I was using by all my cogs as a debug register. The value was getting corrupted and for the life of me I couldn’t figure out why. So, I started down this road of needing a lock to keep two cogs from writing to it at the same time. Turns out the long wasn’t getting corrupted it was just being clocked out faster than my logging terminal could display it. Found the bug in my logging terminal and she is smoking now.
I think I have a good grasp on the need for locks. What I don’t have my head around is how data is moved in and out of a cog when it has access to cog memory. Thanks.

Phil Pilgrim (PhiPi) · 2017-09-25 21:15

Bytes, words, and longs, are all read-from/written-to the hub in the same amount of time when the cog's turn comes around. IOW, the "atomic window" to the hub can be as wide as 32 bits or as small as 8 bits -- at least as far as what a programmer needs to know. How that's actually implemented in the hardware, though, I can't begin to guess.

-Phil

Phil Pilgrim (PhiPi) · 2017-09-25 21:57

Just as a side note: the Propeller provides eight hardware locks. In point of fact, only one hardware lock is necessary, since it could act as a master lock for accessing a whole bunch of software slave locks. But that could lead to inefficiencies if everybody was pounding the master lock to access the slaves. Chip did it right by providing eight, even though it was not strictly necessary.

But, anyway, if you ever need more than eight, the master/slave approach is still available.

-Phil

Mark_T · 2017-09-26 00:30

There are a few scenarios for read/modify/write where a lock isn't needed because the rules used
determine who is in control of the variable at each stage, and the control is passed on explicitly.

For instance a "go" flag for a cog (call it A) might be only allowed to be set by the client cog (call it

when the flag is zero. And only A is allowed to clear the flag when it is non-zero. In other words
the flag being zero means only B can write it, and the flag being non-zero means only A can write
it (and only with zero).

This sort of handshake is quite limited, and you have to be careful to stick to the rules to avoid the
need for a lock.

For an interesting generalization of this sort of communication by a flag see the 100 prisoners/light bulb
puzzle https://cut-the-knot.org/Probability/LightBulbs.shtml

JohnR2010 · 2017-09-26 12:28

Phil Pilgrim (PhiPi) wrote: »

Bytes, words, and longs, are all read-from/written-to the hub in the same amount of time when the cog's turn comes around. IOW, the "atomic window" to the hub can be as wide as 32 bits or as small as 8 bits -- at least as far as what a programmer needs to know. How that's actually implemented in the hardware, though, I can't begin to guess.

-Phil

That makes sense. From what I have observed I thought it could transfer a full long before going on to the next COG. Thanks.

Mark_T · 2017-09-26 12:50

Once you have excuted a hub instruction your cog is in a known state w.r.t. future cog instructions, so typically
you can rely on code like:

        lockclr lock   ' takes 8 to 23 cycles
        nop
        nop
        lockset lock  ' takes exactly 8, since we are synchronized by previous lockclr

wmosscrop · 2017-09-26 14:44

I recently ran into an issue with locking involving more than 2 cogs sharing the same lock.

The examples I've found say to do this:

getlock    lockset lock, WC  ' Try to obtain lock, return current lock status in C
if_c       jmp      #getlock ' Someone else has the lock, try again
...
           lockclr lock      ' Release lock

The problem that I found was that it appeared - and I use that word deliberately - that one of the cogs was able to "hog" the lock while it did its processing. In other words, one of the cogs managed to be the only cog to actually obtain the lock; the other cogs couldn't obtain the lock because each time their "slice" came around the other cog already obtained the lock.

I suspect it's because of the round-robin hub access and a matter of timing. The issue was extremely sporadic but I was able to fix the issue by having each cog wait a different amount of time before reattempting to obtain the lock:

getlock    lockset lock WC   ' Try to obtain lock, return current lock status in C
if_nc      jmp      #gotlock ' We have the lock
           (waitcnt for different periods per cog here)
           jmp      #getlock ' Try again
...
gotlock    (do something useful)
...
           lockclr lock      ' Release lock

Comments? I'm probably all wet on this, I'm sure someone will point out a basic flaw in my premise.

Walter

Mike Green · 2017-09-26 15:02

You've got it right. It's possible sometimes for one cog (process) to hog a lock (resource). In this case it's related to the round-robin hub access and the short time the lock is grabbed. In other systems, the requests for a resource are queued up so each process gets a chance (takes a turn). You can also establish a minimum time from the release of a lock to the next attempt to request it with the minimum greater than the round-robin time.

I think that's right.

Mark_T · 2017-09-26 16:18

So long as after releasing the lock you don't immediately try to re-acquire it this allows a second cog in.

However with 3 or more cogs competing you have the issue of the sequencing of hub ops being round-robin
like this - with more than one cog waiting how to guarantee fair access...

Often the way fair-access gets implemented is to use a spinlock (like the Propeller locks) to control access to a more
complicated queue/lock structure. You ensure that the time you hold the spinlock for is small. The main queue/lock
controls access to the resource, the spinlock controls operations on the main lock.

Heater. · 2017-09-26 17:24

wmosscrop,

...that one of the cogs was able to "hog" the lock while it did its processing.

A cog should not hog the lock whist it does it's processing.

Locks should be acquired and released as quickly as possible. Just long enough to read whatever is coming in and/or write whatever is going out. The processing part should be done when not holding the lock.

As noted above, if there is only one reader and one writer of a shared item locks are not required at all so the problem cannot arise.

Is there even an object in OBEX that uses locks?

Dave Hein · 2017-09-26 17:40

Yes, I use it in the CLIB object to allow multiple cogs to write to serial out. I also use it in spinix so that multiple cogs can do SD file I/O.

EDIT: I also use a lock in the threaded chess program. One cog writes a set of chess moves onto a queue, and then each cog pops moves off the queue until it is empty.

EDIT2: PropGCC allocates a single lock when a program first start up. The lock is used to implement mutex locks and other thread-safe functions.

EDIT3: Here's a list of 29 OBEX objects that I found by grep'ping for lockset. Grep only works on 8-bit ASCII files, so this list does not include any 16-bit UNICODE files.

1Mbaud FullDuplexSerial (Fixed baud-rate)/asm_write_ex.spin: repeat while (lockset(lockId))
3-Axis CNC Control Package/SD-MMC_FATEngine.spin: repeat while(lockset(cardLockID - 1))
640 x 480 VGA Tile Map Driver w_ Mouse Cursor/VGA64_TMPEngine.spin: repeat while(lockset(lockNumber - 1))
74C922 Keypad Driver/74c922buffer.spin:repeat until not lockset(SemID) ' Lets lock the memory
74C92X Keypad Buffer and Driver/74c92Xbuffer.spin:repeat until not lockset(SemID) ' Lock it
Combo PS2 Keyboard and Mouse Driver/PS2_HIDEngine.spin: repeat while(lockset(keyboardLockNumber - 1))
DS1307 RTC Driver/DS1307_RTCEngine.spin: repeat while(lockset(lockNumber - 1))
FAT16_32 Full File System Driver/Full File System Driver with DS1302 RTC/DS1302_SD-MMC_FATEngine.spin: repeat while(lockset(cardLockID - 1))
Full Duplex Serial Port Driver/Full-Duplex_COMEngine.spin: while(lockset(lockNumber - 1))
Generic I2C EEPROM Driver/I2C_ROMEngine.spin: repeat while(lockset(lockNumber - 1))
ILI9325 320x240 TFT driver/touchSPI.spin:select lockset lock wc ' try to claim spi lock
IR Kit/ir_reader_nec.spin: repeat while lockset(lock)
KISS WAV Player Driver/SD-MMC_FATEngine.spin: repeat while(lockset(cardLockID - 1))
KISS WAV Recorder Driver/SD-MMC_FATEngine.spin: repeat while(lockset(cardLockID - 1))
Lock-Bit Demo/simple_multicore_demo3d.spin: repeat until lockset(LockID) == False 'Wait in this loop for lock to open
Nordic nRF2401 Rf Tranceiver Handler/TRF24G.spin: repeat until not lockset( rf_sem )
Octal Button Debouncer/D8C_BUTEngine.spin: repeat while(lockset(lockNumber - 1))
PROPSHELL/PROPSHELL-master/Full-Duplex_COMEngine.spin: while(lockset(lockNumber - 1))
Propeller Backpack TV Overlay/prop_backpack_tv_overlay2.spin: repeat while lockset(timelock)
Pulsadis detector/pulsadis dual processor for Obex/Full-Duplex_COMEngine.spin: while(lockset(lockNumber - 1))
SYSLOG - Multicog debug_log to SD_Serial/DS1307_RTCEngine.spin: repeat while(lockset(lockNumber - 1))
Servos and Encoders Calibration/SD-MMC_FATEngine.spin: repeat while(lockset(cardLockID - 1))
Settings/driver_socket.spin: repeat while NOT lockset(SocketLockid)
Switch Debounce/DebounceCog.c: lockset(SemID);
Trending Barometer/I2C_ROMEngine.spin: repeat while(lockset(lockNumber - 1))
Wheel_Controller/Wheel_Controller.spin: repeat until not lockset(mutex_id)
Wiimote IR blob tracking camera/wiicamera.spin: add clockset,clkpin_ 'Set up clock
mdb_RealTimeClock/MDB_RealTimeClock.spin: repeat until not LockSet(gLockId)
uSDPropLoader/DS1307_RTCEngine.spin: repeat while(lockset(lockNumber - 1))

wmosscrop · 2017-09-26 18:01

Heater. wrote: »

wmosscrop,

...that one of the cogs was able to "hog" the lock while it did its processing.

A cog should not hog the lock whist it does it's processing.

Locks should be acquired and released as quickly as possible. Just long enough to read whatever is coming in and/or write whatever is going out. The processing part should be done when not holding the lock.

Yes, agreed. And I was holding the lock for the shortest time possible in each of the cogs. And there were different processing times between the LOCKCLR and the next LOCKSET in each cog. Which I thought would prevent this from occurring.

But apparently there was some pattern that the 3 cogs fell into that caused the cog (or cogs) to hog the lock. I think at least part of the issue is that each cog spun on the LOCKSET instruction until the lock was acquired.

Heater. · 2017-09-26 18:40

@wmosscrop,

...apparently there was some pattern that the 3 cogs fell into that caused the cog (or cogs) to hog the lock.

That is the part I'm trying to get my head around.

Given that locks area HUB resource and given that HUB resources are accessed in a round the roundabout fashion, I would expect that when any COG releases a lock the next COG around the roundabout can always aquire it.

@Dave Hein

That is quite some list. Good job Chip was talked out of removibg locks for the P2 !

wmosscrop · 2017-09-27 14:29

When a LOCKSET (successful) or LOCKCLR is executed, at what clock tick during the execution of these instructions is the actual hub lock bit altered?

The Propeller documentation states that only one cog at a time can execute a LOCKxxx instruction.

Does this imply that they take 2 or less ticks to execute (which allows for the next cog to execute their LOCKxxx), or is there an internal locking mechanism that prevents the execution of future LOCKxxx instructions until the current instruction has completed?

Mike Green · 2017-09-27 15:43

The LOCKxxx instructions are hub instructions. Each cog gets a time slot for either a hub memory access or LOCKxxx or other hub operation. This guarantees that only one cog can execute a LOCKxxx at a time.

wmosscrop · 2017-09-27 16:14

Correct, but my question is when is the bit actually affected?

I ask this because the manual shows the execution of the hub instruction spanning a total of 8 ticks (4 cog access times). I'm referring to Figure 1-3: Cog-Hub Interaction - Best Case Scenario.

It takes 8 ticks (from the point that the hub is sync'd) to execute a hub instruction.

This implies that there is a certain amount of decode/setup/etc. time before the hub instruction affects anything.

So, is the bit set on the first, second, ..., or eighth tick?

Heater. · 2017-09-27 16:50

Does it matter which clock exactly? It should not. When it comes to HUB ops the only thing that matters is what the other COGS can see.

I like to think that LOCKS have the same atomic nature as any WRLONG. If one COG gets or releases a LOCK the next COG around the HUB cycle sees that as such.

Otherwise we have chaos.

Dave Hein · 2017-09-27 16:57

LOCKSET and LOCKCLR are read/modify/write instructions. The next access of the lock will read the result of the previous LOCKSET or LOCKCLR. So if cog 1 executes a LOCKSET, cog 2 will read the lock as a one.

Heater. · 2017-09-27 17:09

So the next question...

Let's say one COG acquires a lock, spends a long, long time doing whatever. Then releases and reacquires the LOCK as fast it possibly can.

Then is it possible for that activity to block out another COG that is doing the normal, fast as possible, spin until LOCK, do minimal update to shared data and then release?

Somebody around here recently said they had to put random "back off" times in the acquisition of locks to prevent that happening. I just don't see how it can happen in the first place.

Phil Pilgrim (PhiPi) · 2017-09-27 18:06

It may be possible if the other cog's turn at the hub keeps occurring during the jump back to the lockset instruction. An atomic lockwait instruction would have prevented this.

-Phil

Heater. · 2017-09-27 20:19

That sounds like a challenge to all the PASM gurus out there.

Can anyone produce a simple program where:

a) One or more COGS repeatedly acquire and release a lock.

b) They manage to block the progress of another COG trying to acquire the same lock.

c) Each COG flashes an LED to indicate it's progress.

d) All COGS running PASM of course.