Locks with PASM

Dave Hein · 2017-09-27 20:24

If only two cogs are contending for a lock then only a short time would be needed between a LOCKCLR and LOCKSET. If the other cog wants the lock it will be sitting in a tight loop doing a LOCKSET. So the cog that is clearing the lock just needs to wait slightly longer than the loop time. Of course, the loop time in Spin will be substantially longer than a loop in PASM.

It gets more complicated when three or more cogs are actively contending for the lock. In that case a random wait time should fix the problem. Or lock requests could be queued up in a FIFO. The write to the FIFO would need to be protected with a lock. However, it seems like some wait time would still be needed before a cog could add a new request to the FIFO.

Heater. · 2017-09-27 20:30

Show me the code!

Surely if only two COGs are in the game, then when one does a LOCKCLR it has to wait a whole HUB cycle before it can do a LOCKSET. Isn't that enough time for the other COG to always be able to get in?

Phil Pilgrim (PhiPi) · 2017-09-27 20:35

heater wrote:

Isn't that enough time for the other COG to always be able to get in?

Maybe not if the other cog happens to be executing a jmp when its turn comes around.

-Phil

msrobots · 2017-09-27 20:35

Interesting aspect @Phil, lets think about it.

The shortest PASM loop to wait for a lock is one Hub-op and one conditional jmp.

So a PASM COG can get hold of the lock at every HUB-cycle it can access. If more then one PASM COG is waiting, no deadlock possible, after releasing a lock all other PASM COGs waiting have at least one chance at it until the first one gets around again.

Not so with SPIN COGs. A 'repeat until lockset' in SPIN will not be able to catch every HUB-cycle.

Mike

Phil Pilgrim (PhiPi) · 2017-09-28 00:13

heater wrote:

Can anyone produce a simple program where:

a) One or more COGS repeatedly acquire and release a lock.

b) They manage to block the progress of another COG trying to acquire the same lock.

c) Each COG flashes an LED to indicate it's progress.

d) All COGS running PASM of course.

Trying, but so far unsuccessful -- even with four cogs vying normally, plus one "lock hog."

-Phil

Dave Hein · 2017-09-28 01:16

It's certainly possible to create a PASM program that would prevent a Spin cog from getting the lock. The PASM cog would just need a set/clr cycle that matched the number of cycles in the Spin 'repeat until lockset' loop. Would this happen normally? Probably not, but it is possible. I think in general multiple cogs will not get locked out. Some cogs may require more attempts to acquire the lock, but it is unlikely that they would never get it.

Heater. · 2017-09-28 03:43

Dave,

It's certainly possible to create a PASM program that would prevent a Spin cog from getting the lock

OK, just for you, the challenged is relaxed. Your COG blocking demonstration may have one or more of the COGs running Spin.

Mind you, it's the same challenge, seeing as Spin is PASM under the hood.

wmosscrop · 2017-09-28 18:55

Dave Hein wrote: »

It's certainly possible to create a PASM program that would prevent a Spin cog from getting the lock. The PASM cog would just need a set/clr cycle that matched the number of cycles in the Spin 'repeat until lockset' loop. Would this happen normally? Probably not, but it is possible. I think in general multiple cogs will not get locked out. Some cogs may require more attempts to acquire the lock, but it is unlikely that they would never get it.

The issue I had was with a cog running Spin that would never obtain the lock. There were 3 other cogs, all running PASM, that were accessing the same lock. It was very, very, sporadic, maybe happening in 1 out of 10 executions of 10 minute runs each.

But happen it did. And it was always the cog running Spin that would hang. I do believe that there was an inadvertent timing synchronization between the Spin routine and at least one of the PASM routines.

Adding the differing wait times to each cog has fixed the problem (it has never recurred).

Phil Pilgrim (PhiPi) · 2017-09-28 19:00

Finally got a cog to hog a lock, effectively preventing three others from obtaining it. Here's code that does not hog the lock:

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB start

  cognew(@locktest, 4)    'Scope channel 1.
  cognew(@locktest, 8)      'Scope channel 2.
  cognew(@locktest, 16)   'Scope channel 3.
  cognew(@locktest, 32)   'Scope channel 4.

DAT

              org       0
locktest      mov       dira,par

lockget       lockset   zero wc
              nop
              nop
              nop
'              nop
        if_c  jmp       #lockget
              or        outa,par
              andn      outa,par
              lockclr   zero
              jmp       #lockget

zero          long      0

Here's the scope output:

If I uncomment the nop, this is what I get:

What seems to be happening is that in channels 2 through 4, the jmp is synchronized to the hub cycle, so they miss every opportunity to obtain a lock when their turn comes around. The code is admittedly pathological, but it does demonstrate the possibility of a cog not being able to obtain a lock.

I haven't yet been able to duplicate this behavior without instructions between the lockset and the jmp. Also, it takes four cogs with the same code for any one of them to get locked out. My mistake: 'locks out even with two.

-Phil

Phil Pilgrim (PhiPi) · 2017-09-29 20:23

It occurs to me that if all cogs that use a lock immediately follow the lockset by the if_c jmp back to it, all cogs will get an equal shot at obtaining a lock. The reason is that each cog will be able to access the hub again on its very next turn, as Heater rightly pointed out. In my examples above, I put a delay between the lockset and the jmp, and this is what caused the problem. What this tells me that any kind of random hold-off will create the problem, rather than fixing it.

In Spin, the situation is different because, as was pointed out, you don't get to do a lockset on every hub cycle. So a Spin cog might still get locked out. Given more instruction space in the Spin interpreter, a lockwait function that was interpreted using the tight PASM loop, would have taken care of the problem.

-Phil

Heater. · 2017-09-29 23:51

Wow, Phil, you did it. And so easily as it turns out. A nice demonstration.

Amazing that after all these years there are things to be discovered about the Prop.

It's the "lock bomb" !

wmosscrop · 2017-09-30 16:49

Thanks, Phil, for your time and effort researching this issue.

One thing that your test code doesn't do is keep the lock for any significant time. The issue I had was with locking messages that required decoding. So each cog has to obtain the lock, retrieve the message, decode it, determine if it is the intended recipient, and potentially process it - all before releasing the lock.

I believe that the issue in my case was that one of the PASM cogs spent just enough time doing the above - on a message that wasn't intended for it - and keeping the other cogs from obtaining the lock. This, of course, meant that the intended recipient was never able to retrieve and process (and reset) it's message.

Especially since the intended recipient was the Spin cog, the cog always having the issue.

It would have been nice to use a separate lock for each recipient. Lack of available cog space precludes this.

I think that my "random waits" code fixes the issue because each cog waits a different amount of time before retrying.

Of course, it's almost impossible to prove that it will always work.

Heater. · 2017-09-30 16:57

Locks should only be held for as short as time as possible. If only because keeping them longer degrades system performance.

Lock, read, unlock, process...., lock, write, unlock, repeat.

Phil Pilgrim (PhiPi) · 2017-09-30 20:08

wmosscrop wrote:

I think that my "random waits" code fixes the issue because each cog waits a different amount of time before retrying.

Based upon my experiments, you should always retry as soon as possible, IOW in a tight loop. Waiting only increases the chance that you will not obtain the lock at all.

Here's an example:

CON

  _clkmode      = xtal1 + pll16x
  _xinfreq      = 5_000_000

PUB start

  cognew(@locktest, 4)
  cognew(@locktest, 8)
  cognew(@locktest, 16)
  cognew(@locktest, 32)

DAT

              org       0
locktest      mov       dira,par

lockget       lockset   zero wc
        if_nc jmp       #:doit
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop
              nop 
              jmp       #lockget

:doit         or        outa,par
              mov       t0,cnt
              
:waitlp       mov       t1,cnt
              sub       t1,t0
              sub       t1,keep wc
        if_c  jmp       #:waitlp
              
              andn      outa,par
              lockclr   zero
              jmp       #lockget

zero          long      0
keep          long      100000
t0            res       1
t1            res       1

In this example, each cog keeps the lock for 100,000 clocks. The first cog gets the lock once, and the fourth one hogs it thereafter. If I remove the nops, every cog gets its turn.

-Phil

Tracy Allen · 2017-10-01 19:27

Interesting insights Phil. I wonder if lockwait was ever under consideration.

Have I got this right in the current logic? The combination of lockset wc and if_c jmp #$-1 will always hit every hub rotation, and even adding one nop or one other 4-cycle instruction will be okay. But two or more instructions between the the lockset and the jmp $-1 will mean it will miss one or more hub rotation. You can design or inadvertently cause another cog to sync in hub phase so that it will aways release its lock and also pick it up during that missed rotation, a hog-cog. Definitely a reason to keep that loop tight.

One could use a djnz value,#$-1 instead of a straight jmp in order to make a failsafe for a locked up situation for whatever cause, but I haven't ever before thought that something like that might be necessary. Seems an unlikely bump in the night. Code that has random hub accesses or decision trees will tend to escape (?!) quickly, eventually.

Locks with PASM

Comments