Preventing deadlock when using waitcnt

godzich · 2024-02-07 14:17

Hi,

I need to pick your PASM-nerdy brains with the following.

My routine stops every 5 ms to synchronize:

waitcnt tick, BurstFreq

BurstFreg is 400_000 (x 12.5ns) so I get a nice and precise 5ms repetition rate of my routine.

Sometimes the CNT has already passed the value in tick and I need to prevent the dreaded 53-second deadlock. I know how to do this but my problem is that I have only 5 opcodes free (after a lot of optimizing). I also need to cover for the situation that CNT has advanced past the value in tick for up to 20 seconds, and the CNT might have also wrapped around. I want the routine to just continue and rearm for a new 5ms delay. Can this be done with 5 or fewer additional opcodes (taking into consideration also possible needed additional variable)???

My approach is to get a copy of CNT and store it immediately after the waitcnt opcode. Just before the next waitcnt I subtract this stored snapshot of CNT from the actual CNT - like virtually zeroing the CNT (that cannot be done), and then compare if this virtual reference is equal or bigger than BursFreq. Then I know that the waintcnt instruction will dealock and to prevent it I add a small number to tick before executing waitcnt. This works but requires totally 8 registers (code and variable)...

Im sure you propellerheads must have a more elegant and compressed solution

Comments welcome,

Cheers, Christian

Wuerfel_21 · 2024-02-07 18:34

It'd be handy to know what you're actually doing in those 5ms - where does the variance in timing come from? Ideally post the whole code, there's probably a few cycle saves that can be performed.

What I'd do to avoid waitcnt blocking:

            mov temp, tick
            sub temp, cnt
            cmps temp, #8 wc ' Smaller values may work, too, I forget what right one is
      if_ae waitcnt tick, BurstFreq
      if_b  add tick, BurstFreq

This will cause it to try to catch up with the original interval, which may not be what you want. In such a case, recalculate tick from CNT+BurstFreq in that if_b case.

Also: Here's a reminder that you can use I/O registers to free up some memory if you're short:

INB, OUTB and DIRB are not implemented and can be utilized freely as normal RAM cells
PHSA, PHSB and VSCL can be used if the counters/video aren't used
INA, PAR and CNT have "shadow registers" that are accessed when the register is used as the left operand (or jumped to). I often use CNT's shadow register for WAITCNT timing (like your tick variable), though this wouldn't work with the code above, since it needs to read tick in the right slot once. You could put temp there though, if you don't have any other free variable.

evanh · 2024-02-07 19:59

Last line becomes two for a restart

            mov temp, tick
            sub temp, cnt
            cmps temp, #8 wc ' Smaller values may work, too, I forget what right one is
      if_ae waitcnt tick, BurstFreq
      if_b  mov tick, cnt
      if_b  add tick, BurstFreq

EDIT: Although, given your description of the software behaviour, the check for expiry isn't foolproof. The likelihood of landing in the still 20 seconds remaining portion is real.

It might be better to deal to the timing restart where it occurs in your software rather than trying to compensate at the timer wait.

godzich · 2024-02-07 22:51

Hi Wuerfel and Evanh,

Thanks for chiming in. I can reassure you that I have plowed through the code for some months now and reduced opcodes to a bare minimum by optimizing after optimizing then ripping some hair and further optimizing. I have also used ALL possible SFR's that could be used for variable storage to free normal registers for opcodes. Event CTRA and CTRB, but used only the lower 16 bits
Thanks to you both for your contribution, the latter version did the trick, However, this required 6 opcodes, but I was able to remove the initial "add tick, BurstFreq" from the initialization since this code snipplet also guarantees a proper start. So I was able to save one opcode and the total is 5:) Here is what the final code looks like. Works like a charm:

Thank you again for your great support!

Christian

PS: I noted that the cmps #8 slack was not always working so I increased it to #20 without further thought, and that created a completely stable and repeatable operation.

Wuerfel_21 · 2024-02-08 01:08

@godzich said:
I can reassure you that I have plowed through the code for some months now and reduced opcodes to a bare minimum by optimizing after optimizing then ripping some hair and further optimizing.

Thought of the memory alignment? (always put 2+4n regular instructions between hub instructions to minimize waitstates)

macca · 2024-02-08 08:12

@godzich said:

Thanks for chiming in. I can reassure you that I have plowed through the code for some months now and reduced opcodes to a bare minimum by optimizing after optimizing then ripping some hair and further optimizing. I have also used ALL possible SFR's that could be used for variable storage to free normal registers for opcodes. Event CTRA and CTRB, but used only the lower 16 bits

There are techniques to allow larger code in COG ram, depending on what your code do.
It is a bit complicated, but basically you can load different chunks of codes from HUB as needed. For example, if a good portion of your code is for initialization, you can load that code at startup, then overwrite the locations with the code that runs in a loop. You can also load sub-chunks for code that runs occasionally on certain conditions. It adds some overhead to load the code so it depends on how fast the loop must be.
I have used this technique in a couple of projects, I don't know if there are simple examples to look at.

msrobots · 2024-02-08 18:59

What I do quite often is to reuse the startup code space as variables later, no reloading of chunks, just putting label names in front of the code.

Mike

Mickster · 2024-04-12 08:32

@Wuerfel_21 said:

INB, OUTB and DIRB are not implemented and can be utilized freely as normal RAM cells

PMJI but holy moly!

So we can transfer data between cogs without having to wait for the hub? (I'd test but I don't have a P1 in front of me)

Craig

Wuerfel_21 · 2024-04-12 08:48

No. They're completely unimplemented and act as normal cog RAM locations. No I/O function at all.

Mickster · 2024-04-13 12:59

@Wuerfel_21 said:
No. They're completely unimplemented and act as normal cog RAM locations. No I/O function at all.

My thoughts were; write a long to the B port in one cog and then read the long back in another cog(?)

Craig

Wuerfel_21 · 2024-04-13 13:03

That's obviously not what that does. It just behaves like a normal cog RAM cell.

msrobots · 2024-04-13 20:48

But quite useful for bit access to longs in PASM1

evanh · 2024-04-13 23:48

@Mickster said:

@Wuerfel_21 said:
No. They're completely unimplemented and act as normal cog RAM locations. No I/O function at all.

My thoughts were; write a long to the B port in one cog and then read the long back in another cog(?)

Ada is saying there is no portB hardware in any way, not even as special purpose registers in the cogs. The designated portB names are just reserved documentation, and maybe exist as symbols in the PASM assembler.