Hub access/Waitpeq questions

Harley · 2010-06-19 00:22

1. I have a PASM loop which needs to be made more efficient. It is taking too much time.

Presently it is organized like this fragment:

                rdlong     Cmd,PAR wz             ' get value @PAR from Prop#1
        if_nz wrlong    zed,PAR                    ' if non-zero, then clear value @ PAR        
        if_nz mov       CntlFF,Cmd               ' set bits for Control command
                 shr        CntlReg,1                  ' right shift to read next ls bit

Wouldn't that lose 9 clocks once Hub is sync'd to do the 'wrlong'?

Seems maybe if it was reorganized like this, it then would only lost 1 clock:

                rdlong     Cmd,PAR wz             ' get value @PAR from Prop#1
        if_nz mov       CntlFF,Cmd               ' set bits for Control command
                shr         CntlReg,1                  ' right shift to read next ls bit
        if_nz wrlong    zed,PAR                    ' if non-zero, then clear value @ PAR

I don't have a decent way to measure this at the moment.

2. Once an WAITPEQ instruction is begun, does the 'compare' occur every clock (12.5 nsec, running @ 80 MHz)? I don't find any details on how this instruction works as far as the detailed timing is concerned.

And, what is the value of the 'wz result'? Seems it would be z = 0 after the WAITPEQ is exited in all cases. And if no exit, a hang up occurs.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko

kuroneko · 2010-06-19 00:35

re: hub access, yes you'd lose the number of cycles which get you to your next hub window. So putting something in between is always a good idea. As for measuring, why don't you use cnt sampling around those code fragments?

re: waitpxx, yes sampling takes place every cycle (once primed). As for the Z flag, I have to get back to you on this one.

Post Edited (kuroneko) : 6/19/2010 12:52:46 AM GMT

kuroneko · 2010-06-19 00:50

From a quick glance the Z flag is based on the wr behaviour of waitpxx (which makes sense in a way). For waitpeq the Z flag follows value + mask, for waitpne it's value + mask + 1 (obviously only when it's released).

Post Edited (kuroneko) : 6/19/2010 12:58:28 AM GMT

Harley · 2010-06-19 20:37

@kuroneko, Thanks for your responses.

I hadn't before needed to use minimal time with Hub accesses. Think I now better understand what the Prop manual was describing about it now; that 'window' when a cog has access to hub read/writes. So there is a little loss in clock cycles with the 9 clock times vs. 4*n clock times for most instructions.

I still don't understand the value/use for the 'wz' effect. Unless I'm missing the point, wouldn't the result of INx ANDed with Mask be non-zero value except if one is using a zeros State value?

You mentioned 'cnt sampling'. Would this be like subtracting cnt value before from cnt value after one or more instructions? Don't know how one would display such values from a PASM cog and debugging with PASD. Not running in real time in PASD.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko

kuroneko · 2010-06-20 01:44

Harley said...
So there is a little loss in clock cycles with the 9 clock times vs. 4*n clock times for most instructions.

If you believe the data sheet that is. Ever since I couldn't fit a nop (4) and a minimal waitpxx (5+ at the time) between two consecutive hub ops I lost my faith in its validity. AFAIAC there is no loss as the gap is actually 8 cycles (hub ops being 8..23) which can be filled nicely with two normal instructions.

Harley said...
I still don't understand the value/use for the 'wz' effect. Unless I'm missing the point, wouldn't the result of INx ANDed with Mask be non-zero value except if one is using a zeros State value?

Yes that would be the case IF the flags were dependent on the exit condition. However, waitpxx as well as waitcnt perform an add as their main function (the wait bit is just a delay). While waitcnt has wr implied, waitpxx runs with nr by default. Regardless, the flags are set by this add. Consider the following two fragments. Pin 1 is tied to ground by another cog. The AND phase will result in zero which will never be -2 or -3, so the waitpne will fall through. The Z will only be set in the first case -3 + 2 + 1 = 0, not in the second -2 + 2 + 1 = 1. Note, the extra +1 is an oddity of waitpne.

        waitpne minus3, #%10 wz     ' sets Z
        ...
minus3  long    -3

        waitpne minus2, #%10 wz     ' clears Z
        ...
minus2  long    -2

Looking at waitpeq you'll find that the Z isn't set by the AND phase either, again pin 1 is 0. The first call will fall through (0 == 2 & 0). But doesn't set the Z flag (0 + 2 = 2). Getting Z set is a bit more involved (waitpeq zero, zero wz doesn't prove anything). Anyway, my demoboard reads $EF00_0000 from ina which is good enough. So example 2 will fall through ($8000_0000 == $8000_0000 & $EF00_0000) and set Z ($8000_0000 + $8000_0000 = 0).

        waitpeq zero, #%10 wz       ' clears Z (if released)
        ...
zero    long    0

        waitpeq minint, minint wz   ' sets Z (if released)
        ...
minint  long    NEGX

The ALU will most likely generate different (intermediate) Z results while the inputs are sampled and evaluated, the final operation however is the add.

Harley said...
You mentioned 'cnt sampling'. Would this be like subtracting cnt value before from cnt value after one or more instructions? Don't know how one would display such values from a PASM cog and debugging with PASD. Not running in real time in PASD.

Yes, it's simply before/after. Don't forget to adjust by 4 to account for the difference introduced by the sample instructions.

mov     eins, cnt
<instruction under test>
mov     zwei, cnt
sub     zwei, eins      ' will be 4 + instruction under test
sub     zwei, #4        ' time consumed for instruction under test

As for displaying, it's just a numeric value, last time I checked PASD could inspect cog locations [noparse]:)[/noparse] If I'm mistaken (I never used it) then it should at least be possible to write the result to hub (wrlong zwei, par) and inspect it there.

Post Edited (kuroneko) : 6/20/2010 7:18:21 AM GMT

Harley · 2010-06-20 16:27

@kuroneko, again thank you for explaining things in more detail.

I still don't understand what use the wz effect can be put to. Like who cares if it 'adds' something. Has that been documented/validated by Parallax?

And, if using PASD, since it doesn't run at full speed (manually steps from one instruction to another, or runs to a breakpoint) the 'cnt' value would be highly exaggerated, as PASD emulates the Prop instruction set. At least I wouldn't think one could get valid timings within PASD. Haven't tried that, and don't have the time to try to do so.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko

kuroneko · 2010-06-21 00:16

Harley said...
I still don't understand what use the wz effect can be put to. Like who cares if it 'adds' something. Has that been documented/validated by Parallax

Officially the Z flag is marked undefined/not important for all waitxxx instructions (see data sheet) so I was surprised that you asked in the first place. But since you did I told you. Personally I do care about the wr effect. It has its uses.

As for getting stuff documented, you're welcome to try. I talked to 4 people but nothing happened so far.

Harley said...
And, if using PASD, since it doesn't run at full speed (manually steps from one instruction to another, or runs to a breakpoint) the 'cnt' value would be highly exaggerated, as PASD emulates the Prop instruction set. At least I wouldn't think one could get valid timings within PASD.

Upload the attached program, set a breakpoint at cog address $019 (cogid cnt), run the code then inspect cog memory at :min/:max. As this is run to breakpoint you get the real timings. Single stepping this stuff obviously doesn't work too well.

Harley · 2010-06-21 00:57

@kuroneko,

I will look into this tomorrow. I just had some problem with Prop Tool or my EEPROM; gets almost all the way through programming then fails with 'lost Prop' message!!

But did print it out so as to be able to 'study' what is happening with your short program. I didn't realize PASD ran at full speed when running to a breakpoint. Not sure how that happens and still being able to trap the breakpoint address. I've not played with the innerds of the Prop; usually that would be the area of interest, but 'work' comes first unfortunately.

Thanks for the details. Love it.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko

Harley · 2010-06-21 18:47

@kuroneko, found my problem; Windows is so flakey. I'm on a iMac with Parallels + Windows XP. But the Windows part doesn't run for long before 'something' happens and things just don't work right. Had to back out of my apps and Windows and relaunch. Then success.

Your little program I scratched my head over trying to figure out why it runs. The ':diff' wasn't RES'd so wonders how it gives proper results. But did see the 8 min/17 (23 decimal) max values when run. I also moved the breakpoint to 'sync' and watched each loop's results. Found some things I didn't know before, and some I still don't understand.

I didn't know one could increment an instruction by incrementing a ':delay' label. Nor that 'djnz [noparse]:o[/noparse]ffset,#sync' could decrement a value ([noparse]:o[/noparse]ffset) and it still be proper after a Stop. Or what 'cogid cnt' actually is doing. I looked at 'cnt' but couldn't tell in PASD how this is a 0-7 value. Need more time on this short program. I had several Prop Protoboards to choose from in various configurations to try your program on, without clobbering any previously program in EEPROM. Thank you for the valuable study.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko

kuroneko · 2010-06-22 01:33

Harley said...
Your little program I scratched my head over trying to figure out why it runs. The ':diff' wasn't RES'd so wonders how it gives proper results. But did see the 8 min/17 (23 decimal) max values when run.

If you check in the propeller tool after compilation and move the cursor over the label you'll notice that :diff is just an alias for :after (check cog offset). One minor issue though, the alignment has to be correct if you want to use it with PASM.

        wrlong   zwei, par
        ...
zero    long    0
eins    byte    0
zwei
drei    long    0

Here zwei isn't long aligned and you'll get a compiler error when you reference the alias. Adding e.g. 3 more bytes will correct the alignment.

        wrlong   zwei, par
        ...
zero    long    0
eins    byte    0, 1, 2, 3
zwei
drei    long    0

In this case zwei is back to long alignment and represents the same offset as drei. You'll get the same effect with res with the exception that res itself always counts as long (but you could still misalign the address before that). Usually only longs are used in PASM so it shouldn't be much of an issue.

Harley said...
I didn't know one could increment an instruction by incrementing a ':delay' label. Nor that 'djnz [noparse]:o[/noparse]ffset,#sync' could decrement a value ([noparse]:o[/noparse]ffset) and it still be proper after a Stop. Or what 'cogid cnt' actually is doing. I looked at 'cnt' but couldn't tell in PASD how this is a 0-7 value.

:delay is just a register, it's content is an instruction and/or something else. It entirely depends on how you use it, e.g. if you jump to it it'll always be seen as an instruction. In this case the instruction at this location adds an immediate value (#9+2) to another register. We know it's not going to overflow when manipulated and we have at least one instruction between the increment and the add (pipelining) so everything is fine.
I don't quite understand your djnz reference. [noparse]:o[/noparse]ffset gets decremented 16 times, then (nz) the instruction falls through and [noparse]:o[/noparse]ffset shows as 0 in the cog RAM viewer. Note that the destination slot can't hold immediate values, only addresses (so the instruction itself wouldn't be affected).
cnt is a read-only register. Meaning when it's used in the destination slot of an instruction you actually access its shadow location (normal RAM, not the system counter). The first two instances in the test program are just for sync'ing to hub and executing a hub op, results are not important (but should be non-destructive). The last instance is to shutdown the cog. cogid and cogstop only use the destination as an argument (see data sheet). Which seems an ideal usage for shadow registers. Note that you don't have to do it like this. Reserving an ordinary long in your program works just as well. I just don't see why I should go through all the trouble when I get this for free, e.g. usually I define a constant zero as $1FF which is the index of the vscl register. As it's pre-cleared and not every code fragment uses video this seems like a nice shortcut.

One final thing, I'm surprised you didn't complain/comment about this (cnt being read-only etc):

                mov     [b]cnt[/b], cnt                '
:delay          add     [b]cnt[/b], #9+2               '
                waitcnt [b]cnt[/b], #0                 ' delay 16..31 cycles (0..15)

Post Edited (kuroneko) : 6/22/2010 4:59:51 AM GMT

Harley · 2010-06-22 16:08

After a long 'think' on the matter, I realized I keep forgetting that even though a program might reside in EEPROM, when the Prop runs it is running out of RAM and registers are also RAM. Sort of 'brain-washed' from the old days of working with ROM/EPROM based designs.

Yes, now I understand how instructions can be modified while the program runs. And why :diff could work; just another label for a following variable. I wasn't being flexible enough in my thinking to appreciate all what the Prop provides.

This forum is such an educational site. Many thanks to everyone who posts here. And, thank you again kuroneko.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko

Hub access/Waitpeq questions

Comments