Hub access/Waitpeq questions
Harley
Posts: 997
1. I have a PASM loop which needs to be made more efficient. It is taking too much time.
Presently it is organized like this fragment:
Wouldn't that lose 9 clocks once Hub is sync'd to do the 'wrlong'?
Seems maybe if it was reorganized like this, it then would only lost 1 clock:
I don't have a decent way to measure this at the moment.
2. Once an WAITPEQ instruction is begun, does the 'compare' occur every clock (12.5 nsec, running @ 80 MHz)? I don't find any details on how this instruction works as far as the detailed timing is concerned.
And, what is the value of the 'wz result'? Seems it would be z = 0 after the WAITPEQ is exited in all cases. And if no exit, a hang up occurs.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
Presently it is organized like this fragment:
rdlong Cmd,PAR wz ' get value @PAR from Prop#1 if_nz wrlong zed,PAR ' if non-zero, then clear value @ PAR if_nz mov CntlFF,Cmd ' set bits for Control command shr CntlReg,1 ' right shift to read next ls bit
Wouldn't that lose 9 clocks once Hub is sync'd to do the 'wrlong'?
Seems maybe if it was reorganized like this, it then would only lost 1 clock:
rdlong Cmd,PAR wz ' get value @PAR from Prop#1 if_nz mov CntlFF,Cmd ' set bits for Control command shr CntlReg,1 ' right shift to read next ls bit if_nz wrlong zed,PAR ' if non-zero, then clear value @ PAR
I don't have a decent way to measure this at the moment.
2. Once an WAITPEQ instruction is begun, does the 'compare' occur every clock (12.5 nsec, running @ 80 MHz)? I don't find any details on how this instruction works as far as the detailed timing is concerned.
And, what is the value of the 'wz result'? Seems it would be z = 0 after the WAITPEQ is exited in all cases. And if no exit, a hang up occurs.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
Comments
re: waitpxx, yes sampling takes place every cycle (once primed). As for the Z flag, I have to get back to you on this one.
Post Edited (kuroneko) : 6/19/2010 12:52:46 AM GMT
Post Edited (kuroneko) : 6/19/2010 12:58:28 AM GMT
I hadn't before needed to use minimal time with Hub accesses. Think I now better understand what the Prop manual was describing about it now; that 'window' when a cog has access to hub read/writes. So there is a little loss in clock cycles with the 9 clock times vs. 4*n clock times for most instructions.
I still don't understand the value/use for the 'wz' effect. Unless I'm missing the point, wouldn't the result of INx ANDed with Mask be non-zero value except if one is using a zeros State value?
You mentioned 'cnt sampling'. Would this be like subtracting cnt value before from cnt value after one or more instructions? Don't know how one would display such values from a PASM cog and debugging with PASD. Not running in real time in PASD.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
Yes that would be the case IF the flags were dependent on the exit condition. However, waitpxx as well as waitcnt perform an add as their main function (the wait bit is just a delay). While waitcnt has wr implied, waitpxx runs with nr by default. Regardless, the flags are set by this add. Consider the following two fragments. Pin 1 is tied to ground by another cog. The AND phase will result in zero which will never be -2 or -3, so the waitpne will fall through. The Z will only be set in the first case -3 + 2 + 1 = 0, not in the second -2 + 2 + 1 = 1. Note, the extra +1 is an oddity of waitpne.
Looking at waitpeq you'll find that the Z isn't set by the AND phase either, again pin 1 is 0. The first call will fall through (0 == 2 & 0). But doesn't set the Z flag (0 + 2 = 2). Getting Z set is a bit more involved (waitpeq zero, zero wz doesn't prove anything). Anyway, my demoboard reads $EF00_0000 from ina which is good enough. So example 2 will fall through ($8000_0000 == $8000_0000 & $EF00_0000) and set Z ($8000_0000 + $8000_0000 = 0).
The ALU will most likely generate different (intermediate) Z results while the inputs are sampled and evaluated, the final operation however is the add.
Yes, it's simply before/after. Don't forget to adjust by 4 to account for the difference introduced by the sample instructions.
As for displaying, it's just a numeric value, last time I checked PASD could inspect cog locations [noparse]:)[/noparse] If I'm mistaken (I never used it) then it should at least be possible to write the result to hub (wrlong zwei, par) and inspect it there.
Post Edited (kuroneko) : 6/20/2010 7:18:21 AM GMT
I still don't understand what use the wz effect can be put to. Like who cares if it 'adds' something. Has that been documented/validated by Parallax?
And, if using PASD, since it doesn't run at full speed (manually steps from one instruction to another, or runs to a breakpoint) the 'cnt' value would be highly exaggerated, as PASD emulates the Prop instruction set. At least I wouldn't think one could get valid timings within PASD. Haven't tried that, and don't have the time to try to do so.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
As for getting stuff documented, you're welcome to try. I talked to 4 people but nothing happened so far.
Upload the attached program, set a breakpoint at cog address $019 (cogid cnt), run the code then inspect cog memory at :min/:max. As this is run to breakpoint you get the real timings. Single stepping this stuff obviously doesn't work too well.
I will look into this tomorrow. I just had some problem with Prop Tool or my EEPROM; gets almost all the way through programming then fails with 'lost Prop' message!!
But did print it out so as to be able to 'study' what is happening with your short program. I didn't realize PASD ran at full speed when running to a breakpoint. Not sure how that happens and still being able to trap the breakpoint address. I've not played with the innerds of the Prop; usually that would be the area of interest, but 'work' comes first unfortunately.
Thanks for the details. Love it.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
Your little program I scratched my head over trying to figure out why it runs. The ':diff' wasn't RES'd so wonders how it gives proper results. But did see the 8 min/17 (23 decimal) max values when run. I also moved the breakpoint to 'sync' and watched each loop's results. Found some things I didn't know before, and some I still don't understand.
I didn't know one could increment an instruction by incrementing a ':delay' label. Nor that 'djnz [noparse]:o[/noparse]ffset,#sync' could decrement a value ([noparse]:o[/noparse]ffset) and it still be proper after a Stop. Or what 'cogid cnt' actually is doing. I looked at 'cnt' but couldn't tell in PASD how this is a 0-7 value. Need more time on this short program. I had several Prop Protoboards to choose from in various configurations to try your program on, without clobbering any previously program in EEPROM. Thank you for the valuable study.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
Here zwei isn't long aligned and you'll get a compiler error when you reference the alias. Adding e.g. 3 more bytes will correct the alignment.
In this case zwei is back to long alignment and represents the same offset as drei. You'll get the same effect with res with the exception that res itself always counts as long (but you could still misalign the address before that). Usually only longs are used in PASM so it shouldn't be much of an issue.
- :delay is just a register, it's content is an instruction and/or something else. It entirely depends on how you use it, e.g. if you jump to it it'll always be seen as an instruction. In this case the instruction at this location adds an immediate value (#9+2) to another register. We know it's not going to overflow when manipulated and we have at least one instruction between the increment and the add (pipelining) so everything is fine.
- I don't quite understand your djnz reference. [noparse]:o[/noparse]ffset gets decremented 16 times, then (nz) the instruction falls through and [noparse]:o[/noparse]ffset shows as 0 in the cog RAM viewer. Note that the destination slot can't hold immediate values, only addresses (so the instruction itself wouldn't be affected).
- cnt is a read-only register. Meaning when it's used in the destination slot of an instruction you actually access its shadow location (normal RAM, not the system counter). The first two instances in the test program are just for sync'ing to hub and executing a hub op, results are not important (but should be non-destructive). The last instance is to shutdown the cog. cogid and cogstop only use the destination as an argument (see data sheet). Which seems an ideal usage for shadow registers. Note that you don't have to do it like this. Reserving an ordinary long in your program works just as well. I just don't see why I should go through all the trouble when I get this for free, e.g. usually I define a constant zero as $1FF which is the index of the vscl register. As it's pre-cleared and not every code fragment uses video this seems like a nice shortcut.
One final thing, I'm surprised you didn't complain/comment about this (cnt being read-only etc):Post Edited (kuroneko) : 6/22/2010 4:59:51 AM GMT
Yes, now I understand how instructions can be modified while the program runs. And why :diff could work; just another label for a following variable. I wasn't being flexible enough in my thinking to appreciate all what the Prop provides.
This forum is such an educational site. Many thanks to everyone who posts here. And, thank you again kuroneko.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko