App note AN009 could use some updating
frank freedman
Posts: 1,983
This may be a bit controversial coming on the heels of another thread suggesting document updates, but I was just looking for information on prop timing and google naturally popped up AN009, Code ExecutionTime on the P8X32A. While it mentions counting up cycle times, it does not say anything about the cycles used for accessing the hub, no info regarding best case and worst case and how to determine these. Yes, it is in the main documents, but the point of this app note is to simplify and unify the information in a more accessible format rather than having to sift it out of the pages of the manuals, because it may be more asked for material than other things. So, I guess my lazy @$$ needs to go through the forums and docs to get this information. Oh, well, gripe, gripe, gripe................
Comments
Technique 1 tells you the difficult way, which is to understand all the internals of the processor and manually add up the assembly instruction times. Which is intensive.
Technique 2 tells you that there is a system counter that you can use to find out how many ticks went by between two points.
Technique 3 tells you that you can always toggle an output pin and use an external oscilloscope.
I disagree that this application note needs to go into the nitty gritty detail of how the processor works to try and manually add up the instruction times with all of the gotchas and waits. The point it was making is that, yes you can add up assembly instruction times to max out performance, but that no parallax is not going to give exact execution times of spin instructions.
The Propeller V1.2 manual on pages 24-25 explains your questions well enough that it need not be explained again in some other document.
If I find myself accessing the same thing over and over again, I either print to PDF a shortened page range or print a hardcopy to write handwritten notes on.
Then to paraphrase:
The window will come around every 16 clock cycles. Since the hub instruction took 8 clock cycles to execute after it waited the 0 to 15 clock cycles for the access window, there are now 8 clock cycles remaining before the next window. How many normal instructions sandwiched in-between another hub instruction will again affect how long the wait is.
Another hub instruction immediately after the first could be considered wasteful if performance is needed, because the cog will sit there and wait (8 clock cycles) for the access window to come around again. Since the other instructions are a minimum of 4 clocks, you could effectively fit at most two instructions between hub instructions before bumping up to the next access window.
To skip one hub access window, 16 clock cycles * 2 = 32 clock cycles. 32 - 8 for the hub instruction = 24 remaining. Then 24 clock cycles / 4 clocks per instruction = room for 6 normal non-hub instructions. So from 3 to 6 of the 4-clock instructions between hub access instructions will cause this access pattern.
Then the next would be 7 to 10 of the 4-clock instructions.
But then there are more gotchas because some instructions will take 8 cycles instead of 4 conditionally. And other instructions are indeterminate because they are wait instructions. To know all of that and get a perfect count requires a complete understanding of the processor. But even a perfect ideal understanding would be wrong, because people like kuroneko have been eeking out timing differences between what physical cog is running the code, in relation to the wait instructions and timers. But oh darn, at most it's only a few clock cycles different.
To me it doesn't seem appropriate to go to this degree in an application note. Using techniques 2 and 3 of the app note would tell you how long a particular chunk of code would take to execute, if it mattered and had to be precisely measured.