Hub Exec vs Cog Exec Timings... some figures
78rpm
Posts: 264
I modified the 32 bit by 32 bit division I posted recently to record minimum and maximum execution times within the code. I then timed the function when it was executing in Hub and in a Cog. The timings are now in. The function accesses Cog registers only except for the stack pushes in the Hub, but they were outside the timing control.
The timings are the clock count returned by subtracting two getcnt instructions, which I think are system clocks, or are they system clock / 2? If the former, at 50MHz the clock period is 20ns.
Firstly we have the timings for Hub Exec:
Maximum time = 1589 clocks * 20ns = 31.780us
This gives approximately min and max 31,466 divisions / second
Secondly we have the timings for Cog Exec:
Maximum time = 1573 clocks * 20ns = 31.460us
This gives a max case of approx 31,786 divisions / second
This gives a min case of approx 60,901 divisions / second
Now, you may have noticed that the min and max values are for the same pair of numerator / divisor, but they are swapped between Hub and Cog Exec. I think it likely that Hub is heavenly influenced by refilling the instruction fifo / stream.
As the code only accesses Cog registers, I think we can say as a guidline in this instance that the % speed of Hub wrt Cog is 821 / 1580ish = 51.96%, so roughly 50% or half speed.
The timings are the clock count returned by subtracting two getcnt instructions, which I think are system clocks, or are they system clock / 2? If the former, at 50MHz the clock period is 20ns.
Firstly we have the timings for Hub Exec:
Minimum time for division = 1573 Minimum time for division numerator = 1000000000 Minimum time for division divisor = 50000000 Maximum time for division = 1589 Maximum time for division numerator = 2328 Maximum time for division divisor = 1000Minimum time = 1573 clocks * 20ns = 31.460us
Maximum time = 1589 clocks * 20ns = 31.780us
This gives approximately min and max 31,466 divisions / second
Secondly we have the timings for Cog Exec:
Minimum time for division = 821 Minimum time for division numerator = 2328 Minimum time for division divisor = 1000 Maximum time for division = 1573 Maximum time for division numerator = 1000000000 Maximum time for division divisor = 50000000Minimum time = 821 clocks * 20ns = 16.420us
Maximum time = 1573 clocks * 20ns = 31.460us
This gives a max case of approx 31,786 divisions / second
This gives a min case of approx 60,901 divisions / second
Now, you may have noticed that the min and max values are for the same pair of numerator / divisor, but they are swapped between Hub and Cog Exec. I think it likely that Hub is heavenly influenced by refilling the instruction fifo / stream.
As the code only accesses Cog registers, I think we can say as a guidline in this instance that the % speed of Hub wrt Cog is 821 / 1580ish = 51.96%, so roughly 50% or half speed.
Comments
Enjoy!
Mike
Also, I don't understand the numbers. What do they all mean? Why are some large round numbers? And what's with the 2:1 variance in min to max of CogExec?
Roughly 4:1 performance difference.
Are those seemingly strange numeric values proven to be the Minimum time & Minimum time for Division operation ?
The large round numbers are an illusion, in binary they are not. It's the method I used for converting from binary to ASCII decimal. Dividing down from the largest power of ten for the number, which is checked before hand, and then it's a simple index table lookup from there. So / 10_000_000 say, to get the qutient in range 1 to 9, then multiply that by the divisor again to subtract from the value of the number you are converting.
The meaning of the numbers are the minimum elapsed time for a division. As some numbers have differnt values in numerator and divisor, the branch will occur at different rates, thus the fifo will refill again. When an elapsed time is lower than the currently stored minimum, a new minimum is stored, which we've just calculated, and the numerator and divisor for that minimum event are stored.
I'm not too sure why the variation in min and max in Cog Exec, except perhaps a pipeline flush on the branches, as all the variables are in Cog RAM. Interrupts are disabled, so is debug I hope. I'll have to think on that a while, or perhaps someone else will have some thoughts.
Hope this helps.
No, they are the min and max values which the divide function has processed. They happen to be two large numeric values in max case, and two in the min case. I think the true maximum time may be all bits / all bits.
I note you capture the timing inside the PUSHA's but outside the POPA's. That pair of POPA's will be the culprit for variance in the CogExec results.
The code needs a bit more work to sort out numerator and divisor as -ve numbers. I'll do that next. First, here's the new code: Now those strange numbers: Constant cycle count of 983 * 20ns = 19.66us
Or 50,864 divides per second
The pusha and popa are to Hub RAM. However.. ... are all in Cog RAM
So the non-deterministic part would be synchronising to the Hub for the first pusha and the first popa at entry and exit.
A speed up could be achieved by using a small stack in Cog RAM. I do not recall the depth of the small Cog stack FIFO, whether 4, 8 or 16 depth.