Instruction Clocks / Cycles, Timers and Hub waiting
78rpm
Posts: 264
in Propeller 2
@anyone
When I've timed pasm code using getcnt at start and finish, I sometimes get a odd timing figure
My code runs mainly in hub-exec, uses a stack in hubram using ptra, and all calls, rets, pushes, pops are of the 'A' variety. Calls into cog-exec also use "A". I am sure that instructions generally count as 2 cycles, but I want to verify if the timer / counters increment with the system clock or on instruction cycle. If it is as I think on system clock, would the hub-wait possibly result in the odd, as in uneven, counter value of 983, and therefore should the time I really report be 983 /2 = 491.5? I can display that true figure, it is not a problem. I am equally happy to round up or down. I am sure lut reads / writes are 3, so that .5 would be valid in some cases, though i do not have any lut access in my code. So why the 0.5, I can obly think of waiting, and where in the instruction the wait occurs, whether S, which I suspect as that is fetched first, or D, though that could also be valid as the instruction is still not complete at that point. Questions, questions.
When I've timed pasm code using getcnt at start and finish, I sometimes get a odd timing figure
Minimum time for division = 983
My code runs mainly in hub-exec, uses a stack in hubram using ptra, and all calls, rets, pushes, pops are of the 'A' variety. Calls into cog-exec also use "A". I am sure that instructions generally count as 2 cycles, but I want to verify if the timer / counters increment with the system clock or on instruction cycle. If it is as I think on system clock, would the hub-wait possibly result in the odd, as in uneven, counter value of 983, and therefore should the time I really report be 983 /2 = 491.5? I can display that true figure, it is not a problem. I am equally happy to round up or down. I am sure lut reads / writes are 3, so that .5 would be valid in some cases, though i do not have any lut access in my code. So why the 0.5, I can obly think of waiting, and where in the instruction the wait occurs, whether S, which I suspect as that is fetched first, or D, though that could also be valid as the instruction is still not complete at that point. Questions, questions.
Comments
And, yes, the system counter (GETCNT, now GETCT I think) is counting system clock ticks.
In this case yes, though I think the two figures would be appropriate. This is just my test harness and it's one of those things that I think is helpful to know. Plus I find human time wise something easier to handle, even if I'm not aware of even short periods. I seem to recall a figure of about 20Hz to give the eye the illusion of constant smooth scene.
I think a definitive answer for that question will come from Chip. Intermingled with the hub-exec are calls to functions in cor ram, plus I write the results back out to hub ram when back in hub-exec.
I need to read through Chip's updated P2 document at the top of the FPGA Files thread, which I think I saw mentions that the hub-exec fifo gets priority over the rd/wr long etc. As the fifo makes full use of the egg-beater rotation it would mean the fifo would be full, if it were depleted, by the time the rd/wr long is actioned. I will have a close read of the docs a little later.
Yes. I have just realised, it's the system counter value I am converting to nanoseconds using the clock period. So what that really means is the timing is correct, even though an odd value, but the true number of instructions can not be calculated if hub ram, or lut, though not so much, is accessed.
I think it would be interesting to know at some point how much time is spent waiting for the hub. Think we may need quite wide counters or several daisy-chained over a even a short period of hub-exec.
If it is in Cog ram without any hub access, or lut then it should be fairly straight forward, except a jump would flush the pipeline. However, I think the important figure generally is how long a piece of code takes to run. If you want an instruction count it is far easier to use the new single step debug interrupt and increment a software counter. Monitoring the PC will inform you when a branch has occurred, unless one branches to the next instruction. If you want a static code size then a listing is your answer, or ctrl-m in PNut.
Agreed. I think it is also reasonable to maintain a min and max time and perhaps an average for a function, simply maintained as a summed count / num iterations. They all serve a purpose.