Instruction Clocks / Cycles, Timers and Hub waiting

78rpm · 2015-10-29 06:20

@anyone

When I've timed pasm code using getcnt at start and finish, I sometimes get a odd timing figure

Minimum time for division           = 983

My code runs mainly in hub-exec, uses a stack in hubram using ptra, and all calls, rets, pushes, pops are of the 'A' variety. Calls into cog-exec also use "A". I am sure that instructions generally count as 2 cycles, but I want to verify if the timer / counters increment with the system clock or on instruction cycle. If it is as I think on system clock, would the hub-wait possibly result in the odd, as in uneven, counter value of 983, and therefore should the time I really report be 983 /2 = 491.5? I can display that true figure, it is not a problem. I am equally happy to round up or down. I am sure lut reads / writes are 3, so that .5 would be valid in some cases, though i do not have any lut access in my code. So why the 0.5, I can obly think of waiting, and where in the instruction the wait occurs, whether S, which I suspect as that is fetched first, or D, though that could also be valid as the instruction is still not complete at that point. Questions, questions.

jmg · 2015-10-29 06:29

Opcodes are not locked to 2cy, so measuring in Sys Clock Ticks is more sensible.

Electrodude · 2015-10-29 06:31

Since you're doing hubexec, is the odd delay from the FIFO stalling after jumps?

evanh · 2015-10-29 06:36

Any HubRAM access has the possibility of inserting odd numbers of clock delays.

And, yes, the system counter (GETCNT, now GETCT I think) is counting system clock ticks.

78rpm · 2015-10-29 07:13

jmg wrote: »

Opcodes are not locked to 2cy, so measuring in Sys Clock Ticks is more sensible.

In this case yes, though I think the two figures would be appropriate. This is just my test harness and it's one of those things that I think is helpful to know. Plus I find human time wise something easier to handle, even if I'm not aware of even short periods. I seem to recall a figure of about 20Hz to give the eye the illusion of constant smooth scene.

78rpm · 2015-10-29 07:19

Electrodude wrote: »

Since you're doing hubexec, is the odd delay from the FIFO stalling after jumps?

I think a definitive answer for that question will come from Chip. Intermingled with the hub-exec are calls to functions in cor ram, plus I write the results back out to hub ram when back in hub-exec.

I need to read through Chip's updated P2 document at the top of the FPGA Files thread, which I think I saw mentions that the hub-exec fifo gets priority over the rd/wr long etc. As the fifo makes full use of the egg-beater rotation it would mean the fifo would be full, if it were depleted, by the time the rd/wr long is actioned. I will have a close read of the docs a little later.

78rpm · 2015-10-29 07:30

evanh wrote: »

Any HubRAM access has the possibility of inserting odd numbers of clock delays.

And, yes, the system counter (GETCNT, now GETCT I think) is counting system clock ticks.

Yes. I have just realised, it's the system counter value I am converting to nanoseconds using the clock period. So what that really means is the timing is correct, even though an odd value, but the true number of instructions can not be calculated if hub ram, or lut, though not so much, is accessed.

I think it would be interesting to know at some point how much time is spent waiting for the hub. Think we may need quite wide counters or several daisy-chained over a even a short period of hub-exec.

evanh · 2015-10-29 08:42

Yep, create a small logging buffer in CogRAM and drop a GETCT between every instruction. Not too sure if that is easy to extend to larger code blocks but certainly can extract the precise timings in a piecemeal fashion that way.

evanh · 2015-10-29 09:12

Instructions that rely on hub alignment for access will change the timings with size of code, so adding the GETCT instructions will change the results, but analysing that can be a second setup where the captures are more strategic.

78rpm · 2015-10-29 14:23

evanh wrote: »

Yep, create a small logging buffer in CogRAM and drop a GETCT between every instruction. Not too sure if that is easy to extend to larger code blocks but certainly can extract the precise timings in a piecemeal fashion that way.

If it is in Cog ram without any hub access, or lut then it should be fairly straight forward, except a jump would flush the pipeline. However, I think the important figure generally is how long a piece of code takes to run. If you want an instruction count it is far easier to use the new single step debug interrupt and increment a software counter. Monitoring the PC will inform you when a branch has occurred, unless one branches to the next instruction. If you want a static code size then a listing is your answer, or ctrl-m in PNut.

78rpm · 2015-10-29 14:27

evanh wrote: »

Instructions that rely on hub alignment for access will change the timings with size of code, so adding the GETCT instructions will change the results, but analysing that can be a second setup where the captures are more strategic.

Agreed. I think it is also reasonable to maintain a min and max time and perhaps an average for a function, simply maintained as a summed count / num iterations. They all serve a purpose.

Instruction Clocks / Cycles, Timers and Hub waiting

Comments