It iterates, across the table, the hub address (decrementing) of the left hand instruction. And iterates, down the table, the hub address (decrementing) of the right hand instruction.
The line across the top, "28 24 20 16 12 8 4 0", is showing the hubRAM addresses of the across iterations. That applies for the left hand instruction. It should be labelled the same down for the right hand instruction, but I never got the axis labelling sorted out.
So, bottom right corner of each table is both instructions accessing hubRAM 0.
Actually, I'm not sure it is answering your questions yet. Since these times are all the effects on the right hand instruction only. The left hand instruction is still just a backstop timing reference.
Anyway, bedtime for me. Working Saturdays at the moment and alarm goes off in 5 hours.
The first instruction syncs to the egg beater and we have the timings for the second instruction only, but any combination of reads and writes is known now. To get the time of a third instruction, treat the second one as the first and the third as the second.
RDLONG0 WRLONG0 that I thought would have the fastest write of 3 cycles actually has the slowest write of 10 cycles. For a read-modify-write, there are seven cycles available for the modify at no extra time cost, enough for three instructions.
A block copy such as RDLONG0 WRLONG0, RDLONG1 WRLONG1, etc., takes 25 cycles per long, whereas RDLONG0 WRLONG1, RDLONG1 WRLONG2, etc., takes only 17 cycles per long.
...
RDLONG0 WRLONG0 that I thought would have the fastest write of 3 cycles actually has the slowest write of 10 cycles. For a read-modify-write, there are seven cycles available for the modify at no extra time cost, enough for three instructions.
That one is a good point. It's a particularly practical case.
Good that the tables are useful in the end. My work has me a little stretched at the moment, I'm not getting much down time.
...
RDLONG0 WRLONG0 that I thought would have the fastest write of 3 cycles actually has the slowest write of 10 cycles. For a read-modify-write, there are seven cycles available for the modify at no extra time cost, enough for three instructions.
That one is a good point. It's a particularly practical case.
Good that the tables are useful in the end. My work has me a little stretched at the moment, I'm not getting much down time.
I think WRxxxx must end just before the slice of interest, e.g. the last cycle of WRLONG0 is slice 7 and the first cycle of the next instruction is slice 0. This is how I conjectured the writes to be originally but I didn't know about the 1-cycle acknowledgement then.
N.B.
There are differences between your timings and those in the spreadsheet for RDFAST and WRFAST.
...
There are differences between your timings and those in the spreadsheet for RDFAST and WRFAST.
All seems to match up, afaics. WRFAST is normally always 3 clocks according to the spreadsheet, it mostly doesn't need to wait for any slice/slot. The FIFO deals with that after a WFLONG. And RDFAST is 10 to 17 clocks under same conditions.
The exception, in both cases, is if there is any pending WRFAST data in the FIFO still to flush.
...
There are differences between your timings and those in the spreadsheet for RDFAST and WRFAST.
All seems to match up, afaics. WRFAST is normally always 3 clocks according to the spreadsheet, it mostly doesn't need to wait for any slice/slot. The FIFO deals with that after a WFLONG. And RDFAST is 10 to 17 clocks under same conditions.
The exception, in both cases, is if there is any pending WRFAST data in the FIFO still to flush.
The latest spreadsheet I have says:
RDFAST = 2 or WRFAST finish + 10...17 cycles, and
WRFAST = 2 or WRFAST finish + 3 cycles
Yep, always been screwy for WRFAST. I had removed it from the list for a while. It's because WRFAST is only 3 clocks but the compensation for the measuring is 8 by default. I hadn't bothered to fix that.
Actually, now that I've added programmable compensation, I should have a go at making the left hand WRFAST numbers correct ...
EDIT: Oh, it looks like I also had a bug in the parameters for that as well ...
EDIT2: Oh, that's right, it doesn't time well. Because WRFAST is effectively non-blocking in terms of slice alignment, only the right hand instruction aligns to its slice, therefore WRFAST as left hand doesn't give any meaningful measurement.
I'll remove it as a left hand instruction ...
EDIT3: BTW: Out of all that there is one notable WRFAST measurement. Back-to-back WRFASTs take 5 clocks for the second (right hand) one. Everywhere else, where the measurement isn't a mess, it is 3 clocks. This isn't any surprise since the prior (left hand) WRFAST will be completing FIFO setup in the background and therefore has to be waited on.
Thanks, Evan. Reversing the axes makes more sense.
A slice difference of +1 gives the fastest RDLONG RDLONG of 9 cycles but the slowest RDLONG RDFAST of 17 cycles, otherwise the two pairs of instructions are identical. Not what I predicted.
What you are seeing there is a static alignment to the hubRAM slicing. The extra tick is on the end of the RDFAST execution time. Which will show up when RDFAST is placed as the left hand instruction, eg: RDFAST RDLONG.
Less intuitively RDFAST RDFAST doesn't fit. This combo is same as RDLONG RDFAST timing instead.
What you are seeing there is a static alignment to the hubRAM slicing. The extra tick is on the end of the RDFAST execution time. Which will show up when RDFAST is placed as the left hand instruction, eg: RDFAST RDLONG.
For RDFASTs+1 to be 17 cycles, the extra cycle must happen before the read, so that the earlier slice 1 readable by RDLONGs+1 is missed. Thus:
Less intuitively RDFAST RDFAST doesn't fit. This combo is same as RDLONG RDFAST timing instead.
The extra RDLONG times after RDFAST could be due to RDLONG having to defer to the FIFO reads caused by RDFAST, whereas the second instruction in RDFAST RDFAST could cancel the first and stop the FIFO reads.
The extra RDLONG times after RDFAST could be due to RDLONG having to defer to the FIFO reads caused by RDFAST, whereas the second instruction in RDFAST RDFAST could cancel the first and stop the FIFO reads.
Cycle timing calculations agree with the above. FIFO depth is 19 and the first slice available for a random read/write after a RDFAST with 19 consecutive reads is +3, which therefore should have shortest RDLONG/WRLONG and in fact does. Up to five intermediate instructions with no time penalty.
[I have copied the post below from another thread.]
Below is the code I use for FIFO testing that involves timing. It is hopeless trying to count cycles if you do not know the exact hub RAM slice at the start. Slice = hub address[4:2] = 0-7.
rdbyte $1ff,#0 'sync to egg beater slice 0
getct t0 'this does not delay rdfast
rdfast #0,#0 'load FIFO starting at addr/slice 0
waitx #16-2 'ensure FIFO full before end of wait
' ... other instructions
getct t
'rdfast ends 16 cycles after rdbyte ends,
'however FIFO still loading after rdfast and
'need to wait >= 10 cycles before hub read/write
'other instructions take t - t0 - 32 cycles
'rdfast post-read of 5 cycles use slices 1-5,
'therefore first slice after rdfast is 6
Using the code above it is possible to test various things:
Example 1
One can confirm the FIFO depth by filling the first 32 longs of hub RAM with zeroes, then do RDFAST and WAITX as shown above, then fill the first 32 longs with 1,2,3,...,32, then do RFLONGs until a non-zero value is read that is the FIFO depth + 1.
Example 2
Same as example 1, but insert one RFLONG immediately after the WAITX. What should the non-zero value be?
Example 3
Same as example 2, but insert two to six RFLONGs.
Example 4
Same as example 2, but insert seven RFLONGs.
Cycle timing is not important in any of these cases.
Comments
The line across the top, "28 24 20 16 12 8 4 0", is showing the hubRAM addresses of the across iterations. That applies for the left hand instruction. It should be labelled the same down for the right hand instruction, but I never got the axis labelling sorted out.
So, bottom right corner of each table is both instructions accessing hubRAM 0.
Anyway, bedtime for me. Working Saturdays at the moment and alarm goes off in 5 hours.
RDLONG0 WRLONG0 that I thought would have the fastest write of 3 cycles actually has the slowest write of 10 cycles. For a read-modify-write, there are seven cycles available for the modify at no extra time cost, enough for three instructions.
A block copy such as RDLONG0 WRLONG0, RDLONG1 WRLONG1, etc., takes 25 cycles per long, whereas RDLONG0 WRLONG1, RDLONG1 WRLONG2, etc., takes only 17 cycles per long.
Good that the tables are useful in the end. My work has me a little stretched at the moment, I'm not getting much down time.
I think WRxxxx must end just before the slice of interest, e.g. the last cycle of WRLONG0 is slice 7 and the first cycle of the next instruction is slice 0. This is how I conjectured the writes to be originally but I didn't know about the 1-cycle acknowledgement then.
N.B.
There are differences between your timings and those in the spreadsheet for RDFAST and WRFAST.
The exception, in both cases, is if there is any pending WRFAST data in the FIFO still to flush.
The latest spreadsheet I have says:
RDFAST = 2 or WRFAST finish + 10...17 cycles, and
WRFAST = 2 or WRFAST finish + 3 cycles
He he, my thinking is not as good as it was, I blame my Meniere's.
How about WRFAST WRFAST = 0 cycles?
WRFAST RDLONG
WRFAST WRLONG
WRFAST RDFAST
WRFAST WRFAST
Actually, now that I've added programmable compensation, I should have a go at making the left hand WRFAST numbers correct ...
EDIT: Oh, it looks like I also had a bug in the parameters for that as well ...
EDIT2: Oh, that's right, it doesn't time well. Because WRFAST is effectively non-blocking in terms of slice alignment, only the right hand instruction aligns to its slice, therefore WRFAST as left hand doesn't give any meaningful measurement.
I'll remove it as a left hand instruction ...
EDIT3: BTW: Out of all that there is one notable WRFAST measurement. Back-to-back WRFASTs take 5 clocks for the second (right hand) one. Everywhere else, where the measurement isn't a mess, it is 3 clocks. This isn't any surprise since the prior (left hand) WRFAST will be completing FIFO setup in the background and therefore has to be waited on.
A slice difference of +1 gives the fastest RDLONG RDLONG of 9 cycles but the slowest RDLONG RDFAST of 17 cycles, otherwise the two pairs of instructions are identical. Not what I predicted.
Less intuitively RDFAST RDFAST doesn't fit. This combo is same as RDLONG RDFAST timing instead.
RDLONG = aaarbbbbb .. aaarrrrrrrrbbbbb = 9..16 cycles
RDFAST = aaaarbbbbb .. aaaarrrrrrrrbbbbb = 10..17 cycles
The extra RDLONG times after RDFAST could be due to RDLONG having to defer to the FIFO reads caused by RDFAST, whereas the second instruction in RDFAST RDFAST could cancel the first and stop the FIFO reads.
Thanks for moving the comments. There is one more to do:
http://forums.parallax.com/discussion/comment/1512542/#Comment_1512542
Makes no sense where it is now.
[I have copied the post below from another thread.]
Below is the code I use for FIFO testing that involves timing. It is hopeless trying to count cycles if you do not know the exact hub RAM slice at the start. Slice = hub address[4:2] = 0-7.
Using the code above it is possible to test various things:
Example 1
One can confirm the FIFO depth by filling the first 32 longs of hub RAM with zeroes, then do RDFAST and WAITX as shown above, then fill the first 32 longs with 1,2,3,...,32, then do RFLONGs until a non-zero value is read that is the FIFO depth + 1.
Example 2
Same as example 1, but insert one RFLONG immediately after the WAITX. What should the non-zero value be?
Example 3
Same as example 2, but insert two to six RFLONGs.
Example 4
Same as example 2, but insert seven RFLONGs.
Cycle timing is not important in any of these cases.
deleted