Shop OBEX P1 Docs P2 Docs Learn Events
Hub RAM FIFO read timing - Page 4 — Parallax Forums

Hub RAM FIFO read timing

124»

Comments

  • evanhevanh Posts: 15,126
    edited 2020-12-18 11:03
    evanh wrote: »
    It iterates, across the table, the hub address (decrementing) of the left hand instruction. And iterates, down the table, the hub address (decrementing) of the right hand instruction.

    The line across the top, "28 24 20 16 12 8 4 0", is showing the hubRAM addresses of the across iterations. That applies for the left hand instruction. It should be labelled the same down for the right hand instruction, but I never got the axis labelling sorted out.

    So, bottom right corner of each table is both instructions accessing hubRAM 0.

  • Is the following correct for the first column of RDLONG RDLONG ?
     Hub Addr x = 28 | Hub Addr y |  Cycles  
    -----------------|------------|--------
    RDLONGx  RDLONGy |     28     |    16
                     |     24     |    15
                     |     20     |    14
                     |     16     |    13
                     |     12     |    12
                     |      8     |    11
                     |      4     |    10
                     |      0     |     9
    
  • evanhevanh Posts: 15,126
    yep
  • TonyB_TonyB_ Posts: 2,108
    edited 2020-12-18 12:02
    Thanks Evan. Here are your results in a form I can understand better:
    ----------------------------
    slice x  slice y    cycles y
    ----------------------------
    RDLONG0  RDLONG0    16
    RDLONG0  RDLONG1     9
    RDLONG0  RDLONG2    10
    RDLONG0  RDLONG3    11
    RDLONG0  RDLONG4    12
    RDLONG0  RDLONG5    13
    RDLONG0  RDLONG6    14
    RDLONG0  RDLONG7    15
    ----------------------------
    RDLONG0  WRLONG0    10
    RDLONG0  WRLONG1     3
    RDLONG0  WRLONG2     4
    RDLONG0  WRLONG3     5
    RDLONG0  WRLONG4     6
    RDLONG0  WRLONG5     7
    RDLONG0  WRLONG6     8
    RDLONG0  WRLONG7     9
    ----------------------------
    WRLONG0  RDLONG0    14
    WRLONG0  RDLONG1    15
    WRLONG0  RDLONG2    16
    WRLONG0  RDLONG3     9
    WRLONG0  RDLONG4    10
    WRLONG0  RDLONG5    11
    WRLONG0  RDLONG6    12
    WRLONG0  RDLONG7    13
    ----------------------------
    WRLONG0  WRLONG0     8
    WRLONG0  WRLONG1     9
    WRLONG0  WRLONG2    10
    WRLONG0  WRLONG3     3
    WRLONG0  WRLONG4     4
    WRLONG0  WRLONG5     5
    WRLONG0  WRLONG6     6
    WRLONG0  WRLONG7     7
    ----------------------------
    slice y hub address = slice 0 hub address + y*4 bytes
    cycles y = time taken by slice y instruction only
    all hub addresses are long aligned
    
  • My predicted RDLONG RDLONG and WRLONG WRLONG are correct, but my RDLONG WRLONG and WRLONG RDLONG are wrong (off by one slice).
  • evanhevanh Posts: 15,126
    edited 2020-12-18 11:50
    Actually, I'm not sure it is answering your questions yet. Since these times are all the effects on the right hand instruction only. The left hand instruction is still just a backstop timing reference.

    Anyway, bedtime for me. Working Saturdays at the moment and alarm goes off in 5 hours.

  • The first instruction syncs to the egg beater and we have the timings for the second instruction only, but any combination of reads and writes is known now. To get the time of a third instruction, treat the second one as the first and the third as the second.

    RDLONG0 WRLONG0 that I thought would have the fastest write of 3 cycles actually has the slowest write of 10 cycles. For a read-modify-write, there are seven cycles available for the modify at no extra time cost, enough for three instructions.

    A block copy such as RDLONG0 WRLONG0, RDLONG1 WRLONG1, etc., takes 25 cycles per long, whereas RDLONG0 WRLONG1, RDLONG1 WRLONG2, etc., takes only 17 cycles per long.
  • evanhevanh Posts: 15,126
    TonyB_ wrote: »
    ...
    RDLONG0 WRLONG0 that I thought would have the fastest write of 3 cycles actually has the slowest write of 10 cycles. For a read-modify-write, there are seven cycles available for the modify at no extra time cost, enough for three instructions.
    That one is a good point. It's a particularly practical case.

    Good that the tables are useful in the end. My work has me a little stretched at the moment, I'm not getting much down time.

  • evanh wrote: »
    TonyB_ wrote: »
    ...
    RDLONG0 WRLONG0 that I thought would have the fastest write of 3 cycles actually has the slowest write of 10 cycles. For a read-modify-write, there are seven cycles available for the modify at no extra time cost, enough for three instructions.
    That one is a good point. It's a particularly practical case.

    Good that the tables are useful in the end. My work has me a little stretched at the moment, I'm not getting much down time.

    I think WRxxxx must end just before the slice of interest, e.g. the last cycle of WRLONG0 is slice 7 and the first cycle of the next instruction is slice 0. This is how I conjectured the writes to be originally but I didn't know about the 1-cycle acknowledgement then.

    N.B.
    There are differences between your timings and those in the spreadsheet for RDFAST and WRFAST.
  • evanhevanh Posts: 15,126
    edited 2020-12-19 22:40
    TonyB_ wrote: »
    ...
    There are differences between your timings and those in the spreadsheet for RDFAST and WRFAST.
    All seems to match up, afaics. WRFAST is normally always 3 clocks according to the spreadsheet, it mostly doesn't need to wait for any slice/slot. The FIFO deals with that after a WFLONG. And RDFAST is 10 to 17 clocks under same conditions.

    The exception, in both cases, is if there is any pending WRFAST data in the FIFO still to flush.

  • TonyB_TonyB_ Posts: 2,108
    edited 2020-12-20 10:49
    evanh wrote: »
    TonyB_ wrote: »
    ...
    There are differences between your timings and those in the spreadsheet for RDFAST and WRFAST.
    All seems to match up, afaics. WRFAST is normally always 3 clocks according to the spreadsheet, it mostly doesn't need to wait for any slice/slot. The FIFO deals with that after a WFLONG. And RDFAST is 10 to 17 clocks under same conditions.

    The exception, in both cases, is if there is any pending WRFAST data in the FIFO still to flush.

    The latest spreadsheet I have says:

    RDFAST = 2 or WRFAST finish + 10...17 cycles, and
    WRFAST = 2 or WRFAST finish + 3 cycles
  • evanhevanh Posts: 15,126
    edited 2020-12-20 01:58
    It's an either/or. Either it's 2 clocks with the non-blocking mode bit, D[31], set, or it's WRFAST finish + ... that was one of your added features. :)

  • TonyB_TonyB_ Posts: 2,108
    edited 2020-12-20 10:43
    evanh wrote: »
    It's an either/or. Either it's 2 clocks with the non-blocking mode bit, D[31], set, or it's WRFAST finish + ... that was one of your added features. :)

    He he, my thinking is not as good as it was, I blame my Meniere's.

    How about WRFAST WRFAST = 0 cycles?
  • Evan, there are four dubious instruction pairs in your results:

    WRFAST RDLONG
    WRFAST WRLONG
    WRFAST RDFAST
    WRFAST WRFAST
  • evanhevanh Posts: 15,126
    edited 2020-12-20 20:23
    Yep, always been screwy for WRFAST. I had removed it from the list for a while. It's because WRFAST is only 3 clocks but the compensation for the measuring is 8 by default. I hadn't bothered to fix that.

    Actually, now that I've added programmable compensation, I should have a go at making the left hand WRFAST numbers correct ...

    EDIT: Oh, it looks like I also had a bug in the parameters for that as well ...
    EDIT2: Oh, that's right, it doesn't time well. Because WRFAST is effectively non-blocking in terms of slice alignment, only the right hand instruction aligns to its slice, therefore WRFAST as left hand doesn't give any meaningful measurement.

    I'll remove it as a left hand instruction ...

    EDIT3: BTW: Out of all that there is one notable WRFAST measurement. Back-to-back WRFASTs take 5 clocks for the second (right hand) one. Everywhere else, where the measurement isn't a mess, it is 3 clocks. This isn't any surprise since the prior (left hand) WRFAST will be completing FIFO setup in the background and therefore has to be waited on.

  • evanhevanh Posts: 15,126
    edited 2020-12-20 21:30
    Tidied up. I've also reversed both axes.
  • TonyB_TonyB_ Posts: 2,108
    edited 2020-12-21 00:19
    Thanks, Evan. Reversing the axes makes more sense.

    A slice difference of +1 gives the fastest RDLONG RDLONG of 9 cycles but the slowest RDLONG RDFAST of 17 cycles, otherwise the two pairs of instructions are identical. Not what I predicted.
    -----------------------------
    INSTR1s  INSTR2s+n  cycles for
                        INSTR2s+n
    -----------------------------
    RDLONGs  RDLONGs+0  16
    RDLONGs  RDLONGs+1   9
    RDLONGs  RDLONGs+2  10
    RDLONGs  RDLONGs+3  11
    RDLONGs  RDLONGs+4  12
    RDLONGs  RDLONGs+5  13
    RDLONGs  RDLONGs+6  14
    RDLONGs  RDLONGs+7  15
    -----------------------------
    RDLONGs  RDFASTs+0  16
    RDLONGs  RDFASTs+1  17
    RDLONGs  RDFASTs+2  10
    RDLONGs  RDFASTs+3  11
    RDLONGs  RDFASTs+4  12
    RDLONGs  RDFASTs+5  13
    RDLONGs  RDFASTs+6  14
    RDLONGs  RDFASTs+7  15
    -----------------------------
    hub address s+n = hub address s + n*4 bytes
    all addresses are long aligned
    
  • evanhevanh Posts: 15,126
    What you are seeing there is a static alignment to the hubRAM slicing. The extra tick is on the end of the RDFAST execution time. Which will show up when RDFAST is placed as the left hand instruction, eg: RDFAST RDLONG.

    Less intuitively RDFAST RDFAST doesn't fit. This combo is same as RDLONG RDFAST timing instead.

  • TonyB_TonyB_ Posts: 2,108
    edited 2020-12-21 11:31
    evanh wrote: »
    What you are seeing there is a static alignment to the hubRAM slicing. The extra tick is on the end of the RDFAST execution time. Which will show up when RDFAST is placed as the left hand instruction, eg: RDFAST RDLONG.
    For RDFASTs+1 to be 17 cycles, the extra cycle must happen before the read, so that the earlier slice 1 readable by RDLONGs+1 is missed. Thus:

    RDLONG = aaarbbbbb .. aaarrrrrrrrbbbbb = 9..16 cycles
    RDFAST = aaaarbbbbb .. aaaarrrrrrrrbbbbb = 10..17 cycles
    evanh wrote: »
    Less intuitively RDFAST RDFAST doesn't fit. This combo is same as RDLONG RDFAST timing instead.
    The extra RDLONG times after RDFAST could be due to RDLONG having to defer to the FIFO reads caused by RDFAST, whereas the second instruction in RDFAST RDFAST could cancel the first and stop the FIFO reads.
  • TonyB_TonyB_ Posts: 2,108
    edited 2020-12-22 10:36
    TonyB_ wrote: »
    The extra RDLONG times after RDFAST could be due to RDLONG having to defer to the FIFO reads caused by RDFAST, whereas the second instruction in RDFAST RDFAST could cancel the first and stop the FIFO reads.
    Cycle timing calculations agree with the above. FIFO depth is 19 and the first slice available for a random read/write after a RDFAST with 19 consecutive reads is +3, which therefore should have shortest RDLONG/WRLONG and in fact does. Up to five intermediate instructions with no time penalty.
    RDFAST RDLONG fastest
    
    slice   012345670123456701234567012345670
                    !!!!!!!!!!!!!!!!!!!.      cycles
    rf0  aaaarrrrrrrrbbbbb     !       .         ?	
    rd3             !     aaarrrrrrrrrrrbbbbb   19
                    !          !       .
    ins  rf0______________i1i2i3i4i5rd3______
    
    
    RDFAST WRLONG fastest
    
    slice   0123456701234567012345670123
                    !!!!!!!!!!!!!!!!!!!: cycles
    rf0  aaaarrrrrrrrbbbbb     !       :    ?
    wr3             !     ccwwwwwwwwwwd:   13
                    !          !       :
    ins  rf0______________i1i2i3i4i5wr3
    
    
    ! = RDFAST read
    . = RDLONG read
    : = WRLONG write
    
  • SystemSystem Posts: 42
    edited 2020-12-23 15:17
    This discussion was merged with comments split from: PNut/Spin2 Latest Version (v35b - DEBUG MIDI display updated).
  • System wrote: »
    This discussion was merged with comments split from: PNut/Spin2 Latest Version (v35b - DEBUG MIDI display updated).

    Thanks for moving the comments. There is one more to do:
    http://forums.parallax.com/discussion/comment/1512542/#Comment_1512542
    Makes no sense where it is now.
  • TonyB_TonyB_ Posts: 2,108
    edited 2022-08-27 21:43
    deleted
  • TonyB_TonyB_ Posts: 2,108
    edited 2022-08-28 10:21

    [I have copied the post below from another thread.]

    Below is the code I use for FIFO testing that involves timing. It is hopeless trying to count cycles if you do not know the exact hub RAM slice at the start. Slice = hub address[4:2] = 0-7.

            rdbyte  $1ff,#0     'sync to egg beater slice 0
            getct   t0          'this does not delay rdfast
            rdfast  #0,#0       'load FIFO starting at addr/slice 0
            waitx   #16-2       'ensure FIFO full before end of wait
    '       ...                  other instructions
            getct   t
    
    'rdfast ends 16 cycles after rdbyte ends,
    'however FIFO still loading after rdfast and
    'need to wait >= 10 cycles before hub read/write
    'other instructions take t - t0 - 32 cycles
    'rdfast post-read of 5 cycles use slices 1-5,
    'therefore first slice after rdfast is 6
    

    Using the code above it is possible to test various things:

    Example 1
    One can confirm the FIFO depth by filling the first 32 longs of hub RAM with zeroes, then do RDFAST and WAITX as shown above, then fill the first 32 longs with 1,2,3,...,32, then do RFLONGs until a non-zero value is read that is the FIFO depth + 1.

    Example 2
    Same as example 1, but insert one RFLONG immediately after the WAITX. What should the non-zero value be?

    Example 3
    Same as example 2, but insert two to six RFLONGs.

    Example 4
    Same as example 2, but insert seven RFLONGs.

    Cycle timing is not important in any of these cases.

  • TonyB_TonyB_ Posts: 2,108
    edited 2022-09-01 17:50

    deleted

Sign In or Register to comment.