Shop OBEX P1 Docs P2 Docs Learn Events
Hub RAM FIFO read timing - Page 2 — Parallax Forums

Hub RAM FIFO read timing

24

Comments

  • Someone suggesting using a separate register to hold the top of the hardware stack to speed things up a bit. (I don't know whether that was done because I think the stack is a big shift register with no stack pointer.) Similarly, could or does a separate register hold the top of the FIFO?
  • cgraceycgracey Posts: 14,208
    The 9 clocks breaks down like this:

    1) 3 clocks to get the read command to a particular RAM
    2) 1 clock for the RAM to read out the data
    3) 4 more clocks to route the data to the cog that requested it
    4) 1 last clock to get it into the FIFO

    That's it.
  • roglohrogloh Posts: 5,837
    edited 2020-11-25 22:28
    Thanks for the detailed break down Chip. I was looking at faster/smaller C prologue/epilogue ideas in LUT/COG RAM that could speed up the register saving steps in function calls instead of fighting for access to the hub memory with the FIFO but those extra 13-20 clock branches back into HUB-exec are going to hurt so it might just be better to remain in HUB-exec mode the entire time during function calls and live with the extra code size overhead.

    Update: I imagine that running instruction sequences such as "setq + rdlong" etc to access to the hub memory to write/read register data to/from a stack frame while running from the HUB-exec mode would also stall the FIFO, potentially by the full egg-beater read cycle each time they are issued, so figuring out the pros/cons to having prologues/epilogues in COG/LUT RAM is not trivial.
  • cgracey wrote: »
    The 9 clocks breaks down like this:

    1) 3 clocks to get the read command to a particular RAM
    2) 1 clock for the RAM to read out the data
    3) 4 more clocks to route the data to the cog that requested it
    4) 1 last clock to get it into the FIFO

    That's it.

    Thanks from me as well, Chip.
    Are 1) and 3) due mainly to physical distance between cog and hub RAM?
  • cgraceycgracey Posts: 14,208
    TonyB_ wrote: »
    cgracey wrote: »
    The 9 clocks breaks down like this:

    1) 3 clocks to get the read command to a particular RAM
    2) 1 clock for the RAM to read out the data
    3) 4 more clocks to route the data to the cog that requested it
    4) 1 last clock to get it into the FIFO

    That's it.

    Thanks from me as well, Chip.
    Are 1) and 3) due mainly to physical distance between cog and hub RAM?

    Yes, I had to add registers to buy clock cycles to overcome the routing delay.
  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-01 17:54
    TonyB_ wrote: »
    cgracey wrote: »
    Branches to hub take 13..20 clocks. It takes up to 7 clocks to reach the hub window of interest, so that a read command can be issued, then it takes 13 clocks to get to the next instruction. I'm looking at the Verilog and it signals 'go' as soon as the first instruction is entered into the FIFO. So, it doesn't wait for a some number of longs, just the first one. Why this takes 13 clocks, I'm not sure right now. I know it takes a few clocks to get the read command to the actual RAM of interest, then it takes a few clocks for the data to come back through sets of registers to the cog that requested it, then, there's a clock in the FIFO. Not sure how it all adds up to thirteen, at the moment. It seems like a long time.

    As Cog/LUT exec branches take 4 clocks, could it be thought of 4 + 9..16 clocks for hub exec? Or 4 + 9..9+cogs-1, where cogs = 8 only currently. The question is could the 9 be reduced in future?
    cgracey wrote: »
    The 9 clocks breaks down like this:

    1) 3 clocks to get the read command to a particular RAM
    2) 1 clock for the RAM to read out the data
    3) 4 more clocks to route the data to the cog that requested it
    4) 1 last clock to get it into the FIFO

    That's it.

    Just wondering why RDFAST takes 10..17 cycles to load FIFO, instead of 9..16 cycles taken by branches to hub RAM.
  • evanhevanh Posts: 16,039
    edited 2020-12-01 04:59
    It'll be pipeline related. Same as there is +4 for branching in hubexec, incurring the combined hubRAM fetching and instruction pipeline reloading. Whereas RDLONG only has to align the hubRAM fetch with operand input of the ALU, which is much further along the pipeline than instruction fetching.
  • cgraceycgracey Posts: 14,208
    edited 2020-12-01 06:19
    I believe RDFAST waits for two longs to be in the FIFO, not one, in case there is a long crossing.
  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-02 11:01
    TonyB_ wrote: »
    Just wondering why RDFAST takes 10..17 cycles to load FIFO, instead of 9..16 cycles taken by branches to hub RAM.
    cgracey wrote: »
    I believe RDFAST waits for two longs to be in the FIFO, not one, in case there is a long crossing.

    Thanks for the info, Chip. I was hoping there's a case where RDFAST loads the FIFO in 16 cycles maximum and I think this is it: a no-wait RDFAST followed by a RFBYTE, e.g. for XBYTE, can always gets the byte from the first long in the FIFO.

    If there are four instructions before a RDFAST that could be moved after it, or only three before a _RET_ RDFAST, then a no-wait RDFAST will always be faster than a normal one * (assuming a register is used for D with D[31] = 1), despite needing an extra instruction, WAITX, to ensure a delay of 16 cycles after the no-wait RDFAST (unless this duration of instructions can be moved). * Time saving is 2 .. 9 cycles.

    If there are three instructions before a RDFAST that could be moved after it, or only two before a _RET_ RDFAST, then a no-wait RDFAST will never be slower and usually faster than a normal one. Time saving is 0 .. 7 cycles.

    Edit:
    Provided there is no long crossing, the above also applies to RFWORD & RFLONG.
  • evanhevanh Posts: 16,039
    edited 2020-12-02 03:31
    Tony,
    I gather you're commenting on when the hubRAM->FIFO data becomes valid with D[31] = 1, rather than actual instruction speed.

  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-03 23:21
    evanh wrote: »
    Tony,
    I gather you're commenting on when the hubRAM->FIFO data becomes valid with D[31] = 1, rather than actual instruction speed.

    Evan, the code below compares the two versions of RDFAST. The main reason for using no-wait RDFAST is to save time but it also makes timing deterministic.
    'Original code, normal RDFAST:
    				'cycles
    	<instr1>		'2
    	...			
    	<instrN>		'2
    _ret_	rdfast	#0,pb		'2+10..17+2 = 14..21
    				' _ret_ starts new XBYTE
    
    'Reordered code, no-wait RDFAST:
    				'cycles
    	rdfast	nowait,pb	'2
    	waitx	#X		'2+X
    	<instr1>		'2
    	...			
    _ret_	<instrN>		'2+2
    				' _ret_ starts new XBYTE
    nowait	long	$8000_0000
    
    'Must be at least 16 cycles between waitx and _ret_ incl.
    'Assuming 2 cycles for all <instr1> ... <instrN>,
    'then 2+X+2*N+2 = 16, or X = 12-2*N
    '
    'If <instrN> requires prefix other than _ret_,
    'then add separate ret and subtract 2 from X
    
    'Table of N and X with cycle counts:
    '
    'N	X	 Total cycles
    '		using RDFAST with
    '		wait	no-wait
    '
    '0	12	14..21	   18
    '1	10	16..23	   18
    '2	8	18..25	   18
    '3	6	20..27	   18
    '4	4	22..29	   18
    '5	2	24..31	   18
    '6	0	26..33	   18
    '7	-	28..35	   18
    '
    'waitx not needed if N > 6
    
    Note that, for this XBYTE example, moving just one instruction from before RDFAST to after it saves cycles compared to wait average.

    EDIT1:
    Added N = 0, no-wait only 0.5 cycles slower than wait average.

    EDIT2:
    The fixed 18 cycles for no-wait could be a variable 11..18 cycles with the following future logic change: RFxxxx wait for first FIFO long, if necessary, after no-wait RDFAST. Would then not need WAITX, unless deterministic timing wanted.
  • I think we could have done no-wait RDFAST better, see EDIT2 above.
  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-05 09:51
    And before I forget to mention it again, no-wait RDBYTE/WORD/LONG seem possible in a future P2+/P3.
  • evanhevanh Posts: 16,039
    TonyB_ wrote: »
    And before I forget to mention it again, no-wait RDBYTE/WORD/LONG seem possible.
    I presume you're thinking about alternative design possibilities for the prop2.

  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-05 11:31
    evanh wrote: »
    TonyB_ wrote: »
    And before I forget to mention it again, no-wait RDBYTE/WORD/LONG seem possible.
    I presume you're thinking about alternative design possibilities for the prop2.

    Yes, previous post edited.

    Due to the delays reading from hub, it would be good if you could start the reading process at a certain time but read the actual data later, with other instructions in between to reduce the total cycles. This is precisely what no-wait RDFAST allows on the P2 now, as described earlier here:
    http://forums.parallax.com/discussion/comment/1511479/#Comment_1511479
  • cgraceycgracey Posts: 14,208
    Yes, TonyB_, that would work. It wouldn't have been hard to make it do that. I find that a lot of transfers I code, though, are multi-long transfers. They are pretty efficient.
  • Cluso99Cluso99 Posts: 18,069
    Setq is amazing 😃
  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-10 11:12
    TonyB_ wrote: »
    cgracey wrote: »
    The 9 clocks breaks down like this:

    1) 3 clocks to get the read command to a particular RAM
    2) 1 clock for the RAM to read out the data
    3) 4 more clocks to route the data to the cog that requested it
    4) 1 last clock to get it into the FIFO

    That's it.

    Chip, based on the above, is the following an accurate summary for generalised hub RAM read timing?

    a) 3 clocks to get the read command to a particular RAM
    b) 1..8 clocks for the RAM to read out the data, depending on egg beater
    c) 4 clocks to route the data to the cog that requested it
    d) 1 clock to get the data into the FIFO or D register

    Total 9..16 clocks for hub read, +1 if long crossing or RDFAST when D[31]=0

    * * * * * * * * * *

    And is this correct for hub RAM writes?

    e) 3 clocks to get the write command to a particular RAM
    f) 0..7 clocks to start write of data to RAM, depending on egg beater

    Total 3..10 clocks for hub write, +1 if long crossing

    Write instruction ends at start of egg beater slice of interest
  • bump for post above.

    With this read and write info, hub RAM timings could be calculated precisely.
  • cgraceycgracey Posts: 14,208
    TonyB_ wrote: »
    bump for post above.

    With this read and write info, hub RAM timings could be calculated precisely.

    Yes, they could be. I need to come up with the exact formula and add it into the silicon doc.
  • evanhevanh Posts: 16,039
    edited 2020-12-13 23:51
    I've sort of resigned to those sort of details not really mattering. I had a go at trying to get it exact with the instruction pipelining sequence, but even there we had confusion about depicting synchronous clocking in the diagrams.

    Felt I was beating a dead horse from then on.

    EDIT: Maybe only clear way is to use a logic simulator package. I found Logisim easy to use - https://sourceforge.net/projects/circuit/

  • cgracey wrote: »
    TonyB_ wrote: »
    bump for post above.

    With this read and write info, hub RAM timings could be calculated precisely.

    Yes, they could be. I need to come up with the exact formula and add it into the silicon doc.

    Chip, could you please check the following post?
    http://forums.parallax.com/discussion/comment/1511907/#Comment_1511907
    If what I said is right, then that's all the info we need to calculate hub RAM timings.

    You've explained why there are 9 cycles at least for reads, but not the 3 cycles minimum for writes. I assumed that read and write commands take the same time to get from cog to hub, i.e. 3 cycles.
  • evanhevanh Posts: 16,039
    Two of those three is basic instruction execution time. So it's just a question of why the +1.

  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-16 14:09
    Hub RAM read & write timing (version 2, December 2020)

    A hub RAM read involves three separate stages: a read command is sent from
    cog to hub, the hub RAM is read and the read data are sent from hub to cog.
    There is a fixed pre-read time of 3 cycles, a variable read time due to the
    egg beater of 1-8 cycles and a fixed post-read time of 5 cycles. These are
    denoted by the letters a, r and b, respectively, in the tables below where
    each letter represents one cycle. The shortest read of 9 cycles is given by
    aaarbbbbb and the longest of 16 cycles by aaarrrrrrrrbbbbb.

    A hub RAM write also has three stages: write data are sent from cog to hub,
    a wait for the hub slot and an acknowledgement sent from hub to cog. There
    is a fixed pre-wait time of 2 cycles, a variable wait of 0-7 cycles and a
    fixed post-wait of 1 cycle, denoted by the letters c, w and d, respectively.
    The actual write takes place a few clocks later but does not delay the cog.
    The shortest write of 3 cycles is given by ccd and the longest of 10 cycles
    by ccwwwwwwwd.

    If a hub RAM read or write is followed by another and the phase or egg beater
    slice difference between the two is known, then in most cases it is possible
    to use the time that would be wasted on egg beater delays to run one or more
    instructions between the hub accesses.

    This is easy to see in the tables, in which these intermediate instructions
    are denoted by i1, i2, etc., * indicating that one can be 3 cycles long. The
    initial read or write, shown with worst-case timing but of unknown duration,
    synchronizes subsequent reads or writes to the egg beater. Cycle lengths are
    given from the second hub access onwards and assume no long crossing that
    would add an extra cycle.


    The above text and the tables of hub RAM timings are in the attached file.


    Timings are also available in three separate forum posts:

    read - write - read - write
    http://forums.parallax.com/discussion/comment/1512306/#Comment_1512306

    read - read - read - read
    http://forums.parallax.com/discussion/comment/1512310/#Comment_1512310

    write - write - write - write
    http://forums.parallax.com/discussion/comment/1512311/#Comment_1512311

  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-15 18:30
    'hub RAM timings for
    'read - write - read - write
    '
    '
    'slice   012345670123456701234567012345670
    '                .       :       .       : cycles
    'rd0   aaarrrrrrrrbbbbb  :       .       :    ?
    'wr0             .     ccd       .       :    3
    'rd0             .       :aaarrrrrbbbbb  :   13
    'wr0             .       :       .     ccd    3
    '                .       :       .       :
    'ins   rd0_____________wr0i1i2rd0______wr0
    '
    '
    'slice   0123456701234567012345670123456701
    '                .       :        .       : cycles
    'rd0   aaarrrrrrrrbbbbb  :        .       :    ?
    'wr0             .     ccd        .       :    3
    'rd1             .       :aaarrrrrrbbbbb  :   14
    'wr1             .       :        .     ccd    3
    '                .       :        .       :
    'ins   rd0_____________wr0i1i2*rd1______wr1
    '
    '
    'slice   01234567012345670123456701234567012
    '                .        :       .        : cycles
    'rd0   aaarrrrrrrrbbbbb   :       .        :    ?
    'wr1             .     ccwd       .        :    4
    'rd1             .        :aaarrrrrbbbbb   :   13
    'wr2             .        :       .     ccc:    4
    '                .        :       .        :
    'ins   rd0_____________wr1_i1i2rd1______wr2
    '
    '
    'slice   012345670123456701234567012345670123
    '                .         :      .         : cycles
    'rd0   aaarrrrrrrrbbbbb    :      .         :    ?
    'wr2             .     ccwwd      .         :    5
    'rd1             .         :aaarrrrbbbbb    :   12
    'wr3             .         :      .     ccwwd    5
    '                .         :      .         :
    'ins   rd0_____________i1wr2i2*rd1______i1wr3
    '
    '
    'slice   0123456701234567012345670123456701234
    '                .          :     .          : cycles
    'rd0   aaarrrrrrrrbbbbb     :     .          :    ?
    'wr3             .     ccwwwd     .          :    6
    'rd1             .          :aaarrrbbbbb     :   11
    'wr4             .          :     .     cccww:    6
    '                .          :     .          :
    'ins   rd0_____________i1*wr3i2rd1______i1*wr4
    '
    '
    'slice   01234567012345670123456701234567012345
    '                .           :    .           : cycles
    'rd0   aaarrrrrrrrbbbbb      :    .           :    ?
    'wr4             .     ccwwwwd    .           :    7
    'rd1             .           :aaarrbbbbb      :   10
    'wr5             .           :    .     ccwwwwd    7
    '                .           :    .           :
    'ins   rd0_____________i1i2wr4rd1_______i1i2wr5
    '
    '
    'slice   012345670123456701234567012345670123456
    '                .            :   .            : cycles
    'rd0   aaarrrrrrrrbbbbb       :   .            :    ?
    'wr5             .     ccwwwwwd   .            :    8
    'rd1             .            :aaarbbbbb       :    9
    'wr6             .            :   .     ccwwwwwd    8
    '                .            :   .            :
    'ins   rd0_____________i1i2*wr5rd1______i1i2*wr6
    '
    '
    'slice   012345670123456701234567012345670123456701234567
    '                .             :          .             : cycles
    'rd0   aaarrrrrrrrbbbbb        :          .             :    ?
    'wr6             .     ccwwwwwwd          .             :    9
    'rd1             .             :aaarrrrrrrrbbbbb        :   16
    'wr7             .             :          .     ccwwwwwwd    9
    '                .             :          .             :
    'ins   rd0_____________i1i2i3wr6i4i5i6*rd1______i1i2i3wr7
    '
    '
    'slice   0123456701234567012345670123456701234567012345670
    '                .              :         .              : cycles
    'rd0   aaarrrrrrrrbbbbb         :         .              :    ?
    'wr7             .     ccwwwwwwwd         .              :   10
    'rd1             .              :aaarrrrrrrbbbbb         :   15
    'wr0             .              :         .     ccwwwwwwwd   10
    '                .              :         .              :
    'ins   rd0_____________i1i2i3*wr7i4i5i6rd1______i1i2i3*wr0
    
  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-15 01:16
    Timings above and below have not been verified by testing.
  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-15 18:31
    'hub RAM timings for
    'read - read - read - read
    '
    '
    'slice   01234567012345670123456701234567012345670123456701234567012345
    '                .               .               .               .      cycles
    'rd0   aaarrrrrrrrbbbbb          .               .               .         ?
    'rd0             .     aaarrrrrrrrbbbbb          .               .        16
    'rd0             .               .     aaarrrrrrrrbbbbb          .        16
    'rd0             .               .               .     aaarrrrrrrrbbbbb   16
    '                .               .               .               .
    'ins   rd0_____________i1i2i3*rd0______i1i2i3*rd0______i1i2i3*rd0______
    '
    '
    'slice   01234567012345670123456701234567012345670
    '                .        .        .        .      cycles
    'rd0   aaarrrrrrrrbbbbb   .        .        .         ?
    'rd1             .     aaarbbbbb   .        .         9
    'rd2             .        .     aaarbbbbb   .         9
    'rd3             .        .              aaarbbbbb    9
    '                .        .        .        .
    'ins   rd0_____________rd1______rd2______rd3______
    '
    '
    'slice   01234567012345670123456701234567012345670123
    '                .         .         .         .      cycles
    'rd0   aaarrrrrrrrbbbbb    .         .         .         ?
    'rd2             .     aaarrbbbbb    .         .        10
    'rd4             .         .     aaarrbbbbb    .        10
    'rd6             .         .         .     aaarrbbbbb   10
    '                .         .         .         .
    'ins   rd0_____________rd2_______rd4_______rd6_______
    '
    '
    'slice   01234567012345670123456701234567012345670123456
    '                .          .          .          .      cycles
    'rd0   aaarrrrrrrrbbbbb     .          .          .         ?
    'rd3             .     aaarrrbbbbb     .          .        11
    'rd6             .          .     aaarrrbbbbb     .        11
    'rd1             .          .          .     aaarrrbbbbb   11
    '                .          .          .          .
    'ins   rd0_____________i1rd3______i1rd6______i1rd1______
    '
    '
    'slice   01234567012345670123456701234567012345670123456701
    '                .           .           .           .      cycles
    'rd0   aaarrrrrrrrbbbbb      .           .           .         ?
    'rd4             .     aaarrrrbbbbb      .           .        12
    'rd0             .           .     aaarrrrbbbbb      .        12
    'rd4             .           .           .     aaarrrrbbbbb   12
    '                .           .           .           .
    'ins   rd0_____________i1*rd4______i1*rd0______i1*rd4______
    '
    '
    'slice   01234567012345670123456701234567012345670123456701234
    '                .            .            .            .      cycles
    'rd0   aaarrrrrrrrbbbbb       .            .            .         ?
    'rd5             .     aaarrrrrbbbbb       .            .        13
    'rd2             .            .     aaarrrrrbbbbb       .        13
    'rd7             .            .            .     aaarrrrrbbbbb   13
    '                .            .            .            .
    'ins   rd0_____________i1i2rd5______i1i2rd2______i1i2rd7______
    '
    '
    'slice   01234567012345670123456701234567012345670123456701234567
    '                .             .             .             .      cycles
    'rd0   aaarrrrrrrrbbbbb        .             .             .         ?
    'rd6             .     aaarrrrrrbbbbb        .             .        14
    'rd4             .             .     aaarrrrrrbbbbb        .        14
    'rd2             .             .             .     aaarrrrrrbbbbb   14
    '                .             .             .             .
    'ins   rd0_____________i1i2*rd6______i1i2*rd4______i1i2*rd2______
    '
    '
    'slice   01234567012345670123456701234567012345670123456701234567012
    '                .              .              .              .      cycles
    'rd0   aaarrrrrrrrbbbbb         .              .              .         ?
    'rd7             .     aaarrrrrrrbbbbb         .              .        15
    'rd6             .              .     aaarrrrrrrbbbbb         .        15
    'rd5             .              .              .     aaarrrrrrrbbbbb   15
    '                .              .              .              .
    'ins   rd0_____________i1i2i3rd7______i1i2i3rd6______i1i2i3rd5______
    
  • TonyB_TonyB_ Posts: 2,196
    edited 2020-12-15 18:33
    'hub RAM timings for
    'write - write - write - write
    '
    '
    'slice   012345670123456701234567012345670
    '                :       :       :       : cycles
    'wr0    ccwwwwwwwd       :       :       :    ?
    'wr0             :ccwwwwwd       :       :    8
    'wr0             :       :ccwwwwwd       :    8
    'wr0             :       :       :ccwwwwwd    8
    '                :       :       :       :
    'ins    wr0_______i1i2*wr0i1i2*wr0i1i2*wr0
    '
    '
    'slice   012345670123456701234567012345670123
    '                :        :        :        : cycles
    'wr0    ccwwwwwwwd        :        :        :    ?
    'wr1              ccwwwwwwd        :        :    9
    'wr2             :        :ccwwwwwwd        :    9
    'wr3             :        :        :ccwwwwwwd    9
    '                :        :        :        :    9
    'ins     wr0______i1i2i3wr1i1i2i3wr2i1i2i3wr3
    '
    '
    'slice   012345670123456701234567012345670123456
    '                :         :         :         : cycles
    'wr0    ccwwwwwwwd         :         :         :    ?
    'wr2             :ccwwwwwwwd         :         :   10
    'wr4             :         :ccwwwwwwwd         :   10
    'wr6             :         :         :ccwwwwwwwd   10
    '                :         :         :         :
    'ins    wr0_______i1i2i3*wr2i1i2i3*wr4i1i2i3*wr6
    '
    '
    'slice   012345670123456701
    '                :  :  :  : cycles
    'wr0    cccwwwwwwd  :  :  :    ?
    'wr3             :ccd  :  :    3
    'wr6             :  :ccd  :    3
    'wr1             :  :  :ccd    3
    '                :  :  :  :
    'ins    wr0_______wr3wr6wr1
    '
    '
    'slice   012345670123456701234
    '                :   :   :   : cycles
    'wr0    ccwwwwwwwd   :   :   :    ?
    'wr4             :ccwd   :   :    4
    'wr0             :   :ccwd   :    4
    'wr4             :   :   :ccwd    4
    '
    'ins    wr0_______wr4_wr0_wr4_
    '
    '
    'slice   012345670123456701234567
    '                :    :    :    : cycles
    'wr0    ccwwwwwwwd    :    :    :    ?
    'wr5             :ccwwd    :    :    5
    'wr2             :    :ccwwd    :    5
    'wr7             :    :    :ccwwd    5
    '
    'ins    wr0_______i1wr5i1wr2i1wr7
    '
    '
    'slice   012345670123456701234567012
    '                :     :     :     : cycles
    'wr0    ccwwwwwwwd     :     :     :    ?
    'wr6             :ccwwwd     :     :    6
    'wr4             :     :ccwwwd     :    6
    'wr2             :     :     :ccwwwd    6
    '
    'ins    wr0_______i1*wr6i1*wr4i1*wr2
    '
    '
    'slice   012345670123456701234567012345
    '                :      :      :      : cycles
    'wr0    ccwwwwwwwd      :      :      :    ?
    'wr7             :ccwwwwd      :      :    7
    'wr6             :      :ccwwwwd      :    7
    'wr5             :      :      :ccwwwwd    7
    '
    'ins    wr0_______i1i2wr7i1i2wr6i1i2wr5
    
  • roglohrogloh Posts: 5,837
    edited 2020-12-15 01:06
    So if the shortest RDLONG is taking 9 clock cycles, once the first hub address is synced up is the fastest (non setq based) back-to-back RDLONG sequence achieved by reading long addresses n, n+1, n+2, OR is it n, n-1, n-2 (modulo 8)? i.e. Which way does the egg-beater increment its addresses, forwards or backwards?

    Writes look like the would be fastest when writing to long addresses n, n+3, n+6, n+9 all modulo 8 (or perhaps n, n-3, n-6 etc if the egg beater addresses cycles in the other direction).

    It could be useful in some cases to know this optimal access pattern if you need to optimize some data structure layouts etc based on the order in which they get accessed. Of course that's for basic COG-exec PASM2, any hub-exec code would obviously throw its own spanner in the works and mess things up further.
  • I can't attach a .txt or .zip file, tried on two computers, so had to post tables above separately.
Sign In or Register to comment.