Shop OBEX P1 Docs P2 Docs Learn Events
HUB RAM interface question - Page 5 — Parallax Forums

HUB RAM interface question

1235789

Comments

  • evanhevanh Posts: 16,040

    Yup. I believe this is something Chip now needs to add to his list of improvements if there is any silicon respins.

  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 15:35

    deleted

  • evanhevanh Posts: 16,040

    I got that fitting after the bug fix:

     BYTE    3   55555556    78640    78648    98304    78640    98296    98304    78640    98296    78640    78648    0
    SHORT    3   55555556   589776    98320   589792    98304   589816    98312   589808    98312   589776    98320    0
     LONG    3   55555556   196624   196640   196616   196632   196632   196624   196624   196624   196624   196640    0
    
    C (number of cogs) = 8
    D (effective divisor) = 6 (BYTE), 3 (SHORT), 1.5 (LONG)
    Tu (Unimpeded Ticks) = 65536
    Tm (Measured Ticks) = 78640, 98304, 196624, 589792
    
    Te (Extra ticks) = Tm - Tu  =>  78640 - 65536 = 13104,  98304 - 65536 = 32768,  196624 - 65536 = 131088
    S (Stalls) = Te / C         =>  13104 / 8 = 1638     ,  32768 / 8 = 4096     ,  131088 / 8 = 16386
    
    Lb (FIFO burst length in longwords) = (C / D) * Tm / Te  =>  Lb = (C / D) * Tm / (Tm - Tu)  =>  Lb = C / (D * (1 - Tu/Tm))
    Lbb = 8 / (6 * (1 - 65536 / 78640)) = 8
    Lbs = 8 / (3 * (1 - 65536 / 98304)) = 8
    Lbl = 8 / (1.5 * (1 - 65536 / 196624)) = 8
    
    Lbb = 8 / (6 * (1 - 65536 / 98304)) = 4
    Lbs = 8 / (3 * (1 - 65536 / 589792)) = 3
    Lbl = 8 / (1.5 * (1 - 65536 / 589824)) = 6
    
  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 15:35

    deleted

  • evanhevanh Posts: 16,040
    edited 2022-02-20 13:59

    It gives right answers. I think that's because every multiple of eight is counted as a stall no matter how many contiguous rotations in a burst.

  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 15:34

    deleted

  • evanhevanh Posts: 16,040
    edited 2022-02-20 14:51

    Mine is 8 right there. Well, 8 for FIFO acting normal and 6 for FIFO misbehaving.
    And we know it isn't always behaving well.

  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 15:00

    deleted

  • evanhevanh Posts: 16,040

    @TonyB_ said:
    Streamer freq = sysclk/1.5 !!!

    Correct. It can handle it.

  • @evanh said:

    @TonyB_ said:
    Streamer freq = sysclk/1.5 !!!

    Correct. It can handle it.

    I need a rethink.

  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 15:12

    Evan, your table here is confusing me:
    https://forums.parallax.com/discussion/comment/1535803/#Comment_1535803
    If numbers in second column are D as in sysclk/D, then they should be halved. Do you agree?

  • evanhevanh Posts: 16,040
    edited 2022-02-20 15:26

    You can derive the effective divisor from it easy enough but it's not directly the divisor. It's an index that is used for calculating the NCO value that goes to SETXFRQ. It doesn't changed for the three lines because neither does the XFRQ change.

    EDIT: I should point out this is all about FIFO/hubRAM bandwidth, not streamer rate. So the "effective divisor" is with respect to the cog's hubRAM bandwidth.

  • @evanh said:
    LONG 3 55555556 196624 196640 196616 196632 196632 196624 196624 196624 196624 196640 0
    ```

    Is $5555_5556 the actual D value you use for SETXFRQ D for long streaming?

  • evanhevanh Posts: 16,040
    edited 2022-02-20 15:27

    Yes. Each test line in the report has it's own.

  • @evanh said:
    Yes. Each test line in the report has it's own.

    Thank you. Concurrent fast move and FIFO performance is amazing, except for the 'glitches'.

  • evanhevanh Posts: 16,040
    edited 2022-02-20 15:44

    It's cool alright. The advantages of buffering and bursting. FIFO is doing two jobs in effect. Shows off the enormous bandwidth of hubRAM too. Not too surprising it can impact power consumption.

  • TonyB_TonyB_ Posts: 2,196
    edited 2022-08-29 22:15

    @evanh said:
    It's cool alright. The advantages of buffering and bursting. FIFO is doing two jobs in effect. Shows off the enormous bandwidth of hubRAM too. Not too surprising it can impact power consumption.

    Apologies if I said you were wrong when you weren't. I think that examining sysclk/1.5 for longs could tell us quite a lot about how FIFO works.

  • evanhevanh Posts: 16,040

    No worries, I was staying open to a proof. Things changed a lot at my end along the way. I had been fixing lots of other parts before that final bug with the duplicated streamer mode. So, up until then I was still struggling to fit it all together.

  • The excessive timings could be explained mathematically by either increasing revs per stall or decreasing burst length. The latter seems more likely in reality.

  • evanhevanh Posts: 16,040
    edited 2022-02-20 16:11

    Here's a case where SHORT won't to drop to 1 longword per burst. The divider means only LONG triggers it.

     BYTE   18   0e38e38e    68064    68056    68056    68056    68064    68056    68064    68056    68064    68056    0
    SHORT   18   0e38e38e    70784    70776    70784    70776    70784    70776    70776    70776    70784    70776    0
     LONG   18   0e38e38e    76928    76936    76936   589680   589752   589824   589824   589536    76928    76936    0
    
    C (number of cogs) = 8
    D (effective divisor) = 36 (BYTE), 18 (SHORT), 9 (LONG)
    Tu (Unimpeded Ticks) = 65536
    Tm (Measured Ticks) = 68056, 70776, 76936, 589824
    
    Te (Extra ticks) = Tm - Tu  =>  68056 - 65536 = 2520,  70776 - 65536 = 5240,  76936 - 65536 = 11400
    S (Stalls) = Te / C         =>  2520 / 8 = 315      ,  5240 / 8 = 655      ,  11400 / 8 = 1425          65536
    
    Lb (FIFO burst length in longwords) = (C / D) * Tm / Te  =>  Lb = (C / D) * Tm / (Tm - Tu)  =>  Lb = C / (D * (1 - Tu/Tm))
    Lbb = 8 / (36 * (1 - 65536 / 68056)) = 6
    Lbs = 8 / (18 * (1 - 65536 / 70776)) = 6
    Lbl = 8 / (9 * (1 - 65536 / 76936)) = 6
    
    Lbl = 8 / (9 * (1 - 65536 / 589824)) = 1
    
  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 16:47

    Good work Evan, showing that burst length can be as low as one during bus hogging.

    I think this is the way sysclk/1.5 for longs works:

    < Rev0 >< Rev1 >< Rev2 >    egg beater Revolutions
    012345670123456701234567    hub RAM slices
    WWWWWWWW................    fast block Write
    ........IIIIIIIIIIIIIIII    FIFO In
    OO.OO.OO.OO.OO.OO.OO.OO.    FIFO Out
    
    122344565554443332221110    FIFO unfilled longs
    

    Yikes, unwanted bolding makes it hard to read.
    FIFO has maximum of 6 unfilled longs.

  • evanhevanh Posts: 16,040

    @TonyB_ said:
    I think this is the way sysclk/1.5 for longs works:

    Yeah, looks good. And that's the non-hogging case. Okay, the hogging case must look atrocious.

  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 16:47

    @evanh said:

    @TonyB_ said:
    I think this is the way sysclk/1.5 for longs works:

    Yeah, looks good. And that's the non-hogging case. Okay, the hogging case must look atrocious.

    Is there any hogging in this instance?

  • @TonyB_ said:
    Good work Evan, showing that burst length can be as low as one during bus hogging.

    I think this is the way sysclk/1.5 for longs works:

    < Rev0 >< Rev1 >< Rev2 >  egg beater Revolutions
    012345670123456701234567  hub RAM slices
    WWWWWWWW................  fast block Write
    ........IIIIIIIIIIIIIIII  FIFO In
    OO.OO.OO.OO.OO.OO.OO.OO.  FIFO Out
    
    122344565554443332221110  FIFO unfilled longs
    

    Yikes, unwanted bolding makes it hard to read.
    FIFO has maximum of 6 unfilled longs.

    LOL. I had an almost identical diagram I started on this yesterday before abandoning it. This is what we need to have for the good vs bad case. I wanted to see the gaps where neither "W" or "I" would be present in the output, showing wasted HUB bandwidth. I wonder if a simple simulation stepping through the different addresses and modeling the FIFO depth triggering the "I" cycles while tracking the slice # could generate these.

  • TonyB_TonyB_ Posts: 2,196
    edited 2022-02-20 22:55

    Here is my first attempt at sysclk/9 for longs, fast and slow. In reality the fast one must use FIFO burst = 6 but I've shown it using burst = 8 because it was easier. The slow timing matches the equation with burst = 1 and that is what I've shown. The bottom line maybe should have 1's and 0's instead of 8's and 7's to indicate that the FIFO is being refilled when there is only one long free, for some reason. Comments welcome.

    < Rev0 >< Rev1 >< Rev2 >< Rev3 >< Rev4 >< Rev5 >< Rev6 >< Rev7 >< Rev8 >    egg beater Revolutions
    012345670123456701234567012345670123456701234567012345670123456701234567    hub RAM slices
    WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW........    fast block Write
    ................................................................IIIIIIII    FIFO In
    O........O........O........O........O........O........O........O........    FIFO Out
    
    111111111222222222333333333444444444555555555666666666777777777876543210    FIFO unfilled longs
    
    
    < Rev0 >< Rev1 >< Rev2 >< Rev3 >< Rev4 >< Rev5 >< Rev6 >< Rev7 >< Rev8 >    egg beater Revolutions
    012345670123456701234567012345670123456701234567012345670123456701234567    hub RAM slices
    ........................................................WWWWWWWW........    fast block Write
    .I........I........I........I........I........I........I........I.......    FIFO In
    O........O........O........O........O........O........O........O........    FIFO Out
    
    877777777877777777877777777877777777877777777877777777877777777877777777    FIFO unfilled longs
    
    
  • evanhevanh Posts: 16,040

    @TonyB_ said:
    Is there any hogging in this instance?

    Sure is, Lbl = 8 / (1.5 * (1 - 65536 / 589824)) = 6

    https://forums.parallax.com/discussion/comment/1535824/#Comment_1535824

  • evanhevanh Posts: 16,040
    edited 2022-02-20 22:48

    @TonyB_ said:

    < Rev0 >< Rev1 >< Rev2 >< Rev3 >< Rev4 >< Rev5 >< Rev6 >< Rev7 >< Rev8 >  egg beater Revolutions
    012345670123456701234567012345670123456701234567012345670123456701234567  hub RAM slices
    ........................................................WWWWWWWW........  fast block Write
    .I........I........I........I........I........I........I........I.......  FIFO In
    O........O........O........O........O........O........O........O........  FIFO Out
    
    877777777877777777877777777877777777877777777877777777877777777877777777  FIFO unfilled longs
    < Rev0 >< Rev1 >< Rev2 >< Rev3 >< Rev4 >< Rev5 >< Rev6 >< Rev7 >< Rev8 >  egg beater Revolutions
    012345670123456701234567012345670123456701234567012345670123456701234567  hub RAM slices
    

    Ah, cool, hubRAM fetches spaced same 9 tick interval as streamer cycles. EDIT: Okay, not cool outcome, knocks the cog back every time.

  • @evanh said:

    @TonyB_ said:
    Is there any hogging in this instance?

    Sure is, Lbl = 8 / (1.5 * (1 - 65536 / 589824)) = 6

    https://forums.parallax.com/discussion/comment/1535824/#Comment_1535824

    Evan, I think that is sysclk/3 for longs and burst = 3. The line with 1966xx is consistent and applies to sysclk/1.5 for longs.

  • evanhevanh Posts: 16,040
    edited 2022-02-20 22:54

    Actually, what it will be is:

    < Rev0 >< Rev1 >< Rev2 >< Rev3 >< Rev4 >< Rev5 >< Rev6 >< Rev7 >< Rev8 >    egg beater Revolutions
    012345670123456701234567012345670123456701234567012345670123456701234567    hub RAM slices
    W........W........W........W........W........W........W........W........    fast block Write
    .I........I........I........I........I........I........I........I.......    FIFO In
    O........O........O........O........O........O........O........O........    FIFO Out
    
    877777777877777777877777777877777777877777777877777777877777777877777777    FIFO unfilled longs
    012345670123456701234567012345670123456701234567012345670123456701234567    hub RAM slices
    < Rev0 >< Rev1 >< Rev2 >< Rev3 >< Rev4 >< Rev5 >< Rev6 >< Rev7 >< Rev8 >    egg beater Revolutions
    

    Visualising really helps!

  • evanhevanh Posts: 16,040

    @TonyB_ said:
    Evan, I think that is sysclk/3 for longs and burst = 3. The line with 1966xx is consistent and applies to sysclk/1.5 for longs.

    Oh, I see that data, 589824, is actually missing in newest runs. It has existed in previous runs of same config. I'm not yet sure how best to trigger worst case conditions.

Sign In or Register to comment.