Shop OBEX P1 Docs P2 Docs Learn Events
Exmem_mini.spin2 testing - Page 3 — Parallax Forums

Exmem_mini.spin2 testing

13»

Comments

  • roglohrogloh Posts: 6,061

    @Wuerfel_21 said:
    I think that one just reads one long (4 bytes), but that's a question for @rogloh

    Yeah READ_BURST reads an arbitrary number of bytes, READ_LONG reads in a single long.

    A burst read will be broken up into fragments based on a couple of settings, the settable per COG burst limit which is to allow fairness and let a higher priority video COG get a go if it's request is pending, and the device's own burst limit. Eg. HyperRAM transfers need to stop after 4us or so for the internal refresh to operate correctly (I could be wrong on the exact number). It also auto fragments on a page boundary - but that is more to deal with internal device limitations vs fairness to other COGs.

  • evanhevanh Posts: 16,778
    edited 2025-10-06 02:10

    @evanh said:
    Oh, huh, the Spin2 libc support doesn't have ioctl(). I'll ask Eric about it ...

    Spin version of my file speed tester using ioctl() now works with latest flexspin. Eric added it - https://forums.parallax.com/discussion/comment/1569473/#Comment_1569473

    ...
     BLOCK_READ_CRC 0 0  SETDIV  SD clock-divider set to sysclock/2 (100.0 MHz)
    ...
     Buffer = 2 kB,  Written 512 kB at 15515 kB/s,  Verified,  Read 512 kB at 32000 kB/s
     Buffer = 2 kB,  Written 512 kB at 17066 kB/s,  Verified,  Read 512 kB at 32000 kB/s
     Buffer = 2 kB,  Written 512 kB at 13473 kB/s,  Verified,  Read 512 kB at 32000 kB/s
    
     Buffer = 8 kB,  Written 2048 kB at 19883 kB/s,  Verified,  Read 2048 kB at 39384 kB/s
     Buffer = 8 kB,  Written 2048 kB at 19504 kB/s,  Verified,  Read 2048 kB at 39384 kB/s
     Buffer = 8 kB,  Written 2048 kB at 20897 kB/s,  Verified,  Read 2048 kB at 39384 kB/s
    
     Buffer = 32 kB,  Written 4096 kB at 23676 kB/s,  Verified,  Read 4096 kB at 41373 kB/s
     Buffer = 32 kB,  Written 4096 kB at 22755 kB/s,  Verified,  Read 4096 kB at 41373 kB/s
     Buffer = 32 kB,  Written 4096 kB at 22882 kB/s,  Verified,  Read 4096 kB at 41795 kB/s
    
  • evanhevanh Posts: 16,778

    For comparison, here is three dividers with read CRC enabled:

     SD clock-divider set to sysclock/4 (50.0 MHz)
     Buffer = 8 kB,  Written 2048 kB at 18285 kB/s,  Verified,  Read 2048 kB at 20277 kB/s
    
     SETDIV  SD clock-divider set to sysclock/3 (66.6 MHz)
     Buffer = 8 kB,  Written 2048 kB at 23011 kB/s,  Verified,  Read 2048 kB at 25600 kB/s
    
     SETDIV  SD clock-divider set to sysclock/2 (100.0 MHz)
     Buffer = 8 kB,  Written 2048 kB at 22755 kB/s,  Verified,  Read 2048 kB at 25924 kB/s
    

    Now same three dividers but with read CRC disabled:

     SD clock-divider set to sysclock/4 (50.0 MHz)
     BLOCK_READ_CRC 0 0
     Buffer = 8 kB,  Written 2048 kB at 18123 kB/s,  Verified,  Read 2048 kB at 21787 kB/s
    
     BLOCK_READ_CRC 0 0  SETDIV  SD clock-divider set to sysclock/3 (66.6 MHz)
     Buffer = 8 kB,  Written 2048 kB at 22260 kB/s,  Verified,  Read 2048 kB at 28444 kB/s
    
     BLOCK_READ_CRC 0 0  SETDIV  SD clock-divider set to sysclock/2 (10
     Buffer = 8 kB,  Written 2048 kB at 22021 kB/s,  Verified,  Read 2048 kB at 40156 kB/s
    

    Of note is the final line should match my previous posting but comes in a bit faster. Particularly on the writes. I guess that's related to repeated operations. The SD card changes power setting or something.

  • evanhevanh Posts: 16,778

    A comment on the sysclock/4 comparison between CRC on and off for block reading - There is some extra overhead with read CRC enabled. This is not the time required to process the CRC but rather due to the data block fast copy into cogRAM, for subsequent CRC processing, having to be conducted serially against streamer ops. Namely the FIFO must be stopped to allow the fast copy to happen smoothly.

    It was discovered during development that FIFO writes, in particular, clashes badly against direct hubRAM accesses. The FIFO forces a lot of cog stalls!

  • RaymanRayman Posts: 15,721

    @rogloh What exactly does this "maxburst" argument do if it is not -1 ?

      'Give video cog priority access to PSRAM
      exmem.setQoS(cog,-1,15,false,false)   '(cogn, maxburst, priority, locked, attention)
    

    Assuming -1 turns it off. Does this truncate requests? Or, maybe just yield periodically to other cogs at specified burst length and then come back and finish?

  • There's a global maxburst limit that's used instead / the per-cog limit is clamped to. This is set somewhere in the startup code.

  • RaymanRayman Posts: 15,721

    What if this is set to 320 and then asked for a burst of 640?

  • gets split

  • RaymanRayman Posts: 15,721

    And it probably pays attention to other cogs after each split?

  • roglohrogloh Posts: 6,061

    Yes the burst size is set to the minimum of the device and per COG setting. Nothing will be lost from the request due to a burst being fragmented, the request will stay pending, but the code will yield to polling again at the split point, unless the COG's QoS flags indicate LOCKED. Also you can set LOCKED for a high priority COG so it would fragment at the burst size but not yield back to the poller, and the next fragment for the COG's memory request will then continue back to back - to reduce service latency. Note: it would not make sense to set LOCKED for a lower priority COG but I think you can still do so if you wanted to for any reason.

  • RaymanRayman Posts: 15,721

    Seem to get corrupted data if don't do a waitx after telling it copy a line of video...

    Shouldn't the wait loop after the waitx be enough?
    Seems it isn't...

                    wrlong  y,ptra
                    add     exoff,##2560*2
                    waitx   ##8000
    
    exwait
                    rdlong  x,ptra
                    cmp     x,#0  wz
            if_nz   jmp     #exwait
    
                    djnz    mx0,#Backloop
    
  • RaymanRayman Posts: 15,721
    edited 2025-10-09 21:54

    Also, in a video driver need to break up a copy loop into two segments or it doesn't work right...
    Not exactly sure why...

    Guess I'll play with that maxburst setting...
    Maybe set to -1 isn't a good idea here...

             'Copy offscreen buffer to PSRAM for next frame
            wrlong  ##320*240,ptra[2] '#bytes to write or read
            wrlong  pOffBuf,ptra[1]
            wrlong  exwrite,ptra
    '}
    
                    callpa  #9,#blank              'bottom blanks
    
                    drvnot  vsync         'vsync on
    
                    callpa  #2,#blank               'vertical sync blanks
    
                    drvnot  vsync         'vsync off
    
    
             wrlong  ##320*240,ptra[2] '#bytes to write or read
             wrlong  pOffBuf2,ptra[1]
             wrlong  exwrite2,ptra
    
    
                    jmp     #field2                  'loop
    
  • evanhevanh Posts: 16,778

    RDLONG can set Z flag on a zero value itself.

    exwait
                    rdlong  x,ptra   wz
            if_nz   jmp     #exwait
    
  • evanhevanh Posts: 16,778

    Regarding needing the WAITX, it might pay to ensure the mailbox value is at zero before setting it to non-zero. If it hasn't finished the prior operation then you'll be messing up.

  • RaymanRayman Posts: 15,721

    Hmm… that might be it…

  • roglohrogloh Posts: 6,061
    edited 2025-10-10 00:47

    @Rayman said:
    Also, in a video driver need to break up a copy loop into two segments or it doesn't work right...
    Not exactly sure why...

    Guess I'll play with that maxburst setting...
    Maybe set to -1 isn't a good idea here...

             'Copy offscreen buffer to PSRAM for next frame
            wrlong  ##320*240,ptra[2] '#bytes to write or read
            wrlong  pOffBuf,ptra[1]
            wrlong  exwrite,ptra
    '}
    
                    callpa  #9,#blank              'bottom blanks
    
                    drvnot  vsync         'vsync on
    
                    callpa  #2,#blank               'vertical sync blanks
    
                    drvnot  vsync         'vsync off
    
    
             wrlong  ##320*240,ptra[2] '#bytes to write or read
             wrlong  pOffBuf2,ptra[1]
             wrlong  exwrite2,ptra
    
    
                    jmp     #field2                  'loop
    

    Have you setup the QoS service classes to give your video cog priority over the writer cog? Now if the video cog is also the writer cog then you are on your own with respect to service priority and you'll have to ensure you can split your workload up into smaller chunks such that the video reader is not interrupted at critical times causing dropouts on the video line. Remember there is only a single mailbox per cog so only one operation is active at a time from a single cog.

    Update: if you want reliable video you really should do a write operation in a different cog that can be slowed down by the video reader cog getting priority.

  • RaymanRayman Posts: 15,721

    @Rayman said:
    Hmm… that might be it…

    Actually, that can't be because it works if the transfer amount is halved...

    @rogloh Well, I'm not getting it...
    The video cog is retrieving scanlines from PSRAM.
    Then, at the end, is copying working buffer in hub to display buffer in PSRAM.

    This is set at the end of visible lines...
    Are you saying the last step should be done by some other cog?

  • RaymanRayman Posts: 15,721

    Breaking up the copy of working buffer in HUB to display buffer in PSRAM into two operations seems to work, so will attempt to stick to that.
    Don't know why needs to be split though...

  • roglohrogloh Posts: 6,061
    edited 2025-10-10 03:10

    Well as long as each operation completes before the next one needs to be ready it would work out okay. You just need to make sure that all the workload requested can complete in time before the video read data is required to be ready.

    You need to be mindful of the bandwidth needed for your different memory operations - i.e what video resolution and depth and how many scan lines worth of time do you have. Also you need to be confident that the workload requested will complete in time before any video data is required to be valid. This depends on latency as well as transfer duration. Also breaking up into fragments always would extend the time slightly, due to setup/polling overheads etc.

    Do you believe there is time available in the blanking period to readwrite an entire screen's worth of pixel data to HUB?
    EDIT: changed read to write above.

    UPDATE: If you have access to your signals and a scope you might be able to probe a chip select line of the PSRAM as well as VSYNC & HYSNC to see what is happening with your read/write accesses and whether they properly complete in time. That's how I made sure my drivers interworked together in the early days of debugging things when working with video and not knowing what was working and what wasn't etc. That can help find the limits of what is achievable too when you really push it.

Sign In or Register to comment.