Shop OBEX P1 Docs P2 Docs Learn Events
Exmem_mini.spin2 testing - Page 3 — Parallax Forums

Exmem_mini.spin2 testing

13»

Comments

  • roglohrogloh Posts: 6,073

    @Wuerfel_21 said:
    I think that one just reads one long (4 bytes), but that's a question for @rogloh

    Yeah READ_BURST reads an arbitrary number of bytes, READ_LONG reads in a single long.

    A burst read will be broken up into fragments based on a couple of settings, the settable per COG burst limit which is to allow fairness and let a higher priority video COG get a go if it's request is pending, and the device's own burst limit. Eg. HyperRAM transfers need to stop after 4us or so for the internal refresh to operate correctly (I could be wrong on the exact number). It also auto fragments on a page boundary - but that is more to deal with internal device limitations vs fairness to other COGs.

  • evanhevanh Posts: 16,793
    edited 2025-10-06 02:10

    @evanh said:
    Oh, huh, the Spin2 libc support doesn't have ioctl(). I'll ask Eric about it ...

    Spin version of my file speed tester using ioctl() now works with latest flexspin. Eric added it - https://forums.parallax.com/discussion/comment/1569473/#Comment_1569473

    ...
     BLOCK_READ_CRC 0 0  SETDIV  SD clock-divider set to sysclock/2 (100.0 MHz)
    ...
     Buffer = 2 kB,  Written 512 kB at 15515 kB/s,  Verified,  Read 512 kB at 32000 kB/s
     Buffer = 2 kB,  Written 512 kB at 17066 kB/s,  Verified,  Read 512 kB at 32000 kB/s
     Buffer = 2 kB,  Written 512 kB at 13473 kB/s,  Verified,  Read 512 kB at 32000 kB/s
    
     Buffer = 8 kB,  Written 2048 kB at 19883 kB/s,  Verified,  Read 2048 kB at 39384 kB/s
     Buffer = 8 kB,  Written 2048 kB at 19504 kB/s,  Verified,  Read 2048 kB at 39384 kB/s
     Buffer = 8 kB,  Written 2048 kB at 20897 kB/s,  Verified,  Read 2048 kB at 39384 kB/s
    
     Buffer = 32 kB,  Written 4096 kB at 23676 kB/s,  Verified,  Read 4096 kB at 41373 kB/s
     Buffer = 32 kB,  Written 4096 kB at 22755 kB/s,  Verified,  Read 4096 kB at 41373 kB/s
     Buffer = 32 kB,  Written 4096 kB at 22882 kB/s,  Verified,  Read 4096 kB at 41795 kB/s
    
  • evanhevanh Posts: 16,793

    For comparison, here is three dividers with read CRC enabled:

     SD clock-divider set to sysclock/4 (50.0 MHz)
     Buffer = 8 kB,  Written 2048 kB at 18285 kB/s,  Verified,  Read 2048 kB at 20277 kB/s
    
     SETDIV  SD clock-divider set to sysclock/3 (66.6 MHz)
     Buffer = 8 kB,  Written 2048 kB at 23011 kB/s,  Verified,  Read 2048 kB at 25600 kB/s
    
     SETDIV  SD clock-divider set to sysclock/2 (100.0 MHz)
     Buffer = 8 kB,  Written 2048 kB at 22755 kB/s,  Verified,  Read 2048 kB at 25924 kB/s
    

    Now same three dividers but with read CRC disabled:

     SD clock-divider set to sysclock/4 (50.0 MHz)
     BLOCK_READ_CRC 0 0
     Buffer = 8 kB,  Written 2048 kB at 18123 kB/s,  Verified,  Read 2048 kB at 21787 kB/s
    
     BLOCK_READ_CRC 0 0  SETDIV  SD clock-divider set to sysclock/3 (66.6 MHz)
     Buffer = 8 kB,  Written 2048 kB at 22260 kB/s,  Verified,  Read 2048 kB at 28444 kB/s
    
     BLOCK_READ_CRC 0 0  SETDIV  SD clock-divider set to sysclock/2 (10
     Buffer = 8 kB,  Written 2048 kB at 22021 kB/s,  Verified,  Read 2048 kB at 40156 kB/s
    

    Of note is the final line should match my previous posting but comes in a bit faster. Particularly on the writes. I guess that's related to repeated operations. The SD card changes power setting or something.

  • evanhevanh Posts: 16,793

    A comment on the sysclock/4 comparison between CRC on and off for block reading - There is some extra overhead with read CRC enabled. This is not the time required to process the CRC but rather due to the data block fast copy into cogRAM, for subsequent CRC processing, having to be conducted serially against streamer ops. Namely the FIFO must be stopped to allow the fast copy to happen smoothly.

    It was discovered during development that FIFO writes, in particular, clashes badly against direct hubRAM accesses. The FIFO forces a lot of cog stalls!

  • RaymanRayman Posts: 15,748

    @rogloh What exactly does this "maxburst" argument do if it is not -1 ?

      'Give video cog priority access to PSRAM
      exmem.setQoS(cog,-1,15,false,false)   '(cogn, maxburst, priority, locked, attention)
    

    Assuming -1 turns it off. Does this truncate requests? Or, maybe just yield periodically to other cogs at specified burst length and then come back and finish?

  • There's a global maxburst limit that's used instead / the per-cog limit is clamped to. This is set somewhere in the startup code.

  • RaymanRayman Posts: 15,748

    What if this is set to 320 and then asked for a burst of 640?

  • gets split

  • RaymanRayman Posts: 15,748

    And it probably pays attention to other cogs after each split?

  • roglohrogloh Posts: 6,073

    Yes the burst size is set to the minimum of the device and per COG setting. Nothing will be lost from the request due to a burst being fragmented, the request will stay pending, but the code will yield to polling again at the split point, unless the COG's QoS flags indicate LOCKED. Also you can set LOCKED for a high priority COG so it would fragment at the burst size but not yield back to the poller, and the next fragment for the COG's memory request will then continue back to back - to reduce service latency. Note: it would not make sense to set LOCKED for a lower priority COG but I think you can still do so if you wanted to for any reason.

  • RaymanRayman Posts: 15,748

    Seem to get corrupted data if don't do a waitx after telling it copy a line of video...

    Shouldn't the wait loop after the waitx be enough?
    Seems it isn't...

                    wrlong  y,ptra
                    add     exoff,##2560*2
                    waitx   ##8000
    
    exwait
                    rdlong  x,ptra
                    cmp     x,#0  wz
            if_nz   jmp     #exwait
    
                    djnz    mx0,#Backloop
    
  • RaymanRayman Posts: 15,748
    edited 2025-10-09 21:54

    Also, in a video driver need to break up a copy loop into two segments or it doesn't work right...
    Not exactly sure why...

    Guess I'll play with that maxburst setting...
    Maybe set to -1 isn't a good idea here...

             'Copy offscreen buffer to PSRAM for next frame
            wrlong  ##320*240,ptra[2] '#bytes to write or read
            wrlong  pOffBuf,ptra[1]
            wrlong  exwrite,ptra
    '}
    
                    callpa  #9,#blank              'bottom blanks
    
                    drvnot  vsync         'vsync on
    
                    callpa  #2,#blank               'vertical sync blanks
    
                    drvnot  vsync         'vsync off
    
    
             wrlong  ##320*240,ptra[2] '#bytes to write or read
             wrlong  pOffBuf2,ptra[1]
             wrlong  exwrite2,ptra
    
    
                    jmp     #field2                  'loop
    
  • evanhevanh Posts: 16,793

    RDLONG can set Z flag on a zero value itself.

    exwait
                    rdlong  x,ptra   wz
            if_nz   jmp     #exwait
    
  • evanhevanh Posts: 16,793

    Regarding needing the WAITX, it might pay to ensure the mailbox value is at zero before setting it to non-zero. If it hasn't finished the prior operation then you'll be messing up.

  • RaymanRayman Posts: 15,748

    Hmm… that might be it…

  • roglohrogloh Posts: 6,073
    edited 2025-10-10 00:47

    @Rayman said:
    Also, in a video driver need to break up a copy loop into two segments or it doesn't work right...
    Not exactly sure why...

    Guess I'll play with that maxburst setting...
    Maybe set to -1 isn't a good idea here...

             'Copy offscreen buffer to PSRAM for next frame
            wrlong  ##320*240,ptra[2] '#bytes to write or read
            wrlong  pOffBuf,ptra[1]
            wrlong  exwrite,ptra
    '}
    
                    callpa  #9,#blank              'bottom blanks
    
                    drvnot  vsync         'vsync on
    
                    callpa  #2,#blank               'vertical sync blanks
    
                    drvnot  vsync         'vsync off
    
    
             wrlong  ##320*240,ptra[2] '#bytes to write or read
             wrlong  pOffBuf2,ptra[1]
             wrlong  exwrite2,ptra
    
    
                    jmp     #field2                  'loop
    

    Have you setup the QoS service classes to give your video cog priority over the writer cog? Now if the video cog is also the writer cog then you are on your own with respect to service priority and you'll have to ensure you can split your workload up into smaller chunks such that the video reader is not interrupted at critical times causing dropouts on the video line. Remember there is only a single mailbox per cog so only one operation is active at a time from a single cog.

    Update: if you want reliable video you really should do a write operation in a different cog that can be slowed down by the video reader cog getting priority.

  • RaymanRayman Posts: 15,748

    @Rayman said:
    Hmm… that might be it…

    Actually, that can't be because it works if the transfer amount is halved...

    @rogloh Well, I'm not getting it...
    The video cog is retrieving scanlines from PSRAM.
    Then, at the end, is copying working buffer in hub to display buffer in PSRAM.

    This is set at the end of visible lines...
    Are you saying the last step should be done by some other cog?

  • RaymanRayman Posts: 15,748

    Breaking up the copy of working buffer in HUB to display buffer in PSRAM into two operations seems to work, so will attempt to stick to that.
    Don't know why needs to be split though...

  • roglohrogloh Posts: 6,073
    edited 2025-10-10 09:36

    Well as long as each operation completes before the next one needs to be ready it would work out okay. You just need to make sure that all the workload requested can complete in time before the video read data is required to be ready.

    You need to be mindful of the bandwidth needed for your different memory operations - i.e what video resolution and depth and how many scan lines worth of time do you have. Also you need to be confident that the workload requested will complete in time before any video data is required to be valid. This depends on latency as well as transfer duration. Also breaking up into fragments always would extend the time slightly, due to setup/polling overheads etc.

    Do you believe there is time available in the blanking period to readwrite an entire screen's worth of pixel data to HUBPSRAM?
    EDIT: changed read to write above.

    UPDATE: If you have access to your signals and a scope you might be able to probe a chip select line of the PSRAM as well as VSYNC & HYSNC to see what is happening with your read/write accesses and whether they properly complete in time. That's how I made sure my drivers interworked together in the early days of debugging things when working with video and not knowing what was working and what wasn't etc. That can help find the limits of what is achievable too when you really push it.

  • RaymanRayman Posts: 15,748

    Think it wasn't really working right... Seems need 16 bit bus for this, well at least for the raycast thing where needs to do other things with PSRAM as well.

  • RaymanRayman Posts: 15,748

    Not seeing QoS having any real effect...

    Is "locked" supposed to prevent other cogs from accessing PSRAM?

  • RaymanRayman Posts: 15,748

    Think maybe see a slight improvement if SetQoS() is changed like this:

    long[mb][0] := drv16.R_CONFIG + cogn 'cogid()

    Any chance that is right?

  • RaymanRayman Posts: 15,748

    Also, would one be correct in assuming that cogs are normally set to lowest priority, 1, to start?

  • roglohrogloh Posts: 6,073
    edited 2025-10-11 02:14

    @Rayman said:
    Not seeing QoS having any real effect...

    Is "locked" supposed to prevent other cogs from accessing PSRAM?

    Locked means that the COG won't repoll for highest priority service between a fragmented request. That is, it will just continue on with the next fragment of the current request.

    @Rayman said:
    Also, would one be correct in assuming that cogs are normally set to lowest priority, 1, to start?

    At startup my PSRAM driver sets the QoS class & flag parameters to 0 and per COG limit to 0xFFFF (meaning device limit applies). Haven't looked at what exmini does on top of that if anything as that's Ada's code. The lowest class is 0 in this scheme, but you can raise the priority and LOCK flag it as well for a video driver for example.

        ' setup some default bank and QoS parameter values
    
        longfill(@deviceData, (burst << 16) | (delay << 12) | (ADDRSIZE-1), 2)
        longfill(@qosData, $FFFF0000, 8)
    

    @Rayman said:
    Think maybe see a slight improvement if SetQoS() is changed like this:

    long[mb][0] := drv16.R_CONFIG + cogn 'cogid()

    Any chance that is right?

    Yes. To setup the QoS the mailbox parameters passed are shown below from my driver.

    {{
    setQos(cog, qos)
    
    This API lets you adjust the request servicing policy per COG in the driver.
    It sets up a COG's maximum burst size (also still limited by device's max burst setting), 
    and the optional priority & flags.
        cog - cog ID to change from 0-7
        qos - qos parameters for the cog (set to 0 to remove COG from polling)
    Use this 32 bit format for qos data
      Bit
      31-16: maximum burst size allowed for this COG in bytes before fragmenting (bursts also limited by device burst size)
      15   : 1 = COG has a polling priority assigned, 0 = round robin polled after prioritized COGs get serviced first
      14-12: 3 bit priority COGs polling order when bit15 = 1, %111 = highest, %000 = lowest
      11   : 1 = additional ATN notification to COG after request is serviced, 0 = mailbox nofication only
      10   : 1 = Locked transfer completes even after burst size fragmentation, 0 = COGs are repolled
      9-0  : reserved (0)
    }}
    PUB setQos(cog, qos) : result | mailbox
        if drivercog == -1 ' driver must be running
            return ERR_INACTIVE
        if cog < 0 or cog > 7 ' enforce cog id range
            return ERR_INVALID
        long[@qosData][cog] := qos & !$1ff
        mailbox := @mailboxes + drivercog*12
        repeat until LOCKTRY(driverlock)
        long[mailbox] := driver.R_CONFIG + cogid()
        repeat while long[mailbox] < 0
        LOCKREL(driverlock)
    

    By the way this should be covered in the MemoryDriverDocumentation pdf file under control requests.

Sign In or Register to comment.