@Wuerfel_21 said:
I think that one just reads one long (4 bytes), but that's a question for @rogloh
Yeah READ_BURST reads an arbitrary number of bytes, READ_LONG reads in a single long.
A burst read will be broken up into fragments based on a couple of settings, the settable per COG burst limit which is to allow fairness and let a higher priority video COG get a go if it's request is pending, and the device's own burst limit. Eg. HyperRAM transfers need to stop after 4us or so for the internal refresh to operate correctly (I could be wrong on the exact number). It also auto fragments on a page boundary - but that is more to deal with internal device limitations vs fairness to other COGs.
For comparison, here is three dividers with read CRC enabled:
SD clock-divider set to sysclock/4 (50.0 MHz)
Buffer = 8 kB, Written 2048 kB at 18285 kB/s, Verified, Read 2048 kB at 20277 kB/s
SETDIV SD clock-divider set to sysclock/3 (66.6 MHz)
Buffer = 8 kB, Written 2048 kB at 23011 kB/s, Verified, Read 2048 kB at 25600 kB/s
SETDIV SD clock-divider set to sysclock/2 (100.0 MHz)
Buffer = 8 kB, Written 2048 kB at 22755 kB/s, Verified, Read 2048 kB at 25924 kB/s
Now same three dividers but with read CRC disabled:
SD clock-divider set to sysclock/4 (50.0 MHz)
BLOCK_READ_CRC 0 0
Buffer = 8 kB, Written 2048 kB at 18123 kB/s, Verified, Read 2048 kB at 21787 kB/s
BLOCK_READ_CRC 0 0 SETDIV SD clock-divider set to sysclock/3 (66.6 MHz)
Buffer = 8 kB, Written 2048 kB at 22260 kB/s, Verified, Read 2048 kB at 28444 kB/s
BLOCK_READ_CRC 0 0 SETDIV SD clock-divider set to sysclock/2 (10
Buffer = 8 kB, Written 2048 kB at 22021 kB/s, Verified, Read 2048 kB at 40156 kB/s
Of note is the final line should match my previous posting but comes in a bit faster. Particularly on the writes. I guess that's related to repeated operations. The SD card changes power setting or something.
A comment on the sysclock/4 comparison between CRC on and off for block reading - There is some extra overhead with read CRC enabled. This is not the time required to process the CRC but rather due to the data block fast copy into cogRAM, for subsequent CRC processing, having to be conducted serially against streamer ops. Namely the FIFO must be stopped to allow the fast copy to happen smoothly.
It was discovered during development that FIFO writes, in particular, clashes badly against direct hubRAM accesses. The FIFO forces a lot of cog stalls!
@rogloh What exactly does this "maxburst" argument do if it is not -1 ?
'Give video cog priority access to PSRAM
exmem.setQoS(cog,-1,15,false,false) '(cogn, maxburst, priority, locked, attention)
Assuming -1 turns it off. Does this truncate requests? Or, maybe just yield periodically to other cogs at specified burst length and then come back and finish?
Yes the burst size is set to the minimum of the device and per COG setting. Nothing will be lost from the request due to a burst being fragmented, the request will stay pending, but the code will yield to polling again at the split point, unless the COG's QoS flags indicate LOCKED. Also you can set LOCKED for a high priority COG so it would fragment at the burst size but not yield back to the poller, and the next fragment for the COG's memory request will then continue back to back - to reduce service latency. Note: it would not make sense to set LOCKED for a lower priority COG but I think you can still do so if you wanted to for any reason.
Regarding needing the WAITX, it might pay to ensure the mailbox value is at zero before setting it to non-zero. If it hasn't finished the prior operation then you'll be messing up.
@Rayman said:
Also, in a video driver need to break up a copy loop into two segments or it doesn't work right...
Not exactly sure why...
Guess I'll play with that maxburst setting...
Maybe set to -1 isn't a good idea here...
'Copy offscreen buffer to PSRAM for next frame
wrlong ##320*240,ptra[2] '#bytes to write or read
wrlong pOffBuf,ptra[1]
wrlong exwrite,ptra
'}
callpa #9,#blank 'bottom blanks
drvnot vsync 'vsync on
callpa #2,#blank 'vertical sync blanks
drvnot vsync 'vsync off
wrlong ##320*240,ptra[2] '#bytes to write or read
wrlong pOffBuf2,ptra[1]
wrlong exwrite2,ptra
jmp #field2 'loop
Have you setup the QoS service classes to give your video cog priority over the writer cog? Now if the video cog is also the writer cog then you are on your own with respect to service priority and you'll have to ensure you can split your workload up into smaller chunks such that the video reader is not interrupted at critical times causing dropouts on the video line. Remember there is only a single mailbox per cog so only one operation is active at a time from a single cog.
Update: if you want reliable video you really should do a write operation in a different cog that can be slowed down by the video reader cog getting priority.
Actually, that can't be because it works if the transfer amount is halved...
@rogloh Well, I'm not getting it...
The video cog is retrieving scanlines from PSRAM.
Then, at the end, is copying working buffer in hub to display buffer in PSRAM.
This is set at the end of visible lines...
Are you saying the last step should be done by some other cog?
Breaking up the copy of working buffer in HUB to display buffer in PSRAM into two operations seems to work, so will attempt to stick to that.
Don't know why needs to be split though...
Well as long as each operation completes before the next one needs to be ready it would work out okay. You just need to make sure that all the workload requested can complete in time before the video read data is required to be ready.
You need to be mindful of the bandwidth needed for your different memory operations - i.e what video resolution and depth and how many scan lines worth of time do you have. Also you need to be confident that the workload requested will complete in time before any video data is required to be valid. This depends on latency as well as transfer duration. Also breaking up into fragments always would extend the time slightly, due to setup/polling overheads etc.
Do you believe there is time available in the blanking period to readwrite an entire screen's worth of pixel data to HUBPSRAM?
EDIT: changed read to write above.
UPDATE: If you have access to your signals and a scope you might be able to probe a chip select line of the PSRAM as well as VSYNC & HYSNC to see what is happening with your read/write accesses and whether they properly complete in time. That's how I made sure my drivers interworked together in the early days of debugging things when working with video and not knowing what was working and what wasn't etc. That can help find the limits of what is achievable too when you really push it.
Comments
Yeah READ_BURST reads an arbitrary number of bytes, READ_LONG reads in a single long.
A burst read will be broken up into fragments based on a couple of settings, the settable per COG burst limit which is to allow fairness and let a higher priority video COG get a go if it's request is pending, and the device's own burst limit. Eg. HyperRAM transfers need to stop after 4us or so for the internal refresh to operate correctly (I could be wrong on the exact number). It also auto fragments on a page boundary - but that is more to deal with internal device limitations vs fairness to other COGs.
Spin version of my file speed tester using ioctl() now works with latest flexspin. Eric added it - https://forums.parallax.com/discussion/comment/1569473/#Comment_1569473
For comparison, here is three dividers with read CRC enabled:
Now same three dividers but with read CRC disabled:
Of note is the final line should match my previous posting but comes in a bit faster. Particularly on the writes. I guess that's related to repeated operations. The SD card changes power setting or something.
A comment on the sysclock/4 comparison between CRC on and off for block reading - There is some extra overhead with read CRC enabled. This is not the time required to process the CRC but rather due to the data block fast copy into cogRAM, for subsequent CRC processing, having to be conducted serially against streamer ops. Namely the FIFO must be stopped to allow the fast copy to happen smoothly.
It was discovered during development that FIFO writes, in particular, clashes badly against direct hubRAM accesses. The FIFO forces a lot of cog stalls!
@rogloh What exactly does this "maxburst" argument do if it is not -1 ?
Assuming -1 turns it off. Does this truncate requests? Or, maybe just yield periodically to other cogs at specified burst length and then come back and finish?
There's a global maxburst limit that's used instead / the per-cog limit is clamped to. This is set somewhere in the startup code.
What if this is set to 320 and then asked for a burst of 640?
gets split
And it probably pays attention to other cogs after each split?
Yes the burst size is set to the minimum of the device and per COG setting. Nothing will be lost from the request due to a burst being fragmented, the request will stay pending, but the code will yield to polling again at the split point, unless the COG's QoS flags indicate LOCKED. Also you can set LOCKED for a high priority COG so it would fragment at the burst size but not yield back to the poller, and the next fragment for the COG's memory request will then continue back to back - to reduce service latency. Note: it would not make sense to set LOCKED for a lower priority COG but I think you can still do so if you wanted to for any reason.
Seem to get corrupted data if don't do a waitx after telling it copy a line of video...
Shouldn't the wait loop after the waitx be enough?
Seems it isn't...
Also, in a video driver need to break up a copy loop into two segments or it doesn't work right...
Not exactly sure why...
Guess I'll play with that maxburst setting...
Maybe set to -1 isn't a good idea here...
RDLONG can set Z flag on a zero value itself.
Regarding needing the WAITX, it might pay to ensure the mailbox value is at zero before setting it to non-zero. If it hasn't finished the prior operation then you'll be messing up.
Hmm… that might be it…
Have you setup the QoS service classes to give your video cog priority over the writer cog? Now if the video cog is also the writer cog then you are on your own with respect to service priority and you'll have to ensure you can split your workload up into smaller chunks such that the video reader is not interrupted at critical times causing dropouts on the video line. Remember there is only a single mailbox per cog so only one operation is active at a time from a single cog.
Update: if you want reliable video you really should do a write operation in a different cog that can be slowed down by the video reader cog getting priority.
Actually, that can't be because it works if the transfer amount is halved...
@rogloh Well, I'm not getting it...
The video cog is retrieving scanlines from PSRAM.
Then, at the end, is copying working buffer in hub to display buffer in PSRAM.
This is set at the end of visible lines...
Are you saying the last step should be done by some other cog?
Breaking up the copy of working buffer in HUB to display buffer in PSRAM into two operations seems to work, so will attempt to stick to that.
Don't know why needs to be split though...
Well as long as each operation completes before the next one needs to be ready it would work out okay. You just need to make sure that all the workload requested can complete in time before the video read data is required to be ready.
You need to be mindful of the bandwidth needed for your different memory operations - i.e what video resolution and depth and how many scan lines worth of time do you have. Also you need to be confident that the workload requested will complete in time before any video data is required to be valid. This depends on latency as well as transfer duration. Also breaking up into fragments always would extend the time slightly, due to setup/polling overheads etc.
Do you believe there is time available in the blanking period to readwrite an entire screen's worth of pixel data to HUBPSRAM?
EDIT: changed read to write above.
UPDATE: If you have access to your signals and a scope you might be able to probe a chip select line of the PSRAM as well as VSYNC & HYSNC to see what is happening with your read/write accesses and whether they properly complete in time. That's how I made sure my drivers interworked together in the early days of debugging things when working with video and not knowing what was working and what wasn't etc. That can help find the limits of what is achievable too when you really push it.