Found this Wu algorithm that can do anti-aliased lines and circles, but only very thin ones...
Some people suggest drawing aliased circle and then anti-aliased at the edge.
Think will cheat for now and use series of bitmaps for various size circles, but may try the above.
Got it to load a 24bpp bitmap file for the background.
Next up is something of a diversion, but want to create several frames of a video, save to uSD and then see how fast can play video from uSD.
This will be double buffered in PSRAM.
Might be terribly slow, but maybe with the new 4-bit uSD driver will be interesting...
Uncompressed 24-bit video is insane. Stop it, stop it now!
If you want to push the SD card a little harder then disable the block read CRC checking and use CLK_DIV = 2 for faster SD clock.
To disable read CRC use -D NOREADBLOCKCRC on the compile line.
Hmm, sysclock/2 (CLK_DIV = 2) calibration needs some work. I might need to bring in the sub-sysclock delay line tricks to gain more fine grained adjustment. Sysclock/3 is definitely a whole lot more reliable.
PS: It'll be a good exercise to demonstrate auto tuning for the PSRAMs too.
I once again need to remark that I had ok-ish Cinepak-format video working, I think that did up to ~24 fps at 1024x768 (I think that was with 16bbp framebuffers - mostly limited by PSRAM bandwidth IIRC). Streaming uncompressed video is a bit too much brrbrr. Though maybe we can come up with a format that works better than Cinepak and can be decoded efficiently still.
For reference, Cinepak works like this, roughly:
Screen is split into some number of slices
Each slice gets two codebooks ("V1" and "V4"), containing 256 2x2 pixel patterns (in subsampled YUV - 4 Y values, one U, one V)
Each 4x4 macro-block in the slice is encoded in one of 3 ways:
SKIP -> pixels from previous frame are kept
V1 -> one V1 pattern is used (scaled up to 4x4)
V4 -> four V4 patterns are used
At maximum quality this takes ~3 bits per pixel.
The codebook is essentially the same as a 256 color palette, but with 6 dimensions instead of 3. Keeping unchanged parts from the previous frame is obvious.
I've incorporated using registered CMD/DAT pins into the rxlag calibration. It is enough for reliable sysclock/2 ops. The Schmitt Trigger option is not as consistent across differing SD cards, so can't be used as part of a predetermined delay line slope. However, it would still be a requirement for managing sysclock/1 auto-tuning. Therefore a more sophisticated sequence is needed when dealing with a DDR interface.
On that note, as Ada already pointed out, PSRAM doesn't offer any easy solution for recalibrating on the fly. If sysclock/1 was ever attempted it would also need to have a periodic break in normal operations to recheck the calibration, not unlike performing garbage collecting. Sysclock/2 isn't immune to slipping out of calibration either though, so this might be on the cards in the future anyway.
@evanh said:
If sysclock/1 was ever attempted it would also need to have a periodic break in normal operations to recheck the calibration, not unlike performing garbage collecting. Sysclock/2 isn't immune to slipping out of calibration either though, so this might be on the cards in the future anyway.
It certainly can. Anecdotally I can say that I was running the 96MB board outside on a hot summer day and eventually one of the banks crapped out. Though that's a fairly janky board with many banks on long traces, I feel like EC32MB handles it better.
I shall also refer to the old hairdryer video:
That was on the EC32MB. Note that the sound crashes first, despite not relying on code from PSRAM. So that's the P2 hub RAM giving out. The corruption also doesn't really look like PSRAM bit errors (note how it doesn't align to a 16px grid)
On that note, as Ada already pointed out, PSRAM doesn't offer any easy solution for recalibrating on the fly. If sysclock/1 was ever attempted it would also need to have a periodic break in normal operations to recheck the calibration, not unlike performing garbage collecting. Sysclock/2 isn't immune to slipping out of calibration either though, so this might be on the cards in the future anyway.
For reads to continue unimpeded while the PSRAM driver self adapts to timing changes, support for that needs to be built into the driver itself or for other housekeeping COG to access the RAM at some lower priority to the main client COGs and test whether different timing delay values are better at the current frequency and temperature. I built this idea into my driver with the possibility to configure extra dummy test banks which can be setup to operate using different timing to banks used by the regular client COGs. It just need some manager COG to occasionally run a test to see which is the better timing to use. I have not written the test part though, not sure how best to do that as well as deciding which value is better in cases where there are only two working values, one of which might be marginal and ready to fail at any time.
There can be more than two possible fits even at sysclock/1. But since the combinations with clock polarity and schmitt trigger are not well ordered like data registration is, it would need some initial runtime learning to figure a good delay-line slope to get those extra fits. It would likely involve a clock ramp up to do the learning.
Comments
What PSRAM driver? Where?
Chip's version is part of this in first post: https://forums.parallax.com/discussion/175725/anti-aliased-24-bits-per-pixel-hdmi/p1
Adapted it for Platform board as part of the attached.
But, can swap out PSRAM driver and should work with Edge 32MB too...
See now that redid some things that @rogloh already did... Guess missed that...
Oh, wow, I have that. Thanks for the pointer.
Never got it up running I don't think. Was too distracted at the time.
Could use anti-aliased circles...
Found this Wu algorithm that can do anti-aliased lines and circles, but only very thin ones...
Some people suggest drawing aliased circle and then anti-aliased at the edge.
Think will cheat for now and use series of bitmaps for various size circles, but may try the above.
Got it to load a 24bpp bitmap file for the background.
Next up is something of a diversion, but want to create several frames of a video, save to uSD and then see how fast can play video from uSD.
This will be double buffered in PSRAM.
Might be terribly slow, but maybe with the new 4-bit uSD driver will be interesting...
Getting 6 fps video with 4-bit uSD card.
Could be good for some things...
This is double buffered 24-bit video encodes as 32 bpp at 640x480.
So that's 1.2 MB/frame so reading at ~7 MB/sec.
Think 4-bit u SD can actually do ~20 MB/sec though, so could be further optimization that might get to 20 fps or so...
In real life, might want to do this with palleted 8bpp so can just load to hub ram.
That should be capable of 60 fps...
Also contemplating switching the whole thing to 16-bpp. Not there yet though. Think stay 24 bpp until need to change...
Tried increasing uSD buffer sizes and it does make faster, but screen starts flickering.
This is a puzzle...
Think interfering with video driver trying to read from PSRAM. Guess can't hog it too much...
Uncompressed 24-bit video is insane. Stop it, stop it now!
If you want to push the SD card a little harder then disable the block read CRC checking and use CLK_DIV = 2 for faster SD clock.
To disable read CRC use
-D NOREADBLOCKCRC
on the compile line.Ok, seems can only write one line at a time to PSRAM without messing up the video driver...
Still got framerate up to ~16 fps with bigger uSD buffers...
This is enough for basic video... Think 15 fps is the lower limit for pain...
No it isn't - we gotta push the envelope and find where the P2 breaks. 😜
Hmm, sysclock/2 (CLK_DIV = 2) calibration needs some work. I might need to bring in the sub-sysclock delay line tricks to gain more fine grained adjustment. Sysclock/3 is definitely a whole lot more reliable.
PS: It'll be a good exercise to demonstrate auto tuning for the PSRAMs too.
Maybe one limitation is that fread is blocking… maybe need to start psram transfer first, but with a delay…
I once again need to remark that I had ok-ish Cinepak-format video working, I think that did up to ~24 fps at 1024x768 (I think that was with 16bbp framebuffers - mostly limited by PSRAM bandwidth IIRC). Streaming uncompressed video is a bit too much brrbrr. Though maybe we can come up with a format that works better than Cinepak and can be decoded efficiently still.
For reference, Cinepak works like this, roughly:
At maximum quality this takes ~3 bits per pixel.
The codebook is essentially the same as a 256 color palette, but with 6 dimensions instead of 3. Keeping unchanged parts from the previous frame is obvious.
@Wuerfel_21 that sounds like a better way to do video.
Sounds like vga resolution should be easy then…
I've incorporated using registered CMD/DAT pins into the rxlag calibration. It is enough for reliable sysclock/2 ops. The Schmitt Trigger option is not as consistent across differing SD cards, so can't be used as part of a predetermined delay line slope. However, it would still be a requirement for managing sysclock/1 auto-tuning. Therefore a more sophisticated sequence is needed when dealing with a DDR interface.
On that note, as Ada already pointed out, PSRAM doesn't offer any easy solution for recalibrating on the fly. If sysclock/1 was ever attempted it would also need to have a periodic break in normal operations to recheck the calibration, not unlike performing garbage collecting. Sysclock/2 isn't immune to slipping out of calibration either though, so this might be on the cards in the future anyway.
It certainly can. Anecdotally I can say that I was running the 96MB board outside on a hot summer day and eventually one of the banks crapped out. Though that's a fairly janky board with many banks on long traces, I feel like EC32MB handles it better.
I shall also refer to the old hairdryer video:
That was on the EC32MB. Note that the sound crashes first, despite not relying on code from PSRAM. So that's the P2 hub RAM giving out. The corruption also doesn't really look like PSRAM bit errors (note how it doesn't align to a 16px grid)
For reads to continue unimpeded while the PSRAM driver self adapts to timing changes, support for that needs to be built into the driver itself or for other housekeeping COG to access the RAM at some lower priority to the main client COGs and test whether different timing delay values are better at the current frequency and temperature. I built this idea into my driver with the possibility to configure extra dummy test banks which can be setup to operate using different timing to banks used by the regular client COGs. It just need some manager COG to occasionally run a test to see which is the better timing to use. I have not written the test part though, not sure how best to do that as well as deciding which value is better in cases where there are only two working values, one of which might be marginal and ready to fail at any time.
There can be more than two possible fits even at sysclock/1. But since the combinations with clock polarity and schmitt trigger are not well ordered like data registration is, it would need some initial runtime learning to figure a good delay-line slope to get those extra fits. It would likely involve a clock ramp up to do the learning.
Switched it all over to 16bpp
Can now do uncompressed 16bpp VGA video at 33 fps!
Big improvement...
Found some C code to do anti-aliased circles.
Looks pretty good.
Code came from here: https://github.com/Versa-Design/Antialiased_Circle
The "optimized" version has some kind of bug though...