Shop OBEX P1 Docs P2 Docs Learn Events
PSRAM based Anti-Aliased GUI --> Now 16bpp — Parallax Forums

PSRAM based Anti-Aliased GUI --> Now 16bpp

RaymanRayman Posts: 14,744
edited 2024-11-29 18:54 in Propeller 2

Just merged Chip's PSRAM driver and Anti-Aliased line drawing code with McuFont in Flexprop.
Think have the beginning of a 24bpp PSRAM based GUI.

Currently outputting over HDMI.
Here's a font sample with an anti-aliased line drawn through it:

2016 x 1512 - 893K

Comments

  • evanhevanh Posts: 16,023

    What PSRAM driver? Where?

  • RaymanRayman Posts: 14,744
    edited 2024-11-16 23:08

    Chip's version is part of this in first post: https://forums.parallax.com/discussion/175725/anti-aliased-24-bits-per-pixel-hdmi/p1

    Adapted it for Platform board as part of the attached.
    But, can swap out PSRAM driver and should work with Edge 32MB too...

    See now that redid some things that @rogloh already did... Guess missed that...

  • evanhevanh Posts: 16,023

    Oh, wow, I have that. Thanks for the pointer.
    Never got it up running I don't think. Was too distracted at the time.

  • RaymanRayman Posts: 14,744

    Could use anti-aliased circles...

    Found this Wu algorithm that can do anti-aliased lines and circles, but only very thin ones...
    Some people suggest drawing aliased circle and then anti-aliased at the edge.

    Think will cheat for now and use series of bitmaps for various size circles, but may try the above.

  • RaymanRayman Posts: 14,744

    Got it to load a 24bpp bitmap file for the background.
    Next up is something of a diversion, but want to create several frames of a video, save to uSD and then see how fast can play video from uSD.

    This will be double buffered in PSRAM.
    Might be terribly slow, but maybe with the new 4-bit uSD driver will be interesting...

    2016 x 1512 - 653K
  • RaymanRayman Posts: 14,744
    edited 2024-11-19 22:54

    Getting 6 fps video with 4-bit uSD card.

    Could be good for some things...
    This is double buffered 24-bit video encodes as 32 bpp at 640x480.
    So that's 1.2 MB/frame so reading at ~7 MB/sec.

    Think 4-bit u SD can actually do ~20 MB/sec though, so could be further optimization that might get to 20 fps or so...

    In real life, might want to do this with palleted 8bpp so can just load to hub ram.
    That should be capable of 60 fps...

    Also contemplating switching the whole thing to 16-bpp. Not there yet though. Think stay 24 bpp until need to change...

  • RaymanRayman Posts: 14,744
    edited 2024-11-19 23:22

    Tried increasing uSD buffer sizes and it does make faster, but screen starts flickering.
    This is a puzzle...

    Think interfering with video driver trying to read from PSRAM. Guess can't hog it too much...

  • evanhevanh Posts: 16,023
    edited 2024-11-19 23:31

    Uncompressed 24-bit video is insane. Stop it, stop it now! :#

    If you want to push the SD card a little harder then disable the block read CRC checking and use CLK_DIV = 2 for faster SD clock.
    To disable read CRC use -D NOREADBLOCKCRC on the compile line.

  • RaymanRayman Posts: 14,744

    Ok, seems can only write one line at a time to PSRAM without messing up the video driver...

    Still got framerate up to ~16 fps with bigger uSD buffers...
    This is enough for basic video... Think 15 fps is the lower limit for pain...

  • roglohrogloh Posts: 5,837

    @evanh said:
    Uncompressed 24-bit video is insane. Stop it, stop it now! :#

    No it isn't - we gotta push the envelope and find where the P2 breaks. 😜

  • evanhevanh Posts: 16,023
    edited 2024-11-20 00:13

    Hmm, sysclock/2 (CLK_DIV = 2) calibration needs some work. I might need to bring in the sub-sysclock delay line tricks to gain more fine grained adjustment. Sysclock/3 is definitely a whole lot more reliable.

    PS: It'll be a good exercise to demonstrate auto tuning for the PSRAMs too.

  • RaymanRayman Posts: 14,744

    Maybe one limitation is that fread is blocking… maybe need to start psram transfer first, but with a delay…

  • I once again need to remark that I had ok-ish Cinepak-format video working, I think that did up to ~24 fps at 1024x768 (I think that was with 16bbp framebuffers - mostly limited by PSRAM bandwidth IIRC). Streaming uncompressed video is a bit too much brrbrr. Though maybe we can come up with a format that works better than Cinepak and can be decoded efficiently still.

    For reference, Cinepak works like this, roughly:

    • Screen is split into some number of slices
    • Each slice gets two codebooks ("V1" and "V4"), containing 256 2x2 pixel patterns (in subsampled YUV - 4 Y values, one U, one V)
    • Each 4x4 macro-block in the slice is encoded in one of 3 ways:
      • SKIP -> pixels from previous frame are kept
      • V1 -> one V1 pattern is used (scaled up to 4x4)
      • V4 -> four V4 patterns are used

    At maximum quality this takes ~3 bits per pixel.
    The codebook is essentially the same as a 256 color palette, but with 6 dimensions instead of 3. Keeping unchanged parts from the previous frame is obvious.

  • RaymanRayman Posts: 14,744

    @Wuerfel_21 that sounds like a better way to do video.

    Sounds like vga resolution should be easy then…

  • evanhevanh Posts: 16,023

    I've incorporated using registered CMD/DAT pins into the rxlag calibration. It is enough for reliable sysclock/2 ops. The Schmitt Trigger option is not as consistent across differing SD cards, so can't be used as part of a predetermined delay line slope. However, it would still be a requirement for managing sysclock/1 auto-tuning. Therefore a more sophisticated sequence is needed when dealing with a DDR interface.

    On that note, as Ada already pointed out, PSRAM doesn't offer any easy solution for recalibrating on the fly. If sysclock/1 was ever attempted it would also need to have a periodic break in normal operations to recheck the calibration, not unlike performing garbage collecting. Sysclock/2 isn't immune to slipping out of calibration either though, so this might be on the cards in the future anyway.

  • @evanh said:
    If sysclock/1 was ever attempted it would also need to have a periodic break in normal operations to recheck the calibration, not unlike performing garbage collecting. Sysclock/2 isn't immune to slipping out of calibration either though, so this might be on the cards in the future anyway.

    It certainly can. Anecdotally I can say that I was running the 96MB board outside on a hot summer day and eventually one of the banks crapped out. Though that's a fairly janky board with many banks on long traces, I feel like EC32MB handles it better.

    I shall also refer to the old hairdryer video:
    That was on the EC32MB. Note that the sound crashes first, despite not relying on code from PSRAM. So that's the P2 hub RAM giving out. The corruption also doesn't really look like PSRAM bit errors (note how it doesn't align to a 16px grid)

  • roglohrogloh Posts: 5,837

    @evanh said:

    On that note, as Ada already pointed out, PSRAM doesn't offer any easy solution for recalibrating on the fly. If sysclock/1 was ever attempted it would also need to have a periodic break in normal operations to recheck the calibration, not unlike performing garbage collecting. Sysclock/2 isn't immune to slipping out of calibration either though, so this might be on the cards in the future anyway.

    For reads to continue unimpeded while the PSRAM driver self adapts to timing changes, support for that needs to be built into the driver itself or for other housekeeping COG to access the RAM at some lower priority to the main client COGs and test whether different timing delay values are better at the current frequency and temperature. I built this idea into my driver with the possibility to configure extra dummy test banks which can be setup to operate using different timing to banks used by the regular client COGs. It just need some manager COG to occasionally run a test to see which is the better timing to use. I have not written the test part though, not sure how best to do that as well as deciding which value is better in cases where there are only two working values, one of which might be marginal and ready to fail at any time.

  • evanhevanh Posts: 16,023
    edited 2024-11-20 15:20

    There can be more than two possible fits even at sysclock/1. But since the combinations with clock polarity and schmitt trigger are not well ordered like data registration is, it would need some initial runtime learning to figure a good delay-line slope to get those extra fits. It would likely involve a clock ramp up to do the learning.

  • RaymanRayman Posts: 14,744

    Switched it all over to 16bpp

    Can now do uncompressed 16bpp VGA video at 33 fps!
    Big improvement...

  • RaymanRayman Posts: 14,744

    Found some C code to do anti-aliased circles.
    Looks pretty good.

    Code came from here: https://github.com/Versa-Design/Antialiased_Circle

    The "optimized" version has some kind of bug though...

    640 x 480 - 235K
Sign In or Register to comment.