jpeg decoding with picojpeg and FlexC+Spin2

RaymanRayman Posts: 11,247
edited 2020-09-16 - 22:00:08 in Propeller 2
This took some work, but I figured out how to decode and show a 640x300 .jpg image using picojpeg.

The 16 bpp VGA driver is in Spin2, but the main code is in Fastspin's version of C.

Here's a screenshot.
4032 x 3024 - 3M

Comments

  • RaymanRayman Posts: 11,247
    edited 2020-09-16 - 22:00:49
    The RGBSQZ PASM2 instruction turned out to be very helpful...

    Here's the jpg file that was decoded.
    640 x 300 - 61K
  • Wow this is quite a step forward.
    Its probably not optimized yet, but what kind of time does it take to render the image?
  • It's less than a second, but yes, needs a lot of optimization... Maybe get another cog involved for video...
  • Ok, thats mighty useful. I guess you could double buffer or somehow just 'switch' the image on, rather than have it render lines/blocks at a time, to give the illusion of speed
  • Very nice. I wonder if there is scope for a motion jpeg type of video decoder on the P2 if the decode rate can be boosted to ~24Hz so. Is it anywhere in that ballpark if you throw a few cogs at it?
  • Dave HeinDave Hein Posts: 6,127
    edited 2020-09-17 - 02:00:12
    Rayman, can you post your code. I'd be interested in looking at it. I've written a few JPEG decoders in the past. I looked at the picojpeg code on GitHub, and I'm wondering how you implemented the inverse DCT. picojpeg's IDCT multiplies 16-bit numbers, so the P2's hardware multiplier could be used. I don't know whether FlexGUI's C compiler uses the hardware multiplier or the CORDIC multiplier for short ints. The assembly code would show which multiplier is being used.
  • Sure, I've attached the code.

    Note: This is at the "Hey, I just got this working!" stage and not the polished final stage. But, it may never get polished.

    This code originally decoded the image into a giant array of 24-bit color pixels on the heap.
    But, the P2 doesn't have enough RAM to do that and show a 16-bit image from HUB RAM at the same time.

    So, I looked and figured out that it decodes in small chunks it calls MCUs. My test image uses MCUs that consist of four 8x8 pixel blocks. That may be the only format that works at the moment.
    What the code does now is copy each MCU to the display buffer after it is decoded, converting from 24bpp to 16bpp along the way.
    This way, we only need heap storage for a single MCU.

    I think this could also enable decent video, especially when used with eMMC, once optimized for speed...

  • roglohrogloh Posts: 2,614
    edited 2020-09-17 - 14:34:02
    If the writes to external memory can be accumulated to take advantage of something like scan line write bursts then HyperRAM should be useable for the frame buffer, also then there would be no need to convert down to 16 bpp to save room unless that is the colour mode in use.
  • Converting to 16bpp is fairly efficient due to RGBSQZ assembly instruction. I think that allows for higher resolution than you can get with 24 bit color.

    Also, it seems the red, blue, and green bytes are kept in separate buffers that you have to read from.

    Still, 24bpp would be nice too. Maybe for QVGA resolution that would be the way to go...
  • You could also try doing some simple ordered dithering during the 24->16 conversion. That'd probably increase the quality a good bit.
Sign In or Register to comment.