Color Cell Compression
Rayman
Posts: 14,632
Looking for a way to compress images and video for P2, I came across CCC:
https://en.wikipedia.org/wiki/Color_Cell_Compression
Using CCC, one can compress images to 2bpp and still look OK.
See attached for an original 24bpp image and one that is CCC compressed to 2bpp (although the file itself is 8bpp).
The raw data for this image is also attached, so you can see the size of it.
It appears as though we can fit a 1080p image that is CCC encoded into HUB RAM.
Now, the question is: Can we decode this in real time?
I'm hopefully optimistic...
Update: Adding here what our usual VGA test file (bitmap2.bmp) looks like when CCC compressed (the .png file).
https://en.wikipedia.org/wiki/Color_Cell_Compression
Using CCC, one can compress images to 2bpp and still look OK.
See attached for an original 24bpp image and one that is CCC compressed to 2bpp (although the file itself is 8bpp).
The raw data for this image is also attached, so you can see the size of it.
It appears as though we can fit a 1080p image that is CCC encoded into HUB RAM.
Now, the question is: Can we decode this in real time?
I'm hopefully optimistic...
Update: Adding here what our usual VGA test file (bitmap2.bmp) looks like when CCC compressed (the .png file).
zip
420K
Comments
Note: I'm taking a slightly different approach that Wikipedia shows as starting from 8bpp image instead of 24bpp. This saves me from having to pick the best 256 colors for the image.
It can also display the CCC image from the .ccc file in the zip file above.
Note: Updated to support VGA and other resolution ccc files (just open up a bmp of same resolution first!)
Some time ago I was wondering if Ov7670 with fifo ram buffer could be used with P2 or even with P1. I think, it should be possible to do color tracking without having the picture in Ram. Also it is perhaps possible and sufficient for some tasks to load only a part /subframe of the picture into ram at a time.
Regards Christof
http://robotjackie.github.io/portfolio/projects/04_Arduino_cam/
Yeah, it might be hard to do... Might take a few cogs to do 1080p...
You'd want to break up the color information and the bitmap data, I think. That is, if I'm reading this right, each 4x4 block is represented by 16 bits and 2 colors. If you store the whole image's bits as a regular W x H x 1bpp bitmap, which you'd beam-chase. Separately, you'd store the color data as a (W/4) x (W/4) x 2 set of bytes .
The color data you could load during horizontal blanking, and would be repeated for 4 lines. The bitmap data you'd beam chase. For every 8 pixels you'd set up 4 LUT entries (for the 2 colors of each 4 pixel group). This might be easier if the colors were stored as 16bpp rather than 8; for 8bpp you'd have to do a manual lookup from a palette in COG memory.
Still not sure how high a resolution would be feasible in one cog. 640x480 should be no problem at all. 1280x720 would be tricky but I think it might be do-able.
Here's what the long used "bitmap2.bmp" looks like when CCC encoded, along with the original. (encoded version saved as .png so shows up here better).
Also, attaching the 76 kB CCC data file for this image (compressed down from 301 kB).
I think it looks pretty good, right?
We could do things like store many images in hub ram...
Also could probably play VGA resolution video from uSD card...
To open the above CCC file (in the zip file), just open up a 8bpp BMP in same resolution first... Then, load the .ccc file...
Was wondering today if could use the 2bpp tile driver with something like this...
Seems could fit a 720p image in HUB RAM this way.
The images above were with 4x4 pixel tiles and the usual 1bpp within each tile.
If try this at 16x16 pixel tiles, doesn't look all that great (see attached).
But, maybe can use alternate version of CCC that uses 2bpp within each tile instead, result would be better.
Then, could use directly with existing 2bpp tile driver.
Alternate approach would be to try modifying the 2bpp tile driver to instead be 1bpp and with 4x4 pixel tiles.
That gives pretty good results, but I'm not sure if the colors could be swapped fast enough...
Maybe a simpler approach would be to modify the tile driver to 16x4 pixels.
This looks OK, even with just 1bpp data.
Also, I'm remembering that one feature of my code is that it starts with a 256 color image.
That would make it easy to use with the tile driver and also reduce the color data per tile from 6 bytes to 2 bytes.
The base address bits for the streamer can select 16 different offsets in the LUT ram. So 16 blocks could have their colors preloaded into the LUT. Too bad there's only 4 bits for the offset. Otherwise we could load all 120 blocks into the LUT at once. (16 * 120 = 1920)
Assuming 1080p and 300MHz clocks, we have 1 instruction per pixel. Or 16 instructions per block.
For 1bpp, using 16 bit color only adds .25 bpp extra but may simplify the update process with the RGBEXP instruction.
There is just enough time to do 2 bpp at 1080p, but we are short on RAM. A 1080p image at 2bpp is 518400 bytes, which leaves only 5888 bytes left for colors and code.
Of course, these would need to be unrolled into 2 blocks to do ping-pong updates of the LUT. Then during horizontal sync, we would need an uninterrupted 120-160 clocks to block read the new color data for the next blocks.
Got it going with 2bpp tiled VGA driver.
This is P2 VGA output onto monitor showing a 720p image on a 1080p tiled display.
All image data stored in hub ram.
It's actually not as horrible as you might think.
There are 2 of 4 colors not being used at the moment.
Thinking adding those in as options might make it decent...
Wow, the 16x16 cell CCC with four colors per cell is pretty amazing.
Here is BMP of the result and a screenshot using the 2bpp VGA tile driver.
I'm not sure anyone could tell the difference between this and the original...
The 8bpp original is 902 kB. 24bpp version would be 3x bigger.
This is 241 kB. A compression factor of roughly 4 compared to the 8bpp version.
Here's the VGA code, if you want to see for yourself.
I'm doing this with PropTool and SimpleP2++ board, but should also work with Eval board type setup with A/V adapter on basepin 8.
(and usb mouse on basepin 16, although not really useful here...)
Here's a test of graphics. Maybe an easy test, but it works.
Have to try something like a PowerPoint slide...
Looks like can fit two frames of 720x544 CCC4 compressed video in HUB ram.
Would need the eMMC to play at 30 fps though...
Looks like FSRW can do 2.4 MB/s with uSD.
Attached are example 720x544 frames from Sintel that take 182 kB of HUB RAM per frame.
So, that'd be around 13 fps with no audio...
Would be interesting to see how this compression would affect video quality...
Have you seen the cinepak player I wrote? That sortof did 30 FPS on 640x480 I think. I never worked much further on that, but if instead of decompressing directly into a usable RGB framebuffer (that ends up being massive for 32 bit mode and eats up all the bandwidth to external memory), one modified the video driver to handle YUV 4:2:0 (which is only 12bpp) directly, I think that would make it a lot better. Though you'd still need an external buffer, can't fit a full frame of that into hub memory.
I actually tried my hand at writing a custom cinepak encoder (because the one in ffmpeg is mildy ass and makes everything green-tinted and blocky, but I think has better VQ), didn't properly finish it much, but I did run Sintel through it once, though only at 1024x432 and with reduced quality level 4. The entire video for that is 1150809374 bytes (as AVI without audio or index). I think there's on the order of 22000 frames (due to a confluence of dumb issues I can't easily check it right now), so around 52k per frame. That sounds correct, anyways.
Here's a similar frame pulled from that:
Whether the image quality here is better than the color cells is surprisingly questionable. I guess the color is better but the detail is worse. Though it might be more fair at the same resolution.
@Wuerfel_21 That might be better, I don't know. Just kind of experimenting here...
One very nice feature here is that no decoding is required. Just load the image into the buffers and you're done.
I'm now trying some more challenging images from here:
https://imagecompression.info/test_images/
The first one is attached before and after compression.
Not too bad, but there are a couple things if you look close enough...
can just squeeze in 1920x800 cc4 image into HUB RAM.
Looks fairly good..
Seems to be doing pretty well in these test images. Might be done with that..
What I was getting at is that the color cell method seems to work really well for what it is. Maybe it gets bad with motion though.
Though the decoding for cinepak is really just a bunch of copying pixels around:
I guess you could do same "keep from previous frame" compression, but that doesn't always do much. Could probably do VQ on 4x4 sub-cells of indices and rice-code difference of adjacent cell colors or something.
@Wuerfel_21 Found your cinepak thread: https://forums.parallax.com/discussion/175112/cinepak-video-player-proof-of-concept
It might be interesting to compare what this can do compared to that...
I'm not actually sure this won't have some issues with video though.
Right now, you can't really notice the 16x16 cells. But, that might change if video... Or, maybe not...
I'm thinking the cell size could be reduced to 8x8 or 8x4 or so, if needed...
The need not to decompress is a real advantage though.
It's basically just the uSD doing the work.
And, don't think you need PSRAM to do 640x480.
The 2bpp image data is another huge benefit...
At 640x480, one needs a constant size buffer of 81kB to hold the image data.
If uSD is doing 2.4 MB/s, can do 29.6 fps. Maybe I'd drop that to 24 fps and add audio...
Also, the encoder is dirt simple. Starting from and using the 256 color image palette helps a lot for both speed and size.
Do need a good program to convert video frames to 256 color though. Irfanview is not bad but Photoshop is better.
But one real issue is that the 1080p tile driver code is crazy complex and I'm not sure I can dial it down to 720p or 480p anymore.
I'm tried for about an hour to make it do 720p, but it's not working at all...
@SaucySoliton In the 1080p tile driver, the "ColorDriver" cog does what you describe. It shares LUT with the VGA driver...
Ah, that's why the dithering goes so hard. Generating a color palette is basically the same thing as the cinepak codebook generation, i.e. NP-hard problem. The difference is 3 dimensions (R, G, B ) vs 6 dimensions (4x Y, U, V). What I used for the latter is "fast pairwise nearest neighbor" (mildly complex, attached a PDF on that) combined with some iterations of the more common LBG algorithm (set all palette entries to the average of all pixels using them). There's probably smarter ways of doing it though. The big trouble with video is that it will look bad if you change the palette every frame, because everything is constantly shifting colors. With cinepak this problem (very noticably) only happens when you force a key frame because it will rather leave static areas as-is than constantly re-write them with new blocks.
It could be that showing large, static images from hub ram is the best use of this…
Maybe yes, maybe no. It will introduce a temporal dithering, something like the effect in lft's P1 "Turbulence" demo. An experiment needed.
@Wuerfel_21 to dither or not to dither is interesting question. I may have been letting photoshop and irfanview dither when reducing color depth to 256 colors. Might help, might hurt, not sure…
The quality of the result is so good that am wondering if is underlying proportionality...
Going from 4x4 cell with 1 bpp seems to be nearly identical to 16x16 cell with 2 bpp...