Video Player (now with eMMC and 480p)
Rayman
Posts: 14,789
I previously posted about a video player from uSD using FSRW: https://forums.parallax.com/discussion/171570
Just hacked that to use the new FSRW for eMMC. eMMC has an 8-bit data bus, so is much faster.
Can now do 480p widescreen video at 30 fps and 16bpp. This looks way, way better than that QVGA in 8-bit indexed bitmap.
Also demonstrates using a single buffer for video. Don't have a choice as each image is 342 kB. It can actually go at 60 fps (did that by accident).
Video has to be widescreen though because a full frame image won't fit in RAM. (Or, maybe there's a way around that?)
File size is ginormous. I could only put half the movie into one file because 5 minutes of this hits the 4 GB file size limit of FAT32.
Anyway, here's a video of it working. Somehow I managed to get the audio out of sync. I'm not sure how that is even possible though... Have to look into that sometime...
Just hacked that to use the new FSRW for eMMC. eMMC has an 8-bit data bus, so is much faster.
Can now do 480p widescreen video at 30 fps and 16bpp. This looks way, way better than that QVGA in 8-bit indexed bitmap.
Also demonstrates using a single buffer for video. Don't have a choice as each image is 342 kB. It can actually go at 60 fps (did that by accident).
Video has to be widescreen though because a full frame image won't fit in RAM. (Or, maybe there's a way around that?)
File size is ginormous. I could only put half the movie into one file because 5 minutes of this hits the 4 GB file size limit of FAT32.
Anyway, here's a video of it working. Somehow I managed to get the audio out of sync. I'm not sure how that is even possible though... Have to look into that sometime...
Comments
Input reads 6 bytes (4 Y, U, V) per four pixels presumably in this sequence:
V Y0 Y1 U Y2 Y3
Outputs 32 bits x 4 pixels something like this:
Y0:U:V:0 Y1:U:V:0 Y2:U:V:0 Y3:U:V:0
Here's a PASM2 snippet thay may do this work, 28 clocks for 4 pixels including a REP plus 4 write clocks with setq2 burst later, making 8 clocks per pixel. This is ~21us per 640 pixel wide scan line at 250MHz so one COG could do this decompression.
Yes, but some byte shuffling is needed to get 4 U/V values to interpolate into a long.
But then it's easy to interpolate any ratio you want.
Maybe a simpler single interpolation operation could be done to get it to fit within two COGs (or even one?). So just 2 replications and one interpolated pixels, instead of 3 replicated pixels every four pixels, or the more intensive 3 interpolated pixels. That might still give a reasonable effect.
Rough pseudo-asm
Read compressed data: 4*2 = 8 cycles
Assemble pixels: 8*3*2 = 48 cycles
UV interpolation: (4+7)*3 = 33 cycles
writeout (assuming worst-case waitstates): 2*(10+3+2) = 30 cycles
Total: 8 +48 +33 + 30 = 119 cycles for 8 pixels = 14.875 cycles per pixel
Or am I missing something important?
If this blend method works ~15 cycles per pixel will require 2 COGs for VGA resolution and rates at 250MHz for 60fps source data. But perhaps replaying at 30Hz it could be done in one COG and that would need a suitable frame buffer in external memory if the colour depth gets increased to become 24 bpp and the frames are being output (twice) at 60Hz. HyperRAM may suit this if we can get the entire data written in time which requires a 640*360*4 * 30 or ~ 27MB/s write rate. Should be likely doable even with the sysclk/2 writes and a 252MHz P2.