Video Player (now with eMMC and 480p)
I previously posted about a video player from uSD using FSRW: https://forums.parallax.com/discussion/171570
Just hacked that to use the new FSRW for eMMC. eMMC has an 8-bit data bus, so is much faster.
Can now do 480p widescreen video at 30 fps and 16bpp. This looks way, way better than that QVGA in 8-bit indexed bitmap.
Also demonstrates using a single buffer for video. Don't have a choice as each image is 342 kB. It can actually go at 60 fps (did that by accident).
Video has to be widescreen though because a full frame image won't fit in RAM. (Or, maybe there's a way around that?)
File size is ginormous. I could only put half the movie into one file because 5 minutes of this hits the 4 GB file size limit of FAT32.
Anyway, here's a video of it working. Somehow I managed to get the audio out of sync. I'm not sure how that is even possible though... Have to look into that sometime...
Just hacked that to use the new FSRW for eMMC. eMMC has an 8-bit data bus, so is much faster.
Can now do 480p widescreen video at 30 fps and 16bpp. This looks way, way better than that QVGA in 8-bit indexed bitmap.
Also demonstrates using a single buffer for video. Don't have a choice as each image is 342 kB. It can actually go at 60 fps (did that by accident).
Video has to be widescreen though because a full frame image won't fit in RAM. (Or, maybe there's a way around that?)
File size is ginormous. I could only put half the movie into one file because 5 minutes of this hits the 4 GB file size limit of FAT32.
Anyway, here's a video of it working. Somehow I managed to get the audio out of sync. I'm not sure how that is even possible though... Have to look into that sometime...

Comments
Input reads 6 bytes (4 Y, U, V) per four pixels presumably in this sequence:
V Y0 Y1 U Y2 Y3
Outputs 32 bits x 4 pixels something like this:
Y0:U:V:0 Y1:U:V:0 Y2:U:V:0 Y3:U:V:0
Here's a PASM2 snippet thay may do this work, 28 clocks for 4 pixels including a REP plus 4 write clocks with setq2 burst later, making 8 clocks per pixel. This is ~21us per 640 pixel wide scan line at 250MHz so one COG could do this decompression.
rflong pixel ' reads U:Y1:Y0:V rfword y2_3 ' reads 0:0:Y3:Y2 movbyts pixel, #%%1302 ' setup in Y:U:V:0 format getbyte y1, pixel, #0 ' extract y1 before we lose it setbyte pixel, #0, #0 wrlut pixel, ptra++ ' save pixel1 setbyte pixel, y, #3 wrlut pixel, ptra++ ' save pixel2 setbyte pixel, y2_3, #3 wrlut pixel, ptra++ ' save pixel3 shr y2_3, #8 setbyte pixel, y2_3, #3 wrlut pixel, ptra++ ' save pixel4
Yes, but some byte shuffling is needed to get 4 U/V values to interpolate into a long.
But then it's easy to interpolate any ratio you want.
Maybe a simpler single interpolation operation could be done to get it to fit within two COGs (or even one?). So just 2 replications and one interpolated pixels, instead of 3 replicated pixels every four pixels, or the more intensive 3 interpolated pixels. That might still give a reasonable effect.
Rough pseudo-asm
' decompress into 32bit UVYx ' assume FIFO is pointed at interleaved compressed data ' assume PTRA and PTRB are pointed to two scanline buffers ' should use REP, but IDK how that works RN rflong rightuv ' load first UV pair mov iter,#VIDEO_WIDTH/4 :loop mov leftuv,rightuv rflong rightuv ' even line's UVs are in bottom word rflong eveny rflong oddy ' even-line pixel 0 rolword evenpx0,leftuv,#0 rolbyte evenpx0,eveny,#0 rolbyte evenpx0,#0,#0 ' odd-line pixel 0 rolword oddpx0,leftuv,#1 rolbyte oddpx0,oddy,#0 rolbyte oddpx0,#0,#0 ' UVs for column 1 mov tempuv,leftuv setpiv #64 blnpix tempuv,rightuv ' even-line pixel 1 rolword evenpx1,tempuv,#0 rolbyte evenpx1,eveny,#1 rolbyte evenpx1,#0,#0 ' odd-line pixel 1 rolword oddpx1,tempuv,#1 rolbyte oddpx1,oddy,#1 rolbyte oddpx1,#0,#0 ' UVs for column 2 mov tempuv,leftuv setpiv #128 blnpix tempuv,rightuv ' even-line pixel 2 rolword evenpx2,tempuv,#0 rolbyte evenpx2,eveny,#2 rolbyte evenpx2,#0,#0 ' odd-line pixel 2 rolword oddpx2,tempuv,#1 rolbyte oddpx2,oddy,#2 rolbyte oddpx2,#0,#0 ' UVs for column 3 mov tempuv,leftuv setpiv #192 blnpix tempuv,rightuv ' even-line pixel 3 rolword evenpx3,tempuv,#0 rolbyte evenpx3,eveny,#3 rolbyte evenpx3,#0,#0 ' odd-line pixel 3 rolword oddpx3,tempuv,#1 rolbyte oddpx3,oddy,#3 rolbyte oddpx3,#0,#0 ' writeout setq #3 wrlong evenpx0,ptra++ setq #3 wrlong oddpx0,ptrb++ djnz iter,#loop
Read compressed data: 4*2 = 8 cycles
Assemble pixels: 8*3*2 = 48 cycles
UV interpolation: (4+7)*3 = 33 cycles
writeout (assuming worst-case waitstates): 2*(10+3+2) = 30 cycles
Total: 8 +48 +33 + 30 = 119 cycles for 8 pixels = 14.875 cycles per pixel
Or am I missing something important?
If this blend method works ~15 cycles per pixel will require 2 COGs for VGA resolution and rates at 250MHz for 60fps source data. But perhaps replaying at 30Hz it could be done in one COG and that would need a suitable frame buffer in external memory if the colour depth gets increased to become 24 bpp and the frames are being output (twice) at 60Hz. HyperRAM may suit this if we can get the entire data written in time which requires a 640*360*4 * 30 or ~ 27MB/s write rate. Should be likely doable even with the sysclk/2 writes and a 252MHz P2.