Video Player

Rayman · 2020-05-09 01:01

It works!

Source code attached.

More info and program to create video is here: http://www.rayslogic.com/Propeller2/P2Video/P2Video.html

rogloh · 2020-05-09 01:04

Very cool. What resolution & frame rate are you targeting?

Rayman · 2020-05-09 01:05

Very low on both
But maybe just ok

Peter Jakacki · 2020-05-09 03:44

Hi Rayman, how have you got that video encoded?

I found it was easier when I did my video player in TAQOZ to grab frames from the original video, maybe initially as png, and then convert them to 320x240 BMPs on the PC that results in a 76kB frame. Then I joined those frames together in one file with a BMV extension on the SD card.

That way I can read them in fast enough into a separate buffer, flip them, and even double up the pixels and lines for full 640x480 at 30fps. There was enough time to have both double pixel and original with the position of the original 320x240 window anywhere superimposed over the larger 640x480 screen. btw, my SD routines can read at around 3MB/s at 300MHz.

Rayman · 2020-05-09 10:34

3 MB/s ? Wow that’s 3x the best I’ve got... guess I’d like to see how to do that

Peter Jakacki · 2020-05-09 11:33

I was more curious as to "what's under hood" with your stuff

However, I just had a quick dabble on TAQOZ and measured some speeds.
First off I execute just the .SPEEDS part of the full .DISK report.

TAQOZ# .SPEEDS --- 

                 *** SPEEDS *** 
    LATENCY......................... 503us,250us,250us,268us,250us,307us,312us,296us,
    SECTOR.......................... 647us,407us,405us,423us,406us,463us,463us,451us,
    BLOCKS.......................... 3,148kB/s @300MHz ok
TAQOZ#

But is that really > 3MB/s? So I find a bmp I can open and just check the header first

TAQOZ# DIRW --- 
ROOTDIR      EFM8UB3 .ROM P2D2    .BIN P2D2A   .BIN P2USB   .BIN UB200108.BIN 
UB200309.BIN USB1912A.BIN BEACH2  .BMP BUZZ    .BMP DRAGON  .BMP TIGER   .GIF MARIO   .BMP BEACH   .BMP 
TIGER   .PNG TIGER1  .BMP TIGER   .JPG BIRD    .BMP FACE    .BMP Am.a.r.i..o. MARIO   .PNG TIGER   .BMP 
FIRE    .BMP SPIDEY  .PNG MCQUEEN .BMP P2D2A   .BMP EYEGOD  .BMP LMMS    .BMP SPIDEY  .GIF SUNSET  .BMP 
SPIDEY  .BMP EFMUSB  .ROM TAQOZ   .ROM _BOOT_P2.BIX   ok
TAQOZ# FOPEN MCQUEEN.BMP Opened @57750 ---  ok
TAQOZ# 0 $80 SD DUMP --- 
00000: 42 4D 7A B4  04 00 00 00  00 00 7A 04  00 00 6C 00     'BMz.......z...l.'
00010: 00 00 80 02  00 00 E0 01  00 00 01 00  08 00 00 00     '................'
00020: 00 00 00 B0  04 00 13 0B  00 00 13 0B  00 00 00 01     '................'
00030: 00 00 00 01  00 00 42 47  52 73 00 00  00 00 00 00     '......BGRs......'
00040: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00     '................'
00050: 00 00 00 00  00 00 00 00  00 00 00 00  00 00 00 00     '................'
00060: 00 00 00 00  00 00 00 00  00 00 02 00  00 00 00 00     '................'
00070: 00 00 00 00  00 00 00 00  00 00 03 04  0D 00 04 08     '................' ok

Yep, that certainly looks like a bitmap file. Now I will time opening that file and normally I would say "VIEW MCQUEEN" but to time it I just break it down this way.

TAQOZ# LAP " MCQUEEN.BMP" VIEW$ LAP .LAP --- 36,415,485 cycles= 121,384,950ns @300MHz ok

That's great, what was the file size?

TAQOZ# FSIZE@ . --- 308346  ok

I'll use the Forth, and calculate the average transfer rate over the time it took.

TAQOZ# 308346 1000000 121384 */ . --- 2540252  ok

Only 2.5MB/s, but that was finding and opening the file etc. Once a video file is open I only need to read in a frame at a time so I will simply time that part of it using the full 640x480x8 BMP file. Essentially the SDRDS routine requires the starting sector, in this case the open file starting sector can be returned with @FILE. Then a destination in memory and I choose the BMP area which has some room for the header, then the palette and bitmap data, and then of course the number of bytes. Let's time that operation and calculate the effective read speed.

TAQOZ# @FILE BMP 308346 LAP SDRDS LAP .LAP --- 30,001,393 cycles= 100,004,643ns @300MHz ok
TAQOZ# 308346 10 * . --- 3083460  ok

So that lines up nicely with what .SPEEDS reports and viewing the bmp actually involves reading just the header first, determining the offset to the palette and bitmap data, and then reading in the data offset to the BMP memory so that the start of the palette in the file ends up at the fixed start of the palette in memory after which I do a quick vertical flip which takes about 10ms. (BMV files don't need this alignment step).

TAQOZ# LAP VFLIP LAP .LAP --- 2,958,553 cycles= 9,861,843ns @300MHz ok

The BMV video just uses lots of sequential 320x240 bmp frames.

So, how do you do it?

Rayman · 2020-05-09 12:34

@"Peter Jakacki" Is your sd card reading code posted somewhere?
I’d like to see how you do that...

Rayman · 2020-05-09 12:36

I’m also doing bmp files btw...

3mb/s would make it much nicer...

Peter Jakacki · 2020-05-09 12:48

Essentially, the higher level TAQOZ makes use of this SPI block read loop in cog memory.

' Read bytes in from SPI to memory
' SPI>BUF ( dst cnt -- sum ) (46,121 cycles)
SPIRX           wrfast  #0,b
                mov     b,#0
.L0             rep     #5,#8                ' 8 bits
                 outnot  sck                   	' clock (low high or low high)
		 rcl     r1,#1                  ' shift in msb first (first dummy zero)
                 outnot  sck
		 nop
                 testp   miso wc                ' read data from card

		rcl     r1,#1                   ' last bit
                wfbyte  r1
                djnz    a,#.L0
                jmp     #DROP

Just retesting that speed.

TAQOZ# BUFFERS 512 LAP SPIRX LAP .LAP --- 46,121 cycles= 153,736ns @300MHz ok

Rayman · 2020-05-09 13:50

Thanks. I’ll try that.

Rayman · 2020-05-09 14:14

Ok, I’d have to use another cog to do wfbyte ... guess it’s worth it...

Wuerfel_21 · 2020-05-09 15:39

Peter Jakacki wrote: »

I do a quick vertical flip which takes about 10ms.
The BMV video just uses lots of sequential 320x240 bmp frames.

If you're using a sortof-custom file format anyways, why not just flip the frames before/while encoding? 10ms seems pretty long to me.

Rayman · 2020-05-09 16:44

I used Irfanview to batch process the images.
It can also do the vertical flip...

Peter Jakacki · 2020-05-10 00:17

Wuerfel_21 wrote: »

Peter Jakacki wrote: »

I do a quick vertical flip which takes about 10ms.
The BMV video just uses lots of sequential 320x240 bmp frames.

If you're using a sortof-custom file format anyways, why not just flip the frames before/while encoding? 10ms seems pretty long to me.

That 10ms is for viewing any random 640x480 bmp image but the bmv file doesn't need it and besides, the bmv frames are much smaller at 320x240.
This is the one line that reads the bmv frame backwards into memory after the frame is loaded and the palette is setup. (It takes about 1ms)

0 DO DUP SCR I vwin W@ + cols W* + hwin W@ + 80 LMOVE XRES - LOOP DROP

Since the frame is not the same size as the display it can't just block move the whole frame in one go anyway, so there is no advantage in having it flipped beforehand.

Rayman · 2020-05-14 20:55

Just occurred to me that while full VGA video is probably out of reach without help...
a 640x480x8bpp Slide Show with audio should be easily doable.

You could, for example, learn about doing things with your P2 using your P2...

Could probably extend to 1080p using HyperRam...

rogloh · 2020-05-15 00:38

I think full VGA video might be feasible as long as you can read in the data from the source at over 7.2MB/s for (say) 640x480x24 fps with 8 bit palettes per frame. We likely need to use full SD mode instead of SPI mode for streaming at that speed off a fast SD card. We could probably do with an SD mode driver for P2 at some point...

Rayman · 2020-05-15 00:54

I’m thinking the eMMC board I just bought will dramatically improve input speed

rogloh · 2020-05-15 00:59

Yeah it should. A 16GB eMMC module could store over 30 mins of video at the above rate too. The 32GB HyperFlash should store an hour and probably take many more hours to write it LOL.

Peter Jakacki · 2020-05-15 02:29

I can slideshow 10 frames of 640x480x8 bitmap per second. If I could get to 25 or 30 frames then it would be full resolution video. I could shave a little off by having the bmp file preprocessed so it doesn't need flipping etc but SD SPI isn't fast enough, even at full 50MHz SPI speed. The 320x240 that is resized to 640x480 is what I do at present though, so maybe I could try an in-between resolution that can easily be resized. 640x360 is 16:9 aspect ratio friendly, so that is already a start. It's easy to buffer 320x240 and then write it, but we don't really have enough memory to do this at higher resolutions.

Here's the challenge - FULL SCREEN VIDEO (from SD) IS IMPOSSIBLE!

rogloh · 2020-05-15 02:59

Peter Jakacki wrote: »

Here's the challenge - FULL SCREEN VIDEO (from SD) IS IMPOSSIBLE!

I expect that will be possible with SD mode access to SD cards at standard VGA resolution. Maybe there could be some very lightweight compression techniques that could be used (with multiple COGs) to improve the resolution when reading from the SPI mode SD cards too, if pre-processed accordingly beforehand. Eg Perhaps 4:2:2 expansion could be done on the fly with multiple COGs if the input format is already in a good state for doing this work?

rogloh · 2020-05-15 03:05

By the way, also related to video players, I was recently thinking about this video flipping operation and frame buffers in external memory. If your video playback code ever wants to use HyperRAM based frame buffers, you can get the flip operation done for you automatically using the graphics copy operation with a negative pitch setting. Nothing will need to be reversed in HUB memory (unless it is a horizontal flip), and the transfer from HUB memory data into the frame buffer with graphics transfers can do this step for you in the driver during the copy operation from hub RAM off SD card to HyperRAM. That may help somewhat.

Playback from external memory with double buffers may help eliminate tearing as well, and you could sync the flip on a frame boundary. You can construct one frame while displaying another (which you can't do in hub RAM) at VGA resolution.

Peter Jakacki · 2020-05-15 03:11

So the vertical flip entails copying from the bottom line in the file buffer to the top line in the screen, line by line, so the pixels in each line are still in the same order. Is that what is possible in the "graphics copy operation"?
If there isn't a buffer it involves exchanging top and bottom and moving inwards to the center (or vice versa).

While Secure Digital mode is licensed, there is a 4-bit SD mode that could be made use of, but do we have enough information on that for us to write a driver? I'd like to but that also means I have to have another SD card socket connected in 4-bit mode which I could do on my P2LAB.

rogloh · 2020-05-15 03:24

Yes that is what is possible during graphics copy. You can independently see the source pitch and destination pitch to be positive or negative amounts. So you can copy scanlines forwards or backwards from your source data, though with each scan line's data itself still copied in increasing byte order. This pitch allows for graphics memory to not be contiguous per scan line for other useful things like (finite) scrolling.

rogloh · 2020-05-15 03:32

Peter Jakacki wrote: »

While Secure Digital mode is licensed, there is a 4-bit SD mode that could be made use of, but do we have enough information on that for us to write a driver? I'd like to but that also means I have to have another SD card socket connected in 4-bit mode which I could do on my P2LAB.

I was able to achieve SD mode reads in my own dedicated FPGA engine implementation on the P1V using snippets of data sheets, code samples, scoped timing info etc I dug up online, but it is hard to locate all required information in one place unless you were to signup to become a member of SD org or whatever you need to do. You can get the basic command overview in the publically available pdf's you can locate online. How easily it translates over to P2 instructions and streamer commands and how many COG's you may need I'm not sure. It would still be useful to have some HW around for people to experiment with, with all 4 DAT lines connected.

Peter Jakacki · 2020-05-15 03:36

rogloh wrote: »

I was able to achieve SD mode reads in my own dedicated FPGA engine implementation on the P1V using snippets of data sheets, code samples, scoped timing info etc I dug up online, but it is hard to locate all required information in one place unless you were to signup to become a member of SD org or whatever you need to do. You can get the basic command overview in the publically available pdf's you can locate online. How easily it translates over to P2 instructions and streamer commands and how many COG's you may need I'm not sure. It would still be useful to have some HW around for people to experiment with, with all 4 DAT lines connected.

Well it's a reminder for me while I finalize some new artwork to have the extra full-size SD card slot connected up using all its pins so we have the option of 4-bit mode. Of course I would arrange the 4-bit data in order but probably P48 up though as P32 has HyperRAM.

rogloh · 2020-05-15 03:48

Not to derail the video player discussion here Peter but you may wish also consider fitting a 22pF capacitor option on the HyperRAM clock line to support sysclk/1 writes - see/continue current discussion in the HyperRAM driver thread if required.

Peter Jakacki · 2020-05-15 03:54

rogloh wrote: »

Not to derail the video player discussion here Peter but you may wish also consider fitting a 22pF capacitor option on the HyperRAM clock line to support sysclk/1 writes - see/continue current discussion in the HyperRAM driver thread if required.

Ok, I won't say anything more about it here, so I will take a look at it on the other thread. I also happen to have my clock generator available so I can generate the same clock with a controlled phase.

Rayman · 2020-05-15 17:08

I'm putting some info about the video into the first, 512 byte, sector...
Looks like this:

typedef struct  P2V_HEADER {   //Some of this is just for future use.  This plus wav and bmp headers will be dumped into the first 512 Byte sector
	char                PIFF[4];        // PIFF Header      Magic header
	unsigned long       p2_videotype;  //Type of video (1=regular,2=presentation)
	unsigned long       p2_framerate;  //frame rate
	unsigned long		p2_numFrames;  //# of frames in video
	unsigned long		p2_width;  //width of video
	unsigned long		p2_height;  //height of video
	unsigned long		p2_bitsperpixel;  //bpp of video (has to be 8?)
	unsigned long       p2_NumOfChan;      // Number of channels 1=Mono 2=Sterio 
	unsigned long       p2_SamplesPerSec;  // Sampling Frequency in Hz 
	unsigned long		p2_SamplesPerFrame;  //# audio samples in each frame, should be= (Samples/sec)/(Frames/sec)
	unsigned long       p2_VideoBlocks; //#512 Byte sectors in each image
	unsigned long       p2_AudioBlocks; //#512 Byte sectors in each audio sample between images
	unsigned long		p2_WavHeaderOffset;  //offset to wave header in this block
	unsigned long		p2_BmpHeaderOffset;  //offset to bmp header in this block  (just first 54 bytes)
	unsigned long		p2_TitleOffset;  //offset to Title string
	unsigned long		p2_DescriptionOffset;  //offset to Description string

}p2v_hdr;

Rayman · 2020-05-15 17:23

Made an app for creating the video files...
Looks like this (see attached):

Wuerfel_21 · 2020-05-15 17:40

Regarding SD speed: 50 MHz is already out of spec for SPI mode, but works because most cards support 50 MHz in SD mode and share the interface HW between SD bus and SPI. Most cards these days support UHS1, which is just SD bus up to 208 MHz, but at 1.8V. Has anyone here tried 100 or 200 MHz on 3.3V SPI?

Regarding video compression: I think I've brought it up before, but back in the 90s, Cinepak video compression was used to play low-res, low FPS videos off single-speed CD-ROM with the contemporary double digit MHz CPUs. I wonder if that scales to the P2 with high speed SD.

Rayman · 2020-05-15 23:15

I found some code for jpeg decoding.
Wondering if P2 can do that in real-time ...

Meanwhile, I’m having trouble getting audio perfect... think I need bigger buffers..

Video Player

Comments