WAITVID limitations?

pedward · 2013-04-04 22:43

I was playing with WAITVID, and the new instruction seems totally unrelated to how the same named instruction works in the P1.

What I found in experimenting is that if you use the CLUT for data, your video mode must be able to store 1 line in the CLUT. What I mean is that WAITVID will read the CLUT and roll over, and redisplay the contents if the number of pixel clocks is greater than the amount of pixel data in the CLUT.

I experimented with Bill's 256x192 code and made a 1280x1024 display, but it would wrap the CLUT per line.

Is it true, that in the modes where the image data is retrieved as R:G:B data from the CLUT, you have enough memory for 1 scanline?

I'm using mode:

%1101 = STR15_RGB15 - 15-bit pixels are streamed from stack RAM starting at %AAAAAAAA
plus S[7:0], with S[31] selecting the starting word. The pixels
are used 5:5:5 colors.

In that mode, each CLUT entry contains 2 5:5:5 pixels, with the high bit of each word being x:

x_RRRRR_GGGGG_BBBBB_x_RRRRR_GGGGG_BBBBB

That allows you 512 pixels per scanline in the CLUT, I don't see how you can use this mode to display more than 512 pixels per line.

Am I seeing this right, am I grokking it?

On another note, the new video hardware makes programming so simple and easy to do! It's so cool to be able to change 3 lines and go from 1x1 pixels to 16x16 pixels!

potatohead · 2013-04-05 00:39

You can race the beam and populate the data with new pixels after it has been displayed. Use more than one waitvid for a high resolution line and reuse the pixel data space. Once it's seen, it's seen

Yes, very nice. I really like that I can write for a higher resolution display, then pack it down into composite or component SD horizontally with few issues other than some detail loss on higher pixel clocks. Vertical is just limited, but still it's an excellent change.

Waitvids are double buffered too. That's nice, because parameters and such can all be changed with no worries about precise timing, just get one running, then kick off the next one. In moderate resolution cases, just fill the scan line buffer while the border waitvid is running! The active display one will pick up and display those pixels. If you want, you can break the scanline into a couple, or few waitvids to easily latch this, or generate mixed mode displays very easily. (that was very hard on P1)

Oh, and we are at a fraction clock too. Honestly, it will blast pixels so fast at the full clock speed that simple bitmaps will populate their waitvid, or a whole scan line during the border time, maybe just a little more. Tons of time left in the video driver for other things, unlike P1 which basically could only manage a scan line buffer at any serious color depth or pixel clock.

pedward · 2013-04-05 00:56

So you're saying:

Do Sync
Load CLUT
Call WAITVID
Load CLUT
Call WAITVID
Loop

Is that right?

Roy Eltham · 2013-04-05 01:35

or
Do Sync
Load CLUT
Call WAITVID (for full line)
Load CLUT
Load CLUT
Load CLUT
Loop

Ariba · 2013-04-05 07:54

pedward wrote: »

So you're saying:

Do Sync
Load CLUT
Call WAITVID
Load CLUT
Call WAITVID
Loop

Is that right?

Yes, but you must make sure that you don't overwrite a long in the CLUT before waitvid has picked this long.
I was able to make a video driver which uses only 1 long in the CLUT, by waiting long enough before I filled this same long with the next long from hubram. I've done the delay with nops in the fill code and found the number of NOPs by experimenting. This is a bit ugly, but I think you can also do it with POLVID.

Andy

potatohead · 2013-04-05 07:58

I haven't tried POLVID yet, but yes those kinds of things are exactly what I mean.

Ariba · 2013-04-05 08:15

potatohead wrote: »

...
Oh, and we are at a fraction clock too. Honestly, it will blast pixels so fast at the full clock speed that simple bitmaps will populate their waitvid, or a whole scan line during the border time, maybe just a little more. Tons of time left in the video driver for other things, unlike P1 which basically could only manage a scan line buffer at any serious color depth or pixel clock.

Also with only 60MHz I can do the NTSC / PAL text driver, you know, in a 25% task. For example I have tried a code wit 3 tasks: 25% Video driver, 25% Serial receive and 50% LMM execution loop. Now I have changed it to 50% video and 50% LMM and do Serial in LMM, this is easier to handle.
But I think you only will do such things if you have only 1 cog as on my DE0-Nano ;-)

Andy

pedward · 2013-04-05 12:03

Since a single WAITVID call will issue the number of clocks specified in the D parameter, and we have no sync point for knowing when the CLUT should be refilled, it seems the right answer is to call WAITVID with only the number of clocks the CLUT will satisfy, then reload the CLUT and call WAITVID again, then finally do the sync.

sync
repeat
Load lower CLUT
WAITVID lower half
Load uppper half
WAITVID upper half

That way you split the CLUT into 2 buffers and you double buffer the WAITVID call by only issuing 128 CLUT entries in each call, this should allow you to buffer data fast enough from HUB to avoid stalling the video circuitry.

Of course, if there were some sync point, Roy's method of just stacking CLUT fills would work. That also works if you empirically determine the timing requirements.

potatohead · 2013-04-05 12:40

That is about what I did to latch things nice and simple. Not required though! Lots of tricks possible.

cgracey · 2013-04-05 23:29

To output more pixels than the CLUT holds, you must ping-pong between two buffer areas, filling one while displaying the other.

potatohead · 2013-04-06 10:09

Yes. Really, what I meant by tricks was all about how that is done. A simple "ping-pong fill" is simple, clean, etc... However, one can choose to display one big buffer, and just render into the part that's not being displayed too, and that depends on empirical timing. Well, that can be calculated, it's just often easy to scale it once or twice to see where things play out, then go from there.

When doing simple bitmaps, there will be a lot of time between waitvids. Even at 60Mhz, there is a ton of time now compared to P1, particularly at high bit depth per pixel. Seems to me then, latching this buffer fill could also be done with some other event that consumes that time in discrete chunks...

WAITVID limitations?

Comments