1080p tile driver (P1 style)

Rayman · 2020-01-27 19:40

Was thinking about how to do 1080p video and it occurred to me that the P1 tile driver might be good for that.

Tested the idea out today, it seems to work (see screenshot).

Started out with 2 cogs, one for the video output and one to build the video line data.
If your P2 clock is below 250 MHz, another cog would be needed to help build the video line data.

Update: New color scheme seems to work. Now able to replicate the P1 "VGA_demo" (see new screenshot below).
Driver is using the actual P1 ROM font do draw the characters and buttons.
Update2: Added "Graphics" from P1 and can run "Graphics_Demo"
Update3: Code now attached here.

Driver Details:
Using 3 cogs:
One is the VGA driver that outputs 1080p in 2bpp LUT mode from a 1080x16 pixel buffer (one row of tiles). The LUT offset (%bbbb) cycles through 0 and 7.
Second cog populates the 1080x16 pixel buffer based on the 120x68 tile pointer array and ATN signals from the VGA driver
Third cog dynamically updates the tile colors at LUT offsets (%bbbb) 0 through 7, guided by ATN signals from the VGA driver, based on the palette, which is reloaded every frame.

There are 5 arrays needed by this driver, requiring a total of 88.5 kB (out of 512 kB available in P2):
One is the pixel output buffer (7.5 kB):

TileBuffer1 'pixel output buffer.  1920 of 2bpp tile data (16 lines)   
            long   $FFFFFF00[1920*16/16]

Another is the tile pointer array (32 kB):

Tiles       '1920x1080 in 16x16 tiles '20 bits for tile address, upper 12 bits free
            long    $0[120*68]

Another is the tile color pointer array (32 kB):

TileColors  'One long for each tile, picks four colors from 256 color palette.  Note:  bytes are read little endian
            long    $01_00_01_00[120*68]

Then, there's the 256 color, 24-bit palette (1 kB):

palette     '256 element array of 24-bit colors (Note:  Lower 8-bits are (must be?) zero).
            long    0[256]

Finally, the P1 ROM Font (16 kB):

P1RomFont     file "p1_font.dat"

JRoark · 2020-01-27 20:36

@Rayman Can you clarify this for me: Are you trying to output 1080p video (ala pictures/movies), or text at 1080p resolution? (I know the screenshot shows just text, but it is also conceivable you used a 1080p text source as the video input.)

Rayman · 2020-01-27 21:28

This is more or less the same as the P1 examples like "VGA_Demo" and "Graphics_Demo".

The screen contents are described by an array of pointers to 16x16x2bpp tiles.
Each tile also has four colors assigned to it (2 bpp).

So, it can be text, if you point to the ROM_Font, for example. Or, it can also show 2 bpp graphics.
You can also combine these...

Rayman · 2020-01-28 15:27

I had to go back and look at the regular P1 2-bit graphics...

Each tile was defined by a word. The lower 10 bits defined the base address of the tile data, after being left shifted by 6. A left shift of 6 is corresponds to 64 bytes, the number of bytes in a 16x16x2bpp tile.

The upper 6 bits of the tile data word are an index to a set of 4 colors for the tile.

The P1 ROM font is 16x32, so is 1 tile wide and 2 tiles tall.
The font tile data combines two 1bpp characters in the same 2bpp tile.
You need to format the color set in a special way to select which character to display.
For example, the characters "B" and "C" are combined into the same tile data in the p1 rom font and so would have the same lower 10 bits of tile data.
The upper 6 bits of color data is used to pick which is displayed.

Each color set is one long with 1 byte for each of the 4 colors. The usual usage was 2 bits each for red, blue, green, in this order with the lower two bits being unused.
(Actually, when applied to the pins, the lower two bits are where H and V sync signals go).
So, to display character "B", you would pick a color set with the format:
Foreground, background, foreground, background
To pick "C", you use a color set with the format:
foreground, foreground, background, background

Some of the tiles in the ROM font are intended to be used as 2bpp data (instead of merged 1bpp characters), such as edges of buttons.
As Chip mentions below...

cgracey · 2020-01-28 15:36

And some of those characters were actually meant to be displayed in two-bit color. I recall there were some beveled edges and other things.

Rayman · 2020-01-28 15:43

With the P2, there is a lot more flexibility. You can strike your own balance between cogs and memory usage.
The P1 could do 1600x1200 VGA in the way described above, but used up 6 cogs and about half of RAM.
This was not very practical for some applications.

Here, the plan is to do 1920x1080 using 3 cogs and ~64 kB of RAM (out of 512 kB).
This leaves a lot of RAM and cogs for other things...

The driver currently uses one long for every tile. I think I'm going to use the lower 20 bits as pointer to tile address in hub and the upper 12 bits for the color.
There are actually a lot of different ways to do the color.
One way is to use the 12-bits as an index to up to 4096 sets of four, 24-bit colors.
That'd be 64kB of colors if you actually needed all 4096, but I think much less than that is needed in practice.

Wuerfel_21 · 2020-01-28 15:48

Rayman wrote: »

Each tile was defined by a word. The lower 10 bits defined the base address of the tile data, after being left shifted by 6. A left shift of 6 is corresponds to 64 bytes, the number of bytes in a 16x16x2bpp tile.

The upper 6 bits of the tile data word are an index to a set of 4 colors for the tile.

I always found this strange. I think it makes much more sense for the top bits to contain the tile address,so one can just mask off all the other stuff (And indeed, the text modes in JET Engine work like that, both with the tile address and the color index (there's only 16 colors, but tile deinterleaving is handled by the driver (with bit 0 selecting whether you want the odd or even tile))). In the end, there isn't any advantage to either method in terms of driver performance - on P1, that is. On P2, RDLONG doesn't mask off the high 16 bits (which would contain the color bits after shifting the address into place), so "my" way is probably a bit more efficient.

Rayman · 2020-01-28 16:04

Hmm... What does RDLONG do if the upper 12-bits are not zero?
I vaguely remember a discussion about this...

But, don't see anything in documentation about it...
Actually, I guess this passage implies that only the lower 20-bits are used:

In the case of the 'S/#/PTRx' operand used by RDBYTE, RDWORD, RDLONG, WRBYTE, WRWORD, WRLONG, and WMLONG, there are five ways to express a hub address:

    $000..$1FF        - register whose 20 LSBs will be used as the hub address

BTW: I agree that the P1 driver could use ANDs instead of shifts and maybe be clearer...

cgracey · 2020-01-28 16:33

The upper 12 bits of hub addresses, out of 32 bits, are ignored.

Rayman · 2020-01-28 22:28

The above mentioned scheme for storing the color info in the upper 12 bits of tile data didn't pan out...

So, instead using a new array of one long per tile. Each byte in this long represents one of 256 24-bit color values stored in the COG RAM of the color cog.
This is a bit more limiting, but still pretty good and vastly superior to the P1 way.

Anyway, screen is now defined by two 32kB arrays, one for tile pointers and one for color info.

Rayman · 2020-01-29 14:28

I just borrowed another trick from the P1 driver...

Now, it reloads the whole 256 color, 24-bit palette during vertical blanking.

The P1 could only load 15 out of 64, 8-bit colors...

Rayman · 2020-01-30 15:43

Driver can now perfectly replicate the original "VGA_Demo" for P1 in a small corner of the 1080p screen.
So far, looks like this approach will work out nicely...

If you want to try it, just run the attached binary with A/V board connected to P0 header on eval board.

BTW: It's interesting that it looks steady with P2 clock set to 250 MHz and fpix = 148_500_000.0
I thought I would need P2 clock set to fpix*2 to get a steady image...

cgracey · 2020-01-30 19:07

Neat, Rayman! You are displaying some of those two-bit-color bevel characters. That character set has a lot of useful stuff in it. I don't think it's even been really exploited, yet,

Wuerfel_21 · 2020-01-30 20:04

cgracey wrote: »

I don't think it's even been really exploited, yet,

Yeah, with the 480-line modes one can get on P1 with a single cog, the font is just waaaay huge, even if you squeeze it down to 16 lines... An 8x8 font would have probably been more useful to have in ROM...

cheezus · 2020-01-31 05:29

cgracey wrote: »

Neat, Rayman! You are displaying some of those two-bit-color bevel characters. That character set has a lot of useful stuff in it. I don't think it's even been really exploited, yet,

I love the P1 font. I was actually quite sad about not having it in ROM. I've been meaning to dump it to SD and experiment on the P2 but haven't got around to it. So many irons and so little fire these days.

Rayman · 2020-01-31 14:00

The ROM Font was way to big for VGA resolution and even a bit too big for XGA.

But, at 1080P, it's just about right for something like a control panel.

There might be a case for making an 8x16 tile driver instead.
But, I'm going to see where this one takes me...

Rayman · 2020-01-31 21:38

I've almost go the famous "Graphics_Demo" ported over to here..

It's recognizable, but still needs work:

Tubular · 2020-01-31 21:40

ha, amazing work Rayman

cgracey · 2020-01-31 22:01

Man, are you translating the graphics.spin object? That is really useful because it has vector fonts that can be shown at any scale. More useful now than ever, since we have bigger display memory.

Rayman · 2020-01-31 23:27

I tried to make a list of things that needed changing:
'Byte jump table moved into hubexec as long jump table as didn't see how to fix it...
'MINS-->FGES
'MAXS-->FLES
'MOVS-->SETS
':-->. (many places)
'SETD-->SETD2 'SETD is an instruction in P2
'cmps wc,wr --> subs wc
'command := cmd << 16 + argptr --> command := cmd << 24 + argptr
'Using QROTATE to fix missing sin table
'Used MUL to speed multiply

Maybe the main thing now is that it is using a sin table that doesn't actually exist...
Think I can change it to QROTATE though...

Rayman · 2020-01-31 23:58

It works!

Here it is with no added delay in the loop.
Much faster than P1:

BTW: 1080p seems to work perfectly on any recent monitor, assuming it has VGA input.
A lot of the ones I have (like in this video) are old though with 4:3 layout and 1080p doesn't really work... It cuts off the part of the screen outside the 4:3 format.
Some others work better, but skip some of the 1920 vertical lines. This one does this too...

cgracey · 2020-02-01 00:44

Great, Rayman! Are you running at 250MHz?

Rayman · 2020-02-01 00:51

Yes, this is 250 MHz.
Seems fine, especially on newer monitors.

potatohead · 2020-02-01 02:32

Zooom!

Rayman · 2020-02-03 00:26

I just bought this $17 VGA --> HDMI adapter, so I can test with my second monitor (that doesn't have VGA input):
https://www.amazon.com/gp/product/B0823DDQ91

It actually works!

And now, it's clear that there is an issue with 250 MHz clock.
Although the image is stable, I can see that vertical lines appear missing, it's probably doubling some vertical lines and missing others.
At 297 MHz, it looks perfect.

Wuerfel_21 · 2020-02-03 13:27

I'd rather blame the adapter here. Digitizing VGA without any sort of pixel clock adjustment is bound to lead to problems.

Rayman · 2020-02-03 13:49

That's true, but it's not just the adapter, my old monitor was doing the same thing.
I'll test some more, but I think we'll just have to run at 297 or 297/2....

I wonder how safe it is to run at 297... I guess it's fine at room temperature.

potatohead · 2020-02-03 18:48

Have you tried an xzero at the end of frame? We know that won't work for ntsc color phase errors, because we cannot reset the phase of the color engine, but for a simple pixel error like this, it should work just fine.

Rayman · 2020-02-03 18:55

What do you mean by "end of frame"?

I currently do the visible line like this, with an xzero at the start:

            'Want to start off with xzero to decrease jitter
            cogatn  #(1<<Color_Cog)
            xzero   m_rf,#0'%100         'visible line 
            xcont   m_rf,#1
            xcont   m_rf,#2
            xcont   m_rf,#3
            xcont   m_rf,#4
            xcont   m_rf,#5
            xcont   m_rf,#6
            xcont   m_rf,#7            
            rep     @.end,#(120-8)/4/2
            cogatn  #(1<<Color_Cog)            
            xcont   m_rf,#0
            xcont   m_rf,#1
            xcont   m_rf,#2
            xcont   m_rf,#3 
            xcont   m_rf,#4
            xcont   m_rf,#5
            xcont   m_rf,#6
            xcont   m_rf,#7                          
 .end

jmg · 2020-02-03 19:10

Rayman wrote: »

BTW: It's interesting that it looks steady with P2 clock set to 250 MHz and fpix = 148_500_000.0
I thought I would need P2 clock set to fpix*2 to get a steady image...

I think that depends on what 'steady ' means.
If the P2 NCO generator resyncs every line, the timing columns in each line will line up within 1 sysclk, but they will vary across the line, as they will not be pixel-perfect, but will vary by half a sysclk from the ideal placement (due to NCO jitter)

Rayman wrote: »

And now, it's clear that there is an issue with 250 MHz clock.
Although the image is stable, I can see that vertical lines appear missing, it's probably doubling some vertical lines and missing others.
At 297 MHz, it looks perfect.

That's expected, I think ? - as you will have sampling issues at non-standard clock multiples, so something has to give.
For someone using larger fonts, this may be tolerable.

Rayman wrote: »

That's true, but it's not just the adapter, my old monitor was doing the same thing.
I'll test some more, but I think we'll just have to run at 297 or 297/2....
I wonder how safe it is to run at 297... I guess it's fine at room temperature.

Overclocking to 297MHz is likely to be optimistic for volume production, as we still do not have a handle on how much wafer lots will vary with P2.
The next chips will give a second sample, so may have some indication.

Rayman · 2020-02-03 21:02

I dug up some old code where I digitized the NTSC TV colors so can reproduce colors of Graphics_Demo better:

See here where it looks pretty close to original:

I added a delay to slow it down to somewhere close to the original.

This monitor is interesting... It's old and doesn't know 1080p. I think all newer monitors will recognize 1080p...
It only has 1680 horizontal pixels and yet looks good with 1920x1080 with P2 clock at 297 MHz.

BTW: I have to thank rogloh for showing me how to do 297 MHz. I didn't realize you could set xdiv to 20 and then xmul to 297...

1080p tile driver (P1 style)

Comments