Anti-aliased 24-bits-per-pixel HDMI

cgracey · 2024-02-14 08:53

I've been working on graphics for the P2, using the P2-EC32MB Edge module.

The PSRAM buffers 960x540 screens at 24bpp for a really nice picture over HDMI. The resolution isn't super high, but it looks surprisingly good with anti-aliasing.

I took the anti-aliased line-draw routine I made for the PC that the DEBUG displays use and got it running on the P2.

To try this code, you'll need a P2-EC32MB module and the DIGITAL VIDEO OUT board which connects 8 pins to an HDMI connector. I will talk about this on the live Propeller Forum tomorrow.

https://drive.google.com/file/d/1UVdZ3K8Q_14O703ysN0Moq1a7pkLBiCz/view?usp=sharing

Next, I want to make a triangle renderer with a Z buffer for 3D graphics.

With the "qHD" mode, or quarter-HD, we'll be able to show really nice anti-aliased fonts and graphics at the same time.

cgracey · 2024-02-14 09:08

Here are some colored lines. These are 1.5 pixels wide. The anti-aliased line draw has 8 sub-bits for each X, Y, and diameter. So, lines can be placed in X and Y at offsets of 256ths of a pixel. Line diameter is similar, but gets halved to make a radius in 256ths of a pixel. The minimum diameter is $100, or 1 whole pixel.

rogloh · 2024-02-14 10:41

Is there a simple way to get it to build with flexspin or does it need to be PNut only? I was hoping to run this demo on my Mac however I immediately encountered two problems when I tried to build with flexspin...

"repeat x with y" needed patching to the old "repeat y from 0 to x-1" - was simple to fix
setregs doesn't appear to be implemented - I'll need to go check the latest version's release docs etc as I'm still on 6.2 beta but it's probably related to management of variables held in COGRAM which I suspect still differs significantly between flexspin and PNut.

Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2023 Total Spectrum Software Inc. and contributors
Version 6.2.0-beta-v6.1.7-2-g588815b8 Compiled on: Jun 15 2023
LineDrawAntiAlias.spin2
|-PSRAM_driver.spin2
|-HDMI_960x540_24bpp.spin2
LineDrawAntiAlias.spin2:131: error: syntax error, unexpected '#'
LineDrawAntiAlias.spin2:137: error: syntax error, unexpected '#'

Maybe I'll just have to wait to get some form of Windows running again...I am planning a dual boot MacBook Pro setup in time with a newer larger SSD fitted once I get to opening the thing and installing it.

evanh · 2024-02-14 11:45

@rogloh said:
2. setregs doesn't appear to be implemented - I'll need to go check the latest version's release docs etc as I'm still on 6.2 beta but it's probably related to management of variables held in COGRAM which I suspect still differs significantly between flexspin and PNut.

Looking at LineDrawAntiAlias.spin2 I see a large DAT section starting with ORG with all RES declares then everything else is ORGH. I'm not sure what Flexspin will make of a DAT section like that but if it compiles that section then it shouldn't be hard to then make a SETREGS like function to match.

Maybe I'll just have to wait to get some form of Windows running again...I am planning a dual boot MacBook Pro setup in time with a newer larger SSD fitted once I get to opening the thing and installing it.

Pnut, unlike Proptool, runs on Wine fine. It even does the full debug features.

evanh · 2024-02-14 12:48

Chip,
Your hblanking is way too short! I've found 80 to be about the minimum.

PS: I've got it working via Wine. The no-picture had me scratching my head for a while though. Had to double check each part of of the setup before discovering you had the horizontal blanking at only 16!

Wuerfel_21 · 2024-02-14 12:59

@rogloh said:
1. "repeat x with y" needed patching to the old "repeat y from 0 to x-1" - was simple to fix

That I'm pretty sure is in there, just the current version.

@cgracey said:
To try this code, you'll need a P2-EC32MB module and the DIGITAL VIDEO OUT board which connects 8 pins to an HDMI connector.

Will need to make a VGA patch... Though the monitor I've been using is kinda dying. I got a fancy new capture card that could do HDMI, but ran into some medium driver issues. So bad monitor situation currently.

@cgracey said:
Next, I want to make a triangle renderer with a Z buffer for 3D graphics.

My research on the topic has actually led me to believe that doing higher convex N-gons directly can be faster than just triangles. The setup is somewhat more complex, but that gets made up for when you draw a quadliteral in one go (instead of two triangles drawing adjacent spans). That also makes clipping easier - when the tip of a triangle pokes outside the screen, clipping actually turns it into a quadliteral. I think the worst case is a hexagon when all three tips are copped by different clip planes. Of course clipping a quad can turn it into an octagon, but there really isn't a difference between rasterizing quads vs octagons. Just need to iterate through more vertices. Though interpolating values across the face is somewhat more complicated (need to recalculate scale factor per scanline) but would generally be nicer than an affine transform. I think perspective-correct interpolation needs per-scanline work, anyways, so maybe it doesn't matter there. I really haven't fully worked it out, either.

VonSzarvas · 2024-02-14 16:22

@cgracey said:

Is there a message in there Chip?
Stare long enough, and it seems to suggest... "don't touch the lonely red line"

cgracey · 2024-02-14 19:02

@evanh said:
Chip,
Your hblanking is way too short! I've found 80 to be about the minimum.

PS: I've got it working via Wine. The no-picture had me scratching my head for a while though. Had to double check each part of of the setup before discovering you had the horizontal blanking at only 16!

Yeah, I found that on my TV it could be set minimally, in order to get to 60Hz refresh.

All this timing was carry-in from the analog era. It seems that most of it can be squeezed out in HDMI. Sorry it was too short for your TV. I don't know what the minimum really is. This resolution was standard on many cell phones 12 years ago, but has since been eclipsed by higher resolutions.

cgracey · 2024-02-14 19:09

@Wuerfel_21 said:

@rogloh said:
1. "repeat x with y" needed patching to the old "repeat y from 0 to x-1" - was simple to fix

That I'm pretty sure is in there, just the current version.

@cgracey said:
To try this code, you'll need a P2-EC32MB module and the DIGITAL VIDEO OUT board which connects 8 pins to an HDMI connector.

Will need to make a VGA patch... Though the monitor I've been using is kinda dying. I got a fancy new capture card that could do HDMI, but ran into some medium driver issues. So bad monitor situation currently.

@cgracey said:
Next, I want to make a triangle renderer with a Z buffer for 3D graphics.

My research on the topic has actually led me to believe that doing higher convex N-gons directly can be faster than just triangles. The setup is somewhat more complex, but that gets made up for when you draw a quadliteral in one go (instead of two triangles drawing adjacent spans). That also makes clipping easier - when the tip of a triangle pokes outside the screen, clipping actually turns it into a quadliteral. I think the worst case is a hexagon when all three tips are copped by different clip planes. Of course clipping a quad can turn it into an octagon, but there really isn't a difference between rasterizing quads vs octagons. Just need to iterate through more vertices. Though interpolating values across the face is somewhat more complicated (need to recalculate scale factor per scanline) but would generally be nicer than an affine transform. I think perspective-correct interpolation needs per-scanline work, anyways, so maybe it doesn't matter there. I really haven't fully worked it out, either.

Yeah, it seems quadrilaterals would be fine. Even triangles typically get broken into TWO triangles at rendering, so that each begins and ends on a common Y. A section identical to the screen memory can be maintained in the PSRAM to act as a per-pixel Z buffer. Only nearer pixels get written to the screen memory and the corresponding location in the Z buffer is updated with the new distance. By alpha-blending the polygons onto the screen, I think it would look pretty good.

Rayman · 2024-02-14 19:16

Neat stuff. FTDI’s EVE series does subpixel stuff like that. This could be something I’d use with 7” hdmi tfts

3D would be neat for accelerometer and or IMU display…

Wuerfel_21 · 2024-02-14 20:15

@cgracey said:

Yeah, it seems quadrilaterals would be fine. Even triangles typically get broken into TWO triangles at rendering, so that each begins and ends on a common Y.

One can do it like that, but you end up splitting the long edge then. It's better to think of "which side will need a new vertex next" and then grab the next one up/down (depending on wether you're loading a right or a left vertex) and then recalculate that edge only. This all needs a bit of thought since you can go through multiple vertices without crossing an integer scanline boundary where you'd actually get to draw anything.

A section identical to the screen memory can be maintained in the PSRAM to act as a per-pixel Z buffer. Only nearer pixels get written to the screen memory and the corresponding location in the Z buffer is updated with the new distance. By alpha-blending the polygons onto the screen, I think it would look pretty good.

It's either or. Blending and Z-Buffer don't mix, because you can't meaningfully render behind something that's already been drawn with semi-transparency. So depending on how you do it, you'll either have weird occlusion effects (when blended geometry updates Z buffer) or a weird layering issue where further away objects seem to be on top of near objects (when blended geometry only reads Z buffer) This has always been the case. If you want to blend any pixels with the existing buffer (including edge AA), you need to draw in back-to-front order (or, with Z-buffering, draw all opaque geometry first and then the blended geometry in-order. This is slightly relaxed if the blend mode is cumulative, i.e. you're doing add/sub or XOR blending. In such cases the blended geometry doesn't need to be sorted with itself, only with respect to the opaque stuff (no dice if you want to have both additive and alpha blend in the same scene though).

Of course, if you can sort the entire scene you can just toss the Z-buffer entirely. Loads of ways to do this, all slightly fiddly (BSP trees say hello!). The most general is to just have a load of buckets (between 256 and 1024 or so?) that you sort each primitive into based on the Z value of it's vertices. Generally the average of them all, but for something like the walls and floors of a room that objects may directly rub up against, you'd rather use the deepest vertex's Z to avoid sorting issues. This is how loads of CPU-only and early hardware 3D (Playstation 1, SEGA Saturn, etc) did it. PS1 documentations calls this method/the data structure "ordering table". This requires that geometry processing and pixel drawing are done as two phases. But it's always somewhat imperfect, of course. You can very easily create a construct that defies any attempt at sorting:

A BSP tree would split up such a construct and provides perfect ordering (not neccessarily perfect depth sorting! things that don't overlap can be in any order...) in all cases, but is only good for fully static geometry. Of course, since the order that things get added to a single ordering table bucket is relevant in itself, one could combine BSP and OT to get perfect sorting within each static object and approximate sorting between objects.

tl;dr; approaches to 3D graphics are infinite in number and infinitely interesting. Haven't even talked about the Quake edge-sorting algorithm (unlike many believe, quake does not use BSPs for ordering).

cgracey · 2024-02-14 20:37

Wuerfel_21, I didn't say what I meant quite right. I know that alpha-blending whole polygons is impossible without Z-ordering per pixel. I meant to say that I would blend the edges, as in anti-alias them. I don't think this would have any detrimental effect. All polygons would be considered opaque, but the edges might as well get blended to reduce jaggies.

Wuerfel_21 · 2024-02-14 21:09

@cgracey said:
Wuerfel_21, I didn't say what I meant quite right. I know that alpha-blending whole polygons is impossible without Z-ordering per pixel. I meant to say that I would blend the edges, as in anti-alias them. I don't think this would have any detrimental effect. All polygons would be considered opaque, but the edges might as well get blended to reduce jaggies.

There's no difference though - the weird effect would be isolated to the edges, but you always get artifacts if you do something that reads the color underneath at all.

evanh · 2024-02-14 21:36

@cgracey said:

@evanh said:
Chip,
Your hblanking is way too short! I've found 80 to be about the minimum.

PS: I've got it working via Wine. The no-picture had me scratching my head for a while though. Had to double check each part of of the setup before discovering you had the horizontal blanking at only 16!

Yeah, I found that on my TV it could be set minimally, in order to get to 60Hz refresh.

All this timing was carry-in from the analog era. It seems that most of it can be squeezed out in HDMI. Sorry it was too short for your TV. I don't know what the minimum really is. This resolution was standard on many cell phones 12 years ago, but has since been eclipsed by higher resolutions.

Yeah, I don't know what is a safe generalised minimum either.

As for the resolution, in DVI/HDMI there's no restrictions on selection other than multiples of 8 for horizontal and obviously there is a max resolution supported.

You could choose a resolution from the desired dotclock and refresh: Start with 32 MHz and 60 Hz. 32e6 / 60 = 533e3 total dot area, sqrt = 730, x 1.333 = 974 htot, - 80 hblank = 894, round = 896 hres, / 1.78 = 504 vres.

Interestingly, tweaking these, I find my TV is good down to 60 hblanking here. I'm not sure how I figured 80 as the minimum to be honest.

EDIT: So, redoing it at hblank = 60: ... 974 htot, - 60 hblank = 914, round = 912 hres, / 1.78 = 513 vres.

Of course, 960x540 works fine at 56 Hz refresh too.

evanh · 2024-02-14 22:29

Huh, never expected that. My TV is also fussy about the vertical back porch (top blanking lines). It needs a minimum of 9 lines there. I had been unsure about where to place the vsync, so.it looks like there's more leeway when it's at the beginning of the blanking.

Roger,
Need more allocated bits for this in your timings structure!

evanh · 2024-02-14 22:58

Here's an example using the tightest DVI/HDMI blanking timings for my TV:

 Sysclock freq = 320 MHz   Dotclock freq = 32.0 MHz
 Hres=1280  hfp=4 hsync=52 hbp=4  Htot=1340   Hfreq = 23881 Hz
 Vres=640  vfp=1 vsync=2 vbp=9  Vtot=652   Vfreq = 36.6 Hz

EDIT: It also seems to be happy to accept up to 75 Hz refresh rate but I know other monitors I have top out at 60 Hz refresh.

 Sysclock freq = 172 MHz   Dotclock freq = 17.2 MHz
 Hres=640  hfp=4 hsync=52 hbp=4  Htot=700   Hfreq = 24571 Hz
 Vres=320  vfp=1 vsync=2 vbp=9  Vtot=332   Vfreq = 74.0 Hz

EDIT2: Uh-oh, so the vblanking has more complexity here. It can go lower when the vres is lower, which means it probably also requires more when the vres is higher ... or not. The 640 vres didn't need any more blanking ... and 640x800 is fine too ... 640x1080 also good. That's the max vertical.

rogloh · 2024-02-15 00:02

@Wuerfel_21 said:

@rogloh said:
1. "repeat x with y" needed patching to the old "repeat y from 0 to x-1" - was simple to fix

That I'm pretty sure is in there, just the current version.

Yeah I'm outta date yet again. Need to upgrade.

@evanh said:

@rogloh said:
2. setregs doesn't appear to be implemented - I'll need to go check the latest version's release docs etc as I'm still on 6.2 beta but it's probably related to management of variables held in COGRAM which I suspect still differs significantly between flexspin and PNut.

Looking at LineDrawAntiAlias.spin2 I see a large DAT section starting with ORG with all RES declares then everything else is ORGH. I'm not sure what Flexspin will make of a DAT section like that but if it compiles that section then it shouldn't be hard to then make a SETREGS like function to match.

I've started working down that path. I just need to call the smoothline function from Spin somehow - or make it inline.

@evanh said:
Huh, never expected that. My TV is also fussy about the vertical back porch (top blanking lines). It needs a minimum of 9 lines there. I had been unsure about where to place the vsync, so.it looks like there's more leeway when it's at the beginning of the blanking.

Roger,
Need more allocated bits for this in your timings structure!

LOL, your favourite bugbear. How many bits do you need for it?

Speaking of timings I temporarily commented out the DAT PASM section in Chip's demo code and hacked in my own smoothline function for flexspin to use without the anti alias stuff and found my resurrected Dell2405FPW did accept the timings. They are tight! 16 pixels of horizontal and 9 lines of vertical blanking with a 960x540 active area at 60Hz. Cool.

Amazing you can get something with this little blanking going. No way it works with VGA at this rate, it needs more blanking for that. My own video driver certainly wouldn't be able to do this little horizontal blanking due to its other housekeeping code required in this interval, like issuing external memory reads and loading in palettes to LUTRAM etc. Also, I found it didn't sync at all on another 17 inch TFT I have though (Samsung B1740). So not all monitors are going to like this signal.

evanh · 2024-02-15 00:48

@rogloh said:
LOL, your favourite bugbear. How many bits do you need for it?

I'll get back to you on that. I want to allow space for VRR in the vertical allocations.

... They are tight! 16 pixels of horizontal and 9 lines of vertical blanking with a 960x540 active area at 60Hz. Cool.

I think Chip might have it at just 8 lines blanking. Isn't the single first line also the sync? ie front porch = 0.

And that 8 works for me. Just had to up the hblanking to 60.

... My own video driver certainly wouldn't be able to do this little horizontal blanking due to its other housekeeping code required in this interval, like issuing external memory reads and loading in palettes to LUTRAM etc. ...

Oh, I was using your driver in my testing above ... I still need the hblanking of 60 but I can reduce the vblanking further using Chip's driver. So blanking of 60x8, instead of 60x12, works now. Not sure of other resolutions, Chip's program needs 960x540 specifically.

rogloh · 2024-02-15 02:13

@evanh said:]
I think Chip might have it at just 8 lines blanking. Isn't the single first line also the sync? ie front porch = 0.

Just double checked the code, yes you are correct. It's just 8 vblank lines total including the sync.

cgracey · 2024-02-15 06:06

@rogloh said:

@evanh said:]
I think Chip might have it at just 8 lines blanking. Isn't the single first line also the sync? ie front porch = 0.

Just double checked the code, yes you are correct. It's just 8 vblank lines total including the sync.

Yeah, it's one vsync line and seven blanks. I need to know how tight this can be safely pushed. Ada said today that we need something like 34 total horizontal blank pixel periods to accommodate data packets for sound.

It would be good to know exact numbers.

evanh · 2024-02-15 10:33

An old Dell U2412M DVI monitor (My first LCD monitor) wants minimum hblank of 68. But it's a lot fussier about resolution options too. More like how VGA inputs work. Ah, uh-oh, the tight timings only seems to work for modes that weren't a VGA type mode. Basically, it's rubbish at adjusting even though it could do so easier than the fixed modes list it has been programmed with.

So it looks like the fully flexible resolution detection is actual a newish (last ten years or so) ability of monitors and TVs.

evanh · 2024-02-15 10:56

@evanh said:
So it looks like the fully flexible resolution detection is actual a newish (last ten years or so) ability of monitors and TVs.

Maybe that came along with firmwares that supported HDMI. Dunno.

pik33 · 2024-02-15 17:48

Here are my timings for 1024x600@50 Hz:

'                      bf.hs, hs,  bf.vis  visible, up p., vsync, down p.,  cpl, total lines, clock,       hubset                                scanlines  ud bord mode reserved
timings         long   8,     60,  8,       1024,   7,     4,     1,        128, 600,         340500000,   %1_100111__10_1010_1000__1111_1011,   600,        0,     192, 0, 0

76 pixel horizontal, 12 lines vertical.

It worked on what I managed to connect to the P2...

TonyB_ · 2024-02-15 21:23

How low could sysclk go for 960x540 @ 50Hz? (I don't have the Digital Video Out board.)

pik33 · 2024-02-15 22:32

How low could sysclk go for 960x540 @ 50Hz? (I don't have the Digital Video Out board.)

If similar to my 1024x600 sync timings are used, something about 290 MHz.

evanh · 2024-02-16 00:04

Chip's timings: 50 x (540 + 8) x (960 + 16) x 10 = 267.424 MHz
Evan's timings: 50 x (540 + 8) x (960 + 60) x 10 = 279.48 MHz
Pik's timings: 50 x (540 + 12) x (960 + 76) x 10 = 285.936 MHz

PS: Pik's timings will provide the most universal coverage. That'll suit my old Dell because it's not a recognised VGA mode and therefore it'll accept reduced blanking.

rogloh · 2024-02-16 01:50

I was able to modify Chip's code slightly to run on flexspin to see this demo on my Mac. It just runs a seperate COG and waits for a command to occur via a cmd mailbox. Quick hack for now.

If anyone else wants to run it without using PNut/PropTool give this version a go by replacing LineDrawAlias.spin2 with this attached file (and keep the other two source files as they are from Chip's code).

Tested on flexspin v6.2.0-beta.

Not sure why the top frame buffer scan line is not being cleared fully at the start (driver cog not ready?), but didn't dig very far into it. Also there is a warning about no flags being set and if you try to fix it with a "wz" added it then messes up the anti-aliased graphics. Maybe these are flexspin vs PNut differences or some other bugs. In any case you still see the demo.

LineDrawAntiAlias.spin2:660: warning: instruction cmp used without flags being set

EDIT: yes if I add a startup delay of 1sec before gfx are drawn the uncleared scan line disappears. Seems to be COG startup timing related.

evanh · 2024-02-16 03:08

Replacing the REPEAT with the WAITMS worked for me.

  coginit(NEWCOG, @gfxcog, @cmdbuf)
'  repeat while cmdbuf[0]
  waitms(1)

Which suggests the REPEAT isn't working.

rogloh · 2024-02-16 03:30

@evanh said:
Replacing the REPEAT with the WAITMS worked for me.
  coginit(NEWCOG, @gfxcog, @cmdbuf)
'  repeat while cmdbuf[0]
  waitms(1)
Which suggests the REPEAT isn't working.

I found it happens earlier on and it needs a 1ms wait after the hdmi COG is spawned before the first PSRAM clearing write access can occur. Still not entirely sure why. EDIT: One possible theory is that it could be priority related if the HDMI COG is spawned while a large PSRAM write is underway and the video COG's PSRAM initial reads are delayed and get out of sync with the scan line being rendered. Unlike my driver there is no priority for video COGs and fragmentation for non-realtime COGs in Chip's PSRAM driver, which could cause problems like this.

PUB start()

  psram.start()

  hdmi.start(0, psram.pointer(), 0)
  waitms(1)  '  <<<  adding this fixes graphics issue with first scan line

  psram_ptr := psram.pointer() + cogid() * 12

  mapbase := 0
  pixeltype := @smooth_pixel1

pik33 · 2024-02-16 07:24

Chip's timings: 50 x (540 + 8) x (960 + 16) x 10 = 267.424 MHz

16 horizontal? I had problems even at 60 with several monitors.

evanh · 2024-02-16 07:37

@pik33 said:
16 horizontal? I had problems even at 60 with several monitors.

Were those all using DVI/HDMI links?

Anti-aliased 24-bits-per-pixel HDMI

Comments