Anti-aliased 24-bits-per-pixel HDMI

SaucySoliton · 2024-02-19 18:05

I was looking into the Parallax bitmap font and made an observation that could reduce the storage requirements of rendered fonts. Background: In order to display a high definition picture through the RF input of a TV, the image will need to be MPEG encoded. The P2 doesn't have the power to encode a 1080i MPEG signal, so maybe we could speed things up with pre-encoded data. The Parallax bitmap font is 16x32, so it fits nicely into 2 macroblocks. A macroblock is made up of 4 8x8 blocks that are mostly processed separately. The idea was to do the computationally intensive DCT, quantization, and entropy coding once and save the encoded block data. The slightly annoying thing is that when MPEG encoded, the font images used 4-6x more memory than they did in the original bitmapped format. The reason for that is the font is stored as 1 bit per pixel. When rendered to an 8x8 MPEG block it becomes 8 bits per pixel.

Looking at the font, it looked like there was a lot of redundancy in the characters. For example, the 3, 5, 6, 8, and 9 all have a round loop on the bottom. So I made a little program that split up the font into 8x8 blocks and sorted them to see how many are unique. It turns out that this can reduce the memory required to store the font by at least half. The same algorithm should work at different sizes, and even if the font is anti-aliased.

Characters  8x8 blocks   Unique  Storage   Font Contents
 32 - 127       768        340      44%    Printable ASCII
 32 - 255      1792        434      24%     + Accents
  0 - 127      1024        464      45%     + Window decorations
  0 - 255      2048        553      27%     All

evanh · 2024-02-19 22:50

@cgracey said:

This font's pitch is 9x16 pixels and it's anti-aliased. It blends onto the background quite nicely. Here is some code with color sweeps.

I just realised that's a 2 MB screen buffer! No wonder it needs the external RAM chips. The antialiasing does look real sweet on the fonts.

rogloh · 2024-02-19 23:14

Can the nice background stay put while the text screen scrolls above it? That effect would look real nice if it was fast enough. You'd essentially have to bulk copy from some background buffer into a fresh working buffer via the HUB where you overlay the updated text. It's a lot of PSRAM reading/writing per frame so it might be fairly slow, but if you could do it at the frame rate it'd be silky smooth and very nice to see. Even at 30Hz it might still look decent.

This triples the PSRAM bandwidth of that which the video itself needs. So that's at least (960 * 4 * (548 * 60)) * 3 Bytes/sec for 60Hz 960*540 32bpp video data even before overheads. This comes to 379MB/s - Nope! So it would have to be done at 30Hz to have a chance. Two frames to render each new frame. Even then with all the pixel operations needed it is not going to be feasible at that rate. Bitmapped text gets slow once you work with pixels or need to use an alpha blending operation. It likely crawls but is still worth a try to see what is possible. Perhaps if multiple COGs are used it might have a better chance.

EDIT: thinking more, it might be doable if a video driver rendered text over the read PSRAM background bitmap buffer on the fly before it streams out to pins, just like doing sprites, because you then don't have to write rendered text fonts back to the PSRAM. 960 pixels operations per scan line will take it a while, and depends on how tight the per pixel loop can be made. You only have around 10 P2 clocks per pixel at this scan rate at 320MHz if you consume the full scan line and some of that is needed for the data transfers. You also have the limits of storage in the COG/LUT RAM to deal with. If you want to use fancy blending operations per pixel I expect it would still need multiple COGs though.

evanh · 2024-02-20 06:11

Could probably do it as an overlay of the two buffers merged on the fly. Only needs double bandwidth then.

rogloh · 2024-02-20 06:19

Here's the font stuff put into Chip's demo with the quadratic Bezier curves added. I only tested with flexspin but hopefully it might still run with PNut the way I coded it. TBD.

Also, it might be slower to process the data in PNut vs flexspin as much of the work is coded in SPIN2.

cgracey · 2024-02-20 15:14

@rogloh said:
Here's the font stuff put into Chip's demo with the quadratic Bezier curves added. I only tested with flexspin but hopefully it might still run with PNut the way I coded it. TBD.

Also, it might be slower to process the data in PNut vs flexspin as much of the work is coded in SPIN2.

So, what you are showing is that TTF's are points with Bezier curves, mainly. Is that right? I remember they allow some kerning rules and fixed typefaces for lower-res fonts, as well.

In your estimation, does the TTF format look pretty efficient or is it kind of bloated? It's really tempting to use TTF's if they can be small enough.

I will try to run your code soon to see what it does.

Wuerfel_21 · 2024-02-20 16:23

@cgracey said:
In your estimation, does the TTF format look pretty efficient or is it kind of bloated? It's really tempting to use TTF's if they can be small enough.

If it's too bloated, one could just convert it to a custom format. Some preprocessing is neccessary, anyways, if you want to use just any font, since many halfway competent fonts will have tens of MBs of various unicode characters (mostly CJK).

rogloh · 2024-02-20 21:13

@cgracey said:

So, what you are showing is that TTF's are points with Bezier curves, mainly. Is that right? I remember they allow some kerning rules and fixed typefaces for lower-res fonts, as well.

Yeah quadratic Bezier curves. Some other font formats may use the cubic ones. There are other rules/instructions for those low-res ones which I've not looked into as yet.

In your estimation, does the TTF format look pretty efficient or is it kind of bloated? It's really tempting to use TTF's if they can be small enough.

TTF files can get big if there are large number of characters in the set but the actual useful raw contour data in the TTF file per glyph is very efficient and not bad if you just wanted a smaller set such as an ASCII/Alphanumeric range for example. It uses compression and deltas instead of absolute values. There is one flag byte per X,Y delta with bits to indicate if the point is on the curve or a control point, whether the flags repeat (to save on more storage), along with 8 bit deltas for the X/Y movements and +/- flag bits to provide full +/-255 range (or optional signed 16 bit offset if needed) and zero deltas are not stored which commonly happens with vertical/horizontal lines segments. Plus some intermediate values on the curve between control points aren't sent. So I'd say it's packed very well at the expense of some more logic to figure it out (but that's fast). You can certainly pre-process as Wuerfel_21 mentioned to save re-parsing all the contour lists each time it is drawn if you are using vectors/outlines only and this could also indicate whether to draw a curve or a line (which otherwise needs extra lookups to figure out). Bitmapped versions need to be more heavily processed if you want them to look really good but that could possibly happen offline for the sizes you are interested in.

I'll try to look into the flood fill rasterization for a large font in the next few days. That gets more complicated and probably needs sorting/interpolation per scan line as you work your way around the contour data and I expect it looks worse for smaller font sizes unless more processing is done. I imagine it is more efficient to do all scan lines in one go, extracting all the contour data per scan line so you know where spans start and stop but that could require quite a lot of temporary storage if there are lots of contours crossing the scan lines and the glyph is tall. If you instead worked on one scanline at a time with enough storage just for that you would have to parse the contours multiple times to see when they cross the scan line, and that's going to be slower to do if you had several tens of scan lines for heights. Will have play around with this.

Looking at the Parallax font data in the pictures above it's probably common to have one or two spans to fill per glyph per scanline so that's just 4 co-ordinate pairs per scan line, but this could possibly go higher to like 8-10 co-ordinate pairs per scan line for more complex glyphs such as Asian characters like this one below. For a large font 100 pixels tall on screen if you want to process all contours once for all scanlines you could need to have up to 40x100 = 40kB in order to store it fully (if each co-ordinate pair fits in a long), or you might assume not all scan lines are going to need that much space and just consume what you need from a (smaller) shared pool of memory. There are always different approaches that will trade speed for space etc.

I will try to run your code soon to see what it does.

Yeah take a look when you can. Remember though this was just a proof of concept to get something going quickly and any real code could/should get optimized and there are plenty of things missing or bugs. Hopefully it still works on PNut, if not you'll know how to resolve. Use of large fonts can look nice for titles etc in graphics displays.

Tubular · 2024-02-20 21:16

Amazing to see TTF fonts rendered by the P2

I've been meaning to hook a P2 up to the big laser and do something real/live with gcode generation. A 'laser typewriter' could be a good starting point

rogloh · 2024-02-20 22:54

Yeah Tubular I was thinking about your laser and gcode. Or maybe even your robot arm. Add a paintbrush to the end of it and make a real "Paint" program.

rogloh · 2024-02-23 04:00

Worked on a flood fill operation for TT outline fonts using a coordinate sorting method today. It's not 100% correct for some curves and I'm still figuring out why (probably due to some quantization errors or duplicates exceeding the storage limit) but it is now generating some output.

The outline fill operation starts to get fairly slow when the font size gets bigger on screen and there is a lot of sorting needed. I think it would need to be written in inline PASM2 code for best results. It's somewhat of a memory pig for storage too with up to 10 longs per scanline (540) or ~21kB to track 5 flood spans per scanline per glyph being consumed in my code. Maybe drawing the outline of the glyph into the PSRAM buffer and then reading back to HUB RAM for pixel testing along the line during outline fills would be faster, then you write back to PSRAM again over the top of where you want to fill (and possibly applied with your transparency effect). Another idea is to use the empty bits in the 32 bit long beyond the 24bpp colour data to indicate where glyph flood fill spans start/stop when you draw the outline initially and you then readback into HUB and search for those bits and fill between them later (and clear these special bits out). That's probably worth testing out next and wouldn't need too much storage (one 960 pixel scan line is less than 4kB at 32bpp), plus there is less processing variation between characters and no real sorting needed. Fill time is then more a function of character width and height and less the amount of contour information in the glyph.

Rayman · 2024-10-14 21:54

@cgracey said:

That is very interesting, Roger.

I have added text to my anti-alias stuff. In this demo, I started to add your Bezier in, too, but then I realized there were modifications to the line draw and I was too tired to sort it out. So, it's just the text for now.

This font's pitch is 9x16 pixels and it's anti-aliased. It blends onto the background quite nicely. Here is some code with color sweeps.

@cgracey Is this code posted here? I can't seem to find it.
Or, is the font rendering just a call to SmoothPixel() for every pixel in the font?

Rayman · 2024-10-14 23:40

Looks like this is the part need to get a handle on:

                setpiv  plot_color              'blend background pixel with new pixel
                blnpix  pa,plot_color

Appears that plot_color must be encoded as RGBA as setpiv just uses the lower 8 bits. So, setpiv is setting the alpha factor to be used.
Then, blnpix uses that alpha and the two colors in pa and plot_color to create a new color in pa.
Guessing this ignores the alpha in pa...

rogloh · 2024-10-14 23:56

@Rayman said:
Looks like this is the part need to get a handle on:
                setpiv  plot_color              'blend background pixel with new pixel
                blnpix  pa,plot_color
Appears that plot_color must be encoded as RGBA as setpiv just uses the lower 8 bits. So, setpiv is setting the alpha factor to be used.
Then, blnpix uses that alpha and the two colors in pa and plot_color to create a new color in pa.
Guessing this ignores the alpha in pa...

Yeah it doesn't use those lower 8 bits directly for transparency. They are used in the prior instruction. The pixel effects are cool but only really usable in 32bpp mode or LUMA modes or if you convert back and forth using RGBSQZ/RGBEXP from 16 bit mode which adds overhead. Maybe you could use them with LUT 256 if you have a gradient in LUT for example. I think these pixel instructions would have benefitted from different M values per byte (possibly via a prior SETQ). Then you could have used them for transparency with 8 bit modes on a per pixel basis or to accelerate 8 bit sprite stuff, but MUXQ can still be used for that.

Rayman · 2024-11-04 22:35

I'm thinking this is very useful for drawing some GUI elements, like dials and clocks.

Will probably target PSRAM buffer eventually, but might look to see if can hijack this for 256-color VGA mode where 16 palette colors represent 4-bit alpha transition between two colors.

Also, this HDMI mode is interesting, have to see how well it works on my monitors...

Rayman · 2024-11-05 14:58

@rogloh Guess I could use code that draws circular arcs with antialiasing. Guess your code here can do that?

rogloh · 2024-11-05 19:30

Yes that code I posted a while back can compute/draw arcs with Bezier curves and cubic or quadratic interpolation but I didn't do the anti-aliasing part. That is all Chip's stuff.

Rayman · 2024-11-05 22:37

My test TV won't take the signal... But, this was with Prop Tool, maybe I need PNUT?
Guess I'll try that. Chip's spiral code works, so know setup is OK.

Dug out a 32MB PSRAM Edge module for this.
At first, module wouldn't work at all. But then, saw need to flip some tiny switches to make it work

Rayman · 2024-11-05 22:42

Ok, PNUT doesn't make it work either...
Think I'll change the code to 640x480 and see if that works...

Rayman · 2024-11-05 22:49

Ok, just changed the resolution constants to 640x480 and it works.

Rayman · 2024-11-16 21:45

Moved the graphics routines into a separate cog.
Hopefully, can be beginning of an anti-aliased GUI.

Also, hope to be able to use with FlexProp this way...
Rigged PSRAM driver for Platform board, but it's easy to switch back to Edge 32 MB in OBJ section...

Rayman · 2024-11-16 23:19

Was hoping a very short line would make a circle, but no such luck

Rayman · 2024-11-18 21:36

Now adapted for VGA.
Can use either Edge 32MB module or Platform/SimpleP2 boards with PSRAM.
This is with graphics moved to it's own cog.

Attached is rigged for Edge with VGA on basepin #16, but easy to change.

The close up view is interesting because it is showing double buffering in PSRAM with screen clear and about half the screen drawn on between frames.
Refresh rate looks like somewhere between 5 and 10 fps by eye.
Not exactly great for gaming, but should be good enough for GUI type usage and such...

Actually, on a 320x240 display, it might be good enough for gaming...

Rayman · 2024-11-25 22:33

Looking into changing it from 24-bit to 16-bit.
24-bit looks more efficient because it's the natural quanta of the PSRAM.

16-bit mode pixel operations mean figuring out which word to use, using RGBEXP and then using RGBSQZ, then setting the right word with the result.
So, going to add 8 or so instructions to the setpixel call.

The advantage is that doing simple buffer copy operations can be twice as fast...
So, video can be 3X faster...

evanh · 2024-11-26 05:45

If the display buffer is always 32-bit aligned then even horizontal pixel positions are also the even sub-addressing.

Rayman · 2024-11-29 17:37

Finally got 16bpp version of this working.
Took forever to figure out that can't set Z or C flags.
Added this to the end and all good now:
modcz _clr,_clr wcz 'Need to clear CZ!!!

This is VGA and should work on Edge 32MB with a change in the OBJ section.

Maybe should get the HDMI version working in 16bpp one day...

Rayman · 2024-11-30 19:04

Added anti-aliased circles to the mix, based on code from here, converted to assembly:

https://github.com/Versa-Design/Antialiased_Circle

Still need to work on 16-bit HDMI...

refaQtor · 2024-12-27 08:16

@Rayman said:

Still need to work on 16-bit HDMI...

I like what you've got going on here. I only had HDMI AddOn boards on my bench. I just ordered a couple VGA AddOn boards so I can get this (your) code going on my bench. I will certainly get to using them in any case.

Anti-aliased 24-bits-per-pixel HDMI

Comments