All PASM2 gurus - help optimizing a text driver over DVI?

rogloh · 2019-10-18 22:15

potatohead wrote: »

One stupid easy way to do clipping is to simply allocate undisplayed buffer RAM on either side of the scan line.

Then, just write the mouse data normally without clipping, just a normal bounds check.

Yes potatohead, I've used that technique before myself too in other sprite drivers. In this case right now it is a bit tricky because I am taking advantage of the automatic FIFO block read wraparound to get the next scanline continually drawn without reprogramming the FIFO, however once I figure out how to start and stop the FIFO on demand with enough advance notice to let it spool up in time and flush it properly etc instead of set and forget, I might be able to do that too. It could simplify clipping, and let me reduce down the line buffer size from 4 scanlines to a minimum of two (for ping/pong alternating buffers). Right now I need to maintain at least 4 scanlines because the 1bpp mode with 640 pixels takes only 20 longs or 80 bytes per scanline. 4 times this is 320 bytes which is the first multiple of 64, the lowest FIFO block transfer size multiple for wraparound. For the smaller colour bit depths this extra space is no big deal but keeping 4 true colour scanlines does burn 10kB instead of potentially 5kB.

By the way there is one other benefit of keeping more than 2 scanlines in the line buffer and that is because in theory it allows you to post-process the scanline further after it is rendered by my COG but before it is displayed. So you could construct another COG to do special pixel effects on the line before final output, using BLNPIX and MIXPIX types of operations. With my DVI driver model, you could have the display COG put text/or graphics bitmaps on the scanlines for you and other COGs to put sprites on some scanlines as well, either before it gets in to my COG, or even after it is created, on top of background graphics bitmaps or text etc. Having region lists of scanlines would probably dovetail nicely with that overlaying ability too. It could be quite versatile and allow some cool things to be done which is my long term plan...

rogloh · 2019-10-19 04:08

AJL wrote: »
If my figuring is correct this should work for all bit depths:
  altd ptra, #$EF
  setq2 0-0
  wrlong $110, save
  ret wcz
Edit: forgot the need to subtract one from the setq2 value. Code updated.

This looks handy too AJL. I will have to check it out carefully, though in my case I think I could just subtract directly from ptra and use it directly without the altd. I don't need it's value preserved after this point. One instruction is a lot better than my first way of four!

AJL · 2019-10-19 04:41

rogloh wrote: »
AJL wrote: »
If my figuring is correct this should work for all bit depths:
  altd ptra, #$EF
  setq2 0-0
  wrlong $110, save
  ret wcz
Edit: forgot the need to subtract one from the setq2 value. Code updated.
This looks handy too AJL. I will have to check it out carefully, though in my case I think I could just subtract directly from ptra and use it directly without the altd. I don't need it's value preserved after this point. One instruction is a lot better than my first way of four!

Yes, it was fairly late when I was looking through this and wasn't too sure what needed to be preserved, so the altd magic looked like a good way to go. A simple sub would make for clearer code.

rogloh · 2019-10-21 01:22

I've got most of this DVI driver going now. Just tidying up and testing a few things now and looking into HyperRAM. I have about 33 longs left in COGRAM right now. Just enough for HyperRAM, the sync update, and maybe I can squeeze in region lists if I get creative. It's close now. I do still have some LUT RAM free too.

Current features:

40/80 column text, independent foreground and background colour per character from 16 colour palette selected from 16M colours, flashing text or high intensity background colours, dual independent text cursors which can be set to flash/solid, block/underline and to different colours.

16x16 pixel transparent mouse sprite, working in all resolutions and bit depths, including in text mode.

Split horizontal vertical screen, buffer wraparound modes. Fine text scrolling. Horizontal panning with skew.

Scan line doubling.

Font height fully programmable.

320/640 pixel wide graphics modes in all supported P2 modes.

'   Color Mode: colour mode to use for the frame in graphics mode (4 bits)
'          
'    Dec Hex Binary    Colour mode    Bpp    Colours             Pixels/Long
'    --- --- ------    -----------    ---    -------             -----------
'     0  $0  %0000  -  LUT palette     1       2/16M                   32
'     1  $1  %0001  -  LUT palette     2       4/16M                   16
'     2  $2  %0010  -  LUT palette     4      16/16M                    8
'     3  $3  %0011  -  LUT palette     8     256/16M                    4
'     4  $4  %0100  -  RGBI 3rgb+5I    8     8 colours x 32 levels      4
'     5  $5  %0101  -  RGB (3:3:2)     8     256 colours                4    
'     6  $6  %0110  -  RGB (5:6:5)    16     64k "Hi-color"             2
'     7  $7  %0111  -  RGB (8:8:8:0)  32     16M "Truecolor"            1
'     8  $8  %1000  -  LUMA orange     8     256 levels                 4
'     9  $9  %1001  -  LUMA blue       8     256 levels                 4
'    10  $A  %1010  -  LUMA green      8     256 levels                 4    
'    11  $B  %1011  -  LUMA cyan       8     256 levels                 4
'    12  $C  %1100  -  LUMA red        8     256 levels                 4
'    13  $D  %1101  -  LUMA magenta    8     256 levels                 4
'    14  $E  %1110  -  LUMA yellow     8     256 levels                 4
'    15  $F  %1111  -  LUMA white      8     256 levels (greyscale)     4

'

msrobots · 2019-10-21 01:42

Just WOW.

This is a lot packed into one COG. I am a bit irritated about the DVI part, so is this supposed to use the P2 Eval HDMI adapter and connect to a Flat-TV?

Does it work with Rev 1 silicon?

curious,

Mike

rogloh · 2019-10-21 01:46

Right now it does not work with Rev1 silicon. It uses the DVI capabilities of the RevB silicon. However a future port of it to VGA should be quite possible in time. It is designed with the HDMI adapter and can be hooked to monitors with HDMI or HDMI to DVI adaptors.

Actually I don't think it would take all that much to adapt it to pure analog VGA, but the clocking potentially could be different there. I've not played with the analog outputs yet.

Peter Jakacki · 2019-10-21 02:09

msrobots wrote: »

Just WOW./quote]

+1
Very good work Roger.

TonyB_ · 2019-10-21 02:22

rogloh wrote: »

I've got most of this DVI driver going now. Just tidying up and testing a few things now and looking into HyperRAM. I have about 33 longs left in COGRAM right now. Just enough for HyperRAM, the sync update, and maybe I can squeeze in region lists if I get creative. It's close now. I do still have some LUT RAM free too.

Impressive work.

Have you thought about supporting 720 and 800 active pixels/line displays? (858 and 1056 total pixels/line.)

rogloh · 2019-10-21 02:32

TonyB_ wrote: »

Impressive work.

Have you thought about supporting 720 and 800 active pixels/line displays? (858 and 1056 total pixels/line.)

Thanks TonyB_ , Peter and Mike.

Not yet but I'm trying to not directly preclude additional resolutions someday. I think overall CPU timing will tell us if it is doable, should be in DVI mode. I think the column resolution will want to be multiples of 8, and ideally 32. 720 halved is 360 pixels which at 1bpp resolution is not a multiple of 32 bits so things can potentially get trickier there with buffer sizes. Haven't gone there yet...

Cluso99 · 2019-10-21 03:16

Nice work Roger

Having this so early for the P2 release should gain some good traction and standisation

rogloh · 2019-10-21 07:41

TonyB_ wrote: »

Have you thought about supporting 720 and 800 active pixels/line displays? (858 and 1056 total pixels/line.)

Thinking more about this... from a graphics resolution point of view I think it would be okay to go wider, as I break up the transfers into smaller portions that fit in the memory available, so for wider screens there would just be more overall portions to transfer. There is certainly sufficient time left in the scanline even with pixel doubling to do 800 pixels in pure DVI mode. The only immediate issue may be more on the text driver side, as I am using 40 longs in COGRAM to hold the 80 characters, and boosting this up to 50 might start to be a stretch once I'm done but it's probably achievable one way or another in DVI mode by shuffling things about and just using up some LUT RAM for executable code which I am trying to avoid for other reasons....

Are there lots of displays that do 800 pixel widescreen VGA using DVI/HDMI directly? I know there are some LVDS and parallel bus panels in this resolution, wasn't so sure about the popularity of DVI versions without introducing other translater/adapter boards driving them somewhat like RasPi uses for example, though that one is MIPI based. Unless they can refresh slower than 60Hz they'd probably need to be operating with reduced blanking and that may possibly affect my ability to support a mouse sprite in external memory modes, but probably not in Hub memory modes that I can render directly myself.

The other option is to offer a parallel bus output version of the driver which could drive the cheap WVGA panels digitally at other non-VGA pixel rates if the CLK and DE signals can be generated by smartpins etc, though that burns lots of P2 IO pins. This DVI design has to be clocked at 10x the pixel rate to work with TMDS. A parallel or analog variant of this driver is probably somewhat more flexible there with the clocking requirements.

evanh · 2019-10-21 09:07

Minimum horizontal scan rate of 30 kHz is the dictating limit usually. When increasing the resolution, the dot clock has to be raised to keep this above 29 kHz.

EDIT: Regarding 16:9 compatible modes, 848x480 is well defined. Monitors and TVs will both scale it as a supported mode. This also easily fits an overclocked prop2.

rogloh · 2019-10-21 12:24

HyperRAM support for graphics modes is now coded into the driver based on my proposed interface, I now just need a HyperRAM driver to suit.

Couldn't fit pixel doubling with HyperRAM in this first version as it doesn't work in with the current pipeline. You'd need to issue the external memory request one scanline prior while working on a different scanline so it is ready to double once the data arrives one scanline later so you don't waste your processing opportunity. In theory it might be possible at some point but for now I will live without it and I'm more interested in pursuing display lists which make the issue even more complex and harder to solve.

If you are using external HyperRAM memory you probably have enough memory for displaying 640 pixels wide instead of 320 anyway. That's one of the key points of using the external memory in the first place. But it would be nice to find a long term solution if possible. It's like you need to look ahead at what is coming before you've even figured out what you are doing next.

320 pixel wide modes will still be supported from HUB RAM however, that part didn't change.
The new code is shown below. It's only 8 extra instructions in this part, two elsewhere.

' generate next graphics scan line
gen_gfx
            push    ptrb
            mov     save, ptra              'preserve initial source pointer

            testb   screenaddr, #31 wz      'check for external memory usage
    if_z    setbyte ptra, #EXTMEMREQ, #3    'add memory read request to address
    if_z    setbyte ptrb, #80, #3           'transfer 80 "units" of memory data
    if_z    setnib  ptrb, bppidx, #5        '...multiplied by bpp into HUB RAM
    if_z    wrlong  ptrb, mailbox2          'setup memory request information
    if_z    wrlong  ptra, mailbox1          'initiate memory request transfer
    if_z    add     ptra, linebufsize       'increase ptra by this amount
    if_z    jmp     #copy_done              'no need to do any copy this time
            
            testb   modedata, #5 wc         'check for pixel width doubling
    if_nc   setd    writeback, #$140        'no doubling, copied from same addr
    if_c    setd    writeback, #$100        'data copied from different place
            mov     c, transfers            'setup number of read burst loops
transferloop
            setq2   #0-0                    'block copy from HUB source to LUT
            rdlong  $140, ptra++

TonyB_ · 2019-10-21 12:42

rogloh wrote: »

Are there lots of displays that do 800 pixel widescreen VGA using DVI/HDMI directly?

There are cheap 5" 800x480 touchscreen displays for the Raspberry Pi that can operate as HDMI monitors without a Pi. Horizontal and vertical frequencies are same as 640x480, with pixel clock increased from 25.2 to 33.264 MHz, hence P2 sysclk ~333 MHz.

rogloh · 2019-10-21 12:48

They would be good to use though the P2 is running pretty overclocked then.

TonyB_ · 2019-10-21 12:56

rogloh wrote: »

TonyB_ wrote: »

rogloh wrote: »

Are there lots of displays that do 800 pixel widescreen VGA using DVI/HDMI directly?

There are cheap 5" 800x480 touchscreen displays for the Raspberry Pi that can operate as HDMI monitors without a Pi. Horizontal and vertical frequencies are same as 640x480, with pixel clock increased from 25.2 to 33.264 MHz, hence P2 sysclk ~333 MHz.

They would be good to use though the P2 is running pretty overclocked then.

Reduced blanking might be possible to lower sysclk a bit. I am sure Evan and others could say how practicable 333 MHz is.

evanh · 2019-10-21 13:45

RevB can handle that but, like you say, reduced blanking is absolutely an option for tweaking down to lower clocking. 800 visible @ 30 kHz line rate will likely work below 300 MHz sysclock.

EDIT: Well, that's true for 30 MHz dot clock on analogue VGA. It may not always work with the scan converters in modern LCDs. They have to guess the visible resolution from the pixel rate and scan rates. So while they'll sync to that off spec mode no problem, they may also choose the wrong resolution to scale from. I don't think there is any sure way to detect blanking for example. This means they have a range in which each mode is valid before switching to the next. And those ranges will vary depending on how close the next defined mode is.

evanh · 2019-10-21 14:00

I wonder if the latest extensions to HDMI might provide better support for specifying the desired resolution rather than letting the monitor just guess.

rogloh · 2019-10-22 11:44

With some test hacks, I was able to get my driver outputting text at 800 pixels wide (100 columns) over DVI with 480 lines @ 54Hz with some reduced blanking and it was recognized by my Dell 24 inch LCD. It is problematic for graphics though because the setq burst transfers which are all long oriented do not fit well with 800 pixels at all bit depths, specifically 1bpp and pixel doubled modes have issues. Increasing the P2 PLL clock rate helped boost the performance, and can increase the frame rate higher than this (as could dropping the number of blanking lines).

I'm also having some problems setting up the FIFO at the right time if I try to restart it per scanline which I need to do if the line size is not divisible by 64 (this is true for 800 pixel wide, unlike 640 wide screens). When I restart it I see data being being split between two scanlines etc, so I am still fiddling about with when is best to start the FIFO on the scanline. I still am unsure how exactly to stop the FIFO on the spot and get it working with fresh data so the streamer isn't using stale data. That's what I think might be going on here.

The way I currently have the FIFO/streamer running right now for 640 pixel wide screens, is that once started, it continually runs for the entire frame, just wrapping on a 4 scanline boundary which is conveniently always a multiple of a 64 byte block size. I then restart it each frame the same way at the same address. This works well for outputting 640 pixels in all bit depths, but it is specific to that width and I don't have a nice way to deal with other widths if I can't get the FIFO reloaded and resuming on a new scanline buffer address per scanline.

How do you flush the FIFO so the streamer doesn't use stale data?

Update: Seem to have fixed the FIFO issue by waiting until after the hsync pulse occurs before I update the FIFO for the next line. The problem was I was finishing the prior scanline early and updating it too soon, so it wasn't actually stale data.

TonyB_ · 2019-10-22 13:06

rogloh wrote: »

With some test hacks, I was able to get my driver outputting text at 800 pixels wide (100 columns) over DVI with 480 lines @ 54Hz with some reduced blanking and it was recognized by my Dell 24 inch LCD. It is problematic for graphics though because the setq burst transfers which are all long oriented do not fit well with 800 pixels at all bit depths, specifically 1bpp and pixel doubled modes have issues. Increasing the P2 PLL clock rate helped boost the performance, and can increase the frame rate higher than this (as could dropping the number of blanking lines).

Maybe use only 768 active pixels, as a temporary measure?

Small 800x480 displays appear to support only that resolution and cannot do any scaling. However, having only one mode seems to allow a wide variation in horizontal timing. I have data for a 7" display with the following pixels/line, pixel clock:

min. 862, 26.4 MHz
typ. 1056, 33.3 MHz
max. 1200, 46.8 MHz

rogloh · 2019-10-22 13:53

If the DVI pixel clock is boosted past 25.2MHz it can easily output higher number of pixels per line, so it should be possible to operate these types of LCDs. I'm sure can make the driver work with 768 no drama. I'd prefer to get it running native, if it can't be scaled.

I have added a custom timing option to my startup, where if you setup a pointer to these values it will take the new values for sync, front and back porches etc, and patch the code at COG init time. If you leave this pointer at 0 it will default to the pre-compiled option such as VGA 640x480 etc. The default is also statically configurable. This way is nice because it lets us have multiple instances of the driver each on different pins, each operating at potentially different timings, and respawnable if the timing changes.

Update: Was able to get 768 going at 25.2MHz with reduced blanking.

        V_VISIBLE       = 480
        V_FP            = 10
        V_SYNC          = 2
        V_BP            = 33
        H_FP            = 16
        H_SYNC          = 32 'was 96
        H_BP            = 24 'was 48
        H_VISIBLE       = 768 
        HZ              = 57 'was 60
        V_SYNC_POLARITY = SYNC_NEG
        H_SYNC_POLARITY = SYNC_NEG

potatohead · 2019-10-22 14:17

msrobots wrote: »

Just WOW.

This is a lot packed into one COG. I am a bit irritated about the DVI part, so is this supposed to use the P2 Eval HDMI adapter and connect to a Flat-TV?

Does it work with Rev 1 silicon?

curious,

Mike

My sentiments exactly. I am getting ramped up again. This is top notch work.

Props will sing in both digital and analog spaces! Last time I read the specs, they seemed to imply the DVI data can be carried via the HDMI connector. If not, you are an adapter away.

jmg · 2019-10-22 19:27

rogloh wrote: »

Update: Was able to get 768 going at 25.2MHz with reduced blanking.

Curious if you found a lower limit on this reduced blanking timing ? (for your test monitor)

Wuerfel_21 · 2019-10-22 20:11

potatohead wrote: »

Props will sing in both digital and analog spaces! Last time I read the specs, they seemed to imply the DVI data can be carried via the HDMI connector. If not, you are an adapter away.

Electrically, HDMI is just single-link DVI-D with a different (and arguably worse) connector.
(There was a dual-link capable version of HDMI at one point, but the connector isn't compatible with the regular one so no one ever used it)

rogloh · 2019-10-22 22:58

jmg wrote: »

Curious if you found a lower limit on this reduced blanking timing ? (for your test monitor)

Haven't paid it full attention yet, fully busy on other aspects. But there will be a lower limit. I did see a couple of times that it doesn't like scan rates below about 30kHz and I know from other work it can't accept much below 50Hz vertical refresh either. It will take higher than 60Hz though which is nice and that's good for things like EGA with fewer scan lines.

Wuerfel_21 · 2019-10-22 23:35

In my experience, the VSync range for usual LCD monitors is 50Hz to 70 or 75Hz. High-end LCD monitors seem to top out at 240 Hz, new-ish TVs often support 24Hz and 48Hz modes, a typical SVGA CRT will take 50Hz to 160Hz (altough 50Hz is quite headache inducing....)

rogloh · 2019-10-22 23:39

For this DVI only version I think I'm going to bite the bullet and try to add support for higher resolutions with LCDs that can be flexible on their scan rates, certainly for others to try to test that possibility out. I think I'll move the 50 long text buffer back into LUTRAM which will open me up to 1024 pixel wide displays and 128 column text, e.g. for 1024x600 displays (perhaps not 1024x768 unless they support really low vertical refresh rates like 30Hz). Been loathed to do that but it is clear now that any potential HDMI version of this will have to be cut back anyway and be a slightly different animal without every single bell and whistle (or until I figure out a way).

A benefit of this is that it frees up 50 more instructions in COG RAM and I can begin use the entire timing budget on the scanline to allow experimenting with higher graphics resolutions. Plus if an external memory driver is used it should allow some decent colour depths at higher resolutions, probably not truecolour unless the refresh rate is low, but perhaps others. You could still get a pixel doubled 512x384 display with 16bpp to fit in hub RAM, haven't confirmed CPU timing budget there though. There could be hope for 800x600 too with non-standard timings/reduced blanking/overclocked P2 ~ 300MHz etc. I want to try it.

rogloh · 2019-10-22 23:54

@jmg, just tried out a quick monitor test. Was able to get reduced blanking of 8 pixels front porch, 8 hsync, 8 back porch. So only 640 + 24 pixels on the line. Man this Dell2405FPW takes just about anything!

TonyB_ · 2019-10-22 23:58

How widely supported is DVI/HDMI 640x480 50Hz?

rogloh · 2019-10-23 00:12

Well this seems to this monitor's limit at 25.2MHz pixel rates. I found if I reduce any blanking parameter further or increase the pixels or columns any more I see corruption, though some of this may be hitting software limits in setting up the streamer on each line etc. Monitor reports 49Hz.

        V_VISIBLE       = 568 ' was 480
        V_FP            = 2 'was 10
        V_SYNC          = 2
        V_BP            = 2 'was 33
        H_FP            = 1 'was 16
        H_SYNC          = 4 'was 96
        H_BP            = 8 'was 48
        H_VISIBLE       = 880 'was 640

Update: I boosted the P2 clock PLL by a factor of 1.25 up to 315MHz and can now get 800x600 timing going at ~64Hz with this driver so there is headroom to drop the P2 clock down a bit there too for that resolution. 800x600 also works at 25.2MHz pixel rates at 51Hz, when I patched 800 and 600 into the relevant timing numbers above.

All PASM2 gurus - help optimizing a text driver over DVI?

Comments