All PASM2 gurus - help optimizing a text driver over DVI?

rogloh · 2019-11-07 00:04

ersmith wrote: »

Sounds pretty interesting, Roger!

I've updated my VGA ANSI driver to support 2 bytes/char (using the standard VGA format that you also use) and 1 byte/char (7 bit ASCII plus 1 bit blinking effect for the cursor). It should be pretty straightforward to adapt the ANSI character output code in vga_text_routines.spinh to your driver.

The source code is in github at:
https://github.com/totalspectrum/p2_vga_text

Great. I'll have to take a look at interfacing with your character driver too when I get a chance, hopefully it will be easy. I don't have any high level wrapper code done yet, but will need some to be able to control/initialize things, and I was thinking of checking out your flexgui soon so I can wrap some of this in SPIN2 and/or seeing what Chip releases officially.

Cluso99 wrote: »

Great news Roger. Your options are really impressive!

Thanks Cluso. I want to try to make good use of it soon myself for various things, but have been flat out developing/debugging it in my spare time. Am hoping to release it soon, as people will start to get more RevB's and could use it. When I get things back into a decent shape I should release it as a working beta first, as there are still a few untested things. I will try to see if my github account still works and see if I can upload it there (first I need to learn how to create my own repo online, lol), and/or post a zip file containing it.

TonyB_ wrote: »

What about all of us with RGB TTL monitors?

I sort of guessed you might be only joking but it's been nagging me so I had another think about this....and maybe it is really possible depending on how hsync can be timed. If I setup the output DACs to 3.3V levels and repurpose the normal sync output pin that the streamer uses as the CGA monitor's "I" intensity channel, and manage hsync separately on another pin, like I already do for 5 pin VGA's vsync, I might just be able to generate the 16 TTL colours a CGA monitor uses with the 16 entry LUT. So if I manually toggle a separate hysnc pin after several back to back streamer commands on a scanline that resync's the P2 CPU to the streamer it might just be synchronized enough to not add any noticeable pixel jitter, without using the streamer to drive sync. I'm thinking it could actually work in the LUTRAM modes. The main problem is I don't have a CGA monitor to test it out, but I can try to provide this feature if someone wanted to try it out at some point - a little later though. The good thing is now with my sync code overlay it would be possible to have a different hsync mode that works like this. Perhaps the P2 output pins in such a mode would have to be constrained to be something like IBGRHV (in increasing pin order starting from a 4 pin group), as I don't have all the configuration bits for total independent sync pin control in this special mode right now, dunno maybe I can extend it somehow later. But that could still be okay for those desperately needing it....

Edit: Just don't ask me for supporting old EGA monitors with their 6pin TTL colour. That one I can't do with this driver, unless I move to 8 pin parallel outputs...hmm.

evanh · 2019-11-07 06:37

Noooo! Colour TTLs were a joke all their existence.

rogloh · 2019-11-07 07:17

Don't worry I'm not really spending time working on it right now...

I've coded up the palette fixes, added interlaced field source selection and a proper field/frame counter plus fixed a couple of extra issues I found, and added a sweet special bonus feature...

I have the sprite pass through mode to add and possibly a common mouse co-ordinate feature left to go if there is room for that too. I'm now down to my last 8 longs, but if necessary I think I can steal about 5 more if I get desperate by rearranging some data structure items. I always like to leave some up my sleeve, somehow reminds me of Adrian Edmondson's "Emergency Bitter!".

cgracey · 2019-11-07 07:56

In my experience, more can always be squeezed in, no matter how tight you think it already is. As someone said on the forum here, it's a sickness. Might have been PhiPi that said that.

rogloh · 2019-11-07 12:01

Deleted

rogloh · 2019-11-09 03:35

I've added in code for a transparent pass through mode to this driver which should also allow 1:1 P2/pixel clock operation for basic frame buffer output from internal/external memory, without text/mouse support. Am still testing and it is also designed to work with wrapping, but of course no pixel doubling. Only took 2 longs thankfully by sharing an existing code pathway.

I also have added in a sprite mode capability (2 more longs), which uses the same pass through mode with source wrap around to send out a series of source scanlines repeatedly throughout the sprite display region, instead of wrapping just once. This means external COG(s) running tiles+sprites generation code can render into these scanline buffers and my driver will just output it automatically for any given region. The mouse may or may not be supported in that region's mode, I am not sure yet, but any sprite COG could always do a mouse sprite anyway.

After this was put in I found some further simplification/sharing in the code and found about 8 longs! Plus if I use the interrupt longs at $1F0, I can get 6 more COGRAM registers. I'm not using interrupts in this code so I think I can make use of this space with any luck. Luxury!

I still want to add the global/local region mouse stuff and may like to add a way to have the bottom border forced during screen display after a given scanline so that vertical wipe transitions can be done really easily. This transition is still achievable in another way by dynamically adjusting region sizes etc, but this requires extra client side calculations based on region sizes in the entire display list. If I can do it in just 3 longs I might.

That should be just about it for features I hope.

The only other issue I know about is the mouse generation. In graphics/text modes I can easily generate it at the end of the computed scanline where there is time before hsync. However for the external mailbox/sprite modes, it is best to render it as late as possible (i.e. ideally during/after hsync), to give the most amount of time for the external data to be ready after it is requested. The problem in doing it late is that sometimes there might not be enough clocks to perform the mouse calculations when the back porch is small, and we also need to load the 16 palette entry at this point too. It's a fine balancing act.

Update: Full VGA framebuffers with P2 clocks of 25.2MHz are working! So much lower power operation should be possible now with this video driver (obviously any text mode would need additional COG(s) to populate the line buffer). In theory this feature will just work at all higher resolutions too as clocks and pixels scale up. I haven't tried NTSC/PAL at 13.5MHz equivalent yet, but it should work the same as long as there are enough clocks per scanline to do all housekeeping needed. Should be.

Update2: Sprite mode looks to be working as well. Even though I don't have a sprite driver coded and hooked into it yet I see my 3 setup source scanlines (configurable from 1-n) repeating over and over in its display region which is basically what is needed along with the per scanline status counter update to hub memory that I had already coded previously. External sprite COGs should be able to hook right into this architecture.

Update3: I added the bottom border stuff and it is working too. Only the mouse change left now.

msrobots · 2019-11-11 00:33

@rogloh,

I hadn't had time to test your driver, but your features are impressive!

Thanks for doing this, I not MAY but WILL use this fore sure...

Enjoy!

Mike

evanh · 2019-11-11 00:53

Mike,
You might have some trouble in doing that, Roger hasn't posted the source yet.

potatohead · 2019-11-11 01:15

Hope he does soon.

I have time coming up and am itching to play.

Impressive for sure. Ideally, this one can be a general driver like TV.spin and graphics.spin were.

And that's the next fun task. Get primitives working in the various modes, so that a multi zone display can make sense. And this time, graphics.spin is callable from HUB, no need to dedicate a COG.

Did tiles make it into the feature set?

rogloh · 2019-11-11 02:42

I am now feature complete in the PASM and it all fits nicely, I am just integrating my dynamic sync code for SDTV/HDTV into the build instead of patching it in manually for testing. That's the last coding to be done/tested today. Then it could be released certainly as a high quality beta if not the final driver code, which is imminent. I just want a quick tidy up and update its documentation where required for some re-arranging of data I've done to optimize things.

potatohead wrote: »

Did tiles make it into the feature set?

Tiles are compatible with my driver yes, it will just need secondary COG(s) to render its output into a scanline. The model is that my driver COG is the display COG and other COGs can access it as required for display. They can use their own region(s) or control the entire screen. It's very flexible.

I will probably try to work on an example sprite COG after this driver COG is out, but others will be able to do them too.

Newer features recently added..

-interlaced source support
-svideo/composite/component video
-global mouse/local mouse select per region
-mouse display/hide bit
-dynamic colorspace loading per frame for fades/effects in analog output modes
-optional programmable width side borders (ideal for SDTV 720 pixel modes for 80 columns etc)
-absolute top/bottom border control also overriding region sizes
-now enables (external) tiles+sprite mode capability using continuous scanline group wrap around
-transparent pass through mode for low power 1:1 P2 clock / pixel clock ratios

Tubular · 2019-11-11 03:32

This is going to be great

Dare I ask how many longs were left over?

rogloh · 2019-11-11 03:44

Down to 4 at the moment (almost deserves another feature!) but my tidy up might save some.

JRoark · 2019-11-11 03:47

As a suggestion, put the string “UROK” in there, because seriously, Rogloh, you do! Epic opus there.

rogloh · 2019-11-11 03:52

LOL. Well the feature I am thinking about is interlaced font support. Because right now it supports independent source data for graphics when interlaced (or if page flipping is enabled), but right now the text is simply duplicated between fields. However ideally each text region in a field could be allowed to send every second scanline of a font, that way text is not doubling in size and you get the full 1080 line resolution in text mode if the TV builds a frame from two fields. I think this makes sense and I think I'd like to try to put it in if it fits.

potatohead · 2019-11-11 04:57

Very nice!! You wrapped up a ton of work into one driver.

Re: Tiles. Got it. Render whatever we want into one or more scan line buffers and go. Easy peasy.

IMHO, if it fits, interlaced font support may get used.

We may be surprised at what VGA capable LCD panels intended as monitors will do. Same for some HDTV sets.

Baggers and I did a sprite driver on P1 that employed interlaced VGA 640x480. We wanted the very lowest horizontal sweep to remain compatible, and get 320 pixels on a line on P1 at 80Mhz.

To my complete surprise, both my plasma TV (Samsung 120Hz 3D smart TV) and some random VGA LCD just deinterlaced it nicely. On the TV, I did find that capability buried in the technical service manual for the set. Game mode turns it off, but otherwise, it's on as one of several "reduce noise" features. Works wonders with SD material on DVD, etc... Hard to tell it was ever interlaced.

The LCD is some 90's era backlit LCD and it just transparently does the work. No options, just happens. I suspect it was there for older PC's, which would sometimes require interlace to do 1024x768, which is that LCD native resolution.

Did I read correctly that some of the interlaced HDMI modes, or hacks of modes potentially including interlaced, get us some extra resolution at the pixel clocks P2 can do? If so, yeah. Interlaced fonts would make sense on that basis too.

I personally may use it with older, high persistence phosphor CRT displays I want to drive. A nice 80x50 on those looks amazing, but that's a one-off.

Can't wait to tinker with the HDMI stuff. My TV supports that 24p cinema mode explicitly. I've only seen it triggered, and displayed as such in the little, optional notification window, with the one 24p Blu-Ray I have. It's a copy of "How The West Was Won" and the difference is obvious the first time a camera dollies into a complex, moving scene, like downtown. There may be more there to explore.

Anyone have recommendations on a small 1080i capable display? I think I am going to buy one after the holidays.

And zones! You make me think of my Atari 8 bit and it's display lists, and the Amiga. Fun times.

BTW, a couple years back when we were still on the FPGA, I did a horizontal zone, switching from 2bpp to multiple bpp on the same scanline. That was buggy and Chip fixed it. I am pretty sure a cycle or two was getting lost, and that it no longer happens that way due to the whole thing being double buffer commands to the streamer.

Someday, we may make use of zones in both directions. (maybe demoscene type stuff)

Thanks in advance. You did us all a lot of hard work. Appreciated.

rogloh · 2019-11-11 10:59

I just added the interlaced text feature. So that's it now, the COGRAM is totally full now including the $1f0-$1f5 space too. Now I can do the tidyup, docs and release it.

I ran into one weird thing today though while tweaking things, and I'm still trying to figure it out. When the LUTRAM has particular combination of code in it, I can add a nop in LUTRAM or take it away and video output appears to hang without the extra nop in place, and run with it. The hsync output looks like total random data on the scope in this bad case. Changing the size of the code in the LUT affects this too. I'm hoping it might be related to getting sync patterns in the palette (as I'm not yet cleaning the LUT on initialisation after running code from it - I will soon to rule that one out) because if it is not that, I don't know what is causing this issue. The other things might be call/relative jump distances but they seem okay and the p2asm tool warns you about that too I think.

Update: Solved the source of the annoying/unstable bug I hit today. I originally had setq2 #255 then rdlong into LUT RAM, but I just started tripping over the $100 long code size when things got moved around. No drama, I have more LUT space since I relocated things lower, so I can change it up to setq2 #511 now. Glad that's resolved!

rogloh · 2019-11-12 04:01

I've added in the ability to clock the pixels out at fractional rates instead of purely synchronously to the P2 clock. This probably only makes sense in VGA output modes where you might have the P2 running at say 180MHz but want the dot clock output to be 25.2MHz. In this case it is not a perfect multiple then but actually ~7.143.
Previously you would have to simply set it to be 7 and perhaps adjust sync widths etc to try to get your 60Hz output if you wanted that.
Now if you wish you can set the divider to be 7*256 + 0.143*256, it will average out to be close to 25.2MHz, though some pixels will be sent in 7 clocks and some in 8 clocks. I am looking at my Dell LCD and can't even see these wider pixels, though this monitor's scaler is really good and can probably hide it well, others might not look as nice. I'm making good use of the P2 cordic engine to do the 64 bit division now instead of my original look up table approach with limited range.

One thing I've found is that in my porting over my NTSC/PAL video code into the main driver is that the text in NTSC/PAL video modes doesn't look nearly as nice as I thought it was during my experiments and is flickery and I'm seeing more chroma crawl, so I might have lost something when bringing over the sync code or now have bad colour space params etc - still checking it to see if it can be improved (hope so).

Update: looking at the video output on the scope it appears that I have somehow lost the small 1.6us quiet back porch portion right after the colour burst compared to the stable image I see from Chip's demo. I know I compute this interval dynamically in my new code so I'll find the problem.

rogloh · 2019-11-12 06:28

Here's an example of the issue I have... this is S-Video (Y+C) NTSC interlaced output from my driver and I just took a photo of my LCD monitor. Some text looks great, crisp and clear, but other portions seem to be smeared/fading out (see the blue on green text). On the screen these parts are slowly shimmering/phasing and those parts of the text are unreadable. Maybe this is a normal part of certain TV colour combinations or it has something to do with the chroma frequency not being properly locked. Not sure yet.

The composite video output version looks pretty nasty compared to it, very shaky/flickery in these unreadable portions. I'll have to feed these signals into my plasma TV to see how it compares, perhaps some of it is related to the LCD monitor's capabilities too.

Update: Both S-video and composite look slightly better on the plasma but still not 100%. There's probably still something wrong with the clock I think. It's not locked properly to the line rate or something.

potatohead · 2019-11-12 07:12

Could be your color burst reference, relative to the porch, is drifting.

To check this, get a single pixel, vertical, white line where the pixel is half a color burst cycle. In resolution terms, that is 320 pixels or greater. Put it against a black background.

Crank the color saturation up and that line should artifact, and on composite that is normal. On Y C it should not, or won't very much, depending on your set.

The hue of that artifact will be constant when the color burst timing relative to that little porch stays constant. Typically, on P2 with the NCO, it drifts some and will cycle through all hues over some period of time.

The artifact on composite will be a color tinted added to the line as its higher frequency component is seen as color info. Looks a bit like a shimmer when the timing error results in a high difference frequency. (In the hz range, typically)

You can do an xzero at end of frame to correct this. For interlaced, you may have to do it on one odd and one even line, near end of frame to preserve proper phase changes on each field. I did not actually do that for an interlaced field.

Secondly, you can tweak the timing just a shade to make it some even divisor to eliminate creep, due to the small fractional error. Maybe couple that with xzero.

My guess anyway.

Some gross phase differences just do not resolve well too, depending on the display.

Really, even with Y C, there are only 160 (320 interlaced horizontally when phase change is happening properly) color cycles per active line. A blue text on red or green background at higher resolution exceeds the information carrying capacity of the signal. Composite is the worst, with Y C being somewhat better.

Imagine each of those as a rainbow repeating over and over and the spatial precision of color info is quite coarse. 320 pixels is about the max, again with color phase change happening properly. Without it, or a constant phase, that goes down to 160 pixels in the safe region, maybe 256 with some fringing.

It is hard to tell from a static image, but that signal is not bad. You appear not far from peak composite / S-video there.

rogloh · 2019-11-12 08:21

potatohead wrote: »

Could be your color burst reference, relative to the porch, is drifting.

It wouldn't surprise me. When I scoped it and triggered on the rising sync edge, I saw the colour burst rolling along on the scanline, so it's probably not locked and slipping somewhere.

It is hard to tell from a static image, but that signal is not bad. You appear not far from peak composite / S-video there.

Yeah for images like the bird one it actually looks reasonably good from a normal distance, unless there is static/random noise displayed, then the colours start to really phase in and out and don't stay stable. The text is probably pushing the resolution colour limit as you mentioned. S-Video looks a whole lot better than the composite output.

I think I'm not going to spend too much time hunting it down at this point. It would be better to release this code sooner, so others can try it and perhaps help us resolve some of the TV timing details if things can be improved further.

I just want to get the last component SD/HD sync setup stuff into the driver as that looks rock solid on my plasma in Chip's and my own experimental code. That's really the final bit of integration work. The output side is all complete (the COG is full), it's just the initalization code in LUTRAM I'm touching now.

By the way the interlaced text mode I added looks nice (not shown above), and it shrinks the text back down to a normal size.

rogloh · 2019-11-13 05:53

Was changing some S-video parameter setup code and I have absolutely no idea what I changed without going back to older versions but now my S-video images+text seem really nice compared to what I had yesterday, and pretty much back to how I remembered it looking the first time. In comparison though composite still seems rather ripply. Something's still not quite right there perhaps.

In other updates, I have the progressive component HD and progressive component SD integrated into the driver now at it's working and looks rock solid at 50 & 60Hz with 480p/576p/720p. Just have the interlaced SD component and the tricky interlaced 1080i component sync variants left to put in. I'm wondering what to do about SD progressive s-video and progressive composite outputs if ever selected. I haven't thought about those two combinations yet. Dealing with all the different sync choices is actually quite challenging and the dynamically patched code is getting difficult to decipher.

Update: 1080i tri-level sync interlaced component is in and working now on my LCD and plasma.

potatohead · 2019-11-13 07:20

About the only time 240p will come up is old computer and game consoles. I would skip it, if code is tight. Few will want it. Those that do (me maybe) can just do it anyway.

Omit the half scan line and just repeat first field and that will work for a lot of cases. 30 hz frame consisting of two fields becomes one 60 hz frame, one field.

Older, more hacks than official signals, like Apple 2 or Atari, can either be approximated (and potentially look better), or those will get done as a special case, maybe even skipping the color system entirely. The P1 video system methods could be used with better results on P2, IMHO. Better DACS.

For that composite shimmer, an xzero at end of field may cure the shimmer you are seeing. That is a small timing error due to nco error accumulation. The timing between the color burst and the little porch leading into active pixels will vary a little. This has been present on P2 since early days. Little work went into resolving it, beyond using xzero to stabilize it.

P1 used a PLL that did not have that issue. (Well, it did, but was stable, only changing on cold start) A precision tweak will be needed to make it stable on P2, or a hack like xzero in the right place will keep resetting things before they accumulate and become visible.

Or, we don't care and just use video, component or rgb at the low, 15khz sweeps.

That is my take on it anyway.

Just got my rev B board! Fun weekend coming up.

Post this thing already! Asking for a friend

rogloh · 2019-11-14 09:00

potatohead wrote: »

Post this thing already! Asking for a friend

LOL. All good things come to those that wait...

Thanks for the suggestions on the potential interlaced TV video fixes.

I visited Tubular and Ozpropdev today and gave them a demo of this driver and all different outputs and some of the features it supports. Here's a photo of what can be achieved with the P2 and this video driver at full power. It is 1080i over HD component with interlaced text at 8 scanlines high (a CGA font) and 240 columns wide in text mode. The P2 is running overclocked at ~350MHz for this with the single COG doing everything. You can see the mouse sprites in the two regions.

In double wide mode, which is still 120 columns, you can generate this with (only) a 297MHz P2.

In theory you could do this using pass through down at 74.25MHz but you'd need other COG(s) to render the text into scanline buffers instead of having the video driver do it for you. How many COGs you'd need exactly for hitting that I'm not sure but it could well be more than one. Not shown but it can even work with a 6 scanline font which gives 240x180 text resolution (just mad). This is on the Dell LCD, an HDTV may not have the same fine colour resolution.

Tubular · 2019-11-14 09:44

Yeah that was simply amazing to see all that. The Dell monitor has 5 inputs and Roger drove each of those 5 interfaces in turn with just a few tweaks of the code.

Roger was also excited because he had discovered a spare long earlier in the day lol...

Sorry I had to head off early. At the talk this evening this founder of a $200m company was joking about "who codes in machine code anymore"... you could have proudly answered in the affirmative

ozpropdev · 2019-11-14 10:15

It was very impressive Roger!
Now, what to do with that pesky spare long!

rogloh · 2019-11-14 11:25

ozpropdev wrote: »

It was very impressive Roger!
Now, what to do with that pesky spare long!

Saving it for any more critical bug fixes!

Here's the code for text generation. I am hoping to shave off another long 11 lines from the end where it just needs to clear the carry flag (the c register doesn't get used after this point), can anyone see a way to do this or see other optimizations in the cursor code or elsewhere? I could clear the carry in the increment instruction two lines earlier but it gets used on the next line. Perhaps Z (setup with AND/OR/XORs) can be used on its own instead in the wrlong which would allow this.

I reckon there is scope there for saving an instruction by better making use of the flags.

' Code to generate the next text scan line and cursor(s)
gen_text    
            mov     b, rowscan              'build font table base address 
            shl     b, #8                   'for this font and row's scanline 
            add     b, fontaddr      
            setq    #64-1                   '64 longs holds 256 bytes of font
            rdlong  font, b                 'read in font data for scanline

            testb   modedata, #7 wz         'flashing / full colour background?
    if_z    setr    testflash, #$83         'use text flashing code test
    if_nz   setr    testflash, #$EA         'change into helfpul zerox c,#15 wc

            testb   modedata, #5 wz         'pixel double test
            setq2   #120-1                  'read maximum of 120 longs from HUB
            rdlong  $110, ptra              'to get next 240 chars with colours
p9          mov     pb, #$10f+COLS/2        'setup LUT read pointer at end
p10 if_z    sub     pb, #COLS/4             '...of where character data is

            mov     save, ptrb              'save pointer register
            mov     ptrb, #$1ff             'setup write location in LUT RAM     

p1  if_z    sets    adv, #COLS              'increase by half normal columns
p2  if_nz   sets    adv, #COLS*2            'increase by normal columns

            mov     a, #%11000 wc           'reset initial skipf pattern
p3  if_z    rep     @.endwide-2, #COLS/2    '2100 clocks for double wide mode
p4  if_nz   rep     @.endnormal-2, #COLS    '2760 clocks for normal wide mode
            skipf   a                       'skip 2 of the next 5 instructions
            xor     a, #%11110              'flip skip sequence for next time
            rdlut   d, pb                   'read pair of characters/colours
            getword c, d, #0                'select first word in long (skipf)
            getword c, d, #1                'select second word in long (skipf)
            sub     pb, #1                  'decrement LUT read index (skipf)
            getbyte b, c, #0                'extract font offset for char 
            altgb   b, #font                'determine font lookup address
            getbyte pixels, 0-0, #0         'get font for character's scanline
testflash   bitl    c, #15 wcz              'test (and clear) flashing bit
flash if_c  and     pixels, #$ff            'make it all background if flashing
            movbyts c, #%01010101           'colours becomes BF_BF_BF_BF
            mov     b, c                    'grab a copy for muxing step next
            rol     b, #4                   'b becomes FB_FB_FB_FB
            setq    ##$F0FF000F             'mux mask adjusts fg and bg colours
            muxq    c, b                    'c becomes FF_FB_BF_BB
            testb   modedata, #5 wz         'repeat columns test, z was trashed
    if_nz   movbyts c, pixels               'select pixel colours for char
    if_nz   wrlut   c, ptrb--               'write coloured pixel data into LUT 
.endnormal
            setword pixels, pixels, #1      'replicate low words in long
            mergew  pixels                  '...to then double pixels
            mov     b, c                    'save a copy before we lose colours
            movbyts c, pixels               'compute 4 lower colours of char
            ror     pixels, #8              'get upper 8 pixels
            movbyts b, pixels               'compute 4 higher colours of char
    if_z    wrlut   b, ptrb--               'save it to LUT RAM
    if_z    wrlut   c, ptrb--               'save it to LUT RAM
.endwide
            mov     ptrb, save              'restore ptrb
p5          setq2   #COLS-1                 'write all column pixels to HUB RAM
p11         wrlong  $200-COLS, ptrb         'from LUT storage

            'do both cursors here
            bitz    increment, #2           'setup whether cursor is doubled
            bitz    scale, #0               'and multiply offset accordingly

            rep     @.endcursor, #2         'repeat twice for two cursors
            mov     c, cursor1+0            'get cursor data
            xor     $-1, #1                 'alternate cursors
            getbyte a, c, #1                'get cursor's x position (col)
            getword b, c, #1                'get cursor's y position (row)
scale       shl     a, #2+0                 'transform x to long address offset
            add     a, ptrb                 'add offset to start of line buffer
cursflash   testbn  fieldcount, #4-0 wz     'get flashing bit from field count
            testb   c, #4 xorz              'select the blink phase to apply
            testbn  c, #6 orz               'if blink enabled, apply a flash    
    if_z    cmp     b, row wz               'check if cursor is on this row
            testb   c, #7 andz              'check if cursor is enabled
            testb   c, #5 wc                'check solid/underline type cursor
            mov     b, rowheight            'get last scanline of character
            sub     b, #2                   'subtract 2 scanline high cursor
    if_nc   cmpr    rowscan, b wc           'compare this scanline count
            setnib  c, c, #1                'replicate colour nibble in byte 0
            movbyts c, #0                   'replicate colour over all 8 pixels
 if_c_and_z wrlong  c, a                    'write cursor color to line buffer
increment   add     a, #4-0                 'advance to next long if wide text
 if_c_and_z wrlong  c, a                    'repeat to get double wide cursor
.endcursor
            mov     c, 0 wc                 'any way to eliminate this instrn?
            testbn  modedata, #6 wz         'z=1 if line doubling off
            testbn  scanline, #0 orz        'z=1 if second line
if_z        incmod  rowscan, rowheight wc   'z=1 & c=1 if wrapped

            testb   modedata, #10 andz
if_z_and_c  add     rowscan, #1
if_z_and_nc incmod  rowscan, rowheight wc

    if_c    add     row, #1                 'advance to next row
adv if_c    add     ptra, #0-0              'advance by half or full columns
    if_c    add     ptra, skew              'allows windowing into wider screen
            jmp     #selectbuf              'select next buffer to write to

ersmith · 2019-11-14 11:58

Could you re-write "testbn modedata, #6 wz" to "test modedata, #1<<6 wcz"? Then if Z is set (so the bit was 0) C will be 0; otherwise C will be set by the following incmod instruction (as it is now). I may have misread this though, I find the testbX instructions with Z a bit hard to wrap my head around sometimes

EDIT: Oops, something similar would have to be done to the next testbn as well, to handle the case where Z=0 and C=1 after the first test. Perhaps something like:

        test   modedata #(1<<6) wcz
  if_nz test   sscanline, #(1<<0) wcz
  if_z  incmod  rowscan, rowheight wc

Cluso99 · 2019-11-14 12:17

Can you use the add instruction 2 lines above unless overflow is possible?

increment add a, #4-0 wc

ersmith · 2019-11-14 12:26

Cluso99 wrote: »

Can you use the add instruction 2 lines above unless overflow is possible?

increment add a, #4-0 wc

No, because the old value of C is needed in the test for the instruction after that add.

rogloh · 2019-11-14 14:19

ersmith wrote: »

. Perhaps something like:

        test   modedata #(1<<6) wcz
  if_nz test   sscanline, #(1<<0) wcz
  if_z  incmod  rowscan, rowheight wc

Thanks Eric, I'll think about this - at first glance it seems it might work, but I need to check it carefully tomorrow. I've been burned before with logical cases that slip through and break after you think you've cracked it. You are right about the logic tests with Z flag - it's a hard one to reconcile sometimes. The worst is testbn with Z, it's like double reversal and it sometimes does my head in. I've also been making mistakes with habitually typing in "test" instead of "testb" and bad things obviously happen when you do that too.

Cluso99 wrote: »

Can you use the add instruction 2 lines above unless overflow is possible?

increment add a, #4-0 wc

Cluso, that's exactly the approach I am very interested in too. It won't overflow because it's a 512kB hub address, but as Eric mentioned I need to use the C flag on the following line. That's why I was looking at re-ordering cursor test code above to free up the C flag but perhaps Eric found me another long to keep a different way.

If I have to do a special xzero for NTSC phase error fix per frame as potatohead suggested that will cost at least another instruction somewhere, so the more spare I find the better things can be, to fix any last lingering errors. Thankfully I don't need any more features now but I think keeping a handful of longs is better than none at all.

All PASM2 gurus - help optimizing a text driver over DVI?

Comments