All PASM2 gurus - help optimizing a text driver over DVI?

11011121315

Comments

  • roglohrogloh Posts: 1,446
    edited 2019-11-07 - 06:04:13
    ersmith wrote: »
    Sounds pretty interesting, Roger!

    I've updated my VGA ANSI driver to support 2 bytes/char (using the standard VGA format that you also use) and 1 byte/char (7 bit ASCII plus 1 bit blinking effect for the cursor). It should be pretty straightforward to adapt the ANSI character output code in vga_text_routines.spinh to your driver.

    The source code is in github at:
    https://github.com/totalspectrum/p2_vga_text

    Great. I'll have to take a look at interfacing with your character driver too when I get a chance, hopefully it will be easy. I don't have any high level wrapper code done yet, but will need some to be able to control/initialize things, and I was thinking of checking out your flexgui soon so I can wrap some of this in SPIN2 and/or seeing what Chip releases officially.
    Cluso99 wrote: »
    Great news Roger. Your options are really impressive!

    Thanks Cluso. I want to try to make good use of it soon myself for various things, but have been flat out developing/debugging it in my spare time. Am hoping to release it soon, as people will start to get more RevB's and could use it. When I get things back into a decent shape I should release it as a working beta first, as there are still a few untested things. I will try to see if my github account still works and see if I can upload it there (first I need to learn how to create my own repo online, lol), and/or post a zip file containing it.
    TonyB_ wrote: »
    What about all of us with RGB TTL monitors?
    I sort of guessed you might be only joking but it's been nagging me so I had another think about this....and maybe it is really possible depending on how hsync can be timed. If I setup the output DACs to 3.3V levels and repurpose the normal sync output pin that the streamer uses as the CGA monitor's "I" intensity channel, and manage hsync separately on another pin, like I already do for 5 pin VGA's vsync, I might just be able to generate the 16 TTL colours a CGA monitor uses with the 16 entry LUT. So if I manually toggle a separate hysnc pin after several back to back streamer commands on a scanline that resync's the P2 CPU to the streamer it might just be synchronized enough to not add any noticeable pixel jitter, without using the streamer to drive sync. I'm thinking it could actually work in the LUTRAM modes. The main problem is I don't have a CGA monitor to test it out, but I can try to provide this feature if someone wanted to try it out at some point - a little later though. The good thing is now with my sync code overlay it would be possible to have a different hsync mode that works like this. Perhaps the P2 output pins in such a mode would have to be constrained to be something like IBGRHV (in increasing pin order starting from a 4 pin group), as I don't have all the configuration bits for total independent sync pin control in this special mode right now, dunno maybe I can extend it somehow later. But that could still be okay for those desperately needing it....

    Edit: Just don't ask me for supporting old EGA monitors with their 6pin TTL colour. That one I can't do with this driver, unless I move to 8 pin parallel outputs...hmm.
  • Noooo! Colour TTLs were a joke all their existence.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • roglohrogloh Posts: 1,446
    edited 2019-11-07 - 07:19:46
    Don't worry I'm not really spending time working on it right now... :smile:

    I've coded up the palette fixes, added interlaced field source selection and a proper field/frame counter plus fixed a couple of extra issues I found, and added a sweet special bonus feature...

    I have the sprite pass through mode to add and possibly a common mouse co-ordinate feature left to go if there is room for that too. I'm now down to my last 8 longs, but if necessary I think I can steal about 5 more if I get desperate by rearranging some data structure items. I always like to leave some up my sleeve, somehow reminds me of Adrian Edmondson's "Emergency Bitter!".
  • In my experience, more can always be squeezed in, no matter how tight you think it already is. As someone said on the forum here, it's a sickness. Might have been PhiPi that said that.
  • roglohrogloh Posts: 1,446
    edited 2019-11-07 - 12:18:25
    Deleted
  • roglohrogloh Posts: 1,446
    edited 2019-11-09 - 06:17:10
    I've added in code for a transparent pass through mode to this driver which should also allow 1:1 P2/pixel clock operation for basic frame buffer output from internal/external memory, without text/mouse support. Am still testing and it is also designed to work with wrapping, but of course no pixel doubling. Only took 2 longs thankfully by sharing an existing code pathway.

    I also have added in a sprite mode capability (2 more longs), which uses the same pass through mode with source wrap around to send out a series of source scanlines repeatedly throughout the sprite display region, instead of wrapping just once. This means external COG(s) running tiles+sprites generation code can render into these scanline buffers and my driver will just output it automatically for any given region. The mouse may or may not be supported in that region's mode, I am not sure yet, but any sprite COG could always do a mouse sprite anyway.

    After this was put in I found some further simplification/sharing in the code and found about 8 longs! Plus if I use the interrupt longs at $1F0, I can get 6 more COGRAM registers. I'm not using interrupts in this code so I think I can make use of this space with any luck. Luxury!

    I still want to add the global/local region mouse stuff and may like to add a way to have the bottom border forced during screen display after a given scanline so that vertical wipe transitions can be done really easily. This transition is still achievable in another way by dynamically adjusting region sizes etc, but this requires extra client side calculations based on region sizes in the entire display list. If I can do it in just 3 longs I might.

    That should be just about it for features I hope.

    The only other issue I know about is the mouse generation. In graphics/text modes I can easily generate it at the end of the computed scanline where there is time before hsync. However for the external mailbox/sprite modes, it is best to render it as late as possible (i.e. ideally during/after hsync), to give the most amount of time for the external data to be ready after it is requested. The problem in doing it late is that sometimes there might not be enough clocks to perform the mouse calculations when the back porch is small, and we also need to load the 16 palette entry at this point too. It's a fine balancing act.

    Update: Full VGA framebuffers with P2 clocks of 25.2MHz are working! So much lower power operation should be possible now with this video driver (obviously any text mode would need additional COG(s) to populate the line buffer). In theory this feature will just work at all higher resolutions too as clocks and pixels scale up. I haven't tried NTSC/PAL at 13.5MHz equivalent yet, but it should work the same as long as there are enough clocks per scanline to do all housekeeping needed. Should be.

    Update2: Sprite mode looks to be working as well. Even though I don't have a sprite driver coded and hooked into it yet I see my 3 setup source scanlines (configurable from 1-n) repeating over and over in its display region which is basically what is needed along with the per scanline status counter update to hub memory that I had already coded previously. External sprite COGs should be able to hook right into this architecture.

    Update3: I added the bottom border stuff and it is working too. Only the mouse change left now.
  • @rogloh,

    I hadn't had time to test your driver, but your features are impressive!

    Thanks for doing this, I not MAY but WILL use this fore sure...

    Enjoy!

    Mike
    I am just another Code Monkey.
    A determined coder can write COBOL programs in any language. -- Author unknown.
    Press any key to continue, any other key to quit

    The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this post are to be interpreted as described in RFC 2119.
  • Mike,
    You might have some trouble in doing that, Roger hasn't posted the source yet. ;)

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Hope he does soon. :D I have time coming up and am itching to play.

    Impressive for sure. Ideally, this one can be a general driver like TV.spin and graphics.spin were.

    And that's the next fun task. Get primitives working in the various modes, so that a multi zone display can make sense. And this time, graphics.spin is callable from HUB, no need to dedicate a COG.

    Did tiles make it into the feature set?

    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • I am now feature complete in the PASM and it all fits nicely, I am just integrating my dynamic sync code for SDTV/HDTV into the build instead of patching it in manually for testing. That's the last coding to be done/tested today. Then it could be released certainly as a high quality beta if not the final driver code, which is imminent. I just want a quick tidy up and update its documentation where required for some re-arranging of data I've done to optimize things.
    potatohead wrote: »
    Did tiles make it into the feature set?

    Tiles are compatible with my driver yes, it will just need secondary COG(s) to render its output into a scanline. The model is that my driver COG is the display COG and other COGs can access it as required for display. They can use their own region(s) or control the entire screen. It's very flexible.

    I will probably try to work on an example sprite COG after this driver COG is out, but others will be able to do them too.

    Newer features recently added..

    -interlaced source support
    -svideo/composite/component video
    -global mouse/local mouse select per region
    -mouse display/hide bit
    -dynamic colorspace loading per frame for fades/effects in analog output modes
    -optional programmable width side borders (ideal for SDTV 720 pixel modes for 80 columns etc)
    -absolute top/bottom border control also overriding region sizes
    -now enables (external) tiles+sprite mode capability using continuous scanline group wrap around
    -transparent pass through mode for low power 1:1 P2 clock / pixel clock ratios
  • This is going to be great

    Dare I ask how many longs were left over?
  • Down to 4 at the moment (almost deserves another feature!) but my tidy up might save some.
  • As a suggestion, put the string “UROK” in there, because seriously, Rogloh, you do! Epic opus there.
  • roglohrogloh Posts: 1,446
    edited 2019-11-11 - 04:18:43
    LOL. Well the feature I am thinking about is interlaced font support. Because right now it supports independent source data for graphics when interlaced (or if page flipping is enabled), but right now the text is simply duplicated between fields. However ideally each text region in a field could be allowed to send every second scanline of a font, that way text is not doubling in size and you get the full 1080 line resolution in text mode if the TV builds a frame from two fields. I think this makes sense and I think I'd like to try to put it in if it fits.
  • Very nice!! You wrapped up a ton of work into one driver.

    Re: Tiles. Got it. Render whatever we want into one or more scan line buffers and go. Easy peasy.

    IMHO, if it fits, interlaced font support may get used.

    We may be surprised at what VGA capable LCD panels intended as monitors will do. Same for some HDTV sets.

    Baggers and I did a sprite driver on P1 that employed interlaced VGA 640x480. We wanted the very lowest horizontal sweep to remain compatible, and get 320 pixels on a line on P1 at 80Mhz.

    To my complete surprise, both my plasma TV (Samsung 120Hz 3D smart TV) and some random VGA LCD just deinterlaced it nicely. On the TV, I did find that capability buried in the technical service manual for the set. Game mode turns it off, but otherwise, it's on as one of several "reduce noise" features. Works wonders with SD material on DVD, etc... Hard to tell it was ever interlaced.

    The LCD is some 90's era backlit LCD and it just transparently does the work. No options, just happens. I suspect it was there for older PC's, which would sometimes require interlace to do 1024x768, which is that LCD native resolution.

    Did I read correctly that some of the interlaced HDMI modes, or hacks of modes potentially including interlaced, get us some extra resolution at the pixel clocks P2 can do? If so, yeah. Interlaced fonts would make sense on that basis too.

    I personally may use it with older, high persistence phosphor CRT displays I want to drive. A nice 80x50 on those looks amazing, but that's a one-off.

    Can't wait to tinker with the HDMI stuff. My TV supports that 24p cinema mode explicitly. I've only seen it triggered, and displayed as such in the little, optional notification window, with the one 24p Blu-Ray I have. It's a copy of "How The West Was Won" and the difference is obvious the first time a camera dollies into a complex, moving scene, like downtown. There may be more there to explore.

    Anyone have recommendations on a small 1080i capable display? I think I am going to buy one after the holidays.

    And zones! You make me think of my Atari 8 bit and it's display lists, and the Amiga. Fun times.

    BTW, a couple years back when we were still on the FPGA, I did a horizontal zone, switching from 2bpp to multiple bpp on the same scanline. That was buggy and Chip fixed it. I am pretty sure a cycle or two was getting lost, and that it no longer happens that way due to the whole thing being double buffer commands to the streamer.

    Someday, we may make use of zones in both directions. (maybe demoscene type stuff)

    Thanks in advance. You did us all a lot of hard work. Appreciated.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • roglohrogloh Posts: 1,446
    edited 2019-11-11 - 12:45:15
    I just added the interlaced text feature. So that's it now, the COGRAM is totally full now including the $1f0-$1f5 space too. Now I can do the tidyup, docs and release it.

    I ran into one weird thing today though while tweaking things, and I'm still trying to figure it out. When the LUTRAM has particular combination of code in it, I can add a nop in LUTRAM or take it away and video output appears to hang without the extra nop in place, and run with it. The hsync output looks like total random data on the scope in this bad case. Changing the size of the code in the LUT affects this too. I'm hoping it might be related to getting sync patterns in the palette (as I'm not yet cleaning the LUT on initialisation after running code from it - I will soon to rule that one out) because if it is not that, I don't know what is causing this issue. The other things might be call/relative jump distances but they seem okay and the p2asm tool warns you about that too I think.

    Update: Solved the source of the annoying/unstable bug I hit today. I originally had setq2 #255 then rdlong into LUT RAM, but I just started tripping over the $100 long code size when things got moved around. No drama, I have more LUT space since I relocated things lower, so I can change it up to setq2 #511 now. Glad that's resolved!
  • roglohrogloh Posts: 1,446
    edited 2019-11-12 - 05:11:43
    I've added in the ability to clock the pixels out at fractional rates instead of purely synchronously to the P2 clock. This probably only makes sense in VGA output modes where you might have the P2 running at say 180MHz but want the dot clock output to be 25.2MHz. In this case it is not a perfect multiple then but actually ~7.143.
    Previously you would have to simply set it to be 7 and perhaps adjust sync widths etc to try to get your 60Hz output if you wanted that.
    Now if you wish you can set the divider to be 7*256 + 0.143*256, it will average out to be close to 25.2MHz, though some pixels will be sent in 7 clocks and some in 8 clocks. I am looking at my Dell LCD and can't even see these wider pixels, though this monitor's scaler is really good and can probably hide it well, others might not look as nice. I'm making good use of the P2 cordic engine to do the 64 bit division now instead of my original look up table approach with limited range.

    One thing I've found is that in my porting over my NTSC/PAL video code into the main driver is that the text in NTSC/PAL video modes doesn't look nearly as nice as I thought it was during my experiments and is flickery and I'm seeing more chroma crawl, so I might have lost something when bringing over the sync code or now have bad colour space params etc - still checking it to see if it can be improved (hope so).

    Update: looking at the video output on the scope it appears that I have somehow lost the small 1.6us quiet back porch portion right after the colour burst compared to the stable image I see from Chip's demo. I know I compute this interval dynamically in my new code so I'll find the problem.
  • roglohrogloh Posts: 1,446
    edited 2019-11-12 - 06:52:51
    Here's an example of the issue I have... this is S-Video (Y+C) NTSC interlaced output from my driver and I just took a photo of my LCD monitor. Some text looks great, crisp and clear, but other portions seem to be smeared/fading out (see the blue on green text). On the screen these parts are slowly shimmering/phasing and those parts of the text are unreadable. Maybe this is a normal part of certain TV colour combinations or it has something to do with the chroma frequency not being properly locked. Not sure yet.

    The composite video output version looks pretty nasty compared to it, very shaky/flickery in these unreadable portions. I'll have to feed these signals into my plasma TV to see how it compares, perhaps some of it is related to the LCD monitor's capabilities too.

    Update: Both S-video and composite look slightly better on the plasma but still not 100%. There's probably still something wrong with the clock I think. It's not locked properly to the line rate or something.

    IMG_2566.jpg
    1280 x 960 - 224K
  • potatoheadpotatohead Posts: 9,817
    edited 2019-11-12 - 07:22:46
    Could be your color burst reference, relative to the porch, is drifting.

    To check this, get a single pixel, vertical, white line where the pixel is half a color burst cycle. In resolution terms, that is 320 pixels or greater. Put it against a black background.

    Crank the color saturation up and that line should artifact, and on composite that is normal. On Y C it should not, or won't very much, depending on your set.

    The hue of that artifact will be constant when the color burst timing relative to that little porch stays constant. Typically, on P2 with the NCO, it drifts some and will cycle through all hues over some period of time.

    The artifact on composite will be a color tinted added to the line as its higher frequency component is seen as color info. Looks a bit like a shimmer when the timing error results in a high difference frequency. (In the hz range, typically)

    You can do an xzero at end of frame to correct this. For interlaced, you may have to do it on one odd and one even line, near end of frame to preserve proper phase changes on each field. I did not actually do that for an interlaced field.

    Secondly, you can tweak the timing just a shade to make it some even divisor to eliminate creep, due to the small fractional error. Maybe couple that with xzero.

    My guess anyway.

    Some gross phase differences just do not resolve well too, depending on the display.

    Really, even with Y C, there are only 160 (320 interlaced horizontally when phase change is happening properly) color cycles per active line. A blue text on red or green background at higher resolution exceeds the information carrying capacity of the signal. Composite is the worst, with Y C being somewhat better.

    Imagine each of those as a rainbow repeating over and over and the spatial precision of color info is quite coarse. 320 pixels is about the max, again with color phase change happening properly. Without it, or a constant phase, that goes down to 160 pixels in the safe region, maybe 256 with some fringing.

    It is hard to tell from a static image, but that signal is not bad. You appear not far from peak composite / S-video there.

    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • roglohrogloh Posts: 1,446
    edited 2019-11-12 - 08:22:04
    potatohead wrote: »
    Could be your color burst reference, relative to the porch, is drifting.
    It wouldn't surprise me. When I scoped it and triggered on the rising sync edge, I saw the colour burst rolling along on the scanline, so it's probably not locked and slipping somewhere.
    It is hard to tell from a static image, but that signal is not bad. You appear not far from peak composite / S-video there.

    Yeah for images like the bird one it actually looks reasonably good from a normal distance, unless there is static/random noise displayed, then the colours start to really phase in and out and don't stay stable. The text is probably pushing the resolution colour limit as you mentioned. S-Video looks a whole lot better than the composite output.

    I think I'm not going to spend too much time hunting it down at this point. It would be better to release this code sooner, so others can try it and perhaps help us resolve some of the TV timing details if things can be improved further.

    I just want to get the last component SD/HD sync setup stuff into the driver as that looks rock solid on my plasma in Chip's and my own experimental code. That's really the final bit of integration work. The output side is all complete (the COG is full), it's just the initalization code in LUTRAM I'm touching now.

    By the way the interlaced text mode I added looks nice (not shown above), and it shrinks the text back down to a normal size.
  • roglohrogloh Posts: 1,446
    edited 2019-11-13 - 10:28:55
    Was changing some S-video parameter setup code and I have absolutely no idea what I changed without going back to older versions but now my S-video images+text seem really nice compared to what I had yesterday, and pretty much back to how I remembered it looking the first time. In comparison though composite still seems rather ripply. Something's still not quite right there perhaps.

    In other updates, I have the progressive component HD and progressive component SD integrated into the driver now at it's working and looks rock solid at 50 & 60Hz with 480p/576p/720p. Just have the interlaced SD component and the tricky interlaced 1080i component sync variants left to put in. I'm wondering what to do about SD progressive s-video and progressive composite outputs if ever selected. I haven't thought about those two combinations yet. Dealing with all the different sync choices is actually quite challenging and the dynamically patched code is getting difficult to decipher.

    Update: 1080i tri-level sync interlaced component is in and working now on my LCD and plasma.
  • potatoheadpotatohead Posts: 9,817
    edited 2019-11-13 - 07:27:34
    About the only time 240p will come up is old computer and game consoles. I would skip it, if code is tight. Few will want it. Those that do (me maybe) can just do it anyway.

    Omit the half scan line and just repeat first field and that will work for a lot of cases. 30 hz frame consisting of two fields becomes one 60 hz frame, one field.

    Older, more hacks than official signals, like Apple 2 or Atari, can either be approximated (and potentially look better), or those will get done as a special case, maybe even skipping the color system entirely. The P1 video system methods could be used with better results on P2, IMHO. Better DACS.

    For that composite shimmer, an xzero at end of field may cure the shimmer you are seeing. That is a small timing error due to nco error accumulation. The timing between the color burst and the little porch leading into active pixels will vary a little. This has been present on P2 since early days. Little work went into resolving it, beyond using xzero to stabilize it.

    P1 used a PLL that did not have that issue. (Well, it did, but was stable, only changing on cold start) A precision tweak will be needed to make it stable on P2, or a hack like xzero in the right place will keep resetting things before they accumulate and become visible.

    Or, we don't care and just use video, component or rgb at the low, 15khz sweeps.

    That is my take on it anyway.

    Just got my rev B board! Fun weekend coming up.

    Post this thing already! Asking for a friend :D
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • potatohead wrote: »
    Post this thing already! Asking for a friend :D
    LOL. All good things come to those that wait...

    Thanks for the suggestions on the potential interlaced TV video fixes.

    I visited Tubular and Ozpropdev today and gave them a demo of this driver and all different outputs and some of the features it supports. Here's a photo of what can be achieved with the P2 and this video driver at full power. It is 1080i over HD component with interlaced text at 8 scanlines high (a CGA font) and 240 columns wide in text mode. The P2 is running overclocked at ~350MHz for this with the single COG doing everything. You can see the mouse sprites in the two regions.

    In double wide mode, which is still 120 columns, you can generate this with (only) a 297MHz P2. :smile:

    In theory you could do this using pass through down at 74.25MHz but you'd need other COG(s) to render the text into scanline buffers instead of having the video driver do it for you. How many COGs you'd need exactly for hitting that I'm not sure but it could well be more than one. Not shown but it can even work with a 6 scanline font which gives 240x180 text resolution (just mad). This is on the Dell LCD, an HDTV may not have the same fine colour resolution.
    4000 x 3000 - 4M
  • Yeah that was simply amazing to see all that. The Dell monitor has 5 inputs and Roger drove each of those 5 interfaces in turn with just a few tweaks of the code.

    Roger was also excited because he had discovered a spare long earlier in the day lol...

    Sorry I had to head off early. At the talk this evening this founder of a $200m company was joking about "who codes in machine code anymore"... you could have proudly answered in the affirmative
  • It was very impressive Roger!
    Now, what to do with that pesky spare long! :lol:
    Melbourne, Australia
  • roglohrogloh Posts: 1,446
    edited 2019-11-14 - 11:40:55
    ozpropdev wrote: »
    It was very impressive Roger!
    Now, what to do with that pesky spare long! :lol:

    Saving it for any more critical bug fixes! :lol:

    Here's the code for text generation. I am hoping to shave off another long 11 lines from the end where it just needs to clear the carry flag (the c register doesn't get used after this point), can anyone see a way to do this or see other optimizations in the cursor code or elsewhere? I could clear the carry in the increment instruction two lines earlier but it gets used on the next line. Perhaps Z (setup with AND/OR/XORs) can be used on its own instead in the wrlong which would allow this.

    I reckon there is scope there for saving an instruction by better making use of the flags.
    ' Code to generate the next text scan line and cursor(s)
    gen_text    
                mov     b, rowscan              'build font table base address 
                shl     b, #8                   'for this font and row's scanline 
                add     b, fontaddr      
                setq    #64-1                   '64 longs holds 256 bytes of font
                rdlong  font, b                 'read in font data for scanline
    
                testb   modedata, #7 wz         'flashing / full colour background?
        if_z    setr    testflash, #$83         'use text flashing code test
        if_nz   setr    testflash, #$EA         'change into helfpul zerox c,#15 wc
    
                testb   modedata, #5 wz         'pixel double test
                setq2   #120-1                  'read maximum of 120 longs from HUB
                rdlong  $110, ptra              'to get next 240 chars with colours
    p9          mov     pb, #$10f+COLS/2        'setup LUT read pointer at end
    p10 if_z    sub     pb, #COLS/4             '...of where character data is
    
                mov     save, ptrb              'save pointer register
                mov     ptrb, #$1ff             'setup write location in LUT RAM     
    
    p1  if_z    sets    adv, #COLS              'increase by half normal columns
    p2  if_nz   sets    adv, #COLS*2            'increase by normal columns
    
                mov     a, #%11000 wc           'reset initial skipf pattern
    p3  if_z    rep     @.endwide-2, #COLS/2    '2100 clocks for double wide mode
    p4  if_nz   rep     @.endnormal-2, #COLS    '2760 clocks for normal wide mode
                skipf   a                       'skip 2 of the next 5 instructions
                xor     a, #%11110              'flip skip sequence for next time
                rdlut   d, pb                   'read pair of characters/colours
                getword c, d, #0                'select first word in long (skipf)
                getword c, d, #1                'select second word in long (skipf)
                sub     pb, #1                  'decrement LUT read index (skipf)
                getbyte b, c, #0                'extract font offset for char 
                altgb   b, #font                'determine font lookup address
                getbyte pixels, 0-0, #0         'get font for character's scanline
    testflash   bitl    c, #15 wcz              'test (and clear) flashing bit
    flash if_c  and     pixels, #$ff            'make it all background if flashing
                movbyts c, #%01010101           'colours becomes BF_BF_BF_BF
                mov     b, c                    'grab a copy for muxing step next
                rol     b, #4                   'b becomes FB_FB_FB_FB
                setq    ##$F0FF000F             'mux mask adjusts fg and bg colours
                muxq    c, b                    'c becomes FF_FB_BF_BB
                testb   modedata, #5 wz         'repeat columns test, z was trashed
        if_nz   movbyts c, pixels               'select pixel colours for char
        if_nz   wrlut   c, ptrb--               'write coloured pixel data into LUT 
    .endnormal
                setword pixels, pixels, #1      'replicate low words in long
                mergew  pixels                  '...to then double pixels
                mov     b, c                    'save a copy before we lose colours
                movbyts c, pixels               'compute 4 lower colours of char
                ror     pixels, #8              'get upper 8 pixels
                movbyts b, pixels               'compute 4 higher colours of char
        if_z    wrlut   b, ptrb--               'save it to LUT RAM
        if_z    wrlut   c, ptrb--               'save it to LUT RAM
    .endwide
                mov     ptrb, save              'restore ptrb
    p5          setq2   #COLS-1                 'write all column pixels to HUB RAM
    p11         wrlong  $200-COLS, ptrb         'from LUT storage
    
                'do both cursors here
                bitz    increment, #2           'setup whether cursor is doubled
                bitz    scale, #0               'and multiply offset accordingly
    
                rep     @.endcursor, #2         'repeat twice for two cursors
                mov     c, cursor1+0            'get cursor data
                xor     $-1, #1                 'alternate cursors
                getbyte a, c, #1                'get cursor's x position (col)
                getword b, c, #1                'get cursor's y position (row)
    scale       shl     a, #2+0                 'transform x to long address offset
                add     a, ptrb                 'add offset to start of line buffer
    cursflash   testbn  fieldcount, #4-0 wz     'get flashing bit from field count
                testb   c, #4 xorz              'select the blink phase to apply
                testbn  c, #6 orz               'if blink enabled, apply a flash    
        if_z    cmp     b, row wz               'check if cursor is on this row
                testb   c, #7 andz              'check if cursor is enabled
                testb   c, #5 wc                'check solid/underline type cursor
                mov     b, rowheight            'get last scanline of character
                sub     b, #2                   'subtract 2 scanline high cursor
        if_nc   cmpr    rowscan, b wc           'compare this scanline count
                setnib  c, c, #1                'replicate colour nibble in byte 0
                movbyts c, #0                   'replicate colour over all 8 pixels
     if_c_and_z wrlong  c, a                    'write cursor color to line buffer
    increment   add     a, #4-0                 'advance to next long if wide text
     if_c_and_z wrlong  c, a                    'repeat to get double wide cursor
    .endcursor
                mov     c, 0 wc                 'any way to eliminate this instrn?
                testbn  modedata, #6 wz         'z=1 if line doubling off
                testbn  scanline, #0 orz        'z=1 if second line
    if_z        incmod  rowscan, rowheight wc   'z=1 & c=1 if wrapped
    
                testb   modedata, #10 andz
    if_z_and_c  add     rowscan, #1
    if_z_and_nc incmod  rowscan, rowheight wc
    
        if_c    add     row, #1                 'advance to next row
    adv if_c    add     ptra, #0-0              'advance by half or full columns
        if_c    add     ptra, skew              'allows windowing into wider screen
                jmp     #selectbuf              'select next buffer to write to
    
    
  • ersmithersmith Posts: 3,523
    edited 2019-11-14 - 12:25:33
    Could you re-write "testbn modedata, #6 wz" to "test modedata, #1<<6 wcz"? Then if Z is set (so the bit was 0) C will be 0; otherwise C will be set by the following incmod instruction (as it is now). I may have misread this though, I find the testbX instructions with Z a bit hard to wrap my head around sometimes :)

    EDIT: Oops, something similar would have to be done to the next testbn as well, to handle the case where Z=0 and C=1 after the first test. Perhaps something like:
            test   modedata #(1<<6) wcz
      if_nz test   sscanline, #(1<<0) wcz
      if_z  incmod  rowscan, rowheight wc
    
  • Can you use the add instruction 2 lines above unless overflow is possible?

    increment add a, #4-0 wc
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Cluso99 wrote: »
    Can you use the add instruction 2 lines above unless overflow is possible?

    increment add a, #4-0 wc

    No, because the old value of C is needed in the test for the instruction after that add.
  • ersmith wrote: »
    . Perhaps something like:
            test   modedata #(1<<6) wcz
      if_nz test   sscanline, #(1<<0) wcz
      if_z  incmod  rowscan, rowheight wc
    

    Thanks Eric, I'll think about this - at first glance it seems it might work, but I need to check it carefully tomorrow. I've been burned before with logical cases that slip through and break after you think you've cracked it. You are right about the logic tests with Z flag - it's a hard one to reconcile sometimes. The worst is testbn with Z, it's like double reversal and it sometimes does my head in. I've also been making mistakes with habitually typing in "test" instead of "testb" and bad things obviously happen when you do that too.
    Cluso99 wrote: »
    Can you use the add instruction 2 lines above unless overflow is possible?

    increment add a, #4-0 wc

    Cluso, that's exactly the approach I am very interested in too. It won't overflow because it's a 512kB hub address, but as Eric mentioned I need to use the C flag on the following line. That's why I was looking at re-ordering cursor test code above to free up the C flag but perhaps Eric found me another long to keep a different way.

    If I have to do a special xzero for NTSC phase error fix per frame as potatohead suggested that will cost at least another instruction somewhere, so the more spare I find the better things can be, to fix any last lingering errors. Thankfully I don't need any more features now but I think keeping a handful of longs is better than none at all.
Sign In or Register to comment.