All PASM2 gurus - help optimizing a text driver over DVI?

1232425262729»

Comments

  • roglohrogloh Posts: 2,341
    edited 2019-12-20 - 00:56:03
    @TonyB_ I just tried out your optimization to save an instruction. Using it it appears I am able to get 40 column text working with flashing text at 1x clocks (no mouse). 80 columns seems tantalizingly close. I think it would be best to not force all text down to 40 columns when at 2x clocks though just to enable flashing text (which would be one way to do it). Better to disable flashing capability at this lower clock rate altogether, and enable it at 3x/4x.
    How about bitz monoflash1,#20?
    I'll have to decipher what flipping bit 20 does to the testb instruction. It might work if it turns it into a bitc and this could save a few setup instructions if they can both be merged into the original 3 instruction block
  • rogloh wrote: »
    How about bitz monoflash1,#20?
    I'll have to decipher what flipping bit 20 does to the testb instruction. It might work if it turns it into a bitc and this could save a few setup instructions if they can both be merged into the original 3 instruction block
    I edited my post, monoflash1 is now the rdlut instruction with wc.
  • TonyB_TonyB_ Posts: 1,487
    edited 2019-12-20 - 01:08:10
    TonyB_ wrote: »
    rogloh wrote: »
    How about bitz monoflash1,#20?
    I'll have to decipher what flipping bit 20 does to the testb instruction. It might work if it turns it into a bitc and this could save a few setup instructions if they can both be merged into the original 3 instruction block
    I edited my post, monoflash1 is now the rdlut instruction with wc.
    We'll get there in the end...

    It might be bitc monoflash1,#20, not bitz, depending on which flag you're using. The idea is to copy the global flash enable to the opcode write C bit.
  • It might work with some other changes. monoflash2 is still setting C so that would have to be stopped as well.
  • In fact the next instruction after the second monoflash test is also setting C based on whether the ptra is odd/even . This is the real problem as we can't kill that one from using C.
  • I can't see why my suggestion won't work but I can't see the current source either!
  • roglohrogloh Posts: 2,341
    edited 2019-12-20 - 01:31:07
    Here's the current source I'm playing around with TonyB_ ... and it's in a state of flux with these different ideas being tried.
    '..................................................................................................
    ' Code to generate the next text scan line and cursor(s)
    
    gen_text    
                                mov     b, rowscan              'build font table base address 
                                shl     b, #8                   'for this font and row's scanline 
                                add     b, fontaddr      
                                setq    #64-1                   '64 longs holds 256 bytes of font
                                rdlong  font, b                 'read in font data for scanline
    
                                testb   modedata, #8 wc         'test global flash enable
    {
                if_c            setd    monoflash1, #d          'use text flashing code test
                if_nc           setd    monoflash1, #monoflash  '
    }
    
                                testb   modedata, #10 wz        'pixel double test
                                setq2   #40-1                  'read maximum of 120 longs from HUB
                                rdlong  $110, ptra              'to get next 240 chars with colours
    p9                          mov     pb, #$10f+COLS/2        'setup LUT read pointer at end
    p10         if_z            sub     pb, #COLS/4             '...of where character data is
    
                                mov     save, ptrb              'save pointer register
                                mov     ptrb, #$1ff             'setup write location in LUT RAM     
    
    p1          if_z            sets    adv, #COLS              'increase by half normal columns
    p2          if_nz           sets    adv, #COLS*2            'increase by normal columns
    
    '--- patched code for mono flashing text starts from here down
      
                if_c            setd    monoflash2, #d          'use text flashing code test
                if_nc           setd    monoflash2, #monoflash  '
     
     
                                push    ptra
                                mov     ptra, pb
    
    p3          if_z            rep     @endwide, #COLS/4       'double wide mode
    p4          if_nz           rep     @endnormal, #COLS/2     'single wide mode
    
                                rdlut   d, ptra-- wc            'read 2 characters
    
                                getbyte b, d, #2                'get MS char
                                altgb   b, #font                'lookup font
                                rolbyte c, 0-0, #0              'get pixels
    ' monoflash1                  testb   d, #31 wc               'test flashing attribute
                if_c            andn    c, monoflash                'if flashing set to background (0)
    
                                getbyte b, d, #0                'get LS char
                                altgb   b, #font                'lookup font
                                rolbyte c, 0-0, #0              'get pixels
    monoflash2                  testb   d, #15 wc               'test flashing attribute
                if_c            andn    c, monoflash                'if flashing set to background (0)
    
                                testb   ptra, #0 wc
                if_nz_and_c     wrlut   c, ptrb--               'store normal wide pixels every second iteration
    
    endnormal                   setword c, c, #1                'setup MSW
                                mergew  c                       'double pixels in wide mode
                if_z            wrlut   c, ptrb--               'store double wide pixels
    endwide
                                pop     ptra wc
                                jmp     #continue
    monoflash                   long    $000000ff               'toggles between 0 and ff
                                nop
                                nop
                                nop
                                nop
                                nop
    continue
    
  • roglohrogloh Posts: 2,341
    edited 2019-12-20 - 01:33:03
    It's a pity I can't skip the second "getbyte b, d, #0" instruction and just use "altgb d, #font" in the following line directly on the LSByte of d. Tried it but the 9th bit (bit8) kills it if set in the attributes.
  • roglohrogloh Posts: 2,341
    edited 2019-12-20 - 03:52:09
    By the way, there was this little used feature of monochrome VGA cards where I think if the attribute was blue then underline would be turned on. At 3x/4x clocks per pixel if we have enough remaining longs to squeeze this in and the extra performance in the inner loop that capability could potentially be added.

    Update: I now have some operational test code that supports mono underline text plus flashing attributes in VGA resolution and have it working at 3x clocks per pixel (no mouse), or 4x clocks per pixel with the mouse. It just fits into the 31 long space allocated. I am wondering what should enable it. I'll probably make it such that when the flashing attribute is enabled in the region, the underline attribute is also enabled, rather than invent a separate and independent global control bit. So if you want flashing text without underline you'd need to clear bit0 of the attribute byte and vice versa. The only thing I don't like is that it kills the mouse for 3x clocks/pixel. Looks like we are over budget there. Starting to become a few too many combinations perhaps.
  • TonyB_TonyB_ Posts: 1,487
    edited 2019-12-22 - 01:36:23
    rogloh wrote: »
    It's a pity I can't skip the second "getbyte b, d, #0" instruction and just use "altgb d, #font" in the following line directly on the LSByte of d. Tried it but the 9th bit (bit8) kills it if set in the attributes.
    rogloh wrote: »
    By the way, there was this little used feature of monochrome VGA cards where I think if the attribute was blue then underline would be turned on. At 3x/4x clocks per pixel if we have enough remaining longs to squeeze this in and the extra performance in the inner loop that capability could potentially be added.

    In the Monochrome Display Adapter, bit 0 of the attribute (foreground blue) = 1 for underline but only if the other five RGB bits = 0. The MDA is not being emulated fully here anyway, e.g. no reverse video, so why not stipulate that bit 0 = 0 and avoid the second getbyte? I don't know whether any switching between monochrome and colour with same character/attribute data is intended. If so foreground must not contain blue. Maybe the intensity bit could be used for underline or reverse perhaps instead? Presumably there is some global register to set the mono colour?

    Does the rdlut with wc not save an instruction after all?
  • roglohrogloh Posts: 2,341
    edited 2019-12-23 - 04:32:17
    Yes RDLUT with WC does save an instruction but only if bit31 is tested first if there are two attributes in use (i.e. it needs to be used for the underline test which has to happen before the flashing attribute bit test optionally zeroes the font data depending on the blink state). I guess I could reorder the attributes my own way however I was sort of hoping to keep things the same between colour and mono for compatibility so you don't have to re-code your application format and could share it for various clock speeds etc. This means the flashing bit should really be bit31/bit15 in the long being read from LUT as this is how colour also does it. Unfortunately underline sort of only works with mono anyway as it's not possible to do with colour (no room for it in the COG). As a result its not really compatible with the colour so I may not end up doing any underline, we'll see, or I could perhaps put it in as bit3 in the attributes byte instead and mandate bit0 being zero to save a cycle. But this is somewhat of a hack. I really don't like it when the application has to do weird things to deal with driver restrictions like that in order to prevent garbage coming out.

    The actual mono text colour is done via the 2 colour palette which selects foreground(1) and background(0) and is settable per field/frame.

    Update: I think what I might end up doing is making attribute bit7 = flashing, and attribute bit11 = underline if I support underline for monochrome text output at the higher p2clk:pixel clock ratios. This is probably the most compatible with colour usage, and means a highlighted foreground colour basically means to underline in mono (for a standard type of 16 colour VGA palette) whenever both attributes are globally enabled in the region. I probably won't try to use or restrict bit0. Turns out these text attributes are real clock cycle hogs with 4 clocks needed for the two instructions per attribute (one for the test, one to modify the data), done on every 8 pixels. This is 12.5MIPS getting used per attribute just in VGA resolution, which feels excessive to me.
  • jmgjmg Posts: 14,369
    I have data from Winbond m these new HyperRAM parts :
    Part Number  VCC/VCCQ I/O Width Package         Interface       MHz,Temp
    W957D8MFYA5I 1.8V       8       24 balls TFBGA, DDP HyperBus 200MHz, -40°C~85°C
    W957A8MFYA5I 3.0V       8       24 balls TFBGA, DDP HyperBus 200MHz, -40°C~85°C
    W957A8MFYA6I 3.0V       8       24 balls TFBGA, DDP HyperBus 166MHz, -40°C~85°C
    Dual-Die-Package (DDP) ), two of 64M bit chip sealed in one 24 balls TFBGA package
    
    Nice to see 200MHz specs at 3.0V - not sure of delivery
  • Yes that is good to see. Even their slower 166MHz parts should come in handy. Not everyone will need to run the P2 > 333MHz, but I guess if we find we can clock HyperRAM reliably at 1x P2 clock ratio the 200MHz part should be good too for when the P2 is operated in a 166-200MHz range.

    I like to run a P2 @252MHz but operating the existing 100MHz rated 3V HyperRAM within it's spec at only 63MHz is not quite so nice for performance.
  • evanhevanh Posts: 9,623
    edited 2020-01-09 - 10:26:42
    Those won't make reliable operation any better on the prop2 eval boards. Sysclock/1 is always going to require careful tuning even as a borderline case.

    Having the two chips closely placed physically with minimal equal impedances on all signal tracks is needed for opening the reliable frequency window further. In other words it'll need a specially designed board with a single integrated hyperRAM part.

    PS: Someone did post a partly done board layout a few months back that included one such hyperRAM.
  • cgracey wrote: »
    I updated the v33 documentation to cover Q issues:

    SETQ CONSIDERATIONS

    The SETQ and SETQ2 instructions write to the Q register and are intended to precede a companion instruction. The value written to the Q register by SETQ/SETQ2 will persist until any of these events occur:

    * XORO32 executes - Q is set to the XORO32 result.
    * RDLUT executes - Q is set to the data read from the lookup RAM.
    * GETXACC executes - Q is set to the Goertzel sine accumulator value.
    * CRCNIB executes - Q gets shifted left by four bits.
    * COGINIT/QDIV/QFRAC/QROTATE executes without a preceding SETQ instruction - Q is set to zero.
    ...
    Chip,
    It seems to me that the SCA instruction should also be in that list also. It appears to act the same way of feeding data directly to the ALU rather than modifying the following instruction fields that other prefixing instructions do, eg: ALTx.

    If SCA is correctly not in the list, why is it different?

  • jmgjmg Posts: 14,369
    evanh wrote: »
    Those won't make reliable operation any better on the prop2 eval boards. Sysclock/1 is always going to require careful tuning even as a borderline case.

    True enough. It's not clear yet what MHz is going to be process and temperature tolerant for trouble free volume production.
    The good thing about 200MHz 3V parts is they avoid over-clock question marks, and they should have tighter timing spreads, part to part, (at least on paper), which has to be better for volume production yields.


Sign In or Register to comment.