@TonyB_ I just tried out your optimization to save an instruction. Using it it appears I am able to get 40 column text working with flashing text at 1x clocks (no mouse). 80 columns seems tantalizingly close. I think it would be best to not force all text down to 40 columns when at 2x clocks though just to enable flashing text (which would be one way to do it). Better to disable flashing capability at this lower clock rate altogether, and enable it at 3x/4x.
How about bitz monoflash1,#20?
I'll have to decipher what flipping bit 20 does to the testb instruction. It might work if it turns it into a bitc and this could save a few setup instructions if they can both be merged into the original 3 instruction block
I'll have to decipher what flipping bit 20 does to the testb instruction. It might work if it turns it into a bitc and this could save a few setup instructions if they can both be merged into the original 3 instruction block
I edited my post, monoflash1 is now the rdlut instruction with wc.
I'll have to decipher what flipping bit 20 does to the testb instruction. It might work if it turns it into a bitc and this could save a few setup instructions if they can both be merged into the original 3 instruction block
I edited my post, monoflash1 is now the rdlut instruction with wc.
We'll get there in the end...
It might be bitc monoflash1,#20, not bitz, depending on which flag you're using. The idea is to copy the global flash enable to the opcode write C bit.
In fact the next instruction after the second monoflash test is also setting C based on whether the ptra is odd/even . This is the real problem as we can't kill that one from using C.
It's a pity I can't skip the second "getbyte b, d, #0" instruction and just use "altgb d, #font" in the following line directly on the LSByte of d. Tried it but the 9th bit (bit8) kills it if set in the attributes.
By the way, there was this little used feature of monochrome VGA cards where I think if the attribute was blue then underline would be turned on. At 3x/4x clocks per pixel if we have enough remaining longs to squeeze this in and the extra performance in the inner loop that capability could potentially be added.
Update: I now have some operational test code that supports mono underline text plus flashing attributes in VGA resolution and have it working at 3x clocks per pixel (no mouse), or 4x clocks per pixel with the mouse. It just fits into the 31 long space allocated. I am wondering what should enable it. I'll probably make it such that when the flashing attribute is enabled in the region, the underline attribute is also enabled, rather than invent a separate and independent global control bit. So if you want flashing text without underline you'd need to clear bit0 of the attribute byte and vice versa. The only thing I don't like is that it kills the mouse for 3x clocks/pixel. Looks like we are over budget there. Starting to become a few too many combinations perhaps.
It's a pity I can't skip the second "getbyte b, d, #0" instruction and just use "altgb d, #font" in the following line directly on the LSByte of d. Tried it but the 9th bit (bit8) kills it if set in the attributes.
By the way, there was this little used feature of monochrome VGA cards where I think if the attribute was blue then underline would be turned on. At 3x/4x clocks per pixel if we have enough remaining longs to squeeze this in and the extra performance in the inner loop that capability could potentially be added.
In the Monochrome Display Adapter, bit 0 of the attribute (foreground blue) = 1 for underline but only if the other five RGB bits = 0. The MDA is not being emulated fully here anyway, e.g. no reverse video, so why not stipulate that bit 0 = 0 and avoid the second getbyte? I don't know whether any switching between monochrome and colour with same character/attribute data is intended. If so foreground must not contain blue. Maybe the intensity bit could be used for underline or reverse perhaps instead? Presumably there is some global register to set the mono colour?
Does the rdlut with wc not save an instruction after all?
Yes RDLUT with WC does save an instruction but only if bit31 is tested first if there are two attributes in use (i.e. it needs to be used for the underline test which has to happen before the flashing attribute bit test optionally zeroes the font data depending on the blink state). I guess I could reorder the attributes my own way however I was sort of hoping to keep things the same between colour and mono for compatibility so you don't have to re-code your application format and could share it for various clock speeds etc. This means the flashing bit should really be bit31/bit15 in the long being read from LUT as this is how colour also does it. Unfortunately underline sort of only works with mono anyway as it's not possible to do with colour (no room for it in the COG). As a result its not really compatible with the colour so I may not end up doing any underline, we'll see, or I could perhaps put it in as bit3 in the attributes byte instead and mandate bit0 being zero to save a cycle. But this is somewhat of a hack. I really don't like it when the application has to do weird things to deal with driver restrictions like that in order to prevent garbage coming out.
The actual mono text colour is done via the 2 colour palette which selects foreground(1) and background(0) and is settable per field/frame.
Update: I think what I might end up doing is making attribute bit7 = flashing, and attribute bit11 = underline if I support underline for monochrome text output at the higher p2clk:pixel clock ratios. This is probably the most compatible with colour usage, and means a highlighted foreground colour basically means to underline in mono (for a standard type of 16 colour VGA palette) whenever both attributes are globally enabled in the region. I probably won't try to use or restrict bit0. Turns out these text attributes are real clock cycle hogs with 4 clocks needed for the two instructions per attribute (one for the test, one to modify the data), done on every 8 pixels. This is 12.5MIPS getting used per attribute just in VGA resolution, which feels excessive to me.
Yes that is good to see. Even their slower 166MHz parts should come in handy. Not everyone will need to run the P2 > 333MHz, but I guess if we find we can clock HyperRAM reliably at 1x P2 clock ratio the 200MHz part should be good too for when the P2 is operated in a 166-200MHz range.
I like to run a P2 @252MHz but operating the existing 100MHz rated 3V HyperRAM within it's spec at only 63MHz is not quite so nice for performance.
Those won't make reliable operation any better on the prop2 eval boards. Sysclock/1 is always going to require careful tuning even as a borderline case.
Having the two chips closely placed physically with minimal equal impedances on all signal tracks is needed for opening the reliable frequency window further. In other words it'll need a specially designed board with a single integrated hyperRAM part.
PS: Someone did post a partly done board layout a few months back that included one such hyperRAM.
I updated the v33 documentation to cover Q issues:
SETQ CONSIDERATIONS
The SETQ and SETQ2 instructions write to the Q register and are intended to precede a companion instruction. The value written to the Q register by SETQ/SETQ2 will persist until any of these events occur:
* XORO32 executes - Q is set to the XORO32 result.
* RDLUT executes - Q is set to the data read from the lookup RAM.
* GETXACC executes - Q is set to the Goertzel sine accumulator value.
* CRCNIB executes - Q gets shifted left by four bits.
* COGINIT/QDIV/QFRAC/QROTATE executes without a preceding SETQ instruction - Q is set to zero.
...
Chip,
It seems to me that the SCA instruction should also be in that list also. It appears to act the same way of feeding data directly to the ALU rather than modifying the following instruction fields that other prefixing instructions do, eg: ALTx.
If SCA is correctly not in the list, why is it different?
Those won't make reliable operation any better on the prop2 eval boards. Sysclock/1 is always going to require careful tuning even as a borderline case.
True enough. It's not clear yet what MHz is going to be process and temperature tolerant for trouble free volume production.
The good thing about 200MHz 3V parts is they avoid over-clock question marks, and they should have tighter timing spreads, part to part, (at least on paper), which has to be better for volume production yields.
Comments
I'll have to decipher what flipping bit 20 does to the testb instruction. It might work if it turns it into a bitc and this could save a few setup instructions if they can both be merged into the original 3 instruction block
It might be bitc monoflash1,#20, not bitz, depending on which flag you're using. The idea is to copy the global flash enable to the opcode write C bit.
Update: I now have some operational test code that supports mono underline text plus flashing attributes in VGA resolution and have it working at 3x clocks per pixel (no mouse), or 4x clocks per pixel with the mouse. It just fits into the 31 long space allocated. I am wondering what should enable it. I'll probably make it such that when the flashing attribute is enabled in the region, the underline attribute is also enabled, rather than invent a separate and independent global control bit. So if you want flashing text without underline you'd need to clear bit0 of the attribute byte and vice versa. The only thing I don't like is that it kills the mouse for 3x clocks/pixel. Looks like we are over budget there. Starting to become a few too many combinations perhaps.
In the Monochrome Display Adapter, bit 0 of the attribute (foreground blue) = 1 for underline but only if the other five RGB bits = 0. The MDA is not being emulated fully here anyway, e.g. no reverse video, so why not stipulate that bit 0 = 0 and avoid the second getbyte? I don't know whether any switching between monochrome and colour with same character/attribute data is intended. If so foreground must not contain blue. Maybe the intensity bit could be used for underline or reverse perhaps instead? Presumably there is some global register to set the mono colour?
Does the rdlut with wc not save an instruction after all?
The actual mono text colour is done via the 2 colour palette which selects foreground(1) and background(0) and is settable per field/frame.
Update: I think what I might end up doing is making attribute bit7 = flashing, and attribute bit11 = underline if I support underline for monochrome text output at the higher p2clk:pixel clock ratios. This is probably the most compatible with colour usage, and means a highlighted foreground colour basically means to underline in mono (for a standard type of 16 colour VGA palette) whenever both attributes are globally enabled in the region. I probably won't try to use or restrict bit0. Turns out these text attributes are real clock cycle hogs with 4 clocks needed for the two instructions per attribute (one for the test, one to modify the data), done on every 8 pixels. This is 12.5MIPS getting used per attribute just in VGA resolution, which feels excessive to me.
Nice to see 200MHz specs at 3.0V - not sure of delivery
I like to run a P2 @252MHz but operating the existing 100MHz rated 3V HyperRAM within it's spec at only 63MHz is not quite so nice for performance.
Having the two chips closely placed physically with minimal equal impedances on all signal tracks is needed for opening the reliable frequency window further. In other words it'll need a specially designed board with a single integrated hyperRAM part.
PS: Someone did post a partly done board layout a few months back that included one such hyperRAM.
It seems to me that the SCA instruction should also be in that list also. It appears to act the same way of feeding data directly to the ALU rather than modifying the following instruction fields that other prefixing instructions do, eg: ALTx.
If SCA is correctly not in the list, why is it different?
True enough. It's not clear yet what MHz is going to be process and temperature tolerant for trouble free volume production.
The good thing about 200MHz 3V parts is they avoid over-clock question marks, and they should have tighter timing spreads, part to part, (at least on paper), which has to be better for volume production yields.