No urgency - now that its looking glitch free, there's not so much need for capture. But being able to adjust would be a great addition
By the way I see there is now a Qt for microcontrollers. They suggest access to 256 MB ram, and clock speeds of 500 MHz and up. I think the biggest Hyperrams are 32 MB and may have boundary issues if they are stacked die. They are demoing Qt on STM32 microcontroller devices. While they suggest 256 MB many of the demos are ~10MB (this may just be the media files?). Anyway something else to keep an eye on
256 MB RAM for a GUI toolkit? What are they smoking?
(for reference, Windows 98, a complete multitasking OS with nice-looking, customizable, optionally HW accelerated, partially HTML-based GUI, requires 16MB RAM and a 66MHz 486 CPU, according to Microsoft (realistically, you want 64MB and a Pentium or K6, but whatever))
A decent GUI toolkit for P2 should not need more than 32k of code for basic components(buttons, checkboxes, text fields, etc) and dynamic layout (MigLayout, anyone?), plus what is needed for framebuffer and bitmapped graphics
Yeah 256MB and 500MHz, well that is hardly micro-controller territory IMO, but more embedded SoC microprocessor stuff like ARM etc.
By the way it looks like the way the existing P2 USB stack is currently coded we are not just going to be able to change the P2 clock rate on the fly and reload the USB COGs if we ever change video resolutions that use different P2 clocks for dynamic mode testing etc. In the USB code the P2 frequency is used statically at compile time to determine various delay constants etc. At some point it could make sense to patch the USB stack code at startup time depending on the operational clock frequency, a bit like how I patch my video driver with the appropriate sync code dynamically depending on startup parameters etc. This would also help in the USB port A vs port B use because if you want a keyboard and a mouse on different ports it looks like the USB driver code is loaded in twice with different compile time parameters depending on the port desired, and that also doubles the footprint in HUB RAM. Ideally these port parameters could also get patched at run time, they are basically just different pin constants in the code in some places so it should be even simpler than changing all the clock dependent delays (as there are quite of few of those).
Well, I've found some sort of anomaly with the hyperRAM when reading across any 2 MByte boundary. It requires large, >200kB, continuous reads though. Smaller blocks that blindly cross don't have a problem.
It's looking suspiciously like RWDS is briefly activating. I guess it's time get the scope out again ...
EDIT: Oh, suck, RWDS is toggling the whole way through! That's what it does on reads. That's going to be painful to identify a stretched cycle.
EDIT2: Lol, I suppose the answer is don't do stupidly large single blocks.
Oops, scratch all that. The anomaly is there but it's with the hyperRAM writes, not the reads. Same answer, don't do large transfers as one large block.
Looks like HyperRAM cells can hold some of their contents for several hours being powered down. I wrote something to the memory earlier yesterday and it was powered off overnight. Ran it again this morning with no image data and reading it back still shows some evidence of the original data. It obviously now has lots of errors though which shows up as noise, but it gives it a pleasing dithered effect. But it's interesting that the internal DRAM can sustain a charge for that long when normally you need to refresh it every 64ms. We are probably talking over 16 hours now since that was loaded into DRAM! LOL.
Looks like HyperRAM cells can hold some of their contents for several hours being powered down. I wrote something to the memory earlier yesterday and it was powered off overnight. Ran it again this morning with no image data and reading it back still shows some evidence of the original data. It obviously now has lots of errors though which shows up as noise, but it gives it a pleasing dithered effect. But it's interesting that the internal DRAM can sustain a charge for that long when normally you need to refresh it every 64ms. We are probably talking over 16 hours now since that was loaded into DRAM! LOL.
Hehe, careful, you'll have the spooks paranoid about their top secret info !! Power removal is not 100% information removal !
I've just tested a monochrome text mode for P2's clocked from 2x-4x the pixel rate with some modifications to my driver. Previously if you tried to do any (colour) text at these clock speeds it would not work and the output would slow to the point where it was not usable and it would lose sync.
This output photo I took shows two regions in 1280x1024 mode, with the P2 running at only a 2x pixel clock.
The top region is mono text (in blue) and bottom is HyperRAM 16bpp graphics that I can drag around with the mouse and am writing into on the fly (lots of circles being drawn). Text was double wide in this shot, but single wide text works too.
I'm going to modify the init code for the driver so at startup time when it knows you are operating below a 5x pixel clock it will patch the text output code to do monochrome (2 colour palette mode) instead of 16 colours. The source text data format will still remain 16 bit VGA words to maintain application compatibility.
I was also looking for a way to get the flashing text attribute bit working with mono output but I don't think there are sufficient clock cycles and instruction space to patch that in unfortunately. I only have 1280 clocks for VGA at 2x and this has to load the row's input data, load the 64 long font for the scanline, compute 80 or 40 characters worth of pixels, and write the pixel data back to HUB all while the streamer is active displaying the previous line. If I can find a way I will add it though. Even mono text is still rather tight with everything else when operating at a 2x pixel clock. Doing any 1x text is not a chance unless perhaps I tried to invent a 20 column mode. LOL. But given only the transparent graphics mode is to be done at 1x this is not really going to be an issue.
Here's the new inner loop code for mono text I just patch over the existing colour text code. I have 5 spare nops in the footprint allowed. If anyone knows a way to do flashing text that fits let me know. There are 3 other instructions elsewhere that can be used to check for flashing text on/off as well, and another one that can already toggle something at 2Hz. They could also be patched. But that is it and the cycle budget is critical in the single wide mode. I'm at 38 clocks per 32 pixels right now in my inner loop, this leaves 1280-20*38 = 520 cycles left in VGA mode at a 2x pixel clock which still has to load 40 longs of characters, read 64 longs of font, write out 640 mono pixels (20 longs), plus the other per scan line operations like the cursor and setup overhead, plus probably leave room for some region changes at the end of the last scan line in the region (final budget not determined).
p3 if_z rep @endwide, #COLS/8 'double wide mode
p4 if_nz rep @endnormal, #COLS/4 'single wide mode
rdlut d, pb 'read 2 characters
sub pb, #1 'decrement LUT lookup pos
getbyte b, d, #2 'get MS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
getbyte b, d, #0 'get LS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
if_z mov a, c 'preserve for double wide
rdlut d, pb 'read next 2 chars
sub pb, #1 'decrement LUT lookup pos
getbyte b, d, #2 'get MS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
getbyte b, d, #0 'get LS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
if_nz wrlut c, ptrb-- 'store normal wide pixels
endnormal setword c, c, #1 'setup MSW
setword a, a, #1 'setup MSW
mergew a 'double pixels in wide mode
mergew c 'double pixels in wide mode
if_z wrlut a, ptrb-- 'store double wide pixels
if_z wrlut c, ptrb-- 'store double wide pixels
endwide
nop
nop
nop
nop
nop
Update: Tried this code out at VGA resolution instead of SXGA and it has some timing budget issues and isn't working nicely with 40 column text and 80 columns is not syncing. I will have to check at which point it falls over - I think I have an idea. The higher resolutions might be providing it with more time as it saves up more spare cycles due to having more total characters. Running VGA at 3x does seem to fix the issue but I'd like to quite like to be able to run mono text on it at 2x if I can. But perhaps that might not be possible...
Update2: VGA mono text at 2x pixel clock for the P2 clock seems to have enough cycles to work at 40 or 80 columns without a mouse enabled in the text region so this is still going to be very useful at lower clock speeds with a single video COG.
Now I can do text at 2x clocks (in some modes at least), here's about the limit of what can be achieved on my driver with text...
This is some 1920x1200 mono text. It's actually a screen of 240x200 characters with a 6 scan line font. Text is my driver spin code. Looks amazingly hires on my Dell at this native resolution. Open the second image to see it full size.
Looks like HyperRAM cells can hold some of their contents for several hours being powered down. I wrote something to the memory earlier yesterday and it was powered off overnight. Ran it again this morning with no image data and reading it back still shows some evidence of the original data. It obviously now has lots of errors though which shows up as noise, but it gives it a pleasing dithered effect. But it's interesting that the internal DRAM can sustain a charge for that long when normally you need to refresh it every 64ms. We are probably talking over 16 hours now since that was loaded into DRAM! LOL.
That's amazing! I guess the sense amplifiers are always differential, and presumably the tiniest bias can be restored.
I've just tested a monochrome text mode for P2's clocked from 2x-4x the pixel rate with some modifications to my driver. Previously if you tried to do any (colour) text at these clock speeds it would not work and the output would slow to the point where it was not usable and it would lose sync.
That's looking good.
Can you ad a version summary to post #1, to track where things are at ?
I think this now applies a CS MAX rule, but that rule may be 16us ?
In trawling other data, that 16us option setting is not seen on other vendor's, they all seem to use 4us, and some newer parts claim to have a thermally tracking refresh timer, which could make testing a real pain ! One other vendor has a 'go slower' bit, they say self checks is below some tolerance temperature.
The temp auto-variable timer model specs this Maximum Standby Current
o 600μA @ 105°C
o 400μA @ 85°C
o 200μA @ 25°C
so that suggests a 3:1 oscillator range over that temperature span.
That's looking good.
Can you ad a version summary to post #1, to track where things are at ?
I think this now applies a CS MAX rule, but that rule may be 16us ?
In trawling other data, that 16us option setting is not seen on other vendor's, they all seem to use 4us, and some newer parts claim to have a thermally tracking refresh timer, which could make testing a real pain !
...
I'll try to update the other driver thread soon with a new beta that includes the HyperRAM stuff which allows more fun and lists the updated capabilities in the release notes. But "soon" means when I get around to it. I was hoping to put together some examples demonstrating the extent of my driver's capabilities but have been sidetracked to date, plus XMAS is fast approaching now and I'm getting distractions and doing other things. But it's still on my list.
In the current code I am testing with, the HyperRAM driver is being configured with ISSI's 16us setting for max CS low time. I still want to break up the bursts so they can be less than 4us and optimize it further... as any extra overhead can have an impact on the resolutions+bit depths possible and bandwidth remaining for other COGs.
This is precisely 6x CGA 320x200 resolution, in each direction.
So with a 6x6 font you can put an entire character where we once had a single pixel
Great work well done
I had this background windows desktop pic a while ago of all these old PC games using a mosaic of a whole bunch of 320x200 screenshots on my 2560x1600 monitor, so it probably used 64 screenshots. I'll have to dig it up and make it into a 1920x1200 version. It looked great.
@Tubular Exactly. Would have looked really amazing if each sub-window was live gameplay and the whole mosaic turned into a video (or even some type of repeating animated gif etc) ... but that would have taken a lot more work and time I probably didn't have.
Hmm, I wonder what image data rate a P2 could sustain streaming off the fastest SD cards without any compression and write into HUB/HyperRAM? It will likely require SD 4 bit mode support first. I think you can get max transfer numbers in the 50-104MB/s range using UHS-I cards if the IO voltage is 1.8V. But that requires external voltage level translation because it is meant to be operating at 1.8V. The 1.8V writes could probably use the fast BitDAC mode, but reads have the issue if the P2 comparator is going to be too slow. Same issues as HyperRAM basically, and in addition I think you may need to switch the supply voltage to the card dynamically as well.
I found this part that does this SD voltage translation but it requires extra direction control signals that would complicate things slightly. Still could be useful though...
Thats a neat level translator, good find. I don't think the direction signals would be much of a burden in the scheme of things
Yes we really need to get some figures on this input comparator hooked up beside the slow DAC. Should be easy enough to get the P2 to characterise it, itself. In theory
@jmg It does look nice, and rather new. I guess it would then bring the P2 IO pin count for SD cards up to 12 from 4 (in SPI mode) if all pins needed to be wired for the interface to operate fully. Plus two more optional ones for WP/CD if they are needed unless they can be combined with some control pins using resistors, as can be done in SPI mode on the P1 etc. Still should be fairly doable if a higher performing SD card on the P2 is ever required. Will have to add it to my never ending project list...
@Tubular, yeah the direction control pins would likely only need to be flipped at the start and end of transactions so the streamer could potentially still operate in between on the bulk data. I think in SD mode you can overlap control and data however which could possibly complicate things but they have independent direction controls anyway. We could possibly bit bang one and use the streamer for the nibble transfers at the same time.
I was also looking for a way to get the flashing text attribute bit working with mono output but I don't think there are sufficient clock cycles and instruction space to patch that in unfortunately. I only have 1280 clocks for VGA at 2x and this has to load the row's input data, load the 64 long font for the scanline, compute 80 or 40 characters worth of pixels, and write the pixel data back to HUB all while the streamer is active displaying the previous line. If I can find a way I will add it though. Even mono text is still rather tight with everything else when operating at a 2x pixel clock. Doing any 1x text is not a chance unless perhaps I tried to invent a 20 column mode. LOL. But given only the transparent graphics mode is to be done at 1x this is not really going to be an issue.
Here's the new inner loop code for mono text I just patch over the existing colour text code. I have 5 spare nops in the footprint allowed. If anyone knows a way to do flashing text that fits let me know. There are 3 other instructions elsewhere that can be used to check for flashing text on/off as well, and another one that can already toggle something at 2Hz. They could also be patched. But that is it and the cycle budget is critical in the single wide mode. I'm at 38 clocks per 32 pixels right now in my inner loop, this leaves 1280-20*38 = 520 cycles left in VGA mode at a 2x pixel clock which still has to load 40 longs of characters, read 64 longs of font, write out 640 mono pixels (20 longs), plus the other per scan line operations like the cursor and setup overhead, plus probably leave room for some region changes at the end of the last scan line in the region (final budget not determined).
p3 if_z rep @endwide, #COLS/8 'double wide mode
p4 if_nz rep @endnormal, #COLS/4 'single wide mode
rdlut d, pb 'read 2 characters
sub pb, #1 'decrement LUT lookup pos
getbyte b, d, #2 'get MS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
getbyte b, d, #0 'get LS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
if_z mov a, c 'preserve for double wide
rdlut d, pb 'read next 2 chars
sub pb, #1 'decrement LUT lookup pos
getbyte b, d, #2 'get MS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
getbyte b, d, #0 'get LS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
if_nz wrlut c, ptrb-- 'store normal wide pixels
endnormal setword c, c, #1 'setup MSW
setword a, a, #1 'setup MSW
mergew a 'double pixels in wide mode
mergew c 'double pixels in wide mode
if_z wrlut a, ptrb-- 'store double wide pixels
if_z wrlut c, ptrb-- 'store double wide pixels
endwide
nop
nop
nop
nop
nop
If the cycle budget is critical in the single wide mode, could you auto-decrement rdlut addresses and manually decrement wrlut one? That could give you an extra long for blinking, or maybe use it to toggle c and add nc or c to the if_nz and if_z prefixes so that you don't need to duplicate almost half the code?
@TonyB_
I might be able to save ptra and restore it from the stack (2 instructions outside the loop) and then use that register instead of pb + decrement. I'd need c for testing each character's flashing state bit so that's four more (or possibly two instructions more if the two parts could be combined). I think the global flash on/off test outside the loop could be removed in mono mode so the character's attribute always defines the flash state. I will need to keep z free for wide/normal testing so both text loops can be shared.
Even if flashing text can't be done in P2's operating at 2x pixel clocks due to the timing budget, it might be doable in a 3-4x clock operation so it is worth considering if it can be made to fit the COGRAM footprint somehow. At 3x pixels clocks, only the COGRAM budget becomes the issue, not the scan line cycle time budget. So I can have two different mono text codebases that get patched in depending on the pixel clock to P2 clock ratio. This allows scaling of features dynamically with system performance which is good.
Eg. It could become something like this (needs testing to confirm):
1x clock: transparent mode only, streaming raw image data from existing memory frame/line buffers, any text rendered via other sprite COG(s), no mouse overlay unless provided by the other COG(s)
2x clock: mono text mode + gfx (except 32bpp mode from int hub memory, ext memory 32bpp ok), mouse only in non-text regions for VGA (mouse allowed in text regions for > 800x600? TBD)
3x clock: mono flashing text mode + gfx (incl 32 bpp mode), mouse in any region, some pixel doubling
4x clock: (ditto)
5x clock: colour flashing text mode, pixel doubling (perhaps not in 32bpp mode, TBD)
6x+ clock: all features possible, pixel doubling in all modes
@TonyB_. Thinking more about what you said....I think there is scope for it.
For the flashing monochrome text variant first I can patch this original code from the coloured text version:
add fieldcount, #1 'increase the field counter
test fieldcount, #15 wz 'check for 16 fields elapsed
if_z xor flash, #$ff 'flash text ~2Hz
....
testb modedata, #8 wz 'flashing / full colour background?
if_z setr testflash, #$83 'use text flashing code test
if_nz setr testflash, #$EA 'change into helfpul zerox c,#15 wc
changing it into this version:
add fieldcount, #1 'increase the field counter
test fieldcount, #15 wz 'check for 16 fields elapsed
if_z xor monoflash, #$ff 'flash text ~2Hz
...
testb modedata, #8 wz 'test global flash enable
if_z setd monoflash1, #d 'use text flashing code test
if_nz setd monoflash1, #zeroval 'change into test code that doesn't flash
Then I'd patch the new mono+flash text code block as this code below. It all fits with 4 longs to spare! I think for standard width text it does the inner loop in 27 clocks (now done per 16 pixels), instead of taking 38 clocks for 32 pixels. This flashing code consume another 320 clocks from the 1280 clock scan line budget for VGA so is definitely not going to fit with the 2x pixel clock operation, but I expect it would certainly fit in at the higher rates. I'll code it up soon and try it out at 3x pixel clock rates. I think it should work out fine with any luck.
testb modedata, #8 wc 'test global flash enable
if_c setd monoflash2, #d 'use text flashing code test
if_nc setd monoflash2, #zeroval 'change into test code that doesn't flash
push ptra
mov ptra, pb
p3 if_z rep @endwide, #COLS/4 'double wide mode
p4 if_nz rep @endnormal, #COLS/2 'single wide mode
rdlut d, ptra-- 'read 2 characters
getbyte b, d, #2 'get MS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
monoflash1 testb d, #31 wc 'test flashing attribute
if_c and c, monoflash 'if flashing set to background (0)
getbyte b, d, #0 'get LS char
altgb b, #font 'lookup font
rolbyte c, 0-0, #0 'get pixels
monoflash2 testb d, #15 wc 'test flashing attribute
if_c and c, monoflash 'if flashing set to background (0)
testb ptra, #0 wc
if_nz_and_nc wrlut c, ptrb-- 'store normal wide pixels every second iteration
endnormal setword c, c, #1 'setup MSW
mergew c 'double pixels in wide mode
if_z wrlut c, ptrb-- 'store double wide pixels
endwide
pop ptra
jmp #continue
monoflash long $ffffffff 'LSByte toggles between 00 and ff
zeroval long 0
nop
nop
nop
nop
continue
The first flashing attribute could be read for free with rdlut d, ptra-- wc.
Isn't the "1280 clock scan line budget for VGA" actually 1600 including blanking?
Yes it is more than 1280 clocks per total scan line but given the way the architecture of this driver works we can't make use of the entire scanline for computing the next one, because there is some per line fifo/streamer/mode housekeeping and the mouse sprite, sync and other region stuff to be done. To make things work out I basically need to compute the next scanline during the active portion of the existing one being streamed out, in this case at 2x the pixel clock that is 2x640=1280 clocks for VGA.
Good news - I tested the code above and managed to fix it to work for VGA at 3x with the mouse and flashing text.
So it looks like this for VGA resolution (this could differ for other resolutions, but VGA is more demanding with fewer clocks than SVGA, XGA etc):
Mono text needs 2x pixel clock with no mouse in text regions eg. P2 clk ~ 50MHz
Flashing mono text needs 3x or 4x pixel clock including the mouse, e.g. 75MHz < P2 clk < 100MHz
Colour/Flashing text needs 5x pixel clock including the mouse eg. P2 clk > 125MHz
The first flashing attribute could be read for free with rdlut d, ptra-- wc.
Yes this is a good idea too. However I'd have to change the way I patch the monoflash1 instruction so I could get rid of it. Instead I could modify the following and instruction to work on a different register (like "a" instead of "c") when flashing is disabled outside the loop. Handy to still keep the global flashing on/off control bit that already exists in the region config.
The first flashing attribute could be read for free with rdlut d, ptra-- wc.
Yes this is a good idea too. However I'd have to change the way I patch the monoflash1 instruction so I could get rid of it. Instead I could modify the following and instruction to work on a different register (like "a" instead of "c") when flashing is disabled outside the loop. Handy to still keep the global flashing on/off control bit that already exists in the region config.
How about bitz monoflash1,#20? EDIT monoflash1 label moved to rdlut d, ptra-- wc
Thanks for the clock budget clarification. Your 6x8 font at 1920x1200 should look fabulous in amber, e.g. RGB (255,191,0). I have a Zenith amber monitor in the attic that I always found easy on the eye.
Comments
By the way I see there is now a Qt for microcontrollers. They suggest access to 256 MB ram, and clock speeds of 500 MHz and up. I think the biggest Hyperrams are 32 MB and may have boundary issues if they are stacked die. They are demoing Qt on STM32 microcontroller devices. While they suggest 256 MB many of the demos are ~10MB (this may just be the media files?). Anyway something else to keep an eye on
(for reference, Windows 98, a complete multitasking OS with nice-looking, customizable, optionally HW accelerated, partially HTML-based GUI, requires 16MB RAM and a 66MHz 486 CPU, according to Microsoft (realistically, you want 64MB and a Pentium or K6, but whatever))
A decent GUI toolkit for P2 should not need more than 32k of code for basic components(buttons, checkboxes, text fields, etc) and dynamic layout (MigLayout, anyone?), plus what is needed for framebuffer and bitmapped graphics
By the way it looks like the way the existing P2 USB stack is currently coded we are not just going to be able to change the P2 clock rate on the fly and reload the USB COGs if we ever change video resolutions that use different P2 clocks for dynamic mode testing etc. In the USB code the P2 frequency is used statically at compile time to determine various delay constants etc. At some point it could make sense to patch the USB stack code at startup time depending on the operational clock frequency, a bit like how I patch my video driver with the appropriate sync code dynamically depending on startup parameters etc. This would also help in the USB port A vs port B use because if you want a keyboard and a mouse on different ports it looks like the USB driver code is loaded in twice with different compile time parameters depending on the port desired, and that also doubles the footprint in HUB RAM. Ideally these port parameters could also get patched at run time, they are basically just different pin constants in the code in some places so it should be even simpler than changing all the clock dependent delays (as there are quite of few of those).
It's looking suspiciously like RWDS is briefly activating. I guess it's time get the scope out again ...
EDIT: Oh, suck, RWDS is toggling the whole way through! That's what it does on reads. That's going to be painful to identify a stretched cycle.
EDIT2: Lol, I suppose the answer is don't do stupidly large single blocks.
Oops, scratch all that. The anomaly is there but it's with the hyperRAM writes, not the reads. Same answer, don't do large transfers as one large block.
Hehe, careful, you'll have the spooks paranoid about their top secret info !! Power removal is not 100% information removal !
This output photo I took shows two regions in 1280x1024 mode, with the P2 running at only a 2x pixel clock.
The top region is mono text (in blue) and bottom is HyperRAM 16bpp graphics that I can drag around with the mouse and am writing into on the fly (lots of circles being drawn). Text was double wide in this shot, but single wide text works too.
I'm going to modify the init code for the driver so at startup time when it knows you are operating below a 5x pixel clock it will patch the text output code to do monochrome (2 colour palette mode) instead of 16 colours. The source text data format will still remain 16 bit VGA words to maintain application compatibility.
I was also looking for a way to get the flashing text attribute bit working with mono output but I don't think there are sufficient clock cycles and instruction space to patch that in unfortunately. I only have 1280 clocks for VGA at 2x and this has to load the row's input data, load the 64 long font for the scanline, compute 80 or 40 characters worth of pixels, and write the pixel data back to HUB all while the streamer is active displaying the previous line. If I can find a way I will add it though. Even mono text is still rather tight with everything else when operating at a 2x pixel clock. Doing any 1x text is not a chance unless perhaps I tried to invent a 20 column mode. LOL. But given only the transparent graphics mode is to be done at 1x this is not really going to be an issue.
Here's the new inner loop code for mono text I just patch over the existing colour text code. I have 5 spare nops in the footprint allowed. If anyone knows a way to do flashing text that fits let me know. There are 3 other instructions elsewhere that can be used to check for flashing text on/off as well, and another one that can already toggle something at 2Hz. They could also be patched. But that is it and the cycle budget is critical in the single wide mode. I'm at 38 clocks per 32 pixels right now in my inner loop, this leaves 1280-20*38 = 520 cycles left in VGA mode at a 2x pixel clock which still has to load 40 longs of characters, read 64 longs of font, write out 640 mono pixels (20 longs), plus the other per scan line operations like the cursor and setup overhead, plus probably leave room for some region changes at the end of the last scan line in the region (final budget not determined).
Update: Tried this code out at VGA resolution instead of SXGA and it has some timing budget issues and isn't working nicely with 40 column text and 80 columns is not syncing. I will have to check at which point it falls over - I think I have an idea. The higher resolutions might be providing it with more time as it saves up more spare cycles due to having more total characters. Running VGA at 3x does seem to fix the issue but I'd like to quite like to be able to run mono text on it at 2x if I can. But perhaps that might not be possible...
Update2: VGA mono text at 2x pixel clock for the P2 clock seems to have enough cycles to work at 40 or 80 columns without a mouse enabled in the text region so this is still going to be very useful at lower clock speeds with a single video COG.
This is some 1920x1200 mono text. It's actually a screen of 240x200 characters with a 6 scan line font. Text is my driver spin code. Looks amazingly hires on my Dell at this native resolution. Open the second image to see it full size.
That's amazing! I guess the sense amplifiers are always differential, and presumably the tiniest bias can be restored.
So with a 6x6 font you can put an entire character where we once had a single pixel
Great work well done
That data retention is crazy long. As we gain the ability to make increasingly perfect structures, maybe refresh times can be extended in the future.
Can you ad a version summary to post #1, to track where things are at ?
I think this now applies a CS MAX rule, but that rule may be 16us ?
In trawling other data, that 16us option setting is not seen on other vendor's, they all seem to use 4us, and some newer parts claim to have a thermally tracking refresh timer, which could make testing a real pain ! One other vendor has a 'go slower' bit, they say self checks is below some tolerance temperature.
The temp auto-variable timer model specs this
Maximum Standby Current
o 600μA @ 105°C
o 400μA @ 85°C
o 200μA @ 25°C
so that suggests a 3:1 oscillator range over that temperature span.
I'll try to update the other driver thread soon with a new beta that includes the HyperRAM stuff which allows more fun and lists the updated capabilities in the release notes. But "soon" means when I get around to it. I was hoping to put together some examples demonstrating the extent of my driver's capabilities but have been sidetracked to date, plus XMAS is fast approaching now and I'm getting distractions and doing other things. But it's still on my list.
In the current code I am testing with, the HyperRAM driver is being configured with ISSI's 16us setting for max CS low time. I still want to break up the bursts so they can be less than 4us and optimize it further... as any extra overhead can have an impact on the resolutions+bit depths possible and bandwidth remaining for other COGs.
I had this background windows desktop pic a while ago of all these old PC games using a mosaic of a whole bunch of 320x200 screenshots on my 2560x1600 monitor, so it probably used 64 screenshots. I'll have to dig it up and make it into a 1920x1200 version. It looked great.
Hmm, I wonder what image data rate a P2 could sustain streaming off the fastest SD cards without any compression and write into HUB/HyperRAM? It will likely require SD 4 bit mode support first. I think you can get max transfer numbers in the 50-104MB/s range using UHS-I cards if the IO voltage is 1.8V. But that requires external voltage level translation because it is meant to be operating at 1.8V. The 1.8V writes could probably use the fast BitDAC mode, but reads have the issue if the P2 comparator is going to be too slow. Same issues as HyperRAM basically, and in addition I think you may need to switch the supply voltage to the card dynamically as well.
I found this part that does this SD voltage translation but it requires extra direction control signals that would complicate things slightly. Still could be useful though...
https://assets.nexperia.com/documents/data-sheet/IP4856CX25_C.pdf
Nice looking part - includes ESD protection, and a quite fast interface.
Yes we really need to get some figures on this input comparator hooked up beside the slow DAC. Should be easy enough to get the P2 to characterise it, itself. In theory
@Tubular, yeah the direction control pins would likely only need to be flipped at the start and end of transactions so the streamer could potentially still operate in between on the bulk data. I think in SD mode you can overlap control and data however which could possibly complicate things but they have independent direction controls anyway. We could possibly bit bang one and use the streamer for the nibble transfers at the same time.
If the cycle budget is critical in the single wide mode, could you auto-decrement rdlut addresses and manually decrement wrlut one? That could give you an extra long for blinking, or maybe use it to toggle c and add nc or c to the if_nz and if_z prefixes so that you don't need to duplicate almost half the code?
I might be able to save ptra and restore it from the stack (2 instructions outside the loop) and then use that register instead of pb + decrement. I'd need c for testing each character's flashing state bit so that's four more (or possibly two instructions more if the two parts could be combined). I think the global flash on/off test outside the loop could be removed in mono mode so the character's attribute always defines the flash state. I will need to keep z free for wide/normal testing so both text loops can be shared.
Even if flashing text can't be done in P2's operating at 2x pixel clocks due to the timing budget, it might be doable in a 3-4x clock operation so it is worth considering if it can be made to fit the COGRAM footprint somehow. At 3x pixels clocks, only the COGRAM budget becomes the issue, not the scan line cycle time budget. So I can have two different mono text codebases that get patched in depending on the pixel clock to P2 clock ratio. This allows scaling of features dynamically with system performance which is good.
Eg. It could become something like this (needs testing to confirm):
1x clock: transparent mode only, streaming raw image data from existing memory frame/line buffers, any text rendered via other sprite COG(s), no mouse overlay unless provided by the other COG(s)
2x clock: mono text mode + gfx (except 32bpp mode from int hub memory, ext memory 32bpp ok), mouse only in non-text regions for VGA (mouse allowed in text regions for > 800x600? TBD)
3x clock: mono flashing text mode + gfx (incl 32 bpp mode), mouse in any region, some pixel doubling
4x clock: (ditto)
5x clock: colour flashing text mode, pixel doubling (perhaps not in 32bpp mode, TBD)
6x+ clock: all features possible, pixel doubling in all modes
For the flashing monochrome text variant first I can patch this original code from the coloured text version: changing it into this version: Then I'd patch the new mono+flash text code block as this code below. It all fits with 4 longs to spare! I think for standard width text it does the inner loop in 27 clocks (now done per 16 pixels), instead of taking 38 clocks for 32 pixels. This flashing code consume another 320 clocks from the 1280 clock scan line budget for VGA so is definitely not going to fit with the 2x pixel clock operation, but I expect it would certainly fit in at the higher rates. I'll code it up soon and try it out at 3x pixel clock rates. I think it should work out fine with any luck.
Isn't the "1280 clock scan line budget for VGA" actually 1600 including blanking?
EDIT:
Removed nonsense.
Good news - I tested the code above and managed to fix it to work for VGA at 3x with the mouse and flashing text.
So it looks like this for VGA resolution (this could differ for other resolutions, but VGA is more demanding with fewer clocks than SVGA, XGA etc):
Mono text needs 2x pixel clock with no mouse in text regions eg. P2 clk ~ 50MHz
Flashing mono text needs 3x or 4x pixel clock including the mouse, e.g. 75MHz < P2 clk < 100MHz
Colour/Flashing text needs 5x pixel clock including the mouse eg. P2 clk > 125MHz
Yes this is a good idea too. However I'd have to change the way I patch the monoflash1 instruction so I could get rid of it. Instead I could modify the following and instruction to work on a different register (like "a" instead of "c") when flashing is disabled outside the loop. Handy to still keep the global flashing on/off control bit that already exists in the region config.
How about bitz monoflash1,#20?
EDIT monoflash1 label moved to rdlut d, ptra-- wc
Thanks for the clock budget clarification. Your 6x8 font at 1920x1200 should look fabulous in amber, e.g. RGB (255,191,0). I have a Zenith amber monitor in the attic that I always found easy on the eye.