50 column VGA text driver in one cog - 60/50/40/30/25/20...rows! (RELEASED)
Bill Henning
Posts: 6,445
This is the second driver in my high resolution, single cog flexible VGA text driver series.
MikronautsSVGA50 - 50 column driver, requires 5MHz crystal for 80MHz prop clock speed
As far as a monitor is concerned, it is VESA 800x600 60Hz timing.
I added VGA_Text like methods, so it should be a drop in replacement (except for palette handling).
The archive contains five fonts, and the number of times each scan line is displayed can be set in the driver, so this one driver provides the following text modes:
50x75, 50x60, 50x50, 50x40, 50x30, 50x25, 50x20
50x15, 50x12, 50x10, 50x8, 50x6, 50x5
It is a regular "text mode" driver, and as such, uses one byte per character in the screen buffer.
The first 128 characters contain 32 graphic elements, then 96 ASCII characters, and the second 128 characters normally contain the inverse of the first 128 characters. The font is interleaved - this was the only way I could get the driver to work fast enough to get 50 columns!
I am MIT licensing the driver. Please don't change the file names so at least I get some exposure out of it
Thanks to kuroneko's hub/vid forced sync patch, this driver now released as v1.00
kuroneko sent in an improvement to his sync patch, so I updated the driver to v1.01
I'll fix the 8x8 font problem kuroneko found tomorrow.
MikronautsSVGA50 - 50 column driver, requires 5MHz crystal for 80MHz prop clock speed
As far as a monitor is concerned, it is VESA 800x600 60Hz timing.
I added VGA_Text like methods, so it should be a drop in replacement (except for palette handling).
The archive contains five fonts, and the number of times each scan line is displayed can be set in the driver, so this one driver provides the following text modes:
50x75, 50x60, 50x50, 50x40, 50x30, 50x25, 50x20
50x15, 50x12, 50x10, 50x8, 50x6, 50x5
It is a regular "text mode" driver, and as such, uses one byte per character in the screen buffer.
The first 128 characters contain 32 graphic elements, then 96 ASCII characters, and the second 128 characters normally contain the inverse of the first 128 characters. The font is interleaved - this was the only way I could get the driver to work fast enough to get 50 columns!
I am MIT licensing the driver. Please don't change the file names so at least I get some exposure out of it
Thanks to kuroneko's hub/vid forced sync patch, this driver now released as v1.00
kuroneko sent in an improvement to his sync patch, so I updated the driver to v1.01
I'll fix the 8x8 font problem kuroneko found tomorrow.
Comments
Try the following changes to the CON block:
hf = 20
hs = 64
hb = 44
This puts the timing back to bog standard VESA 800x600@60Hz with 40MHz dot clock
(actually the above numbers work with the 20MHz dot clock I am using to make the 50 column / 400 pixel "half" SVGA mode)
You may need to play with those values a bit. I had different values in the CON block, ones suited to my test monitor.
I'll try a couple of other monitors and throw a scope on it later today - I appreciate your reports, it is a big help with debugging the beta!
I am concerned with the noise you found on the sync signals; I have an "or" to the color word in the driver that should have prevented that.
I've been giving it some thought - perhaps I need to let the video PLL settle some more... I had a similar issue with my Morpheus drivers. I have to test some boards before I can debug it some more.
Perhaps I should use the counters in a different mode... not the video pll mode... hmmm NCO/single ended without an output pin should work if I divide the system clock by four to generate the 20MHz clock this mode needs
The LCD panel I'm using it's picky, but not at that point... at the moment I cannot test with another one.
If it can be of any help, I noticed that the times it's going to sync (it may require even an half dozen resets, but it's absolutely unpredictable), the demo board will show a slightly different light pattern (less uniform) on leds, which seems a bit strange to me ?:-/
(But this driver has the same problem)
After lunch I did a bit of testing with ViewPort ... and I verified the WEIRD output on hsync/vsync on the VGA64 driver - and I bet ViewPort will show it on the SVGA50 as well.
It is extremely weird as:
- I verified that I OR hsync/vsync with $0303 to make sure HSYNC/VSYNC is high during regular pixel display
- I verified its not the 'color' long getting clobbered by adding a 'fixedcolor' long
Good thing I only released the drivers as 'BETA' versions
ViewPort verifies that it is NOT noise, but output on the pins, so my main suspect right now is something wrong with pipelining the WAITVID's the way I am doing it, trying to pseudo-sync them to hub cycles.
All of my new drivers depend on keeping the visible line display synced to the hub, I was counting on waitvid happening in the right window (the 9 clock cycles available after I read the pixel values). Waitvid is supposed to take 5+ clocks, kurenko's tests showed it taking 4-6 cycles... so I thought I was safe.
One thing I can try is to shave some time off the hsync routine by in-lining it, perhaps I am throwing the first hub read out of whack... the other is to use NCO mode to divide down the CPU clock, which *should* result in better synchronization.
I could also try "priming" the pipeline by reading the first 8 pixels to display during the horizontal back porch, after reading the color entry, and have the first waitvid happen right after.
Waitvids are 6 cycles essentially.
At 5 cycles, the D&S data will be unstable. I believe Linus was the first to run into this with his 800x600 display post, where the waitvids would display the data, even with some of them commented out. They were just looping, picking up what was on the D&S busses. Some of you might remember the humorous, "look ma! No hands!" he had left in the comments.
(that post is now gone too, so I can't link it)
Nicely done, if I'm reading that right
Also, you've got basically this:
You deffo need to prime the pump to start this off right, IMHO. Look at the potatotext TV COG, in the "real_pixels" subroutine for a great loop that Eric did that basically does a similar thing, using the waitvid in parallel with the HUB OP. The loop sets up one "character" for the first read to get things running, and that character is just blank pixels, set to the border color. A calculation happens where the sync and border is just adjusted for active lines, where this read will occur, and it all kind of works.
I'm a little concerned about the single waitvid being packed between the two hubops. The waitvid being 6 cycles to really get it's S&D data may not work well there, and neither will another instruction... Have you put a nop in there, just for grins? Would be curious to see whether or not that's tolerated.
What you are wanting to do here may be possible though.
Have you tried fetching a long of character data at a time?
It seems to me, with the way you are storing your font data, you might be able to just shift bytes out, and shift pixels into the "pixel" long, for a net gain in throughput. Worked out in my driver, but the sweep timing is slower, and the operation profile different.
expanding it to a RDLONG would gain you all those almost aligned cycles, but you've got a shift, mask, or, add set of instructions to deal with. Might not be a gain. Still, worth stepping through... Doesn't look promising at first glance. No way to hit the two instruction window with the shift, add, mask, or that's gotta happen.
You could reorder your loop...
RDBYTE
add
waitvid
RDBYTE
add
Use the "prime the pump" operation with blank pixels to set this up, so that the waitvid is operating off the data from the earlier loop.
You are reading it right :-) ... and thanks!
I thought the back porch pixels being output would overlap the rdlong to fetch the colors and the first characters worth of the unrolled loop above; that may still work if I inline waitvids for hf/hs/hb instad of calling Chips clever "do blank / do hsync / do most of vsync" routine.
I spent a lot of time with paper, pencil, calculator.. then spreadsheet... doing timing analysis before trying this.
It *SHOULD* work because I am deliberately using 20MHz clock for the 64/50 drivers (pll4x), and 25MHz (pll4x) with a 6.25Mhz xtal for the 80 column mode
Why?
Because then each pixel ends up being four processor cycles... thus I had 32 cycles to fetch/decode the tile, and by unrolling, leaving a 9 cycle window for waitvid to play with.
Priming + in-line hsync + NCO mode instead of PLL *should* fix it, I should have some time tonight and tomorrow to try that.
Whats funny is that it works almost all the time with my test monitor!
Nope, I deliberately wanted to give Waitvid some slop, but I can try a NOP.
The initial rolled up version that was counting on 5-cycle waitvids was a disaster (it had a djnz after the waitvid)
The math, and spreadsheets say it should work... probably just needs priming, inline hsync, and maybe NCO.
The re-order idea is a good one...
The read long for character data cannot work as I'd be fetching the next row of the character.
I do have a design for an 80x25 for 80MHz that does that, however it requires two blank rows above each character to do prefetches. I have not finished that one yet, as due to the blank rows it cannot do contigous vertical graphics characters.
all:
I've just spent some more quality time with ViewPort - no joy yet.
I tried a quick-and-dirty "priming", and it did not help.
Looking at the logic analyzer output, it sure looks like it is outputting something different than is supplied to WAITVID; so I guess I have more experiements to run.
By priming, re-ordering to put waitvid first, and NCO I still think it should work... or at least I hope so, because it usually syncs on my monitor, and looks great...
I like Bill's trick of storing the font in the other orientation. Might have to do that on potatotext when it goes to VGA. Hate storing fonts reversed and shuffled, but... hate not getting the higher character densities too.
@Bill. I think this is kind of instructive. I really wish I had archived the post that Linus did. He did a similar thing, where each pixel was a perfect multiple of a cycle, ending up with a unrolled loop very similar to yours. There is something about doing that, and the D&S latch on the waitvid, that contradicts the 5 cycles we've come up with. The thing isn't stable at 5 in my adventures, but works well at 6. Problem with that is it latches the data, when it latches it... Not entirely sure when that is. For almost every use case, that bit of data isn't all that important. It sure looks like your case should work.
Probably, several events are coming together at once. Hopefully, the options you've got open to you shuffle things around enough to break whatever it is that's allowing the waitvid to catch what I would call "bus noise", whatever is on there at that indeterminate time.
I might have a copy of the code Linus did though. I'm looking for it. There are a coupla areas about the waitvid I don't understand well yet, and this is one of them. The other being the impact of vcfg mid frame. Ran into trouble on that, switching from 2 to 4 color mode a time or two. Different subject, but it's why I've not written a driver that can do mixed modes in a scan line yet.
Maybe you will shake out a new fact or two
Thanks! It was the only way I could get the timing right... on paper anyway.
Yep, I wish you had archived it too. I figured that a 9 cycle window would be plenty!!!
I just tried several other instruction orders... no joy.
Thanks for looking for Linus' code! It might help.
As far as I can tell, the code *should* work, and in fact it does most of the time with my little 5" monitor!
Let me know what you find out about mid-frame vcfg changes
Argh this is the problem with pushing the envelope - you run into boundary conditions!
I can do single cog hirez text drivers by using a blank scan line to hide half the hub accesses, but that is (a) cheating and (b) does not allow vertical graphics by stacking tiles...
Can't spend more time on it tonight, I have a wife
(ideas for next prop session for you obviously)
How big of a deal is interlaced display? You could cut your vertical refresh to 1/2. then just draw each line twice for a considerably more relaxed timing. Many newer displays don't even show the interlace, though a CRT will. I have one of each on the desk, and the LCD appears to just dim the pixels a little. No flicker, Drawing both scans looks great. Something to think about.
Another option is the 640 pixel timing. It doesn't have to be 640 pixels...
Finally, since it appears to work some of the time, perhaps the latching issue comes down to what cycle that COG is operating on? Could try a waitcnt to shift it to a known one, and test all those cases.
---> and be good to her. She's a keeper.
I would prefer to avoid interlaced as I suspect a lot of monitors would barf on it, but I will keep it in mind...
If you look at VGA64... it scales down the clock to 20MHz, and outputs only 80% of the pixels that VGA uses - ie 512 active 128 hf/hs/hb @ 20MHz pixel clock versus 25MHz VESA at 640 active 160 hf/hs/hb
Good call on what cog cycle... also which cog it runs on may matter. I will do many experiments.
I just tried to run it on NCO mode, instead of PLL, and I don't get any output from waitvid at all! Weird, I must be missing something.
potatohead's suggested reading the screen buffer four characters at a time.
Initially I thought that the cycle could would be the same, but in fact I was able to shave one cycle, and use 32 pixel waitvids.
This leaves an 8 cycle window for waitvid.
Unfortunately that does not work all the time either.
I've come up with other couple ideas which should work:
(famous last words - my original version should work too!)
1) come up with cute code to synchronize PLL generated clock with processor clock
I actually think I have thought of a way to do this, but I will need:
- two pins to output clocks to while synchronizing
- five cogs (vga cog, four temporary monitor cogs until I sync it)
- theoretically possible without the temporary monitor cogs, but trickier, need to do more research
2) reduce the work in the scan line display so that waitvid/rdbyte have a lot more elbow room to sync
- I think I figured out a way, but it is ...umm... not easy, and I am not 100% sure I can make it fit into a single cog
3) give up on graphics blocks that can touch seamlessly - i hate this idea
Linus has some great code examples for synchronization here:
http://www.linusakesson.net/programming/propeller/pllsync.php
This is intended to sync up multiple COGS doing their own video, but...
8 cycles is enough time. I'm thinking the pixel clock being exactly a cycle is at issue somehow...
Thanks for that link! I will study it carefully!
EDIT: I just finished reading it - what a brilliant analysis!
No worries on the who said what. Just more or less wanted you to know I wasn't suggesting a long fetch on the character pixel data (because I know better)
hey, we've got it cleared up now
Edit: Now I know it doesn't work like this.
The video generator really doesn't operate in lock step with the HUB like that. It's all COG data. The HUB data is fetched, but by the time waitvid gets it, the data resides in the COG.
What I was thinking was when the pixel is exactly a instruction cycle, there might be a boundary condition in play, bug maybe, where the data latch doesn't happen properly, corrupting that frame to follow. Almost like the prop can't reliably move the data for a instruction, and perform the latch for the waitvid.
Bill, can you change your porch length to offset the pixel timing just a little, or run PLLA double speed, and ask for a frame, plus half a pixel? Or frame, minus half a pixel, whatever it takes to get the pixel disassociated with the instruction clock? (clearly have to account for that with the porches and there will be a gap)