Shop OBEX P1 Docs P2 Docs Learn Events
50 column VGA text driver in one cog - 60/50/40/30/25/20...rows! (RELEASED) — Parallax Forums

50 column VGA text driver in one cog - 60/50/40/30/25/20...rows! (RELEASED)

Bill HenningBill Henning Posts: 6,445
edited 2013-08-24 16:37 in Propeller 1
This is the second driver in my high resolution, single cog flexible VGA text driver series.

MikronautsSVGA50 - 50 column driver, requires 5MHz crystal for 80MHz prop clock speed

As far as a monitor is concerned, it is VESA 800x600 60Hz timing.

I added VGA_Text like methods, so it should be a drop in replacement (except for palette handling).

The archive contains five fonts, and the number of times each scan line is displayed can be set in the driver, so this one driver provides the following text modes:

50x75, 50x60, 50x50, 50x40, 50x30, 50x25, 50x20
50x15, 50x12, 50x10, 50x8, 50x6, 50x5

It is a regular "text mode" driver, and as such, uses one byte per character in the screen buffer.

The first 128 characters contain 32 graphic elements, then 96 ASCII characters, and the second 128 characters normally contain the inverse of the first 128 characters. The font is interleaved - this was the only way I could get the driver to work fast enough to get 50 columns!

I am MIT licensing the driver. Please don't change the file names so at least I get some exposure out of it

Thanks to kuroneko's hub/vid forced sync patch, this driver now released as v1.00

kuroneko sent in an improvement to his sync patch, so I updated the driver to v1.01

I'll fix the 8x8 font problem kuroneko found tomorrow.
«1

Comments

  • localrogerlocalroger Posts: 3,452
    edited 2011-02-21 05:54
    Something seems to be wrong. I've tried this driver on 3 different monitors with both a SVT1+VGA daughterboard and a stock demoboard. The demoboard LED's light up as if it's sending VGA but none of the monitors recognize the signal, and all of them recognize other VGA drivers. (One of them won't sync my own experimental 800x600 driver, but it sensibly says "mode not supported" and correctly reports the scan frequency I used.)
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-21 06:40
    Hi,

    Try the following changes to the CON block:

    hf = 20
    hs = 64
    hb = 44

    This puts the timing back to bog standard VESA 800x600@60Hz with 40MHz dot clock

    (actually the above numbers work with the 20MHz dot clock I am using to make the 50 column / 400 pixel "half" SVGA mode)

    You may need to play with those values a bit. I had different values in the CON block, ones suited to my test monitor.
  • localrogerlocalroger Posts: 3,452
    edited 2011-02-21 08:51
    I tried the alt numbers but it's still not working. My two LCD monitors don't even try and the CRT goes crazy for a few seconds before displaying "sync out of range." On a scope meter, both sync signals appear to be swamped with noise. Everything works and the scope meter looks normal with a different VGA demo.
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-21 08:55
    Interesting.

    I'll try a couple of other monitors and throw a scope on it later today - I appreciate your reports, it is a big help with debugging the beta!

    I am concerned with the noise you found on the sync signals; I have an "or" to the color word in the driver that should have prevented that.
  • Martin HodgeMartin Hodge Posts: 1,246
    edited 2011-02-21 09:43
    This is very strange. With a scope on the sync signals I can reset the prop many times and, seemingly at random, get clear or garbled sync signals.
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-21 09:54
    Argh.

    I've been giving it some thought - perhaps I need to let the video PLL settle some more... I had a similar issue with my Morpheus drivers. I have to test some boards before I can debug it some more.

    Perhaps I should use the counters in a different mode... not the video pll mode... hmmm NCO/single ended without an output pin should work if I divide the system clock by four to generate the 20MHz clock this mode needs
  • AntoineDoinelAntoineDoinel Posts: 312
    edited 2011-02-21 11:16
    Bill, I'm having similar issues (with this and the 64 column driver too).
    The LCD panel I'm using it's picky, but not at that point... at the moment I cannot test with another one.

    If it can be of any help, I noticed that the times it's going to sync (it may require even an half dozen resets, but it's absolutely unpredictable), the demo board will show a slightly different light pattern (less uniform) on leds, which seems a bit strange to me ?:-/
  • Martin HodgeMartin Hodge Posts: 1,246
    edited 2011-02-21 12:53
    Oops, I was testing the 64 column driver. Wrong thread, sorry.

    (But this driver has the same problem)
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-21 13:19
    Martin & Antoine,

    After lunch I did a bit of testing with ViewPort ... and I verified the WEIRD output on hsync/vsync on the VGA64 driver - and I bet ViewPort will show it on the SVGA50 as well.

    It is extremely weird as:

    - I verified that I OR hsync/vsync with $0303 to make sure HSYNC/VSYNC is high during regular pixel display
    - I verified its not the 'color' long getting clobbered by adding a 'fixedcolor' long

    Good thing I only released the drivers as 'BETA' versions :)

    ViewPort verifies that it is NOT noise, but output on the pins, so my main suspect right now is something wrong with pipelining the WAITVID's the way I am doing it, trying to pseudo-sync them to hub cycles.

    All of my new drivers depend on keeping the visible line display synced to the hub, I was counting on waitvid happening in the right window (the 9 clock cycles available after I read the pixel values). Waitvid is supposed to take 5+ clocks, kurenko's tests showed it taking 4-6 cycles... so I thought I was safe.

    One thing I can try is to shave some time off the hsync routine by in-lining it, perhaps I am throwing the first hub read out of whack... the other is to use NCO mode to divide down the CPU clock, which *should* result in better synchronization.

    I could also try "priming" the pipeline by reading the first 8 pixels to display during the horizontal back porch, after reading the color entry, and have the first waitvid happen right after.
  • potatoheadpotatohead Posts: 10,261
    edited 2011-02-21 13:30
    The priming stands a good shot at helping, but...

    Waitvids are 6 cycles essentially.

    At 5 cycles, the D&S data will be unstable. I believe Linus was the first to run into this with his 800x600 display post, where the waitvids would display the data, even with some of them commented out. They were just looping, picking up what was on the D&S busses. Some of you might remember the humorous, "look ma! No hands!" he had left in the comments.

    (that post is now gone too, so I can't link it)
  • potatoheadpotatohead Posts: 10,261
    edited 2011-02-21 13:38
    Are you storing the font data per row to eliminate the add and shift needed to otherwise obtain the pixel data address for a character?

    Nicely done, if I'm reading that right :)

    Also, you've got basically this:
                           '------------------------------------- 00
                            rdbyte  tile,screen_ptr
                            add     screen_ptr,#1
                            add     tile,font_ptr
                            rdbyte  pixel,tile
                            waitvid color,pixel
                     
    
    You deffo need to prime the pump to start this off right, IMHO. Look at the potatotext TV COG, in the "real_pixels" subroutine for a great loop that Eric did that basically does a similar thing, using the waitvid in parallel with the HUB OP. The loop sets up one "character" for the first read to get things running, and that character is just blank pixels, set to the border color. A calculation happens where the sync and border is just adjusted for active lines, where this read will occur, and it all kind of works.

    I'm a little concerned about the single waitvid being packed between the two hubops. The waitvid being 6 cycles to really get it's S&D data may not work well there, and neither will another instruction... Have you put a nop in there, just for grins? Would be curious to see whether or not that's tolerated.

    What you are wanting to do here may be possible though.

    Have you tried fetching a long of character data at a time?

    It seems to me, with the way you are storing your font data, you might be able to just shift bytes out, and shift pixels into the "pixel" long, for a net gain in throughput. Worked out in my driver, but the sweep timing is slower, and the operation profile different.

    expanding it to a RDLONG would gain you all those almost aligned cycles, but you've got a shift, mask, or, add set of instructions to deal with. Might not be a gain. Still, worth stepping through... Doesn't look promising at first glance. No way to hit the two instruction window with the shift, add, mask, or that's gotta happen. :(

    You could reorder your loop...

    RDBYTE
    add
    waitvid
    RDBYTE
    add

    Use the "prime the pump" operation with blank pixels to set this up, so that the waitvid is operating off the data from the earlier loop.
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-21 14:07
    potatohead wrote: »
    Are you storing the font data per row to eliminate the add and shift needed to otherwise obtain the pixel data address for a character?

    Nicely done, if I'm reading that right :)

    You are reading it right :-) ... and thanks!
    potatohead wrote: »
    Also, you've got basically this:
                           '------------------------------------- 00
                            rdbyte  tile,screen_ptr
                            add     screen_ptr,#1
                            add     tile,font_ptr
                            rdbyte  pixel,tile
                            waitvid color,pixel
                     
    

    You deffo need to prime the pump to start this off right. Look at the potatotext TV COG, in the "real_pixels" subroutine for a great loop that Eric did that basically does a similar thing, using the waitvid in parallel with the HUB OP. The loop sets up one "character" for the first read to get things running, and that character is just blank pixels, set to the border color. A calculation happens where the sync and border is just adjusted for active lines, where this read will occur, and it all kind of works.

    I thought the back porch pixels being output would overlap the rdlong to fetch the colors and the first characters worth of the unrolled loop above; that may still work if I inline waitvids for hf/hs/hb instad of calling Chips clever "do blank / do hsync / do most of vsync" routine.

    I spent a lot of time with paper, pencil, calculator.. then spreadsheet... doing timing analysis before trying this.

    It *SHOULD* work because I am deliberately using 20MHz clock for the 64/50 drivers (pll4x), and 25MHz (pll4x) with a 6.25Mhz xtal for the 80 column mode

    Why?

    Because then each pixel ends up being four processor cycles... thus I had 32 cycles to fetch/decode the tile, and by unrolling, leaving a 9 cycle window for waitvid to play with.

    Priming + in-line hsync + NCO mode instead of PLL *should* fix it, I should have some time tonight and tomorrow to try that.

    Whats funny is that it works almost all the time with my test monitor!
    potatohead wrote: »
    I'm a little concerned about the single waitvid being packed between the two hubops. The waitvid being 6 cycles to really get it's S&D data may not work well there, and neither will another instruction... Have you put a nop in there, just for grins?

    Nope, I deliberately wanted to give Waitvid some slop, but I can try a NOP.

    The initial rolled up version that was counting on 5-cycle waitvids was a disaster (it had a djnz after the waitvid)
    potatohead wrote: »
    What you are wanting to do here may be possible though.

    Have you tried fetching a long of character data at a time?

    It seems to me, with the way you are storing your font data, you might be able to just shift bytes out, and shift pixels into the "pixel" long, for a net gain in throughput.

    expanding it to a RDLONG would gain you all those almost aligned cycles, but you've got a shift, mask, or, add set of instructions to deal with. Might not be a gain. Still, worth stepping through... Doesn't look promising at first glance. No way to hit the two instruction window with the shift, add, mask, or that's gotta happen. :(

    You could reorder your loop...

    RDBYTE
    add
    waitvid
    RDBYTE
    add

    Use the "prime the pump" operation with blank pixels to set this up.

    The math, and spreadsheets say it should work... probably just needs priming, inline hsync, and maybe NCO.

    The re-order idea is a good one...

    The read long for character data cannot work as I'd be fetching the next row of the character.

    I do have a design for an 80x25 for 80MHz that does that, however it requires two blank rows above each character to do prefetches. I have not finished that one yet, as due to the blank rows it cannot do contigous vertical graphics characters.
  • localrogerlocalroger Posts: 3,452
    edited 2011-02-21 15:27
    By the way Bill, I have to thank you for the idea of unrolling the inner loop. After I finish my 3-cog 40 by 18 ROM font display I think I can achieve 40x24 with 1024x768 signaling by unrolling the waitvid loop this way.
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-21 18:24
    You are most welcome Roger!

    all:

    I've just spent some more quality time with ViewPort - no joy yet.

    I tried a quick-and-dirty "priming", and it did not help.

    Looking at the logic analyzer output, it sure looks like it is outputting something different than is supplied to WAITVID; so I guess I have more experiements to run.

    By priming, re-ordering to put waitvid first, and NCO I still think it should work... or at least I hope so, because it usually syncs on my monitor, and looks great...
  • potatoheadpotatohead Posts: 10,261
    edited 2011-02-21 19:10
    @localroger, yeah. Unrolling can do a lot! Baggers has it in his VGA tile and sprite driver we worked on a while back too.

    I like Bill's trick of storing the font in the other orientation. Might have to do that on potatotext when it goes to VGA. Hate storing fonts reversed and shuffled, but... hate not getting the higher character densities too.

    @Bill. I think this is kind of instructive. I really wish I had archived the post that Linus did. He did a similar thing, where each pixel was a perfect multiple of a cycle, ending up with a unrolled loop very similar to yours. There is something about doing that, and the D&S latch on the waitvid, that contradicts the 5 cycles we've come up with. The thing isn't stable at 5 in my adventures, but works well at 6. Problem with that is it latches the data, when it latches it... Not entirely sure when that is. For almost every use case, that bit of data isn't all that important. It sure looks like your case should work.

    Probably, several events are coming together at once. Hopefully, the options you've got open to you shuffle things around enough to break whatever it is that's allowing the waitvid to catch what I would call "bus noise", whatever is on there at that indeterminate time.

    I might have a copy of the code Linus did though. I'm looking for it. There are a coupla areas about the waitvid I don't understand well yet, and this is one of them. The other being the impact of vcfg mid frame. Ran into trouble on that, switching from 2 to 4 color mode a time or two. Different subject, but it's why I've not written a driver that can do mixed modes in a scan line yet.

    Maybe you will shake out a new fact or two :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-21 20:23
    potatohead wrote: »
    I like Bill's trick of storing the font in the other orientation. Might have to do that on potatotext when it goes to VGA. Hate storing fonts reversed and shuffled, but... hate not getting the higher character densities too.

    Thanks! It was the only way I could get the timing right... on paper anyway.
    potatohead wrote: »
    @Bill. I think this is kind of instructive. I really wish I had archived the post that Linus did. He did a similar thing, where each pixel was a perfect multiple of a cycle, ending up with a unrolled loop very similar to yours. There is something about doing that, and the D&S latch on the waitvid, that contradicts the 5 cycles we've come up with. The thing isn't stable at 5 in my adventures, but works well at 6. Problem with that is it latches the data, when it latches it... Not entirely sure when that is. For almost every use case, that bit of data isn't all that important. It sure looks like your case should work.

    Yep, I wish you had archived it too. I figured that a 9 cycle window would be plenty!!!
    potatohead wrote: »
    Probably, several events are coming together at once. Hopefully, the options you've got open to you shuffle things around enough to break whatever it is that's allowing the waitvid to catch what I would call "bus noise", whatever is on there at that indeterminate time.

    I just tried several other instruction orders... no joy.
    potatohead wrote: »
    I might have a copy of the code Linus did though. I'm looking for it. There are a coupla areas about the waitvid I don't understand well yet, and this is one of them. The other being the impact of vcfg mid frame. Ran into trouble on that, switching from 2 to 4 color mode a time or two. Different subject, but it's why I've not written a driver that can do mixed modes in a scan line yet.

    Maybe you will shake out a new fact or two :)

    Thanks for looking for Linus' code! It might help.

    As far as I can tell, the code *should* work, and in fact it does most of the time with my little 5" monitor!

    Let me know what you find out about mid-frame vcfg changes :)

    Argh this is the problem with pushing the envelope - you run into boundary conditions!

    I can do single cog hirez text drivers by using a blank scan line to hide half the hub accesses, but that is (a) cheating and (b) does not allow vertical graphics by stacking tiles...

    Can't spend more time on it tonight, I have a wife :)
  • potatoheadpotatohead Posts: 10,261
    edited 2011-02-21 20:31
    Can you bend the timing a little bit, by changing the porches?

    (ideas for next prop session for you obviously)

    How big of a deal is interlaced display? You could cut your vertical refresh to 1/2. then just draw each line twice for a considerably more relaxed timing. Many newer displays don't even show the interlace, though a CRT will. I have one of each on the desk, and the LCD appears to just dim the pixels a little. No flicker, Drawing both scans looks great. Something to think about.

    Another option is the 640 pixel timing. It doesn't have to be 640 pixels...

    Finally, since it appears to work some of the time, perhaps the latching issue comes down to what cycle that COG is operating on? Could try a waitcnt to shift it to a known one, and test all those cases.


    ---> and be good to her. She's a keeper.
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-22 06:33
    potatohead wrote: »
    Can you bend the timing a little bit, by changing the porches?

    (ideas for next prop session for you obviously)

    How big of a deal is interlaced display? You could cut your vertical refresh to 1/2. then just draw each line twice for a considerably more relaxed timing. Many newer displays don't even show the interlace, though a CRT will. I have one of each on the desk, and the LCD appears to just dim the pixels a little. No flicker, Drawing both scans looks great. Something to think about.

    Another option is the 640 pixel timing. It doesn't have to be 640 pixels...

    Finally, since it appears to work some of the time, perhaps the latching issue comes down to what cycle that COG is operating on? Could try a waitcnt to shift it to a known one, and test all those cases.

    Wifey is DEFINITELY a keeper!


    ---> and be good to her. She's a keeper.

    I would prefer to avoid interlaced as I suspect a lot of monitors would barf on it, but I will keep it in mind...

    If you look at VGA64... it scales down the clock to 20MHz, and outputs only 80% of the pixels that VGA uses - ie 512 active 128 hf/hs/hb @ 20MHz pixel clock versus 25MHz VESA at 640 active 160 hf/hs/hb

    Good call on what cog cycle... also which cog it runs on may matter. I will do many experiments.

    I just tried to run it on NCO mode, instead of PLL, and I don't get any output from waitvid at all! Weird, I must be missing something.
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-24 09:51
    Update:

    potatohead's suggested reading the screen buffer four characters at a time.

    Initially I thought that the cycle could would be the same, but in fact I was able to shave one cycle, and use 32 pixel waitvids.

    This leaves an 8 cycle window for waitvid.
            '------------------------------------- 00
            rdlong          tile4,screen_ptr
            mov             tile1,tile4
            mov             tile2,tile4
    
            mov             tile3,tile4
            shr             tile2,#8
            shr             tile3,#16
            shr             tile4,#24               ' no need for and
    
            rdbyte          pix1,tile1
            and             tile1,#$ff
            add             tile1,font_base
    
            rdbyte          pix2,tile2
            and             tile2,#$ff
            add             tile2,font_base
    
            rdbyte          pix3,tile3
            and             tile3,#$ff
            add             tile3,font_base
    
            add             tile4,font_base
            shl             pix1,#8
            or              pix1,pix2
            shl             pix1,#8
    
            rdbyte          pix4,tile4
            or              pix1,pix3
            shl             pix1,#8
    
            or              pix1,pix4
            add             screen_ptr,#4
            waitvid         color,pix1
    

    Unfortunately that does not work all the time either.

    I've come up with other couple ideas which should work:

    (famous last words - my original version should work too!)

    1) come up with cute code to synchronize PLL generated clock with processor clock

    I actually think I have thought of a way to do this, but I will need:

    - two pins to output clocks to while synchronizing
    - five cogs (vga cog, four temporary monitor cogs until I sync it)
    - theoretically possible without the temporary monitor cogs, but trickier, need to do more research

    2) reduce the work in the scan line display so that waitvid/rdbyte have a lot more elbow room to sync

    - I think I figured out a way, but it is ...umm... not easy, and I am not 100% sure I can make it fit into a single cog

    3) give up on graphics blocks that can touch seamlessly - i hate this idea
  • potatoheadpotatohead Posts: 10,261
    edited 2011-02-24 09:56
    that actually was my suggestion. "character data" should have been screen data.

    Linus has some great code examples for synchronization here:

    http://www.linusakesson.net/programming/propeller/pllsync.php

    This is intended to sync up multiple COGS doing their own video, but...

    8 cycles is enough time. I'm thinking the pixel clock being exactly a cycle is at issue somehow...
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-24 10:10
    I misunderstood your original suggestion - but I fixed my post to reflect what you meant :)

    Thanks for that link! I will study it carefully!

    EDIT: I just finished reading it - what a brilliant analysis!
    potatohead wrote: »
    that actually was my suggestion. "character data" should have been screen data.

    Linus has some great code examples for synchronization here:

    http://www.linusakesson.net/programming/propeller/pllsync.php

    This is intended to sync up multiple COGS doing their own video, but...

    8 cycles is enough time. I'm thinking the pixel clock being exactly a cycle is at issue somehow...
  • potatoheadpotatohead Posts: 10,261
    edited 2011-02-24 11:13
    Seriously great. That's why I bookmarked it. I've not yet gone completely through it, but plan to. Want to make NTSC overlay drivers. Wanted to do that since I got a prop, and that detail is the path to making it work.

    No worries on the who said what. Just more or less wanted you to know I wasn't suggesting a long fetch on the character pixel data :) (because I know better)
  • JasonDorieJasonDorie Posts: 1,930
    edited 2011-02-24 13:07
    That's funny - I was just reading through this thread, thinking to myself, "I wonder if he's read Linus' description of syncing cogs for video output..." and trying to dig it out. Then I hit the last post. :) Most of what you're talking about here is over my head, but I'm interested to try writing a video driver at some point, so I've been reading a lot hoping to osmose some of it.
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-24 15:45
    1) come up with cute code to synchronize PLL generated clock with processor clock
    Just thinking aloud here, wouldn't it be more beneficial to sync the waitvid hand-off point to the hub window. I mean having enough cycles is one thing but doesn't get you anywhere if the video h/w ignores you during that time.
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-24 15:48
    potatohead wrote: »
    Seriously great. That's why I bookmarked it. I've not yet gone completely through it, but plan to. Want to make NTSC overlay drivers. Wanted to do that since I got a prop, and that detail is the path to making it work.

    No worries on the who said what. Just more or less wanted you to know I wasn't suggesting a long fetch on the character pixel data :) (because I know better)

    hey, we've got it cleared up now :)
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-24 15:48
    Writing video drivers is very interesting - if frustrating at times (at least when pushing the envelope)
  • Bill HenningBill Henning Posts: 6,445
    edited 2011-02-24 15:49
    That is exactly what I was trying to accomplish... unsucessfully so far :(
    kuroneko wrote: »
    Just thinking aloud here, wouldn't it be more beneficial to sync the waitvid hand-off point to the hub window. I mean having enough cycles is one thing but doesn't get you anywhere if the video h/w ignores you during that time.
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-24 16:02
    That is exactly what I was trying to accomplish... unsucessfully so far :(
    Check this (old) thread [thread=121521]waitvid minimum timing[/thread]. While untested, I'm convinced that the fact that the frame count can be pushed to n+1 lets you adjust the hand-off point. Selective VSCL breeding may get you there as well, I don't know right now.

    Edit: Now I know it doesn't work like this.
  • potatoheadpotatohead Posts: 10,261
    edited 2011-02-24 16:27
    Well, I had a different picture in mind.

    The video generator really doesn't operate in lock step with the HUB like that. It's all COG data. The HUB data is fetched, but by the time waitvid gets it, the data resides in the COG.

    What I was thinking was when the pixel is exactly a instruction cycle, there might be a boundary condition in play, bug maybe, where the data latch doesn't happen properly, corrupting that frame to follow. Almost like the prop can't reliably move the data for a instruction, and perform the latch for the waitvid.

    Bill, can you change your porch length to offset the pixel timing just a little, or run PLLA double speed, and ask for a frame, plus half a pixel? Or frame, minus half a pixel, whatever it takes to get the pixel disassociated with the instruction clock? (clearly have to account for that with the porches and there will be a gap)
  • kuronekokuroneko Posts: 3,623
    edited 2011-02-24 16:38
    potatohead wrote: »
    The video generator really doesn't operate in lock step with the HUB like that. It's all COG data. The HUB data is fetched, but by the time waitvid gets it, the data resides in the COG.
    It doesn't by design but if you pick your PLL in relation to the system clock (e.g. 1:1) I'd assume that it stays in sync. Odd ratios obviously void that.
Sign In or Register to comment.