Creating a multi-cog VGA driver

escherescher Posts: 57
edited November 13 in Propeller 1 Vote Up0Vote Down
How's it spinning everyone...

I have finished an initial version of a 640x480 @ 60 Hz tile-based VGA driver, parameterized to support both 8x8 and 16x16 pixel tiles, as well as screen tile dimensions such that the horizontal and vertical screen tile dimensions are evenly divisible by the tile height and width with theoretical max screen dimensions of 40x30 w/ 16x16 tiles and 80x60 w/ 8x8 tiles. The code is located here. The graphics.spin object is the driver wrapper and vga.spin is the driver itself. The game and input objects are the game logic (which also defines the tile maps, tile palettes, and color palettes) and code to interface with 74HC165 shift registers to receive control inputs.

The problem is, I'm hitting actual dimension ceilings much lower for each because currently, the driver is pulling tile map, tile palette, and color palette data from hub RAM between each waitvid, so literally in real time in a losing race against the pixel clock. And that's before even attempting to incorporate a sprite engine into the driver.

The solution of course is a multi-cog driver, and I've found plenty of good information on this site concerning the implementation of such a system:

- http://forums.parallax.com/discussion/131278/wading-into-video
- http://forums.parallax.com/discussion/106036/multi-cog-tile-driver-tutorial/p1
- http://forums.parallax.com/discussion/123709/commented-graphics-demo-spin/p1

However, I still have some brain-picking to do from the people who have the most experience with this...

From what I've seen in Chip's demo multi-cog video drivers, it appears that he's purposed a single PASM routine to alternately render scanlines. So while one cog is building the next scanline(s) it will render, the other is rendering the scanline(s) it just built (extensible to however many cogs). His code is admittedly beyond my ability to fully understand at this point, but I believe I understand the methodology he's employing. I also understand that in this scenario, "synchronizing" the cogs is vital so that their independent video output is interleaved seamlessly.

The other paradigm I'm seeing is where you have a single "video" cog (which outputs the actual VGA signal) and several "render" cogs which build alternating scanlines and copy them to a buffer which the video cog can grab and display. In this scenario, cog synchronization isn't necessary (unless using a combination of Chip's method to have multiple "video" cogs as well as multiple "render" cogs) however the number of required cogs is increased.

I prefer Chip's method personally, due to the reduced resource footprint, however I'm concerned about its scalability i.e. when I add a sprite rendering component to the VGA driver, will there still be enough time to generate data for the video generator? Because with the video+render cogs method, the render cogs don't have to wait for the video generation to be completed before generating the next scanline(s) of data like Chip's method requires; once they've copied their generated data to the scanline buffer they can immediately start generating the next scanline(s), without having to wait for a vertical or horizontal blanking period to do so.

So out of all this, the questions here are:

1. Is this analysis of pros and cons of each accurate?
2. If not/so, which method (or other) is better for both constructing and pushing tile-mapped and sprite-augmented video at VGA speeds?
3. When I get to development of an NTSC driver, will one of these methods be preferential to the other?

Thanks for any help!

Comments

  • 14 Comments sorted by Date Added Votes
  • potatoheadpotatohead Posts: 8,956
    edited November 13 Vote Up0Vote Down
    Chips method results in higher resolutions, but it leaves little time for dynamic display methods, the primary one being sprites. It does use a tile map though. That can do a lot more than people think.

    It also does not do any color any pixel. 4 colors per tile. Works very well with software sprites done in a pseudo-sprite COG that is timed to VBLANK.

    The other method can do any pixel any color, and that works by running waitvid backwards. "WAITVID pixels, #%%3210"

    The pixel values are fixed and reversed, each assigned one of the 4 palette entries. Palette entries become pixel colors, one byte per pixel.

    This requires a lot more WAITVID instructions per line.

    What happens is the max pixel clock gets constrained by only having 4 pixels, and the horizontal sweep and propeller clock determine the max number of pixels. At VGA sweeps, this is 256 pixels. At NTSC, one can get 320, maybe a bit more.

    In addition, there is less racing the beam as the render COGS can work up to several scan lines ahead, doing tiles or a bitmap first, then masking in sprites. The more render COGS you have, the more sprites per line are possible.

    The slower sweeps on TV = more graphics possible per line. This favors the render COG method, if full color freedom is needed, otherwise both work well, and Chips method does a lot more with fewer COGS required.

    Optimal sprites are 4 x whatever number of pixels you want in the vertical direction. Run them in pairs or groups in the sprite position list to make larger objects.

    This is all the classic speed / color depth tradeoff in the propeller software video system.

    If you don't need super high resolution, nor all the colors in any pixel, Chips method is superior. What a lot of people miss about his method is each tile can be pointed to a region of HUB RAM. This can be used to partially buffer a screen and make effective use of higher resolutions where not enough RAM exists to represent the screen. Software sprites can be drawn to a small buffer, then displayed, repeat in vertical sections.

    Chips method works at two or one bit per pixel, and one of 64 possible palettes per tile.

    The other method, which several of us implemented, always works in units of 4 pixels, but any pixel is any color.

    At any kind of resolution, say 160 pixels horizontally, or more, and one scanline per pixel, there isn't enough HUB RAM to represent the entire screen. Dynamic display methods, or tiles are needed at a minimum to draw all the pixels, in byte per pixel operation.



    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • potatoheadpotatohead Posts: 8,956
    edited November 13 Vote Up0Vote Down
    There is one more great trick. Google WHOP. WAITVID hand off point.

    First discovered by Linus Akesson on accident, the idea is WAITVID autorepeats. If one times a register load just right, sequential WAITVID are possible sans WAITVID instructions.

    A user here worked the timing out. I can't remember their name... kurenko or similar.

    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • Kuroneko. (Black Cat, so easy to remember (for me at least :))
  • Thank you.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • potatohead wrote: »
    Chips method results in higher resolutions, but it leaves little time for dynamic display methods, the primary one being sprites. It does use a tile map though. That can do a lot more than people think.

    It also does not do any color any pixel. 4 colors per tile. Works very well with software sprites done in a pseudo-sprite COG that is timed to VBLANK.

    The other method can do any pixel any color, and that works by running waitvid backwards. "WAITVID pixels, #%%3210"

    The pixel values are fixed and reversed, each assigned one of the 4 palette entries. Palette entries become pixel colors, one byte per pixel.

    This requires a lot more WAITVID instructions per line.

    What happens is the max pixel clock gets constrained by only having 4 pixels, and the horizontal sweep and propeller clock determine the max number of pixels. At VGA sweeps, this is 256 pixels. At NTSC, one can get 320, maybe a bit more.

    In addition, there is less racing the beam as the render COGS can work up to several scan lines ahead, doing tiles or a bitmap first, then masking in sprites. The more render COGS you have, the more sprites per line are possible.

    The slower sweeps on TV = more graphics possible per line. This favors the render COG method, if full color freedom is needed, otherwise both work well, and Chips method does a lot more with fewer COGS required.

    Optimal sprites are 4 x whatever number of pixels you want in the vertical direction. Run them in pairs or groups in the sprite position list to make larger objects.

    This is all the classic speed / color depth tradeoff in the propeller software video system.

    If you don't need super high resolution, nor all the colors in any pixel, Chips method is superior. What a lot of people miss about his method is each tile can be pointed to a region of HUB RAM. This can be used to partially buffer a screen and make effective use of higher resolutions where not enough RAM exists to represent the screen. Software sprites can be drawn to a small buffer, then displayed, repeat in vertical sections.

    Chips method works at two or one bit per pixel, and one of 64 possible palettes per tile.

    The other method, which several of us implemented, always works in units of 4 pixels, but any pixel is any color.

    At any kind of resolution, say 160 pixels horizontally, or more, and one scanline per pixel, there isn't enough HUB RAM to represent the entire screen. Dynamic display methods, or tiles are needed at a minimum to draw all the pixels, in byte per pixel operation.

    Thanks for the great reply potato!

    The WHOP method I am aware of, and I've been talking with kuroneko about it on GitHub. Once I've got my head wrapped around it I'll investigate that path, but for now I'd like to focus on the "conventional" approaches as a starting point.

    I was aware of the bitmapped and tile based approaches but not the reversed waitvid one; that's pretty snazzy!

    But because my project is attempting to emulate retro arcade games, the tile engine I've implemented is my definite way forward. 4-color palettes are fine, and 6-bit RRGGBB color for 64 possible is sufficient for retro graphics.

    So long story short, since I'm having my system capable of displaying VGA as an option alongside "CGA" (15 kHz) and possibly NTSC, I'm choosing speed over color depth. I'm printing either a 16-pixel or 8-pixel tile palette line per waitvid.

    With this in mind, I'm thinking of the following setup...

    For a 640x480 @ 60 Hz, tile and sprite based, 6-bit color video driver:
    - One video cog which pulls the currently displaying scanline from main RAM
    - 2 render cogs which generate their respective alternating scanlines and write them to main RAM
    - One sprite cog which augments the scanline in main RAM after each render cog writes it but before the video cog grabs it

    The methodology of the sprite system is still up in the air, so that's subject to change.

    But based on my specs, is this paradigm valid and representative of "best practices" to an objective or subjective extent?
  • potatoheadpotatohead Posts: 8,956
    edited November 13 Vote Up0Vote Down
    What does augment mean? Add sprite data?

    Tiles are quicker than sprites. You might want three buffered scan lines, 4 total.

    All render COGS do tiles, then sprites.

    Sprites are slower. Shift plus mask (read, mask, write sprite data) plus write back to line RAM is needed for two bits per pixel. More scan lines is more time.

    Sprites are much quicker byte per pixel, BTW. No masks. If you start your video display one long in on the left and right, no clipping needed either. That's true for 2 and 4bpp too. Just don't display the clipped region.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • Yep that was what I meant by augment. I supposed I'll have to experiment with the architecture a bit, but I'll definitely start out having both tile AND sprite rendering in the same cog(s), and then restructure if that's not fast enough. I'm not sure what I would consider an "acceptable" number of sprites per scanline yet, so that will play a large role as well.

    Thanks for the help!
  • What I do is make it work. Then make it work faster.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • potatoheadpotatohead Posts: 8,956
    edited November 14 Vote Up1Vote Down
    The best thing you can do, is to make it work however it works. Then write a little demo program showing all the Sprites and the tiles and exercise it some.

    Once you have that working, crank up the numbers on the demo program till the driver fails. Then you know what kind of failure you've got. Then you can recode around it.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • Just noticed this comment:
    potatohead wrote:
    If you start your video display one long in on the left and right, no clipping needed either. That's true for 2 and 4bpp too. Just don't display the clipped region.

    That's a nifty trick, thanks for that too!

  • Thank Baggers. :D
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: http://forums.parallax.com/showthread.php?123709-Commented-Graphics_Demo.spin<br>
  • pik33pik33 Posts: 777
    edited November 17 Vote Up0Vote Down
    Several years ago I wrote a "nostalgic VGA driver" - it was 640x480 text driver (80x30 8x16px characters) with a border, signalling 800x600 to the monitor. It uses 2 cogs. One of them decodes a character from the font definitions to pixels, putting the results to the buffer in the main memory while the second cog displays pixels from this (circular) buffer. The cogs are synchronized via vblank/hblank signals. As the display cog's task is simple, there were some bytes left in the cog's memory so I used it for the color buffer.

    Here is the topic: http://forums.parallax.com/discussion/139960/nostalgic-80x30-vga-text-driver-with-border-now-beta/p1
  • That's pretty awesome! Very similar to the direction I'm going... I dig the Atari font too!

    My final driver is going to be an almost direct copy of the NES graphics system, with 4-color tiles and sprites of user defined size but with support for a wide variety of resolutions in addition to VGA, RGBS (basically CGA), and probably NTSC.

    Ambitious, but because I'm targeting those relatively low resolution retro graphics the modern Propeller is more than up to the task.

    So far I've gotten VGA working with tiles, and I'm now implementing the multicog method in a refactor that will allow higher tile dimensions as well as a sprite system.
  • potatohead wrote: »
    At VGA sweeps, this is 256 pixels.

    Sorry to revive an old post, but potato I was wondering whether this limit is for the entire scan line or just the visible video area (not taking into account horizontal sync area).
Sign In or Register to comment.