Tile-rendering graphics system?
Hello there,
I have been using double-buffering for animation on VGA screens using microcontrollers, but there is also the "Tile-based" rendering which is common in old video game consoles and some portable graphics system.
I know they used tiles or pieces of a picture to render onto the screen, but what if the tile/sprite is not on a fixed location, example between two other tiles? Or when it starts translating? :nerd:
I have been using double-buffering for animation on VGA screens using microcontrollers, but there is also the "Tile-based" rendering which is common in old video game consoles and some portable graphics system.
I know they used tiles or pieces of a picture to render onto the screen, but what if the tile/sprite is not on a fixed location, example between two other tiles? Or when it starts translating? :nerd:
Comments
Tiles save memory but I guess sooner or later you run up against the hardware limitations of the propeller. I've moved over to touchscreens as a result as they have full color 240x320 and you don't have to worry about all these compromises.
Text mode displays only allow positioning per character. That's tiles.
"Translating" can mean a lot of stuff! On older systems, it generally means manipulating the tiles in real time to circumvent screen display limits, "racing the beam" on the display. On the Prop, it can mean that stuff, but more typically means the choice between using the tile and the sprite, depending on the display requirement.
Plus, I'm not entirely sure how to render tiles on the active video line (25.17us on VGA). Should I split the tiles into thin strips on each active video line? I saw Uzebox's example on that but the codes are very confusing. I hope there's a pseudo-code on tile-driving mechanism on the net.
Yes, thin strips! That's exactly it. The devil is in the details. Always is. What follows is some general conceptual info that might help you understand the drivers out there.
In general, the better approach has been to divide the tasks this way:
There is a signal COG. It actually writes the image out to the VGA display. This COG offers up the signal timing options and communicates the display state to graphics COGS, and it reads from one or more scan line buffers. If there are sprites, there generally needs to be more than one scan line buffer.
There are one or more graphics COGS. A graphics COG fetches image pixel data and sometimes color data depending on how the screen is represented, processes that data by masking and or shifting and sometimes sorting, and it writes that data to the scan line buffers for display by the signal COG.
The key to this is to have the graphics COG(S) doing their work to fill the buffers before the signal COG has to display them. The other key to this is to break the task down into chunks that can be done in parallel. One of the simplest ways to do that is to have one scan line buffer per graphics COG. A graphics COG will get what it can get done in one scan line worth of time. That's at least the tiles and a couple of sprites.
If more data is to be displayed per line than a single graphics COG can process, add another buffer, and then add another graphics COG. Now the graphics COG has two scan lines worth of time to fill it's buffer! That generally means more sprites per line. Need more? Add another, and another...
It's best to select screen representations that are powers of two, and that fit Propeller data sizes. A great example of this is 8x8 vs 4x8 pixel sprites. Having sprites that are 8 pixels wide makes a lot of sense from a design perspective because that pixel size is familiar and useful without too much handling of sprites. Larger sprites, say 16x8 or 16x16 are even better
, but...
full color displays on a Propeller are 4 pixels. That's due to how the waitvid works. Turns out that processing 4x8 sprites is a lot faster than larger sprite sizes, and that's the critical path. A program manipulating those sprites can very easily work with them in "clumps" to move larger images around the screen.
Best results generally come from having tile data organized sequentially in the HUB RAM, long aligned so that a graphics COG can just update a pointer, do the math to account for the tile number reference and fetch the pixels from the HUB. The same is true for sprites.
That means having a tile pixel buffer pointer and a sprite pixel buffer pointer at a minimum.
It may be a bit faster in the graphics COG to distribute the tile and sprite pixels in some non-linear fashion to avoid computation in the critical buffer fill loop, but that's going to be painful to create art for, and probably not too much faster. Graphics COGS tend to really add up. The first one can deliver tiles, plus a few sprites per line. A second one really adds a lot of time, yielding a much larger number of sprites per line, etc... Unless the screen object movement requirements are really stiff, it's likely two or maybe three graphics COGS will get it done. Spreading out the data could potentially reduce that by one COG, but then again, so could reconsidering how things happen on the screen.
From there you need a "screen" That's a sequential HUB memory block, that is byte or word or long aligned, depending on how tile addressing is done.
There are some basic methods for addressing tiles.
1. Chip's way, which is a great way, but proven confusing to a lot of people. Chip packed a tile address and a color palette reference into one long. This forced tiles to be 64 byte aligned, but allowed for a large number of tiles, and good color flexibility. All propeller colors can appear on screen, and tiles can appear in multiple places using different colors without having to store each color image combination in the HUB.
With this method, there needs to be sets of colors stored in the HUB, a pointer to them, and a list of tile addresses and color palette references and a pointer to those in the HUB. Graphics_Demo.spin does this. I've got a commented version out there which can help with this. Sprites can be done, but they suffer from color clash as there is no means to color tiles and sprites independently. On the Hydra CD, there is the HEL driver which employs sprites.
2. Numeric indexes. These are typically byte or word. Bytes yield 256 tiles, words 65536 tiles. With this method, the tiles typically are "full color" tiles. That means each pixel = one byte of HUB memory. This is by far the fastest method where it comes down to just throwing things around on screen. But, no color redirection is possible, so each color image needs to be stored in the HUB, with that color, or some computation must be done in the graphics COG to realize different colors.
A simple shift can perform the multiplication needed to get the tile base address. Logical AND the scan line counter to get the tile scan line offset address, finally adding that to the tile base address pointer and you've got the address of the pixels for that tile on that scan line. A very similar scheme works for sprites.
Sprites are managed via a sprite table, which is a sequential set of parameters for the sprites. The list of sprites includes the number of items in the list, or some "done" delimiter. Either is fine. For each sprite entry, a long works well to hold the sprite number, x position, y position. If sprites are not full color, then maybe two longs are needed to store all of that, plus a color palette entry, or put the color palette entry in the sprite table.
3. Hybrids! Tiles can be directly referenced by address or by a tile number, and colors are associated with tiles either on a per scan line or per tile basis. Tile addresses are stored sequentially to form a screen, where the number of addresses is simply screen X tiles * screen Y tiles.
Sprites can either share the tile colors, or not, or have their own colors or not. Usually, there is a table of some sort to detail where the sprites are, and the table is sequential which determines overlay order. An example might be tiles drawn first, then sprites. Sprites occlude tiles, and lower number sprites are occluded by higher numbered ones.
Sprites can either include transparency, or not too. If they do include transparency, one color must be considered "see through" and that requires processing in the graphics cog to get done, which means fewer sprites per line per COG, but more flexibility in using them.
Basically, storing addresses instead of indexes can speed some things up, but the trade-off is always space or speed elsewhere, or flexibility. One example would be encoding sprites with their color information. This can make the sprite table shorter and easier to process, but then it's harder to put the same sprite in two places with different colors! More HUB memory would be needed to store the same sprite twice with the different color definitions. On the other hand, having a separate color definition means reusing the same sprite image a lot of times, trading for more compute time being needed to fetch, mask and place the sprite pixels in the scan line buffers...
If I were you, I would take a look at the driver in my signature as well as the ones from a few games. JT Cook built a nice one for his recent game that runs on the C3. If you are wanting to build your own, tearing down a few that are out there is a great start! Once you see how it all comes together and what the timing / data / multi-cog requirements are, you can either use one of the engines that are out there, or build one up for just the task you are wanting to do.
Rolling your own is time expensive, but the trade off is being able to really optimize the HUB memory for the desired display.
In general, it's best to do as little as possible in the video COGS. The higher level program can initialize tables, sort sprite positions, update pointers and such. The video COG has some free time during blanking periods on the display too. Some tasks can go there, depending. Graphics COGS should be simple number movers, maskers and crunchers. All they do is wait for a state flag and go. The signal COG generally updates buffer pointers, which is a great signal for the graphics COGS to operate on. They see the buffer change, and that's their signal to go fill a new buffer. They can be passive mostly, just responding to state changes dictated by the signal COG.
Thanks for the explanation. I'll go digest it one by one. Plus, it's hard to coax the people in my place about the beauty of the Propeller. So, I ended up having to demonstrate the tile-driver scheme using a PIC32, which is easily accessible.
I do not know if I can write the tile driver for single cores, but is my pseudocode demonstrate it (example, one empty screen with 8x8 pixel at 0,0 coordinate)?
I admit the pseudocode may look very naiive, so suggestions are welcome.
From what I've read on the TMS9918 (an old picture generator system developed by TI), they have a layer exclusively for tiles, sprite-1, sprite-2, and sprite-N, plus layers for fixed characters. So am I using this idea correct?
Here's a 8x8, one bit per pixel tile code segment:
On this one, the simple tile addressing is used. One byte in the screen buffer = one tile numbered by that byte = 256 possible tiles. Screen is just a sequential string of bytes equal to (x_tiles * y_tiles), and the tile code reads into this string over and over to know which tile it's working on. Say there are 20 tiles / line, and that the tiles are 8 pixels high. That means the first 20 bytes of that string will be read over and over for the first 8 active display scan lines.
fonttab is just the list of tiles, a part of which I included. 256 of those tops. This is the base tile address. Fontsum is the addition to the base address to account for the offset needed for each scan line. Fontline is the scan line counter divided by modulo 7, usually an AND instruction does this quick and easy, but you just need a recurring counter for each row of tiles.
The rest of the loop shows how tiles are assembled onto the screen.
Basically, that is get a tile number from the screen, use it to calculate where the pixel data is, then get that, then put it into the waitvid instructions, add colors, issue waitvid to draw it all, then repeat until all the tiles on all the rows are done.
Doing just tiles is easy. One video loop is needed, and it can go right into the signal COG, drawing the tiles in real time, one by one.
There isn't time to do much else though. That loop is constrained by how long it takes to do one waitvid. If the loop takes too long, the Prop will display the wrong data, leaving sparkles or worse on the screen, potentially losing sync with the display. That code is kind of slow, but simple to read. It can do maybe 20 tiles maybe 30 on a low frequency VGA screen. More sophisticated methods that hit the HUB timing window and that operate with longs, or other tricks are needed to get more tiles per line on higher frequency displays.
In this example, it's a simple text driver. The same methodology applies though. If you want bigger tiles, use larger constants and more shifts. If you want more colors, then you read more pixel data, etc...
What this can't really do is sprites. The tiles are fetched and drawn just before the beam gets there. Essentially, the buffer is just the waitvid. No buffer, means no time to collapse all the layers for sprites into something that waitvid can process.
I'll stop there. Does any of that make sense?
Where I'm heading with this is to understand tiles and how their pixels get put onto the screen. Then, that can be extended to a scan line buffer, and more than one COG, so there is time for sprites. Finally, sprites are just more tiles on top of tiles, and they are shifted into place before writing them into the buffer, where the background tiles are just written to the buffer as is, because they stack one next to the other.
The PIC32 has instruction cache which can make things very unpredictable - which means that I can't calculate the no. of cycles in the code. I might be using an FPGA for this one too.
I'll try to read up on your sample codes if I have the time.
Really, the big limitations are:
1. RAM. There isn't anywhere near enough to really exploit what the video system + PASM + COGS can do.
2. Resolution + color. It's fairly easy to get one or the other up and going quickly. Assuming RAM isn't a contributing factor, doing both tends to compromise both, due to throughput limitations and the limit on how much data a waitvid can process.
3. Color redirection is generally expensive for most tile + sprite schemes. It just takes a long time to pull up pixel data, do color lookups, mask, then buffer. Much easier to just pull the pixels, shift and buffer, which is why so few drivers offer color redirection, which is the ability to assign the same pixel data different colors without having to store the pixels and colors once for each color set desired. Limited palette capability, in other words. Chip's tile scheme is actually damn good in this respect, but it's traded for sprites being independent on the screen, without using some external circuit and a second video generator. We've discussed that, but nobody ever really did it.
Get past those and the Prop can sling a mad number of things around on the screen, and do nifty things with tiles and display signals that are very, very difficult on other devices. Some elements of this process are down right trivial on a Prop.
Before the discussion gets too far, what are the display requirements? Perhaps this can be qualified out quickly. Truth is, if your display fits what the Prop I can do, we've got the drivers done. There are lots of methods and code for each. Would be much simpler to combine the elements needed and write the connect / interface / functions for it than it would be to roll another one from scratch. Don't get me wrong though. Always fun to talk about.
Re: Tough to render tiles.
Yep. Resources were scarce. Fun to see all the old school graphics techniques appear here on the Prop. I've quite enjoyed it, but it's not as easy as a bitmap is.
Too bad we don't have more RAM. Just for fun, I did a full, single buffer bitmap display for the NyanCat animation. Resolution requirements were in the bitmap sweet spot, so I went ahead with it. Truth is, if we had more HUB RAM, Props would flat out rock on that kind of display, and it would be really easy too. Rather than double buffer, I just chose to dynamically draw each frame onto the bitmap. Managed to get the cat drawn and animated using a single graphics cog, plotting pixels in PASM to the bitmap canvas. Worked a whole lot faster than I thought it would. The beauty of that scheme is it is much easier for people to just fire up multiple graphics COGS, each writing to parts of the screen. Doing that means a ton of moving things using very simple image definitions and display portability as only the pixel plot function needs porting to a different display configuration... Always wanted to explore that a little, and it's a sweet way of doing things. We just can't pack the pixels and or colors in there at any density that makes sense.
Oh well. Prop 2
That one is going to be capable of a lot. Of course, mobile phone chipsets are going to punch well above what a Prop 2 will, but the differences and scope of usability will be very significantly improved over what Prop I currently can do on it's own.