Help with 1280x1024 tile driver
Dr_Acula
Posts: 5,484
I spent some time searching the Obex for display tile drivers and found this quite amazing one by Chip Gracey. It is 1280x1024 and has a mouse as well.
Having seen the extraordinary worldwide joint effort on the Full Color Tile driver thread, I am wondering if it is possible to increase the color depth on this one?
It appears there are 4 colors per tile. Clearly increasing the color depth to, say, 64 colors, is going to use up more memory. As a rough ballpark figure, 160x120 is 19200 bytes and with 640x480 that means that about 1/16th of the screen can be unique tiles.
To my simplistic way of thinking I see two barriers to increasing the color depth from 4 to 64:
1) Propeller hub memory and
2) Timing considerations while building the scanlines prior to display
I think memory should not be such an issue if one is drawing Windows style pictures with mostly gray background.
This text is from the driver. Is there a fundamental timing problem with using more colors or could it be possible to build high color tiles, even if you had a lot less of them?
Having seen the extraordinary worldwide joint effort on the Full Color Tile driver thread, I am wondering if it is possible to increase the color depth on this one?
It appears there are 4 colors per tile. Clearly increasing the color depth to, say, 64 colors, is going to use up more memory. As a rough ballpark figure, 160x120 is 19200 bytes and with 640x480 that means that about 1/16th of the screen can be unique tiles.
To my simplistic way of thinking I see two barriers to increasing the color depth from 4 to 64:
1) Propeller hub memory and
2) Timing considerations while building the scanlines prior to display
I think memory should not be such an issue if one is drawing Windows style pictures with mostly gray background.
This text is from the driver. Is there a fundamental timing problem with using more colors or could it be possible to build high color tiles, even if you had a lot less of them?
'' Start driver - starts two or three cogs
'' returns false if cogs not available
''
'' base_pin = First of eight VGA pins, must be a multiple of eight (0, 8, 16, 24, etc):
''
'' 240Ω 240Ω 240Ω 240Ω
'' +7 ───┳─ Red +5 ───┳─ Green +3 ───┳─ Blue +1 ── H
'' 470Ω │ 470Ω │ 470Ω │ 240Ω
'' +6 ───┘ +4 ───┘ +2 ───┘ +0 ── V
''
'' array_ptr = Pointer to 5,120 long-aligned words, organized as 80 across by 64 down,
'' which will serve as the tile array. Each word specifies a tile bitmap and
'' a color palette for its tile area. The bottom 10 bits of each word hold
'' the base address of a 16-long tile bitmap, while the top 6 bits select a
'' color palette for the bitmap. For example, $B2E5 would specify the tile
'' bitmap spanning $B940..$B97F ($2E5<<6) and color palette $2C ($B2E5>>10).
''
'' color_ptr = Pointer to 64 longs which will define the 64 color palettes. The RGB data
'' in each long is arranged as %%RGBx_RGBx_RGBx_RGBx with the sub-bytes 3..0
'' providing the color data for pixel values %11..%00, respectively:
''
'' %%3330_0110_0020_3300: %11=white, %10=dark cyan, %01=blue, %00=gold
zip
18K
Comments
It may be that dithering could be used to get a reasonable GUI display, though a lot of work would be required to keep tiles common.
And/or, is there a programming technique called "unrolling"?
I think Jim used a 640x200 timing for that one, which is fairly slow.
Really, what we need is the 320 pixel, TV style timing to work on VGA. That one would deliver 640 pixels or so, because it's very doable on TV. Full color at 640 pixels is probably doable at 90Mhz. Cut the sweep down, and the number of pixels goes up! Problem with that is many of the VGA monitors out now won't sync that slowly anymore. I've not tried mine. It is a CRT, so maybe... those were the ones that did it. Basically CGA timings are what is needed. Those map to the TV sweeps and pixel counts rather nicely. The big deal with those is the poor vertical resolution at 200 lines or so. Maybe 240. I think that's why they are no longer supported. Would look crappy on modern displays with nice, thin scan lines. Older ones had a much fatter scan line, where that resolution made sense. Maybe a non-standard kind of mode, a hybrid would work, but there would be a significant risk of not being displayable universally.
Unrolling is taking something like:
So, doing it the straightforward way is probably off the table. It may be that the code in the 1280 driver, that pixel locks all the waitvids, could be used to run a few cogs together each responsible for one of red, blue, green, with one of them doing sync, for a 3 cog raster. Each of them could run in 4 color mode, all pixels combined. That's maybe something that could happen at 640 pixels or so, with the longer waitvid frames we have now.
IMHO, the tile addressing would have to change though. Instead of packing all the colors into one byte, what would need to happen is there would be three arrays of pixels, each like a simple bitmap, each driving one color, all combined to render the pixels. That makes tiles kind of ugly. That's called bitplanes. I don't even know how tiles would be done... Probably have to just make three 4 color tiles, the product of which would then equal the full color tile.
One advantage of that is fewer wasted bits. The screen display pixels would all be contributing to colors then, for some savings in the hub.
let's see what some of the others have to say. I'll think on it for a while.
I'll see what I can come up with, using 2.
Re resolution, 640x480 would be fine.
I need to study the code in more detail, but to my simplistic way of thinking, the timing problem is not in the displaying, it is in the building of the lines, right? And looking at the code above, I see a tile format that includes palate information and the data bits. Somewhere along the line this gets converted to 8 pins on the propeller going up and down at xMhz. To my very simplistic understanding, I would have thought that the maths involved in taking a tile with palate and bits encoded into it might actually be more complex than reading a full color tile that has already been pre coded with byte data that is essentially the state of 8 propeller pins. Just point the code at the start of the tile, then read n longs into the scanline buffer, where n is a fixed value that is the size of the tile.
But maybe it is not that simple?
That one was done at 320x200(maybe 192) in two colors. 320 pixels is possible with all the colors. Maybe that's enough??
Still thinking about 640, and I'm still thinking it's gotta be multi-cog, bit-plane style.
There are some colored bits. That computer was capable of running sprites on top of things, and was also capable of changing colors per scan line. You will also see it drop to character mode for some things. That's both a limitation of that machine, and the need to incorporate it's DOS, which was character based. Ignore that, I suppose. Really, what I was linking to is the GUI possibilities at low resolution.
Another one to look at, that I don't have video links for is GEM / GEOS. Both ran at 320 / 640 pixels, and GEOS was actually somewhat effective on the C64.
Edit: Here's a GEOS one to look at: http://www.youtube.com/watch?v=j1Mnvead8Tc
GEOS is assembly language. The Atari one linked was written in Turbo BASIC, which compiles to machine code. For reference, SPIN runs at a similar instruction speed on a Prop, just for some idea of the resources required to draw GUI at that resolution.
In theory, if you can do 320 in 2 cogs, can you do 640 in 4 cogs?
GUI needs code space and video space. I think you could get a rudimentary GUI working using Kye's 160x120 driver but there would not be much space left for the user's spin code. Once you go above 160x120 I think Tiles are the only option. Move to the 256x240 tile driver and I think you have to start thinking about rebuilding tiles from within code. You will need tiles for each ascii character and tiles for corners of boxes and lines and pictureboxes. That takes code space and then it is a fight between code space for code vs space for more tiles. This is why I am coding in Big C (XMM) because it frees up all the hub ram. If Big Spin existed I'm sure that would work too.
I don't quite understand what Baggers says about waitvid - I need to take another look at the code.
Some recoding will need to be done.
If you want 640 you're not going to get it on a 5Mhz prop.
First, this is the explanation from the manual:
I'm not clear about what is meant by "4 color mode". I thought it meant you could only display 4 colors in any waitvid. But then there is a description of "four 8bit values" so now I'm wondering if it means 4 colors per 2 pins, ie on the red pin, 00, 01, 10 or 11.
But then, if you could only display four colors per waitvid, how are all the full color drivers working? So I'm even more confused.
I took a look at Kye's single cog 160x120 driver. I think this is the relevant code:
%%3210 is the same as %11100100 ?
I'm more confused by that because I see the pixels as being a constant and the colors changing.
Also I see an OR with the synccolors. Could you leave that line out by pre-building the line as a series of bytes that already includes these?
All the above suggests that it is simply impossible to go much faster than something displaying 160 little colored squares per vga line.
Yet there is this vga 1280 driver that is producing fantastically smooth fonts with pixel sizes almost too small for me to see. How is the waitvid working in that code? Is there some tradeoff between color depth and speed? Or is the secret of the 1280 driver the clever synchronisation between the four cogs?
The video generator is basically a shift register which outputs (and shifts) 1 or 2 bits from the WAITVID Pixel (source) register every PixelClocks (from the last VSCL value when FrameClocks expired) PLLA clocks. The 2 bits (or 1 bit with msb=0) are then used to select one of the four bytes from the WAITVID Color (destination) register. So most video generators this means each tile is limited to 4 colors since each tile is displayed via a single WAITVID.
But what if you instead only have 4 pixels per WAITVID? Then the each pixel color can be loaded into the Color register and displayed with a %%3210 value for Pixel. Although this method removes the 4 color limitation it significantly reduces the possible resolution because only 4 pixels may be displayed per WAITVID.
Is it possible to split up the colors into cogs. Say you have one cog running 16 pixels per waitvid and it only is driving the red gun. Then another cog doing green? Can they be synchronised with that code above? 6 cogs even, two per color?
Hmm, hi res full color video seems so tantalisingly close.
Back to more practical things then, I've got a little bit stuck understanding this piece of code. In spin:
and in pasm:
How does the 'par' work? My understanding is that you pass something with par and the pasm code processes this, and indeed this is how a driver from Kye seems to work loop mov displayCounter, par ' Set/Reset tiles fill counter.
But the code above seems to have the first instance of par as something overwriting par, and the instance further down where I would have expected dira to be retrieved from par instead is a line with no code and just a comment.
I have a feeling the pin parameters are being passed as a contiguous group of longs as the VAR section has this:
in which case how does the cog code get the pointer to dira_
The first par (in the destination slot) is the shadow register. This can be (ab)used any way you want. The line mov regs, par accesses the parameter passed in from SPIN. So for further usage I'd look for regs access (which now points to the 9 long hub array).
A good target would be the 640x480 driver. It can be broken into three COGs, each doing one color. The only funky thing about that is the addressing. There will need to be basically three screens, one red one, with sequentially addressed pixels, and two more for green and blue.
In terms of games, even a 6 cog vga solution is useful. 1 for the C driver (or spin), 1 for either keyboard or mouse input, and 6 for video. Shut down the screen temporarily between levels while you load the video cogs with sd card cog drivers. Do you think we can hack it to just do 2 bits in, say, 2 cogs, doing just red?
Coupla things. The data movement is essentially 4X, which really changes things. Dr_A, one thing Baggers is pointing out is video drivers really need to operate on data sizes that fit on multiple levels. Just pushing the pixels is part of the problem Pulling the data from sane data sizes is another part, unless all that is desired is a bitmap, then none of it matters. If there are to be tiles or sprites, just getting pixels is only part of the story. Just doing 1 bit per pixel would mean a 32 pixel tile.
And everything multiples too. If there are bitplanes, then there are three cog buffers to fill, one per color..
Can't just do a bitmap, because that's several times the HUB memory, so it's gotta be character (tiles) based. Still stuff to be worked through.
I can't help but wonder whether a 100Mhz prop won't do 512, or 480 pixels in the usual way...
I wonder if this single cog demo from Kye might be a starting point http://obex.parallax.com/objects/655/
The demo goes for several minutes if you want to watch it all.
I'm a tile newbie, but I see these lines in the code:
So ok, cog one only does red, and the color map is either 00 01 10 or 11 for red?
Is it possible to synchronise the cogs using Chip's code?
Triple the ram might not be such a problem. Consider a screen where you have a black tile, and most of the screen is black but there are small high resolution bitmap tiles as icons, like on an ipad. The icons will take triple the ram, but the black background is repeated for most of the tiles?
Not many icons I know. I think you would only get two 64x64 icons. Or maybe 8 32x32 ones. Or maybe one 80x80 pixel picture.
Is your purpose to just get the icon on the screen, with some nice text? What about a driver that was mostly black, but with the center area of the screen capable of tiles, or active graphics of some kind, maybe even just a bitmap. Imagine the Kye driver, or the current High Color Tile driver running as a little square on a high-resolution screen. No graphics would be possible in the corners, just the active area, like one big 160x192 tile. Just to be clear about it, picture a 160x192 bitmap screen, sitting in the center of, say 640x480, or maybe 800x600 screen...
Could something like that work?
The only other slow sweep option is TV component video. Originally, I was thinking of it in terms of three video circuits, consuming 9 pins, driven by three cogs, one for luma, and the other two for red and green difference signals. One very nice thing about component video is the luma can be run at one, presumably higher, resolution, with the color difference signals running at a lower one. Could be more efficient in terms of the HUB ram cost, trading cogs for HUB ram.
A display might be 640 pixels of resolution for luma, and maybe 320 or 160 for color. Nice HUB ram savings...
Seems to me, a different video circuit would allow one cog to drive component with just 8 pins, a lot like VGA does. The only real advantage there is a lower sweep, and there still is the "find a display" problem...
Hardware = a 19 bit binary counter (some 74xx chips), three fast 512k memory chips, eg http://au.mouser.com/Search/Refine.aspx?Keyword=870IS61LV5128AL10TLI, and some R/2R D to A plus buffer (or use proper D to As).
The propeller handles the timing with generating vsync, hsync and the front and backport timing. It also generates a clock signal which it ought to be able to do easily. The clock signal is precisely 640 pulses then it pauses for the front and back porch. During that time, a much slower latch based circuit could take over the ram chips and you could write in a few new pixels.
I also suspect it would be MUCH simpler to add a Prop board to a mini-itx atom 230, with 24 bit graphics
Or, ofcourse, you could get a Morpheus
That is an interesting teaser? I see Morpheus has a whole lot of video options. I think you pushed the video to 256x192 a couple of years ago - which is similar to the tile driver 256x240. Have you pushed it further with full color? (The Morpheus site is a treasure trove of info. I may be away for some time!)
I tried a few hacks on Kye's driver - it says 4 color but I'm not sure what that means. There are octuplets with values of 0,1,2,3 and I tried changing one to a different value but it seems to be 2 color, not 4. Maybe I changed the wrong value. Is there a speed difference between drivers that do 4 colors per tile vs 2? (actually, just to confirm, is there such a thing as a 4 color per tile driver?)
If that works, I'd like to try to synch three cogs, one for red, one for green and one for blue.
VRAM looks intresting. Lots of ram around at 50-70ns but once you go below that, the price goes up and it seems a project in TTL might end up more than an entrie apad computer ($115).
I'll check out the atom.
Meanwhile, work continues porting the 1280 driver into C. I'm spending most of the time on the IDE rather than on the C. Sometimes the old school programming techniques work the best. Delete and add lines to a richtext box is slow with more than 100 lines. Write the richtextbox to a binary file, read it back using old fashioned mbasic type commands (which are still there in vb.net), edit, then write back as a binary file and then read back into a richtext box, and this has given about a 200 fold increase in speed.
Hopefully I'll have a demo in the next week swapping between several VGA modes from within code.