Help with 1280x1024 tile driver

Dr_Acula · 2011-01-02 17:54

I spent some time searching the Obex for display tile drivers and found this quite amazing one by Chip Gracey. It is 1280x1024 and has a mouse as well.

Having seen the extraordinary worldwide joint effort on the Full Color Tile driver thread, I am wondering if it is possible to increase the color depth on this one?

It appears there are 4 colors per tile. Clearly increasing the color depth to, say, 64 colors, is going to use up more memory. As a rough ballpark figure, 160x120 is 19200 bytes and with 640x480 that means that about 1/16th of the screen can be unique tiles.

To my simplistic way of thinking I see two barriers to increasing the color depth from 4 to 64:
1) Propeller hub memory and
2) Timing considerations while building the scanlines prior to display

I think memory should not be such an issue if one is drawing Windows style pictures with mostly gray background.

This text is from the driver. Is there a fundamental timing problem with using more colors or could it be possible to build high color tiles, even if you had a lot less of them?

'' Start driver - starts two or three cogs
'' returns false if cogs not available
''
'' base_pin = First of eight VGA pins, must be a multiple of eight (0, 8, 16, 24, etc):
''
'' 240Ω 240Ω 240Ω 240Ω
'' +7 ───┳─ Red +5 ───┳─ Green +3 ───┳─ Blue +1 ── H
'' 470Ω │ 470Ω │ 470Ω │ 240Ω
'' +6 ───┘ +4 ───┘ +2 ───┘ +0 ── V
''
'' array_ptr = Pointer to 5,120 long-aligned words, organized as 80 across by 64 down,
'' which will serve as the tile array. Each word specifies a tile bitmap and
'' a color palette for its tile area. The bottom 10 bits of each word hold
'' the base address of a 16-long tile bitmap, while the top 6 bits select a
'' color palette for the bitmap. For example, $B2E5 would specify the tile
'' bitmap spanning $B940..$B97F ($2E5<<6) and color palette $2C ($B2E5>>10).
''
'' color_ptr = Pointer to 64 longs which will define the 64 color palettes. The RGB data
'' in each long is arranged as %%RGBx_RGBx_RGBx_RGBx with the sub-bytes 3..0
'' providing the color data for pixel values %11..%00, respectively:
''
'' %%3330_0110_0020_3300: %11=white, %10=dark cyan, %01=blue, %00=gold

potatohead · 2011-01-02 18:08

Timing is going to be trouble on that one. It takes 3 cogs just for the raster. The core limitation in play is the waitvid frame length. At these frequencies, it needs to be as long as possible to allow the propeller to build pixel and color values. A shorter waitvid frame would require even more cogs to build the display. I would be surprised to learn a Propeller could feed the display that fast.

It may be that dithering could be used to get a reasonable GUI display, though a lot of work would be required to keep tiles common.

Dr_Acula · 2011-01-02 18:39

Ok. What if the timing was changed to 640x480?

And/or, is there a programming technique called "unrolling"?

potatohead · 2011-01-02 18:55

Well, the current driver, at 256 pixels, is unrolled. It could probably do 320 above 80Mhz, if the safety "or" instruction were removed. That one is in there to keep sync outta the active display. It isn't strictly needed, but it's typically in there, just because. Many of the TV drivers go with out it, and people have some trouble with sync too.

I think Jim used a 640x200 timing for that one, which is fairly slow.

Really, what we need is the 320 pixel, TV style timing to work on VGA. That one would deliver 640 pixels or so, because it's very doable on TV. Full color at 640 pixels is probably doable at 90Mhz. Cut the sweep down, and the number of pixels goes up! Problem with that is many of the VGA monitors out now won't sync that slowly anymore. I've not tried mine. It is a CRT, so maybe... those were the ones that did it. Basically CGA timings are what is needed. Those map to the TV sweeps and pixel counts rather nicely. The big deal with those is the poor vertical resolution at 200 lines or so. Maybe 240. I think that's why they are no longer supported. Would look crappy on modern displays with nice, thin scan lines. Older ones had a much fatter scan line, where that resolution made sense. Maybe a non-standard kind of mode, a hybrid would work, but there would be a significant risk of not being displayable universally.

Unrolling is taking something like:

loop  rdword  pixels, index
       sub  index, #4
       waitvid colors, pixels
       djnz index, #loop

and simply repeating it index times!

rdword
sub
waitvid
rdword
sub
waitvid
.
.
.

etc.

So, doing it the straightforward way is probably off the table. It may be that the code in the 1280 driver, that pixel locks all the waitvids, could be used to run a few cogs together each responsible for one of red, blue, green, with one of them doing sync, for a 3 cog raster. Each of them could run in 4 color mode, all pixels combined. That's maybe something that could happen at 640 pixels or so, with the longer waitvid frames we have now.

IMHO, the tile addressing would have to change though. Instead of packing all the colors into one byte, what would need to happen is there would be three arrays of pixels, each like a simple bitmap, each driving one color, all combined to render the pixels. That makes tiles kind of ugly. That's called bitplanes. I don't even know how tiles would be done... Probably have to just make three 4 color tiles, the product of which would then equal the full color tile.

One advantage of that is fewer wasted bits. The screen display pixels would all be contributing to colors then, for some savings in the hub.

let's see what some of the others have to say. I'll think on it for a while.

Baggers · 2011-01-03 03:23

Dr_A, You do realise that that driver uses 4 cogs?

I'll see what I can come up with, using 2.

Baggers · 2011-01-03 05:27

what res are you actually wanting btw?

Dr_Acula · 2011-01-03 06:02

Yes 4 cogs is ok. I've got a C program working that can load and reload cogs on the fly from within code, so the 'cost' of cogs has gone down. Plus virtually all the hub is free to use as a tile buffer, and I see the code above is using well under half the hub ram so there is room to fit plenty more tiles in. It would be just fantastic if those tiles were full color!

Re resolution, 640x480 would be fine.

I need to study the code in more detail, but to my simplistic way of thinking, the timing problem is not in the displaying, it is in the building of the lines, right? And looking at the code above, I see a tile format that includes palate information and the data bits. Somewhere along the line this gets converted to 8 pins on the propeller going up and down at xMhz. To my very simplistic understanding, I would have thought that the maths involved in taking a tile with palate and bits encoded into it might actually be more complex than reading a full color tile that has already been pre coded with byte data that is essentially the state of 8 propeller pins. Just point the code at the start of the tile, then read n longs into the scanline buffer, where n is a fixed value that is the size of the tile.

But maybe it is not that simple?

Baggers · 2011-01-03 07:38

Your first issue is only having 4 pixels per waitvid command, those waitvids in that code are inline one after the other, and they're displaying 16 pixels per waitvid.

potatohead · 2011-01-03 08:22

http://www.youtube.com/user/atarixle

That one was done at 320x200(maybe 192) in two colors. 320 pixels is possible with all the colors. Maybe that's enough??

Still thinking about 640, and I'm still thinking it's gotta be multi-cog, bit-plane style.

There are some colored bits. That computer was capable of running sprites on top of things, and was also capable of changing colors per scan line. You will also see it drop to character mode for some things. That's both a limitation of that machine, and the need to incorporate it's DOS, which was character based. Ignore that, I suppose. Really, what I was linking to is the GUI possibilities at low resolution.

Another one to look at, that I don't have video links for is GEM / GEOS. Both ran at 320 / 640 pixels, and GEOS was actually somewhat effective on the C64.

Edit: Here's a GEOS one to look at: http://www.youtube.com/watch?v=j1Mnvead8Tc

GEOS is assembly language. The Atari one linked was written in Turbo BASIC, which compiles to machine code. For reference, SPIN runs at a similar instruction speed on a Prop, just for some idea of the resources required to draw GUI at that resolution.

Dr_Acula · 2011-01-03 14:24

In C I'm testing switching between text and graphics modes within a program. At the very least it makes debugging easier. Ultimately I'd like to head to a GUI.

In theory, if you can do 320 in 2 cogs, can you do 640 in 4 cogs?

GUI needs code space and video space. I think you could get a rudimentary GUI working using Kye's 160x120 driver but there would not be much space left for the user's spin code. Once you go above 160x120 I think Tiles are the only option. Move to the 256x240 tile driver and I think you have to start thinking about rebuilding tiles from within code. You will need tiles for each ascii character and tiles for corners of boxes and lines and pictureboxes. That takes code space and then it is a fight between code space for code vs space for more tiles. This is why I am coding in Big C (XMM) because it frees up all the hub ram. If Big Spin existed I'm sure that would work too.

I don't quite understand what Baggers says about waitvid - I need to take another look at the code.

potatohead · 2011-01-03 14:26

He said the waitvids were back to back, meaning the video was operating at max speed, and that the frame size was 16 pixels. Full color requires a 4 pixel frame, which is a 4X speed increase on code that's already fast.

Some recoding will need to be done.

Baggers · 2011-01-03 14:36

I can do 320 in 2 cogs, I have it working primarily at 5Mhz clock
If you want 640 you're not going to get it on a 5Mhz prop.

Dr_Acula · 2011-01-03 14:59

Sorry I'm being so thick about video.

First, this is the explanation from the manual:

WAITVID (Colors, Pixels )
• Colors is a long containing four byte-sized color values, each describing the four
possible colors of the pixel patterns in Pixels.
• Pixels is the next 16-pixel by 2-bit (or 32-pixel by 1-bit) pixel pattern to display.
...
The Colors parameter is a 32-bit value containing either four 8-bit color values (for 4-color
mode) or two 8-bit color values in the lower 16 bits (for 2-color mode). For VGA, each color
value’s upper 6-bits is the 2-bit red, 2-bit green, and 2-bit blue color components describing
the desired color; the lower 2-bits are “don’t care” bits. Each of the color values corresponds
to one of the four possible colors per 2-bit pixel (when Pixels is used as a 16x2 bit pixel
pattern) or as one of the two possible colors per 1-bit pixel (when Pixels is used at a 32x1 bit
pixel pattern).

I'm not clear about what is meant by "4 color mode". I thought it meant you could only display 4 colors in any waitvid. But then there is a description of "four 8bit values" so now I'm wondering if it means 4 colors per 2 pins, ie on the red pin, 00, 01, 10 or 11.

But then, if you could only display four colors per waitvid, how are all the full color drivers working? So I'm even more confused.

I took a look at Kye's single cog 160x120 driver. I think this is the relevant code:

' //////////////////////Visible Video//////////////////////////////////////////////////////////////////////////////////////////

videoLoop               rdlong  buffer,         displayCounter             ' Download new pixels.
                        add     displayCounter, #4                         '

                        or      buffer,         HVSyncColors               ' Update display scanline.
                        waitvid buffer,         #%%3210                    '

                        djnz    counter,        #videoLoop                 ' Repeat.

%%3210 is the same as %11100100 ?

I'm more confused by that because I see the pixels as being a constant and the colors changing.

Also I see an OR with the synccolors. Could you leave that line out by pre-building the line as a series of bytes that already includes these?

All the above suggests that it is simply impossible to go much faster than something displaying 160 little colored squares per vga line.

Yet there is this vga 1280 driver that is producing fantastically smooth fonts with pixel sizes almost too small for me to see. How is the waitvid working in that code? Is there some tradeoff between color depth and speed? Or is the secret of the 1280 driver the clever synchronisation between the four cogs?

' Synchronize all cogs' video circuits so that waitvid's will be pixel-locked
                                                                                              
                        movi    frqa,#(pr / 5) << 1     'set pixel rate (VCO runs at 1x)                     
                        mov     vscl,#1                 'set video shifter to reload on every pixel
                        waitcnt cnt,d8_d4               'wait for sync count, add ~3ms - cogs locked!
                        movi    ctra,#%00001_111        'enable PLLs now - NCOs locked!
                        waitcnt cnt,#0                  'wait ~3ms for PLLs to stabilize - PLLs locked!
                        mov     vscl,#100               'subsequent WAITVIDs will now be pixel-locked!

ericball · 2011-01-03 17:35

Dr_Acula wrote: »

I'm not clear about what is meant by "4 color mode". I thought it meant you could only display 4 colors in any waitvid. But then there is a description of "four 8bit values" so now I'm wondering if it means 4 colors per 2 pins, ie on the red pin, 00, 01, 10 or 11. But then, if you could only display four colors per waitvid, how are all the full color drivers working? So I'm even more confused.

The video generator is basically a shift register which outputs (and shifts) 1 or 2 bits from the WAITVID Pixel (source) register every PixelClocks (from the last VSCL value when FrameClocks expired) PLLA clocks. The 2 bits (or 1 bit with msb=0) are then used to select one of the four bytes from the WAITVID Color (destination) register. So most video generators this means each tile is limited to 4 colors since each tile is displayed via a single WAITVID.

But what if you instead only have 4 pixels per WAITVID? Then the each pixel color can be loaded into the Color register and displayed with a %%3210 value for Pixel. Although this method removes the 4 color limitation it significantly reduces the possible resolution because only 4 pixels may be displayed per WAITVID.

Dr_Acula · 2011-01-03 18:22

Ah, bother.

Is it possible to split up the colors into cogs. Say you have one cog running 16 pixels per waitvid and it only is driving the red gun. Then another cog doing green? Can they be synchronised with that code above? 6 cogs even, two per color?

Hmm, hi res full color video seems so tantalisingly close.

Back to more practical things then, I've got a little bit stuck understanding this piece of code. In spin:

  dira_ := i & j   
...
   ifnot cog[i] := cognew(@entry, @dira_) + 1

and in pasm:

                        org

' Move field loop into position

entry                   mov     $1EF,$1EF - field + field_begin                                    
                        sub     entry,d0s0_             '(reverse move to avoid overwrite)             
                        djnz    regs,#entry

' Build line display code

                        mov     par,#xtiles
:wv                     mov     linecode+0,linecode_wv
                        add     :wv,d1
                        add     linecode_wv,#1
                        cmp     par,#1          wz
:sc     if_nz           mov     linecode+1,linecode_sc
        if_nz           add     :sc,d1
        if_nz           add     linecode_sc,d1
                        djnz    par,#:wv

' Acquire settings
                                                        'dira_        &#9472;&#61627;  dira
                        mov     regs,par                'dirb_        &#9472;&#61627;  dirb
:next                   movd    :read,sprs              'vcfg_        &#9472;&#61627;  vcfg
                        or      :read,d8_d4             'cnt_         &#9472;&#61627;  cnt

How does the 'par' work? My understanding is that you pass something with par and the pasm code processes this, and indeed this is how a driver from Kye seems to work loop mov displayCounter, par ' Set/Reset tiles fill counter.

But the code above seems to have the first instance of par as something overwriting par, and the instance further down where I would have expected dira to be retrieved from par instead is a line with no code and just a comment.

I have a feeling the pin parameters are being passed as a contiguous group of longs as the VAR section has this:

VAR

  long cog[4]
  
  long dira_                    '9 contiguous longs
  long dirb_
  long vcfg_
  long cnt_
  long array_ptr_
  long color_ptr_
  long cursor_ptr_
  long sync_ptr_
  long mode_

in which case how does the cog code get the pointer to dira_

kuroneko · 2011-01-03 19:08

Dr_Acula wrote: »

How does the 'par' work? My understanding is that you pass something with par and the pasm code processes this ...

But the code above seems to have the first instance of par as something overwriting par, and the instance further down where I would have expected dira to be retrieved from par instead is a line with no code and just a comment.

The first par (in the destination slot) is the shadow register. This can be (ab)used any way you want. The line mov regs, par accesses the parameter passed in from SPIN. So for further usage I'd look for regs access (which now points to the 9 long hub array).

Dr_Acula · 2011-01-03 19:34

Ah, thankyou kuroneko. That makes a lot more sense now. In the translation to C, I think I just need to set up the equivalent of the 9 long hub array and pass the location of that. Back to coding!

potatohead · 2011-01-03 20:51

IMHO, it's possible to run a three COG display, dividing up the colors. I hinted at that early on in the thread. I don't think the short waitvid frame can be used at the higher resolutions, would have to be a longer one.

A good target would be the 640x480 driver. It can be broken into three COGs, each doing one color. The only funky thing about that is the addressing. There will need to be basically three screens, one red one, with sequentially addressed pixels, and two more for green and blue.

Dr_Acula · 2011-01-04 00:55

Now that would be sweet! I didn't think you could synchronise cogs till I saw Chip's code. What is the difference between 1 bit per pixel and 2 bits per pixel in terms of the way waitvid works?

In terms of games, even a 6 cog vga solution is useful. 1 for the C driver (or spin), 1 for either keyboard or mouse input, and 6 for video. Shut down the screen temporarily between levels while you load the video cogs with sd card cog drivers. Do you think we can hack it to just do 2 bits in, say, 2 cogs, doing just red?

Baggers · 2011-01-04 01:17

potatohead, doing a three cog renderer will still be tricky, you could make the waitvid do 8 pixels at a time then which is good, but chars will then be 64bytes each

but that will still be like 4 pixels at 320 timing, so you won't have much time at all to process the chars, but at least you can get the resolution using that method.

Dr_Acula · 2011-01-04 02:49

Will you need much time to process the characters? I'd be thinking that a tile is pre-processed as much as possible. Even to the point of having separate data files for red green and blue so you don't have to do any ORs or masking. Just point it at the start of the tile in hub ram and read in a group of bytes?

potatohead · 2011-01-04 07:19

That's where I'm thinking still. Target resolution would have to be 640, if anything. Maybe 512??

Coupla things. The data movement is essentially 4X, which really changes things. Dr_A, one thing Baggers is pointing out is video drivers really need to operate on data sizes that fit on multiple levels. Just pushing the pixels is part of the problem Pulling the data from sane data sizes is another part, unless all that is desired is a bitmap, then none of it matters. If there are to be tiles or sprites, just getting pixels is only part of the story. Just doing 1 bit per pixel would mean a 32 pixel tile.

And everything multiples too. If there are bitplanes, then there are three cog buffers to fill, one per color..

Can't just do a bitmap, because that's several times the HUB memory, so it's gotta be character (tiles) based. Still stuff to be worked through.

I can't help but wonder whether a 100Mhz prop won't do 512, or 480 pixels in the usual way...

ericball · 2011-01-04 09:16

Bitplane could be done by having some kind of triple sized tile batmap. The standard vga.spin tile is a WORD containing a 10 bit tile bitmap pointer and a 6 bit palette index. So either expand that out to a LONG with three 10 bit tile bitmap pointers (one each for red, green and blue planes) or stick with the WORD and calculate the three addresses from the single 10 bit address. It should be possible to detect a ROM address and use the palette index instead. Still 3 cogs and still triple the HUB RAM required for the tile bitmaps.

Dr_Acula · 2011-01-04 17:36

Ok, 3 cogs and triple the ram.

I wonder if this single cog demo from Kye might be a starting point http://obex.parallax.com/objects/655/

The demo goes for several minutes if you want to watch it all.

I'm a tile newbie, but I see these lines in the code:

'' // Each tile has has 16 longs and each long has 16 pixels. Each pixel has a value of 0 - 3 using quaternary encoding.
'' //
'' // A pixel of 0 maps to nothing and a pixel of 1, 2, or 3 maps to the color byte (%RR_GG_BB_xx).
'' //

So ok, cog one only does red, and the color map is either 00 01 10 or 11 for red?

Is it possible to synchronise the cogs using Chip's code?

Triple the ram might not be such a problem. Consider a screen where you have a black tile, and most of the screen is black but there are small high resolution bitmap tiles as icons, like on an ipad. The icons will take triple the ram, but the black background is repeated for most of the tiles?

Not many icons I know. I think you would only get two 64x64 icons. Or maybe 8 32x32 ones. Or maybe one 80x80 pixel picture.

potatohead · 2011-01-04 18:14

Well, I just had a thought.

Is your purpose to just get the icon on the screen, with some nice text? What about a driver that was mostly black, but with the center area of the screen capable of tiles, or active graphics of some kind, maybe even just a bitmap. Imagine the Kye driver, or the current High Color Tile driver running as a little square on a high-resolution screen. No graphics would be possible in the corners, just the active area, like one big 160x192 tile. Just to be clear about it, picture a 160x192 bitmap screen, sitting in the center of, say 640x480, or maybe 800x600 screen...

Could something like that work?

Dr_Acula · 2011-01-04 18:35

Yes that would be great but I thought that was not possible? Not because you can't build the scanline fast enough (more than half the scanline will be black) but because you can't output the pixels fast enough at full color.

potatohead · 2011-01-05 08:44

Really, I was exploring boundaries of what might work. I don't know that the much longer blank time gets anything. Need to think on that some. Was just wondering whether that was really the intent.

The only other slow sweep option is TV component video. Originally, I was thinking of it in terms of three video circuits, consuming 9 pins, driven by three cogs, one for luma, and the other two for red and green difference signals. One very nice thing about component video is the luma can be run at one, presumably higher, resolution, with the color difference signals running at a lower one. Could be more efficient in terms of the HUB ram cost, trading cogs for HUB ram.

A display might be 640 pixels of resolution for luma, and maybe 320 or 160 for color. Nice HUB ram savings...

Seems to me, a different video circuit would allow one cog to drive component with just 8 pins, a lot like VGA does. The only real advantage there is a lower sweep, and there still is the "find a display" problem...

Dr_Acula · 2011-01-05 17:53

Here is one out of left field. How about going for proper 24 bit color using external ram, but under propeller control.

Hardware = a 19 bit binary counter (some 74xx chips), three fast 512k memory chips, eg http://au.mouser.com/Search/Refine.aspx?Keyword=870IS61LV5128AL10TLI, and some R/2R D to A plus buffer (or use proper D to As).

The propeller handles the timing with generating vsync, hsync and the front and backport timing. It also generates a clock signal which it ought to be able to do easily. The clock signal is precisely 640 pulses then it pauses for the front and back porch. During that time, a much slower latch based circuit could take over the ram chips and you could write in a few new pixels.

ericball · 2011-01-05 18:21

Dr_Acula wrote: »

Here is one out of left field. How about going for proper 24 bit color using external ram, but under propeller control.

Sounds like a job for VRAM.

Bill Henning · 2011-01-05 19:47

It needs more than that - there has to be a way to write pixels to the memory, with random access to it.

I also suspect it would be MUCH simpler to add a Prop board to a mini-itx atom 230, with 24 bit graphics

Or, ofcourse, you could get a Morpheus

Dr_Acula wrote: »

Here is one out of left field. How about going for proper 24 bit color using external ram, but under propeller control.

Hardware = a 19 bit binary counter (some 74xx chips), three fast 512k memory chips, eg http://au.mouser.com/Search/Refine.aspx?Keyword=870IS61LV5128AL10TLI, and some R/2R D to A plus buffer (or use proper D to As).

The propeller handles the timing with generating vsync, hsync and the front and backport timing. It also generates a clock signal which it ought to be able to do easily. The clock signal is precisely 640 pulses then it pauses for the front and back porch. During that time, a much slower latch based circuit could take over the ram chips and you could write in a few new pixels.

Dr_Acula · 2011-01-05 21:31

Or, ofcourse, you could get a Morpheus

That is an interesting teaser? I see Morpheus has a whole lot of video options. I think you pushed the video to 256x192 a couple of years ago - which is similar to the tile driver 256x240. Have you pushed it further with full color? (The Morpheus site is a treasure trove of info. I may be away for some time!)

I tried a few hacks on Kye's driver - it says 4 color but I'm not sure what that means. There are octuplets with values of 0,1,2,3 and I tried changing one to a different value but it seems to be 2 color, not 4. Maybe I changed the wrong value. Is there a speed difference between drivers that do 4 colors per tile vs 2? (actually, just to confirm, is there such a thing as a 4 color per tile driver?)

If that works, I'd like to try to synch three cogs, one for red, one for green and one for blue.

VRAM looks intresting. Lots of ram around at 50-70ns but once you go below that, the price goes up and it seems a project in TTL might end up more than an entrie apad computer ($115).

I'll check out the atom.

Meanwhile, work continues porting the 1280 driver into C. I'm spending most of the time on the IDE rather than on the C. Sometimes the old school programming techniques work the best. Delete and add lines to a richtext box is slow with more than 100 lines. Write the richtextbox to a binary file, read it back using old fashioned mbasic type commands (which are still there in vb.net), edit, then write back as a binary file and then read back into a richtext box, and this has given about a 200 fold increase in speed.

Hopefully I'll have a demo in the next week swapping between several VGA modes from within code.

Help with 1280x1024 tile driver

Comments