TFT Driver - Optimizing for speed

Vega256 · 2015-12-26 02:29

Hey guys,

I'm writing a TFT driver for a 320x240xRGB QVGA display module; my code is attached. I'm trying to get the Prop to send RGB pixel data fast enough to get a solid 60fps, but to no avail. Could someone possibly give me some hints, tips, and/or insight on my methods? I would post the code right here, but the editor won't take my code in its entirety.

My basic theory of operation goes as follows. There is one 'line buffer' that holds the pixel data for one horizontal line. This buffer is 80 longs in size (16bpp x (320 pixels / 2) / 32) and this resides in main RAM. There is a single cog that reads the data from this buffer and sends it to the display via 16 GPIO pins.

msrobots · 2015-12-26 03:16

OK. I give it a shot.

in user_graphics_lines
you can save one instruction:

        mov     word_counter, #SCREEN_LENGTH
        shr     word_counter, #1

is the same as

        mov     word_counter, #SCREEN_LENGTH/2

let the compiler do it.

all them one line subs like 'cs_active'

cs_active
        and     outa, cs_mask
cs_active_ret
        ret

don't do that. Think about the overhead. you execute 3 instructions instead of one.

do not

        call     #cs_active

do

        and     outa, cs_mask ' cs_active

And you will gain speed and also some cog memory.

Basically the same with those 3-liners. But there you just gain two ins. cycles but loose two longs on memory for each call.

start with a list of the less often occurring calls in source and compare to a mental list to the most often called while running list.

inline the calls until you run out of cog memory...

Enjoy!

Mike

Ariba · 2015-12-26 04:16

Here is a long thread with alot of TFT drivers. They all use the Videogenerator to output the pixels fast. This works only up to 8 bits with one cog.
forums.parallax.com/discussion/115518/new-4-3-touchscreen-lcd-for-propeller-used-screens-almost-free-w-purchase/p1

How will you fit 320x240 pixels with 16bits per pixel into the RAM of the Propeller?

Andy

kwinn · 2015-12-26 04:16

I hate to throw a damper on this project but 320 x 240 at 60 frames a second cannot be done without using the on chip video circuitry.

60 frames a second requires one frame every 16.67 mS
240 lines per frame is one line every 69.44 usec
320 pixels per line is one pixel every 217.01 nsec

This is without subtracting the time taken for vertical and horizontal sync which will make the time per pixel even less.

A read from hub ram takes 8 to 23 clock cycles, and most others take 4 cycles. At 100MHz each cycle takes 10nS. A cog can access the Hub once every 16 System Clock cycles so best case there is only time for one hub access (8 x 10nS = 80nS) and two 4 cycle instructions (2 x 4 x 10 = 80nS). Any more instructions would require missing the next hub access window.

macca · 2015-12-26 09:25

Vega256 wrote: »

I'm writing a TFT driver for a 320x240xRGB QVGA display module; my code is attached. I'm trying to get the Prop to send RGB pixel data fast enough to get a solid 60fps, but to no avail. Could someone possibly give me some hints, tips, and/or insight on my methods? I would post the code right here, but the editor won't take my code in its entirety.

If I read the source correctly, the display has it's own controller, you are not driving it directly, right ?

If so, take a look at the attached source code. I wrote a driver for a display using the SSD1289 controller, it is capable of 60fps (well, nearly, 59.9fps) using 4 scanline buffers and some hardware tricks: invert the R/S line so you can load 16 bits of data directly from hub to OUTA (otherwise you have to set the R/S bit each time), this also has the side effect of driving the WR line low so you have to just drive it back HIGH to complete the write cycle. It doesn't use the CS line, the display is always selected. A bit of loop unrolling is also necessary because of the hub access window.

Hope this helps.

Vega256 · 2015-12-26 12:58

Thanks for the fast replies, guys. Just getting back to everyone:

msrobots wrote: »

OK. I give it a shot...

Good catch on the subroutines. Every call is basically 8 cycles since every call is also followed by a ret. Every nanosecond counts.

Ariba wrote: »

Here is a long thread with alot of TFT drivers. They all use the Videogenerator to output the pixels fast. This works only up to 8 bits with one cog.
forums.parallax.com/discussion/115518/new-4-3-touchscreen-lcd-for-propeller-used-screens-almost-free-w-purchase/p1

How will you fit 320x240 pixels with 16bits per pixel into the RAM of the Propeller?

Andy

Thanks for the list of resources. Although the display does indeed have a resolution of 320x240, I'm not doing bitmapped graphics here (maybe should have stated that before...)

Also, every two pixels on the display are being treated as one, making the effective resolution 160x120. I only need roughly 2.5KB to do 300 8x8 tiles.

kwinn wrote: »

I hate to throw a damper on this project but 320 x 240 at 60 frames a second cannot be done without using the on chip video circuitry.

60 frames a second requires one frame every 16.67 mS
240 lines per frame is one line every 69.44 usec
320 pixels per line is one pixel every 217.01 nsec

This is without subtracting the time taken for vertical and horizontal sync which will make the time per pixel even less.

A read from hub ram takes 8 to 23 clock cycles, and most others take 4 cycles. At 100MHz each cycle takes 10nS. A cog can access the Hub once every 16 System Clock cycles so best case there is only time for one hub access (8 x 10nS = 80nS) and two 4 cycle instructions (2 x 4 x 10 = 80nS). Any more instructions would require missing the next hub access window.

Thanks for mentioning the on-chip video generator; you bring up a good point. I have a follow-up question: I assume the video generator and the cogs are driven by the same clock (external osc. + PLL). How is it that the generator can do this job any faster than the cog? Is it because the generator is dedicated whereas cog is doing much more than outputting the video stream?

macca wrote: »

Vega256 wrote: »

I'm writing a TFT driver for a 320x240xRGB QVGA display module; my code is attached. I'm trying to get the Prop to send RGB pixel data fast enough to get a solid 60fps, but to no avail. Could someone possibly give me some hints, tips, and/or insight on my methods? I would post the code right here, but the editor won't take my code in its entirety.

If I read the source correctly, the display has it's own controller, you are not driving it directly, right ?

If so, take a look at the attached source code. I wrote a driver for a display using the SSD1289 controller, it is capable of 60fps (well, nearly, 59.9fps) using 4 scanline buffers and some hardware tricks: invert the R/S line so you can load 16 bits of data directly from hub to OUTA (otherwise you have to set the R/S bit each time), this also has the side effect of driving the WR line low so you have to just drive it back HIGH to complete the write cycle. It doesn't use the CS line, the display is always selected. A bit of loop unrolling is also necessary because of the hub access window.

Hope this helps.

Thanks a bunch for the code; I hope I can adapt it to my display driver. Loading 16 bits from hub to outa? That's pretty impressive.

kwinn · 2015-12-26 15:27

The video generator along with one of the counters in a cog provide a hardware boost for generating video. The counter and it's pll can provide up to 128MHz clock for the video circuit to shift data out to one or two pins. Download the application notes AN001 – AN019 for more information. AN001 is for the counters and AN004&5 provide some video info. All of them are worth looking at. The Propeller Manual also provides some information, although it is scattered between descriptions of the various instructions and registers.

Vega256 · 2015-12-26 17:37

kwinn wrote: »

The video generator along with one of the counters in a cog provide a hardware boost for generating video. The counter and it's pll can provide up to 128MHz clock for the video circuit to shift data out to one or two pins. Download the application notes AN001 – AN019 for more information. AN001 is for the counters and AN004&5 provide some video info. All of them are worth looking at. The Propeller Manual also provides some information, although it is scattered between descriptions of the various instructions and registers.

Thanks for the resources. Just as a clarification, if I go the non-video generator route, it's mainly waiting for hub access that slows down pixel transfer? If this is the case, maybe the Prop isn't the best tool for this; I really need the 16-bit color depth. On average, how does the bandwidth of PICs compare to the Prop (obviously, each PIC model is different)

Rayman · 2015-12-26 22:34

There are a couple chips you could use in between Prop and LCD to improve quality...

One is SSD1928 and another is FT800.

Vega256 · 2015-12-27 04:56

Just an update guys,

I can't believe what I'm seeing...I'm not questioning it either.

By unrolling the pixel transfer loop and replacing all of my calls with inlines, I managed to shave off enough cog time to cleanly do 40fps. There's gotta be something else I can do...

I'm already overclocked at 96MHz. I wonder if that extra 4MHz is enough to get it over the hump; I don't have a 6.25MHz crystal on hand, though.

I'm so close to 60, but if all else fails, I'll settle for 50 since this is PAL update rate.

potatohead · 2015-12-27 05:19

Try using two cogs doing interleaved writes. Each one does 30 fps.

I suppose the other question is can you organize your screen data differently to help improve transfer?

TFT Driver - Optimizing for speed

Comments