Propeller C - Driving 8 bit parallel screen slowly?
Wallaby
Posts: 10
in Propeller 1
Hello,
I managed to get a ILI9341 320x240 screen working. I used an example here to drive SPI and found it too slow. There wasn't a 8-bit parallel example, so I wrote my own in C. Running at 80mhz, I had absolutely no doubt it could drive that easily at 60hz. But, it turns out it's nowhere near 60hz. I expected to update every pixel 60 times a second with little issue.
So, while I'm proud of my achievement updating pixels on this display (first hardware project) - I'm a little disappointed PASM SPI was faster.
I'm not accessing memory or anything, simply setting the pixel color with a constant. It can't be any faster, as far as I know. Was I barking up the wrong tree with C for this?
Thanks!
I managed to get a ILI9341 320x240 screen working. I used an example here to drive SPI and found it too slow. There wasn't a 8-bit parallel example, so I wrote my own in C. Running at 80mhz, I had absolutely no doubt it could drive that easily at 60hz. But, it turns out it's nowhere near 60hz. I expected to update every pixel 60 times a second with little issue.
So, while I'm proud of my achievement updating pixels on this display (first hardware project) - I'm a little disappointed PASM SPI was faster.
I'm not accessing memory or anything, simply setting the pixel color with a constant. It can't be any faster, as far as I know. Was I barking up the wrong tree with C for this?
Thanks!
Comments
If that is still not fast enough, I wrote a C library of SPI functions that uses the PASM SPI driver from the PropTool examples. The thread below details how that was developed and has the library.
Tom
https://forums.parallax.com/discussion/157441/can-spi-in-simple-libraries-be-speeded-up
Here is the code I'm using to write to the display:
It should be able to write 320x240 easily, shouldn't it? If each command took ~4 clock cycles, that's 4x7x2x320x240 = 8.6Mhz. Not even 10% of the cog's power?
Using outa was the fastest in straight C. In CMM it took 145 clocks and in LMM it took 17 clocks. For comparison it took 6 clocks in PASM.
Tom
Depends what 'update' means here ?
There seems to be an extra x2 in there, but this does show your problem.
80M/(4*7*320*240) = 37.20238 Hz, (assumes each source line is 1 PASM line of 4 sysclks). That rate is a screen-fill speed, not write of useful information.
However, your code does not loop, so you need other call overheads to add,
If your call overhead is also (say) 7 lines of PASM equiv, you have an update rate of 18.6Hz, (say) 14 lines of PASM equiv gets you down to 12.4Hz refresh speed
I don't understand your math at all, and I think you're missing some key parts. 80 MHz is the clock rate of the propeller. It takes 4 clocks to execute a single instruction, which means it can only executed 20M instructions per second, or 20 MIPS. Each line of C in your above function is multiple op codes, but when compiled with LMM it actually takes four PASM instructions for each op code because it has to fetch the op code from HUB RAM. That's why you're not seeing anything remotely close to 8.6 MHz with your above function.
To get the best speed, you're going to need to use FCache. Take a look at how I implemented an SPI block write function with FCache and inline assembly here: https://github.com/parallaxinc/PropWare/blob/develop/PropWare/serial/spi/spi.h#L270. Note that there are three preprocessor definitions used by that function which are defined here: https://github.com/parallaxinc/PropWare/blob/develop/PropWare/PropWare.h#L69
Ah, I see. I thought that C was just compiled down to PASM.
I found the memory management in SimpleIDE but it won't resize properly so it was hard for me to navigate. I was able to change it from CMM to LMM and it was a noticeable improvement - still to slow - but getting there! COG Ram won't compile with simpletools.h so maybe I will write my own routines. Its only setting pins right now so it should fit into COG ram without simpletools.
Yes, my math was probably wrong. I was just trying to get an estimate. I feel that the Prop should be more than fast enough to drive this display but I'm very new to hardware. Maybe it can't.
The extra x2 was because its a 16-bit display and requires two writes to fill one pixel. I'd use 8-bit colors if I could, but the display doesn't support it.
Is it possible to divide up the work between multiple cogs? Let's say in a perfect world one COG could write 12.4Hz, could I interlace the screen x6? It seems to me that the output pins would get in the way of each other?
I'll try it. Thanks!
https://forums.parallax.com/discussion/154703/read-bmp-image-from-sd-to-display-ili9341-done-in-spin-but-very-slow
Your inner most loops will need to be assembler, but you can run different languages in different COGS
There is also a cog-mode in PropC, which is for small-but-fast stuff.
See this thread & generated PASM code
https://forums.parallax.com/discussion/comment/1325462/#Comment_1325462
and same thread compares PropC and PropBASIC code
https://forums.parallax.com/discussion/comment/1325549/#Comment_1325549
Is there a way to use the built-in VGA generation on the COG to drive this display? I felt the VGA implementation was too restrictive but maybe if it could generate at 320x240 screen instead of a 640x480 screen it might be worth testing. Still, I think with an 8 bit data bus, it's never going to be fast enough.
Even toggling the write as fast as possible isn't enough.
The display has a RGB mode where it can use VSYNC and HSYNC but it's not broken out on the board. Even with that, I'd be limited to 2 or 4 color modes.
I'll see what else I can do with the Prop.
I have interfaced to a 320x240 24-bit TFT that has iirc an SSD1963 SSD2119 internal display memory via SPI and I don't have a problem with writing to it. Of course I'm using Tachyon Forth where I have very fast SPI instructions so both that and the execution speed certainly help. BTW, I would draw variable sized fonts and lines and rectangles etc. The slowest operation will always be clearing or filling that display memory but drawing should not be such a problem.
If you search for PSM, you might find my old 320x240 driver for 8-bit interface with some kind of ILI chip.
It is Spin and PASM, but you can probably modify it for your display...
You could also look at this thread :
https://forums.parallax.com/discussion/168553/newhaven-matrix-orbital-display-with-on-board-ftdi-ft81x-embedded-video-engine/p1
The problem is not just the bus, it's also the shuffling of pixels needed, as that 320x240 is going to push the Prop RAM - eg a just 4bpp image plane needs 38.4kBytes.
You can improve the bandwidth with external help.
Working up the price curve, looking for x16 memory, (to use as palette LUT) finds
( You need to pre-program that LUT, either externally, or using something like PCA6416A 16b i2c io, 62c/1k or the N76E003AT20 can do init loading.)
? Flash : SST39LF402C-55-4C-EKE 88c/25+ - quite cheap at 4Mb, these seem to now be the lowest price part of the curve.
- you only use a small portion of that, and with a palette pair you can then send just pixels from the P1
? SDRAM : IS42S16100H-7TLI-TR $1.64/1 or W9816G6JH-6 $1.09/1 - I think those can do 1024 x 16 x 2 LUT, using column address only. As above, P1 selects LUT and then sends Pixels to select FG/BG
? SRAM : IS62WV12816EBLL-45TLI $1.32/100+ 2Mb, this part can either be simple LUT, or at 128k pixels, it is large enough to swallow the whole 76800 pixel display
? SRAM : IS61WV25616EDBLL-10TLI $2.35/100, 4Mb, 10ns
Pallet look up table is a good idea. I understand that would greatly lower the memory cost to store a frame in memory. And I could use that external ram to store a frame buffer. How do I get the pixel data to the display fast enough though? I want to target 60hz refresh. I could try a 16 bit parallel bus or a lower resolution screen I guess. Something like a 128x128 possibly. That's 1989 GameBoy resolution though and I'm not sure if it's enough.
I figured I could use the Prop for a low cost portable game console. I'm a game developer by trade, but love the idea of hardware and only just learning. My initial idea was to see how many pixels I could drive with the Prop and design a portable around the screen. Because I'm new to hardware, I thought I'd have tons of horsepower to drive the screen and could design the rest of the system with the extra cores.
However we are expecting the P2 to sample later this year which has 512kB of RAM (just for starters) and is a whole lot faster and can stream data efficiently to the display. Perhaps you would like to hold off and be one of the first to use this powerful new Propeller chip for a portable game console?
See my sig for links.
Take a look at this:
https://dev.maccasoft.com/propgame/
There is a portable version prototype that uses those 320x240 LCD display with SD card and touch panel with SSD1289 or ILI9341 drivers. The PASM code can update the screen at 60Hz using a 16 bit data bus and a small hardware trick (inverting the data/command line logic) to minimize pin toggles.
Source code is here:
https://dev.maccasoft.com/propgame/browser/trunk/libraries/lcd/scanline_driver.s
Updating only part of the screen is a fair compromise, but I'm worried the logic behind it would take a lot of work. For example, with something like a 3D object.
I'll try simulating a 2D sprite and see how the refresh is.
Interesting. So it can actually get there with a 16 bit data bus and PASM.
How do you invert command / data bit? After the signal leaves the Prop? I could see that inverting that bit makes a lot of sense because you're often sending many more data commands than commands and not having to toggle it to 1 every write would save some time.
Any idea what the maximum resolution is for 60hz?
There is a 7404 inverter between the Propeller and the LCD. The full schematic is here:
https://dev.maccasoft.com/propgame/wiki/Doc/PortableConsoleSchematic
320x240 is the maximum resolution, there isn't much space to improve things, and as you can see I had to unroll most of the loop to keep the hub reads synchronized and achive the maximum speed. The code has some delays used to make it a like a CRT screen that can be removed, the refresh may be increaed to 65 Hz or so (don't remember exactly what are the limits) but that doesn't leave much for additional resolution. Lowering the refresh to 50Hz may have some gains, depends on what you want to do.
The cartridge ROM / RAM is generous too.
Is it feasible to use a bluetooth controller instead of building your own?
Hub reads are synchronized to minimize the wasted clock cycles, rdword reads directly to OUTA saving some clocks (that's why I need to invert the data / command bit, rdword clear bits 16-31 so the lines must be 0 in data mode) and the other two instructions are using the hub wait time so the next rdword is synchronized.
I think you are now referring to the "classic" console. Bluetooth usually needs a USB receiver, there is a USB host stack for the Propeller 1 with bluetooth support here:
https://github.com/SaucySoliton/propeller-usb-host
But never used for that, maybe it works.