All PASM2 gurus - help optimizing a text driver over DVI?

rogloh · 2019-12-08 20:22

Wuerfel_21 wrote: »

Some(most?) of those 18 bit LCDs seem to also support a 6 bit mode where the RGB components are sent one after the other (at 3x the usual dot clock). That seems useful for saving pins, but driving it off anything that isn't already formatted as 24bit RGB seems tricky.

I believe LVDS uses a clock rate of 7x the pixel clock and uses 3-4 differential signal pin pairs plus a clock pair. The output signals the P2 natively sends for video are either TMDS or parallel RGB digital, and/or via the analog DACs. This driver's future parallel RGB output up to 24 bits would nicely match up with something like this part for example, to enable LVDS LCD support:

http://www.ti.com/lit/ds/symlink/sn65lvds93a.pdf

Trying to do the differential conversion to LVDS output format yourself in the P2 might someday be possible if someone wanted to attempt it but it would need a dedicated driver just for that alone, and unless your image data in memory was already pre-formatted in the right way for streaming LVDS directly, all the pixel shuffling processing overhead could be quite significant and could burn COGs.

In theory my driver in a future 8 bit parallel output mode could send LVDS out to 18 bit panels using 3 data lanes if the data is already in the correct format however each pixel in memory would take 7 bytes nibbles using the 4 bit LUT mode and you have to run the P2 at a 7x pixel clock rate. Not sure it's practical except for outputting static images perhaps, but I guess it would be possible if you wanted to save pins. You would also need to encode the syncs and DE into the image data.

potatohead · 2019-12-11 03:59

Yes I think CGA and EGA can

I just wanted to drop this idea here so it is not forgotten. On those TTL displays, I'm really wondering what can be done with a very fast, like not expected type fast, pixel clock. Might actually get a range of colors and shades. Just a, "get round to it" comment for the future.

evanh · 2019-12-11 04:32

rogloh wrote: »

I believe LVDS uses a clock rate of 7x the pixel clock and uses 3-4 differential signal pin pairs plus a clock pair. The output signals the P2 natively sends for video are either TMDS or parallel RGB digital, and/or via the analog DACs. This driver's future parallel RGB output up to 24 bits would nicely match up with something like this part for example, to enable LVDS LCD support:

Yep, that all rings familiar bells from me wrapping machine days.

Needs another CMOD option really. I wouldn't bother try make the data fit. Job for future enhanced prop.

rogloh · 2019-12-11 04:37

potatohead wrote: »

Yes I think CGA and EGA can

I just wanted to drop this idea here so it is not forgotten. On those TTL displays, I'm really wondering what can be done with a very fast, like not expected type fast, pixel clock. Might actually get a range of colors and shades. Just a, "get round to it" comment for the future.

I expect it would be possible to overclock old TTL modes faster. My driver allows fully programmable clock/resolutions so instead of 320x200 etc you might be able to go 6x faster up to 1920 res and dither colour etc, but your images in memory will be larger too.

Speaking of which, right now I am looking at a 640x480x16bpp VGA screen driven by a HyperRAM frame buffer being updated dynamically by a 2nd COG. Just got it going 10mins ago.

cgracey · 2019-12-11 04:52

Grear, Rogloh! Can you write to it while it displays video?

rogloh · 2019-12-11 04:57

Yes I can write to it dynamically from other COG(s) as it displays because I've integrated it with my arbiter controller. One (video) COG has priority access, others get round robin polled. Poll cycle latency when video is not running should be less than a microsecond, I think it's about 80 clocks or so.

I do see a little scan line corruption, potentially refresh related given I am bursting huge amounts of data which breaks the refresh constraints. Although it is following a bit of a pattern so it might be some other issue. It's very early days, only just got some life out of it, I'm still playing and it likely need further tuning...

There will be some tuning required for higher performance. Non-video COGs can get better transfer rates using bursts, but the video COG needs these bursts to remain limited (probably in the 128 byte range).

Tubular · 2019-12-11 05:52

Oh this is exciting, congrats Roger

rogloh · 2019-12-11 06:33

Well ozprop's HyperRAM sample code was a very useful start. Basically I've just fixed? a couple of issues and attached it to my basic arbiter and added it to some video driver test code, plus written some SPIN access methods for other COGs. The external memory scanline request code was there already, I'd added it some time back to the driver but it was just not exercised fully until now. Very glad I put it in before getting close to the COG RAM limit. I think it should become very handy if this RAM proves reliable.

Somewhat pleased now I have at least something sort of working after a couple of days with HyperRAM integration and mucking about on the logic analyzer with timing. I do see some rapid flicker down the screen in a fixed group of pixel columns from this external memory region so something is still going awry - might be address overlap somewhere. It appears to go away in LUT8 mode which is rather weird, perhaps streamer or scanline timing budget related. Also every now and again partial scanlines get corrupted and display garbage (statically), even if it is just doing read-only operation. Maybe I'm wrapping past some boundary while a refresh is in progress and the data gets delayed there? It's strange. Looks like the corrupted portions are regularly spaced on the screen and skewed due to the 640 pixel width, so they might be on 512 or 256 byte boundaries and related to internal DRAM rows/column sizes.

ozpropdev · 2019-12-11 06:52

Cool Roger!

Is that burst reads at sysclk speed?

jmg · 2019-12-11 07:12

rogloh wrote: »

... Also every now and again partial scanlines get corrupted and display garbage (statically), even if it is just doing read-only operation. Maybe I'm wrapping past some boundary while a refresh is in progress and the data gets delayed there? It's strange. Looks like the corrupted portions are regularly spaced on the screen and skewed due to the 640 pixel width, so they might be on 512 or 256 byte boundaries and related to internal DRAM rows/column sizes.

Could be more timing than refresh related, as my understanding is read can act as self-refresh, with no other action, provided you only expect the read-swept RAM to refresh. It is dual image plane type uses that could have more refresh issues.
Does it change with temperature ?

rogloh · 2019-12-11 08:39

ozpropdev wrote: »

Cool Roger!
Is that burst reads at sysclk speed?

Not yet, just the original sysclk/2 .

rogloh · 2019-12-11 08:45

jmg wrote: »

Could be more timing than refresh related, as my understanding is read can act as self-refresh, with no other action, provided you only expect the read-swept RAM to refresh. It is dual image plane type uses that could have more refresh issues.
Does it change with temperature ?

Not examined yet with temp. By the looks of it, I have two issues to sort out, and they could well be software/timing related somewhere.

1) flicker down one "column" of multiple pixels on each row. Seems to be some type of jitter/overlap or problem in arrival time of the data. Goes away in LUT modes which is strange.

2) Corruption of contiguous portions of video scanline data. Seems to appear shortly after startup in occasional bursts and corrupts down the screen in multiple places at the same time. Looks like it might be a pattern but I need to try to measure its width/separation. Once it is corrupt it stays that way which is weird as it does it also in a read-only state. Then over time more sections get the same thing happening. Refresh should be happening on this video data just by reading it every frame.

Update: Problem 2) looks to be groups of 256 bytes being read from the wrong location in HyperRAM. The incorrect scan line strips that periodically appear look like they are from other portions of HyperRAM memory - I see my own test patterns from different screen regions in them. They usually appear (and also vanish sometimes later) in multiple places on the screen at once, so it may be some address failure or wrap on a 256 bytes perhaps inside the HyperRAM. Right now I am doing linear burst transfers for the entire scanline of up to 2560 bytes in truecolour modes for VGA resolution once every 32us (I know it exceeds the tCMS time, but would that cause this corruption?) and no other COG writes at this testing stage.

Update2: I'm also seeing some invalid 128 byte scanline strip portions too if I wait long enough, it's not just 256 bytes. This is with P2 clk = 252MHz and HyperRAM memory clock at 126MHz (which I think should be within its rating).

evanh · 2019-12-11 15:33

Be careful of DDR clock terminology. The hyperRAM data rate is 126 MT/s but its clock is only 61 MHz.

If the refresh is being starved then an increased DRAM temperature will hasten the corruption rate.

jmg · 2019-12-11 19:27

rogloh wrote: »

Update: Problem 2) looks to be groups of 256 bytes being read from the wrong location in HyperRAM. The incorrect scan line strips that periodically appear look like they are from other portions of HyperRAM memory - I see my own test patterns from different screen regions in them. They usually appear (and also vanish sometimes later) in multiple places on the screen at once, so it may be some address failure or wrap on a 256 bytes perhaps inside the HyperRAM. Right now I am doing linear burst transfers for the entire scanline of up to 2560 bytes in truecolour modes for VGA resolution once every 32us (I know it exceeds the tCMS time, but would that cause this corruption?) and no other COG writes at this testing stage.

Update2: I'm also seeing some invalid 128 byte scanline strip portions too if I wait long enough, it's not just 256 bytes. This is with P2 clk = 252MHz and HyperRAM memory clock at 126MHz (which I think should be within its rating).

So those 256 bytes are some random one of the 10% slices of the 2560 burst ?
Do the pixels that come after a fault appear correct ?
Do those flicker (one-off), or read as stable error zones, across multiple video frames ?

This They usually appear (and also vanish sometimes later) in multiple places on the screen at once, suggests persistence ?
How can this 'know' to fail at the same place next frame, and how can it later self fix ?
Are these on many-address-bit change boundaries, or are they on a burst-into-next-page location ?
Can you change the first-read address, to be mod page size, rather than a packed mod 2560, and see if the faults then align vertically ?

Burst roll over into next page has to be a bit tricky, as it needs to read-next somehow ahead of time, but those issues you would expect to be page-sized.

rogloh wrote: »

Update2:... This is with P2 clk = 252MHz and HyperRAM memory clock at 126MHz (which I think should be within its rating).

That's overclocking a little :
100-MHz clock rate (200 MB/s) at 3.0V VCC
Sequential burst transactions
Configurable Burst Characteristics
– Wrapped burst lengths:
– 16 bytes (8 clocks)
– 32 bytes (16 clocks)
– 64 bytes (32 clocks)
– 128 bytes (64 clocks)
– Linear burst
– Hybrid option - one wrapped burst followed by linear burst
– Wrapped or linear burst type selected in each transaction
– Configurable output drive strength

Perhaps incorrect wraps ? Tho I think if it wrongly took a wrapped command, the whole 2560 line would be corrupted.

If you try change of drive strength, does that change anything ?

rogloh · 2019-12-11 21:31

evanh wrote: »

Be careful of DDR clock terminology. The hyperRAM data rate is 126 MT/s but its clock is only 61 MHz.

You are correct, the actual HyperRAM clock is only 61MHz in this case, and still within the rating for 3V operation.

jmg wrote: »

So those 256 bytes are some random one of the 10% slices of the 2560 burst ?

By the looks of what I saw yesterday, yes. But they repeat down the screen, in different places. I'll try to take a photo of it when I retest so you can see what I mean. I think a LUMA mode with a wide gradient fill would probably show it up best.

Do the pixels that come after a fault appear correct ?

Yes.

Do those flicker (one-off), or read as stable error zones, across multiple video frames ?

Stable over multiple video frames but can go away perhaps a minute or more later.

This They usually appear (and also vanish sometimes later) in multiple places on the screen at once, suggests persistence ?
How can this 'know' to fail at the same place next frame, and how can it later self fix ?

I know. This is the weird part. The only way I think this could happen is if the RAM is sending values from a different row/page/column etc in the middle of a burst. I don't store this scanline data anywhere else in hub, I just reuse the line buffer for the next line. The information my line buffer holds is only transient, and is re-filled by the next scanline read. So the only place it can come is from HyperRAM. However the P2 fifo is used to get the data into memory. It could hold 8-16 longs? Maybe that is related to it? But how could part of the frame buffer so far away get retained in the FIFO while it is in use. Doesn't make sense.

Are these on many-address-bit change boundaries, or are they on a burst-into-next-page location ?

In 640 pixel modes they appeared to be on 128/256 pixel offsets, but I need to re-test and collect the data in different colour depths as I was playing around with the colour as well and can't recall what depth it was in when I saw this.

Can you change the first-read address, to be mod page size, rather than a packed mod 2560, and see if the faults then align vertically ?

Yes I should try this. I can use the skew feature to do that.

Burst roll over into next page has to be a bit tricky, as it needs to read-next somehow ahead of time, but those issues you would expect to be page-sized.

I agree. This one is weird. The more I think about it, the more I am wondering if it is a HW issue.

Perhaps incorrect wraps ? Tho I think if it wrongly took a wrapped command, the whole 2560 line would be corrupted.

I thought about whether it might be related to when the address is sent and if it got corrupted. But then the whole line should be bad, not a smaller portion of it.

If you try change of drive strength, does that change anything ?

I haven't got the register writes implemented yet and am still running on the default. But if/when that happens I can try.

jmg · 2019-12-11 21:51

rogloh wrote: »

Stable over multiple video frames but can go away perhaps a minute or more later.

This They usually appear (and also vanish sometimes later) in multiple places on the screen at once, suggests persistence ?
How can this 'know' to fail at the same place next frame, and how can it later self fix ?

I know. This is the weird part. The only way I think this could happen is if the RAM is sending values from a different row/page/column etc in the middle of a burst. I don't store this scanline data anywhere else in hub, I just reuse the line buffer for the next line. The information my line buffer holds is only transient, and is re-filled by the next scanline read. So the only place it can come is from HyperRAM. However the P2 fifo is used to get the data into memory. It could hold 8-16 longs? Maybe that is related to it? But how could part of the frame buffer so far away get retained in the FIFO while it is in use. Doesn't make sense.

Wow, stable for minutes, & then self-healing is truly strange.

What was the room temperature refresh failure you? found - do I remember right that was of the same order (retained for dozens of seconds) ?

Perhaps if you change CS to be within the spec'd MAX, (even if that means pixel-pauses on the screen, for the test ) that can check if it is some strange HW refresh interact artifact ?

rogloh · 2019-12-11 22:02

What was the room temperature refresh failure you? found - do I remember right that was of the same order (retained for dozens of seconds) ?

I don't think that was me. Maybe ozpropdev reported something there.

Perhaps if you change CS to be within the spec'd MAX, (even if that means pixel-pauses on the screen, for the test ) that can check if it is some strange HW refresh interact artifact ?

In time I might get to do this anyway. I was thinking of breaking up long video bursts into smaller chunks in order to allow refresh cycles to continue. Or perhaps take over refresh completely in the arbiter COG if these smaller video transfer chunks increases latency too much for the video. When a video COG first identifies itself to the arbiter as the high priority requestor I also might want to pass in to the arbiter COG the video COG's status address so it can see when the blanking is happening also know how many scan lines / clock cycles will be free there so it has an opportunity to do a bunch of manual refresh cycles. That could possibly impact other COGs too much though. Haven't run the numbers.

rogloh · 2019-12-11 22:10

Actually @jmg, I don't think I am exceeding CS low max by very much right now unless I used the 16/24 bit colour modes.

For 640 pixel wide modes at 126MB/s, I am only holding CS low for around 5us, then leaving over 25us free. I don't think that would push refresh out by much. I could try dropping this to 4bpp, which would halve this again and be within the rated 4us max CS low time.

jmg · 2019-12-11 22:38

rogloh wrote: »

Actually @jmg, I don't think I am exceeding CS low max by very much right now unless I used the 16/24 bit colour modes.

For 640 pixel wide modes at 126MB/s, I am only holding CS low for around 5us, then leaving over 25us free. I don't think that would push refresh out by much. I could try dropping this to 4bpp, which would halve this again and be within the rated 4us max CS low time.

You'd certainly expect that to be fine, ie within room temperature margins. (5us vs 4us)
Do the failures seem independent of the CS low time, in other cases ?

What part codes & how many have you tested ?
I see ISSI now have 64Mb and 128Mb parts in stock at Digikey.

Tubular · 2019-12-11 22:41

Do you think using a capture card would grab the fault Roger? Did you want to send me the code and I can see if it replicates here?

rogloh · 2019-12-11 23:09

What part codes & how many have you tested ?

I only have one board. It is ISSI66WVH16MB8BLL-100B1LI. I'll check if I can visit Tubular and OzProp today to see if they have a board with the same issue or if it is mine alone.

whicker · 2019-12-11 23:45

The wrapped bursts don't apply, so don't get sidetracked by it.

From my experiments 6 months ago, the burst transfer reads can basically die if the voltage supply to the chip drops, or there is noise on the chip select line.

You wouldn't think that, but any rising edge induced on CS causes it to cancel the transaction. If you keep clocking and reading, well then usually it'll be an invalid command and address and it'll show up onscreen as a mostly solid color.

A picture is worth a thousand words.

The other issue is the P2's long input latency. You might need to experiment with waitx #3 etc. Changing its value up or down a bit to make sure the data is being read with enough setup and hold time. There are obvious problems with reading on the same clock that the data lines are changing.

rogloh · 2019-12-11 23:52

I've had this code in the LUMA mode for about 20mins at VGA resolution with a gradient displayed over 512 pixels (3 pixels per level) and have not seen any form of corruption. I'll have to change the pattern a bit, perhaps only columns get replaced by other same columns making it invisible.

Also it's probably a few degrees cooler in this room this morning. I also tried heating the HyperRAM with hot air flow and it didn't cause anything. I stopped when the chip felt a bit hot to touch as I don't have a temp gauge on it, don't want to cook it.

rogloh · 2019-12-11 23:59

whicker wrote: »

The wrapped bursts don't apply, so don't get sidetracked by it.

Good to know. I wasn't sure as the ISSI data sheet wasn't clear on it. It says some parts can introduce delay with RWDS when bursts wrap on certain boundaries, but refer to the specific part's data. Problem is the specific part is the data sheet I am reading! I am not taking into account any possibility of RWDS stopping. However if that were the case it would likely not show what I was seeing.

From my experiments 6 months ago, the burst transfer reads can basically die if the voltage supply to the chip drops, or there is noise on the chip select line.

Ok, I should take a look at the IO voltage when it starts showing issues. It is powered by the P2-eval board's IO power.

You wouldn't think that, but any rising edge induced on CS causes it to cancel the transaction. If you keep clocking and reading, well then usually it'll be an invalid command and address and it'll show up onscreen as a mostly solid color.

A picture is worth a thousand words.

Yes, when it happens I'll take a pic.

The other issue is the P2's long input latency. You might need to experiment with waitx #3 etc. Changing its value up or down a bit to make sure the data is being read with enough setup and hold time. There are obvious problems with reading on the same clock that the data lines are changing.

Yep, I know about these things and have been tweaking them. If they are off, usually it is just a byte different (skew by a byte) in a burst transfer.

jmg · 2019-12-12 00:47

rogloh wrote: »

I am not taking into account any possibility of RWDS stopping. However if that were the case it would likely not show what I was seeing.

Yes, if there were issues going across page boundaries, you would only get a quite small phase shift of however many bytes were paused.
That would also be expected on all page boundaries, as a static fail, not as a fail that self-cures, or applies sparsely, with some longevity, to 128 or 256 bytes..

rogloh · 2019-12-12 00:53

Luma mode didn't show issues as all columns were the same.

Here's a quick pic I took in RGB24 bpp mode I just grabbed showing the issue once I changed modes.

The top region is a fill pattern left over from my LUMA testing. It is a gradient, repeating on each scanline because it is now reading 4 bytes per pixel, and the fill was originally 3 bytes of the same colour in 512 bytes of the 640 on the line. More corruption has appeared since I took these photos when I glanced back to the monitor just now while typing this.

The middle portion is just whatever garbage in in HyperRAM at boot time (unwritten as yet). The lowest portion is a region coming from HUB ram, which is unaffected by these corruptions.

Update: going to head to Tubular's site soon to see if their HyperRAM board does the same so I'll likely be offline for a little while.

rogloh · 2019-12-12 01:10

Looking at the pic above, there seems to be an offset here in this mode too. The first gradient "box" seems to be narrowed then the others. Maybe this indicates something too. I didn't notice this in the LUMA mode. I need to draw some outline co-ordinates or something to make sure I am displaying from the right place in the frame buffer (I just read from address 0).

evanh · 2019-12-12 03:41

JMG,
It was me doing the temperature testing the day I got my hyperRAM board. I haven't touched it since then. It wasn't a very exhaustive set of tests. Another thing that showed up, with intentionally long CS low: The larger the data set, the more errors that became noticeable. It seemed most likely to be single bit errors though, I never got significant percentages from memory.

I was more trying to get sysclock/1 to work but that seemed less and less reliable the more I tried. Maybe that's something to get back on to ...

rogloh · 2019-12-12 05:24

Ozpropdev and I played around with this today and believe we found the root cause of the corruption of portions of scanlines. It appears to be related to holding CS low for too long and this interacts with the refresh sequence forming a "beat" pattern of HyperRAM rows that don't get refreshed often enough. Over time these unrefreshed portions appear on screen as a pattern.

When we ran at sysclock/1 the problem only appears on 24bpp modes, instead of both 16 and 24bpp modes as before at sysclock/2. Sysclock/1 holds CS low for less time.

To get around this in the longer term it may be required to break up the HyperRAM burst transfers into smaller portions or fully manage the refresh ourselves in the arbiter COG.

whicker · 2019-12-12 05:33

If you set a lower latency number of clocks, there isn't as much of a penalty for breaking up the bursts into smaller lengths.

Just be careful setting the config register as certain undefined bits must be set to 1, or the whole chip acts super weird. Just another one of those poorly documented gotchas.

All PASM2 gurus - help optimizing a text driver over DVI?

Comments