HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p) &Rev.B

jmg · 2019-03-21 18:43

Rayman wrote: »

..
It sounds like it refreshes the entire row when part of it is accessed. I'm thinking power must be marginal... This might explain what's happening...
..
Update2: Ugh... Lines just showed up after ~ 4 hours...

Do those stripes that eventually appear, line up to a whole row of pixels in memory? (hard to count the lines on a image)
When they do appear, is it a sudden flip to bad, or does it 'rust away' ?

Addit : some of those bad-lines look to be the same ? Could it be miss-reading the address, and getting a line of valid pixels, but in the wrong place ?
You could swap out 16 bits of pixels with a debug line-counter, that is read and checked ?

Rayman · 2019-03-21 18:49

It's a whole row, and they suddenly appear. And, the mostly look the same, oddly...

Ok, breaking out the scope was a good idea!

Think I found the problem... CSn was not being raised at the end of read bursts.
My code may have been OK when using lower pins, but this was bad and now fixed:

'or        outa,mCSn
outh      #Pin_CSn

Applied this fix to my earlier version that read in a whole 640 pixels at a time and the lines don't show up there now, where they used to show up immediately. But, there are still some black pixels around the beak that shouldn't be black, strange...
But, the CSn low time is 6 us, much more that 4 us limit...

Rayman · 2019-03-21 19:04

Here's what CSn in 10 bursts of 64 reads look like.
Second pic is individual bursts's CSn. Looks like there some pickup from the clock signal...

Before the above fix, CSn was almost always low (not so good).
I think I see why I needed a "waitx" in the main loop now... That was the only time CSn was high...

Third image is with the clock.

Rayman · 2019-03-22 00:05

Trying to optimize things now, but don't understand something...

HR can fill VGA line buffer faster that VGA output.
But, it's not seeming that way when looking at HR CSn and VGA HSync signals on scope.

I'm thinking that streamer is somehow buffering it's output so that I need to fill line buffer before HSync...

Yanomani · 2019-03-22 02:16

Hi Rayman

Does the summation (Front Porch + Sync + Back Porch) amounts to approximately 160 pixel clocks (~6.4 uS)?

P.S. With HSync itself lasting 96 pixel clocks (~3.84 uS)?

Rayman · 2019-03-22 12:10

This is the horizontal sync code, the numbers after the "+" are the pixel clocks for each:

m_bs        long    $CF000000+16        'before sync
m_sn        long    $CF000000+96        'sync
m_bv        long    $CF000000+48        'before visible

Rayman · 2019-03-22 12:15

Think I can declare victory (for real this time) now!
Ran overnight and no bad lines this morning.

I briefly tested using just 2 bursts of 320 pixels and that appears to be stable also.
This is two, 3-us bursts, which should be OK.

Need to work on syncing that with the VGA driver though.
I can put in a big wait after HSync is output, but I'd rather get ahead of the pixel output by starting at the very end of hsync.

This looks like it should work, but doesn't. The streamer must have some kind of output buffer...

Cluso99 · 2019-03-22 14:39

IIRC there is a double buffer for the xcont outputs so you can have the next one cued while the current one is executing. Perhaps that is what you’re seeing.

Rayman · 2019-03-22 15:10

I remember there being a xcont buffer (triple?).

But, I'm looking at the HSync and CSn signals on a scope and seeing the line buffer nearly filled before VGA pixels start coming out...

Yet, on the screen, I'm seeing the line buffer being filled too late...

Maybe it won't be an issue if I can figure out how to sync to the VGA driver directly instead of my measuring the HSync signal on a different pin...

potatohead · 2019-03-22 16:44

Rayman wrote: »

Trying to optimize things now, but don't understand something...

HR can fill VGA line buffer faster that VGA output.
But, it's not seeming that way when looking at HR CSn and VGA HSync signals on scope.

I'm thinking that streamer is somehow buffering it's output so that I need to fill line buffer before HSync...

The streamer is double buffered.

Rayman · 2019-03-22 18:23

Ok, the streamer commands are double buffered, but is the data (to be streamed out) buffered?

potatohead · 2019-03-22 18:29

No. Just the commands.

Rayman · 2019-03-22 18:51

It appears to be acting as if some of the data is buffered. But, maybe I'm looking at it wrong...

jmg · 2019-03-22 19:32

Rayman wrote: »

Think I can declare victory (for real this time) now!
Ran overnight and no bad lines this morning.

What was the final fix ? Just the CS idle hi ?
Data is vague on if refresh can occur with CS=H and no clock (that requires some internal clock), or if refresh only occurs after CS =\_ using the clock edges that load Command.Address.Latency
CSH min spec is quite short, which infers the latter.

Rayman wrote: »

I briefly tested using just 2 bursts of 320 pixels and that appears to be stable also.
This is two, 3-us bursts, which should be OK.
..

Is 1 x 640 easy to test ? That could check margins ?

whicker · 2019-03-22 20:03

jmg,

the datasheets says you can stop clocking with CS# high. So yes there is an internal clock.
The added latency clocks between command address and reading/writing the data has to do with finishing up the current row of auto refresh. The default is to always assume it's finishing up the refresh, but there is the "variable latency" option which will signal early if it wasn't actually busy.

I still posit, as we have seen, that internal refresh does not occur with CS low, hence the 4us max burst time in the datasheet. I saw that per array, that the refresh needs to happen every 6.4 edit: 64 milliseconds. Reality is that you can get away with 2 to 4 orders of magnitude more time in-between refreshes for the most part, which is why it was still mostly working even though it barely was given any time for refresh.

Another phenomenon could be that a given row is "checked out" and is only written back when crossing into another row, or with CS# going high. And that checked out buffer could be made of DRAM cells as well. So if the buffer waits too long before writeback, the data fades?

jmg · 2019-03-22 20:20

whicker wrote: »

... I saw that per array, that the refresh needs to happen every 6.4 milliseconds. ..

Did you mean 64ms ? The data gives Array Refresh Interval (ms) ( <85°C) < 64 ms

Yanomani · 2019-03-22 21:15

Throwing my own bits into the subject of HyperRam refresh...

There are two possible scenarios, when it comes to the operation of HyperRam's internal self-refresh circuitry:

- CSn is inactive (High) and steady: the internal timing unit will schedulle refresh operations as needed, advancing the internal refresh counter after each full-row (1024 bytes) read-then-write operation it completes. The parts of that circuitry that are really yet unknown (IP-protected) are the ones related to if it has any embeeded "inteligence", in the sense that it is possible for it to avoid (skip) refreshing any row that was recently accessed by any previour main data array read/write operation, that has recently occured. How recently and the maximum/minimum pace of the internally commanded self-refresh operations is another parameter TBD, but, since it is expected to ensure susteinability of main memory array contents, as long as power is not removed, nor any other control signal is changed, in a way that disturbs its operations, we can rely on it, to keep memory contents at our disposal. If it has any internal temperature sensor, in order to adapt its own pace for any range of changing thermal conditions is also yet TBD (but foreseeable).

- CSn control is being activelly exercised by the HyperRam controller (P2, in our case): provided that all appliable limits are being respected, as stated at the several access parameters listed in the datasheets and HyperBus specifications, there are at least two opportunities for a hidden refresh operation to take place, during any valid read/write access cycle:
- The first one occurs just after CSn goes high, at the end of any main data memory array access cycle, and is denoted at the datasheets as tRWR (Read Write Recovery time). HyperRams does rely on the use of that time to end a full-row (1024 bytes) refresh proccess of any previously accessed address space, that didn't had yet time to be refreshed, during the previous cycle operation.
Note that are some caveats here, because there is a situation where the former operation could need to call for TWO full-rows refresh operations, in a chain. That situation arises when the HyperRam controller (P2, in our case) solely accesses, e. g., the last word of a row, and the first word of the next row, in a chain, ending the access just after reading (or writing) the last byte of the first word of the second row.
That is one of the so-called limit situations; the ones where RWDS-High output must be delayed, by the HyperRam, when passing from the access operation at the initial row to the next one, because it needs to ensure the termination of the read/write processing (including the needed refreshing) of the first row that was acessed (the one that contains the first word that was readen/writen, at the begining of the operation). (Since RWDS-High back-control-request isn't available during WRITE accesses, more intelligence is needed at the internal management of the secondary full-row buffer, to ensure the proper number of main memory array operations can be finished, timely)
That is the reason I always recomend for accesses to last at least 8 words (per row), for Hyp_CK frequencies up to 100 MHz; 16 words (per row) for higher frequencies, up to HR's limiting one (166 MHz, at present, but interface signaling needs to be done at 1.8 V, thus not really our case, yet...). Use the datasheet-listed latency table as a guide: the higher the frequency is, the more clocks it needs, to ensure enough time for the HR internal controller to forward/retrieve the information you need.
The more "intelligence" the HR internally embeeded controller has, the faster it can resolve all the needed main memory array read/write operations it needs to do, during any given access situations, thus shortening the required time to complete them.
- The second "hidden" refresh opportunity does appear at the FIRST latency period (Latency Count 1). When Fixed Two Times Latency Count is programmed, it is always present and available for the internal controller to use it; irrespective if it is needed or not, it is there. It is used to execute any impending refresh operation, schedulled and yet not executed, to ensure the maintainability of main data memory array contents. When Variable Latency Count is selected, the internal controller will request an extended Latency Count period as needed, by raising RWDS at the begining of any access cycle.
If the external HyperBus device controller is really fast, in order to cope with such stringent signaling requirements, it can enjoy all the speed HyperBus provides, at such mild number of signal control/data lines requirements. If not, one simply needs to select Two Times Fixed Latency Count, and assume the waste of 3-to-6 clocks, during each access round.

Hope it helps a bit

Henrique

P.S. some last minute aditions, in bold...

jmg · 2019-03-22 21:26

Yanomani wrote: »

- CSn is inactive (High) and steady: the internal timing unit will schedulle refresh operations as needed, advancing the internal refresh counter after each full-row (1024 bytes) read-then-write operation it completes. The parts of that circuitry that are really yet unknown (IP-protected) are the ones related to if it has any embeeded "inteligence", in the sense that it is possible for it to avoid (skip) refreshing any row that was recently accessed by any previour main data array read/write operation, that has recently occured.

I'm not sure it needs to be super-clever, as it does not matter if a row is refreshed twice inside the 64ms repeat limit (once by user read, and again by refresh)
ie a simple refresh counter should be enough, that increments on any refresh-done action, of CS=H or CS =\_ available window.

Rayman · 2019-03-22 21:27

The problem was that CS was being held low way too long.
It was supposed to be raised high at the end of a read, but wasn't because the command was "OUTA" while I had moved the device over to the "OUTB" side.

All better now, been running almost 24 hours...

I'll try 640 pixel reads.

Yanomani · 2019-03-22 21:31

jmg wrote: »

I'm not sure it needs to be super-clever, as it does not matter if a row is refreshed twice inside the 64ms repeat limit (once by user read, and again by refresh)
ie a simple refresh counter should be enough, that increments on any refresh-done action, of CS=H or CS =\_ available window.

I agree with you, but, the clever it can be, the less current it will use.

Yanomani · 2019-03-22 21:33

Rayman wrote: »

I'll try 640 pixel reads.

Good luck to you and your excellent efforts.

Waiting to see what comes from this..

jmg · 2019-03-22 21:42

Rayman wrote: »

The problem was that CS was being held low way too long.
It was supposed to be raised high at the end of a read, but wasn't because the command was "OUTA" while I had moved the device over to the "OUTB" side.

Good to hear.
I see the data says this too
"Active Clock Stop
The Active Clock Stop mode reduces device interface energy consumption to the ICC6 level during the data transfer portion of a read or write operation.
The device automatically enables this mode when clock remains stable for tACC + 30 ns.
While in Active Clock Stop mode, read data is latched and always driven onto the data bus. ICC6 shown in Section 9.4, DC Characteristics on page 32.
Active Clock Stop mode helps reduce current consumption when the host system clock has stopped to pause the data transfer.
Even though CS# may be Low throughout these extended data transfer cycles, the memory device host interface will go into the Active Clock Stop current level at tACC + 30 ns.
This allows the device to transition into a lower current mode if the data transfer is stalled.
Active read or write current will resume once the data transfer is restarted with a toggling clock.
The Active Clock Stop mode must not be used in violation of the tCSM limit. CS# must go high before tCSM is violated."

The last line suggests holding CS# low for long times, is not supported. Maybe it's a case of 'surprising it worked as well as it did' ?

Active Clock Stop Icc is quite high, at typ 12mA vs 20mA for full read.

ICC4 VCC Standby Current for Industrial CS#= Vcc, is typ 270uA, maybe that's enough current to manage refresh ?

Rayman · 2019-03-22 21:51

640 pixel read appears to work. At least, it doesn't fail after ~5 minutes or so.

Back to my sync issue...
Look at this scope trace were the HR CSn is in yellow and the VGA HSync pin is in green.
It's pretty clear that the line is more than half read in before HSync is done and VGA line output starts.
But, look at the screenshot there is a shift at about 15% into the line where it shifts from old data to new data.

Somehow (and I don't know how this can be), it appears that the VGA pixel output has a big buffer.

Yanomani · 2019-03-22 22:03

Hi Rayman

Since you are ever retrieving HyperRam contents well in front of the end of each horizontal line, perhaps if you start reading just after VSync begins, or better, keep your own counters and preview when each full screen ends its visible pixels scan period, then you can be sure you are well at the first place of the run.

evanh · 2019-03-23 00:27

Rayman,
I'm no wiz on the VGA/streamer stuff but looking at v1i source code looks like you've placed a RDFAST in the wrong place. I think line 753 needs to be moved out of the display loop. Presumably replacing line 719.

evanh · 2019-03-23 00:46

Oh, I see. You're wanting to restart the scanline at an arbitrary length. Where as the FIFO normally is a multiple of 64 bytes.

Okay, maybe changing the RDFAST to be non-blocking and shifting it to inside the hsync routine, eg:

hsync       xcont   m_bs,#0         'horizontal sync
        rdfast  ##1<<31,##@HyperBuffer
            xcont   m_sn,#1
    _ret_   xcont   m_bv,#0

Rayman · 2019-03-23 01:18

That did change things... Interesting...

There's no more tear, but horizontally shifted about 4 pixels..

evanh · 2019-03-23 01:30

I don't if that example is any good either. I'm not sure when streamer will be using the FIFO. I've got no experience.

evanh · 2019-03-23 06:36

Been doing some experimenting and found that the code position of RDFAST was fine. I'm slightly surprised. Maybe there is enough time before the first pixel is needed. The tearing seems to be more about what you were first investigating.

I've eliminated the tearing by changing the RDFAST to use a buffer pointer and doing buffer flipping.

Here's an example using bitmap2.bmp from FPGA files: (You can replace the "img_ptr" handling and the RD/WRLONG buffer copying with whatever you've got for HyperRAM. The feature is having two buffers to flip between.)

evanh · 2019-03-23 10:00

Here's another version using multiples of 64 bytes per scan line. Doesn't require the repeated RDFAST's.

HyperRam/Flash as VGA screen buffer (Now XGA, 720p &amp;1080p) &amp;Rev.B

Comments

HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p) &Rev.B