Ozpropdev and I played around with this today and believe we found the root cause of the corruption of portions of scanlines. It appears to be related to holding CS low for too long and this interacts with the refresh sequence forming a "beat" pattern of HyperRAM rows that don't get refreshed often enough. Over time these unrefreshed portions appear on screen as a pattern.
When we ran at sysclock/1 the problem only appears on 24bpp modes, instead of both 16 and 24bpp modes as before at sysclock/2. Sysclock/1 holds CS low for less time.
To get around this in the longer term it may be required to break up the HyperRAM burst transfers into smaller portions or fully manage the refresh ourselves in the arbiter COG.
So what's the CS low threshold ? Is that 4us looking quite a hard-number ?
Well given I was going at ~5us CS low and occaisionally seeing issues, 4us might need to be adhered to. 10us triggers it a lot.
The thing is this seems to be related to how frequently you hold it off as well as the holdoff length. I think what is happening is the internal refresh counter is incrementing all the time and if you have CS low for more than a full cycle, it doesn't get an opportunity to refresh that row and moves onto the next one. Normally an occasional miss like this would be okay and it would catch up next time, but because video is synchronously requesting all the time at some scan line frequency, if a given row's refresh opportunity get's hit over and over it will miss it's 64ms refresh cycle too often and lose its data. Only some row's portions are affected on the screen and these seem smaller than the column size of 1kB. Also weirdly, just by reading through all the data rows every frame, this is still not enough to keep it fully refreshed. Not sure why that is.
Last night I also noticed when a row is not refreshed and loses its state, it can apparently obtain the state of another row, not pure random data. I saw a case where some other pattern from another part of HyperRAM screen clobbers a portion of a row that is not being refreshed frequently enough. This is the really weird thing, and explains why it can "come good" later as the pattern that writes it back was already obtained from another part of the screen that matched what it held originally. That case appears to be far rarer though, most of the time the corrupted scanline portion persists and you need a simple pattern from elsewhere that can match it, like a single coloured stripe.
You probably are, but how long are you leaving CS high in between transactions? It's probably better to do your command address conversion and unpack into the six bytes, and then put CS low.
I'm still under the thought process that auto refresh is completely paused during CS low for that die. (Some of these chips are dual stack). It is going to need enough time to start at least one refresh.
For the SDRAM driver I made for P2-Hot, I issued 'refresh' commands periodically within the management code. Is that something that could be done with HyperRAM to make it deterministic?
The thing is this seems to be related to how frequently you hold it off as well as the holdoff length. I think what is happening is the internal refresh counter is incrementing all the time and if you have CS low for more than a full cycle, it doesn't get an opportunity to refresh that row and moves onto the next one. Normally an occasional miss like this would be okay and it would catch up next time, but because video is synchronously requesting all the time at some scan line frequency, if a given row's refresh opportunity get's hit over and over it will miss it's 64ms refresh cycle too often and lose its data. Only some row's portions are affected on the screen and these seem smaller than the column size of 1kB. Also weirdly, just by reading through all the data rows every frame, this is not still not enough to keep it fully refreshed. Not sure why that is.
Last night I also noticed when a row is not refreshed and loses its state, it can apparently obtain the state of another row, not pure random data. I've saw a case where some other pattern from another part of HyperRAM screen clobbers a portion of a row that is not being refreshed frequently enough. This is the really weird thing, and explains why it can "come good" later as the pattern that writes it back was already obtained from another part of the screen that matched what it held originally. That case appears to be far rarer though, most of the time the corrupted scanline portion persists and you need a simple pattern from elsewhere that can match it, like a single coloured stripe.
These are very new parts, and likely to have hidden errata. Data is frustratingly vague in many areas. It's always going to be tricky to run some internal monostable timer, and external user access, and these results sound like all interaction cases of that are not handled well.
I wonder if the Cypress parts are a whole new design, or just a badge job ?
There are OctaRAM parts too, but not easy to source.
.. Also weirdly, just by reading through all the data rows every frame, this is not still not enough to keep it fully refreshed. Not sure why that is...
Maybe they increment the refresh counter, on CS _/=, so you would need a certain number of CS pulses to scan in time ?
(If that were true, a simple pulsing of CS in quiet times, (probably with CLKs too?) could buy refresh times)
A chip design that incremented refresh ctr on CS _/= and used address clocks to do refresh, could manage without any other clock ?
Did you try read in 1kB chunks ?
You probably are, but how long are you leaving CS high in between transactions? It's probably better to do your command address conversion and unpack into the six bytes, and then put CS low.
I agree there and I have taken some initial steps to reduce the CS low time in the code I started with from ozpropdev. To shrink it even further I would like to try to use the streamer there too - that address setup part is just byte banged right now. I want to try some ideas later today.
For the SDRAM driver I made for P2-Hot, I issued 'refresh' commands periodically within the management code. Is that something that could be done with HyperRAM to make it deterministic?
There is no separate refresh command like there is on regular SDRAMs so this concept can only work if reads also trigger a refresh. From what I saw yesterday with the screen corruption I'm not convinced that simply continuous reading throughout the chip going to be enough, at least for this ISSI chip on the P2-EVAL HyperRAM breakout. We might just have to let the chip have it's chance every 4us interval (or I could try to increase this interval in the registers slightly, haven't done register writes yet).
It would be easy enough to breakup the burst transfers into sub 4us chunks if we know the frequency. A single COG is controlling all of this on behalf of all requestors so it can co-ordinate these bursts. I'll be trying that idea out too.
These are very new parts, and likely to have hidden errata. Data is frustratingly vague in many areas.
Maybe they increment the refresh counter, on CS _/=, so you would need a certain number of CS pulses to scan in time ?
(If that were true, a simple pulsing of CS in quiet times, (probably with CLKs too?) could buy refresh times)
A chip design that incremented refresh ctr on CS _/= and used address clocks to do refresh, could manage without any other clock ?
Without knowing the intricacies of their refresh design it's only speculation. I could try some periodic /CS low, but I don't imagine that approach could work out in a completely idle system.
Did you try read in 1kB chunks ?
Until now from the Video COG I have only read in chunks that are multiples of the scanline. I've tried 640x480 at all bit depths (so 80,160,320,640,1280,2560 byte transfers) plus 800x600 in some 8/16/32 bit modes IIRC. I have also capped to 256 bytes the other COGs burst accesses (for now), and so I guess I have also tried 256 bytes.
Update: Just want to point out how poorly some of my recent posts read with bad typos/grammar/apostrophes missing or duplicated words etc. I type in too fast and I really need to proof read carefully and slowly before clicking send. I sort of use the edit window as a word processor collecting my thoughts and often drag words and sentences around with the mouse etc as I reformulate ideas in my head, which causes some of this, but I think I am too quick to press send and only skim read it first in a cursory way. Really I need to slow things down as I appear to miss a lot of these errors until I read the post later, that's why I need to edit many of my posts. My English teachers would be ashamed after all those essays I wrote for them. Funny thing is these days I'm not sure I care, that's the problem. LOL.
Had a play with timing and I am looking at a 1280x1024x16bpp (SXGA) screen output from a P2 over a VGA interface sourced from HyperRAM with the sysclk/1 code ozpropdev gave me yesterday. Wow. Going up to 24bpp is too much for it though at SXGA. I was clocking the P2 at 216MHz and this is overclocking the HyperRAM by 8% at 3.3V. I also got XGA at 24bpp going too (with a 220MHz P2, 10% memory overclock, 55MHz pixel clock and 51Hz vertical refresh, didn't try to tweak its timing to hit 60Hz).
If I overclock the HyperRAM by 20% and run the P2 at 240MHz you start to see some pixel "fuzz", this is indicating some bit transfer errors and/or other setup timing problems on read. I haven't pinpointed precisely yet where it begins.
Test results like this show a single HyperRAM module can potentially support 16bpp SXGA or 24bpp XGA with appropriate timing. Hopefully after breaking up the bursts to solve the refresh thing it will still have the bandwidth to do these resolutions, and maybe higher if other RAMs overclock more or we can tweak the setup timing further.
I now really need a good way to get proper large test images into the system, as the HUB RAM is limited and doesn't let me load much from the P2 image itself, that's why that bird image is just replicated. I guess I will have to find or code up a good SD reader COG or add in some more serial download stuff. Ideally at some point we'll probably want loadp2 to be able to load data into HyperRAM too. I think that could be done as long as the hyperRAM contents are sustained between boot time and run time (they should be).
Had a play with timing and I am looking at a 1280x1024x16bpp (SXGA) screen output from a P2 over a VGA interface sourced from HyperRAM with the sysclk/1 code ozpropdev gave me yesterday. Wow. Going up to 24bpp is too much for it though at SXGA. I was clocking the P2 at 216MHz and this is overclocking the HyperRAM by 8% at 3.3V. I also got XGA at 24bpp going too (with a 220MHz P2, 10% memory overclock, 55MHz pixel clock and 51Hz vertical refresh, didn't try to tweak its timing to hit 60Hz).
If I overclock the HyperRAM by 20% and run the P2 at 240MHz you start to see some pixel "fuzz", this is indicating some bit transfer errors and/or other setup timing problems on read. I haven't pinpointed precisely yet where it begins.
It looks like 133MHz at 3.3V is possible, on other x8 Serial memories that spec 200MHz at 1.8V
Some part codes here : http://www.wridy.com/list-354-1.html
LY68L6408 64M 3.3V 266MHz 8 200uA -40~85°C 20 QFN
Lyontek already offer this
LY68L6400 64M 3.3V 100MHz 1 or 4 200uA -40~85°C SOP8 QSPI
The issue is more about clock/data timing relationships. At sysclock/1, there is only narrow bands of frequencies where those alignments are suitable. And temperature will likely have a bearing on the exact position of the band too.
If you move up in sysclock rate a decent amount further and adjust for an extra clock of lag then you'll get your much faster data rate you want.
The issue is more about clock/data timing relationships. At sysclock/1, there is only narrow bands of frequencies where those alignments are suitable. And temperature will likely have a bearing on the exact position of the band too.
If you move up in sysclock rate a decent amount further and adjust for an extra clock of lag then you'll get your much faster data rate you want.
Yeah that narrow relationship could make it tough to reliably be able to use the RAM at high clock speeds, especially if it varies from part to part as well as temperature.
Weirdly at one point I think I found that keeping the HyperRAM clock within spec (sysclock/2) but increasing the P2 rate to over 300MHz I was getting some fuzz errors again. At the time I wondered if the higher PLL frequency caused more jitter on the P2 IO? But I need to take another look as I was just playing about with lots of things then and may have changed something else.
Yeah I've seen the overlapping bands that OzPropDev has produced for the /1 case
The good news is at least there is some overlap
We have a few tools yet to try, such as using the bit dac mode asymmetrically around the threshold of the 1v8 hyperram, in order to have some phase control. It would be good to close the loop on testing and cycle temperature as well as frequency to see how the bands and reliability moves
I think it would be a good time about now to get going on the 1v8 Hyperram and see how it behaves, while the 3v3 Hyperram experiments are fresh
Weirdly at one point I think I found that keeping the HyperRAM clock within spec (sysclock/2) but increasing the P2 rate to over 300MHz I was getting some fuzz errors again.
I now really need a good way to get proper large test images into the system, as the HUB RAM is limited and doesn't let me load much from the P2 image itself, that's why that bird image is just replicated. I guess I will have to find or code up a good SD reader COG or add in some more serial download stuff. Ideally at some point we'll probably want loadp2 to be able to load data into HyperRAM too. I think that could be done as long as the hyperRAM contents are sustained between boot time and run time (they should be).
No one has a JPEG decoder written in PASM2 yet do they? That could work. Could the P2 CORDIC accelerate the IDCT in any way? Or maybe the fast multiply is good enough.
I think it would be a good time about now to get going on the 1v8 Hyperram and see how it behaves, while the 3v3 Hyperram experiments are fresh
The P2 logic paths slow down at 1V8 VIO, so this may need logic translators on the data paths.
xx8T245 parts have Dual Vcc, and spec 380/420 Mbps & maybe Si53307 or similar on the CLK/CLK# ? (or lower cost 74AUP2G3404 or 74LVC2G86 or 74AUP2G86 or 74AUP1G19)
Maybe using a DAC to power lower VCCB could give some fine tune of delays ?
Attached is a simple binary if anyone wants to see some scrolling video coming from their HyperRAM board out to VGA with the P2-EVAL. Doesn't do much but exercise the buses and scrolls the screen. If you have a scope or analyser you can see the sysclk/1 data transfers on the HyperRAM interface. It still displays that scan line corruption to be fixed which you can see for yourself.
Needs
- VGA breakout board fitted on pins 8-15
- HyperRAM module fitted on pins 32-47
P2 will be clocked at 220MHz and HyperRAM at 110MHz (10% overclock). SXGA output.
Update: Also added a non-scrolling version.
Update2: these demos also contained that other bug near the start of the scan lines I'd reported earlier. I wasn't ping/ponging correctly in the external memory cases. This was due to repurposing the line buffer data pointer to create the external mailbox request and this was clobbering where I was keeping track of the ping/pong state, resulting in the same scan line buffer being used for all external memory scan lines. It's now fixed, but the fix used up one of my last 3 longs so I want to look for another optimization to pay for it! I think I can find one or two more.
Here's another binary with the two fixes.
1) single scan line buffer fix
2) slow refresh (16us max CS low time), seems to fix the bad scan lines. I've waited about 5mins and not seen any corruption yet. Normally it happened very soon after starting the demo.
The scrolling is very slowly crawling upwards - presumably one pixel line per rotation. I only noticed because eventually continuous garbage appears below the bottom images. I remember that was something I came across in the same situation during my experiments. I managed to correct for it, because I wanted to, but it wasn't a particularly elegant solution from memory.
I've put the hair drying on the hyperRAM for a minute or so ... no indication of data corruption.
We really need to have a go at the 1v8 Hyperrams using the bit_dac smartpin mode. These are rated to 333 MT/s instead of 200 MT/s
The problem will be reading the data back from the RAM, since the DAC comparator may not be fast enough to receive the transitions.
Yes that would hurt. Its certainly the Hyperram->P2 I'm worried about
In that case here's what I'm thinking would be worth trying
- Run P2's VIO from 3.0v so we still get fast i/o
- This will make the P2's switching threshold around 1.5v
- Run the 1v8 Hyperram from 1.95v regulator, so its output high is comfortably above the 1.5v threshold of the P2
- This will make the 1v8 hyperram signal threshold around 1.0v
- For p2->Hyperram, use the 75 ohm 2v DAC and bit-dac mode
- (nb you still get fast drive because P2's VIO is close to 3v3, not 1v8)
- Manipulate the bit_dac levels around 1.0v to alter phase shift a little bit. eg you might want Vlow=0.8v and Vhigh = 1.8v, or the other extreme Vlow=0.0v Vhigh=1.0v.
- (There will be phase shift because the capacitance will roll off the clock signal at a few hundred MHz, so the clock is actually more sine than square wave. The effect of changing the bit_dac levels is to move the sine wave up and down with respect to the 1.0v hyperram threshold)
- for Hyperram->P2, see whether its natural Vhigh of 1.95v is sufficiently clear of the P2's 1.5V threshold. This is where it might all come unstuck. We'll see
Here's another binary with the two fixes.
1) single scan line buffer fix
2) slow refresh (16us max CS low time), seems to fix the bad scan lines. I've waited about 5mins and not seen any corruption yet. Normally it happened very soon after starting the demo.
Seems clean for now.
What was the max CS low time, before that change ? Is this indicating much longer CS low times are to be avoided, but up to 16us may be ok ?
Another possible artifact I can imagine, is if the first read page does refresh (during address slack time), but roll-over reads do not (no refresh as no spare time).
That could explain your report of a full-scan seeming to fail to refresh ?
The scrolling is very slowly crawling upwards - presumably one pixel line per rotation. I only noticed because eventually continuous garbage appears below the bottom images. I remember that was something I came across in the same situation during my experiments. I managed to correct for it, because I wanted to, but it wasn't a particularly elegant solution from memory.
I've put the hair drying on the hyperRAM for a minute or so ... no indication of data corruption.
Yeah the horizontal scroll is not infinite and wasn't coded up to be. It still takes it quite a while to fully run out of data. A vertical scroll can be infinite and I can do that to test my wrap in external memory mode (should just work the same as my internal memory mode). I could also make another static version if you want the screen to persist to wait for possible long term corruption.
What was the max CS low time, before that change ? Is this indicating much longer CS low times are to be avoided, but up to 16us may be ok ?
Max CS low before the change was the default of 4us. So I increased by 4x to get 16us. The total data transfers in this 1280 wide 16bpp mode is 2560 bytes. @220MB/s it takes 11.63us plus some extra setup overhead which is still within the 16us max CS low. It looks like after you exceed the limit the corruption seems to appear. There are two other refresh interval settings I can try at 6us and 8us. I am highly confident these both should have refresh issues in this mode, but should verify.
Another possible artifact I can imagine, is if the first read page does refresh (during address slack time), but roll-over reads do not (no refresh as no spare time).
That could explain your report of a full-scan seeming to fail to refresh ?
Do you mean that in a burst we are simply starving it of refresh opportunities as it switches from one row to the other? Maybe with the back to back accesses it doesn't have any time to recharge the cells properly after read. Perhaps it would then be the last row accessed that is being refreshed once CS finally goes high and gets time to "come up for air".
In that case here's what I'm thinking would be worth trying
- Run P2's VIO from 3.0v so we still get fast i/o
- This will make the P2's switching threshold around 1.5v
- Run the 1v8 Hyperram from 1.95v regulator, so its output high is comfortably above the 1.5v threshold of the P2
- This will make the 1v8 hyperram signal threshold around 1.0v
- For p2->Hyperram, use the 75 ohm 2v DAC and bit-dac mode
- (nb you still get fast drive because P2's VIO is close to 3v3, not 1v8)
- Manipulate the bit_dac levels around 1.0v to alter phase shift a little bit. eg you might want Vlow=0.8v and Vhigh = 1.8v, or the other extreme Vlow=0.0v Vhigh=1.0v.
- (There will be phase shift because the capacitance will roll off the clock signal at a few hundred MHz, so the clock is actually more sine than square wave. The effect of changing the bit_dac levels is to move the sine wave up and down with respect to the 1.0v hyperram threshold)
- for Hyperram->P2, see whether its natural Vhigh of 1.95v is sufficiently clear of the P2's 1.5V threshold. This is where it might all come unstuck. We'll see
There is still the risk that errant COGs could set the IO on the HyperRAM pins to non-BIT DAC modes and drive out 3V to the 1.95V RAM and possibly damage it. 1V higher than VccQ inputs is the latchup specification for the 1.8V ISSI device in their data sheet.
What was the max CS low time, before that change ? Is this indicating much longer CS low times are to be avoided, but up to 16us may be ok ?
Max CS low before the change was the default of 4us. So I increased by 4x to get 16us. The total data transfers in this 1280 wide 16bpp mode is 2560 bytes. @220MB/s it takes 11.63us plus some extra setup overhead which is still within the 16us max CS low. It looks like after you exceed the limit the corruption seems to appear. There are two other refresh interval settings I can try at 6us and 8us. I am highly confident these both should have refresh issues in this mode, but should verify.
Another possible artifact I can imagine, is if the first read page does refresh (during address slack time), but roll-over reads do not (no refresh as no spare time).
That could explain your report of a full-scan seeming to fail to refresh ?
Do you mean that in a burst we are simply starving it of refresh opportunities as it switches from one row to the other? Maybe with the back to back accesses it doesn't have any time to recharge the cells properly after read. Perhaps it would then be the last row accessed that is being refreshed once CS finally goes high and gets time to "come up for air".
Interesting, so a manual change of the Distributed Refresh Interval eases the problem ? that likely shifts a simple monostable, but maybe at the expense of high temperature refresh ?
The data does say this cryptic comment : The distributed refresh method requires that the host does not do burst transactions that are so long as to prevent the memory from
doing the distributed refreshes when they are needed.
That does sound like burst access can (somehow) interfere / trump refresh, and 'prevent' sounds more blunt than saying delay or defer you might expect from a smarter system.
There is dead time in the address portion of read where the memory could do a refresh, but I was wondering if a roll-over lacks that time, and so does not refresh the rolled-into row ?
I thought of it like this.
One chip is 8 MB = 8388608 bytes.
Rows are 1024 bytes.
Divide and get 8192 rows.
Spec says keep CS low time limited to 4 us.
8192 rows × .000004 sec = 32.768 ms
That's similar to the time a given DRAM cell needs refreshing. Something like 64 ms.
So this is why i keep thinking refresh is paused when CS is low. Given the auto row refresh cadence is probably every 8 uS, and it's probably a dumb incrementing counter, too long of having CS low can cause the next row(s) to fail to get refreshed.
What's more, i believe the 1024 × 8 bit shift register after the sense amps is also composed of DRAM cells. So if it doesn't get clocked out, the data eventually fades.
What's more, i believe the 1024 × 8 bit shift register after the sense amps is also composed of DRAM cells. So if it doesn't get clocked out, the data eventually fades.
I like your numbers but what makes you say that? I would expect straight up flops. There is significant logic, address generation and state sequencing inside a hyperRAM chip.
..
So this is why i keep thinking refresh is paused when CS is low. Given the auto row refresh cadence is probably every 8 uS, and it's probably a dumb incrementing counter, too long of having CS low can cause the next row(s) to fail to get refreshed.
refresh fail is highly temperature sensitive, and is much longer than ~64ms at room temp.
rogloh's results suggest a monostable/timer in there, that you can register-extend up to 16ms from 4ms, which is going to be ok if you do not read the whole device, or if you keep the temperature more modest.
Maybe 16ms is also ok, if you ensure you do read the whole device.
That timer seems to be more precise, maybe 20% or similar ?
Still unclear is if a roll-over read into the next row, is enough to count as a refresh for that next row ?
One possible way to manage 2 image planes could be to read half a row for each. The whole row is refreshed, for free, and the display plane is picked by which half is read.
What's more, i believe the 1024 × 8 bit shift register after the sense amps is also composed of DRAM cells. So if it doesn't get clocked out, the data eventually fades.
Possibly, but I'm not sure of a use case where that would be an issue ?
The refresh decay times are much longer than any video line flyback time. Even longer than frame times.
Comments
So what's the CS low threshold ? Is that 4us looking quite a hard-number ?
The thing is this seems to be related to how frequently you hold it off as well as the holdoff length. I think what is happening is the internal refresh counter is incrementing all the time and if you have CS low for more than a full cycle, it doesn't get an opportunity to refresh that row and moves onto the next one. Normally an occasional miss like this would be okay and it would catch up next time, but because video is synchronously requesting all the time at some scan line frequency, if a given row's refresh opportunity get's hit over and over it will miss it's 64ms refresh cycle too often and lose its data. Only some row's portions are affected on the screen and these seem smaller than the column size of 1kB. Also weirdly, just by reading through all the data rows every frame, this is still not enough to keep it fully refreshed. Not sure why that is.
Last night I also noticed when a row is not refreshed and loses its state, it can apparently obtain the state of another row, not pure random data. I saw a case where some other pattern from another part of HyperRAM screen clobbers a portion of a row that is not being refreshed frequently enough. This is the really weird thing, and explains why it can "come good" later as the pattern that writes it back was already obtained from another part of the screen that matched what it held originally. That case appears to be far rarer though, most of the time the corrupted scanline portion persists and you need a simple pattern from elsewhere that can match it, like a single coloured stripe.
I'm still under the thought process that auto refresh is completely paused during CS low for that die. (Some of these chips are dual stack). It is going to need enough time to start at least one refresh.
These are very new parts, and likely to have hidden errata. Data is frustratingly vague in many areas.
It's always going to be tricky to run some internal monostable timer, and external user access, and these results sound like all interaction cases of that are not handled well.
I wonder if the Cypress parts are a whole new design, or just a badge job ?
There are OctaRAM parts too, but not easy to source.
Maybe they increment the refresh counter, on CS _/=, so you would need a certain number of CS pulses to scan in time ?
(If that were true, a simple pulsing of CS in quiet times, (probably with CLKs too?) could buy refresh times)
A chip design that incremented refresh ctr on CS _/= and used address clocks to do refresh, could manage without any other clock ?
Did you try read in 1kB chunks ?
There is no separate refresh command like there is on regular SDRAMs so this concept can only work if reads also trigger a refresh. From what I saw yesterday with the screen corruption I'm not convinced that simply continuous reading throughout the chip going to be enough, at least for this ISSI chip on the P2-EVAL HyperRAM breakout. We might just have to let the chip have it's chance every 4us interval (or I could try to increase this interval in the registers slightly, haven't done register writes yet).
It would be easy enough to breakup the burst transfers into sub 4us chunks if we know the frequency. A single COG is controlling all of this on behalf of all requestors so it can co-ordinate these bursts. I'll be trying that idea out too.
Almost an understatement.
Without knowing the intricacies of their refresh design it's only speculation. I could try some periodic /CS low, but I don't imagine that approach could work out in a completely idle system.
Until now from the Video COG I have only read in chunks that are multiples of the scanline. I've tried 640x480 at all bit depths (so 80,160,320,640,1280,2560 byte transfers) plus 800x600 in some 8/16/32 bit modes IIRC. I have also capped to 256 bytes the other COGs burst accesses (for now), and so I guess I have also tried 256 bytes.
Update: Just want to point out how poorly some of my recent posts read with bad typos/grammar/apostrophes missing or duplicated words etc. I type in too fast and I really need to proof read carefully and slowly before clicking send. I sort of use the edit window as a word processor collecting my thoughts and often drag words and sentences around with the mouse etc as I reformulate ideas in my head, which causes some of this, but I think I am too quick to press send and only skim read it first in a cursory way. Really I need to slow things down as I appear to miss a lot of these errors until I read the post later, that's why I need to edit many of my posts. My English teachers would be ashamed after all those essays I wrote for them. Funny thing is these days I'm not sure I care, that's the problem. LOL.
If I overclock the HyperRAM by 20% and run the P2 at 240MHz you start to see some pixel "fuzz", this is indicating some bit transfer errors and/or other setup timing problems on read. I haven't pinpointed precisely yet where it begins.
Test results like this show a single HyperRAM module can potentially support 16bpp SXGA or 24bpp XGA with appropriate timing. Hopefully after breaking up the bursts to solve the refresh thing it will still have the bandwidth to do these resolutions, and maybe higher if other RAMs overclock more or we can tweak the setup timing further.
I now really need a good way to get proper large test images into the system, as the HUB RAM is limited and doesn't let me load much from the P2 image itself, that's why that bird image is just replicated. I guess I will have to find or code up a good SD reader COG or add in some more serial download stuff. Ideally at some point we'll probably want loadp2 to be able to load data into HyperRAM too. I think that could be done as long as the hyperRAM contents are sustained between boot time and run time (they should be).
It looks like 133MHz at 3.3V is possible, on other x8 Serial memories that spec 200MHz at 1.8V
Some part codes here : http://www.wridy.com/list-354-1.html
LY68L6408 64M 3.3V 266MHz 8 200uA -40~85°C 20 QFN
Lyontek already offer this
LY68L6400 64M 3.3V 100MHz 1 or 4 200uA -40~85°C SOP8 QSPI
JSC:
JSC28SSU8AGDY-75I 128Mb 3.0V 133MHz 24B BGA 6x8mm
JSC64SSU8AGDY-75I 64Mb 3.0V 133MHz 24B BGA 6x8mm
APMemory
APS3208x-3xx 32Mb Octal SPI PSRAM x8 Octal SPI 266MHz 3V
APS6408x-3xx 64Mb Octal SPI PSRAM x8 Octal SPI 266MHz 3V
APS12808x-3xx 128Mb Octal SPI PSRAM x8 Octal SPI 266MHz 3V
but not much data on these x8 parts....
If you move up in sysclock rate a decent amount further and adjust for an extra clock of lag then you'll get your much faster data rate you want.
Weirdly at one point I think I found that keeping the HyperRAM clock within spec (sysclock/2) but increasing the P2 rate to over 300MHz I was getting some fuzz errors again. At the time I wondered if the higher PLL frequency caused more jitter on the P2 IO? But I need to take another look as I was just playing about with lots of things then and may have changed something else.
The good news is at least there is some overlap
We have a few tools yet to try, such as using the bit dac mode asymmetrically around the threshold of the 1v8 hyperram, in order to have some phase control. It would be good to close the loop on testing and cycle temperature as well as frequency to see how the bands and reliability moves
I think it would be a good time about now to get going on the 1v8 Hyperram and see how it behaves, while the 3v3 Hyperram experiments are fresh
He's wangled some overlap there by interleaving registered and unregistered configs with the various lag calibrations.
xx8T245 parts have Dual Vcc, and spec 380/420 Mbps & maybe Si53307 or similar on the CLK/CLK# ? (or lower cost 74AUP2G3404 or 74LVC2G86 or 74AUP2G86 or 74AUP1G19)
Maybe using a DAC to power lower VCCB could give some fine tune of delays ?
Needs
- VGA breakout board fitted on pins 8-15
- HyperRAM module fitted on pins 32-47
P2 will be clocked at 220MHz and HyperRAM at 110MHz (10% overclock). SXGA output.
Update: Also added a non-scrolling version.
Update2: these demos also contained that other bug near the start of the scan lines I'd reported earlier. I wasn't ping/ponging correctly in the external memory cases. This was due to repurposing the line buffer data pointer to create the external mailbox request and this was clobbering where I was keeping track of the ping/pong state, resulting in the same scan line buffer being used for all external memory scan lines. It's now fixed, but the fix used up one of my last 3 longs so I want to look for another optimization to pay for it! I think I can find one or two more.
The problem will be reading the data back from the RAM, since the DAC comparator may not be fast enough to receive the transitions.
1) single scan line buffer fix
2) slow refresh (16us max CS low time), seems to fix the bad scan lines. I've waited about 5mins and not seen any corruption yet. Normally it happened very soon after starting the demo.
Seems clean for now.
I've put the hair drying on the hyperRAM for a minute or so ... no indication of data corruption.
Yes that would hurt. Its certainly the Hyperram->P2 I'm worried about
In that case here's what I'm thinking would be worth trying
- Run P2's VIO from 3.0v so we still get fast i/o
- This will make the P2's switching threshold around 1.5v
- Run the 1v8 Hyperram from 1.95v regulator, so its output high is comfortably above the 1.5v threshold of the P2
- This will make the 1v8 hyperram signal threshold around 1.0v
- For p2->Hyperram, use the 75 ohm 2v DAC and bit-dac mode
- (nb you still get fast drive because P2's VIO is close to 3v3, not 1v8)
- Manipulate the bit_dac levels around 1.0v to alter phase shift a little bit. eg you might want Vlow=0.8v and Vhigh = 1.8v, or the other extreme Vlow=0.0v Vhigh=1.0v.
- (There will be phase shift because the capacitance will roll off the clock signal at a few hundred MHz, so the clock is actually more sine than square wave. The effect of changing the bit_dac levels is to move the sine wave up and down with respect to the 1.0v hyperram threshold)
- for Hyperram->P2, see whether its natural Vhigh of 1.95v is sufficiently clear of the P2's 1.5V threshold. This is where it might all come unstuck. We'll see
Another possible artifact I can imagine, is if the first read page does refresh (during address slack time), but roll-over reads do not (no refresh as no spare time).
That could explain your report of a full-scan seeming to fail to refresh ?
Yeah the horizontal scroll is not infinite and wasn't coded up to be. It still takes it quite a while to fully run out of data. A vertical scroll can be infinite and I can do that to test my wrap in external memory mode (should just work the same as my internal memory mode). I could also make another static version if you want the screen to persist to wait for possible long term corruption.
Max CS low before the change was the default of 4us. So I increased by 4x to get 16us. The total data transfers in this 1280 wide 16bpp mode is 2560 bytes. @220MB/s it takes 11.63us plus some extra setup overhead which is still within the 16us max CS low. It looks like after you exceed the limit the corruption seems to appear. There are two other refresh interval settings I can try at 6us and 8us. I am highly confident these both should have refresh issues in this mode, but should verify.
Do you mean that in a burst we are simply starving it of refresh opportunities as it switches from one row to the other? Maybe with the back to back accesses it doesn't have any time to recharge the cells properly after read. Perhaps it would then be the last row accessed that is being refreshed once CS finally goes high and gets time to "come up for air".
There is still the risk that errant COGs could set the IO on the HyperRAM pins to non-BIT DAC modes and drive out 3V to the 1.95V RAM and possibly damage it. 1V higher than VccQ inputs is the latchup specification for the 1.8V ISSI device in their data sheet.
Another approach (ugly) could be to lift the whole hyperram above ground using a schottky diode and bleed resistor
Interesting, so a manual change of the Distributed Refresh Interval eases the problem ? that likely shifts a simple monostable, but maybe at the expense of high temperature refresh ?
The data does say this cryptic comment :
The distributed refresh method requires that the host does not do burst transactions that are so long as to prevent the memory from
doing the distributed refreshes when they are needed.
That does sound like burst access can (somehow) interfere / trump refresh, and 'prevent' sounds more blunt than saying delay or defer you might expect from a smarter system.
There is dead time in the address portion of read where the memory could do a refresh, but I was wondering if a roll-over lacks that time, and so does not refresh the rolled-into row ?
One chip is 8 MB = 8388608 bytes.
Rows are 1024 bytes.
Divide and get 8192 rows.
Spec says keep CS low time limited to 4 us.
8192 rows × .000004 sec = 32.768 ms
That's similar to the time a given DRAM cell needs refreshing. Something like 64 ms.
So this is why i keep thinking refresh is paused when CS is low. Given the auto row refresh cadence is probably every 8 uS, and it's probably a dumb incrementing counter, too long of having CS low can cause the next row(s) to fail to get refreshed.
What's more, i believe the 1024 × 8 bit shift register after the sense amps is also composed of DRAM cells. So if it doesn't get clocked out, the data eventually fades.
Spread it across all the potential "rows" just as a test? Or does reading not refresh at all?
rogloh's results suggest a monostable/timer in there, that you can register-extend up to 16ms from 4ms, which is going to be ok if you do not read the whole device, or if you keep the temperature more modest.
Maybe 16ms is also ok, if you ensure you do read the whole device.
That timer seems to be more precise, maybe 20% or similar ?
Still unclear is if a roll-over read into the next row, is enough to count as a refresh for that next row ?
One possible way to manage 2 image planes could be to read half a row for each. The whole row is refreshed, for free, and the display plane is picked by which half is read.
Possibly, but I'm not sure of a use case where that would be an issue ?
The refresh decay times are much longer than any video line flyback time. Even longer than frame times.