evanh, I moved your rdfast up one line and it's all better.
Tear is gone and now perfectly aligned.
I have no idea why this works though... Must be something to do with the fifo…
Still using a pin to sync to Hsync. Need to figure out how to sync without that... Maybe I'll just use that smartpin trick described earlier...
What’s your takeaway on HR with 4.3/5.0 screens? Obviously one thing is you can park a lot of bitmaps for fast transfer to LCD(much faster than any other storage?). Do you know how fast you can read a screen data and update the LCD? Is the screen update near instantaneous or is there some scrolling visible. Nice job btw get that up and running!
How would you manage the P2 memory to allow for a background image with various buttons and button effects on touch etc.
Here's the latest version with 640 byte reads and better timing after moving rdfast into hsync.
CSn is low for 6.2 us (yellow trace on scope) and starts reading at beginning of hsync (green trace on scope).
This may violate the maximum CSn low time spec, but I think it will work anyway.
The video buffer part of the HRam should be fine as it's continuously being read and it refreshes after being read. It's the rest of HRam that may or may not be OK. I bet it's OK though...
evanh, I moved your rdfast up one line and it's all better.
Tear is gone and now perfectly aligned.
I have no idea why this works though... Must be something to do with the fifo…
Yeah, the FIFO will be the reason for sure. It'll be because it is always prefetching something like 8 to 16 longwords ahead. The RDFAST reloads the FIFO so wipes any prefetches.
I just realised my second double scanline buffer method is working not because of flip timing but because it operates ahead of the prefetching.
More bird fun at 16bpp VGA.
I think there's bandwidth for higher resolution...
See scope trace with CSn (yellow) and HSync (green). We're loading about 1/4 the pixels during Hsync and the fully loaded at about 1/3 into the visible line.
XGA at 16bpp looks to be very tight... Might work at 250 MHz though.
Got a couple issues.
First, I broke the image up into 2 parts, but that's not enough to fit, need to break into 3 parts.
Second, it's not liking me doing this as two 1024 byte reads. The bad horizontal lines are back...
Think I have XGA @ 16bpp going now. Had to up P2 clock to 300 MHz.
Doing 4 reads of 512 bytes for each line.
See scope traces with CSn (yellow) and HSync (green).
Wow! The evolution is... (please, excuse the pun)... visible!!!!
Been trying to follow every step of your progress, but seems like trying to run a marathon, for a sick-footed like me!
Second, it's not liking me doing this as two 1024 byte reads. The bad horizontal lines are back...
Think I have XGA @ 16bpp going now. Had to up P2 clock to 300 MHz.
Doing 4 reads of 512 bytes for each line.
See scope traces with CSn (yellow) and HSync (green).
Impressive.
300MHz is likely to be rather high for deployment.
Is there scope to nudge the HyperRAM CLK up a little, from 62.5MHz toward the MAX 100MHz, which might give more spare time, and allow SysCLK to come down ?
For refresh typicals, I make it
512/62.5M = 8.192us => Looks ok at room temp
1024/62.5M = 16.384us => fails tCSM? at room temp
With P2 Sysclk @ 300 MHz, the maximum HyperClk rate will hit 75 MHz, because you need to provide at least a 45/55% duty cycle clock, in order to agree with (001-99253 HyperBus Specification. Cypress: Document Number: 001-99253 Rev. *H, Revised February 06, 2019).
P.S. 45/55% does refer to any active level clock half period. It's not meant to be strictly interpreted as 45% High; 55% Low, or its reversal, forcefully.
"300MHz is likely to be rather high for deployment."
Ok, I dialed it down to 260 MHz and still works.
See scope traces with CSn (yellow) and HSync (green).
Cool.
I think there is some motivation to try to hit a spec point of 250MHz, at some Temperature and Vdd limits, in order to be able to use the HDMI features.
In practical terms. that may mean 4 layer board and/or heatsinks.
FWIR, I think OnSemi sims to a TJ of 150'C
Great to see more reference points.
Can you maybe add a table to the first post, to summarize all the working examples, with SysCLK, HyCLK, bpp, fH, fV, ImageSize etc ?
I think you use PFD = 10MHz in the PLL, does that appear free of any VCO jitter effects in your tests ?
In theory, the streamer could manage a block transfer at sysclock rate. Which is double the current method of
rep #1,##512
wfbyte inb
So sysclock could be reduced to 200 MHz and have the smartpin generate the HR clock at 100 MHz.
Interesting idea, I wonder if that would require more careful clock edge delay design ?
There is mention of a DDR Center Aligned Read Strobe (DCARS) feature in the HR docs, that seems to allow fine-tune of the RWDS edges, however that assumes those edges are sampling masters.
The P2 does not quite work like that as SysCLK is always used to sample the pins, so the only means to delay adjust would be an external clk buffer
Key question is, can a sampling eye be made large enough, to cope with device warming/cooling, as the P2 round trip async pin delays can be quite long.
Using pin register mode may reduce the variation in that timing ?
(CPLDs and FPGAs have pin-registers, intended to tighten sample/hold windows at highest speeds, and avoid routing delay effects)
There's obviously some spare time from HR burst read command to actually receiving the data. After the command is issued, first the smartpin based HR clock is set up using 7 instructions, then FIFO functions are set up for the RFBYTE using another 7 instructions, one of which is a WRFAST that may block, then finally the program settles down to waiting for the RWDS pin to go high.
As for clock-data sampling alignment, there is no indication Rayman was trying to adapt beyond waiting for RWDS.
Given our existing experience with pin speeds, and given the Prop is the clock master, the clock-data timing will likely be very dependant on both sysclock and board impedance characteristics. Those adapter boards Rayman is using will be a factor.
Maybe have a tuning program to empirically map out the good and bad sysclock rates. Each board layout will produce different results. Keeping the 8-bits of the hyperbus evenly impeded will be important.
As for clock-data sampling alignment, there is no indication Rayman was trying to adapt beyond waiting for RWDS.
One Tsu/Th check that may be simple to include, would be a 1bit streamer read of RWDS.
If that ever becomes marginal on sampling, it would change from a stable always 0xAAAA ( or 0x5555, depends on start-phase )
Comments
One is steady and looks in place.
The second is slowly scrolling downward...
Tear is gone and now perfectly aligned.
I have no idea why this works though... Must be something to do with the fifo…
Still using a pin to sync to Hsync. Need to figure out how to sync without that... Maybe I'll just use that smartpin trick described earlier...
How would you manage the P2 memory to allow for a background image with various buttons and button effects on touch etc.
HyperFlash though is a game changer. I have one on a board already. Going to play with that next.
There's a lot of code work still to do, but I think that P2 combined with HyperRam and HyperFlash is going to be awesome. Should be better that EVE2.
I would have gotten HyperFlash+HyperRam if Digikey stocked it... Mouser has it...
Imagine if we can directly transfer between HyperFlash and HyperRam…
CSn is low for 6.2 us (yellow trace on scope) and starts reading at beginning of hsync (green trace on scope).
This may violate the maximum CSn low time spec, but I think it will work anyway.
The video buffer part of the HRam should be fine as it's continuously being read and it refreshes after being read. It's the rest of HRam that may or may not be OK. I bet it's OK though...
Now, don't need to use a pin to do it...
Cleaned up the code a lot too.
Next up is to use smartpin to toggle HR clock instead of a helper cog.
They both scroll down. I didn't disable the HR code, maybe that's interfering for you.
I just realised my second double scanline buffer method is working not because of flip timing but because it operates ahead of the prefetching.
Storing only 640 bytes on each row to make things easy.
Here's the same image in 16bpp.
Had to load one half at a time... (need to get uSD going).
I think there's bandwidth for higher resolution...
See scope trace with CSn (yellow) and HSync (green). We're loading about 1/4 the pixels during Hsync and the fully loaded at about 1/3 into the visible line.
Got a couple issues.
First, I broke the image up into 2 parts, but that's not enough to fit, need to break into 3 parts.
Second, it's not liking me doing this as two 1024 byte reads. The bad horizontal lines are back...
Think I have XGA @ 16bpp going now. Had to up P2 clock to 300 MHz.
Doing 4 reads of 512 bytes for each line.
See scope traces with CSn (yellow) and HSync (green).
Wow! The evolution is... (please, excuse the pun)... visible!!!!
Been trying to follow every step of your progress, but seems like trying to run a marathon, for a sick-footed like me!
Impressive.
300MHz is likely to be rather high for deployment.
Is there scope to nudge the HyperRAM CLK up a little, from 62.5MHz toward the MAX 100MHz, which might give more spare time, and allow SysCLK to come down ?
For refresh typicals, I make it
512/62.5M = 8.192us => Looks ok at room temp
1024/62.5M = 16.384us => fails tCSM? at room temp
With P2 Sysclk @ 300 MHz, the maximum HyperClk rate will hit 75 MHz, because you need to provide at least a 45/55% duty cycle clock, in order to agree with (001-99253 HyperBus Specification. Cypress: Document Number: 001-99253 Rev. *H, Revised February 06, 2019).
P.S. 45/55% does refer to any active level clock half period. It's not meant to be strictly interpreted as 45% High; 55% Low, or its reversal, forcefully.
Perseverance is paying off!
Ok, I dialed it down to 260 MHz and still works.
See scope traces with CSn (yellow) and HSync (green).
Cool.
I think there is some motivation to try to hit a spec point of 250MHz, at some Temperature and Vdd limits, in order to be able to use the HDMI features.
In practical terms. that may mean 4 layer board and/or heatsinks.
FWIR, I think OnSemi sims to a TJ of 150'C
Great to see more reference points.
Can you maybe add a table to the first post, to summarize all the working examples, with SysCLK, HyCLK, bpp, fH, fV, ImageSize etc ?
I think you use PFD = 10MHz in the PLL, does that appear free of any VCO jitter effects in your tests ?
So sysclock could be reduced to 200 MHz and have the smartpin generate the HR clock at 100 MHz.
Interesting idea, I wonder if that would require more careful clock edge delay design ?
There is mention of a DDR Center Aligned Read Strobe (DCARS) feature in the HR docs, that seems to allow fine-tune of the RWDS edges, however that assumes those edges are sampling masters.
The P2 does not quite work like that as SysCLK is always used to sample the pins, so the only means to delay adjust would be an external clk buffer
Key question is, can a sampling eye be made large enough, to cope with device warming/cooling, as the P2 round trip async pin delays can be quite long.
Using pin register mode may reduce the variation in that timing ?
(CPLDs and FPGAs have pin-registers, intended to tighten sample/hold windows at highest speeds, and avoid routing delay effects)
Given our existing experience with pin speeds, and given the Prop is the clock master, the clock-data timing will likely be very dependant on both sysclock and board impedance characteristics. Those adapter boards Rayman is using will be a factor.
If that ever becomes marginal on sampling, it would change from a stable always 0xAAAA ( or 0x5555, depends on start-phase )
This is with jumper wires.
Looks like the burst read dies where the solid color lines continue to the end where the data lines are just floating.
I think this chip also has a different default latency clock, so that might explain the line down the center?