HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p)

1246789

Comments

  • These are both giving me a double image...
    One is steady and looks in place.
    The second is slowly scrolling downward...
    3024 x 4032 - 3M
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,474
    edited 2019-03-23 - 12:56:41
    evanh, I moved your rdfast up one line and it's all better.
    Tear is gone and now perfectly aligned.
    I have no idea why this works though... Must be something to do with the fifo…

    Still using a pin to sync to Hsync. Need to figure out how to sync without that... Maybe I'll just use that smartpin trick described earlier...
    hsync       rdfast  ##1<<31,##@HyperBuffer
                xcont   m_bs,#0         'horizontal sync            
                xcont   m_sn,#1            
        _ret_   xcont   m_bv,#0
    
    3024 x 4032 - 2M
    Prop Info and Apps: http://www.rayslogic.com/
  • T ChapT Chap Posts: 3,990
    edited 2019-03-23 - 13:09:45
    What’s your takeaway on HR with 4.3/5.0 screens? Obviously one thing is you can park a lot of bitmaps for fast transfer to LCD(much faster than any other storage?). Do you know how fast you can read a screen data and update the LCD? Is the screen update near instantaneous or is there some scrolling visible. Nice job btw get that up and running!

    How would you manage the P2 memory to allow for a background image with various buttons and button effects on touch etc.
  • RaymanRayman Posts: 9,474
    edited 2019-03-23 - 15:58:54
    Might be overkill for 4.3" screens, maybe not... depends...

    HyperFlash though is a game changer. I have one on a board already. Going to play with that next.

    There's a lot of code work still to do, but I think that P2 combined with HyperRam and HyperFlash is going to be awesome. Should be better that EVE2.

    I would have gotten HyperFlash+HyperRam if Digikey stocked it... Mouser has it...
    Imagine if we can directly transfer between HyperFlash and HyperRam…
    Prop Info and Apps: http://www.rayslogic.com/
  • Here's the latest version with 640 byte reads and better timing after moving rdfast into hsync.
    CSn is low for 6.2 us (yellow trace on scope) and starts reading at beginning of hsync (green trace on scope).

    This may violate the maximum CSn low time spec, but I think it will work anyway.
    The video buffer part of the HRam should be fine as it's continuously being read and it refreshes after being read. It's the rest of HRam that may or may not be OK. I bet it's OK though...
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,474
    edited 2019-03-23 - 20:17:39
    Ok, finally figured out how to make it sync to HSync using COGATN.
    Now, don't need to use a pin to do it...

    Cleaned up the code a lot too.

    Next up is to use smartpin to toggle HR clock instead of a helper cog.
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,474
    edited 2019-03-23 - 21:37:29
    Clock now run via SmartPin, helper cog no longer needed. Cleaned up a bit more with ver.1N.
    Prop Info and Apps: http://www.rayslogic.com/
  • Rayman wrote: »
    These are both giving me a double image...
    One is steady and looks in place.
    The second is slowly scrolling downward...

    They both scroll down. I didn't disable the HR code, maybe that's interfering for you.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Rayman wrote: »
    evanh, I moved your rdfast up one line and it's all better.
    Tear is gone and now perfectly aligned.
    I have no idea why this works though... Must be something to do with the fifo…
    Yeah, the FIFO will be the reason for sure. It'll be because it is always prefetching something like 8 to 16 longwords ahead. The RDFAST reloads the FIFO so wipes any prefetches.

    I just realised my second double scanline buffer method is working not because of flip timing but because it operates ahead of the prefetching.

    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Got it going at 16bpp. Using two 640 pixel reads for each line.
    Storing only 640 bytes on each row to make things easy.
    Prop Info and Apps: http://www.rayslogic.com/
  • No more dithering on the bird!
    Here's the same image in 16bpp.
    Had to load one half at a time... (need to get uSD going).
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,474
    edited 2019-03-24 - 16:30:38
    More bird fun at 16bpp VGA.
    I think there's bandwidth for higher resolution...
    See scope trace with CSn (yellow) and HSync (green). We're loading about 1/4 the pixels during Hsync and the fully loaded at about 1/3 into the visible line.
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,474
    edited 2019-03-26 - 18:02:10
    XGA at 16bpp looks to be very tight... Might work at 250 MHz though.

    Got a couple issues.
    First, I broke the image up into 2 parts, but that's not enough to fit, need to break into 3 parts.
    Second, it's not liking me doing this as two 1024 byte reads. The bad horizontal lines are back...
    3024 x 4032 - 2M
    3024 x 4032 - 2M
    Prop Info and Apps: http://www.rayslogic.com/
  • Looking gooood!
  • Thanks!

    Think I have XGA @ 16bpp going now. Had to up P2 clock to 300 MHz.
    Doing 4 reads of 512 bytes for each line.
    See scope traces with CSn (yellow) and HSync (green).
    Prop Info and Apps: http://www.rayslogic.com/
  • Hi Rayman

    Wow! The evolution is... (please, excuse the pun)... visible!!!!
    Been trying to follow every step of your progress, but seems like trying to run a marathon, for a sick-footed like me! :smile:
  • jmgjmg Posts: 13,535
    Rayman wrote: »
    Second, it's not liking me doing this as two 1024 byte reads. The bad horizontal lines are back...
    Think I have XGA @ 16bpp going now. Had to up P2 clock to 300 MHz.
    Doing 4 reads of 512 bytes for each line.
    See scope traces with CSn (yellow) and HSync (green).

    Impressive.
    300MHz is likely to be rather high for deployment.

    Is there scope to nudge the HyperRAM CLK up a little, from 62.5MHz toward the MAX 100MHz, which might give more spare time, and allow SysCLK to come down ?

    For refresh typicals, I make it
    512/62.5M = 8.192us => Looks ok at room temp
    1024/62.5M = 16.384us => fails tCSM? at room temp

  • YanomaniYanomani Posts: 810
    edited 2019-03-24 - 19:10:48
    Hi jmg

    With P2 Sysclk @ 300 MHz, the maximum HyperClk rate will hit 75 MHz, because you need to provide at least a 45/55% duty cycle clock, in order to agree with (001-99253 HyperBus Specification. Cypress: Document Number: 001-99253 Rev. *H, Revised February 06, 2019).

    P.S. 45/55% does refer to any active level clock half period. It's not meant to be strictly interpreted as 45% High; 55% Low, or its reversal, forcefully.
  • Looking good :smiley:

    Perseverance is paying off!
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • "300MHz is likely to be rather high for deployment."

    Ok, I dialed it down to 260 MHz and still works.
    See scope traces with CSn (yellow) and HSync (green).
    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 13,535
    Rayman wrote: »
    "300MHz is likely to be rather high for deployment."

    Ok, I dialed it down to 260 MHz and still works.
    See scope traces with CSn (yellow) and HSync (green).

    Cool.
    I think there is some motivation to try to hit a spec point of 250MHz, at some Temperature and Vdd limits, in order to be able to use the HDMI features.
    In practical terms. that may mean 4 layer board and/or heatsinks.
    FWIR, I think OnSemi sims to a TJ of 150'C
  • Ok, back to 250 MHz with 720p @8bpp.

    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 13,535
    Rayman wrote: »
    Ok, back to 250 MHz with 720p @8bpp.

    Great to see more reference points.
    Can you maybe add a table to the first post, to summarize all the working examples, with SysCLK, HyCLK, bpp, fH, fV, ImageSize etc ?

    I think you use PFD = 10MHz in the PLL, does that appear free of any VCO jitter effects in your tests ?

  • In theory, the streamer could manage a block transfer at sysclock rate. Which is double the current method of
                  rep   #1,##512
                  wfbyte    inb
    

    So sysclock could be reduced to 200 MHz and have the smartpin generate the HR clock at 100 MHz.

    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • jmgjmg Posts: 13,535
    evanh wrote: »
    In theory, the streamer could manage a block transfer at sysclock rate. Which is double the current method of
                  rep   #1,##512
                  wfbyte    inb
    

    So sysclock could be reduced to 200 MHz and have the smartpin generate the HR clock at 100 MHz.

    Interesting idea, I wonder if that would require more careful clock edge delay design ?
    There is mention of a DDR Center Aligned Read Strobe (DCARS) feature in the HR docs, that seems to allow fine-tune of the RWDS edges, however that assumes those edges are sampling masters.
    The P2 does not quite work like that as SysCLK is always used to sample the pins, so the only means to delay adjust would be an external clk buffer
    Key question is, can a sampling eye be made large enough, to cope with device warming/cooling, as the P2 round trip async pin delays can be quite long.
    Using pin register mode may reduce the variation in that timing ?
    (CPLDs and FPGAs have pin-registers, intended to tighten sample/hold windows at highest speeds, and avoid routing delay effects)
  • There's obviously some spare time from HR burst read command to actually receiving the data. After the command is issued, first the smartpin based HR clock is set up using 7 instructions, then FIFO functions are set up for the RFBYTE using another 7 instructions, one of which is a WRFAST that may block, then finally the program settles down to waiting for the RWDS pin to go high.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • As for clock-data sampling alignment, there is no indication Rayman was trying to adapt beyond waiting for RWDS.

    Given our existing experience with pin speeds, and given the Prop is the clock master, the clock-data timing will likely be very dependant on both sysclock and board impedance characteristics. Those adapter boards Rayman is using will be a factor.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Maybe have a tuning program to empirically map out the good and bad sysclock rates. Each board layout will produce different results. Keeping the 8-bits of the hyperbus evenly impeded will be important.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • jmgjmg Posts: 13,535
    evanh wrote: »
    As for clock-data sampling alignment, there is no indication Rayman was trying to adapt beyond waiting for RWDS.
    One Tsu/Th check that may be simple to include, would be a 1bit streamer read of RWDS.
    If that ever becomes marginal on sampling, it would change from a stable always 0xAAAA ( or 0x5555, depends on start-phase )

  • This is my first attempt at running the HyperVGA_640_x_480_16bpp_1d.spin2 code.

    This is with jumper wires.
    Looks like the burst read dies where the solid color lines continue to the end where the data lines are just floating.

    I think this chip also has a different default latency clock, so that might explain the line down the center?
    1920 x 1080 - 609K
    1920 x 1080 - 751K
Sign In or Register to comment.