HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p)

1234689

Comments

  • XIP would explain the focus on the wrapped burst mode.
    Prop Info and Apps: http://www.rayslogic.com/
  • RaymanRayman Posts: 9,480
    edited 2019-03-26 - 23:39:41
    This is funny. I'm in the middle of trying 1080p. All I did was change the video driver and have a stable image...
    Did have to change P2 clock to 300 MHz to get a stable image.

    Thing is, at 300 MHz, might be able to pull off 8 bpp. It's close. Thought I'd be stuck at 4bpp for sure...

    I'm going to need a bigger bird :)

    Darn it... Can't get both 8bpp and steady image... And, colored birds don't look good in 4 bpp.
    3024 x 4032 - 2M
    3024 x 4032 - 2M
    Prop Info and Apps: http://www.rayslogic.com/
  • Have you done any adjustments, both at Hsync timing tables and also at Vsync ones?

    What are the exact numbers you are using to control the following parameters:

    Intended screen refresh rate (Hz):
    Pixel clock (MHz):
    Number of displayed pixels at each horizontal scan line:
    Horizontal front porch, sync and back porch; in number of pixel clocks:
    Humber of displayed lines at each frame:
    Vertical front porch, sync and back porch; in number of horizontal lines:
  • Nice work! Try a 16 color dithered.

    For a nice trick, you can leave the bird at the lower resolution say 7:20 or a little less, and then let the graphics program dither at 1080p.

    Scale the bird up to fit. Done that way, the dither has more to work with.
    Do not taunt Happy Fun Ball! @opengeekorg ---> Be Excellent To One Another SKYPE = acuity_doug
    Parallax colors simplified: https://forums.parallax.com/discussion/123709/commented-graphics-demo-spin<br>
  • Too late today, try again tomorrow...
    Prop Info and Apps: http://www.rayslogic.com/
  • evanhevanh Posts: 7,138
    edited 2019-03-27 - 03:53:45
    I've just been looking at the track spacing guidelines, from Henrique's links, which I've never paid much attention to in the past. They've spec'd, 1.5H and 2H, in terms of insulation thickness from ground layer. I guess this will eliminate both voltage overshoot and crosstalk effects.

    If wanting to keep the tracks packed close then the PCB thickness needs to be reduced, ie: 4-layer board. But this'll also increase overall impedance and therefore needs to accommodate more lag on the clocking.

    The software will need to be adaptable for different boards. I might have a go at doing an accessory board too.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • YanomaniYanomani Posts: 820
    edited 2019-03-27 - 04:17:37
    Hi evanh

    The availability of uneven insulating/metal layer stacks, where the elements at the center could be made of higher thickness, when compared to the top and bottom ones, and also to the ones just under the surface, in case of a six-layered board, makes it feasible to arrange the environment that surrounds the most sensitive signal lanes, enabling their fine tune, even using an auto-router.

    Do a simple Google for Würth Elektronik signal integrity (quotation marks dismissed)

    You'll be amazed with by what your eyes will see..

    Food for thinking.

    Henrique
  • Sticking with a simple double-sided PCB, thickness is about 2.5mm, so tracks should be about 5mm apart! That's not even close to practical. :(

    I guess then, the shorter the better.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Oh, wow, the balls under the HyperRAM chip are dense. The old 100nF leaded caps I've always uses as my small components are enormous beside that. Rayman, I can see why you were pleased to have routed between them.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Hi evanh

    IMHO, there is something strange in the numbers you'd found...

    Weren't you using dimensions in mm in a calculation which expects units to be in mils?

    I know that those web calculators are not the best way to find exact values, but anyway, i'd found the mismatch between results way bigger than acceptable.

    Henrique

    Microstrip_Impedance_Calculator.png

    Edge_Coupled_Microstrip_Impedance_Calculator_01.png
  • S = 2 x H1, if I've read the guidelines correctly.

    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Hi evanh

    Eh eh eh, thanks for pointing me to the source of all messes... that value is clearly stated at the previously linked Cypress' document: "HyperFlash™ and HyperRAM™ Layout Guide", paragraph 4.2.3 Signal Spacing Constraints from Other Signals.

    Yes, you are right, because one of the possible interpretations of the text contents, was the main culprit of our apparent dissention here.

    And again, you are right, from the standpoint of the information presented at the topic, BUT, text contents doesn't realy reflects the real information it intended to pass to any audience it could be aiming.

    When you read the "signal spacing constraints from other signals" text, what the original specc writer intended to pass ahead, was that you need to ensure that separation, in order to avoid interference from "those other" signals, to be coupled to HyperRam's sensitive signal lanes.

    That text is fruit of multiple and successive translations, between several idioms, so the intention of the original writer (or translator) (suposing it was a person with enough knowledge about that subject) to write (or translate) something important, in his mother's language) was to ensure one would not pass, says, a track carrying some high power RF contents signal, or the return current from a heavy amperage motor driver, too close to those sensitive lanes.

    It's like a gossip: each one that passes it ahead, enhances the good, or the bad aspects of the ever unknown original content, at his own personal preference.

    If the text does realy intended to mean what it apparently tells, the first victim would be their own "Package Breakout Recommendations", on page three. It's completelly impossible to follow that rule, strictly, and connect a HyperSomething to any other integrated circuit, up to the extent of my humble knowledge.

    Sure, there are other rules presented at that document, that are far from easy to follow, e. g., the ones at paragraph 4.1: Microstrip versus Stripline versus Co-Planar Signal Routing. In fact, without using a four-layer stackup, even using a finished pcb thickness of 1.6 mm, none of the calculators I was able to find could give me that 50 Ohm value for trace impedance; the closest I was able to reach was 75 Ohm for single-ended, 150 Ohm for differential impedance (nominal values), using the edge coupled microstrip model with practical values for the needed parameters.

    If we strive to insert a VSS (GND) lane between any two traces (but not under the chips; it would be a real mess, trying to do it), up to the vicinity of both the HR and P2 chips, or any interveining connectors, and also ensure the limiting capacitance values for each lane could be respected, we can be considered as very lucky persons.

    Maximum operating frequency will reflect both when we could atain any victory, or be completely busted, despite any efforts we could do.

    Hope it helps

    Henrique



  • Chemandy Electronics also has a bunch of assorted online calculators that could help estimating the values we are looking for, before deciding what kind of PCB stackup and the better thickness we need to use, to attain our purposes.

    chemandy.com/default.htm
  • RaymanRayman Posts: 9,480
    edited 2019-03-28 - 00:10:19
    Got 1080p @ 8bpp going.

    But, can't do all 1920 in a row, so doing 640+640+320 read for each line.
    My monitor doesn't like the timing. Probably because 150 MHz instead of 148.5, but the later isn't stable.
    This is stable, but the first column is shifted over about 6 pixels or so.
    I had to play with H. porch numbers to get this, otherwise horizontal position was way off...

    Also, I had to insert a NOP before the read loop. Not sure if I have a residual test in my code, or if maybe at 300 MHz the timing (as JMG described earlier) has changed...
    Prop Info and Apps: http://www.rayslogic.com/
  • RJSM wrote: »
    Here's some P2 HyperRAM code I've been using successfully for some time now - maybe this will be of some help. I've been playing around with HR for a couple of years now but really haven't looked at it closely in quite a while. The attached word doc gives some background info

    Richard

    Hey Richard,
    I note you are already using streamer copying. But are restricting yourself to same two sysclocks per transfer cycle as Rayman has. Have you tried configuring for 1:1 ratio instead? You wouldn't need to run so hot to get the speed.

    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • jmgjmg Posts: 13,554
    evanh wrote: »
    ... But are restricting yourself to same two sysclocks per transfer cycle as Rayman has. Have you tried configuring for 1:1 ratio instead? You wouldn't need to run so hot to get the speed.
    I was wondering the same.
    Streamer allows SysCLK/1 rates, so the 1080p @ 8bpp might be able to run at 75MHz HR_CLK (DDR, as now) but at lower 150MHz SysCLK, which is much more viable/practical commercially than 300MHz.
    That assumes there is still enough processor time to complete, at the lower SysCLK.

    IIRC the HDMI tests ran the streamer at /1 for 250MHz.

  • YanomaniYanomani Posts: 820
    edited 2019-03-28 - 04:01:52
    HDMI can leverage from that kind of time relationship, because everything that is related to it happens internally to P2.

    In case of HyperBus-enabled devices, there is a tight relationship between every HYP_CK up/down border and its occurence, that needs to hit the HYP_DATA(A) or HYP_DATA(B) valid-windows, the nearest to its center position as feasible.

    The HYP_CK up/down borders are allowed to occur up to 270º (3/4 of the corresponding semi-period) after the DATA-valid window has started (eventually a little after, in fact), but they are nor allowed to occur at the same time.

    Since P2 has no provisions to change any timing-related signal in sync with the falling edge of SysClk, there is no way to get the right HYP_DATA-to-HYP_CLK relationship, for the HyperSomething to work at HYP_CK = SysClk/2.

    The following image, crafted using P2 SysClk @ 160 MHz, is a depict of the writing of the first 3 words (CA-cycle; fixed two times latency count, thus RWDS starts High, automagically, 12-13 nS after CSn is brought LOW by the controller (P2)).

    P2_Data_to_HypR_WR_00.png

    I'll try to remake it, now targeting SysClk @ 250 MHz and also 300 MHz, to help others to understand the involved timing relationships.

    Hope it helps.

    Henrique

    P.S. At the above figure, I'd assumed that HYP_D(7:0) will appear at P2 bare-metal pins, two SysClks + ~1 nS after the corresponding Streamer NCO_ROLL transition has occured, inside P2 Streamer control logic.

    If someone (Chip???) could confirm those numbers are right, I'll be gratefull.

    P.S.-2 Another assumption I've made was that HYP_CK will appear at P2 bare-metal pins ~1 nS after the correspoding SysClk positive transition, but the Smart Pin NCO internal condition that triggered it could, in fact, be active a full SysClk period before that moment.
    1794 x 420 - 23K
  • Yanomani wrote: »
    Since P2 has no provisions to change any timing-related signal in sync with the falling edge of SysClk, there is no way to get the right HYP_DATA-to-HYP_CLK relationship, for the HyperSomething to work at HYP_CK = SysClk/2.
    There is a high likelihood of certain clock rates being fine while others will be broken. I make this claim based on experience with measuring the echo speed of OUT to IN.

    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • jmgjmg Posts: 13,554
    Yanomani wrote: »
    ....

    P.S. At the above figure, I'd assumed that HYP_D(7:0) will appear at P2 bare-metal pins, two SysClks + ~1 nS after the corresponding Streamer NCO_ROLL transition has occured, inside P2 Streamer control logic.

    If someone (Chip???) could confirm those numbers are right, I'll be gratefull.

    P.S.-2 Another assumption I've made was that HYP_CK will appear at P2 bare-metal pins ~1 nS after the correspoding SysClk positive transition, but the Smart Pin NCO internal condition that triggered it could, in fact, be active a full SysClk period before that moment.

    Yes, right now, the exact delays, and how they vary with PVT, are undefined. We have an 'it works' reference point.

    The WAITSE1 (_/=) for RWDS will release with some tbd phase delays on the following opcodes
                  setse1    #$80+Pin_RWDS
                  waitse1    ' edge sync  - HR ready
                  nop                 'added for 300MHz 
                  rep   #1,pa 
                  wfbyte    inb    'read in bytes, 75MHz HR_CLK, 150MB/s data block
    

    HR delay is spec'd quite broad, but will likely not be near the limit values.

    CK transition to DQ Valid tCKD 1~5.5 ns
    That window alone, is a >= whole SysCLK at >= 222MHz

    From a CLK out pin, to sampled-in pin, P2 itself will have some similar Min-Max time window, which could include 1 SysCLK + tMin~tMax form

    That may also vary with user choice of Pin mode, and will vary from pin to pin.
    Pin A Logic
    Pin A Schmitt
    Pin A Live, or Clocked (expect less spread in Clocked, but likely more absolute delay)

    Thinking of ways to test the sample points, I see you can buy delay lines, nice looking parts, 3.2ns ~ 14.8ns in ~10ps steps.

    SY89295UTG Microchip IC DELAY LN 1024TAP PROG 32TQFP stk 614 $10.28/1 dT ~ 10ps (nominal) 3.2ns ~ 14.8ns 2.375V ~ 3.6V -40°C ~ 85°C Surface Mount 32-TQFP
    and that would need something like this to do LVPECL to LVCMOS
    1 x PI6C49X0201WIE IC TRNSLTR UNIDIRECTIONAL 8SOIC $2.83/1 Input Signal : HCSL, LVDS, LVHSTL, LVPECL, SSTL Output Signal LVCMOS, LVTTL
    (LVCMOS to DiffIn can be resistor divider)

    or, maybe FPGAs allow run time access to their pin-delay elements ?
  • YanomaniYanomani Posts: 820
    edited 2019-03-28 - 05:15:27
    evanh, it's fantastic! :smile:

    I never saw something like this documented before.

    If you can share the numbers (SysClk frequency and Out versus In relationship, in number of clock cycles), perhaps I can be able to craft a more accurate timing diagram, showing that relationships in a way anyone can use them with advantage.
  • Have you done it by jumping Out pins to In pins, externally, or you were using an HyperRam, timing from Hyp_Ck change to Hyp_Data or Hyp_RWDS available at P2 bare-metal pins?
  • jmgjmg Posts: 13,554
    evanh wrote: »
    There is a high likelihood of certain clock rates being fine while others will be broken. I make this claim based on experience with measuring the echo speed of OUT to IN.
    Yes, good point.
    I'm wondering if there is enough resource within P2, to do some of the testing ?
    eg you could sweep the SysCLK over a wide range, checking all 8 bits of a known RAM alternating bit test pattern, and the aperture where the pin-change matches the sample points would give unstable results.
    (Write could be done at some low frequency, to give a known value & load of the 48 bit R/W/Reg/Addr would need to be done at a safe, lower MHz )
    Not all 8 bits would fail at the exact same limits, so some spread across pins could be derived, and that MHz-band would also vary with heating, so some tempco could be extracted.

    At 200MHz a 2MHz step in VCO, is ~ 49.5ps of time movement.
    With a WAIT for RWDS included, it might be that provides enough of a 'snap' to broaden tolerance, and avoid 'bad' alignments ?

  • evanhevanh Posts: 7,138
    edited 2019-03-28 - 05:59:58
    Just plain open Prop2 pins on P2ES-Eval board. Set OUT high and scan IN for low->high. Using different sysclock rates yields different number of clocks for low->high echo return.

    Eg: (Add 2 to each "w" line for real lag in clocks)
    Total smartpins = 64   1111111111111111111111111111111111111111111111111111111111111111
    System clock frequency set to 80 MHz
    
      OUT fed back
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   11111111111111111111111111111111111111111111111111111111111111
    3   11111111111111111111111111111111111111111111111111111111111111
    4   11111111111111111111111111111111111111111111111111111111111111
    5   11111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
      Unregistered
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   00000000000000000000000000000000000000000000000000000000000000
    3   11111111111111111111111111111111111111111111111111111111111111
    4   11111111111111111111111111111111111111111111111111111111111111
    5   11111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
      Registered
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   00000000000000000000000000000000000000000000000000000000000000
    3   00000000000000000000000000000000000000000000000000000000000000
    4   00000000000000000000000000000000000000000000000000000000000000
    5   11111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
    Total smartpins = 64   1111111111111111111111111111111111111111111111111111111111111111
    System clock frequency set to 160 MHz
    
      OUT fed back
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   11111111111111111111111111111111111111111111111111111111111111
    3   11111111111111111111111111111111111111111111111111111111111111
    4   11111111111111111111111111111111111111111111111111111111111111
    5   11111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
      Unregistered
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   00000000000000000000000000000000000000000000000000000000000000
    3   00000000000000000000000000000000000000000000000000000000000000
    4   11111111111111111111111111111111111111111111111111111111111111
    5   11111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
      Registered
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   00000000000000000000000000000000000000000000000000000000000000
    3   00000000000000000000000000000000000000000000000000000000000000
    4   00000000000000000000000000000000000000000000000000000000000000
    5   01111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
    Total smartpins = 64   1111111111111111111111111111111111111111111111111111111111111111
    System clock frequency set to 320 MHz
    
      OUT fed back
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   11111111111111111111111111111111111111111111111111111111111111
    3   11111111111111111111111111111111111111111111111111111111111111
    4   11111111111111111111111111111111111111111111111111111111111111
    5   11111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
      Unregistered
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   00000000000000000000000000000000000000000000000000000000000000
    3   00000000000000000000000000000000000000000000000000000000000000
    4   00000000000000000000000000000000000000000000000000000000000000
    5   01111111111111111111111111111111111111111111111111111111111111
    6   11111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    
      Registered
    w   60 . . . .50 . . . .40 . . . .30 . . . .20 . . . .10 . . . . 0
    0   00000000000000000000000000000000000000000000000000000000000000
    1   00000000000000000000000000000000000000000000000000000000000000
    2   00000000000000000000000000000000000000000000000000000000000000
    3   00000000000000000000000000000000000000000000000000000000000000
    4   00000000000000000000000000000000000000000000000000000000000000
    5   00000000000000000000000000000000000000000000000000000000000000
    6   01111111111111111111111111111111111111111111111111111111111111
    7   11111111111111111111111111111111111111111111111111111111111111
    

    EDIT: Added "Registered" graphs
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • I'm musing here, before going to bed (2:47 AM, must wake-up at 06:00)...

    In order to get the most realistic results, we simply need to use two wires, and connect, says, Hyper_CK and Hyper_RWDS to two available bits at any group of 4-ones, sharing the same Vio. Preferably at D[1:0] of any of the two 32-bit pin groups.

    Then start a free Cog and fire its Streamer in a WRFBYTE, 2-bit submode, synchronizing the start of this action with the video Hsync pulse.

    After some full horizontal lines were captured, we can show the resulting observations, for analisys...

    Only a fading-down-neuron-mass thought, before my snooze-mode begins....

    :smile:
  • evanhevanh Posts: 7,138
    edited 2019-03-28 - 06:02:23
    Interesting, I hadn't paid enough attention to "Registered" results (Added above). They typically add two clocks to any lag measurement ... but I just realised that isn't always true. Sometimes it only looks to be one clock.

    So, clocked I/O can indeed be used for adjusting the timing. I guess that's where Chip was leading yesterday.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Note pin 61 is showing signs of being slower than the others. That pin will have a longer track attached than the others on the P2ES-Eval board.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Yanomani wrote: »
    ... going to bed (2:47 AM, must wake-up at 06:00)...
    Ouch. Three hours never works well.

    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • I knew I could get it, even with how silly it looks.
    Added Bypass Caps
    10K Pullup on CSn
    10K Pulldown on CLK and RWDS
    Treat CLK as a transmission line (the gray and white wires).

    Screen photo is running HyperVGA_640_x_480_8bpp_1n.spin2
    After telling monitor to auto-adjust, the green border of TestPat2.bmp is fully visible.

    Higher rates probably won't work here, but never say never :wink:
  • Way to go Whicker! It's amazing what can work.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • jmgjmg Posts: 13,554
    evanh wrote: »
    Interesting, I hadn't paid enough attention to "Registered" results (Added above). They typically add two clocks to any lag measurement ... but I just realised that isn't always true. Sometimes it only looks to be one clock.

    So, clocked I/O can indeed be used for adjusting the timing. I guess that's where Chip was leading yesterday.

    Clocked IO should also get lower spread, which should be visible by sweeping the Mhz more slowly from the 4 at 160MHz to the 5 at 320Mhz

Sign In or Register to comment.