HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p)

1234579

Comments

  • Yeah, working on doing a reliable autosweep right now.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Here we go.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Cooled it down with some ice packs and got another 30 MHz at the top.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmgjmg Posts: 13,942
    evanh wrote: »
    Cooled it down with some ice packs and got another 30 MHz at the top.

    Great scans :)

    These are the waterfalls I derive from that, and the timing apertures they imply (I dropped the upper 4 pins, as they have unusual PCB loadings)

    1/379M-1/357M = 0.162ns timing aperture spread to cover 60 pins, register mode
    1/327M-1/287M = 0.426ns timing aperture spread to cover 60 pins, non-register mode
    TmNon-TmReg = 0.5399ns shift in mean aperture sample point, relative register to non register. (+2 Sysclks)

    As expected, register pin mode is much tighter spread, with 162ps, vs 426ps non-register mode


    That register spread is quite good, one sixth of 1 nanosecond across any pin,

    There is also an absolute shift in mean sampling point, of just over half a nano second, (0.5399ns + 2 Sysclks) with no skirt overlap, between Reg<->NoReg choices, that might be useful in some testing or design cases.

    Derived from   pin_echo2.txt
     Registered
    380 MHz 6   00000000000000000000000000000000000000000000000000000000000100
    379 MHz 6   00000000000000000000000000000000000000000000000000000000000100
    378 MHz 6   00000000000000000000000000000000000000000000010000000000000100
    377 MHz 6   00000000001100000010000000000000000000000000010000000000000100
    376 MHz 6   00000000001110000010000000000000000000000000010000000000000100
    375 MHz 6   00000000001110000110000000000000000000000000010000000000000100
    374 MHz 6   00000000001110000010000000000000000000000000010010000000000100
    373 MHz 6   00000000001110001110000000010000000000000000110010000000000100
    372 MHz 6   00000000001111111110000000010000000000000000111010000110000100
    371 MHz 6   00000000001111111110010000111100000000000000111010010110001100
    370 MHz 6   00010100011111111110011001111100010011000000111111110110011101
    369 MHz 6   00010100011111111110011011111110110011000000111111111110011101
    368 MHz 6   00010100011111111110011111111110111011000010111111111110011101
    367 MHz 6   00011100011111111110111111111111111011000011111111111110011101
    366 MHz 6   00011100011111111110111111111111111011000011111111111110011111
    365 MHz 6   00011100011111111111111111111111111111000011111111111110011111
    364 MHz 6   00011101111111111111111111111111111111010011111111111110011111
    363 MHz 6   00011101111111111111111111111111111111010011111111111110011111
    362 MHz 6   00011111111111111111111111111111111111110011111111111110111111
    361 MHz 6   00011111111111111111111111111111111111110011111111111110111111
    360 MHz 6   00011111111111111111111111111111111111110011111111111110111111
    359 MHz 6   00011111111111111111111111111111111111110111111111111111111111
    358 MHz 6   00011111111111111111111111111111111111110111111111111111111111
    357 MHz 6   00011111111111111111111111111111111111111111111111111111111111 << ignore upper pins
    356 MHz 6   00011111111111111111111111111111111111111111111111111111111111
    355 MHz 6   00111111111111111111111111111111111111111111111111111111111111
    354 MHz 6   01111111111111111111111111111111111111111111111111111111111111
    
    
    Unregistered
    328 MHz 4   00000000000000000000000000000000000000000000000000000000000000
    327 MHz 4   00000000000000000000010001000000000000000000000000000000000000
    ...
    297 MHz 4   00000011011111111111111111111110111111111111101111101111101100
    296 MHz 4   00001011011111111111111111111111111111111111111111111111111100
    295 MHz 4   00001011011111111111111111111111111111111111111111111111111100
    294 MHz 4   00001011011111111111111111111111111111111111111111111111111100
    293 MHz 4   00001011111111111111111111111111111111111111111111111111111100
    292 MHz 4   00001011111111111111111111111111111111111111111111111111111100
    291 MHz 4   00001011111111111111111111111111111111111111111111111111111100
    290 MHz 4   00001011111111111111111111111111111111111111111111111111111101
    289 MHz 4   00001011111111111111111111111111111111111111111111111111111111
    288 MHz 4   00001011111111111111111111111111111111111111111111111111111111
    287 MHz 4   00001111111111111111111111111111111111111111111111111111111111
    
    1/379M-1/357M = 0.162ns timing aperture spread to cover all pins, register mode
    1/327M-1/287M = 0.426ns timing aperture spread to cover all pins, non-register mode 
                            TmReg=1/((379M+357M)/2) = 2.717391304347826087e-9 TmNon = 1/((327M+287M)/2) = 3.2573289902280130293e-9
    TmNon-TmReg   = 0.5399ns shift in mean aperture sample point, relative register to non register. (+ 2 sysclks)
    
  • Some of that 162 ps could even be due XDIV = 20. I had to make it a larger value to achieve 1 MHz steps.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • jmgjmg Posts: 13,942
    evanh wrote: »
    Some of that 162 ps could even be due XDIV = 20. I had to make it a larger value to achieve 1 MHz steps.
    True, but the VCO only needs to be stable over a handful of sysclk cycles here, and the larger XDIV mainly increases lower frequency noise, or the variation over a few thousand sysclks.
    It is useful to include any XDIV variation anyway.

    If we take ~ 3 sysclks at ~ 10ns, a litmus test of 0.1% jitter variation (poor) on that, maps to ~ 10ps.
    That 162ps aperture is going to 'walk' with temperature (PVT) as well.
    It is nice to know what the pin-spread component is & how registered helps reduce spread.

    Your two plots above are at different cooling, right ?
    Do you have a guess/estimate of the dTemp on those 2 captures ?
    Did any code change between them ? It may be possible to extract a ps/°C indicator tempco from the two tables ?
  • Thanks evanh and jmg for crafting the tables, they will help me a lot.

    I'd specially enjoyed the waterfalls, because they easy the study of when and where things start to change.

  • jmgjmg Posts: 13,942
    Here is a compare of the two warmer/cold plots
    There seems to be about 16ps of variance (VCO jitter?) in the waterfall patterns, ie the same pin is last in both plots, but the exact ripples vary a couple of lines across the pins

    The shift due to test temperature looks to be ~ 143ps, which needs a Temperature change.
    Icepacks may imply 25°C or 30°C change ?
    That's a ballpark indication of 5 ps/°C, (=28.6°C) or over a 50°C operating range => 250ps, or over 75°C => 375ps

    pin_echo2.txt   Registered  COLD 
    363 MHz 6   00011101111111111111111111111111111111010011111111111110011111 <<
    362 MHz 6   00011111111111111111111111111111111111110011111111111110111111
    361 MHz 6   00011111111111111111111111111111111111110011111111111110111111
    360 MHz 6   00011111111111111111111111111111111111110011111111111110111111
    359 MHz 6   00011111111111111111111111111111111111110111111111111111111111
    358 MHz 6   00011111111111111111111111111111111111110111111111111111111111
    357 MHz 6   00011111111111111111111111111111111111111111111111111111111111
                                                        ^- last pin 
    
    pin_echo.txt  Registered WARMER
    347 MHz 6   00011101111111111110111111111111111111010011111111111110011111
    346 MHz 6   00011101111111111110111111111111111111010011111111111110111111
    345 MHz 6   00011111111111111111111111111111111111010011111111111110111111 <<
    344 MHz 6   00011111111111111111111111111111111111110011111111111111111111
    343 MHz 6   00111111111111111111111111111111111111110111111111111111111111
    342 MHz 6   01111111111111111111111111111111111111110111111111111111111111
    341 MHz 6   01111111111111111111111111111111111111110111111111111111111111
    340 MHz 6   01111111111111111111111111111111111111111111111111111111111111
                                                        ^- last pin 
    
    General wobble in pins 
     1/347M-1/345M = 16.706 ps 
    
    d Aperture/ dT ?°C   1/363M-1/345M = 143 ps
    

    I wonder if this MHz sweep technique can give a margin indicator to show where to avoid ?

    So I then scan the file for ~50% bits flipped, as a better quality indicator of the middle of the aperture (highest slope of change)
    pin_echo2.txt   Registered  COLD
    372 MHz  -> 6 : 17 1's
    371 MHz  -> 6 : 23 1's
    
    317 MHz  -> 5 : 15 1's
    315 MHz  -> 5 : 25 1's
    316 MHz  -> 5 : 29 1's
    - remains at 5 : 1's in bottom 58 pins, down to 80MHz test stop.
     1/372M-1/317M = 0.466ns ??
    
    312MHz Unregistered  -> 4 : 17 1's 
    158MHz Unregistered  -> 3 : 14 1's 
    
     1/158M-1/315M = 3.1545ns - more expected spacing
    
    
    The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.

    The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
    Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
    Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
    This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?

    Below 300+ MHz, the Registered delay is well behaved.


    Just above 300MHz is where both Registered & UnRegistered have an aperture point, so that's probably best avoided operating point.
  • RaymanRayman Posts: 9,718
    edited 2019-03-29 - 12:27:11
    Here's 1080p @ 4bpp. Can't do a pretty bird in 4bpp...

    This is full screen, whereas couldn't quite get to full screen at 8bpp.

    Need 300 MHz to get my monitor to like it.
    Still needed that NOP in read burst routine.
    Had to reorder nibbles in video loop. Glad that option was there!
    Had to add to front porch Hsync to make right.

    Prop Info and Apps: http://www.rayslogic.com/
  • roglohrogloh Posts: 1,303
    edited 2019-03-29 - 01:31:49
    Impressive, @Rayman . How much memory bandwidth was left for writes in this 4bpp 1080p mode?
  • See scope traces..
    Maybe half line each line
    Prop Info and Apps: http://www.rayslogic.com/
  • I was looking at your code rather in depth today.
    I noticed this line in most of your different display resolution sources:
                  'Wait for Pin_RWDS to go high
                  setse1    #$80+Pin_RWDS
                  waitse1  
    

    When i look at the P2 documentation V32, i see that setse1 has a pattern like this:
    %001_PPPPPP = INA/INB bit of pin %PPPPPP rises
    %010_PPPPPP = INA/INB bit of pin %PPPPPP falls
    %011_PPPPPP = INA/INB bit of pin %PPPPPP changes
    
    %10x_PPPPPP = INA/INB bit of pin %PPPPPP is low
    %11x_PPPPPP = INA/INB bit of pin %PPPPPP is high
    

    so the line with $80 is basically:
                  setse1    #%1000_0000+Pin_RWDS
                  'same as:
                  setse1    #%010_000000+Pin_RWDS
    

    So despite the comment, it's waiting for a falling edge of RWDS?
  • jmg wrote: »
    Icepacks may imply 25°C or 30°C change ?
    Yep, I'd think 30 °C. I have a little PTC based weather thermometer resting on top of the chip. And the ice packs were top and bottom. Bottom one was touching the eval board. I left it that way for a while before the cold run. Thermometer read about 2.0 °C. Got a little lower in the end.

    The first run, at room temperature, the thermometer was probably in the low 30's.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • I've been planning to buy some fine thermocouple wire, that I'll solder to the centre underside of the board. This'll give much more direct reading. I must get it ordered ...

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • evanhevanh Posts: 7,947
    edited 2019-03-29 - 05:17:15
    jmg wrote: »
    Did any code change between them ?
    Only change was the start and end MHz of the runs. I tried 400 MHz too but that locked up.

    BTW: There is 62 pins in the graphs. numbered 61 .. 0.

    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • RaymanRayman Posts: 9,718
    edited 2019-03-29 - 10:02:45
    Whicker: good catch. I’ll have to fix the text and/or code.

    It would be correct if said:
    'Wait for Pin_RWDS to go high and then low
    

    Maybe I should try to skew the timing somehow... On the other hand, it seems to work...
    Prop Info and Apps: http://www.rayslogic.com/
  • There is a difference in how those two event types get triggered. The edge detect version, %010, does as you are saying.

    The level detect version, %100, however, can catch you out as it will retrigger on a held level if that level condition is still present after checking (clearing the trigger). If not managed, this will create excess unexpected triggers. Which can produced duplicate or leading type effects where the program seems to be instant when it shouldn't be.
    "We suspect that ALMA will allow us to observe this rare form of CO in many other discs.
    By doing that, we can more accurately measure their mass, and determine whether
    scientists have systematically been underestimating how much matter they contain."
  • Rayman wrote: »
    Whicker: good catch. I’ll have to fix the text and/or code.

    It would be correct if said:
    'Wait for Pin_RWDS to go high and then low
    

    Maybe I should try to skew the timing somehow... On the other hand, it seems to work...

    I think it works because the data for the first pixel is from skipping at least one byte, or it unfortunately is testing the 8 nS data line hold time right at the margin. I think it's the former, because in WriteRamBurstSub we see the following line:
                  'Latency Clocks
                    nop
                    mov       i2,#27'24  'need to check that this is right...
    LoopLat1
                    outnot    #Pin_CK
                    djnz      i2,#LoopLat1
    

    I think the default latency is 5 clocks, and 2x latency?
    I haven't looked at the ISSI datasheet.

    But the latency clocking starts early before all of the command address has arrived. One rise for 15:8 and one fall for 7:0.
    This is already clocked manually in the first part of the code.

    So with the assumed default settings, 5-1=4 clock cycles for the first part of the latency. 4*2 = 8 transitions.
    5 more clock cycles for the second part of the latency. 5*2 = 10 transitions.

    So that's only 8+10=18 clock transitions.


    If default latency was 6, then it's 6-1=5, 5*2=10 transitions. Then adding to it is 6*2 = 12. 22 transitions to get the first data byte.
    As shown in the attached image, with a latency of 4, then it's a total of 14 transitions, then put the data on, then rising edge of clock.

    Somewhere there's something interesting going on is all I'm saying.
    1359 x 749 - 136K
  • I suppose I could to a slow read and see what's actually stored in the first few bytes of each row...
    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 13,942
    edited 2019-03-29 - 19:27:57
    whicker wrote: »
    ...

    so the line with $80 is basically:
                  setse1    #%1000_0000+Pin_RWDS
                  'same as:
                  setse1    #%010_000000+Pin_RWDS
    

    So despite the comment, it's waiting for a falling edge of RWDS?
    ...
    Somewhere there's something interesting going on is all I'm saying.
    Well spotted.
    The clock also changes gear, from SW send of the addr.command part, to using a SmartPin generated faster clock, for the latency 'spare clocks' region & following data
                  setbyte   outb,#0,#0      'CA7..0  'lower 3 bits are A2..A0
                  outnot    #Pin_CK
                  
                  setbyte   dirb,#$00,#0 'release control of buffer             
                
                  'configure smartpin to run HR clock
                  dirl      #Pin_CK
                  wrpin     #%1_00110_0,#Pin_CK
                  wxpin     #1,#Pin_CK  'add on every clock
                  mov       pa,#1
                  shl       pa,#30      'SysCLK/4 = 62.5 or 75MHz
                  wypin     pa,#Pin_CK
                  dirh      #Pin_CK
                           
                  'prepare to load buffer using fifo              
                  loc       ptra,#@HyperBuffer
                  mov       pa,pb
                  mul       pa,##480  'pb is section of buffer, 0..2
                  add       ptra,pa
                  wrfast    #0,ptra
                  
                   
                   'calc # bytes to read
                  mov       pa,##480           
                   
                  'Wait for Pin_RWDS to go high
                  setse1    #$80+Pin_RWDS
                  waitse1  
                  
                  nop  'need this at 300 MHz?
                  
                  'read in bytes
                  rep   #1,pa 
                  wfbyte    inb            
    .reploop2   
    
    Current code uses SysCLK/4, but that pushes to need 300MHz PLL's, which is ok for testing, but not really shippable...
    It may be possible to use SysCLK/2, and shuffle the code sections about a little, eg so 'prepare to load buffer using fifo goes before 'configure smartpin to run HR clock

    I notice octaRAM floats DQS slightly longer (goes low later), than HyperFLASH RQDS, so maybe a pull-down mode on the pin would tolerate RAM types more.
  • jmgjmg Posts: 13,942
    edited 2019-03-29 - 23:37:58
    jmg wrote: »
    ...
    The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.

    The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
    Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
    Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
    This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
    Looking some more into 466ps apparent clock width, I did find one mechanism that appeared in doing PCB trace delay spice runs.. and that is clock ringing.
    If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns 466ps separation measurement result ?
    I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?

    1501 x 928 - 80K
  • jmg wrote: »
    jmg wrote: »
    ...
    The Unregistered sweep to next aperture, gives a healthy 3.15ns between 4 & 3 changes, which nicely ~equals the period at 312MHz - ie an 'expected' result.

    The puzzle here, is the Registered 6 sysclk & 5 sysclk apertures change points, seem to be only 466ps apart ?!
    Below ~ 316MHz, Registered remains stable, (just pin 63 is late change at ~150MHz due to high loading effects on that pin, & the top 6 pins are clearly different )
    Not sure how to explain how it can pick up another sysclk in just 466 ps period change ?
    This is pushing up to the top-spec and well overclocked, so maybe other effects come into play ?
    Looking some more into 466ps apparent clock width, I did find one mechanism that appeared in doing PCB trace delay spice runs.. and that is clock ringing.
    If the internal cock on the PAD ring has some ringing, as per the spice, that could explain the 466ns separation measurement result ?
    I'n not sure what this could mean for things like edge-detection circuits, but I think that's not done in the PAD Ring ?

    It seems maybe it missed the first clock, but was registered on the second clock, instead?
  • jmgjmg Posts: 13,942
    cgracey wrote: »
    It seems maybe it missed the first clock, but was registered on the second clock, instead?
    If that was true, the change would not be 466ps ?
    The non-registered path does seem to have next-clock step size, which is more expected.

  • jmg wrote: »
    cgracey wrote: »
    It seems maybe it missed the first clock, but was registered on the second clock, instead?
    If that was true, the change would not be 466ps ?
    The non-registered path does seem to have next-clock step size, which is more expected.

    Maybe there is some delay between the clocking that occurs in the actual I/O pad and the internal core flops? Or, the data paths subject to the two clocking contexts?
  • RaymanRayman Posts: 9,718
    edited 2019-03-30 - 21:20:21
    I just did a test where I made the first two pixels in the first row different. Did the fast row read and then output result to serial port.
    Looks good. We are getting the data read in where it is supposed to be.

    Actually, it's 4 pixels, two bytes...
    Prop Info and Apps: http://www.rayslogic.com/
  • I was just wondering why the 640x480x16bpp bird photo looks so good on my 1080p monitor...
    Detail around the beak is much smoother than is should be at this resolution.

    Turns out the monitor actually has upscaling built into it.

    It's a Samsung S24E450. Not even my best monitor...

    This is really good for a P2 with VGA HDMI focus...
    Think I'm going to target 16bpp VGA for games. 1080p for sure for profi stuff.
    3024 x 4032 - 3M
    Prop Info and Apps: http://www.rayslogic.com/
  • Here's what it looks like without upscaling:
    629 x 401 - 12K
    Prop Info and Apps: http://www.rayslogic.com/
  • Just tried HyperRam code for HyperFlash. Looks to me like has same exact read and write protocol.

    But, got no response at all from HyperFlash…
    Went back to datasheet and now I see that the CSn ball is different for HyperFlash :(
    Looks like all the other needed pins are the same, except this one.
    Prop Info and Apps: http://www.rayslogic.com/
  • Hi Rayman

    They did made it that way, in order to enable the existence of multi-chip package options, where both can be accomodated, one HyperFlash and one HyperRam, their respective silicon dices stacked, one on top of the other, using a unique FBGA package footprint.

    Be aware that some multi-chip packaged devices could also present a up-to 1 nS timing penalty, at some/all the signals and their respective interrelationships.
  • Ok, have to order some new boards with the fix...

    Also looking at dual HyperRam+HyperFlash along with dual eMMC (see attached)
    1186 x 1217 - 88K
    Prop Info and Apps: http://www.rayslogic.com/
Sign In or Register to comment.