HyperRam/Flash as VGA screen buffer (Now XGA, 720p &1080p)

RaymanRayman Posts: 9,482
edited 2019-04-15 - 18:41:59 in Propeller 2
The attached file outputs a VGA image stored in HyperRam.
Code is a lot cleaner now with Ver.1N.

The read rate is 125 MBPS, much faster than the VGA pixel rate of 25 MHz.
There were some bugs initially, caused by an OUTA that should have been OUTH, but all fixed now.
Ver.1P is cleaned up and uses smartpin for HR clock and uses COGATN to synchronize HR with VGA.
Also, switched from waitse1 to waitx to get data starting from the first byte bin (instead of fifth).

Update: See ~page 4 for 16bpp code and images
Update2: See ~page 6 for some 1080p tests
Update3: Now works with HyperFlash too! See ~page 9.

Prop Info and Apps: http://www.rayslogic.com/
«13456789

Comments

  • jmgjmg Posts: 13,614
    Rayman wrote: »
    Thought I had it, and then these corrupt horizontal lines started showing up...

    Must be doing something wrong, not sure what yet...
    They start showing up after about 10 seconds
    ....
    Interesting effect.
    You could try cooling P2 or HyperRAM chip with a can of freeze and see if it clears?
    what is the time-between-artitfacts here ?
  • RaymanRayman Posts: 9,482
    edited 2019-03-16 - 22:40:23
    Think I fixed that...
    Seems it doesn't like me reading 640 bytes at a time.
    Switched to 10x reads of 64 bytes and no more lines...
    I don't see anything in the datasheet that says I can't do 640 bytes though...

    Next minor issue is sync between vga driver and hyperram, needs some tweaking...
    3024 x 4032 - 2M
    Prop Info and Apps: http://www.rayslogic.com/
  • Think I see what happened..
    There's a 4 us limit for transactions (tCSM) and I was over 5 us.
    Guess I can do 2 reads of 320 pixels...
    Prop Info and Apps: http://www.rayslogic.com/
  • Impressive progress!
  • jmgjmg Posts: 13,614
    Rayman wrote: »
    Think I fixed that...
    Seems it doesn't like me reading 640 bytes at a time.
    Switched to 10x reads of 64 bytes and no more lines...
    I don't see anything in the datasheet that says I can't do 640 bytes though...
    ...
    Think I see what happened..
    There's a 4 us limit for transactions (tCSM) and I was over 5 us.
    Guess I can do 2 reads of 320 pixels...
    Good progress. The data is vague on that tCSM, as one example divides the refresh time by the rows & it is unclear if that is a hard per-time ceiling, or if it is ok to hit that as an average and so meet overall refresh
    Your tests indicate that is more a per-time ceiling.
    That would mean it is the RAM warming up that shifts the refresh timing just enough to start to have an effect ?
    That said, if they spec 4us, on a bench test being over 5us, is not a huge violation ?
    If you do 320 pixel reads, and heat the RAM, is it still ok ? (is the whole RAM refreshing ok ?)
  • It's not even warm, I think the chip is fine.

    I'm having trouble figuring out a way to coordinate with HSYNC. looks like INA doesn't work because it's in smartpin mode...
    Prop Info and Apps: http://www.rayslogic.com/
  • I have a hunch that refreshing pauses when CS# is Low and it has received the upper 4 bytes (of 6) of the Command-Address with that Address being in memory space, until CS# goes High again.

    Plenty of time to figure this out, though.
  • I have something that works now. See top post.
    It's stable and positioned correctly.

    Also, there were some black pixels on the beak that are gone now. That was strange...

    Not completely happy with it, but at least a proof of concept.
    Prop Info and Apps: http://www.rayslogic.com/
  • Is the dithering in the original image, or is that a limitation of the software/hardware?
  • roglohrogloh Posts: 1,106
    edited 2019-03-17 - 22:56:33
    Nice work Rayman, perseverance pays off.

    It will be interesting to see how people resolve the write accesses if a HyperRAM is used as a graphics buffer and can be shared between some COGs. We'd need guaranteed read bandwidth for display and some remaining bandwidth for intermittent write accesses from other COG(s), ideally keeping write latency low as a practical. I wonder if writes could be deferred to the end of each scan line being read in, or if you'd want finer grained interleaving between R/W. For block transfers that approach might be okay, but for lots of small accesses such as text rendering pixels etc I guess it might not be as good and probably starts to reduce the number of write transfers possible per frame. There'll probably be some sweet spot in between for different applications.
  • The image is the same 8-bpp, 256 color palette that is used with the examples from Chip.
    There probably is some dithering.

    The read rate is 125 MBPS when the VGA output rate is 25 MBPS.
    That's a decent ratio. Should be time to do updates.
    Prop Info and Apps: http://www.rayslogic.com/
  • Would like to have a better way to sync with VGA HSync.

    Right now, I have a jumper to P5 that I use to monitor HSync.
    Prop Info and Apps: http://www.rayslogic.com/
  • You could probably tack a COGATN or similar into the VGA hsync subroutine.
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Rayman,
    Not sure if this is what you are asking...
    In my VGA 1920x1080 driver, I changed the colors between successive groups of horizontal lines. This required syncing with the horizontal lines. Take a look at that thread for my latest code.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Rayman wrote: »
    Would like to have a better way to sync with VGA HSync.

    Right now, I have a jumper to P5 that I use to monitor HSync.

    If I read your code and the documentation correctly you should be able to use Smartpin 61 with the input offset selector set to x011 (+3) to read pin 0.

    Setting SE2 to match INB 29 rising, and using mode %10010 AND !Y[2] = Time X A-input highs/rises/edges, you can probably replace the TESTP loops with WAITSE2s, with an AKPIN to prepare for the next event should you not need the timing results.

    BlanksLoop can include WXPIN with the total number (65?) of edges to wait, while HyperWait could WXPIN with two edges if you are confident that you won't start waiting in the wrong state.


    This also goes some way toward freeing you from using the TESTP instruction ;-) With 3 SE sources you could also use one for VSync and be completely free :-)
  • @Rayman
    Here is what I do to get access to the HS. Note I swap palettes between some HS rows.
    It may give you some ideas.
    '+-------[ Setup Streamer ]----------------------------------------------------+ 
    vid
                     loc     ptra,#\@palette_pairs          ' set 1st color palette   
                     setq2   #2-1                           ' (only need 2)
                     rdlong  lut_start,ptra                 ' set next palette color
                     mov     ctr1,#135                      ' & set row ctr
    
                    setxfrq ##fset                  'set transfer frequency to xxxMHz
                    dirh    #vsync                  'make vsync pin output
    
                    'the next 4 lines may be commented out to bypass level scaling
                    setcy   ##intensity << 24       '\ r      set colorspace for rgb
                    setci   ##intensity << 16       '| g
                    setcq   ##intensity << 08       '| b
                    setcmod #%01_0_000_0            '/ enable colorspace conversion
    
                    wrpin   dacmode_s,#0            '\ enable dac modes in pins 0..3
                    wrpin   dacmode_c,#1            '|
                    wrpin   dacmode_c,#2            '|
                    wrpin   dacmode_c,#3            '|
                    setnib  dira,#$f,#0             '/ & enable output
    
    
                    rdfast  ##w*h*bpp/32/16,##@video_buffer ' wraps ~253KB bitmap (longs/16) 
    
    '+-------[ Display Screen ]----------------------------------------------------+ 
    ' Field loop
    field           outnot  #vsync                  '\ vsync on
                    mov     x,#sync_blanks          '|
                    call    #blank                  '|
                    outnot  #vsync                  '/ vsync off
    '-----------------------------------------------
                    mov     x,#top_blanks
                    call    #blank
    '-----------------------------------------------
    ' display visible screen 1bpp
                    mov     x,##h                   'set visible lines
                     loc     ptra,#\@palette_pairs           ' set 1st color palette   
                     mov     ctr1,#1                         ' & force new load below
    .visible        xcont   m_bs,#0                 'horizontal sync (before sync)
                    xcont   m_sn,#1                 'horizontal sync (sync)
                     djnz    ctr1,#.skip                     ' set new palette?
                     setq2   #2-1                            ' (only need 2)
                     rdlong  lut_start,ptra                  ' set next palette color
                     add     ptra,#2*4                       ' next color pair
                     mov     ctr1,#135                       ' & set row ctr
    .skip           xcont   m_bv,#0                 'horizontal sync (before visible)
                    xcont   m_rf,#0                 'visible line                   
                    djnz    x,#.visible             'another line?
    '-----------------------------------------------
                    mov     x,##bottom_blanks
                    call    #blank
    '-----------------------------------------------
                    jmp     #field                  'loop
    '-----------------------------------------------
    
    ' display blank lines
    blank           xcont   m_bs,#0                 'horizontal sync (before sync)
                    xcont   m_sn,#1                 'horizontal sync (sync)
                    xcont   m_bv,#0                 'horizontal sync (before visible)
                    xcont   m_vi,#0                 'blank visible
            _RET_   djnz    x,#blank
    '+-----------------------------------------------------------------------------+
    
    ' Initialized data
    dacmode_s       long    %0000_0000_000_1011000000000_01_00000_0         'hsync is 123-ohm, 3.3V
    dacmode_c       long    %0000_0000_000_1011100000000_01_00000_0         'R/G/B are 75-ohm, 2.0V
    
    m_bs            long    $CF000000+before_sync
    m_sn            long    $CF000000+sync
    m_bv            long    $CF000000+before_visible
    m_vi            long    $CF000000+w             'visible
    'm_rf            long    $6F000000+w             '4bit RFLONG LUT
    'm_rf            long    $5F000000+w             '2bit RFLONG LUT
    m_rf            long    $4F000000+w             '1bit RFLONG LUT
    
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • RaymanRayman Posts: 9,482
    edited 2019-03-18 - 13:01:36
    Thanks. I think my problem is the xcont buffering messing up timing...
    Have to look into inserting fake or broken up xcont lines to clear that buffer...
    Or, maybe I was just doing something wrong.
    Now that I have it working, I'll try again...
    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 13,614
    Rayman wrote: »
    I'm having trouble figuring out a way to coordinate with HSYNC. looks like INA doesn't work because it's in smartpin mode...
    Would like to have a better way to sync with VGA HSync.
    Right now, I have a jumper to P5 that I use to monitor HSync.

    Hmm.. It would certainly be generally useful and expected in P2, to be able to trigger from any pin edge, without needing a jumper, or even a pin-remap.


    I think you are saying the INA pin-in-attach fails (due to smart pin mode?), so this part of the docs cannot apply ?

    ‘SETSE1 D/#’, for which D/# selects the event:

    %001_PPPPPP = INA/INB bit of pin %PPPPPP rises
    %010_PPPPPP = INA/INB bit of pin %PPPPPP falls
    %011_PPPPPP = INA/INB bit of pin %PPPPPP changes


    It's not clear what INA/INB means, as there seems no bit to select A/B in this setup ?

  • Well, just reading INA doesn't work in this smartpin mode.
    But, maybe SETSE1 would?
    I have a feeling it won't, but worth trying...
    AJL's smartpin offset idea is interesting. Not sure if that works either...

    I have a feeling that actual pin status may not be available in smart pin mode...
    Prop Info and Apps: http://www.rayslogic.com/
  • jmg wrote: »
    Rayman wrote: »
    I'm having trouble figuring out a way to coordinate with HSYNC. looks like INA doesn't work because it's in smartpin mode...
    Would like to have a better way to sync with VGA HSync.
    Right now, I have a jumper to P5 that I use to monitor HSync.

    Hmm.. It would certainly be generally useful and expected in P2, to be able to trigger from any pin edge, without needing a jumper, or even a pin-remap.


    I think you are saying the INA pin-in-attach fails (due to smart pin mode?), so this part of the docs cannot apply ?

    ‘SETSE1 D/#’, for which D/# selects the event:

    %001_PPPPPP = INA/INB bit of pin %PPPPPP rises
    %010_PPPPPP = INA/INB bit of pin %PPPPPP falls
    %011_PPPPPP = INA/INB bit of pin %PPPPPP changes


    It's not clear what INA/INB means, as there seems no bit to select A/B in this setup ?

    With 6 bits in the %PPPPPP field the A/B select occurs from the uppermost bit.

    For these events it's probably less helpful to think of INA/INB 0..31, and more helpful to think of IN 0..63.

    When a pin cell is placed in smartpin mode the IN signal for that cell is subverted for the purposes of the smartpin function, which for the DAC modes indicates end of sample period. That's where the remap to another smartpin can provide a means to read the pin state.
  • RaymanRayman Posts: 9,482
    edited 2019-03-18 - 21:46:28
    Is there anywhere where Chip says this works?
    I know it can read the IN state, but not sure what that means in smartpin mode...
    Prop Info and Apps: http://www.rayslogic.com/
  • jmgjmg Posts: 13,614
    AJL wrote: »

    With 6 bits in the %PPPPPP field the A/B select occurs from the uppermost bit.

    For these events it's probably less helpful to think of INA/INB 0..31, and more helpful to think of IN 0..63.
    Ah, of course.... I was thinking INB was some alternate pathway...
    AJL wrote: »
    When a pin cell is placed in smartpin mode the IN signal for that cell is subverted for the purposes of the smartpin function, which for the DAC modes indicates end of sample period. That's where the remap to another smartpin can provide a means to read the pin state.
    That makes sense, but given P2's performance, I had not expected to also lose the ability to simply sense any pin edge.
  • Rayman wrote: »
    Is there anywhere where Chip says this works?
    I know it can read the IN state, but not sure what that means in smartpin mode...

    This is another case where multiple signals have the same name, at least in part because of history.

    On P1, IN simply meant the input state of the pin.

    On P2, the state of the pin can be read by smartpins or can bypass the smartpin cell for direct reading by the COGs.
    So, if you'll forgive some nomenclature creation on my part:
    You have IN (PIN) and IN (COG), which in normal operation are the same signal. If my understanding is correct, IN (COG) is where the INA/INB is relevant.
    In smartpin mode, the path is interrupted with IN (PIN) being supplied to the input selector circuits, and IN (COG) being an output of the smartpin cell fed to the COGs.
    So the SE events are operating on the raw pin input in normal operation or the smartpin outputs in smartpin mode.
    This would allow code to setup a smartpin cell, enable it, do any further preparatory processing and then a WAITSEx configured for IN rising edge to wait for the smartpin cell to complete its operation.

    By using the input pin selector logic, it allows a smartpin to take its A and B inputs from any pin in the ±3 range, raw or inverted as necessary, and perform the smartpin function on that. While a smartpin cell can be a repository, or perform many automated logical functions, there's no simple read of the pin state.
    In the OPs situation there appears to be only two approaches that can work:

    Build the sync tracking into the frame generation routine, so you know what the sync pin state is and don't need to read it; or
    Assign the sync tracking to a smartpin cell that is in the range and not being used for other purposes.

    I don't have a P2 to test this with, so I'll need to rely on others to test the theory.
  • Rayman wrote: »
    Is there anywhere where Chip says this works?
    I know it can read the IN state, but not sure what that means in smartpin mode...

    Chip uses smartpins to sample the autobaud in the ROM. He uses P0 & P1 to do this for P63/62.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Are p63/62 in smartpin mode then?
    Prop Info and Apps: http://www.rayslogic.com/
  • Shoot. After running overnight the bad lines are back...

    Need to read the datasheet again and see what I'm doing wrong...
    3024 x 4032 - 2M
    Prop Info and Apps: http://www.rayslogic.com/
  • evanhevanh Posts: 7,295
    edited 2019-03-19 - 10:51:55
    Rayman wrote: »
    Are p63/62 in smartpin mode then?
    Yes. All four smartpins are set up in quick succession:
    reset_serial	andn	dira,#%11		'disable timing measurements for autobaud
    
    		setint1	#0			'disable int1
    
    		mov	head,#0			'reset serial buffer pointers
    		mov	tail,#0
    
    		dirl	#rx_pin			'disable receiver
    		wrpin	#%00_11111_0,#rx_pin	'configure rx_pin for asynchronous receive, always input
    
    		wrpin	#%01_11110_0,#tx_pin	'configure tx_pin for asynchronous transmit, always output
    		dirh	#tx_pin			'enable transmitter
    
    		wrpin	mths,#rx_ths		'configure rx_ths for timing high states
    
    		wrpin	mtne,#rx_tne		'configure rx_tne for timing negative edges
    		wxpin	#1,#rx_tne		'report each cycle
    		wypin	#0,#rx_tne		'measure fall to fall
    
    		setse1	#%110<<6+rx_tne		'set se1 to trigger on rx_tne high
    
    		mov	ijmp1,#autobaud_isr	'set int1 jump vector to autobaud ISR
    
    		setint1	#4			'set int1 to trigger on se1 (rx_tne high)
    
    		or	dira,#%11		'enable timing measurements for autobaud
    
    "... peers into the actual workings of a quantum jump for the first time. The results
    reveal a surprising finding that contradicts Danish physicist Niels Bohr's established view
    —the jumps are neither abrupt nor as random as previously thought."
  • Rayman wrote: »
    Are p63/62 in smartpin mode then?
    Yes, they are also in smartpin mode to function as uarts. P0&1 both connect to P63 I think to time the receive high and low times while autobauding. See the ROM code.
    My Prop boards: P8XBlade2, RamBlade, CpuBlade, TriBlade
    Prop OS (also see Sphinx, PropDos, PropCmd, Spinix)
    Website: www.clusos.com
    Prop Tools (Index) , Emulators (Index) , ZiCog (Z80)
  • Hy Rayman

    Based in what I had found at the SPIN 2 code that is available at the first post, there are some instances where #Pin_CSn polarity is changed BEFORE #Pin_CK is driven LOW.

    If you look carefully at the available datasheets (Cypress, ISSI) and also HyperBus specs, #Pin_CK MUST be LOW (Inactive), some time BEFORE ANY #Pin_CSn transition (tCSH (0nS), tCSS (3nS), at Cypress datasheets).

    Also, since you are using a helper COG to transition #Pin_CK during the data transfer period, you must ensure that the last transition during a Data Memory Read operation lefts #Pin_CK in a LOW state (Inactive), during and after the last byte read operation.

    Similarly, the same applies to any Data Memory Write operation; the last #Pin_CK transition, that latches the last byte to the HyperRam, MUST be a Low-going transition.

    In both cases, #Pin_CK must be kept low, untill #Pin_CSn fully transitions from Low to High (tCSH, tCSS).

    If needed (to any other intended application), #Pin_CK can be toggled freely during the time #PIN_CSn is actively driven HIGH, but the timing constraints must be observed, before any tansition on its active state.

    Its also strongly recommended that #Pin_RSTn, #Pin_CSn and #Pin_CK are EVER actively driven, keeping them either LOW or HIGH, as needed by the control routines. This will ensure that noise or other interferences will have no opportunities to mess with the HyperBus interface.

    Hope it helps a bit.

    Henrique
  • Thanks. That might be the problem (I hope so.)
    I have a feeling the HR clock is still running when I deselect the chip...
    Prop Info and Apps: http://www.rayslogic.com/
Sign In or Register to comment.