Shop OBEX P1 Docs P2 Docs Learn Events
SRAM as VGA Buffer - Page 2 — Parallax Forums

SRAM as VGA Buffer

245678

Comments

  • @evanh said:
    I'm full steam ahead using your REP'd ADD for incremental address generation. It's perfect for sysclock/2. :)

    Yes it is perfect. An SRAM will perform quite well for mid range video resolutions, not quite as fast as HyperRAM and PSRAM configurations, but still reasonable.

  • Read bursts are something like this

    read_common
                                setword xrecvdata, c, #0        'adjust the byte transfer clocks needed in streamer
                                setxfrq xfreq2                  'setup streamer frequency (sysclk/2)
                                setq    mask                    
    patchport0                  muxq    outa, addr1             'get start address of transfer
                                wrpin   regdatabus, datapins    'setup if pins registered or not
    
                                drvl    cspin
                                drvl    oepin
                                xinit   xrecvdata, #0
                                waitx   delay
                                rep     #1, c
    patchport1                  add     outa, #1
                                call    resume
                                drvh    oepin
                      _ret_     drvh    cspin
    
  • evanhevanh Posts: 15,171

    Doh! /me a dummy. Total red-herring above. I've got the address and data pins registered but not the control pins.

  • I think the timing can be figured out fully without an SRAM board to play with. Writes are easy, but reads just need another COG to drive out patterns on the data bus, and then scoped to learn what comes back into the driver as the delay is varied.

  • evanhevanh Posts: 15,171
    edited 2021-04-11 16:34

    I've got linear writes looking good. Worked out how to add a controlled delay to the start of the smartpin for WE pulses.
    Here's the routine:

    sram_wrbytes
            rdfast  fifo_nb, hub_addr   'start FIFO
            setword stm_rf8, length, #0 'count of streamer transfer cycles
            mov outa, sr_addr       'preset address bus
            outh    #CEPIN          'enable SRAM
    
            dirl    #WEPIN
            wxpin   sp_slow, #WEPIN     'tuned compensation delay on the NCO
            dirh    #WEPIN          'restart the smartpin's NCO
            wxpin   sp_fast, #WEPIN     'go fast on next NCO rollover
            wypin   length, #WEPIN      'start the WE smartpin pulses on next NCO rollover
    
            xinit   stm_rf8, #0     'start the data, conveniently occurs just 2 ticks ahead of the REP'd ADD
    
            rep @.rloop, length     'cycle through the addresses
            add outa, srcmd_add     'sr_addr + 1
    .rloop
        _ret_   outl    #CEPIN          'disable SRAM
    

    EDIT: Updated some comments in the source code

    And scope screenshot of prop2 running at 4 MHz sysclock:

    • Orange trace is CE
    • Blue trace is WE (with 5 write pulses) from the smartpin
    • Pink trace is A0 (starting high and toggling 5 more times, one more than needed)
    • Green trace is D0 (%10101) from the streamer
  • roglohrogloh Posts: 5,151
    edited 2021-04-11 15:47

    Very good, you are probably a step ahead of me at this point evanh :smile:

    I like the idea of including the SRAM as an option into this growing suite of memory drivers, it'll be handy for people who'd like to play with traditional RAM. We could probably do a 16 bit wide variant too to double the rate but that's really burning up the pins. Byte wide memory is so much easier with streaming directly and not dealing with the various alignment issues that crop up otherwise.

  • evanhevanh Posts: 15,171

    Would be nice if there's a way to trim down fewer than five instructions for that smartpin trick.

  • Yeah using smartpins can sometimes take a few instructions to setup when aligning the clock. Can't we just have the WR pin ready to go and trigger the clock transitions with WYPIN and have the WE pin output inverted and put the delay in between/after the streamer start and/or the rep loop? Does the write clock come out too soon or too late if you do that? Another way to do it is to issue a dummy streamer command that doesn't stream into/out of HUB but sets up a needed delay, but that is going to be at the sysclk/4 rate.

  • evanhevanh Posts: 15,171

    What I've done is rock solid consistent. The timing is not affected by pre-existing states like hub slot alignment or NCO phases. Both the XINIT and the DIRH/L are important to bring the three processing units (cog/streamer/smartpin) in phase with each other.

    I think the those five instructions have to stay.

    That said, I can wangle it without the DIRH/L pair but it depends on the cog staying on a 2-tick/instruction regular beat. If the cog execution gets shifted to the alternate phase it throws off instruction timing with respect to the smartpin's NCO phase.

  • roglohrogloh Posts: 5,151
    edited 2021-04-11 16:18

    It's a crying shame about the P28-P31 clock stuff on P2-EVAL. We could otherwise make a really neat SRAM breakout that fits within less than half of the P2-EVAL outline (instead of hanging outside of it) and consumes the port A connectors on P24-P31 for the SRAM Data bus then P32-P55 for Address and Control on Port B. I guess it can still be done with Port A alone, but will have to be a larger board for that. Pity.

  • evanhevanh Posts: 15,171

    The streamer commands are much better built for this than the smartpin handling is.

  • @evanh said:
    What I've done is rock solid consistent. The timing is not affected by pre-existing states like hub slot alignment or NCO phases. Both the XINIT and the DIRH/L are important to bring the three processing units (cog/streamer/smartpin) in phase with each other.

    I think the those five instructions have to stay.

    That said, I can wangle it without the DIRH/L pair but it depends on the cog staying on a 2-tick/instruction regular beat. If the cog execution gets shifted to the alternate phase it throws off instruction timing with respect to the smartpin's NCO phase.

    Yeah I've been there before with the HyperRAM and the various options like sysclk1 vs sysclk/2 and registered/unregistered clocks. It's not an easy thing to solve sometimes.

  • evanhevanh Posts: 15,171

    Actually, you know what? Trimming more off would be bad anyway, because the RDFAST needs that time to fill the FIFO. I've got it set for non-blocking execution.

  • @evanh said:
    Actually, you know what? Trimming more off would be bad anyway, because the RDFAST needs that time to fill the FIFO. I've got it set for non-blocking execution.

    Yes the FIFO needs time to fill. I tend to put that RDFAST instruction somewhere early in my code for that reason.
    Getting late here, must be almost dawn in NZ. Wrapping up.

  • evanhevanh Posts: 15,171

    :) Yep, got up at 2:30 PM. I should go to bed too.

  • Hi

    Scratching my head- trying to remember but using static ram like 6116 many years ago I don't think strobing CE (CS) and OE was necessary for block reads just set OE and CS and then change address and read (after slight delay). Writes have to be done by strobing the write line but not reads.

    Of course memories (human) are fallible...

    Dave

  • If only we had more ram inside the p2. That would nullify the need for external memory for videoout

  • @tritonium said:
    Hi

    Scratching my head- trying to remember but using static ram like 6116 many years ago I don't think strobing CE (CS) and OE was necessary for block reads just set OE and CS and then change address and read (after slight delay). Writes have to be done by strobing the write line but not reads.

    Of course memories (human) are fallible...

    Dave

    You're still right. Most standard SRAMs can just tie the OE ad CS low throughout a transfer and just change the address to read new data. They are asynchronous. I use this fact in the code above to get the fastest read rate speeds while writes need the pulsed WE pin to clock in the new data.

  • @Surac said:
    If only we had more ram inside the p2. That would nullify the need for external memory for videoout

    For full screen framebuffers/GUIs etc at high resolution, yeah the 512k can be rather limiting at times, but for other applications in text modes or with sprite drivers that race the beam etc, it's not a big deal and you don't have to have the external memory. It's great to have when you need it though.

  • @rogloh said:

    @Surac said:
    If only we had more ram inside the p2. That would nullify the need for external memory for videoout

    For full screen framebuffers/GUIs etc at high resolution, yeah the 512k can be rather limiting at times, but for other applications in text modes or with sprite drivers that race the beam etc, it's not a big deal and you don't have to have the external memory. It's great to have when you need it though.

    Even for lowres external memory is pretty useful, since you can use the external memory as a backbuffer and still have plenty bandwidth left for other stuff.

  • evanhevanh Posts: 15,171

    A smartpin begins cycling one sysclock (or tick) later than the DIRH instruction.
    A streamer begins cycling two sysclocks (or ticks) later than the XINIT instruction.

  • How about after a WYPIN in clock transition mode?

  • evanhevanh Posts: 15,171
    edited 2021-04-12 01:17

    Depends on the phase of the NCO cycle inside the smartpin. That only stops (reset) when DIR is low. Just like the PWM modes, the next action is buffered until the next NCO rollover.
    ie: There is no timing distinction between zero pulses and non-zero.
    Managing this in the streamer is why XINIT and XZERO exist. So you can think of the smartpin modes as all being XCONTs by default with a DIRL/DIRH pair serving as an XINIT.

  • evanhevanh Posts: 15,171

    One of the side effects of that in those two modes, TRANSITION and PULSE, is the very first "base period" (NCO cycle) can never be utilised. It always cycles as a zero in Y, because Y is cleared while DIR is low. That detail can cause confusion when you are expecting an immediate start to the pulses/steps at the subsequent WYPIN.

  • evanhevanh Posts: 15,171
    edited 2021-04-12 02:35

    Hmm, err, calling it an NCO in the smartpins is wrong. I've borrowed that from the streamer docs - which does use an NCO. The smartpins use the simpler countdown timer method for their "period"s. EDIT: So there is no XZERO equivalent needed.
    EDIT: Here's updated source comments:

            dirl    #WEPIN
            wxpin   sp_wrbytes, #WEPIN  'tuned compensation delay, stretches the first cycle
            dirh    #WEPIN          'restart the smartpin's cycle timer
            wxpin   sp_fast, #WEPIN     'go fast on next cycle
            wypin   #4, #WEPIN      'start the WE smartpin pulses on next cycle
    
  • Looking at my HyperRAM write code (simplest register case) I see this. Looks like I only needed a waitx with clkdelay set to 1 for unregistered clock output pins, 0 for registered pin. I was using a sysclk/4 output clock though with sysclk/2 write transfers (DDR). This should apply for 8ns SRAM at 250MHz P2 with 62.5MB/s writes. I guess your code will be violating timing with 4ns write pulses. I'd prefer to stick to 6.5ns to be sure the writes are solid.

                                drvl    cspin                   'active chip select
                                drvl    datapins                'enable the DATA bus
                                fltl    clkpin                  'disable Smartpin clock output mode
                                wxpin   #2, clkpin              'configure for 2 clocks between transitions
                                drvh    clkpin                  'enable Smartpin   
                                setxfrq xfreq2                  'setup streamer frequency (sysclk/2)
                                waitx   clkdelay                'odd delay shifts clock phase from data
                                xinit   ximm4, addrhi           'send 4 bytes of addrhi data
                                wypin   count, clkpin           'start memory clock output 
                                xcont   ximm, addrlo            'send 2 or 4 bytes of addrlo + data
                if_z            xcont   xhub, hubdata           'optionally stream burst data from hub
                                waitxfi                         'wait for streamer to end
                                fltl    datapins                'tri-state DATA bus
                                drvh    cspin                   'de-assert chip select
    
  • ScroungreScroungre Posts: 161
    edited 2021-04-12 02:37

    Hm! Interesting thread, because I want to use external SRAM on my P2 as well!! Thanks!

    However, I've a thought that could conserve a lot of pins - use an external counter chip for the low* address pins.

    For example, if in my application (not video) I need to read and write always a block of 256 bytes then I can use an external eight-bit counter chip for eight address pins. This (presuming it wraps appropriately) needs only one pin for eight addresses - a 'counter increment' pulse. Bigger blocks just use a larger counter and save another pin each step up. The blocks do have to be of the size 2^x** bytes.

    For slightly more flexibility, another counter control pin, 'reset', might come in handy, as well as a method of very quickly toggling the 'counter increment' in case you need a quick way to get 'back where you came from'.

    This also defeats a large part of the 'Random' in 'Static Random Access Memory', but if you don't need it, you can save a lot of pins. 'ta! S.

    ETA: Another useful trick might what they used to call 'paging', wherein the external counter chip counts which page you are on, the high SRAM bits, and the P2 controlling the low SRAM addresses allows free and random access to that page - hit the counter increment pulse to advance to the next page.

    • As pointed out before, most SRAMs don't care if you scramble the address (or data) pin arrays.
      • Wherein X is an integer. So there, pedants. ;-P
  • evanhevanh Posts: 15,171
    edited 2021-04-12 02:52

    I'd use a preprogrammed PAL/CPLD chip as the external counter. Then it can be placed on the 8-bit databus as well. With this arrangement the entire address would be loaded into it a byte at a time. It does give you much lower access latency and avoids the refresh complications of PS(D)RAMs.

    Could get creative with features like single byte sized address updates packed into the CPLD.

  • evanhevanh Posts: 15,171
    edited 2021-04-12 04:43

    [Buggy example removed] not a flaw after all. :)

  • evanhevanh Posts: 15,171
    edited 2021-04-12 04:42

    Ah... and it's another DOH! That one isn't any help to others, time for a delete. :(

Sign In or Register to comment.