SRAM as VGA Buffer

rogloh · 2021-04-11 15:08

@evanh said:
I'm full steam ahead using your REP'd ADD for incremental address generation. It's perfect for sysclock/2.

Yes it is perfect. An SRAM will perform quite well for mid range video resolutions, not quite as fast as HyperRAM and PSRAM configurations, but still reasonable.

rogloh · 2021-04-11 15:21

Read bursts are something like this

read_common
                            setword xrecvdata, c, #0        'adjust the byte transfer clocks needed in streamer
                            setxfrq xfreq2                  'setup streamer frequency (sysclk/2)
                            setq    mask                    
patchport0                  muxq    outa, addr1             'get start address of transfer
                            wrpin   regdatabus, datapins    'setup if pins registered or not

                            drvl    cspin
                            drvl    oepin
                            xinit   xrecvdata, #0
                            waitx   delay
                            rep     #1, c
patchport1                  add     outa, #1
                            call    resume
                            drvh    oepin
                  _ret_     drvh    cspin

evanh · 2021-04-11 15:23

Doh! /me a dummy. Total red-herring above. I've got the address and data pins registered but not the control pins.

rogloh · 2021-04-11 15:29

I think the timing can be figured out fully without an SRAM board to play with. Writes are easy, but reads just need another COG to drive out patterns on the data bus, and then scoped to learn what comes back into the driver as the delay is varied.

evanh · 2021-04-11 15:37

I've got linear writes looking good. Worked out how to add a controlled delay to the start of the smartpin for WE pulses.
Here's the routine:

sram_wrbytes
        rdfast  fifo_nb, hub_addr   'start FIFO
        setword stm_rf8, length, #0 'count of streamer transfer cycles
        mov outa, sr_addr       'preset address bus
        outh    #CEPIN          'enable SRAM

        dirl    #WEPIN
        wxpin   sp_slow, #WEPIN     'tuned compensation delay on the NCO
        dirh    #WEPIN          'restart the smartpin's NCO
        wxpin   sp_fast, #WEPIN     'go fast on next NCO rollover
        wypin   length, #WEPIN      'start the WE smartpin pulses on next NCO rollover

        xinit   stm_rf8, #0     'start the data, conveniently occurs just 2 ticks ahead of the REP'd ADD

        rep @.rloop, length     'cycle through the addresses
        add outa, srcmd_add     'sr_addr + 1
.rloop
    _ret_   outl    #CEPIN          'disable SRAM

EDIT: Updated some comments in the source code

And scope screenshot of prop2 running at 4 MHz sysclock:

Orange trace is CE
Blue trace is WE (with 5 write pulses) from the smartpin
Pink trace is A0 (starting high and toggling 5 more times, one more than needed)
Green trace is D0 (%10101) from the streamer

rogloh · 2021-04-11 15:44

Very good, you are probably a step ahead of me at this point evanh

I like the idea of including the SRAM as an option into this growing suite of memory drivers, it'll be handy for people who'd like to play with traditional RAM. We could probably do a 16 bit wide variant too to double the rate but that's really burning up the pins. Byte wide memory is so much easier with streaming directly and not dealing with the various alignment issues that crop up otherwise.

evanh · 2021-04-11 15:47

Would be nice if there's a way to trim down fewer than five instructions for that smartpin trick.

rogloh · 2021-04-11 15:54

Yeah using smartpins can sometimes take a few instructions to setup when aligning the clock. Can't we just have the WR pin ready to go and trigger the clock transitions with WYPIN and have the WE pin output inverted and put the delay in between/after the streamer start and/or the rep loop? Does the write clock come out too soon or too late if you do that? Another way to do it is to issue a dummy streamer command that doesn't stream into/out of HUB but sets up a needed delay, but that is going to be at the sysclk/4 rate.

evanh · 2021-04-11 16:13

What I've done is rock solid consistent. The timing is not affected by pre-existing states like hub slot alignment or NCO phases. Both the XINIT and the DIRH/L are important to bring the three processing units (cog/streamer/smartpin) in phase with each other.

I think the those five instructions have to stay.

That said, I can wangle it without the DIRH/L pair but it depends on the cog staying on a 2-tick/instruction regular beat. If the cog execution gets shifted to the alternate phase it throws off instruction timing with respect to the smartpin's NCO phase.

rogloh · 2021-04-11 16:16

It's a crying shame about the P28-P31 clock stuff on P2-EVAL. We could otherwise make a really neat SRAM breakout that fits within less than half of the P2-EVAL outline (instead of hanging outside of it) and consumes the port A connectors on P24-P31 for the SRAM Data bus then P32-P55 for Address and Control on Port B. I guess it can still be done with Port A alone, but will have to be a larger board for that. Pity.

evanh · 2021-04-11 16:17

The streamer commands are much better built for this than the smartpin handling is.

rogloh · 2021-04-11 16:25

@evanh said:
What I've done is rock solid consistent. The timing is not affected by pre-existing states like hub slot alignment or NCO phases. Both the XINIT and the DIRH/L are important to bring the three processing units (cog/streamer/smartpin) in phase with each other.

I think the those five instructions have to stay.

That said, I can wangle it without the DIRH/L pair but it depends on the cog staying on a 2-tick/instruction regular beat. If the cog execution gets shifted to the alternate phase it throws off instruction timing with respect to the smartpin's NCO phase.

Yeah I've been there before with the HyperRAM and the various options like sysclk1 vs sysclk/2 and registered/unregistered clocks. It's not an easy thing to solve sometimes.

evanh · 2021-04-11 16:28

Actually, you know what? Trimming more off would be bad anyway, because the RDFAST needs that time to fill the FIFO. I've got it set for non-blocking execution.

rogloh · 2021-04-11 16:31

@evanh said:
Actually, you know what? Trimming more off would be bad anyway, because the RDFAST needs that time to fill the FIFO. I've got it set for non-blocking execution.

Yes the FIFO needs time to fill. I tend to put that RDFAST instruction somewhere early in my code for that reason.
Getting late here, must be almost dawn in NZ. Wrapping up.

evanh · 2021-04-11 16:36

Yep, got up at 2:30 PM. I should go to bed too.

tritonium · 2021-04-11 17:48

Hi

Scratching my head- trying to remember but using static ram like 6116 many years ago I don't think strobing CE (CS) and OE was necessary for block reads just set OE and CS and then change address and read (after slight delay). Writes have to be done by strobing the write line but not reads.

Of course memories (human) are fallible...

Dave

Surac · 2021-04-11 20:29

If only we had more ram inside the p2. That would nullify the need for external memory for videoout

rogloh · 2021-04-11 23:23

@tritonium said:
Hi

Scratching my head- trying to remember but using static ram like 6116 many years ago I don't think strobing CE (CS) and OE was necessary for block reads just set OE and CS and then change address and read (after slight delay). Writes have to be done by strobing the write line but not reads.

Of course memories (human) are fallible...

Dave

You're still right. Most standard SRAMs can just tie the OE ad CS low throughout a transfer and just change the address to read new data. They are asynchronous. I use this fact in the code above to get the fastest read rate speeds while writes need the pulsed WE pin to clock in the new data.

rogloh · 2021-04-11 23:27

@Surac said:
If only we had more ram inside the p2. That would nullify the need for external memory for videoout

For full screen framebuffers/GUIs etc at high resolution, yeah the 512k can be rather limiting at times, but for other applications in text modes or with sprite drivers that race the beam etc, it's not a big deal and you don't have to have the external memory. It's great to have when you need it though.

Wuerfel_21 · 2021-04-11 23:45

@rogloh said:

@Surac said:
If only we had more ram inside the p2. That would nullify the need for external memory for videoout

For full screen framebuffers/GUIs etc at high resolution, yeah the 512k can be rather limiting at times, but for other applications in text modes or with sprite drivers that race the beam etc, it's not a big deal and you don't have to have the external memory. It's great to have when you need it though.

Even for lowres external memory is pretty useful, since you can use the external memory as a backbuffer and still have plenty bandwidth left for other stuff.

evanh · 2021-04-12 00:25

A smartpin begins cycling one sysclock (or tick) later than the DIRH instruction.
A streamer begins cycling two sysclocks (or ticks) later than the XINIT instruction.

rogloh · 2021-04-12 00:40

How about after a WYPIN in clock transition mode?

evanh · 2021-04-12 01:11

Depends on the phase of the NCO cycle inside the smartpin. That only stops (reset) when DIR is low. Just like the PWM modes, the next action is buffered until the next NCO rollover.
ie: There is no timing distinction between zero pulses and non-zero.
Managing this in the streamer is why XINIT and XZERO exist. So you can think of the smartpin modes as all being XCONTs by default with a DIRL/DIRH pair serving as an XINIT.

evanh · 2021-04-12 01:54

One of the side effects of that in those two modes, TRANSITION and PULSE, is the very first "base period" (NCO cycle) can never be utilised. It always cycles as a zero in Y, because Y is cleared while DIR is low. That detail can cause confusion when you are expecting an immediate start to the pulses/steps at the subsequent WYPIN.

evanh · 2021-04-12 01:59

Hmm, err, calling it an NCO in the smartpins is wrong. I've borrowed that from the streamer docs - which does use an NCO. The smartpins use the simpler countdown timer method for their "period"s. EDIT: So there is no XZERO equivalent needed.
EDIT: Here's updated source comments:

        dirl    #WEPIN
        wxpin   sp_wrbytes, #WEPIN  'tuned compensation delay, stretches the first cycle
        dirh    #WEPIN          'restart the smartpin's cycle timer
        wxpin   sp_fast, #WEPIN     'go fast on next cycle
        wypin   #4, #WEPIN      'start the WE smartpin pulses on next cycle

rogloh · 2021-04-12 02:13

Looking at my HyperRAM write code (simplest register case) I see this. Looks like I only needed a waitx with clkdelay set to 1 for unregistered clock output pins, 0 for registered pin. I was using a sysclk/4 output clock though with sysclk/2 write transfers (DDR). This should apply for 8ns SRAM at 250MHz P2 with 62.5MB/s writes. I guess your code will be violating timing with 4ns write pulses. I'd prefer to stick to 6.5ns to be sure the writes are solid.

                            drvl    cspin                   'active chip select
                            drvl    datapins                'enable the DATA bus
                            fltl    clkpin                  'disable Smartpin clock output mode
                            wxpin   #2, clkpin              'configure for 2 clocks between transitions
                            drvh    clkpin                  'enable Smartpin   
                            setxfrq xfreq2                  'setup streamer frequency (sysclk/2)
                            waitx   clkdelay                'odd delay shifts clock phase from data
                            xinit   ximm4, addrhi           'send 4 bytes of addrhi data
                            wypin   count, clkpin           'start memory clock output 
                            xcont   ximm, addrlo            'send 2 or 4 bytes of addrlo + data
            if_z            xcont   xhub, hubdata           'optionally stream burst data from hub
                            waitxfi                         'wait for streamer to end
                            fltl    datapins                'tri-state DATA bus
                            drvh    cspin                   'de-assert chip select

Scroungre · 2021-04-12 02:19

Hm! Interesting thread, because I want to use external SRAM on my P2 as well!! Thanks!

However, I've a thought that could conserve a lot of pins - use an external counter chip for the low* address pins.

For example, if in my application (not video) I need to read and write always a block of 256 bytes then I can use an external eight-bit counter chip for eight address pins. This (presuming it wraps appropriately) needs only one pin for eight addresses - a 'counter increment' pulse. Bigger blocks just use a larger counter and save another pin each step up. The blocks do have to be of the size 2^x** bytes.

For slightly more flexibility, another counter control pin, 'reset', might come in handy, as well as a method of very quickly toggling the 'counter increment' in case you need a quick way to get 'back where you came from'.

This also defeats a large part of the 'Random' in 'Static Random Access Memory', but if you don't need it, you can save a lot of pins. 'ta! S.

ETA: Another useful trick might what they used to call 'paging', wherein the external counter chip counts which page you are on, the high SRAM bits, and the P2 controlling the low SRAM addresses allows free and random access to that page - hit the counter increment pulse to advance to the next page.

As pointed out before, most SRAMs don't care if you scramble the address (or data) pin arrays.
- Wherein X is an integer. So there, pedants. ;-P

evanh · 2021-04-12 02:47

I'd use a preprogrammed PAL/CPLD chip as the external counter. Then it can be placed on the 8-bit databus as well. With this arrangement the entire address would be loaded into it a byte at a time. It does give you much lower access latency and avoids the refresh complications of PS(D)RAMs.

Could get creative with features like single byte sized address updates packed into the CPLD.

evanh · 2021-04-12 04:06

[Buggy example removed] not a flaw after all.

evanh · 2021-04-12 04:20

Ah... and it's another DOH! That one isn't any help to others, time for a delete.

SRAM as VGA Buffer

Comments