Memory drivers for P2 - PSRAM/SRAM/HyperRAM (was HyperRAM driver for P2)

evanh · 2021-08-02 05:20

@Scroungre said:
Just in the hardware. It seemed mostly okay, off and on, up to about 150MHz, sysclk/2, registered clk pin, but after that it was just hopeless. Sysclk/1 was worse. This is completely unacceptable for my application - looks like I'm going back to SRAM and a kajillion pins again.

To go above 150 MHz sysclock/2 the delay compensation value has to be increased by one. And increased again at even higher frequencies.

The layout I suspect is not good - I'm using a P2 Edge on a Jon MacPhalen "breadboard" adapter, and the module is plugged into 'base' pin 0 (closest to the chip!). I have not stuck a 22pF capacitor on it.

The added capacitor is only to allow writes at sysclock/1. Adding it actually makes reads less reliable. Writes are rock solid at sysclock/2, across all frequencies, without the capacitor.

rogloh · 2021-08-02 05:23

Yeah that is interesting @Scroungre . The JohnnyMac board potentially has much longer trace lengths vs P2-EVAL which is where all the HyperRAM driver development and testing has taken place and also what has determined the default timing profile.

The memory timing might be tweakable if you run the included delay test program to find the delay breakpoints in your setup and adjust the driver code accordingly in the area of code shown below in memory.spin2. I typically try to have them transition in the center of the overlapping bands which is why I like to see the percentage of transfers that fail tapering off rather than a simple binary good/bad result in the output.

If that doesn't help at all in the different plug in locations I think what will be needed for the JohnnyMac will be to use the upcoming PSRAM based Edge board, and not the HyperRAM breakout.

With HyperRAM 1.0 devices running at 3.3V, clocking it at anything over 100MHz operation (which is a P2 at 200MHz with sysclk/1 DDR operation) is actually overclocking it, but at room temps I have found typically we can push it to about 150MHz (300MHz P2) before it completely fails (on the P2-EVAL board).

{
Below are the P2 frequency bands with the input timing delay & registered data pins correction 
factors to get the memory reads working on the P2.  This timing is potentially also affected by 
temperature/process/voltage etc, and may need further tweaking in a given setup or whenever non 
room temp operating conditions are experienced - YMMV.

Thanks go to @ozpropdev and @evanh for experimentally determining operating ranges that helped
figure some of this out below!

Default delay profiles used for HyperFlash and HyperRAM on P2-EVAL HyperRAM breakout board 
operating at room temp. This can be tweaked or others added for different temperatures.
These delay profiles can be assigned to each configured device at address mapping time.
The actual operating input delay can also be adjusted on the fly per bank if the variation 
of delay with temperature is already determined and the temperature is known/measurable.
}
'sysclk/1
HyperRamDelays1         long 6,92_000000,135_000000,188_000000,234_000000,280_000000,308_000000,0
HyperRamDelaysUnreg1    long 6,88_000000,120_000000,180_000000,225_000000,270_000000,305_000000,0
'sysclk/2
HyperRamDelays2         long 7,92_000000,135_000000,188_000000,232_000000,280_000000,307_000000,0
HyperRamDelaysUnreg2    long 7,89_000000,135_000000,180_000000,222_000000,266_000000,303_000000,0

HyperFlashDelays1       long 5,70_000000,110_000000,160_000000,225_000000,277_000000,320_000000,0
HyperFlashDelaysUnreg1  long 5,70_000000,105_000000,150_000000,210_000000,260_000000,315_000000,0

HyperFlashDelays2       long 6,70_000000,110_000000,160_000000,225_000000,277_000000,320_000000,0
HyperFlashDelaysUnreg2  long 6,70_000000,105_000000,150_000000,210_000000,260_000000,315_000000,0

PSRamDelays             long 7,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0

' SRAM delay as yet untested
SRamDelays              long 4,26_000000,92_000000,150_000000,206_000000,258_000000,310_000000,333_000000,0
{
The profile format begins with the initial delay value, followed by frequencies at which the
delay is sequentially increased until either it falls below the next frequency, or the list 
terminates with a zero.  Frequencies must be stored in increasing order.

e.g. using HyperRam data above for sysclk/1 unregistered clock:
   if            0 <= freq <  88000000 Hz, the delay compensation value is 6,
   if    880000000 <= freq < 120000000 Hz, the delay compensation value is 7,
   if    120000000 <= freq < 180000000 Hz, the delay compensation value is 8,
                   ...etc...

evanh · 2021-08-02 05:28

Oh, oops, right, the JohnnyMac may be a bad choice. The signal is probably rounding off and fading away (attenuating) too much. Maybe 75 MHz is as good as that can do.

evanh · 2021-08-02 05:31

@rogloh said:
If that doesn't help at all in the different plug in locations I think what will be needed for the JohnnyMac will be to use the upcoming PSRAM based Edge board, and not the HyperRAM breakout.

I haven't seen a schematic for the RAM&Edge board. The Prop2 pins used for the RAMs better not also be routed to the edge connector.

rogloh · 2021-08-02 05:33

No they don't route to the edge connector - I also have checked that with Parallax...

Scroungre · 2021-08-02 06:57

Using rogloh's test software the 'delay' value runs up to thirteen, but it doesn't help. I poked around a little looking at some of the default values, and got confused. I'll poke around a bit more. S.

rogloh · 2021-08-02 06:58

Can you post the delay test results @Scroungre ?

rogloh · 2021-08-02 07:59

Here's what I see when I run the delay test on a JonnyMac (at base pin 0) vs P2-EVAL (at base pin 16). Exact same test image and HyperRAM board, but I see problems on the JonnyMac. Looks like poorer signal integrity on the JonnyMac board. First result in the file below is the JonnyMac, the second is the P2-EVAL.

evanh · 2021-08-02 08:02

Ouch! That's horrible. Need to look below 50 MHz too.

rogloh · 2021-08-02 08:36

Well the traces from the Edge connector to the P0-P15 pins are already up to 3-4 inches long and skew might be a problem.

If you need high speed with parallel busses it might be better to use the P2-EVAL. Not sure the JonnyMac was designed with high speed signal integrity in mind.

Scroungre · 2021-08-02 13:36

@rogloh said:
Here's what I see ...

That's almost exactly what I see too. I can post my results if you still want me to, but there's not much point, in my opinion. S.

eta: PS, you're probably right, that it's better to use an Eval for this - or even better, a P2 Edge with PsRAM on-board!! (hint hint? ;-) S.

rogloh · 2021-08-02 14:43

@Scroungre said:

@rogloh said:
Here's what I see ...

That's almost exactly what I see too. I can post my results if you still want me to, but there's not much point, in my opinion. S.

eta: PS, you're probably right, that it's better to use an Eval for this - or even better, a P2 Edge with PsRAM on-board!! (hint hint? ;-) S.

Yeah don't worry about it if it looks about the same as mine. Will very likely be the same issue.

A PSRAM Edge will likely be the way to go for the JonnyMac breadboard users if they want lots of fast external memory, once it becomes available. It should work pretty well with video in particular.

Also if you have a P2 EVAL, building your own PSRAM based breakout can be pretty inexpensive too and easy to solder as I have discovered. Those memories are less than $2 USD from Adafruit or Mouser etc.

Scroungre · 2021-08-02 15:55

I was wondering about fiddling with your 'mailbox' format.

In the application I have in mind each (of four or five) "calculation" cog will want to collect three or four longs from external RAM on a fairly regular basis - and it would be nice to have one custom request that asks for several (possibly non-contiguous) longs from the external RAM.

Writing can be done one long at a time, slowly, offline.

Perhaps (up to a limit: there's actually quite a lot of Hub RAM) there would be a way for a cog to specify several external RAM addresses at once, and expect back data from all of them once the cog doing the transfer is done.

I was also going to put ring buffers in the "calculation" cogs to help keep up with the "eggbeater". This isn't video - that's your department ;-) Just systems control.

So adjustable 'mailbox' sizes would be nice. Anyhow - I guess if I want it, I should get on with writing it, huh? ;-) (I can save some space because I don't care about bytes and words - I'm doing everything in 32 bits). S.

rogloh · 2021-08-03 00:43

@Scroungre said:
I was wondering about fiddling with your 'mailbox' format.

In the application I have in mind each (of four or five) "calculation" cog will want to collect three or four longs from external RAM on a fairly regular basis - and it would be nice to have one custom request that asks for several (possibly non-contiguous) longs from the external RAM.

Writing can be done one long at a time, slowly, offline.

Perhaps (up to a limit: there's actually quite a lot of Hub RAM) there would be a way for a cog to specify several external RAM addresses at once, and expect back data from all of them once the cog doing the transfer is done.

Yeah that type of operation is supported and I do have the ability to do this in the driver. Instead of a single memory request you can construct a linked list of n independent requests ahead of time and just pass the address of the head of this request list to the memory driver and it will work its way through the list sequence and notify you at the end, instead of one at a time. You can either block on the overall list completion, or instead run this list in a non-blocking mode where the COG that triggers the request can go off to do other work in the meantime and optionally check the result later whenever it is convenient, sometime before the next request list needs to be issued.

Note: When running in a list I think you need to setup burst/block transfers instead of a single item reads. But you can still setup burst transfer sizes for single byte/words/longs in your burst request lengths so there is no restriction for transferring these standard item sizes into your arbitrary hub addresses.

Scroungre · 2021-08-03 11:33

Hm. Dunno if that seemed useful for just three longs, but I guess that counts as 12 bytes, so... I'll tinker with it for awhile. It will be definitely useful for loading up the RAM in the first place. Thanks! S.

Scroungre · 2021-09-18 11:15

Waking up an older thread here, but here goes:

I bought a KISS0000 board and built a little adapter for it so I could use the Parallax P2 HyperFlash / HyperRam card I had, and ran some more memory tests on it.

The results were wonderful! Far, far, better than on the Jon McPhalen breadboard. 100% up to 300MHz!! (for some reason, my KISS board doesn't like anything over 300MHz. 350 seems 'right out'. Dunno why - will ask on the correct threads).

But yeah, here's my results up to 300MHz - and they're 100% at the top! Yay! S.

rogloh · 2021-09-18 14:10

That's a decent result, even via your adapter. The shorter traces on the KISS0000 board obviously help a lot. With it's long path lengths the JonnyMac board is not really designed for high speed operation. You should try the sysclk/1 option as well to see if that can be achieved at any frequencies but it pushes the timing much harder so it might not work as well in your setup though. Its performance might possibly vary with module position on the KISS0000 board too if there are differences in data bus path lengths in different positions.

Pandemic lockdown demotivation (~231 days and counting since March last year) has slowed things down and I've been messing about quite a bit with my P2ME2 and VOYAGER boards in my spare time lately but I know I want to get back and release my video driver update that includes the official external HyperRAM and PSRAM capability that it already supports. This driver will also help when the updated P2Edge board comes along with PSRAM and make it easy to use that system with external video frame buffers, or other applications.

evanh · 2021-09-18 14:16

And after those tests, try shifting the adaptor to the other end so that the hyper data bus is shorter than the hyper clock.

Scroungre · 2021-09-20 02:07

Both interesting thoughts, but if it's perfect as it is (which I'm not sure - Instead of "percentage of successful read/write cycles" I'd rather see "Errors in [say] 10,000 cycles") there's not much point in trying to make it better. Unless I can coax the KISS to 350 and find errors thereby.

I'll try your suggestions anyhow - even if it makes it worse, that's useful information. Oddly enough, unlike the Edge breadboard, the headers on the KISS are equally spaced, so I can move down eight data pins at a time (if I stick in some more headers). I don't think the code will like that, though. Maybe I can tweak it a little - more news to come later. Hafta go do carpentry now.

So far, so good, S.

evanh · 2021-09-20 04:40

You're right. I didn't even look at your supplied report file. The fails you are getting above 300 MHz are sounding like a chip reset rather than hyper timing limits. Your report, when compared to the Eval Board, shows there is more room for tight integration. The new Edge Board with RAM should be highly reliable at all clock rates.

Scroungre · 2021-09-20 06:46

It does reek a bit of a chip reset. The RAM test walks up through the frequencies, and if you go over 300MHz it clears the serial terminal window! Rude, that! Can't cut'n'paste at all, let alone logfile it (at least, not with the Propellor Serial Terminal). It also seems a really 'hard' limit - just a bit over, nothing, at 300 exactly, everything's working fine.

If it were weird power or bum caps or something, I'd expect more "flakiness" - more of a 'fuzzy' limit, where below it you get occasional errors and above it you get occasional success.

Looking forward to the Edge with RAM, although the KISS is a more convenient form factor for me. That's just me... S.

rogloh · 2021-09-20 07:08

I don't see what would be intentionally fitted in place to cause a hard P2 reset above 300MHz. If it is thermal or some current limit on the KISS board then it should also probably be load dependent and idling a single COG at 300MHz shouldn't hit that limit so you could write something that just clocks it at over 300MHz and doesn't do much else and see if that works without failure. Also you could monitor the reset pin too when you encounter this. My own P2ME2 board can reset the P2 if the regulator power good signal is driven low indicating the 1.8V voltage is out of range or for a UVLO condition, maybe your KISS board does something similar.

I think the next gen EDGE should help resolve your problems when it arrives assuming it's all designed well timing wise (should be given its small size) and its power design is solid. The P2-EVAL also works reasonably well (apart from use of P28-P31 for high speed stuff).

evanh · 2021-09-20 07:31

@Scroungre said:
... and if you go over 300MHz it clears the serial terminal window! Rude, that! Can't cut'n'paste at all, let alone logfile it (at least, not with the Propellor Serial Terminal). It also seems a really 'hard' limit - just a bit over, nothing, at 300 exactly, everything's working fine.

That's even odder than I was thinking. Time to verify the clock frequencies. Here's a quick test I've thrown together. Compiled with Flexspin. It requires an interactive terminal. It starts at 300 MHz and each key-press there after will bump the frequency by 1 MHz.

Monitor accuracy with an oscilloscope (or frequency counter) on pin P56. Pulse rate is sysclock/1000 so 300 MHz should read 300 kHz.

evanh · 2021-09-20 19:31

Oops, now rebuilt for a 25 MHz crystal:

Scroungre · 2021-09-21 09:06

Clock frequencies verified! I ran your 25MHz code and looked at it on a 'scope, and it does indeed run quite well at up to nearly 400! (yeah, by then it gets weird, but that's to be expected.) It's fine and does indeed display 350kHz when the serial port claims it's running at 350MHz.

Weird. With a fair amount of hackwork I managed to get your code to compile under the Propellor Spin Tool (the remaining difference!) instead of the FlexSpin compiler, and it worked okay there too.

Running Rogloh's driver code through Flexspin got me this note at the end:

299000000        (12)   0       0       0       0       0       0       0       0       100     100     100
300000000        (12)   0       0       0       0       0       0       0       0       100     100     100
Frequency 301000000 is unattainable, stopping
Exiting

because it then asks for an 'enter to continue', which it doesn't in the propellor serial window.

This is quite confusing. I have tried going back to some other code, and it seems to be behaving now? Very strange. S.

rogloh · 2021-09-21 09:21

Aha, in theory that is because it is trying to go in steps of 1MHz but couldn't find a valid set values using a 25MHz crystal. But it should be able to if it divides by 25 then multiplies by 300. I'll need to look at that code.

rogloh · 2021-09-21 09:26

Here's the code I use... for a given desired output frequency it should compute the clock mode value for the PLL setup, or fail with a zero returned if it couldn't find a match.

CON
' clock source used below
    #0, CLKSRC_XTAL, CLKSRC_XIN

' setup one of these based on your P2 HW input clock, 
' this will only be used if the PLL settings get automatically computed (see code below)
    'CLKIN_HZ = _xtalfreq ' also only enable CLKSRC_XTAL below as CLKSRC
    'CLKIN_HZ = _xinfreq  ' also only enable CLKSRC_XIN below as CLKSRC
    CLKIN_HZ = 20000000 ' assume 20MHz crystal by default

    CLKSRC = CLKSRC_XTAL ' enable this for crystal clock source (default)
    'CLKSRC = CLKSRC_XIN ' enable this for direct input clock source on XI (no crystal)

' parameters used when automatically determining PLL settings
    TOLERANCE_HZ = 500000    ' pixel clock accuracy will be constrained by this when no exact ratios are found
    MAXVCO_HZ    = 350000000 ' for safety, but you could try to overclock even higher at your own risk
    MINVCO_HZ    = 100000000
    MINPLLIN_HZ  = 500000    ' setting lower can find more PLL ratios but may begin to introduce more PLL jitter


PRI computeClockMode(desiredHz) : mode | vco, finput, fval, p, div, m, error, bestError
    bestError := -1
    repeat p from 0 to 30 step 2
        ' compute the ideal VCO frequency fval at this value of P
        if p <> 0
            if desiredHz > MAXVCO_HZ/p ' test it like this to not overflow
                quit
            fval := desiredHz * p
        else
            fval := desiredHz
            if fval > MAXVCO_HZ
                quit
        ' scan through D values, and find best M, retain best case
        repeat div from 1 to 64
            'compute the PLL input frequency from the crystal through the divider
            finput := CLKIN_HZ/div
            if finput < MINPLLIN_HZ ' input getting too low, and only gets lower so quit now
                quit

            ' determine M value needed for this ideal VCO frequency and input frequency
            m := fval / finput

            ' check for the out of divider range case
            if m +> 1024
                quit

            ' zero is special and gets a second chance
            if m == 0
                m++

            ' compute the actual VCO frequency at this particular M, D setting
            vco := finput * m
            if vco +< MINVCO_HZ
                quit
            if vco +> MAXVCO_HZ
                next

            ' compute the error and check next higher M value if possible, it may be closer
            error := abs(fval - vco)
            if m < 1024 and (vco + finput) +< MAXVCO_HZ
                if error > abs(fval - (vco + finput))
                    error := abs(fval - (vco + finput))
                    m++

            ' retain best allowed frequency error and divider bits found so far
            if error +< bestError and error +< TOLERANCE_HZ+1
                bestError := error
                mode := ((div-1) << 18) + ((m-1) << 8) + (((p/2 - 1) & $f) << 4)

            ' quit whenever perfect match found
            if bestError == 0
                quit

        if bestError == 0
            quit

    ' final clock mode format is this #%0000_000E_DDDD_DDMM_MMMM_MMMM_PPPP_CCSS
    if mode
        ' also set 15 or 30pF capacitor loading based on input crystal frequency
        mode |= (1<<24) ' enable PLL
        if (CLKSRC == CLKSRC_XTAL) ' enable oscillator and caps for crystal
            mode |= (CLKIN_HZ < 16000000) ? %1111 : %1011
        else
            mode |= %0111 ' don't enable oscillator

rogloh · 2021-09-21 09:41

Just tried a quick test program calling this method above with settings for both 20MHz and 25MHz crystals. Code seems to work in both cases and not return zero when it hits 301MHz. I'm not sure what is happening in your setup. I am using a older 5.3.1 version of FlexSpin in case that makes a difference. If you can run this same test code I posted above in your setup it would be worth comparing the results @Scroungre .

OBJ
    uart : "SmartSerial"
    f    : "ers_fmt"

PUB main() | fr, v
    'setup serial port output
    uart.start(BAUD)
    send := @uart.tx

    f.nl()
    send("PLL test program using crystal operating at ",f.dec(CLKIN_HZ),"Hz",13,10)
    repeat fr from 295000000 to 310000000 step 1000000
        v := computeClockMode(fr)
        f.dec(fr)
        send(" - ")
        f.hex(v)
        f.nl()
    repeat

Results:

PLL test program using crystal operating at 20000000Hz
295000000 - 10C3AFB
296000000 - 11049FB
297000000 - 14D28FB
298000000 - 12494FB
299000000 - 14D2AFB
300000000 - 1000EFB
301000000 - 14D2CFB
302000000 - 12496FB
303000000 - 14D2EFB
304000000 - 1104BFB
305000000 - 10C3CFB
306000000 - 12498FB
307000000 - 14D32FB
308000000 - 1104CFB
309000000 - 14D34FB
310000000 - 1041EFB

and

PLL test program using crystal operating at 25000000Hz
295000000 - 1103AFB
296000000 - 16127FB
297000000 - 16128FB
298000000 - 16129FB
299000000 - 1612AFB
300000000 - 1000BFB
301000000 - 1612CFB
302000000 - 1612DFB
303000000 - 1612EFB
304000000 - 1612FFB
305000000 - 1103CFB
306000000 - 16131FB
307000000 - 16132FB
308000000 - 16133FB
309000000 - 16134FB
310000000 - 1103DFB

Scroungre · 2021-09-21 11:50

Better! But not quite perfect... I cut and pasted your "PRI computeClockMode(desiredHz)" code and replaced the old method with that, and it ran happily up to about 348MHz before complaining about a 'set RAM Delay failed'. See attached text.

I ran all this through FlexProp 5.2. (No Optimization). It seems to behave the same with the Propellor Tool 2.5.3, except that clears the screen when it exits, so I can't cut and paste the results.

My 'CON' block values are:

CON
   _xtlfreq = 25_000_000

    _clkfreq = 100000000

    RAM_START   = $00_000000
    FLASH_START = $02_000000
    FLASH_SIZE  = 32*1024*1024

    BAUD = 115200
    BUFSIZE = 128

' clock source used below
    #0, CLKSRC_XTAL, CLKSRC_XIN

' setup one of these based on your P2 HW input clock,
' this will only be used if the PLL settings get automatically computed (see code below)
    'CLKIN_HZ = _xtalfreq ' also only enable CLKSRC_XTAL below as CLKSRC
    'CLKIN_HZ = _xinfreq  ' also only enable CLKSRC_XIN below as CLKSRC
  '    CLKIN_HZ = 20000000 ' assume 20MHz crystal by default
    CLKIN_HZ = 25000000 ' assume 20MHz crystal by default

    CLKSRC = CLKSRC_XTAL ' enable this for crystal clock source (default)
    'CLKSRC = CLKSRC_XIN ' enable this for direct input clock source on XI (no crystal)

' parameters used when automatically determining PLL settings
    TOLERANCE_HZ = 500000    ' pixel clock accuracy will be constrained by this when no exact ratios are found
    MAXVCO_HZ    = 350000000 ' for safety, but you could try to overclock even higher at your own risk
    MINVCO_HZ    = 100000000
    MINPLLIN_HZ  = 500000    ' setting lower can find more PLL ratios but may begin to introduce more PLL jitter

PS - Using 'fast' sysclk/1' is a dead loss. Not even good up to 300 on my gizmo. S.

Scroungre · 2021-09-21 12:01

The PLL code (with the con block and the Compute Clock method results gave me exactly the same results as yours. (at 25MHz Xtal). S.

PS - For grinsies, I tested out my other KISS (I had bought two) and lightly tweaked your PLL calculation numbers to run up to 360MHz. It did. See attached...

Memory drivers for P2 - PSRAM/SRAM/HyperRAM (was HyperRAM driver for P2)

Comments