Shop OBEX P1 Docs P2 Docs Learn Events
Where to find a P2-EC32MB driver for RAM — Parallax Forums

Where to find a P2-EC32MB driver for RAM

What's the best generic driver for the 32MB on the P2-EC32MB?

Comments

  • RaymanRayman Posts: 15,634

    Well, there's the @rogloh driver. It's very full featured, but a bit tricky to figure out.

    Then, there's the @cgracey one. It's bare bones, but easy to understand. Doesn't work with HyperRam though and has a limited but practical frequency range where it works.

    @Wuerfel_21 posted what looks like a barebones version of the @rogloh driver recently.

  • RaymanRayman Posts: 15,634

    The @cgracey one is used in this PSRAM base VGA driver here:
    https://forums.parallax.com/discussion/175725/anti-aliased-24-bits-per-pixel-hdmi/p3

  • RaymanRayman Posts: 15,634

    The @Wuerfel_21 one is part of the teapot demo:
    https://forums.parallax.com/discussion/176083/3d-teapot-demo#latest

    file is exmem_mini.spin2

  • evanhevanh Posts: 16,719

    A stripped down non-mail-boxed bare bones would also be possible if that's all you want. You'd make your own messaging then.

  • The exmem_mini stuff is basically Roger's PASM drivers, but with a stripped down and opinionated Spin wrapper. The big thing is that the 16bit/8bit/4bit PSRAM and HyperRAM drivers are supported roughly equivalently, with no change needed in the code. I find this useful because I have boards requiring all of the 4 driver variants (though I don't have a dedicated 4-bit board, only the 24MB EVAL Accessory - but the idea that someone might make one in the future isn't too absurd).
    This code is actually not that recent, but the version from the latest teapot demo post (3_2 I think?) is what you want, because I tweaked settings for improved performance.

  • RaymanRayman Posts: 15,634
    edited 2025-09-16 22:09

    Do like the ability to support all the various memory types. But, also like the simplicity of the @cgracey code. So, that's my current dilemma...

    Should I write code that doesn't support my own boards? Suppose not, but tempted to anyway...

  • Extending Chip's driver is probably not so bad. Most of the complexity in Roger's drivers is because they're full of features that are dubiously useful to begin with (e.g. trying to support byte-granular read/write DMA when the underlying hardware doesn't)
    Generally (i.e. I'm not looking at the code right now) you'd need to first make timing configurable and add multi-bank support (even if it's just forcing the extra select pins high), then it'd work with e.g. the P2Platform and the MXX thing. To adapt to the narrow PSRAM types, you really just need to change some values and possibly shift the address and transfer lengths. Starting from the 16-bit variant is advantageous there, as the access granularity is largest there (32 bit) and thus can be emulated on all narrower memories. You can see how the different PSRAM widths are handled in my RAM test program, it's just some value changes.
    Speaking of, that's probably also a decent base to work from. (unless I did something weird with the HyperRAM - it's sometimes advantageous to restrict addressing to pages, I might have done that (because the field for sub-page addresses is in a weird place))

  • (Unrelatedly, I don't think any of the drivers have been tested with those newer higher-capacity HyperRAMs - A 64Mx8 part has been available for a while. If only I had a Hyper accessory board populared with two of those, it'd be almost as janky as the 96MB board, except now at 128MB, which I think would be a new record)

  • ke4pjwke4pjw Posts: 1,223

    Dumb question, without knowing anything about any of these drivers or looking at source: Do they not use locks? For the life of me, for something as simple as reading and writing to a memory location, why use a mailbox and a dedicated COG, when you can just use a lock?

  • RaymanRayman Posts: 15,634

    @ke4pjw Interesting idea. Maybe for low bandwidth things you don't need a dedicated cog...

  • evanhevanh Posts: 16,719
    edited 2025-09-17 00:12

    A small piece of "inline" pasm2 should be able to do that well. Since it dynamically loads on a per cog basis it'll happily share between multiple cogs. It's up to the program to manage sharing.

    PS: This is in effect how Flexspin does its block drivers. Well, the ones optimised with Pasm2 at least. The unoptimised drivers just bit-bash using C or Spin.

  • I took a look at Chip's driver and kind of understand it. Not sure what all the stuff is about the 8 cogs. Was looking for something that would expose read and write methods in SPIN2 along the lines of

    PSRAM.readbock(xnumbytes, fromExtRAMAddr, toCogRAMAddr)
    PSRAM.writeblock(xnumbytes,fromCogRAMAddr, toExtRAMAddr)

    That would make it accessible for things beyond video drivers. Not great at performance, but useful. I get they are all in PASM, as the killer app for it is video. I guess I just need to buckle down and build what I want :)

  • @ke4pjw said:
    Dumb question, without knowing anything about any of these drivers or looking at source: Do they not use locks? For the life of me, for something as simple as reading and writing to a memory location, why use a mailbox and a dedicated COG, when you can just use a lock?

    Locks are fine in some situations. They do limit the ability to control QoS if the lock taker is doing large transfers and wont relinquish the lock fast enough though, and locks are used on a first come first served basis. My drivers operate differently and full give control to the mailbox poller COG which can then choose to fragment and prioritize memory requests to accomodate regular accesses by real-time COGs, such as a video COG which needs priority to sustain the video pixel data without interruptions.

  • evanhevanh Posts: 16,719

    @ke4pjw said:
    I get they are all in PASM, as the killer app for it is video. I guess I just need to buckle down and build what I want :)

    It's the cogRAM residence that's important, more than the Pasm2. It allows using the FIFO for streamer ops. An "inline" Pasm2 subroutine also gets loaded into cogRAM, thereby providing the same ability on a temporary basis.

    PS: I put quotes around inline because it's only inline in the source code. The actual execution is done as a subroutine.

  • roglohrogloh Posts: 6,000
    edited 2025-09-17 07:14

    @Wuerfel_21 said:
    Extending Chip's driver is probably not so bad. Most of the complexity in Roger's drivers is because they're full of features that are dubiously useful to begin with (e.g. trying to support byte-granular read/write DMA when the underlying hardware doesn't)

    This is useful if you don't want to only support the lowest common denominator access type across different memory types (such as reading/writing longs only) and allowing software portability when using different memory types. In cases where the underlying memory width is not natively being accessed we sometimes need to do a read-modify-write of the different sized quantity in order to write it back without corrupting adjacent data byte(s), also at the start end ends of bursts if the addresses are not aligned to native memory widths. This read-modify-write is also quite useful if you need to do pixel operations on 8 or 16 bit data. In some cases that will slow things down however (though not usually vs the client doing it), so for highest performance a fully custom driver may warrant having fewer features. I would say I chose the driver feature set in order to allow the most versatile use of my drivers over pure maximum performance, though they are not typically slow for medium to large bursts where transfer duration dominates. For individual memory accesses as fast as possible from a single COG the access latency can certainly be reduced with other drivers, as you've already encountered in your own coding.

  • pik33pik33 Posts: 2,415

    support byte-granular read/write

    That's essential fetaure for a graphic's driver (to get/set a pixel)

  • @ke4pjw said:
    I took a look at Chip's driver and kind of understand it. Not sure what all the stuff is about the 8 cogs. Was looking for something that would expose read and write methods in SPIN2 along the lines of

    PSRAM.readbock(xnumbytes, fromExtRAMAddr, toCogRAMAddr)
    PSRAM.writeblock(xnumbytes,fromCogRAMAddr, toExtRAMAddr)

    That would make it accessible for things beyond video drivers. Not great at performance, but useful. I get they are all in PASM, as the killer app for it is video. I guess I just need to buckle down and build what I want :)

    The exmem_mini thing basically does that and nothing else (though there's a 4th parameter that controls whether the function returns immediately or waits for the transfer to complete).
    If the block is properly aligned and doesn't hit the nasty cases (like the afore/below mentioned unaligned write RMW) it's not bad performance at all.


    @rogloh said:

    @Wuerfel_21 said:
    Extending Chip's driver is probably not so bad. Most of the complexity in Roger's drivers is because they're full of features that are dubiously useful to begin with (e.g. trying to support byte-granular read/write DMA when the underlying hardware doesn't)

    This is useful if you don't want to only support the lowest common denominator access type across different memory types (such as reading/writing longs only) and allowing software portability when using different memory types.

    The point is that writing a single byte for me is dubiously useful because it's inherently complex, slow (except on 4-bit, which is just slow, full-stop) and not really what the hardware wants to do and by checking for that case all the actually good operations get slowed down a little. (Consider the case of writing an unaligned 16-bit word straddling a page boundary, that's at least 4 commands and a bunch of software voodoo to make that happen with the given 2-byte client buffer).

    FYI: The best case is always self-aligned power-of-2-sized blocks (so if it's 64 bytes it should be on a 64 byte boundary) made as large as possible for the application. Self-alignment guarantees the block never crosses a row boundary (unless the block is bigger than a whole row).

    (Speaking of video, I've been thinking that packing framebuffer lines into PSRAM rows would increase performance somewhat... i.e. if page size is 2048 bytes and each line is 640 bytes, you'd pack 3 lines together (-> 1920 bytes) and waste/use-for-something-else the remaining 128 byte block)

    @pik33 said:

    support byte-granular read/write

    That's essential fetaure for a graphic's driver (to get/set a pixel)

    Generally those end up being the function everyone tells you not to use because they're so slow :P
    Seen that a few times (and persists into modern times with glReadPixels and friends)

Sign In or Register to comment.