64 MB PSRAM module using 16 pins? --> 96 MB w/16 pins or 24 MB w/8 pins

Rayman · 2022-06-05 21:59

Thinking about a memory module, mostly for the @Wuerfel_21 Sega Genesis emulator at this time.
Would like it to connect to the side of a P2 Eval board.

Problem is that side of P2 only has 16 pins and to do it like the Parallax P2 module takes 18 pins.

So, wondering if can share an 8-bit bus between four different 16 MB pairs of chips.
Was looking at the @rogloh driver and he seems to suggest bus sharing is possible.
So, would use 8 pins for the bus and 8-pins for CS and CLK for each of four pairs.

I think it would work, but not sure if would be fast enough for Sega Genesis emulation...

Update: After input from forum, going to a 96 MB setup with six banks with two separate clocks for lower and upper nibble. Driver going to be complex. But, also doing a simple, 24 MB setup with just 8 pins and 3 chips. This can use existing driver and support Sega Genesis emulation

Wuerfel_21 · 2022-06-05 22:38

@Rayman said:
Thinking about a memory module, mostly for the @Wuerfel_21 Sega Genesis emulator at this time.
Would like it to connect to the side of a P2 Eval board.

Problem is that side of P2 only has 16 pins and to do it like the Parallax P2 module takes 18 pins.

So, wondering if can share an 8-bit bus between four different 16 MB pairs of chips.
Was looking at the @rogloh driver and he seems to suggest bus sharing is possible.
So, would use 8 pins for the bus and 8-pins for CS and CLK for each of four pairs.

I think it would work, but not sure if would be fast enough for Sega Genesis emulation...

For Genesis/MegaDrive just a single 4bit chip is already enough.

If you mean the new NeoGeo emulator, I think 8bit bus could be sufficient, since the majority of time is spent on command/wait Command needs 8 memclk, wait needs 10+2 memclk, sprite read needs 4 memclk on 16 bit, would be 8 memclk on 8 bit. So 24 vs 28 clocks per access, not too bad. But that doesn't factor in the couple extra instructions to select the appropriate CS pin (should be 4 of them (mov+shr+and+altd)). I guess it'd be fine, the memory access isn't a terrible bottleneck.

Separate CLK is a bit pointless, I'd share that and just up it to 6 banks (96MB) instead.

Rayman · 2022-06-05 22:54

Thanks for your thoughts on this @Wuerfel_21 !

Guess I need to revisit the console emulation thread to see what's going on. I thought it was all Genesis, but sounds like it is also about NeoGeo (whatever that is)..

The idea with the separate clocks was so could send the same address to all chips and then read them out, one bank at a time.
But, maybe that doesn't make any sense... Thought it would save clocks, but need to think some more on this...

Wuerfel_21 · 2022-06-05 23:05

@Rayman said:
Thanks for your thoughts on this @Wuerfel_21 !

Guess I need to revisit the console emulation thread to see what's going on. I thought it was all Genesis, but sounds like it is also about NeoGeo (whatever that is)..

Yeah, just reread the entire 700 post thread. (I actually love reading decade old long forum threads, so I guess I have contributed my part for future generations to discover)

What is NeoGeo?

Oh nothing, just the most expensive game console ever conceived. Mostly chose it to work on because it is architecturally similar to Genesis (68000 + Z80 + Yamaha sound) so I was able to re-use lots of code.

The idea with the separate clocks was so could send the same address to all chips and then read them out, one bank at a time.
But, maybe that doesn't make any sense... Thought it would save clocks, but need to think some more on this...

Not sure if these PSRAMs like it when you clock them too slow. But I don't think there's really a major benefit to reading from multiple banks sequentially vs. just the same amount of data from one bank. Upping to ludicrous sizes (I guess the theoretical max is 7 banks for 112MByte) seems more useful, but idk about the routing and bus load for that. I don't think the squiggly routing on your design as posted is terribly good at withstanding 170MHz signalling, but what do i know about routing.

rogloh · 2022-06-06 02:24

An 8 bit PSRAM bus is not ideal from addressing perspective in a general P2 system.

The 4 bit PSRAM bus is addressed in native bytes which is okay and turned out the easier driver as there is no need for RMW and you can write on any byte boundary. The 4 bit PSRAM is meant to be used in byte wide setups as each PSRAM address accessed will return 2 nibbles. It's the lowest performing option but is relatively simple to use being byte granular. There's no concept of unaligned accesses.

The 16 bit PSRAM option was quite a bit trickier to put together initially but it's the highest performance option for bandwidth and at least gets you one P2 long matching the native memory storage size, being 2x16bits read = one P2 long. In this case Read-Modify-Write cycles are needed for all byte and word writes, or for unaligned longs which complicates the writes.

8 bits is sort of stuck in between (worst of both worlds). It doesn't get you a full long on each access, you will have to mess with address translations differently due to the word size addressed AND you still have to do RMW. Bytes writes have to do read modify write cycles on 16 bits. Any unaligned 16 bit or 32 bit accesses are the same. Also as of right now there is no 8 bit PSRAM support in my driver.

Rayman · 2022-06-06 10:28

Suppose could have two clocks, one for upper nibble and one for lower nibble. Could avoid need for RMW?

rogloh · 2022-06-06 11:15

You could split your 8 bit bus into two 4 bit halves and either access one bus at a time (in a dual 4 bit mode) based on the address range/bank, OR you could potentially run two different driver COGs accessing each 4 bit bus independently and simultaneously. What you can't do (at this time) is access PSRAM configs using 8 bit data buses with my driver. Has to be either 4 or 16 right now.

In my PSRAM driver each different bank can setup its own CS and Clock pin or it can share a clock with other banks. Also each bank can drive out a contiguous range of pins (not just one) so you could drive multiple CS or CLK IO pins feeding into different devices from the same bank if their control pins are consecutively assigned to the memories. This allows quite a bit of flexibility.

evanh · 2022-06-06 11:40

The proper way to do that is have a cog and streamer for each bus. Then they can operate concurrently as dual-channel RAM.

Rayman · 2022-06-06 13:09

Ok, had an extra pin anyway, so will use as second clock. This way, could use existing @rogloh driver in 4-bit mode. Actually, should be able to use two of these drivers at same time as mentioned above. It could be that two chips on same nibble bus could be enabled at the same time, but only one of them would be clocked, so should be OK.

Or, a new driver could working in 8-bit addressable mode.

But, best way would probably be a new driver 16-bit addressable mode, would be good for 16-bit graphics and such.

So, there are 6 banks of memory, each with it's own CE. Each bank has two chips, with different clock signals on each. Total memory is 96 MB.
This works out great for using all 16 pins of the interface.

rogloh · 2022-06-06 13:34

Unfortunately with this configuration I think in dual mode only half the chips can be used. If CLK0 is used in one 4 bit driver and CLK1 in another 4 bit driver, then the chip selects preclude using the second chip in the assigned bank.
i.e. This means driver one could use half of bank's 0, 2, 4 chips, and driver two could use half of bank's 1,3,5 chips. So half the memory is inaccessible.

Wuerfel_21 · 2022-06-06 13:37

@rogloh said:
Unfortunately with this configuration I think in dual mode only half the chips can be used. If CLK0 is used in one 4 bit driver and clock 2 in another 4 bit driver, then the chip selects preclude using the second chip in the assigned bank.
i.e. This means driver one could use half of bank's 0, 2, 4 chips, and driver two could use half of bank's 1,3,5 chips. So half the memory is inaccessible.

That's still 48MB though.

rogloh · 2022-06-06 13:38

Yeah but you are paying for 96MB.

Rayman · 2022-06-06 15:03

I suppose the other way to do this is to just use one 8-pin header and work in nibble mode. I'll think about that...

Meanwhile, here's the 96 MB board, now with two clocks:

Wuerfel_21 · 2022-06-06 15:07

Aren't you missing the cutout between the pin headers?

Rayman · 2022-06-06 15:13

Do you mean to clear the ground posts on Eval board? The headers bring the board over that posts with ~5mm clearance...

Here's a possible nibble mode board schematic with 24 MB using 3 chips:

Rayman · 2022-06-06 15:49

Got the nibble version done:

pik33 · 2022-06-06 17:11

@Rayman said:
Got the nibble version done:

This!. 2 of these can be attached to 2 banks and give 48 MB, either by use one bank at once or use 2 cogs to access both at the same time. The board is simple enough to make a prototype without sending gerber files to China.

Rayman · 2022-06-06 18:02

Just sent gerbers to China

Using SeeedStudio Fusion, getting 7 different board design prototypes for $115.
First time trying outline that is not square on some of them, we'll see if that works...

Rayman · 2022-06-07 00:05

@Wuerfel_21 said:
Aren't you missing the cutout between the pin headers?

Guess you're thinking I'd be using the bottom entry headers, like Parallax does.
Maybe I should be...

Just looked them up and they seem to cost 3X as much as the regular ones...
https://www.mouser.com/ProductDetail/Harwin/M20-7810645?qs=k41KVqW3ympKz0jQ%2Bxdz7A==

Have to think about this...

rogloh · 2022-06-07 00:36

Did you find any available PSRAM chips Rayman? Last time I was building boards they were somewhat scarce.

Rayman · 2022-06-07 01:01

If I'm seeing it right, Mouser has 40,000 in stock:
https://www.mouser.com/ProductDetail/878-APS6404L-3SQR-SN

Rayman · 2022-06-07 01:07

Hmm... Wonder if APS6404L-3SQN is better than APS6404L-3SQR...
Datasheet first page suggests that linear burst speed might be faster...

Ok, looks like APS6404L-3SQR will let you cross page boundaries at the cost of lower speed.
But, I think I remember @rogloh saying that his driver doesn't cross page boundaries. So, not clear which is better...

rogloh · 2022-06-07 01:21

@Rayman said:
If I'm seeing it right, Mouser has 40,000 in stock:
https://www.mouser.com/ProductDetail/878-APS6404L-3SQR-SN

Ok. Larger amounts of stock now it seems. That's better to see for once.

@Rayman said:
Hmm... Wonder if APS6404L-3SQN is better than APS6404L-3SQR...
Datasheet first page suggests that linear burst speed might be faster...

Ok, looks like APS6404L-3SQR will let you cross page boundaries at the cost of lower speed.
But, I think I remember @rogloh saying that his driver doesn't cross page boundaries. So, not clear which is better...

Yes no need to worry in my driver about the boundaries. Bursts crossing this boundary will be split up and a new chip selected transaction started. If you wanted your own code it might make a difference. Running the PSRAM slower at below 84MHz just so you can stream across the boundary was not used as it then limits the P2 speed to only 168MHz which is a little restrictive. The video stuff I use always wants more bandwidth and higher P2 clocks anyway. Restarting a transaction is not too onerous and you can always use a small hub buffer to smooth things out if you really want to stream from it.

evanh · 2022-06-07 09:07

I wish there were DDR gull-wing parts.

Rayman · 2022-06-07 17:22

I didn't appreciate that there were DDR parts that look like HyperRam...

Things like this: APS6408L-3OBM might allow the streamer to collect a byte every clock:
https://www.mouser.com/ProductDetail/AP-Memory/APS6408L-3OBM-BA?qs=IS%2B4QmGtzzowqQ04s/oaHA==

I started out looking for "HyperRam", but couldn't find any in stock. This looks to be even better.

Rayman · 2022-06-07 17:49

This one here looks like a winner:
https://www.mouser.com/ProductDetail/AP-Memory/APS12808L-3OBM-BA?qs=IS%2B4QmGtzzrXcGkbuYahqw==

Two 8 MB die inside for 16 MB total. Price is just a bit more that 8 MB.

I don't know why it's not "HyperRam", but looks the same to me, except that speed is a bit higher than it was. 133 MHz vs 100 MHz @ 3.3 V

Wuerfel_21 · 2022-06-07 18:03

HyperRAM has different (very stupid) command format. These seem to use a saner format. Interestingly only word-addressable. Seems to have less latency than the 4bit parts. IIRC @rogloh said that actually latching the data correctly when it's clocking every cycle is significantly tricky. Though if you run it slower it seems you can reduce the latency clocks, which is pretty good ig.

Rayman · 2022-06-07 18:32

Think Hyperram was word addressable too. But, you can use the DM pin to write just a single byte.

Rayman · 2022-06-07 18:40

Datasheet claims it to be byte addressable:
"Octal DDR PSRAM device is byte-addressable. Memory accesses are required to start on even addresses (A[0]=’0)."

So, you just have to start read or write at an even address, but again, you can use DM pin to stop writes to any bytes.
Guess you can stop read or write whenever you want...

Rayman · 2022-06-07 19:00

Pinout is the same as hyperram... Should be able to copy and paste layout inside Eagle...

Rayman · 2022-06-07 20:49

Just drew this up, 64 32 MB using two of these chips and a uSD card...

This should be able to run NeoYume I think...

64 MB PSRAM module using 16 pins? --> 96 MB w/16 pins or 24 MB w/8 pins

Comments