Memory Breakout Poll

jmg · 2019-06-19 23:52

Wingineer19 wrote: »

..
Maybe it's time for a "Working Group" blessed by Parallax to come up with standard memory arrangements for the Prop2 that will be supported by the Compilers?

Maybe have a standardized layout for parallel memory and another for serial SPI/SQI?

Imagine if each Compiler only had to account for one parallal memory design and one serial memory design, with variants as to which pins can used for each respective design, but the memory architecture for each memory type is fixed.

There has already been a "Working Group", and the standard pinout is what you see on p2Eval and P2D2x

That supports 1-bit SPI, and from 2 sources mapped to 4 Prop pins.

Currently physically that is Micro-SD-Card and SO8 FLASH.

That already gives some choices :

eg Peter has mentioned he will boot-from-SD, and install PSRAM into the SO8.
That will be very interesting to see. There are reports of EPM8266 users 'swapping-in' that PSRAM memory for Flash, and managing ok

Or, someone could make a Micro-SD memory breakout board, (1mm thick?) which wires larger Static SRAM or PSRAM onto the 4 allocated SD pins, and so boots-from-flash.
Currently, PCB designs route only 4 wires to Micro-SD.

ie there are choices there now, for 1-pin spi, on existing hardware.
Addit: actually, there are choices for Dual-SPI on the FLASH memory, and the streamer looks to support 2-bit modes, so it may be possible to cache from Flash at dual-spi speeds.
For XIP code designs, maybe that Dual-SPI is a better sweet spot, as it uses standard reference hardware.

I see the Flash in P2-Eval W25Q128JV, does supports DTR, for Dual-edge clocking. Shows
DTR Fast Read Dual I/O Command: 0BDh

For QuadSPI, that's a broader chestnut, as IIRC there were pin-align issues with streamer and Quad modes, and it was deemed the electrically smarter pin alignments, did `not look as good`.
I think the present pinout does not 'add-3' to get to QuadSPI, so a new pin map would be needed, and that may also be affected by the headers on P2-Eval

It may be P56~P63 group on one header, can support QuadSPI ?

Peter Jakacki · 2019-06-20 01:38

Wingineer19 wrote: »

Hey, it seems there was some guy on the Prop1 forum who was wanting to add memory to a Prop1 board and was taken to task over it. I guess the desire for additional memory isn't limited to the Prop1 platform

On a more serious note, ...........

Yes, I've always said it is madness but mostly in terms of I/O resources really. Some of the external memory solutions consume too many precious I/O that you start to wonder if the P1 would be used for anything serious other than running introverted software. I've never had to use any external memory with Tachyon since I was always able to cram everything in including VGA, FAT32, and FTP/HTTP servers into the existing memory and still have room and speed to run application software.

The P2 has plenty of I/O and the reason for "external" memory is more to do with video graphics rather than trying to cram more software in.

Cluso99 · 2019-06-20 05:18

Peter, I have in fact used XMM on P1. I needed speed so the SRAM has 8 I/O, 20 Address, CS, WE, OE connected to P1. In addition, an SD socket is connected, plus the serial I/O on P30/31.

This was a commercial product that had large databases on the SD card (files with lists of stars, nebula, etc). Catalina C was used in XMM mode to sample/sort the databases and then send the info to a second P1 which was driving a telescope to automatically point to the chosen star/nebula/etc, and then maintain a lock.

This would not have been possible without the SRAM, XMM and Catalina. This was pre PropGCC and Tachyon days

Anyway, it was great to work with Ross to get Catalina running

BTW there was a 3rd P1 too! It was in a remote box with a red 4x20 line LCD and a custom keypad. This could have been done with a different micro, but hey, it was already on the BOM, and we love props too

Peter Jakacki · 2019-06-20 05:30

Cluso99 wrote: »

Peter, I have in fact used XMM on P1. I needed speed so the SRAM has 8 I/O, 20 Address, CS, WE, OE connected to P1. In addition, an SD socket is connected, plus the serial I/O on P30/31.

This was a commercial product that had large databases on the SD card (files with lists of stars, nebula, etc). Catalina C was used in XMM mode to sample/sort the databases and then send the info to a second P1 which was driving a telescope to automatically point to the chosen star/nebula/etc, and then maintain a lock.

This would not have been possible without the SRAM, XMM and Catalina. This was pre PropGCC and Tachyon days
Anyway, it was great to work with Ross to get Catalina running

BTW there was a 3rd P1 too! It was in a remote box with a red 4x20 line LCD and a custom keypad. This could have been done with a different micro, but hey, it was already on the BOM, and we love props too

You mean after the RAM was connected you had a grand totals of one I/O pin left?
As I said - madness

Cluso99 · 2019-06-20 05:41

No, all pins were used!
8 io, 19 addr (mistake above), 2 serial, leaving 3 for the shared CS(sram)-CE(sd), WE and OE which were also shared with the EEPROM SDA and SCL.

VonSzarvas · 2019-06-21 09:30

Given the parts mentioned above, seems we could do:

HyperRAM
Dual header breakout: 4x HyperRAM chips for total of 16 I/O pins (32 MByte total)

SRAM
Single header breakout: 3x SRAM for total of 8 I/O pins (24 MByte total)
Dual header breakout: 11x SRAM for total of 16 I/O pins (88 MByte total)

Dual header, Dual bus breakout: 6x SRAM for total of 16 I/O pins (48 MByte total) ** Although might as well have the singles, and use either 1 or 2 of them !

Which of these would be the most useful configuration for your projects ?

evanh · 2019-06-21 10:10

I'd like to experiment with possible HyperRAM burst rate of 250 MB/s. It'll be hard to beat that with anything else. Having those BGAs pre-mounted will certainly make my life easier.

dMajo · 2019-06-21 10:30

VonSzarvas wrote: »

Given the parts mentioned above, seems we could do:

HyperRAM
Dual header breakout: 4x HyperRAM chips for total of 16 I/O pins (32 MByte total)

SRAM
Single header breakout: 3x SRAM for total of 8 I/O pins (24 MByte total)
Dual header breakout: 11x SRAM for total of 16 I/O pins (88 MByte total)

Dual header, Dual bus breakout: 6x SRAM for total of 16 I/O pins (48 MByte total) ** Although might as well have the singles, and use either 1 or 2 of them !

Which of these would be the most useful configuration for your projects ?

If you do the sram module, if the the srams have stand-by current low enough, please consider also a secondary power supply source (battery, super-cap, header, ...) to allow sram contents to survive power-cycling.

VonSzarvas · 2019-06-21 12:20

evanh wrote: »

I'd like to experiment with possible HyperRAM burst rate of 250 MB/s. It'll be hard to beat that with anything else. Having those BGAs pre-mounted will certainly make my life easier.

Those burst rates are interesting to compare.

HyperRAM at 1.8V
4 qty HyperRAM (dual module, sharing data pins)
Potential throughput: 333 MB/s single channel

SRAM at 3.3V
Potential throughput: 168 MB/s single channel linear burst
Potential throughput: 288 MB/s single channel 32 byte burst, within boundary

Potential throughput: 336 MB/s dual channel concurrent linear burst
Potential throughput: 576 MB/s dual channel concurrent 32 byte burst, within boundary

Sure, SRAM needs a solid software driver to manage the concurrent writes for each channel, but the potential looks very close to HyperRAM for a fraction of the cost, plus level-shifting not required.
ie. compare 8MByte memory, HyperRAM = $3, SRAM = $0.50 !

Maybe I'm missing something?

Rayman · 2019-06-21 14:12

I personally don't see the point of having more that 1 hyperram on these dual header boards.
HyperRam + HyperFlash would be nice. There is chip with both, but I think it costs more than separate...

I'd rather have two of these in the setup so, for example, one could be used for displaying while the other one is used for rendering the next frame... Or, one used for Flash and the other for RAM, etc.

VonSzarvas · 2019-06-21 14:19

What RAM capacity would realistically be the maximum useful ?

I agree there's no point adding more and more memory if applications cannot make much use of it.

VonSzarvas · 2019-06-21 17:10

Considering that HyperRAM takes so many pins (13 IOs for one, 14 for two sharing the data lines), would it not be more useful to have two SRAMS with individual address pins (6 per SRAM)

Two SRAMs on their own IOs would get 168 MB/s per ram chip on 12 pins.

This assumes ram i pairs is required (which makes sense as @Rayman explained), and that a pair of 8 MByte would suffice for most applications.

So then the next question might be, that if qty 2 of the 8 MByte rams is plenty, then would 168MB/s access speed suffice, or would dual channel HyperRAMs be required, across 26 pins (13 IOs each), to reach closer the 333MB/s?

Caveat: The HyperRAM top speed assumes that the smartpin logic level will be 1.8V, either by internal configuration, or with some sort of voltage level shifting. Otherwise at 3.3V, top HyperRAM speed of 180MB/s is likely. And in that case, SRAM certainly seems to win the day?

So all this pondering points to the critical questions...

How fast is really needed? ( Where does ??? MB/s become faster than P2 can make use of )
How many RAM banks are really needed (those which can be addressed concurrently). Could be two ?
How large is plenty for each RAM bank? 8 MByte ?

Caveat 2: Flash is another topic. this is all about RAM for now. Once this is figured, then I think looking at including flash could be a good idea, especially if pins allow.

cgracey · 2019-06-21 17:17

The pins need to operate at 3.3V. You can use the internal DAC comparator and bit DAC output to interface at 1.8V, but those DACs would take a lot of power and the whole thing probably wouldn't work at over 30MHz. So, whatever is going to happen quickly is going to have to happen at 3.3V logic levels.

VonSzarvas · 2019-06-21 17:18

And if VIO on that group of pins was at 1.8V (instead of 3.3V), that wouldn't work either, right? As VDD would also need to be reduced a bit ?

cgracey · 2019-06-21 17:22

VonSzarvas wrote: »

And if VIO on that group of pins was at 1.8V (instead of 3.3V), that wouldn't work either, right? As you'd also need to reduce VDD a bit ?

Yes. Using a low voltage for VIO significantly unbiases the level translators, needing VDD to be lowered, as well.. Just run the I/O at 3.3V. It works and it's fast.

VonSzarvas · 2019-06-21 17:30

So it seems to me, that SRAM at 3.3V (168 MB/s) could be a better solution than HyperRAM at 3.3V (~180MB/s), because SRAM only requires 6 pins and so both RAM chips would be addressable concurrently at 168 MB/s. Whereas HyperRAM shares the same databus (assuming max 16 pins per accessory module), and so the effective read/write rate might be as low as 90MB/s when both memories are being accessed concurrently.

Have I got that right ?

Yanomani · 2019-06-21 18:24

There could be some misunderstanding going on, about the number of pins that Hyperrams @ 3.3 V (3.0 V in fact, as speccd at the datasheets) would use.

Both Cypress and ISSI parts, up to 128 Mb (16 MB) current maximum capacity, would need only twelve pins each (RESET#, CS#, CK, D[7:0] and RWDS).

Add just one CS#, for a total of thirteen pins, and anyone would be able to select between two 128 Mb packages, sharing data buses, RWDSs, CKs and RESET#.

Add one more clock signal (to have, e.g., CK0 and CK1, each one reaching one of the two independent packages) and another RWDS (to have, e. g., RWDS0 and RWDS1, again, each one reaching one of the two independent packages), now for a total of fifteen pins, and anyone would be able to copy data from one of the packages to the other, or from P2 to both, simultaneously, targeting diferent address spaces, or not, with or without P2 intervention (or even grabbing data or doing some sniffing at its contents), other than properly (and timelly) generating CK0, CK1, CS0#, CS1#, and generating (write) or receiving (read) RWDS0 and RWDS1, as the proper execution of either comand could demand.

The only thing that would be totally forbiden is reading from BOTH devices at the same time, simultaneously (or writing data from P2, while, simultaneously, reading data from one, or both, devices, at the same time), due to the (more than likelly) possibility of electricall conflict at the data bus lines, thus having to deal with some (many) frying data bus drivers, as a penalty.

Even there, some (series resistors) tricks can be devised by the much creative community members, thus the above warning could be seen as mute, in the near future. As ever has been, impossible is a word that encites (and excites) Parallax's forum members, thus, perhaps...

If properly paced (not full speed, due to limits imposed by P2 maximum sysclk), one P2 could selectively copy the contents of one package to the other one, either parcially or at full, controlling (to be read as "avoiding to change") the main data memory contents of the destination package (the one that would receive a Write command), sollely by controling the RWDS signal that is sent to the package that would be receiving data (the one with an active write command). This includes some kind of simulation of the behaviour of the WMLONG instruction, in a byte-by-byte basis, but not limited to the exclusion of $00 as the prospective data to be writen (thus, not overwriting the data value at the destination of the copy, wichever value would be selected, from $00 to $FF, existing at the source of the data package).

Please remember, I'm not an specific advocate of HyperAnything, but IMHO they are extremelly flexible in what they can do, with that mild number of interface lines.

The rest is only having to pay attention to some timing concerns/limits, and a bit of software...

Hope it helps a bit

Henrique

VonSzarvas · 2019-06-21 18:49

@Yanomani

Thank you for the thorough summary - Very much appreciated.

However, now I'm even more convinced that two independent SRAMS (6 pins each) and some 4 pin flash, which would nicely fill a dual breakout with 16 pins., would represent a practical and simple solution that suits P2's working speed, and the ability for P2 code to run with true concurrent access.

Including the reason that the SRAMS will be read/writeable in any combination, without all the caveats you've descibed, when a pair of HyperRAMs are paired on a bus.

And for applications where faster ram storage is required (and not two separate ram areas), then both SRAMs could be read or written with interleaved memory addressing, to effectively double the rate of SRAM and reach HyperRAM speeds.

Just seems very flexible to me.

Yanomani · 2019-06-21 19:13

As for the Lyontek LY68L6400 psram, in order to use linear bursts, the clock speed would be limited to 84 MHz, at QPI, thus the data transfer rate would be 42 MB/s (two 4-bit QPI transactions per byte).

Keeping each transaction within the limits of a single page (1024 bytes), would raise the maximum interface clock to 144 MHz, resulting in a 72 MB/s transfer rate.

The good part of it is that throughput can be reached at 3.3 V, without any concerns to any voltage translation, wich in fact is a plus, and, IMHO, the best argument one can use to justify its use.

Each device pair, using twelve signals at the interface, would enable the independent or simultaneous use of both devices, but the copy from one device to another should pass the data itens thru the controlling processor (P2).

Constantly paired devices could enable 144 MB/s data transfers per pair, at the net cost of only ten interface signals, wich is another plus.

Note: I was composing the above post (slowly, as usual

) when I saw yours.

Henrique

Rayman · 2019-06-21 22:53

VonSzarvas wrote: »

So it seems to me, that SRAM at 3.3V (168 MB/s) could be a better solution than HyperRAM at 3.3V (~180MB/s), because SRAM only requires 6 pins and so both RAM chips would be addressable concurrently at 168 MB/s. Whereas HyperRAM shares the same databus (assuming max 16 pins per accessory module), and so the effective read/write rate might be as low as 90MB/s when both memories are being accessed concurrently.

Have I got that right ?

Can you give an example part number for this SRAM?

jmg · 2019-06-21 23:25

VonSzarvas wrote: »

However, now I'm even more convinced that two independent SRAMS (6 pins each) and some 4 pin flash, which would nicely fill a dual breakout with 16 pins., would represent a practical and simple solution that suits P2's working speed, and the ability for P2 code to run with true concurrent access.

Including the reason that the SRAMS will be read/writeable in any combination, without all the caveats you've descibed, when a pair of HyperRAMs are paired on a bus.

It may be better to look at footprints too.
What you call SRAM here , are actually Pseudo Static DRAMs, so they have refresh rules.
The true-SRAMS are the ISSI ones, that come to 45MHz, and it could be that someone may want to trial a mix of Static SRAM and PS-SRAM.
That points to a PCB with 4 SO8 footprints to permit 8b wide access, into two types of RAM.
Someone might want to also have FLASH, as you say, which is then 6 SO-8 footprints.

HyperRAM looks to come in larger sizes, so there is a case for that too. Is 6 x SO8 + BGA getting too much loading ?

pilot0315 · 2019-06-22 02:33

@evanh

Question:how do you put the quotes in your profile?

evanh · 2019-06-22 02:45

I'm not sure of what you are asking, I edit my signature setting: https://forums.parallax.com/profile/signature

Rayman · 2019-06-22 03:22

I'd like an SRAM part number to evaluate efficacy vs. hyperram...

samuell · 2019-06-22 03:27

Hi,

I would go for the SRAM, due to reasons of cost and possibly documentation. However, I don't oppose HyperRAM. It would be interesting to see both.

Kind regards, Samuel Lourenço

evanh · 2019-06-22 03:27

Rayman wrote: »

I'd like an SRAM part number to evaluate efficacy vs. hyperram...

So a combo of both on one board?

jmg · 2019-06-22 03:38

Rayman wrote: »

I'd like an SRAM part number to evaluate efficacy vs. hyperram...

I think VonSzarvas means this PSRAM one - for this, I do not see double-edge specs, so if you chase raw speed HyperRAM's both-edges wins.

LY68L6400 64MBits Serial Pseudo-SRAM with SPIand QPI
Package : 8-pin 150mil SOP8-pin
Single Supply Voltage: VCC=2.7 to 3.6V
Interface: SPI/QPI with SDR mode
Performance: Clock rate up to 144MHz(non-page boundary crossing)84MHz (page boundary crossing)
Organization: 64Mb, 8M x 8bits
Addressable bit range: A[22:0]
Page size: 1024 bytes
Refresh: Self-managed
Operating temperature range TC = -40°C to +85°C

Yanomani · 2019-06-22 14:50

There is also Espressif Systems ESP-PSRAM64H.

Except maximum clock rate, wich is speccd at 133 MHz (non-page boundary crossing) and 84MHz (page boundary crossing), the rest is almost the same as the Lyontek LY68L6400.

Though I was able to find the datasheets, I didn't found any Espressif devices available for sale, anywhere. Perhaps they are reserving their herd to serv their own processor ecosystem.

Also about Ly68L6400: must check real speed limits (144 MHz) of that (2.7 - (3.3) - 3.6 V) parts with Lyontek. Hope there is no confusion/translation error with some prospective (1.62 - (1.8) - 1.98)-technology devices they didn't ever launched.

Rayman · 2019-06-22 18:30

So, sounds like hyperram can be 2 (or maybe 4) times faster for large bursts (like VGA buffering), and only need one chip for 8-bit interface.

Also, I'm not sure I actually use RWDS anymore, but would need to confirm that...

VonSzarvas · 2019-06-22 19:38

For VGA buffering, what is the maximum RAM you'd expect to need per buffer?

Memory Breakout Poll

Comments