The Quest for RAM

Retrobits · 2011-08-02 13:11

Hi all,

I am on a Quest for RAM. I know this issue has been covered in the forum at some length in the past, but I'm curious what the current thinking is. Here's my plight:

I have a Propeller Professional Development Board, and a breadboard SD card unit. To this, I'd like to add some RAM that is compatible with the Catalina C compiler. I've looked at some of the devices that Catalina supports, and there seem to be various choices for RAM and interface designs. It would be awesome if the RAM choice had a commercially-available breakout board for use in a solderless breadboard, or if it was itself DIP compatible. Failing that, I'll figure out how to wire it up.

128K would be neat, 512K would be awesome, and more would be even better.

Thanks in advance for any ideas!

Oh, and P.S. - my goal is to use this RAM for a virtual disk drive unit I'm designing for a retro laptop computer called the Epson PX-8. I've been toying with this project for many years, and recently had some prototype success, so I'm very excited to take the next steps.

There is an open-source implementation of such a virtual drive that runs under Linux. It's written in C, and thus it would be cool if I could use Catalina and port as much of the code as possible with minimal rewrite. I saw that Catalina can use the SD card as a standard file system, but only if enough RAM is present - hence, my quest for more memory.

Thanks again, and take care,

- Earl

Duane Degn · 2011-08-02 13:31

Earl,

Here's my attempt at an 8-bit data bus SRAM. It's a stack of 256Kbit DIP chips to make a 256KByte module. It's still waiting for an application.

There are links to other memory modules in the above thread.

I don't think Clusso's RAMBlade is listed anywhere in the thread. That would be another module to look at.

I haven't tried Catalina myself yet. I don't know which modules work best with it.

Duane

Cluso99 · 2011-08-02 17:52

RamBlade has 512KB sram and microSD and prop overclocked and communicates to another prop (or micro) via high speed serial (2 pins). It is otherwise self-contained and uses no latches to get fast sram access. It is a small pcb (see link in my signature). I have other solutions coming (see http://forums.parallax.com/showthread.php?132769-Cluso-s-miniature-stackable-and-pluggable-boxable-propeller-boards-(1-quot-x1-quot-etc)

Jazzed has solutions using Quad SPI srams. Bill has solutions too (see micronaughts - spelling doesnt look right) so search for Bill Henning. Dr_Acula has a number of DracBlades.

Dr_Acula · 2011-08-02 20:27

I have 10 different boards for the dracblade, all with external ram, but I still don't think they are the perfect answer. Like Cluso, I am working on some even better (ie faster) designs.

This schematic is version 5 http://www.smarthome.jigsy.com/files/documents/Propeller_v5.pdf

Essentially it is 3 latches and a 512k ram chip and uses 12 propeller pins.

The 'better' design I am working on uses one latch and 24 propeller pins and ought to get close to Cluso's design in terms of speed (the one that uses virtually all the propeller pins connected to the ram chip). I am porting all my designs over the the Gadget Gangster format as this has now been endorsed by Parallax as an officially supported platform.

The great thing about Catalina is that it supports a variety of ram chips. So you can start with one hardware design, write all your code, and very easily change to a different hardware design down the track and all your code will still work eg if you run out of ram even with 512k you can then go to jazzed's 32mb design.

A virtual drive in C sounds great. Please keep us posted with progress on the project.

jazzed · 2011-08-02 21:15

Cluso99 wrote: »

Jazzed has solutions using Quad SPI srams.

Actually that's 4MB QuadSPI Flash (2x for an 8 bit bus with 10 pins total).
I do have an 8x 32KB SPI SRAM board but it's a little slower than flash.
Samples of 64KB SPI SRAM are on the way now.

The really great thing about using SPI anything is its synchronous nature.
For example, read/write data for a stock SRAM requires changing the address
for every byte. With 2x QuadSPI you just send a clock strobe, so it's possible
to read as fast as the Propeller HUB will let you (5MB/s at 80MHz). Getting
the initial address takes longer, but on average with a cache the throughput
is much higher. I just haven't tested the drivers with Catalina yet.

There is another language where I can switch between HUB and 2x QuadSPI
flash for certain tests and the performance impact is less than a 1.33 slow down.
More complicated programs like FFT run less than 2x slower (actual 1.47:1).

RossH · 2011-08-02 21:16

All,

I've offered this service before, but it seems appropriate to restate it here:

If anyone wants Catalina support for their external RAM solution, all they have to do is send me a working board (and preferably some clue - e.g. some Spin code - about how it is supposed to work!) and I'll write the necessary XMM API for inclusion in the next release of Catalina.

Or alternatively you can do this yourself and just send me the appropriate platform-specific target files (see the Catalina target directory for examples).

Ross.

Ale · 2011-08-02 23:42

The propeller may not be the best solution unless you can live with the slow execution speed, then it may suit your needs. If you want more memory and speed, there are other alternatives.

Gadgetman · 2011-08-03 02:02

Interesting project you have...
(I have the HX-20 and a PX-4. Not certain if I have a PX-8... my collection is not exactly organised... )

Are you aiming for an emulation of the optional floppy drive?

As for speed issues; I doubt that will be much of a problem.
(The HX-20 which is the origin of these models used 2 x 6301 CPUs running at 0.6MHz. The others are single-CPU, I believe. A bit faster, though. )

Rayman · 2011-08-03 06:48

RossH wrote: »

If anyone wants Catalina support for their external RAM solution, all they have to do is send me a working board (and preferably some clue - e.g. some Spin code - about how it is supposed to work!) and I'll write the necessary XMM API for inclusion in the next release of Catalina.

Ross,

I didn't know about this offer! Thanks. I'll definitely send you a Flashpoint Rampage in hopes you can make it work with Catalina.
I'm kinda focussing on using it for video memory right now...

prof_braino · 2011-08-03 06:57

Retrobits wrote: »

Quest for RAM. ...breadboard SD card unit. ... more would be even better.Thanks in advance for any ideas!
use this RAM for a virtual disk drive unit

Hi Earl

This may be on the outside edge of what you want, so here's food for thought. There is a method to use SD instead of RAM, which is a little slower than actual RAM, but much cheaper in terms of hardware and I/O resources.

On my project, we treat (one) SD as internal to the prop system, that is, organized and accessed to be optimized for speed. No FAT, no PC compatablity. A single block on the SD is accessed at any time, and the last block accessed is the current block. Areas of the disk are allocated into "files" and once defined, a file definition is permanent until the SD is reformated. The files become a "memory map" of the SD, they can be read and the contents altered, but the size is fixed. The idea is that these restriction reduce overhead and keep it fast.

The result is we can pretend the prop with an 8Gig SD card has 8Gig of RAM. The since the SD is persistent, the application(s) use the SD as both RAM and storage. I think usage might be similar to how they used to store a programs in core memory, if I understand core memory. With some imagination, one can pretend that the COG memory is used as L1 cache and registers, HUB memory used as 32k of L2 cache, and the SD is used as RAM and disk storage.

The trick becomes dividing the target applications and data into less-than-32K chuncks, and work on one disk block at a time, and keep the SD channel saturated. If one can do this, one can have programs and data up to the size of the SD card. The technics to use this are necessarilly a little different from on a PC workstation, but might be similar to dealing with the 64K boundary on old DOS systems minus the segment registers. The transfer speed is limited to the transfer speed of the SD card, so it is not the solution for all applications. You can have a very large number of small (assembler) functions that run at full prop speed, and a very large number of sub 32k data chunks.

Cluso99 · 2011-08-03 16:08

Prof_Braino: In ZiCog I use the CPM disks directly as you say, except they are separate fixed contiguous files located on the SD card under FAT16. ZiCog searches for the filenames and locates the first sector, which is then used as an offset to the data. This has the advantage of being able to support FAT files as well as pure data.

prof_braino · 2011-08-03 17:50

Cluso99 wrote: »

except they are separate fixed contiguous files located on the SD card under FAT16. This has the advantage of being able to support FAT files as well as pure data.

Yes that is a good way to do it. Sal felt the overhead of FAT support would more than double the SD support, and significantly impact performance, so he took the other option. The SD can still be accessed through the prop, and all file things that FAT does still get done.

The plan is to have a second SD slot the includes the full FAT support. This way, the "internal" SD is as fast as possible, and only the "external" SD is impacted. The thinking is that SD is cheap when SD adapter is used instead of an actual slot, and only needs a couple pins; also the FAT support can be loaded when needed and eliminated when complete.

Retrobits · 2011-08-05 11:33

Hi all,

Thanks for the responses! Of the various choices for RAM, I think I'll be going with Cluso's RAMBlade. Having a full Propeller "co-processor" with extra RAM, micro SD card support and Catalina support would be a great solution for this. The primary Propeller would handle I/O and user interface, and the RAMBlade can do the storage and heavy lifting. On a side note, I'm also looking forward to playing with the CP/M implementation for RAMBlade - that should be a lot of fun!

Speed is not really an issue, compared to some applications (e.g., real-time hi-res video). I only need to support a 38400 baud connection - that's the speed at which the Epson PX-8 laptop talks to the disk drive.

@Gadgetman: Yes, the hope is to emulate the Epson PF-10 portable floppy drive, which connects to the PX-8 computer over a 38400 baud serial connection. I believe it would also work for the PX-4. The Linux implementation of a virtual floppy (vfloppy) emulates four of those PF-10 drives simultaneously, and that would be my intent here as well. Disk image files serve as the virtual floppy media, similar to the approach used in emulators. In this case, those files would be on the SD card.

I've sent an e-mail to Cluso asking about purchasing an assembled RAMBlade. I'll keep the forum up to date on my progress - once I'm a bit further on a working prototype, I'll make a post and include some pictures.

Thanks again!

- Earl

Dr_Acula · 2011-08-06 01:48

I've sent an e-mail to Cluso asking about purchasing an assembled RAMBlade.

I believe Cluso's design is the fastest one out there. Where I think it really shines is with Catalina in XMM mode, because once you have the hardware sorted, you can start coding and never worry about running out of space. And Ross is very close to getting Catalina to something that is extremely easy to install and use.

But I am in awe of prof_braino's concept of caches. This is a brand new way of looking at things, and it opens up the possibility of huge programs. In particular, if you are smart and code your program so that things that are read/write often are the bits that end up in cog/hub ram, and the 'mostly read only' bits are in the sd cache, then the program could be optimised to run much more efficiently.

I wonder if the concept could be extended to considering a "sram" cache? I think it sits somewhere between "hub cache" and "sd cache", a bit faster than sd, doesn't wear out, but of course, data is lost on power down and it is not as big as the sd cache (though bigger than the hub cache).

Once you abstract the hardware in such a way, you can optimise code and then hardware can be changed down the track without changing the code.

As an aside, a while back cluso designed the "triblade" with three propellers, some with sram. I don't know if it is still available, but it was the first propeller board I owned and I don't think I ever fully explored all the possibilities. It used high speed links between the propeller chips as well as autoloading one propeller to the next. And it had a micro SD and an external ram chip.

I'm thinking of taking this further, with a propeller plus sram dedicated to the highest resolution display possible within the timing constraints, coupled with a propeller plus sram dedicated to running big Catalina C programs as fast as the hardware will allow (24 pins devoted to external ram on each board). it is all very very experimental, but thanks to this amazing forum and the generous time and help from a number of individuals, both parts of the project are going ahead in leaps and bounds. Like many projects, it will probably end up being a piece of code with multiple acknowledgments.

@retrobits, a cached drive could well end up being very useful to a number of software projects. For a start, if you use more pins than the 4 that connect to an sd card, it ought to fundamentally be faster than an sd card.

LoopyByteloose · 2011-08-06 10:10

Regarding Catalina's offer, I have a Hydra RAM extension card "Xtreme 512k" that Andre LeMothe created. So there is something already available.

Nonetheless, my impression is that all external RAM can only run as fast as Hub RAM because the i/o is on that same buss. Am I wrong?

Heater. · 2011-08-06 10:38

Given that external RAM is not any Propeller bus but driven by a bus of your own creation using IO pins it has to be a lot slower than using HUB RAM.
I don't have any performance figures to hand but I'm sure they have been posted on the various ext RAM threads.

jazzed · 2011-08-06 11:18

Heater. wrote: »

Given that external RAM is not any Propeller bus but driven by a bus of your own creation using IO pins it has to be a lot slower than using HUB RAM.
I don't have any performance figures to hand but I'm sure they have been posted on the various ext RAM threads.

Heater, just for hardware comparison sake I implemented heater_fft 2.0 in a language (David Betz xBasic) that supports HUB only and 2xQuadSPI Flash (10 pin interface).

The HUB only version finished heater_fft in about 2.574 seconds and the Flash version finished in 3.802 seconds. That's only a %47.7 slow down.

I'd like to see the performance difference on one of the "faster" solutions. Flash is not RAM, but a similar SRAM solution is on my desk ... I have to port the cache driver before I'll know the performance difference for that.

HUB ony at 96MHz

$ xbcom -b hub96 -p 6 xbfft.bas -r -t
Propeller Version 1 on COM6
Writing 3892 bytes to Propeller RAM.
Verifying ... Upload OK!
Loading VM
1916 bytes sent
Loading image
13740 bytes sent
Entering terminal mode. Exit with ESC.

heater_fft v2.0 - ported to xbasic by  jazzed July 2011.

 Test time about 3 seconds on 80MHz Propeller.
 Starting test ....

Freq. Magnitude
0       396
1       0
...
191     0
192     1023
193     0
...
319     0
320     1023
321     0
...
511     0
512     400
1024 point FFT plus magnitude calculation run time = 2574 ms

SpinSocket-Flash (4MB code space cached in HUB at 96MHz)

$ xbcom -b ssf -p 6 xbfft.bas -r -t
Propeller Version 1 on COM6
Writing 3892 bytes to Propeller RAM.
Verifying ... Upload OK!
Loading cache driver
1484 bytes sent
Loading VM
1916 bytes sent
Loading image
13752 bytes sent
Entering terminal mode. Exit with ESC.

heater_fft v2.0 - ported to xbasic by  jazzed July 2011.

 Test time about 3 seconds on 80MHz Propeller.
 Starting test ....

Freq. Magnitude
0       396
1       0
...
191     0
192     1023
193     0
...
319     0
320     1023
321     0
...
511     0
512     400
1024 point FFT plus magnitude calculation run time = 3802 ms

Heater. · 2011-08-06 13:04

Meanwhile I found some old zog results for Dhrystone and Ackermans function. There is a slow down of a bout a factor of four going from HUB RAM to EXT RAM.

Those heater_fft results don't look right to me.

jazzed · 2011-08-06 14:55

Heater. wrote: »

Meanwhile I found some old zog results for Dhrystone and Ackermans function. There is a slow down of a bout a factor of four going from HUB RAM to EXT RAM.

Ouch. What was the impact on fft?

Heater. wrote: »

Those heater_fft results don't look right to me.

Maybe it was 1.0 instead of 2.0. All i know is the results were identical to the SPIN version with #define SPINBUTTERFLIES except for performance. I didn't bother with PASMBUTTERFLIES since it's not exercising the "business part" of language.

LoopyByteloose · 2011-08-06 18:54

I suspect that factor of four is about dead on. After all, if Hub ram can be directly accessed at 32bits wide and external ram is accessed at 8bits wide. Both are 'picked off' the Hub's bus at the same rate, but 8 bit is one-fourth the data. I imagine a big boost here would to be to have 16bit wide external ram.

I am now rereading Andre's documentation for the Xtreme 512K ram as it is an interesting 8bit wide device that was intended to be well suited for video support.

Heater. · 2011-08-06 20:55

The xtreme RAM card may well be good for streaming data from RAM to video but that makes it very slow for random access as when running code from ext RAM.

RossH · 2011-08-06 20:59

Heater. wrote: »

The xtreme RAM card may well be good for streaming data from RAM to video but that makes it very slow for random access as when running code from ext RAM.

Actually, it's not too bad. It's certainly not the slowest XMM Ram platform I have. The "auto increment" function for addresses means that as long as you are reading successive bytes it is quite quick. This was intended for streaming data, but it works for streaming instruction fetches as well.

You only need to slow down on the corners (so to speak)!.

Ross.

jazzed · 2011-08-06 22:20

RossH wrote: »

Actually, it's not too bad. It's certainly not the slowest XMM Ram platform I have. The "auto increment" function for addresses means that as long as you are reading successive bytes it is quite quick. This was intended for streaming data, but it works for streaming instruction fetches as well.

You only need to slow down on the corners (so to speak)!.

Ross.

If the data bits were on P0..7, it could burst at full HUB byte-wide data rates

The 10 pin SPI RAM and Flash solutions I'm producing are twice the performance for less than half the price. Big QuadSPI Flash chips are even cheaper. I received the 64K x8 SPI RAM samples (512KB/board) and will be testing them soon. QuadSPI Flash addressing is much faster than for SPI SRAM.

The Quest for RAM

Comments