Cache volatile memory options
Dr_Acula
Posts: 5,484
Caching seems to be increasing in popularity and and I was wondering if we could brainstorm some memory options?
I was surprised recently when testing caching for Catalina to find that the final speed of a large program was hardly affected by the actual memory speed. I guess that stands to reason if most memory accesses are a cache "hit".
There are a number of memory options that use lots of propeller pins, and in general, the more pins, the faster the memory. But caching changes that rule. So I thought it could be useful to explore large external memory solutions using as *few* pins as possible.
I started with this Propeller Application Note http://huarobotics.com/appnotes/AN012-SRAM-v1.0_0.pdf
That uses Spin and it uses a single 23K256 (32k) chip connected to a C3. This is a fantastic document packed with useful information.
But - what if we want to add many of these 32k chips?
Well, you could use an I/O expander chip. Scroll down just under half way down this page http://futurlec.com/ICSFMicrochip.shtml for some expander chips by Microchip. There are also chips by Philips and others.
But an expander chip uses another 4 pins on an SPI bus. Now we have used 8 pins and at 12 pins one may as well use a SRAM and latches.
Can we use less than 8 pins?
Well, the first thing would be to share the data and clock lines between the ram chips and the I/O expander. That uses 5 lines and with a 16 output expander you could have 16 23K256 chips (or more even).
Next you could use the Propeller's I2C bus for the EEPROM and use an I2C expander rather than an SPI expander (MCP23017 vs MCP23S17). That means you only use 3 pins - Do, DI and Clock and you can have an unlimited number of SPI ram chips.
But with more than 6 SPI ram chips, the cost starts to add up and so does the board real estate. As a rough guide, Futurlec sell the 23k256 for $1.35 for 32k so you will need 16 to get 512k. By comparison, a 512k SRAM chip from Future Electronics (and others) is around $3.25.
So - how about we drive a 512k SRAM chip with a couple of MCP23S17 16 port I/O expander chips and run that off the I2C eeprom bus?
Most designs use an EEPROM anyway, so this is a *Zero Pin 512k Ram expansion*.
Would it work? Well, this website describes doing this for the Arduino with 8k ram chips http://davidn.org/wp/?cat=3&paged=2 so the idea ought to be possible to expand. The 512k sram has 32 pins and 2 are power so 30 pins and 2x16 I/O expanders means you could even drive three 512k sram chips.
So, some questions for the sage and wise experts on this forum:
Firstly, for the caching pasm drivers, is there room for an I2C driver in the cog?
And second, are there other ram chips that could be used that are better/cheaper/larger?
I was surprised recently when testing caching for Catalina to find that the final speed of a large program was hardly affected by the actual memory speed. I guess that stands to reason if most memory accesses are a cache "hit".
There are a number of memory options that use lots of propeller pins, and in general, the more pins, the faster the memory. But caching changes that rule. So I thought it could be useful to explore large external memory solutions using as *few* pins as possible.
I started with this Propeller Application Note http://huarobotics.com/appnotes/AN012-SRAM-v1.0_0.pdf
That uses Spin and it uses a single 23K256 (32k) chip connected to a C3. This is a fantastic document packed with useful information.
But - what if we want to add many of these 32k chips?
Well, you could use an I/O expander chip. Scroll down just under half way down this page http://futurlec.com/ICSFMicrochip.shtml for some expander chips by Microchip. There are also chips by Philips and others.
But an expander chip uses another 4 pins on an SPI bus. Now we have used 8 pins and at 12 pins one may as well use a SRAM and latches.
Can we use less than 8 pins?
Well, the first thing would be to share the data and clock lines between the ram chips and the I/O expander. That uses 5 lines and with a 16 output expander you could have 16 23K256 chips (or more even).
Next you could use the Propeller's I2C bus for the EEPROM and use an I2C expander rather than an SPI expander (MCP23017 vs MCP23S17). That means you only use 3 pins - Do, DI and Clock and you can have an unlimited number of SPI ram chips.
But with more than 6 SPI ram chips, the cost starts to add up and so does the board real estate. As a rough guide, Futurlec sell the 23k256 for $1.35 for 32k so you will need 16 to get 512k. By comparison, a 512k SRAM chip from Future Electronics (and others) is around $3.25.
So - how about we drive a 512k SRAM chip with a couple of MCP23S17 16 port I/O expander chips and run that off the I2C eeprom bus?
Most designs use an EEPROM anyway, so this is a *Zero Pin 512k Ram expansion*.
Would it work? Well, this website describes doing this for the Arduino with 8k ram chips http://davidn.org/wp/?cat=3&paged=2 so the idea ought to be possible to expand. The 512k sram has 32 pins and 2 are power so 30 pins and 2x16 I/O expanders means you could even drive three 512k sram chips.
So, some questions for the sage and wise experts on this forum:
Firstly, for the caching pasm drivers, is there room for an I2C driver in the cog?
And second, are there other ram chips that could be used that are better/cheaper/larger?
Comments
I am not sure if large serial ram chips exist - I couldn't find anything over 32k. So using a I2C expander and a real sram seemed like an answer. And yes it will be slow but no slower than other serial options, and with caching in hub the speed of the external memory is not so important.
I'm trying to come up with a design for the propeller that has keyboard, mouse, display, sd card and still has some pins free for other things.
Someone here mentioned some 64kB serial SRAM chips becoming available. It helps a bit. And there were some large serial flash chips with a decent data rate too.
Also
Cluso a while back had what I thought was a really nice solution. He had a prop board, and unlatched ram chip board, and an optional sandwich board that provided some latching and freed up pins at the same time. So for speed just use the 2 boards with no latching, or to free up pins you just add the latching board in between. With his permission you could extend this idea to a Propeller Platform board format.
EEPROM XMMC is a little slow at about 0.18 DMIPS. For comparison LMM runs at 4 DMIPS. 2x QuadSPI is at 0.71 DMIPS with the slow version of the driver.
XMMC puts code in "read-only" memory with data, stack, and select bits of code in HUB memory. This offers great advantages. I've never talked about leaving select bits of program code in HUB memory on the forums before - that's a GCC super power
On the volatile front, I have an 8x SPI SRAM module (10 pin solution), but haven't had time to integrate the driver. I'll look at that this week. I have samples of the 64KB SRAMs for a 512KB module - not sure if the performance improvement will justify the additional cost over straight SRAM or not.
Is this some sort of caching algorithm? I can certainly see the advantage of keeping the program in some external memory and having data and stack in hub. Then the "program" is only ever read-only and as such, doesn't wear out flash memory.
64KB sram samples? Sounds great. The problem with using hub for screen buffer, cache, stack and data is you run out of hub memory, but 64KB may well be enough extra memory for some working variables.
I found Dave Hein's pasm I2C driver,which is a starting point for experimenting with I2C expander chips (I need to bypass Spin and start with Pasm if this is going to end up in the cache driver for C etc). I picked up some MCP23017 chips at a great price from element14 so hopefully I can get something breadboarded this week with the crazy "no pin" memory solution. I'm thinking two MCP23017 chips on the eeprom bus and one 512k sram.
There are several such GCC super powers. Be careful though. With such powers come great responsibility.
For source to be compiled with vanilla C, empowered code should be painfully suppressed with #if defined(__PROPELLER_GCC__) ... #endif
That's part of the story. The other part is that since it's read-only, "write-back" or "write-through" swapping is not used, so the cache and code execution is faster.
Just adapt the driver from propgcc
I've added some Alpha Test Release notes for using eeprom or c3 here: http://code.google.com/p/propgcc/wiki/PropGccAlphaRelease
re
You are right, thinking about this more you could have two types of cache - one for data which is read/write, and another type for programs which is read only and would be faster (and use less code too, I presume). Smart thinking.
So GCC would let you put a function that is used a lot in hub, and a function that is rarely used in external memory? I can see some immediate uses for that.
Yes, we use this now with XMM/LMM for 115200 baud console output, etc... to save a COG.