External RAM

parsko · 2006-06-21 10:03

Hi all,

Seems to be a lot of talk about more external RAM.· I, too, have a need.· I have found

DS2016·>> http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2835

to be something that might work well for us Proppers.· It comsumes quite a few pins, but with 32 to spare, I think·21 (max)·would be okay to give up.

Frankly, I wouldn't quite know where to begin with something like this.· I assume that once you hook up the pins, you need to define an address to write to,·a byte to write, then flick the "write enable" pin, and it sucks it all up.

Also, it only adds 2048 bytes of location space (even though the datasheet says 2048 words), but that is a big lookup table (in my circumstance).

Any thoughts?

I might be able to write a spin object for this, but an assy object is out of my hands...

-Parsko

Ym2413a · 2006-06-21 15:42

I really think that answer would be some kind of cheap serial RAM.

Parallel Sram is nice and simple to intereface to, But they would take up too many pins for any large storage design that needs a lot of In-Out pins.
8 data pins and a WR would only leave... 23 pins afterwords.
Don't get me wrong, That's enough addressing space for 8megs. But after you start using pins for other IN-OUT things such as DAC, VGA, NTSC, Buttons and other misc stuff.
You'll soon find your self with a lack of pins.

32k of parallel sram would need 24pins alone and would only leave 8 pins left for other things.
Something that would work for some designs, but not all.

I wonder what's cheaper per bit these days. Serial SRAM or Parallel. (hmm)

--Andrew Arsenault.

Paul Baker · 2006-06-21 15:43

You would have to write code which adheres to the timing diagrams for the chip, as a first step you can do this in spin, then worry later about coveying it to assembly. BTW there are better chips out there for less money, such as the CY7C1399B-15VC which is a SRAM 32KX8 3.3V ASYNC 28-SOJ. SOJ is a surface mount device that has DIP spacing, so they are very easy to solder. If you dont need all that memory, you can always tie the unused address pins to ground, it's still cheaper.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Paul Baker · 2006-06-21 15:50

Andrew, finding serial SRAM is next to impossible, the closest you can find to the unlimited writability of SRAM is FRAM manufactured by Ramtron which is non-volatile like EEPROM. It is decidably more expensive than SRAM, the part I refered to is availible for $1.89 (Digikey) compared to the same sized FRAM $6.74 (NewarkInOne, which has them back ordered for 32 days)

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Mike Green · 2006-06-21 16:16

For a lookup table where you are writing only rarely, but mostly reading, a serial EEPROM would be ideal (look at Atmel's website). They come in sizes up to 128K x 8. I2C is slower, but only uses 2 pins. SPI is faster, but takes more pins. If you need write speed, you can always substitute the Ramtron chips. They're supposed to be a direct replacement for standard serial EEPROM.

bambino · 2006-06-21 16:29

I'm not sure what the latency would be, but using a serial to parrellel I/o expander and another chip to read it back to serial again might could be done with 4 pins provided they each where daisy chained!

Kaos Kidd · 2006-06-21 16:36

I just ordered a 128K X 8 sram from maxim as well, and their "high speed port expander", my intentions are to use the 28 ports on the port expander to access the sram, via a seperate cog and some buffering. The chips should be here abouts shortly.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

Ym2413a · 2006-06-21 16:59

Paul Baker said...
Andrew, finding serial SRAM is next to impossible, the closest you can find to the unlimited writability of SRAM is FRAM manufactured by Ramtron which is non-volatile like EEPROM. It is decidably more expensive than SRAM, the part I refered to is availible for $1.89 (Digikey) compared to the same sized FRAM $6.74 (NewarkInOne, which has them back ordered for 32 days)

The other problem would be access time. You might be stuck with using the pins for parallel SRAM if you need fast access to your storage.

I've used the Cypress CY62148BLL-70SXC before on some of my other designs.
It's a 512kilobyte 70ns SRAM that you can buy from digikey.

bambino · 2006-06-21 17:41

Kaos Kidd said,

I just ordered a 128K X 8 sram from maxim as well, and their "high speed port expander", my intentions are to use the 28 ports on the port expander to access the sram, via a seperate cog and some buffering. The chips should be here abouts shortly.

______________________________________________________________________________________________
Does that port expander enable you to read and write to the RAM?

Phil Pilgrim (PhiPi) · 2006-06-21 17:45

Guys,

With a couple cheap 8-bit latches ('373, '573, etc.), you could multiplex 16 address lines and 8 data lines on 8 pins, plus a few extra for strobes and enables. Use the latches to latch the address lines, then read or write the data directly.

-Phil

Kaos Kidd · 2006-06-21 18:11

bambino.
It's an i2c port expander...
My intentions are to use three lines: SCL, SDA and CE.. I should get fast response via high speed i2c.
Basic operation would be something like
Set the address port pins to the desired address, the data port pins to inout or output via i2c, toggle the ce line, read write the data via i2c, then toggle the ce again.
If it works out, I'll add more chips (up to 16) and a 4->16 decoder to drive the CE lines (total line count of 6 pins on the propeller) for 2 megs of sram
The spin object would handle which chip I need to access, and if all is great, and the need is there, I can always add additional SCL/SDA lines, and add "bank select"
into the spin object. In all, the final spin object would be a sequencial memory map, the spin object will deal with which bank / chip is needed.
All this is off in the future at this point, as I'm only coding the fisrt stage: Getting it to read / write to 1 chip.
Phil, you are correct as well, I was just looking at using the port expander as the multiplexer for the addressing, and using the extra ports for the data, but thinking about it, I could remove two i2c transfers by reading the data directly... Still, there are so many ways of doing things... [noparse]:)[/noparse]

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

Paul Baker · 2006-06-21 18:20

Ive been contemplating the merits of using an SX48 as an intelligent latch. It would use 2 ports as a dedicated address bus, another port as an 8 bit input (sniffing the Propeller/SRAM data bus), and some lines for SRAM control and Prop interface. It could be designed to operate as an ordinary latch, do single step auto-increment/decrement, burst memory operations and page mode non sequential addressing (lower 8 bits of address supplied, data comes back on the same 8 bits); depending on a function value provided by the Propeller. While it wouldn't provide as fast of operations as permitted by using the Propeller alone, you could probably do it with only occupying 12 pins (or less) and not sacrifice very much in speed.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Post Edited (Paul Baker) : 6/21/2006 6:23:48 PM GMT

Ym2413a · 2006-06-21 19:17

Phil Pilgrim (PhiPi) said...
Guys,

With a couple cheap 8-bit latches ('373, '573, etc.), you could multiplex 16 address lines and 8 data lines on 8 pins, plus a few extra for strobes and enables. Use the latches to latch the address lines, then read or write the data directly.

-Phil

I like Phil's Bank Switching type idea the most.
If writing the address into latches first isn't going to kill anyone, then it would more then likely be the fastest and cheapest way of saving pins while adding large SRAM chips.

The SX48 would make a cool Memory Controller.
Only it would add to the cost of a design and would be overkill for most.

Still a "Smart" Memory Controller would be a cool project in its self.

Tracy Allen · 2006-06-21 21:15

Terry Hitt (well known here as Bean) has already followed through on the SX to memory peripheral idea. I've seen his neat product, soon to be released, I believe. It includes 32k of RAM and an SD memory card interface using FAT16. It uses a serial interface at 19200 baud, so it won't be as fast as a parallel implementation, but it is really cool for what it is meant to do.
www.sddatalogger.com/

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com

parsko · 2006-06-22 06:47

Hi guys,

Thanks for all your comments. I have been researching this because I have a need for quick RAM access for lookup tables (ones that may be too large for onboard RAM). I also have the need to write/update lookup values in the same time as I need to read them. A summary of how fast SPI works:
2400 kbit/s =300 byte/sec =150 word/sec
9600 kbit/s =1200 byte/sec =600 word/sec
57600 kbit/s =7200 byte/sec =3600 word/sec
115.2 kbit/s =14400 byte/sec =7200 word/sec

Thus far, I need to write words, and would like to read/write faster than 1000Hz, preferably 2500Hz. At this rate, with an EEPROM capable of 100,000 writes, I would run out in 40 secs (at 2500Hz). Granted, I would not be accessing the same value, but have to design for the worst case.

I agree the FRAM seems to be quite a desirable thing. It seems to be cheaper than the RAM chip I suggested, and it doesn't need a battery for non-volatility.

What does 20Mhz SPI equate to in bit/s? The highest SPI setting for the Prop is 115.2kbit/s, correct?

BUT, if one takes the amount of pins used with the DS2016, you could use 4 SPI connections, actually, 3+N=20 -->17 SPI connections. Again, one full_duplex object can only run at 115.kkbit/s, right? So, two objects can write 14 words at 1000Hz to two SD cards/SPI EEPROM/SPI FRAM and consume 8 pins. I guess that is quite acceptable.

Grumble Grumble, decisions can be tough to come by sometimes...

-Parsko

Paul Baker · 2006-06-22 13:07

I dont think it is a proper analogy to use the full duplex object to asess whats possible with SPI. In master configuration normal method of writing (in assembly) you should be able to approach a throughput of a bit every·5 instructions, or 4MBits/second or 200k words/second. If you were to unroll the communication loop and have a counter generate the clock signal and sync to the clock signal, you could acheive in the vicinity of a bit every 2 instructions or 10MBits/second or 500k words/second.

These figures are just for raw information, remember for every byte written non sequentially requires 3 bytes of preamble.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.

Post Edited (Paul Baker) : 6/22/2006 1:26:05 PM GMT

parsko · 2006-06-27 17:46

It just occured to me, after reading another post regarding COG stuff, that one could use a COG as RAM. What got me thinking this was

Chip said...
Each COG has 512 longs (32-bits), 496 of which are general-purpose RAM registers which hold the program and data for that COG. So, the maximum COG program size is 496 longs.

For every COG, there are 496 longs, or 1984 bytes available. You would need code to download data to the COG once, and code to handle the transactions to other COG's. Dependent upon the resultant size of this COG's code, you could use the remaining memory which, I'm guessing completely, could be 1000+ bytes, 500 words, 250 longs.

If you used more COGs, then you would have more memory.

Does this make sense?

-Parsko

NOTE: The reason I keep asking this is because I have a need for a lookup table. There will need to be around 1000 values (it's actually 2 values on the same size table). I'm trying to figure out if it could be a "word" lookup table, versus a "byte" table, which would be half the size. It all boils down to what would work best for my specific application. In the end I could go both ways (snicker), but curiosity has gotten the best of me.

Kaos Kidd · 2006-06-27 18:05

Parsko:
Earlier I posed the same question in a different thread, slightly different... and very "dooable"
A small assembly object launched into a cog which had two mthods:
Write(Cog,Address,@Var,Count) ... Used to write into the asm cog's unused ram... @var is a pointer to a spin var or array, count is the number
Read(Cog,Address,@Var,Count) ... Used to read into a spin var or array from the cog's unused ram.
(The only reason I'm including the Cog pram is for the possability of running more then 1 copy of the object in different cogs)
The max address size will be the entire unused portion of the cog running the asm code, addressed in a continious space starting at 0.
As things stand right now, I'm working on this object as my first attempt at asm.
My goals are to keep the asm short, giving more space for storage.
It's going slow, but it's going, my biggest issue is passing the prams into the running asm application, but I'v ideas about that as well..

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

parsko · 2006-06-27 22:26

Interesting Kaos, you got me thinking. Would it be possible to have cog1 load data from an external eeprom into cog2, and minimize futher the code needed on cog2 to handle moving the data to and from whatever cog could want it (a.k.a. COGRAM). In my case, a large lookup table, I would only need to read it once at the start, and write it at the end... What is the max COGRAM space that can be achieved?

-Parsko

Cliff L. Biffle · 2006-06-28 04:59

In theory, there's 16K of space for use in the COGs' local storage, minus a couple hundred bytes for the interface program.

Keep in mind, however, that unless you do something spectacularly clever, you'll have a latency of at least 48 cycles per read, and 32 per write. That's 3x and 2x slower than shared RAM (respectively), and 12x and 8x slower than local storage.

parsko · 2006-06-28 06:44

Cliff,

If, when you say shared RAM, do you mean the global Prop RAM? If so, the problem with that is, if one has a HUGE program that consumes all the memory, then you could not use the shared RAM, right? That I why I suggest loading a mini program in another cog to handle data that gets written to it from an external eeprom (or similar).

Also, what do you mean by local storage?

-Parsko

Cliff L. Biffle · 2006-06-28 16:11

Each COG has 2k (512x32) local storage, which can be accessed single-cycle (by a four-cycle instruction). This is the memory that (I believe) Kaos Kidd is proposing using as a sort of storage space, to augment the shared 32K of RAM. (KK, if I've misread you, please correct me.) This also sounds like your "mini program" to buffer data from an EEPROM.

As for a "huge program," you're probably referring to SPIN programs, which live in shared RAM; the actual COG-level code is stored in this 2k of local storage, along with any COG data. (For SPIN programs, this is the interpreter.) My comments were aimed squarely at assembly language programming, and my timings will be incorrect for SPIN.

So, back to COG-resident storage. You can't simply call a function in another COG; you'll need some way of communicating between COGs.

(Chip, if you're reading this, seriously consider a semaphore/exchanger mechanism between COGs for a future rev. The Acroname Brainstem actually serves as a good example here, for once. I can point you at docs if you want.)

So, the most practical approach is to reserve some shared memory in the 32K shared RAM. Without going to the level of actual instruction mnemonics, the protocol would look like this:

READ:
1. Master COG writes an address to shared RAM. (7-22 cycles, requires HUB sync)
2. Slave COG reads address from shared RAM. (7-22 cycles, requires HUB sync)
3. Slave COG writes data to shared RAM. (At least 16 cycles after #2.)
4. Master COG reads data from shared RAM. (7-22 cycles)

You could carefully sync up the COGs to avoid any need for "data-ready" flags or locks (which I've been discussing in another thread), but you're still looking at significant latency.

Writes are cheaper.

WRITE:
1. Master COG writes address and data to shared RAM. If you treat the Slave as a 16-bit-wide memory, you could pack these into a single word.
2. Slave COG reads address and data.
3. Slave COG modifies internal storage.

If you don't mind the slave COG running full-out at all times (with the accompanying current consumption), and as long as you have a single master COG, you won't need locking or other communication; writes and reads are idempotent in this protocol.

Kaos Kidd · 2006-06-28 17:30

Cliff:
Yes, you got it right. My intention is to create a "COG RAM DRIVE" (aka "CRD"), using a total software approch. As it stans right now, the assembly program (the actual part to run in the cog) is only 17 longs long. I'm still working on getting the "data" exchange to work correctly, but your right. From the beginning I have intentions of using the locks to manage the "run" of the app. The lock's function is to communicate from spin -> assembly "GO" and from the assembly -> spin "DONE" I just have intentions of monitoring the lock's status, having the code just toggle it's state as needed. If I code this aspect correctly, one could have more then one CRD running.
You stated the latency of the read/writes in X's, could you express in bytes per second?

Parsko:
If you include the i2c object, you can write to the CRD after each read from the i2c object. THe whole idea is to increase runtime var storage. You really don't want to bulk out the asm code to do extra stuff.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

parsko · 2006-06-29 07:22

Okay, I'm pretty sure that this may be a repeat, but...

I'm wicked confused now regarding the distribution of memory. Please correct the following if I am wrong:

Global RAM = 32k
Global ROM = 32k
COG RAM = 2k (x8=16k)
Total Memory on the Prop = 80k

Global ROM = stores log table, character map, etc... NOT usable by user other than to READ
Global RAM = stores compiled spin/asm programs (where Smile goes when you hit "F10")
COG RAM = Stores "initialized" COG programs. aka if I COGNEW, this is where the program is stored

If your spin/asm program compiled is 32k, one would not have room left over to store a large lookup table.
If your spin/asm program compiled is 16k, one would have 16k left over to store a large lookup table.
Processor (COG) ram is ONLY accessible by the COG (with the exception of Phils callback routine?)
Processor (COG) ram is where an ASM program is stored when called via a COGNEW command

Slightly tangent: a compiled program starts in COG(0). One could start COG(1), run code that could possibly run until electricity no longer exists on the supply pin, while the code in COG(0) stops. Thus leaving COG(1) as the only COG running. Correct?

Thanks so far guys!

-Parsko

Cliff L. Biffle · 2006-06-30 14:38

First KK, then parsko:

Kaos Kidd said...

You stated the latency of the read/writes in X's, could you express in bytes per second?

Assuming you were able to read/write 16 bits per operation (which would let you pack both data and address into a single word), that'd come out to rooooughly 46 cycles per 16 bits read. (I'm making some inappropriate assumptions about COG sync.) At 80MHz, then, you could do a little over 1.7m reads per second if you were doing absolutely nothing else, for a data rate of roughly 3.48MBps.

By contrast, a single COG can shuttle data into or out of shared RAM at 20MBps, or to/from local COG storage at 80MBps.

If you use a lock, add at least 16 cycles to each read. If you write in SPIN, divide these numbers by 200.

Now:

parsko said...

Global RAM = stores compiled spin/asm programs (where Smile goes when you hit "F10")
COG RAM = Stores "initialized" COG programs. aka if I COGNEW, this is where the program is stored

Correct, but I'll clarify this a bit; anything you load onto the Propeller using the normal process goes into shared RAM (global RAM) at boot. Little chunks of it may get pulled into COG RAM and run as assembly programs on the COGs.

Note I said assembly programs. If you write in SPIN, what actually gets pulled into the COG is the SPIN interpreter.

Any leftover space in the COG after the program's loaded can be used (by an assembly program) as data space. The SPIN interpreter also does this, but I don't know that it's user-accessible.

So:
[noparse][[/noparse]quote]
If your spin/asm program compiled is 32k, one would not have room left over to store a large lookup table.
If your spin/asm program compiled is 16k, one would have 16k left over to store a large lookup table.

These statements are correct for SPIN, and mostly correct for assembler -- but as I noted above, assembler programs have to be pulled into local COG ram, and thus must be broken into chunks of about 2k. So, if your assembler program is 32k, you have other problems and have to start being very clever.

[noparse][[/noparse]quote]
Processor (COG) ram is ONLY accessible by the COG (with the exception of Phils callback routine?)

COG RAM is only accessible to the one COG, period. I'm not sure I've seen the callback routine you're referring to, but this restriction is in hardware; the most one could do is circumvent it in software like KK is planning.

[noparse][[/noparse]quote]
Processor (COG) ram is where an ASM program is stored when called via a COGNEW command

Yes, that's correct.

[noparse][[/noparse]quote]
Slightly tangent: a compiled program starts in COG(0). One could start COG(1), run code that could possibly run until electricity no longer exists on the supply pin, while the code in COG(0) stops. Thus leaving COG(1) as the only COG running. Correct?

Correct.

Kaos Kidd · 2006-06-30 16:28

Maybe this might be a better way of viewing it.
There's 32K of space.· Within that 32K of space you must fit all of your spin and assembly code
(or have a means within the code to get newcode·from an off chip source!).

A single cog can run either an assembly or spin program.· If it's assembly, the max size is 2K and it must fit entirely within a cog.
For spin, the interpeter is loaded into the "target" cog, and run.·
When the interpiter is running, the spin code is gotten (byte code by byte code) from global ram and executed.
The spin source code still resides in the global 32K of ram.· All of the vars are in that same GR space as well.
Remember the old basic days?· Same concept here.· NOTE:· GR := Global Ram
You can make 1 32k spin appication, with 1 method. It's stored in the GR
You can make 1 32k spin appication with 512·{I don't know the max count on methods, just a random number} methods, It's stored in GR
You can make 8 spin·programs with·??? number of methods each, the sum of all must be <=32k, all of which is stored and left in GR.
The spin code never leaves the GR.· Only as it's executed are parts retrieved by the cog that executing that chunck of code.
The thing and sweetness of it all is this: Any one method can be run by any cog.· This can happen up to 8 times.
Assembly is different.· It occupies GR and a cog.
The source code for an assembly program is stored in GR.·
When you cognew it into a cog, it is copied entirely into the cog, then executed.

Now, if there was a simple way to keep assembly source from using GR, or to recycle the GR for other·things·[noparse]:)[/noparse]

When working with·some younger scouts (11~15), I translated the verbage into automotive terms they understood:
Each cog is a complete machine.· We have 8 machines we can use and play with.· Each of the 8 machines can do anything we tell it to do.·
If we want, we can tell all 8 to do the same thing, or each one something different.
Any machine can be a regular machine (runs spin)·or a preformance machine (runs assembly).
All 8 machines use the same gas tank· (the gas tank is the global ram )
If I want one of the machines to be a preformance machine, I have to put that special gas (assembly code) into the gas tank then tell one of the regular machines to use that gas.· The ECU (automotive Electronic Control Unit, for the propeller it's the compilier) knows where in the gas tank the special gas is, and makes the regular machine fill it's carbs (cog ram) with the special gas, then it starts the preformance machine.··If·you made the gas (program) right, the machines will never run out of gas.

This verbage hit it home for all but the youngest one.·
Not only did·they understand and demonstrate they understood it, some·are getting into the hobbie.
Well, I hope this helps.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

Kaos Kidd · 2006-06-30 16:29

Cliff:
THanks. I didn't know how big a preformance hit it would be...

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket

KK
·

External RAM

Comments