External RAM
parsko
Posts: 501
Hi all,
Seems to be a lot of talk about more external RAM.· I, too, have a need.· I have found
DS2016·>> http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2835
to be something that might work well for us Proppers.· It comsumes quite a few pins, but with 32 to spare, I think·21 (max)·would be okay to give up.
Frankly, I wouldn't quite know where to begin with something like this.· I assume that once you hook up the pins, you need to define an address to write to,·a byte to write, then flick the "write enable" pin, and it sucks it all up.
Also, it only adds 2048 bytes of location space (even though the datasheet says 2048 words), but that is a big lookup table (in my circumstance).
Any thoughts?
I might be able to write a spin object for this, but an assy object is out of my hands...
-Parsko
Seems to be a lot of talk about more external RAM.· I, too, have a need.· I have found
DS2016·>> http://www.maxim-ic.com/quick_view2.cfm/qv_pk/2835
to be something that might work well for us Proppers.· It comsumes quite a few pins, but with 32 to spare, I think·21 (max)·would be okay to give up.
Frankly, I wouldn't quite know where to begin with something like this.· I assume that once you hook up the pins, you need to define an address to write to,·a byte to write, then flick the "write enable" pin, and it sucks it all up.
Also, it only adds 2048 bytes of location space (even though the datasheet says 2048 words), but that is a big lookup table (in my circumstance).
Any thoughts?
I might be able to write a spin object for this, but an assy object is out of my hands...
-Parsko
Comments
Parallel Sram is nice and simple to intereface to, But they would take up too many pins for any large storage design that needs a lot of In-Out pins.
8 data pins and a WR would only leave... 23 pins afterwords.
Don't get me wrong, That's enough addressing space for 8megs. But after you start using pins for other IN-OUT things such as DAC, VGA, NTSC, Buttons and other misc stuff.
You'll soon find your self with a lack of pins.
32k of parallel sram would need 24pins alone and would only leave 8 pins left for other things.
Something that would work for some designs, but not all.
I wonder what's cheaper per bit these days. Serial SRAM or Parallel. (hmm)
--Andrew Arsenault.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket
KK
·
The other problem would be access time. You might be stuck with using the pins for parallel SRAM if you need fast access to your storage.
I've used the Cypress CY62148BLL-70SXC before on some of my other designs.
It's a 512kilobyte 70ns SRAM that you can buy from digikey.
I just ordered a 128K X 8 sram from maxim as well, and their "high speed port expander", my intentions are to use the 28 ports on the port expander to access the sram, via a seperate cog and some buffering. The chips should be here abouts shortly.
______________________________________________________________________________________________
Does that port expander enable you to read and write to the RAM?
With a couple cheap 8-bit latches ('373, '573, etc.), you could multiplex 16 address lines and 8 data lines on 8 pins, plus a few extra for strobes and enables. Use the latches to latch the address lines, then read or write the data directly.
-Phil
It's an i2c port expander...
My intentions are to use three lines: SCL, SDA and CE.. I should get fast response via high speed i2c.
Basic operation would be something like
Set the address port pins to the desired address, the data port pins to inout or output via i2c, toggle the ce line, read write the data via i2c, then toggle the ce again.
If it works out, I'll add more chips (up to 16) and a 4->16 decoder to drive the CE lines (total line count of 6 pins on the propeller) for 2 megs of sram
The spin object would handle which chip I need to access, and if all is great, and the need is there, I can always add additional SCL/SDA lines, and add "bank select"
into the spin object. In all, the final spin object would be a sequencial memory map, the spin object will deal with which bank / chip is needed.
All this is off in the future at this point, as I'm only coding the fisrt stage: Getting it to read / write to 1 chip.
Phil, you are correct as well, I was just looking at using the port expander as the multiplexer for the addressing, and using the extra ports for the data, but thinking about it, I could remove two i2c transfers by reading the data directly... Still, there are so many ways of doing things... [noparse]:)[/noparse]
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket
KK
·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.
Post Edited (Paul Baker) : 6/21/2006 6:23:48 PM GMT
I like Phil's Bank Switching type idea the most.
If writing the address into latches first isn't going to kill anyone, then it would more then likely be the fastest and cheapest way of saving pins while adding large SRAM chips.
The SX48 would make a cool Memory Controller.
Only it would add to the cost of a design and would be overkill for most.
Still a "Smart" Memory Controller would be a cool project in its self.
www.sddatalogger.com/
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Tracy Allen
www.emesystems.com
Thanks for all your comments. I have been researching this because I have a need for quick RAM access for lookup tables (ones that may be too large for onboard RAM). I also have the need to write/update lookup values in the same time as I need to read them. A summary of how fast SPI works:
2400 kbit/s =300 byte/sec =150 word/sec
9600 kbit/s =1200 byte/sec =600 word/sec
57600 kbit/s =7200 byte/sec =3600 word/sec
115.2 kbit/s =14400 byte/sec =7200 word/sec
Thus far, I need to write words, and would like to read/write faster than 1000Hz, preferably 2500Hz. At this rate, with an EEPROM capable of 100,000 writes, I would run out in 40 secs (at 2500Hz). Granted, I would not be accessing the same value, but have to design for the worst case.
I agree the FRAM seems to be quite a desirable thing. It seems to be cheaper than the RAM chip I suggested, and it doesn't need a battery for non-volatility.
What does 20Mhz SPI equate to in bit/s? The highest SPI setting for the Prop is 115.2kbit/s, correct?
BUT, if one takes the amount of pins used with the DS2016, you could use 4 SPI connections, actually, 3+N=20 -->17 SPI connections. Again, one full_duplex object can only run at 115.kkbit/s, right? So, two objects can write 14 words at 1000Hz to two SD cards/SPI EEPROM/SPI FRAM and consume 8 pins. I guess that is quite acceptable.
Grumble Grumble, decisions can be tough to come by sometimes...
-Parsko
These figures are just for raw information, remember for every byte written non sequentially requires 3 bytes of preamble.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Life is one giant teacup ride.
Post Edited (Paul Baker) : 6/22/2006 1:26:05 PM GMT
For every COG, there are 496 longs, or 1984 bytes available. You would need code to download data to the COG once, and code to handle the transactions to other COG's. Dependent upon the resultant size of this COG's code, you could use the remaining memory which, I'm guessing completely, could be 1000+ bytes, 500 words, 250 longs.
If you used more COGs, then you would have more memory.
Does this make sense?
-Parsko
NOTE: The reason I keep asking this is because I have a need for a lookup table. There will need to be around 1000 values (it's actually 2 values on the same size table). I'm trying to figure out if it could be a "word" lookup table, versus a "byte" table, which would be half the size. It all boils down to what would work best for my specific application. In the end I could go both ways (snicker), but curiosity has gotten the best of me.
Earlier I posed the same question in a different thread, slightly different... and very "dooable"
A small assembly object launched into a cog which had two mthods:
Write(Cog,Address,@Var,Count) ... Used to write into the asm cog's unused ram... @var is a pointer to a spin var or array, count is the number
Read(Cog,Address,@Var,Count) ... Used to read into a spin var or array from the cog's unused ram.
(The only reason I'm including the Cog pram is for the possability of running more then 1 copy of the object in different cogs)
The max address size will be the entire unused portion of the cog running the asm code, addressed in a continious space starting at 0.
As things stand right now, I'm working on this object as my first attempt at asm.
My goals are to keep the asm short, giving more space for storage.
It's going slow, but it's going, my biggest issue is passing the prams into the running asm application, but I'v ideas about that as well..
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket
KK
·
-Parsko
Keep in mind, however, that unless you do something spectacularly clever, you'll have a latency of at least 48 cycles per read, and 32 per write. That's 3x and 2x slower than shared RAM (respectively), and 12x and 8x slower than local storage.
If, when you say shared RAM, do you mean the global Prop RAM? If so, the problem with that is, if one has a HUGE program that consumes all the memory, then you could not use the shared RAM, right? That I why I suggest loading a mini program in another cog to handle data that gets written to it from an external eeprom (or similar).
Also, what do you mean by local storage?
-Parsko
As for a "huge program," you're probably referring to SPIN programs, which live in shared RAM; the actual COG-level code is stored in this 2k of local storage, along with any COG data. (For SPIN programs, this is the interpreter.) My comments were aimed squarely at assembly language programming, and my timings will be incorrect for SPIN.
So, back to COG-resident storage. You can't simply call a function in another COG; you'll need some way of communicating between COGs.
(Chip, if you're reading this, seriously consider a semaphore/exchanger mechanism between COGs for a future rev. The Acroname Brainstem actually serves as a good example here, for once. I can point you at docs if you want.)
So, the most practical approach is to reserve some shared memory in the 32K shared RAM. Without going to the level of actual instruction mnemonics, the protocol would look like this:
READ:
1. Master COG writes an address to shared RAM. (7-22 cycles, requires HUB sync)
2. Slave COG reads address from shared RAM. (7-22 cycles, requires HUB sync)
3. Slave COG writes data to shared RAM. (At least 16 cycles after #2.)
4. Master COG reads data from shared RAM. (7-22 cycles)
You could carefully sync up the COGs to avoid any need for "data-ready" flags or locks (which I've been discussing in another thread), but you're still looking at significant latency.
Writes are cheaper.
WRITE:
1. Master COG writes address and data to shared RAM. If you treat the Slave as a 16-bit-wide memory, you could pack these into a single word.
2. Slave COG reads address and data.
3. Slave COG modifies internal storage.
If you don't mind the slave COG running full-out at all times (with the accompanying current consumption), and as long as you have a single master COG, you won't need locking or other communication; writes and reads are idempotent in this protocol.
Yes, you got it right. My intention is to create a "COG RAM DRIVE" (aka "CRD"), using a total software approch. As it stans right now, the assembly program (the actual part to run in the cog) is only 17 longs long. I'm still working on getting the "data" exchange to work correctly, but your right. From the beginning I have intentions of using the locks to manage the "run" of the app. The lock's function is to communicate from spin -> assembly "GO" and from the assembly -> spin "DONE" I just have intentions of monitoring the lock's status, having the code just toggle it's state as needed. If I code this aspect correctly, one could have more then one CRD running.
You stated the latency of the read/writes in X's, could you express in bytes per second?
Parsko:
If you include the i2c object, you can write to the CRD after each read from the i2c object. THe whole idea is to increase runtime var storage. You really don't want to bulk out the asm code to do extra stuff.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket
KK
·
I'm wicked confused now regarding the distribution of memory. Please correct the following if I am wrong:
Global RAM = 32k
Global ROM = 32k
COG RAM = 2k (x8=16k)
Total Memory on the Prop = 80k
Global ROM = stores log table, character map, etc... NOT usable by user other than to READ
Global RAM = stores compiled spin/asm programs (where Smile goes when you hit "F10")
COG RAM = Stores "initialized" COG programs. aka if I COGNEW, this is where the program is stored
If your spin/asm program compiled is 32k, one would not have room left over to store a large lookup table.
If your spin/asm program compiled is 16k, one would have 16k left over to store a large lookup table.
Processor (COG) ram is ONLY accessible by the COG (with the exception of Phils callback routine?)
Processor (COG) ram is where an ASM program is stored when called via a COGNEW command
Slightly tangent: a compiled program starts in COG(0). One could start COG(1), run code that could possibly run until electricity no longer exists on the supply pin, while the code in COG(0) stops. Thus leaving COG(1) as the only COG running. Correct?
Thanks so far guys!
-Parsko
Assuming you were able to read/write 16 bits per operation (which would let you pack both data and address into a single word), that'd come out to rooooughly 46 cycles per 16 bits read. (I'm making some inappropriate assumptions about COG sync.) At 80MHz, then, you could do a little over 1.7m reads per second if you were doing absolutely nothing else, for a data rate of roughly 3.48MBps.
By contrast, a single COG can shuttle data into or out of shared RAM at 20MBps, or to/from local COG storage at 80MBps.
If you use a lock, add at least 16 cycles to each read. If you write in SPIN, divide these numbers by 200.
Now:
Correct, but I'll clarify this a bit; anything you load onto the Propeller using the normal process goes into shared RAM (global RAM) at boot. Little chunks of it may get pulled into COG RAM and run as assembly programs on the COGs.
Note I said assembly programs. If you write in SPIN, what actually gets pulled into the COG is the SPIN interpreter.
Any leftover space in the COG after the program's loaded can be used (by an assembly program) as data space. The SPIN interpreter also does this, but I don't know that it's user-accessible.
So:
[noparse][[/noparse]quote]
If your spin/asm program compiled is 32k, one would not have room left over to store a large lookup table.
If your spin/asm program compiled is 16k, one would have 16k left over to store a large lookup table.
These statements are correct for SPIN, and mostly correct for assembler -- but as I noted above, assembler programs have to be pulled into local COG ram, and thus must be broken into chunks of about 2k. So, if your assembler program is 32k, you have other problems and have to start being very clever.
[noparse][[/noparse]quote]
Processor (COG) ram is ONLY accessible by the COG (with the exception of Phils callback routine?)
COG RAM is only accessible to the one COG, period. I'm not sure I've seen the callback routine you're referring to, but this restriction is in hardware; the most one could do is circumvent it in software like KK is planning.
[noparse][[/noparse]quote]
Processor (COG) ram is where an ASM program is stored when called via a COGNEW command
Yes, that's correct.
[noparse][[/noparse]quote]
Slightly tangent: a compiled program starts in COG(0). One could start COG(1), run code that could possibly run until electricity no longer exists on the supply pin, while the code in COG(0) stops. Thus leaving COG(1) as the only COG running. Correct?
Correct.
There's 32K of space.· Within that 32K of space you must fit all of your spin and assembly code
(or have a means within the code to get newcode·from an off chip source!).
A single cog can run either an assembly or spin program.· If it's assembly, the max size is 2K and it must fit entirely within a cog.
For spin, the interpeter is loaded into the "target" cog, and run.·
When the interpiter is running, the spin code is gotten (byte code by byte code) from global ram and executed.
The spin source code still resides in the global 32K of ram.· All of the vars are in that same GR space as well.
Remember the old basic days?· Same concept here.· NOTE:· GR := Global Ram
You can make 1 32k spin appication, with 1 method. It's stored in the GR
You can make 1 32k spin appication with 512·{I don't know the max count on methods, just a random number} methods, It's stored in GR
You can make 8 spin·programs with·??? number of methods each, the sum of all must be <=32k, all of which is stored and left in GR.
The spin code never leaves the GR.· Only as it's executed are parts retrieved by the cog that executing that chunck of code.
The thing and sweetness of it all is this: Any one method can be run by any cog.· This can happen up to 8 times.
Assembly is different.· It occupies GR and a cog.
The source code for an assembly program is stored in GR.·
When you cognew it into a cog, it is copied entirely into the cog, then executed.
Now, if there was a simple way to keep assembly source from using GR, or to recycle the GR for other·things·[noparse]:)[/noparse]
When working with·some younger scouts (11~15), I translated the verbage into automotive terms they understood:
Each cog is a complete machine.· We have 8 machines we can use and play with.· Each of the 8 machines can do anything we tell it to do.·
If we want, we can tell all 8 to do the same thing, or each one something different.
Any machine can be a regular machine (runs spin)·or a preformance machine (runs assembly).
All 8 machines use the same gas tank· (the gas tank is the global ram )
If I want one of the machines to be a preformance machine, I have to put that special gas (assembly code) into the gas tank then tell one of the regular machines to use that gas.· The ECU (automotive Electronic Control Unit, for the propeller it's the compilier) knows where in the gas tank the special gas is, and makes the regular machine fill it's carbs (cog ram) with the special gas, then it starts the preformance machine.··If·you made the gas (program) right, the machines will never run out of gas.
This verbage hit it home for all but the youngest one.·
Not only did·they understand and demonstrate they understood it, some·are getting into the hobbie.
Well, I hope this helps.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket
KK
·
THanks. I didn't know how big a preformance hit it would be...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Just tossing my two bits worth into the bit bucket
KK
·