Propeller as RAM (PRAM?)
heater
Posts: 3,370
Idea: What about using Propellers as external RAMs for, well, another Propeller? Crazy or not....
My application requires 64K of RAM for data space. Any more than that is overkill. It only has 20K free in HUB. It also requires a whole bunch of pins.
Adding 64K of SRAM connected with parallel address/data lines eats my pins. Using serially connected RAMs is going to be slooowww.
But what if I used Props as external RAMs? Two of them would easily give the required 44K and all the pins used in getting a reasonably fast parallel transfer would be got back. Plus I'd gain a whole bunch of COGs to use as peripherals, they have enough HUB left to be useful. All in all a better 3 chip solution than 1 Prop + 2 * 32K SRAM. If a little slower.
Question: What is the fastest way to do this?
I'm thinking of a "master" prop just pushing out address high, address low, data, using one pin as a clock. Another pin selects read/write, another pin may select IO or Memory operation. The two "slave" Props respond accordingly, sucking in data or blowing out on the third clock. Provided the COGS keep in synch with that three clock sequence, which they should, all is well.
An improvement would be to skip the address cycles for sequential reads/writes, which would take another pin I think.
There we have it, 24 Cogs, 96K RAM and 50 or so I/O pins free.
Any thoughts on this?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
My application requires 64K of RAM for data space. Any more than that is overkill. It only has 20K free in HUB. It also requires a whole bunch of pins.
Adding 64K of SRAM connected with parallel address/data lines eats my pins. Using serially connected RAMs is going to be slooowww.
But what if I used Props as external RAMs? Two of them would easily give the required 44K and all the pins used in getting a reasonably fast parallel transfer would be got back. Plus I'd gain a whole bunch of COGs to use as peripherals, they have enough HUB left to be useful. All in all a better 3 chip solution than 1 Prop + 2 * 32K SRAM. If a little slower.
Question: What is the fastest way to do this?
I'm thinking of a "master" prop just pushing out address high, address low, data, using one pin as a clock. Another pin selects read/write, another pin may select IO or Memory operation. The two "slave" Props respond accordingly, sucking in data or blowing out on the third clock. Provided the COGS keep in synch with that three clock sequence, which they should, all is well.
An improvement would be to skip the address cycles for sequential reads/writes, which would take another pin I think.
There we have it, 24 Cogs, 96K RAM and 50 or so I/O pins free.
Any thoughts on this?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Comments
This is not a direct answer to your question but you have dual flash (1MB + 64KB) and 32KB sram. And ... you will get back all your IO (and more) wasted by interfacing it with prop. It have a builtin cpld and dpld: both are very easy to program and the software·is free (PSDsoft Express)
I have not yet used the PSD but I usually use the uPSD3454EVB which is more or less the same with builtin a 8032 10mips core (free samples available)
I could swear that this was discussed a few years back on this forum. Maybe Mike's memory is better than mine, and he'd remember/confirm? Try a search.
-Parsko
[noparse][[/noparse]edit - I did a search and could not find it. I seem to recall that the big disadvantage would be the amount of additional ram that an extra prop adds is still below what most would want it for]
Post Edited (parsko) : 1/28/2009 5:46:56 PM GMT
There is an 8 bit transfer bus for addresses and data.
There are 3 control lines.
There is 1 clock line.
This still leaves 48 pins free which is enough for me.
The three control lines are coded to indicate what the operation is:
0 = Read memory byte at given address (Random access), 3 cycles.
1 = Write memory byte to given address. (Random access), 3 cycles.
2 = Read I/O port at given 8 bit port number. 2 cycles.
3 = Write I/O port at given 8 bit port number. 2 cycles.
4 = Read a byte from internal program counter register and increment the register (Next instruction fetch), 1 cycle.
5 = Read a byte from internal stack pointer register address and increment the register (Pop), 1 cycle.
6 = Decrement internal stack pointer and write a byte there (Push), 1 cycle.
7 = Read memory byte from given address and store that address + 1 in the internal register (Jump), 3 cycles.
We need also to set the internal stack pointer but we have no code for that. So do it by writing to 2 reserved I/O ports. Setting stack is infrequent so it can be a bit slower here (4 cycles)
"Cycle" above means exchange of blobs of 8 bit data or address.
I want to use both edges of the clock !!! Such that any change in the clock means a) There is now some data on the bus to write somewhere or b) give me some data to read ASAP.
So a random read is:
1. Ouput control and low address
2. Toggle clock
3. Output control and high address
4. Toggle clock
5. Output control and data
6. Toggle clock
Random write is:
1. Output control and low address
2. Toggle clock
3. Output control and high address
4. Toggle clock
5. Toggle clock
6. Wait for sufficient time and read data.
Isn't this the way to minimize the amount of PASM required? If we get the timing right this should just fly long.
Given the intelligent auto increment/decrement this could be faster than any static RAM.
Note that there is no way to "initialize" a transfer or otherwise reset everything to a known state. I'm trusting that Props are deterministic and the two will stay in step for ever after the first clock edge. Is that too optimistic ?
@Parsko: Perhaps I have a rare requirement, must have RAM but 64K is just fine, must have I/O. This just seems to fit perfectly.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
Check the Propeller Supercomputer discussions a few years ago. Chip discussed clocking and multiple Prop chips.
Also see http://forums.parallax.com/showthread.php?p=582511
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
JMH
Post Edited (Quantum) : 1/28/2009 8:23:55 PM GMT
I thought of the same thing. I say go for it. For me, if you need 64K, it seems like the simplest solution because you already know how the propeller behaves so you don't have t learn a new chip like the ones offered above.
Those seem to fit the bill, but have many little pins, and you would to learn the timing.
I would not shy away from PASM though. Transfers will be slow if you try to do it in spin, also, spin uses main memory, so it would be eating into your 64KB in very inconvenient places.
Has anybody loaded an assembly, and then overwritten all of the contents of hub memory? I think that would be necessary unless you want to kludge here and there to work around the hub ram used by spin.
Just my 10 bits input
doug
@kwinn. I guess it depends on the nature of your problem and how it can be partitioned. Not sure how I would do it for the problem I have, 8 bit CPU emulation. The emulator it self does not need the RAM, everything fits in COGs, but the CPU it is emulating does!
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.
code in this direction. IIRC, it wasn't a complete solution, but it was a start in this
direction? Time to search the archives.
Personally I love this idea.
OBC
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
New to the Propeller?
Check out: Protoboard Introduction , Propeller Cookbook 1.4 & Software Index
Updates to the Cookbook are now posted to: Propeller.warrantyvoid.us
Got an SD card connected? - PropDOS
Having been sleeping on it, I've decided that Mike's idea with 10 pin's and a control byte to start is better than my multi pin nightmare. Wouldn't you know. It uses less pins and is more flexible. For example if we want to simulate those index registers in a Z80 or have 16 pit port addresses or 32 bit data...... Also the code to drive it may be simpler.
Used in an emulator, where one is using bunch of PASM instructions to emulate a CPU instruction interspersed with accesses to
the external RAM this could be quite speedy. The calculation of where to put a byte in RAM and actually writing it are being done in parallel with the emulation code. Not to mention all the index auto increment and pre-fetching of bytes/words they point to.
It occurs to me that in a PRAM Prop all it's COGS are sitting on this BUS of pins. So drivers like FullDuplexSerial or SD card SPI interface or whatever could be modified to talk to the PRAM BUS and not through HUB RAM. In this way we get speedy access to peripherals and ALL the 32K HUB RAM could be served up by the PRAM COG. The PRAM bus becomes a sort of HUB for external COGs (PUB?). Of course local COGs could sit on the PRAM bus as well.....
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.