Shop OBEX P1 Docs P2 Docs Learn Events
Propeller as RAM (PRAM?) — Parallax Forums

Propeller as RAM (PRAM?)

heaterheater Posts: 3,370
edited 2009-01-30 08:07 in Propeller 1
Idea: What about using Propellers as external RAMs for, well, another Propeller? Crazy or not....

My application requires 64K of RAM for data space. Any more than that is overkill. It only has 20K free in HUB. It also requires a whole bunch of pins.

Adding 64K of SRAM connected with parallel address/data lines eats my pins. Using serially connected RAMs is going to be slooowww.

But what if I used Props as external RAMs? Two of them would easily give the required 44K and all the pins used in getting a reasonably fast parallel transfer would be got back. Plus I'd gain a whole bunch of COGs to use as peripherals, they have enough HUB left to be useful. All in all a better 3 chip solution than 1 Prop + 2 * 32K SRAM. If a little slower.

Question: What is the fastest way to do this?
I'm thinking of a "master" prop just pushing out address high, address low, data, using one pin as a clock. Another pin selects read/write, another pin may select IO or Memory operation. The two "slave" Props respond accordingly, sucking in data or blowing out on the third clock. Provided the COGS keep in synch with that three clock sequence, which they should, all is well.

An improvement would be to skip the address cycles for sequential reads/writes, which would take another pin I think.

There we have it, 24 Cogs, 96K RAM and 50 or so I/O pins free.

Any thoughts on this?

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
For me, the past is not over yet.

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2009-01-28 17:13
    It's relatively pricey, but very simple and straightforward. It could easily be done with a minimum of 10 I/O pins leaving plenty of others for other uses. I would suggest an SPI-like protocol with a "select" line to initialize the transfer, a clock, and 8 bits for address or data. A transfer would begin with an 8-bit operation code, optionally followed by one or two bytes of address, followed by a read or write transfer. The operation code would indicate whether the transfer is a read or write and whether an address follows. There would be several address registers internally in the memory Prop that could be used repeatedly with auto-increment or auto-decrement with the number of the address register in the operation code. This would allow things like keeping a program counter and stack pointer in the "memory chips". If done properly, multiple memory chips could be addressable and several could be selected at the same time so that the internal address registers of all of them could be kept in sync, yet read and write transfers would only involve one of them.
  • dMajodMajo Posts: 855
    edited 2009-01-28 17:16
    Take a look at this.

    This is not a direct answer to your question but you have dual flash (1MB + 64KB) and 32KB sram. And ... you will get back all your IO (and more) wasted by interfacing it with prop. It have a builtin cpld and dpld: both are very easy to program and the software·is free (PSDsoft Express)

    I have not yet used the PSD but I usually use the uPSD3454EVB which is more or less the same with builtin a 8032 10mips core (free samples available)
  • parskoparsko Posts: 501
    edited 2009-01-28 17:25
    Heater,

    I could swear that this was discussed a few years back on this forum. Maybe Mike's memory is better than mine, and he'd remember/confirm? Try a search.

    -Parsko

    [noparse][[/noparse]edit - I did a search and could not find it. I seem to recall that the big disadvantage would be the amount of additional ram that an extra prop adds is still below what most would want it for]

    Post Edited (parsko) : 1/28/2009 5:46:56 PM GMT
  • heaterheater Posts: 3,370
    edited 2009-01-28 19:51
    Mike, price is not such an issue as I want at least two Props anyway. The third Prop is a bonus for extra I/O which will be used. I was hoping to get away without that operation code cycle, that takes time. How about this:

    There is an 8 bit transfer bus for addresses and data.
    There are 3 control lines.
    There is 1 clock line.

    This still leaves 48 pins free which is enough for me.

    The three control lines are coded to indicate what the operation is:

    0 = Read memory byte at given address (Random access), 3 cycles.
    1 = Write memory byte to given address. (Random access), 3 cycles.
    2 = Read I/O port at given 8 bit port number. 2 cycles.
    3 = Write I/O port at given 8 bit port number. 2 cycles.
    4 = Read a byte from internal program counter register and increment the register (Next instruction fetch), 1 cycle.
    5 = Read a byte from internal stack pointer register address and increment the register (Pop), 1 cycle.
    6 = Decrement internal stack pointer and write a byte there (Push), 1 cycle.
    7 = Read memory byte from given address and store that address + 1 in the internal register (Jump), 3 cycles.

    We need also to set the internal stack pointer but we have no code for that. So do it by writing to 2 reserved I/O ports. Setting stack is infrequent so it can be a bit slower here (4 cycles)

    "Cycle" above means exchange of blobs of 8 bit data or address.

    I want to use both edges of the clock !!! Such that any change in the clock means a) There is now some data on the bus to write somewhere or b) give me some data to read ASAP.

    So a random read is:
    1. Ouput control and low address
    2. Toggle clock
    3. Output control and high address
    4. Toggle clock
    5. Output control and data
    6. Toggle clock

    Random write is:
    1. Output control and low address
    2. Toggle clock
    3. Output control and high address
    4. Toggle clock
    5. Toggle clock
    6. Wait for sufficient time and read data.

    Isn't this the way to minimize the amount of PASM required? If we get the timing right this should just fly long.

    Given the intelligent auto increment/decrement this could be faster than any static RAM.

    Note that there is no way to "initialize" a transfer or otherwise reset everything to a known state. I'm trusting that Props are deterministic and the two will stay in step for ever after the first clock edge. Is that too optimistic ?


    @Parsko: Perhaps I have a rare requirement, must have RAM but 64K is just fine, must have I/O. This just seems to fit perfectly.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Mike HuseltonMike Huselton Posts: 746
    edited 2009-01-28 20:06
    Heater, parkso, Mike Green:

    Check the Propeller Supercomputer discussions a few years ago. Chip discussed clocking and multiple Prop chips.

    Also see http://forums.parallax.com/showthread.php?p=582511

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    JMH

    Post Edited (Quantum) : 1/28/2009 8:23:55 PM GMT
  • hinvhinv Posts: 1,255
    edited 2009-01-29 15:25
    heater,

    I thought of the same thing. I say go for it. For me, if you need 64K, it seems like the simplest solution because you already know how the propeller behaves so you don't have t learn a new chip like the ones offered above.
    Those seem to fit the bill, but have many little pins, and you would to learn the timing.
    I would not shy away from PASM though. Transfers will be slow if you try to do it in spin, also, spin uses main memory, so it would be eating into your 64KB in very inconvenient places.
    Has anybody loaded an assembly, and then overwritten all of the contents of hub memory? I think that would be necessary unless you want to kludge here and there to work around the hub ram used by spin.

    Just my 10 bits input

    doug
  • kwinnkwinn Posts: 8,697
    edited 2009-01-29 19:26
    If the program needs so much ram that you want to add a second prop to provide it have you considered ways to split the program between the two props so that you can take advantage of the extra processing power? It would probably speed things up as well if you only need to pass commands and results back and forth instead of the raw data. 16 cogs, 64K of hub ram, hmmm should speed things up some.
  • heaterheater Posts: 3,370
    edited 2009-01-29 22:14
    @hinv: Me shy away from PASM? No, I would not dream of doing it any other way and I've tried to specify the interface to use as few instructions as possible. That idea about reusing HUB memory after loading everything into COG has been discussed here quite a lot and would be very useful in my case. As it is now with the Propeller tool chain it's seems quite tricky to pull off but others here have done work on this.

    @kwinn. I guess it depends on the nature of your problem and how it can be partitioned. Not sure how I would do it for the problem I have, 8 bit CPU emulation. The emulator it self does not need the RAM, everything fits in COGs, but the CPU it is emulating does!

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
  • Oldbitcollector (Jeff)Oldbitcollector (Jeff) Posts: 8,091
    edited 2009-01-30 01:10
    I could also swear we've had this conversation before, and tpw may have done a little
    code in this direction. IIRC, it wasn't a complete solution, but it was a start in this
    direction? Time to search the archives.

    Personally I love this idea.

    OBC

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    New to the Propeller?

    Check out: Protoboard Introduction , Propeller Cookbook 1.4 & Software Index
    Updates to the Cookbook are now posted to: Propeller.warrantyvoid.us
    Got an SD card connected? - PropDOS
  • heaterheater Posts: 3,370
    edited 2009-01-30 08:07
    OBC: I have this sneaking suspicion that you are correct. Don't remember a thing about it. But now the idea has its own thread and name, PRAM, so we might remember next time [noparse]:)[/noparse]

    Having been sleeping on it, I've decided that Mike's idea with 10 pin's and a control byte to start is better than my multi pin nightmare. Wouldn't you know. It uses less pins and is more flexible. For example if we want to simulate those index registers in a Z80 or have 16 pit port addresses or 32 bit data...... Also the code to drive it may be simpler.

    Used in an emulator, where one is using bunch of PASM instructions to emulate a CPU instruction interspersed with accesses to
    the external RAM this could be quite speedy. The calculation of where to put a byte in RAM and actually writing it are being done in parallel with the emulation code. Not to mention all the index auto increment and pre-fetching of bytes/words they point to.

    It occurs to me that in a PRAM Prop all it's COGS are sitting on this BUS of pins. So drivers like FullDuplexSerial or SD card SPI interface or whatever could be modified to talk to the PRAM BUS and not through HUB RAM. In this way we get speedy access to peripherals and ALL the 32K HUB RAM could be served up by the PRAM COG. The PRAM bus becomes a sort of HUB for external COGs (PUB?). Of course local COGs could sit on the PRAM bus as well.....

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    For me, the past is not over yet.
Sign In or Register to comment.