Shop OBEX P1 Docs P2 Docs Learn Events
Propeller RAM — Parallax Forums

Propeller RAM

BADHABITBADHABIT Posts: 138
edited 2009-04-25 21:55 in Propeller 1
I am curious about the RAM and memory structure in the prop. I have seen that there are SD card add ons for it and would like to know how this changes the memory structure of the system. rolleyes.gif
From what I've read so far the EEPROM and RAM are equal and program size is limited to what the RAM can store and that's it. That has to be untrue bc there would be no need for the SD cards, unless the card holds extra objects and the RAM is for the Top Object File, or something else that hasn't come to mind yet. Or is it just for data collected from external apparatus? (camera or sensors) burger.gif
Is it possible to use external RAM chips?·yeah.gif If it were then the ? about 16 cogs or more RAM for the P2·would be easily solved.

skull.gif·

Comments

  • Mike GreenMike Green Posts: 23,101
    edited 2009-04-24 00:06
    There is 32K of shared RAM and 32K of shared masked ROM that contains display font, some math tables, and the Spin interpreter which is loaded from ROM into the cogs' 2K RAM (each). There's provision for an external 32K of EEPROM which is used for loading the initial program (via a bootloader in ROM) from the 1st 32K of the EEPROM on pins 28/29. There's no built-in provision for any other I/O. There are I/O routines available that can be incorporated into your programs (or built-into programs written by others) that can read and write to EEPROM, whether the boot EEPROM or parts of the boot EEPROM beyond the 1st 32K or other EEPROMs attached to the Prop. There are I/O routines for reading and writing PC compatible files on an SD card. Neither the EEPROMs nor the SD card change the memory structure. They're just I/O devices to the Propeller.· The shared RAM and ROM in fact looks like an I/O device to the native (assembly) instruction set in that they're accessed only by special instructions and are completely separate from the cogs' own RAM.
  • BADHABITBADHABIT Posts: 138
    edited 2009-04-24 00:16
    So external RAM could be used for larger programs?
  • mctriviamctrivia Posts: 3,772
    edited 2009-04-24 01:06
    definetly see cluso's tri blade.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Need to make your prop design easier or secure? Get a PropMod has crystal, eeprom, and programing header in a 40 pin dip 0.7" pitch module with uSD reader, and RTC options.
  • Mike GreenMike Green Posts: 23,101
    edited 2009-04-24 01:46
    BADHABIT,
    Not the way you're thinking about it. There is no way to make external memory an extension of the internal memory.

    On the other hand, there is a Z80 emulator written for the Prop and used to run CP/M. There's a version of it that uses external memory to store the Z80 program. There's a performance penalty for this, but the emulator runs faster than the original Z80 systems, so it's very usable.

    There's another interpreter for a special form of the Prop's instruction set which is called the Large Memory Model (LMM). It's partially interpreted and only a few times slower than the native instruction set. Several people have proposed running an LMM interpreter with the LMM code in an external RAM. Like the Z80 emulator/interpreter, there would be a performance hit for accessing the external RAM, but it may be reasonable for real-world programs, particularly when you can load small pieces of the program into the cog's memory for full speed execution (like small tight loops).
  • jazzedjazzed Posts: 11,803
    edited 2009-04-24 04:15
    I'm running LMM code straight from external memory. Two problems exist: speed and density.

    If the code can be cached (and there is a mechanism for doing that), the performance would not be too terrible. As it stands today uncached with a byte-wide memory I'm getting about 800K LMM instructions per second with fairly optimized fetch code and hardware. Hardware optimized for speed can however do little else except execute instructions and communicate via the serial port.

    To get a more I/O friendly external design using one Propeller, one must sacrifice fetch speed using address latch mechanisms. Having to constantly switch the pin directions costs at least 8 clock cycles (100ns at 8MHz). Unfortunately the story does not end with that delay since addresses must be also moved and latched. This requires masking and shifting and bit setting, yada yada yada.

    Using two+ Propellers for the problem helps, but then you have latency associated with peripherals being attached to the non-code executing propeller. A design where a parallel bus is used for communication between props is possible but limits memory from 2MB to say 1.5MB. I have such a design documented and somewhat in progress ... a hardware prototype with limited features based on the PropRPM board has been finished since February and XMM code running for over a month ... I call it "iiProp" .

    I've looked at using 2 Propellers working in tandem to solve the bus interface requirements, but someone smarter than me will have to figure that out. Problem is how do you split up the task? Obviously one Propeller must execute the instructions. So if the other Propeller magically knew the address to set for the executer to fetch, I'd be in business. One could say with 8 bits what relative address to set or with 4 transactions what absolute address to set, but this is really no better than just using memory attached to one propeller.

    My latest study is back to focusing on DRAM (specifically SDRAM) like I did last year. This time though, I'll be using some optimizations to reduce access times. DRAM densities and required pin count are too good to ignore. Hopefull I'll get back to the cache business soon so that access speed is less critical. Meanwhile ... so much to do, so little time.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
  • BADHABITBADHABIT Posts: 138
    edited 2009-04-24 07:03
    With multiple props would it be possible to use one chip at a time for processing, then have the other(s) load their RAM and wait in a queue to be the next proc when the other chip finished?
  • StefanL38StefanL38 Posts: 2,292
    edited 2009-04-24 12:38
    hello,

    it is not really clear what you are trying too do

    distributing single method calls would slow down the system. Because you have overhead for all the communication between the props

    maybe you write what you want to do IN THE END.

    It's ALWAYS the same. As soon as people tell about their REAL project in DETAIL new and elegant soltutions can be found

    otherwise you are leading the forum-members into you "three-circles-around from-the-back-through-the-ey-to-the-heart"-direction
    which might be much more complicated than other solutions

    best regards

    Stefan
  • Mike GreenMike Green Posts: 23,101
    edited 2009-04-24 14:02
    It's possible to do all kinds of things. Whether it's practical, affordable, reliable, reasonable is another question entirely. Whether it does what you really wanted, but didn't ask is unlikely.
  • BADHABITBADHABIT Posts: 138
    edited 2009-04-25 05:38
    This is purely hypothetical at this point, just to help me get a better idea about how extended memory works and the possibilities for size and speed, to use as a future reference for systems that may need that sort thing.
  • Rick PriceRick Price Posts: 36
    edited 2009-04-25 20:20
    Have you considered doing some sort of pipelining where once you latch the address you are committed to loading a fixed number of bytes using a clock signal? You would be able to use fewer address lines, and would not have to change the direction of the bits as frequently.

    Rick


    jazzed said...
    I'm running LMM code straight from external memory. Two problems exist: speed and density.

    If the code can be cached (and there is a mechanism for doing that), the performance would not be too terrible. As it stands today uncached with a byte-wide memory I'm getting about 800K LMM instructions per second with fairly optimized fetch code and hardware. Hardware optimized for speed can however do little else except execute instructions and communicate via the serial port.

    To get a more I/O friendly external design using one Propeller, one must sacrifice fetch speed using address latch mechanisms. Having to constantly switch the pin directions costs at least 8 clock cycles (100ns at 8MHz). Unfortunately the story does not end with that delay since addresses must be also moved and latched. This requires masking and shifting and bit setting, yada yada yada.

    Using two+ Propellers for the problem helps, but then you have latency associated with peripherals being attached to the non-code executing propeller. A design where a parallel bus is used for communication between props is possible but limits memory from 2MB to say 1.5MB. I have such a design documented and somewhat in progress ... a hardware prototype with limited features based on the PropRPM board has been finished since February and XMM code running for over a month ... I call it "iiProp" .

    I've looked at using 2 Propellers working in tandem to solve the bus interface requirements, but someone smarter than me will have to figure that out. Problem is how do you split up the task? Obviously one Propeller must execute the instructions. So if the other Propeller magically knew the address to set for the executer to fetch, I'd be in business. One could say with 8 bits what relative address to set or with 4 transactions what absolute address to set, but this is really no better than just using memory attached to one propeller.

    My latest study is back to focusing on DRAM (specifically SDRAM) like I did last year. This time though, I'll be using some optimizations to reduce access times. DRAM densities and required pin count are too good to ignore. Hopefull I'll get back to the cache business soon so that access speed is less critical. Meanwhile ... so much to do, so little time.
  • jazzedjazzed Posts: 11,803
    edited 2009-04-25 21:55
    Rick Price said...

    Have you considered doing some sort of pipelining where once you latch the address you are committed to loading a fixed number of bytes using a clock signal? You would be able to use fewer address lines, and would not have to change the direction of the bits as frequently.

    Funny you mention that Rick [noparse]:)[/noparse] It's supposed to be a secret though ... shhhh.

    I've spent the last couple of days looking at implementing a synchronous design that would be more Propeller friendly than stock SDRAM. I've planned to use an "Access Packet" format to communicate the transaction to a CPLD which will act as a memory controller. I have written a PASM module along that line.

    My initial test bed will be the HX512K board, but I intend to have a higher performance, higher density custom platform board.

    The problem with the HX512K now is it focuses more on block transfers ... but even that is geologically slow. Of course having the target platform put the data bits in a wierd place that requires at least 4 extra instructions doesn't help ... an old floppy drive cable could fix that [noparse]:)[/noparse] I'm looking at rewriting the CPLD ABEL-HDL code ... which I've actually used before [noparse]:)[/noparse] to support the faster synchronous random access paradigm. Using a CPLD also fixs byte only transactions in a 16 bit bus.

    Having someone else code up the CPLD would speed things up. I have to go back and refresh a bit even with ABEL.

    My projected numbers are as follows:

    Modified HX512K CPLD and standard 8 bit I/O ... Other Propeller pins in Hydra config:
    • Write32 ... 700ns .... 1.4MB/s ... 4 bytes
    • Read32 ... 1200ns ... 800KB/s ... 4 bytes
    • Write8 ... 1000ns ... 1.0MB/s ... 1 byte
    • Write8 ... 4600ns ... 2.1MB/s ... 10 bytes
    • Read8 ... 1200ns ... 800KB/s ... 1 byte
    • Read8 ... 5450ns ... 1.8MB/s ... 10 bytes
    Custom 16 bit I/O FAB ... 11 free Propeller pins for TV, KB, Mouse, SDIO, 4MB+ SRAM.
    (SDRAM times would be different and require different CPLD logic of course):
    • Write32 ... 500ns .... 2.0MB/s ... 4 bytes
    • Read32 ... 450ns ... 2.2MB/s ... 4 bytes
    • Write8 ... 700ns ... 1.4MB/s ... 1 byte
    • Write8 ... 2050ns ... 4.8MB/s ... 10 bytes
    • Read8 ... 1075ns ... 930KB/s ... 1 byte
    • Read8 ... 4900ns ... 2.0MB/s ... 10 bytes
    Just for comparison, a 4 byte "copy" from SRAM with HX512K today takes about 9us as far as I can tell.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    --Steve


    Propalyzer: Propeller PC Logic Analyzer
    http://forums.parallax.com/showthread.php?p=788230
Sign In or Register to comment.