Propeller External Memory Hardware Survey

Dr_Acula · 2011-10-02 17:11

Yes I know there are other solutions that involve an SD card but I'd rather have a mechanism that doesn't require moving the SD card from the PC to the RamBlade and back for every code recompile.

That is a problem I have come up against too with Catalina and it is going to be the same with GCC - once you have a program that is megabytes in size, downloading via a 115200 serial link gets very tedious.

I have some boards on the way that use a USB to SD adaptor. I got a whole lot of these cheap on ebay and the case pops off easily and so you can solder the board onto your own PCB. Then a 4PDT switch so you can switch the SD from the Prop to the adaptor. Downloads to an SD card via USB are much faster than via serial.

Maybe there are other answers too.

RossH · 2011-10-02 17:37

David Betz wrote: »

Yes I know there are other solutions that involve an SD card but I'd rather have a mechanism that doesn't require moving the SD card from the PC to the RamBlade and back for every code recompile.

Hi David,

If you are in a load/test/fix/compile cycle (which I spend a lot of time doing!) then using an SD card is by far the fastest and easiest way to to it. It's much faster even than using the Parallax Spin tool compile and serially load ordinary Spin programs - and you don't need to worry about messing about with cables or serial ports.

Ross.

David Betz · 2011-10-02 17:48

Well, that may be true but I don't think it's going to be good news to the commercial developers we're hoping to attract. It's a much more cumbersome way of developing code than what is used on other systems.

Luis Digital · 2011-10-02 18:18

Hi,

The Atmel Flash Memory AT45DB*** have good price and are very common. Use a SPI bus, and are easy to use.

Cluso99 · 2011-10-02 19:30

RamBlade2 moves the serial pins back to P30/31 for standard loading. It is 1 cycle slower. Currently this is built using my 1" sq stackable boards and has an RTC too. Latches could be interposed if required.

RamBlade3 is designed on a commercial pcb. I have not yet produced it in a standalone pcb. This is the same as RamBlade2 with the RTC/battery plus the ability to connect to my BaseBlade series.

TriBlade was too expensive for the pcbs. However, I am reconsidering building a new pcb for this which should cost less by removing some of the parts that were never used.

My TriBlade and RamBlades were/are designed to be multiple prop solutions. The Tri/RamBlade was designed to be the fast workhorse and the second prop to be the I/O controller. These can all be downloaded at runtime from another prop. Downloading code via the serial pins to the microSD is only a fairly simple program compared with what has been done with these boards - it is just that it has not been done.

There is of course an issue that all these solutions raise, and stated by a few above. We are making the prop act like other processors. But, the point is that the prop does this nicely and we can keep the simplicity of the prop and the multicore solutions with these solutions. So they are not wasted designs.

Dr_Acula · 2011-10-03 03:09

Schematic and code for a single latch external memory solution. Hopefully I included all the relevant bits of code. Many thanks in advance for considering this hardware option.

'Read a "len" byte block given by "sram_address" into hub at "hubaddr"  - Cluso99's ram driver
' preserves sram_address, destroys len and hubaddr
' uses t1,t2
rdblock               
                        mov             t1,sram_address         ' store sram address
                        shl             t1,#8                   ' shift left so in the right position
                        or              t1,oerd                 ' %00000000_01010000_00000000_00000000 ' /oe and /rd low 
                        ' outa pre-filled with the address shifted over by 8 and the /oe and /rd low
                        mov             outa,t1                 ' send it out
                        nop                                     ' this delay is important otherwise misses first bytes   
rdloop                  mov             t2, ina                 ' read byte from SRAM \ ignores upper bits
                        wrbyte          t2, hubaddr             ' copy byte to hub    /
                        add             hubaddr, #1             ' inc hub pointer
                        add             outa, #(1 << 8)         ' inc sram address
                        djnz            len, #rdloop            ' loop for xxx bytes
                        mov             outa,oewrrd             ' oe,wr and rd and latch all high
rdblock_ret             ret


' Change latch. Pass latchcounter, uses t1

selectlatch             mov              dira,blockmask ' enable different pins to reading, ie P0-7 are outputs now
                        mov              t1,latchcounter           ' get the latch counter
                        and              t1,#%01111111              'mask off high bit as 512k/4096/16 is 128
                        mov              outa,t1                    ' output blockcount byte
                        and              outa,latchlow              ' set latch low
                        nop                                         ' delay not needed but added it anyway
                        nop
                        or               outa,latchhigh             ' set latch high
                        mov              dira,dira_pins             ' enable these pins for output  %00000000_11111111_11111111_00000000
                        mov              outa,oewrrd                ' set oe,wr,rd and latch high
selectlatch_ret         ret
                      
' write a "len" byte block from hub at hubaddr to sram_address
' destroys sram_address,hubaddr and len. Uses t1 and t2

wrblock
                        mov             dira,pin0to23            ' dira pins %00000000_11111111_11111111_11111111                   
wrloop                  mov             t1,sram_address          ' store sram address in t1
                        shl             t1,#8                    ' shift left so in the right position
                        rdbyte          t2,hubaddr               ' get the byte
                        or              t1,oewr                  ' or with %00000000_00110000_00000000_00000000
                        or              t1,t2                    ' or with the hub address
                        mov             outa,t1                  ' send it out
                        nop                                      ' wait a little
                        nop
                        or              outa,oewrrd              ' oe,wr and rd all high
                        add             hubaddr,#1                    ' add 1 to hub address
                        add             sram_address,#1          ' add 1 to sram address
                        djnz            len, #wrloop             ' loop for xxx bytes
                        mov             dira,dira_pins          ' enable these pins for output  %00000000_11111111_11111111_00000000 
                        mov             outa,oewrrd              ' oe,wr and rd and latch all high
wrblock_ret             ret

' Initialised data
'address_mask    long %00000000_00000000_00011111_11111111                     ' 13 bits max 32 lines otherwise need to latch
oewrrd          long %00000000_11110000_00000000_00000000                     ' 23 = oe, 22=wr, 21=rd, latch high
oerd            long %00000000_01010000_00000000_00000000                     ' oe and rd low
oewr            long %00000000_00110000_00000000_00000000                     ' oe and wr low
dira_pins       long %00000000_11111111_11111111_00000000                     ' direction pins to be enabled                         
pin0to23        long %00000000_11111111_11111111_11111111                     ' for testing
zero            long %00000000_00000000_00000000_00000000                     ' for testing
'smallblock      long 1024
fivetwelve      long %00000000_00000000_00000010_00000000                       ' 512 for sram increment
blockmask       long %00000000_00010000_00000000_11111111                       ' mask for block select
latchlow        long %11111111_11101111_11111111_11111111                       ' mask for P20 latch low
latchhigh       long %00000000_00010000_00000000_00000000                       ' mask for P20 latch high
latchcounter    long 0  
t1              long 0        ' local variable
t2              long 0        ' local variable

David Betz · 2011-10-03 06:17

Cluso99 wrote: »

RamBlade2 moves the serial pins back to P30/31 for standard loading. It is 1 cycle slower. Currently this is built using my 1" sq stackable boards and has an RTC too. Latches could be interposed if required.

Are you saying that RamBlade2 is no longer a single module but a stack of your 1" modules? How many modules are required to make the equivilent of a RamBlade1?

David Betz · 2011-10-03 07:04

Cluso99: could you post the URL to the web site that describes your new modules? The URL in your signature just talks about old boards like the TriBlade and RamBlade1.

Cluso99 · 2011-10-03 13:40

David: www.clusos.com/home is my new web address. However, RamBlade2 & 3 is not really mentioned in the strictest sense.

RamBlade2 is made with either 2 or 3 1"sq pcbs (or a BaseBlade1 plus 1 or 2 1"sq pcbs). They are: CpuBlade (Prop/xtal/eeprom/3v3reg/resetcct); MemBlade (SRAM/microSD/3v3reg); ClockBlade (RTC/battery). So, the equivalent RamBlade (still available) is 2 modules (Cpu + Mem). And of course, they will stack onto a BaseBlade1 to give an extra Prop chip with expansion.

David Betz · 2011-10-03 13:54

Thanks for the updated link. You might want to change your signature to include it as well. Once I get my RamBlade1 up and running I may send in an order for the RamBlade2 stack. Only two boards to makeup the same functionality of the RamBlade1 seems reasonable.

prof_braino · 2011-10-04 08:51

jazzed wrote: »

Please help determine which external memory hardware solutions should be added to the initial Propeller GCC release. Hardware developers can easily add their own solution, but we need a minimum set for testing. Please just mention the hardware you have in a reply.

Don't know if this is of interest, but in propforth we use SD card as external RAM. Is this possible in GCC or would it be something to consider including? Granted the serial transfer is not the fastest, but its still pretty fast and can be handy to have in some applications. It can be quite snappy with a minimal driver.

David Betz · 2011-10-04 09:01

prof_braino wrote: »

Don't know if this is of interest, but in propforth we use SD card as external RAM. Is this possible in GCC or would it be something to consider including? Granted the serial transfer is not the fastest, but its still pretty fast and can be handy to have in some applications. It can be quite snappy with a minimal driver.

We'd like to try that. The main problem I see is that we would have to maintain a cluster map of the executable image so that the XMM cache driver could find all of the sectors. Either that or we'd have to require that the executable files be contiguous on the SD card. How do you handle that with PropForth?

jazzed · 2011-10-04 20:48

Dr_Acula wrote: »

Schematic and code for a single latch external memory solution. Hopefully I included all the relevant bits of code. Many thanks in advance for considering this hardware option.

Have you considered using CTRA + WAITVID to generate the address? That could be used in conjunction with CTRB edge detect mode to increment the hub pointer. Doing this would allow your SRAM to be as fast as any of the synchronous solutions - I.E. 5MB/s at 80MHz.

I'm hoping I can put together some instructions on using an external memory COG with Propeller GCC tomorrow.

prof_braino · 2011-10-05 04:47

David Betz wrote: »

.. maintain a cluster map .... Either that or we'd have to require that the executable files be contiguous on the SD card. How do you handle that with PropForth?

The mechanisms for both EEPROM and the SD are similar. Both are a single long list of "pages". By default, the only thing maintained is a list of "file name" and the length of each in "pages". In the simplest form, there is no support for delete or resize, this keeps the driver simple and fast. To delete or re-size, we reinitialize of reformat to device. The idea is that the memory is internal to the kernel. While this is a bit different than we'd likely see on a workstation, it can make sense on a micro.

The EEPROM does allow for something like "delete" files, but it is "drop", it just drops the name and length from the list; the storage is not erased or recovered until reformat.

More complex operations like file delete, resize, and FAT compatibility can be added as extensinions, but these have een intnetionally left out in the interest of speed and small memory foot print.

The rule of thumb is smallest, minimium functionality to get the basic job done; in this case the job is store and retrive. No extras, those are for later if needed.

David Betz · 2011-10-05 06:46

I see. You're not using a standard FAT filesystem on the SD card. Do you have a utility that runs on a PC that can read/write this format?

prof_braino · 2011-10-06 08:53

David Betz wrote: »

I see. You're not using a standard FAT filesystem on the SD card. Do you have a utility that runs on a PC that can read/write this format?

No, the idea is that the SD is used as RAM, it is internal only, and not designed to be accessed by the PC. Much like the way flash on an iPod is internal only, it can only be accessed via the iPod provided interface. This is how it stays fast and small.

Of course an extension can be written to support FAT, and a PC utility can be written to access it, but this has not been needed so far.

If the design requires FAT/PC compatibility, that presents conflicts with the RAM/speed requirements for the prop implementation. The raw SD speed is reasonable in comparison to the prop for certain application if the overhead is kept to a minimum. But this can quickly get bogged down if the design calls for the prop to behave as a workstation. Just something to consider if you want to persude this option. Its simple and cheap and big, but much slower than DRAM so not a general replacement, just an option.

jazzed · 2011-10-06 09:02

prof_braino wrote: »

No, the idea is that the SD is used as RAM ...

I consider this an "apparent solution" ... that is it apparently works and there are no guarantees.

How long will it work?
How do the blocks get accessed?
How big are the memory chunks that are being used?
How big are the programs that are being stored?

It would be far better to use the device as it is meant to be used to avoid grief down the road.

Phil Pilgrim (PhiPi) · 2011-10-06 09:28

jazzed wrote:

It would be far better to use the device as it is meant to be used to avoid grief down the road.

Beg pardon, but I'm not sure prof_braino's use of it is in conflict with how it was "meant" to be used -- only with how Microsoft meant it to be used, since the FAT filesystem is their invention. An SD card is just a piece of hardware with many different -- and perfectly legitimate -- ways to use it. Prof_braino has created a minimalist filesystem which joins the ranks of hundreds of other filesystems out there. Just because it won't mount under Windows is no reason to dismiss it. Apple's HFS won't mount under Windows, either, without special drivers.

-Phil

jazzed · 2011-10-06 09:39

Phil Pilgrim (PhiPi) wrote: »

Beg pardon, but I'm not sure prof_braino's use of it is in conflict with how it was "meant" to be used -- only with how Microsoft meant it to be used, since the FAT filesystem is their invention. An SD card is just a piece of hardware with many different -- and perfectly legitimate -- ways to use it. Prof_braino has created a minimalist filesystem which joins the ranks of hundreds of other filesystems out there. Just because it won't mount under Windows is no reason to dismiss it. Apple's HFS won't mount under Windows, either, without special drivers.

-Phil

Personally, I would rather use ext3 but that's a little heavy

.

What i meant by "meant to be used" is that it should be used in the context of a file system. You're asking for trouble otherwise.

Cluso99 · 2011-10-06 12:04

Basically SD memory is just SPI Flash with a little processor chip taking care of wear levelling. So really, it is no different to using a normal SPI Flash.

You can reserve a chunk of memory on the SD for using as RAM and access it directly by sector number. Provided it is contiguous, it may even be a file under FAT. This is how we emulate the 8MB disks on CPM.

So, with respect, it is a sufficiently valid use.

jazzed · 2011-10-06 12:20

Cluso99 wrote: »

So, with respect, it is a sufficiently valid use.

That is very interesting. Do you have any MTBF numbers with this model?
How many RAM read/writes does it take before the product stops working?
Would you expect a product from institutional professionals to support this?

I can see this solution being offered by/for small organizations (or hobbyists) that don't mind risk.

Cluso99 · 2011-10-06 22:04

jazzed: The SD cards all use flash memory! In fact, it is designed to be read/written perhaps more so than the SPI Flash which is usually just for boot loading. If you require mtbf go to the reputable SD card suppliers. I do not know what flash chips they are using, and it is possible they are even parallel, not serial. Not that it matters.

Tor · 2011-10-07 01:07

We can already tell that an SD card can survive a _lot_ of read/writes (due to its built-in wear-levelling handling). This is precisely because they're normally used as FAT filesystems.. FAT is incredibly inefficient in its directory update handling. According to an article I read on Wikipedia some years ago a simple update of a few files can easily result in a hundred thousand file accesses to the filesystem. That article (which I can't find in its original version right now - it's been rewritten since) was a discussion which explained why FAT wouldn't be possible on flash memory without wear levelling.

-Tor

Heater. · 2011-10-07 02:20

This debate about the longevity of SD cards when used for a lot of writes makes me crazy. It is always stated that:
1) SD is based of FLASH memory and therefore is subject to failure when blocks have exceeded their permitted number of write cycles.
2) SD cards have some processor on board performing some magical "wear leveling" such that if you think you are writing to the same block repeatedly you will actually be writing to different physical blocks, perhaps using some hidden blocks or shuffling used blocks for unused blocks. The implication being that an SD card can take a lot more write cycles than the naked physical flash blocks of which it is built.

Problem is that after searching around on a number of occasions I can't pin down a definitive statement about:
1) How many times a physical block can be rewritten.
2) How the wear leveling works and how well does it save you from premature failure.

Then there is talk about the wear leveling swapping to unused blocks to save wear on heavily used blocks. Well how on earth can that processor on the SD know that any block is now unused by my file system? It can't unless I tell it which we generally don't. I believe Linux can do this now.

What this boils down to is that if you are using SD as a file store you have no way of predicting it's lifetime when subject to write activity.

When using SD as RAM, which implies to me a lot of continuous write activity, you are living on the edge.

Does anyone have any figures on this or done any experiments?

What happens if I repeatedly rewrite block zero on an SD card as fast as possible from a Prop. Does it last a day, a week a month? Who knows.

Time for an experiment...

Dr_Acula · 2011-10-07 02:46

jazzed

Have you considered using CTRA + WAITVID to generate the address?

No I hadn't thought of that - it looks an interesting solution - I'll check it out.

Re the memory driver, I'm racing off to my breadboard shortly to try a slightly different design so no hurry or rush adding that design I posted earlier as it may change in the next week or two. Though in general terms it will still be a SRAM and latch(es).

Tor · 2011-10-07 03:40

Heater. wrote: »

1) How many times a physical block can be rewritten.

I've seen 100,000 rewrites stated. Sometimes the number is (much) higher, and (particularly in the past) lower.

2) How the wear leveling works and how well does it save you from premature failure.

Then there is talk about the wear leveling swapping to unused blocks to save wear on heavily used blocks. Well how on earth can that processor on the SD know that any block is now unused by my file system? It can't unless I tell it which we generally don't. I believe Linux can do this now.

afaik it's much simpler than that. It simply keeps track of which blocks have been written to, then when the next write occurs it writes to the next least written-to block. A kind of round-robin scheme. It doesn't have to know anything about the filesystem used (or not used, as it may be). In addition to that there is a small pool of spare blocks, so that if a write to a block fails it can replace that one with a spare block.

What this boils down to is that if you are using SD as a file store you have no way of predicting it's lifetime when subject to write activity.

Probably true, but still, it's very rare to hear about an SD card breaking down due to worn out blocks. It's nearly unheard of. The vast majority of SD card write failures are due to other issues. Many new high-capacity cards, for example, used to fail in some devices because they use the control bits (signal pins) in a subtly different way. When the Nokia N900 came out a few years ago there was, after a firmware upgrade, suddenly some reports about some cards failing while others worked fine. Only due to a lot of reports was the cause finally nailed down: It only happened if a write was done soon before the N900 went into ARM power save mode, i.e. it dropped the voltage from 3v to 1.8v, (or off - possibly off, when I think about it). Turned out this was not safe to do until a certain bit on the SD card had changed, although that was not how it's described in the SD technical docu (found at the SD org website) at all. A bit like turning off the disk power while there's still data in its internal RAM cache.

I'm not sure if I've ever heard about a worn-out SD card except for lab test setups where you write to the card constantly until it fails. There have been warnings about not using SD cards for swap space, for example, or log file systems, due to all the writes. But in practice it works fine.

-Tor

Cluso99 · 2011-10-07 03:58

From what I understand, the comments are in respect to using SD as external RAM as in LMM type situations. Is this the correct interpretation? Because if so, SD usage should be fine. If however, it was used as a screen buffer with game programs (i.e. continually being changed) then that would be a different thing altogether.

So perhaps some clarification is in order???

Heater. · 2011-10-07 06:39

Yep depends what you mean by "RAM".

For example, currently if you put ZPU code out in external memory for Zog to run then the stack and variables will be there to. Ooops you are then giving your FLASH a serious thrashing. David Betz on the other hand has an implementation that puts the code out in external RAM and the stack and variables stay in HUB. OK no FLASH beating anymore.

AntoineDoinel · 2011-10-07 07:29

What about this scenario...

suppose you want 1MB XMEM (2048 x 512B pages => 11bit sector address):

- reserve some 1GB out of a 2GB card (1GB = 21bits sector address)
- every time the V-RAM is restarted (system reset or new app loading) a 21bit TRUE random number is generated
- on every block access request, read or write, the 11bit sector number is xor-ed with the 21bit random mask, and then added to a base offset.

this should spread the beating evenly in the long term, over the whole 1GB area (with a 1024:1 ratio).

Caching is required, because of the block access to the media, and to shield the flash from high frequency access bursts (i.e. a stack machine CPU working).

South of the base offset, a FAT partition can be placed. Downside is the need to partition the card.

jazzed · 2011-10-07 10:58

Flash is a perfectly reasonable solution for saving code to execute as read-only. If SD card can be made to be used like readonly Flash, that's fine too.

People can use SD card any way they like. Fortunately some analysis is already done for us and is available here: http://www.sandisk.com/media/65675/LDE_White_Paper.pdf. Appendix A is very interesting.

It would be nice to know practical limits of using SD card as RAM from our own community. I look forward to Heater's or anyone else's results.

AntoineDoinel wrote: »

South of the base offset, a FAT partition can be placed. Downside is the need to partition the card.

This seems reasonable. That is: The first partition can be for a user's readonly program and the next partition(s) for the user's file system. The actual implementation on the first partition is up to the developer.

A cache driver can use any media whatsoever for external memory hardware. It encapsulates the gory details. People can write their own drivers and use that hardware any way they like.

Propeller External Memory Hardware Survey

Comments