Memory expansion for P1
thej
Posts: 232
in Propeller 1
I've been digging through the forum for memory expansion options for the P1 but have not been finding much that is still available.
What P1 boards with extra RAM or RAM boards that can expand an existing P1 board are available?
Thanks!
J
What P1 boards with extra RAM or RAM boards that can expand an existing P1 board are available?
Thanks!
J
Comments
http://mghdesigns.com/propeller/dna.html
You can plug a SPI SRAM into one of the sockets.
Since it is "arduino compatible", I guess should be able to use an arduino ram shield.
Any reasons why that might not work?
I had found the 32MB expansion board made by Jazzed but the site linked in the forum no longer works.
http://forums.parallax.com/discussion/127044
However, I will need a lot more than 32K for my various "planed" hijinks ...er... projects.
I'm currently looking at this arduino board.
https://www.tindie.com/products/FemtoCow/spi-ram-shield-for-arduino/
It has up to 512KB on board.
Peter, how would accessing this from tachyon work?
It is a SPI interface to the RAM board but does it need to use LMM or XMM ??
Is that in Tachyon? (still reading through the docs)
I use this as the base in one of my projects that runs a large database and big C (Catalina C) program.
Not sure what mileage you will get from an Arduino RAM add-on board. There are hardware considerations when mixing Arduino an Propeller as the prop runs on 3V3. SPI RAM has speed restrictions too.
BTW, Tachyon does have an LMM instruction but that is mainly to allow it to run PASM code from hub, and the only reason I need to do that is because it involves hub operations such as coginit and clkset etc that need to be executed from PASM but aren't time critical that they need to be in cog memory.
What does 'a lot more' mean in numbers ?
What speed do you need, and what memory types ?
Microchip and OnSemi have 128k Byte SPI Static SRAM parts.
There are HyperRAM and OctaRAM parts, if you can work with SDRAM refresh rules.
Then, FRAM can cover both RAM and FLASH, tho at a price premium. I see 4Mbit FRAMs are shipping.
Or, there is ReRAM, (MB85AS4MT) - cheaper than FRAM, 1.2M cycles, with a 256 byte page write, write time seems bit-changed proportional ? Not quad SPI & modest MHz, but may work well with P1 ?
Or MRAM, also showing 4MBits, is close to FRAM in having 'instant' writes.
Serial FLASH is the lowest cost per-bit, but it can have larger erase times, so it's less 'RAM replacement' useful.
What is the max random byte reads per second that can one reasonably achieve with external RAM?
I'm just getting started with some parallel SRAM and I'm getting 14,500 random byte read or writes per second and a little over 20,000 sequential byte reads or writes per second, but it's just a 70ns chip and I've just been working in spin so far.
My goal with parallel memory interface would be to explore sequencing other cogs to read the data at the same time as one cog writes it to RAM by adding a few id pins too. At this point it is academic and I can't get a P2 cheaply. I recognize that there are faster ways and your speeds are impressive.
You can certainly get much faster than that in ASM. The P1 has a 20MHz opcode rate, but can also generate a clock to 40MHz using the timers.
In ASM, the generic 4 opcodes of shift.clk.data.clk give a peak SPI clock of 5MHz
Depends on what you have already, and how much memory you need, and how you want to read it ?
some examples :
Simplest are QuadSPI SRAMs like 23LCV1024, N01S830 (above) - these have a sticky quad mode, so you can load CMD.ADDRESS with a 10 spi-clock overhead to first nibble out.
They are low cost, and easy to use - but will not quite hit video playback speeds. Their 20MHz spec is for 2.5V, so 3v3 can likely go higher.
Also simple to understand, are parallel SRAMs and helper logic.
You need to set address, so latches, counters, PLDs, FPGAs, or even 'use another P1' are all possible ways to wrap generic SRAM, to improve pin counts.
A cpld could let you latch address and auto-inc on R/W, for best burst bandwidth.
More compact and modern, but harder to start with, is a RAM intensive FPGA like Lattice ICE40UP5K-SG48
This would make a nifty P1 companion, & in a compact QFN48 you get 1Mbit SRAM and some useful 5280 Logic elements, roughly the same price as a P1
This can clock faster than a 23LCV1024, and give many more choices on ports, and widths.
That is my intention as I'd like to explore transmitting the same data to other cogs at the same time. I translated my Spin code to C last night and it came up to a respectable 160kB/s from my CY62256N. I've attached the C code if you can recommend improvements. If I can use that same transaction time to copy that data to 4 other cogs then I can eliminate many slower hub ram operations. I have several different options to play with on the way. I intended to start with the EFM8UB3. This is just to study the potential. If I get good results I'd like to move on to a 16bit, 10ns module and a wider 16bit controller aiming for the 4 to 8MB range.
Now, a question that I have not asked yet. Can multiple cogs receive the same data from SPI RAM simultaneously? I got a deal on some 32MB SPI chips to play with N25Q032A, but I think they cap out at about 108MHz.
@thej hope all this helps your decision process
Sure, receive can be by many cogs, as they have no idea who else may be snooping.
However, SPI read is not totally read-only, you need to write the command and address, before the turnaround to data out.
That means only one COG can be in charge of command and address, & SPI_CLK, but many COGs could look at the data, if you wanted to.
Those slave parts would need to carefully sync to the SPI_CLK, or some frame pin.
Very cool, sounds like both options are worth pursuing. I'll put the results in the Shrinking Standards thread. Just waiting for several slow boats from China.
You should be able to set any pll mode to drive the video generator, including outputting the clock on a io pin. That way you don't have to output the clock using 2 bit mode but should have the real clock signal. Hub read writes won't be insurmountable problem problem if your using an external shift register, but you will have to slow things down a little. You may or may not need to transfer to hub ram depending on the application though.
Let's assume an 80mhz spi clock. It takes 8 cycles to transfer data to hub ram once your synced. Send the read command by using a waitvid. Wait for the command to complete, and disable the video generator output but leave the clock. Then wait for your data to be ready. It probably makes sense to use a 16 bit shift register. Read the word. That's 4 system clocks, then write to hub ram. That's 8 more. (If we got everything synced up before we started) do a 4cycle instruction, and we are at 16clocks. Then read the new word, write it to hub ram, execute another 4cycle instruction and repeat. Each 8 clocks, if your device supports that mode, the next byte is clocked out. You can use the extra instructions to monitor the loop iterations, do the loop itself and end the loop when you have all your data. The timing should work.
An interesting statistic is, if you can receive a block of data at 125mhz, that's 3.9 million dwords/sec, 256ns/word (not counting overhead) , or about 130us to fill up a cogs cog ram. Since It takes 100us to start a cog up and load it's memory, It would seem that it could be quite a useful way to reload and repurpose a cog in the middle of a program.A small routine in a cogs memory could load it's new program as fast as it could be reinitialized.
What if the video generator transmits at 20mpbs and we receive with a sequence of MOVs? The sustained transfer rate would certainly be less than 10mpbs, but we could transfer the data and hand the bus off to another cog.
Related:
http://forums.parallax.com/discussion/164171/what-is-the-maximum-possible-throughput-of-spi-on-the-prop
http://forums.parallax.com/discussion/166850/boost-your-propeller-s-memory-256x-from-32kb-8mb
Sending is done by setting the CTR to NCO mode with a FRQ of zero. then just use shift instructions to shift all bits through. (The other CTR can be used to generate a clock)
Receiving is little bit more complex, i think you need to set the CTR up to count edges of the clock signal (must be CLKFREQ/4 and be generated from the same clock source) where the data is high. Then by shifting the PHS register up every 4 cycles, the bits end up where they are supposed to be.
At least thats what I remember, I may be wrong.
EDIT:
The maximum speed at which data can be transferred in/out of the prop is 80 Megabit per second, by using P0 through P15 as a 16 bit parallel port and doing something like this:
(Moderators, how do you make code blocks again?)
add pointer,#2
djnz length,#loop
A single cog can do about 10 Megabit using the 20 Mhz shift tricks, if you manage to sync two up, the full bandwidth of 20 Megabit may be usable.
Note that a SD card has a limit of 25 Mhz in SPI mode.
Suppose your data rate is 10Mhz. Setup the video generator to run at 80Mhz. Send 00010000:00111100:00011000:00010000 out the video generator. You set it up so that the counter counts up on the A pin AND the B pin. The data line is the B pin, the output from the video generator is the A pin. Notice the period of the data bits are 8 clocks. The same time it takes to shift a byte out the video generator. The first bit is masked by the video generator most of the time, but right in the middle,the video generator line goes high. If the data line is high as well, and if the timing is just right, you can catch that on the counter. So you get +1 added to the counter value. Similarly for the second bit,if the data line is high and the timing is just right, you can catch it on the counter. But the output from the video generator is wider. Twice as wide actually,so you get two counts. So if the second bit is high,you get +2 added to the counter. The third bit works the same way, except that its twice as wide again. So it gets four counts if the data line is high. Thats three bits with just one instruction to send data to the video generator. You then shift left three positions and send more data to the video generator. so you need two instructions every three bits to receive data. So you have four instructions left to play with. I didn't try to go 8 clocks wide, which would allow me to get 8 counts per bit and thus 4 bits at a time, because there needs to be enough slop in the system to receive data that isn't exactly timed right. If you were communicating synchronously, with something that where the data your receiving is clocked by a timing signal that is based on the same source the propeller uses it might work. (for example,if you setup another of the propellers counters to provide it)
You CAN get 10 or even 20Mhz by just banging out instructions, but my goal was to actually be able to have enough extra time to do something useful. For SPI you just need to send and receive as fast as you can so that trick isnt as useful.
In retrospect,were missing some badly needed counter modes. One mode that would be nice would be a mode where the counter adds the phase register to itself on the rising edge of the A line is high and additionally, sets the most significant bit to 1 if the B line is high. (Adding a register to itself,is of course the same multiplying it by 2,which is the same as shifting it left) That plus the video generator and you have hardware serial I/O. Alternatively, they could perhaps have put a data in line on the video generator and modify the "Waitvid" instruction. The modified waitvid instruction would not only feed the video generator but leave the value that was shifted in in the cogram location specified in the destination register. You could use it for high speed serial communications, most especially SPI up to 125Mhz. But the point is moot as the propeller does NOT have that functionality.
That should work, but Id try receiving in the counter with shift instructions. If your data line is used to mask the clock line, then if the data line is high, then you cannot see the low to high transition and you wont count but if the data line is low, then you will see the transition. Your data will need to be inverted after its received.