LY68L6400 8MB 8-pin RAM and SPI
Peter Jakacki
Posts: 10,193
I decided to order some of these LY68L6400 64Mbit RAM chips on Friday and had them turn up on my doorstep on Monday! Even though I don't have a special pcb for them I reasoned that I would test them out in standard single SPI mode by replacing the Flash chip on my P2D2, which I did. Using the standard SF commands that I have for serial Flash I could read the ID, write, read, and dump from it just like Flash although I will write some specific commands for this device. While the memory will be useful for all kinds of stuff I really want to be able to run the SPI bus up to the 84MHz continuous sequential read speed possible, whereas now the SPI is bit-bashed and runs about 1/10 sysclk, so 25MHz for 250MHz P2 clock.
I'd like to experiment with an 8 color VGA mode first with this basic arrangement since I should be able to read 3-bits for every pixel. If I could run it in QSPI mode then full 640x480x8 is possible and although it would involve an extra cog working full-time to buffer a scan-line, it would mean that most of fast hub RAM would be available for other things.
Here's the thing, the Smartpin SPI modes I've seen seem to be awkward but I may be mistaken. Is there a good example of using the smartpins in SPI mode that might be useful?
Here's a terminal session interacting with the chip using TAQOZ SF commands.
and copying the TAQOZ dictionary into it as well (easy to see the ASCII)
Checking upper 4MB and 8MB for mirroring or random.
I'd like to experiment with an 8 color VGA mode first with this basic arrangement since I should be able to read 3-bits for every pixel. If I could run it in QSPI mode then full 640x480x8 is possible and although it would involve an extra cog working full-time to buffer a scan-line, it would mean that most of fast hub RAM would be available for other things.
Here's the thing, the Smartpin SPI modes I've seen seem to be awkward but I may be mistaken. Is there a good example of using the smartpins in SPI mode that might be useful?
Here's a terminal session interacting with the chip using TAQOZ SF commands.
TAQOZ# SFJID .L --- $FFFF_FC0D ok TAQOZ# $8.0000 $40 SF DUMP --- 0008_0000: AA CA AA AB AA CA EA BA AA AB AF CB D2 A8 BE AE '................' 0008_0010: AA AA 29 CF AE AA BA FA 4E AE AB FF AA EA EF BD '..).....N.......' 0008_0020: 7A EF AE 2A B2 3C AB AA 3B AB AA AE AE B0 A2 FA 'z..*.<..;.......' 0008_0030: AA 89 BE DE AA FA F8 AA A8 AB CD AF EA B0 AA BC '................' TAQOZ# $8.0000 $40 DUMP --- 0008_0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 0008_0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 0008_0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 0008_0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................'
and copying the TAQOZ dictionary into it as well (easy to see the ASCII)
TAQOZ# WE --- ok TAQOZ# @WORDS $8.0000 $4000 SFWRS --- \ TAQOZ# $8.0000 $40 DUMP --- 0008_0000: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 0008_0010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 0008_0020: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' 0008_0030: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 '................' TAQOZ# $8.0000 $40 SF DUMP --- 0008_0000: 06 52 47 42 53 51 5A 1E 53 04 4D 55 58 51 14 53 '.RGBSQZ.S.MUXQ.S' 0008_0010: 04 53 45 54 51 0A 53 04 4D 55 58 51 00 53 07 4D '.SETQ.S.MUXQ.S.M' 0008_0020: 4F 56 42 59 54 53 F6 52 06 53 45 55 53 53 52 EC 'OVBYTS.R.SEUSSR.' 0008_0030: 52 06 53 45 55 53 53 46 E2 52 06 4D 45 52 47 45 'R.SEUSSF.R.MERGE'
Checking upper 4MB and 8MB for mirroring or random.
TAQOZ# $1.8000 $40 SF DUMP --- 0001_8000: AA 8A AA AB 8A CA AA B2 AA AB AF 8B 42 A8 AA AE '............B...' 0001_8010: AA 2A 29 CF AE AA BA BA 4E AE AB EA AA AA EE BD '.*).....N.......' 0001_8020: 7A 5F AE 2E B2 1C AA 28 3A 2B AA AE AA B0 A2 FA 'z_.....(:+......' 0001_8030: AA 89 BA DE AA FA F8 8A A8 AA CC 8F EA 30 A2 A0 '.............0..' ok TAQOZ# 4 MB .L --- $0040_0000 ok TAQOZ# $40.8000 $40 SF DUMP --- 0040_8000: AA AA AA AA AA A2 DA AA AA AA 8A AA BA AA A8 A2 '................' 0040_8010: AA AA 2E AA AA AA 1A AA A8 AA EA AA AA A8 A9 B2 '................' 0040_8020: AA AA AA 2E E2 AA A6 9A AB 2A 2A AA AA AA AA AA '.........**.....' 0040_8030: AA AA AB AE AB 28 EA BA AA AA 2A CA AA AA 8A AA '.....(....*.....' ok TAQOZ# $7F.8000 $40 SF DUMP --- 007F_8000: 57 57 DD 55 55 5D 55 41 55 55 55 55 55 55 55 55 'WW.UU]UAUUUUUUUU' 007F_8010: 55 05 1D 57 D5 55 55 51 55 55 55 95 DF 55 55 54 'U..W.UUQUUU..UUT' 007F_8020: 5F 57 55 55 5D D5 15 55 55 55 55 55 55 55 D5 D5 '_WUU]..UUUUUUU..' 007F_8030: 55 55 D7 45 55 55 5D 55 75 57 55 55 55 55 55 55 'UU.EUU]UuWUUUUUU' ok TAQOZ#
Comments
A streamer can do better because it DMAs the hub data direct. There is a lot more setup for the streamers but, for block transfers, it would worth the effort in the end.
The most useful thing I can see for SPI HW, is in buying time inside the short CSL window the PSRAMs have. ie transfer more bits per address+data-block.
I think P2 QSPI in SPI HW did not make the cut, but the streamer can do nibbles, so that may be a means to QSPI ?
One piece of info that's annoyingly hard to glean from these PSRAMs, is their tolerance on CS duty cycles.
eg Some spec refresh times of 16~64ms region, and CSL of 4us, but are less clear on just when the refresh counter advances ?
eg If it advances on a CS edge, (no internal clocks) and ignores user address, then you need 8192 pulses inside (say) 64ms to keep refresh.
It a separate clock runs inside(when CS=H), it just needs a certain % of HI time on CS, and the CSL time can be stretched.
Maybe TAQOZ can do some retention checks, and vary CS to see when the memory fades, and what duty CS needs ?
I'd expect Write, then CS=L for 6 seconds, then read, to fail, but write,CS=H.32ms, then CS=L.32ms,read, may be ok ?
Please can you post transfer speed (MB/s) numbers for LY68L6400 SPI and raw SD card sectors (without FAT32) for comparison?.
Is there any other (high capacity) storage available for P2 with high transfer rates?
Thank you!
http://www.avalanche-technology.com/products/discrete-mram/p-sram-gen-2/
1Mbit – 16Mbit SPI MRAM
Memory for evanh
Not sure where to buy it yet .... and endurance may need to be watched.
The site has a link that, as far as I understand, claims infinite write endurance (as any MRAM-based device should have?)
As an exercise I might even output 1bpp VGA from a cog and see how that goes. If it works well then that means a dedicated chip in QPI mode can do 4bpp VGA or with dual chips 8bpp VGA. I'm not sure how well modern monitors handle flicker if I were to in this 1-bit SPI arrangement in place of the Flash, and output 2-bits for every pixel by using 1-bit alternate frames I could achieve 4-color VGA.
@Ramon
The SPI bus is bit-bashed and runs at the same speed as SD etc and is about 1/10 of the sysclk. With a 240MHz that translates to a 24MHz clock and we need to use smartpin SPI effectively to really push the limits.
Timing a SPI RAM multi-block WRITE reveals we are transferring a byte every 377ns or 2.65MBs at 240MHz sysclk.
Read timing is very similar too with a byte read every 383ns (reading 100kB)
These figures aren't much different from SD reads which suffer from setup latency but power along with multi-block reads. The advantage of the RAM is that it needs only about 3us to setup the address.
and so in TAQOZ read a single random byte anywhere in the 8MB takes about 4us.
On the Teensy Audio interface board they had two options on layout: SD Card and 8-SOIC for FLASH or RAM (W25Q128/23LC1024).
They did that with the purpose of having a low latency storage for wavetable/LUT or recording. Someone reported exact numbers at different block sizes (512 bytes, 50 bytes, 32K) and I remember that SD card can have several tens of us of latency while Flash and RAM can decrease latency to <5 us. I don't have the links at hand but they can be searched on their forums (not sure if they were posted on Teensy or Arduino forums).
(There is reaserch suggesting A2 rated cards are actually slower, since they only need to reach their 4000 IOPS goal when cheating using advanced protocol features)
Yes, I can never understand why some insist on using their old worn out card when a brand new Sandisk Ultra 16GB A1 card can be had for ten bucks of less, and a 32GB for only a couple of dollars more.
Here's the speed and latency test for an "old" 8GB Ultra from 2016. The sector speeds are spread over the drive and include the latency.
This is the full disk report:
All you'd actually need is a memory driver that matches my proposed memory driver spec in this thread linked below and it would just work, assuming you give the video COG some priority and limit or fragment the requests from the non video COGs accordingly to meet latency. If the entire line is back within 25us or so, the mouse sprite can be drawn over it too.
http://forums.parallax.com/discussion/170645/proposed-external-hyperram-memory-interface-suitable-for-video-drivers-with-other-cogs#latest
If simpler address-once bursts are needed, there is
ISSI's IS62WVS5128GBLL, which is 45MHz QSPI 4MBit SRAM, (stocked at Mouser)
and the new
AS3016204 is 54MHz(DDR)/108MHz QSPI 16Mbit MRAM (also comes in 1Mb, 4Mb, 8Mb)
Constants in use
And config code
EDIT: Oh! Now I remember, the SPI clock is still the messy part. To get to sysclock/2 needs the streamer to generate the SPI clock. So the above code is notably better solution.
PS: Synchronous serial receive smartpin mode is, funnily, a lot easier to handle because it aligns nicely with whatever the external SPI clock is - like regular SPI hardware. With the prop2 as master, it can happily go to sysclock/2. And with it configured for 32-bit word size it'll be quite manageable for bursting a block without needing the streamer for anything.
Okay, here's QSPI transmit anyway. The heart is the same, just tweaked cycles and streamer width.
Config code
Using smartpins for rx works pretty well up to Dual SPI at least. The smartpin rx mode has a big advantage over the tx mode - clock and data are actually related, the SPI device has responded to the propeller produced clock, and both flow as external inputs in unison to the rx smartpins. See https://forums.parallax.com/discussion/comment/1480866/#Comment_1480866 However, there is a couple of limits if trying to go faster. Processing of Quad SPI or QPI will struggle to compete with that. Dual is close to maxing out the cog. And the clock pin will need to be physically in the middle of the data pins too for the B-input to function as SPI clock in the smartpins.
Using a streamer eliminates both the processing overhead and the clock pin selection limit. But this still needs sorted. It won't be as easy as the tx streamer methods.
Tx at those speeds always needs the streamer involved. Either as demo'd above or as the SPI clock generator instead. An advantage of going with the clock generator method is it frees up the cog's hubram FIFO.
EDIT: PS: There is a third way to use the streamer for tx. It would involved encoding the SPI clock as one of the parallel data bits within in the data block to be DMA'd from hubram. This eliminates the phase compensation guesswork but obviously incurs the encoding overhead.
I guess sysclock/3 is also an option. Not sure.