As noted on these forums on a number of previous occasions, HyperRAM (HR) is attractive for a number of reasons - its low cost, high density and relatively simple interface requirements.
Mouser show the ISSI 8MB, IS66WVH8M8BLL-100B1LI at $4.47 - qty 1- and dual-die stack 16 MB parts (IS66WVH16M8BLL-100B1LI) are also now available for $6.58. There is also an equivalent 8MB Cypress part, the S27KL0641DABHI020 for only $2.86, qty 1.
I’ve used HR previously in several XMOS-based instrument designs, and you’ll find some useful background information about these here -
Recently, I’ve been working on interfacing HyperRAM to P2 using a smart pin to generate the clock signal and the streamer to manage the data bus transactions. Some careful management of the RWDS signal is also required.
I connected a small HR breakout PCB to a P2 emulation running on a DE2-115 FPGA and then wrote some code to configure the HR registers and read/write 128 byte buffers from/to HR.
I’ve got these latter routines running in ~ 2.6 usec, affording a data transfer rate between HUB RAM and HyperRAM in excess of 50 MB/s. The key step in getting this working was to correctly synchronize the smart pin and streamer so as to obtain reliable DDR transfers.
To test the interface I’ve developed a LabVIEW front end that allows a selection of different 128 byte waveforms to be downloaded into HyperRAM and then read back for comparison.
I’ve attached some screen grabs showing typical results. Here, reading/writing HyperRAM is done by reference to a block number; in an 8MB HR chip we have 65536 x 128 of these blocks. The two graph windows show the dataset written to HyperRAM (left) and read back (right), allowing for a quick visual validation of the data transfer.
At present, I’m seeing an occasional glitch in a few bytes in the data transfer in a couple of the waveforms I’ve tested. I’m currently waiting on a new PCB that I’m confident will fix this problem - which completely disappears if I touch the CLK signal from the DE2-115 board to the corresponding pin on the HR breakout with my finger (a capacitance/drive issue ??).
I’d like to thank ozpropdev for a P2 v19 version of his logic analyzer code for the DE2-115 that helped me sort out a few issues I had along the way…
After getting this far, I decided to have an attempt at a proof-of-concept for the P1. I started out by making a small P1-based PCB with an HR chip mounted on a small castellated breakout PCB – see the attached photo. The pin labelled GND in the lower left of my HyperRAM breakout is actually a Vcc pin and some surgery was needed to fix this issue (hence the red wire - that appears to short Vcc and GND !).
On the P1 the main issue is that hub accesses can only be guaranteed every 16 clock cycles (or 200 ns) - and this is only after you’ve synchronized your memory transfers. This means that in between each successive rdbyte/wrbyte instruction you’ve got just 2 intermediate instructions to work with.
Given the self-refresh requirements of the HR chip (with data buffer transfers typically restricted to a maximum of 4 us, but which can be stretched out to 8 us) it seemed the buffer size on the P1 would inevitably be restricted to a modest 16 bytes. For that buffer size it turns out there’s a roughly 100% overhead to provide a memory address plus additional latency clocks.
Based on these thoughts, however, it did still seem that a target goal of slightly > 2 MB/s would be feasible on the P1 (that’s the buffer transfer rate though).
To cut a long story short, after drawing some initial inspiration from some neat OBEX code posted by Tracy Allen, I’ve succeeded in achieving a viable P1 HR interface solution using 3 COG’s.
I’ll post P1 and P2 versions of my code after I’ve added some liberal comments to aid understanding. It will be after the coming weekend before I can get back on to this again but I felt forum members would be interested to learn of progress made thus far.
I’m fairly confident that others may see ways for further improvement given that I’m no code guru. Comparing my own P2 vs. P1 programming experiences after getting to this point we’ll be truly empowered when we’ve got some real Si to play with – the P2 instruction set/functionality really does provide us with some awesome capabilities.
Thus far, I’ve been able to document an ~ 25-fold performance increase on the P2 version using some very compact code (smart pin/streamer +++ !!) and even that factor can very likely be improved upon. I’m sure there are numerous other applications out there that will really benefit from a P2…