Actually those extra numbers you added evanh, look like they are just the P2 clock rates. What speed did you get to for reliable reads with sysclk/1? I wasn't able to manage more than about 110-120% HyperRAM overclock when clocking the RAM over 100MHz IIRC, but maybe that falls into one of the dead bands in your list. Eg, just above 233 MHz with unregistered pins which is probably what I used initially.
... Register endianness threw me a bit at first because it has 8 bit MSB first then 8 bit LSB, ...
Ah, yeah, Chip has done a bang up job of endian handling in the streamers for chopped 8-bit hubRAM accesses but seems to have left it out for chopping up the 16-bit and 32-bit hubRAM accesses.
Adjusting the latency setting seems to me be just extra overhead. I've not tried to analyse it but it doesn't seem worth the effort at first glance. Just having to check RWDS is an extra overhead that isn't needed if the latency is left at default.
My plan for a dedicated tightly coupled HyperRAM chip would be to not even have the RWDS signal in the board layout. I wouldn't have RESET controlled either. It'll just be CS, CLK, and D0-D7.
Yeah it adds several extra instruction cycles or so. It's a very marginal gain if any for HyperRAM, though it is sort of desirable when you work with HyperFlash which has a much larger range of values.
Note that I am mainly talking about the ability to change the latency not so much the dynamic latency polling of RWDS. Here's the overall cost incurred for polling that (2 instructions).
getbyte latency, pinconfig, #3 'get latency clock edges for this bank
testp rwdspin wz 'check RWDS pin
if_z shl latency, #1 'double latency edges if RWDS is high
My plan for a dedicated tightly coupled HyperRAM chip would be to not even have the RWDS signal in the board layout. I wouldn't have RESET controlled either. It'll just be CS, CLK, and D0-D7.
When I am done with this driver (soon), I think it may also be worth stripping out and making a single bank version with some constant values for higher performance. But I want the generic one first as it is far easier to start with that and optimise than go the other way from single bank to multi-bank.
HyperFlash isn't bootable on the prop2 so doesn't register in my field of view.
I think it may still be useful for a large amount of application storage and it has a high transfer rate compared to SPI flash etc. Not sure how well it might work as a flash filesystem of some kind. I sort of like the combo idea of a single footprint part with both HyperRAM and HyperFlash supported (though it is rather expensive from what I heard).
Yeah it's probably more suitable for typically read-only use and occasional upgrades. I wonder how fast it will be. I think it is critical to get a 256-512 byte burst write going with it to have any chance of a semi-decent transfer rate. The write overhead will kill it otherwise. I haven't checked its erase time overhead either.
I've been testing large ranges of the HyperRAM chip here - while totally ignoring RWDS - and not hitting any data errors ... unless I push the refresh starvation too far that is. Then I notice a bad spot, when crossing the 2 MB boundary I think it was, that starts dropping data earlier than the rest.
... I wonder how fast it will be. I think it is critical to get a 256-512 byte burst write going with it to have any chance of a semi-decent transfer rate. The write overhead will kill it otherwise. I haven't checked its erase time overhead either.
A Cypress datasheet I've read says estimated 1 MB/s when written as 512 byte blocks. And only a few kB/s when done as 16 byte blocks. I'm assuming that doesn't include erases either.
Wow, that seems really slow. Some back of the envelope calculations I had done for an upper performance bound of write performance in my driver (which may not be realistic in practice if there are other limitations inside the HyperFlash):
There is at least ~1us per transaction turn-arounds, so even if you are not competing with video, on a 250MHz P2 you are talking something like 4 setup transactions per burst write which is at least 4us plus the actual burst time of say 512/250MB/s (at best if you can do sysclk/1 writes), making another ~2us which is a total of 6us to transfer just 512 bytes. This gives you 81MB/s ignoring any idle time or erase time which will drop it lower.
I guess the erase time must be included to get down to 1MB/s. Not sure what else is slowing it down that much but I haven't tried it yet. Perhaps if the chip is already erased the writes will be faster?
Update: I'm assuming the writes can continue on, but there may be a long delay stopping this if only one write can be in progress and you need to wait for completion before you move onto the next 512 bytes etc. Perhaps that is another reason for such a dismal rate.
Holy moly this explains it.... I looked at this and thought the units must be a mistake. It says it takes 2.9 seconds(!) to erase 256kB and 231 seconds to erase the whole 32MB chip. So S L O W !!! This is basically read-only memory.
Use the 75 ohm DAC into 22pf the rolloff is around 100 MHz so its mostly the fundamental acting to effect the phase shift of the hyperram clock.
You might need to explain that more for me. Here's what I get with three 100 MHz square wave outputs, pushing my 200 MHz scope to its analogue limit, probed at empty accessory headers:
- Pink trace is logic drive (3V3) on P32
- Orange trace is 124 ohm (3V3) BITDAC on P0
- Blue trace is 75 ohm (2V0) BITDAC on P8
I moved the Blue ground position up a little to centre it wrt the other two.
So leave your orange trace P0 alone as a reference to compare the shifts of other waveforms to
Connect your 22pF cap between P8 and P32.
You should now see the P32 (output clock) waveform move its phase depending on the bit dac amplitude you change on P8
What we're doing is like the third line of table two on this page, adding two sines of same frequency but different phase. https://www.dsprelated.com/showarticle/635.php
The amount of shift depends on the amplitudes A and B
Oh, doing some shifting around I realised where I lie the probe cables makes a big difference to amplitude. I've now laid them side by side as much as possible. Bed time for now.
I wonder if the increased amplitude created by adding this second sinusoid will exceed the maximum rating on the HyperRam clock pin in order for it to generate enough phase shift...we may need to be a little careful there.
From the ISSI HyperRAM data sheet for the IS66WVH16M8BLL under Absolute Maximum Ratings:
"During voltage transitions, inputs or I/Os may overshoot VSS to -1.0V or overshoot to VDD +1.0V, for periods up to 20 ns."
Now the duration will be under 20ns which is only a 50MHz period but will we be above 1V over? If so this is entering the latchup level.
Only half the oscillation period is available per swing/overshoot. So highest frequency that could fit is 25 MHz. 20 ns overshoot is hugely long time for digital edges.
Amusingly I'm seeing up 6 volts pk-pk here with the 100 MHz. There's no sign of clamping so I doubt it's reaching the chips though, it'll just be the dangly connections to the scope.
Measurement issues aside, once terminated by the load hopefully the natural signal attenuation due to voltage division at the clock input pin may help us out a little here. But it probably still needs to be figured out to keep the overshoot below 1V over VDD. Best to start with low DAC values and work up slowly. The problem is measuring this clock signal with the scope probes evanh is using (make sure to use 1:10 ratio and compensate the probe BTW) is that the capacitance of the probe is already in the same order as the clock pin load which is 6-9pF according to the data sheet.
First tests don't show any voltage boost. Actually not much effect at all the way I looking at it. That's why I decided to move things around to see if other pins acted any different. In hindsight I guess the 22 nF is only a fraction of a nanosecond effect so I probably wasn't looking at it right.
I need to drop back to the slower clock I had for the first 22 pF measuring I did. For visualisation, the 100 MHz is just getting in the way.
EDIT: Here's two 10 MHz square waves. Orange on P18, Blue on P24, HyperRAM accessory removed.
I get the feeling I'm doing it wrong. It seems the lower the DAC amplitude I set the bigger the effect is. Basically, if I clamp it right down to no signal it has the biggest lag effect, ie: like it was when the capacitor was going to ground before.
I wonder if the second output being summed needs a bigger phase shift for anything to be noticeable. This is the "alpha" in the formula that Tubular referenced. Maybe a different capacitance is needed, or something between that second source pin and ground to try to change the phase even more.
Inherently it wouldn't but if there is a RC network between this second pin and the clock pin that differs from the direct connection to the main driver pin, I hope it could add some phase shift. Right now that is just a series 22pF cap but perhaps more is needed there and to possibly also remove the direct connection to the main pin and add a small resistor there...not sure.
What I had before was pretty good I feel. Fortuitously, the 22 pF capacitor to GND was actually making the HR clock signal look closer to the HR data signals in shape.
Maybe something fancier would be in order for closely placed HyperRAM part in a faster board layout.
I made up an LTspice model to see how far the Hyperram clock could be nudged, in theory, without the various real world measurement artifacts that make it hard to observe
To be honest I suspect we're better off just driving the clock signal in 120 ohm bitdac mode, and adjusting the full amplitude (logic high voltage) to change where the clock signal intersects the threshold of the hyperram clock input. This is the red signal in the graph below.
My original suggestion wasn't as effective as I'd hoped, because you're manipulating the clock signal where its rising slowly (lower gradient) to get the maximum range, and while thats fine on the model, in the real world, noise on that signal is going to wreak havoc with timing.
However, if we can output a 90 degree phase shifted signal on a second pin, we could do more. I'm not sure whether that is possible, though. Perhaps the transition smartpin mode may help here
Comments
My plan for a dedicated tightly coupled HyperRAM chip would be to not even have the RWDS signal in the board layout. I wouldn't have RESET controlled either. It'll just be CS, CLK, and D0-D7.
I think it may still be useful for a large amount of application storage and it has a high transfer rate compared to SPI flash etc. Not sure how well it might work as a flash filesystem of some kind. I sort of like the combo idea of a single footprint part with both HyperRAM and HyperFlash supported (though it is rather expensive from what I heard).
There is at least ~1us per transaction turn-arounds, so even if you are not competing with video, on a 250MHz P2 you are talking something like 4 setup transactions per burst write which is at least 4us plus the actual burst time of say 512/250MB/s (at best if you can do sysclk/1 writes), making another ~2us which is a total of 6us to transfer just 512 bytes. This gives you 81MB/s ignoring any idle time or erase time which will drop it lower.
I guess the erase time must be included to get down to 1MB/s. Not sure what else is slowing it down that much but I haven't tried it yet. Perhaps if the chip is already erased the writes will be faster?
Update: I'm assuming the writes can continue on, but there may be a long delay stopping this if only one write can be in progress and you need to wait for completion before you move onto the next 512 bytes etc. Perhaps that is another reason for such a dismal rate.
- Pink trace is logic drive (3V3) on P32
- Orange trace is 124 ohm (3V3) BITDAC on P0
- Blue trace is 75 ohm (2V0) BITDAC on P8
I moved the Blue ground position up a little to centre it wrt the other two.
Do they spec the capacitance of the cro probes?
And Here's the scope - https://cdn.tmi.yokogawa.com/IM701610-01E.pdf
Connect your 22pF cap between P8 and P32.
You should now see the P32 (output clock) waveform move its phase depending on the bit dac amplitude you change on P8
What we're doing is like the third line of table two on this page, adding two sines of same frequency but different phase.
https://www.dsprelated.com/showarticle/635.php
The amount of shift depends on the amplitudes A and B
From the ISSI HyperRAM data sheet for the IS66WVH16M8BLL under Absolute Maximum Ratings:
"During voltage transitions, inputs or I/Os may overshoot VSS to -1.0V or overshoot to VDD +1.0V, for periods up to 20 ns."
Now the duration will be under 20ns which is only a 50MHz period but will we be above 1V over? If so this is entering the latchup level.
Only half the oscillation period is available per swing/overshoot. So highest frequency that could fit is 25 MHz. 20 ns overshoot is hugely long time for digital edges.
Amusingly I'm seeing up 6 volts pk-pk here with the 100 MHz. There's no sign of clamping so I doubt it's reaching the chips though, it'll just be the dangly connections to the scope.
I need to drop back to the slower clock I had for the first 22 pF measuring I did. For visualisation, the 100 MHz is just getting in the way.
EDIT: Here's two 10 MHz square waves. Orange on P18, Blue on P24, HyperRAM accessory removed.
Maybe something fancier would be in order for closely placed HyperRAM part in a faster board layout.
To be honest I suspect we're better off just driving the clock signal in 120 ohm bitdac mode, and adjusting the full amplitude (logic high voltage) to change where the clock signal intersects the threshold of the hyperram clock input. This is the red signal in the graph below.
My original suggestion wasn't as effective as I'd hoped, because you're manipulating the clock signal where its rising slowly (lower gradient) to get the maximum range, and while thats fine on the model, in the real world, noise on that signal is going to wreak havoc with timing.
However, if we can output a 90 degree phase shifted signal on a second pin, we could do more. I'm not sure whether that is possible, though. Perhaps the transition smartpin mode may help here