If it is true that I don't have to drive RWDS low first before high after the address phase..
If that means after the float time, you could always use a modest pull down, so the pin idles low, if that buys some sysclks ?
Possibly helpful if the existing approach doesn't work out for some reason, though the Parallax board does not have this resistor - I guess it could be soldered on.
Yanomani has suggested using the pin mode to select an internal pulldown, that should avoid PCB changes ?
The only thing I can see that might be of concern is where the P2 enables its output beginning half way between clocks 3 and 4 as you can see where we come out of ti-state in the waveform below (yellow=RWDS, cyan=CLK). I can't easily delay this output further. If the HyperRAM has not shut off by then there will be contention.
The octalRAM spec gives this from the 6th clock edge tDQSCK Clock transition to DQSM valid 2~7ns
so looks like on a typical basis, you are just ok. Any contention will be timing-skew short, single digit ns ?
From the main memory aray access time standpoint, I believe it'll be ok doing some overclocking, but, from a DQ[7:0} and RWDS (if used as a write/no write discriminator) standpoint I believe some concerns will arrise.
First, VccQ should be maintained at 3.0 V, as specc'd, and even taking that precaution, noise and crosstalk at the interface lines will start showing all its deleterious potential, and one should face the problem of having to switch to stripline-routing, abandomning the microstrip model. More GND planes, more layers, and cost.
Can be done, but hardly at justifyable costs. Better increase the bus widht to 16, 24, or even 32 bits, with sound results in throughput, at manageable costs.
P.S. With a 12-layered pcb, almost every P2 dream comes true, but the costs... Simply thrilling
A cascade of fet switches (or level translators used as switches) can be used as a bucket-brigade, by feeding them with 3.3V, both sides, when they have dual power suplies.
Each stage will introduce <300 pS of delay, but the numbers are not completelly deterministic, though stable within their opperational range.
A simple 8-bit latch can bring a thermometer-alike control to the enable pins of the fet gates, so the ammount of delay can be dinamically managed and exercised.
Following that line of thinking I decided to have a look for fast gates. The TinyLogic range fits. NC7SV34 looks good - https://www.onsemi.com/products/standard-logic/buffers/nc7sv34
There is faster gates but I'm thinking 1.5 ns is right for Hyperbus v1.0.
Put that on the accessory board for HR clock only. Then run all prop2 pins as registered.
PPS: I guess a small capacitor is about as good. I did try a 33 pF ceramic that was lying around, and it worked, although I have no idea the best value to use at this stage.
Trully better, including the fact it tops at 2.8 nS, same temp and voltage range as the other one.
The low cost, tinny size and its minimum delay of 0.5 nS makes me wonder if it's not advisable to have at least 2 or even 3 of them, in a cascade.
A solder jumper can be used to select how much delay is needed, to adjust for Hyper parts variations.
Also, could leverage the use of an inverted CK, under software control at the smart pin level.
Being able to adjust the phase of CK between 90° and 270° in respect to the start of the data bits can be a sure recipe to suceed in that kind of application.
Possibly helpful if the existing approach doesn't work out for some reason, though the Parallax board does not have this resistor - I guess it could be soldered on.
Yanomani has suggested using the pin mode to select an internal pulldown, that should avoid PCB changes ?
Good call. I keep forgetting that the P2 can do internal pullups/pulldowns etc.
There is a 10ohm resistor on the Parallax HyperRAM board that can help if there is transient contention. Also if we apply the same initial level based on the latency at the point when the signal comes on from the P2, we may avoid some of the actual contention as we will be driving close to the same output level as the HyperRAM did before it switches off, though this is a transient event too. Maybe we can use pullup/pulldown for high/low pin levels instead of full GPIO I/O power.
Both CK and RWDS parks at LOW level nicelly, so FLTL solves them. DQs can be wichever you choose. More options, more simplicity.
P.S. On second thought, to avoid any possibility of undesired (though minimum) current flow at the Hypers DQ/RWDS I/O structure, IMHO, both should be parked at the same logic levels.
Yes the idle case for RWDS is good being low. Also if we use any pullups/pulldowns instead of full strength P2 pin outputs the transitions may not be fast enough for driving at the highest P2 clock rates unless they are rather low resistances and input capacitances. We don't want to introduce extra skew for the RWDS timing.
Any solution that can bring those 9 signals in unison, forward and backward, between the P2 and the Hypers can be considered a good one. P.S. (DQs + RWDS)...
A rattle drum, knocking at anyone's door (ears) is the last thing a good (aka sound) interface needs to have.
.Just done a few more tests of sysclock/1 using a couple more small capacitors. Results are pointing to some real gains to be made by designing a circuit board around a HyperRAM + Prop2 with low attenuation layout. ie: tightly arranged together.
I'm able to, right now, at 3.3 Volts, consistently burst write data into the hyperram at up to 350 MT/s. That's exceeding even the 1.8 V rating!
EDIT: I've reduced the capacitor value from 33 pF to 22 pF to get this fast. At 33 pF the sysclock limit was 230 MHz instead. But for 22 pF to work I also had to use an unregistered pin for the HR clock. Otherwise it wasn't enough phase shift on the clock signal.
One question on my mind is the use of the capacitor for lagging the clock signal. When designing a circuit board, I don't have the confidence to say it's the best option - compared to one of those fast buffers say.
There's no way I can measure the timings sadly. It would need an active probe and a faster scope than I have. Anyone else feel they've got the skills to characterise it further? I can help by cleaning up and documenting/explaining this testing code I've cobbled together. Err, scratch that, I don't have to be measuring it at 350 MT/s to see the lagging delay.
I'm able to, right now, at 3.3 Volts, consistently burst write data into the hyperram at up to 350 MT/s. That's exceeding even the 1.8 V rating!
Wow that's pretty amazing. I am only doing sysclk/2 for writes in my driver right now and the clock is nicely centered in the middle of the data. Perhaps it could be done higher with a capacitor, though it does seem a bit board dependent. Something to think about later...
Yes, aiming for it to have clear guidelines for board layout.
It'll be the only reliable way. Notably, more so for reading data at speed. At the moment, reads at sysclock/1 have that crazy lagging slope that comes and goes at different frequencies.
Ha, it's notable that the data trace has a natural slope shallower than the clock trace. It's particularly true once the accessory board is attached. And adding the capacitor brings the clock slope parallel to the data slope.
Since my 1990'ish, 40 MHz oscilloscope did perished long ago, flooded by Rio's moisty and salty atmosphere, every interface signal waveforms I can put my eyes on, acts like some refreshing eye drops.
At the former images, are you showing us data bit#1 (at the accessory header, P17), and HyperCk waveforms for a first-byte-write, within some generic word write cycle?
Am I understanding it right, or not?
P.S. Also, in case of writes, the more one can bring HyperCK leading and trailing edges centered with respect to databits, the better.
.
I'm able to, right now, at 3.3 Volts, consistently burst write data into the hyperram at up to 350 MT/s. That's exceeding even the 1.8 V rating!
EDIT: I've reduced the capacitor value from 33 pF to 22 pF to get this fast. At 33 pF the sysclock limit was 230 MHz instead. But for 22 pF to work I also had to use an unregistered pin for the HR clock. Otherwise it wasn't enough phase shift on the clock signal.
... Perhaps it could be done higher with a capacitor, though it does seem a bit board dependent. Something to think about later...
The new ISSI OctalRAMs have a pattern-read mode (ROM), which looks like it is intended to be used for PCB tuning, or driver-delay adjustments - a nice idea.
At the former images, are you showing us data bit#1 (at the accessory header, P17), and HyperCk waveforms for a first-byte-write, within some generic word write cycle?
Am I understanding it right, or not?
For repeating signal (so the scope can reconstruct waveform), I ended up writing a custom program that simply drives the two pins toggling every sysclock in unison. It is a bit of a cut'n'paste.
Wasn't really my intent to make a capacitor the ultimate solution.
Why not ?
It may have less variation than other solutions, which have large max-min spreads.
A cap ( and maybe some modest Series R) uses the P2 driver and PCB layout effects to move the edge.
For repeating signal (so the scope can reconstruct waveform), I ended up writing a custom program that simply drives the two pins toggling every sysclock in unison. It is a bit of a cut'n'paste.
Ah, ok! Now I understand why old eyes of mine did complained; seasoned brain too...
Even if the Hyper had a super-hyper PLL (wich it don't has), by closelly tracking HyperCK, it would be almost impossible for it to be able to deliver DQ just in time during Read cycles, as suggested by the images. So I've rulled out Read cycles.
Else, if it was RWDS, paired with any DQ, well... Exccelent layout and decoupling could perhaps explain why they appear to track that close. But you stated that they were CK and one of DQs.
Finally, even if it was a Write cycle, HyperCK would be wrong, due to absense of phase lag.
@evanh, I noticed you had other pin drive options like 1.5k ohm, bitDAC etc in your test code. So did you find any benefits with enabling other output options on the shape/timing of the delayed clock waveform etc or are you still looking into it? It would be great to find something that is sort of "universally" useful to apply here, if there is such an option.
@evanh, I noticed you had other pin drive options like 1.5k ohm, bitDAC etc in your test code. So did you find any benefits with enabling other output options on the shape/timing of the delayed clock waveform etc or are you still looking into it? It would be great to find something that is sort of "universally" useful to apply here, if there is such an option.
From months back. Those all were terrible outcomes but I've just left them there as reminders.
Those charts Brian made were very useful. It appears there are a couple of frequency ranges where the timing differences between registered vs non-registered IO behaviour don't overlap very much. Around 88-92MHz and 228-232MHz look tight and could be frequencies bands to avoid with HyperRAM if there is a choice. How constant these charts are over temp/voltage/layout etc I don't know. But the good thing is the range 228-280MHz will cover the 252MHz/270MHz TDMS operation and the next range up from 268-308MHz will cover the handy frequency of 297MHz, which are all useful for video. Another sweet spot for video at 200MHz looks good too.
Comments
Yanomani has suggested using the pin mode to select an internal pulldown, that should avoid PCB changes ?
tDQSCK Clock transition to DQSM valid 2~7ns
so looks like on a typical basis, you are just ok. Any contention will be timing-skew short, single digit ns ?
First, VccQ should be maintained at 3.0 V, as specc'd, and even taking that precaution, noise and crosstalk at the interface lines will start showing all its deleterious potential, and one should face the problem of having to switch to stripline-routing, abandomning the microstrip model. More GND planes, more layers, and cost.
Can be done, but hardly at justifyable costs. Better increase the bus widht to 16, 24, or even 32 bits, with sound results in throughput, at manageable costs.
P.S. With a 12-layered pcb, almost every P2 dream comes true, but the costs... Simply thrilling
Following that line of thinking I decided to have a look for fast gates. The TinyLogic range fits. NC7SV34 looks good - https://www.onsemi.com/products/standard-logic/buffers/nc7sv34
There is faster gates but I'm thinking 1.5 ns is right for Hyperbus v1.0.
Put that on the accessory board for HR clock only. Then run all prop2 pins as registered.
PS: NC7SV125 is rated for 1.0 ns, if you think that's preferable - https://www.onsemi.com/products/standard-logic/buffers/nc7sv125
PPS: I guess a small capacitor is about as good. I did try a 33 pF ceramic that was lying around, and it worked, although I have no idea the best value to use at this stage.
The low cost, tinny size and its minimum delay of 0.5 nS makes me wonder if it's not advisable to have at least 2 or even 3 of them, in a cascade.
A solder jumper can be used to select how much delay is needed, to adjust for Hyper parts variations.
Also, could leverage the use of an inverted CK, under software control at the smart pin level.
Being able to adjust the phase of CK between 90° and 270° in respect to the start of the data bits can be a sure recipe to suceed in that kind of application.
There is a 10ohm resistor on the Parallax HyperRAM board that can help if there is transient contention. Also if we apply the same initial level based on the latency at the point when the signal comes on from the P2, we may avoid some of the actual contention as we will be driving close to the same output level as the HyperRAM did before it switches off, though this is a transient event too. Maybe we can use pullup/pulldown for high/low pin levels instead of full GPIO I/O power.
P.S. On second thought, to avoid any possibility of undesired (though minimum) current flow at the Hypers DQ/RWDS I/O structure, IMHO, both should be parked at the same logic levels.
A rattle drum, knocking at anyone's door (ears) is the last thing a good (aka sound) interface needs to have.
I'm able to, right now, at 3.3 Volts, consistently burst write data into the hyperram at up to 350 MT/s. That's exceeding even the 1.8 V rating!
EDIT: I've reduced the capacitor value from 33 pF to 22 pF to get this fast. At 33 pF the sysclock limit was 230 MHz instead. But for 22 pF to work I also had to use an unregistered pin for the HR clock. Otherwise it wasn't enough phase shift on the clock signal.
There's no way I can measure the timings sadly. It would need an active probe and a faster scope than I have. Anyone else feel they've got the skills to characterise it further? I can help by cleaning up and documenting/explaining this testing code I've cobbled together. Err, scratch that, I don't have to be measuring it at 350 MT/s to see the lagging delay.
Wow that's pretty amazing. I am only doing sysclk/2 for writes in my driver right now and the clock is nicely centered in the middle of the data. Perhaps it could be done higher with a capacitor, though it does seem a bit board dependent. Something to think about later...
It'll be the only reliable way. Notably, more so for reading data at speed. At the moment, reads at sysclock/1 have that crazy lagging slope that comes and goes at different frequencies.
- Orange traces are hyper clock pin at the accessory header, P24
- Blue traces are data bit#1 at the accessory header, P17
Data and Clock pins both registered, no accessory board:
Data and Clock pins both registered, unmodified accessory board:
Data and Clock pins both registered, 22 pF capacitor attached to hyper clock at accessory header, P24:
Clock pin unregistered, no accessory board:
Clock pin unregistered, unmodified accessory board:
Clock pin unregistered, 22 pF capacitor attached to hyper clock at accessory header, P24:
This last one has sufficient clock lag to provide the required data setup timing.
Since my 1990'ish, 40 MHz oscilloscope did perished long ago, flooded by Rio's moisty and salty atmosphere, every interface signal waveforms I can put my eyes on, acts like some refreshing eye drops.
At the former images, are you showing us data bit#1 (at the accessory header, P17), and HyperCk waveforms for a first-byte-write, within some generic word write cycle?
Am I understanding it right, or not?
P.S. Also, in case of writes, the more one can bring HyperCK leading and trailing edges centered with respect to databits, the better.
How stable is that over temperature ?
The new ISSI OctalRAMs have a pattern-read mode (ROM), which looks like it is intended to be used for PCB tuning, or driver-delay adjustments - a nice idea.
This is with both pins registered:
And this is with hyper clock pin unregistered:
It may have less variation than other solutions, which have large max-min spreads.
A cap ( and maybe some modest Series R) uses the P2 driver and PCB layout effects to move the edge.
Ah, ok! Now I understand why old eyes of mine did complained; seasoned brain too...
Even if the Hyper had a super-hyper PLL (wich it don't has), by closelly tracking HyperCK, it would be almost impossible for it to be able to deliver DQ just in time during Read cycles, as suggested by the images. So I've rulled out Read cycles.
Else, if it was RWDS, paired with any DQ, well... Exccelent layout and decoupling could perhaps explain why they appear to track that close. But you stated that they were CK and one of DQs.
Finally, even if it was a Write cycle, HyperCK would be wrong, due to absense of phase lag.
Misunderstanding did generated the question.
Now, the exercise is clear to me. Thanks a lot!
I'm hoping that'll improve a lot with close layout between the Prop2 and HyperRAM on one specialised circuit board.
Those charts Brian made were very useful. It appears there are a couple of frequency ranges where the timing differences between registered vs non-registered IO behaviour don't overlap very much. Around 88-92MHz and 228-232MHz look tight and could be frequencies bands to avoid with HyperRAM if there is a choice. How constant these charts are over temp/voltage/layout etc I don't know. But the good thing is the range 228-280MHz will cover the 252MHz/270MHz TDMS operation and the next range up from 268-308MHz will cover the handy frequency of 297MHz, which are all useful for video. Another sweet spot for video at 200MHz looks good too.