Excellent, thanks Von. You have to remember 200 MT/s is the rated speed for these parts at 3v3. And even at 1v8 the rated speed is still only 333 MT/s. So clearing 350 MT/s, without even optimising the board for it, I'd say we're running into severe attenuation with the >360 MT/s failures rather than any timing issue.
Interesting there was one single error observed at 256MHz on P32 even with the capacitor. Maybe some brief random noise spike?
It'll be related to the poor matching on the Eval Board for that pin group. Reflection interference or something. I get a long string of single/double digit error counts above 300 MHz. The difference could be down to me using a leaded capacitor vs Von's surface mount. Maybe 27 pF would be a safer choice.
Looking to summarise, each test hit the first error at...:
P32, no cap = 30
P32, 22pF = 256 (possible glitch, with the next error coming at 353)
P16, no cap = 89
P16, 22pF = 359
P0, no cap = 110
P0, 22pF = 360
P0, 10pF = 363
P0, 1pF = 207
It would be fair to add that without caps, P16 had significantly more errors than P0. However, they both hit the first error at similar points both with and without caps. Also P16 had fairly close matching by accident in the layout, whereas P32 was quite a bit off (and which is reflected in the results)
Thus it seems the improvement achieved by impedance matching of traces is not by itself the key to higher speeds, but it helps significantly.
I'm thinking it would be worth to trace-match at least P16 to P31 on the next Eval board, if not P32 to P47 also (if possible). P0 to P15 are already matched for customers who got a RevB board with date-code 1952 or higher, or want to grab one whilst there's only 16 left!
For the HyperRAM accessory I'll make a note to look at the layout around the data and clock traces and see if we can improve anything for a future rev, or maybe add a footprint for a cap on each clock line at least. I'm fairly sure we included some series R on the clock and data? traces... perhaps those values could be adjusted for higher speed gains too. hmm.. better open the layout...
Edit-- ok, series 10R's on the clock and RWDS signals! Yeah, a cap with fit quite nicely next to the each of the clock series R's without needing to change anything else that might have other unintended consequences..
Gosh, there's some teeny tiny trimmer cap and res packages available that would fit nicely for experimentation! I'll have a hunt around for a spare module to adjust and scope out.
Does that test code @evanh have an option to run continuously at a certain fixed rate ? (Sorry- I should look really!)
Yep, I didn't make a #define for it but it's a as simple as uncomment line 513 and set the starting XMUL to what you want.
incmod comp, #compref+9 wc 'number of columns
if_nc jmp #comploop
call #putnl
' jmp #logloop 'uncomment this line to cycle forever at one frequency
cmp xtalmul, #370 wcz
if_b ijnz xtalmul, #logloop 'loop back for next sysclock setting, keep going up until crash
Right now my driver doesn't support sysclk/1 writes (only sysclk/2), though it could be added in time if this capacitor change and signal trace matching works out well. It may add a slight penalty of a couple of instructions to dynamically select it in the write path but I'd think it should be able to fit in the code space.
and about sysclk/1, is that something that the test-code simulates (or could it?)
Not quite sure what you are asking for there. I think evanh's standalone test code is actually using sysclk/1 operation for writes, so no need to simulate, just use it for real.
Yep, with HR_DIV (dmadiv) = 1 and HR_READS not defined it'll burst write at sysclock/1. If you define HR_READS then it'll burst read at sysclock/1 instead.
EDIT: And, yes, as Roger says, it really is writing lots of data to the HyperRAM chip. The high bit error counts reported, ~400,000, is roughly 50% of the data written for each case.
Only one column will likely match good results without error. There is one software timing delay value that lines up with getting the correct result (I think that is what evanh is calling "compensations" in the table but I might be wrong...)
For sysclock/1, only a single column can be valid. Doesn't matter which column, I have so many columns just so that I don't have to guess the right compensation as much. For syscloc/2 potentially up to two columns could be valid. Sysclock/3 can achieve three columns, and so on ...
EDIT: The constant is actually called "dmadiv". It's a legacy of much earlier code that was copying straight between two streamers long before I had my hands on any HyperRAM parts.
Lol, you asked! A custom layout with dedicated prop2 pins for single HR chip. No long routes beyond the HR so that best speed can be achieved. Particularly interested in how that would improve usable frequency bands of reading data out of the HyperRAM chip.
And obviously it would make sense to test out a board fitted with new 3V v2.0 HyperRAMs as soon as those chips become available - must be soon. We could see how well they perform given they are not going to be overclocked when running a P2 up 333MHz.
Oh, and certainly have no use for the /RESET pin on a HyperRAM. Rescue an I/O pin by tying /RESET high. If the RAM cells are corrupt on power up, it's no biggie. DRAM isn't expected to be coherent after a power down. EDIT: Just attach it to a small capacitor for an automatic power-up reset feature. EDIT2: Err, had it right the first time, There is already a built-in power-on-reset.
RWDS I guess should be left connected for those single byte writes ... just seems overkill throwing a whole I/O pin at it is all. I'd try leaving it out if I did my own layout.
It'd be cool to merge CLK and /CS to rescue another I/O pin. It'd be tricky though, CLK is required to idle low and /CS needs to return high while CLK is idling.
@evanh Are there any speed gains to be had though ? If your test shows 0 errors up to 350 MHz already, then it seems about as fast as P2 limits already?
@rogloh Not seeing them just yet, and I suppose they'd need a dedicated LDO for 3V- not a simple swap out on the existing PCB. (Which is a shame, as I'd gladly swap out a couple HR chips). Poop!
@evanh Are there any speed gains to be had though ? If your test shows 0 errors up to 350 MHz already, then it seems about as fast as P2 limits already?
Read data is not so friendly as writes. There is a lot of narrow bands that work and don't work. And the 22 pF capacitor makes them a little narrower!
Some band overlap is achieved with combinations of registered and unregistered pins but it's pretty hairy, especially if taking temperature effect into account.
Oh, and certainly have no use for the /RESET pin on a HyperRAM. Rescue an I/O pin by tying /RESET high. If the RAM cells are corrupt on power up, it's no biggie. DRAM isn't expected to be coherent after a power down.
RWDS I guess should be left connected for those single byte writes ... just seems overkill throwing a whole I/O pin at it is all. I'd try leaving it out if I did my own layout.
It'd be cool to merge CLK and /CS to rescue another I/O pin. It'd be tricky though, CLK is required to idle low and /CS needs to return high while CLK is idling.
Noted on RESET, good point.
I suppose one issue with pin sharing those others, is that the traces/circuits to achieve that will add the very crud one seeks to avoid. RWDS might be tempting to leave off if multi-writes are fast enough, but we need minimum 10 pins anyway, so already into a second bank of 8. (Although I get what you are saying for a custom project).
@rogloh Not seeing them just yet, and I suppose they'd need a dedicated LDO for 3V- not a simple swap out on the existing PCB. (Which is a shame, as I'd gladly swap out a couple HR chips). Poop!
I think the acceptable range of the VCC supply was up to 3.6V on the data sheet for 2.0 HyperRAM here:
I suppose one issue with pin sharing those others, is that the traces/circuits to achieve that will add the very crud one seeks to avoid. RWDS might be tempting to leave off if multi-writes are fast enough, but we need minimum 10 pins anyway, so already into a second bank of 8. (Although I get what you are saying for a custom project).
Yeah, 11 pins is the sensible answer for first shot at this.
Oh, and certainly have no use for the /RESET pin on a HyperRAM. Rescue an I/O pin by tying /RESET high. If the RAM cells are corrupt on power up, it's no biggie. DRAM isn't expected to be coherent after a power down.
RWDS I guess should be left connected for those single byte writes ... just seems overkill throwing a whole I/O pin at it is all. I'd try leaving it out if I did my own layout.
It'd be cool to merge CLK and /CS to rescue another I/O pin. It'd be tricky though, CLK is required to idle low and /CS needs to return high while CLK is idling.
Noted on RESET, good point.
I suppose one issue with pin sharing those others, is that the traces/circuits to achieve that will add the very crud one seeks to avoid. RWDS might be tempting to leave off if multi-writes are fast enough, but we need minimum 10 pins anyway, so already into a second bank of 8. (Although I get what you are saying for a custom project).
My current driver supporting byte granular memory writes requires and uses the RWDS pin. I think you'd want to keep that routed to the memory chips on the plug in boards for P2-EVAL use. Other pin constrained setups could potentially drop it and sacrifice byte writes and try to create their own custom drivers I guess. But I'd route it. Reset is probably less important and my driver will optionally pulse it if specified, though it doesn't require it. In a dedicated setup you'd probably just connect it into the system master reset signal.
For the accessory boards we would keep both. They are all about evaluation and experimenting with all the features. There may have only been a choice to make if dropping RESET would allow a single edge-header accessory, but that's not going to happen
Note that when it comes to using RWDS as the write mask at high speed the signal performance of RWDS has to be identical to the data pins, ie: It's effectively a 9th data bit then.
EDIT: That said, on the prop2, I doubt it'll get used for more than a single byte at a time. So pointless needing it to electrically perform when it's faster to just bit-bash the one byte and leave the performance for complete bursts without RWDS.
@rogloh Not seeing them just yet, and I suppose they'd need a dedicated LDO for 3V- not a simple swap out on the existing PCB. (Which is a shame, as I'd gladly swap out a couple HR chips). Poop!
I think the acceptable range of the VCC supply was up to 3.6V on the data sheet for 2.0 HyperRAM here:
Ah, ok that would be interesting to run them at 3.3V and see where they top-out. Those parts are available.... hmm, I'd better not distract myself on that until the Evals are in production, but I've just added S70KL1282 to a samples order. (I'll check later if that's the best available choice)
Edit: Excitement got the better of me! Not actually available, I was reading the "minimum qty" column !
Comments
P32, no cap = 30
P32, 22pF = 256 (possible glitch, with the next error coming at 353)
P16, no cap = 89
P16, 22pF = 359
P0, no cap = 110
P0, 22pF = 360
P0, 10pF = 363
P0, 1pF = 207
It would be fair to add that without caps, P16 had significantly more errors than P0. However, they both hit the first error at similar points both with and without caps. Also P16 had fairly close matching by accident in the layout, whereas P32 was quite a bit off (and which is reflected in the results)
Thus it seems the improvement achieved by impedance matching of traces is not by itself the key to higher speeds, but it helps significantly.
I'm thinking it would be worth to trace-match at least P16 to P31 on the next Eval board, if not P32 to P47 also (if possible). P0 to P15 are already matched for customers who got a RevB board with date-code 1952 or higher, or want to grab one whilst there's only 16 left!
For the HyperRAM accessory I'll make a note to look at the layout around the data and clock traces and see if we can improve anything for a future rev, or maybe add a footprint for a cap on each clock line at least. I'm fairly sure we included some series R on the clock and data? traces... perhaps those values could be adjusted for higher speed gains too. hmm.. better open the layout...
Edit-- ok, series 10R's on the clock and RWDS signals! Yeah, a cap with fit quite nicely next to the each of the clock series R's without needing to change anything else that might have other unintended consequences..
Does that test code @evanh have an option to run continuously at a certain fixed rate ? (Sorry- I should look really!)
line 513 got it!
and about sysclk/1, is that something that the test-code simulates (or could it?)
Not quite sure what you are asking for there. I think evanh's standalone test code is actually using sysclk/1 operation for writes, so no need to simulate, just use it for real.
EDIT: And, yes, as Roger says, it really is writing lots of data to the HyperRAM chip. The high bit error counts reported, ~400,000, is roughly 50% of the data written for each case.
Is changing HR_DIV to 2 or 3 the right way to test at sysclk/2 or 3 ?
EDIT: The constant is actually called "dmadiv". It's a legacy of much earlier code that was copying straight between two streamers long before I had my hands on any HyperRAM parts.
I'm expecting trimmer deliveries on Tuesday, and will post results soon after.
If there's anything else you think of which you'd like testing just shout!
RWDS I guess should be left connected for those single byte writes ... just seems overkill throwing a whole I/O pin at it is all. I'd try leaving it out if I did my own layout.
It'd be cool to merge CLK and /CS to rescue another I/O pin. It'd be tricky though, CLK is required to idle low and /CS needs to return high while CLK is idling.
@rogloh Not seeing them just yet, and I suppose they'd need a dedicated LDO for 3V- not a simple swap out on the existing PCB. (Which is a shame, as I'd gladly swap out a couple HR chips). Poop!
Some band overlap is achieved with combinations of registered and unregistered pins but it's pretty hairy, especially if taking temperature effect into account.
Noted on RESET, good point.
I suppose one issue with pin sharing those others, is that the traces/circuits to achieve that will add the very crud one seeks to avoid. RWDS might be tempting to leave off if multi-writes are fast enough, but we need minimum 10 pins anyway, so already into a second bank of 8. (Although I get what you are saying for a custom project).
I think the acceptable range of the VCC supply was up to 3.6V on the data sheet for 2.0 HyperRAM here:
S70KL1282/S70KS1282
https://www.cypress.com/file/501841/download
My current driver supporting byte granular memory writes requires and uses the RWDS pin. I think you'd want to keep that routed to the memory chips on the plug in boards for P2-EVAL use. Other pin constrained setups could potentially drop it and sacrifice byte writes and try to create their own custom drivers I guess. But I'd route it. Reset is probably less important and my driver will optionally pulse it if specified, though it doesn't require it. In a dedicated setup you'd probably just connect it into the system master reset signal.
EDIT: That said, on the prop2, I doubt it'll get used for more than a single byte at a time. So pointless needing it to electrically perform when it's faster to just bit-bash the one byte and leave the performance for complete bursts without RWDS.
Ah, ok that would be interesting to run them at 3.3V and see where they top-out. Those parts are available.... hmm, I'd better not distract myself on that until the Evals are in production, but I've just added S70KL1282 to a samples order. (I'll check later if that's the best available choice)
Edit: Excitement got the better of me! Not actually available, I was reading the "minimum qty" column !