@dgately They are all valid test options, so you could test both your Flash and HyperRAM at both the sysclk/1 and sysclk/2 read speeds using the full delay range and check that the default driver delay (in parentheses) will work with 100% success over the frequency range from 50-350MHz. The unregistered clock option is really just experimental to see if the range is extended/reduced, the (current) default is registered unless overridden at startup.
So good test options would be:
50-350MHz
1, 0, 0, 0 - for fast sysclk/1 + RAM (with a registered clk pin)
50-350MHz
1, 0, 0, 1 - for fast sysclk/1 + Flash (with a registered clk pin)
50-350MHz
0, 0, 0, 0 - for sysclk/2 + RAM (with a registered clk pin)
50-350MHz
0, 0, 0, 1 - for sysclk/2 + Flash (with a registered clk pin)
You could then do the same with an unregistered clock pin if you felt like it to see if choosing that improves/degrades the range and the overlaps. From memory I think it typically helps increase overlap slightly but reduces the top end range.
@evanh , if you get a chance and the inclination, would you like to run this test on your HyperRAM/HyperFlash setup too, to see if the default delay values work out for your HW? If the HyperFlash test is selected it will erase the last sector in case you have some valid data in there that you needed to keep.
Here's a simple demo binary to show a COG drawing dynamically into a HyperRAM frame buffer while the video COG reads it out at the same time, and changing the display resolutions / P2 frequencies on the fly with the HyperRAM driver adjusting it's timing automatically to match the correct delay.
Cycles through VGA, SVGA, XGA, SXGA, FULLHD video in the different colour depths (except 16 & 24bpp to avoid overloading with the higher resolutions - I should try to add it back for the lower resolutions that support that).
You need to fit a HyperRAM expansion module on pins 32-47 and your VGA board on pin 0-7 of the P2-EVAL.
This demo will operate the P2 at clocks ranging from 195MHz - 297MHz.
Ah, your randomiser isn't correct. You're overwriting the state variable with the generator's output. It'll be random still but likely have reduced state space and probably not distributed evenly either. Here's the fixed version:
PUB randomizeBuf(seed) : r | i
repeat i from 0 to BUFSIZE-1
long[@writebuffer][i]:=??seed
return seed
It's similar to how "z := x++" updates x as well as z.
Yeah, did that but the results suggest you're doing two passes with registered/unregistered pins for each line. Sysclock/1 shouldn't be able to ever produce 100% score for two compensations.
EDIT: Actually, it looks like the code uses the same timing parameters for both the read and write passes for the same test. That's probably not ideal for separating read limitations from write limitations.
My 4 bit "delay" value's LSB is actually the data pins being registered or not. The actual clock delay cycles in the waitx code ultimately uses the value without the LSB (ie. delay>>1, shift operation not shown) So you are sort of correct about the table values. Don't worry that the number presented doesn't match your own interpretation of delay.
wypin clks, clkpin 'setup number of transfer clocks
wrpin regdatabus, datapins 'setup data bus inputs as registered or not
waitx delay 'tuning delay for input data reading
xinit xrecv, #0 'start data transfer and then jump to setup code
EDIT: Actually, it looks like the code uses the same timing parameters for both the read and write passes for the same test. That's probably not ideal for separating read limitations from write limitations.
Yeah I guess in theory I should do the RAM write just once like I do with flash. Then read with different delays. Write doesn't use the clock delay, but it could vary slightly with the registered/unregistered bus settings. It is always done at sysclk/2 however and the clock is centered in the middle of the data (I've checked this one on the scope - so it should remain reliable).
Update: scratch that. Only reads ever change registered/unregistered data pins and they restore the setting at the end of the read. Writes keep this setting static.
It is always done at sysclk/2 however and the clock is centered in the middle of the data (I've checked this one on the scope - so it should remain reliable).
Update: scratch that. Only reads ever change registered/unregistered data pins and they restore the setting at the end of the read. Writes keep this setting static.
Yeah, that's reliable. Writes at sysclock/2 is a very stable config. They don't have that stretching effect on the latency that the reads have. Attenuation is the only limit for writes.
Somewhat interested to know how fast your HyperFlash can read out and whether the automatic clock delay breakpoints I have put in work out (at least for room temp). @dgately 's RAM results seemed good. My own flash seems to still read back ok at 350MHz with the sysclk/1 rate. It'll probably go higher but I don't want to test that.
Here's a couple of runs with same options but with different hardware. One is my RevB globtop with the HR board that has a 22 pF capacitor added on the HR clock pin of the accessory header. The other is the RevB finished package without any capacitor added.
Thanks for that. Yeah your 22pF mod has shifted things about, you would need a tweak to the timing profile if you wanted to use that HW. The setup without the cap looks okay except around 233MHz crossing (that seems to be a problematic region in general).
Just realised something cool about my driver. Because I have up to 16 banks per bus, and it is unlikely that we will have over 8 physical devices on the same bus, for systems with less than 128MB fitted I can actually use a second bank to map to the same memory as the normal bank does, but give it different delays (delays are stored on a per bank basis). This means a background process in theory could be reading/writing to a spare area of HyperRAM, testing out which of the two neighbouring delay values is better to use. This can happen without affecting the current setting on the normally read bank. Once you know the best new value to use you can change the normal bank too. This might let us track changes with temperature perhaps...
Yes if the temperature varies we already know things can change. I'm sort of happy that with three different boards there isn't a huge amount of timing difference seen so far for room temp operation, though for other non P2-EVAL setups and other HyperRAM devices it could easily vary more of course. Probably we can hope to avoid the frequency regions where the timing overlaps. HyperFlash seems like it has a lot more overlap in general.
@dgately They are all valid test options, so you could test both your Flash and HyperRAM at both the sysclk/1 and sysclk/2 read speeds using the full delay range and check that the default driver delay (in parentheses) will work with 100% success over the frequency range from 50-350MHz. The unregistered clock option is really just experimental to see if the range is extended/reduced, the (current) default is registered unless overridden at startup.
So good test options would be...
Attached:Zipped-file of tests for registered and unregistered clock pins... I think, nothing unexpected here!
Thanks dgately. Looks to me like the registered clock setting is the way to go, so keeping that as the default is what I will do. It increases the top end range too to cover the 297MHz operation.
In your setup with registered clocks, 88-95MHz, 230-235MHz and 275-280MHz are probably good frequency ranges to try to avoid if you can for fastest sysclk/1 HyperRAM performance (at least at the temperature you tested), or you could tweak the timing profile further to improve it in your setup. HyperFlash looks great over the full range at sysclk/1 speeds.
It will be easy to tweak, the attached driver code snippet below shows what I did to setup the default profile and you can still get to create your own and apply them after driver startup. I may end up creating another variant for sysclk/2 vs sysclk/1 operation too.
In the future I originally thought it may be possible to interpolate between different profiles based on a measured temperature, though this is problematic because it relies on some extra HW ability to monitor temperature and have some background thread/COG doing it (or periodically delay requests if the driver does it - not ideal). An alternative would be something that periodically reads/writes into a small reserved portion of RAM that has been mapped using a secondary bank and this operation tracks the read errors with both neighbouring delay values and finds the best delay automatically for the given operating frequency (no temp sensor needed). All TBD but I think for now we keep things as is. The unstable frequency ranges are thankfully still pretty small and could be possible to avoid anyway.
'Default delay profiles used for HyperFlash and HyperRAM on P2-EVAL HyperRAM breakout board
'operating at room temp. This can be tweaked or others added for different temperatures.
'These delay profiles can be assigned to each configured device at address mapping time.
'The actual operating input delay can also be adjusted on the fly per bank if the variation
'of delay with temperature is already determined and the temperature is known/measurable.
HyperRamDelays long 6,92_000000,135_000000,188_000000,234_000000,280_000000,0
HyperRamDelaysUnreg long 6,88_000000,120_000000,180_000000,225_000000,270_000000,0
HyperFlashDelays long 5,70_000000,110_000000,160_000000,225_000000,277_000000,320_000000,0
HyperFlashDelaysUnreg long 5,70_000000,105_000000,150_000000,210_000000,260_000000,315_000000,0
'The profile format begins with the initial delay value, followed by frequencies at which the
'delay is sequentially increased until either it falls below the next frequency, or the list
'terminates with a zero. Frequencies must be stored in increasing order.
' e.g. using HyperRam data above for unregistered clock option
' if 0 <= freq < 88000000 Hz, the delay compensation value is 6,
' if 880000000 <= freq < 120000000 Hz, the delay compensation value is 7,
' if 120000000 <= freq < 180000000 Hz, the delay compensation value is 8,
' ...etc...
' if 270000000 <= freq , the delay compensation value is 11
'
Thanks for that. Yeah your 22pF mod has shifted things about, you would need a tweak to the timing profile if you wanted to use that HW.
Huh, looking back at my old log files I see I started investigating using a 10 pF cap in place of the 22 pF. It looked like it was okay. I should do some more testing of that ...
EDIT: Oh, I also discovered that P32 as the base pin on the RevB Eval Board sucks. Even with the 22 pF cap, writes at sysclock/1 still can fail at certain bands, namely around 120 MHz and 240 MHz. Best not to use P32 unless you have the RevC Eval Board.
Here's the sysclock/1 burst write runs for P0, P16 and P32 basepin without any clock capacitor. Compensation 4 shows a nice curve of worse and better frequencies. Lower is better. P0 and P16 are pretty similar. P32 is much worse over all.
I've included the run for P32 with 22 pF clock cap also. At 256 MHz it has a single bit error. If I remember correctly, all attempts of that config had at least one error around there.
PS: I think the B2 naming means I used the Eval Board with the finished packaged chip for these tests. For most prior testing I used the Eval Board with the globtop chip.
Ok, the wiring to P32 may be less than ideal for sysclk/1 writes which I don't support in the driver at this point. I think it mostly works okay for sysclk/2 writes and sysclk/1 reads at least.
Supporting sysclk/1 writes in the future will need to somehow free some LUTRAM space to fit it in. Recently I shuffled things about and I think I now have 7 LUTRAM longs free, and 2 COGRAM longs free so hopefully that may help.
It may still need a sysclk/2 transfer portion at the start for the address phase so I can still share it with my existing instruction sequence and any sysclk/1 operation has to be disabled for fills because I need the rep loop below to fill HyperRAM with arbitrary bytes/word/long patterns and this takes 2 clocks per xcont transferring a byte. This means that any future sysclk/1 support could only help the burst writes and copies in my driver, not individual writes or fills.
rep #1, pb 'repeat transfers if filling
xcont xsend, hubdata 'stream the immediate/hub fifo data
I haven't tried a lot of sysclock/2 testing. I should see how bullet proof it really is at burst writes. On that note, I just realised the new mini toaster oven I got for de-soldering is pretty good for cheaply raising the ambient temperature when testing all this gear.
LOL, all the interesting uses of a toaster oven these days.
Maybe for cooling you could put the thing in a fridge to cool it down to a few degrees C and have some USB wires coming out for logging the result and current temperature if you have a thermocouple/sensor attached.
Ouch, my glob-top is failing even more. I can't use P0 as the base pin on it any longer. EDIT: It's probably the first time I've tried to use P0 for this since I roasted that board. It could be a de-soldered pin in that group.
Maybe for cooling you could put the thing in a fridge to cool it down to a few degrees C and have some USB wires coming out for logging the result and current temperature if you have a thermocouple/sensor attached.
I have used chiller ice packs to go as low is -10 °C. Of course I have to be organised to pack and time them right for best outcomes. For example, to go below -5 °C I have to also pre-chill the board in the freezer before transporting everything to the project room. I have a small soft chiller bag to pack them in tight.
Comments
So good test options would be:
50-350MHz
1, 0, 0, 0 - for fast sysclk/1 + RAM (with a registered clk pin)
50-350MHz
1, 0, 0, 1 - for fast sysclk/1 + Flash (with a registered clk pin)
50-350MHz
0, 0, 0, 0 - for sysclk/2 + RAM (with a registered clk pin)
50-350MHz
0, 0, 0, 1 - for sysclk/2 + Flash (with a registered clk pin)
You could then do the same with an unregistered clock pin if you felt like it to see if choosing that improves/degrades the range and the overlaps. From memory I think it typically helps increase overlap slightly but reduces the top end range.
Cycles through VGA, SVGA, XGA, SXGA, FULLHD video in the different colour depths (except 16 & 24bpp to avoid overloading with the higher resolutions - I should try to add it back for the lower resolutions that support that).
You need to fit a HyperRAM expansion module on pins 32-47 and your VGA board on pin 0-7 of the P2-EVAL.
This demo will operate the P2 at clocks ranging from 195MHz - 297MHz.
EDIT: Actually, it looks like the code uses the same timing parameters for both the read and write passes for the same test. That's probably not ideal for separating read limitations from write limitations.
Update: scratch that. Only reads ever change registered/unregistered data pins and they restore the setting at the end of the read. Writes keep this setting static.
Your HyperFlash run looks great too.
In your setup with registered clocks, 88-95MHz, 230-235MHz and 275-280MHz are probably good frequency ranges to try to avoid if you can for fastest sysclk/1 HyperRAM performance (at least at the temperature you tested), or you could tweak the timing profile further to improve it in your setup. HyperFlash looks great over the full range at sysclk/1 speeds.
It will be easy to tweak, the attached driver code snippet below shows what I did to setup the default profile and you can still get to create your own and apply them after driver startup. I may end up creating another variant for sysclk/2 vs sysclk/1 operation too.
In the future I originally thought it may be possible to interpolate between different profiles based on a measured temperature, though this is problematic because it relies on some extra HW ability to monitor temperature and have some background thread/COG doing it (or periodically delay requests if the driver does it - not ideal). An alternative would be something that periodically reads/writes into a small reserved portion of RAM that has been mapped using a secondary bank and this operation tracks the read errors with both neighbouring delay values and finds the best delay automatically for the given operating frequency (no temp sensor needed). All TBD but I think for now we keep things as is. The unstable frequency ranges are thankfully still pretty small and could be possible to avoid anyway.
EDIT: Oh, I also discovered that P32 as the base pin on the RevB Eval Board sucks. Even with the 22 pF cap, writes at sysclock/1 still can fail at certain bands, namely around 120 MHz and 240 MHz. Best not to use P32 unless you have the RevC Eval Board.
I've included the run for P32 with 22 pF clock cap also. At 256 MHz it has a single bit error. If I remember correctly, all attempts of that config had at least one error around there.
PS: I think the B2 naming means I used the Eval Board with the finished packaged chip for these tests. For most prior testing I used the Eval Board with the globtop chip.
Supporting sysclk/1 writes in the future will need to somehow free some LUTRAM space to fit it in. Recently I shuffled things about and I think I now have 7 LUTRAM longs free, and 2 COGRAM longs free so hopefully that may help.
It may still need a sysclk/2 transfer portion at the start for the address phase so I can still share it with my existing instruction sequence and any sysclk/1 operation has to be disabled for fills because I need the rep loop below to fill HyperRAM with arbitrary bytes/word/long patterns and this takes 2 clocks per xcont transferring a byte. This means that any future sysclk/1 support could only help the burst writes and copies in my driver, not individual writes or fills.
Maybe for cooling you could put the thing in a fridge to cool it down to a few degrees C and have some USB wires coming out for logging the result and current temperature if you have a thermocouple/sensor attached.