HyperRAM driver for P2

1246720

Comments

  • evanhevanh Posts: 9,827
    edited 2020-04-06 - 08:38:12
    That was all found through trial and error with random data transferred in large blocks with every byte verified. Only after all that did I attached a scope to it and see what the HR write phasing looked like coming out of the prop2:
    - Orange trace is hyper clock pin at the accessory header, P24
    - Blue trace is data bit#1 at the accessory header, P17
    dat-reg_clk-unreg_22pF.PNG


    EDIT: So I've loaded about 1 ns lag on the HR clock signal with the combined difference from unregistered HR clock and registered HR data plus the 22 pF capacitor. This relies on registered HR data pins, so they'd have to be switched back and forth when reads want to be unregistered data.

    EDIT2: The added lag is actually a little greater than 1 ns, because the HR accessory board loads the HR data pins more than the HR clock pin for some reason. Further reading - https://forums.parallax.com/discussion/comment/1490296/#Comment_1490296
  • Am sort of hoping we can find a reasonable set of values that overlap over the frequency range with Parallax HyperRAM board and no capacitor fitted at room temps which will be the default for people who don't rework the board. I also hope the variation between pin usage such as P0-P15, P16-P31, P31-P47 won't affect things too much either. In any case these delays/registered pin settings will remain programmable so people could always tweak things to their own requirements if needed if it turns out there is more variation to this over different implementations. At this point I don't plan to make the clock output dynamically registerable though.
  • evanhevanh Posts: 9,827
    edited 2020-04-06 - 08:42:08
    The best answer for widening the usable frequency bands of HR read data is to have a specific project board that has a HR chip included at close position to the prop2. And all projects that want sysclock/1 speed will expect that board or one with equivalent layout for the HR chip.

    This need mostly fades at sysclock/2. Certainly HR writes are a piece of cake at sysclock/2 without any capacitor.

  • Yeah all my driver's writes at this point are only going to be done at sysclk/2, and this speed also includes the address phase for reads as well. Optionally only the data phase of the reads can be set to transfer at sysclk/1 for systems that can support this.

    Later I might look at the code for supporting sysclk/1 writes if you fit a delay element like a capacitor, but I think this will be a challenge to get the software code timing correct without extra gaps once you include RWDS and odd/even byte address handling etc, certainly the way it is currently being done anyway. Perhaps the address phase will always have to stay at sysclk/2 rates like I have for reads. This is not such a problem for larger burst transfers.
  • evanh wrote: »
    EDIT2: The added lag is actually a little greater than 1 ns, because the HR accessory board loads the HR data pins more than the HR clock pin for some reason. Further reading - https://forums.parallax.com/discussion/comment/1490296/#Comment_1490296

    After a while (many severe lung diseases in a row, from 2019-10 thru 2020-03 (perhaps SARS-related, perhaps not; doctors still unsure; 25 kg of my body mass did vanished; slowly recovering, from this too), trying to wrap my head around Hypers, again...

    Its absolutelly normal that Hypers do present a higher (capacitive) load at DQ lines, as compared to their clock lines: being input-only, CK and CK# are internally conneted to a single receiver (buffer/reshapper) each, while DQs, due their bi-directional nature, are connected to a simillar receiver AND to a tri-stateable driver, in a per-pin basis.

    Also, the tri-stateable drivers at DQ pins, whose topology is also shared by RWDS, have programmable drive-strengts, whose design (my guess) could also affect the total capacitance numbers, as sawn by the "external world" (P2), for each pin.

  • That makes sense. Thanks Yanomani.

  • You're always welcome, evanh.

    Since there are no available schematics describing the internal design of DQs and RWDS, I need to rely in paragraphs like the following, extracted from page 21 of the most recent copy of S27KL0642/S27KS0642 datasheet I have in hands ( Document Number: 002-24692 Rev. *F, Revised November 25, 2019):

    "Drive Strength DQ and RWDS signal line loading, length, and impedance vary depending on each system design. Configuration register bits CR0[14:12] provide a means to adjust the DQ[7:0] and RWDS signal output impedance to customize the DQ and RWDS signal impedance to the system conditions to minimize high speed signal behaviors such as overshoot, undershoot, and ringing. The default POR or reset configuration value is 000b to select the mid point of the available output impedance options. The impedance values shown are typical for both pull-up and pull-down drivers at typical silicon process conditions, nominal operating voltage (1.8 V or 3.0 V) and 50°C. The impedance values may vary from the typical values depending on the Process, Voltage, and Temperature (PVT) conditions. Impedance will increase with slower process, lower voltage, or higher temperature. Impedance will decrease with faster process, higher voltage, or lower temperature. Each system design should evaluate the data signal integrity across the operating voltage and temperature ranges to select the best drive strength settings for the operating conditions. "

    https://cypress.com/file/498611/download
  • Oh, and the data pins are paralleled between the two Hyper chips as well. Whereas the hyper clocks are independent. That'll add to the data pins loading.

    Here's a photo with the 22 pF capacitor for the HyperRAM testing. It shows how simple the soldering is. Conveniently there is sun shining into the room right now for a good photo.
    HR-cap.JPG
    1088 x 590 - 369K
  • Neat job with the 22pf ceramic. We have these at work in case anyone local needs them

    I noticed some stock appeared with 1v8 "rev2" hyperrams at digikey, arrow and mouser, S27KS0642*

    The 3v3 "rev2"s, S27KL0642* don't show yet, but will keep watching
  • roglohrogloh Posts: 2,640
    edited 2020-04-08 - 01:04:07
    Neat mod evanh. That gives me an idea. As I am already testing out the HyperRAM board using some 6x2 jumper extenders (think of 2 Arduino shield connectors bound together done that way so I can access the extended pins in-situ with the logic analyzer), I might be able to take a small 1x2 PC jumper and cut its top link and solder a small 22pF cap there, then just plug it in and out as required for some easy non-permanent testing of 1x write operation once I get to trying that too. It just won't be as quite close to the chip if that somehow mattered but I think it will still help achieve the delay.
  • YanomaniYanomani Posts: 1,037
    edited 2020-04-08 - 02:10:52
    Hi rogloh

    In your case (extra/extended connections, to easy "probe" access) consider buying some 16/18pF, low-voltage ceramic disc capacitors (and "extra" jumpers too), as a possible option, to attain the same effect of the 22 pF ones others are using;

    Any "extra" metal/plastic paralleled connections, as presented by the "extended pin/posts, and jumper hardware, would increase capacitive loading a little bit, by themselves.
  • Ok good call Yanomani. What is handy is that I can make a few different values to try and then plug and pray.
  • Plug'n-play; timing tweaks... :lol:

    Sounds good, even to the "ears"...
  • Called some good memories from a long-deceased (and missed) friend/tutor of the 80's; he used to carry a good bunch of 50 Ohm pre-assembled stubs into his toolbag.

    Usefull to tweak undesired reflections at the many Ham radios-to-antenna links he used to service, every day!!! :lol:
  • YanomaniYanomani Posts: 1,037
    edited 2020-04-08 - 03:01:43
    While we are musing about that subject (and in order to avoid forgetting it, as time passes by), we must always remember the fact that Hypers do have a single pair of power pins (VccQ/VssQ) feeding all the "outer world" signal connections/conditioners (DQs, RWDS, CK and CK#, CS# , RESET#, etc) (AKA: its own "pad ring").

    By its way, P2 physical topology tends to make use of three connections to any Vio-source (or, even worse, sources), as a minimum, when used as an HyperBus controller.

    The physical distance between these individual connections (either being provided by a plane or tracks, including any interposed vias) would impose some level of inductance, both between them, and to any 3.3V power source.

    Despite it can be minimized someway, this inductance imposes slight dynamic imbalances (currents) between the three groups of pins (or more), as they are not being loaded steadly, but switching, at "crazy" rates.

    And all that noise is being "coupled" to a single padframe, inside the Hypers, particulary to the DQs/RWDS group, that is well known to terminate into a single logic block, also inside the Hypers.

    The above considerations are making me think about the many "gaps" are being experienced by the many "frequency range" measurements, took while varying P2 sysclk...

    Perhaps there are some yet unnoticed connections between specific ranges on those gaps, and the few sample hardware (still limited physical topologies) we have in hands, when taking these measurements.

    Time (and diversity) (P2D2???) could tell new tales, I hope...
  • evanhevanh Posts: 9,827
    edited 2020-04-08 - 04:41:26
    There is some similarity when just copying from streamer to streamer via the prop2 I/O pins. The first band upper limit of sysclock/1 was around 280 MHz sysclock at room temperature (compared to about 90 MHz with the HR accessory), and this dropped to as low as 250 MHz up around 80 95 °C. Chip believes it is the slew limit of the prop2 output drive. I did a large amount of testing of this way back ...

  • YanomaniYanomani Posts: 1,037
    edited 2020-04-08 - 03:36:19
    Thanks evanh for bringing back those informations.

    Yes, differences between pins slew-rates can explain it, but, IIRC, they were tested with chips soldered at onto P2-Evals. Is this assumption right correct?
  • evanhevanh Posts: 9,827
    edited 2020-04-08 - 03:44:32
    Yes, revA chip in a revA board I believe. Found it - https://forums.parallax.com/discussion/comment/1472679/#Comment_1472679

    PS: I think that was using the lower 32 pins (P0-P31) to perform the data copy.
    PPS: I never did solve why it made a difference to the graph shape which task (send vs receive) was doing the waiting. I don't think it mattered which cog the task was running in.
  • jmgjmg Posts: 14,474
    Has anyone checked the delay bands on a Rev C die yet ?
    That was another batch run, and might indicate how much batch-to-batch variation could be expected.

    Some newer Octal memory parts have ROM read patterns, which may be useful for read tuning ?

    With the apparent lack of overlap in the 'good zones' I wonder if a programmable 3v3 supply (eg +/- 20%) would be enough to tune/margin this ?
  • YanomaniYanomani Posts: 1,037
    edited 2020-04-08 - 04:40:45
    evanh wrote: »
    Yes, revA chip in a revA board I believe. Found it - https://forums.parallax.com/discussion/comment/1472679/#Comment_1472679

    PS: I think that was using the lower 32 pins (P0-P31) to perform the data copy.
    PPS: I never did solve why it made a difference to the graph shape which task (send vs receive) was doing the waiting. I don't think it mattered which cog the task was running in.

    Were there any tests limited to specific groups of FOUR pins at a time (grouped by their Vio)?

    Elaborating a bit more: by the past months (and due to the miserable condition the never confirmed "SARS", did affected my body health) I've been studying high-frequency digital design; and did learned a lot about it, while trying to keep brain's integrity.

    Even the best differential design (either LVDS, LVPECL, or any other, in fact) can be "killed", or, at least, severelly affected/limited, when frequencies are exercised above 160~200 MHz. Single-ended (as P2, except for diff-USB) can be worse.

    One tenth of an inch of misbalance, protuding as an outward track, and "booom"; above 200 MHz, the mess is almost certain:

    Any impedance mismatches, or stubs, along the signal lanes, and one can almost "hear" the sound barrier being broken, and no Tom Cat at the sky...

    - reflections that can't be properly absorbed at any evenly terminated group of connections;
    - "rattle"-alike noise, due to the uneven time-of-arrival of signals within a group, meant to be sampled at any specific window apperture (cat-eye);
    - jitter;
    - jittery;
    - did I said anything about jitter???

    When you put those 22pF ceramic caps into play, you are "panning" the sample point a little bit, eventually hitting the next "clean" cat eye, but, as frequency shifts up, a new (and lower) cap value would be required, since the cleannest eye-apperture would be shrunken, a little. It's dynamic, and can also be varied as the data patterns varies, due to crosstalk-induced effects. Eventually, pcb construction/routing (or pad/solder-induced mismatch) would dominate, and no suitable cap value could be found, anymore.

    As an exagerated (but real as your toe hitting a solid rock) example, goolge google for "potato semiconductor" (yes, there are potato chips at the semiconductor industry, whatever funny (and bad) it can be). :lol:


  • jmg wrote: »
    Has anyone checked the delay bands on a Rev C die yet ?
    That was another batch run, and might indicate how much batch-to-batch variation could be expected.

    Some newer Octal memory parts have ROM read patterns, which may be useful for read tuning ?

    With the apparent lack of overlap in the 'good zones' I wonder if a programmable 3v3 supply (eg +/- 20%) would be enough to tune/margin this ?

    Good point, jmg!

    We have been using Hypers that are specc'd for 3.0V operation; 3.3 V, though permissible, could collaborate to mess the limits even more.
  • evanhevanh Posts: 9,827
    edited 2020-04-08 - 05:02:44
    The testing for suitable choice of capacitor was done without any electrical probing. Just used software - random data with verify - and a soldering iron. Vons had already advised to use only P16-P31 I/O on the Eval Board as they were the most evenly laid out. The scope snapshot above was done after the fact.

    I started with 33 pF, moved up to 68 pF then back down to 22 pF ... and left it there.

  • I was talking with Rogloh about the possibility of switching the cap GND leg on/off using one of the upper unused pins (call it aux)

    If we do that, we could also output a variable amplitude sine wave from that aux smartpin, same frequency as the clock pin, to affect the overall resultant phase of the clock signal

    Reactance of a 22pF cap is nominally 72 ohms at 100 MHz. Gnd switch is 19 ohms or thereabouts at DC, but would need to check the parasitics.
  • jmgjmg Posts: 14,474
    Tubular wrote: »
    If we do that, we could also output a variable amplitude sine wave from that aux smartpin, same frequency as the clock pin, to affect the overall resultant phase of the clock signal
    How would you do that 'variable amplitude sine wave' at 100MHz rates exactly ?
    It may be possible to change the pin from Floating to GND to !CLK (doubles the cap) to lighter-drive choices (some portion of cap) ?


  • Use the bit dac mode to control the amplitude.

    Use the 75 ohm DAC into 22pf the rolloff is around 100 MHz so its mostly the fundamental acting to effect the phase shift of the hyperram clock.
  • evanhevanh Posts: 9,827
    edited 2020-04-08 - 06:47:47
    Just to give an idea how well the 22 pF capacitor actually worked, here's a copy of the dump from the final HyperRAM write test I did about six weeks back. There is 400,000 bits (50,000 bytes) per data burst tested. The error rate is roughly 50% when timing is unaligned.

    I'd hate to think how poor the signal looks up at 350 MHz sysclock (350 MB/s) but the 3 Volt ver1 part still works at that rate! HUBSET was using XDIV=20 and XDIVP=1.
  • evanhevanh Posts: 9,827
    edited 2020-04-10 - 04:34:56
    Here's the source code. I've tweaked it some more today to allow more config options using #define. This obviously requires using Fastspin to build.

    EDIT: Oops, corrected a bug with HRclock pin registering.
  • roglohrogloh Posts: 2,640
    edited 2020-04-08 - 23:12:08
    Thanks evanh.
    I'd hate to think how poor the signal looks up at 350 MHz sysclock (350 MB/s) but the 3 Volt ver1 part still works at that rate!
    I think I recall you originally mentioned this was for writes, but do you remember what was your fastest successful read rate achieved?
  • evanhevanh Posts: 9,827
    edited 2020-04-10 - 04:19:20
    Same as Brian's results. Although the capacitor degrades the bands a little more. And higher temperature will degrade them further. Running the sources above ...

    Frequency bands for HR Read Data, room temperature, data pins P16-P23, clock pin P24:
    1-96 MHz, 112-193 MHz, 232-288 MHz: All registered pins, no capacitor.
    1-87 MHz, 107-174 MHz, 217-266 MHz: All registered pins, 22 pF capacitor on clock.

    1-94 MHz, 107-188 MHz, 221-277 MHz: Registered data pins, no capacitor.
    1-84 MHz, 103-168 MHz, 208-256 MHz: Registered data pins, 22 pF capacitor on clock.

    1-80 MHz, 90-162 MHz, 180-241 MHz, 276-317 MHz: Registered clock pin, no capacitor.
    1-73 MHz, 86-145 MHz, 171-221 MHz, 259-295 MHz: Registered clock pin, 22 pF capacitor on clock.

    1-78 MHz, 87-156 MHz, 173-233 MHz, 264-308 MHz: All unregistered pins, no capacitor.
    1-71 MHz, 83-140 MHz, 165-214 MHz, 249-286 MHz: All unregistered pins, 22 pF capacitor on clock.

    EDIT: Oops, corrected a bug with HRclock pin registering.
  • roglohrogloh Posts: 2,640
    edited 2020-04-09 - 04:24:35
    evanh wrote: »
    Same as Brian's results. Although the capacitor degrades the bands a little more. And higher temperature will degrade them further. Running the sources above ...
    OK I sort of thought as much.

    Just now I tested out my register write and read back on the real HW. Register endianness threw me a bit at first because it has 8 bit MSB first then 8 bit LSB, unlike the typical memory read/write data handling in my driver (which can actually be either way around, but with the streamer in use it really wants it to be little endian).

    With it the correct way I can now setup the HyperRAM latency to 4 clocks (from its reset default of 6) and read it back. I can also see this actual latency setting is also working on the scope/analyzer and the data pattern read back.
    ( Entering terminal mode.  Press Ctrl-] to exit. )
    Starting
    COG 1 spawned
    delay = $00000002 clocked=0 result = $00000000
    delay = $00000001 clocked=0 result = $00000000
    Reading IR0 Reg die0 result = $0D83
    Reading IR0 Reg die1 result = $0D83
    Reading IR1 Reg die0 result = $0000
    Reading IR1 Reg die1 result = $0000
    Reading CR0 Reg die0 result = $0F1F
    Reading CR0 Reg die1 result = $0F1F
    Reading CR1 Reg die0 result = $0002
    Reading CR1 Reg die1 result = $0002
    Writing Reg result = $00000000 <---- setting latency to 4 on both dies in MCP
    Writing Reg result = $00000000
    Reading IR0 Reg die0 result = $0D83
    Reading IR0 Reg die1 result = $0D83
    Reading IR1 Reg die0 result = $0000
    Reading IR1 Reg die1 result = $0000
    Reading CR0 Reg die0 result = $0FFF
    Reading CR0 Reg die1 result = $0FFF
    Reading CR1 Reg die0 result = $0002
    Reading CR1 Reg die1 result = $0002
    Setting faster 200000000 Hz clock
    delay = $00000002 clocked=0 result = $00000000
    delay = $00000003 clocked=0 result = $00000000
    Setting up burst pattern
    Printing burst buffer contents
    00000000 : 03020100 07060504 0B0A0908 0F0E0D0C 13121110 17161514 1B1A1918 1F1E1D1C 
    00000020 : 23222120 27262524 2B2A2928 2F2E2D2C 33323130 37363534 3B3A3938 3F3E3D3C 
    00000040 : 43424140 47464544 4B4A4948 4F4E4D4C 53525150 57565554 5B5A5958 5F5E5D5C 
    00000060 : 63626160 67666564 6B6A6968 6F6E6D6C 73727170 77767574 7B7A7978 7F7E7D7C 
    Burst writing 10 data bytes to address $00000000
    result = $0000000A
    Clearing out burst buffer
    Printing burst buffer contents
    00000000 : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
    00000020 : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
    00000040 : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
    00000060 : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
    Reading a 10 byte burst from $00000000 to buffer
    result = $0000000A
    Printing burst buffer contents
    00000000 : 03020100 07060504 00000908 00000000 00000000 00000000 00000000 00000000 
    00000020 : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
    00000040 : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
    00000060 : 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
    
    
    
Sign In or Register to comment.