Shop OBEX P1 Docs P2 Docs Learn Events
Memory drivers for P2 - PSRAM/SRAM/HyperRAM (was HyperRAM driver for P2) - Page 29 — Parallax Forums

Memory drivers for P2 - PSRAM/SRAM/HyperRAM (was HyperRAM driver for P2)

1262729313237

Comments

  • evanhevanh Posts: 16,027
    edited 2020-10-31 06:43
    Huh, the input inversion in the custom pad ring seems not very effective, it doesn't apply to this feedback path. There is better input inversion option in the F-block and the I config bit could be better used for something else, imho.

    Output inversion works as expected but has no effect on timing. That'll be because the XOR inverting gate is always in circuit. See attached.

    EDIT: Clarified that output inversion works.
    827 x 250 - 11K
  • roglohrogloh Posts: 5,837
    edited 2020-10-31 06:02
    Ok, so yes there are other places we can invert - we could do it at the output of the intermediate pin or in the FFF block (using A XOR B ), then selecting PinA Schmitt or Logic as the input and driving OUT to 1 whenever we wish to invert. I can see it getting complex though if this has to be dynamic...may be okay if this was setup just once for some fixed operating conditions. Still good to know this type of thing may be doable if tweaking is needed.
  • One thing Brian checked out the other day is that the AAAA input bit inverter seems separate to the CIO bit inverter, so you can apply both to end up with a noninverted signal (hopefully more delayed though)
  • evanhevanh Posts: 16,027
    edited 2020-10-31 06:28
    I've made use of the "INPUT" feedback to the output completely within the low level custom pad ring circuits. It wouldn't make sense to venture beyond.

    Using the F-block for this introduces the hidden clocked stages between the hub and pad ring. The propagation would be multiples of the sysclock for a starters. That test code above is operating at just 10 MHz.

    PS: Not to mention there is no other hardware feedback path anyway.

  • evanhevanh Posts: 16,027
    Here's without and with inversion, respectively.
    pin_lat0065.PNG
    pin_lat0066.PNG
    640 x 480 - 8K
    640 x 480 - 8K
  • A 3.5ns clock delay from data to clock or thereabouts is probably good for sysclk/1 operation up to 270MHz or so if the hold time is zero ns. So a 252MHz VGA / HDMI frequency could still benefit from this amount of delay, and it could potentially double the write speed vs what we have today at sysclk/2. But the capacitor solution might still be beneficial and IIRC @evanh, you expect it might even be possible to achieve without HW if just single device is present on the bus and the data bus pins become less capacitively loaded.
  • At the upper end, a P2 at 333MHz and HyperRAM v2 clock at 166MHz has a period of 3ns. To be able to go this fast using sysclk/1 writes, I imagine we'd like a clock output delay somewhere around 2-2.5 ns which probably gives us some good margin for the different data pin skew too.
  • Am I seeing slew rates of 1V/nS at those signals, or there are any scalling effects in play, scaping from my view?
  • evanhevanh Posts: 16,027
    rogloh wrote: »
    ... you expect it might even be possible to achieve without HW if just single device is present on the bus and the data bus pins become less capacitively loaded.
    Correct. I still think that. I'll be getting an Edge w/HR as soon as they're in Parallax shop.

    Here's same two pins, P24 (orange) and P25 (blue), but driven with two smartpins (NCO frequency mode) instead of bit-bashed. This allows toggling on every sysclock tick. First screenshot is identical config, running in unison, both have unregistered outputs. The second one I've registered, delayes the transition by one sysclock, and inverted the output for P25. The inversion counters the effect of registering when toggling every sysclock like this.

    pin_lat0069.PNG
    The first screen shot shows how close the two pins match each other.

    pin_lat0068.PNG
    The second screenshot shows the additional propagation an unregistered (orange) pin has when compared to a registered pin. Looks to be about 0.6 ns.

    This is sufficient for hyperRAM data setup time when the write data pins are registered and the clock pin is unregistered. And for V2 hyperbus it's even in spec.

    640 x 480 - 8K
    640 x 480 - 8K
  • evanhevanh Posts: 16,027
    Yanomani wrote: »
    Am I seeing slew rates of 1V/nS at those signals, or there are any scalling effects in play, scaping from my view?
    Heh, I wouldn't be reading too much into the slopes. The scope is using passive probes and an analogue frontend of only 200 MHz.

  • evanhevanh Posts: 16,027
    Ah, and here's both pins registered only. Compare with both unregistered, the first one, above. They look the same, which is really cool.
    pin_lat0070.PNG
    640 x 480 - 8K
  • evanhevanh Posts: 16,027
    edited 2020-10-31 09:21
    Hmm, I'm not getting what I expected from P16 and P17. They match up, unregistered, very well too. There is a slight movement but we're talking only 0.05 ns, not the 0.45 ns listed in OnSemi's timing sheet.

    PS: This is revB silicon. I'd hate to think revC is that different though.

  • Just woke up, from within my vamp coffin, to take a look.

    Your oscilloscope just looks fantastic to me, as compared to my old and unbranded pal (now frozen, into a huge NaCl crystal); even at its best days, before turned into a sculpture (as Lot's wife), it never dared going above 25/30 MHz, widhout dimming the traces, well under my then reasonable sight capabilities (1995). :lol:
  • evanhevanh Posts: 16,027
    It cost me a packet, in 2002 I think it was, but it was the bottom model that Yokagawa sold at the time. It's only 4 bits per pixel I think. I couldn't find anything cheaper that had deep storage and four channels.

    They had something like four or five ranges of scopes. There were 8-channel all-in-one with 1 GHz frontends but there was also modular rack mounted gear above that. I presume they directly competing against HP at the time.

  • Out of curiosity: could you please do a try at P52 versus P54, or they need to match an even/odd pair?
  • evanh wrote: »
    It cost me a packet, in 2002 I think it was, but it was the bottom model that Yokagawa sold at the time. It's only 4 bits per pixel I think. I couldn't find anything cheaper that had deep storage and four channels.

    They had something like four or five ranges of scopes. There were 8-channel all-in-one with 1 GHz frontends but there was also modular rack mounted gear above that. I presume they directly competing against HP at the time.

    Mine was darn cheaper; traded for some consulting services, executed at one then-upcoming-startup that didn't left ground, at all, due to lack of funding.

    It seemed kind of a confortable bactrian camel to me, at the time; turned to be a dromedary, with an inflated and fake extra hump, at the end.
  • evanhevanh Posts: 16,027
    edited 2020-10-31 10:50
    The registered vs unregistered can be any pins. Only the input to output propagation has to be a pair.

    I'm getting the same unexpected outcome with P52 vs P54. About 0.3 ns between the two pins for both registered and unregistered outputs. I was expecting registered to not have that difference.

    What it suggests is the difference is all in the clock tree, not the I/O routes.

    PS: It also implies each final hidden output stage is placed up against the pad ring next to the pin cell it is associated with. And that stage will use the same clock tree branch that goes into that pin cell. Each custom pin cell in the pad ring has its own sysclock input.

  • Your observations are making perfect sense of the situation, at least to me.
  • evanhevanh Posts: 16,027
    edited 2020-11-02 00:38
    I guess that posses a problem for my hopes. Given there is up to 0.8 ns of difference in output timings because of the clock tree differences. Either we're going to have to be very choosy on which Prop2 pins get used for talking to the hyperRAM, or we need to add more than 1.0 ns to the clock signal. I feel that the 3.0 ns option is just too much. A small capacitor might still be needed.

  • the 74alvc125 buffer I'm using lists a typical propagation time of 1.8 nsec. Since its already hooked onto the clock input (with a cap to ground in case we want to load/delay the clock), perhaps the output side of the buffer is useful?

    To date the output side of the buffer was just for driving a cro without disturbing the input signals, but it doesn't have to be this way. The buffer is already there and connected into clock and data

  • evanh wrote: »
    I guess that posses a problem for my hopes. Given there is up to 0.8 ns of difference in output timings because of the clock tree differences. Either we're going to have to be very choosy on which Prop2 pins get used for talking to the hyperRAM, or we need to add more than 1.0 ns to the clock signal. I feel that the 3.0 ns option is just too much. A small capacitor might still be needed.

    Yep, I'd expect 2-2.5ns delay is probably the sweet spot area for clock skew relative to the fastest data bit IMO. That still gives enough margin for fast-slow pin deviation and a sufficient setup time for HyperRAM (extra hold time is not needed).

    For anyone wondering, all this is mainly useful for attempting sysclk/1 writes. Sysclk/2 writes is already fine as is.
  • roglohrogloh Posts: 5,837
    edited 2020-11-02 01:40
    Tubular wrote: »
    the 74alvc125 buffer I'm using lists a typical propagation time of 1.8 nsec. Since its already hooked onto the clock input (with a cap to ground in case we want to load/delay the clock), perhaps the output side of the buffer is useful?

    To date the output side of the buffer was just for driving a cro without disturbing the input signals, but it doesn't have to be this way. The buffer is already there and connected into clock and data

    Yeah 1.8ns could be useful if it can toggle fast enough without attenuating the 166MHz clock signal too much. If you have spare outputs in the package you could route one with a solder pad jumper as a clock option perhaps and another for probing.

    One problem is that gate delay becomes problematic if it varies much. One TI data sheet I just looked at for this part mentioned min-max propagation delays from 1.1-2.8ns at 3.3V +/- 10% (so mainly temp range variation). That's where it's nice if the delay could be inside the P2 so we can minimise further part variations.
  • evanhevanh Posts: 16,027
    rogloh wrote: »
    For anyone wondering, all this is mainly useful for attempting sysclk/1 writes. Sysclk/2 writes is already fine as is.
    Yes, totally. Also, it allows simplifying control of the clock pin not having to be switching data rates. Can then ditch the pre data phase clock pausing.
  • evanhevanh Posts: 16,027
    edited 2020-11-02 02:23
    rogloh wrote: »
    One problem is that gate delay becomes problematic if it varies much. One TI data sheet I just looked at for this part mentioned min-max propagation delays from 1.1-2.8ns at 3.3V +/- 10% (so mainly temp range variation).
    That'll affect read timing rather than writes. Reads are already frequency and temperature dependant. I'm thinking a run-time auto-calibrate-on-init routine will sort out each board. And monitoring of temperature can be added as well.

    EDIT: Possibly have a pass/fail test for usability of sysclock/1.

  • I've mentioned to Roger but not sure evanh that i'm going to butt a 47k themistor up against each HyperRam, to get some idea of local temperature.

    Each thermistor also performs the CS pullup function, so if you want to read its value you put the P2 into 150 kohm pulldown mode, so the voltage is still above 1.65v mid threshold, but you can take a reading and then resume normal operation
  • evanhevanh Posts: 16,027
    edited 2020-11-02 02:31
    Roger mentioned it. I've been thinking an internal ADC (eg: The drift in VIO reading) would work but something external will be more accurate.
  • __deets____deets__ Posts: 203
    edited 2021-06-26 17:11

    Not sure if this is the right thread, but I'm trying to get the version 0.8b running with flexspin 5.5.1. The latter crashes and I'll tell Eric about it, but one thing it warns about that looks legit: in programFlash it warns about origCount being used uninitialized, and that seems to be correct. From the looks of it, maybe it should be initialized to byteCount (it is later, but byteCount is not assigned anything earlier).

    Maybe there's a lingering issue here.

    Edit: I was stupid and named different things the same way. That caused the crash.

  • roglohrogloh Posts: 5,837
    edited 2021-06-27 01:31

    Yeah @deets , I think you've located a minor bug there.

    In this case if you choose to erase HyperFlash first using 256kB sectors prior to programming, then any optional callback notification progress during this erase step would not correctly report the original number of bytes that would be programmed so any calculation using it could be off if it was zero and the callbck code tried to divide by 0 for example. I'll look into fixing it, perhaps by making it include the number of erased sectors somehow so you know how long erase will take prior to reprogramming large chunks of flash. In the meantime a simple fix is to just change its first use to byteCount instead as below.

    I think the version of flex I compiled with back then must not have had this handy warning in it or I likely would have noticed this uninitialized variable.

    ' erase as needed
    flags &= (ERASE_SECTOR_256K | ERASE_ENTIRE_FLASH | ERASE_SHOW_PROGRESS)
    if flags & ERASE_SECTOR_256K
        eraseAddr := addr
        repeat
            if (r := eraseFlash(eraseAddr, flags))
                return r
            eraseAddr += ERASE_SECTOR_256K
            if callback <> 0
                callback(0, byteCount, @stop)  '<<<<< change this line
                if stop
                    return ERR_CANCELLED ' we can still cancel erase if done by sectors
        while eraseAddr < addr + byteCount
    elseif flags & ERASE_ENTIRE_FLASH
        if (r := eraseFlash(eraseAddr, flags))
            return r
    
  • I had to turn the warnings explicitly on, maybe that’s a setting you also need to make.

    Thanks for the prompt fix.

  • I tried out rogloh's HyperRam (and HyperFlash) card driver and sample software (0.8b, and I put deets's fix in), and, to be blunt, was miserably disappointed. Not in the driver, No No!

    Just in the hardware. It seemed mostly okay, off and on, up to about 150MHz, sysclk/2, registered clk pin, but after that it was just hopeless. Sysclk/1 was worse. This is completely unacceptable for my application - looks like I'm going back to SRAM and a kajillion pins again.

    The layout I suspect is not good - I'm using a P2 Edge on a Jon MacPhalen "breadboard" adapter, and the module is plugged into 'base' pin 0 (closest to the chip!). I have not stuck a 22pF capacitor on it.

    A suggestion I'd make for the RAM tests is like the display evanh was using. "Number of (Zero!! Everything else is failure) bit errors" is, to me, much more interesting than a percentage. I'd also be curious about a sysclk/4 option - To me, memory interface speed isn't as important as core speed. Many thanks for writing it all in the first place!

    I have no particular interest in the Flash, so didn't test it beyond seeing that it worked once.

    Anyhow, looks like that card in this situation is good for 75MHz (150/2) - and that's it. Hope this is useful information. Thanks! S.

Sign In or Register to comment.