Huh, the input inversion in the custom pad ring seems not very effective, it doesn't apply to this feedback path. There is better input inversion option in the F-block and the I config bit could be better used for something else, imho.
Output inversion works as expected but has no effect on timing. That'll be because the XOR inverting gate is always in circuit. See attached.
Ok, so yes there are other places we can invert - we could do it at the output of the intermediate pin or in the FFF block (using A XOR B ), then selecting PinA Schmitt or Logic as the input and driving OUT to 1 whenever we wish to invert. I can see it getting complex though if this has to be dynamic...may be okay if this was setup just once for some fixed operating conditions. Still good to know this type of thing may be doable if tweaking is needed.
One thing Brian checked out the other day is that the AAAA input bit inverter seems separate to the CIO bit inverter, so you can apply both to end up with a noninverted signal (hopefully more delayed though)
I've made use of the "INPUT" feedback to the output completely within the low level custom pad ring circuits. It wouldn't make sense to venture beyond.
Using the F-block for this introduces the hidden clocked stages between the hub and pad ring. The propagation would be multiples of the sysclock for a starters. That test code above is operating at just 10 MHz.
PS: Not to mention there is no other hardware feedback path anyway.
A 3.5ns clock delay from data to clock or thereabouts is probably good for sysclk/1 operation up to 270MHz or so if the hold time is zero ns. So a 252MHz VGA / HDMI frequency could still benefit from this amount of delay, and it could potentially double the write speed vs what we have today at sysclk/2. But the capacitor solution might still be beneficial and IIRC @evanh, you expect it might even be possible to achieve without HW if just single device is present on the bus and the data bus pins become less capacitively loaded.
At the upper end, a P2 at 333MHz and HyperRAM v2 clock at 166MHz has a period of 3ns. To be able to go this fast using sysclk/1 writes, I imagine we'd like a clock output delay somewhere around 2-2.5 ns which probably gives us some good margin for the different data pin skew too.
... you expect it might even be possible to achieve without HW if just single device is present on the bus and the data bus pins become less capacitively loaded.
Correct. I still think that. I'll be getting an Edge w/HR as soon as they're in Parallax shop.
Here's same two pins, P24 (orange) and P25 (blue), but driven with two smartpins (NCO frequency mode) instead of bit-bashed. This allows toggling on every sysclock tick. First screenshot is identical config, running in unison, both have unregistered outputs. The second one I've registered, delayes the transition by one sysclock, and inverted the output for P25. The inversion counters the effect of registering when toggling every sysclock like this.
The first screen shot shows how close the two pins match each other.
The second screenshot shows the additional propagation an unregistered (orange) pin has when compared to a registered pin. Looks to be about 0.6 ns.
This is sufficient for hyperRAM data setup time when the write data pins are registered and the clock pin is unregistered. And for V2 hyperbus it's even in spec.
Hmm, I'm not getting what I expected from P16 and P17. They match up, unregistered, very well too. There is a slight movement but we're talking only 0.05 ns, not the 0.45 ns listed in OnSemi's timing sheet.
PS: This is revB silicon. I'd hate to think revC is that different though.
Just woke up, from within my vamp coffin, to take a look.
Your oscilloscope just looks fantastic to me, as compared to my old and unbranded pal (now frozen, into a huge NaCl crystal); even at its best days, before turned into a sculpture (as Lot's wife), it never dared going above 25/30 MHz, widhout dimming the traces, well under my then reasonable sight capabilities (1995).
It cost me a packet, in 2002 I think it was, but it was the bottom model that Yokagawa sold at the time. It's only 4 bits per pixel I think. I couldn't find anything cheaper that had deep storage and four channels.
They had something like four or five ranges of scopes. There were 8-channel all-in-one with 1 GHz frontends but there was also modular rack mounted gear above that. I presume they directly competing against HP at the time.
It cost me a packet, in 2002 I think it was, but it was the bottom model that Yokagawa sold at the time. It's only 4 bits per pixel I think. I couldn't find anything cheaper that had deep storage and four channels.
They had something like four or five ranges of scopes. There were 8-channel all-in-one with 1 GHz frontends but there was also modular rack mounted gear above that. I presume they directly competing against HP at the time.
Mine was darn cheaper; traded for some consulting services, executed at one then-upcoming-startup that didn't left ground, at all, due to lack of funding.
It seemed kind of a confortable bactrian camel to me, at the time; turned to be a dromedary, with an inflated and fake extra hump, at the end.
The registered vs unregistered can be any pins. Only the input to output propagation has to be a pair.
I'm getting the same unexpected outcome with P52 vs P54. About 0.3 ns between the two pins for both registered and unregistered outputs. I was expecting registered to not have that difference.
What it suggests is the difference is all in the clock tree, not the I/O routes.
PS: It also implies each final hidden output stage is placed up against the pad ring next to the pin cell it is associated with. And that stage will use the same clock tree branch that goes into that pin cell. Each custom pin cell in the pad ring has its own sysclock input.
I guess that posses a problem for my hopes. Given there is up to 0.8 ns of difference in output timings because of the clock tree differences. Either we're going to have to be very choosy on which Prop2 pins get used for talking to the hyperRAM, or we need to add more than 1.0 ns to the clock signal. I feel that the 3.0 ns option is just too much. A small capacitor might still be needed.
the 74alvc125 buffer I'm using lists a typical propagation time of 1.8 nsec. Since its already hooked onto the clock input (with a cap to ground in case we want to load/delay the clock), perhaps the output side of the buffer is useful?
To date the output side of the buffer was just for driving a cro without disturbing the input signals, but it doesn't have to be this way. The buffer is already there and connected into clock and data
I guess that posses a problem for my hopes. Given there is up to 0.8 ns of difference in output timings because of the clock tree differences. Either we're going to have to be very choosy on which Prop2 pins get used for talking to the hyperRAM, or we need to add more than 1.0 ns to the clock signal. I feel that the 3.0 ns option is just too much. A small capacitor might still be needed.
Yep, I'd expect 2-2.5ns delay is probably the sweet spot area for clock skew relative to the fastest data bit IMO. That still gives enough margin for fast-slow pin deviation and a sufficient setup time for HyperRAM (extra hold time is not needed).
For anyone wondering, all this is mainly useful for attempting sysclk/1 writes. Sysclk/2 writes is already fine as is.
the 74alvc125 buffer I'm using lists a typical propagation time of 1.8 nsec. Since its already hooked onto the clock input (with a cap to ground in case we want to load/delay the clock), perhaps the output side of the buffer is useful?
To date the output side of the buffer was just for driving a cro without disturbing the input signals, but it doesn't have to be this way. The buffer is already there and connected into clock and data
Yeah 1.8ns could be useful if it can toggle fast enough without attenuating the 166MHz clock signal too much. If you have spare outputs in the package you could route one with a solder pad jumper as a clock option perhaps and another for probing.
One problem is that gate delay becomes problematic if it varies much. One TI data sheet I just looked at for this part mentioned min-max propagation delays from 1.1-2.8ns at 3.3V +/- 10% (so mainly temp range variation). That's where it's nice if the delay could be inside the P2 so we can minimise further part variations.
For anyone wondering, all this is mainly useful for attempting sysclk/1 writes. Sysclk/2 writes is already fine as is.
Yes, totally. Also, it allows simplifying control of the clock pin not having to be switching data rates. Can then ditch the pre data phase clock pausing.
One problem is that gate delay becomes problematic if it varies much. One TI data sheet I just looked at for this part mentioned min-max propagation delays from 1.1-2.8ns at 3.3V +/- 10% (so mainly temp range variation).
That'll affect read timing rather than writes. Reads are already frequency and temperature dependant. I'm thinking a run-time auto-calibrate-on-init routine will sort out each board. And monitoring of temperature can be added as well.
EDIT: Possibly have a pass/fail test for usability of sysclock/1.
I've mentioned to Roger but not sure evanh that i'm going to butt a 47k themistor up against each HyperRam, to get some idea of local temperature.
Each thermistor also performs the CS pullup function, so if you want to read its value you put the P2 into 150 kohm pulldown mode, so the voltage is still above 1.65v mid threshold, but you can take a reading and then resume normal operation
Not sure if this is the right thread, but I'm trying to get the version 0.8b running with flexspin 5.5.1. The latter crashes and I'll tell Eric about it, but one thing it warns about that looks legit: in programFlash it warns about origCount being used uninitialized, and that seems to be correct. From the looks of it, maybe it should be initialized to byteCount (it is later, but byteCount is not assigned anything earlier).
Maybe there's a lingering issue here.
Edit: I was stupid and named different things the same way. That caused the crash.
Yeah @deets , I think you've located a minor bug there.
In this case if you choose to erase HyperFlash first using 256kB sectors prior to programming, then any optional callback notification progress during this erase step would not correctly report the original number of bytes that would be programmed so any calculation using it could be off if it was zero and the callbck code tried to divide by 0 for example. I'll look into fixing it, perhaps by making it include the number of erased sectors somehow so you know how long erase will take prior to reprogramming large chunks of flash. In the meantime a simple fix is to just change its first use to byteCount instead as below.
I think the version of flex I compiled with back then must not have had this handy warning in it or I likely would have noticed this uninitialized variable.
' erase as needed
flags &= (ERASE_SECTOR_256K | ERASE_ENTIRE_FLASH | ERASE_SHOW_PROGRESS)
if flags & ERASE_SECTOR_256K
eraseAddr := addr
repeat
if (r := eraseFlash(eraseAddr, flags))
return r
eraseAddr += ERASE_SECTOR_256K
if callback <> 0
callback(0, byteCount, @stop) '<<<<< change this line
if stop
return ERR_CANCELLED ' we can still cancel erase if done by sectors
while eraseAddr < addr + byteCount
elseif flags & ERASE_ENTIRE_FLASH
if (r := eraseFlash(eraseAddr, flags))
return r
I tried out rogloh's HyperRam (and HyperFlash) card driver and sample software (0.8b, and I put deets's fix in), and, to be blunt, was miserably disappointed. Not in the driver, No No!
Just in the hardware. It seemed mostly okay, off and on, up to about 150MHz, sysclk/2, registered clk pin, but after that it was just hopeless. Sysclk/1 was worse. This is completely unacceptable for my application - looks like I'm going back to SRAM and a kajillion pins again.
The layout I suspect is not good - I'm using a P2 Edge on a Jon MacPhalen "breadboard" adapter, and the module is plugged into 'base' pin 0 (closest to the chip!). I have not stuck a 22pF capacitor on it.
A suggestion I'd make for the RAM tests is like the display evanh was using. "Number of (Zero!! Everything else is failure) bit errors" is, to me, much more interesting than a percentage. I'd also be curious about a sysclk/4 option - To me, memory interface speed isn't as important as core speed. Many thanks for writing it all in the first place!
I have no particular interest in the Flash, so didn't test it beyond seeing that it worked once.
Anyhow, looks like that card in this situation is good for 75MHz (150/2) - and that's it. Hope this is useful information. Thanks! S.
Comments
Output inversion works as expected but has no effect on timing. That'll be because the XOR inverting gate is always in circuit. See attached.
EDIT: Clarified that output inversion works.
Using the F-block for this introduces the hidden clocked stages between the hub and pad ring. The propagation would be multiples of the sysclock for a starters. That test code above is operating at just 10 MHz.
PS: Not to mention there is no other hardware feedback path anyway.
Here's same two pins, P24 (orange) and P25 (blue), but driven with two smartpins (NCO frequency mode) instead of bit-bashed. This allows toggling on every sysclock tick. First screenshot is identical config, running in unison, both have unregistered outputs. The second one I've registered, delayes the transition by one sysclock, and inverted the output for P25. The inversion counters the effect of registering when toggling every sysclock like this.
The first screen shot shows how close the two pins match each other.
The second screenshot shows the additional propagation an unregistered (orange) pin has when compared to a registered pin. Looks to be about 0.6 ns.
This is sufficient for hyperRAM data setup time when the write data pins are registered and the clock pin is unregistered. And for V2 hyperbus it's even in spec.
PS: This is revB silicon. I'd hate to think revC is that different though.
Your oscilloscope just looks fantastic to me, as compared to my old and unbranded pal (now frozen, into a huge NaCl crystal); even at its best days, before turned into a sculpture (as Lot's wife), it never dared going above 25/30 MHz, widhout dimming the traces, well under my then reasonable sight capabilities (1995).
They had something like four or five ranges of scopes. There were 8-channel all-in-one with 1 GHz frontends but there was also modular rack mounted gear above that. I presume they directly competing against HP at the time.
Mine was darn cheaper; traded for some consulting services, executed at one then-upcoming-startup that didn't left ground, at all, due to lack of funding.
It seemed kind of a confortable bactrian camel to me, at the time; turned to be a dromedary, with an inflated and fake extra hump, at the end.
I'm getting the same unexpected outcome with P52 vs P54. About 0.3 ns between the two pins for both registered and unregistered outputs. I was expecting registered to not have that difference.
What it suggests is the difference is all in the clock tree, not the I/O routes.
PS: It also implies each final hidden output stage is placed up against the pad ring next to the pin cell it is associated with. And that stage will use the same clock tree branch that goes into that pin cell. Each custom pin cell in the pad ring has its own sysclock input.
To date the output side of the buffer was just for driving a cro without disturbing the input signals, but it doesn't have to be this way. The buffer is already there and connected into clock and data
Yep, I'd expect 2-2.5ns delay is probably the sweet spot area for clock skew relative to the fastest data bit IMO. That still gives enough margin for fast-slow pin deviation and a sufficient setup time for HyperRAM (extra hold time is not needed).
For anyone wondering, all this is mainly useful for attempting sysclk/1 writes. Sysclk/2 writes is already fine as is.
Yeah 1.8ns could be useful if it can toggle fast enough without attenuating the 166MHz clock signal too much. If you have spare outputs in the package you could route one with a solder pad jumper as a clock option perhaps and another for probing.
One problem is that gate delay becomes problematic if it varies much. One TI data sheet I just looked at for this part mentioned min-max propagation delays from 1.1-2.8ns at 3.3V +/- 10% (so mainly temp range variation). That's where it's nice if the delay could be inside the P2 so we can minimise further part variations.
EDIT: Possibly have a pass/fail test for usability of sysclock/1.
Each thermistor also performs the CS pullup function, so if you want to read its value you put the P2 into 150 kohm pulldown mode, so the voltage is still above 1.65v mid threshold, but you can take a reading and then resume normal operation
Not sure if this is the right thread, but I'm trying to get the version 0.8b running with flexspin 5.5.1. The latter crashes and I'll tell Eric about it, but one thing it warns about that looks legit: in programFlash it warns about origCount being used uninitialized, and that seems to be correct. From the looks of it, maybe it should be initialized to byteCount (it is later, but byteCount is not assigned anything earlier).
Maybe there's a lingering issue here.
Edit: I was stupid and named different things the same way. That caused the crash.
Yeah @deets , I think you've located a minor bug there.
In this case if you choose to erase HyperFlash first using 256kB sectors prior to programming, then any optional callback notification progress during this erase step would not correctly report the original number of bytes that would be programmed so any calculation using it could be off if it was zero and the callbck code tried to divide by 0 for example. I'll look into fixing it, perhaps by making it include the number of erased sectors somehow so you know how long erase will take prior to reprogramming large chunks of flash. In the meantime a simple fix is to just change its first use to byteCount instead as below.
I think the version of flex I compiled with back then must not have had this handy warning in it or I likely would have noticed this uninitialized variable.
I had to turn the warnings explicitly on, maybe that’s a setting you also need to make.
Thanks for the prompt fix.
I tried out rogloh's HyperRam (and HyperFlash) card driver and sample software (0.8b, and I put deets's fix in), and, to be blunt, was miserably disappointed. Not in the driver, No No!
Just in the hardware. It seemed mostly okay, off and on, up to about 150MHz, sysclk/2, registered clk pin, but after that it was just hopeless. Sysclk/1 was worse. This is completely unacceptable for my application - looks like I'm going back to SRAM and a kajillion pins again.
The layout I suspect is not good - I'm using a P2 Edge on a Jon MacPhalen "breadboard" adapter, and the module is plugged into 'base' pin 0 (closest to the chip!). I have not stuck a 22pF capacitor on it.
A suggestion I'd make for the RAM tests is like the display evanh was using. "Number of (Zero!! Everything else is failure) bit errors" is, to me, much more interesting than a percentage. I'd also be curious about a sysclk/4 option - To me, memory interface speed isn't as important as core speed. Many thanks for writing it all in the first place!
I have no particular interest in the Flash, so didn't test it beyond seeing that it worked once.
Anyhow, looks like that card in this situation is good for 75MHz (150/2) - and that's it. Hope this is useful information. Thanks! S.