Something rather strange in Pulse/Cycle Smartpin mode output

evanh · 2020-02-19 09:06

[deleted]

Yanomani · 2020-02-19 09:10

Hi rogloh

Good morning from the sunny Rio de Janeiro (05:31A.M. here...)

I have just woke-up and re-read all the near-past comments...

First things first;

- Besides (and because of) the embodied self-refresh circuitry are someway "cursed" by concerns about maximum tCSM, at least when dealing with HyperRams (I've didn't studied HyperFlashs that hard, yet), one can't use none arbitrary low-speed clock to drive their CK (or CK, CK#) pin (s).

Soon after detecting CK = LOW (mere 30 nS), they'll enter Active Clock Stop mode. The former, and also the exigence of a 50/50 duty cycle signal at the CK pin (or CK/CK# pin-pair) ( and self-refresh concerns too), would pose a lower-limit of 33.33 MHz to the clock frequency.

P.S. ERROR, the limit would be 16.67 MHz (didn't entirelly wake up, at least my brain calcullator is yet taking some naps) :LOL:

Now, about RWDS (to be continued soon) (low-speed typing concerns)...

evanh · 2020-02-19 09:15

Yanomani wrote: »

Soon after detecting CK = LOW (mere 30 nS), they'll enter Active Clock Stop mode. The former, and also the exigence of a 50/50 duty cycle signal at the CK pin (or CK/CK# pin-pair) ( and self-refresh concerns too), would pose a lower-limit of 33.33 MHz to the clock frequency.

I don't remember any slow clock issues.

EDIT: Hmm, maybe I didn't test much below 30 MHz 15 MHz HR clock ...

EDIT2: Just tested writing at 5 MHz HR clock and seems to be working.
EDIT3: Tested writing at 2.5 and 1.0 MHz HR clock and seems to be working.
EDIT4: Tested both reading and writing, separately, at 0.25 MHz HR clock - All good.

EDIT5: Note the testing routine is streamer + smartpin. The opposing direction of the test is then bit-bashed.

Yanomani · 2020-02-19 09:31

You can only take full control over RWDS during write-cycles, as you depicted in your post, and there is no need to worry about note 5, as whicker did pointed earlier; it's just about ensureing a meaningfull LOW time as a preamble, if you intend to mask-out the writing process at the very first data byte.

Yanomani · 2020-02-19 09:42

Hi evanh

For short-lenght data-cycles, a little less than 16.67 MHz would not impair the self-refresh circuitry from keeping ram contents.

Every time the 30nS LOW, at CK, is detected, Active Clock Stop mode will be entered, and the power consumption would begin to fall. If you keep recursivelly entering (and leaving) Active Clock Stop mode, you are like of walking over the razor's edge.

Sure, one can go further, and exercise the true limits of data retention, as per real die temperature, and its a good thing doing that kind of exercise, since we are all learning a bit, each day.

evanh · 2020-02-19 09:55

Ah, so only data retention is the concern. I was thinking it might create a timing issue.

Test has burst length of 50_000 bytes for each block. So burst time is 200 ms at 0.25 MHz clock. Data is procedurally random and non-repeating. All bits are verified, any mismatch is counted in error bits.

I do vaguely remember temperature made a huge difference to the HR retention times.

Yanomani · 2020-02-19 10:45

You are totally right; only data retention would be affected.

And re-reading about active clock stop mode: besides being internally sampled for the 30 nS innitial LOW level at the CK pin, it'll be entered after tACC + 30 nS, wich gives (keeping 50/50 % CK dutty-cycle) roughly 76 nS + 76nS = 152 nS for the clock period (100 MHz, 3.0 V parts), or a ~6.6 MHz lower limit.

Pending all the calculations involved, after subtracting the minimum tCSHI (between transactions), the CA phase and fixed latency periods (each latency period = 3 CK periods: CK frequency lower than 80 MHz), roughly 16 Words could be transfered, without hardly consuming the 4 uS maximum available time for CS# = LOW.

rogloh · 2020-02-19 13:23

Hi Yanomani, I was just testing this at a slow clock speed so I can capture on my logic analyser which can only sample up to 16MHz and also make clock edges somewhat cleaner for my crappy 50MHz scope. There is no HyperRAM fitted on my P2 board right now. When I am confident the PASM timing is spot on I will plug it back in and test this new code for real again at the higher speed. Once I am happy with this code I don't think I will bother to test it at low speed unless I see some problems at high speed.

If it is true that I don't have to drive RWDS low first before high after the address phase and it is safe to drive out this signal half way between third and fourth clocks (in the middle of the clock low interval) I've found a sequence I can use to get the min latency = 3 working and run the clock output fully back-to-back at sysclk/2 once it starts up and that's as tight as I can get it. Data timing all looks fine now, and I've been tweaking the RWDS one-shot counter computation to line it up for all the different cases. This is all for single accesses, I still need to test the hub fifo burst case, hopefully the fifo won't introduce streamer delay that throws this off once I issue the first RDFAST early enough to be sure some data is loaded in time.

jmg · 2020-02-19 18:43

rogloh wrote: »

If it is true that I don't have to drive RWDS low first before high after the address phase and it is safe to drive out this signal half way between third and fourth clocks (in the middle of the clock low interval) I've found a sequence I can use to get the min latency = 3 working and run the clock output fully back-to-back at sysclk/2 once it starts up and that's as tight as I can get it. Data timing all looks fine now, and I've been tweaking the RWDS one-shot counter computation to line it up for all the different cases..

I'm not following this ? - my reading has RWDS an output during the whole of the address phase, and it is only an input during write data, as a write data mask.
ie the chip tells the user, via RWDS, right after CS# goes low, if it is going to need the slow or fast preambles.

rogloh wrote: »

If it is true that I don't have to drive RWDS low first before high after the address phase..

If that means after the float time, you could always use a modest pull down, so the pin idles low, if that buys some sysclks ?

rogloh wrote: »

.. and it is safe to drive out this signal half way between third and fourth clocks (in the middle of the clock low interval)..

The OctalRAM part specs a tDMV, and that is relative to the clock LOW edge, of 0ns - ie looks like that should be defined on the falling edge, but it also give tIS tIH on the following active clock edges.
If you are not using masking, a simple pull down would seem to give a valid DQSM for the whole of a write cycle ?

Cluso99 · 2020-02-19 20:02

Not really following all of this, but could the pins invert mode help with the start low problem?

Yanomani · 2020-02-19 22:18

By executing an early FLTL instruction, targeting the pin connected to RWDS; wouldn't it be enough to expose a 1.5 kOhm pull-down, ensureing the HyperRam will "see" a valid Low-level during the entire write period, commencing as soon as it "floats" its own internal RWDS driving circuitry?

At their datasheets, both Cypress and Issi are very omissing in providing comprehensive information, as per their Write Timing Parameters tables, as per their Write Timing Diagrams, althought one thing can be extracted from all that pile of digital documents:

- During write cycles: HyperRams don't need to drive their RWDS pins out of the CA-phase, where it's solelly used to identify their need for the bus-controller (P2) to account for a second latency period (impending self-refresh operations urgeing enough time to complete, or stacked-dice devices limitations, whichever the reason is), or not.

- Also during write cycles: HyperRams are absolutelly precluded from driving their RWDS pins; shortly before, during main memory array data-input period, and soon after its end, untill being deselected by CS# = High, in order to don't affect any write-mask flagging.

Yanomani · 2020-02-19 23:00

A tip (rogloh's privilege, by its turn):

Don't mind if you need to enlarge HyperCK = LOW periods, to extend the end of any HyperRam controller interface phase, or delay the beggining of another one.

Passing from the CS# = High (unselected) to the CS# = Low (selected), then to the CA phase, then to the Latency count phase (single or double), then to the data-transfer phase (either Read or Write and including any "preamble" time, before defining a possible write-mask (RWDS = High), to be applied to the first byte to be written), then to the CS# = High (unselected) can (and should) all be made with HyperCK = Low (CK idle state), irrespective if it needs to be made in a way that "seems" to violate the 50/50 duty cycle specc'd for CK, or not.

In fact, a unique exception to the above rule does exists, for sure: the passing of a write-mask to any second-byte of any word, to be masked out from being written, that needs to occur during the time CK is High, to be sampled when it falls to a Low level.

The main thing to keep in mind is: if possible, try to don't violate the maximum CS# = Low time, in order to don't step into the UFC-alike self-refresh arena.

The rest is your playground. Enjoy it...

rogloh · 2020-02-20 00:07

jmg wrote: »

I'm not following this ? - my reading has RWDS an output during the whole of the address phase, and it is only an input during write data, as a write data mask.
ie the chip tells the user, via RWDS, right after CS# goes low, if it is going to need the slow or fast preambles.

Yes it is an output during the address phase, but is it also an output during some of the latency phase after CA2 is sent but before the data is sent I wonder? That is when I am enabling the output of this signal. It is very convenient to output then to keep the clocks back to back. I cannot delay this RWDS Smartpin signal output any further without introducing a larger delay or gap clocks between address/data phases in the code due to the output pipelining. The latest I can do this with the tightest latency clocks = 3 is at the timing point I mentioned between clock 3 and 4. I really want to avoid gapping the clock if possible, it slows the transfer down and adds more code to start the streamer and clock smartpin up again.

jmg wrote:

rogloh wrote: »

If it is true that I don't have to drive RWDS low first before high after the address phase..

If that means after the float time, you could always use a modest pull down, so the pin idles low, if that buys some sysclks ?

Possibly helpful if the existing approach doesn't work out for some reason, though the Parallax board does not have this resistor - I guess it could be soldered on.

"jmg wrote:

The OctalRAM part specs a tDMV, and that is relative to the clock LOW edge, of 0ns - ie looks like that should be defined on the falling edge, but it also give tIS tIH on the following active clock edges.
If you are not using masking, a simple pull down would seem to give a valid DQSM for the whole of a write cycle ?

I just looked at a more recent ISSI HyperRAM device data sheet released last December which mentions this tDMV as well (old data sheet didn't have it) but doesn't specify its value anywhere else in the pdf(!). At least OctaRAM mentions it is 0 ns so it might be the same for HyperRAM. Waveform attached below. Looks like they want to clock in a low for RWDS at the falling edge of the clock at the end of the latency period and it is mentioned as being a "preamble" to the mask. In some cases am not sending this initial low right now so this could be a problem. When I drive it out it will either be all LOW if not masked for even address transfers (which should be okay), or HIGH from the midpoint of clock cycles 3+4 all the way to when it drops low for the second byte transferred (for odd byte address write, even byte skipped), which means it won't supply this preamble. I don't think it will be happy with that.

Cluso99 wrote: »

Not really following all of this, but could the pins invert mode help with the start low problem?

Well it might help but I sort of ran into issues with this Smartpin mode timing that prevented me from getting the one shot behaviour I wanted at the right cycle (hence this original post). I've found a workaround that lets me drive RWDS active high for some time then goes low and I can control this precisely to the clock cycle I need based on the latency, but it doesn't do the preamble. If I can find a another solution to that it might solve this start low issue. What I really want is a way to start out low, wait some programmed time I have computed, then drive a single high pulse for two P2 clocks then go low again, plus if I don't issue the command at all (based on a C flag) it just stays low the entire time the Smartpin is enabled (default state).

rogloh · 2020-02-20 00:10

Yanomani wrote:

The main thing to keep in mind is: if possible, try to don't violate the maximum CS# = Low time, in order to don't step into the UFC-alike self-refresh arena.

Yes I am not doing that anymore. It is less than 4us now and larger bursts are broken up. Seemed to stop the graphics issues I saw initially.

Yanomani · 2020-02-20 01:01

Here's what Cypress has to say about tDMV:

Cypress_Write_tDMV_Shown_01.pdf

"Document Number: 001-97964 Rev. *M Revised July 19, 2019"

evanh · 2020-02-20 01:12

It does seem somewhat at odds to say RWDS must be driven low prior to mask/data, then say the minimum duration is 0 ns. To me, even the early drive requirement seems extraneous, let alone requiring it to be low first.

RWDS has the exact same setup and hold requirements as the data pins do.

Yanomani · 2020-02-20 01:21

I believe that thing of "needing to have" some preamble is being a little overestimated...

The way I see it is: the "normal" situation is every data byte would be accepted and written to the main memory array, and because the HyperRam device has NOT been driving the RWDS line since the end of the CA phase (it was 3-staded by the Hyper, in order to become an input), the HyperBus controller (P2) must start driving it to a meaningfull level, before the very first data byte to be written (or not).

The sooner the bus controller starts to drive RWDS to a meaningfull logic level, the lesser the opportunity some unintended noise will have, to take over it, and mess with an otherwise floating input, wich everyone of us ever try to contend, ever since.

The decision (to write or not to write) will be taken after both the data byte AND the mask bit have been internally sampled (into the device), thus the bus controller must provide meaningfull data bytes AND mask bits, at the same time, as specc'd (and expected).

Yanomani · 2020-02-20 01:29

evanh wrote: »

It does seem somewhat at odds to say RWDS must be driven low prior to mask/data, then say the minimum duration is 0 ns. To me, even the early drive requirement seems extraneous, let alone requiring it to be low first.

RWDS has the exact same setup and hold requirements as the data pins do.

+1...

rogloh · 2020-02-20 01:38

Yanomani wrote: »

Here's what Cypress has to say a

Thanks for that, rather more useful than ISSI's omission.

evanh wrote: »

It does seem somewhat at odds to say RWDS must be driven low prior to mask/data, then say the minimum duration is 0 ns. To me, even the early drive requirement seems extraneous, let alone requiring it to be low first.

RWDS has the exact same setup and hold requirements as the data pins do.

I know, it is somewhat weird. The fact that they call it a "preamble" implies to me that it might be somewhat like a start bit, and it may possibly be needed to detect a change or clear itself from the idle or a floating high state. It might be needed in cases where you leave the RWDS pin totally disconnected or pulled high with a resistor if that is even supported. I can see the HyperRAM might be usable without an RWDS pin if you don't ever mask and keep the latency fixed and just use the main clock at the correct time for latching read data without needing this pin as a data valid strobe (like we ignore it in the P2). So maybe this low detection is useful for that case.

rogloh · 2020-02-20 01:39

I'm playing with the one-shot pulse mode again to see if I can get something to work that will drive RWDS low at the right "preamble" time.

Yanomani · 2020-02-20 01:51

Depending on the arrangement of the interconnections between P2 and the Hyper, and sure, the rate of HyperCK, its easier to prepare a write/no-write table in ram (whichever ram, we have plenty of it to choose from) and use a nearby (and freely available, within the -3 to +3 reach) smart pin, in order to have a 2 x HyperCK frequency, and clock the RWDS pin as a synchronous serial interface transmiter.

With a little tweaking (our day-by-day bread and butter), one can have an easy way to comply with each and every timing, as specced into that bunch of pdfs.

Graphics controller, anyone?

Yanomani · 2020-02-20 02:03

rogloh wrote: »

I know, it is somewhat weird. The fact that they call it a "preamble" implies to me that it might be somewhat like a start bit, and it may possibly be needed to detect a change or clear itself from the idle or a floating high state. It might be needed in cases where you leave the RWDS pin totally disconnected or pulled high with a resistor if that is even supported. I can see the HyperRAM might be usable without an RWDS pin if you don't ever mask and keep the latency fixed and just use the main clock at the correct time for latching read data without needing this pin as a data valid strobe (like we ignore it in the P2). So maybe this low detection is useful for that case.

By tying RWDS to a LOW level (thru a pull-down, like jmg did mentioned earlier) you've got a Write EVER ram. Doing it to a HIGH level, a Write NEVER ram is at your disposal.

This reminds me of some history about Write ONLY roms, from the past...

rogloh · 2020-02-20 02:06

Yanomani wrote:

...and clock the RWDS pin as a synchronous serial interface transmiter.

Yes if the one-shot pulse mode way doesn't work, I'll look into the serial transmitter method you've mentioned. The P2 can send out up to 32 bits serially which should cover a lot of the latency range. It would need to be inverted compared to regular serial.

Yanomani · 2020-02-20 02:17

Yeah...

I ever thinked about a way to spare another streamer, only to have the RWDS write/no-write bitstream, hitting the pin in absoute synch with the byte stream.

For the topmost CK frequencies, another streamer is unnavoidable; sych serial is unnusable (too slow), but, when the interface is done at a lower-than-maximum rate...

Yanomani · 2020-02-20 02:24

I ever recall some memories about MATRIX, and the falling characters at the screen...

Pitty the same arrangement of bits that the streamer can control, can't be used with synch serial, due to the -3 to +3 reach limitations.

It would be a software-fest, having to compare the many ways one can use both the streamer and the smarts...

evanh · 2020-02-20 02:30

In hindsight, Chip probably could have specified that unregistered I/O have a fixed latency of between 1.0 ns and 1.5 ns rather than the less than 1.0 ns that he did.

Config the HR clock to unregistered and the data pins to registered and this should allow a reliable HR data rate equal to sysclock.

rogloh · 2020-02-20 02:46

Just found a way with the pulse mode to get RWDS working with back to back sysclk/2 writes...

- tested latencies from 3 to 12, seems to follow the pattern as I wanted
- tested bytes/words/long transfers
- tested odd/even start addresses

The only thing I can see that might be of concern is where the P2 enables its output beginning half way between clocks 3 and 4 as you can see where we come out of ti-state in the waveform below (yellow=RWDS, cyan=CLK). I can't easily delay this output further. If the HyperRAM has not shut off by then there will be contention.

This example shows a long write (4 bytes in 4 clock edges) to an odd byte address where a RWDS pulse needs to be applied for the first clock after the latency period ends (in this case latency was 3). The logic capture also shows the data bit0. The test data value written was $FF0000FF. 48 bit address pattern sent was $FF000000FFFF.

Yanomani · 2020-02-20 02:50

A cascade of fet switches (or level translators used as switches) can be used as a bucket-brigade, by feeding them with 3.3V, both sides, when they have dual power suplies.

Each stage will introduce <300 pS of delay, but the numbers are not completelly deterministic, though stable within their opperational range.

Yanomani · 2020-02-20 03:00

A cascade of fet switches (or level translators used as switches) can be used as a bucket-brigade, by feeding them with 3.3V, both sides, when they have dual power suplies.

Each stage will introduce <300 pS of delay, but the numbers are not completelly deterministic, though stable within their opperational range.

A simple 8-bit latch can bring a thermometer-alike control to the enable pins of the fet gates, so the ammount of delay can be dinamically managed and exercised.

evanh · 2020-02-20 03:28

I note the Hyperbus v2 Cypress parts are rated to 400 MT/s for both 1.8 V and 3.0 V. I suspect v1 3.0 V parts have no problem running at higher clock than spec'd.

Something rather strange in Pulse/Cycle Smartpin mode output

Comments