If you don't have any digital storage-enabled instrument (oscilloscope or logic analyzer) you still have the option to spent a little more time and craft your own, using a free Streamer and a bunch of jumper wires.
Since HyperRam transactions involve only a few signals, as little as 1 to 8 kB of Hub space could be reserved to behave as your next bench-top friend.
IIRC there could be at least two examples, crafted by Cluso99 and Ozpropdev, available somewhere, buried within the many P2 threads. I'm telling this based on my own memories, thus, failure is expected, ever (listening to Joe Cocker singing "With a little help from my friends", while typing.... Good Karma!)
Since you intend to use Hyper_CK = 62.5 MHz, you can program the HyperRam with the following parameters:
- Fixed Latency Count (two times latency count);
- Three Hyp_CK pulses latency (good up to 80 MHz);
- Linear burst always enabled (you are using it, already);
- Limit each Data Memory Read / Write operation to 240 words (480 bytes), maximum;
With the above constraints, you'll spent a maximum of 10 Hyp_CK periods to do all the rest of the interface timing (two times latency counts included).
This will lead to 240 + 10 = 250 Hyp_CK periods to transfer 480 bytes. @ 62.5 MHz, and will cost exactly 4 uS. The net effective data transfer throughput will be 4 uS / 480 = 120 MB/s.
If you are able to time the initial and ending Hyp_CK x Hyp_CSn sequencies correctly (do follow the data sheet timings/waveforms as close as possible; it is your map to a treasure), you never need to worry about the state of RWDS during a read operation; it is fully deterministic, provided your data memory access reads more than a few words (3 to 4 (6 to 8 bytes), IIRC), prior to reaching the actual row ending address, the next row (1024 bytes) is automaticaly accessed in the background, providing seamless read access across row boundaries.
Data memory write cycles have no option to work, other than using the internal dual row buffer mechanism, to ensure the ability of seamless data memory write accesses, crossing row boundaries, because RWDS is unavailable for signaling (other than enabling/disabling current byte position overwrite), during data memory write cycles.
Ensureing a good schedulling at the HR address space usage, so as to avoid reading/writing fewer than, says, 8 words (16 bytes) each time, is a sure path to success, and also relieves the HR internal logic, giving it ample time for the management of automated refresh operations, that are carried at the actual row(s) being accessed, while the two times fixed latency count gives another opportunity for the internal (to the HR) row refresh counter to request a hidden refresh operation per read/write operation cycle, being executed by the HR controller (actualy, the P2).
Hope it helps (while I do a refresh operation at my own memories)
...
If you are able to time the initial and ending Hyp_CK x Hyp_CSn sequencies correctly (do follow the data sheet timings/waveforms as close as possible; it is your map to a treasure), you never need to worry about the state of RWDS during a read operation; it is fully deterministic, provided your data memory access reads more than a few words (3 to 4 (6 to 8 bytes), IIRC), prior to reaching the actual row ending address, the next row (1024 bytes) is automaticaly accessed in the background, providing seamless read access across row boundaries.
Can you expand that some more ?
Do those few words need to access into the next page, or does the HyperRAM have a look ahead scheme where it readies the next row to the one it is outputting ?
Re. the delay for RWDS, it is not clear to me that it is fixed. Maybe it is.
If I do get out the scope, I could check...
My reading was that RWDS compensates for round-trip delays, in pin drivers, PCB traces etc.
I think it can skip if it needs to, on crossing a ROW , but it should be otherwise 'fixed' in the sense it is a round-trip path delay.
PVT will vary the RWDS delay, but at lower clock speeds, a couple of ns are less critical.
With Linear Burst enabled, whenever a read/write main data memory access operation reaches any address within the last 8 words of any row (can be less; I personally doubt this amounts to way less than that number), the HR internal state machine (processor??) is schedulled to read the next row (memory address range are not automatically crossed on dual stacked dice devices), preparing for an eventual need of seamless accessing across row boundaries.
If the controller ensures at least that number of contiguous accesses, before reaching each row ending address, the internal logic has enough time to use the secondary buffer, filling it with the next row, preparing it to be used, seamlessly.
Otherwise, in a data memory read operation, RWDS would need to be retained at the last data valid state (low, since the last valid data was output after the RWDS falling edge), till the next row could be read, at the secondary buffer.
Data memory writes have no option to wait, since RWDS can't be used, during their operation, thus the secondary buffer needs to be, and is, in fact, ever operational.
Remember, every main data memory operation always refresh the row being accessed, so the forcefully need of a secondary row buffer, fully interchangeable with the primary, to the point they are indistinguishable, being swaped, one in place of the other, as needed.
Looks like you have to input RWDS to see how much latency there will be:
"During the CA transfer portion of a read or write transaction, RWDS acts as an output from a HyperRAM device to indicate whether additional initial access latency is needed in the transaction."
Looks like you have to input RWDS to see how much latency there will be:
"During the CA transfer portion of a read or write transaction, RWDS acts as an output from a HyperRAM device to indicate whether additional initial access latency is needed in the transaction."
My reading is that is only needed/applicable for variable latency, so a fixed latency (longer) always has a slot for refresh to complete.
docs:
Configuration Register 0 ... Variable Latency
– Whether an array read or write transaction will use fixed or variable latency.
If fixed latency is selected the memory will always indicate a refresh latency and delay the read data transfer accordingly.
If variable latency is selected, latency for a refresh is only added when a refresh is required at the same time a new transaction is starting.
It could be useful to connect RWDS to a SmartPin counter, to check you have the expected number of ACK edges during debug ?
Provided the driving routine is still the same you'd posted at the beginning of the thread, I believe I've found at least a clue.
At the sToggleClock helper routine, you are using dirh #Pin_CK to enable the clock pin to be actively driven to the HR, just before the clock counting/generating reploop, and dirl #Pin_CK just after the end of the loop.
Well, as I stated before, in a former post, it's strongly advisable to keep the HR CK pin actively driven, either LOW (inactive state, as described at the datashhets) or HIGH, but anyway, it should never be left floating.
IMHO, this is sure a point that needs attention: always keep the CK pin actively driven: Low, when not being used, toggling to HIGH, then to LOW, when needed.
The rest position needs to be the Inactive state (Actively driven LOW).
If you had a scope (4channel ideally) you could measure all the pulse widths and get familiar with how things are looking with no lines. Then when the lines show up see if you can see a difference on the signals to see if something is changing on the P2. Maybe that would help point in a direction.
I always use my scope and/or logic analyzer to know, for sure, that things are signalling like I think they are. At least, I use them initially to confirm that I don't have any drive/float issues that would haunt my efforts going forward.
Here's some P2 HyperRAM code I've been using successfully for some time now - maybe this will be of some help. I've been playing around with HR for a couple of years now but really haven't looked at it closely in quite a while. The attached word doc gives some background info
I did some overnight runs some time ago writing 128 byte blocks of random #’s filling the full HR data space and didn’t see errors. Not at my pc right now but I think there’s a test routine in there somewhere that I used. I do remember messing around with drive strength at the time too
Have you tested doing a lot of consecutive reads for hours at a time?
Does your code just read forever, once the image is first written ?
With a refresh time of ~ 64ms, usually one would expect any serious problems should appear within a few seconds.
Maybe something is marginal around the refresh or timing side, to the point temperature drift matters ?.
Perhaps a change from an image, to a known test pattern like triangle modulation would mean you could 'check read for expected value' on each scan (assuming enough time)
Refresh minimum times are applicable to the highest specified temperature where leakage current
can be many orders of magnitude greater than at ambient. At room temperature I wouldn't rely on the
contents fading as a reliable diagnostic.
Its a known technique for grabbing DRAM contents to cool the chips, power down, maybe move
RAM into test jig and power up, contents basically intact.
Just thinking that the nearest 3.3V decoupling capacitor is ~1 inch away. Also, I'm using the switching regulator instead of LDO... This might be a power issue.
It sounds like it refreshes the entire row when part of it is accessed. I'm thinking power must be marginal... This might explain what's happening...
Ok, switched to LDO power and dropped in a 10uF Ta cap nearby. Hope this fixes it...
if so, please check the following excerpt I'd took from it:
'Latency Clocks
nop
'setbyte dira,#$00,#0
mov i2,#27'24 'need to check that this is right...
LoopLat1
outnot #Pin_CK
djnz i2,#LoopLat1
Since the clock count value is ODD, it will leave #Pin_Ck = High, passing this value to the following routines.
Please note that each stage of the HyperBus interface starts and ends with CK = Low (Inactive), thus you need to check if this was provided during each and every part of your routines.
IIRC, P2 uses the same Wire-OR convention, when it comes for multiple Cogs, accessing a single pin; then, leveraging from this fact, and also because #Pin_CK can be left actively driven LOW, at any time (limited to not exceeding the longest possible CSn = Low time, as defined in the maximum refresh timing) you can get rid of any pin float control at #Pin_CK, and ensure it = Low, by design, even when passing control to/from any existing helper Cog routines.
Also make sure the control software is changing the state of #Pin_CSn only when #Pin_CK = Low (inactive).
Still with what's in first post. But, I just added a "outl #Pin_CK" before raising CS in that write subroutine.
The writing only gets done once at startup though.
Hard to imagine it would make a difference many minutes later...
I just added a second USB cable. I think that should give it extra power, in case it was low.
Going to let it run for a while... Then, I'll go get the scope...
Update: it's been 3 hours and it's still good. I'll let it go a little while longer and then actually measure the rails with 1 and 2 USB cables for power. Could be 500 mA is not enough for 250 MHz and 3 cogs...
Update2: Ugh... Lines just showed up after ~ 4 hours...
Comments
If you don't have any digital storage-enabled instrument (oscilloscope or logic analyzer) you still have the option to spent a little more time and craft your own, using a free Streamer and a bunch of jumper wires.
Since HyperRam transactions involve only a few signals, as little as 1 to 8 kB of Hub space could be reserved to behave as your next bench-top friend.
IIRC there could be at least two examples, crafted by Cluso99 and Ozpropdev, available somewhere, buried within the many P2 threads. I'm telling this based on my own memories, thus, failure is expected, ever (listening to Joe Cocker singing "With a little help from my friends", while typing.... Good Karma!)
Henrique
I'll just reduce the number of clocks until pixels start going missing and then I'll know when to stop...
Since you intend to use Hyper_CK = 62.5 MHz, you can program the HyperRam with the following parameters:
- Fixed Latency Count (two times latency count);
- Three Hyp_CK pulses latency (good up to 80 MHz);
- Linear burst always enabled (you are using it, already);
- Limit each Data Memory Read / Write operation to 240 words (480 bytes), maximum;
With the above constraints, you'll spent a maximum of 10 Hyp_CK periods to do all the rest of the interface timing (two times latency counts included).
This will lead to 240 + 10 = 250 Hyp_CK periods to transfer 480 bytes. @ 62.5 MHz, and will cost exactly 4 uS. The net effective data transfer throughput will be 4 uS / 480 = 120 MB/s.
If you are able to time the initial and ending Hyp_CK x Hyp_CSn sequencies correctly (do follow the data sheet timings/waveforms as close as possible; it is your map to a treasure), you never need to worry about the state of RWDS during a read operation; it is fully deterministic, provided your data memory access reads more than a few words (3 to 4 (6 to 8 bytes), IIRC), prior to reaching the actual row ending address, the next row (1024 bytes) is automaticaly accessed in the background, providing seamless read access across row boundaries.
Data memory write cycles have no option to work, other than using the internal dual row buffer mechanism, to ensure the ability of seamless data memory write accesses, crossing row boundaries, because RWDS is unavailable for signaling (other than enabling/disabling current byte position overwrite), during data memory write cycles.
Ensureing a good schedulling at the HR address space usage, so as to avoid reading/writing fewer than, says, 8 words (16 bytes) each time, is a sure path to success, and also relieves the HR internal logic, giving it ample time for the management of automated refresh operations, that are carried at the actual row(s) being accessed, while the two times fixed latency count gives another opportunity for the internal (to the HR) row refresh counter to request a hidden refresh operation per read/write operation cycle, being executed by the HR controller (actualy, the P2).
Hope it helps (while I do a refresh operation at my own memories)
Henrique
Do those few words need to access into the next page, or does the HyperRAM have a look ahead scheme where it readies the next row to the one it is outputting ?
If I do get out the scope, I could check...
My reading was that RWDS compensates for round-trip delays, in pin drivers, PCB traces etc.
I think it can skip if it needs to, on crossing a ROW , but it should be otherwise 'fixed' in the sense it is a round-trip path delay.
PVT will vary the RWDS delay, but at lower clock speeds, a couple of ns are less critical.
With Linear Burst enabled, whenever a read/write main data memory access operation reaches any address within the last 8 words of any row (can be less; I personally doubt this amounts to way less than that number), the HR internal state machine (processor??) is schedulled to read the next row (memory address range are not automatically crossed on dual stacked dice devices), preparing for an eventual need of seamless accessing across row boundaries.
If the controller ensures at least that number of contiguous accesses, before reaching each row ending address, the internal logic has enough time to use the secondary buffer, filling it with the next row, preparing it to be used, seamlessly.
Otherwise, in a data memory read operation, RWDS would need to be retained at the last data valid state (low, since the last valid data was output after the RWDS falling edge), till the next row could be read, at the secondary buffer.
Data memory writes have no option to wait, since RWDS can't be used, during their operation, thus the secondary buffer needs to be, and is, in fact, ever operational.
Remember, every main data memory operation always refresh the row being accessed, so the forcefully need of a secondary row buffer, fully interchangeable with the primary, to the point they are indistinguishable, being swaped, one in place of the other, as needed.
Henrique
It's a registered contract, between you and the HyperRam device. Do you part, with the interface signaling.
It'll do its job. Be happy, you are the master (of the bus); it's your slave (HyperRam)!
"During the CA transfer portion of a read or write transaction, RWDS acts as an output from a HyperRAM device to indicate whether additional initial access latency is needed in the transaction."
My reading is that is only needed/applicable for variable latency, so a fixed latency (longer) always has a slot for refresh to complete.
docs:
Configuration Register 0 ... Variable Latency
– Whether an array read or write transaction will use fixed or variable latency.
If fixed latency is selected the memory will always indicate a refresh latency and delay the read data transfer accordingly.
If variable latency is selected, latency for a refresh is only added when a refresh is required at the same time a new transaction is starting.
It could be useful to connect RWDS to a SmartPin counter, to check you have the expected number of ACK edges during debug ?
With fixed latency count enabled, RWDS will be always high, ~12nS after CSn goes Low.
So its high state can be used to ensure the presence of an HR device, ready at the bus, but is not mandatory for it to be checked.
Take a look at the timing diagrams, at the datasheets. It's enlightening!
May have changed something after posting that messed it up...
been almost 2 hours and still good.
Would be very strange to go so long and then start having problems...
Ok, shoot, 2 lines just showed up.
Hope we can figure this out...
Provided the driving routine is still the same you'd posted at the beginning of the thread, I believe I've found at least a clue.
At the sToggleClock helper routine, you are using dirh #Pin_CK to enable the clock pin to be actively driven to the HR, just before the clock counting/generating reploop, and dirl #Pin_CK just after the end of the loop.
Well, as I stated before, in a former post, it's strongly advisable to keep the HR CK pin actively driven, either LOW (inactive state, as described at the datashhets) or HIGH, but anyway, it should never be left floating.
IMHO, this is sure a point that needs attention: always keep the CK pin actively driven: Low, when not being used, toggling to HIGH, then to LOW, when needed.
The rest position needs to be the Inactive state (Actively driven LOW).
Hope it helps
Henrique
In this case, the clock helper cog overrides it during the read...
I always use my scope and/or logic analyzer to know, for sure, that things are signalling like I think they are. At least, I use them initially to confirm that I don't have any drive/float issues that would haunt my efforts going forward.
Richard
Have you tested doing a lot of consecutive reads for hours at a time?
With a refresh time of ~ 64ms, usually one would expect any serious problems should appear within a few seconds.
Maybe something is marginal around the refresh or timing side, to the point temperature drift matters ?.
Perhaps a change from an image, to a known test pattern like triangle modulation would mean you could 'check read for expected value' on each scan (assuming enough time)
can be many orders of magnitude greater than at ambient. At room temperature I wouldn't rely on the
contents fading as a reliable diagnostic.
Its a known technique for grabbing DRAM contents to cool the chips, power down, maybe move
RAM into test jig and power up, contents basically intact.
https://en.wikipedia.org/wiki/Cold_boot_attack#Technical_details
It sounds like it refreshes the entire row when part of it is accessed. I'm thinking power must be marginal... This might explain what's happening...
Ok, switched to LDO power and dropped in a 10uF Ta cap nearby. Hope this fixes it...
Nope, lines back after just ~20 minutes...
Are you still using the same version of the control software, that's available at the first post of this thread?
Since the clock count value is ODD, it will leave #Pin_Ck = High, passing this value to the following routines.
Please note that each stage of the HyperBus interface starts and ends with CK = Low (Inactive), thus you need to check if this was provided during each and every part of your routines.
IIRC, P2 uses the same Wire-OR convention, when it comes for multiple Cogs, accessing a single pin; then, leveraging from this fact, and also because #Pin_CK can be left actively driven LOW, at any time (limited to not exceeding the longest possible CSn = Low time, as defined in the maximum refresh timing) you can get rid of any pin float control at #Pin_CK, and ensure it = Low, by design, even when passing control to/from any existing helper Cog routines.
Also make sure the control software is changing the state of #Pin_CSn only when #Pin_CK = Low (inactive).
Hope it helps
Henrique
The writing only gets done once at startup though.
Hard to imagine it would make a difference many minutes later...
I just added a second USB cable. I think that should give it extra power, in case it was low.
Going to let it run for a while... Then, I'll go get the scope...
Update: it's been 3 hours and it's still good. I'll let it go a little while longer and then actually measure the rails with 1 and 2 USB cables for power. Could be 500 mA is not enough for 250 MHz and 3 cogs...
Update2: Ugh... Lines just showed up after ~ 4 hours...