@evanh said:
PS: That rxlag lsb comment is accurate. Roger does the same in his driver. The lsb of his lag compensation "delay" is used to enable/disable the data pin registration (pin sync he calls it). This provides a small but effective delay line effect to minutely adjust (Something like 0.5 nanosecond) the phase timing that allows more sampling options to finely adjust for receiving incoming data.
But registrating imposes a whole sysclock tick of latency to the data pin on top of the phase shift. So that has to be accounted for as it gets set/unset.
Yep, it's complicated and in an ideal world it'd be great to hide much of this complexity where possible. I ended up settling on having a single delay value which incorporates the half step (registered data pins) as the LS bit of this number to try to hide much of this from the user and I have to deal with the extra latency internally in the driver. But it's all unavoidable when writing your own code unfortunately, it'll come back to rear it's ugly head again for you.
Writes are not typically as complicated as the clock and data can go out together (on the same port) so the phase relationship can remain locked. But reads involve a board dependent (temperature variable) round trip delay from the time the clock edge is output from the P2 to the sampling of the received data back to the P2 which the P2 can only sample some number of integer clocks later. Thankfully on the P2 the other pin settings such as Schmitt trigger and registered in or output can tweak things even finer that the integer clock count, otherwise we'd be limited to slower clocked read operations where the sampling transitions can be more closely centered in the middle of the data transitions.
What I was talking about is modeling the system to an extent that the correct latency cycles / pin modes for a particular board at a given clock frequency can be determined without having to trial and error it. (and of course figure out how to translate that to configuration variables of different drivers -> this is the easy part). If the relevant measurement series can be compressed into a few hundred ms and a few K of code, I think it would even be viable to do this on application startup. Just testing different settings at the current clockfreq doesn't work IME, you can hit marginal settings that almost-work and will not error out within a reasonable number of test transfers. Need tests at adjacent frequencies to characterize it properly. Tests where only 1 cog is active are similarly invalid, the whole thing becomes more precarious under load.
Previous experiments with using the schmitt mode on PSRAM didn't go very well at all for me.
Yeah it's an ugly problem to resolve perfectly - if that is even possible. Writing special code to measure it all under load and auto configure everything is not fun either but probably the best way to do it.
Everything can be predetermined except the rxlag value. And that can be quickly calibrated at init. The SD mode driver deals with all this already by reading back a known pattern repeatedly, adjusting rxlag into the timing centre spot.
PS: The posted psram_qpi.spin2 object doesn't attempt this. It wasn't needed for the testing since the testing was only ever running a scan of frequencies and lag settings. I'll add it ...
@evanh said:
Everything can be predetermined except the rxlag value. And that can be quickly calibrated at init. The SD mode driver deals with all this already by reading back a known pattern repeatedly, adjusting rxlag into the timing centre spot.
It works for the SD driver because that's backed up by the CRC check. Read again what I just wrote:
@Wuerfel_21 said:
Just testing different settings at the current clockfreq doesn't work IME, you can hit marginal settings that almost-work and will not error out within a reasonable number of test transfers. Need tests at adjacent frequencies to characterize it properly. Tests where only 1 cog is active are similarly invalid, the whole thing becomes more precarious under load.
@Wuerfel_21 said:
It works for the SD driver because that's backed up by the CRC check. Read again what I just wrote:
Auto-calibrated won't be any worse than the existing empirical preset value. Just removes the need to guess.
PS: If the memory board is too rickety to handle a sysclock/2 divider then throw the board out.
PPS: If it fails on the Edge EC-32MB then we need to adapt. That can be our benchmark.
@evanh said:
Auto-calibrated won't be any worse than the existing empirical preset value. Just removes the need to guess.
No, because with manual settings you can run the burn-in test to let it cook for ~10 hours to be sure that one really works. And usually that translates to identical copies of the same board.
PS: If the memory board is too rickety to handle a sysclock/2 divider then throw the board out.
PPS: If it fails on the Edge EC-32MB then we need to adapt.
I first ran into that fun issue of settings that almost work but then with more load they don't quite and crash your thing after a few minutes ON THE EC-32MB. It just is like that.
@Wuerfel_21 said
I first ran into that fun issue of settings that almost work but then with more load they don't quite and crash your thing after a few minutes ON THE EC-32MB. It just is like that.
Yeah, that can happen. A reboot will fix it. How robust are we wanting?
PS: Auto-calibrated still won't be any worse than empirical though. It can happily choose the longer edge of the measured good timings.
@Wuerfel_21 said
I first ran into that fun issue of settings that almost work but then with more load they don't quite and crash your thing after a few minutes ON THE EC-32MB. It just is like that.
Thermal effect?
Maybe? The particular incident I'm referring to is when I started adding audio to NeoYume (before even hooking up the ADPCM streaming!) and those 2 extra cogs pushed the PSRAM timing over the edge where I needed to deal with the whole P_SYNC_IO thing to stop it from randomly crashing. This pre-dates any 3rd-party PSRAM boards (at least in my possesion). Though IME there isn't a huge cold-start / warm-start difference as far as running boards in my basement goes (though heating up the PSRAM chips by e.g. having them exposed to the sun can cause errors (at least on that rickety-to-begin-with 96MB board - they don't usually get hot on their own though).
There's a related problem where the P2 cores can crash and sometimes it's difficult to tell this apart from memory corruption. The burn-in tester I wrote is good at that. I figured out a way to waste a lot of power and sub-standard boards (like a whole bunch of SimpleP2 board revisions I got from @Rayman) just instantly suffer core crash on that program. No weird cyan video glitches, just instant dead. Meanwhile memory errors just show up as a "FAIL".
I should also do well to recall to us that one time I pointed a hair dryer at the EC-32MB (this was after the whole SYNC_IO thing got fixed and ADPCM got added):
Notice how the audio just stops? In hindsight think that's actually the OPNBcog suffering core crash. Notice how that happens before the screen stops updating (not sure if 68000 ROM read got corrupted or the whole cog crashed, difficult to tell, see above). Those white bars are also very interesting, I don't know what causes them (beyond "the blitter is getting garbage sprite data")
@evanh said:
Yeah, that can happen. A reboot will fix it. How robust are we wanting?
PS: Auto-calibrated still won't be any worse than empirical though. It can happily choose the longer edge of the measured good timings.
I think you can do a good auto-calibration, but not without switching the clock speed around a bunch to feel out which of e.g. 2 candidate timings is better.
EDIT:
though this itself will need severe reliability testing to make sure it doesn't have flukes where it picks the bad timing at random.
So IMO it'd still be better to characterize the board once and then keep that info around.
Comments
Yep, it's complicated and in an ideal world it'd be great to hide much of this complexity where possible. I ended up settling on having a single delay value which incorporates the half step (registered data pins) as the LS bit of this number to try to hide much of this from the user and I have to deal with the extra latency internally in the driver. But it's all unavoidable when writing your own code unfortunately, it'll come back to rear it's ugly head again for you.
Writes are not typically as complicated as the clock and data can go out together (on the same port) so the phase relationship can remain locked. But reads involve a board dependent (temperature variable) round trip delay from the time the clock edge is output from the P2 to the sampling of the received data back to the P2 which the P2 can only sample some number of integer clocks later. Thankfully on the P2 the other pin settings such as Schmitt trigger and registered in or output can tweak things even finer that the integer clock count, otherwise we'd be limited to slower clocked read operations where the sampling transitions can be more closely centered in the middle of the data transitions.
What I was talking about is modeling the system to an extent that the correct latency cycles / pin modes for a particular board at a given clock frequency can be determined without having to trial and error it. (and of course figure out how to translate that to configuration variables of different drivers -> this is the easy part). If the relevant measurement series can be compressed into a few hundred ms and a few K of code, I think it would even be viable to do this on application startup. Just testing different settings at the current clockfreq doesn't work IME, you can hit marginal settings that almost-work and will not error out within a reasonable number of test transfers. Need tests at adjacent frequencies to characterize it properly. Tests where only 1 cog is active are similarly invalid, the whole thing becomes more precarious under load.
Previous experiments with using the schmitt mode on PSRAM didn't go very well at all for me.
Yeah it's an ugly problem to resolve perfectly - if that is even possible. Writing special code to measure it all under load and auto configure everything is not fun either but probably the best way to do it.
Everything can be predetermined except the rxlag value. And that can be quickly calibrated at init. The SD mode driver deals with all this already by reading back a known pattern repeatedly, adjusting rxlag into the timing centre spot.
PS: The posted psram_qpi.spin2 object doesn't attempt this. It wasn't needed for the testing since the testing was only ever running a scan of frequencies and lag settings. I'll add it ...
It works for the SD driver because that's backed up by the CRC check. Read again what I just wrote:
>
Auto-calibrated won't be any worse than the existing empirical preset value. Just removes the need to guess.
PS: If the memory board is too rickety to handle a sysclock/2 divider then throw the board out.
PPS: If it fails on the Edge EC-32MB then we need to adapt. That can be our benchmark.
No, because with manual settings you can run the burn-in test to let it cook for ~10 hours to be sure that one really works. And usually that translates to identical copies of the same board.
I first ran into that fun issue of settings that almost work but then with more load they don't quite and crash your thing after a few minutes ON THE EC-32MB. It just is like that.
Thermal effect?
Yeah, that can happen. A reboot will fix it. How robust are we wanting?
PS: Auto-calibrated still won't be any worse than empirical though. It can happily choose the longer edge of the measured good timings.
Maybe? The particular incident I'm referring to is when I started adding audio to NeoYume (before even hooking up the ADPCM streaming!) and those 2 extra cogs pushed the PSRAM timing over the edge where I needed to deal with the whole P_SYNC_IO thing to stop it from randomly crashing. This pre-dates any 3rd-party PSRAM boards (at least in my possesion). Though IME there isn't a huge cold-start / warm-start difference as far as running boards in my basement goes (though heating up the PSRAM chips by e.g. having them exposed to the sun can cause errors (at least on that rickety-to-begin-with 96MB board - they don't usually get hot on their own though).
There's a related problem where the P2 cores can crash and sometimes it's difficult to tell this apart from memory corruption. The burn-in tester I wrote is good at that. I figured out a way to waste a lot of power and sub-standard boards (like a whole bunch of SimpleP2 board revisions I got from @Rayman) just instantly suffer core crash on that program. No weird cyan video glitches, just instant dead. Meanwhile memory errors just show up as a "FAIL".
I should also do well to recall to us that one time I pointed a hair dryer at the EC-32MB (this was after the whole SYNC_IO thing got fixed and ADPCM got added):
Notice how the audio just stops? In hindsight think that's actually the OPNBcog suffering core crash. Notice how that happens before the screen stops updating (not sure if 68000 ROM read got corrupted or the whole cog crashed, difficult to tell, see above). Those white bars are also very interesting, I don't know what causes them (beyond "the blitter is getting garbage sprite data")
I think you can do a good auto-calibration, but not without switching the clock speed around a bunch to feel out which of e.g. 2 candidate timings is better.
EDIT:
though this itself will need severe reliability testing to make sure it doesn't have flukes where it picks the bad timing at random.
So IMO it'd still be better to characterize the board once and then keep that info around.