Where to find a P2-EC32MB driver for RAM

rogloh · 2025-09-19 03:27

@evanh said:
PS: That rxlag lsb comment is accurate. Roger does the same in his driver. The lsb of his lag compensation "delay" is used to enable/disable the data pin registration (pin sync he calls it). This provides a small but effective delay line effect to minutely adjust (Something like 0.5 nanosecond) the phase timing that allows more sampling options to finely adjust for receiving incoming data.

But registrating imposes a whole sysclock tick of latency to the data pin on top of the phase shift. So that has to be accounted for as it gets set/unset.

Yep, it's complicated and in an ideal world it'd be great to hide much of this complexity where possible. I ended up settling on having a single delay value which incorporates the half step (registered data pins) as the LS bit of this number to try to hide much of this from the user and I have to deal with the extra latency internally in the driver. But it's all unavoidable when writing your own code unfortunately, it'll come back to rear it's ugly head again for you.

Writes are not typically as complicated as the clock and data can go out together (on the same port) so the phase relationship can remain locked. But reads involve a board dependent (temperature variable) round trip delay from the time the clock edge is output from the P2 to the sampling of the received data back to the P2 which the P2 can only sample some number of integer clocks later. Thankfully on the P2 the other pin settings such as Schmitt trigger and registered in or output can tweak things even finer that the integer clock count, otherwise we'd be limited to slower clocked read operations where the sampling transitions can be more closely centered in the middle of the data transitions.

Wuerfel_21 · 2025-09-19 08:05

What I was talking about is modeling the system to an extent that the correct latency cycles / pin modes for a particular board at a given clock frequency can be determined without having to trial and error it. (and of course figure out how to translate that to configuration variables of different drivers -> this is the easy part). If the relevant measurement series can be compressed into a few hundred ms and a few K of code, I think it would even be viable to do this on application startup. Just testing different settings at the current clockfreq doesn't work IME, you can hit marginal settings that almost-work and will not error out within a reasonable number of test transfers. Need tests at adjacent frequencies to characterize it properly. Tests where only 1 cog is active are similarly invalid, the whole thing becomes more precarious under load.

Previous experiments with using the schmitt mode on PSRAM didn't go very well at all for me.

rogloh · 2025-09-19 08:10

Yeah it's an ugly problem to resolve perfectly - if that is even possible. Writing special code to measure it all under load and auto configure everything is not fun either but probably the best way to do it.

evanh · 2025-09-19 08:39

Everything can be predetermined except the rxlag value. And that can be quickly calibrated at init. The SD mode driver deals with all this already by reading back a known pattern repeatedly, adjusting rxlag into the timing centre spot.

PS: The posted psram_qpi.spin2 object doesn't attempt this. It wasn't needed for the testing since the testing was only ever running a scan of frequencies and lag settings. I'll add it ...

Wuerfel_21 · 2025-09-19 08:49

@evanh said:
Everything can be predetermined except the rxlag value. And that can be quickly calibrated at init. The SD mode driver deals with all this already by reading back a known pattern repeatedly, adjusting rxlag into the timing centre spot.

It works for the SD driver because that's backed up by the CRC check. Read again what I just wrote:

@Wuerfel_21 said:
Just testing different settings at the current clockfreq doesn't work IME, you can hit marginal settings that almost-work and will not error out within a reasonable number of test transfers. Need tests at adjacent frequencies to characterize it properly. Tests where only 1 cog is active are similarly invalid, the whole thing becomes more precarious under load.

>

evanh · 2025-09-19 08:55

@Wuerfel_21 said:
It works for the SD driver because that's backed up by the CRC check. Read again what I just wrote:

Auto-calibrated won't be any worse than the existing empirical preset value. Just removes the need to guess.

PS: If the memory board is too rickety to handle a sysclock/2 divider then throw the board out.
PPS: If it fails on the Edge EC-32MB then we need to adapt. That can be our benchmark.

Wuerfel_21 · 2025-09-19 09:05

@evanh said:
Auto-calibrated won't be any worse than the existing empirical preset value. Just removes the need to guess.

No, because with manual settings you can run the burn-in test to let it cook for ~10 hours to be sure that one really works. And usually that translates to identical copies of the same board.

PS: If the memory board is too rickety to handle a sysclock/2 divider then throw the board out.

PPS: If it fails on the Edge EC-32MB then we need to adapt.

I first ran into that fun issue of settings that almost work but then with more load they don't quite and crash your thing after a few minutes ON THE EC-32MB. It just is like that.

rogloh · 2025-09-19 09:14

@Wuerfel_21 said
I first ran into that fun issue of settings that almost work but then with more load they don't quite and crash your thing after a few minutes ON THE EC-32MB. It just is like that.

Thermal effect?

evanh · 2025-09-19 09:34

Yeah, that can happen. A reboot will fix it. How robust are we wanting?
PS: Auto-calibrated still won't be any worse than empirical though. It can happily choose the longer edge of the measured good timings.

Wuerfel_21 · 2025-09-19 09:39

@rogloh said:

@Wuerfel_21 said
I first ran into that fun issue of settings that almost work but then with more load they don't quite and crash your thing after a few minutes ON THE EC-32MB. It just is like that.

Thermal effect?

Maybe? The particular incident I'm referring to is when I started adding audio to NeoYume (before even hooking up the ADPCM streaming!) and those 2 extra cogs pushed the PSRAM timing over the edge where I needed to deal with the whole P_SYNC_IO thing to stop it from randomly crashing. This pre-dates any 3rd-party PSRAM boards (at least in my possesion). Though IME there isn't a huge cold-start / warm-start difference as far as running boards in my basement goes (though heating up the PSRAM chips by e.g. having them exposed to the sun can cause errors (at least on that rickety-to-begin-with 96MB board - they don't usually get hot on their own though).

There's a related problem where the P2 cores can crash and sometimes it's difficult to tell this apart from memory corruption. The burn-in tester I wrote is good at that. I figured out a way to waste a lot of power and sub-standard boards (like a whole bunch of SimpleP2 board revisions I got from @Rayman) just instantly suffer core crash on that program. No weird cyan video glitches, just instant dead. Meanwhile memory errors just show up as a "FAIL".

I should also do well to recall to us that one time I pointed a hair dryer at the EC-32MB (this was after the whole SYNC_IO thing got fixed and ADPCM got added):

Notice how the audio just stops? In hindsight think that's actually the OPNBcog suffering core crash. Notice how that happens before the screen stops updating (not sure if 68000 ROM read got corrupted or the whole cog crashed, difficult to tell, see above). Those white bars are also very interesting, I don't know what causes them (beyond "the blitter is getting garbage sprite data")

Wuerfel_21 · 2025-09-19 09:40

@evanh said:
Yeah, that can happen. A reboot will fix it. How robust are we wanting?
PS: Auto-calibrated still won't be any worse than empirical though. It can happily choose the longer edge of the measured good timings.

I think you can do a good auto-calibration, but not without switching the clock speed around a bunch to feel out which of e.g. 2 candidate timings is better.

EDIT:
though this itself will need severe reliability testing to make sure it doesn't have flukes where it picks the bad timing at random.
So IMO it'd still be better to characterize the board once and then keep that info around.

evanh · 2025-09-20 19:16

Well, I sat down and wrote the auto-calibrator and here's the multiple results for Rayman's 96 MB add-on board at sysclock/2. One result for each of the six DRAM banks. Then stepped up by 2 MHz and run again.

At 280 MHz, rxlag = 7  7  7  7  7  7 
At 282 MHz, rxlag = 7  7  7  7  7  7 
At 284 MHz, rxlag = 7  7  7  7  7  7 
At 286 MHz, rxlag = 7  7  7  7  7  7 
At 288 MHz, rxlag = 7  7  7  7  7  7 
At 290 MHz, rxlag = 7  7  7  7  7  7 
At 292 MHz, rxlag = 7  7  7  7  7  7 
At 294 MHz, rxlag = 7  7  7  7  7  7 
At 296 MHz, rxlag = 7  7  7  7  7  7 
At 298 MHz, rxlag = 7  8  7  7  7  7 
At 300 MHz, rxlag = 7  8  7  7  7  7 
At 302 MHz, rxlag = 7  8  7  7  7  7 
At 304 MHz, rxlag = 7  8  7  7  7  7 
At 306 MHz, rxlag = 7  8  7  7  7  7 
At 308 MHz, rxlag = FAILED! 8  7  7  7  8 
At 310 MHz, rxlag = FAILED! 8  7  7  7  8 
At 312 MHz, rxlag = FAILED! 8  7  FAILED! 7  8 
At 314 MHz, rxlag = FAILED! 8  7  FAILED! 7  8 
At 316 MHz, rxlag = FAILED! 8  7  8  7  8 
At 318 MHz, rxlag = FAILED! 8  7  8  8  8 
At 320 MHz, rxlag = FAILED! 8  8  8  8  8 
At 322 MHz, rxlag = FAILED! 8  8  8  8  8 
At 324 MHz, rxlag = 7  8  8  8  8  8 
At 326 MHz, rxlag = 8  8  8  8  8  8 
At 328 MHz, rxlag = 8  8  8  8  8  8 
At 330 MHz, rxlag = 8  8  8  FAILED! 8  8 
At 332 MHz, rxlag = FAILED! 8  8  FAILED! 8  8 
At 334 MHz, rxlag = 8  FAILED! 8  FAILED! 8  8 
At 336 MHz, rxlag = 8  FAILED! 8  FAILED! 8  8 
At 338 MHz, rxlag = FAILED! FAILED! 8  FAILED! 8  FAILED!
At 340 MHz, rxlag = FAILED! FAILED! 8  FAILED! 8  FAILED!

And as it warms up the results above 300 MHz just get worse!

evanh · 2025-09-20 19:26

The good news is sysclock/3 is perfect with that add-on board. Not getting any fails right up to 350 MHz. So an asymmetrical clock doesn't appear to be detrimental in any way.

Wuerfel_21 · 2025-09-20 19:53

That's my experience, too, sysclk/3 is solid. Which makes sense when you consider these chips are only rated to 133MHz to begin with.

evanh · 2025-09-20 19:57

It's not the chips. It's just the track lengths and number of banks, and maybe mismatch in those lengths. The EC32MB uses the same chips and easily handles sysclock/2.

evanh · 2025-09-20 20:10

I've combined the banks into a single check now too. A zero result is a fail, otherwise it returns the calibrated rxlag value.

Here's the EC32MB at sysclock/2

 DATA_PIN = 40 addpins 15
  CLK_PIN = 56 addpins 0
   CE_PIN = 57 addpins 0
TX_REGD = 1  CLK_REGD = 1  CLK_ADV = 0
SPI cmode:  CPOL = 1  CPHA = 1
SPI clock ratio = 2 (sysclock/2)

Test data length: 100 x 4096 = 409600 bytes

At 280 MHz, rxlag = 6 
At 285 MHz, rxlag = 6 
At 290 MHz, rxlag = 6 
At 295 MHz, rxlag = 6 
At 300 MHz, rxlag = 6 
At 305 MHz, rxlag = 6 
At 310 MHz, rxlag = 6 
At 315 MHz, rxlag = 6 
At 320 MHz, rxlag = 6 
At 325 MHz, rxlag = 7 
At 330 MHz, rxlag = 7 
At 335 MHz, rxlag = 7 
At 340 MHz, rxlag = 7 
At 345 MHz, rxlag = 7 
At 350 MHz, rxlag = 7

And Rayman's 24 MB add-on at sysclock/2

 DATA_PIN = 16 addpins 3
  CLK_PIN = 20 addpins 0
   CE_PIN = 21 addpins 2
TX_REGD = 1  CLK_REGD = 1  CLK_ADV = 0
SPI cmode:  CPOL = 1  CPHA = 1
SPI clock ratio = 2 (sysclock/2)

Test data length: 100 x 1024 = 102400 bytes

At 280 MHz, rxlag = 6 
At 285 MHz, rxlag = 6 
At 290 MHz, rxlag = 6 
At 295 MHz, rxlag = 7 
At 300 MHz, rxlag = 7 
At 305 MHz, rxlag = 7 
At 310 MHz, rxlag = 7 
At 315 MHz, rxlag = 7 
At 320 MHz, rxlag = 7 
At 325 MHz, rxlag = 7 
At 330 MHz, rxlag = 8 
At 335 MHz, rxlag = 8 
At 340 MHz, rxlag = 8 
At 345 MHz, rxlag = 8 
At 350 MHz, rxlag = 8

And Rayman's 96 MB add-on at sysclock/3

 DATA_PIN = 0 addpins 7
  CLK_PIN = 8 addpins 1
   CE_PIN = 10 addpins 5
TX_REGD = 1  CLK_REGD = 1  CLK_ADV = 0
SPI cmode:  CPOL = 1  CPHA = 1
SPI clock ratio = 3 (sysclock/3)

Test data length: 100 x 2048 = 204800 bytes

At 280 MHz, rxlag = 7 
At 285 MHz, rxlag = 7 
At 290 MHz, rxlag = 7 
At 295 MHz, rxlag = 7 
At 300 MHz, rxlag = 7 
At 305 MHz, rxlag = 7 
At 310 MHz, rxlag = 7 
At 315 MHz, rxlag = 7 
At 320 MHz, rxlag = 8 
At 325 MHz, rxlag = 8 
At 330 MHz, rxlag = 8 
At 335 MHz, rxlag = 8 
At 340 MHz, rxlag = 8 
At 345 MHz, rxlag = 8 
At 350 MHz, rxlag = 8

evanh · 2025-09-20 20:21

EDIT: I've made a couple of naming changes. The newer code is here - https://forums.parallax.com/discussion/comment/1569287/#Comment_1569287

evanh · 2025-09-27 09:55

@Wuerfel_21 said:
That's my experience, too, sysclk/3 is solid. Which makes sense when you consider these chips are only rated to 133MHz to begin with.

Ah, a detail: Roger's drivers don't do sysclock/3 at all. It's either /2 or /4. I use PULSE mode, he uses TRANSITION mode, for the smartpin clock gen.

PS: I assume Roger chose TRANSITION explicitly because at a divider of sysclock/2 it stops exhibiting any internal phased response to WYPIN ops. Eliminating one point of frustration ... So then only sysclock/4 has to be dealt to with a stab in the dark. Whereas PULSE has the internal cycling phases at every divider value.

Wuerfel_21 · 2025-09-27 10:01

I think you don't have the special sauce versions that I do

evanh · 2025-09-27 10:10

@Wuerfel_21 said:
I think you don't have the special sauce versions that I do

That Roger wrote?

Wuerfel_21 · 2025-09-27 10:13

Yes. The whole /3 thing was a more recent development after realizing that /4 is too slow. See the whole exmem_mini thing. That has all the special sauce drivers.

evanh · 2025-09-27 10:27

Oh, ha, I was only looking at the recent HyperRAM driver - which is still transition mode. Or at least for that info.

I can see now why the QPI drivers are notably different in that sequence to how they used to be. There's a lot more to it now.

                            wxpin   #1, clkpin              'increase clock update rate
                            drvl    cspin                   'activate chip select
                            drvl    datapins                'enable data bus
                            wxpin   clkduty, clkpin         'resync clock output
                            mergeb  cmdaddr                 'do something useful while we wait one cycle
                            xinit   ximm8lut, cmdaddr       'stream out command+address
                            wypin   clks, clkpin            'start clock output

I'm guessing the spacer instructions are needed between the smartpin instructions. In particular the 6 sysclock ticks from the second WXPIN to WYPIN. It roundly fits internal cycling of the smartpin at both /2 and /3.

evanh · 2025-09-27 10:46

I must admit, that my code is somewhat over-engineered here; it was designed to precisely operate at any divider. It is used to switch down to 400 KHz during SD card init, for example. Whereas sysclock/3 is really as slow as any RAM should be operating at.

Rayman · 2025-09-28 17:16

Guess one thing to consider is that Chip's driver is faster (at least that's what I'm seeing).

Here's it doing 720p. Doesn't seem like the exmem_mini can do it... (Well, not at 16bpp anyway, think can do 8bpp)

(should actually tweak the platform driver if want to do 300 MHz. There's a barely noticeable few pixel flicker. All good at 280 MHz though...)

Where to find a P2-EC32MB driver for RAM

Comments