[ Moved from emulation topic - https://forums.parallax.com/discussion/comment/1539903/#Comment_1539903 ]
Maybe optimistic there Rayman. Inner loop is sysclock/14:
drvl pinClk '2
waitx #2 '8
drvh pinClk '10
wfbyte inb '14
28 MBytes/s, without overheads, would need sysclock of 411 MHz.
BTW: It wouldn't take much to tighten that loop to sysclock/8. Something like:
drvl pinClk '2
drvh pinClk '6
drvl pinClk '10
drvh pinClk '14
wfbyte inb '16
Hmm... Been a while, not sure how I messed that up...
Maybe I never posted the fast version...
Just looked and the inner assembly loop is like this:
Ah, that's writing the flash, not reading. Err, no. But that won't work as is. Not enough lag compensation. Or it might just, at slowest sysclocks.
Oh, I see what's going on, you're clocking 513 bytes, which likely doesn't hurt. It starts with this:
testp pinBase wc
if_nc jmp #DoReadBlock3
which is sampling the rx pin containing the old data byte. New data byte appears after the loop exits. Which in turn is picked up by your posted BlockByteLoop3 loop. And it continues picking up the old/prior bytes - same as my suggestion above - but it clocks one too many as a result.
A comment in the code says:
' 04JUN20: Multi-block read speed up to 28 MB/s with 300 MHz clock
Yep, sysclock/10 is close to best. You should be able to do this:
drvl pinClk ' 8, low going clocks out from flash
nop '2 10
drvh pinClk '4 12
wfbyte inb '6 14
To do any better than sysclock/8 requires frequency calibration. Like what gets done with the PSRAMs.