Macca,
Just give that alternate code a whirl, it's in readSector() function. Adjusting align_delay to suit would be a good idea too. Earlier in readSector(), change to align_delay := spi_period + 3
That'll hopefully fix the MBR reading and tell me I'm targetting the right part.
@evanh, careful, new driver coming with dynamic calculation of align_delay to improve our margin. Fixed a number of other things you pointed out as well... standby... we're an hour or two away from having broad fixes and fully regression-tested, with improved margins for older devices.
Stephen, I got another one for you. Since your minimum divider is 8, this gives plenty of timing leeway. align_delay isn't even needed for block writes:
' Streamer TX with deterministic clock alignment:
ORG
RDFAST #0, p_buf ' Setup RDFAST from hub buffer
SETXFRQ xfrq ' Set streamer bit rate
FLTL _sck ' Reset SCK: counter stops, output LOW, Y=0
XINIT stream_mode, #0 ' Start streamer; NCO ramps from zero
DIRH _sck ' Re-enable: DIR=1 restarts base period counter fresh
WYPIN clk_count, _sck ' Start clock - deterministic phase from dirh
WAITXFI ' Wait for streamer to complete
END
@evanh said:
Macca,
Just give that alternate code a whirl, it's in readSector() function. Adjusting align_delay to suit would be a good idea too. Earlier in readSector(), change to align_delay := spi_period + 3
That'll hopefully fix the MBR reading and tell me I'm targetting the right part.
Tried that, same result. As shown in the debug output, the problem is "[readSector] TIMEOUT waiting for start token" which, AFAIK, is outside the streamer block.
If I comment the speed adjustment in initCard it can read the sector but then can't write, or better, the SD receives garbage because the card is apprently mounted but the MBR is wiped and can't be mounted anymore (error -22 = E_NOT_FAT32). Tried to move the RDFAST at the top of the instructions block without changes.
Since I don't care about performances, I would be happy if it works with the default initial speed (whatever it is).
I'll wait for Stephen's update.
@macca -- v1.5.1 should give your 1GB SDSC much better timing margin at low system clocks. Here is what we improved and the two tests we'd like you to run.
What we improved
Better SCK alignment in bulk-sector reads. We restructured the inline-PASM block in readSector so the first SCK pulse now lands on schedule for every system clock from 200 MHz up. The streamer's sampling now aligns precisely with the card's actual data output. This is the change most likely to fix your sector-0 reads at a sysclk of 250 MHz.
Adaptive MISO sampling in byte-by-byte transfers. The driver now adapts its MISO sampling based on the system-clock-to-SCK ratio. At higher system clocks (≥255 MHz, 25 MHz SPI target), it samples on the edge as before. At lower system clocks, it samples slightly before the edge, giving slower cards more trailing-edge margin. For your card at sysclk=250 MHz, this shifts the sample point by roughly 30% of the bit cell, providing meaningful headroom. Fully internal, no API changes.
Init-sequence pin-setup hygiene. Small reorder to reset the MISO pin before its mode is configured. Eliminates an edge case on re-mount paths.
The write streamer block was not changed in v1.5.1; the analogous reorder is queued for v1.5.2
Tests to run (build each with _CLKFREQ = 250_000_000)
regression-tests/SD_RT_mount_tests.spin2 -- proves the read-path improvements work on your card. Exercises the mount command sequence, reads every sector, and remount cycles. Read-only.
regression-tests/SD_RT_raw_sector_tests.spin2 -- proves the streamer write path works on your card. This test determines whether you need v1.5.2's write-side reorder or if v1.5.1 is sufficient.
UTILS/SD_card_characterize.spin2 -- full register dump from your card. Send us the output, especially the raw CSD bytes and the [USED] TAAC / [USED] NSAC lines, so we can verify our timing model against your specific card.
What the results tell us
Both tests pass → v1.5.1 is sufficient. Send us the register dump, and you're done.
Mount test passes, raw sector test fails → v1.5.2 will include the streamer-write reorder. Send us the failing output.
Mount test fails → reach out; we'll send a small diagnostic tool separately to find the right tuning for your card.
@"Stephen Moraco" said: regression-tests/SD_RT_mount_tests.spin2 -- proves the read-path improvements work on your card. Exercises the mount command sequence, reads every sector, and remount cycles. Read-only.
Cog0 INIT $0000_0000 $0000_0000 load
Cog0 INIT $0000_0F64 $0000_7C58 jump
Cog0
Cog0 ==============================================
Cog0 SD Card Driver - Mount/Unmount Tests
Cog0 ==============================================
Cog0 * init VAR
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Operations before mount should fail gracefully
Cog0
Cog0 * Test #1: openFileRead() before mount
Cog0 openFileRead(): 0
Cog0 -> pass
Cog0
Cog0 * Test #2: createFileNew() before mount
Cog0 createFileNew(): 0
Cog0 -> pass
Cog0
Cog0 * Test #3: freeSpace() before mount
Cog0 freeSpace(): 0
Cog0 -> pass
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Pre-Mount Error Validation
Cog0
Cog0 * Test #4: openFileRead() before mount returns E_NOT_MOUNTED
Cog0 openFileRead(): -20
Cog0 -> pass
Cog0
Cog0 * Test #5: createFileNew() before mount returns E_NOT_MOUNTED
Cog0 createFileNew(): -20
Cog0 -> pass
Cog0
Cog0 * Test #6: readSectorRaw() before mount returns error
Cog0 readSectorRaw(): -20
Cog0 -> pass
Cog0
Cog0 * Test #7: changeDirectory() before mount returns error
Cog0 changeDirectory(): -20
Cog0 -> pass
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Pin offset validation (B-input ±3 limit)
Cog0
Cog0 * Test #8: mount() with SCK too far from MOSI (offset=+5)
Cog0 mount(bad pins): -9
Cog0 -> pass
Cog0
Cog0 * Test #9: mount() with SCK too far from MISO (offset=-4)
Cog0 mount(bad pins): -9
Cog0 -> pass
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Mount SD card
Cog0
Cog0 * Test #10: mount() with valid pins
Cog1 INIT $0000_0F64 $0000_29F6 jump
Cog0 mount(): -7 (expected 0)
Cog0 -> FAIL
Cog0
Cog0 DIAG: error()=-7
Cog0 DIAG: lastCMD13=$$8 lastCMD13Error=$$8
Cog0 DIAG: CRC match=0 mismatch=0 retry=0
Cog0 DIAG: recvCRC=$$0 calcCRC=$$0
Cog0 DIAG: readSectorRaw(0) FAILED
Cog0
Cog0 * Test #11: Verify card is accessible - get volume label
Cog0 volumeLabel() pointer: 13_323
Cog0 -> pass
Cog0
Cog0 * Test #12: Verify free space is reasonable
Cog0 freeSpace(): 0 (expected 1 to 2_147_483_647)
Cog0 -> FAIL
Cog0
Cog0 Card Info:
Cog0 Volume Label:
Cog0 Free Space: 0 sectors (0 MB)
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Unmount SD card
Cog0
Cog0 * Test #13: unmount() card
Cog0 unmount(): 0
Cog0 -> pass
Cog0
Cog0 * Test #14: Operations after unmount should fail
Cog0 freeSpace() after unmount: 0
Cog0 -> pass
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Remount SD card
Cog0
Cog0 * Test #15: mount() again
Cog0 mount() second time: -8 (expected 0)
Cog0 -> FAIL
Cog0 DIAG2: error()=-8
Cog0 DIAG2: lastCMD13=$$0 lastCMD13Error=$$8
Cog0 DIAG2: CRC match=0 mismatch=0 retry=0
Cog0
Cog0 * Test #16: Card still accessible after remount
Cog0 freeSpace() after remount: 0 (expected 1 to 2_147_483_647)
Cog0 -> FAIL
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Multiple mount/unmount cycles
Cog0
Cog0 * Test #17: unmount()
Cog0 unmount(): 0
Cog0 -> pass
Cog0
Cog0 * Test #18: mount() cycle 1
Cog0 mount(): -8 (expected 0)
Cog0 -> FAIL
Cog0
Cog0 * Test #19: unmount()
Cog0 unmount(): 0
Cog0 -> pass
Cog0
Cog0 * Test #20: mount() cycle 2
Cog0 mount(): -8 (expected 0)
Cog0 -> FAIL
Cog0
Cog0 * Test #21: unmount()
Cog0 unmount(): 0
Cog0 -> pass
Cog0
Cog0 * Test #22: mount() cycle 3
Cog0 mount(): -8 (expected 0)
Cog0 -> FAIL
Cog0
Cog0 * Test #23: Card still works after 3 mount/unmount cycles
Cog0 freeSpace(): 0 (expected 1 to 2_147_483_647)
Cog0 -> FAIL
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Post-Unmount State
Cog0
Cog0 * Test #24: openFileRead() after unmount returns E_NOT_MOUNTED
Cog0 openFileRead(): -20
Cog0 -> pass
Cog0
Cog0 * Test #25: createFileNew() after unmount returns E_NOT_MOUNTED
Cog0 createFileNew(): -20
Cog0 -> pass
Cog0
Cog0 * Test #26: changeDirectory() after unmount returns E_NOT_MOUNTED
Cog0 changeDirectory(): -20
Cog0 -> pass
Cog0
Cog0 * Test #27: Remount after unmount succeeds
Cog0 mount() after unmount: -8 (expected 0)
Cog0 -> FAIL
Cog0
Cog0 * Test #28: freeSpace() works after remount
Cog0 freeSpace() after remount: 0 (expected 1 to 2_147_483_647)
Cog0 -> FAIL
Cog0
Cog0
Cog0 ------------------------------------------------------------
Cog0 * Test Group: Double-Mount Behavior
Cog0
Cog0 * Test #29: Double mount returns SUCCESS
Cog0 Sub-Test: mount() again
Cog0 Value: -8 (expected 0)
Cog0 -> Sub-FAIL
Cog0 Sub-Test: freeSpace() valid
Cog0 Value: 0 (expected 4_294_967_295)
Cog0 -> Sub-FAIL
Cog0 Sub-Test Results: count=1, Pass: 0, Fail: 1
Cog0 -> FAIL
Cog0
Cog0 * Test #30: Double mount preserves open file handle
Cog0
Cog0 * Test #31: Double mount preserves working directory
Cog0 Sub-Test: mount() in subdir
Cog0 Value: -8 (expected 0)
Cog0 -> Sub-FAIL
Cog0 Sub-Test: file in subdir
Cog0 Value: 0 (expected 4_294_967_295)
Cog0 -> Sub-FAIL
Cog0 Sub-Test Results: count=1, Pass: 0, Fail: 1
Cog0 -> FAIL
Cog0
Cog0 * Worker cog stack: 108 of 160 longs used
Cog0
Cog0 ============================================================
Cog0 * 31 Tests - Pass: 18, Fail: 12
Cog0 * BAD TEST COUNTS: 31 <> 30 (missing 1 tests)
Cog0 ============================================================
Cog0
Cog0 * Mount/Unmount Tests Complete
Cog0 END_SESSION
regression-tests/SD_RT_raw_sector_tests.spin2 -- proves the streamer write path works on your card. This test determines whether you need v1.5.2's write-side reorder or if v1.5.1 is sufficient.
UTILS/SD_card_characterize.spin2 -- full register dump from your card. Send us the output, especially the raw CSD bytes and the [USED] TAAC / [USED] NSAC lines, so we can verify our timing model against your specific card.
@macca said:
Tried that, same result. As shown in the debug output, the problem is "[readSector] TIMEOUT waiting for start token" which, AFAIK, is outside the streamer block.
@evanh said:
Macca,
Have you tested that card using Flexspin's FAT filesystem and drivers? Do you have a 4-bit SD slot?
No and no (using the P2 Edge built-in slot).
I have other implementations but I think all are using bitbanged SPI, like fsrw using sdspi_asm_mb2, no problems with that at any clock frequency, as far as I can tell.
Just tested sd_dir.bas from flexprop and works (don't know what kind of spi implementation is using or the P2 clock setting, release is 7.6.2)
@macca said:
Just tested sd_dir.bas from flexprop and works (don't know what kind of spi implementation is using or the P2 clock setting, release is 7.6.2)
The driver will be sdmm.cc. It uses smartpins on the Prop2, and pre-calculates the divider at init. There is no streamer ops in that driver. However, it obviously is handling the read block start bit without issue.
Here's the relevant function:
static
int rcvr_datablock ( /* 1:OK, 0:Failed */
BYTE *buff, /* Data buffer to store received data */
UINT btr /* Byte count */
)
{
BYTE *d = __builtin_alloca(2);
UINT tmr, tmout;
tmr = _cnt();
tmout = _clockfreq() >> 3; // 125 ms timeout
for(;;) {
rcvr_mmc( d, 1 );
if( d[0] != 0xFF ) break;
if( _cnt() - tmr >= tmout ) break;
}
if (d[0] != 0xFE) return 0; /* If not valid data token, return with error */
rcvr_mmc(buff, btr); /* Receive the data block into buffer */
rcvr_mmc(d, 2); /* Discard CRC */
return 1; /* Return with success */
}
Ah, I just noticed your card is SDSC rather than SDHC. I've got one such card sitting in my digital camera ... and surprise, Stephen's code fails to read that MBR too, but not at the start bit.
EDIT2: Right, okay, works now I've reformatted to FAT32. But, boy, it's slow at writing one block at a time (CMD24). 14 kB/s
EDIT3: It definitely goes a lot faster using long bursts of CMD25. The 4-bit driver gets 1200 kB/s writing to FAT32 and about 8000 kB/s writing to FAT16.
@evanh said:
send_cmd() is a little complicated ...
Back to the issue - There is a notable difference between sdmm.cc's command handling and how Stephen's code works. Chip Select (CS) management is entirely done at the beginning of the routine in sdmm.cc. Namely it always deselects before reselecting the card, with one exception. If command is CMD12 then it skips the CS management. Whereas Stephen's code attempts to mange CS in a more start-to-finish approach.
Just as a guess, there is a reasonable chance that CS has stayed selected (Low) when it should have toggled High then Low before the CMD17 block read. This would mean the SD card has not seen the command. I have no idea how this would occur for only one card though.
I can't really fault the choice of divider values either. I used a wider range of values in sdmm.cc than in Stephen's but at the higher frequencies from sysclock 200 MHz and above, both are pretty similar. It's about as fast as Parallax's resistor limited boot slot can run at.
@macca, thank you for the logs. These are very useful results. I'm seeing a number of things of interest. This is the first SDSC card we've seen; all the others I tested are SDHC/SDXC. Yours is failing at 25MHz SPI, which is stated as the max for your card, so derating for SDSC cards seems prudent. We don't have evidence yet, but I'm preparing quick tests for you to run, which will give us more insight across many areas, not just SPI speed. v1.5.2 is coming along with a new test for you to run. (Couple of hours yet...)
Comments
Don't know if it helps, but seem to recall a delay in FSRW that depends on clock speed...
Macca,
Just give that alternate code a whirl, it's in
readSector()function. Adjustingalign_delayto suit would be a good idea too. Earlier in readSector(), change toalign_delay := spi_period + 3That'll hopefully fix the MBR reading and tell me I'm targetting the right part.
@evanh, careful, new driver coming with dynamic calculation of align_delay to improve our margin. Fixed a number of other things you pointed out as well... standby... we're an hour or two away from having broad fixes and fully regression-tested, with improved margins for older devices.
Stephen, I got another one for you. Since your minimum divider is 8, this gives plenty of timing leeway. align_delay isn't even needed for block writes:
' Streamer TX with deterministic clock alignment: ORG RDFAST #0, p_buf ' Setup RDFAST from hub buffer SETXFRQ xfrq ' Set streamer bit rate FLTL _sck ' Reset SCK: counter stops, output LOW, Y=0 XINIT stream_mode, #0 ' Start streamer; NCO ramps from zero DIRH _sck ' Re-enable: DIR=1 restarts base period counter fresh WYPIN clk_count, _sck ' Start clock - deterministic phase from dirh WAITXFI ' Wait for streamer to complete ENDYeah, testing indicates the big write-up I had planned isn't needed. At least not for these larger dividers anyway.
Tried that, same result. As shown in the debug output, the problem is "[readSector] TIMEOUT waiting for start token" which, AFAIK, is outside the streamer block.
If I comment the speed adjustment in initCard it can read the sector but then can't write, or better, the SD receives garbage because the card is apprently mounted but the MBR is wiped and can't be mounted anymore (error -22 = E_NOT_FAT32). Tried to move the RDFAST at the top of the instructions block without changes.
Since I don't care about performances, I would be happy if it works with the default initial speed (whatever it is).
I'll wait for Stephen's update.
@macca -- v1.5.1 should give your 1GB SDSC much better timing margin at low system clocks. Here is what we improved and the two tests we'd like you to run.
What we improved
Better SCK alignment in bulk-sector reads. We restructured the inline-PASM block in readSector so the first SCK pulse now lands on schedule for every system clock from 200 MHz up. The streamer's sampling now aligns precisely with the card's actual data output. This is the change most likely to fix your sector-0 reads at a sysclk of 250 MHz.
Adaptive MISO sampling in byte-by-byte transfers. The driver now adapts its MISO sampling based on the system-clock-to-SCK ratio. At higher system clocks (≥255 MHz, 25 MHz SPI target), it samples on the edge as before. At lower system clocks, it samples slightly before the edge, giving slower cards more trailing-edge margin. For your card at sysclk=250 MHz, this shifts the sample point by roughly 30% of the bit cell, providing meaningful headroom. Fully internal, no API changes.
Init-sequence pin-setup hygiene. Small reorder to reset the MISO pin before its mode is configured. Eliminates an edge case on re-mount paths.
The write streamer block was not changed in v1.5.1; the analogous reorder is queued for v1.5.2
Tests to run (build each with _CLKFREQ = 250_000_000)
regression-tests/SD_RT_mount_tests.spin2-- proves the read-path improvements work on your card. Exercises the mount command sequence, reads every sector, and remount cycles. Read-only.regression-tests/SD_RT_raw_sector_tests.spin2-- proves the streamer write path works on your card. This test determines whether you need v1.5.2's write-side reorder or if v1.5.1 is sufficient.UTILS/SD_card_characterize.spin2-- full register dump from your card. Send us the output, especially the raw CSD bytes and the [USED] TAAC / [USED] NSAC lines, so we can verify our timing model against your specific card.What the results tell us
@evanh I'm testing this next. Good find. After passing regression tests, it should ship with v1.5.2
Doesn't look very good...
Oh, good point. Hmm ...
Macca,
Have you tested that card using Flexspin's FAT filesystem and drivers? Do you have a 4-bit SD slot?
No and no (using the P2 Edge built-in slot).
I have other implementations but I think all are using bitbanged SPI, like fsrw using sdspi_asm_mb2, no problems with that at any clock frequency, as far as I can tell.
Just tested sd_dir.bas from flexprop and works (don't know what kind of spi implementation is using or the P2 clock setting, release is 7.6.2)
The driver will be sdmm.cc. It uses smartpins on the Prop2, and pre-calculates the divider at init. There is no streamer ops in that driver. However, it obviously is handling the read block start bit without issue.
Here's the relevant function:
static int rcvr_datablock ( /* 1:OK, 0:Failed */ BYTE *buff, /* Data buffer to store received data */ UINT btr /* Byte count */ ) { BYTE *d = __builtin_alloca(2); UINT tmr, tmout; tmr = _cnt(); tmout = _clockfreq() >> 3; // 125 ms timeout for(;;) { rcvr_mmc( d, 1 ); if( d[0] != 0xFF ) break; if( _cnt() - tmr >= tmout ) break; } if (d[0] != 0xFE) return 0; /* If not valid data token, return with error */ rcvr_mmc(buff, btr); /* Receive the data block into buffer */ rcvr_mmc(d, 2); /* Discard CRC */ return 1; /* Return with success */ }It looks functionally the same as Stephen's code.
The only thing of significance prior to that is:
cmd = count > 1 ? CMD18 : CMD17; /* READ_MULTIPLE_BLOCK : READ_SINGLE_BLOCK */ if (send_cmd(cmd, sect) == 0) {send_cmd() is a little complicated ...
Ah, I just noticed your card is SDSC rather than SDHC. I've got one such card sitting in my digital camera ... and surprise, Stephen's code fails to read that MBR too, but not at the start bit.
EDIT: Oh, Doh! It's FAT16, not FAT32.
EDIT2: Right, okay, works now I've reformatted to FAT32.
But, boy, it's slow at writing one block at a time (CMD24). 14 kB/s 
EDIT3: It definitely goes a lot faster using long bursts of CMD25. The 4-bit driver gets 1200 kB/s writing to FAT32 and about 8000 kB/s writing to FAT16.
Back to the issue - There is a notable difference between sdmm.cc's command handling and how Stephen's code works. Chip Select (CS) management is entirely done at the beginning of the routine in sdmm.cc. Namely it always deselects before reselecting the card, with one exception. If command is CMD12 then it skips the CS management. Whereas Stephen's code attempts to mange CS in a more start-to-finish approach.
Just as a guess, there is a reasonable chance that CS has stayed selected (Low) when it should have toggled High then Low before the CMD17 block read. This would mean the SD card has not seen the command. I have no idea how this would occur for only one card though.
@macca evaluating your results... more soon
@"Stephen Moraco" this is the problem of supporting a new uSD driver…
I can't really fault the choice of divider values either. I used a wider range of values in sdmm.cc than in Stephen's but at the higher frequencies from sysclock 200 MHz and above, both are pretty similar. It's about as fast as Parallax's resistor limited boot slot can run at.
The 4-bit driver isn't limited in that way.
@macca, thank you for the logs. These are very useful results. I'm seeing a number of things of interest. This is the first SDSC card we've seen; all the others I tested are SDHC/SDXC. Yours is failing at 25MHz SPI, which is stated as the max for your card, so derating for SDSC cards seems prudent. We don't have evidence yet, but I'm preparing quick tests for you to run, which will give us more insight across many areas, not just SPI speed. v1.5.2 is coming along with a new test for you to run. (Couple of hours yet...)