Shop OBEX P1 Docs P2 Docs Learn Events
New SD mode P2 accessory board - Page 34 — Parallax Forums

New SD mode P2 accessory board

12930313234

Comments

  • evanhevanh Posts: 16,813
    edited 2025-05-01 01:11

    @rogloh said:
    Same driver, different SD cards on different IO cmd/clk/data pins, right? Not actually sharing the same SD data bus (which I believe is theoretically possible in SD transfer mode); as that case seems like too much complexity.

    Yes, two separate SD pins sets for two separate cards. That's how it is right now too. I have your Eval add-on uSD board at basepin 16 and a hand wired full sized SD slot I made at basepin 40.

    The driver has no support for sharing the SD bus. It never deselects the card. EDIT: Err, it has to deselect to perform a CMD10 (SEND_CID) when it runs a rxlag calibration cycle.

    On that note, I'd very much like to come up with a solution for using block reads instead of CMD10 to do the calibration with. The problem is there's no way to be sure the data blocks being read aren't just all zeros or all ones. I think I'd need to write data to the card storage.

  • evanhevanh Posts: 16,813
    edited 2025-05-01 01:55

    @evanh said:
    On that note, I'd very much like to come up with a solution for using block reads instead of CMD10 to do the calibration with. The problem is there's no way to be sure the data blocks being read aren't just all zeros or all ones. I think I'd need to write data to the card storage.

    Oh, that's right, I did have an idea to attempt engaging 1.8 Volt UHS interface and see how the card handles the Prop2 staying 3.3 Volts ... if that somehow works then I can use UHS's dedicated CMD19 (SEND_TUNING_BLOCK) which uses the DAT pins.

    EDIT: Nah, it'll be a bust. The Vdd supply stays at 3.3 V after UHS switchover. Which means, at the very least, the card's signalling will be too low for the Prop2 inputs at speed.

  • roglohrogloh Posts: 6,090

    Yeah pity. That tuning block command looked useful otherwise. From my memory of old discussions, the pin comparator is slower than the streamer so it may not work at high speed. Still might be worth a quick look though to help measure latency somehow. There may still be some residual correlation between optimal read delay and the response time of the comparator to a known tuning pattern even if its initial delayed response is large due to its own bandwidth limitations. Perhaps try it anyway?

  • evanhevanh Posts: 16,813

    That'd be dog's breakfast. Can't operate at full speed because of the comparator's speed limit, so just projecting from something slower, and then requires power cycling and reiniting after any calibration cycle is done. UHS mode, like SPI mode, can't be switch out of without a power cycle.

  • evanhevanh Posts: 16,813
    edited 2025-05-02 03:45

    And I doubt reliability of any projected method anyway.

    PS: UHS would require bitDAC pin config for outputs as well. Otherwise the Prop2's 3.3 V outputs will likely lift the card's 1.8 V regulator voltage and cause a fault there.

    All those differences between calibrating and full speed operation would need a lot of careful behaviour mapping to make a projection from. The death nail being that different boards with different track lengths will redefine the mappings. And possibly differences in SD cards will impact it too.

  • roglohrogloh Posts: 6,090

    Ok. Sounds dubious then at best. There's no fixed/known JEDEC like structure on the SD somewhere that can be read via block transfers?

  • evanhevanh Posts: 16,813
    edited 2025-05-02 05:35

    There is a couple but they're both basically empty structures. CMD10's CID structure was the best I found with a decent mix of 1's and 0's.

    PS: The calibrator routine performs 12 x CMD10 for each dot, and 80 dots per rxlag setting, and 24 rxlag settings are tested. So possible 24 x 80 x 12 = 23040 issuings of CMD10. In reality a lot less. Each per dot group of 12 is checked for errors, which, when occurs, aborts that whole rxlag setting, flagging it unsuitable, moving on to next setting.

    PPS: CMD10 has 136 bit (17 byte) response, including the CRC and framing.

  • evanhevanh Posts: 16,813
    edited 2025-05-03 14:12

    I've made another tester program, this time for file create and delete testing ...

    Using v1.4 of sdsd.cc

     Delete 200 files ... Duration 1050 ms, 190.47 files/s
       File size = 512 bytes
     Create 200 files ... Duration 1802 ms, 110.98 files/s
     Verify 200 files ... Duration 326 ms, 613.49 files/s
    

    Using v1.2 of sdsd.cc

     Delete 200 files ... Duration 1228 ms, 162.86 files/s
       File size = 512 bytes
     Create 200 files ... Duration 2393 ms, 83.57 files/s
     Verify 200 files ... Duration 502 ms, 398.40 files/s
    

    Using plug-in version of sdmm.cc

     Delete 200 files ... Duration 1568 ms, 127.55 files/s
       File size = 512 bytes
     Create 200 files ... Duration 3164 ms, 63.21 files/s
     Verify 200 files ... Duration 721 ms, 277.39 files/s
    

    And its sister sdmm_bashed.cc

     Delete 200 files ... Duration 1595 ms, 125.39 files/s
       File size = 512 bytes
     Create 200 files ... Duration 3237 ms, 61.78 files/s
     Verify 200 files ... Duration 741 ms, 269.90 files/s
    

    All testing was at 200 MHz sysclock and with a Sandisk Extreme 64 GB card

     CID decode:  ManID=03   OEMID=SD  Name=SN64G
      Ver=8.0   Serial=8ab989e1   Date=2021-2
     Speed Class = C10  UHS Grade = U3  Video Class = V30  App Class = A2
     Card User Capacity = 60906 MiB
    
  • evanhevanh Posts: 16,813

    Samsung EVO 128 GB card

     CID decode:  ManID=1b   OEMID=SM  Name=ED2S5
      Ver=3.0   Serial=49c16906   Date=2023-2
     Speed Class = C10  UHS Grade = U3  Video Class = V30  App Class = A2
     Card User Capacity = 122240 MiB
    

    Using v1.4 of sdsd.cc

     Delete 200 files ... Duration 3058 ms, 65.4 files/s
       File size = 512 bytes
     Create 200 files ... Duration 4797 ms, 41.6 files/s
     Verify 200 files ... Duration 335 ms, 597.0 files/s
    

    Using v1.2 of sdsd.cc

     Delete 200 files ... Duration 3242 ms, 61.6 files/s
       File size = 512 bytes
     Create 200 files ... Duration 5641 ms, 35.4 files/s
     Verify 200 files ... Duration 557 ms, 359.0 files/s
    
  • evanhevanh Posts: 16,813
    edited 2025-05-03 14:42

    Adata Orange 64 GB card, using v1.4 of sdsd.cc

     CID decode:  ManID=1d   OEMID=AD  Name=USD
      Ver=2.0   Serial=000003a0   Date=2024-1
     Speed Class = C10  UHS Grade = U1  Video Class = V10  App Class = A1
     Card User Capacity = 59638 MiB
    
     Delete 200 files ... Duration 1374 ms, 145.5 files/s
       File size = 512 bytes
     Create 200 files ... Duration 2362 ms, 84.6 files/s
     Verify 200 files ... Duration 304 ms, 657.8 files/s
    

    Kingston Select Plus 64 GB card, using v1.4 of sdsd.cc

     CID decode:  ManID=9f   OEMID=TI  Name=SD64G
      Ver=6.1   Serial=58480246   Date=2021-12
     Speed Class = C10  UHS Grade = U1  Video Class = V10  App Class = A1
     Card User Capacity = 59638 MiB
    
     Delete 200 files ... Duration 1773 ms, 112.8 files/s
       File size = 512 bytes
     Create 200 files ... Duration 2724 ms, 73.4 files/s
     Verify 200 files ... Duration 297 ms, 673.4 files/s
    

    Apacer 16 GB card, using v1.4 of sdsd.cc

     CID decode:  ManID=9c   OEMID=SO  Name=USD00
      Ver=0.2   Serial=39a8fbfb   Date=2018-1
     Speed Class = C10  UHS Grade = U1  Video Class = V10  App Class = A0
     Card User Capacity = 15103 MiB
    
     Delete 200 files ... Duration 2839 ms, 70.4 files/s
       File size = 512 bytes
     Create 200 files ... Duration 4732 ms, 42.2 files/s
     Verify 200 files ... Duration 434 ms, 460.8 files/s
    

    Adata 2 GB (Camera) card, using v1.4 of sdsd.cc

     CID decode:  ManID=1D   OEMID=AD  Name=SD   
      Ver=1.0   Serial=A15002BA   Date=2007-5
     Speed Class = C0  UHS Grade = U0  Video Class = V0  App Class = A0
     Card User Capacity = 1962 MiB
    
     Delete 200 files ... Duration 21705 ms, 9.2 files/s
       File size = 512 bytes
     Create 200 files ... Duration 44480 ms, 4.4 files/s
     Verify 200 files ... Duration 262 ms, 763.3 files/s
    
  • evanhevanh Posts: 16,813

    Comparing different numbers of files shows up a growing overhead in handling the FAT filesystem as the directory fills. The verify stage is most affected.
    200 MHz sysclock, Sandisk Extreme 64 GB card, sdsd.cc v1.4 driver, 512 bytes file size

     Create 500 files ... Duration 6208 ms, 80.54 files/s
     Verify 500 files ... Duration 1432 ms, 349.1 files/s
     Delete 500 files ... Duration 3065 ms, 163.1 files/s
    
     Create 200 files ... Duration 2266 ms, 88.27 files/s
     Verify 200 files ... Duration 320 ms, 625.2 files/s
     Delete 200 files ... Duration 1051 ms, 190.4 files/s
    
     Create 50 files ... Duration 530 ms, 94.29 files/s
     Verify 50 files ... Duration 45 ms, 1122 files/s
     Delete 50 files ... Duration 240 ms, 208.4 files/s
    
  • roglohrogloh Posts: 6,090

    Yeah that's interesting. The more sectors the FAT filesystem extends itself into with file allocations, the more FAT sectors need to be navigated through for file verification purposes. It's certainly noticeable in those results.

    The other operations are also slowed down too, but not as much, probably because they are more individual operations instead of being repeatedly and alternately accessed due to the different locations on the disk of the two files being verified.

  • evanhevanh Posts: 16,813
    edited 2025-05-17 10:14

    I'm looking into "discard" erase mode - Hoping it might help with consistency of write performance. Discarding would be the best way to implement block TRIMming. TRIM being the name used in SATA drives to inform a Flash based block device of what can be erased in the background at its leisure, and is an optionally supported feature in Flexspin's filesystem handler.

    First surprise I got is it's simply an uncommon feature to exist in SD cards. Or at least in my collection. Only one card indicates support for discard - The Sandisk Extreme 64 GB card, made in 2021. The even newer cards in my collection don't support discarding. This includes the Samsung Evo which otherwise has a lot of modern SD features.

  • evanhevanh Posts: 16,813
    edited 2025-05-17 12:51

    I think it makes a difference. Consistency appeared to improve after first performing a full FULE erase then setting up a fresh partition table and FAT32 volume on the SD card. Of note is Flexspin's filesystem handler is issuing one TRIM for each file overwrite.

       clkfreq = 330000000   clkmode = 0x10420fb
      Compiled with FlexC v7.2.1
     Speed Class = C10  UHS Grade = U3  Video Class = V30  App Class = A2
      TRIM = 1  FULE = 1
     Card User Capacity = 60906 MiB
     CID decode:  ManID=03   OEMID=SD  Name=SN64G
      Ver=8.0   Serial=8ab989e1   Date=2021-2
     SD clock-divider set to sysclock/3 (110.0 MHz)
      rxlag=13 selected  Lowest=11 Highest=15
     cluster size = 32768 bytes
    

    It's not conclusive. It was previously the same as below but with quite a lot of 21 MB/s and 24 MB/s writing rates popping up as well. Multiple runs are now pretty close to this every run:

     TRIM 7f80..8f7f  Buffer = 2 kB,  Written 2048 kB at 27868 kB/s,  Verified,  Read 2048 kB at 31688 kB/s
     TRIM 7f80..8f7f  Buffer = 2 kB,  Written 2048 kB at 27520 kB/s,  Verified,  Read 2048 kB at 31688 kB/s
     TRIM 7f80..8f7f  Buffer = 2 kB,  Written 2048 kB at 29822 kB/s,  Verified,  Read 2048 kB at 31683 kB/s
    
     TRIM 8f80..9f7f  Buffer = 4 kB,  Written 2048 kB at 28691 kB/s,  Verified,  Read 2048 kB at 37202 kB/s
     TRIM 8f80..9f7f  Buffer = 4 kB,  Written 2048 kB at 29923 kB/s,  Verified,  Read 2048 kB at 37182 kB/s
     TRIM 8f80..9f7f  Buffer = 4 kB,  Written 2048 kB at 25504 kB/s,  Verified,  Read 2048 kB at 36628 kB/s
    
     TRIM 9f80..bf7f  Buffer = 8 kB,  Written 4096 kB at 33469 kB/s,  Verified,  Read 4096 kB at 41727 kB/s
     TRIM 9f80..bf7f  Buffer = 8 kB,  Written 4096 kB at 31246 kB/s,  Verified,  Read 4096 kB at 41720 kB/s
     TRIM 9f80..bf7f  Buffer = 8 kB,  Written 4096 kB at 32071 kB/s,  Verified,  Read 4096 kB at 41727 kB/s
    
     TRIM bf80..df7f  Buffer = 16 kB,  Written 4096 kB at 31549 kB/s,  Verified,  Read 4096 kB at 44086 kB/s
     TRIM bf80..df7f  Buffer = 16 kB,  Written 4096 kB at 29758 kB/s,  Verified,  Read 4096 kB at 44108 kB/s
     TRIM bf80..df7f  Buffer = 16 kB,  Written 4096 kB at 29624 kB/s,  Verified,  Read 4096 kB at 44119 kB/s
    
     TRIM df80..11f7f  Buffer = 32 kB,  Written 8192 kB at 35723 kB/s,  Verified,  Read 8192 kB at 45609 kB/s
     TRIM df80..11f7f  Buffer = 32 kB,  Written 8192 kB at 36177 kB/s,  Verified,  Read 8192 kB at 45589 kB/s
     TRIM df80..11f7f  Buffer = 32 kB,  Written 8192 kB at 35787 kB/s,  Verified,  Read 8192 kB at 45611 kB/s
    
     TRIM 11f80..15f7f  Buffer = 64 kB,  Written 8192 kB at 34712 kB/s,  Verified,  Read 8192 kB at 45727 kB/s
     TRIM 11f80..15f7f  Buffer = 64 kB,  Written 8192 kB at 35837 kB/s,  Verified,  Read 8192 kB at 45726 kB/s
     TRIM 11f80..15f7f  Buffer = 64 kB,  Written 8192 kB at 35803 kB/s,  Verified,  Read 8192 kB at 45746 kB/s
    
     TRIM 15f80..19f7f  Buffer = 128 kB,  Written 8192 kB at 36617 kB/s,  Verified,  Read 8192 kB at 45738 kB/s
     TRIM 15f80..19f7f  Buffer = 128 kB,  Written 8192 kB at 34870 kB/s,  Verified,  Read 8192 kB at 45702 kB/s
     TRIM 15f80..19f7f  Buffer = 128 kB,  Written 8192 kB at 34535 kB/s,  Verified,  Read 8192 kB at 45734 kB/s
    
  • evanhevanh Posts: 16,813
    edited 2025-05-17 13:56

    Hmm, okay, nope, no difference after all. It was the FULE erase that made a small difference. Disabling/enabling TRIM makes no measurable difference, at least not in the short term.

    The speed tester sequence has the filesystem handler always overwriting the same blocks for the same files, so there's no fragmentation occurring. Therefore making it easy for the card to know what can be erased without any assistive prompting.

    The additional code adds 384 bytes to the binary size. I'll leave it in the driver source but it won't be compiled in without the right #define switch - Which is in Flexspin's include/filesys/fatfs/ffconf.h

  • evanhevanh Posts: 16,813
    edited 2025-05-25 12:10

    Doing very little with it of late. I better post what I've got. Everyone try it out with all your SD cards. Use the speed tester with -D SD_DEBUG defined so card info gets printed.

  • evanhevanh Posts: 16,813

    v1.6 release - Typo fix - a lucky non-bug, and some improved comments.
    The typo - which compiled to unconditional execute:

        if__nz  rolbyte pa, crc0, #1
    

    is now corrected to:

        if_nz   rolbyte pa, crc0, #1
    

    Nice to get it as intended but the unintended PA operation didn't actually matter because PA isn't used for anything else there.

  • Maybe we should warn on labels that begin with if_...

  • roglohrogloh Posts: 6,090

    That's a very good idea. Without syntax highlighting enabled or nearby code to compare underscore lengths this is the sort of thing you could easily miss when browsing through code that is not working. I recall once I had a bug in some C code where a comma was inadvertently added to the end of a very long mostly blanked line that was off the screen essentially (probably caused by leaning down the space bar or something) and the code looked fine and compiled without errors, it was just that my editor settings were not wrapping lines and I assumed (incorrectly) that everything was neatly justified to all fit 80 columns. That took me a couple of days to locate and was probably my most annoying bug to locate in my youth as I still remember it to this day. These days I'd have other tricks to track it down faster but back then it just killed me.

  • TonyB_TonyB_ Posts: 2,220

    @evanh said:
    v1.6 release - Typo fix - a lucky non-bug, and some improved comments.
    The typo - which compiled to unconditional execute:

      if__nz  rolbyte pa, crc0, #1
    

    is now corrected to:

      if_nz   rolbyte pa, crc0, #1
    

    Nice to get it as intended but the unintended PA operation didn't actually matter because PA isn't used for anything else there.

    Extra underscores shouldn't matter to the compiler.

  • evanhevanh Posts: 16,813

    I've made a small optimisation. By merging the command-response phases into one assembly block it has eliminated one small piece of overhead. But really the best part of the optimisation is the code size. I've knocked off a full 100 bytes! Yay.

  • evanhevanh Posts: 16,813

    Okay, I'm satisfied for the moment. Here it is. I tried to merge the three send_cmd() wrappers too but it didn't save space and only added overhead, so that's reverted.

  • evanhevanh Posts: 16,813
    edited 2025-10-15 04:43

    I am also tending to think I can default to sysclock/3 now. Sticking with sysclock/4's symmetrical clock doesn't appear that important. On the other hand, sysclock/3 is a pretty severe overclock above HighSpeed's 50 MHz SD clock frequency. It works because most cards expect to operate in UHS 1.8V at up to 200 MHz SD clock frequency. I'm just pushing 3.3 Volt to do the same. The processing side inside an SD card doesn't make a distinction I don't think. It's just a case of the pin drive operating out of spec is all.

    Roger, you got an opinion?

  • roglohrogloh Posts: 6,090

    @evanh said:
    I am also tending to think I can default to sysclock/3 now. Sticking with sysclock/4's symmetrical clock doesn't appear that important. On the other hand, sysclock/3 is a pretty severe overclock above HighSpeed's 50 MHz SD clock frequency. It works because most cards expect to operate in UHS 1.8V at up to 200 MHz SD clock frequency. I'm just pushing 3.3 Volt to do the same. The processing side inside an SD card doesn't make a distinction I don't think. It's just a case of the pin drive operating out of spec is all.

    Roger, you got an opinion?

    Is there any way we could have it as a choice? So if it didn't work well at sysclk/3 or for faster P2 setups you could at least still get it working at sysclk/4. I realize that would likely increase the code space again.

  • evanhevanh Posts: 16,813
    edited 2025-10-15 08:55

    It's all in place already. Use ioctl() to change the setting at runtime. The default is 4, minimum is 2, and can be set to any 16-bit value at any time after the _sdsd_open() init call.

    I hadn't noticed libc support in spin was missing ioctl() for a long time, Eric has added it in new Flexspin 7.5.1 - https://forums.parallax.com/discussion/comment/1569473/#Comment_1569473

    Here's the code from updated Spin2 tester program mounting routine. It uses ioctl() to disable read CRC and sets sysclock/2 clock divider.

    OBJ
        c: "libc.spin2"
        sddrv: "blkdrvr/sdsd.cc"
    '    mmdrv: "blkdrvr/sdmm_bashed.cc"
    '    mmdrv: "blkdrvr/sdmm.cc"
    
    
    PRI  mountsd() | handle, clkdiv
    
    '    c.printf(string(" Driver = sdmm",13,10))
    '    c.printf(string(" Driver = sdmm_bashed",13,10))
    '    handle := c._sdmm_open(CLK_EVAL, CS_EVAL, MOSI_EVAL, MISO_EVAL)
    '    handle := c._sdmm_open(CLK_RL, CS_RL, MOSI_RL, MISO_RL)
    '    handle := mmdrv._sdmm_open(CLK_EVAL, CS_EVAL, MOSI_EVAL, MISO_EVAL)
    '    handle := mmdrv._sdmm_open(CLK_RL, CS_RL, MOSI_RL, MISO_RL)
    '    handle := mmdrv._sdmm_open(CLK_EH, CS_EH, MOSI_EH, MISO_EH)
    
        c.printf(string(" Driver = sdsd",13,10))
        handle := sddrv._sdsd_open(CLK_RL, CMD_RL, DAT0_RL, PWR_RL, LED_RL)
    '    handle := sddrv._sdsd_open(CLK_EH, CMD_EH, DAT0_EH, -1, -1)
    
        if not handle
            c.printf(string(" device open failed!   errno = %d: %s",13,10), errno, c.strerror(errno))
            abort 1
    
        clkdiv := 0
        c._ioctl(handle, 70, @clkdiv)    ' disable read-block CRC processing
        clkdiv := 2
        c._ioctl(handle, 72, @clkdiv)    ' set CLK_DIV value
    
        if c.mount(string("/sd"), c._vfs_open_fat_handle(handle))
            abort 1
    
  • evanhevanh Posts: 16,813

    @rogloh said:
    ... I realize that would likely increase the code space again.

    The general handling of ioctrl() is required in the driver anyway. This is all there is in terms of extra code in the driver.

            case CTRL_SET_CLKDIV :    // Set the active clock-divider
    #ifdef SD_DEBUG
        __builtin_printf(" SETDIV ");
    #endif
                new_clkdiv(*(int *)buff);
                break;
    
  • roglohrogloh Posts: 6,090

    Ok good that looks like a fairly minimal overhead.

  • evanhevanh Posts: 16,813
    edited 2025-10-15 23:51

    Hmm, I'm hiding a little there. The bounds check within new_clkdiv() wouldn't be needed if user setting wasn't exposed. That's still only a couple of lines itself though.

    static int  new_clkdiv(
        uint32_t clkdiv )    // minimum value of 2
    {
        if( clkdiv < 2 )  clkdiv = 2;    // bounds check
        if( clkdiv > 0xffff )  clkdiv = 0xffff;    // bounds check
        clkdivider = clkdiv;    // store setting
    #ifdef SD_DEBUG
        uint32_t  tmr = _clockfreq() / (clkdiv * 100_000UL);
        __builtin_printf(" SD clock-divider set to sysclock/%d (%d.%d MHz)\n",
                         clkdiv, tmr/10, tmr%10);
    #endif
    
        return calibrate_rxlag();    // perform rx calibration, also selects card and sets full speed clock
    }
    
  • evanhevanh Posts: 16,813

    Another small 56 bytes shrink. Found some space saving in the rxlag calibrate routine.

Sign In or Register to comment.