Shop OBEX P1 Docs P2 Docs Learn Events
flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler - Page 92 — Parallax Forums

flexspin compiler for P2: Assembly, Spin, BASIC, and C in one compiler

18990929495123

Comments

  • evanhevanh Posts: 16,032
    edited 2022-03-28 00:13

    @ersmith said:

    @evanh said:
    Eric,
    In FlexC, is there a graceful way to handle SD card removal after mount()ing? I don't see any umount().

    There's now a umount() call which is hooked up to fatfs (but not to the host fs yet). It seemed to work for me with simple testing, but I haven't tested a lot of different cards.

    Nice. I'll give it a test with the EDID extractor program ...

    EDIT: Not yet in the master branch, maybe?

    Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
    Version 5.9.10-beta-v5.9.9-67-g5bffba90 Compiled on: Mar 28 2022
    ...
    /home/evanh/hoard/coding/prop2/testing/edid-extract.c:284: error: unknown identifier umount used in function call
    /home/evanh/hoard/coding/prop2/testing/edid-extract.c:284: error: Unknown symbol umount
    
  • evanhevanh Posts: 16,032
    edited 2022-03-28 04:00

    @ersmith said:
    @evanh : I'm afraid there's not enough context to know why FCACHE is disabled in your program. The most common reason is disabling optimizations (compiling with -O0). Otherwise, could you post a compilable example that shows the warning? When I tried inserting your code into the sdmm.cc file it compiled without any problems, although it didn't seem to work on my SD card.

    I'll update to your latest sources and try with that ...

    EDIT: Done, no change, still getting the warning. Here's the full transcript, my test program, and my whole modified sdmm.cc file:

    $ filename=sdfat-speedtest;   echo; flexspin -I include -l -2 ${filename}.c; echo; loadp2 -p /dev/serial/by-id/usb-Parallax_Inc_Propeller_P2-ES_EVAL_P23YOO42-if00-port0 ${filename}.binary -t -b 230400
    
    Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
    Version 5.9.10-beta-v5.9.9-67-g5bffba90 Compiled on: Mar 28 2022
    sdfat-speedtest.c
    |-stdlib.spin2
    fopen.c
    fwrite.c
    mount.c
    fmt.c
    posixio.c
    isatty.c
    fputs.c
    fflush.c
    bufio.c
    errno.c
    posixio.c
    fatfs_vfs.c
    |-ff.cc
    bufio.c
    strncpy.c
    strncmp.c
    ioctl.c
    fatfs_vfs.c
    |-ff.cc
    vfs.c
    strcpy.c
    strncat.c
    memset.c
    sdmm.cc
    stat.c
    malloc.c
    strcpy.c
    memset.c
    /home/evanh/hoard/coding/prop2/testing/include/filesys/fatfs/sdmm.cc:147: warning: FCACHE is disabled, asm will be in HUB
    /home/evanh/hoard/coding/prop2/testing/include/filesys/fatfs/sdmm.cc:237: warning: FCACHE is disabled, asm will be in HUB
    sdfat-speedtest.p2asm
    Done.
    Program size is 54396 bytes
    
    ( Entering terminal mode.  Press Ctrl-] or Ctrl-Z to exit. )
     clkfreq = 100000000   clkmode = 0x100090b
     Randfill ticks = 450070
     Mounting:  Writing 200000 bytes at 294118 bytes/s
     Reading 200000 bytes at 275103 bytes/s
     Matches!  :)
    

    EDIT2: Oops, missed out attaching my spin2 library for the muldiv65()

  • I now have 9 open Pull Requests on Github.

    Place your bets now, will I get another stupid idea first, getting into the double digits, or will Eric deal with some of them first?

  • evanhevanh Posts: 16,032

    What's a pull request when compared to git pull that I use for updating my copy of the master?

  • @evanh said:
    What's a pull request when compared to git pull that I use for updating my copy of the master?

    A pull request is a suggested change to the repository (accepting of which is equivalent to pulling and merging the PR source branch, except the web frontend does it for you).

  • evanhevanh Posts: 16,032

    There's so much to understanding the commands ... just reading now:

    In its default mode, git pull is shorthand for git fetch followed by git merge FETCH_HEAD.

  • @evanh said:

    @ersmith said:

    @evanh said:
    Eric,
    In FlexC, is there a graceful way to handle SD card removal after mount()ing? I don't see any umount().

    There's now a umount() call which is hooked up to fatfs (but not to the host fs yet). It seemed to work for me with simple testing, but I haven't tested a lot of different cards.

    Nice. I'll give it a test with the EDID extractor program ...

    EDIT: Not yet in the master branch, maybe?

    There, but called "_umount" (with an underscore). I've added an alias now so that plain "umount" works too.

    Also, calling mount() with a NULL pointer for the file system should also work as unmount. For that matter, I think that just re-calling mount() with _vfs_sdcard() should probably do the unmount/mount, but I haven't tested that path.

  • @evanh : the FCACHE issue was due to the way system library functions (like the SD card stuff) were handled; they were being compiled before FCACHE was set up. This should be fixed now. Although honestly, it's probably better not to use __asm volatile in the libraries (just plain __asm should work fine, and the loops will be FCACHEd automatically).

  • @Wuerfel_21 said:
    I now have 9 open Pull Requests on Github.

    It's hard to keep up with you Ada! Thank you for all of your improvements, you've really made flexspin much better in many ways.

  • evanhevanh Posts: 16,032

    @ersmith said:
    @evanh : the FCACHE issue was due to the way system library functions (like the SD card stuff) were handled; they were being compiled before FCACHE was set up. This should be fixed now. Although honestly, it's probably better not to use __asm volatile in the libraries (just plain __asm should work fine, and the loops will be FCACHEd automatically).

    Thank you, thank you, thank you. Got a 5x speed increase already. :) Bedtime now though.

  • evanhevanh Posts: 16,032
    edited 2022-03-30 06:14

    Well, sticking to the straight byte loop hand coding is all I've done now. There's definitely some notable overheads higher up, particularly with SD writes. And I've been confounded at making any further easy improvements around word sizes and parameter passing. I can see where Chip is coming from with his frustrations with strong typing. Type punning, https://en.wikipedia.org/wiki/Type_punning, is common in assembly and he explicitly retained that ability in Spin.

    In all cases below, the inner loop bit rate is locked at 8 ticks per bit.

    At 10 MHz sysclock it's 118 kB/s read speed, that's ~10 ticks per bit. Good result.
    At 200 MHz sysclock it's 1526 kB/s read speed, that's ~16 ticks per bit. Not so good, and presumably is mostly due to set delays.

    At 10 MHz sysclock it's 104 kB/s write speed, that's ~11 ticks per bit.
    At 200 MHz sysclock it's 718 kB/s write speed, that's ~34 ticks per bit. Those delays are obviously even worse here.

    PS: Eric, I tried without volatile on multiple occasions. It very much is needed, as is the FCACHE. If either aren't functioning then the code fails outright.

  • I haven't looked into it, but I assume it only uses single-block read/write commands, right? Because ye, single block writes are just slow because the card will signal busy until it has actually written the data (which it can only start doing when the full block has been recieved).

  • evanhevanh Posts: 16,032
    edited 2022-03-30 06:51

    Oh? I guess that's a good next thing ...

    EDIT: Looks like Eric is handling that case already:

        if (disk_status(drv) & STA_NOINIT) return RES_NOTRDY;
        if (!(CardType & CT_BLOCK)) sect *= 512;    /* Convert LBA to byte address if needed */
    
        if (count == 1) {   /* Single block write */
            if ((send_cmd(CMD24, sect) == 0)    /* WRITE_BLOCK */
                && xmit_datablock(buff, 0xFE))
                count = 0;
        }
        else {              /* Multiple block write */
            if (CardType & CT_SDC) send_cmd(ACMD23, count);
            if (send_cmd(CMD25, sect) == 0) {   /* WRITE_MULTIPLE_BLOCK */
                do {
                    if (!xmit_datablock(buff, 0xFC)) break;
                    buff += 512;
                } while (--count);
                if (!xmit_datablock(0, 0xFD))   /* STOP_TRAN token */
                    count = 1;
            }
        }
        deselect();
    
  • But does the FS layer ever actually write more than one block at a time? With that code it'd only be able to do it if the fwrite call is block-aligned (since otherwise it'd have to combine a block from a buffered partial block and the fwrite data) and I'd suspect it might just not check for that case.

    Haven't looked at it at all yet, am still in bed lmao.

  • evanhevanh Posts: 16,032
    edited 2022-03-30 07:36

    I've confirmed that CMD25 is the path taken with the SD card I'm using. That CMD25 sets up a loop doing one block at a time but without further SD commands, xmit_datablock() does these 0xFC "tokens" in front of each block, so I'm guessing that's correctly a multi-block transfer.

  • evanhevanh Posts: 16,032
    edited 2022-03-30 07:48

    Ah-ha! Changed to using a Sandisk SD card and writes at 200 MHz sysclock are now 1380 kB/s, instead of 718 kB/s.

    At 360 MHz: Reads are 2200 kB/s, Writes are 1800 kB/s.

  • evanhevanh Posts: 16,032
    edited 2022-03-31 03:25

    Oh, Eric,
    A regression of some sort. I'm getting duplicate labels in the .p2asm file. Doesn't happen if I use the older includes:

    Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
    Version 5.9.10-beta-v5.9.9-93-ge37a63f5 Compiled on: Mar 30 2022
    edid-extract.c
    |-jm_i2c_modified.spin2
    posixio.c
    fputs.c
    fprintf.c
    fopen.c
    fwrite.c
    bufio.c
    mount.c
    fmt.c
    ieee32.c
    ioctl.c
    isatty.c
    fflush.c
    fatfs_vfs.c
    |-ff.cc
    memset.c
    dofmt.c
    vfs.c
    errno.c
    posixio.c
    strncpy.c
    strncat.c
    strncmp.c
    strcpy.c
    memset.c
    sdmm.cc
    stat.c
    malloc.c
    fprintf.c
    strcpy.c
    memset.c
    dofmt.c
    edid-extract.p2asm
    edid-extract.p2asm:17102: error: Changing hub value for symbol __struct___fmtfile_putchar
    edid-extract.p2asm:17119: error: Changing hub value for symbol __struct___fmtfile_putchar_ret
    Done.
    
  • evanhevanh Posts: 16,032
    edited 2022-03-30 20:36

    Even weirder ... just noticed the duplication only occurs when I #define _DEBUG for disk_initialize() within the include/filesys/fatfs/sdmm.cc file. I'd forgotten I'd turned that on. EDIT: Ah, which probably explains why it vanishes when using an alternate set of includes ... EDIT2: Yep, that's it. So same errors from duplicate symbols when enabling the debug prints.

    EDIT3: Ah, but also needs something in that EDID extract program. If I use the speed test program instead, then the duplication error doesn't happen with or without the _DEBUG.

  • evanhevanh Posts: 16,032
    edited 2022-03-31 03:53

    Found that removing the source line mount( "/sd", _vfs_open_sdcard() ); eliminates the duplication error. Dunno why. I use that exact same line in the speed tester. It's the latter occurrence of __struct___fmtfile_putchar, right at the end of the .p2asm listing, that comes and goes.

  • evanhevanh Posts: 16,032

    Eric,
    Here's a version of sdmm.cc with more alterations to remove the excessive delays() at beginning of block read/write.

  • @evanh said:
    Eric,
    Here's a version of sdmm.cc with more alterations to remove the excessive delays() at beginning of block read/write.

    You replaced it with _getms calls though, which is a rather slow function call still. I'd just rely on the byte receive function's inherent delay.

  • evanhevanh Posts: 16,032
    edited 2022-04-01 18:02

    And ditch the timeout?
    BTW, the results are just as good as the 1 us delay got. I'm happy with the outcome as is.

    EDIT: Removing the timeout is gaining another 10% again. I'll leave that one for Eric to ponder. Actually, Eric might not be too keen on the "needs cogexec" requirement and throw it all out.

  • Wuerfel_21Wuerfel_21 Posts: 5,106
    edited 2022-04-01 18:11

    @evanh said:
    And ditch the timeout?
    BTW, the results are just as good as the 1 us delay got. I'm happy with the outcome as is.

    _getms probably takes the better part of a µs to execute (one QDIV + function call overheads () + one internal branch).
    You just time out after a certain number of non-start bytes recieved. That means the timeout goes up as clock speed goes down, but since it's an error condition, that's likely fine. If you want to keep an accurate time, compute the target CT value first (getct + clkfreq/reciporal-of-timeout) and check against that in the loop.
    But ye, it doesn't matter that much.

  • Wuerfel_21Wuerfel_21 Posts: 5,106
    edited 2022-04-03 00:31

    ⑨ time again.

    I believe procrastination is the strongest force in the known universe.

  • @evanh : I've been busy on other things, so I've kind of lost track of where the sdmm driver stands. Is the current version the one in the "improving SD performance" thread?

  • evanhevanh Posts: 16,032
    edited 2022-04-06 22:39

    I'm not recommending that one any more. It is a simple bit-bashed Pasm code replacement of the original bit-bashed C code for the tx/rx functions. I've worked out that with the SPI clock at sysclock/8, which it is crafted for, is risky above 200 MHz sysclock. It works beautifully on the Eval Board but the Edge Board, surprisingly, seems to be a bigger load on the SD cards. Only my Sandisk Extreme can push above 200 MHz, and seems to with ease, on the Edge Board.

    So ...
    I've spent the day fine tuning my latest smartpin based replacement. It's feeling good. The smartpin version is a lot of hacking around so is full of commented out testing code but I'm ready for others to try it anyway.

    PS: It's still using Fcache for the moment. I was going to try eliminating that need but needed to build confidence in the rest of the changes first. Timing problems needed minimised as much as possible.

    BTW: This code is not in any way intended as submit-able. It's based off master branch from 30 Mar 2022.

    EDIT: Eric, don't use it. I've got the timings too tight there for the rx phase in use. I should have enabled the alternative timings set. I'll work on adjusting the rx phase ...

  • evanhevanh Posts: 16,032

    Eric,
    Barring any failed tests by others, it's ready - https://forums.parallax.com/discussion/comment/1537865/#Comment_1537865

    The source incorporates both optimised bit-bashed and fine-tuned smartpins. Build-time selectable with the #define _smartpins_mode_eh

  • pik33pik33 Posts: 2,388

    The Prop2Play uses the updated driver for some time now. It allowed to get the wave file chunks more than 5x faster then the original one and play them without acrobatics and huge RAM buffers.

  • I've also been using (some version of it, anyways) for building megayume and that works fine. Infact, I haven't tested the SRAM save feature with the stock driver at all (uhhhh...)

  • @evanh said:
    Eric,
    Barring any failed tests by others, it's ready - https://forums.parallax.com/discussion/comment/1537865/#Comment_1537865

    The source incorporates both optimised bit-bashed and fine-tuned smartpins. Build-time selectable with the #define _smartpins_mode_eh

    Great, thanks @evanh ! I'm sorry I haven't had much time to work on flexspin, but I hope to look at your new code this weekend and get it into the official libraries. Thank you for your work, the SD card code definitely needs a speed up.

Sign In or Register to comment.