Yeah, I decided it's best to have the example tester programs use the boot pins and built-in driver by default so then can see it working before testing new hardware.
Looks like you've found all the settings.
Driver doesn't have to be in a subdirectory. That was mainly because of multiple drivers.
The struct __using() and _sdsd_open() is the install instructions.
@evanh it might be handy to include a small readme.txt file in your zip archive stating a few things to help new users out (you might feel this is redundant if you look at that code but for new users it's important to know what to do to get started).
For example to make use of the API in your own code it's not yet built into the tools, so you'll need to create a subfolder under your own application folder called "blkdrvr" and copy the low lever driver files there. Then you can make use of the API from C (or SPIN2) in flexspin that gets used by the demo. Given it's all C based you could mention how to enable the C API calls from SPIN2 via including the libc.a and mmdrv objects, as some SPIN2 users may not yet know about much of that.
Also that you may need to customize the pins (and where you'd modify them). Plus mention any IO restrictions like having the 4 bit DAT pins start on 4 bit pin boundary (although I do see that is listed in the source code), and the difference between the 3 driver file variants in how they operate.
A brief description of the functionality and any IOCTL options available, maybe a short list of the basic APIs and one line description of what it does would be good too even if this is taken from the source.
It's just far nicer for new users to have something to initially start from rather than have to reverse engineer all from the raw code directly, as it will give people more confidence than starting out blind so to speak.
@Rayman said:
Think the 1.0 has a helpful doc that 1.2 does not..
Oh, that tiny .txt in the driver directory, it was a cut'n'paste from Eric's original plugin code for the built-in driver. And I'm not even using that arrangement in the example testers. I've forgotten why Eric did that arrangement, there was some small reason.
The Spin2 example is the best one to read if you're not C familiar already. It shows using the specific objects in Spin2. It'll align better with Basic I suspect also.
In the end though, learning libc is kind of a prerequisite since you're going to be using all the file access functionality from it at the very least. Then using stuff like printf() becomes a no-brainer.
@rogloh said:
It's just far nicer for new users to have something to initially start from rather than have to reverse engineer all from the raw code directly, ...
That's the main reason I have a driver subdirectory. So then the demo speed tester programs are the only sources present at first glance. Any driver otherwise can be right alongside the top level source.
Rayman,
Fire off specific questions in here about the Spin2 speed tester and I'll put more comments in the source code so that people reading it in the future will find it easier.
Roger,
I'll make a quick txt document listing the driver's settings and include the Obex description - which itself links to your excellent schematic for wiring up a suitable SD card slot in the opening post of this topic.
@evanh said:
Roger,
I'll make a quick txt document listing the driver's settings and include the Obex description - which itself links to your excellent schematic for wiring up a suitable SD card slot in the opening post of this topic.
Cool. It's a good way to help describe what this is for new users. I noticed a few typos/grammar errors (is/are) are still in there to clean up when you read closely through it. Although who am I to judge that with all of my later post tidy ups that I end up doing. LOL.
@rogloh said:
Cool. It's a good way to help describe what this is for new users. I noticed a few typos/grammar errors (is/are) are still in there to clean up when you read closely through it. Although who am I to judge that with all of my later post tidy ups that I end up doing. LOL.
A tomorrow job. I'm always fixing dyslexic and grammar and general readability. Most forum posts are re-edited just for this.
By the way the schematic shown here is what I used recently for FireAnt which probably includes a better p-FET for power on/off control vs the original one.
PS: I've changed the ioctl() features. Reduced it to just two user control numbers, the same two (70 and 72) as used in the demo speed testers. So the demos don't need altered. The other two aren't functionally gone as such, they're just combined with the remaining two. Now, instead of separately getting and setting a value, it returns the prior value while also setting the new value.
Hehe, first success playing with __pasm {} code sections. Converted just the data block read routine - Performance sucks! Which I'm going to put down to my blind fast copy firing on every call. Presumably Flexspin's Fcache is smarter than that.
Buffer = 2 kB, Written 512 kB at 13617 kB/s, Verified, Read 512 kB at 774 kB/s
Buffer = 2 kB, Written 512 kB at 13607 kB/s, Verified, Read 512 kB at 774 kB/s
Buffer = 2 kB, Written 512 kB at 13609 kB/s, Verified, Read 512 kB at 774 kB/s
Buffer = 8 kB, Written 2048 kB at 18108 kB/s, Verified, Read 2048 kB at 2807 kB/s
Buffer = 8 kB, Written 2048 kB at 17733 kB/s, Verified, Read 2048 kB at 2807 kB/s
Buffer = 8 kB, Written 2048 kB at 18129 kB/s, Verified, Read 2048 kB at 2807 kB/s
Buffer = 32 kB, Written 4096 kB at 18312 kB/s, Verified, Read 4096 kB at 8095 kB/s
Buffer = 32 kB, Written 4096 kB at 18299 kB/s, Verified, Read 4096 kB at 8094 kB/s
Buffer = 32 kB, Written 4096 kB at 18102 kB/s, Verified, Read 4096 kB at 8094 kB/s
Compared to:
Buffer = 2 kB, Written 512 kB at 15457 kB/s, Verified, Read 512 kB at 16113 kB/s
Buffer = 2 kB, Written 512 kB at 15460 kB/s, Verified, Read 512 kB at 16111 kB/s
Buffer = 2 kB, Written 512 kB at 15452 kB/s, Verified, Read 512 kB at 16111 kB/s
Buffer = 8 kB, Written 2048 kB at 18834 kB/s, Verified, Read 2048 kB at 20529 kB/s
Buffer = 8 kB, Written 2048 kB at 18832 kB/s, Verified, Read 2048 kB at 20529 kB/s
Buffer = 8 kB, Written 2048 kB at 18812 kB/s, Verified, Read 2048 kB at 20529 kB/s
Buffer = 32 kB, Written 4096 kB at 19234 kB/s, Verified, Read 4096 kB at 21925 kB/s
Buffer = 32 kB, Written 4096 kB at 19002 kB/s, Verified, Read 4096 kB at 21948 kB/s
Buffer = 32 kB, Written 4096 kB at 19158 kB/s, Verified, Read 4096 kB at 21941 kB/s
@evanh said:
Presumably Flexspin's Fcache is smarter than that.
No, it really isn't.
Big issue with that idea I see is that you can no longer instantiate 2 drives at the same time. (The pasm blocks are equivalent to DAT blocks in Spin, you only get one instance for a class)
@Wuerfel_21 said:
Big issue with that idea I see is that you can no longer instantiate 2 drives at the same time. (The pasm blocks are equivalent to DAT blocks in Spin, you only get one instance for a class)
Hmm, yeah, that one did slip. Funnily it's the presets that I'm trying to merge with the code that'll be where this problem comes up.
I've been chatting with Stephen about creating a removable driver from his recent FAT32 FS efforts - https://forums.parallax.com/discussion/178033/p2-usd-card-driver-fat32-filesystem-spin2-pasm2/p1
Listing what might be useful to support. In the process I realised I'd never bothered to measure how much bigger the binary becomes when device TRIMming is enabled with the 4-bit SD mode driver.
Turns out it's pretty small. Only 340 bytes (85 longwords). Of that 85 longwords, the driver accounts for 41.
PS: For anyone wanting to try it out, enabling requires editing of the file include/filesys/fatfs/ffconf.h changing the FF_USE_TRIM define from 0 to 1.
cardrelease() for when unmounting the card.
ioctl() is for many optional query/control of driver/card features. Buffer flushing, card capacity, TRIMming, everything except block data transfer.
Don't forget the IOCTL to disable CRC read validation. Wasn't that something needed for highest transfer speed operation, like streaming video where a retry after CRC error is not desired?
Yep, that switch is in there too. As is setting of the clock divider. Although disabling read-CRC is less about avoiding retries and more about the parallel executing CRC computation not keeping up with data rate. Best it can do is sysclock/3. Which, btw, is not the default. Sysclock/4 is the default. So more speed can be coaxed out of the CRC enabled mode.
Good timing. I'm currently in the midst of porting some of your SD driver code to MicroPython. Don't need all of it, just the core P2 pin & card init parts, plus CRC and the data/cmd transfer stuff and will hook up the sector reads/writes into MP VFS.
I believe I have found a way to get Fcache functioning with LLVM using your (slightly modified) code. For example here's the crc7sd function altered to suit. It should be able to read into COGRAM a block of PASM2 code execute it at speed and return back. COG Registers from $0-$1D0 are currently free for use. I just have to mess with the format to ensure that the correct C variables are used in the registers allocated to the code. I've also added in a number of PASM2 instructions that were not part of Nikita's original ISA, including CRC, MODCZ, LOC, WAITnnnn, etc, etc. Hopefully I have all the needed ones now and your code will finally build. Eventually once I know this works, I should make this Fcache handler sit in LUTRAM so it doesn't have to continue hubexec after block loading before branching back into the COGRAM to execute the block - that requires a fifo refill and is wasteful to prepend it each time.
static uint32_t crc7sd( // SD spec 4.5
uint8_t *buf,
size_t len )
{
uint32_t crc = 0;
asm volatile ( // const enforces XIP, volatile enforces Fcache
// Reference code courtesy of Ariba
"loc pb, #\\crc_end\n"
"loc pa, #\\crc_start\n"
"sub pb, pa\n" // compute length in bytes - 4
"shr pb, #2\n" // convert to longs - 1
"setq pb\n" // copy transfer length in longs - 1
"rdlong $0, pa\n" // burst read from HUB to COGRAM
"loc pa, #\\crc_end+4\n" // get return address
"push pa\n" // save return address on internal stack
"jmp #0\n" // run block of fcache code and return to address after fcache code block
"crc_start:\n"
"rdfast #0, %[buf]\n" //buf
"rep #5, %[len]\n" //len
"rfbyte pa ' comment \n"
"movbyts pa, #0b00011011\n" // byte swap within longwords
"setq pa\n"
"crcnib %[crc], #0x48\n" // CRC-7-ITU reversed (x7 + x3 + x0: 0x09, odd parity)
"crcnib %[crc], #0x48\n"
"rev %[crc]\n" // correct the bit order to match standard
"shr %[crc], #24\n" // 7-bit CRC in bits 7..1
"or %[crc], #1\n" // and the SD response end-bit in bit0
"crc_end: ret\n"
: [crc] "+r" (crc) : [len] "r" (len), [buf] "r" (buf) :);
return crc;
}
@evanh said:
PS: Here's an XIP version of crc7sd(), from v1.2 driver, that is almost as fast as that Fcache'd version.
Cool, yeah unrolling in hubexec can still be fast without as much branching. Pity we cannot use the streamer though and that demands an Fcache solution.
PPS: Regarding those block read/write routines, they partly depend on registers being loaded along with the code. If Fcache is shifted to LUTRAM you'd have to move such data into locals instead.
Comments
Did get it going though. Just have to change the RL basepin at the top and then comment/uncomment the correct lines here for sdsd mode:
//static struct __using("blkdrvr/sdmm.cc") DRV; // compiles to 54804 bytes //static struct __using("blkdrvr/sdmm_bashed.cc") DRV; // compiles to 56904 bytes (w/retries) static struct __using("blkdrvr/sdsd.cc") DRV; // compiles to 60596 bytes static FILE * mountsd( void ) { FILE *handle; int rc, clkdiv; uint32_t part; umount("/sd"); _seterror(0); // handle = _sdmm_open(CLK_RL, CS_RL, MOSI_RL, MISO_RL); // handle = _sdmm_open(CLK_EVAL, CS_EVAL, MOSI_EVAL, MISO_EVAL); // handle = DRV._sdmm_open(CLK_RL, CS_RL, MOSI_RL, MISO_RL); // handle = DRV._sdmm_open(CLK_EH, CS_EH, MOSI_EH, MISO_EH); // handle = DRV._sdmm_open(CLK_EVAL, CS_EVAL, MOSI_EVAL, MISO_EVAL); handle = DRV._sdsd_open(CLK_RL, CMD_RL, DAT0_RL, PWR_RL, LED_RL); // handle = DRV._sdsd_open(CLK_EH, CMD_EH, DAT0_EH, -1, -1);Guess this new way of just adding a new folder is easier than old way. Just have to find the install instructions...
Yeah, I decided it's best to have the example tester programs use the boot pins and built-in driver by default so then can see it working before testing new hardware.
Looks like you've found all the settings.
Driver doesn't have to be in a subdirectory. That was mainly because of multiple drivers.
The
struct __using()and_sdsd_open()is the install instructions.PS: You're welcome to work from the Spin2 edition of the tester. There's an updated version with ioctl() added - https://forums.parallax.com/discussion/comment/1569697/#Comment_1569697
Instructions are missing in the latest version zip…
The example source code is the instructions. It's just struct __using() and _sdsd_open(). The rest is normal C file ops, including the use of ioctl().
Or did you just mean the v1.12 zip I posted back a few posts doesn't have the example tester sources included?
@evanh it might be handy to include a small readme.txt file in your zip archive stating a few things to help new users out (you might feel this is redundant if you look at that code but for new users it's important to know what to do to get started).
For example to make use of the API in your own code it's not yet built into the tools, so you'll need to create a subfolder under your own application folder called "blkdrvr" and copy the low lever driver files there. Then you can make use of the API from C (or SPIN2) in flexspin that gets used by the demo. Given it's all C based you could mention how to enable the C API calls from SPIN2 via including the libc.a and mmdrv objects, as some SPIN2 users may not yet know about much of that.
Also that you may need to customize the pins (and where you'd modify them). Plus mention any IO restrictions like having the 4 bit DAT pins start on 4 bit pin boundary (although I do see that is listed in the source code), and the difference between the 3 driver file variants in how they operate.
A brief description of the functionality and any IOCTL options available, maybe a short list of the basic APIs and one line description of what it does would be good too even if this is taken from the source.
It's just far nicer for new users to have something to initially start from rather than have to reverse engineer all from the raw code directly, as it will give people more confidence than starting out blind so to speak.
Think the 1.0 has a helpful doc that 1.2 does not..
Oh, that tiny .txt in the driver directory, it was a cut'n'paste from Eric's original plugin code for the built-in driver. And I'm not even using that arrangement in the example testers. I've forgotten why Eric did that arrangement, there was some small reason.
The Spin2 example is the best one to read if you're not C familiar already. It shows using the specific objects in Spin2. It'll align better with Basic I suspect also.
In the end though, learning libc is kind of a prerequisite since you're going to be using all the file access functionality from it at the very least. Then using stuff like printf() becomes a no-brainer.
That's the main reason I have a driver subdirectory. So then the demo speed tester programs are the only sources present at first glance. Any driver otherwise can be right alongside the top level source.
Rayman,
Fire off specific questions in here about the Spin2 speed tester and I'll put more comments in the source code so that people reading it in the future will find it easier.
Roger,
I'll make a quick txt document listing the driver's settings and include the Obex description - which itself links to your excellent schematic for wiring up a suitable SD card slot in the opening post of this topic.
First run at the readme.
The other text file has good usage notes…
Make a folder under flexprop and copy the three files there…
The "adding a driver ..." text file has no usage at all. It has an incomplete init/mount routine.
Cool. It's a good way to help describe what this is for new users. I noticed a few typos/grammar errors (is/are) are still in there to clean up when you read closely through it. Although who am I to judge that with all of my later post tidy ups that I end up doing. LOL.
By the way the schematic shown here is what I used recently for FireAnt which probably includes a better p-FET for power on/off control vs the original one.
https://forums.parallax.com/discussion/comment/1570122/#Comment_1570122
A tomorrow job. I'm always fixing dyslexic and grammar and general readability. Most forum posts are re-edited just for this.
The original is tidier but I'll add this one too.
Now recovered from a cold of some sort, very chesty. I haven't done much of anything in the last week, watched a bit of telly was my limit.
Updated and added a few markdown's. You'll need to rename it to
readme.mdsince the the forum software doesn't accept .md as a suffix.PS: I've changed the ioctl() features. Reduced it to just two user control numbers, the same two (70 and 72) as used in the demo speed testers. So the demos don't need altered. The other two aren't functionally gone as such, they're just combined with the remaining two. Now, instead of separately getting and setting a value, it returns the prior value while also setting the new value.
Latest edit:
Hehe, first success playing with __pasm {} code sections. Converted just the data block read routine - Performance sucks! Which I'm going to put down to my blind fast copy firing on every call. Presumably Flexspin's Fcache is smarter than that.
Compared to:
No, it really isn't.
Big issue with that idea I see is that you can no longer instantiate 2 drives at the same time. (The pasm blocks are equivalent to DAT blocks in Spin, you only get one instance for a class)
Hmm, yeah, that one did slip. Funnily it's the presets that I'm trying to merge with the code that'll be where this problem comes up.
Not pursuable at any angle now.
Oh well.
I've been chatting with Stephen about creating a removable driver from his recent FAT32 FS efforts - https://forums.parallax.com/discussion/178033/p2-usd-card-driver-fat32-filesystem-spin2-pasm2/p1
Listing what might be useful to support. In the process I realised I'd never bothered to measure how much bigger the binary becomes when device TRIMming is enabled with the 4-bit SD mode driver.
Turns out it's pretty small. Only 340 bytes (85 longwords). Of that 85 longwords, the driver accounts for 41.
PS: For anyone wanting to try it out, enabling requires editing of the file include/filesys/fatfs/ffconf.h changing the FF_USE_TRIM define from 0 to 1.
Current interface suggestion is:
With option to support other pertinent Flexspin hooks like:
cardrelease() for when unmounting the card.
ioctl() is for many optional query/control of driver/card features. Buffer flushing, card capacity, TRIMming, everything except block data transfer.
Don't forget the IOCTL to disable CRC read validation. Wasn't that something needed for highest transfer speed operation, like streaming video where a retry after CRC error is not desired?
Yep, that switch is in there too. As is setting of the clock divider. Although disabling read-CRC is less about avoiding retries and more about the parallel executing CRC computation not keeping up with data rate. Best it can do is sysclock/3. Which, btw, is not the default. Sysclock/4 is the default. So more speed can be coaxed out of the CRC enabled mode.
Minor update to the driver for optional reducing of binary size:
Good timing. I'm currently in the midst of porting some of your SD driver code to MicroPython. Don't need all of it, just the core P2 pin & card init parts, plus CRC and the data/cmd transfer stuff and will hook up the sector reads/writes into MP VFS.
I believe I have found a way to get Fcache functioning with LLVM using your (slightly modified) code. For example here's the crc7sd function altered to suit. It should be able to read into COGRAM a block of PASM2 code execute it at speed and return back. COG Registers from $0-$1D0 are currently free for use. I just have to mess with the format to ensure that the correct C variables are used in the registers allocated to the code. I've also added in a number of PASM2 instructions that were not part of Nikita's original ISA, including CRC, MODCZ, LOC, WAITnnnn, etc, etc. Hopefully I have all the needed ones now and your code will finally build. Eventually once I know this works, I should make this Fcache handler sit in LUTRAM so it doesn't have to continue hubexec after block loading before branching back into the COGRAM to execute the block - that requires a fifo refill and is wasteful to prepend it each time.
static uint32_t crc7sd( // SD spec 4.5 uint8_t *buf, size_t len ) { uint32_t crc = 0; asm volatile ( // const enforces XIP, volatile enforces Fcache // Reference code courtesy of Ariba "loc pb, #\\crc_end\n" "loc pa, #\\crc_start\n" "sub pb, pa\n" // compute length in bytes - 4 "shr pb, #2\n" // convert to longs - 1 "setq pb\n" // copy transfer length in longs - 1 "rdlong $0, pa\n" // burst read from HUB to COGRAM "loc pa, #\\crc_end+4\n" // get return address "push pa\n" // save return address on internal stack "jmp #0\n" // run block of fcache code and return to address after fcache code block "crc_start:\n" "rdfast #0, %[buf]\n" //buf "rep #5, %[len]\n" //len "rfbyte pa ' comment \n" "movbyts pa, #0b00011011\n" // byte swap within longwords "setq pa\n" "crcnib %[crc], #0x48\n" // CRC-7-ITU reversed (x7 + x3 + x0: 0x09, odd parity) "crcnib %[crc], #0x48\n" "rev %[crc]\n" // correct the bit order to match standard "shr %[crc], #24\n" // 7-bit CRC in bits 7..1 "or %[crc], #1\n" // and the SD response end-bit in bit0 "crc_end: ret\n" : [crc] "+r" (crc) : [len] "r" (len), [buf] "r" (buf) :); return crc; }Cool. Yeah, the main block read/write routines seriously use a lot of cogRAM, and to good effect.
PS: Here's an XIP version of crc7sd(), from v1.2 driver, that is almost as fast as that Fcache'd version.
static uint32_t crc7sd( // SD spec 4.5 uint8_t *buf, size_t len ) { uint32_t crc = 0; uint32_t val; __asm const { // "const" enforces XIP, "volatile" enforces Fcache // Reference code courtesy of Ariba crc7lp rdlong val, buf add buf, #4 movbyts val, #0b00_01_10_11 // byte swap within longwords setq val crcnib crc, #0x48 // CRC-7-ITU reversed (x7 + x3 + x0: 0x09, odd parity) crcnib crc, #0x48 djz len, #crc7done crcnib crc, #0x48 crcnib crc, #0x48 djz len, #crc7done crcnib crc, #0x48 crcnib crc, #0x48 djz len, #crc7done crcnib crc, #0x48 crcnib crc, #0x48 djnz len, #crc7lp crc7done rev crc // correct the bit order to match standard shr crc, #24 or crc, #1 // add the SD response end-bit as 8th bit } return crc; }Cool, yeah unrolling in hubexec can still be fast without as much branching. Pity we cannot use the streamer though and that demands an Fcache solution.
PPS: Regarding those block read/write routines, they partly depend on registers being loaded along with the code. If Fcache is shifted to LUTRAM you'd have to move such data into locals instead.
Or they could be included with the presets.