dgately, please try running the two test programs that are in the attached zipfile. test1.c is a minimal program that just prints "Hello". test2.c checks an array of 5000 ints to see if there is any data corruption.
Those run correctly!
$ p2gcc -v -k -o a.bin test1.c
propeller-elf-gcc -mcog -Os -m32bit-doubles -S test1.c
s2pasm -g -p/opt/parallax/lib/prefix.spin2 test1
p2asm -c -o test1.spin2
p2link /opt/parallax/lib/prefix.o -v -o a.bin test1.o /opt/parallax/lib/stdio.a /opt/parallax/lib/stdlib.a /opt/parallax/lib/string.a
Found offset of 12 for symbol ___files of type W at location 568
$ loadp2 -p /dev/cu.usbserial-AE00BU4H -b 115200 -t -v /Users/myUser/source/p2gcc/a.bin
Loading /Users/myUser/source/p2gcc/a.bin - 1284 bytes
/Users/myUser/source/p2gcc/a.bin loaded
[ Entering terminal mode. Press ESC to exit. ]
Hello
$p2gcc -v -k -o a.bin test2.c
propeller-elf-gcc -mcog -Os -m32bit-doubles -S test2.c
s2pasm -g -p/opt/parallax/lib/prefix.spin2 test2
p2asm -c -o test2.spin2
p2link /opt/parallax/lib/prefix.o -v -o a.bin test2.o /opt/parallax/lib/stdio.a /opt/parallax/lib/stdlib.a /opt/parallax/lib/string.a
Found offset of 6 for symbol _LC7 of type R at location 548
Found offset of 12 for symbol ___files of type W at location 54ac
$ loadp2 -p /dev/cu.usbserial-AE00BU4H -b 115200 -t -v /Users/myUser/source/p2gcc/a.bin
Loading /Users/myUser/source/p2gcc/a.bin - 21576 bytes
/Users/myUser/source/p2gcc/a.bin loaded
[ Entering terminal mode. Press ESC to exit. ]
Done
I believe the problem is that the FPGA images for your boards do not support the cordic instructions. p2gcc uses qmul and qdiv, and when getqx and getqy are executed they must be returning values of zero. I think the only code that uses qmul and qdiv is in prefix.spin2, so it's possible to create a version of prefix.spin2 that does multiplication and division like the P1 does in a loop.
I'll look into this tomorrow, and I'll post a version of prefix.spin2 that should work with your boards.
I believe the problem is that the FPGA images for your boards do not support the cordic instructions. p2gcc uses qmul and qdiv, and when getqx and getqy are executed they must be returning values of zero. I think the only code that uses qmul and qdiv is in prefix.spin2, so it's possible to create a version of prefix.spin2 that does multiplication and division like the P1 does in a loop.
I'll look into this tomorrow, and I'll post a version of prefix.spin2 that should work with your boards.
Thanks Dave! I'll test them...
I also notice that the BeMico-A2 has 128k of RAM, so prefix.spin2 could keep the larger hub ram size setting for that board vs the DE0-Nano.
BeMicro-A2 | 1 7 128k 80MHz No BeMicro_A2_Prop2_v32b.jic * - READY
DE0-Nano | 1 8 32k 80MHz No DE0_Nano_Prop2_v32b.jic - READY
DE0-Nano Bare | 1 8 32k 80MHz No DE0_Nano_Bare_Prop2_v32b.jic - READY
* These images always map SD card pins {CSn,CLK,DO,DI} into P[61:58].
It would be good to know which examples would run on these smaller, 1-cog boards. Should the SD-based examples run on the BeMicro-A2, with its on-board SD? Or, do they also require the cordic?
What Dave is saying there is any C source that has * or / operators requires the CORDIC, for the moment.
So, any multiply or divide operations are currently not supported for the single-cog boards... I get it!
But... That does not explain why filetest.c gets an error calling sd_mount(). I could just assume that it's a cordic issue, but should I? There's only one not-currently-executed multiply operation in filetest.c (inside getdec()). I'm not sure that this is a cordic issue (yet).
The limitation also applies to any pre-compiled C using the same tools. Library code doing a multiply for example. In this case sd_mount() is calling mount_explicit() which uses a multiply by 2 a couple of times. Which could, instead, be easily modified to use a left shift by 1.
Looking at the generated assembly, fsrw.c does generate a multiply by -2. I don't know why this isn't optimized to a shift and a subtract, but the multiply will cause the mount to fail.
I created a new prefix.spin2 that doesn't use qmul and qdiv. This is contained in the attached zip file. You will also need to update p2link.c and build it. p2link puts a pointer at 0x13C that overwrites the new multiply and divide code I added. The new p2link.c moves the pointer further away from the code.
Here's a new version of prefix.spin2 that automatically detects if the P2 supports CORDIC or not, and uses the best code for multiply and divide. It still requires the change to p2link.c that I attached to the previous post. I also included a program named cordic.c that prints out whether CORDIC is available or not.
Here's a new version of prefix.spin2 that automatically detects if the P2 supports CORDIC or not, and uses the best code for multiply and divide. It still requires the change to p2link.c that I attached to the previous post. I also included a program named cordic.c that prints out whether CORDIC is available or not.
This improves the situation, allowing most of the samples to at least exec... For filetest.c, it's gotten further, with just one additional modification to the sdspi.c library code (see below). filetest still does not complete, but the SD is mounted and several commands are written to the card with responses (until timing out).
#1 issue: After making the changes to p2link.c and the newer prefix.spin2, I rebuilt everything. I noticed that when building the libraries, the sdspi.c code doesn't actually compile... The function "int getcnt(void)" creates an error as there already exists a macro "int getcnt(int)"
sdspi.c:54:16: error: macro "getcnt" passed 1 arguments, but takes just 0
sdspi.c:55:1: error: expected '=', ',', ';', 'asm' or '__attribute__' before '{' token
puts
Renaming getcnt() to mygetcnt() and replacing all occurrences of that name in sdspi.c, fixes the error.
#2 filetest.c execution responds with an eventual timeout after trying to initialize communication with the SD card. I instrumented sdspi.c with printfs to dump-out the commands, parameters and results of the initialization calls:
Loading /Users/myUserName/source/p2gcc/a.bin - 28916 bytes
/Users/myUserName/source/p2gcc/a.bin loaded
[ Entering terminal mode. Press ESC to exit. ]
cmd: 0 parm: 0x00000000
start_exp cmd 0:0 result: 128
cmd: 8 parm: 0x000001AA
start_exp cmd 8:0x1aa result: 128
cmd: 55 parm: 0x00000000
cmd: 41 parm: 0x40000000
start_exp cmd 41:0x40000000 result: 130
cmd: 55 parm: 0x00000000
cmd: 41 parm: 0x40000000
start_exp cmd 41:0x40000000 result: 128
... the above result 58 times!
cmd: 55 parm: 0x00000000
errorexit: -41 <-- this is a timeout during read in void checktime(void) (in sdspi.c)
My Samsung SDHC 4GB SD card formatting:
Formatting disk3s1 as MS-DOS (FAT) with name UNTITLED
512 bytes per physical sector
/dev/rdisk3s1: 7700608 sectors in 120322 FAT32 clusters (32768 bytes/cluster)
bps=512 spc=64 res=32 nft=2 mid=0xf8 spt=32 hds=255 hid=8192 drv=0x80 bsec=7702528 bspf=941 rdcl=2 infs=1 bkbs=6
fsrwtest.c & shell.c get the same results as they use the same initialization as filetest.c
dgately, sorry about the getcnt. You reported that almost a year ago, and I forgot to fix it. I just checked in a fix for it in GitHub today.
The code should loop a few times sending commands 55 and 41 until a zero response is received. My initialization code has problems with SDHC cards. It usually requires many loops before it gets a zero. In your case it never gets a zero response. Maybe it's a timing issue. I haven't figure out the cause yet. I have noticed that once I've successfully mounted an SDHC card subsequent mounts only take a few loops.
Do you have an old 2G or smaller SD card? If so, give that a try. The code works better with SD than it does with SDHC.
dgately, sorry about the getcnt. You reported that almost a year ago, and I forgot to fix it. I just checked in a fix for it in GitHub today.
Do you have an old 2G or smaller SD card? If so, give that a try. The code works better with SD than it does with SDHC.
I had forgotten about that bug :-)...
I'll look for an old 2GB card (most of the older SD I have are of the larger size). Or, I may increase the timeout just to see if it can eventually get a zero response.
Found an old 2GB SD (non-SDHC) card, formatted it as MS-DOS FAT, but getting the same results... Even an increase in the timeout value in checktime() didn't seem to help. Probably a timing problem at a lower level.
I've attached Cluso's SD test program. Could you try running that? Just type "loadp2 -T SD2_test_121a.obj". I've included the binary file produced by PNut because p2asm doesn't produce an exact match to it. I'll have to look at p2asm to see what it doesn't like about the spin2 source file.
The problem isn't related to the formatting of the card. It's not getting past the initialization code. I suspect the problem is caused by the clock speed I'm using during the initialization phase. Other drivers use a slower clock when initializing, and then use a faster clock after the card has been initialized. I've ignored that, but my SD and SDHC cards mount OK without changing the clock speed.
I modified my code to throttle the clock speed at the beginning. I'll post it after I've had a chance to test it.
If you aren't getting past the initialisation phase, checkout my code. I've not tried to be fast, just thorough and following he specs. Some cards are quicker to initialise than others. One card, a Dane, gave an invalid response which I also cater for.
The timing to initialise has much more impact by the time the card takes, rather than bit-banging the pins faster!
BTW, I have thought about making the SD code callable from the users code. It will only take a few RET instructions to achieve it. The code will require 32 long in cog at $1C0-$1DF.
There would be on SD Init routine, a routine to search for a filename in the root directory, read a file for the files length, and of course a read sector routine. Currently, there is no write sector routine as I removed it a long time ago. Not sure if there's enough space for it. There's better ways to run faster for reading and writing sectors anyway. My code needed to be bulletproof.
IIRC file searching is now only for FAT32. ie I don't support FAT16, nor exFAT.
I tried the slower clock, but that didn't make any difference. I think the slower clock is only needed for very old cards. My SD/SDHC cards always initialize, but the first time takes many loops -- as many as 50 loops. Subsequent mounts usually take 4 to 6 loops. I don't know why dgately's cards don't initialize. I'll continue look at it.
Cluso, I don't think you need to make your code callable. User code can just use their own drivers.
BTW, I have thought about making the SD code callable from the users code. It will only take a few RET instructions to achieve it. The code will require 32 long in cog at $1C0-$1DF.
If there is room, and it is easy to do, that would be a good idea.
It is always useful to have proven code, in a known place. This then becomes a little BIOS like.
Other vendors have callable ROMs with common routines - eg Divide for cores lacking that.
Will the pins be locked, or could the code access multiple SD cards ?
Sounds like "easy use" requires a linker that can locate/reserve memory areas, but that will be software that can come later.
Cluso, when I say it loops I'm referring to the ACMD41 loop. I added a loop count just after the .again55 label in your code, and I see that it loops many times. In fact, it appears to be more loops than I typically get with my code. There's a chance I didn't count or print the loop count correctly. Do you know how many times your ACMD41 loop repeats?
The loop is caused because the card is performing its internal following the reset command CMD0. The response to CMD0 from the card allows the user (ie not the card) software to progress.
The card even responds to the CMD8 (voltage check), again allowing the user software to progress.
However, once we get to the nitty gritty CMD55/CMDA41 pair, here is where I note the "busy" signal is returned while we wait for the card to complete its internal initialisation. I noted in the past, this can be quite a significant time in this loop. It varies dramatically between various cards. Older versions of my code used to trace the loop count IIRC. Some of the user logs will probably show this.
You will note there is a timeout running in the background from the beginning of CMD0 (starttime/duration/delay1s) that runs up to 1s timeout - yes 1 second!!!
Starttime is also reset in readblock.
Postedit:
I had to patch the CMD0 reply to allow for $00 so that the Dane SD card would work. The valid reply should be $01=idle.
Cluso, thanks for the explanation. That might explain why it takes longer to mount the SD card the first time after a power up. I'm currently using a timeout value of 50 million cycles, which is 625 msec. I checked the driver in FSRW, and it uses a timeout value of 4 seconds.
dgately, please try changing the timeout value from 50000000 to 160000000 in the checktime routine in sdspi.c. That will change it from 0.625 seconds to 2 seconds. Rebuild the stdio library by running buildstdiolib in the lib directory. Then try running fsrwtest.c to see if it works. BTW, I don't think filetest.c will run correctly because a stack at 30K doesn't give it enough memory.
I found that part of SD init has to do with the card, you can loop fast, you can loop slow, you can clock fast, you can clock slow, but no matter, the card is performing internal checks on the integrity of the Flash itself. I have some timing diagrams somewhere which I may post-attach when I find them.
If you want to know a little bit about SD card internals and what makes them tick then this teardown is an interesting read.
Comments
Those run correctly!
dgately
I'll look into this tomorrow, and I'll post a version of prefix.spin2 that should work with your boards.
Thanks Dave! I'll test them...
I also notice that the BeMico-A2 has 128k of RAM, so prefix.spin2 could keep the larger hub ram size setting for that board vs the DE0-Nano.
It would be good to know which examples would run on these smaller, 1-cog boards. Should the SD-based examples run on the BeMicro-A2, with its on-board SD? Or, do they also require the cordic?
dgately
But... That does not explain why filetest.c gets an error calling sd_mount(). I could just assume that it's a cordic issue, but should I? There's only one not-currently-executed multiply operation in filetest.c (inside getdec()). I'm not sure that this is a cordic issue (yet).
Thanks,
dgately
Ie: Replace with
I created a new prefix.spin2 that doesn't use qmul and qdiv. This is contained in the attached zip file. You will also need to update p2link.c and build it. p2link puts a pointer at 0x13C that overwrites the new multiply and divide code I added. The new p2link.c moves the pointer further away from the code.
Give it a try and see if it works for you.
#1 issue: After making the changes to p2link.c and the newer prefix.spin2, I rebuilt everything. I noticed that when building the libraries, the sdspi.c code doesn't actually compile... The function "int getcnt(void)" creates an error as there already exists a macro "int getcnt(int)" Renaming getcnt() to mygetcnt() and replacing all occurrences of that name in sdspi.c, fixes the error.
#2 filetest.c execution responds with an eventual timeout after trying to initialize communication with the SD card. I instrumented sdspi.c with printfs to dump-out the commands, parameters and results of the initialization calls: My Samsung SDHC 4GB SD card formatting:
fsrwtest.c & shell.c get the same results as they use the same initialization as filetest.c
The other examples work!
dgately
The code should loop a few times sending commands 55 and 41 until a zero response is received. My initialization code has problems with SDHC cards. It usually requires many loops before it gets a zero. In your case it never gets a zero response. Maybe it's a timing issue. I haven't figure out the cause yet. I have noticed that once I've successfully mounted an SDHC card subsequent mounts only take a few loops.
Do you have an old 2G or smaller SD card? If so, give that a try. The code works better with SD than it does with SDHC.
I'll look for an old 2GB card (most of the older SD I have are of the larger size). Or, I may increase the timeout just to see if it can eventually get a zero response.
Thanks,
dgately
dgately
dgately
I modified my code to throttle the clock speed at the beginning. I'll post it after I've had a chance to test it.
The timing to initialise has much more impact by the time the card takes, rather than bit-banging the pins faster!
There would be on SD Init routine, a routine to search for a filename in the root directory, read a file for the files length, and of course a read sector routine. Currently, there is no write sector routine as I removed it a long time ago. Not sure if there's enough space for it. There's better ways to run faster for reading and writing sectors anyway. My code needed to be bulletproof.
IIRC file searching is now only for FAT32. ie I don't support FAT16, nor exFAT.
Cluso, I don't think you need to make your code callable. User code can just use their own drivers.
It is always useful to have proven code, in a known place. This then becomes a little BIOS like.
Other vendors have callable ROMs with common routines - eg Divide for cores lacking that.
Will the pins be locked, or could the code access multiple SD cards ?
Sounds like "easy use" requires a linker that can locate/reserve memory areas, but that will be software that can come later.
The card even responds to the CMD8 (voltage check), again allowing the user software to progress.
However, once we get to the nitty gritty CMD55/CMDA41 pair, here is where I note the "busy" signal is returned while we wait for the card to complete its internal initialisation. I noted in the past, this can be quite a significant time in this loop. It varies dramatically between various cards. Older versions of my code used to trace the loop count IIRC. Some of the user logs will probably show this.
You will note there is a timeout running in the background from the beginning of CMD0 (starttime/duration/delay1s) that runs up to 1s timeout - yes 1 second!!!
Starttime is also reset in readblock.
Postedit:
I had to patch the CMD0 reply to allow for $00 so that the Dane SD card would work. The valid reply should be $01=idle.
dgately, please try changing the timeout value from 50000000 to 160000000 in the checktime routine in sdspi.c. That will change it from 0.625 seconds to 2 seconds. Rebuild the stdio library by running buildstdiolib in the lib directory. Then try running fsrwtest.c to see if it works. BTW, I don't think filetest.c will run correctly because a stack at 30K doesn't give it enough memory.
Sounds like the processor is booting itself from the flash - an easy way to save some money.
If you want to know a little bit about SD card internals and what makes them tick then this teardown is an interesting read.