@rokicki & lonesock: I'm late to the party I know, but MANY thanks for the work you've put in to this.
For what it's worth, here are the figures I get for a 128MB integral card:
Clock: 80000000 ClusterSize: 2048 ClusterCount: 61223
Raw write 3968 kB in 3590 ms at 1105 kB/s
Raw read 3968 kB in 2385 ms at 1663 kB/s
fsrw pwrite 480 kB in 2820 ms at 170 kB/s
fsrw pread 480 kB in 422 ms at 1135 kB/s
FSRW pputc 63 kB in 2276 ms at 27 kB/s
FSRW pgetc 63 kB in 1843 ms at 34 kB/s
That's a small card. If you reformat it to use a larger cluster size, you'll get better speed out of it.
Especially for the pwrite; your pwrite numbers are being *destroyed* by the tiny cluster size.
LOL - I hear ya! That card's been sitting in my bits box for ages waiting for it's time to shine. I have a 2GB card, but my wife stole it for her camera - time for re-possession me thinks!
Great work! This is a great addition to the Propeller.
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
First, this software is fantastic. The recent speed bump cut my requirements from two SD cards to 1 for a project I'm working on!
Second, I'm curious how, exactly, the SPI routines work (I've looked through several FSRW threads and didn't find the answer). I think I figured it out, but I'm not sure I have the counter stuff down exactly right. (Those with a weak stomach for odd hardware kludges may wish to skip the rest of this paragraph [noparse]:)[/noparse] ) The reason I ask is that I need to slow down the SPI read side. Why on earth would I wanna do that you ask? Well.... I have a test board with an interesting "design feature" (translation: I made my life miserable). I'm trying to share the SD card between a Prop and a USB enabled PIC (end goal is to have the card appear as a USB mass storage device when you plug in the cable from the computer). To do that, I put a bus mux on the lines going from the microcontrollers to the SD card so that exactly one microcontroller can control the lines (yes, I could do this in software, but I'd prefer not to be able to fry stuff by writing bad software). Had I put more thought into this, I would've realized that the bus mux adds too much latency to the clock line but....
Ok, so here's my summary of how I think the SPI reads work. Let me know where I screwed up:
Clock:
- Counter B provides the clock
- Set FRQB to $4000_0000 and PHSB to $c000_0000 using movi. This causes us to have a 20MHz clock which starts one 80MHz clock cycle after the write to FRQB takes effect.
- This is necessary so that we can execute exactly one cog instruction per CTRB output clock cycle (ie, 4 80MHz clocks per CTRB clock)
Read:
- CTRA is used to read the data by setting it to be in "Logic A AND B" mode
- We read the bits in from most to least significant
- We read the bytes in from least to most significant
- We start with bit 7 of the final 32 bit word. The counter will count once in this mode if both DATA_IN and CLOCK are high. It doesn't count twice because the the PinA is delayed by one clock so there's only a single clock where they can both be high.
- By shifting around which single bit is set in FRQA, we can read each bit in turn.
I haven't quite worked out yet how to slow this down short of dropping the Prop speed to 40 MHz (which does actually work on my board, but I need the rest of the Prop going full speed ahead!), but I'm sure I can work out something. The loop above (which runs a 10MHz clock using two Cog instructions per bit) won't work because it samples at the beginning of the clock period, not the end, but a variant of it will likely get me what I want.
Thanks! Yep, you got the idea. Here's a slightly earlier version of the code, with the read as a separate subroutine. The clock rate is 1/2 the current one, and there's a bit of extra overhead looping through 4 bytes. You may need to clean up the entry/exit a bit, but this was working code [noparse][[/noparse]8^). Let me know if you have any problems with it. You will probably need to use the same technique for the "in8" subroutine.
in32_swizzled
' set up my input variable and my clock
'mov readback,#0
mov tmp1,#4
movi phsb,#%011_000000
:in_byte
' Start my clock
movi frqb,#%001_000000
' keep reading in my value, BACKWARDS! (Brilliant idea by Tom Rokicki!)
test maskDO,ina wc
rcl readback,#8
test maskDO,ina wc
muxc readback,#2
test maskDO,ina wc
muxc readback,#4
test maskDO,ina wc
muxc readback,#8
test maskDO,ina wc
muxc readback,#16
test maskDO,ina wc
muxc readback,#32
test maskDO,ina wc
muxc readback,#64
test maskDO,ina wc
mov frqb,#0 ' stop the clock
muxc readback,#128
' go back for more
djnz tmp1,#:in_byte
' make it...NOT backwards [noparse][[/noparse]8^)
rev readback,#0
in32_swizzled_ret
ret
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
I am trying to test a Motorola 32MB micro-SD card.
First and second mount go ok, but I get -100 when reading block 0 with test.spin.
Any suggestions?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full Morpheus & Mem+dual Prop SBC w/ 512KB kit $119.95, 2MB memory IO board kit $89.95, both kits $189.95 www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
-100 is the error code for "ERR_ASM_NO_READ_TOKEN". I have never actually seen this error happen...it means the driver started a multi-block read request, then waited 500ms for the start-of-block token, which never came. Is it possible this card is either _really_ old or slow? (500ms is already way longer than the spec states.)
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
I would try the other low-level drivers (lonesock, can you remind me of their names?) and see if they work.
I believe the current drivers work on all the dozens of cards we have tried.
Lonesock, don't we have all this in the README (if something fails, try this block driver, then this one)? If
not, something we should consider before release.
Yes to both - old and probably slow. I'll go buy some 2GB cards [noparse]:)[/noparse]
lonesock said...
-100 is the error code for "ERR_ASM_NO_READ_TOKEN". I have never actually seen this error happen...it means the driver started a multi-block read request, then waited 500ms for the start-of-block token, which never came. Is it possible this card is either _really_ old or slow? (500ms is already way longer than the spec states.)
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full Morpheus & Mem+dual Prop SBC w/ 512KB kit $119.95, 2MB memory IO board kit $89.95, both kits $189.95 www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
Forest: I use the routine (presently have a bug and haven't had time to find the exact problem). I share the bus with SRAM on the TriBlade for ZiCog. This may help you with your shared bus. I have a seperate -CS pin, but the DI,DO & CLK pins are shared. The problem I found previously is that you have to force the SD card to release the DO pin and we achieved this with an older driver and lonesock has built this in to his new driver also.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Links to other interesting threads:
So, does the really long unrolled loops really make that big of a difference in speed once the spin overhead is attached. I would like to know as I may wish to remove them.
Kye said...
So, does the really long unrolled loops really make that big of a difference in speed once the spin overhead is attached. I would like to know as I may wish to remove them.
Well, as the code is currently running the SPI clock at 20MHz (which is the max for MMC cards...25MHz is the max for SD and SDHC cards in SPI mode) with one bit per instruction, the next step down would be two instructions per bit, i.e. a 10MHz clock. At 20MHz, assuming no overhead, you can clock in/out data at ~2.38MB/s. Including the overhead, the current code can transfer data at right about 2MB/s for raw access. Because of the read-ahead and write-behind code, the actual data transfer _can_ happen in parallel with the SPIN overhead of FSRW, but not always. One slowdown is that you can't get fully consecutive reads (or writes) when using the FAT16/32 layer, as every once in a while you have to go read the actual FAT entries, then get back to reading your data. In those cases, then the data transfer becomes serial instead of parallel, so any slowdown of the block layer results in a slowdown in overall performance.
Oops, I'm digressing.
But the short answer is, yes, dropping the SPI clock speed to 10MHz does have a significant hit to the transfer performance. For a project I'm currently working on, even the 2MB/s is pretty close to my desired minimum speed. [noparse][[/noparse]8^) But feel free to change the code out, it certainly doesn;t hurt my feelings!
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
The lines are all driven by the prop during use (except the data in line, which is driven by the SD card)...the pull-ups are for when the prop is not driving the SD card (or if you are releasing the lines from the FSRW driver, and using the same lines for some other application), so their values should not affect FSRW speed.
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
lonesock said...
The lines are all driven by the prop during use (except the data in line, which is driven by the SD card)...the pull-ups are for when the prop is not driving the SD card (or if you are releasing the lines from the FSRW driver, and using the same lines for some other application), so their values should not affect FSRW speed.
Jonathan
Does the card drive the line actively or does it just pull it down? I ask this because of rise time. If the card doesn't drive it actively the rise time of the line could possibly effect the communication speed (data in speed anyhow). I figure the guys (including you) working on this already know....but it's not a bad idea to double check.
I did look for specifics on the pull ups for SPI but I didn't find any that was specific enough for this application.
The only case the value of the pullups may affect speed is if they are too low. We are driving the lines at 20MHz.
As long as you stay with our recommended values of 10K or thereabouts you should be fine.
rokicki said...
The only case the value of the pullups may affect speed is if they are too low. We are driving the lines at 20MHz.
As long as you stay with our recommended values of 10K or thereabouts you should be fine.
Rokicki,
That was the answer I was looking for. I am designing with 10K, so I should be well within the bandwidth of what you guys are working with.
I figure if you go too low, the pull down time will suffer.....so the frequency would be slower.
Comments
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,
For what it's worth, here are the figures I get for a 128MB integral card:
Clock: 80000000 ClusterSize: 2048 ClusterCount: 61223
Raw write 3968 kB in 3590 ms at 1105 kB/s
Raw read 3968 kB in 2385 ms at 1663 kB/s
fsrw pwrite 480 kB in 2820 ms at 170 kB/s
fsrw pread 480 kB in 422 ms at 1135 kB/s
FSRW pputc 63 kB in 2276 ms at 27 kB/s
FSRW pgetc 63 kB in 1843 ms at 34 kB/s
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,
Simon
www.norfolkhelicopterclub.com
Announcement: To cut costs in the current economic climate, we have switched-off the light at the end of the tunnel.
That's a small card. If you reformat it to use a larger cluster size, you'll get better speed out of it.
Especially for the pwrite; your pwrite numbers are being *destroyed* by the tiny cluster size.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Cheers,
Simon
www.norfolkhelicopterclub.com
Announcement: To cut costs in the current economic climate, we have switched-off the light at the end of the tunnel.
Hanno
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Download a free trial of ViewPort- the premier visual debugger for the Propeller
Includes full debugger, simulated instruments, fuzzy logic, and OpenCV for computer vision. Now a Parallax Product!
Second, I'm curious how, exactly, the SPI routines work (I've looked through several FSRW threads and didn't find the answer). I think I figured it out, but I'm not sure I have the counter stuff down exactly right. (Those with a weak stomach for odd hardware kludges may wish to skip the rest of this paragraph [noparse]:)[/noparse] ) The reason I ask is that I need to slow down the SPI read side. Why on earth would I wanna do that you ask? Well.... I have a test board with an interesting "design feature" (translation: I made my life miserable). I'm trying to share the SD card between a Prop and a USB enabled PIC (end goal is to have the card appear as a USB mass storage device when you plug in the cable from the computer). To do that, I put a bus mux on the lines going from the microcontrollers to the SD card so that exactly one microcontroller can control the lines (yes, I could do this in software, but I'd prefer not to be able to fry stuff by writing bad software). Had I put more thought into this, I would've realized that the bus mux adds too much latency to the clock line but....
Ok, so here's my summary of how I think the SPI reads work. Let me know where I screwed up:
Clock:
- Counter B provides the clock
- Set FRQB to $4000_0000 and PHSB to $c000_0000 using movi. This causes us to have a 20MHz clock which starts one 80MHz clock cycle after the write to FRQB takes effect.
- This is necessary so that we can execute exactly one cog instruction per CTRB output clock cycle (ie, 4 80MHz clocks per CTRB clock)
Read:
- CTRA is used to read the data by setting it to be in "Logic A AND B" mode
- We read the bits in from most to least significant
- We read the bytes in from least to most significant
- We start with bit 7 of the final 32 bit word. The counter will count once in this mode if both DATA_IN and CLOCK are high. It doesn't count twice because the the PinA is delayed by one clock so there's only a single clock where they can both be high.
- By shifting around which single bit is set in FRQA, we can read each bit in turn.
I haven't quite worked out yet how to slow this down short of dropping the Prop speed to 40 MHz (which does actually work on my board, but I need the rest of the Prop going full speed ahead!), but I'm sure I can work out something. The loop above (which runs a 10MHz clock using two Cog instructions per bit) won't work because it samples at the beginning of the clock period, not the end, but a variant of it will likely get me what I want.
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
First and second mount go ok, but I get -100 when reading block 0 with test.spin.
Any suggestions?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheus & Mem+dual Prop SBC w/ 512KB kit $119.95, 2MB memory IO board kit $89.95, both kits $189.95
www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
I believe the current drivers work on all the dozens of cards we have tried.
Lonesock, don't we have all this in the README (if something fails, try this block driver, then this one)? If
not, something we should consider before release.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Please use mikronauts _at_ gmail _dot_ com to contact me off-forum, my PM is almost totally full
Morpheus & Mem+dual Prop SBC w/ 512KB kit $119.95, 2MB memory IO board kit $89.95, both kits $189.95
www.mikronauts.com - my site 6.250MHz custom Crystals for running Propellers at 100MHz
Las - Large model assembler for the Propeller Largos - a feature full nano operating system for the Propeller
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Links to other interesting threads:
· Home of the MultiBladeProps: TriBlade,·RamBlade, RetroBlade,·TwinBlade,·SixBlade, website
· Single Board Computer:·3 Propeller ICs·and a·TriBladeProp board (ZiCog Z80 Emulator)
· Prop Tools under Development or Completed (Index)
· Emulators: Micros eg Altair, and Terminals eg VT100 (Index) ZiCog (Z80) , MoCog (6809)
· Search the Propeller forums·(uses advanced Google search)
My cruising website is: ·www.bluemagic.biz·· MultiBladeProp is: www.bluemagic.biz/cluso.htm
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Nyamekye,
Oops, I'm digressing.
But the short answer is, yes, dropping the SPI clock speed to 10MHz does have a significant hit to the transfer performance. For a project I'm currently working on, even the 2MB/s is pretty close to my desired minimum speed. [noparse][[/noparse]8^) But feel free to change the code out, it certainly doesn;t hurt my feelings!
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Is there dependency on the value of pull up resistors for speed. Do they need to be a specific size/value for the best speed?
James L
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer
Lil Brother SMT Assembly Services
Are you addicted to technology or Micro-controllers..... then checkout the forums at Savage Circuits. Learn to build your own Gizmos!
Jonathan
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
lonesock
Piranha are people too.
Does the card drive the line actively or does it just pull it down? I ask this because of rise time. If the card doesn't drive it actively the rise time of the line could possibly effect the communication speed (data in speed anyhow). I figure the guys (including you) working on this already know....but it's not a bad idea to double check.
I did look for specifics on the pull ups for SPI but I didn't find any that was specific enough for this application.
James L
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer
Lil Brother SMT Assembly Services
Are you addicted to technology or Micro-controllers..... then checkout the forums at Savage Circuits. Learn to build your own Gizmos!
As long as you stay with our recommended values of 10K or thereabouts you should be fine.
Rokicki,
That was the answer I was looking for. I am designing with 10K, so I should be well within the bandwidth of what you guys are working with.
I figure if you go too low, the pull down time will suffer.....so the frequency would be slower.
Thanks for the answers guys, I do appreciate it.
James L
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
James L
Partner/Designer
Lil Brother SMT Assembly Services
Are you addicted to technology or Micro-controllers..... then checkout the forums at Savage Circuits. Learn to build your own Gizmos!