fsrw 2.6 speed tests
pgbpsu
Posts: 460
I'm working on a project that needs pretty high speed access (both read and write) to an SD card. To test that I can access the card sufficiently fast, I wrote the attached code. It's a basic block level read and write. The thing that surprises me is how slow the read is compared to the write. In my code it's about 10x slower. Running the test program included with fsrw 2.6 reads are about 1/2 the speed of writes which is consistent with comments about the nature of read vs write in block level spin file.
My write speeds are on par with the fsrw tests. My reads are 4x slower than fsrw tests and 10x slow than my writes. Why does it differ so much from the fsrw 2.6 test program? Am I doing something wrong? Have others seen the same thing? My results and the fsrw 2.6 test program results are included below.
My goal is to find a solution that will allow me to write an output buffer to the SD card at some location and read an input buffer from another location on the SD card before my output buffer fills again. My output buffer fills at a rate of 512-bytes every 3 milliseconds. My first attempt used 512-bytes for both the read and the write, but that didn't work. I doubled them both hoping that there was a large setup penalty for reads and I could improve my throughput by reading 1024 bytes and have 6ms to accomplish my goals. This hasn't helped enough and I'm no longer convinced there is a penalty. But I really don't understand how my tests are so different from the fsrw tests results (included below).
THE ATTACHED CODE WILL TRASH DATA ON YOUR SD CARD! IN FACT IT MAY REQUIRE A REFORMAT TO BE USED AFTER RUNNING THIS CODE.
I've attached the code for folks to look at as well as test if there are any brave souls out there. It's setup to run on the gadget gangster Prop Platform USB.
Thanks for looking.
Peter
My write speeds are on par with the fsrw tests. My reads are 4x slower than fsrw tests and 10x slow than my writes. Why does it differ so much from the fsrw 2.6 test program? Am I doing something wrong? Have others seen the same thing? My results and the fsrw 2.6 test program results are included below.
Main Cog: 0 Serial ports running in cog: 1 Mount Command Returned: 3 Blocks to write: 10000 uSeconds to init inputBuffer and outputBuffer: 227 Running read and write speed tests. block 10000 offset 1240 readPtr 8934 time(us) 26316057 sumWrite(us) 3158027 maxWrite(us) 638 minWrite(us) 406 sumRead(us) 21071387 maxRead(us) 20707 minRead(us): 2663 Finished. Blocks processed: 10000
Mount tests first First mount. Succeeded; stopping cog. Second mount. Succeeded. Reading block 0 (should be a boot block) Read finished; checking for boot block signature Boot block checks out; unmounting Third mount. Succeeded. Reading block 0 again (should still be a boot block) Read finished; checking for boot block signature Boot block checks out; writing it back Write finished; unmounting Fourth mount. Succeeded. Reading block 0 again (should still be a boot block) Read finished; checking for boot block signature Block layer seems to check out Now speed tests How fast can we write, sequentially? Raw write 3968 kB in 2191 ms at 1810 kB/s Do a single non-sequential write...Done How fast can we read, sequentially? Raw read 1920 kB in 2179 ms at 880 kB/s Now the filesystem tests Trying to mount Mounted. How fast can we write using pwrite? fsrw pwrite 4064 kB in 3766 ms at 1079 kB/s How fast can we read using pread? fsrw pread 4064 kB in 4626 ms at 878 kB/s How fast can we write using pputc? FSRW pputc 63 kB in 2097 ms at 30 kB/s How fast can we read using pgetc? FSRW pgetc 63 kB in 1838 ms at 34 kB/s Repeating all the speed results: Clock: 80000000 ClusterSize: 32768 ClusterCount: 120979 Raw write 3968 kB in 2191 ms at 1810 kB/s Raw read 1920 kB in 2179 ms at 880 kB/s fsrw pwrite 4064 kB in 3766 ms at 1079 kB/s fsrw pread 4064 kB in 4626 ms at 878 kB/s FSRW pputc 63 kB in 2097 ms at 30 kB/s FSRW pgetc 63 kB in 1838 ms at 34 kB/s All done! Waiting for key press to start tests
My goal is to find a solution that will allow me to write an output buffer to the SD card at some location and read an input buffer from another location on the SD card before my output buffer fills again. My output buffer fills at a rate of 512-bytes every 3 milliseconds. My first attempt used 512-bytes for both the read and the write, but that didn't work. I doubled them both hoping that there was a large setup penalty for reads and I could improve my throughput by reading 1024 bytes and have 6ms to accomplish my goals. This hasn't helped enough and I'm no longer convinced there is a penalty. But I really don't understand how my tests are so different from the fsrw tests results (included below).
THE ATTACHED CODE WILL TRASH DATA ON YOUR SD CARD! IN FACT IT MAY REQUIRE A REFORMAT TO BE USED AFTER RUNNING THIS CODE.
I've attached the code for folks to look at as well as test if there are any brave souls out there. It's setup to run on the gadget gangster Prop Platform USB.
Thanks for looking.
Peter
Comments
I've never used "readblock" before, so maybe there's some trick to using that...
I think FSRW 2.6 does read ahead buffering, so maybe that's getting messed up somehow...
BTW: Last I heard FSRW is many times faster than Kye's version.
What I think is happening, and you can verify this by separating the write benchmark from the read benchmark, is:
You write a block to the card.
The card stores this block in a single buffer
you read from the card, so the card writes the buffer to the destination location so it can reuse the buffer for the read operation.
The card reads and returns the block you requested.
The problem is that your benchmark isn't flushing the buffer before taking the write time measurement, so this buffer write overhead is being accounted for in the read time.
Rewrite the benchmark to write a 1MB block of data, then read a 1MB block of data, then you will see the real write and read speeds.
1 - quick SPI code
2 - multi-block mode: SDHC cards hate you when you use single-block mode, which is the default for most (all?) other microcontroller SD interface routines. Multi-block is much faster, but it assumes that you will read a bunch of (consecutive) blocks in a row, and ditto when writing.
3a - read-ahead (I just read in block N, and while you're busy with that info, I'm just going to grab block N+1 and have it ready for you, since that's what you are most likely to request next)
3b - write behind (you told me to write to block N...I secretly copied your data into a cog buffer, and will now stream it out, while telling you "it's OK, carry on with what you're doing...I have a copy of the data now". A flush or close or switch from write to read makes sure the card is in sync)
So, as you suspected, the best way to abuse FSRW is to read a block, write a block [8^). Even using 2 blocks won't help very much. You basically want to use as large a buffer as you can afford. I wrote some code for a fast large circular buffer on SDHC, which sounds similar to what you're doing, but I think I was using an 8KB buffer (so 16 blocks of 512-bytes each).
Jonathan
I'll admit that I don't know the details of the workings of flash memory cards. But my read tests (and those of the fsrw) all show reads from the card take longer than writes to the card. In that sense my test agree with fsrw. But they differ in the ratio.
I'll have a look at doing large blocks of write then read just to see what happens. Unfortunately, in my application, high speeds in this test aren't very helpful. I won't have 1MB to read/write. I'm doing the buffering in the prop and I'm concerned about setting aside more than the 2048 bytes I already have.
Rayman, you are right that this version of fsrw does some kind of read ahead. I'll look more carefully at that. But what still confuses me is when my read is so much slower than the fsrw. It's the same card, same platform. I don't even eject the card between tests. F10 my program; F10 fsrw_test.
I'll try what pedward suggested to see if the problem is the interleaving of the operations and flushing but I built this following fsrw test. I'd suggest that my timing code is incorrect, but I've put in some pin toggles that allow me to track things with a scope. The scope agrees with the spin calculated times.
Peter
Thank's for responding. It was never my intention to abuse FSRW. It's such a durable and sturdy piece of code I figured it would be more than up to the task. I'm in the testing phase so I'm interleaving reads and writes. In the final application it's likely that they won't be one:one. More likely several writes in a row followed by a single read, then several more writes. What I'm trying to find is the buffer size that will guarantee that I can write an output buffer to the card and read some data back from the card to an input buffer (these 2 buffers need not be the same size) before my output buffer fills again. It's being filled by another process.
Since I'm just testing I can grow the buffers to fill the remaining space, then shrink them stepwise to find the min size that works. I'm afraid if they get much larger than 1KByte each I'll have buffered myself right out of the memory required to do the rest of what I need to. Since that code isn't written yet it's hard to say.
Should I look into disabling read ahead/write behind? The write speeds are great so maybe I can just get rid of read ahead. Do you have any idea how much this read ahead/write behind speed things up when used appropriately? I can probably switch this to circular buffers which would then give me a bit of extra time, but I'm very concerned about the worst case scenario.
Thanks,
Peter
It would probably be relatively simple to add a flag that specifies "don't read-ahead" to the block layer. Are you going to be using the FAT code in FSRW, or just the block driver to get a big flash scratchpad? If you are only writing a single block at a time right now, then you might be getting a false speed result due to the write-behind code (try timing a writeblock and release, though release has a little bit of extra overhead). Sorry, I haven't looked at your code yet.
Jonathan
At the moment I'm planning on using only the block layer code. I don't think I have enough RAM or speed to use the FAT layer.
What I'm trying to do is record incoming data (30 seconds of 40Khz 32-bit data followed by several minutes of no incoming data) to the SD card for long-term storage. I also want to send the data back over a wireless link which I'm sure won't always be able to keep up. Rather than add a >5Mb buffer chip I was hoping to feed the wireless link with data I'd already written to the SD card. I can't buffer all 30 seconds in the prop and I'd rather not add a buffer chip (too many pins; besides I'm already storing the data on the SD card). I was hoping the incoming data could get streamed out the SD card. I think this part works. I'd then simultaneously, albeit at a different rate, send the collected data out over wireless. So I need to write to the card at one speed and read from it at a different rate, but those things have to happen before the buffer filling with data bound for the SD card fills. The data for the wireless link are lower priority than the data bound for long-term storage. But I need to be sure that if I start with an empty outgoing buffer, I can read the wireless data from the SD card before the output buffer fills and needs to be written.
Make sense? Sound possible?
Peter
- If the streaming back out is at a slower rate, I'm not sure why it has to be simultaneous, as it's going to be out of sync eventually...you could just start the stream after recording 30s
- If the slower data is just a check value, could you maybe store a low res, low sampling rate version instead? (like 8bit, 5ksps), or even a cheap compression like ADPCM.
- Does the slower data need to be streamed back at a constant rate? If spotty is OK, you can just have a single 512-byte buffer for the read-back, and all the rest of your free space would be in a circular buffer for the recording cog to fill (note: whether the buffer is circular or straight, the overrun will happen at the same point...once the buffer capacity is full). Then, read, start transmit, switch back to write mode can all happen while the buffer is filled. Once you're done recording, you can simple skip the writing, and do a continuous read till all data is transmitted.
JonathanI agree that you should get the datasheet for the brand of card you are using and find out if there is a command to disable all caching. A badly synchronized write/read combo will cause the caching algorithms to thrash the buffer and not work the way you want to.
If you could also explain a little more about what you are trying to achieve, it would help. Right now I understand you want to write out 1 sector every 3ms, but what is the read immediately after that for?
Another thing you should possible consider is the high speed mode that the SD cards implement. If you go look at the Parallax microSD card adapter, there is a 4bit parallel interface instead of an SPI interface. This is most likely how they read/write 20MB/s on these cards.
Jonathan
I appreciate your comments. A couple posts back I tried to explain my goal. But the basic version is I want to capture data coming into the prop at 160Kbytes/second. This happens for 30 seconds and then stops for several minutes. One of my requirements is to get those data written to the SD card.
A second requirement is to pass those same data out via a wireless link. The stability and throughput of that connection cannot be guaranteed. As Jonathan pointed out, one solution would be to stream all the data to the SD card and when the incoming data are finished, return to the beginning of the file and start sending the pre-recorded data out wirelessly. It's a simple solution but not desirable. Why not use those 30 seconds to get some (maybe all) the data out over the wireless?
That's all I want to accomplish; anything else I might add to this description is HOW I'm trying to do it not WHAT I'm trying to do. So without prejudicing you further with my attempts, I'm open to suggestions.
With regard to your throughput suggestions, I'm not interested in the best-case scenarios because what happens to my system when the worst case shows up? Although the 1000 blocks of continuous read/write will give me a better handle on what the SD card can do, it doesn't help my particular situation. There's no way I can buffer 512_000 bytes in the prop. But I think my approach and what FSRW attempts to do for me are at odds. And that explains why my results are so different from the FSRW test routines.
Jonathan knows the FSRW code (he's one of the authors you know) so when he says 4-bit mode gives only a modest improvement I believe him. I might be right on the edge where that will matter, but I think there's a way to work with what's already available. Besides, I thought the 4-bit mode was proprietary. I'd love to know for sure. Do you know what its status is?
I did my read and write test separately:
This gives an average write of about 1900Kbytes/sec and read of 1100Kbytes/sec. These are in line with the fsrw test. Clearly I'm using the system very differently than it was intended/written in FSRW.
p
I ran my code again, this time calling block.release after I'd read or written 1024 bytes (2x512 blocks). Doing 10000 512-byte blocks (5.12Mbytes) reads, I get 481Kbytes/sec. The writes are about the same at 400Kbytes/sec. These were run separately (all the reads then all the writes).
p
Overall it resulted in an average write speed somewhere around 1-2 milliseconds per sector, if I remember correctly, but that 32nd sector delay could be a problem if you need fast consistent response but aren't using some kind of hub RAM buffering.
And I have no free time...
Can you clarify what you mean by sector? I assume you are not referring to the 512-byte blocks that fsrw writes but to the sectors as formatted? I'll have to run further tests on just the write side of things to be sure I don't have a problem there.
Kye-
I have no doubt you could write something that would solve this, although as you point out, it wouldn't leave enough memory to do anything else ; (
I still believe FSRW can do what I want if I size the buffers correctly and remove the read ahead stuff. That's my story and I'm sticking to it.... I've been working on other things all morning with no chance to dig into the FSRW code. Hopefully a couple hours from now I'll finally have time to do that. I'll post what I find.
p
Jonathan
I made the changes you recommended. My results, using the original code in the first post, are below. It improved (slightly) the total read time, but the max read time went up considerably. This operations are:
Results
I ran this several times and the results don't change much; in fact the total read time rose slightly in subsequent tests. I even switched it back and things don't seem to change much. So maybe I'm not doing it correctly (new code attached).
Peter
strippedVersion - Archive [Date 2011.12.21 Time 13.58].zip
In this scenario, I would probably change the code to have 2 calls: readblock, and readblock_noreadahead. You could then call the appropriate version. This will be a bit more complex, I'll try to get you something tonight.
Jonathan
These tests were for a logger that didn't use any FAT functionality at all; they just passed the SD block address and pointer to a hub block buffer and the command to "write" to the SD write functionality, then waited for the write to complete.
I thought the four bit parallel SD mode required a checksum for each data pin. Has anyone has actually done this with a propeller?
Do you see that as the worse possible way to interact with the SD card? If that's the case, it seem removing both the read ahead and the write behind would be in order. Sure sometimes I'll actually write a few blocks to the SD card without interruption and when all 30 seconds of data have been captured, I'll be doing nothing but reading but if the worse case can be solved, it will work for all others. I understand it could be slower overall but I'm worried about max time to write/read one block.
If you have time to look at this great, but I'm not sure I see a pressing need for 2 read routines. I'm going to hunt around to see if I can remove the write behind code; maybe as someone suggested earlier, my reads look like the culprits because they are charged with time required to get cleaned up after a write. I just flipped my read/write order and there's still no change in the times.
I'm packing up the family tonight and will be on the road all day tomorrow. I'm planning to take my prop stuff with me. I won't have a scope, but I've been doing most of my measurements with the system counter anyway.
Jonathan
I don't know anything about the 4-bit parallel SD mode checksum requirements. Sorry.
Yes, in fact I think that's the safest. I promise to write, read, write. If I stick with 512-bytes, my buffer fills in 3milliseconds.
write 4 blocks
read 1,
write 4,
etc.
Matching the input and output ratio would at least let the writing benefit from staying in write mode longer, even though the read mode will still suffer from having to change to read every single block (at least until all writing is done).
With a circular input buffer, you can have it be larger than the read size (say 2x larger). The read buffer only ever needs to be a single block.
Jonathan
Another possibility would be a non-blocking read. If I can watch the elapsed time and I get close to the time it takes to fill a buffer, I simply abandon ship and quit. I have no idea what that does to the SD card and I can see that slowing the read down even further, but the amount of data I get through the wireless link is secondary to getting all the ADC data safely to the SD card.
I think you're right about the circular input buffer. If it's large enough to handle the longest possible read, I should be fine. I'll simply won't read until I've written everything in my output buffer. I'll have to do more tinkering to find out how long the longest possible single block read is. I assume the best way to handle this is to keep the read ahead stuff commented out.
pb
To put it another way does the following code get the actual time it takes to get complete one read and be ready to do something other than another read?
Jonathan