@Rayman said:
Surprised you're still working on this. Seems totally stable and blazing fast already...
Reads are good all round for the most part. But I want to get the writes with small buffer also performing better. This'll allow simple rapid data store from large external PSRAM without consuming all of hubRAM in the process.
EDIT: An alternative approach to caching the single writes would be make the filesystem layers, and the driver, aware of PSRAM address space and handle concurrent access with the PSRAM driver so that a small buffer in hubRAM can be used to transfer large bursts to/from a block device without returning to the application level code. This would then eliminate much of the single block writes that are occurring.
@evanh said:
EDIT: I guess that's technically a cache, within the driver, of those single blocks. It'll need to identify matching block numbers, coming from the filesystem layers, for readback and overwrite.
That cache should really just happen in the FS layer (as an option). Maybe it already exists, FatFS has lots of options.
@Rayman said:
Surprised you're still working on this. Seems totally stable and blazing fast already...
Reads are good all round for the most part. But I want to get the writes with small buffer also performing better. This'll allow simple rapid data store from large external PSRAM without consuming all of hubRAM in the process.
EDIT: An alternative approach to caching the single writes would be make the filesystem layers, and the driver, aware of PSRAM address space and handle concurrent access with the PSRAM driver so that a small buffer in hubRAM can be used to transfer large bursts to/from a block device without returning to the application level code. This would then eliminate much of the single block writes that are occurring.
Not 100% sure of that idea. It's probably simpler and more flexible to have the application control where the streaming data is coming from and arrange transfers in/out of PSRAM on its own rather than 100% rely on the filesystem to be aware of all this. If external memory is used, PSRAM access requests can always be overlapped with application work which is a great way to keep COGs active and working in parallel. It doesn't have to block so you could have the application COG transferring something from SD while the PSRAM driver is saving/retrieving the next block for example. Now putting in some hooks that allow preparation of next sectors using ioctls and/or callbacks from the filesystem might be useful however to keep data moving. Will be interesting to see how things evolve there.
Roger, none of that will help. The existing way that FATFS works is it performs a single block write in between each buffer passed to it for writing. If the buffer size is small, for writing out a large file, then there is a ton of single writes that clog up the SD card with lots of busy delays. Now I think about it, an ioctl() sync is performed too. So that would mess with my plans for hiding a cache in the driver part.
Ada's got a point. I've not looked for compile options of this type that might already exist in the other FATFS files.
There is no problem for writing out from hubRAM only. The buffer size is the whole content to be written. Not to mention speed is not needed with less than a mere 500 kB. But for gigabytes of write data it would be a big plus to be able to dump it as fast as the SD card can go. I fully expect added PSRAM to be integral to making this happen.
@evanh said:
Roger, none of that will help. The existing way that FATFS works is it performs a single block write in between each buffer passed to it for writing. If the buffer size is small, for writing out a large file, then there is a ton of single writes that clog up the SD card with lots of busy delays. Now I think about it, an ioctl() sync is performed too. So that would mess with my plans for hiding a cache in the driver part.
Ok if the only way that it can do large multi-sector bursts is if you pass it an enormous buffer, then there would need to be a way to abstract source reads from special locations to make it think it's reading from a larger space, while something underneath is wrapping and filling the memory behind the scenes - hence your PSRAM suggestion. Now I see what you are getting at.
@Wuerfel_21 said:
That cache should really just happen in the FS layer (as an option). Maybe it already exists, FatFS has lots of options.
Looking in ffconf.h, I'm not seeing anything of this feature type listed.
I just checked myself. There's no config, rather it should always buffer writes to the FAT. If there's a full sync coming through from the VFS layer though, that will also flush the FAT buffer to disk (see sync_window and move_window /their call sites in ff.c for the handling of the FAT buffer).
The driver is in limbo for the moment, the CQ+Cache approach was fun until it failed. I'm sort of waiting on Eric's changes now. I guess I should have a look at the latest master to see how big the changes are. Ada reported it as broken so I didn't even look.
Or I could circle back to porting my driver to Spin2 for FSRW, I've not looked at that yet either ...
PS: Not to mention urgency is still rather low. You're only the third person with the 4-bit configured hardware. Although I suppose that could change quickly if Parallax suddenly put an accessory for it in their shop.
I kept getting surprise after surprise in having to find alternative solutions. Something as simple as the MOVBYTS instruction is available as a built-in function in FlexC but is non-existent in Spin as far as I can see.
Finding a solution for inline assembly with both locals and pre-settable parameters had me making test programs. I ended up with the equivalent of an earlier version of what I'd done in C. This resulted in a revisit of a cloud warnings from Flexspin. But I think now that those warnings are actually a bug in the compiler. Waiting on Eric for confirmation.
MOVBYTS is the endian swapper instruction. Gets used a lot when two devices are opposing endianness. So, no, bytemove() is just a memory copy function.
Okay, yeah, I wasn't in any rush to try that myself. It's a complete duplication of files, with a couple patched for loading my driver, so not entirely surprising there is conflict.
PS: I know very little about the fatfs layers that load my driver.
@evanh said:
Finding a solution for inline assembly with both locals and pre-settable parameters had me making test programs. I ended up with the equivalent of an earlier version of what I'd done in C. This resulted in a revisit of a cloud warnings from Flexspin. But I think now that those warnings are actually a bug in the compiler. Waiting on Eric for confirmation.
Is this related to the test_setqrdlong2.spin2 program you posted in the flexspin thread? If so the warnings are just that, warnings, and not a problem (and they should be gone now in the most recent github sources).
Good reason to leave that debug enabled. That'll say why it failed.
PS: I did occasionally get a badly seated card. It hasn't happened in a while, but then I'm not swapping cards at anything like the rate I was. That's what drove me to ensure CRC got fully implemented. The DAT pins operate relatively independently of the CMD.
Comments
@evanh Surprised you're still working on this. Seems totally stable and blazing fast already...
Guess you're exploring all the possibilities.
The giant file size is nice. Could be useful for movies...
Reads are good all round for the most part. But I want to get the writes with small buffer also performing better. This'll allow simple rapid data store from large external PSRAM without consuming all of hubRAM in the process.
EDIT: An alternative approach to caching the single writes would be make the filesystem layers, and the driver, aware of PSRAM address space and handle concurrent access with the PSRAM driver so that a small buffer in hubRAM can be used to transfer large bursts to/from a block device without returning to the application level code. This would then eliminate much of the single block writes that are occurring.
That cache should really just happen in the FS layer (as an option). Maybe it already exists, FatFS has lots of options.
Can one just assume sequential blocks for writing like one can for reading on freshly formatted card?
Not 100% sure of that idea. It's probably simpler and more flexible to have the application control where the streaming data is coming from and arrange transfers in/out of PSRAM on its own rather than 100% rely on the filesystem to be aware of all this. If external memory is used, PSRAM access requests can always be overlapped with application work which is a great way to keep COGs active and working in parallel. It doesn't have to block so you could have the application COG transferring something from SD while the PSRAM driver is saving/retrieving the next block for example. Now putting in some hooks that allow preparation of next sectors using ioctls and/or callbacks from the filesystem might be useful however to keep data moving. Will be interesting to see how things evolve there.
Roger, none of that will help. The existing way that FATFS works is it performs a single block write in between each buffer passed to it for writing. If the buffer size is small, for writing out a large file, then there is a ton of single writes that clog up the SD card with lots of busy delays. Now I think about it, an ioctl() sync is performed too. So that would mess with my plans for hiding a cache in the driver part.
Ada's got a point. I've not looked for compile options of this type that might already exist in the other FATFS files.
There is no problem for writing out from hubRAM only. The buffer size is the whole content to be written. Not to mention speed is not needed with less than a mere 500 kB. But for gigabytes of write data it would be a big plus to be able to dump it as fast as the SD card can go. I fully expect added PSRAM to be integral to making this happen.
Ok if the only way that it can do large multi-sector bursts is if you pass it an enormous buffer, then there would need to be a way to abstract source reads from special locations to make it think it's reading from a larger space, while something underneath is wrapping and filling the memory behind the scenes - hence your PSRAM suggestion. Now I see what you are getting at.
Looking in
ffconf.h
, I'm not seeing anything of this feature type listed.I just checked myself. There's no config, rather it should always buffer writes to the FAT. If there's a full sync coming through from the VFS layer though, that will also flush the FAT buffer to disk (see
sync_window
andmove_window
/their call sites in ff.c for the handling of the FAT buffer).The superfluous flushing probably has to do with that libc-level buffer logic... (that, as previously stated, belongs in a burning trash compactor).
Also, there is a config mildly related to this: if TINY mode is enabled, that same window buffer is re-used for partial file sectors.
Is the driver part of official flexprop yet?
New "Platform" board works in 4-bit mode:
At first didn't work, but reread this thread and remembered that need to add
--fcache=256
to the command line.
The driver is in limbo for the moment, the CQ+Cache approach was fun until it failed. I'm sort of waiting on Eric's changes now. I guess I should have a look at the latest master to see how big the changes are. Ada reported it as broken so I didn't even look.
Or I could circle back to porting my driver to Spin2 for FSRW, I've not looked at that yet either ...
PS: Not to mention urgency is still rather low. You're only the third person with the 4-bit configured hardware. Although I suppose that could change quickly if Parallax suddenly put an accessory for it in their shop.
Ah, okay, so FSRW requires only a few functions from the "sdspi..." driver file. Looks a relatively easy interface ...
Think just needs to read and write blocks…
It's ported! But not tested in any way.
I kept getting surprise after surprise in having to find alternative solutions. Something as simple as the MOVBYTS instruction is available as a built-in function in FlexC but is non-existent in Spin as far as I can see.
Finding a solution for inline assembly with both locals and pre-settable parameters had me making test programs. I ended up with the equivalent of an earlier version of what I'd done in C. This resulted in a revisit of a cloud warnings from Flexspin. But I think now that those warnings are actually a bug in the compiler. Waiting on Eric for confirmation.
Bytemove()?
MOVBYTS is the endian swapper instruction. Gets used a lot when two devices are opposing endianness. So, no, bytemove() is just a memory copy function.
Just spend too much time figuring out that if you code includes both "_vfs_open_sdcardx" and "_vfs_open_sdsdcard", things go horribly wrong...
Okay, yeah, I wasn't in any rush to try that myself. It's a complete duplication of files, with a couple patched for loading my driver, so not entirely surprising there is conflict.
PS: I know very little about the fatfs layers that load my driver.
Also, is there a switch to turn off all the diagnostic output? Seems might not want that at some point...
The debug #defines under "Compiler options" at the top of the driver file. Comment them out with
//
in front.Is this related to the test_setqrdlong2.spin2 program you posted in the flexspin thread? If so the warnings are just that, warnings, and not a problem (and they should be gone now in the most recent github sources).
All cool, warnings are all gone now. Yeah, I'd jump to conclusions thinking the debug oddity was related to the warnings.
I’ve seen a couple times where it fails to mount.
@evanh you ever see this?
Maybe it should try power cycling more than once?
Good reason to leave that debug enabled. That'll say why it failed.
PS: I did occasionally get a badly seated card. It hasn't happened in a while, but then I'm not swapping cards at anything like the rate I was. That's what drove me to ensure CRC got fully implemented. The DAT pins operate relatively independently of the CMD.
Ok I’ll try to document if see it again..
Got some boards in. Should work as copied directly from latest P2 board.
Silkscreen is a bit unfortunate though...
When get time, will probably put some up on Ebay.
Right, LOL, the Rev 1A.