Flash file driver for P2
Mike Green
Posts: 23,101
in Propeller 2
Attached is a P2 version of the flash filesystem driver for the P1 previously posted.
This is a fairly simple file system. There are no subdirectories. There is currently no support for executing programs. There's no update in place or append, but you can name files, create them, delete them, write to them, read from them. There's some documentation in the form of comments in the source file.
Comments
Thanks Mike! Caught you on Zoom.
The main thing to be careful of is that, if you are booting from flash memory, the program itself resides in flash memory. The program's loader / flasher is stored at the very beginning of the flash memory with the program itself following that. Everything beyond the end of your program normally is managed by this driver. P2ES_flashloader.bin takes about 32K. TestFlash.binary takes about 90K. The flash filesystem then begins at 128*1024, set by calling flash.setBase(131072). If it's possible that the rest of flash memory may have data in it, it can be erased by calling flash.eraseData for all multiples of 4096 from 131072 to the end of the flash (usually 16MB).
This might be something handy to map onto my memory driver for providing simple read/write temporary filesystems using external RAM (Hyper/PSRAM/SRAM), plus it should get pretty decent transfer performance from those devices too vs SPI flash. Applications like audio streaming code, or those that require temporarily generated files such as those on-board build tools might create could then benefit from faster read performance out of this filesystem...
I'll take a look when I can to see what might be needed to do this. It should be possible to map onto real HyperFlash too, by calling the appropriate erase functions instead going directly over SPI and assuming Winbond. Ideally there would be a volume handle of some sort so that multiple filesystems could exist in parallel from the same driver. Maybe that could be done using an array index to multiple instances of this OBJ from the callers...
I just watched the recent live forum video showing @ersmith 's latest FlexProp toolchain and noticed the questions about the mount command and vfs support in FlexProp, which is already supporting mounting SD and host file systems. I also wonder how I can get my driver code called from this layer to enable more virtual file systems on the P2?
I know @Wuerfel_21 has recently built an external tool to load up PSRAM directly but it would be nice to have proper support built into the P2 software environment to read/write data and files this way with RAM or other flash file systems using that shell utility Eric showed or similar methods. I have a number of memory types now supported (Hyper, PSRAM, and SRAM) and it would be great to support these to enable RAM based filesystems (for either temporary runtime use or for applications such as those needing larger media or game files, sound files etc), and also HyperFlash for persistent storage.
Unlike what Mike Green has above for the SPI flash I do not have any filesystem layer in my own memory driver code, it just has the primitive read/write/erase (HyperFlash) type functions, and also needs its own DMA driver COG which uses a mailbox interface accessible from my driver's SPIN2 APIs. If we had some shared filesystem layer that sits on top of it then we could overlay FAT/FAT32 or some other other filesystem like Mike uses on top of it and make use of the different underlying hardware. Now that the P2 Edge with 32 MB RAM is around it would be great to try to support it somehow and open the door for future file systems/storage devices as well.
What would we need to do for something like this Eric? Is there some (hopefully) simple interface that could get defined that we could then support which would then "just work" with your code and we could mount other storage devices just as easily as your existing SD and host filesystems? How could it be layered so we don't need to duplicate the same filesystem code in different drivers in order to keep it simple and small if that filesystem type is already present and working elsewhere.
Mike,
Coincidently I've just put in some time to understand SPI clocking options and how to get the Prop2 to perform accordingly, I figured I could poke my head in here and offer some help on fine tuning low-level SPI workings .... having a squiz ... looks like everything relevant sits in
sendRecv()
method. Tidy.One thing that has immediately stood out is CSpin is being floated upon completion. I would think it should be driven high to ensure the SPI chip stays disabled while not in use. I'd also leave CLKpin driven too.
Bed time for me now.
Evanh,
This P2 driver is essentially a translation of the P1 version. I believe the reason for floating all 4 I/O pins was that they were shared with other devices. On the P2, the same pins are used for the SD card although the CS and Clk pins are swapped. The SD card driver used for the P1 also floated all 4 pins for the same reason.
In that case, because each is dependant who goes low first, both CLK and CS should definitely always idle high and driven. The drivers can easily handle such a hand-over condition.
And with my recent knowledge on clocking arrangements, that implies either SPI mode 1 or mode 3. I noted, as per the Windbond documentation, mode's 0 and 3 are generally interchangeable and will be the two most likely used modes. So, mode 3 it is.
Updated CSpin and CLKpin handling:
Thanks @evanh. I'll give that a try.
Is there a ready to use test program I could use the driver with?
I ask because I've crafted another
sendRecv()
with inline assembly. It's pretty much 100% replaced the spin code, like-for-like hopefully. So a good idea I actually test it on my Eval Board before posting the source code.EDIT: Got a question too. Or maybe more a this-could-need-fixed type query ... At the moment, the way that
result
gets filled, whenrA
equals zero, might not be as intended. It uses a pointer to fill the incremental addresses ofresult
in hubRAM as the bytes arrive. These few SPI bytes will presumably form a single big-endian value. Eg: Coming from a status register. Butresult
is 32-bit little-endian. Which, as is, probably means lots of endian swapping higher up.I have a version of FemtoBasic modified for the P2 to use as a test platform. It's not quite ready to use to test the flash driver. I'll post it here. It includes floating point and integer arithmetic, single dimension arrays, USING formatting ... most of which works. I've been working on the interface to the flash file system ... which did work earlier, but broke for a variety of reasons.
The issue with rA equals zero is one mostly of documentation. It's intended for I/O of single bytes where there's no endianess involved, but could be used for simple copying of a few bytes without actually looking at what's contained there.
Sooo .... Femto is written in Spin? EDIT: Ah, maybe it can import spin objects like Eric's compiler does?
I'd recommend changing behaviour, and documenting it, because it's much more likely to get used for reading of SPI device registers than the bulk memory.
@evanh,
I tried your modified "sendRecv". Although your analysis makes sense, the modified code doesn't work. The initialization code tries to read the JEDEC type code for the device so it can determine the size and the value read does not match any of JEDEC codes that it knows. I'll poke around to see if I can figure out what's wrong.
For the use it's put to ... an internal I/O routine in an object supplying a higher level structure (simple named file system without update in place and only simple wear levelling) ... I can't see a need to change the behavior.
Yes, FemtoBasic is written in Spin. It stores Basic programs with the keywords replaced by single byte tokens for space and ease in parsing. Variables are limited to single character names. It's an interpreter, so it can't import other languages. It can read and interpret text files as if they were typed in directly, so it can merge two programs or merge a group of subroutines into a main program. It's really intended for trying out ideas or controlling I/O devices for testing or experimenting purposes where speed is not an issue.
I need to do some testing myself.
The only change is when reading more than 8 bits into
result
. 8-bit values don't change behaviour.Okay, looking at
start()
I see it does a JEDEC query so I've hacked it a little to report the result ... and it works!Cog0 device = $18_70EF
.Update: Oops, the
test()
method name clashes in Pnut. I use Flexspin normally. Changed the name tosendrecv_test()
now.Update: Scope measurements were incorrect, see two posts further down
Oooh, wow ... found an unexpected surprise with how slow spin is, even when compiled to native. With the above code, in Pnut, it takes about 2630 sysclock ticks (CS low to CS high on the scope) for the JEDEC register read ... but snapshotting time on either side of the
sendRecv(JEDEC,3,0)
nets me 22112 ticks! Flexspin is faster at 910 and 7654 respectively but still a dramatic ratio between routine run time and call time.But when using the new inline assembly version instead, the second number drops far more dramatically!
Pnut is 330 ticks and 1136 ticks
Flexspin is 330 ticks and 542 ticks
PS: Binary size from Pnut, for inline assembly version, is 44 bytes larger. The original is larger from Flexspin.
I have an idea why the overhead is so large in the pure Spin code. It'll be due to that one memory reference to
result
. I found, when I first tried to compile in the assembly code, that none of the locals were in cogRAM. So all those locals which would normally be cogRAM accesses had become RDLONG/WRLONG accesses instead.To bring cogRAM into play, I had to change the way
result
got filled. Namely always referenceresult
as a local.I'd kind of forgotten about it funnily.
Oh, what?, weird! The scope measurements look nothing like previously. Both Pnut and Flexspin. What was 2630 and 910 ticks for the CS low pulse on the scope, respectively, are now 20960 and 7310 ticks instead. The huge ratio is gone in both cases. I must have previously been seeing another sequence somehow. Looks much more sensible now anyway.
Of note is it's only Flexspin that benefits from the locals optimisation.
Okay, some basic block read/write tests with the inline assembly seem good. Here's the whole thing, including my testing.
To optimise
sendRecv()
a little better I also packed all the SPI pin numbers into four consecutive bytes, forming one longword. Since onlysendRecv()
uses them, I figured that liberty was safe enough.EDIT: Added my newly minted support library. Needed for the testing code.
Compiling for Ada's nu-code in Flexspin makes Pnut look good.
you got that wrong NU code is Eric, Ada did the P1 Interpreter/compiler
Two different animals
Mike
Okay, apologies.
EDIT: Speaking of Prop1, I get an unexplained error from Flexspin when trying to compile the original code here for the Prop1.
I can make it do SPI clock mode 0 now. Takes extra CS toggles though. I'm inclined to split it off and place in
start()
method.sendRecv()
now starts with this:Compared to SPI mode 3:
EDIT: I guess splitting it off isn't wise. The idea will be to cooperatively share with SD ops together. Mode 3 is fine. Just throwing in the options.
Attached is the current version of P2 FemtoBASIC including support for a flash filesystem and floating point. I'm in the process of testing it. I can't copy it to flash using FlexProp, but using loadp2 with P2ES_flashloader.bin seems to work. Flow of control seems to work (IF, GOTO, FOR, GOSUB, RETURN). Expressions and their operators seem to work. A floating point constant must have a decimal point "." and an optional exponent. If there are just digits (with maybe a leading sign), it's an integer constant. A binary operator with only integer operands produces an integer result (except for "**" and "//" which are more complicated). Unary operators generally product the same kind of result as its operand. Obvious exceptions include SIN, COS, ATAN, SQRT, LOG, EXP that always produce a floating point result. Storing a value in a variable also stores the type of value. Dimensioned variables are all of the same type, set by the first value stored in it. Values are converted to the proper type when stored. Statements that expect a value of a particular type force the appropriate conversion.
OPEN, CLOSE, SAVE, LOAD, DELETE, FILES seem to work. READ and WRITE are partially tested, but should work given how much code is in common with INPUT and PRINT. The flash I/O includes "evanh"s changes.
Please report bugs here.
Okay that's a bruh moment right there. The file that should be loaded as
_platform_
doesn't even have 23 lines.Wait, no, that's an error from the ASM backend. The relevant file still doesn't have a line 23 or any sign-extends.
You can ditch the initial
clkset()
and the pile of old associated assembly constants. Spin environment presets the clock mode based on_clkfreq
... which assumes a 20 MHz crystal is fitted, optionals_xtlfreq
or_xinfreq
can change that assumption.If you want runtime adjustable clock setting then have a look at
pllset()
in that small stdlib.spin2 I posted.Okay, FlexSpin doesn't recognise SETPAT instruction within inlined assembly - which SmartSerial.spin2 is using for its auto-baud detection. I'm guessing Flex don't like anything to do with events. It might be a recently imposed restriction ... nope, gone back to 2020 (Fastspin 4.3.2) and it still don't like it.
@evanh,
I'm using the most recent release of FlexProp / FlexSpin (5.9.8) and it seems to be just fine with FCACHE use of SETPAT instructions. The auto-Baud detection in SmartSerial does work once I can get the program loaded into flash.
Next time I post a newer version, I'll strip out the unnecessary constant definitions.