Flash file driver for P2

Mike Green · 2022-02-02 22:28

Attached is a P2 version of the flash filesystem driver for the P1 previously posted.

This is a fairly simple file system. There are no subdirectories. There is currently no support for executing programs. There's no update in place or append, but you can name files, create them, delete them, write to them, read from them. There's some documentation in the form of comments in the source file.

Publison · 2022-02-02 22:54

Thanks Mike! Caught you on Zoom.

Mike Green · 2022-02-03 00:18

The main thing to be careful of is that, if you are booting from flash memory, the program itself resides in flash memory. The program's loader / flasher is stored at the very beginning of the flash memory with the program itself following that. Everything beyond the end of your program normally is managed by this driver. P2ES_flashloader.bin takes about 32K. TestFlash.binary takes about 90K. The flash filesystem then begins at 128*1024, set by calling flash.setBase(131072). If it's possible that the rest of flash memory may have data in it, it can be erased by calling flash.eraseData for all multiples of 4096 from 131072 to the end of the flash (usually 16MB).

rogloh · 2022-02-03 02:12

This might be something handy to map onto my memory driver for providing simple read/write temporary filesystems using external RAM (Hyper/PSRAM/SRAM), plus it should get pretty decent transfer performance from those devices too vs SPI flash. Applications like audio streaming code, or those that require temporarily generated files such as those on-board build tools might create could then benefit from faster read performance out of this filesystem...

I'll take a look when I can to see what might be needed to do this. It should be possible to map onto real HyperFlash too, by calling the appropriate erase functions instead going directly over SPI and assuming Winbond. Ideally there would be a volume handle of some sort so that multiple filesystems could exist in parallel from the same driver. Maybe that could be done using an array index to multiple instances of this OBJ from the callers...

rogloh · 2022-02-04 05:05

I just watched the recent live forum video showing @ersmith 's latest FlexProp toolchain and noticed the questions about the mount command and vfs support in FlexProp, which is already supporting mounting SD and host file systems. I also wonder how I can get my driver code called from this layer to enable more virtual file systems on the P2?

I know @Wuerfel_21 has recently built an external tool to load up PSRAM directly but it would be nice to have proper support built into the P2 software environment to read/write data and files this way with RAM or other flash file systems using that shell utility Eric showed or similar methods. I have a number of memory types now supported (Hyper, PSRAM, and SRAM) and it would be great to support these to enable RAM based filesystems (for either temporary runtime use or for applications such as those needing larger media or game files, sound files etc), and also HyperFlash for persistent storage.

Unlike what Mike Green has above for the SPI flash I do not have any filesystem layer in my own memory driver code, it just has the primitive read/write/erase (HyperFlash) type functions, and also needs its own DMA driver COG which uses a mailbox interface accessible from my driver's SPIN2 APIs. If we had some shared filesystem layer that sits on top of it then we could overlay FAT/FAT32 or some other other filesystem like Mike uses on top of it and make use of the different underlying hardware. Now that the P2 Edge with 32 MB RAM is around it would be great to try to support it somehow and open the door for future file systems/storage devices as well.

What would we need to do for something like this Eric? Is there some (hopefully) simple interface that could get defined that we could then support which would then "just work" with your code and we could mount other storage devices just as easily as your existing SD and host filesystems? How could it be layered so we don't need to duplicate the same filesystem code in different drivers in order to keep it simple and small if that filesystem type is already present and working elsewhere.

evanh · 2022-02-04 13:04

Mike,
Coincidently I've just put in some time to understand SPI clocking options and how to get the Prop2 to perform accordingly, I figured I could poke my head in here and offer some help on fine tuning low-level SPI workings .... having a squiz ... looks like everything relevant sits in sendRecv() method. Tidy.

One thing that has immediately stood out is CSpin is being floated upon completion. I would think it should be driven high to ensure the SPI chip stays disabled while not in use. I'd also leave CLKpin driven too.

Bed time for me now.

Mike Green · 2022-02-04 18:53

Evanh,
This P2 driver is essentially a translation of the P1 version. I believe the reason for floating all 4 I/O pins was that they were shared with other devices. On the P2, the same pins are used for the SD card although the CS and Clk pins are swapped. The SD card driver used for the P1 also floated all 4 pins for the same reason.

evanh · 2022-02-04 23:31

In that case, because each is dependant who goes low first, both CLK and CS should definitely always idle high and driven. The drivers can easily handle such a hand-over condition.

And with my recent knowledge on clocking arrangements, that implies either SPI mode 1 or mode 3. I noted, as per the Windbond documentation, mode's 0 and 3 are generally interchangeable and will be the two most likely used modes. So, mode 3 it is.

evanh · 2022-02-05 01:51

Updated CSpin and CLKpin handling:

PUB sendRecv(sD,rCt,rA):result | i,m   ' Send and possibly receive
'  sD  = Value to be sent MSB first, right justified in parameter
'  rCt = Number of bytes to be transferred (>0 received, <0 sent)
'  rA  = 0 or address of buffer area.  If 0, @result is used
   if rA == 0                          ' Use result if rA is zero
      rA := @result
   pinh(CSpin)                         ' Disable flash if present
   pinh(CLKpin)
   pinl(DIOpin)                        ' Set up the various I/O pins
   pinl(CSpin)

'*** Transmit 1-4 Bytes From sD (SPI Command-Address) ***
   m := $FF000000
   repeat until m == 0 or sD & m <> 0  ' Set up sending mask
      m >>= 8
   m &= $80808080                      ' Start with most significant bit
   repeat until m == 0
      pinw(DIOpin,((sD & m) <> 0) & 1) ' Output bit to be sent
      pinl(CLKpin)                     ' Toggle clock pulse
      pinh(CLKpin)
      m >>= 1                          ' Advance mask towards LSB

'*** Transmit From HubRAM (SPI Data Write) ***
   if rCt < 0
      repeat i from 0 to -rCt-1
         m := $80                      ' Set up sending mask
         sD := byte[rA+i]              '  and get data byte
         repeat until m == 0
            pinw(DIOpin,((sD & m) <> 0) & 1)
            pinl(CLKpin)               ' Output bit to be sent
            pinh(CLKpin)               ' Toggle clock pulse
            m >>= 1                    ' Advance mask towards LSB

   pinf(DIOpin)                       ' Don't need DIO pin now

'*** Receive To HubRAM (SPI Data Read) ***
   if rCt > 0
      repeat i from 0 to rCt-1
         m := $80                      ' Set up receiving mask
         sD~                           ' Make sure byte is zeroed
         repeat until m == 0
            pinl(CLKpin)               ' Toggle clock pulse
            pinh(CLKpin)
            if pinr(DOpin) <> 0        ' Input bit to be received
               sD |= m
            m >>= 1                    ' Advance mask towards LSB
         byte[rA+i] := sD
   pinh(CSpin)
   pinh(CLKpin)

Mike Green · 2022-02-05 04:18

Thanks @evanh. I'll give that a try.

evanh · 2022-02-05 09:43

Is there a ready to use test program I could use the driver with?

I ask because I've crafted another sendRecv() with inline assembly. It's pretty much 100% replaced the spin code, like-for-like hopefully. So a good idea I actually test it on my Eval Board before posting the source code.

EDIT: Got a question too. Or maybe more a this-could-need-fixed type query ... At the moment, the way that result gets filled, when rA equals zero, might not be as intended. It uses a pointer to fill the incremental addresses of result in hubRAM as the bytes arrive. These few SPI bytes will presumably form a single big-endian value. Eg: Coming from a status register. But result is 32-bit little-endian. Which, as is, probably means lots of endian swapping higher up.

Mike Green · 2022-02-05 15:47

I have a version of FemtoBasic modified for the P2 to use as a test platform. It's not quite ready to use to test the flash driver. I'll post it here. It includes floating point and integer arithmetic, single dimension arrays, USING formatting ... most of which works. I've been working on the interface to the flash file system ... which did work earlier, but broke for a variety of reasons.

The issue with rA equals zero is one mostly of documentation. It's intended for I/O of single bytes where there's no endianess involved, but could be used for simple copying of a few bytes without actually looking at what's contained there.

evanh · 2022-02-05 22:51

Sooo .... Femto is written in Spin? EDIT: Ah, maybe it can import spin objects like Eric's compiler does?

evanh · 2022-02-05 23:32

@"Mike Green" said:
The issue with rA equals zero is one mostly of documentation. It's intended for I/O of single bytes where there's no endianess involved, but could be used for simple copying of a few bytes without actually looking at what's contained there.

I'd recommend changing behaviour, and documenting it, because it's much more likely to get used for reading of SPI device registers than the bulk memory.

Mike Green · 2022-02-06 01:11

@evanh,
I tried your modified "sendRecv". Although your analysis makes sense, the modified code doesn't work. The initialization code tries to read the JEDEC type code for the device so it can determine the size and the value read does not match any of JEDEC codes that it knows. I'll poke around to see if I can figure out what's wrong.

For the use it's put to ... an internal I/O routine in an object supplying a higher level structure (simple named file system without update in place and only simple wear levelling) ... I can't see a need to change the behavior.

Yes, FemtoBasic is written in Spin. It stores Basic programs with the keywords replaced by single byte tokens for space and ease in parsing. Variables are limited to single character names. It's an interpreter, so it can't import other languages. It can read and interpret text files as if they were typed in directly, so it can merge two programs or merge a group of subroutines into a main program. It's really intended for trying out ideas or controlling I/O devices for testing or experimenting purposes where speed is not an issue.

evanh · 2022-02-06 01:23

@"Mike Green" said:
@evanh,
I tried your modified "sendRecv". Although your analysis makes sense, the modified code doesn't work. The initialization code tries to read the JEDEC type code for the device so it can determine the size and the value read does not match any of JEDEC codes that it knows. I'll poke around to see if I can figure out what's wrong.

I need to do some testing myself.

For the use it's put to ... an internal I/O routine in an object supplying a higher level structure (simple named file system without update in place and only simple wear levelling) ... I can't see a need to change the behavior.

The only change is when reading more than 8 bits into result. 8-bit values don't change behaviour.

evanh · 2022-02-06 02:30

@evanh said:

@"Mike Green" said:
I tried your modified "sendRecv". Although your analysis makes sense, the modified code doesn't work. The initialization code tries to read the JEDEC type code for the device so it can determine the size and the value read does not match any of JEDEC codes that it knows. I'll poke around to see if I can figure out what's wrong.

I need to do some testing myself.

Okay, looking at start() I see it does a JEDEC query so I've hacked it a little to report the result ... and it works! Cog0 device = $18_70EF.

Update: Oops, the test() method name clashes in Pnut. I use Flexspin normally. Changed the name to sendrecv_test() now.

evanh · 2022-02-06 05:15

Update: Scope measurements were incorrect, see two posts further down
Oooh, wow ... found an unexpected surprise with how slow spin is, even when compiled to native. With the above code, in Pnut, it takes about 2630 sysclock ticks (CS low to CS high on the scope) for the JEDEC register read ... but snapshotting time on either side of the sendRecv(JEDEC,3,0) nets me 22112 ticks! Flexspin is faster at 910 and 7654 respectively but still a dramatic ratio between routine run time and call time.

But when using the new inline assembly version instead, the second number drops far more dramatically!
Pnut is 330 ticks and 1136 ticks
Flexspin is 330 ticks and 542 ticks

PS: Binary size from Pnut, for inline assembly version, is 44 bytes larger. The original is larger from Flexspin.

evanh · 2022-02-06 12:03

I have an idea why the overhead is so large in the pure Spin code. It'll be due to that one memory reference to result. I found, when I first tried to compile in the assembly code, that none of the locals were in cogRAM. So all those locals which would normally be cogRAM accesses had become RDLONG/WRLONG accesses instead.

To bring cogRAM into play, I had to change the way result got filled. Namely always reference result as a local.

I'd kind of forgotten about it funnily.

evanh · 2022-02-06 13:03

Oh, what?, weird! The scope measurements look nothing like previously. Both Pnut and Flexspin. What was 2630 and 910 ticks for the CS low pulse on the scope, respectively, are now 20960 and 7310 ticks instead. The huge ratio is gone in both cases. I must have previously been seeing another sequence somehow. Looks much more sensible now anyway.

CS pulse   Call ticks      Pnut
 20960       21856      BYTE[result]
 20860       21608      result |=
   330        1104      inline assembly


CS pulse   Call ticks     Flexspin
  7310        7550      BYTE[result]
  3440        3734      result |=
   330         542      inline assembly

Of note is it's only Flexspin that benefits from the locals optimisation.

evanh · 2022-02-06 16:34

Okay, some basic block read/write tests with the inline assembly seem good. Here's the whole thing, including my testing.

To optimise sendRecv() a little better I also packed all the SPI pin numbers into four consecutive bytes, forming one longword. Since only sendRecv() uses them, I figured that liberty was safe enough.

EDIT: Added my newly minted support library. Needed for the testing code.

evanh · 2022-02-06 23:46

Compiling for Ada's nu-code in Flexspin makes Pnut look good.

CS pulse   Call ticks     Nu-code
 51670       53368      BYTE[result]
 51630       53104      result |=
   330        1872      inline assembly

msrobots · 2022-02-07 01:21

you got that wrong NU code is Eric, Ada did the P1 Interpreter/compiler

Two different animals

Mike

evanh · 2022-02-07 01:26

Okay, apologies.

EDIT: Speaking of Prop1, I get an unexplained error from Flexspin when trying to compile the original code here for the Prop1.

Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
Version 5.9.9-beta-v5.9.7-7-g94d82603 Compiled on: Feb  7 2022
Winbond_Driver-ebh.spin2
fmt.c
_platform_:23: error: Error: only constant values are supported for sign/zero extension

evanh · 2022-02-07 04:33

I can make it do SPI clock mode 0 now. Takes extra CS toggles though. I'm inclined to split it off and place in start() method. sendRecv() now starts with this:

   pinh(CSpin)                         ' Disable flash
   pinh(CLKpin)                        '   and SD if present
   pinl(CSpin)                         ' Must preceed CLK low so SD isn't woken
   pinl(CLKpin)                        ' CLK idle low (edge makes CPHA=1)
   pinh(CSpin)                         ' Negate edge effect (now SPI mode 0)
   pinl(DIOpin)                        ' Set up the various I/O pins
   pinl(CSpin)                         ' Enable flash

Compared to SPI mode 3:

   pinh(CSpin)                         ' Disable flash if present
   pinh(CLKpin)                        ' CLK idle high (SPI mode 3)
   pinl(DIOpin)                        ' Set up the various I/O pins
   pinl(CSpin)                         ' Enable flash

EDIT: I guess splitting it off isn't wise. The idea will be to cooperatively share with SD ops together. Mode 3 is fine. Just throwing in the options.

Mike Green · 2022-02-07 23:49

Attached is the current version of P2 FemtoBASIC including support for a flash filesystem and floating point. I'm in the process of testing it. I can't copy it to flash using FlexProp, but using loadp2 with P2ES_flashloader.bin seems to work. Flow of control seems to work (IF, GOTO, FOR, GOSUB, RETURN). Expressions and their operators seem to work. A floating point constant must have a decimal point "." and an optional exponent. If there are just digits (with maybe a leading sign), it's an integer constant. A binary operator with only integer operands produces an integer result (except for "**" and "//" which are more complicated). Unary operators generally product the same kind of result as its operand. Obvious exceptions include SIN, COS, ATAN, SQRT, LOG, EXP that always produce a floating point result. Storing a value in a variable also stores the type of value. Dimensioned variables are all of the same type, set by the first value stored in it. Values are converted to the proper type when stored. Statements that expect a value of a particular type force the appropriate conversion.

OPEN, CLOSE, SAVE, LOAD, DELETE, FILES seem to work. READ and WRITE are partially tested, but should work given how much code is in common with INPUT and PRINT. The flash I/O includes "evanh"s changes.

Please report bugs here.

Wuerfel_21 · 2022-02-08 00:11

@evanh said:
Okay, apologies.

EDIT: Speaking of Prop1, I get an unexplained error from Flexspin when trying to compile the original code here for the Prop1.
Propeller Spin/PASM Compiler 'FlexSpin' (c) 2011-2022 Total Spectrum Software Inc.
Version 5.9.9-beta-v5.9.7-7-g94d82603 Compiled on: Feb  7 2022
Winbond_Driver-ebh.spin2
fmt.c
_platform_:23: error: Error: only constant values are supported for sign/zero extension

Okay that's a bruh moment right there. The file that should be loaded as _platform_ doesn't even have 23 lines.

Wuerfel_21 · 2022-02-08 00:19

Wait, no, that's an error from the ASM backend. The relevant file still doesn't have a line 23 or any sign-extends.

evanh · 2022-02-08 01:36

@"Mike Green" said:
Please report bugs here.

You can ditch the initial clkset() and the pile of old associated assembly constants. Spin environment presets the clock mode based on _clkfreq ... which assumes a 20 MHz crystal is fitted, optionals _xtlfreq or _xinfreq can change that assumption.

If you want runtime adjustable clock setting then have a look at pllset() in that small stdlib.spin2 I posted.

evanh · 2022-02-08 01:54

@"Mike Green" said:
Attached is the current version of P2 FemtoBASIC including support for a flash filesystem and floating point. I'm in the process of testing it. I can't copy it to flash using FlexProp, but using loadp2 with P2ES_flashloader.bin seems to work.
...
Please report bugs here.

Okay, FlexSpin doesn't recognise SETPAT instruction within inlined assembly - which SmartSerial.spin2 is using for its auto-baud detection. I'm guessing Flex don't like anything to do with events. It might be a recently imposed restriction ... nope, gone back to 2020 (Fastspin 4.3.2) and it still don't like it.

Mike Green · 2022-02-08 03:24

@evanh,

Okay, FlexSpin doesn't recognise SETPAT instruction within inlined assembly - which SmartSerial.spin2 is using for its auto-baud detection. I'm guessing Flex don't like anything to do with events. It might be a recently imposed restriction ... nope, gone back to 2020 (Fastspin 4.3.2) and it still don't like it.

I'm using the most recent release of FlexProp / FlexSpin (5.9.8) and it seems to be just fine with FCACHE use of SETPAT instructions. The auto-Baud detection in SmartSerial does work once I can get the program loaded into flash.

Next time I post a newer version, I'll strip out the unnecessary constant definitions.

Flash file driver for P2

Comments