P1V booting from FleaFPGA Ohm flash with custom boot ROM tools

rogloh · 2018-03-03 15:52

Hi all,

Today I managed to get the FleaFPGA Ohm board running some custom P1V boot code to read images from its onboard SPI flash chip and also allowing the normal flashing of persistent images serially via standard Propeller tools (actually PropellerIDE was tested).

This is good because this particular FPGA board does not have any I2C EEPROM but it does have a 1MB SPI flash chip that holds the FPGA configuration data. This 1MB chip is not fully used by the ~600kB configuration data so it can also be shared after FPGA initialization by dedicating the top 32kB block of the chip for storage of a typical P1 EEPROM image.

Little bit of mucking about but I finally got it going:
- had to learn the W25Q80 FLASH chip SPI command sequences for read/write/erase etc
- had to modify the boot ROM to use flash ROM instead of I2C EEPROM and recreate 4 ROM image byte lane files for my P1V Diamond project - I've created a handy Makefile to assist this
- had to modify the Lattice project preference files to enable SPI port on ECP5 FPGA in user mode
- had to map FPGA IO pins to this SPI port and make them accessible to the P1V code

Thankfully there was only one slightly tricky bug in my SPI flash mods to track down. I had inadvertently used a “test ina, spimaskmiso wc” instead of “test spimaskmiso, ina wc, nr” and was wondering why all my SPI data read was zeroes. DOH!

If anyone is interested I have attached a zip file with this ROM code and a Makefile for users on MAC / Linux systems to generate either the original 4k boot ROM image or my custom flash enabled boot ROM image for this board. It could probably be further customized as required for other FPGA systems that use similar SPI flash.

You can build the custom ROMs using

make

or

make flea

or rebuild the original ROMs using

make orig

Enjoy,
Roger.

customrom.zip

Ariba · 2018-03-03 17:07

Great news Roger

I will try to include your new booter and interpreter spin file into my rom-hex files. Maybe you can post the 4 hex data files in addition to the spin sources, I already have made a tool to merge them with the rom tables.

rogloh wrote: »

- had to modify the Lattice project preference files to enable SPI port on ECP5 FPGA in user mode

Can you show how the settings must be? I had no luck to access the Flash with my custom processor so far. I can output the SPI clock with the available Primitive, but I could not read data, because the MISO pin was always tight to ground.

Andy

Cluso99 · 2018-03-03 18:24

WTG Roger

jac_goudsmit · 2018-03-03 18:51

That's great, Roger!

I'm currently busy with other projects, but I'm going to integrate this into my P1V repo. I know many (if not all) FPGA boards have no I2C EEPROM on board (or it's much too small), so this is a welcome addition!

Thanks!

===Jac

jmg · 2018-03-03 19:14

rogloh wrote: »

Hi all,

Today I managed to get the FleaFPGA Ohm board running some custom P1V boot code to read images from its onboard SPI flash chip and also allowing the normal flashing of persistent images serially via standard Propeller tools (actually PropellerIDE was tested).

Good progress.
Can you add a short summary of the P1V that runs on this Board - ie COGs/RAM/MHz/Pins ? Boot-times ?

Rayman · 2018-03-03 19:18

Does the fpga load in sqi mode?
Or, is it all just regular spi?

jmg · 2018-03-03 19:26

Rayman wrote: »

Does the fpga load in sqi mode?
Or, is it all just regular spi?

Do you mean the FPGA bitstream-image load step, or the 32k P1 code booting step ?

rogloh wrote: »

This 1MB chip is not fully used by the ~600kB configuration data so it can also be shared after FPGA initialization by dedicating the top 32kB block of the chip for storage of a typical P1 EEPROM image.

Is there any merit is making that the top 64k, as many P1 boards ship with 64k eeproms ?
Or even allowing more than 64k, as I guess that FPGA ceiling is quite well defined, and that would allow other P1V variants to have more Flash.

XIP is always possible, or a mini flash storage scheme ?

Ariba · 2018-03-03 19:58

Rayman wrote: »

Does the fpga load in sqi mode?
Or, is it all just regular spi?

The Flash connection on the FleaOhm board is not made for QSPI, only 2 data lines are connected.
The FPGA could do also Quad mode for configuration if connected right.

jmg wrote: »

...
Is there any merit is making that the top 64k, as many P1 boards ship with 64k eeproms ?
Or even allowing more than 64k, as I guess that FPGA ceiling is quite well defined, and that would allow other P1V variants to have more Flash.

XIP is always possible, or a mini flash storage scheme ?

There are 384 kByte free in the Flash on the Ohm board.
You can anyway not use the existing EEPROM drivers to write the top 32k of a 64k EEPROM, so reserving 64kB at the top makes not much sense. Special SPI Flash drivers can provide access to any sector/block in the Flash.

Andy

jmg · 2018-03-03 20:07

Ariba wrote: »

There are 384 kByte free in the Flash on the Ohm board.
You can anyway not use the existing EEPROM drivers to write the top 32k of a 64k EEPROM, so reserving 64kB at the top makes not much sense. Special SPI Flash drivers can provide access to any sector/block in the Flash.

Of course, but I was thinking of future proofing the tool flows, so the P1V image loads from the bottom, as it does for EEPROM.
Users can then use the same memory image files.

Ariba · 2018-03-03 20:31

jmg wrote: »

Ariba wrote: »

There are 384 kByte free in the Flash on the Ohm board.
You can anyway not use the existing EEPROM drivers to write the top 32k of a 64k EEPROM, so reserving 64kB at the top makes not much sense. Special SPI Flash drivers can provide access to any sector/block in the Flash.

Of course, but I was thinking of future proofing the tool flows, so the P1V image loads from the bottom, as it does for EEPROM.
Users can then use the same memory image files.

Ah, okay. We can have up to 96 kB Hub-RAM on a ECP5-25 and even more on the bigger variants, so it can make sense to start the hubram image direct after the configuration to get a contiguous memory area.

ozpropdev · 2018-03-04 00:32

Nice work Roger! :cool:
I haven't fired up my flea board yet. Hopefully this week I can squeeze some time in.

rogloh · 2018-03-04 01:11

Ariba wrote: »

Great news Roger

I will try to include your new booter and interpreter spin file into my rom-hex files. Maybe you can post the 4 hex data files in addition to the spin sources, I already have made a tool to merge them with the rom tables.

Andy

I have attached my set of custom ROM hex files below in another zip file. However this is for my own current system and for this to work it assumes SPI mappings are using certain pins on P1V port A. Because it includes a timeout it also needs to know the frequency and in my setup I have it as 12.5MHz during boot (fastRC clock emulation). Of course if you have a different setup it won't work without pin changes and a rebuild, but in the meantime you can try it out anyway if you go map the IO pins the same way for seeing it work. You can customize all these in the flea-booter.spin file, but you then need to rebuild:

  SPI_MISO_PIN = 8      ' SPI FLASH pins on port A
  SPI_CLK_PIN  = 9
  SPI_MOSI_PIN = 10
  SPI_CS_PIN   = 11

  BOOTCLK       = 12_500_000 ' 12.5MHz

Ariba wrote: »

Can you show how the settings must be? I had no luck to access the Flash with my custom processor so far. I can output the SPI clock with the available Primitive, but I could not read data, because the MISO pin was always tight to ground.

I had the same issue with IO pins not working but the trick I used is to have PULLMODE=DOWN and OPENDRAIN=OFF for both the CS and MOSI output pins. And just PULLMODE=DOWN for the MISO input pin. For some reason the default Spreadsheet view had some other settings and would not let it work. Also critical to this was setting MASTER_SPI_PORT=DISABLE and SLAVE_SPI_PORT=DISABLE and CONFIG_MODE=JTAG in the SYSCONFIG line. You can set it up in a tab of Spreadsheet View or edit the .lpf file directly. This opens up the SPI port normally reserved for config to support user mode control. However I found there is a slight downside to these settings it means that the SPI port is not accessible for programming the flash any further in the background by Valentin's tools, so to re-flash the ROM you first need to download into FPGA RAM some project that enables the SPI port (only takes 15s so not a big deal). Don't panic if at first you first think you've bricked it and can't reflash again. It still works and can be recovered by downloading into RAM an FPGA project image that re-enables the SPI port for flash reconfiguration in the background in user mode.

rogloh · 2018-03-04 01:32

jmg wrote: »

Rayman wrote: »

Does the fpga load in sqi mode?
Or, is it all just regular spi?

Do you mean the FPGA bitstream-image load step, or the 32k P1 code booting step ?

rogloh wrote: »

This 1MB chip is not fully used by the ~600kB configuration data so it can also be shared after FPGA initialization by dedicating the top 32kB block of the chip for storage of a typical P1 EEPROM image.

Is there any merit is making that the top 64k, as many P1 boards ship with 64k eeproms ?
Or even allowing more than 64k, as I guess that FPGA ceiling is quite well defined, and that would allow other P1V variants to have more Flash.

XIP is always possible, or a mini flash storage scheme ?

The flash on this board has two data pins so dual mode reads are technically possible, but not quad. I just used standard SPI in my boot code. It reads bytes from flash into HUB RAM at a rate of 1 every 15 hub cycles. So at 12.5MHz during boot that is 32kB in ~0.63s. Not fast but pretty similar to I2C. A dedicated SPI peripheral and/or instruction that sent bytes to HUB RAM could boost this much higher as the chip is rated to 50MHz clocks for reading, and you could use two pins. Also if the serial transfer timing used during boot was adjusted accordingly the clock rate could be boosted to 80MHz from 12.5MHz during boot to give a 6-7x speedup which probably makes sense. I didn't want to play with that serial stuff initially so I stuck with the RCfast clock rate.

I was also wondering if it would make a handy in built small Flash file system after bootup. The risk is that without protection you could corrupt the FPGA config data and then temporarily brick it. I was initially thinking of just turning off the CS pin in hardware after the first clkset instruction (and probably also free these IO pins back to the GPIO header) to ensure this corruption can't happen. But another way is to setup the flash chip to protect the lower 640kB or so from any tampering. This can be done using its protection registers (in a volatile way) to keep a portion of the flash chip protected until the next power cycle or when issuing the special setting to unlock it. You could just set that flash protection up in the boot code before launching the main application.

rogloh · 2018-03-04 01:52

jmg wrote: »

Good progress.
Can you add a short summary of the P1V that runs on this Board - ie COGs/RAM/MHz/Pins ? Boot-times ?

Setup is 8 COGs, boots 32kB from SPI flash in 0.63s after FPGA config done (haven't measured total time from cold start). I am running stock 80MHz. There is 60kB of HUB RAM, 4kB HUB ROM, and 32kB RAM still free for future use. 4 IO pins are mapped to on board flash. Remaining pins are mapped to some GPIO header pins, the on board microSD card reader and the USB enabled UART on the FT230X chip. HDMI lane data is mapped independently and driven in parallel to the usual video IO pins output by video generator. On board SDRAM is not yet used but I already have some prior Verilog code to try it when I get a chance.

There is a small problem with mapping all these IO peripherals and GPIO pins on the FleaFPGA board as there are more total IO that would fit a single 32 bit P1V port. The RasPi compatible header has (up to) 28 user controllable IO pins. There is also the on-board UART, a 4 (or 6) pin SD card, a 4 pin SPI flash plus two USB/PS/2 ports with a common pullup/pulldown IO control pin. There is also a single IO pin independent from the RasPi header which is ideal for a reset switch (or any other use). A single LED output pin is also available - I think it makes sense to use it to show SD card activity by paralleling the LED with SD card CS. It could also be driven by another pin if required.

Some of these IO pins can be incorporated into P1V port A. Some could be mapped to a fully enabled port B. However there may be some COG software around that assumes port B is not connected and reuses its registers when things get tight in the COG. If anything like that writes a COG's dirb register some hardware clash could theoretically happen if a port B was exposed to the outside world via GPIO pins. I was thinking to get around that it could be made safer with some global port B enable bit (e.g. to use port B you first enable by setting dira[31] = 1 which otherwise would drive the typical Rx pin and clash with the FTDI UART, and is unlikely to be set 1 in most software). This could then both protect the UART and port B stuff. If there aren't many peripherals on port B then the limited number of drivers required for those peripherals can be modified to both use outb instead of outa, inb instead of ina etc, and at the same time have added the one line change to enable dira[31]. Eg. you could have USB or PS/2 KBM driver COGs use port B. You could also put the boot flash there too.

I don't think it makes sense to split the RasPi compatible GPIO port pins over two P1V ports, it probably makes sense to keep it all port A or all port B, so you can still read or toggle any combination of IO pins simultaneously. I also like the idea of numbering the mapped IO pins using the same GPIO numbers they use on RasPi for convenience, e.g. GPIO2 goes to outa or outb port bit 2 etc. Make things easy to remember and no need for another translation level.

Maybe all the onboard peripherals could use port B and the GPIO pins could use port A - or even vice versa, still mulling it over for this board. In any case you'd still want the FT230X to use pins 30 and 31 of port A for the most compatibility. I can also see an argument for putting the SD card on port A as well because lots of tools assume that, including GCC capable proploader stuff that can already handily transfer files directly to the SD card from a host PC. Changing things around to make use of a port B requires some additional tool and/or COG driver changes. It's a bit agonizing deciding either way unfortunately.

Roger

rogloh · 2018-03-04 10:14

rogloh wrote: »

The flash on this board has two data pins so dual mode reads are technically possible, but not quad. I just used standard SPI in my boot code. It reads bytes from flash into HUB RAM at a rate of 1 every 15 hub cycles. So at 12.5MHz during boot that is 32kB in ~0.63s. Not fast but pretty similar to I2C. A dedicated SPI peripheral and/or instruction that sent bytes to HUB RAM could boost this much higher as the chip is rated to 50MHz clocks for reading, and you could use two pins. Also if the serial transfer timing used during boot was adjusted accordingly the clock rate could be boosted to 80MHz from 12.5MHz during boot to give a 6-7x speedup which probably makes sense. I didn't want to play with that serial stuff initially so I stuck with the RCfast clock rate.

Just tried to improve things further and was able to get the clock pushed up to 80MHz during booting from flash. This buys a 6.4x SPI transfer speedup. I should also recompute the RX pin polling timeout duration to use the BOOTCLK constant instead of assuming 20MHz. This will help reduce it back to the intended 150ms for the serial timeout interval, given the RX pin driven by the FTDI would idle high. Looking at the best possible startup time on this board from rising edge of the reset (not from cold power up) I think it is going to be about 150ms plus 7 hub cycles per bytes read in a fully unrolled byte transfer loop using Dual pin SPI transfers. For 32kB @ 80MHz that turns out to be 7*32768/5MHz = 46ms for the flash read transfer plus the initial 150ms serial timeout making about 200ms or so if the 20ms PLL stabilization delay is also removed from the ROM. If you really wanted you could use another pin state to enable serial transfers at reset time and you could lose that initial 150ms and boot in 46ms. That's getting rather snappy.

jmg · 2018-03-04 19:58

rogloh wrote: »

.... If you really wanted you could use another pin state to enable serial transfers at reset time and you could lose that initial 150ms and boot in 46ms. That's getting rather snappy.

P2 uses a pin to skip Serial-wait, maybe P1V can use the same idea/pin ?

The Greenwave GAP8 claims this for boot... (but no mention of just how much code is loaded in that 0.5ms)
- 0.5 ms cold boot time
- 10us to start or stop cluster
- Up to 250 MHz internal clock

If some small Verilog was added to assist Dual(Quad)-SPI transport, it seems the PCB HW floor limit for transport time becomes (32k*8/2)/80M = 1.6 ms ? (or perhaps (32k*8/2)/40M = 3.2 ms ?)

rogloh · 2018-03-05 07:36

Yes I can see a way on this FleaFPGA Ohm board to get it down to 1.6ms if you create a special new HUB op and dedicated SPI HW and use dual SPI transfers and clock 16x2 bits in at 80MHz, then write those 32 bits per hub cycle directly into SRAM assuming there was an inbuilt transfer counter that completed after 8192 transfers or something so you avoided extra DJNZ loop overhead etc which burns another hub cycle per iteration.

Fastest possible on this board would be 104MHz fast reads directly into dual ported hub SRAM but there is not really much gain over 80MHz to do all that.

1.6ms is getting crazy fast for a P1V reboot. I can only imagine what applications would need anything like that. Automotive/Aeronautics control or real-time trading stuff I guess. Anything that costs lots of money or becomes life-threatening during any downtime I would expect.

Some of the fast boot is probably moot anyway given the load time of the FPGA (order of 100ms on this board), but that is usually only encountered once at cold start. Although there could potentially be dynamic field upgrades supported too where an FPGA reconfig is required. Not sure if this ECP5 supports user triggered reconfig, certainly didn't see an external way to do it on the board but maybe there some internal way.

Roger.

p.s. Hey I just got the USB host code going too on this FleaFPGA Ohm board by the way. A USB keyboard and mouse are now reporting codes in their boot protocol formats. Happy days.

jmg · 2018-03-05 07:59

rogloh wrote: »

p.s. Hey I just got the USB host code going too on this FleaFPGA Ohm board by the way. A USB keyboard and mouse are now reporting codes in their boot protocol formats. Happy days.

Nice milestone...

rogloh wrote: »

Yes I can see a way on this FleaFPGA Ohm board to get it down to 1.6ms if you create a special new HUB op and dedicated SPI HW and use dual SPI transfers and clock 16x2 bits in at 80MHz, then write those 32 bits per hub cycle directly into SRAM assuming there was an inbuilt transfer counter that completed after 8192 transfers or something so you avoided extra DJNZ loop overhead etc which burns another hub cycle per iteration.

I wondered about 2 opcodes, one RD32SPI++ and one WR32SPI++, that could be generally useful for code access too.
Makes the small verilog more useful.
That would need a DJNZ, but could fit in 2 lines of code.

rogloh wrote: »

1.6ms is getting crazy fast for a P1V reboot. I can only imagine what applications would need anything like that. Automotive/Aeronautics control or real-time trading stuff I guess. Anything that costs lots of money or becomes life-threatening during any downtime I would expect.

Anything with a watchdog, can benefit from a fast reboot, as you never know when the watchdog might kick off.
RAM based systems are often not-design-selected due to load-time concerns, so it is good to reduce the time down to around what the hardware limit is.

SaucySoliton · 2018-03-07 06:38

Nice work, Roger!

rogloh wrote: »

Some of the fast boot is probably moot anyway given the load time of the FPGA (order of 100ms on this board), but that is usually only encountered once at cold start. Although there could potentially be dynamic field upgrades supported too where an FPGA reconfig is required. Not sure if this ECP5 supports user triggered reconfig, certainly didn't see an external way to do it on the board but maybe there some internal way.

On the iCE40 this can be done with the WARMBOOT module. I saw Dual Boot and Multiple Boot for ECP5. I didn't research enough to see how to use it. The FPGA config time is likely significant, but it should be possible to have the RAM initialized by the bitstream. I haven't tried it on real hardware yet, but it really helped the startup time when running in Verilator. The problem is that resetting the RAM likely requires reconfiguration. I though that the config time might be too long for the usual serial programming process to work.

rogloh wrote: »

p.s. Hey I just got the USB host code going too on this FleaFPGA Ohm board by the way. A USB keyboard and mouse are now reporting codes in their boot protocol formats. Happy days.

rogloh · 2018-03-08 02:39

SaucySoliton wrote: »

On the iCE40 this can be done with the WARMBOOT module. I saw Dual Boot and Multiple Boot for ECP5. I didn't research enough to see how to use it. The FPGA config time is likely significant, but it should be possible to have the RAM initialized by the bitstream. I haven't tried it on real hardware yet, but it really helped the startup time when running in Verilator. The problem is that resetting the RAM likely requires reconfiguration. I though that the config time mig

Was hunting through FPGA library reference stuff for ECP5 and didn't see any actual user peripheral for triggering FPGA re-configuration. I also noticed that the other Lattice FPGA families ECP/ECP3/LatticeXP/LatticeSC/M/MachXO etc have various JTAG interface blocks (JTAGA-JTAGF) in case they might have been somehow useful to trigger re-config. Not exactly sure how they would normally get used but there doesn't appear to be one that supports the ECP5 anyway.

Then when I read the sysConfig Usage Guide I noticed this which seems to confirm my suspicion that this can't be readily done by the FPGA itself.

"Refresh– The process of re-triggering a bitstream write operation.
It is activated by toggling of the PROGRAMN pin or issuing a REFRESH command, which emulates the PROGRAMN pin toggling. Only the JTAG port and the Slave SPI port support the REFRESH command."

Rizwan87 · 2019-11-17 11:46

HI
I have a query that I want to design a system on ECP5 12k FPGA.In reconfiguration file it says it can be reconfigured via SPI port too.

Does this means that I can load configuration file from microcontroller to the FPGA configuration memory,secondly it wont require any additional hardware programmer for it.

Regards
Rizwan

Ariba · 2019-11-20 10:32

Rizwan87 wrote: »

HI
I have a query that I want to design a system on ECP5 12k FPGA.In reconfiguration file it says it can be reconfigured via SPI port too.

Does this means that I can load configuration file from microcontroller to the FPGA configuration memory,secondly it wont require any additional hardware programmer for it.

Regards
Rizwan

Yes, that is possible.
The simplest is the "Slave Serial" mode, you only need one Clock and one Data line. You just send the *.bit file, generated by Diamond, serially to the FPGA, with the microcontrollers SPI peripheral or alternatively bitbanged.
For Slave Serial you need to wire the CFGMDn pins to '101'. (see TN1260 document)

Andy

P1V booting from FleaFPGA Ohm flash with custom boot ROM tools

Comments