Loading Binaries to SPI Flash (with loadp2?)
n_ermosh
Posts: 294
in Propeller 2
I can't seem to find any documentation on how to load compiled binaries to the SPI flash on the ES boards. I typically use loadp2 to load binaries after compiling them with my LLVM port, but want to figure out how to be able to boot these programs from SPI Flash as well. Does loadp2 support this? Or another app?
Comments
There is a hint in the options list when typing "loadp2" but it's not obvious. Burning the EEPROM uses the multi-load option. So you attach your binary to the flash loader from the board/ subdirectory.
Eg:
loadp2 @0=P2ES_flashloader.bin, @8000+yourprogram.bin
EDIT: There's examples in the readme and the source code in the board/ subdirectory.
``
loadp2 -b 230400 "@0=P2ES_flashloader.bin,@8000+myprogram.binary" -t -k
the flashloader is located in loadp2/board/P2ES_flashloader.bin
``
This is approximately how flexgui does it.
I don't know if there is a flashloader that can program more than about 500kB (more data than fits in hub ram).
I modified the flashloader to work better with other flash chips here:
https://forums.parallax.com/discussion/comment/1525737#Comment_1525737
I hope the forum doesn't change my standard quotes to right and left directional quotes.
Thanks for the link James. I was the one who went headlong into optimising P2ES_flashloader.spin2. I did get a tad enthusiastic going with dualSPI operating mode when there is so many variations around.
It's probably time to port the code to Spin2. The split nature of the code, burning and booting, definitely needs some clean up. The low-level SPI routines are written in two completely different ways.
That was easy! Thanks!
Hmm, gonna have to think about this. I'd forgotten that an all-smartpin solution also has its gotchas - https://forums.parallax.com/discussion/comment/1488948/#Comment_1488948
Might be smartpin for rx and bit-bash for tx.
The other approach is use a streamer for both. That's what Chip did with his version. I had avoided using the streamer early on because all the mode bits got shifted around, and new streamer modes got added, between revA and revB silicon. Err, scratch that. Streamer operation suffers the same as bit-bashing. Chip got away with it because his code operated only at RCFAST sysclock frequency.
I have a similar obsession with efficiency and high speed. But I made the decision to run my stage2 loader at RCFAST.
Yep and it's way simpler not to have to touch the clock config for sure. Chip's solution also was very small, cleanly written and performant but needed a slow sysclock ... as well as not working on revA silicon.
I'm thinking somewhat more general at the moment - https://forums.parallax.com/discussion/173892/simple-spiflash-how-to-use-higher-clock-rates-with-p2-edge/p1 Maybe I should take the pondering over there ...
Somewhat related question with loadp2. What does the "-FIFO" flag do, and how do you determine the proper size? I set it arbitrarily to 1024, since loading always fails with default. However, it still sometimes fails for the same image with that or larger numbers, and I can't figure out how to reliably load images.
Looks like it is meant to be set to match the actual hardware. It's for calculating pause times to allow the hardware buffer (FIFO) to empty out. Presumably the FT chip on the Eval boards has the default 512 byte FIFO.
Interesting--with the eval board, the 512 default doesn't work at all, so likely also OS/host machine dependent. I'm using a Macbook Pro.
Try a slower baud maybe? Any idea what speed you're using now?
EDIT: Also, that -FIFO option is very new. It wasn't there previously. It's possible it hasn't been tested on Macs.
It was tested on my MacBook Pro. I only added in this option in case it needed to be varied from the default 512 byte transmit buffer in the serial device (which is the FT231X on the PropPlugs and P2-EVAL board IIRC). I wonder if any new version of FTDI drivers or MacOS has changed something else in the buffering path affecting the total transmit timing. I still need to run OS X 10.10 here which is pretty old now. If different OS/driver setups need different delays it will be difficult to get right across all of them.
Further to this...I hope that the wait_drain() call still works on newer OS X systems. This calls tcdrain(hSerial) underneath and is used to wait until the OS thinks it has drained characters out of the serial device buffer.
The FIFO argument (or it's default) is only used to account for the extra delay in any attached USB->serial device outside the computer itself. In this case it is the buffer in the FT231X chip that in the worst case is assumed to be full after a large burst sent to it from the computer, and needs to drain out to the serial device at the configured loader baud rate, before we should start our timeout waiting for a reply from the P2.
If there is another delay between the serial device buffer and the bytes actually exiting the computer over USB (perhaps even OS scheduling or other driver buffering delays), that would need to be included in the timeout delay interval as well, and is currently not part of the calculation. If there was a definitive API in OS X that told us when the serial data has really been sent we could use that.
Found a related conversation here...
https://www.linuxquestions.org/questions/programming-9/serial-port-write-wait-for-transmission-complete-with-the-function-tcdrain-4175502564/
Update: By the way @n_ermosh , which of these errors are you seeing? They happen to be in different places...
or
or
I always get a timeout with checksum -1 error.
And for loading baud rate, I’m using 230400
Unfortunately it is difficult to resolve the problem on a different system to my own. If you have an analyser and can capture the serial transactions for TX/RX after reset and download with loadp2 that shows the delays between the different transfers, it might be possible to see what is happening causing the timeouts. I know it took me quite a while to try to pin this stuff down in my own setup here by checking the serial download failures with what I saw on the logic analyzer. If some other delays are being introduced by different driver buffering or differences in OS versions, that could potentially explain this.
I will at least try out the 230400 rate you used specifically to see if it can happen on my own setup here with OS X 10.10. I tend to use 2Mbps but I believe I did try this over the range of baud rates.
You might also try the -SINGLE option to see if it gives you any more luck.
There is a number of timeout errors possible:
ERROR: timeout waiting for initial checksum: got %d
ERROR: timeout waiting for checksum at end: got %d
ERROR: timeout waiting for string [%s]
And some more cryptic shorter pieces.
What is the exact error message?
PS: Yes, trying -SINGLE is a good idea.
Yes I had asked about that too and n_ermosh indicated it is a checksum type of timeout error (so either the first or second one). In my digging I also found this information linked below which sort of implies that the tcdrain call does actually call down into the lower level HW driver queues to find out when the transfer is done. But how far this really goes with USB on MAC and whether the FTDI or Apple's serial driver implements it correctly/consistently I'm not sure.
http://embeddedguruji.blogspot.com/2019/02/serial-port-programming-part-4-tcdrain.html
Also @n_ermosh one thing I forgot to add before: double check your DIP settings are correct and if you have a micro SD card fitted etc. I've had issues in the past with serial download if they were not setup correctly when I'd left a card in the P2-EVAL.
Sorry, I was away from my computer when I last responded--here's the exact error I get, with P2 Eval board and my Macbook Pro
trying /dev/cu.usbserial-DN43U5CA... P2 version G found on serial port /dev/cu.usbserial-DN43U5CA Setting load mode to CHIP Setting clock_mode to 12427f8 Loading fast loader for chip... ERROR: timeout waiting for initial checksum: got -1 Try increasing the FIFO setting if not large enough for your setup
DIP settings are correct (all switches off for loading from serial)
You could try to download and build a slightly older version of loadp2 prior to September 5 2021 that didn't have the FIFO timing fix in it to see if it helps resolve your issue. Here's where loadp2 can be obtained directly
https://github.com/totalspectrum/loadp2
Try -SINGLE option. It doesn't call flush_input() after the msleep(). And use -v option too.
I reverted my build of loadp2 to git commit c072514f74b06413254e4c34ebec587f0489ffb0 and it seems to be consistently working now. To make it work with my slow baud rate loading (necessary for the specific thing I'm working on), the -SINGLE option was necessary. What does -SINGLE actually do, what is "single stage loading"?
The -SINGLE option only ever uses the "Prop_Hex" serial command and its data download format to load your code in one single step to the P2.
The -CHIP option instead uses a two stage downloader, where a small first stage helper program is initially loaded into the P2 and run using the Prop_Hex command and this is then followed by your own program sent serially in raw binary format at high speeds. The initially loaded helper program takes over from the ROM downloader and performs its own autobauding and can handle the higher density serial data that is subsequently sent to it by loadp2.
It's also separately coded on the PC side. The two methods have very little in common other than basic comport access, so a bug in the programming of one method may be unique to that method.
What is common is the detection of the prop2 chip. I noted that detection worked fine but the next part, loading the loader, wasn't working.
ERROR: timeout waiting for initial checksum
occurs when the loader fails to load. It's basically saying there is no response from the prop2.I'm suspicious it is loading but the
flush_input()
call is erasing the checksum due to timing problem.Yeah the loadp2 code in the -CHIP mode never seemed to check for the successful download of the loader itself, while the -SINGLE version now does (after I just re-enabled the use_checksum flag by default). I don't know why that should need to be the case with -CHIP, but that is just how it was coded. It would be far better to know for sure that the first stage download succeeded before attempting to further communicate with it in -CHIP mode. Maybe the "." character the P2 returns if you sent a "?" instead of "~" was messing up the receive timeouts or something, before we had code that also accounted for the FIFO drain time, I'm not sure.
This loadp2 code has some dependencies on round trip timing and it relies on wait_drain() to do the right thing for the transactions to work correctly, if they don't work this code won't work. Sprinkling extra sleep times in different places to try to compensate for this is only covering up the issue but not really solving it 100%.