Python P2 Loader
ozpropdev
Posts: 2,793
Here's a starting point for a possible Python P2 tool chain.
It converts an object file to the P2 base64 download protocol and loads @ 3Mbaid.
It's bare bones at present but things like multi P2 support can added as we move forward.
Give it a try!
Sample output
Update: 14th Jan 2019 - Version 1.2 with P2 auto detect and better command line usage.
It converts an object file to the P2 base64 download protocol and loads @ 3Mbaid.
It's bare bones at present but things like multi P2 support can added as we move forward.
Give it a try!
Sample output
Python P2 loader V0.1 Ozpropdev 2018 Prop_Chk = Prop_Ver E HUBSET Ok, max clock speed enabled Loading.......complete
Update: 14th Jan 2019 - Version 1.2 with P2 auto detect and better command line usage.
Comments
Can I suggest adding this line
That then gives a useful message ( Unexpected RX= b'> Prop_Chk 0 0 0 0 ') when RX-TX is looped, for a com-link test.
Useful for trouble shooting.
I think there was some ROM command to change sysclk ? is it possible to use that to bump the baud to 12MBd ?
I will check it again next time I fire it up.
Ahh, yes. How does that react in a genuine P2 ?
If it is FTDI, probably will not go > 3Mbd.
Do you have a CP2102N board ? (or a FT2232H, or FT232H) (note: needs to be CP2102N, not older CP2102)
I was idly testing your code on CP2102N, and whacked in 12MBd, expecting an error report... none came.
So I assumed it has rounded to nearest legal Mbd (4MBd?).... nope, scope says it is sending at 12MBd (but with some gaps, of course, as the USB cannot sustain that ).
I have tested CP2102N in the past, but usually in loop-back full duplex, which is tougher test than downloader use. There, 4MBd duplex was upper limit, needing HW handshake to never drop chars.
So, I try some other test points, in the Python code
Confirms CP2102N has a virtual baud clock of 24MHz/N, and it can even make a decent fist of send at 24Mbd, (but has no hope of RX at that speed.)
(It could maybe be used to confirm P2 serial Rx at 24MHz, or to generate test short pulses ~ 42ns steps, for smart pins testing )
At 12MBd Rx also struggles, but at 8Mbd and below, short RX packets look to be ok - ie 60 chars at 6MBd looks fine (loopback)
My earlier tests show under 4Mb full duplex loop back, but P2 download is not duplex, or even close, so things can push much faster, on a one way stream with short acks.
This is sending a single, very large string (via sp1.write(b'U'*2000000)), but shows Python is no slouch - and that the CP2102N appears to not drop any chars TX of a 2M block, even at 6/8/12/24MBd Tx
edit: added tests for 3M, 4M, 4.8MBd, as there may be some cases where gap-less send (no added stop bits) from PC to MCU is useful.
These tests show 4.8M almost manages, with 0.28% slow-down, and 4.8M with 2 stop bits predicts 4.3636363 average, but comes up ~ 300ppm slow.
2 Stop bits and PARITY_MARK is ok, gives the exact same byte-rate as 4MBd, but does buy added stop-bit margin.
3M & 4M look to be gap-less, at ~ +200ppm on requested baud (CP2102N uses lock to USB frame, so does not give low-ppm errors on Baud that Xtal based UARTS do. )
For P2 loader use (half duplex), CP2102N settings of 4.8M, 6M, 8M certainly look worth trying, should get close to ~1 sec for full P2 image, Base64 ?
Requires python-serial package in Ubuntu - may be named differently in other distros.
Did you test this on a real-code download & run ok ?
I've been playing about speeding the code up, and found some puzzles ... I have it ~2x faster, and think ~4x faster is possible..
Puzzles:
Your code seems to send one less char than predicted ?
& also does not quite base64 encode how I anticipated.
eg I find 0x14 maps onto U, so for my tests I want to send 'UUUU', which I make to be
0x14+(0x14<<6)+(0x14<<12)+(0x14<<18) = 0x00514514 dropped into 24 bits, should send 'UUUU'
- but your code needs this fill pattern to send 'UUUUUUUUUUUUUUU' (note 15U expected 16U ?)
buff1 = [0x51,0x45,0x14]*4
Downloaded code runs Ok.
Just download a 450K image with no issues.
All the tested code I tried is not affected by the dropped bits at the end of the stream.
I need to add a check at the end to empty the bit buffer.
Good pick up jmg.
Ah, thanks, I went back to the example Chip gave, reshuffled my packing, and now I have my faster code below sending
buff1 = [0xFB,0xF7,0x23,0xF6,0xFD,0xFB,0x23,0xF6,0x25,0x26,0x80,0xFF,0x28,0x80,0x66,0xFD,0xF0,0xFF,0x9F,0xFD]
# Want : +/cj9v37I/YlJoD/KIBm/fD/n/0
# Get : SendStr +/cj9v37I/YlJoD/KIBm/fD/n/0 Len() 27 = OK, takes 2 remainder branch
Your original code does this
# Get : SendStr +/cj9v37I/YlJoD/KIBm/fD/n/ Len() 26 - Missing '0' ?
Speed Tests - original were a little slow in the Base64 prep area...
# Len: 699051 Time to construct s: 1.088480
# Len: 699051 Time to construct s: 0.999540
# Len: 699051 Time to construct s: 0.988162
improved code below : is now ~ 4x faster,
# Fixed packing of build_txt_R3W4()
# Len: 699056 Time to construct s: 0.252640
# Len: 699056 Time to construct s: 0.248172
# Len: 699056 Time to construct s: 0.256134
expected transport times USB-UART (add to the prep time above)
CP2102N 6~8MBd appx 699056/(5.5M/10) = 1.271s
FT232H 12Mbd appx 699056/(12M/10) = 0.582s
Can you confirm that works on some real downloads ?
Just had to add pre/postamble to stream.
Good to hear
With modest tuning, this Python is looking quite useful. A final device should manage the 12MBd download, fingers crossed ?
Just checked the code with FT232H,
If someone was doing production programming, there could be a case for allowing send of a pre-prepared Base64 file, to save (most of) that prep time.
In your code above you have: When I do this in javascript I get something slightly different: Outputs:
https://forums.parallax.com/discussion/comment/1384233/#Comment_1384233
and stopped when I had equality.
As a cross check, 20*8/6 = 26.666, so it should 'fit into' 27 chars.
This padding may or may not be required depending on the variant of the base64 standard one is using.
See here: https://en.wikipedia.org/wiki/Base64#Variants_summary_table
The padding is mandatory for Base64 transfer encoding for MIME (RFC 2045). As used for encoding binary data, as we do here.
This sounds like a bug.
The Python base64 library agrees: Outputs:
Though if it does not actually conform to the base64 standard we should not call it that.
I betting the Prop PROM does not care about that padding and that it will accept output from Javascript or Python base64 encoders or others.
Have you actually tried using Python's base64 module?
Actually I was wondering why you hand coded that encoder. I would have expected the built in one to be much faster.
But, I suppose the " ~" at the end makes it so you don't need it.
I searched and found how to do this in Visual Studio:
The ROM seems to exit on first not=base64 char, and I guess extra '=' are tolerated.
Easy enough to try, but it also looks easy enough (see below) to strip any trailing '=' at the host side.
I think it gives either none, or '=' or '==' as appended packer ?
No, I was partly assuming the hand-coding I started from, was required because P2 was different enough..
checking this seems to remove any trailing '=' packers, and yes, the built in one is significantly faster, slashes prep time from ~230ms to ~ 10ms
https://forums.parallax.com/discussion/comment/1391790/#Comment_1391790
Not sure if this was implemented or not but that might be a issiue.
Mike
'Standard' base64 looks to use a 65 char-set, with the '=' as a packer. The P2 ROM I think exits base64 decode if it see's '=', but you can also use .rstrip(b'='), as above on the host side.
The latest P2 Autobaud revisions of '> ' use an advanced char edge ratio test, (tFF,tH) via the smart pins, which are almost as precise as 'U', but can be trapped in a mid-stream capture.
ie if you send a continually repeating
"> Prop_Chk 0 0 0 0 "+"> Prop_Chk 0 0 0 0 "+"> Prop_Chk 0 0 0 0 "+"> Prop_Chk 0 0 0 0 "...
that can Exit-reset and Autobaud fine, no matter what phase the reset release has.
Other MCUs cannot quite manage that ....
“CP210x devices can generally provide a higher throughput than the CP2110. The CP2110 is limited to 1 x 64 byte packet per 1 millisecond, while the CP210x will generally have multiple 64 byte packets per 1 millisecond. “
Maybe this is an issue
Speed is always an issue
64 bytes/ms is ~64k Bytes/s, whilst the CP2102N here is hitting ~550k Bytes/s with Python (FT232H hits 1.2MB/s)
- but you are right, HID is driver-less and could appeal to some, even at the lower speeds, so a HID variant certainly sounds possible/useful.
Google finds some Python HID hits- has anyone done HID with Python ?
Addit: Found time to do some Rx tests, with a FT2232H 'standing in' for a P2 - sending high speed UART data to a CP2102N TX was given above, and looks fine to 2M block sends. ~ 550k Bytes/sec stream speeds.
RX is tougher, and here looks like 2 stop bits allows moderate packets from the P2, and some simple pause-code should save needing HW handshake lines.
Can you test/merge this change, from above ?
If that works ok, try one with the .rstrip(b'=') removed - that then leaves the 0/1/2 packer '=' appended.
P2 seems to ignore the '=' character.
Eezy peezy
That brings total Download times down to this ballpark, for a full 512k image download to P2 of
CP2102N 6~8MBd appx 699056/(5.5M/10) = 1.271s + 10ms
FT232H 12Mbd appx 699056/(12M/10) = 0.582s + 10ms
(tbd is the actual Baud a P2 chip can support, when switched to faster sysCLK)
FWIR ~2Mbd was spec'd at 20MHz min, so 80M (FPGA) should manage ~ 8Mbd and > 120MHz should support 12MBd downloads.
Do you have a FT232H/FT2232H there, you can test at 6M/8M/12M on the 80M FPGA setup ?
And a reference comparison, spotted over in AVR forums, is this comment from someone testing their
MplabX 5.05, and the ATMEL X-plained Mini 328p
"..was disappointed to see write speeds for about a minute for a 20k plus file size. That compares to about 50 seconds for Atmel Studio 7"
hmm... their 50~60s for 20k, is going to make P2 download look very snappy indeed, to Atmel users
Has anyone had a chance to try this on silicon (assuming you've received it)?
I tried it last night for myself with no luck, with and without the incorporated changes above.
It seems to detect the P2 just fine, but the loading stage fails "successfully," i.e., it ends with "Loading.......complete" but the code doesn't actually seem to run.
Though I'm not very familiar with Python, so I had to guess at how some of the changes were supposed to be incorporated
Attached was the last I tried, which was to use the "built-in" base64 libs.
Also attached was the code I tried on the P2 - slightly modified copy of the all_cogs_blink.spin2. This works fine loaded from PNut under wine. Just note it's been transformed with unix2dos - otherwise PNut showed all of the code on one long line.
Cheers,
Jesse
This version loads the P2 eval board Ok.
In my own P2 code I send my own "ACK" back to my loader to prove load success.