Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

evanh · 2017-10-19 01:45

ozpropdev wrote: »

Problem is as soon as I connect the scope the code load issue disappears.

Yeah, that doesn't surprise me, given the sensitivity you're experiencing. It won't be directly related to what I'm dealing with.

evanh · 2017-10-19 01:51

jmg wrote: »

evanh wrote: »

In the PNut sequence, there is about 270 ms from the normal reset#2 to the problematic reset#3. So, a blanking filter would need to be large.

If pnut does an ID query, why does it even need to generate a 3rd reset ?

The third reset is not intentional. It's a compatibility difference between Windoze and Wine.

There is less lead-in delay after 3rd reset, but still should be enough ?

The reset will be cancelling the sequence too. So, it likely needs eliminated entirely.

jmg · 2017-10-19 03:26

evanh wrote: »

jmg wrote: »

evanh wrote: »

In the PNut sequence, there is about 270 ms from the normal reset#2 to the problematic reset#3. So, a blanking filter would need to be large.

If pnut does an ID query, why does it even need to generate a 3rd reset ?

The third reset is not intentional. It's a compatibility difference between Windoze and Wine.

Is pnut doing a close and reopen at that point ?
I recall mention of bonus/bogus handshake line activity when a port is (re) opened, but once open, it should 'behave itself' ?

cgracey · 2017-10-19 05:40

That 10nF cap is there so that a rising edge on DTR creates a RESn pulse. It shouldn't matter how long DTR stays high. It's the positive-edge event that creates the pulse. The falling edge does nothing.

jmg · 2017-10-19 05:43

cgracey wrote: »

That 10nF cap is there so that a rising edge on DTR creates a RESn pulse. It shouldn't matter how long DTR stays high. It's the positive-edge event that creates the pulse. The falling edge does nothing.

For FPGA testing, it should be possible to swap the 10nF with a 47K, and use the falling DTR edge to give a (much) cleaner reset rise signal.
Provided pnut generates a narrow DTR, either circuit should work.

cgracey · 2017-10-19 06:04

Here is a new BeMicro-A9 file that outputs the RX input, the TX output, and the RESn input to pins 31..29:

https://drive.google.com/file/d/0B9NbgkdrupkHRnZTYVhwajM1TFk/view?usp=sharing

This will let you see pn P31..P29 what the FPGA is seeing internally throughout the loading sequence.

ErNa · 2017-10-19 07:16

I know two mechanisms, that create false signal transitions: "ringing", so the input signal oscillates up and down crossing the switching level, and, much less often, slow signal slopes, where a non schmitt triggered input "sees" a constant voltage, which is overlayed by noise and so creates a stream of high and low.

Rayman · 2017-10-19 14:30

Upside to only 8 cogs?

Some latencies must be reduced going from 16 to 8 cogs, right?
Was there a list of what timings have changed?

cgracey · 2017-10-19 14:45

All the hub timings have changed, including the CORDIC. I need to document them..

jmg · 2017-10-19 18:40

cgracey wrote: »

That 10nF cap is there so that a rising edge on DTR creates a RESn pulse. It shouldn't matter how long DTR stays high. It's the positive-edge event that creates the pulse. The falling edge does nothing.

How wide is the DTR pulse(s) that pnut generates deliberately ?
Anyone know how wide the unwanted DTR pulses are ?

Tubular · 2017-10-19 19:15

Pnut aims for 25ms DTR pulse.

What I remember from observations on the scope was more like 18ms, but I'm going to go back and confirm this in coming days on a few PCs

cgracey · 2017-10-21 15:20

I've removed the fuses from the Prop2 chip, along with the ROM code to deal with them.

What's left is a leaner ROM and loading protocol that has no SHA-256/HMAC delays.

Here is the future v25 documentation, for which I've updated the 'BOOT PROCESS' section:

https://docs.google.com/document/d/1hLLw4pU1uwOo1mHtF17oB91KNTsZv4zbC6nGSYbJi-o/edit?usp=sharing

Download time for short programs is now in the 50ms range. That's from hitting F11 to seeing your program execute.

For those of you who make custom loaders, I figured you might want a heads-up on the loading-protocol changes, which are minimal, really.

I'm going to tidy up the new ROM code and test it thoroughly, then make a new v25 FPGA-image release over the next few days.

jmg · 2017-10-21 19:10

cgracey wrote: »

Download time for short programs is now in the 50ms range. That's from hitting F11 to seeing your program execute.

What is the time from reset release, to program run ? - let's say one whole COG image, tho a Base ms + us/k formula is better to put in the docs.
ie shortest times, timeouts skipped via pin-pulls, for SPI and Serial loaders.

evanh · 2017-10-21 23:37

The Prop123 FPGA sees about a 150us reset pulse from the onboard reset circuit.

cgracey · 2017-10-22 10:15

I simplified the serial loading protocol...

There are three basic commands which the sender can issue:

1) Request Propeller type:

Prop_Chk <INAmask> <INAdata> <INBmask> <INBdata>

2) Load and execute hex data, with and without sum checking:

Prop_Hex <INAmask> <INAdata> <INBmask> <INBdata> <hexdatabytes> ?
Prop_Hex <INAmask> <INAdata> <INBmask> <INBdata> <hexdatabytes> ~

3) Load and execute Base64 data, with and without sum checking:

Prop_Txt <INAmask> <INAdata> <INBmask> <INBdata> <base64chrs> ?
Prop_Txt <INAmask> <INAdata> <INBmask> <INBdata> <base64chrs> ~

In the case of Prop_Hex and Prop_Txt, ending the data with a "?" will cause one of two things to happen:
1) If the embedded checksum was correct, a "." character is sent back and the program executes.
2) If the checksum was incorrect, a "!" is sent back and a new command is waited for.

ozpropdev · 2017-10-22 10:22

I like the tweaked protocol. Nice and easy!

cgracey · 2017-10-22 12:52

jmg wrote: »

cgracey wrote: »

Download time for short programs is now in the 50ms range. That's from hitting F11 to seeing your program execute.

What is the time from reset release, to program run ? - let's say one whole COG image, tho a Base ms + us/k formula is better to put in the docs.
ie shortest times, timeouts skipped via pin-pulls, for SPI and Serial loaders.

I've got things really flying now.

The flash boot time, from RESn high to load and start executing a 256-instruction cog program, is 11ms. That's with a pull-up on flash CLK to tell the loader to skip serial, which it would have afforded 100ms.

The time to contact, download, and execute a short program from PNut is now 20ms. Sometimes it takes ~30ms because of the time it takes Windows to put up the status message box. That could be decoupled from the comm's, I suppose. Anyway, it's fast. Oh, I got rid of that intermediary mainloader.spin program. Now, I just use Prop_Txt (base64) and run everything at 2M baud. It's way simpler.

Dave Hein · 2017-10-22 15:06

Without mainloader.spin the user programs will now need to set up the clock frequency. So all our programs will need to add a CLKSET if we don't want to run with RCFAST. That's fine, but it does require a minor change to user programs.

Also, for large programs isn't it faster to use mainloader.spin to load the program? Base64 requires sending 33% more bytes.

And one more question on the changes to the loading protocol. Did you only add the "?" commands, or did the other commands change also?

evanh · 2017-10-22 16:26

Dave Hein wrote: »

Also, for large programs isn't it faster to use mainloader.spin to load the program? Base64 requires sending 33% more bytes.

At 2 Mbit/s that's 150 kByte/s instead of 200 kByte/s. That's no biggie. Given the savings in overheads it'll take a decent sized program before that becomes a noticeable factor.

Actually, this gives another impetus for larger systems to implement another method of store for bulk data, eg: QuadSPI.

Dave Hein · 2017-10-22 17:05

A 512K program will take 3.5 seconds to load with base64 versus 2.6 seconds with raw binary, but like you said it's no biggie. Anyhow, serial download is normally used just during development, and the program would be burned into flash for production.

Using mainload.spin really isn't that complicated. I'll probably keep it in loadp2. However, I'll have to change loadp2 to not set up the user clock frequency to make it work the same as PNut, and then I'll have to change my test programs so they all do a setclk at the beginning instead of relying on mainload.spin to do it.

evanh · 2017-10-22 17:09

Oh, yeah, doh! I was think that was also the 1-bit SPI speed limit for some reason.

jmg · 2017-10-22 19:11

cgracey wrote: »

Oh, I got rid of that intermediary mainloader.spin program. Now, I just use Prop_Txt (base64) and run everything at 2M baud. It's way simpler.

The Autobaud is still there tho ? Was “>” character ($3E) ?

cgracey wrote: »

I simplified the serial loading protocol...
There are three basic commands which the sender can issue:

1) Request Propeller type:
Prop_Chk <INAmask> <INAdata> <INBmask> <INBdata>

For small MCU cases, can there be a simplified version of this 'Is anyone there?' ?

With multiple reset-exits going on in P2+MCU cases, some means of fast polling is needed to know soonest time when both parts are 'ready'

It could poll a repeating eg '>Prop_ChK', or even '>P?' - the shorter the better, with no masks, & reply echos the Prop type.
Systems with many P2 will be relatively rare, and smaller kernal code in the MCU means more room for P2 code image.

ozpropdev · 2017-10-23 00:05

Maybe a "Prop_Max >" command could be added to switch the serial loader to "CLKSET #$FF"
Then with autobaud "> " we can use faster bridges like FT2232H @ 12Mbits if more speed is required.

jmg · 2017-10-23 00:14

ozpropdev wrote: »

Maybe a "Prop_Max >" command could be added to switch the serial loader to "CLKSET #$FF"
Then with autobaud "> " we can use faster bridges like FT2232H @ 12Mbits if more speed is required.

Interesting idea, but I think the P2 design cannot currently link the VCO to the RC-FAST as a reference - if it could do that, higher Boot-ROM speeds might be possible ?

Perhaps the test chip(s) can be used to check that PLL from RC-FAST is workable ?

Anything in the loader, should work in the final silicon, and I'd be wary of any command that was FPGA-host specific.

ozpropdev · 2017-10-23 00:37

jmg wrote: »

Interesting idea, but I think the P2 design cannot currently link the VCO to the RC-FAST as a reference - if it could do that, higher Boot-ROM speeds might be possible ?

Perhaps the test chip(s) can be used to check that PLL from RC-FAST is workable ?

Anything in the loader, should work in the final silicon, and I'd be wary of any command that was FPGA-host specific.

The "Prop_Max >" scheme would only apply to P2 systems with xtal based clock source.
Like the P1 you wouldn't switch to a xtal/pll mode if the xtal wasn't there.

jmg · 2017-10-23 00:44

ozpropdev wrote: »

The "Prop_Max >" scheme would only apply to P2 systems with xtal based clock source.
Like the P1 you wouldn't switch to a xtal/pll mode if the xtal wasn't there.

ok, that could be useful, but then you would need enough following parameters to load into the PLL registers. A wide range of Xin values are valid on P2.

IIRC the P2 PLL/VCO is now like most, with a SysCLK divider, and a VCO_FB_Divider, and Xtal_FB_Divider to the common PFD frequency.

Command then looks something like
">Prop_PLL Sys_Div VCO_Div Xtal_Div" + some pause for PLL lock, and host Baud-redefine, and then '>' at the new higher Baud rate.

Addit: Using this, a simple means to boost boot from a fast-UART part like EFM8UB3 becomes available

With the available ~ 32kBytes of P2 code storage in the UB3, that's 5.4~4ms loading times, at 6~8MBaud that part should be capable of.
(plus other hard-wired delays inside P2, hopefully, those are not too great...)

cgracey wrote: »

The flash boot time, from RESn high to load and start executing a 256-instruction cog program, is 11ms. That's with a pull-up on flash CLK to tell the loader to skip serial, which it would have afforded 100ms.

What are the new boot-delays, for a small MCU as the loader - from reset release to first char read ?

cgracey · 2017-10-23 05:08

Engaging a crystal and the PLL is maybe too much for the serial loader, as the FPGA and real silicon have different circuitry. The serial loader needs to work from the common base.

cgracey · 2017-10-23 05:11

For a small MCU as loader, serial commands can begin 15ms after RESn release. A small download can complete in 2ms after that.

jmg · 2017-10-23 05:26

cgracey wrote: »

For a small MCU as loader, serial commands can begin 15ms after RESn release. A small download can complete in 2ms after that.

That 15ms seems quite long, given it can load from flash & run in 11ms ?
What causes the 15ms, and can that be shortened ? RCFAST should have a very short start time.

cgracey wrote: »

Engaging a crystal and the PLL is maybe too much for the serial loader, as the FPGA and real silicon have different circuitry. The serial loader needs to work from the common base.

Of course, it would not be the default.
It does provide an appealing way to simplify/speed-up MCU loading, as they can 'change-gears' without needing to administer and load some faster stub first.
Since P2 lacks any Xtal fail detect, this also gives a nice way the MCU can check P2 Xtal, at boot-up.
The MCU would know what Xtal was used, and what was system-legal.

With external loading mandated on P2, anything that can speed that Reset->Load up, makes it more tolerable to users.

ozpropdev · 2017-10-23 05:37

Hi Chip

I've been looking into the weird Pnut loading issue which appeared on the BeMicro-A9 V24 image.

The issue seems to be caused by a combination of a routing difference with Quartus and the serial comms routines in ROM.

My initial focus was on the reset line but nothing really stood out as strange there.

I was able to capture pin activity on reset and Rx(#63) Ok but as soon as I try to probe the Tx(#62) pin the issue would dissaspear.

With a probe connected to Tx and a successful load from Pnut I detected a glitch and slow rise at the end of each data packer.

This glitch can be seen at the end of the "Prop_Ver" strings last stop bit.
The same glitch can be seen when the status ". or !" byte is transmitted too.

It appears that the glitch is caused by the pin being floated at the end of data packets.

This seems to explain why the identify hardware works as Pnut is looking for a "Prop_Ver" string so the
last character "lf" is ignored anyway.

Because the status response is a single byte "." or "!" and it's being corrupted the "lost" error is triggered.

I assume this might be linked to the half duplex serial mode?
This also seems to explain why my own loader and dave's loader work Ok.

Edit: Fixed typo pin numbers

Prop2 FPGA files!!! - Updated 2 June 2018 - Final Version 32i

Comments