Where is Spud?

potatohead · 2016-07-16 17:24

Hey guys! I've been reading along. Itching to jump back into P2 land, but I've been in P1 land for the last couple months.

...at times, wishing it was a P2, but that will come, and I can wait, and P1 is plenty. Easy too. We all know that, and it's awful fun to have those, "but you can't do that on...." moments. Then they see a little block of code doing spiffy stuff and ask for the datasheet.

Total win.

Finally, after a long time doing this, I ended up with a work related project I can tackle. All that stuff we've done here together really pays off! Thanks guys. I know stuff. I think, for this project, I know enough stuff too. Should that not be true, you know I'll ask.

But, I lied. This last 4 weeks, I've not read anything. Where are we at? Is the FPGA image sorted? Are we in full test mode?

Tomorrow, I will seek to catch up and update my FPGA. See if my stuff still runs.

Any thread pointers to "stuff Spud should know, because Spud was gone for a while...." info will be appreciated!

Hope all of you are happy, healthy, and having a lot of fun!

cgracey · 2016-07-16 17:58

I'm wrapping up the ROM code now. It just needs a little optimization.

The ROM first checks for a serial connection, then it checks for an image in SPI flash. It loads whatever is available from serial or flash, and then authenticates the loaded $1F8-long program with its 8-long HMAC signature using the first 128 fuses as a key. If the loader authenticates, it is executed. If not, the clock is slowed to 20KHz and cog0 stops, in order to go comatose until another reset.

I'll make loaders which can load RAM (as we currently have) or SPI flash.

I decided NOT to put a monitor program into ROM. The reason being that because the ROM is only practically accessible on boot-up now, and not mapped into hub memory, it wouldn't be usable at any time other than start-up. So, it wouldn't be otherwise available as an instantiable asset. Of course, a monitor could be used to load code from a generic terminal program, but I wonder if there'd be much interest in making tools that don't use the quick and tidy loader protocol. Well, what do you guys think about this? Maybe I'm a little punch drunk and the idea of ONE WAY seems a respite.

potatohead · 2016-07-16 18:15

Without a RAM protect, a resident monitor does not make sense.

That said, I would use the "squirt it in" mode we had in the Hot design. I liked that one could bootstrap onto the chip from anything. But, what is worth what?

If we set that aside, a nice set of tools could be the first thing someone loads.

At some point, I would love to see a loadable assembler, editor, monitor, maybe.... small SPIN compiler. Just the core keywords, no special things, use in-line assembly for those.

That mini-environment can be "finished" and used very long term, just like P1 SPIN + PASM is. Call it the baseline, "works no matter what" target environment.

Everything else is bigger, changes, whatever.

my .02

cgracey · 2016-07-16 19:33

Yes, I think maybe if authentication fails and no fuses are set, we could have a squirt-in loader. From the perspective of the PC/host:

1) must be >1 second after Prop2 reset
2) send space (Prop2 recalibrates serial timing from each space/$20 received)
3) send non-hex/whitespace chr to reset loader (white space = space/tab/cr/lf)
2) send hex value to set load length (ie "FFFC0")
3) send data bytes (ie "12 34 56 78 9A B C D E F 0 a b cc dd ee ff...")
4) send 16-bit checksum from data bytes (ie "abcd")
5) if checksum okay, Prop2 does "coginit #0,#0", else it waits for more characters

This would allow the Prop2 to be loaded from any machine with a serial port. The only problem with having this is that the chip would never go to low-power mode when it didn't have a program, as it would always be waiting for serial data. With 1mA of quiescent, anyway, that may not matter, as this program might take only 1ma at 20MHz and 1 cog.

potatohead · 2016-07-16 20:30

Honestly, that's a nice option.

It could time out, say after one minute too, if a power compromise is warranted. I'm not sure it is either.

Doing it this way, as an option, allows for unknown integration and development platforms. And it can remain OS / tool / computer independent. Someone may home brew something and want to put a Prop 2 to use. This way, they can with basically zero dependencies. Their tools, their machine, etc...

cgracey · 2016-07-16 21:05

Timeout... That's what it needs. Good idea.

cgracey · 2016-07-17 09:03

Rethinking what the text-based loader would do...

a) Prop2 recalibrates serial timing from each space/$20 received
b) white space = space/tab/cr/lf
c) unexpected characters cause the text loader to start over
d) text loader times out after one minute and chip enters coma

From the perspective of the PC/host:

1) must be > 1 second and < 1 minute after Prop2 reset
2) send special string to signify start of data (ie " Propeller 2" - note the leading space)
3) send hex data bytes separated by white spaces to load into hub 0, 1, 2, 3...
4) send "G", then cog does 'COGINIT #0,#0'

That's really simple and does away with the checksum. The checksum could be implemented by the program at $00000. This way, a simple, editable text file just needs to be transmitted to the chip for it to load.

ozpropdev · 2016-07-17 10:28

Elegant and simple.

Seairth · 2016-07-17 17:43

That looks good. No frills. From there, you can easily provide canned HEX "binaries" to load such as the monitor, Language REPL (Spin, Forth, JavaScript, etc.), etc.

potatohead · 2016-07-18 02:06

Yes. This is very good.

Cluso99 · 2016-07-18 03:16

cgracey wrote: »

Rethinking what the text-based loader would do...

a) Prop2 recalibrates serial timing from each space/$20 received
b) white space = space/tab/cr/lf
c) unexpected characters cause the text loader to start over
d) text loader times out after one minute and chip enters coma

From the perspective of the PC/host:

1) must be > 1 second and < 1 minute after Prop2 reset
2) send special string to signify start of data (ie " Propeller 2" - note the leading space)
3) send hex data bytes separated by white spaces to load into hub 0, 1, 2, 3...
4) send "G", then cog does 'COGINIT #0,#0'

That's really simple and does away with the checksum. The checksum could be implemented by the program at $00000. This way, a simple, editable text file just needs to be transmitted to the chip for it to load.

This would be excellent because it does not require any special interface.

Ramon · 2016-07-19 13:10

Will the text-based loader reply with a special string after a reset? or we don't need that?
Just to be sure we are talking to the correct device, it is alive and communication is OK.

Peter Jakacki · 2016-07-19 13:33

Ramon wrote: »

Will the text-based loader reply with a special string after a reset? or we don't need that?
Just to be sure we are talking to the correct device, it is alive and communication is OK.

Yes, it would be good if we got some kind of prompt so that we also know when to send perhaps even just echoing everything back.

Seairth · 2016-07-19 13:33

Ramon wrote: »

Will the text-based loader reply with a special string after a reset? or we don't need that?
Just to be sure we are talking to the correct device, it is alive and communication is OK.

I thought about that too, though I think it should be sent between steps 2 and 3. This way, the baud rate has been established, and the P2 doesn't output anything unless it recognizes the signature in step 2.

cgracey · 2016-07-19 17:26

I've been thinking about the same things as mentioned in the last three posts.

I'm thinking the loader should be redone to make it always text-based, with simple queries and responses. When the Prop2 resets, it will wait for 100ms for a serial query to come in from a host. If nothing comes, it will try to load and verify from SPI flash. If that fails, it goes back to waiting for a serial query.

I've been thinking about upshifting the baud rate after initial connection, but that needs to be made optional, so that a dumb transmitter doesn't have to do anything special, while the Propeller chip just keeps auto-bauding without interruption. That way, the host can start and finish without any adjustments - it just sends the data file.

Also, by making the HMAC signature envelope the whole application, not just the loader, we get around data integrity problems.

Here is what I have, so far:

To initiate contact / set clock mode....

Host:		"  Propeller ? <amask> <bmask> <adata> <bdata> <clkcode> <baudcode>"

		amask/bmask/adata/bdata	= masks and data to select particular chip
					  based on pin input states (0 0 0 0 = any chip)

		clkcode			= CLKSET value (0=default/20MHz-RC/autobaud)

		baudcode		= CNT/NCO value for async serial smart pin
					  (0 = autobaud, 9600 to 115.2k baud)


Propeller:	"Propeller <apins> <bpins> <asmart> <bsmart> <hubsize> <cogs> <cordic>"

		apins/bpins		= masks of physical pins

		asmart/bsmart		= masks of smart pins

		hubsize			= size of hub RAM in bytes

		cogs			= number of cogs

		cordic			= CORDIC present (1 = yes, 0 = no)



To initiate contact / load code....

Host:		"  Propeller ! <amask> <bmask> <adata> <bdata> <bytecount> <bytes> <hmac>"

		amask/bmask/adata/bdata	= masks and data to select particular chip
					  based on pin input states (0 0 0 0 = any chip)

		bytecount		= number of data bytes that will be sent

		bytes			= data bytes

		hmac			= hmac signature (32 bytes)


Propeller:	"PASS" or "FAIL"

In the simplest usage, the host just sends " Propeller ! 0 0 0 0 <bytecount> <bytes> <hmac>" at any baud rate from 9600 to 115.2k. The Propeller responds with "PASS" or "FAIL". If PASS, it does a 'COGINIT #0,#0'. If fail, it attempts to load from SPI flash, and then returns to waiting for a serial command while running from the 20MHz RC oscillator (same as power-up).

To step up the speed before a download, the host does the " Propeller ?..." command with a clkcode and baudcode, followed by the " Propeller !..." command to download.

If the host just wants to know if any Propeller is there, it can send " Propeller ? 0 0 0 0 0 0" at any baud rate from 9600 to 115.2k. The Propeller responds with "Propeller..." which details the pins, the hub size, number of cogs, and whether or not the chip contains a CORDIC unit. Those metrics cover every possible family member.

David Betz · 2016-07-19 17:39

That looks great, Chip! I don't understand the <amask> <bmask> <adata> and <bdata> parameters though. Are these masks to indicate which pins need to be available?

Edit: Duh. I guess what you mean is that the pins need to match the xdata values after being masked by the xmask values. This is to allow the chip type to be determined by using pins as a selector?

cgracey · 2016-07-19 18:11

David Betz wrote: »

That looks great, Chip! I don't understand the <amask> <bmask> <adata> and <bdata> parameters though. Are these masks to indicate which pins need to be available?

Edit: Duh. I guess what you mean is that the pins need to match the xdata values after being masked by the xmask values. This is to allow the chip type to be determined by using pins as a selector?

Say you have 16 chips loading different programs from the same serial signal and they are not using SPI flash chips. You can differentiate them from one another by, say, tying the normally-used SPI flash pins in different high/low combinations. The amask/bmask values would be 00000000/3C000000, while the adata/bdata values would be 00000000/xx000000. The 'xx' represents the four pins that could be tied in different states. You could load each chip with a different program over one wire. No need to plug a PropPlug into any of them, ever.

David Betz · 2016-07-19 18:19

Clever!

potatohead · 2016-07-19 23:33

Yes it is. Awesome. Thank you for giving this some detail thought Chip.

Tubular · 2016-07-20 00:33

This looks really good Chip. We could also be useful where multiple P1V's / P2V's are embedded in the same FPGA

jmg · 2016-07-20 01:18

cgracey wrote: »

Say you have 16 chips loading different programs from the same serial signal and they are not using SPI flash chips. You can differentiate them from one another by, say, tying the normally-used SPI flash pins in different high/low combinations. The amask/bmask values would be 00000000/3C000000, while the adata/bdata values would be 00000000/xx000000. The 'xx' represents the four pins that could be tied in different states. You could load each chip with a different program over one wire. No need to plug a PropPlug into any of them, ever.

You may also need to use this flow to program flash chips that are connected.
That means those dual-purpose pins need to be re-defined to valid states for QuadSPI during pgm, but use light pullup/pulldown during Serial ID check.
Should be possible, but each piece of SW needs to tolerate the other operation ?

cgracey wrote: »

...at any baud rate from 9600 to 115.2k. ... while running from the 20MHz RC oscillator (same as power-up).

Is there any reason the Auto-baud is limited to 115200 here ?
20MHz should be ok to > 1MBd, with a NCO/fractional Baud clock ?

Seairth · 2016-07-20 01:21

Would some of those 16 chips potentially time out while other chips are being loaded? Also, I imagine that multi-chip loading is not going to be a common use case. In those case where something like this is wanted, couldn't you just as effectively load a generic bootstrap to all of the chips that would then provide this capability?

[insert favorite "simplicity" quote here]

ozpropdev · 2016-07-20 01:24

I like it Chip!

Thinking about the identification of the chip from an internal point of view a GETVER instruction would make code aware of available cogs, pins, hub size cordic etc.
This would be useful for OBEX objects.

Seairth · 2016-07-20 01:30

ozpropdev wrote: »

I like it Chip!
Thinking about the identification of the chip from an internal point of view a GETVER instruction would make code aware of available cogs, pins, hub size cordic etc.
This would be useful for OBEX objects.

I just don't get this need for the loader to have to query this information. You (in general) are going to know exactly what version of the chip you are targeting.

jmg · 2016-07-20 01:34

Seairth wrote: »

I just don't get this need for the loader to have to query this information. You (in general) are going to know exactly what version of the chip you are targeting.

I think the idea here is to confirm you are actually running, on what you hope to be on.
If P2 variants become common, this will be a common question.

It is a good idea generally for loaders to identify the device.

ozpropdev · 2016-07-20 02:12

I find myself already dealing with two different P2's.
Currently I am testing code on a P123-A9 board and a DE2-115 board.
There are considerable differences between the two platforms such as number of cogs, smartpin count, hub size and pin assignment of peripherals.
Not having the luxury of conditional compilation in Pnut means hardware detection was the way to go.
Knowing what platform my code is running on my code adjust to the available hardware.
Now I have one version of my code that runs on both platforms without modification.
A lot cleaner and simpler.

Dave Hein · 2016-07-20 02:18

For security purposes I would be concerned about making the boot program too flexible. This could create an unexpected security hole that might be exploited later on. It might be better to make the boot loader as simple as possible. Secondary boot programs with features like multi-device loading, debug code, monitors, hex string loading or a Forth kernel could be loaded using a simple boot loader. Maybe the fuses would prevent such a security hole from happening, but I would hate to see the security feature of the P2 circumvented by a hack that exploited some feature of a flexible boot loader.

cgracey · 2016-07-20 02:51

jmg wrote: »

cgracey wrote: »

Say you have 16 chips loading different programs from the same serial signal and they are not using SPI flash chips. You can differentiate them from one another by, say, tying the normally-used SPI flash pins in different high/low combinations. The amask/bmask values would be 00000000/3C000000, while the adata/bdata values would be 00000000/xx000000. The 'xx' represents the four pins that could be tied in different states. You could load each chip with a different program over one wire. No need to plug a PropPlug into any of them, ever.

You may also need to use this flow to program flash chips that are connected.
That means those dual-purpose pins need to be re-defined to valid states for QuadSPI during pgm, but use light pullup/pulldown during Serial ID check.
Should be possible, but each piece of SW needs to tolerate the other operation ?

cgracey wrote: »

...at any baud rate from 9600 to 115.2k. ... while running from the 20MHz RC oscillator (same as power-up).

Is there any reason the Auto-baud is limited to 115200 here ?
20MHz should be ok to > 1MBd, with a NCO/fractional Baud clock ?

Auto baud detecton involves measuring the RX pin's states. Software must be able to do a full computation in one bit period. I'll put the code here soon. I got it working today. At 20Mhz, 115.2k baud is about 173 clocks, or 86 instructions. We might be able to do 320.4k, but once you shift up to crystal and PLL, you are deterministic, as well as fast, so 20Mhz should be possible.

cgracey · 2016-07-20 02:54

Dave Hein wrote: »

For security purposes I would be concerned about making the boot program too flexible. This could create an unexpected security hole that might be exploited later on. It might be better to make the boot loader as simple as possible. Secondary boot programs with features like multi-device loading, debug code, monitors, hex string loading or a Forth kernel could be loaded using a simple boot loader. Maybe the fuses would prevent such a security hole from happening, but I would hate to see the security feature of the P2 circumvented by a hack that exploited some feature of a flexible boot loader.

You need the 128-bit key to come up with a valid signature for whatever you download, before it will run.

jmg · 2016-07-20 03:13

cgracey wrote: »

Auto baud detecton involves measuring the RX pin's states. Software must be able to do a full computation in one bit period...

You can freely choose which char to use for AutoBaud, so that gives some flexibility
(you are not locked to that one bit period)

cgracey · 2016-07-20 04:35

jmg wrote: »

cgracey wrote: »

Auto baud detecton involves measuring the RX pin's states. Software must be able to do a full computation in one bit period...

You can freely choose which char to use for AutoBaud, so that gives some flexibility
(you are not locked to that one bit period)

We look for a space ($20). It has a very unique 10000001001 pattern that can be detected with certainty among text characters.

Where is Spud?

Comments