Propeller II

pedward · 2012-08-09 19:57

The decryption algorothm needs to be implemented in the user supplied boot loader. I highly recommend staying with provent algorithms, lest you create holes where none existed. If you have room in ROM, I'm sure you can come up with other useful things for the space.

Please...resist...the...urge.......

Kye · 2012-08-09 20:22

Any SPI flash will work. The boot loader executes the "0x03" read instruction featured on every SPI flash on the market today to read the boot image. It's simple and works.

Sapieha · 2012-08-09 21:31

Hi Chip.

Are Increment, Decrement --- To byte positions else NEXT LONG.

If byte position -- That is Nice -- BUT not if it increment, Decrement to next LONG.
IF it Increment, Decrement BYTE positions -- My question if IT rollover from "11"" to "00" again --- IF yes GOOD else -- BAD

cgracey wrote: »

but here's SETF/MOVF:

SETF D/# - set up field mover

%w_xxdd_yyss

w: 0=byte, 1=word
xx: destination field control, 00=static, 01=rotate left by 8/16 bits, 10=increment, 11=decrement
dd: initial destination field, 00=byte0/word0, 01=byte1/word0, 10=byte2/word1, 11=byte3/word1
yy: source field control, 0x=static, 10=increment, 11=decrement
ss: initial source field, 00=byte0/word0, 01=byte1/word0, 10=byte2/word1, 11=byte3/word1

MOVF D,S - moves a byte/word from S into a byte/word in D, leaving other bits unchanged (except in the case of xx=01, in which bits rotate by 8 or 16)

cgracey · 2012-08-09 21:47

Sapieha wrote: »

Hi Chip.

Are Increment, Decrement --- To byte positions else NEXT LONG.

If byte position -- That is Nice -- BUT not if it increment, Decrement to next LONG.
IF it Increment, Decrement BYTE positions -- My question if IT rollover from "11"" to "00" again --- IF yes GOOD else -- BAD

It can't roll over/under to the next/previous long because the D and S registers are hard-coded into the opcode. So, it just rolls under and over to the opposite byte within D and S.

cgracey · 2012-08-09 21:48

Kye wrote: »

Any SPI flash will work. The boot loader executes the "0x03" read instruction featured on every SPI flash on the market today to read the boot image. It's simple and works.

Thanks to Kye for figuring this out last week!

cgracey · 2012-08-09 21:52

pedward wrote: »

The decryption algorothm needs to be implemented in the user supplied boot loader. I highly recommend staying with provent algorithms, lest you create holes where none existed. If you have room in ROM, I'm sure you can come up with other useful things for the space.

Please...resist...the...urge.......

I won't have room in the ROM for the complete AES-128 algorithm. However, I will have room for the s-box and inverse-s-box tables of 256 bytes each. I think those will be pushed against the top of the ROM. That way, the user's loader program can implement the algorithm and loading function within a single cog, while the s-boxes are not required to be loaded.

Question: Should we omit the s-box tables and regain 512 bytes of hub RAM, instead of ROM?

cgracey · 2012-08-09 22:00

Cluso99 wrote: »

Chip,

The MOVF instruction is very nice thanks.

Since we are not going to be able to boot from SD what will be the cheapest SPI Flash usable (generic part no, lowest size) ?

I am starting to use 24LC64 SOT23-5 $0.33/100 on some prop1 projects that have microSD - it just contains a minimal SD FAT bootloader. I would have loved to ditch the flash on P2.

Here is a Winbond 1Mx8 SPI flash chip that is only $0.37 @1k units:

http://www.digikey.com/product-detail/en/W25Q80BVSNIG/W25Q80BVSNIG-ND/2815927

The only cheaper part is 6 cents less, but 1/4 the size.

jazzed · 2012-08-09 22:08

cgracey wrote: »

Thanks to Kye for figuring this out last week!

Indeed. And thanks to you for making it possible to use QuadSPI devices for single-bit boot and multi-bit access.

Kye · 2012-08-09 22:25

There's enough pins to go octal SPI. The speed bump will be worth the extra chip cost. Otherwise you'll have to shift and mask bits... which will greatly impact your speed.

pedward · 2012-08-09 22:26

cgracey wrote: »

I won't have room in the ROM for the complete AES-128 algorithm. However, I will have room for the s-box and inverse-s-box tables of 256 bytes each. I think those will be pushed against the top of the ROM. That way, the user's loader program can implement the algorithm and loading function within a single cog, while the s-boxes are not required to be loaded.

Question: Should we omit the s-box tables and regain 512 bytes of hub RAM, instead of ROM?

2KB of ROM is nicely orthogonal to the 2KB of COG RAM. Only the deciphering s-box needs to be used, so that's only 256 bytes.

Have you already included SPI flash primitives in ROM? If you have 128 longs to play with, I would recommend trying for SD card read access. Writing to flash or SD isn't necessary in ROM because you just load a loader program, like the existing Propeller Loader for GCC does.

IIRC, there is a very compact FSRW read-only object that you could base the SD code off of. I think Rayman used it in the Propeller Presenter thread (http://forums.parallax.com/showthread.php?140156-Smaller-SD-card-reading-driver)

Kye · 2012-08-09 22:38

@pedward - Its not a feature you want to add. True FAT support will require more than 128 longs. Having a RAW SD card loader will not be preferred. Hacking it to work with a lot of caveats is not something that should be put in the ROM.

If a full FAT16/32 loader could be implemented this would be great. Having anything short of the ability to handle every case means that Parallax will have to go through extra work to support the "feature".

Reading SPI flash is much simpler and does not require a lot of complexity.

Thanks,

cgracey · 2012-08-09 22:41

pedward wrote: »

2KB of ROM is nicely orthogonal to the 2KB of COG RAM. Only the deciphering s-box needs to be used, so that's only 256 bytes.

Have you already included SPI flash primitives in ROM? If you have 128 longs to play with, I would recommend trying for SD card read access. Writing to flash or SD isn't necessary in ROM because you just load a loader program, like the existing Propeller Loader for GCC does.

IIRC, there is a very compact FSRW read-only object that you could base the SD code off of. I think Rayman used it in the Propeller Presenter thread (http://forums.parallax.com/showthread.php?140156-Smaller-SD-card-reading-driver)

I've heard from Kye that there is NO simple way to read anything out of an SD card because the protocol is insane and different manufacturers do things a little differently.

If anyone can show otherwise, it's not too late to fit it in. I'll check the link, meanwhile.

Sapieha · 2012-08-09 22:48

Hi Chip.

MOVF D,S
In that case if it Increment, Decrement D and S --- That is not good

IF it Increment, Decrement BYTE positions -- My question if IT rollover from "11"" to "00" again

%w_xxdd_yyss

It was this field it need Increment, Decrement with rollover of this counter --- For possibility to have sequential BYTE move

w: 0=byte, 1=word
xx: destination field control, 00=static, 01=rotate left by 8/16 bits, 10=increment, 11=decrement
dd: initial destination field, 00=byte0/word0, 01=byte1/word0, 10=byte2/word1, 11=byte3/word1
yy: source field control, 0x=static, 10=increment, 11=decrement
ss: initial source field, 00=byte0/word0, 01=byte1/word0, 10=byte2/word1, 11=byte3/word1

cgracey wrote: »

It can't roll over/under to the next/previous long because the D and S registers are hard-coded into the opcode. So, it just rolls under and over to the opposite byte within D and S.

cgracey · 2012-08-09 23:01

Sapieha wrote: »

Hi Chip.

MOVF D,S
In that case if it Increment, Decrement D and S --- That is not good

IF it Increment, Decrement BYTE positions -- My question if IT rollover from "11"" to "00" again

%w_xxdd_yyss
It was this field it need Increment, Decrement with rollover of this counter --- For possibility to have sequential BYTE move

w: 0=byte, 1=word
xx: destination field control, 00=static, 01=rotate left by 8/16 bits, 10=increment, 11=decrement
dd: initial destination field, 00=byte0/word0, 01=byte1/word0, 10=byte2/word1, 11=byte3/word1
yy: source field control, 0x=static, 10=increment, 11=decrement
ss: initial source field, 00=byte0/word0, 01=byte1/word0, 10=byte2/word1, 11=byte3/word1

		setf	#%0_1000_1111	'configure movf for endian swap

The D byte field pointer is set to increment and is initialized to 0 (%1000)
The S byte field pointer is set to decrement and is initialized to 3 (%1111)

		movf	x,y		'y.byte3 into x.byte0
		movf	x,y		'y.byte2 into x.byte1
		movf	x,y		'y.byte1 into x.byte2
		movf	x,y		'y.byte0 into x.byte3

After these four instructions, both byte field pointers, being stepped 4 times, have returned to their original positions from the SETF

msrobots · 2012-08-09 23:14

How about reading the bootsector and jmp to it like dos etc did? Creating bootable disk was used long time and still works/is supported for sd.
Propeller-System sd. Back to the future.

Really, every pc is doing that. And loads a second stage bootloader supporting fat or cp/m or raw or whatever is on there.

but 128 longs? I do not think that will work. even mb_small_spi from lonesock needs 487 longs

But a small HEX-MONITOR or one small debugger? Cluso has this small footprint debugger... simple serial to boot-pins/PropTerm? Just start a cog with it in your own code?

Peter Jakacki · 2012-08-09 23:23

cgracey wrote: »

I've heard from Kye that there is NO simple way to read anything out of an SD card because the protocol is insane and different manufacturers do things a little differently.

If anyone can show otherwise, it's not too late to fit it in. I'll check the link, meanwhile.

Kye's comments have me scratching my head because I am sure that he is speaking with authority, it's just that I have done a lot of work with SD cards and raw access without a FS and I can treat an SD card like virtual memory. I am just getting back into this with my Tachyon so I will confirm with the great variety of cards that I have at my disposal whether there are any problems or not. I've found that even with an FS that there are sectors that are never touched anyway and I've hidden information there which even a normal format doesn't remove. But let me get back to you though after I confirm or eat my words

Sapieha · 2012-08-09 23:26

Hi Chip.

THANKS.

Ok -- understand it now.
That not give automated moving BYTES in sequential manner

For that needs MOVxF D,y

D = destination LONG addres ..... y = BYTE source field

I know it is to late to discussing that --- BUT for sake ON serializes (Bit banging) You said we will not have that.
BUT I think we missed one possibility to have that with 2 simple instructions.

OUTX, INPX

Cary bit TO else FROM pin ---- that have give us Serializing BITs by only 2 instructions ---- Rotate REGISTER trough Cary -- Then TUOX else INPX.

evanh · 2012-08-09 23:27

4x5n wrote: »

Granted it's been a long time since I've worked with PLCs but most of the time I spent with Omron and would write "subroutines" that were called by having up to a large amount of logic run or not run by way of a set/reset "relay". Often times after the "subroutine" the last rung would reset the "calling relay" so that the next scan would pass it up. Very important that the logic would only "execute" the exact number of times required based on top to bottom scans. If the logic truly ran in parallel a lot of my PLC programs wouldn't have worked properly!

Ya, good example.

Where multicore execution can be applied very easily is functions (Something that the Omron PLCs lacked for a long time). Functions (Which can be compiled code from another language I believe) are normally executed after the main ladder but there is no reason not to execute concurrently and combine the results at I/O update time. And even the I/O update can be done on a double buffered basis and thereby also run concurrently.

There is also room for ladder "tasking" to be concurrent too. Just has to be well documented what are the cavents for each type of control block.

Trying to implement every logic rung as parallel processing would be freakesh after getting so used to scan times and the sequential scan nature. But not an impossible idea, after all, the origins of ladder logic is that of relay logic which is just plain electrical flows through relays. There is no ordered scanning with relays. Would it be desirable though? Probably not. As you have pointed out, there is edge case tricks one uses that is reliant on an ordered scan of the program.

cgracey · 2012-08-09 23:28

msrobots wrote: »

...But a small HEX-MONITOR or one small debugger? Cluso has this small footprint debugger... simple serial to boot-pins/PropTerm? Just start a cog with it in your own code?

A small debugger might be really useful. I LOVE that idea!

Do you think it should be driven from the host over the serial, or just an output medium commanded by Prop software?

pedward · 2012-08-09 23:37

How much SPI flash functionality exists in the ROM code presently? I know we discussed a few things other than SHA-256 and bootloading. Are you going to divide the e-fuses into a system and user section so end-user programs have e-fuses for their own uses? I suggested something along the lines of a 140/32 split or a 156/16 split. Last we talked on the subject I think you agreed 16 user bits would represent a goodly amount.

Also, how are you intending to handle the 4K page size of FLASH? I really wanted the second stage bootloader to be a single page that wasn't overwritten when firmware was loaded. This would mean the firmware starts at $1000 in the FLASH address space. Of course this means the minimum chip size is 256KB, but we talked about 1MB being the minimum because of pricing sweet spot.

I strongly advocate having encryption in software and authentication in ROM. The authentication is well understood and unrestricted, however customers may wish to implement different encryption standards due to export controls of their software/hardware products. Export controls are a bugaboo. Also, you can implement signed boot, un-encrypted code with the present design. This may be the preferred method for opensource projects or customers that don't want to deal with key management.

I have a product design in mind that I'm very much interested in using the P2 for, specifically for the authentication and encryption abilities, because I believe it could be a game changer and subject to competitive analysis. The nature of the product demands security as well, to prevent attacks similar to past high-profile compromises.

There is nothing that stops a customer from implementing AES-256 and using key-expansion with the in-ROM SHA-256 code. If anything, making HMAC a simple call in ROM, so the bootloader can do a 2K round key expansion, would be great in my mind.

pedward · 2012-08-09 23:40

cgracey wrote: »

A small debugger might be really useful. I LOVE that idea!

Do you think it should be driven from the host over the serial, or just an output medium commanded by Prop software?

Agreed, anything you can do to implement a debugger (I thought a true debugger would be one that uses a COG to "take control" of the chip). Obviously there are architectural issues that preclude full debugger support. Stuff like IP pause/run and memory peek/poke.

Although, with the task switching (ala i386 TSS swapping) capabilities, you could implement a debug monitor that ran as 1 task and shared with the "original" task, most of the debug handling done as part of the compiler fixups.

If only you had the capability to peek/poke other COG's RAM and pause/unpause execution, you'd be set with a software "hardware" debugger.

cgracey · 2012-08-09 23:41

Sapieha wrote: »

Hi Chip.

THANKS.

Ok -- understand it now.
That not give automated moving BYTES in sequential manner

For that needs MOVxF D,y
D = destination LONG addres ..... y = BYTE source field

I know it is to late to discussing that --- BUT for sake ON serializes (Bit banging) You said we will not have that.
BUT I think we missed one possibility to have that with 2 simple instructions.

OUTX, INPX
Cary bit TO else FROM pin ---- that have give us Serializing BITs by only 2 instructions ---- Rotate REGISTER trough Cary -- Then TUOX else INPX.

Sapieha, could you please elaborate a bit on your idea for serializing in two instructions? I want to know what you mean.

Here is a serial output routine I made for testing the SHA-256:

'
'
' transmit byte in x
'
tx		setb	x,#8		'set stop bit
		reps	#10,#3		'ready to repeat 3 instructions 10 times
		shl	x,#1		'insert start bit

		shr	x,#1	wc	'shift out next bit to send
		setpc	#30		'write to tx pin
		nop	tx_period	'pause for baud period, loop

tx_ret		ret


tx_period	long	20_000_000 / 115_200 - 3

msrobots · 2012-08-09 23:50

Hi Chip

Both would be nice for interaction with the system. but just output comanded by the calling cog may be easyer to implement/handle. And no waits on serial. Simple stuff would help. Watching some Hub Areas. Trigger on change. always showing cnt on 'events' to find out when stuff happens.
calling cog can provide some parameter-area in Hub-mem for comms.

Interaction via serial might be nice but not really important.

Or talk to Peter Jakacki - maybe we can shrink the Tachyon-Forth-Kernel?

Enjoy!

Mike

Sapieha · 2012-08-09 23:55

Hi Chip.

I think it is what I have thinking on:

shr x,#1 wc 'shift out next bit to send
setpc #30 'write to tx pin: .............................. IF I understand setpc -- OUTPuts carry to pin -- Look on my OUTX to pin it have same meaning

NOW have You instruction tha can READ pin to CARRY for SHL, SHR it in REGISTER
xxxpc #30 'read rx pin:to carry.............................. FOR input serial data from pin to CARRY

pedward · 2012-08-10 00:04

Sapieha wrote: »

Hi Chip.

I think it is what I have thinking on:

shr x,#1 wc 'shift out next bit to send
setpc #30 'write to tx pin: .............................. IF I understand setpc -- OUTPuts carry to pin -- Look on my OUTX to pin it have same meaning

NOW have You instruction tha can READ pin to CARRY for SHL, SHR it in REGISTER
xxxpc #30 'read rx pin:to carry.............................. FOR input serial data from pin to CARRY

You can do that with TEST and conditional execution.

msrobots · 2012-08-10 00:22

@Pedward

Although, with the task switching (ala i386 TSS swapping) capabilities, you could implement a debug monitor that ran as 1 task and shared with the "original" task, most of the debug handling done as part of the compiler fixups.

yes - please - tell me more ... or get CHIP to exclain his wonders to us.... please more input ...

I used Viewport but it does to much for me. serial output is way more usable then 'need a pc there now'.

I do not think about real debugging but having some sort of 'emergency tool' to look what really goes on there on site. And no need to include this in every application. just call a new cog from your system and look. Could be useful for aborts also. If nothing to run - run this and tell why over and over again on the serial port....

yes this woud be very useful.

Enjoy!

Mike

cgracey · 2012-08-10 00:25

pedward wrote: »

How much SPI flash functionality exists in the ROM code presently? I know we discussed a few things other than SHA-256 and bootloading. Are you going to divide the e-fuses into a system and user section so end-user programs have e-fuses for their own uses? I suggested something along the lines of a 140/32 split or a 156/16 split. Last we talked on the subject I think you agreed 16 user bits would represent a goodly amount.

Also, how are you intending to handle the 4K page size of FLASH? I really wanted the second stage bootloader to be a single page that wasn't overwritten when firmware was loaded. This would mean the firmware starts at $1000 in the FLASH address space. Of course this means the minimum chip size is 256KB, but we talked about 1MB being the minimum because of pricing sweet spot.

I strongly advocate having encryption in software and authentication in ROM. The authentication is well understood and unrestricted, however customers may wish to implement different encryption standards due to export controls of their software/hardware products. Export controls are a bugaboo. Also, you can implement signed boot, un-encrypted code with the present design. This may be the preferred method for opensource projects or customers that don't want to deal with key management.

I have a product design in mind that I'm very much interested in using the P2 for, specifically for the authentication and encryption abilities, because I believe it could be a game changer and subject to competitive analysis. The nature of the product demands security as well, to prevent attacks similar to past high-profile compromises.

There is nothing that stops a customer from implementing AES-256 and using key-expansion with the in-ROM SHA-256 code. If anything, making HMAC a simple call in ROM, so the bootloader can do a 2K round key expansion, would be great in my mind.

The only thing we need to do to read an SPI flash is make a negative edge on its select pin, clock out command $03_000000, then clock the data bits in. We are clocking in $1F0+8 longs of data (through Rx/Tx, in host mode). The first $1F0 are the loader (launched in cog0 after authentication, with all 172 fuses sitting in registers $1F0..$1F5) and the last 8 are the HMAC signature. We do an HMAC on the $1F0 longs, using the first 128 fuse bits as the key, then verify it against the 8 HMAC longs. Authentication is always on, so the HMAC acts as a validator. For virgin parts, this means that the key is $00000000_00000000_00000000_00000000. The just-loaded loader then must perform any AES-128/other on software which it pulls from the Tx/Rx serial or SPI flash chip, using the passed fuses as key bits. So, we have128 fuse bits being used as an authentication key and another 44 for whatever the user wants. I haven't thought about 4K page sizes yet, and how we'll use them, because that will get addressed in the software loader, not the ROM code.

So, we have authentication in ROM and decryption in software, like you advocate.

Does only committing 128 fuses to the HMAC bug you?

cgracey · 2012-08-10 00:30

Sapieha wrote: »

Hi Chip.

I think it is what I have thinking on:

shr x,#1 wc 'shift out next bit to send
setpc #30 'write to tx pin: .............................. IF I understand setpc -- OUTPuts carry to pin -- Look on my OUTX to pin it have same meaning

NOW have You instruction tha can READ pin to CARRY for SHL, SHR it in REGISTER
xxxpc #30 'read rx pin:to carry.............................. FOR input serial data from pin to CARRY

Right, SETPC writes the carry to a pin. There's also: OFFP (make input), CLRP, SETP, NOTP, SETPC, SETPNC, SETPZ, SETPNZ.

For inputting, we have GETP (into C and NZ) and GETNP (into NC and Z).

Sapieha · 2012-08-10 00:36

Hi Chip.

Thanks.

That give all I need -- That is good not only for serialize BUT all type of PLC programing

cgracey wrote: »

Right, SETPC writes the carry to a pin. There's also: OFFP (make input), CLRP, SETP, NOTP, SETPC, SETPNC, SETPZ, SETPNZ.

For inputting, we have GETP (into C and NZ) and GETNP (into NC and Z).

pedward · 2012-08-10 01:08

cgracey wrote: »

The only thing we need to do to read an SPI flash is make a negative edge on its select pin, clock out command $03_000000, then clock the data bits in. We are clocking in $1F0+8 longs of data (through Rx/Tx, in host mode). The first $1F0 are the loader (launched in cog0 after authentication, with all 172 fuses sitting in registers $1F0..$1F5) and the last 8 are the HMAC signature. We do an HMAC on the $1F0 longs, using the first 128 fuse bits as the key, then verify it against the 8 HMAC longs. Authentication is always on, so the HMAC acts as a validator. For virgin parts, this means that the key is $00000000_00000000_00000000_00000000. The just-loaded loader then must perform any AES-128/other on software which it pulls from the Tx/Rx serial or SPI flash chip, using the passed fuses as key bits. So, we have128 fuse bits being used as an authentication key and another 44 for whatever the user wants. I haven't thought about 4K page sizes yet, and how we'll use them, because that will get addressed in the software loader, not the ROM code.

So, we have authentication in ROM and decryption in software, like you advocate.

Does only committing 128 fuses to the HMAC bug you?

Ok, those are the details I was after since we talked broad strokes.

As for 128 bits, that doesn't worry me, HMAC was designed for that and SHA-256 hasn't been proven to vulnerable to a 1 bit attack, much less 128 bits of deterministic entropy

I am concerned about how WP is implemented, last we talked you and I floated the idea of redundant WP bits to prevent unintentional bricking. Is only part of the e-fuses WP? That's the difference I make between "system" and "user". The "user" is still allowed to blow "user" fuses after the WP is set. Maybe WP should be two-fold, 5 WP bits (interleaved) for the "system" portion and several interleaved WP bits for the user portion?

I was kinda thinking WP bits could gate write access to the next 31 bits of fuses, so you have 1 WP bit for every 31 bits. Though they should be ANDed for the 2 different groups. The system group fuse bits are ANDed together and the user group bits are ANDed together.

Also, to think of it, fuse bits need to be blown in ROM (Ring 0) code because you can read back the bits in Ring 1. The ROM could simply read 5 longs and check the sign bit (high order bit) to see if it's WP and disable write if all bits are true.

Yes, the loader cares about 4K...

Propeller II

Comments