Parallax Propeller 2 Code Authentication and Protection

pedward · 2012-03-04 16:35

Hi folks, I've been privately working with Chip on the code authentication scheme used in the Propeller 2. I spent yesterday writing up a description of the method so it could be professionally reviewed. After working through the majority of the details, I am ready to publish a draft preview of the specification.

Chip suggested I publish it here to get some feedback. At this point Chip is pretty happy with the spec and he's moving forward with getting a professional review by a 3rd party company. The parts that aren't completely nailed down are the way the key bits are used.

I discussed a new approach to using the fuse bits with Chip and it would work like this:

The first 42 bits of the fuse bits are user defineable, suggested use is a 32 bit serial number, with 10 bits for meta-data. The next 2 bits define the mode and write protect, the last 128 bits are the cryptographic key. When key access is disabled, the first 44 fusible links are still readable, at bit 45 it rolls over to bit 0. Before access is disabled, bits 45-171 are accessible to ready the key. This isn't super firm, but it's the most elegant approach so far, the logistics of making it work are up to Chip.

Below is the pasted contents of the document, it is a DRAFT, which means it's still fluid and will be altered.

Parallax Propeller 2
Code Authentication and Protection

By Perry Harrington

DRAFT v1.0

Code protection is a common feature of many microcontrollers available today. This feature is important to many business applications for several reasons, chief among them is protection of proprietary IP. Additional reasons such as code signing for authenticating firmware updates and authentication to prevent untrusted 3^rd party code from running on a product. This system is more elaborate than conventional code protection schemes used by other manufacturers because the microcontroller does not have on-chip non-volatile memory and loads program code from external memory. I believe this system is more robust than proprietary methods used by other manufacturers because it is open and uses well heeled technologies and principles, all while meeting the requirements outlined above.

Propeller 2 Key
The Parallax Propeller 2 has 172 one time programmable fusible links within the chip. These fusible links are buried within the chip and difficult to alter or detect due to the nature of how the metal layers of the chip are deposited. This document assumes that the fusible bits are the second weakest link within the system; key management is always considered the weakest link in any cryptographic system.
The fuse bits are readable using the COGID instruction 1 bit at a time via the carry flag. Upon reset, the Propeller 2 permits access to these bits until the COGID instruction is executed with a sentinel value in the destination field of the instruction. Once the internal flip-flop has flipped, the fuse bits may no longer be read until the next reset. This ensures the fuse bits are unavailable to user code.
I propose to allocate 128 of the 172 bits for key storage, 42 bits for user purposes, and reserve 2 bits to indicate external memory is encrypted and to write protect the fuse bits from further programming.

Trust Levels
Within computer security there is the concept of hierarchical trust, often called ring levels¹. This document describes 3 levels, numbered Ring 0 to Ring 2 in descending levels of trust. Ring 0 is entered upon reset, Ring 1 is entered upon execution of the boot-loader, Ring 2 is entered upon execution of the user code.
Ring 0 is considered the most privileged level and has access to all chip resources. It is important that Ring 0 authenticate the boot-loader used in Ring 1, so that untrusted code is not executed with escalated permissions. Ring 0 code does not employ any obfuscation or trickery to hide security aspects from scrutiny.
Ring 1 is the boot-loader code and is authenticated with a 2 round SHA-256 HMAC to obscure the key used for decrypting Ring 2 code. Because Ring 1 is considered privileged, it must be as transparent and well written, thus a reference implementation will be provided to end users. The reference implementation of the Ring 1 boot-loader will implement AES-128-CBC decryption to load Ring 2 code into memory.
Ring 2 code is the user application written for the microcontroller, it is considered insecure and all privileges are dropped prior to execution. Ring 2 code may be susceptible to security vulnerabilities, which may allow untrusted code to be executed on the microcontroller, however the security keys are safe from untrusted Ring 2 code.

Propeller 2 Memory Map
The Propeller 2 is a Harvard architecture design with separate volatile memory for user code and data storage, with executable code running from local volatile memory in each core. There is 126KB of volatile HUB RAM and 2KB of volatile COG RAM for each COG. There are 8 COGs within the chip. User code cannot execute directly from HUB RAM, it must be copied into COG ram to execute. Upon boot-up, an external non-volatile memory is read and loaded into HUB memory, then user code is loaded into the first COG and executed.
The 2KB located above HUB memory is OTP ROM and contains the Ring 0 code. This leaves 2KB of space in EEPROM for the boot-loader and authentication hash.

ROM code
The ROM code executed in Ring 0 contains house-keeping and a SHA-256 implementation. Upon reset various chip setup tasks are performed, then the chip looks for a connection from an external device.
If an external device is present, the ROM code checks for the desired action, run or program. If the action is run, and the encrypted flag is set, access to the fuse bits is disabled, an unencrypted program is downloaded to HUB memory and [optionally]² authenticated, then Ring 2 code is executed directly, skipping Ring 1.
If the action is to program the external memory, the ROM code simply downloads the data to be programmed into external memory and writes to the external memory. No authentication or decryption needs to be performed, this is done upon the next reset.
If an external device is not attached to the serial pins at reset, the ROM code proceeds to load the Ring 1 boot-loader from external memory. The encrypted flag is consulted to determine whether the fuse bits should be protected and whether authentication should be performed.
Programming the fuse bits for the first time is accomplished by downloading a program to HUB memory and running it. If the write protect is not enabled, the Ring 2 code can program the fusible links. It is the responsibility of the developer to set the write protect bit after the fusible bits are programmed. The process of setting the fusible bits should be handled by a wizard program on the PC to prevent accidental bricking of the chip.

HMAC³
Hash-based Message Authentication Code is a protocol for salting and hashing a message in a 2 round protocol. One iteration of HMAC consists of an inner and outer round of hashing, each with the same key XORed with a different value. The purpose of the two rounds is to obscure the key by mixing two sets of hashed data.

SHA-256
The SHA-256 hash algorithm is currently the minimum standard for hash-based authentication. The algorithm processes data in 512 bit blocks and produces a 256 bit hash of the message. There are presently no known attacks against this algorithm and it is presumed to be secure for up to 20 years.
AES-128
The AES algorithm is a symmetric block cipher, recognized as the NIST standard for encryption. This algorithm processes 16 byte blocks and uses a 128bit key. The algorithm is considered to be secure for up to 20 years with a 128 bit key, so there is symmetry between SHA-256 and AES-128 being used as a pair.

CBC⁴
One of the faults of a symmetric block cipher is that similar plaintexts will result in the same ciphertext. If you have a pattern that repeats in 16 bytes, the ciphertext will be the same. Furthermore, files that differ only slightly will have ciphertexts that contain a lot of duplication. From a security standpoint this is undesirable. IBM invented CBC in 1976 to ensure duplicate blocks within a file do not have duplicate ciphertexts, additionally there is a random initialization vector which is used to salt the ciphertext so that identical plaintexts do not generate identical ciphertexts. This initialization vector should not be reused in subsequent encryption operations.

How it all comes together

$00 - $1F
SHA-256 hash

$20 - $2F
CBC IV

$30 - $3F
Reserved

$40 - $1FF
Boot-loader

$200 - $1FFFF
User code

Table 1: Memory map of the external memory

The ROM code loads the boot-loader from external memory and performs HMAC authentication. The resulting hash is compared against the hash stored in the first block of external memory. If they match, the boot-loader is given the key and control of the chip. The boot-loader reads the CBC Initialization Vector from external memory and loads the User code into HUB memory. Next the boot-loader decrypts the first block in HUB memory then XORs the IV with the decrypted block, copies the ciphertext to the IV, then stores the decrypted block in HUB memory.

In the end, the whole authentication and encryption system is quite simple, uses industry accepted algorithms, and implements security in an open and robust manner.

1 http://en.wikipedia.org/wiki/Ring_(computer_security)

2If the encrypted flag is unset, the fusible bits are assumed to be a user defined and access is allowed.

3 http://en.wikipedia.org/wiki/HMAC

4 http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Cipher-block_chaining_.28CBC.29

Rayman · 2012-03-04 17:01

I was with you up to the "Harvard Architecture" line... I think Prop 1&2 are really von Neumann, although it can pretend to be either...

pedward · 2012-03-04 17:26

It's really more NUMA. You are right that it can be either, but the main distinction that most make is separate data and instruction memories. Since the Prop can't execute from Hub ram, it resembles that more closely. To be more concise, I would think that a modified Harvard architecture is more appropriate. You could really beat the subject to death, but conveying the separate memory approach helps for people to understand the difference between COG and HUB memory.

Cluso99 · 2012-03-04 20:57

Great work. WOW, 172 fuses is fantastic. We have 128 hidden and still 42 to play with. This should make (almost) everyone happy. We got our cake and get to eat it too;)

I do have one question (for Chip)... Why can't we have the full 128KB of hub ram. No doubt the chip has this because you would not build 126KB. Therefore, can the ROM either be mapped into the hub on power up and mapped out for hub ram after booting? Or, can the ROM be located above the 128KB hub ram? The loading from eeprom or sram can still be just 128KB if that is required to allow for a 2KB encrypted loader. This 2KB of hub ram will still be valuable.

I am presuming that there are a few options to store the user code, such as eeprom and SD card. Given the extra fuses, might it be possible to use 2 of these bits to speed up the boot process? Something along these lines...
00 = (unprogrammed) = try download, eeprom, sd card, etc??
01 = programmed = can only be loaded from eeprom
10 = programmed = can only be loaded from SD card
11 = programmed = can only be downladed from 3rd device (if one exists).
This should speed up the boot process for final products that do not require field updates, or can provide access to the eeprom or SD card.

__red__ · 2012-03-04 21:45

pedward wrote: »

How it all comes together

$00 - $1F
SHA-256 hash

$20 - $2F

CBC IV

$30 - $3F
Reserved

$40 - $1FF
Boot-loader

$200 - $1FFFF
User code

I thought the IV needed to be protected with the same rigor? If the CBC IV is in the eeprom then can't you use that to attack that randomization? The first block of your typical payload is likely fairly predictable hence the need for the IV (and why WEP was cracked so easily).

This is a question for a cryptographer with more knowledge that I have.

(From wikipedia): "If an attacker knows the IV (or the previous block of ciphertext) before he specifies the next plaintext, he can check his guess about plaintext of some block that was encrypted with the same key before (this is known as the TLS CBC IV attack).

I do not know enough about this subject to hold a strong opinion on the matter.

pedward · 2012-03-05 13:53

The IV doesn't need to be secret. The purpose of the IV is to salt the ciphertext during encryption so that identical plaintexts don't have identical ciphertexts. In decryption the IV is only used on the first block, from that point on the ciphertext from the previous block is used to unsalt the next block.

The SSL vulnerability is on the flip side of the issue, encryption. The CBC system is vulnerable to a known IV attack if you are able to get the encryption engine to encrypt a known plaintext with the unknown key. In our scenario the Developer isn't going to allow some untrusted 3rd party to probe his key a zillion times, he's only encrypting the plaintext of his program binary.

The most important thing about the IV is that you don't reuse the same IV for multiple messages. This means that each Prop 2 device in the field should have a different IV and thus the ciphertexts are different. You cannot derive anything from the ciphertext by knowing the IV.

Cluso99 · 2012-03-05 17:50

I would have expected more replies. Satisfies my needs but I am no expert on encryption so cannot comment on its security.

Circuitsoft · 2012-03-05 17:55

pedward wrote: »

This means that each Prop 2 device in the field should have a different IV and thus the ciphertexts are different.

I do want to clarify that you don't need to re-encrypt the program for every device using the same IV, only make sure there aren't two different programs encrypted with the same IV. Several copies of the same ciphertext aren't going to be any less secure than a single copy.

pedward · 2012-03-05 18:21

From a decryption standpoint, unique IVs are kinda useless. The 2nd block uses the first block as the IV, the 3rd uses the 2nd, etc. It would be good practice to program every device you deploy with a different IV, but it isn't required. What you DON'T want is different plaintexts encrypted with the SAME IV. I consider each chip to be a "message" and in a stream protocol every message should get a unique IV, even if they are identical. It is this randomization that helps to thwart same plaintext attacks in stream protocols like SSL. From a technical standpoint it may be moot, but if you are going to employ best practices, why stop short of the whole boat?

The Prop 2 is going to need a wizard to program the chip because fuse bit handling needs a simple to use interface and the encryption needs to be performed in a safe manner. The only way to ensure that key management is done properly is to build it all into a wizard program. The way I see it, Unix development environments should be able to get away with makefiles and command line programs for stuffing data, but a majority of users will want the confidence of a wizard utility to manage their crypto.

The main item of interest is employing a key-ring program that uses a good passphrase to protect access to the secure keys on the developer's computer. Also, it makes sense to permit this program to keep track of serial numbers written to each chip programmed. It could be feasible that every device has a unique key for highly sensitive applications. In this case the serial number is used to lookup the key for that chip and supply a firmware image. The database would similarly be encrypted on disk to make attack of the developer's computer more difficult.

pedward · 2012-03-07 13:04

Cluso99 wrote: »

I do have one question (for Chip)... Why can't we have the full 128KB of hub ram. No doubt the chip has this because you would not build 126KB. Therefore, can the ROM either be mapped into the hub on power up and mapped out for hub ram after booting? Or, can the ROM be located above the 128KB hub ram? The loading from eeprom or sram can still be just 128KB if that is required to allow for a 2KB encrypted loader. This 2KB of hub ram will still be valuable.

I am presuming that there are a few options to store the user code, such as eeprom and SD card. Given the extra fuses, might it be possible to use 2 of these bits to speed up the boot process? Something along these lines...
00 = (unprogrammed) = try download, eeprom, sd card, etc??
01 = programmed = can only be loaded from eeprom
10 = programmed = can only be loaded from SD card
11 = programmed = can only be downladed from 3rd device (if one exists).
This should speed up the boot process for final products that do not require field updates, or can provide access to the eeprom or SD card.

I think Chip said a while ago that 128KB of memory is all they had space for. To make things simpler, he stole the top 2KB of memory for the ROM code; if there was more room, there would be more space.

Chip is leaning towards an SPI device as the boot device because I2C devices aren't available at the same price point for the same density, and they are slower. Chip is also leaning towards an 8Mbit chip instead of 1Mbit because the price differential is so small, and 8Mbits is a lot more space for the price.

Chip was interested in booting from SD, however there are some hurdles with that. Firstly, an SPI chip is about a dollar for 8Mbit, cheap SD cards are $5, then you need a socket. That drives the cost way up for a minimal implementation. What I suggested is that onboard SPI flash would be the default, then the boot-loader could boot to SD if that was desired.

One interesting thing that came up, the boot loader is only 512 longs, so the external SPI only needs to be 4KB (that is the flash sector size). This means if you wanted to bootstrap to some other medium, you only need a 16Kbit flash chip to store the bootloader. This changes the game significantly from the Prop 1, you aren't stuck with a 256Kbit minimum size.

If there is room, I2C might be dually supported in the ROM code. As it is, Chip needs to write the ROM monitor. The SHA-256 code takes ~284 longs, leaving ~100 longs if you don't pack code into the transient data space. There is around 96 longs of transient data storage for the algorithm.

The only elements of the bootup that are pretty well figured out are SPI as the primary device and SHA-256 for HMAC of the boot-loader. Chip has 2KB to work with and I don't expect he will waste any of the precious ROM.

I think the ROM monitor is going to be pretty neato and might have some tricks up it's sleeves to speed downloads A LOT!

Circuitsoft · 2012-03-07 13:12

If 1Mbit is enough, wouldn't you target that? Wouldn't 8Mbit still work?

pedward · 2012-03-07 13:24

You can use a 1Mbit flash, for sure. It's a 20 cent difference for 8x the storage. The important factors are that the command set are the same. The datasheets I looked at all had 24bit addresses, a single read command takes 32bits and just streams data out continuously.

Since flash is page oriented, you need to have the same page size for each chip. The devices I looked at had a minimum of a 4K sector size for erase. The program code will be in ROM to make things simple, so one routine has to work for all devices. The ROM monitor will probably have small operations like "write long to HUB memory" and "write HUB page to flash". This way the device on the other end gives the monitor a list of commands instead of macro operations.

Circuitsoft · 2012-03-07 13:25

Can you read at least the first few bytes if the page size is wrong? Could you store page size on the flash chip?

pedward · 2012-03-07 13:35

The page size has more to do with programming. You could zap the whole chip, but that defeats the purpose. You can read from the flash no matter the page size, writing is the tricky part.

I had first suggested downloading a stub program that did the flash programming, but Chip wants the programming to be part of the ROM.

That said, you could still download a stub loader to HUB memory and use that to program non-standard flash chips.

One of the ideas discussed was switching the Prop 2 over to crystal in the ROM code when downloading from PC. You would handshake and send the clock info and baud rate, then it would switch modes and talk normal serial to the PC. This opens up the possibility of using 920Kbaud serial to download code.

Another idea would be to use a few of the fuse bits as permanent clock settings, so the chip would always start up in PLL mode, so you would always have serial available via the ROM monitor, simplifying the process of slaving chips.

You could download a stub program that enables chip2chip comms and uses the builtin instructions to perform bulk download of firmware at boot time, thus speeding up the booting of slave chips considerably.

This is the same premise that FastLynx used in the 80's for serial data transfer:

MODE COM1: 9600, 8, N, 1
CTTY COM1:

You load a second stage boot-loader over serial, which kicks things up a notch.

Cluso99 · 2012-03-07 19:44

I beg to differ. I don't want the expense, nor space, of any SPI flash chip when I have SD. There is some boot space on the SD card but I would prefer locate a FAT16/32 file by any fixed name. Hopefully we will get 2 boot choices as has previously been said. I do realise that 2KB ROM is not much. I am happy to use SPI Flash instead of I2C as that appears the way to go.

I thinkyou missed my point re 128KB hub ram. I am almost certain that the prop will not be built with 126KB hub ram, but indeed with 128KB because they are regular blocks and to omit 2KB would not be practical because that lost die space would not be usable. Therefore, I would like access to it and thought that should be simple enough.

canacar · 2012-03-08 22:51

Amazing!

I am now even more impressed with Parallax for doing the right thing regarding crypto/code protection: Using standards/best practices and opening up the design for review.

Here are some comments ...

The scheme is generally well designed. One big issue is the lack of code authentication. Even though the encryption protects the code, it does not provide a guarantee that the decrypted code is the same. There are some transformations in CBC-mode that lead to limited corruption, which might allow an attacker to duplicate or shuffle parts of the image around or use parts from different images encrypted with the same key. This may subvert the program operation and perhaps leak information about the image contents. Authentication in addition to encryption is needed to detect modifications to the (encrypted) image. Computing the image HMAC after encryption should work. It should be possible to use multiple cogs to simultaneously update the HMAC and decrypt the blocks while loading the image. It is best practice not to use the same key for different purposes (such as encryption and authentication) so consider deriving different keys from the master key for encryption and authentication.

Alternatively an authenticated encryption mode such as EAX or CCM could be used. Note that, both of these use CTR-mode for encryption which is even more sensitive to IV selection. The encryption scheme breaks if the same IV is ever used to encrypt different images with the same key (the wizard should help here). On the other hand, these schemes use the same block cipher (AES) for both authentication and encryption. This may save space by removing the need for SHA-256. These are some of the options I can think of. I am not a crypto person, so I will not make a recommendation here.

Storing the key in fuses and hiding it from the user image is good idea. However, the 'ring' concept is not a good match since the "inner" rings are no longer available when the user code starts running. It is more like dropping privileges while moving to next stage.

While hiding the key is good, it may be even better to leave a "proof of authentication" such as a key that authenticated user code can use. Something that non-authenticated code would not be able to know. This may be useful on communication devices and would allow the code to authenticate itself to the peer using this key. It can also be used to authenticate other code or data that the user code loads from external devices (is LMM still relevant?). Obviously this key should be different from the master key in the fuses. Deriving a new key from the master key (using a key derivation function, KDF, which can be implemented using SHA or AES) and leaving it in HUB memory should do the trick. Obviously this key should not be generated when running unauthenticated code.

Most of the above are implemented in the ring-1 loader, which means they do not necessarily have an impact on the hardware.

Ideally the reference implementation of the ring-1 loader, and the crypto parts of the wizard (or a command-line version) would be released as open source for easy verification.

Also, it would be nice if the AES (and SHA-256) implementations in ROM would be accessible to the user code.

Now, here are some potential attacks to be aware of:

1. Load protected code, reset the chip and load a small unauthenticated image that dumps HUB memory. The mitigation is to clear the rest of the HUB memory before running unauthenticated code.

2. Timing/power attacks: While operating on the key (reading from fuses, doing computations on etc.) if the code takes different paths/cycles based on the value of the key bits/bytes, it may be detectable from the outside and may cause the key to be compromised. For instance, assume that when loading the key from fuses, the code uses an extra instruction depending on whether the bit is a one or a zero. This may be detectable, from the outside especially if the attacker manages to reduce the clock speed and measure power consumption as well. This particular one is unlikely since the fuse bits end up in the carry flag which can just be shifted into a register regardless of the value, but I guess it illustrates the point.

3. Key reuse: If multiple devices use the same key, it might make it feasible to use more expensive attacks and even decapping one chip to learn the key. Since the authentication and encryption is based on symmetric keys, an attacker who obtains the key not only gains the ability to read the images, but they can modify and "sign" any image they want. The design should encourage a unique key for each device (even make it transparent to the user). The "wizard" is a probably a good start.

Thanks again for opening this up for discussion...

pedward · 2012-03-09 00:38

Your point about corruption is well taken, I think that attack via mixing ciphertexts is a very slim possibility. Since CBC relies on chaining of blocks, you can't simply substitute blocks unless they blocks are identical. The IV is supposed to introduce a bit of entropy in the process to prevent this type of problem. I do see value in checking for corruption, the authentication would come for free and "feel" nice, although I just can't see the attack being possible.

1) upon reset the ROM code runs and zeros out RAM. The ROM code and boot-loader will be written to properly manage the key.

2) The process is exactly the same to read all of the bits, Chip thinks the power signature won't be noisy enough to give away any info.

3) I suggested building the crypto tools on the PC around OpenSSL, the key ring still needs to be sorted out. It would be nice to leverage an existing key ring program like KeePass.

The ROM, boot-loader, and support tools will be open sourced from the get-go, so no worry about that.

Cluso99 · 2012-03-09 02:06

canacar: One of the nice features of the prop is that instructions can be converted to nops so that code takes the same path by using the flags. Each instruction can be conditionally executed based on the condition codes, and each instruction can optionally write to the z & c flags.
Anyway, it seems the bits will be read into the carry and shifted into position, so this is not needed anyway.
As pedward said, the ROM code will be open so no need to dump it. And once reset the prop starts over.

canacar · 2012-03-09 07:27

pedward wrote: »

Your point about corruption is well taken, I think that attack via mixing ciphertexts is a very slim possibility. Since CBC relies on chaining of blocks, you can't simply substitute blocks unless they blocks are identical. The IV is supposed to introduce a bit of entropy in the process to prevent this type of problem. I do see value in checking for corruption, the authentication would come for free and "feel" nice, although I just can't see the attack being possible.

While encryption with different IVs changes all the blocks, decryption in CBC mode only depends on the current and the previous "encrypted blocks". Which means one can take two or more ciphertext blocks from anywhere in the image (or even from another image encrypted with the same key) and put it anywhere else. Only the first and the following blocks would be corrupted. The remaining blocks in the middle will be decrypted just fine. Here are some possible ways to misuse this property:

1. If an image outputs some data (bitmap etc.) from the HUB RAM, replace this data with other parts of the RAM to get access to bigger chunks of the image that is not normally displayed.

2. If there is a separate encrypted firmware, say a "diagnostics firmware", that outputs some information about the device, you can replace parts of its HUB ram with blocks from the "production firmware" so that these blocks would be output as a "strings" in the diagnostics messages, for instance.

Encryption alone should never be used for "authentication". Other (unauthenticated) encryption modes are no better either. One can shuffle blocks around in ECB mode, and you can even flip individual bits in CTR mode for instance, and things would decrypt just fine. It does not mean that these modes are bad (well, ECB should still be avoided). It just means that they are not designed to do authentication. There are also more efficient, single-pass authenticated encryption modes. However, they are all patented

pedward wrote: »

1) upon reset the ROM code runs and zeros out RAM. The ROM code and boot-loader will be written to properly manage the key.

2) The process is exactly the same to read all of the bits, Chip thinks the power signature won't be noisy enough to give away any info.

3) I suggested building the crypto tools on the PC around OpenSSL, the key ring still needs to be sorted out. It would be nice to leverage an existing key ring program like KeePass.

The ROM, boot-loader, and support tools will be open sourced from the get-go, so no worry about that.

These are very good to know. Since there is no implementation, I wanted to point out some potential pitfalls, and it seems they are already being considered

canacar · 2012-03-09 07:31

Cluso99 wrote: »

canacar: One of the nice features of the prop is that instructions can be converted to nops so that code takes the same path by using the flags. Each instruction can be conditionally executed based on the condition codes, and each instruction can optionally write to the z & c flags.
Anyway, it seems the bits will be read into the carry and shifted into position, so this is not needed anyway.
As pedward said, the ROM code will be open so no need to dump it. And once reset the prop starts over.

I was not concerned about dumping the ROM. It can be dumped by unauthenticated code, from any chip anyway. My concern was dumping the leftovers in the HUB RAM from the previous "session". Since pedward confirmed that the HUB RAM is cleared by ROM during boot, this is not a concern anymore.

pedward · 2012-03-09 11:30

@canacar

Your point is taken, authentication of the ciphertext wasn't considered a requirement because it is unprivileged and the requirement was to hide the plaintext.

The attack you describe would require copying blocks from the ciphertext that you know will perform some action. If you don't know what each block does, you can't really expect it to perform an action.

Also, you would have to tailor it perfectly so that corrupted instructions don't lead to improper operation.

In short, even if you had the plaintext, crafting the ciphertext to do something malicious is very difficult. If you didn't have the plaintext, crafting an attack based on just the ciphertext would be a monumental undertaking.

I'm not discounting that it *could* be done, I'm saying that the likelihood of it actually happening is slim.

That said, I think we can find space to put a hash in there so that corruption of the memory image can be detected.

pedward · 2012-03-09 11:37

One more thing, the SHA-256 hash function could be used after reset. You would need to load it into a COG and provide a wrapper to communicate with it. I have already written the PASM and SPIN code to do this as part of the development. The ROM code will contain just the sha_256_block function.

One issue with authenticating the ciphertext is that the boot-loader has to do AES-128 and SHA-256, this might be difficult, so it may be necessary to launch a second COG to do the hashing, or the CLUT memory might need to be used as temporary overlay storage, "swap space".

Chip is shooting to have the actual ROM code in the alpha chips, writing of the bootloader and the exact details of the encryption/auth will be ironed out then.

Darreen · 2012-03-09 12:14

A few of the points raised here http://debugmo.de/2011/11/almost-secure/ maybe relevant

Darren

Jeff Martin · 2012-03-09 16:14

Cluso99 wrote: »

I thinkyou missed my point re 128KB hub ram. I am almost certain that the prop will not be built with 126KB hub ram, but indeed with 128KB because they are regular blocks and to omit 2KB would not be practical because that lost die space would not be usable. Therefore, I would like access to it and thought that should be simple enough.

You're right, to omit 2KB would not be practical and, in fact, the implementation will not do this. Instead, we're using our 128 KB RAM block but are making the last 2 KB of that RAM fixed to specific values... it's read-only RAM. There really is no separate ROM block nor any associated control lines because of this, just the RAM blocks and associated control lines, but the last 2K can only be read (not written). So, there is no wasted, hidden RAM.

Dave Hein · 2012-03-09 16:33

For the P1, the physical size of the ROM was about 1/7th the size of the RAM. Wouldn't this have been the case for the P2 as well?

Cluso99 · 2012-03-09 16:40

Jeff: Thanks for that info. Nice concept. I had not thought of that. Reduces the requirement of an extra set of lines to ROM and ROM space, even though it would be small.

A qusestion if you have time... Have you seen the new Raspberry Pi ARM chip that has connection for the ram (dram i think) on the top? A neat idea but I suppose expensive. Stackable chips.

Jeff Martin · 2012-03-09 16:44

Dave Hein wrote: »

For the P1, the physical size of the ROM was about 1/7th the size of the RAM. Wouldn't this have been the case for the P2 as well?

Yes, that's true (to my knowledge) but we're out of die space and didn't want to make RAM even smaller just to fit dedicated ROM in. Plus, with all the advances of the Propeller 2, we really didn't have the need for as much ROM as in the Propeller 1. From my understanding, doing it this way saved us some complexity, control circuitry, and simplified block placement, making for a little more practical room for the synth'd cogs.

Jeff Martin · 2012-03-09 16:52

Cluso99 wrote: »

A qusestion if you have time... Have you seen the new Raspberry Pi ARM chip that has connection for the ram (dram i think) on the top? A neat idea but I suppose expensive. Stackable chips.

No, I haven't seen that yet. Yes, seems expensive, but I didn't see anything special about the pictures I saw doing a quick search. Can you send me a link?

pedward · 2012-03-09 16:59

Jeff Martin wrote: »

No, I haven't seen that yet. Yes, seems expensive, but I didn't see anything special about the pictures I saw doing a quick search. Can you send me a link?

http://www.raspberrypi.org/archives/592

pedward · 2012-03-09 17:07

Darreen wrote: »

A few of the points raised here http://debugmo.de/2011/11/almost-secure/ maybe relevant

Darren

I read that section. The attack outlined assumes that you have some method to trick the system into decrypting a ciphertext and give you the plaintext. The idea isn't to hide the plaintext once the chip has booted, a straight up exploit will enable you to read the HUB memory in user mode.

The other assumption the attack makes is that you have 2 ciphertexts encoded with the same key. You have one ciphertext that you can get the plaintext of and a second ciphertext that you don't have access to in plaintext form, so you stuff it into a region you CAN read. The boot-loader must be compromised in order to make this attack feasible, and you must have some way of getting the plaintext.

I don't disagree that you can replace blocks of the ciphertext, and I don't disagree you can corrupt a ciphertext, but I don't see a viable attack with this architecture.

With the whole disk encryption example presented, certainly CBC falls down. However, in this particular design CBC does what it needs to without exposing the developer. To carry out a chosen ciphertext attack, you need to know what the ciphertext does, the attacks presented against CBC are about replacing known blocks with unknown blocks and tricking the decryptor into revealing those blocks.

No guarantee or assurance has been suggested that Ring 2 code is secure once booted, however it *is* secure against IP theft and the chip won't properly execute code not encrypted by it's own key.

Jeff Martin · 2012-03-09 17:21

pedward wrote: »

http://www.raspberrypi.org/archives/592

Cool. At one point Chip and I discussed doing a stacked die inside the chip package, since we couldn't combine our chip process with flash process on the same die... but just seemed like that'd be fraught with gotchas and cost.

Parallax Propeller 2 Code Authentication and Protection

Comments