I'm trying to shift everyone's focus from "code protection" to "code authentication", because the first step to protecting anything is authenticating the parties.
I would tend to operate the urge the opposite direction. I would encourage ignoring code authentication and focus on code protection.
Simply put, If I want to run my own code I'll simpy swap out the prop on the board for my own. What does code authentication get you when your attacker has a soldering iron?
Code protection on the other hand is about hiding the "secret sauce" which is in the eeprom.
By and far, the biggest weakness of the P2 code protection is the human being that is programming it.
Which is why it would be amazing if we could do public key cryptography with the secret key being generated on the chip. It then wouldn't be possible for the inept (or corrupt) employee to steal the key.
Authentication is more important because you have to protect the secret key. If the code was decrypted by the on-chip ROM code, then there would a single algorithm programmed into the P2 at fab time. If that algorithm falls to some new attack, the masks have to be redone. If you put authentication into the chip, using a well vetted hashing algorithm, then the decryption happens at a different level.
In existing parlance, the ROM code is Ring 0 and can do whatever it wants, Ring 1 is authenticated code and has access to a copy of the key, but not the internal key, Ring 2 is the user code payload of 126K of hub memory.
You perform a chain of authentication, Ring 0 is unencrypted an runs from ROM, Ring 1 is unencrypted, loaded from EEPROM and authenticated, Ring 2 is encrypted and loaded from EEPROM. Authentication is implied in Ring 2 because it uses the secret key to be encrypted. By the time Ring 2 code is run, the secret key is long gone and inaccessible.
Each increase in Ring results in a decrease in privilege, until you are running buggy user code. The amount of possibly buggy code is kept to a minimum in Ring 0 and Ring 1, simplicity results in less bugs. Furthermore, the source code for Ring 0 and Ring 1 can be publicly disclosed without fear of attack because the security doesn't rely on obscurity.
So, we have a chained authentication, privilege de-escalation, and encryption. This method secures, authenticates, and helps protect developers from their own bugs by ensuring and code injection runs at Ring 2. This means code injection can reveal the source to your program, which is the most likely attack to code protection.
No matter how you slice it, you cannot have a truly secure system when the key is known to the device.
It depends on what you mean. If you mean that you can't have an obfuscated system where nobody can decrypt the code, then you're correct -- since the device can decrypt the code, obviously an attacker will be able to get it if they invest enough effort (but it may have to be significant effort!). However, you can arrange things so that no attacker can change the code or download new code: just use a public key/private key system where the public key is on the device and the private key is kept in a very secure location available only to the developer.
However, you can arrange things so that no attacker can change the code or download new code: just use a public key/private key system where the public key is on the device and the private key is kept in a very secure location available only to the developer.
I'm more concerned about the LFSR-based encryption. Has anyone well-steeped in the cryptographic art actually tested its security? There needs to be more analysis than assuming that more bits are better. I have a friend who likes to say, "I don't know knots, so I use lots." I would not trust my life to his knot-tying skills.
I have read a lot of the latest discussion. Most of it went right over my head.
However, I do not see the reason to have public and private keys. The key is always private (and held in the chips fuses). Yes, there is a copy held by the organisation designning the prop product and they have to ensure that key is kept private. This is no different anywhere.
It seems to me that the point with the prop that is being missed, is that each secure product has its own key in the fuses. It is therefore not general encryption like in the PSP etc. As long as a well known and proven encryption scheme is implemented then surely this is sufficient? And for this to be secure from brute force attacks, a large block of data must be decoded, not just a tiny block! That is why I said 32KB as a minimum.
But, if there is room for more bits (thanks Beau), then a couple of these could be used to indicate what the minimum size block to be loaded is. Therefore, the designer could mandate that 256KB is downloaded using out-of-order loading and overlapped loading, just to get 192KB of hub ram. On top of that, the designer can also add his own decryption on top of that. Seems as long as the decryption ROM code executes is a proven method, the prop will be secure enough. Even if someones code is broken by brute force, no other code is broken, and the same effort would be required to break another props code.
Doesn't this make sense??? Its a lot easier than multiple keys with short blocks that would permit simple brute force attacks???
It depends on what you mean. If you mean that you can't have an obfuscated system where nobody can decrypt the code, then you're correct -- since the device can decrypt the code, obviously an attacker will be able to get it if they invest enough effort (but it may have to be significant effort!). However, you can arrange things so that no attacker can change the code or download new code: just use a public key/private key system where the public key is on the device and the private key is kept in a very secure location available only to the developer.
Even this can be broken. Many years ago there was console that was cartridge based. It used a 32-bit encryption key (which, at the time, was pretty decent) and the key was held in inaccessible ROM. On startup, the bootstrap code decoded the first X Kb of the cart into RAM. Friends of mine were able to extract the decryption code (as it couldn't be encrypted). They then ran a static analysis of the code to figure out the cycle counts of the various paths when branches were taken vs not taken. Because the code was not written to take identical periods of time through all paths, they were able to then analyze the memory controller lines to figure out the exact moment each new data line was accessed (in chunks of 16 bytes).
With that information, they ran a PC-based simulation of the algorithm to do pruning on the key space. If given key could not possibly cause memory to be accessed in the measured manner, it was rejected. In the end it took about 14 hours for the sim to run and find the key, on a 486-33. Once they had the key they could decrypt any existing cart, or load their own software into the machine. It meant they were able to load their software onto store-bought consumer hardware instead of having to buy the overpriced development kits.
My point is, if someone wants it bad enough, they'll probably figure out a way to get it.
One possible way to protect against that attack is to use thin strips of aluminum for the fuse bits, so that if the chip gets depotted, the aluminum should oxidize pretty quickly in a form that hides whether the fuse was open or closed, preventing them from being physically read.
But, as soon as this was known the attacker would only have to open the chip in an oxygen-free environment. E.g. something like those boxes labs use for working with dangerous microbes, filled with an inertial gas of some kind.
As a counter example, the Atari 7800 (6502 & 4K of RAM) calculated a digital signature (public key) of the cartridge ROM during startup. If the calculated signature didn't match the one stored in the cartridge, it would boot into an Atari 2600 compatible mode. This digital signature prevented homebrew games until the original private key was recovered from a salvaged hard drive.
This was pre-GPU and the availability of similar massive compute power, but it shows how effective good encryption can be at keeping things secret.
Again, as Heater said: Rule number one in the security world is "DO NOT TRY TO INVENT YOUR OWN ENCRYPTION ALGORITHM". Which can be extended to: don't try to do it yourself as the penalty for failure is very harsh. Learn from other's mistakes as you only have one chance to get it right.
A couple of decades ago I worked at Bell Northern Research with a group doing X.25 encryption (3DES with Diffie-Hellman key exchange). Two of the members of the group were well versed in crypto. One spent his time considering what is now called man-in-the-middle attacks, while the other looked for weak keys (and ways to avoid them). I developed a healthy respect for how difficult crypto is to get right. (About as hard as redundant & fault tolerant computing.)
I'm more concerned about the LFSR-based encryption. Has anyone well-steeped in the cryptographic art actually tested its security?
From Wikipedia:
LFSRs have long been used as pseudo-random number generators for use in stream ciphers (especially in military cryptography), due to the ease of construction from simple electromechanical or electronic circuits, long periods, and very uniformly distributed output streams. However, an LFSR is a linear system, leading to fairly easy cryptanalysis. For example, given a stretch of known plaintext and corresponding ciphertext, an attacker can intercept and recover a stretch of LFSR output stream used in the system described, and from that stretch of the output stream can construct an LFSR of minimal size that simulates the intended receiver by using the Berlekamp-Massey algorithm. This LFSR can then be fed the intercepted stretch of output stream to recover the remaining plaintext.
Important LFSR-based stream ciphers include A5/1 and A5/2, used in GSM cell phones, E0, used in Bluetooth, and the shrinking generator. The A5/2 cipher has been broken and both A5/1 and E0 have serious weaknesses.
It's not that we disagree with you. I think we all agree on adopting know
algorithms rather than inventing something tne for the Prop. It's just that down
there in the transistors and gates of the implementation there is scope for
overlooking some detail that makes it trivially easy to get the keys out when
discovered.
Looking back I see that you and I made the same suggestion in posts #702 and
#703. Lets see if I can spell it out for clarity:
Authentication:
1) The binary is kept in the clear.
2) The binary is signed with some key.
3) When downloaded the binary is authenticated against the key bits in the
fuses.
4) If it does not authenticate it is not run.
Encryption: When you want to hide you code:
1) The binary includes an decryption algorithm, in the clear. No harm in it
being in the clear we are not trying to hide it.
2) The binary contains the user program encrypted.
3) The binary is downloaded and authenticated and the decryptor code run.
4) The successfully decrypted user code is now run.
Questions:
Did I get this right? Miss any steps?
What advantages does this have over decryption only? Clearly the user can
now select whatever decryption algorithm he likes. What else?
It was suggested to use SHA2, blown into ROM, for the authentication step, is
that small enough? I suspect that it plus the rest of the down loader protocol
has to execute from a single COG.
I do worry that this is more complex that straight encryption and hence more prone to breakage.
For example the key now used by the authenticator and then passed to the decryptor. A user
adopting a poor decryptor implementation may now find that it makes the keys available.
I think a secure technique that shifts the decryption responsibility to the user is a good idea, leaving the ROM code to do authentication only. That gets Parallax off the hook from having a dodgy built-in decryption algorithm broken.
I think a secure technique that shifts the decryption responsibility to the user is a good idea, leaving the ROM code to do authentication only. That gets Parallax off the hook from having a dodgy built-in decryption algorithm broken.
One would have to ask "why bother having to implement your own code protection", when I could use an ARM chip for example without the above headache and have code security.
It gets parallax of the hook from having a dodgy built-in decryption algorithm but then it shifts that particular problem to the authentication algorithm and implementation.
Also I worry that it requires using the keys in the authentication and then passing them to the decryption rather than closing access to them for the rest of run time. Is that not opening further risk for a hole?
While this topic is interesting, the general point is missed. The purpose isn't to make the next great "bullet-proof" crypto chip, the goal is to give the Project Manager a modicum of comfort that if they choose the Propeller, some low-skilled/resourced yahoo isn't going to copy the code and create a knock off. The chips out there that are widely used in embedded devices simply use a plastic case to "protect the code", that is, the code is stored in flash and is fully readable by someone with a silicon device lab (Parallax has the necessary equipment in their lab), but this is fully adequate for 90% of the market.
While working at Parallax I was a major champion for putting fuses on board to provide some protection to the code because I endlessly received calls from people who loved the chip but because of their company's paranoia couldn't use it because the code was stored in plain-text in an easily accessible external device. All the company wants to know is "is my code protected", they really don't care how its done just that its there so they can tick it on the check list for their next-generation processor.
All the company wants to know is "is my code protected", ...
And if there's a published exploit for deciphering the code in EEPROM, due to weak encryption, the answer would have to be, "No, sorry, it's not." There's a difference between having to have expensive equipment to break code protection in flash, and just an internet connection to download cracking tools for free.
Even authentication can be implemented incorrectly - see the Wii where the second stage boot loader (authenticated by the first stage boot against a OTP hash) had a bug in the code used to authenticate the next stage. So while the second stage boot loader was eventually updated in new Wiis (without updating the first stage boot loader in ROM), this couldn't be done for existing Wiis.
Hmm... but I think I see where people are going with this. In theory the authentication should be easier to implement correctly and then if the customer wants encryption they can implement it themselves. So, the boot loader runs calculates a digital signature of the data loaded into HUB RAM. The problem I see with the idea is with key storage:
1. The authentication algorithm will need to at least store the key and possibly the initialization vector on chip. If the MAC isn't stored onchip it will use some of the EEPROM storage. (Assuming the MAC is longer than the key.)
2. If the loaded code needs to perform decryption, it will almost certainly need to store that key on chip. Otherwise an attacker could decrypt the payload and steal the IP.
However, if the boot loader handles the decryption then only one key will need to be stored on chip and the code will automatically be authenticated.
...the goal is to give the Project Manager a modicum of comfort that if they
choose the Propeller, some low-skilled/resourced yahoo isn't going to copy the
code and create a knock off.
Yes indeed. Perhaps a low skilled yahoo could not break the thing by himself. But don't
forget those yahoos have the resources of the internet behind them. Remember
that DVDs were protected from copying by yahoos until they had the help of:
In October 1999, Jon Lech Johansen and two people who have remained anonymous
reverse engineered CSS and created DeCSS to share the exploit with others, in a
striking example of the trusted client problem. Not long after, CSS was further
revealed to be easily susceptible to a brute force attack, which is implemented
by the widely used libdvdcss; the brute-force attack works even if the keys
cannot be retrieved from the lead-in area, as is the case when the DVD's region
code is different from that of the drive. This allows region-free DVD player
software to work with region-locked drives.
As soon a the Props II protection falls to a future Jon Lech Johansen the "code
protection" is no longer on the spec sheet check list.
Eric,
Yes, this authentication business does seem to make juggling keys harder and
more prone to failure.
There were probably a million more people trying to break DVD than would ever be trying to break Prop2 though...
It all depends on what the Prop2 is used for. IIRC there are a few microcontrollers with security bits which are used in thing like smartcards or conditional access modules which have been determined to be, ummm..., less than secure.
And even then, I don't think it matters. Even in cases like DVD, PS3, or Wii, 99% of the effort was done by a small number of people. It's not a crowdsourced brute-force attack but the lone cracker who has decided your encryption is his current challenge. Thus the first step is to thwart attackers who aren't able to decap the chip and read the keys with an electron microscope, but can and will find any weaknesses in your implementation.
Even this can be broken. Many years ago there was console that was cartridge based. It used a 32-bit encryption key (which, at the time, was pretty decent) and the key was held in inaccessible ROM. On startup, the bootstrap code decoded the first X Kb of the cart into RAM. Friends of mine were able to extract the decryption code (as it couldn't be encrypted). They then ran a static analysis of the code to figure out the cycle counts of the various paths when branches were taken vs not taken. Because the code was not written to take identical periods of time through all paths, they were able to then analyze the memory controller lines to figure out the exact moment each new data line was accessed (in chunks of 16 bytes).
With that information, they ran a PC-based simulation of the algorithm to do pruning on the key space. If given key could not possibly cause memory to be accessed in the measured manner, it was rejected. In the end it took about 14 hours for the sim to run and find the key, on a 486-33. Once they had the key they could decrypt any existing cart, or load their own software into the machine. It meant they were able to load their software onto store-bought consumer hardware instead of having to buy the overpriced development kits.
My point is, if someone wants it bad enough, they'll probably figure out a way to get it.
That is a perfect example of Smile-poor implementation. I've had similar experience with a product and it took only a few hours to figure out how to break their scheme, not recover the key, then I was able to recover the data I wanted.
I've also broken commercial key generation using a disassembler, took about 18 hours of probing code to find the algorithm and implement it.
It's not that we disagree with you. I think we all agree on adopting know
algorithms rather than inventing something tne for the Prop. It's just that down
there in the transistors and gates of the implementation there is scope for
overlooking some detail that makes it trivially easy to get the keys out when
discovered.
Looking back I see that you and I made the same suggestion in posts #702 and
#703. Lets see if I can spell it out for clarity:
Authentication:
1) The binary is kept in the clear.
2) The binary is signed with some key.
3) When downloaded the binary is authenticated against the key bits in the
fuses.
4) If it does not authenticate it is not run.
Encryption: When you want to hide you code:
1) The binary includes an decryption algorithm, in the clear. No harm in it
being in the clear we are not trying to hide it.
2) The binary contains the user program encrypted.
3) The binary is downloaded and authenticated and the decryptor code run.
4) The successfully decrypted user code is now run.
Questions:
Did I get this right? Miss any steps?
What advantages does this have over decryption only? Clearly the user can
now select whatever decryption algorithm he likes. What else?
It was suggested to use SHA2, blown into ROM, for the authentication step, is
that small enough? I suspect that it plus the rest of the down loader protocol
has to execute from a single COG.
I do worry that this is more complex that straight encryption and hence more prone to breakage.
For example the key now used by the authenticator and then passed to the decryptor. A user
adopting a poor decryptor implementation may now find that it makes the keys available.
You got it.
"Decryption only" requires the algorithm to be implemented in ROM, if that algorithm falls to some new attack or analysis, then you have to change the mask for the P2 for a new algorithm. By implementing SHA-256 in ROM, you have the strongest hash that can be implemented in 32 bits. A hash is one-way, so cracking it requires much more effort, as statistical analysis won't work because the hash irrecoverably compresses the input data, you can't recover the secret key from processing the hash signature. SHA-256 will compress 16000 bits into 256 bits, even after 1 bit is hashed, the original data is lost.
Rainbow tables are a common way to crack a hash, by using a 128 bit salt added to a known plaintext, even if you subtract the plaintext, you still have 2^128 possible inputs to the hash that generate a unique result. If you were to only hash the 128 bit key, with only 1 round, you haven't done your job. However, the plaintext and the key (salt) are inexorably linked and can't be separated when trying to break a hash.
I'm working on a SHA-256 implementation that is intended to fit in a small space, this effort is independent of Mark's reference implementation. I'm using all the tricks I can come up with to shave size, right now it's under 200 longs. There should be enough room in COG ram to implement the i2c loader and SHA-256, you need further space to implement decryption, hence the chained authentication.
The notions I have adopted to describe this scheme follow traditional security model of Ring 0, Ring 1, Ring 2.
Ring 0 code is ROM code and has supreme access to all chip resources, including reading the secret key. Once the Ring 1 code is authenticated, access to the secret key fuse bits is blocked.
Ring 1 code has a copy of the secret key in memory, this means any Ring 1 bootloader must be a secure implementation; the reference implementation would be provided by Parallax and be clear and open. Ring 1 code must dispose of the secret key before executing Ring 2 code.
Ring 2 code is user code and considered insecure. Because Ring 2 is considered insecure, it does not have any privileges and the secret key is inaccessible to it.
I realize people might get squeamish when they hear "Ring 1 code has the key", but that is necessary and accepted. Ring 1 code is written to be as bug free and secure as possible, if a user decides to implement their own Ring 1 code, they are on the hook for ensuring it is secure. RING 1 CODE MUST DISPOSE OF THE KEY!
Yes indeed. Perhaps a low skilled yahoo could not break the thing by himself. But don't
forget those yahoos have the resources of the internet behind them. Remember
that DVDs were protected from copying by yahoos until they had the help of:
As soon a the Props II protection falls to a future Jon Lech Johansen the "code
protection" is no longer on the spec sheet check list.
Eric,
Yes, this authentication business does seem to make juggling keys harder and
more prone to failure.
The beauty with the design I propose, is if the encryption is found to be weak, you simply need to recompile a new Ring 1 bootloader and push both the Ring 1 bootloader and the Ring 2 encrypted code to your devices. You aren't paralyzed by a bad algorithm. Yes, you have a bunch of devices in the field that need updating. If you use a key division system where part of the 128 bit key is dedicated to a family, model, and stepping (like Intel chips), you have a unique key for each subset of device you make and not all the devices are broken.
There were probably a million more people trying to break DVD than would ever be trying to break Prop2 though...
DVD and Blu-ray were broken not because of the algorithms, but because of the key management. Key management is the most important part of crypto. That is why I propose key-rings for the secret keys, so they are protected by an additional layer of protection while on the developer's computer. This also eliminates using plaintext pass phrases to generate the key in the first place, further weakening the quality.
An 8 character password, as typed at the keyboard, only has 48 bits of quality. You would need to type a 22 character passphrase to generate a 128 bit quality key.
A common method of generating keys from passphrases involves taking the passphrase and running it through a hash > 1000 times, hashing the hash and an accumulator value. This is repeatable and can sufficiently obscure the original passphrase. I would choose to leave this up to existing code like OpenSSL. I would recommend using the OpenSSL library to implement key generation, hashing, and encryption on the developer side. I would then recommend a keyring program like KeePass for storing the secret keys. I have KeePass on my phone and desktop for storing passwords.
I fail to see how a small open decryption code downloaded with keys is very secure. Surely, all I have to do is replace the open decryption routine with a simple routine to dump the props key and I have broken it.
For me, the decryption code must not be downloadable, but must execute from within the chip (i.e. from ROM or copied into cog to run). This must decrypt the downloaded code (downloaded = downloaded or eeprom or SDcard). The bigger the minimum code is the slower brute force attacks can be. Out-of-order and overlapped just adds to the complexity to attack.
If and when some particular code is broken, then the maker just has to use a new key with newly encrypted code. It does not translate that once a particular code is broken, all codes are broken. This is the beauty of the scheme, no matter what encryption method is used.
Contrary to the notion, having more encrypted data, rather than less, actually makes the encryption more vulnerable.
Think of it this way, if I had to guess your Facebook password, I would want to know a lot about you, encryption ciphertext attacks rely on the same principle. If you have only one example of a program, you will need to analyze that program over and over to try and derive the plaintext. In comparison, if I have 100 different samples that are encrypted with the same key, I can infer the key a lot faster and infer the structure or patterns.
This particular attack is something that many algorithms are vulnerable to, so in 1976 IBM came up with the concept of Cipher Block Chaining, or CBC mode. In simple terms, it adds in a repeatable bit of entropy at each block encryption step, so no 2 blocks have the same ciphertext for the same input.
This means that patterns of MOV or WAITCNT are not telegraphed through the ciphertext such that statistical analysis or frequency analysis can be used to derive the structure of the plaintext.
The biggest threat to security is the meat behind the keyboard. Generating low quality keys, storing generated keys in open sight, browsing porn sites and getting viruses, etc are all very valid threats to security.
If industrial espionage was undertaken against a Prop 2 product, they would first read up on how the security was implemented, try a couple of notions, then realize that attacking the developer is more sensible.
They would research the developer on the forums, open a friendly dialog, exchange emails to get the dev's IP addresses, then perform penetration testing on the Dev's computer. If that doesn't succeed, they will anonymously send poisoned emails or get the dev to visit a site under the auspices of friendly communication, where a web browser based trojan/attack will be used to gain access to the Dev's computer. Then they effectively have remote desktop access to the Dev's computer. They would only need to install a keylogger to recover the passphrase, if that was used, to secure the key-ring. Alternatively the Dev could use a fingerprint reader to obscure the access, or a Dallas 1-wire security token for authentication.
So, security begins and ends in meatspace; being unaware of potential threats, engaging in risky behavior, and not keeping your malware defense and software up to date is the recipe for compromising a Prop 2 product.
That is much simpler and easier than attacking the technology.
Comments
I would tend to operate the urge the opposite direction. I would encourage ignoring code authentication and focus on code protection.
Simply put, If I want to run my own code I'll simpy swap out the prop on the board for my own. What does code authentication get you when your attacker has a soldering iron?
Code protection on the other hand is about hiding the "secret sauce" which is in the eeprom.
Which is why it would be amazing if we could do public key cryptography with the secret key being generated on the chip. It then wouldn't be possible for the inept (or corrupt) employee to steal the key.
In existing parlance, the ROM code is Ring 0 and can do whatever it wants, Ring 1 is authenticated code and has access to a copy of the key, but not the internal key, Ring 2 is the user code payload of 126K of hub memory.
You perform a chain of authentication, Ring 0 is unencrypted an runs from ROM, Ring 1 is unencrypted, loaded from EEPROM and authenticated, Ring 2 is encrypted and loaded from EEPROM. Authentication is implied in Ring 2 because it uses the secret key to be encrypted. By the time Ring 2 code is run, the secret key is long gone and inaccessible.
Each increase in Ring results in a decrease in privilege, until you are running buggy user code. The amount of possibly buggy code is kept to a minimum in Ring 0 and Ring 1, simplicity results in less bugs. Furthermore, the source code for Ring 0 and Ring 1 can be publicly disclosed without fear of attack because the security doesn't rely on obscurity.
So, we have a chained authentication, privilege de-escalation, and encryption. This method secures, authenticates, and helps protect developers from their own bugs by ensuring and code injection runs at Ring 2. This means code injection can reveal the source to your program, which is the most likely attack to code protection.
Eric
I still haven't seen anyone attack my proposal in a meaningful way, only to offer up "but an expert should do it".
EDIT:
http://en.wikipedia.org/wiki/Salt_(cryptography)
We have heard your concerns and I am able, meaning there is enough room on the Silicon, that we can bump the number of fuse bits up from 65 to 129.
I'm more concerned about the LFSR-based encryption. Has anyone well-steeped in the cryptographic art actually tested its security? There needs to be more analysis than assuming that more bits are better. I have a friend who likes to say, "I don't know knots, so I use lots." I would not trust my life to his knot-tying skills.
-Phil
However, I do not see the reason to have public and private keys. The key is always private (and held in the chips fuses). Yes, there is a copy held by the organisation designning the prop product and they have to ensure that key is kept private. This is no different anywhere.
It seems to me that the point with the prop that is being missed, is that each secure product has its own key in the fuses. It is therefore not general encryption like in the PSP etc. As long as a well known and proven encryption scheme is implemented then surely this is sufficient? And for this to be secure from brute force attacks, a large block of data must be decoded, not just a tiny block! That is why I said 32KB as a minimum.
But, if there is room for more bits (thanks Beau), then a couple of these could be used to indicate what the minimum size block to be loaded is. Therefore, the designer could mandate that 256KB is downloaded using out-of-order loading and overlapped loading, just to get 192KB of hub ram. On top of that, the designer can also add his own decryption on top of that. Seems as long as the decryption ROM code executes is a proven method, the prop will be secure enough. Even if someones code is broken by brute force, no other code is broken, and the same effort would be required to break another props code.
Doesn't this make sense??? Its a lot easier than multiple keys with short blocks that would permit simple brute force attacks???
Even this can be broken. Many years ago there was console that was cartridge based. It used a 32-bit encryption key (which, at the time, was pretty decent) and the key was held in inaccessible ROM. On startup, the bootstrap code decoded the first X Kb of the cart into RAM. Friends of mine were able to extract the decryption code (as it couldn't be encrypted). They then ran a static analysis of the code to figure out the cycle counts of the various paths when branches were taken vs not taken. Because the code was not written to take identical periods of time through all paths, they were able to then analyze the memory controller lines to figure out the exact moment each new data line was accessed (in chunks of 16 bytes).
With that information, they ran a PC-based simulation of the algorithm to do pruning on the key space. If given key could not possibly cause memory to be accessed in the measured manner, it was rejected. In the end it took about 14 hours for the sim to run and find the key, on a 486-33. Once they had the key they could decrypt any existing cart, or load their own software into the machine. It meant they were able to load their software onto store-bought consumer hardware instead of having to buy the overpriced development kits.
My point is, if someone wants it bad enough, they'll probably figure out a way to get it.
-Tor
This was pre-GPU and the availability of similar massive compute power, but it shows how effective good encryption can be at keeping things secret.
Again, as Heater said: Rule number one in the security world is "DO NOT TRY TO INVENT YOUR OWN ENCRYPTION ALGORITHM". Which can be extended to: don't try to do it yourself as the penalty for failure is very harsh. Learn from other's mistakes as you only have one chance to get it right.
A couple of decades ago I worked at Bell Northern Research with a group doing X.25 encryption (3DES with Diffie-Hellman key exchange). Two of the members of the group were well versed in crypto. One spent his time considering what is now called man-in-the-middle attacks, while the other looked for weak keys (and ways to avoid them). I developed a healthy respect for how difficult crypto is to get right. (About as hard as redundant & fault tolerant computing.)
From Wikipedia:
LFSRs have long been used as pseudo-random number generators for use in stream ciphers (especially in military cryptography), due to the ease of construction from simple electromechanical or electronic circuits, long periods, and very uniformly distributed output streams. However, an LFSR is a linear system, leading to fairly easy cryptanalysis. For example, given a stretch of known plaintext and corresponding ciphertext, an attacker can intercept and recover a stretch of LFSR output stream used in the system described, and from that stretch of the output stream can construct an LFSR of minimal size that simulates the intended receiver by using the Berlekamp-Massey algorithm. This LFSR can then be fed the intercepted stretch of output stream to recover the remaining plaintext.
Important LFSR-based stream ciphers include A5/1 and A5/2, used in GSM cell phones, E0, used in Bluetooth, and the shrinking generator. The A5/2 cipher has been broken and both A5/1 and E0 have serious weaknesses.
It's not that we disagree with you. I think we all agree on adopting know
algorithms rather than inventing something tne for the Prop. It's just that down
there in the transistors and gates of the implementation there is scope for
overlooking some detail that makes it trivially easy to get the keys out when
discovered.
Looking back I see that you and I made the same suggestion in posts #702 and
#703. Lets see if I can spell it out for clarity:
Authentication:
1) The binary is kept in the clear.
2) The binary is signed with some key.
3) When downloaded the binary is authenticated against the key bits in the
fuses.
4) If it does not authenticate it is not run.
Encryption: When you want to hide you code:
1) The binary includes an decryption algorithm, in the clear. No harm in it
being in the clear we are not trying to hide it.
2) The binary contains the user program encrypted.
3) The binary is downloaded and authenticated and the decryptor code run.
4) The successfully decrypted user code is now run.
Questions:
Did I get this right? Miss any steps?
What advantages does this have over decryption only? Clearly the user can
now select whatever decryption algorithm he likes. What else?
It was suggested to use SHA2, blown into ROM, for the authentication step, is
that small enough? I suspect that it plus the rest of the down loader protocol
has to execute from a single COG.
I do worry that this is more complex that straight encryption and hence more prone to breakage.
For example the key now used by the authenticator and then passed to the decryptor. A user
adopting a poor decryptor implementation may now find that it makes the keys available.
-Phil
One would have to ask "why bother having to implement your own code protection", when I could use an ARM chip for example without the above headache and have code security.
Just a thought.
It gets parallax of the hook from having a dodgy built-in decryption algorithm but then it shifts that particular problem to the authentication algorithm and implementation.
Also I worry that it requires using the keys in the authentication and then passing them to the decryption rather than closing access to them for the rest of run time. Is that not opening further risk for a hole?
While working at Parallax I was a major champion for putting fuses on board to provide some protection to the code because I endlessly received calls from people who loved the chip but because of their company's paranoia couldn't use it because the code was stored in plain-text in an easily accessible external device. All the company wants to know is "is my code protected", they really don't care how its done just that its there so they can tick it on the check list for their next-generation processor.
-Phil
Hmm... but I think I see where people are going with this. In theory the authentication should be easier to implement correctly and then if the customer wants encryption they can implement it themselves. So, the boot loader runs calculates a digital signature of the data loaded into HUB RAM. The problem I see with the idea is with key storage:
1. The authentication algorithm will need to at least store the key and possibly the initialization vector on chip. If the MAC isn't stored onchip it will use some of the EEPROM storage. (Assuming the MAC is longer than the key.)
2. If the loaded code needs to perform decryption, it will almost certainly need to store that key on chip. Otherwise an attacker could decrypt the payload and steal the IP.
However, if the boot loader handles the decryption then only one key will need to be stored on chip and the code will automatically be authenticated.
No the general point is not missed.
Yes indeed. Perhaps a low skilled yahoo could not break the thing by himself. But don't
forget those yahoos have the resources of the internet behind them. Remember
that DVDs were protected from copying by yahoos until they had the help of:
As soon a the Props II protection falls to a future Jon Lech Johansen the "code
protection" is no longer on the spec sheet check list.
Eric,
Yes, this authentication business does seem to make juggling keys harder and
more prone to failure.
It all depends on what the Prop2 is used for. IIRC there are a few microcontrollers with security bits which are used in thing like smartcards or conditional access modules which have been determined to be, ummm..., less than secure.
And even then, I don't think it matters. Even in cases like DVD, PS3, or Wii, 99% of the effort was done by a small number of people. It's not a crowdsourced brute-force attack but the lone cracker who has decided your encryption is his current challenge. Thus the first step is to thwart attackers who aren't able to decap the chip and read the keys with an electron microscope, but can and will find any weaknesses in your implementation.
That is a perfect example of Smile-poor implementation. I've had similar experience with a product and it took only a few hours to figure out how to break their scheme, not recover the key, then I was able to recover the data I wanted.
I've also broken commercial key generation using a disassembler, took about 18 hours of probing code to find the algorithm and implement it.
Yeah - I think that most of us with any RE experience have all dabbled in that murky area of the duck pond :-)
I must be dense. I still don't see a compelling reason to do code authentication.
You got it.
"Decryption only" requires the algorithm to be implemented in ROM, if that algorithm falls to some new attack or analysis, then you have to change the mask for the P2 for a new algorithm. By implementing SHA-256 in ROM, you have the strongest hash that can be implemented in 32 bits. A hash is one-way, so cracking it requires much more effort, as statistical analysis won't work because the hash irrecoverably compresses the input data, you can't recover the secret key from processing the hash signature. SHA-256 will compress 16000 bits into 256 bits, even after 1 bit is hashed, the original data is lost.
Rainbow tables are a common way to crack a hash, by using a 128 bit salt added to a known plaintext, even if you subtract the plaintext, you still have 2^128 possible inputs to the hash that generate a unique result. If you were to only hash the 128 bit key, with only 1 round, you haven't done your job. However, the plaintext and the key (salt) are inexorably linked and can't be separated when trying to break a hash.
I'm working on a SHA-256 implementation that is intended to fit in a small space, this effort is independent of Mark's reference implementation. I'm using all the tricks I can come up with to shave size, right now it's under 200 longs. There should be enough room in COG ram to implement the i2c loader and SHA-256, you need further space to implement decryption, hence the chained authentication.
The notions I have adopted to describe this scheme follow traditional security model of Ring 0, Ring 1, Ring 2.
Ring 0 code is ROM code and has supreme access to all chip resources, including reading the secret key. Once the Ring 1 code is authenticated, access to the secret key fuse bits is blocked.
Ring 1 code has a copy of the secret key in memory, this means any Ring 1 bootloader must be a secure implementation; the reference implementation would be provided by Parallax and be clear and open. Ring 1 code must dispose of the secret key before executing Ring 2 code.
Ring 2 code is user code and considered insecure. Because Ring 2 is considered insecure, it does not have any privileges and the secret key is inaccessible to it.
I realize people might get squeamish when they hear "Ring 1 code has the key", but that is necessary and accepted. Ring 1 code is written to be as bug free and secure as possible, if a user decides to implement their own Ring 1 code, they are on the hook for ensuring it is secure. RING 1 CODE MUST DISPOSE OF THE KEY!
The beauty with the design I propose, is if the encryption is found to be weak, you simply need to recompile a new Ring 1 bootloader and push both the Ring 1 bootloader and the Ring 2 encrypted code to your devices. You aren't paralyzed by a bad algorithm. Yes, you have a bunch of devices in the field that need updating. If you use a key division system where part of the 128 bit key is dedicated to a family, model, and stepping (like Intel chips), you have a unique key for each subset of device you make and not all the devices are broken.
DVD and Blu-ray were broken not because of the algorithms, but because of the key management. Key management is the most important part of crypto. That is why I propose key-rings for the secret keys, so they are protected by an additional layer of protection while on the developer's computer. This also eliminates using plaintext pass phrases to generate the key in the first place, further weakening the quality.
An 8 character password, as typed at the keyboard, only has 48 bits of quality. You would need to type a 22 character passphrase to generate a 128 bit quality key.
A common method of generating keys from passphrases involves taking the passphrase and running it through a hash > 1000 times, hashing the hash and an accumulator value. This is repeatable and can sufficiently obscure the original passphrase. I would choose to leave this up to existing code like OpenSSL. I would recommend using the OpenSSL library to implement key generation, hashing, and encryption on the developer side. I would then recommend a keyring program like KeePass for storing the secret keys. I have KeePass on my phone and desktop for storing passwords.
For me, the decryption code must not be downloadable, but must execute from within the chip (i.e. from ROM or copied into cog to run). This must decrypt the downloaded code (downloaded = downloaded or eeprom or SDcard). The bigger the minimum code is the slower brute force attacks can be. Out-of-order and overlapped just adds to the complexity to attack.
If and when some particular code is broken, then the maker just has to use a new key with newly encrypted code. It does not translate that once a particular code is broken, all codes are broken. This is the beauty of the scheme, no matter what encryption method is used.
Contrary to the notion, having more encrypted data, rather than less, actually makes the encryption more vulnerable.
Think of it this way, if I had to guess your Facebook password, I would want to know a lot about you, encryption ciphertext attacks rely on the same principle. If you have only one example of a program, you will need to analyze that program over and over to try and derive the plaintext. In comparison, if I have 100 different samples that are encrypted with the same key, I can infer the key a lot faster and infer the structure or patterns.
http://en.wikipedia.org/wiki/Block_cipher_modes_of_operation#Cipher-block_chaining_.28CBC.29
This particular attack is something that many algorithms are vulnerable to, so in 1976 IBM came up with the concept of Cipher Block Chaining, or CBC mode. In simple terms, it adds in a repeatable bit of entropy at each block encryption step, so no 2 blocks have the same ciphertext for the same input.
This means that patterns of MOV or WAITCNT are not telegraphed through the ciphertext such that statistical analysis or frequency analysis can be used to derive the structure of the plaintext.
The biggest threat to security is the meat behind the keyboard. Generating low quality keys, storing generated keys in open sight, browsing porn sites and getting viruses, etc are all very valid threats to security.
If industrial espionage was undertaken against a Prop 2 product, they would first read up on how the security was implemented, try a couple of notions, then realize that attacking the developer is more sensible.
They would research the developer on the forums, open a friendly dialog, exchange emails to get the dev's IP addresses, then perform penetration testing on the Dev's computer. If that doesn't succeed, they will anonymously send poisoned emails or get the dev to visit a site under the auspices of friendly communication, where a web browser based trojan/attack will be used to gain access to the Dev's computer. Then they effectively have remote desktop access to the Dev's computer. They would only need to install a keylogger to recover the passphrase, if that was used, to secure the key-ring. Alternatively the Dev could use a fingerprint reader to obscure the access, or a Dallas 1-wire security token for authentication.
So, security begins and ends in meatspace; being unaware of potential threats, engaging in risky behavior, and not keeping your malware defense and software up to date is the recipe for compromising a Prop 2 product.
That is much simpler and easier than attacking the technology.