I fail to see how a small open decryption code downloaded with keys is
very secure.
Think of it like this. You add a checksum to the code you are going to
download. Let's assume we are not encrypting anything yet. On download the
loader verifies the checksum and rejects any mismatch. So if anyone gets hold
of your binary, from EEPROM or elsewhere and changes the content the checksum
will fail and it won't load.
Of course with a simple checksum the attacker probably knows how it is
calculated and where it is in the binary so they can trivially calculate a new
checksum for their modified binary and bingo it runs.
But what if the checksum was calculated over your binary and some "secret
stuff". The secret stuff being on your PC when you create the binary and in the
Props fuses when it downloads. Now if the attacker modifies the binary the
checksum will need changing but they don't know the secret stuff so they can't
create a valid checksum.
Now we have authentication. Add to the that the fact that the checksum
algorithm is not just a simple addition of bytes or CRC but some clever
algorithm that makes reverse engineering the checksum from the binary
"impossibly" hard. SHA 2 is under discussion here.
Surely, all I have to do is replace the open decryption routine
with a simple routine to dump the props key and I have broken it.
No, as you see from the above argument, if the attacker tweaks the open
decryption routine the authentication will fail and it won't load and the keys
cannot be dumped. Of course if the user puts a broken decryptor in there that
dumps keys then he has screwed himself.
For me, the decryption code must not be downloadable, but must execute from
within the chip.
By the above argument this is not necessary.
The bigger the minimum code is the slower brute force attacks can be
By "code" do you mean the binary code or the cipher keys. Bigger keys are
better of course.
Out-of-order and overlapped just adds to the complexity to attack.
I don't think so. The method of shulffing or overlapping should be part of the
encryption algorithm, and we want to use a known respected algorithm. Otherwise
it is just something you have dreamed up to make things more obscure with no
rational behind it.
If and when some particular code is broken, then the maker just has to use a new
key with newly encrypted code. It does not translate that once a particular code
is broken, all codes are broken.
Never mind an attacker getting your key, but if the cypher implementation in
EEPROM broken it is broken and a way is found around it then it is broken for
everyone everywhere forever...
Now here is the key point, if I understand correctly, the authentication via
SHA2 is estimated to be unbreakable. More so than the encryption algorithms. So
it better to authenticate and down load the encryption algorithm which can be
swapped for a new one when it is brought down.
Think of it like this. You add a checksum to the code you are going to
download. Let's assume we are not encrypting anything yet. On download the
loader verifies the checksum and rejects any mismatch. So if anyone gets hold
of your binary, from EEPROM or elsewhere and changes the content the checksum
will fail and it won't load.
But if the code is short, it will be easier to brute-force attack it. The longer the code, the longer it takes to decrypt and hence the longer it takes someone to try each brute force attack. The longer the code, the more varied the instructions can be, also meaning the more obscure the code can be. Remember, it is not text.
Of course with a simple checksum the attacker probably knows how it is
calculated and where it is in the binary so they can trivially calculate a new
checksum for their modified binary and bingo it runs.
But what if the checksum was calculated over your binary and some "secret
stuff". The secret stuff being on your PC when you create the binary and in the
Props fuses when it downloads. Now if the attacker modifies the binary the
checksum will need changing but they don't know the secret stuff so they can't
create a valid checksum.
Now we have authentication. Add to the that the fact that the checksum
algorithm is not just a simple addition of bytes or CRC but some clever
algorithm that makes reverse engineering the checksum from the binary
"impossibly" hard. SHA 2 is under discussion here.
But here the cracker only has to play with a few bytes or longs for a brute force attack.
No, as you see from the above argument, if the attacker tweaks the open
decryption routine the authentication will fail and it won't load and the keys
cannot be dumped. Of course if the user puts a broken decryptor in there that
dumps keys then he has screwed himself.
By the above argument this is not necessary.
By "code" do you mean the binary code or the cipher keys. Bigger keys are
better of course.
[/code]Agreed
[code]
I don't think so. The method of shulffing or overlapping should be part of the
encryption algorithm, and we want to use a known respected algorithm. Otherwise
it is just something you have dreamed up to make things more obscure with no
rational behind it.
The shuffling and overlapping would be in addition (and optional) to a secure algorithm and could be used to insert unused code which would make any statistical analysis fail.
Never mind an attacker getting your key, but if the cypher implementation in
EEPROM broken it is broken and a way is found around it then it is broken for
everyone everywhere forever...
This is a different argument. I do understand that a secure algorithm is required. And if there is a flaw, the whole method could be compromised. But, provided the algorithm is secure, breaking one suppliers key will not result in breaking the next suppliers key.
Now here is the key point, if I understand correctly, the authentication via
SHA2 is estimated to be unbreakable. More so than the encryption algorithms. So
it better to authenticate and down load the encryption algorithm which can be
swapped for a new one when it is brought down.
Surely, if you have an open decryption code with a secure calculated checksum, that checksum can be broken quite simply. Simply put, you would run the plain code through an encrypter (on the pc) with each of the 2^64 keys to find the one(s) that yield the supplied checksum. Voila, key broken.
Now here is the key point, if I understand correctly, the authentication via
SHA2 is estimated to be unbreakable. More so than the encryption algorithms. So
it better to authenticate and down load the encryption algorithm which can be
swapped for a new one when it is brought down.
Do I understand correctly?
Yes. A hash cannot be compromised to reveal the original key, you can only find a collision or failure in the algorithm. Assuming the algorithm is good, that means you have 2^128 tests that have to be performed to find the key that matches the hash.
To make it very clear, the hash is used like this:
[binary Prop 2 opcodes] + [128 bit secret key] = 16,000 bits of "message" to be hashed
Salt is a method that's been used to obscure common plaintexts for a long time. Unix used DES and a salt to encrypt your password.
Here is an excerpt from the crypt man page:
crypt() is the password encryption function. It is based on the Data Encryption Standard algorithm with variations intended (among other things) to discourage use
of hardware implementations of a key search.
key is a user's typed password.
salt is a two-character string chosen from the set [a–zA–Z0–9./]. This string is used to perturb the algorithm in one of 4096 different ways.
By taking the lowest 7 bits of each of the first eight characters of the key, a 56-bit key is obtained. This 56-bit key is used to encrypt repeatedly a constant
string (usually a string consisting of all zeros). The returned value points to the encrypted password, a series of 13 printable ASCII characters (the first two
characters represent the salt itself). The return value points to static data whose content is overwritten by each call.
Warning: The key space consists of 2**56 equal 7.2e16 possible values. Exhaustive searches of this key space are possible using massively parallel computers.
Software, such as crack(1), is available which will search the portion of this key space that is generally used by humans for passwords. Hence, password selection
should, at minimum, avoid common words and names. The use of a passwd(1) program that checks for crackable passwords during the selection process is recommended.
The DES algorithm itself has a few quirks which make the use of the crypt() interface a very poor choice for anything other than password authentication. If you
are planning on using the crypt() interface for a cryptography project, don't do it: get a good book on encryption and one of the widely available DES libraries.
So, you take a plaintext, in this case it's the bootloader opcodes that will be the same across a large population of devices. Next you append the 128bit key to these 496 longs to get exactly 500 longs of data to send through the hash function.
The hash function generates a hash signature that has 2^256 possible values, predicated on the theory that no 2 input messages will generate the same hash. SHA-256 is capable of hashing messages up to 2^64 - 1 bits in length. A single bit change in the message results in a completely different hash, and sequential changes do not result in sequential hashes.
I surveyed several block ciphers for applicability to the Prop 2. My criteria was to eliminate any ciphers that were known to be weak or theoretically weak based on cryptanalysis. I also constrained the list to ciphers that were in widespread use. In addition, the ciphers had to be open and not encumbered by patents, royalties, or corporate politics. The ciphers also had to offer 128 bit keys, since the amount of fuses in the Prop 2 is limited.
The ciphers that were considered were Blowfish, CAST5 and AES-128. Blowfish, DES, and 3DES are implemented in SSH as stream ciphers. OpenSSL implements all of the candidates, and GPG implements each.
Blowfish requires 4K for S-boxes, which makes it uneconomical to use.
CAST5 is a similar to Blowfish and requires large S-boxes.
AES-128 only requires 256 bytes for the decryption S-boxes, has a 128 bit key, and processes blocks 128bits at a time.
AES-128 leads other algorithms because a) it's an open standard b) it's withstood a lot of professional scrutiny c) it's well understood d) it's had 10+ years as the preferred DOD algorithm e) it's easily implementable in the space constraints
SHA-256 and AES-128 are recommended as a pair by the NSA for the minimum level of security. They have approved these for Secret level information and they are projected to have a 20 year lifespan for encrypted data. If you go to SHA-512 and AES-256, those are classified as the strongest available algorithms and have a 50 year data integrity.
With that said, the current Prop 2 architecture has limitations that would be difficult to overcome, to implement SHA-512, and AES-256 wouldn't be very useful without a 256 bit key, or a key stretching algorithm.
I would bet on the Prop 3 being 64 bit and having more COG ram, which would easily permit a 50 year data security model. But, the proposed algorithms are more than sufficient to protect code in the Prop 2.
I think you may misunderstand the notion of a brute force attack. To brute force the SHA-256 hash, you have to run 2^128 trials to find a secret key that, when combined with the bootloader, results in the same hash signature. Since the secret key is 128 bits, you have to try all the permutations of a 128 bit number until you find one that matches the hash.
Also consider, the hash is 2^256, and the longest message it can process is 2^64-1 bits.
Most Pre-1996 US export restrictions have been severely relaxed. Import restrictions in other countries are more of an issue now.
Either way, if the chip only did code verification in a manner that allowed secure cryptography of the rest of program storage, then it would be up to users, but Parallax could still provide standard/sample implementations on a country-by-country basis.
Ok.I use a 2^256 algorithm. I am only decrypting the 500 byte block.
So, for a brute force attack, I have to try 2^256 combinations to find the key (the fuses). Assume I use a pc capable of processing 1,000 blocks of the 500 byte blocks per second. Any slower and the prop would take too long anyway.
So, I can check 1,000 combinations per second ~2^10. Therefore, per day this is 60*60*(3*8) seconds = ~80,000 = ~2^16. Therefore we can test 2^26 combinations per day.
So, by brute force I will have tested every combination in 10 days.
2^256/2^26 = 2^(256-26) = 1725436586697640946858688965569256363112777243042596638790631055949824 days = 4727223525199016292763531412518510583870622583678346955590770016300 years = ~363632578861462791751040877886039275682355583359872842737 * the time since the big bang
One problem with the authentication-only idea is customers would expect (nay, require) Parallax to provide a reference implementation of the encryption function. So going that route doesn't relieve Parallax from any effort or risk. Having the encryption routine offchip also increases the BoM to the additional storage space and increases the boot time. One other advantage of putting the encryption in ROM is it might be possible to expose it to customer code as a kind of library function which customers could then use to encrypt code & data beyond the initial load.
Since Flash is only really available in power-of-two sizes, having the encryption code off-chip isn't likely to increase the BOM cost at all, but it has significant advantages to continuing security in that algorithms can be replaced by the original producer, even without changing fuses, that algorithms and implementations can be chosen with various speed/security tradeoffs, and that the code security can be provided by a third party.
To brute force the SHA-256 hash, you have to run 2^128 trials to find a secret key that, when combined with the bootloader, results in the same hash signature. Since the secret key is 128 bits, you have to try all the permutations of a 128 bit number until you find one that matches the hash.
Just to be clear, the hacker would only have to run 2^128 trials to be guaranteed to find the key. I think on average it would half that number to find it (2^127 is still a huge number), but he might get lucky and find it right away.
[cluso]
So, I can check 1,000 combinations per second ~2^10. Therefore, per day this is
60*60*(3*8) seconds = ~80,000 = ~2^16. Therefore we can test 2^26 combinations
per day.
So, by brute force I will have tested every combination in 10 days.
Have I made an error somewhere???
[/quote]
Yes.
You can do 2^26 tries per day and you need to get through 2^256.
When dividing these numbers you subtract the exponents.
2^256 / 2^26 = 2^(256 - 26) = 2^230
You will need 2^230 days !!!
Eric,
I would not worry about the storage space increase in boot time due to the size of the included decryption algorithm some of them are quite small.
If you use a key division system where part of the 128 bit key is dedicated to a family, model, and stepping (like Intel chips), you have a unique key for each subset of device you make and not all the devices are broken.
Except now you introduce three problems:
1. Your customer now has to have a different signed binary per revision of ic.
2. The cost involved in changing the die per run.
3. You actually reduced the entropy of the system by making a proportion of the keyspace predictable.
Ring 0 code is ROM code and has supreme access to all chip resources, including reading the secret key. Once the Ring 1 code is authenticated, access to the secret key fuse bits is blocked.
So, does do the read-only key fuses contain a hash for authentication or a secret key?
Ring 1 code has a copy of the secret key in memory, this means any Ring 1 bootloader must be a secure implementation; the reference implementation would be provided by Parallax and be clear and open. Ring 1 code must dispose of the secret key before executing Ring 2 code.
...
I realize people might get squeamish when they hear "Ring 1 code has the key", but that is necessary and accepted. Ring 1 code is written to be as bug free and secure as possible, if a user decides to implement their own Ring 1 code, they are on the hook for ensuring it is secure. RING 1 CODE MUST DISPOSE OF THE KEY!
So, since you explicitly say that the user can write their own, I will. Mine will write the key out to the serial port.
I think the SHA-256/AES-128 combo that pedward is talking about is the best approach.
Beau has found some room to add more fuse bits. Yesterday, he thought we could get over 160 bits into the chip. So, this will afford a 128-bit key, plus extra bits for the user to employ however he wants.
Here's how I understand it would all work:
1) On reset, the ROM code loads all the fuse bits into cog RAM, then trips the circuit to make them unreadable until the next reset.
2) If fuse bits indicate the chip is set for authentication mode:
a) A PASM booter is read from external EEPROM longs $000..$1F7 into hub RAM.
b) The booter is salted with the secret 128-bit key (from fuse bits) and then hashed with SHA-256.
c) The resulting hash is compared to external EEPROM longs $1F8..$1FF.
d) If the hash matches, the secret 128-bit key is put into hub RAM and cog0 is relaunched with the now-authenticated PASM booter code.
e) The authenticated booter, using the secret key conveyed in hub RAM, can now read and decrypt the rest of the EEPROM into RAM.
f) After destroying any copies of the key in hub RAM, the booter cog relaunches itself with final user code.
g) The user code now runs, having been decrypted from EEPROM with no trace of the process that loaded it.
Does this sound right?
Parallax can provide an implementation of AES-128 which can automatically (through the tools) be implanted into your download to facilitate code security. The user could work some other scheme if he chose to, while only conforming to the authentication protocol (assuming the fuse bits dictated that authentication mode was set, otherwise no authentication and key passing occurs).
For the most part, that sounds good to me. Any way it could be expanded to 256 bits, though? That would allow 128 bits for salt for the SHA-256 verification, and 128 bits for an AES key.
One other feature that may, or may not, be used would be the ability of the early bootloader to set a mask on the I/O pins that can set an I/O pin high or low and prevent further changes after it's been booted. That way, someone could provide a bootloader to allow their hardware to accept user-modified code, but set a flag (ex. to light a red LED?) to make it harder to pass off a device with user code as authentic/original.
Any way it could be expanded to 256 bits, though? That would allow 128 bits for salt for the SHA-256 verification, and 128 bits for an AES key.
I don't think we have room for 256 physical fuses, though we already have 8 address bits allocated for their selection.
It seems to me that the 128-bit key used to validate the loader could be reused for AES-128 without any compromise. Is there an advantage to having another 128-bit key for the AES-128 decrypt?
Note: for best security the key should be user configurable, not hardcoded into the ROM. I think from a cryptanalysis perspective there are disadvantages to reusing the same key for the authentication and encryption and there might be keys which are considered weak for one purpose but not the other (although I think that's more applicable to public key crypto). However, given a sufficiently random key it should be "good enough". There should be literature for this.
Of a bigger concern (to me) is any salt / initialization vector. In many cases using a static salt / IV can make cyptanalysis easier, especially known plaintext attacks. Again, making this user configurable (and secure) is the best idea.
Question: what's the advantage of making the fuses only one-time readable? The intent of the authentication / encryption is to ensure the Prop2 only executes valid code. Therefore the fuses should be secure (barring extreme measures). The disadvantage is it prevents the user from re-using the keys unless they are stored in HUB RAM, in which defeats the purpose of the one-time readability.
Chip,
My concern about reusing the fuse bits is that someone may want to use some other algorithm, or maybe they're just using the bits as configuration space and don't care about encryption. Are fuses really that big that you can't easily find room for them?
Eric,
Remember that we're talking about fuse bits here, so they are user-set. Cracking them will break one manufacturer's device, but not all PropII devices.
The SHA/AES scheme sounds like it will work. One additional advantage of having the loader in EEPROM is that it can be configured by the user for his system's maximum clock speed and/or EEPROM/Flash/FRAM data rate, thus decreasing boot times.
The SHA/AES schrme sounds like it will work. One additional advantage of having the loader in EEPROM is that it can be configured by the user for his system's maximum clock speed and/or EEPROM/Flash/FRAM data rate, thus decreasing boot times.
Comments
Think of it like this. You add a checksum to the code you are going to
download. Let's assume we are not encrypting anything yet. On download the
loader verifies the checksum and rejects any mismatch. So if anyone gets hold
of your binary, from EEPROM or elsewhere and changes the content the checksum
will fail and it won't load.
Of course with a simple checksum the attacker probably knows how it is
calculated and where it is in the binary so they can trivially calculate a new
checksum for their modified binary and bingo it runs.
But what if the checksum was calculated over your binary and some "secret
stuff". The secret stuff being on your PC when you create the binary and in the
Props fuses when it downloads. Now if the attacker modifies the binary the
checksum will need changing but they don't know the secret stuff so they can't
create a valid checksum.
Now we have authentication. Add to the that the fact that the checksum
algorithm is not just a simple addition of bytes or CRC but some clever
algorithm that makes reverse engineering the checksum from the binary
"impossibly" hard. SHA 2 is under discussion here.
No, as you see from the above argument, if the attacker tweaks the open
decryption routine the authentication will fail and it won't load and the keys
cannot be dumped. Of course if the user puts a broken decryptor in there that
dumps keys then he has screwed himself.
By the above argument this is not necessary.
By "code" do you mean the binary code or the cipher keys. Bigger keys are
better of course.
I don't think so. The method of shulffing or overlapping should be part of the
encryption algorithm, and we want to use a known respected algorithm. Otherwise
it is just something you have dreamed up to make things more obscure with no
rational behind it.
Never mind an attacker getting your key, but if the cypher implementation in
EEPROM broken it is broken and a way is found around it then it is broken for
everyone everywhere forever...
Now here is the key point, if I understand correctly, the authentication via
SHA2 is estimated to be unbreakable. More so than the encryption algorithms. So
it better to authenticate and down load the encryption algorithm which can be
swapped for a new one when it is brought down.
Do I understand correctly?
But if the code is short, it will be easier to brute-force attack it. The longer the code, the longer it takes to decrypt and hence the longer it takes someone to try each brute force attack. The longer the code, the more varied the instructions can be, also meaning the more obscure the code can be. Remember, it is not text.
But here the cracker only has to play with a few bytes or longs for a brute force attack. The shuffling and overlapping would be in addition (and optional) to a secure algorithm and could be used to insert unused code which would make any statistical analysis fail. This is a different argument. I do understand that a secure algorithm is required. And if there is a flaw, the whole method could be compromised. But, provided the algorithm is secure, breaking one suppliers key will not result in breaking the next suppliers key. Surely, if you have an open decryption code with a secure calculated checksum, that checksum can be broken quite simply. Simply put, you would run the plain code through an encrypter (on the pc) with each of the 2^64 keys to find the one(s) that yield the supplied checksum. Voila, key broken.
What am I missing???
Yes. A hash cannot be compromised to reveal the original key, you can only find a collision or failure in the algorithm. Assuming the algorithm is good, that means you have 2^128 tests that have to be performed to find the key that matches the hash.
To make it very clear, the hash is used like this:
[binary Prop 2 opcodes] + [128 bit secret key] = 16,000 bits of "message" to be hashed
Salt is a method that's been used to obscure common plaintexts for a long time. Unix used DES and a salt to encrypt your password.
Here is an excerpt from the crypt man page:
So, you take a plaintext, in this case it's the bootloader opcodes that will be the same across a large population of devices. Next you append the 128bit key to these 496 longs to get exactly 500 longs of data to send through the hash function.
The hash function generates a hash signature that has 2^256 possible values, predicated on the theory that no 2 input messages will generate the same hash. SHA-256 is capable of hashing messages up to 2^64 - 1 bits in length. A single bit change in the message results in a completely different hash, and sequential changes do not result in sequential hashes.
I surveyed several block ciphers for applicability to the Prop 2. My criteria was to eliminate any ciphers that were known to be weak or theoretically weak based on cryptanalysis. I also constrained the list to ciphers that were in widespread use. In addition, the ciphers had to be open and not encumbered by patents, royalties, or corporate politics. The ciphers also had to offer 128 bit keys, since the amount of fuses in the Prop 2 is limited.
The ciphers that were considered were Blowfish, CAST5 and AES-128. Blowfish, DES, and 3DES are implemented in SSH as stream ciphers. OpenSSL implements all of the candidates, and GPG implements each.
Blowfish requires 4K for S-boxes, which makes it uneconomical to use.
CAST5 is a similar to Blowfish and requires large S-boxes.
AES-128 only requires 256 bytes for the decryption S-boxes, has a 128 bit key, and processes blocks 128bits at a time.
AES-128 leads other algorithms because a) it's an open standard b) it's withstood a lot of professional scrutiny c) it's well understood d) it's had 10+ years as the preferred DOD algorithm e) it's easily implementable in the space constraints
SHA-256 and AES-128 are recommended as a pair by the NSA for the minimum level of security. They have approved these for Secret level information and they are projected to have a 20 year lifespan for encrypted data. If you go to SHA-512 and AES-256, those are classified as the strongest available algorithms and have a 50 year data integrity.
http://en.wikipedia.org/wiki/Key_stretching
With that said, the current Prop 2 architecture has limitations that would be difficult to overcome, to implement SHA-512, and AES-256 wouldn't be very useful without a 256 bit key, or a key stretching algorithm.
I would bet on the Prop 3 being 64 bit and having more COG ram, which would easily permit a 50 year data security model. But, the proposed algorithms are more than sufficient to protect code in the Prop 2.
http://en.wikipedia.org/wiki/Brute-force_attack
I think you may misunderstand the notion of a brute force attack. To brute force the SHA-256 hash, you have to run 2^128 trials to find a secret key that, when combined with the bootloader, results in the same hash signature. Since the secret key is 128 bits, you have to try all the permutations of a 128 bit number until you find one that matches the hash.
Also consider, the hash is 2^256, and the longest message it can process is 2^64-1 bits.
If half the worlds population each had a PC with 4GBytes of RAM it would be about that much.
I would not like to see the prop2 on the export restricted list.
Either way, if the chip only did code verification in a manner that allowed secure cryptography of the rest of program storage, then it would be up to users, but Parallax could still provide standard/sample implementations on a country-by-country basis.
So, for a brute force attack, I have to try 2^256 combinations to find the key (the fuses). Assume I use a pc capable of processing 1,000 blocks of the 500 byte blocks per second. Any slower and the prop would take too long anyway.
So, I can check 1,000 combinations per second ~2^10. Therefore, per day this is 60*60*(3*8) seconds = ~80,000 = ~2^16. Therefore we can test 2^26 combinations per day.
So, by brute force I will have tested every combination in 10 days.
Have I made an error somewhere???
Just to be clear, the hacker would only have to run 2^128 trials to be guaranteed to find the key. I think on average it would half that number to find it (2^127 is still a huge number), but he might get lucky and find it right away.
Bean
So, I can check 1,000 combinations per second ~2^10. Therefore, per day this is
60*60*(3*8) seconds = ~80,000 = ~2^16. Therefore we can test 2^26 combinations
per day.
So, by brute force I will have tested every combination in 10 days.
Have I made an error somewhere???
[/quote]
Yes.
You can do 2^26 tries per day and you need to get through 2^256.
When dividing these numbers you subtract the exponents.
2^256 / 2^26 = 2^(256 - 26) = 2^230
You will need 2^230 days !!!
Eric,
I would not worry about the storage space increase in boot time due to the size of the included decryption algorithm some of them are quite small.
Except now you introduce three problems:
1. Your customer now has to have a different signed binary per revision of ic.
2. The cost involved in changing the die per run.
3. You actually reduced the entropy of the system by making a proportion of the keyspace predictable.
So, does do the read-only key fuses contain a hash for authentication or a secret key?
So, since you explicitly say that the user can write their own, I will. Mine will write the key out to the serial port.
But wait...
It wouldn't have to. I run it in an emulator which ignores the code signing.
Sure, so where in your scenario does the key live?
You already used the fuse bits for your authenticating digest...
Beau has found some room to add more fuse bits. Yesterday, he thought we could get over 160 bits into the chip. So, this will afford a 128-bit key, plus extra bits for the user to employ however he wants.
Here's how I understand it would all work:
1) On reset, the ROM code loads all the fuse bits into cog RAM, then trips the circuit to make them unreadable until the next reset.
2) If fuse bits indicate the chip is set for authentication mode:
a) A PASM booter is read from external EEPROM longs $000..$1F7 into hub RAM.
b) The booter is salted with the secret 128-bit key (from fuse bits) and then hashed with SHA-256.
c) The resulting hash is compared to external EEPROM longs $1F8..$1FF.
d) If the hash matches, the secret 128-bit key is put into hub RAM and cog0 is relaunched with the now-authenticated PASM booter code.
e) The authenticated booter, using the secret key conveyed in hub RAM, can now read and decrypt the rest of the EEPROM into RAM.
f) After destroying any copies of the key in hub RAM, the booter cog relaunches itself with final user code.
g) The user code now runs, having been decrypted from EEPROM with no trace of the process that loaded it.
Does this sound right?
Parallax can provide an implementation of AES-128 which can automatically (through the tools) be implanted into your download to facilitate code security. The user could work some other scheme if he chose to, while only conforming to the authentication protocol (assuming the fuse bits dictated that authentication mode was set, otherwise no authentication and key passing occurs).
How does this sound?
One other feature that may, or may not, be used would be the ability of the early bootloader to set a mask on the I/O pins that can set an I/O pin high or low and prevent further changes after it's been booted. That way, someone could provide a bootloader to allow their hardware to accept user-modified code, but set a flag (ex. to light a red LED?) to make it harder to pass off a device with user code as authentic/original.
I don't think we have room for 256 physical fuses, though we already have 8 address bits allocated for their selection.
It seems to me that the 128-bit key used to validate the loader could be reused for AES-128 without any compromise. Is there an advantage to having another 128-bit key for the AES-128 decrypt?
Of a bigger concern (to me) is any salt / initialization vector. In many cases using a static salt / IV can make cyptanalysis easier, especially known plaintext attacks. Again, making this user configurable (and secure) is the best idea.
Question: what's the advantage of making the fuses only one-time readable? The intent of the authentication / encryption is to ensure the Prop2 only executes valid code. Therefore the fuses should be secure (barring extreme measures). The disadvantage is it prevents the user from re-using the keys unless they are stored in HUB RAM, in which defeats the purpose of the one-time readability.
My concern about reusing the fuse bits is that someone may want to use some other algorithm, or maybe they're just using the bits as configuration space and don't care about encryption. Are fuses really that big that you can't easily find room for them?
Eric,
Remember that we're talking about fuse bits here, so they are user-set. Cracking them will break one manufacturer's device, but not all PropII devices.
-Phil
They are big because each fuse has to have a huge 3.3V PMOS device to blow it. Its channel width is around 300um.
Good point, Phil.
The chip's transistors couldn't handle that kind of voltage. Even 4V would be pushing it. As it is, 3.3V should blow the fuses.