Your point is taken, authentication of the ciphertext wasn't considered a requirement because it is unprivileged and the requirement was to hide the plaintext.
The attack you describe would require copying blocks from the ciphertext that you know will perform some action. If you don't know what each block does, you can't really expect it to perform an action.
Also, you would have to tailor it perfectly so that corrupted instructions don't lead to improper operation.
I described multiple attacks. The latest one only requires to finding a region of the image which is output by the code It can be a bitmap that goes to a display, a string that gets printed or sent out to a serial port etc. Once you find such a region (which you can do by corrupting selected blocks and looking at the output), you only need to copy sets of encrypted blocks from the rest of the image (or from other images encrypted with the same key) into this region. They will be decrypted normally (as CBC only cares about the previous cypher block) and will be output instead of the data (bitmap/font/string etc.) that you replaced. Rinse and repeat and you will be able to decrypt all the blocks in the image.
Yes, it won't work for every application, but it will make a nice blog post or a presentation in a security conference the first time it does
In short, even if you had the plaintext, crafting the ciphertext to do something malicious is very difficult. If you didn't have the plaintext, crafting an attack based on just the ciphertext would be a monumental undertaking.
I'm not discounting that it *could* be done, I'm saying that the likelihood of it actually happening is slim.
That said, I think we can find space to put a hash in there so that corruption of the memory image can be detected.
If you can get some or all of the plaintext as described above, you can a) find bugs/weaknesses to exploit, or b) fill in the missing parts and clone the device or c) reverse engineer the protocol, or get whatever secrets/keys are hidden in the encrypted image (which are some of the reasons why people want code protection).
Given that the purpose of this exercise is to get the design/draft reviewed so that potential issues could be identified. It seems to be working. You have an issue or two which you can evaluate and make a decision whether to take the risk or fix it before things are set in silicon. Which is much easier than trying to fix them after the fact.
I am already quite impressed by the design you came up with, and opening it up for review raises my confidence that you are taking this feature seriously. I think the biggest issue is lack of authentication, and from your response it seems it would be possible to fix. Another "feature" that would be useful is to extend the "trust" to authenticated code by giving it a key derived from the master key (plus the serial, perhaps). This will let protected code to do many cool things such as decrypt and authenticate external code, implement its own "protected storage", and authenticate itself to other devices etc.
On the other hand, these two are related to the ring-1 code, so others who really care about security may be able to implement their own even if you decide not to. I am more concerned about those who wont have the expertise to do so correctly.
I'm not opposed to the change, which is ring 1 boot loader code. I just wanted to paint the reality is much more difficult than the theory. I want to avoid the rom code reading more than 512 longs, so the customer can use a small flash chip to boot to some other media.
I understand the constraints. There is always a tradeoff. I think you are doing the right thing. Even if you decide not to do authentication by default, just by documenting it properly, you are letting your customers know the risk. They can decide whether to accept the risk, and if not will know how to fix it.
Thanks again for doing this.
PS: For some reason I keep wondering whether EAX mode would fit in a single cog or not. I should give it a try sometime.
Correct me if I'm wrong, but wouldn't descent code authentication (which I'm a big proponent of us doing) squash someone's attempts to locate known bit/byte patterns (ie: a string of text or a bitmap) by successively injecting different bit/byte patterns into to code and re-running it?
It doesn't have to be complicated, but shouldn't be too simplistic either. A good polynomial-based CRC like CRC16 would be great for that, I think, and would be simple to implement in hardware... thus it would be a simple matter of clearing the CRC result, piping all decrypted byte values through the CRC16 hardware (as they are decrypted) and comparing the final result with that which is stored in the encrypted image itself. For that matter, CRC32 would be better... and it would be great to have both CRC16 and CRC32 implemented in hardware anyway to facilitate easy and reliable error detection in chip-to-chip or chip-to-device communication.
Okay, perhaps my thought of using CRC-based validation of code after decryption is not a good one. This is addressed in the Wikipedia article on CRC (see bold text especially):
CRCs are specifically designed to protect against common types of errors on communication channels, where they can provide quick and reasonable assurance of the integrity of messages delivered. However, they are not suitable for protecting against intentional alteration of data. Firstly, as there is no authentication, an attacker can edit a message and recalculate the CRC without the substitution being detected. This is even the case when the CRC is encrypted, leading to one of the design flaws of the WEP protocol.
Although, I'm still trying to figure out why this would be the case in a system where the encryption algorithm is sufficiently secure and the keyphrase is always private. Can anyone explain this to me? [EDIT]: In fact, now that I've read this statement over and over again, I'm not convinced it applies to how I was suggesting CRC be used for code integrity validation. Of course, calculating a CRC on a payload and encrypting "the CRC" itself doesn't prevent someone from successfully changing the payload, but encrypting everything properly, with an always-private key, seems it would solve our concerns. (I'm no expert of this... just logically thinking it through).
Also...
When stored alongside the data, CRCs and cryptographic hash functions by themselves do not protect against intentional modification of data. Any application that requires protection against such attacks must use cryptographic authentication mechanisms, such as message authentication codes.
and I think there was talk of using MACs, so that's probably the best thing.
Does using MACs for authentication also give us reasonable (or superior) validation of the integrity of the code as well? If so, that would be a great feature to have because customers have occasionally worried about their non-volatile-based code image becoming corrupted via natural or human event and don't want the application executing in such a case.
... and as such, if this is how we'll implement code integrity checking, we need the MAC or CRC (or whatever mechanism) to check code integrity even if encryption is not enabled, and it must not include areas of the non-volatile image likely to be subject to change through the run-time operation of the application (ie: intersession persistent storage) or it will break the application after the first run. If that last requirement proves impossible or awkward to implement, perhaps one bit of the fuses could be used to optionally toggle code integrity checking upon bootup.
There are a few ways to do code authentication of the encrypted code. One method involves changing the cypher mode from CBC to one of the authenticated cipher modes. I briefly looked at these, but didn't find anything right away that was a slam dunk. The advantage of these systems is that the MAC is builtin to the decryption process, so it all streams at the same time.
Another method is to load the ciphertext into HUB memory and run a SHA-256 MAC on the ciphertext, since we already have that established for the lower level code. There are 2 problems with this method, first we need to find space for the IV and the hash, which we are 16 bytes short on at the moment. I'm leaning towards putting the hash as the first 8 longs of the program space, so if an SD bootloader is written, it will follow the same procedure and the validation of the program code stays with the program code and isn't mixed with the boot-loader code. This way you can boot multiple firmware images without reflashing the boot-loader.
Speaking of reflashing, this presents a bit of a problem. The bootloader is only 2KB, but the sector size is 4KB, so when writing new firmware you have to RMW the first sector.
The second issue with SHA-256 MAC on the ciphertext is COG space. Presently I don't think there would be space for the AES-128 and SHA-256 algorithms in 1 COG. This would require that the bootloader launch a second COG to do the MAC. It would also require either 4KB for the bootloader or the bootloader would require a program image be compiled in and then copied into the other COG.
You would write the bootloader with stub code in it, copy it to HUB RAM, copy the SHA-256 hash from the ROM memory, then launch the MAC COG, load the ciphertext into HUB RAM, then signal the MAC COG to validate the ciphertext and wait for the result.
That's a really complicated approach. I'm more inclined to implement a different cipher mode than CBC.
No, I haven't seen that yet. Yes, seems expensive, but I didn't see anything special about the pictures I saw doing a quick search. Can you send me a link?
I have not seen any pics that really show how the chips stack. It is only a single ram chip that goes on top. They have decided on both versions to use 256MB. So I would think the ARM packaging has an exposed top like a QFN. So maybe QFN both sides. I just thought it was really neat and could be a way of the future - 3D stackable chips. Because of the pi cost, although I thought the concept would be expensive, maybe its not.
It certainly could be an excellent concept. Use a standard fast ram on top of a P2 for the hub memory??? Or P3 with larger cog ram and external hub ram.
Postedit: Just read the next post with a link. Seems it is a BGA type package. My above comments still appply.
Encryption: Would using the CRCC outside the decryption work? I see how inside the decryption could allow it to be compromised (I think).
I have not seen any pics that really show how the chips stack. It is only a single ram chip that goes on top. They have decided on both versions to use 256MB. So I would think the ARM packaging has an exposed top like a QFN. So maybe QFN both sides. I just thought it was really neat and could be a way of the future - 3D stackable chips. Because of the pi cost, although I thought the concept would be expensive, maybe its not.
It certainly could be an excellent concept. Use a standard fast ram on top of a P2 for the hub memory??? Or P3 with larger cog ram and external hub ram.
Postedit: Just read the next post with a link. Seems it is a BGA type package. My above comments still appply.
TI has a whitepaper about difficulties they had fab-ing the first rev of the BeagleBoard. Since it also uses POP flash/RAM, it may prove iluminating for you. It seems JEDEC has a standard pinout for LPDDR which matches what's on all the POP-capable processors.
Comments
I described multiple attacks. The latest one only requires to finding a region of the image which is output by the code It can be a bitmap that goes to a display, a string that gets printed or sent out to a serial port etc. Once you find such a region (which you can do by corrupting selected blocks and looking at the output), you only need to copy sets of encrypted blocks from the rest of the image (or from other images encrypted with the same key) into this region. They will be decrypted normally (as CBC only cares about the previous cypher block) and will be output instead of the data (bitmap/font/string etc.) that you replaced. Rinse and repeat and you will be able to decrypt all the blocks in the image.
Yes, it won't work for every application, but it will make a nice blog post or a presentation in a security conference the first time it does
If you can get some or all of the plaintext as described above, you can a) find bugs/weaknesses to exploit, or b) fill in the missing parts and clone the device or c) reverse engineer the protocol, or get whatever secrets/keys are hidden in the encrypted image (which are some of the reasons why people want code protection).
Given that the purpose of this exercise is to get the design/draft reviewed so that potential issues could be identified. It seems to be working. You have an issue or two which you can evaluate and make a decision whether to take the risk or fix it before things are set in silicon. Which is much easier than trying to fix them after the fact.
I am already quite impressed by the design you came up with, and opening it up for review raises my confidence that you are taking this feature seriously. I think the biggest issue is lack of authentication, and from your response it seems it would be possible to fix. Another "feature" that would be useful is to extend the "trust" to authenticated code by giving it a key derived from the master key (plus the serial, perhaps). This will let protected code to do many cool things such as decrypt and authenticate external code, implement its own "protected storage", and authenticate itself to other devices etc.
On the other hand, these two are related to the ring-1 code, so others who really care about security may be able to implement their own even if you decide not to. I am more concerned about those who wont have the expertise to do so correctly.
Thanks again for doing this.
PS: For some reason I keep wondering whether EAX mode would fit in a single cog or not. I should give it a try sometime.
It doesn't have to be complicated, but shouldn't be too simplistic either. A good polynomial-based CRC like CRC16 would be great for that, I think, and would be simple to implement in hardware... thus it would be a simple matter of clearing the CRC result, piping all decrypted byte values through the CRC16 hardware (as they are decrypted) and comparing the final result with that which is stored in the encrypted image itself. For that matter, CRC32 would be better... and it would be great to have both CRC16 and CRC32 implemented in hardware anyway to facilitate easy and reliable error detection in chip-to-chip or chip-to-device communication.
Although, I'm still trying to figure out why this would be the case in a system where the encryption algorithm is sufficiently secure and the keyphrase is always private. Can anyone explain this to me? [EDIT]: In fact, now that I've read this statement over and over again, I'm not convinced it applies to how I was suggesting CRC be used for code integrity validation. Of course, calculating a CRC on a payload and encrypting "the CRC" itself doesn't prevent someone from successfully changing the payload, but encrypting everything properly, with an always-private key, seems it would solve our concerns. (I'm no expert of this... just logically thinking it through).
Also...
and I think there was talk of using MACs, so that's probably the best thing.
Does using MACs for authentication also give us reasonable (or superior) validation of the integrity of the code as well? If so, that would be a great feature to have because customers have occasionally worried about their non-volatile-based code image becoming corrupted via natural or human event and don't want the application executing in such a case.
... and as such, if this is how we'll implement code integrity checking, we need the MAC or CRC (or whatever mechanism) to check code integrity even if encryption is not enabled, and it must not include areas of the non-volatile image likely to be subject to change through the run-time operation of the application (ie: intersession persistent storage) or it will break the application after the first run. If that last requirement proves impossible or awkward to implement, perhaps one bit of the fuses could be used to optionally toggle code integrity checking upon bootup.
There are a few ways to do code authentication of the encrypted code. One method involves changing the cypher mode from CBC to one of the authenticated cipher modes. I briefly looked at these, but didn't find anything right away that was a slam dunk. The advantage of these systems is that the MAC is builtin to the decryption process, so it all streams at the same time.
Another method is to load the ciphertext into HUB memory and run a SHA-256 MAC on the ciphertext, since we already have that established for the lower level code. There are 2 problems with this method, first we need to find space for the IV and the hash, which we are 16 bytes short on at the moment. I'm leaning towards putting the hash as the first 8 longs of the program space, so if an SD bootloader is written, it will follow the same procedure and the validation of the program code stays with the program code and isn't mixed with the boot-loader code. This way you can boot multiple firmware images without reflashing the boot-loader.
Speaking of reflashing, this presents a bit of a problem. The bootloader is only 2KB, but the sector size is 4KB, so when writing new firmware you have to RMW the first sector.
The second issue with SHA-256 MAC on the ciphertext is COG space. Presently I don't think there would be space for the AES-128 and SHA-256 algorithms in 1 COG. This would require that the bootloader launch a second COG to do the MAC. It would also require either 4KB for the bootloader or the bootloader would require a program image be compiled in and then copied into the other COG.
You would write the bootloader with stub code in it, copy it to HUB RAM, copy the SHA-256 hash from the ROM memory, then launch the MAC COG, load the ciphertext into HUB RAM, then signal the MAC COG to validate the ciphertext and wait for the result.
That's a really complicated approach. I'm more inclined to implement a different cipher mode than CBC.
It certainly could be an excellent concept. Use a standard fast ram on top of a P2 for the hub memory??? Or P3 with larger cog ram and external hub ram.
Postedit: Just read the next post with a link. Seems it is a BGA type package. My above comments still appply.
Encryption: Would using the CRCC outside the decryption work? I see how inside the decryption could allow it to be compromised (I think).