A simpler alternative to my previous post is to have a word of lock bits, such that once a lock bit is set, the associate I/O pin cannot be state-changed without a system reset. An authenticated bootloader could set said lock bit on an I/O pin, and that I/O pin could have a circuit to make sure the user is aware that it's unauthenticated/non-stock code.
I think this also solves the problem of an insulin pump while still allowing end-user modifications for hardware.
That runs so against the grain that it's beyond my imagining.
But I suppose there are paranoid types out there who -- let's say for safety reasons -- don't want someone messing with the code in their device. So how about a two-tiered lockdown: one that pays attention to a "this is encrypted" bit in the downloaded code and one that assumes everything has to be encrypted.
-Phil
I was thinking about a machine that might control heavy equipment, for which there are potential liabilities if someone gets hurt. It's just an electro-litigal consideration. And yes, the encryption can be staged, so it needn't cause anyone undue headaches.
Thinking about it more, imagine a company that makes two or more devices, differing only in the installed firmware. Each model number uses a distinct load key, so that Company A who has the super-deluxe version can't share his firmware with Company B, who bought the entry-level version. Now, suppose that the manufacturer wants to write a secure test program that works on all models (or for all customers if each customer is keyed differently, say). For this it would be handy to have a master key that works across all of them. So maybe a four-level system would be better, which could be accomplished by having two keys:
0. Unsecured (both keys zero).
1. Secured, but can accept unsecured code (key A non-zero, key B zero).
2. Secured, but can accept master-keyed code (key A and B non-zero, but not equal).
3. Completely secured with a single key (key A and B non-zero and equal).
While I liked the simplicity of just loading one 504 block directly to cog while decrypting, this would be very easy to break.
Using the out-off order blocks, what I did not mention is that it would also be possible to overlap the data, thereby making the code far more difficult to break. eg Block 1 loads into $1040 (504) bytes. Block 2 loads into $1040-301 where 301 is decimal, so it overwrites block 1 by 203 bytes. Later in loading Block n loads into $1040+355, meaning block 1 now has the last 149 bytes overwritten as well. So in effect only bytes 203-354 were actually used, with the rest becoming useless decoy code.
Now, if we also can use coginit(0) to begin from anywhere in hub, the user can further decrypt with their own algorithm, such as copying the first 16 hub bytes from elsewhere, resulting in the fact that no code can be expected to contain anything specific.
Take the start record a little further and allow for this to be out of order and thismakes a really difficult code to break ontop of the actual encryption. In fact make the start record a data record with the start address as the last 4 bytes of the data, so all 504 bytes get loaded into hub and bytes 500_503 are the start addres, or they could be the actual start instruction as like lmm code.
Even if you knew what the first 16 bytes were it would take a very long time (decades or centuries) to determine what the 64-bit key was. You would have try the 16 billion*billion possible keys until you found the right one.
Dave Hein,
Modern GPUs can try many billions of combinations per second. 64bit keys are not really all the secure anymore. Might take overnight or a few days to break using a multi-GPU setup.
They are massively parallel. Think, 512-1536 parallel cores, crunching numbers (in one GPU chip).
If the download takes a whole second before you know whether or not it deciphered okay, how elaborate does the scheme really need to be? Never mind the 'levels' issue, for a moment.
You would send it (or it would read from SD or EEPROM) your code, which it would decipher using the secret key. It would check the deciphered code for okay-ness, and then execute it. If the code didn't validate, nothing would happen. I see no opportunity for data-in/data-out analysis, do you guys? You'd have to send it a whole encrypted program before you'd know if it worked, or not. If that process takes a second, it may take forever to crack. Is this wrong to think?
The initial user code could be constrained to one cog's worth, so that known text strings, let's say, couldn't be searched for.
I don't think it's safe to assume how long it takes to download into the Prop2 and decode will be any kind of hinderance. Using GPU general purpose coding, they can quickly just try every combination until they see prop2 code on the other side (easily identified). the key bit length needs to be longer to make this unreasonable.
I don't think it's safe to assume how long it takes to download into the Prop2 and decode will be any kind of hinderance. Using GPU general purpose coding, they can quickly just try every combination until they see prop2 code on the other side (easily identified). the key bit length needs to be longer to make this unreasonable.
Duh! I don't know how to think about this stuff, yet. Thanks for pointing that out.
Well... wait a minute... How would you know what you are looking for? You'd never know for sure, would you?
Chip,
Data that has been decrypted with the incorrect key is very similar to random data. Propeller assembly language, on the other hand, will have a lot of repeating data, such as most instructions containing the same execution flags and some instructions being very common, so by calculating the randomness of the data, it would be easy to determine if it was correctly decrypted. You could also see if it has a valid header, which is the most common way to determine if a file was decrypted correctly.
Dave Hein,
Modern GPUs can try many billions of combinations per second. 64bit keys are not really all the secure anymore. Might take overnight or a few days to break using a multi-GPU setup.
They are massively parallel. Think, 512-1536 parallel cores, crunching numbers (in one GPU chip).
So how long would it take a 10 GHz machine with 1000 cores to do 16 billion*billion encryptions? Let say a single encryption takes 100 cycles, so that would be 100 billion per second, which results in 16 billion/100 = 160 million seconds. That's about 5 years. So if you had a 10 GHz machine with 1000 cores it is feasible to crack a 64-bit key in a few years.
I cannot comment on encryption methods. I will leave that for the experts in this field.
But, if the code was incorrect, it would be nice for the prop to continue to process code as if it were correct until the load was exhausted, then look for the flag to see if there was a failure. If a failure, change the key, leave the invalid flag set, and then start again. That way, the prop does not in fact ever stop, so that it would indeed take at least as long as a valid decryption takes, if not longer, to realise it had failed. This would slow the brute force attack. And anyone trying to do some form of heat/power signature would not be getting the correct details after the first pass. Just a thought.
In order to get around the "static header" issue surely all one has to do is prepend a bunch of random bytes to the binary before encrypting. Real random that is. Then every time one encrypts the same program the actual binary for down load is different. More effective than shuffling the blocks. Prop just discards those salt bytes on decrypting.
Have a google for "encryption salt".
I would imagine that any form of public key encryption/authentication is off the table as those algorithms start to get huge.
We do seem to be looking at two (or more) different problems here:
1. How to stop an attacker copying your EEPROM code in order to clone your device or otherwise make use of your code.
2. How to stop an attacker loading his own code into a device for some evil purpose.
Cluso,
We cannot assume that the brute force attack is slowed down by having to load to an actual Propeller. One could be attempting billions of decryptions of a binary on a fast machine with no Propeller around. Just keep looking for something to come out that looks statistically like PASM instructions or Spin byte codes.
Hopefully if your algorithm is good this still takes an infeasibly long time.
Something just occurred to me reading this discussion.
What about encryption used as some attack. An open device, operating much as we do now, is produced. Somebody else has nefarious intent, offering an update, or perhaps the device gets updates and malicious code ends up in the Propeller. What if that code then sets the fuses, basically bricking the device?
Guess the questions are:
Does anyone care about that?
If so, is there an option to commit the Prop to "open" only, just as there will be to commit to "trusted" only?
I would imagine that any form of public key encryption/authentication is off the table as those algorithms start to get huge.
Man, but wouldn't it be cool? The secret key is generated on the prop and is never available to anyone. /drool. It also means that the customer wouldn't be able to have the key stolen from them.
We do seem to be looking at two (or more) different problems here:
1. How to stop an attacker copying your EEPROM code in order to clone your device or otherwise make use of your code.
2. How to stop an attacker loading his own code into a device for some evil purpose.
You'll never stop #2. The attacker just has to substitute the prop on the board for one without blown fuses and his own code. As such I'm not sure it's worth defending.
I guess I don't see any value in code signing / validation either for the same reason. The nature of these things is that you either end up with something correct or you end up with something blistering, horribly, obscenely wrong in every way imaginable.
Something just occurred to me reading this discussion.
What about encryption used as some attack. An open device, operating much as we do now, is produced. Somebody else has nefarious intent, offering an update, or perhaps the device gets updates and malicious code ends up in the Propeller. What if that code then sets the fuses, basically bricking the device?
Guess the questions are:
Does anyone care about that?
If so, is there an option to commit the Prop to "open" only, just as there will be to commit to "trusted" only?
Yes, you could just blow the master fuse, leaving the others at '0'. The master fuse would electrically inhibit any further programming. This could have the effect of leaving the device permanently "open".
Potatohead,
Yes I do worry about bricking my Propellers. Never mind any attacker as I said before I had a pile of unusable AVRs on the bench that I had somehow fused the wrong way. One of the reasons I gave up on the damn things for hobby tinkering.
just thinking out loud on your comment.. "Just keep looking for something to come out that looks statistically like PASM instructions or Spin byte codes." ....
What if the hashed code result in the EEPROM after encryption IS/ARE valid PASM instructions? Just not the instructions that were originally intended.
What i mean... suppose the encryption ONLY scatters the the instruction mnemonic of the OP code so that visually it may look like valid PASM code, but if you were to follow the flow of the code, then it would not make any sense.
Seems to me that would suggest something about the encryption used, lowering the bar for an attack because for it to look like real instructions would imply far fewer valid possibilities than just numbers would.
Curious to hear what others say on this, because I like the idea otherwise.
Hmm...I have a suspicion that only partially encrypting stuff like that might not be a good idea. What if the "secret stuff" I'm trying to hide is not PASM instructions or Spin byte codes? Perhaps it's ZPU byte codes for ZOG or perhaps it's not code at all but some other kind of data. It would seem to be better that the encryption is not dependent on the thing you are encrypting.
@Heater, yeah! Bricking them that way is the worst. They are fine otherwise, just useless. On the other hand, bricking them by letting the smoke out does render them really useless as opposed to just technically useless, LOL!!
@Chip, thanks. Makes perfect sense. So then, an uncommitted Propeller is something to be careful with! Whatever gets done, it's probably a good idea to supply a basic object / code that will commit a chip to open code easy cheezy to cut down on mistakes.
The 65th bit seems fine to me. You can choose open (default), locked open (b64=1, b0-63=0), locked encrypted (b64=1, b0-63=key) and of course unlocked encrypted (b64=0, b0-63=key or user).
But it then allows us to use the fuses differently too.
Now, can I suggest something slightly different to the above.
b64=0: Unlocked = fuses can be blown, No decryption. b0-63 passed to user code after loading (maybe $0 unprogrammed)
b64=1: Locked =fuses cannot be changed. b0-63=0 means no decryption. b0-63<>0 means decryption, key cannot be read.
The only problem I see with this suggestion is that you must program the key and lock to use decryption.
Perhaps with decryption, the I, S and D could only be transformed by using one fuse bit. i.e. It keeps the other 5 bits unchanged ?
Is there an update as to when the possible release of the Prop2 will be? The first post mentions November of... oh, that was 2 months ago. lol
Thanks,
David
@potatohead - "Seems to me that would suggest something about the encryption used" - Wouldn't a standard encryption method do the same thing?
"...lowering the bar for an attack because for it to look like real instructions would imply far fewer valid possibilities than just numbers would" - Actually, it would be about the same, assuming that each manipulated bit in the instruction represented a valid instruction and there were no holes. The fact that it would 'look' like a valid instruction would add to the non-deterministic complexity of trying to decipher it, because the hash produces valid code commands, each branch you follow in the code would send you on a goose chase.
Essentially the idea would be the same with any encryption method, but what would be different would be "splitting" the encrypted string across ONLY the bits correlating to the instruction identifier.
@Heater - "Hmm...I have a suspicion that only partially encrypting stuff like that might not be a good idea. What if the "secret stuff" I'm trying to hide is not PASM instructions or Spin byte codes?" - Sadly it's gonna be hard to please everybody :-)
Something else to think about. If you are using external RAM and put you code in the external RAM (xLMM) then the code security on the prop isn't going to help you.
Here is a simple example of just XORing the Instruction register against a 'key' and nothing else.... the xor'd result is a direct translation to the code, but it could just as well be symbolic, meaning that it would translate to a different instruction even further. This would prevent any 'hole' in case there was a hashed result that didn't match an existing instruction. This happened twice in my example below, and for demonstration purposes I reversed the resulting instruction bits on the two that didn't have a match.
I'd just be curious if someone could figure out the original code ... after all this is a simple XOR encryption... the 'real' thing would implement a much harder sequence.
DAT
PASM org
sar dira, #%1
negnz :T1, :P1
neg :T1, cnt
cmpsub :C1 , #16
:L1 addabs :T1, :P1
muxnc outa, #%1
negc :C1, #:L1
ror :C1, #16
:L2 sumnc :T1, :P2
muxnc outa, #%1
negc :C1, #:L2
or :C1, #16
xor #:L1
:T1 long 0
:P1 long 298
:P2 long 322
:C1 long 0
Dave,
It's not inconceivable to have a farm of GPUs to work on the problem. With the latest AMD GPU you have 2048 stream processors, running at about 1Ghz (925Mhz default, many cards with push that above 1Ghz). So 5 of those roughly matches the 1000 10Ghz cores, which you said would take about 5 years. 100 of them would do it in 3 months. At $500 a piece, that's only $50,000 for the GPU cards. It's not practical, but it's well within reach of many.
I don't think we need to jump to 128bit, going to 72 or 80 would be sufficient for the next 10 years or so.
Comments
I think this also solves the problem of an insulin pump while still allowing end-user modifications for hardware.
This is an ATI Radeon 7970 card
I was thinking about a machine that might control heavy equipment, for which there are potential liabilities if someone gets hurt. It's just an electro-litigal consideration. And yes, the encryption can be staged, so it needn't cause anyone undue headaches.
Thinking about it more, imagine a company that makes two or more devices, differing only in the installed firmware. Each model number uses a distinct load key, so that Company A who has the super-deluxe version can't share his firmware with Company B, who bought the entry-level version. Now, suppose that the manufacturer wants to write a secure test program that works on all models (or for all customers if each customer is keyed differently, say). For this it would be handy to have a master key that works across all of them. So maybe a four-level system would be better, which could be accomplished by having two keys:
1. Secured, but can accept unsecured code (key A non-zero, key B zero).
2. Secured, but can accept master-keyed code (key A and B non-zero, but not equal).
3. Completely secured with a single key (key A and B non-zero and equal).
-Phil
Verified bootloader A accepts code A and test Z
Verified bootloader B accepts code B and test Z
Chip A verifies that bootloader A can be verified with public key in fuse bits in chip A
Chip B verifies that bootloader B can be verified with public key in fuse bits in chip B
Using the out-off order blocks, what I did not mention is that it would also be possible to overlap the data, thereby making the code far more difficult to break. eg Block 1 loads into $1040 (504) bytes. Block 2 loads into $1040-301 where 301 is decimal, so it overwrites block 1 by 203 bytes. Later in loading Block n loads into $1040+355, meaning block 1 now has the last 149 bytes overwritten as well. So in effect only bytes 203-354 were actually used, with the rest becoming useless decoy code.
Now, if we also can use coginit(0) to begin from anywhere in hub, the user can further decrypt with their own algorithm, such as copying the first 16 hub bytes from elsewhere, resulting in the fact that no code can be expected to contain anything specific.
Take the start record a little further and allow for this to be out of order and thismakes a really difficult code to break ontop of the actual encryption. In fact make the start record a data record with the start address as the last 4 bytes of the data, so all 504 bytes get loaded into hub and bytes 500_503 are the start addres, or they could be the actual start instruction as like lmm code.
Modern GPUs can try many billions of combinations per second. 64bit keys are not really all the secure anymore. Might take overnight or a few days to break using a multi-GPU setup.
They are massively parallel. Think, 512-1536 parallel cores, crunching numbers (in one GPU chip).
If the download takes a whole second before you know whether or not it deciphered okay, how elaborate does the scheme really need to be? Never mind the 'levels' issue, for a moment.
You would send it (or it would read from SD or EEPROM) your code, which it would decipher using the secret key. It would check the deciphered code for okay-ness, and then execute it. If the code didn't validate, nothing would happen. I see no opportunity for data-in/data-out analysis, do you guys? You'd have to send it a whole encrypted program before you'd know if it worked, or not. If that process takes a second, it may take forever to crack. Is this wrong to think?
The initial user code could be constrained to one cog's worth, so that known text strings, let's say, couldn't be searched for.
Duh! I don't know how to think about this stuff, yet. Thanks for pointing that out.
Well... wait a minute... How would you know what you are looking for? You'd never know for sure, would you?
Data that has been decrypted with the incorrect key is very similar to random data. Propeller assembly language, on the other hand, will have a lot of repeating data, such as most instructions containing the same execution flags and some instructions being very common, so by calculating the randomness of the data, it would be easy to determine if it was correctly decrypted. You could also see if it has a valid header, which is the most common way to determine if a file was decrypted correctly.
— David Carrier
Maybe we need 128 bits.
But, if the code was incorrect, it would be nice for the prop to continue to process code as if it were correct until the load was exhausted, then look for the flag to see if there was a failure. If a failure, change the key, leave the invalid flag set, and then start again. That way, the prop does not in fact ever stop, so that it would indeed take at least as long as a valid decryption takes, if not longer, to realise it had failed. This would slow the brute force attack. And anyone trying to do some form of heat/power signature would not be getting the correct details after the first pass. Just a thought.
Have a google for "encryption salt".
I would imagine that any form of public key encryption/authentication is off the table as those algorithms start to get huge.
We do seem to be looking at two (or more) different problems here:
1. How to stop an attacker copying your EEPROM code in order to clone your device or otherwise make use of your code.
2. How to stop an attacker loading his own code into a device for some evil purpose.
We cannot assume that the brute force attack is slowed down by having to load to an actual Propeller. One could be attempting billions of decryptions of a binary on a fast machine with no Propeller around. Just keep looking for something to come out that looks statistically like PASM instructions or Spin byte codes.
Hopefully if your algorithm is good this still takes an infeasibly long time.
What about encryption used as some attack. An open device, operating much as we do now, is produced. Somebody else has nefarious intent, offering an update, or perhaps the device gets updates and malicious code ends up in the Propeller. What if that code then sets the fuses, basically bricking the device?
Guess the questions are:
Does anyone care about that?
If so, is there an option to commit the Prop to "open" only, just as there will be to commit to "trusted" only?
Man, but wouldn't it be cool? The secret key is generated on the prop and is never available to anyone. /drool. It also means that the customer wouldn't be able to have the key stolen from them.
You'll never stop #2. The attacker just has to substitute the prop on the board for one without blown fuses and his own code. As such I'm not sure it's worth defending.
I guess I don't see any value in code signing / validation either for the same reason. The nature of these things is that you either end up with something correct or you end up with something blistering, horribly, obscenely wrong in every way imaginable.
Yes, you could just blow the master fuse, leaving the others at '0'. The master fuse would electrically inhibit any further programming. This could have the effect of leaving the device permanently "open".
Yes I do worry about bricking my Propellers. Never mind any attacker as I said before I had a pile of unusable AVRs on the bench that I had somehow fused the wrong way. One of the reasons I gave up on the damn things for hobby tinkering.
just thinking out loud on your comment.. "Just keep looking for something to come out that looks statistically like PASM instructions or Spin byte codes." ....
What if the hashed code result in the EEPROM after encryption IS/ARE valid PASM instructions? Just not the instructions that were originally intended.
What i mean... suppose the encryption ONLY scatters the the instruction mnemonic of the OP code so that visually it may look like valid PASM code, but if you were to follow the flow of the code, then it would not make any sense.
Curious to hear what others say on this, because I like the idea otherwise.
@Chip, thanks. Makes perfect sense. So then, an uncommitted Propeller is something to be careful with! Whatever gets done, it's probably a good idea to supply a basic object / code that will commit a chip to open code easy cheezy to cut down on mistakes.
But it then allows us to use the fuses differently too.
Now, can I suggest something slightly different to the above.
b64=0: Unlocked = fuses can be blown, No decryption. b0-63 passed to user code after loading (maybe $0 unprogrammed)
b64=1: Locked =fuses cannot be changed. b0-63=0 means no decryption. b0-63<>0 means decryption, key cannot be read.
The only problem I see with this suggestion is that you must program the key and lock to use decryption.
Perhaps with decryption, the I, S and D could only be transformed by using one fuse bit. i.e. It keeps the other 5 bits unchanged ?
Thanks,
David
"...lowering the bar for an attack because for it to look like real instructions would imply far fewer valid possibilities than just numbers would" - Actually, it would be about the same, assuming that each manipulated bit in the instruction represented a valid instruction and there were no holes. The fact that it would 'look' like a valid instruction would add to the non-deterministic complexity of trying to decipher it, because the hash produces valid code commands, each branch you follow in the code would send you on a goose chase.
Essentially the idea would be the same with any encryption method, but what would be different would be "splitting" the encrypted string across ONLY the bits correlating to the instruction identifier.
@Heater - "Hmm...I have a suspicion that only partially encrypting stuff like that might not be a good idea. What if the "secret stuff" I'm trying to hide is not PASM instructions or Spin byte codes?" - Sadly it's gonna be hard to please everybody :-)
I'd just be curious if someone could figure out the original code ... after all this is a simple XOR encryption... the 'real' thing would implement a much harder sequence.
It's not inconceivable to have a farm of GPUs to work on the problem. With the latest AMD GPU you have 2048 stream processors, running at about 1Ghz (925Mhz default, many cards with push that above 1Ghz). So 5 of those roughly matches the 1000 10Ghz cores, which you said would take about 5 years. 100 of them would do it in 3 months. At $500 a piece, that's only $50,000 for the GPU cards. It's not practical, but it's well within reach of many.
I don't think we need to jump to 128bit, going to 72 or 80 would be sufficient for the next 10 years or so.