All this is very interesting. Chip implied here http://forums.parallax.com/showthread.php?p=593557 that that RDLONG was used to load the cog. Obviously when loading from ROM an RDLONG is not being used or else code would not be scrambled. Along the lines that hippy suggested, maybe 9 bit areas are swapped around? So the instruction bits are swapped with the destination or source parts of the instruction?
1.Start a new spin cog.
2.Stop that cog
3.Start it again with an instruction to return a long from the cog.
4.Read all the remaining longs that way and output them to something or store them in hub ram.
5.Guess what you've overwritten with the instruction,
or copy and compare the image in HUB RAM with the image in ROM to guess the encryption key.
I'm assuming cog RAM doesn't clear. If it does then there is a way to protect other code.
Hmmm. Maybe the cog can be reset after just a few longs and be started then?
Sorry I haven't learned enough PASM yet to evaluate all the likely loop holes.
Chip wrote:
> The next Propeller will hopefully solve this problem, once and for all.
Hmm ... do I have to read that as EEPROM? IIRC, it can't be done with the currently rotating Propeller's process.
Or some kind of OTP? If so, hopefully not the "external" ROM.
Nick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!
The DIY Digital-Readout for mills, lathes etc.: YADRO
There are no loopholes.
Some threads here from time to time show the need for IP protection even in this small community. My idea 6 monthes ago - when I met the Propeller - was that Parallax had all the ingredients.
I expected from month to month that someone would come up with a disassembled SPIN interpreter.
For Parallax - selling to commercial customers - it would be a very serious argument that for more than a year now, a bunch of highly gifted programmers had not succeeded to break the IP barrier for the interpreter!
I was not aware that no one really tried this seriously
''MPE_SECRETSPIN_002 2007-07-29 by deSilva
'
' Checks for some pecularities of encrypted SPIN Interpreter
CON
_clkmode = xtal1 + pll8x ' Hydra!
_xinfreq = 10_000_000
OBJ
pc : "PC_Interface"
VAR LONG a, b, x
PUB Main | stack[noparse][[/noparse]20]
pc.start(31,30) 'start the PropTerminal
pc.setcol(1)
REPEAT
pc.str(string(13, "COGNEW in µs..."))
a := CNT
cognew(@asm, @x)
b := CNT
checkIt(string(" ASM: "))
a := CNT
cognew(spin, @stack)
b := CNT
checkIt(string(" SPIN: "))
PRI checkit(what)
pc.str(what)
waitcnt(cnt+clkfreq)
pc.dec((x-a)/80)
pc.out(" ")
pc.dec((x-b)/80)
PRI spin
x := cnt
DAT
asm
MOV CNT, CNT
WRLONG CNT, PAR
COGID 0
COGSTOP 0
Results: I am getting are 114(9160),100(8008) for ASM and 317(25364),104(8356) from Spin. Numbers in brackets are the actual number of cycles. The first number is the time it takes for the cog to start for the ASM and the time it takes to start + maybe a little bit more for spin (have to allow for the instruction fetch+jmp to instruction decoding+setting up the stack).
deSilva, it would also be interesting to get an ASM cog to start another ASM cog and spin cog to see how different they are.
edit: just did b-a which is the time taken the do the cognew instruction. Got 1152 for the assembly cog and 17008 for the spin cog so something very interesting is going on. (clock cycles not us)
Post Edited (stevenmess2004) : 2/21/2008 1:01:00 AM GMT
There is some presetting of the stackframe for the new SPIN-COG, as discussed in Paul's thread the other day. But this will not explain the 200 µs difference, sufficient to reload a COG twice!
Interestingly it only takes 104 µs (or 8350 ticks) after the COGNEW for SPIN is issued until the SPIN program responses, so in fact there CAN BE NO 2-tiere decrytion using a (decripting) pre-loader. As we know how to activate SPIN form ASM this could be double checked, but I think this gives no new insight..
I think this is been part of the bootstrap loader...
' Start up oscillator and begin Spin program at $0010
' (assumes EEPROM data already loaded into $0000-$7FFF)
' (assumes currently running RCFAST oscillator)
' (assumes this is COG 0 - gets rebooted)
'
rdbyte address,#$0004 'if xtal/pll enabled, start up now
and address,#$F8 '..while remaining in rcfast mode
clkset address
:delay djnz time_xtal,#:delay 'allow 20ms @20MHz for xtal/pll to settle
rdbyte address,#$0004 'switch to selected clock
clkset address
coginit interpreter 'reboot cog with interpreter
time_xtal long 20 * 20000 / 4 / 1 '20ms (@20MHz, 1 inst/loop)
interpreter long $0001 << 18 + $3C01 << 4 + %0000
address res 1
This time is spent in the SPIN interpreter, part of it needed for the stack frame organization, as said.
However 200 µs is an extremely long time, good for 10 to 20 lines of SPIN generally. This is factor "17" you noticed, in contrast to the one SPN line for the ASM-COGNEW.
The most simple indicator for the quality of encryption is its deviation from noise.
I just took some minutes to count the percentage of zero bytes in the first 2000 bytes of:
- RAM ($0): 18% : This is the generally observed redundance of slightly compressed code and used data
- BootLoader ($F800): 6%
- SPIN ($F000): 4.4%: both values indicate a not so perfect uniform distribution, which gives hope
Probably something really simple like XORing every byte with $42. Actually it would have to be something simple as many countries (France is one, I believe) prohibit the import or possession of "strong" encryption. It will be interesting to see what the results are.
EDIT: I like Hippy's idea of bit swapping. As anyone who has used DES knows, this is easy to do in hardware and difficult in software. I think it was also used on some of the earlier video games which used removable cartridges. The implementation could be something like two data busses connecting the ROM to the rest of the chip. One is a straightforward connection for plaintext data. The other has the rearranged connections for the "scrambled" data. Tri-state drivers on each end would switch between the two as needed. And since there is no keying material it avoids all the legal entanglements of true encryption. Of course I don't know if this is how it is actually done but it would be a simple and clever method.
Now for more theory time. A cog only loads 496 longs. However, there is 511 longs for the interpreter and 516 for the boot loader. Now, if the cog is loaded sequently starting at $F004 and $F800 then there must be 15 and 16 longs respectively at the end that are either used for something else or do nothing. If they do nothing then they could be very useful for figuring out how to crack this. (i.e. if they are all one value then we could try a XORing the rest of the memory with them)
This is all a learning experience for me. Never done anything like this before
Edit: Just checked and the values are not the same so this won't work. Time to try something else.
Post Edited (stevenmess2004) : 2/21/2008 6:57:14 AM GMT
What if you load a cog starting at $FFF0? Will the descramble stay on and rollover to RAM 0 where some code might be?
If not, that code might run and be able to display descrambled data from fff0-ffff to compare with the scrambled code.
VIRAND said...
What if you load a cog starting at $FFF0? Will the descramble stay on and rollover to RAM 0 where some code might be?
If not, that code might run and be able to display descrambled data from fff0-ffff to compare with the scrambled code.
If we write something in assembly this should be possible. May have to have a look at a serial object to see what I can do.
(a) Q: How does the hardware know it has to "decrypt" something in the first place?
(b) When loading cells 496 to 511 the HUB data is suppressed and substituted by 0, in a normal load. However it is possible the COG loading does something special when in decryption mode.
(c) First do a simple statistical analysis on bit and long basis to learn the distribution!
(a) don't know but I wouldn't mind betting that it is either everything in the ROM, everything after $F004 or just when $F004 or $F800 which is not really helpful
(b) Possibly, if using an XOR then the mask may be in there
(c) Distribution of what? certain instructions? 1s and 0s? zcr flags?
Like I said a few posts above I've never worked on anything like this before so don't really have any good idea of what to try so any suggestions will be accepted
At first I figured that it shouldn't be that difficult if you assume that the first instruction is a MOV, exept...
There's a lot of possible variations on that instruction, so trying to 'backtrack' and find the XOR value(If that's what is used) isn't that easy.
Also, does it XOR with the same value all the time, does it use the result of the previous operation, or maybe a variation?
(Shifting the XOR one bit to the left for each operation, maybe)
As for what code to expect...
I have a sneaking suspicion that Chip is even more clever than is normally assumed, and his programming 'style' depends more on what he is trying to achieve, than how he prefers to work.
In other words...
I'm getting a headache just thinking about this...
If the code is ever revealed, would it be possible to get the listing printed on T-shirts, with the heading '496 Longs, who needs more?'
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...
Gadgetman said...
If the code is ever revealed, would it be possible to get the listing printed on T-shirts, with the heading '496 Longs, who needs more?'
Very good
Paul posted the code that does the random number generation in the GEAR thread so we can look for that. I just did a quick calculation and it looks like it will probably take somewhere over 3 hours to go through and check every XOR value between 0 and 2^32. So I may write a small program to do this over the weekend and see if I come up with anything unless anyone has a better suggestion.
RE: clock counting on the cognew.
Don't forget the cognew is performed by the Prop itself, not the instruction set, so ANYTHING is possible, as it's ALL parallel inside.
I'm currently of the opinion that there's nothing to it other than bit-swapping of the data, and if that isn't the case then with some bit-inversion / XORing applied as well. There are a large number of bit-swapping possibilities ( is it 32! ? ) so the bit-swapping is not immediately obvious and offers a high degree of protection.
My reasoning for this ...
1) Chip would not do something overly complex when there's an easier, effective method.
2) Bit-swapping has zero silicon overhead, minimal added risk to the project
3) Encrypting the ROM is easy and has low risk of error.
4) It is deterrence rather than protection.
5) It has been described as "scrambling" not encryption.
6) I've done similar simple things and it is hard to see the forest for the trees in the result.
7) Bit-swapping is harder to crack than XORing, in code writing and execution time
8) Looking at the data suggests it could be the case ...
This is particularly true of the data below when loaded to the interpreter Cog ( assuming it's a linear load from $FFF4->$000 onwards ) ...
Common constants used in many programs and most likely the interpreter are -1, $8000_0000 for sign bit manipulation, plus $0000_0200 for destination register incrementing. All those can be found by just bit-swapping.
They are all towards the end of the Cog where it is 'natural' to place constants. I have no explanation for 1E6 and 1E9 being the same, except one may get updated at run-time. The data at 1EB could decode to $FFC0_0000 ... $0000_03FF and in-betweens which could be a useful constant.
Bit-counting I think is the key to identifying known instructions. A "mov ?,PAR" would have at least 12 bits set, at least 11 bits clear for example, and that should appear early on. The only unknown bits there are the register which would be subsequently used in rdxxxx's to set the pc, sp, and global variable pointer.
I think the solution is going to rest on a finding the set of bit-swaps which generate a potentially correct result, those which don't generate "mov ?,PAR" early on can be discarded, those that don't reveal some "rdxxxx ?,?" early on can be discarded. Those which don't turn one of the constants to $0000_0200 can be discarded. Any 'jmp' early on could screw that hope, and the 'rdxxxx' may be hidden in calls to do read and increment ( unlikely IMO as that probably doesn't actually give shorter code ).
The wildcard is whether there is any address line swapping as well. If there is, I suspect it is block swapping or the ASCII Copyright notice would not have fitted so nicely. I would expect that 48 long block is uninitialised runtime buffer data for the loader.
Chip has said that CogInit executes some bytecode held in ROM, taking that as literally as stated, that bytecode could likely be in ROM mapped to the $1F0-$1FF area of the Cog. It could also be the data after the Copyright notice, but that doesn't look like bytecode to me at first glance. The last byte at $FFFF is $01, and I recall ChipVer is simply a read of ROM[noparse][[/noparse]-1], ie ROM[noparse][[/noparse]$FFFF]. There's also some odd interaction with CogInit and ROM in the Spin bytecode which I never got to the bottom of, but that could be incorrect bytecode decoding.
I'm not really sure how anyone would narrow the sets of possible interpreters to just one. Somewhere in the Cog will be some sort of jump table(s) to dispatch to bytecode handlers and we have no idea how Chip has done that. It could be a long list of 'jmp' ( unlikely IMO ), a long list of word or byte addresses. The bytecode falls into categories so there are likely to be multiple tables. None of that would appear as 'legitimate PASM instruction sequences'.
Another potential helper is that a few consecutive encrypted longs are the same. Which same instructions executed consecutively make any sense ? 'call' is one, there may be more, but these could also be duplicated addresses in a table where the same code handles different opcodes.
One brute force approach would be to generate all bit-swapped Cog programs, run them through a Cog emulator executing a known Spin program and see which execute it correctly. The same could be done to crack any encryption method; in the extreme case try all possible Cog Programs until one runs !
I think there's still a long haul to the challenge.
Comments
judging from Chip's seemingly excited posting, he is itching to let the cat out of the bag.
a thin veil of secrecy holding back the floodgates of creativity that issue forth from this forum.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Parallax Forums - If you're ready to learn, we're ready to help.
Is it still encrypted when it's running in cog ram?
1.Start a new spin cog.
2.Stop that cog
3.Start it again with an instruction to return a long from the cog.
4.Read all the remaining longs that way and output them to something or store them in hub ram.
5.Guess what you've overwritten with the instruction,
or copy and compare the image in HUB RAM with the image in ROM to guess the encryption key.
I'm assuming cog RAM doesn't clear. If it does then there is a way to protect other code.
Sorry I haven't learned enough PASM yet to evaluate all the likely loop holes.
The Propeller boots up at only 32Khz clock which is very slow...
> The next Propeller will hopefully solve this problem, once and for all.
Hmm ... do I have to read that as EEPROM? IIRC, it can't be done with the currently rotating Propeller's process.
Or some kind of OTP? If so, hopefully not the "external" ROM.
Nick
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!
The DIY Digital-Readout for mills, lathes etc.:
YADRO
Some threads here from time to time show the need for IP protection even in this small community. My idea 6 monthes ago - when I met the Propeller - was that Parallax had all the ingredients.
I expected from month to month that someone would come up with a disassembled SPIN interpreter.
For Parallax - selling to commercial customers - it would be a very serious argument that for more than a year now, a bunch of highly gifted programmers had not succeeded to break the IP barrier for the interpreter!
I was not aware that no one really tried this seriously
XOR with
$FFFF
$FFFF_FFFF
chip
pihc
PIHC
pihC
CHIP
Chip
Prop
prop
PROP
PORP
porP
porp
%%11111111
%%33333333
%%33333333_33333333
%%11111111_11111111
word swap
word[noparse][[/noparse]0]:=word
word:=word[noparse][[/noparse]0]
word swap with XOR $FFFF
word swap with XOR $FFFF_FFFF
Reverse bits
The reverse bits was at least showing a lot of instructions with the r bit set which you would expect.
But what do these timings tell you
deSilva, it would also be interesting to get an ASM cog to start another ASM cog and spin cog to see how different they are.
edit: just did b-a which is the time taken the do the cognew instruction. Got 1152 for the assembly cog and 17008 for the spin cog so something very interesting is going on. (clock cycles not us)
Post Edited (stevenmess2004) : 2/21/2008 1:01:00 AM GMT
Interestingly it only takes 104 µs (or 8350 ticks) after the COGNEW for SPIN is issued until the SPIN program responses, so in fact there CAN BE NO 2-tiere decrytion using a (decripting) pre-loader. As we know how to activate SPIN form ASM this could be double checked, but I think this gives no new insight..
I think this is been part of the bootstrap loader...
However 200 µs is an extremely long time, good for 10 to 20 lines of SPIN generally. This is factor "17" you noticed, in contrast to the one SPN line for the ASM-COGNEW.
I just took some minutes to count the percentage of zero bytes in the first 2000 bytes of:
- RAM ($0): 18% : This is the generally observed redundance of slightly compressed code and used data
- BootLoader ($F800): 6%
- SPIN ($F000): 4.4%: both values indicate a not so perfect uniform distribution, which gives hope
EDIT: I like Hippy's idea of bit swapping. As anyone who has used DES knows, this is easy to do in hardware and difficult in software. I think it was also used on some of the earlier video games which used removable cartridges. The implementation could be something like two data busses connecting the ROM to the rest of the chip. One is a straightforward connection for plaintext data. The other has the rearranged connections for the "scrambled" data. Tri-state drivers on each end would switch between the two as needed. And since there is no keying material it avoids all the legal entanglements of true encryption. Of course I don't know if this is how it is actually done but it would be a simple and clever method.
Post Edited (Beanie2k) : 2/21/2008 3:35:19 PM GMT
This is all a learning experience for me. Never done anything like this before
Edit: Just checked and the values are not the same so this won't work. Time to try something else.
Post Edited (stevenmess2004) : 2/21/2008 6:57:14 AM GMT
If not, that code might run and be able to display descrambled data from fff0-ffff to compare with the scrambled code.
Post Edited (VIRAND) : 2/21/2008 7:13:19 AM GMT
See attached
If we write something in assembly this should be possible. May have to have a look at a serial object to see what I can do.
(a) Q: How does the hardware know it has to "decrypt" something in the first place?
(b) When loading cells 496 to 511 the HUB data is suppressed and substituted by 0, in a normal load. However it is possible the COG loading does something special when in decryption mode.
(c) First do a simple statistical analysis on bit and long basis to learn the distribution!
Post Edited (deSilva) : 2/22/2008 2:15:53 PM GMT
(b) Possibly, if using an XOR then the mask may be in there
(c) Distribution of what? certain instructions? 1s and 0s? zcr flags?
Like I said a few posts above I've never worked on anything like this before so don't really have any good idea of what to try so any suggestions will be accepted
At first I figured that it shouldn't be that difficult if you assume that the first instruction is a MOV, exept...
There's a lot of possible variations on that instruction, so trying to 'backtrack' and find the XOR value(If that's what is used) isn't that easy.
Also, does it XOR with the same value all the time, does it use the result of the previous operation, or maybe a variation?
(Shifting the XOR one bit to the left for each operation, maybe)
As for what code to expect...
I have a sneaking suspicion that Chip is even more clever than is normally assumed, and his programming 'style' depends more on what he is trying to achieve, than how he prefers to work.
In other words...
I'm getting a headache just thinking about this...
If the code is ever revealed, would it be possible to get the listing printed on T-shirts, with the heading '496 Longs, who needs more?'
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...
Paul posted the code that does the random number generation in the GEAR thread so we can look for that. I just did a quick calculation and it looks like it will probably take somewhere over 3 hours to go through and check every XOR value between 0 and 2^32. So I may write a small program to do this over the weekend and see if I come up with anything unless anyone has a better suggestion.
Don't forget the cognew is performed by the Prop itself, not the instruction set, so ANYTHING is possible, as it's ALL parallel inside.
My reasoning for this ...
1) Chip would not do something overly complex when there's an easier, effective method.
2) Bit-swapping has zero silicon overhead, minimal added risk to the project
3) Encrypting the ROM is easy and has low risk of error.
4) It is deterrence rather than protection.
5) It has been described as "scrambling" not encryption.
6) I've done similar simple things and it is hard to see the forest for the trees in the result.
7) Bit-swapping is harder to crack than XORing, in code writing and execution time
8) Looking at the data suggests it could be the case ...
This is particularly true of the data below when loaded to the interpreter Cog ( assuming it's a linear load from $FFF4->$000 onwards ) ...
1E5 FFFFFFFF
1E6 08000000
1E7 00001000
1E8 C1C93008
1E9 08000000
Common constants used in many programs and most likely the interpreter are -1, $8000_0000 for sign bit manipulation, plus $0000_0200 for destination register incrementing. All those can be found by just bit-swapping.
They are all towards the end of the Cog where it is 'natural' to place constants. I have no explanation for 1E6 and 1E9 being the same, except one may get updated at run-time. The data at 1EB could decode to $FFC0_0000 ... $0000_03FF and in-betweens which could be a useful constant.
Bit-counting I think is the key to identifying known instructions. A "mov ?,PAR" would have at least 12 bits set, at least 11 bits clear for example, and that should appear early on. The only unknown bits there are the register which would be subsequently used in rdxxxx's to set the pc, sp, and global variable pointer.
I think the solution is going to rest on a finding the set of bit-swaps which generate a potentially correct result, those which don't generate "mov ?,PAR" early on can be discarded, those that don't reveal some "rdxxxx ?,?" early on can be discarded. Those which don't turn one of the constants to $0000_0200 can be discarded. Any 'jmp' early on could screw that hope, and the 'rdxxxx' may be hidden in calls to do read and increment ( unlikely IMO as that probably doesn't actually give shorter code ).
The wildcard is whether there is any address line swapping as well. If there is, I suspect it is block swapping or the ASCII Copyright notice would not have fitted so nicely. I would expect that 48 long block is uninitialised runtime buffer data for the loader.
Chip has said that CogInit executes some bytecode held in ROM, taking that as literally as stated, that bytecode could likely be in ROM mapped to the $1F0-$1FF area of the Cog. It could also be the data after the Copyright notice, but that doesn't look like bytecode to me at first glance. The last byte at $FFFF is $01, and I recall ChipVer is simply a read of ROM[noparse][[/noparse]-1], ie ROM[noparse][[/noparse]$FFFF]. There's also some odd interaction with CogInit and ROM in the Spin bytecode which I never got to the bottom of, but that could be incorrect bytecode decoding.
I'm not really sure how anyone would narrow the sets of possible interpreters to just one. Somewhere in the Cog will be some sort of jump table(s) to dispatch to bytecode handlers and we have no idea how Chip has done that. It could be a long list of 'jmp' ( unlikely IMO ), a long list of word or byte addresses. The bytecode falls into categories so there are likely to be multiple tables. None of that would appear as 'legitimate PASM instruction sequences'.
Another potential helper is that a few consecutive encrypted longs are the same. Which same instructions executed consecutively make any sense ? 'call' is one, there may be more, but these could also be duplicated addresses in a table where the same code handles different opcodes.
One brute force approach would be to generate all bit-swapped Cog programs, run them through a Cog emulator executing a known Spin program and see which execute it correctly. The same could be done to crack any encryption method; in the extreme case try all possible Cog Programs until one runs !
I think there's still a long haul to the challenge.