Code protect?

stevenmess2004 · 2008-02-20 21:04

All this is very interesting. Chip implied here http://forums.parallax.com/showthread.php?p=593557 that that RDLONG was used to load the cog. Obviously when loading from ROM an RDLONG is not being used or else code would not be scrambled. Along the lines that hippy suggested, maybe 9 bit areas are swapped around? So the instruction bits are swapped with the destination or source parts of the instruction?

CJ · 2008-02-20 21:19

perhaps even simpler, instead of little endian, it is big endian.

judging from Chip's seemingly excited posting, he is itching to let the cat out of the bag.

a thin veil of secrecy holding back the floodgates of creativity that issue forth from this forum.

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Parallax Forums - If you're ready to learn, we're ready to help.

VIRAND · 2008-02-20 21:57

I hadn't noticed that the interpreter was encrypted before.
Is it still encrypted when it's running in cog ram?

stevenmess2004 · 2008-02-20 22:04

No, it gets decrypted somewhere between the HUB and the COG.

stevenmess2004 · 2008-02-20 22:08

Something that may or may not be useful. There is 511 longs allocated for the interpreter in ROM but you only need 496.

VIRAND · 2008-02-20 22:24

Well I was thinking this might work:

1.Start a new spin cog.
2.Stop that cog
3.Start it again with an instruction to return a long from the cog.
4.Read all the remaining longs that way and output them to something or store them in hub ram.
5.Guess what you've overwritten with the instruction,
or copy and compare the image in HUB RAM with the image in ROM to guess the encryption key.

I'm assuming cog RAM doesn't clear. If it does then there is a way to protect other code.

stevenmess2004 · 2008-02-20 22:27

When you load a cog it loads 496 longs starting at the address that you specify overwriting anything that is in the cog.

VIRAND · 2008-02-20 22:34

Hmmm. Maybe the cog can be reset after just a few longs and be started then?
Sorry I haven't learned enough PASM yet to evaluate all the likely loop holes.

VIRAND · 2008-02-20 22:40

Maybe that can be done by single stepping the Propeller and then Resetting it.
The Propeller boots up at only 32Khz clock which is very slow...

Nick Mueller · 2008-02-20 22:41

Chip wrote:
> The next Propeller will hopefully solve this problem, once and for all.

Hmm ... do I have to read that as EEPROM? IIRC, it can't be done with the currently rotating Propeller's process.
Or some kind of OTP? If so, hopefully not the "external" ROM.

Nick

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Never use force, just go for a bigger hammer!

The DIY Digital-Readout for mills, lathes etc.:
YADRO

deSilva · 2008-02-20 22:44

There are no loopholes.
Some threads here from time to time show the need for IP protection even in this small community. My idea 6 monthes ago - when I met the Propeller - was that Parallax had all the ingredients.

I expected from month to month that someone would come up with a disassembled SPIN interpreter.

For Parallax - selling to commercial customers - it would be a very serious argument that for more than a year now, a bunch of highly gifted programmers had not succeeded to break the IP barrier for the interpreter!

I was not aware that no one really tried this seriously

stevenmess2004 · 2008-02-20 23:21

I have tried the following with no luck (ie I couldn't see any useful instructions in the first 20 or so lines.)

XOR with
$FFFF
$FFFF_FFFF
chip
pihc
PIHC
pihC
CHIP
Chip
Prop
prop
PROP
PORP
porP
porp
%%11111111
%%33333333
%%33333333_33333333
%%11111111_11111111
word swap
word[noparse][[/noparse]0]:=word
word:=word[noparse][[/noparse]0]

word swap with XOR $FFFF
word swap with XOR $FFFF_FFFF

Reverse bits

The reverse bits was at least showing a lot of instructions with the r bit set which you would expect.

deSilva · 2008-02-21 00:25

Nice try, Steven

But what do these timings tell you

''MPE_SECRETSPIN_002    2007-07-29 by deSilva
' 
'  Checks for some pecularities of encrypted SPIN Interpreter

CON

  _clkmode = xtal1 + pll8x  ' Hydra!
  _xinfreq = 10_000_000

OBJ
  pc   :       "PC_Interface"


VAR LONG a, b, x
PUB Main | stack[noparse][[/noparse]20]


  pc.start(31,30)    'start the PropTerminal

  pc.setcol(1)
  REPEAT
    pc.str(string(13, "COGNEW in µs..."))
    a := CNT
    cognew(@asm, @x)
    b := CNT     
    checkIt(string(" ASM: "))
    
    a := CNT
   cognew(spin, @stack)
    b := CNT
    checkIt(string(" SPIN: "))

PRI checkit(what)
    pc.str(what) 
    waitcnt(cnt+clkfreq)
    pc.dec((x-a)/80)                
    pc.out(" ")
    pc.dec((x-b)/80)

    
PRI spin
x := cnt

DAT
asm
  MOV CNT, CNT
  WRLONG CNT, PAR
  COGID 0
  COGSTOP 0

stevenmess2004 · 2008-02-21 00:52

Results: I am getting are 114(9160),100(8008) for ASM and 317(25364),104(8356) from Spin. Numbers in brackets are the actual number of cycles. The first number is the time it takes for the cog to start for the ASM and the time it takes to start + maybe a little bit more for spin (have to allow for the instruction fetch+jmp to instruction decoding+setting up the stack).

deSilva, it would also be interesting to get an ASM cog to start another ASM cog and spin cog to see how different they are.

edit: just did b-a which is the time taken the do the cognew instruction. Got 1152 for the assembly cog and 17008 for the spin cog so something very interesting is going on. (clock cycles not us)

Post Edited (stevenmess2004) : 2/21/2008 1:01:00 AM GMT

deSilva · 2008-02-21 01:11

There is some presetting of the stackframe for the new SPIN-COG, as discussed in Paul's thread the other day. But this will not explain the 200 µs difference, sufficient to reload a COG twice!

Interestingly it only takes 104 µs (or 8350 ticks) after the COGNEW for SPIN is issued until the SPIN program responses, so in fact there CAN BE NO 2-tiere decrytion using a (decripting) pre-loader. As we know how to activate SPIN form ASM this could be double checked, but I think this gives no new insight..

I think this is been part of the bootstrap loader...

' Start up oscillator and begin Spin program at $0010
' (assumes EEPROM data already loaded into $0000-$7FFF)
' (assumes currently running RCFAST oscillator)
' (assumes this is COG 0 - gets rebooted)
'
                        rdbyte  address,#$0004          'if xtal/pll enabled, start up now
                        and     address,#$F8            '..while remaining in rcfast mode
                        clkset  address

:delay                  djnz    time_xtal,#:delay       'allow 20ms @20MHz for xtal/pll to settle

                        rdbyte  address,#$0004          'switch to selected clock
                        clkset  address

                        coginit interpreter             'reboot cog with interpreter


time_xtal               long    20 * 20000 / 4 / 1      '20ms (@20MHz, 1 inst/loop)
interpreter             long    $0001 << 18 + $3C01 << 4 + %0000

address                 res     1

stevenmess2004 · 2008-02-21 01:27

What I think is very interesting is the time difference for a to b. It takes almost 17 times more for the spin cog than asm.

deSilva · 2008-02-21 01:38

This time is spent in the SPIN interpreter, part of it needed for the stack frame organization, as said.
However 200 µs is an extremely long time, good for 10 to 20 lines of SPIN generally. This is factor "17" you noticed, in contrast to the one SPN line for the ASM-COGNEW.

stevenmess2004 · 2008-02-21 01:51

The stack framing shouldn't take that long. It should only need to write about 4 wrlongs plus a small amount of code for adjusting the addresses.

deSilva · 2008-02-21 02:18

The most simple indicator for the quality of encryption is its deviation from noise.
I just took some minutes to count the percentage of zero bytes in the first 2000 bytes of:
- RAM ($0): 18% : This is the generally observed redundance of slightly compressed code and used data
- BootLoader ($F800): 6%
- SPIN ($F000): 4.4%: both values indicate a not so perfect uniform distribution, which gives hope

Beanie2k · 2008-02-21 04:44

Probably something really simple like XORing every byte with $42.

Actually it would have to be something simple as many countries (France is one, I believe) prohibit the import or possession of "strong" encryption. It will be interesting to see what the results are.

EDIT: I like Hippy's idea of bit swapping. As anyone who has used DES knows, this is easy to do in hardware and difficult in software. I think it was also used on some of the earlier video games which used removable cartridges. The implementation could be something like two data busses connecting the ROM to the rest of the chip. One is a straightforward connection for plaintext data. The other has the rearranged connections for the "scrambled" data. Tri-state drivers on each end would switch between the two as needed. And since there is no keying material it avoids all the legal entanglements of true encryption. Of course I don't know if this is how it is actually done but it would be a simple and clever method.

Post Edited (Beanie2k) : 2/21/2008 3:35:19 PM GMT

stevenmess2004 · 2008-02-21 06:51

Now for more theory time. A cog only loads 496 longs. However, there is 511 longs for the interpreter and 516 for the boot loader. Now, if the cog is loaded sequently starting at $F004 and $F800 then there must be 15 and 16 longs respectively at the end that are either used for something else or do nothing. If they do nothing then they could be very useful for figuring out how to crack this. (i.e. if they are all one value then we could try a XORing the rest of the memory with them)

This is all a learning experience for me. Never done anything like this before

Edit: Just checked and the values are not the same so this won't work. Time to try something else.

Post Edited (stevenmess2004) : 2/21/2008 6:57:14 AM GMT

VIRAND · 2008-02-21 07:04

What if you load a cog starting at $FFF0? Will the descramble stay on and rollover to RAM 0 where some code might be?
If not, that code might run and be able to display descrambled data from fff0-ffff to compare with the scrambled code.

Post Edited (VIRAND) : 2/21/2008 7:13:19 AM GMT

stevenmess2004 · 2008-02-21 07:11

I just added this (and a waitcnt(cnt) to my propAsm program and it crashed when the cog started

cognew($FFF0,$FFF0)

See attached

stevenmess2004 · 2008-02-21 07:52

VIRAND said...
What if you load a cog starting at $FFF0? Will the descramble stay on and rollover to RAM 0 where some code might be?
If not, that code might run and be able to display descrambled data from fff0-ffff to compare with the scrambled code.

If we write something in assembly this should be possible. May have to have a look at a serial object to see what I can do.

deSilva · 2008-02-21 09:36

The last postings are not very convincing

(a) Q: How does the hardware know it has to "decrypt" something in the first place?
(b) When loading cells 496 to 511 the HUB data is suppressed and substituted by 0, in a normal load. However it is possible the COG loading does something special when in decryption mode.
(c) First do a simple statistical analysis on bit and long basis to learn the distribution!

Post Edited (deSilva) : 2/22/2008 2:15:53 PM GMT

stevenmess2004 · 2008-02-21 09:59

(a) don't know but I wouldn't mind betting that it is either everything in the ROM, everything after $F004 or just when $F004 or $F800 which is not really helpful

(b) Possibly, if using an XOR then the mask may be in there
(c) Distribution of what? certain instructions? 1s and 0s? zcr flags?

Like I said a few posts above I've never worked on anything like this before so don't really have any good idea of what to try so any suggestions will be accepted

Gadgetman · 2008-02-21 10:24

I've been following this thread and...

At first I figured that it shouldn't be that difficult if you assume that the first instruction is a MOV, exept...
There's a lot of possible variations on that instruction, so trying to 'backtrack' and find the XOR value(If that's what is used) isn't that easy.

Also, does it XOR with the same value all the time, does it use the result of the previous operation, or maybe a variation?
(Shifting the XOR one bit to the left for each operation, maybe)

As for what code to expect...
I have a sneaking suspicion that Chip is even more clever than is normally assumed, and his programming 'style' depends more on what he is trying to achieve, than how he prefers to work.

In other words...
I'm getting a headache just thinking about this...

If the code is ever revealed, would it be possible to get the listing printed on T-shirts, with the heading '496 Longs, who needs more?'

▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...

stevenmess2004 · 2008-02-21 10:49

Gadgetman said...
If the code is ever revealed, would it be possible to get the listing printed on T-shirts, with the heading '496 Longs, who needs more?'

Very good

Paul posted the code that does the random number generation in the GEAR thread so we can look for that. I just did a quick calculation and it looks like it will probably take somewhere over 3 hours to go through and check every XOR value between 0 and 2^32. So I may write a small program to do this over the weekend and see if I come up with anything unless anyone has a better suggestion.

Baggers · 2008-02-21 11:20

RE: clock counting on the cognew.
Don't forget the cognew is performed by the Prop itself, not the instruction set, so ANYTHING is possible, as it's ALL parallel inside.

hippy · 2008-02-21 12:48

I'm currently of the opinion that there's nothing to it other than bit-swapping of the data, and if that isn't the case then with some bit-inversion / XORing applied as well. There are a large number of bit-swapping possibilities ( is it 32! ? ) so the bit-swapping is not immediately obvious and offers a high degree of protection.

My reasoning for this ...

1) Chip would not do something overly complex when there's an easier, effective method.
2) Bit-swapping has zero silicon overhead, minimal added risk to the project
3) Encrypting the ROM is easy and has low risk of error.
4) It is deterrence rather than protection.
5) It has been described as "scrambling" not encryption.
6) I've done similar simple things and it is hard to see the forest for the trees in the result.
7) Bit-swapping is harder to crack than XORing, in code writing and execution time
8) Looking at the data suggests it could be the case ...

This is particularly true of the data below when loaded to the interpreter Cog ( assuming it's a linear load from $FFF4->$000 onwards ) ...

1E5 FFFFFFFF
1E6 08000000
1E7 00001000
1E8 C1C93008
1E9 08000000

Common constants used in many programs and most likely the interpreter are -1, $8000_0000 for sign bit manipulation, plus $0000_0200 for destination register incrementing. All those can be found by just bit-swapping.

They are all towards the end of the Cog where it is 'natural' to place constants. I have no explanation for 1E6 and 1E9 being the same, except one may get updated at run-time. The data at 1EB could decode to $FFC0_0000 ... $0000_03FF and in-betweens which could be a useful constant.

Bit-counting I think is the key to identifying known instructions. A "mov ?,PAR" would have at least 12 bits set, at least 11 bits clear for example, and that should appear early on. The only unknown bits there are the register which would be subsequently used in rdxxxx's to set the pc, sp, and global variable pointer.

I think the solution is going to rest on a finding the set of bit-swaps which generate a potentially correct result, those which don't generate "mov ?,PAR" early on can be discarded, those that don't reveal some "rdxxxx ?,?" early on can be discarded. Those which don't turn one of the constants to $0000_0200 can be discarded. Any 'jmp' early on could screw that hope, and the 'rdxxxx' may be hidden in calls to do read and increment ( unlikely IMO as that probably doesn't actually give shorter code ).

The wildcard is whether there is any address line swapping as well. If there is, I suspect it is block swapping or the ASCII Copyright notice would not have fitted so nicely. I would expect that 48 long block is uninitialised runtime buffer data for the loader.

Chip has said that CogInit executes some bytecode held in ROM, taking that as literally as stated, that bytecode could likely be in ROM mapped to the $1F0-$1FF area of the Cog. It could also be the data after the Copyright notice, but that doesn't look like bytecode to me at first glance. The last byte at $FFFF is $01, and I recall ChipVer is simply a read of ROM[noparse][[/noparse]-1], ie ROM[noparse][[/noparse]$FFFF]. There's also some odd interaction with CogInit and ROM in the Spin bytecode which I never got to the bottom of, but that could be incorrect bytecode decoding.

I'm not really sure how anyone would narrow the sets of possible interpreters to just one. Somewhere in the Cog will be some sort of jump table(s) to dispatch to bytecode handlers and we have no idea how Chip has done that. It could be a long list of 'jmp' ( unlikely IMO ), a long list of word or byte addresses. The bytecode falls into categories so there are likely to be multiple tables. None of that would appear as 'legitimate PASM instruction sequences'.

Another potential helper is that a few consecutive encrypted longs are the same. Which same instructions executed consecutively make any sense ? 'call' is one, there may be more, but these could also be duplicated addresses in a table where the same code handles different opcodes.

One brute force approach would be to generate all bit-swapped Cog programs, run them through a Cog emulator executing a known Spin program and see which execute it correctly. The same could be done to crack any encryption method; in the extreme case try all possible Cog Programs until one runs !

I think there's still a long haul to the challenge.

Code protect?

Comments