Shop OBEX P1 Docs P2 Docs Learn Events
Code protect? - Page 2 — Parallax Forums

Code protect?

24567

Comments

  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-20 21:04
    All this is very interesting. Chip implied here http://forums.parallax.com/showthread.php?p=593557 that that RDLONG was used to load the cog. Obviously when loading from ROM an RDLONG is not being used or else code would not be scrambled. Along the lines that hippy suggested, maybe 9 bit areas are swapped around? So the instruction bits are swapped with the destination or source parts of the instruction?
  • CJCJ Posts: 470
    edited 2008-02-20 21:19
    perhaps even simpler, instead of little endian, it is big endian.

    judging from Chip's seemingly excited posting, he is itching to let the cat out of the bag.

    a thin veil of secrecy holding back the floodgates of creativity that issue forth from this forum.

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Parallax Forums - If you're ready to learn, we're ready to help.
  • VIRANDVIRAND Posts: 656
    edited 2008-02-20 21:57
    I hadn't noticed that the interpreter was encrypted before.
    Is it still encrypted when it's running in cog ram?
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-20 22:04
    No, it gets decrypted somewhere between the HUB and the COG.
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-20 22:08
    Something that may or may not be useful. There is 511 longs allocated for the interpreter in ROM but you only need 496.
  • VIRANDVIRAND Posts: 656
    edited 2008-02-20 22:24
    Well I was thinking this might work:

    1.Start a new spin cog.
    2.Stop that cog
    3.Start it again with an instruction to return a long from the cog.
    4.Read all the remaining longs that way and output them to something or store them in hub ram.
    5.Guess what you've overwritten with the instruction,
    or copy and compare the image in HUB RAM with the image in ROM to guess the encryption key.

    I'm assuming cog RAM doesn't clear. If it does then there is a way to protect other code.
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-20 22:27
    When you load a cog it loads 496 longs starting at the address that you specify overwriting anything that is in the cog.
  • VIRANDVIRAND Posts: 656
    edited 2008-02-20 22:34
    Hmmm. Maybe the cog can be reset after just a few longs and be started then?
    Sorry I haven't learned enough PASM yet to evaluate all the likely loop holes.
  • VIRANDVIRAND Posts: 656
    edited 2008-02-20 22:40
    Maybe that can be done by single stepping the Propeller and then Resetting it.
    The Propeller boots up at only 32Khz clock which is very slow...
  • Nick MuellerNick Mueller Posts: 815
    edited 2008-02-20 22:41
    Chip wrote:
    > The next Propeller will hopefully solve this problem, once and for all.

    Hmm ... do I have to read that as EEPROM? IIRC, it can't be done with the currently rotating Propeller's process.
    Or some kind of OTP? If so, hopefully not the "external" ROM.


    Nick

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Never use force, just go for a bigger hammer!

    The DIY Digital-Readout for mills, lathes etc.:
    YADRO
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-20 22:44
    There are no loopholes.
    Some threads here from time to time show the need for IP protection even in this small community. My idea 6 monthes ago - when I met the Propeller - was that Parallax had all the ingredients.

    I expected from month to month that someone would come up with a disassembled SPIN interpreter.

    For Parallax - selling to commercial customers - it would be a very serious argument that for more than a year now, a bunch of highly gifted programmers had not succeeded to break the IP barrier for the interpreter!

    I was not aware that no one really tried this seriously smile.gif
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-20 23:21
    I have tried the following with no luck (ie I couldn't see any useful instructions in the first 20 or so lines.)

    XOR with
    $FFFF
    $FFFF_FFFF
    chip
    pihc
    PIHC
    pihC
    CHIP
    Chip
    Prop
    prop
    PROP
    PORP
    porP
    porp
    %%11111111
    %%33333333
    %%33333333_33333333
    %%11111111_11111111
    word swap
    word[noparse][[/noparse]0]:=word
    word:=word[noparse][[/noparse]0]

    word swap with XOR $FFFF
    word swap with XOR $FFFF_FFFF

    Reverse bits

    The reverse bits was at least showing a lot of instructions with the r bit set which you would expect.
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-21 00:25
    Nice try, Steven smile.gif

    But what do these timings tell you smile.gif
    ''MPE_SECRETSPIN_002    2007-07-29 by deSilva
    ' 
    '  Checks for some pecularities of encrypted SPIN Interpreter
    
    CON
    
      _clkmode = xtal1 + pll8x  ' Hydra!
      _xinfreq = 10_000_000
    
    OBJ
      pc   :       "PC_Interface"
    
    
    VAR LONG a, b, x
    PUB Main | stack[noparse][[/noparse]20]
    
    
      pc.start(31,30)    'start the PropTerminal
    
      pc.setcol(1)
      REPEAT
        pc.str(string(13, "COGNEW in µs..."))
        a := CNT
        cognew(@asm, @x)
        b := CNT     
        checkIt(string(" ASM: "))
        
        a := CNT
       cognew(spin, @stack)
        b := CNT
        checkIt(string(" SPIN: "))
    
    PRI checkit(what)
        pc.str(what) 
        waitcnt(cnt+clkfreq)
        pc.dec((x-a)/80)                
        pc.out(" ")
        pc.dec((x-b)/80)
    
        
    PRI spin
    x := cnt
    
    DAT
    asm
      MOV CNT, CNT
      WRLONG CNT, PAR
      COGID 0
      COGSTOP 0
    
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 00:52
    Results: I am getting are 114(9160),100(8008) for ASM and 317(25364),104(8356) from Spin. Numbers in brackets are the actual number of cycles. The first number is the time it takes for the cog to start for the ASM and the time it takes to start + maybe a little bit more for spin (have to allow for the instruction fetch+jmp to instruction decoding+setting up the stack).

    deSilva, it would also be interesting to get an ASM cog to start another ASM cog and spin cog to see how different they are.

    edit: just did b-a which is the time taken the do the cognew instruction. Got 1152 for the assembly cog and 17008 for the spin cog so something very interesting is going on. (clock cycles not us)

    Post Edited (stevenmess2004) : 2/21/2008 1:01:00 AM GMT
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-21 01:11
    There is some presetting of the stackframe for the new SPIN-COG, as discussed in Paul's thread the other day. But this will not explain the 200 µs difference, sufficient to reload a COG twice!

    Interestingly it only takes 104 µs (or 8350 ticks) after the COGNEW for SPIN is issued until the SPIN program responses, so in fact there CAN BE NO 2-tiere decrytion using a (decripting) pre-loader. As we know how to activate SPIN form ASM this could be double checked, but I think this gives no new insight..

    I think this is been part of the bootstrap loader...

    ' Start up oscillator and begin Spin program at $0010
    ' (assumes EEPROM data already loaded into $0000-$7FFF)
    ' (assumes currently running RCFAST oscillator)
    ' (assumes this is COG 0 - gets rebooted)
    '
                            rdbyte  address,#$0004          'if xtal/pll enabled, start up now
                            and     address,#$F8            '..while remaining in rcfast mode
                            clkset  address
    
    :delay                  djnz    time_xtal,#:delay       'allow 20ms @20MHz for xtal/pll to settle
    
                            rdbyte  address,#$0004          'switch to selected clock
                            clkset  address
    
                            coginit interpreter             'reboot cog with interpreter
    
    
    time_xtal               long    20 * 20000 / 4 / 1      '20ms (@20MHz, 1 inst/loop)
    interpreter             long    $0001 << 18 + $3C01 << 4 + %0000
    
    address                 res     1
    
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 01:27
    What I think is very interesting is the time difference for a to b. It takes almost 17 times more for the spin cog than asm.
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-21 01:38
    This time is spent in the SPIN interpreter, part of it needed for the stack frame organization, as said.
    However 200 µs is an extremely long time, good for 10 to 20 lines of SPIN generally. This is factor "17" you noticed, in contrast to the one SPN line for the ASM-COGNEW.
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 01:51
    The stack framing shouldn't take that long. It should only need to write about 4 wrlongs plus a small amount of code for adjusting the addresses.
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-21 02:18
    The most simple indicator for the quality of encryption is its deviation from noise.
    I just took some minutes to count the percentage of zero bytes in the first 2000 bytes of:
    - RAM ($0): 18% : This is the generally observed redundance of slightly compressed code and used data
    - BootLoader ($F800): 6%
    - SPIN ($F000): 4.4%: both values indicate a not so perfect uniform distribution, which gives hope smile.gif
  • Beanie2kBeanie2k Posts: 83
    edited 2008-02-21 04:44
    Probably something really simple like XORing every byte with $42. lol.gif Actually it would have to be something simple as many countries (France is one, I believe) prohibit the import or possession of "strong" encryption. It will be interesting to see what the results are. yeah.gif

    EDIT: I like Hippy's idea of bit swapping. As anyone who has used DES knows, this is easy to do in hardware and difficult in software. I think it was also used on some of the earlier video games which used removable cartridges. The implementation could be something like two data busses connecting the ROM to the rest of the chip. One is a straightforward connection for plaintext data. The other has the rearranged connections for the "scrambled" data. Tri-state drivers on each end would switch between the two as needed. And since there is no keying material it avoids all the legal entanglements of true encryption. Of course I don't know if this is how it is actually done but it would be a simple and clever method.

    Post Edited (Beanie2k) : 2/21/2008 3:35:19 PM GMT
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 06:51
    Now for more theory time. A cog only loads 496 longs. However, there is 511 longs for the interpreter and 516 for the boot loader. Now, if the cog is loaded sequently starting at $F004 and $F800 then there must be 15 and 16 longs respectively at the end that are either used for something else or do nothing. If they do nothing then they could be very useful for figuring out how to crack this. (i.e. if they are all one value then we could try a XORing the rest of the memory with them)

    This is all a learning experience for me. Never done anything like this before smile.gif

    Edit: Just checked and the values are not the same so this won't work. Time to try something else.

    Post Edited (stevenmess2004) : 2/21/2008 6:57:14 AM GMT
  • VIRANDVIRAND Posts: 656
    edited 2008-02-21 07:04
    What if you load a cog starting at $FFF0? Will the descramble stay on and rollover to RAM 0 where some code might be?
    If not, that code might run and be able to display descrambled data from fff0-ffff to compare with the scrambled code.

    Post Edited (VIRAND) : 2/21/2008 7:13:19 AM GMT
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 07:11
    I just added this (and a waitcnt(cnt) to my propAsm program and it crashed when the cog started
    cognew($FFF0,$FFF0)
    



    See attached
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 07:52
    VIRAND said...
    What if you load a cog starting at $FFF0? Will the descramble stay on and rollover to RAM 0 where some code might be?
    If not, that code might run and be able to display descrambled data from fff0-ffff to compare with the scrambled code.

    If we write something in assembly this should be possible. May have to have a look at a serial object to see what I can do.
  • deSilvadeSilva Posts: 2,967
    edited 2008-02-21 09:36
    The last postings are not very convincing smile.gif

    (a) Q: How does the hardware know it has to "decrypt" something in the first place?
    (b) When loading cells 496 to 511 the HUB data is suppressed and substituted by 0, in a normal load. However it is possible the COG loading does something special when in decryption mode.
    (c) First do a simple statistical analysis on bit and long basis to learn the distribution!

    Post Edited (deSilva) : 2/22/2008 2:15:53 PM GMT
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 09:59
    (a) don't know but I wouldn't mind betting that it is either everything in the ROM, everything after $F004 or just when $F004 or $F800 which is not really helpful smile.gif
    (b) Possibly, if using an XOR then the mask may be in there
    (c) Distribution of what? certain instructions? 1s and 0s? zcr flags?

    Like I said a few posts above I've never worked on anything like this before so don't really have any good idea of what to try so any suggestions will be accepted smile.gif
  • GadgetmanGadgetman Posts: 2,436
    edited 2008-02-21 10:24
    I've been following this thread and...

    At first I figured that it shouldn't be that difficult if you assume that the first instruction is a MOV, exept...
    There's a lot of possible variations on that instruction, so trying to 'backtrack' and find the XOR value(If that's what is used) isn't that easy.

    Also, does it XOR with the same value all the time, does it use the result of the previous operation, or maybe a variation?
    (Shifting the XOR one bit to the left for each operation, maybe)

    As for what code to expect...
    I have a sneaking suspicion that Chip is even more clever than is normally assumed, and his programming 'style' depends more on what he is trying to achieve, than how he prefers to work.

    In other words...
    I'm getting a headache just thinking about this...

    If the code is ever revealed, would it be possible to get the listing printed on T-shirts, with the heading '496 Longs, who needs more?'

    smile.gif

    ▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
    Don't visit my new website...
  • stevenmess2004stevenmess2004 Posts: 1,102
    edited 2008-02-21 10:49
    Gadgetman said...
    If the code is ever revealed, would it be possible to get the listing printed on T-shirts, with the heading '496 Longs, who needs more?'
    Very good smile.gif

    Paul posted the code that does the random number generation in the GEAR thread so we can look for that. I just did a quick calculation and it looks like it will probably take somewhere over 3 hours to go through and check every XOR value between 0 and 2^32. So I may write a small program to do this over the weekend and see if I come up with anything unless anyone has a better suggestion.
  • BaggersBaggers Posts: 3,019
    edited 2008-02-21 11:20
    RE: clock counting on the cognew.
    Don't forget the cognew is performed by the Prop itself, not the instruction set, so ANYTHING is possible, as it's ALL parallel inside.
  • hippyhippy Posts: 1,981
    edited 2008-02-21 12:48
    I'm currently of the opinion that there's nothing to it other than bit-swapping of the data, and if that isn't the case then with some bit-inversion / XORing applied as well. There are a large number of bit-swapping possibilities ( is it 32! ? ) so the bit-swapping is not immediately obvious and offers a high degree of protection.

    My reasoning for this ...

    1) Chip would not do something overly complex when there's an easier, effective method.
    2) Bit-swapping has zero silicon overhead, minimal added risk to the project
    3) Encrypting the ROM is easy and has low risk of error.
    4) It is deterrence rather than protection.
    5) It has been described as "scrambling" not encryption.
    6) I've done similar simple things and it is hard to see the forest for the trees in the result.
    7) Bit-swapping is harder to crack than XORing, in code writing and execution time
    8) Looking at the data suggests it could be the case ...

    This is particularly true of the data below when loaded to the interpreter Cog ( assuming it's a linear load from $FFF4->$000 onwards ) ...

    1E5 FFFFFFFF
    1E6 08000000
    1E7 00001000
    1E8 C1C93008
    1E9 08000000

    Common constants used in many programs and most likely the interpreter are -1, $8000_0000 for sign bit manipulation, plus $0000_0200 for destination register incrementing. All those can be found by just bit-swapping.

    They are all towards the end of the Cog where it is 'natural' to place constants. I have no explanation for 1E6 and 1E9 being the same, except one may get updated at run-time. The data at 1EB could decode to $FFC0_0000 ... $0000_03FF and in-betweens which could be a useful constant.

    Bit-counting I think is the key to identifying known instructions. A "mov ?,PAR" would have at least 12 bits set, at least 11 bits clear for example, and that should appear early on. The only unknown bits there are the register which would be subsequently used in rdxxxx's to set the pc, sp, and global variable pointer.

    I think the solution is going to rest on a finding the set of bit-swaps which generate a potentially correct result, those which don't generate "mov ?,PAR" early on can be discarded, those that don't reveal some "rdxxxx ?,?" early on can be discarded. Those which don't turn one of the constants to $0000_0200 can be discarded. Any 'jmp' early on could screw that hope, and the 'rdxxxx' may be hidden in calls to do read and increment ( unlikely IMO as that probably doesn't actually give shorter code ).

    The wildcard is whether there is any address line swapping as well. If there is, I suspect it is block swapping or the ASCII Copyright notice would not have fitted so nicely. I would expect that 48 long block is uninitialised runtime buffer data for the loader.

    Chip has said that CogInit executes some bytecode held in ROM, taking that as literally as stated, that bytecode could likely be in ROM mapped to the $1F0-$1FF area of the Cog. It could also be the data after the Copyright notice, but that doesn't look like bytecode to me at first glance. The last byte at $FFFF is $01, and I recall ChipVer is simply a read of ROM[noparse][[/noparse]-1], ie ROM[noparse][[/noparse]$FFFF]. There's also some odd interaction with CogInit and ROM in the Spin bytecode which I never got to the bottom of, but that could be incorrect bytecode decoding.

    I'm not really sure how anyone would narrow the sets of possible interpreters to just one. Somewhere in the Cog will be some sort of jump table(s) to dispatch to bytecode handlers and we have no idea how Chip has done that. It could be a long list of 'jmp' ( unlikely IMO ), a long list of word or byte addresses. The bytecode falls into categories so there are likely to be multiple tables. None of that would appear as 'legitimate PASM instruction sequences'.

    Another potential helper is that a few consecutive encrypted longs are the same. Which same instructions executed consecutively make any sense ? 'call' is one, there may be more, but these could also be duplicated addresses in a table where the same code handles different opcodes.

    One brute force approach would be to generate all bit-swapped Cog programs, run them through a Cog emulator executing a known Spin program and see which execute it correctly. The same could be done to crack any encryption method; in the extreme case try all possible Cog Programs until one runs !

    I think there's still a long haul to the challenge.
Sign In or Register to comment.