Beanie2k,
The download protocol was released a long time ago and several people have written downloaders. It's also been known for a long time how to start the Spin interpreter from assembly. There's not much else the bootloader does. It would be instructive to see how Chip did it, but there's little left to "protect".
Thank you so much for your description of what was used to do this task. To me this is a very interesting subject, reverse engineering. Glad Chip gave his blessing. And you are a hero. Thanks
"Also, PropellerForth (and other binaries built with propasm) is incompatible with the current version of PropDOS and other SD loaders due to a bug in their init code, which I will eventually send patches for. I noted this on the Getting Started page, but thought it worth reiterating."
I don't know if he sent patches, but you could compare the ROM'ed boot loader to what's being done in the SD loaders.
I've been watching this Thread from the start, but lurking quietly as assembly coding (or decoding in this case) is not one of my strong suits. I congratulate Hippy and all that aided in the discovery.
It was talked about earlier in the thread that this code, now that it's out in the open will most likely not lead to counterfeit Propellers because the hardware aspect.
I wonder now though, with the Spin Interpreter revealed to the public now, would it be possible for someone (or more likely a group of someones), be able to write an open source Propeller Tool (for users of Linux and Mac). Was this what was needed to enable such a project to begin?
I'm going to start a new thread where I'll post the INTERPRETER, BOOTER, and RUNNER source code. This way, it will be easier for everybody to locate and reference in the future.
parts-man73 said...
I wonder now though, with the Spin Interpreter revealed to the public now, would it be possible for someone (or more likely a group of someones), be able to write an open source Propeller Tool (for users of Linux and Mac). Was this what was needed to enable such a project to begin?
No, the Interpreter source was never needed for that.
All that is needed is a way to do the translation from Source code to tokens, and a way to download it.
The first could be achieved by a command-line tool, if anyone at Parallax had the time to build and compile it for your OS of choice. (Probably not a prioritised task since the Windows-based tool is sufficient for 95%+ of the users)
The second part shouldn't be too difficult as the download protocol is already known.
(I'm not very good at serial stuff, though, so don't ask me)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...
Well, I just posted everything into a new thread that could support some discussion, for now. We can easily permanentize it this week into something more like what you guys were suggesting.
la unica solucion que se me ocurre es un key externo co0mo dijo chip , hasta que cada propeler tenga una posicion de memoria
con un numero aleatorio y listo ,
The one and only solution that I come up with is an external key the chip said co0mo, until each propeler have a memory location with a random number and I list,
Pardon the meti in a job ( alley ). A greeting of Argentina
Hippy performed an amazing feat in this thread, not only by his insight that Chip's encryption involved bit permutation alone (although that singular FFFF_FFFF was surely a clue), but by performing — and finishing! — the decryption using deductive logic. I took a different, more inductive approach. And even though I did not finish the task, I thought it might be interesting to some if I documented what I was able to accomplish.
The idea behind an inductive approach is to ferret information out of the encryption using statistics based on a known plaintext example. The example I chose as a paradigm was the Float32a.spin file in the floating point object. This was an arbitrary choice, picked only because it contains a lot of assembly code. (The TV object also contains a lot of assembly code, but I was afraid that its peculiarities regarding video output would render it less than representative of assembly code in general.) Programs written in assembly do not use the available instructions in equal proportion. Some instructions, like JMPRET (including JMP, CALL, and RET), MOV, ADD, SUB, etc, are used way more frequently than others. It is this extreme skewing of instruction frequencies that permits information to be extracted from an otherwise scrambled array of bits.
Each Propeller instruction contains a six-bit instruction field. This provides for 64 distinct instructions. In a permutation cipher, these six bits can be scattered over a 32-bit long in 32! / 26! = 652,458,240 different ways. So to find the best match between the enciphered code and some paradigm, we'd have to try each of these 652 million ordered combinations and score it somehow against the paradigm code. This could take a long time. But what if, instead of finding all six bits at once, we could just do two at a time? In other words, find the best match for bits 31 and 30; then using those results, do 29 and 28; and finally, with four bits in hand, polish off 27 and 26. This would reduce the size of the search space considerably, possibly to as little as 32·31 + 30·29 + 28·27 = 2618 points, assuming each step produces one clear-cut winner. For this to work, a typcial assembly program would have to be skewed not only in individual instruction frequencies, but also by blocks of instruction frequencies defined by subsets of the instruction field. Fortunately, this is also the case, as the following graphs taken from Float32a.spin reveal.
Now, what to use as a measure of fitness for each of our samples? Probability and statistics provides a good candidate called a correlation coefficient. When applied to two ordered sets of numbers (e.g. the instruction frequencies from two different programs), it produces a number ranging between -1 and 1 that tells how well one set correlates — or agrees with — the other set. By computing the correlation coefficient for the relative instruction frequencies of each set of bits in our search space, against the frequencies from the paradigm program, we can rank each candidate and pick the best (and, hopefully, correct) one.
This is what I did, starting with the first two bits of the instruction field. The first graph in the following image displays the correlation coefficients, with the X-axis representing the first bit, and the Y-axis representing the second bit. The lighter the square at the intersection, the higher the correlation coefficient. (Anything below 0.7 is shown in black to keep from cluttering up the graph.)
This first-order scan produced five likely candidates for the first two bits, so there wasn't a clear-cut winner. These were (3,7), (10,7), (25,7), (19,21), and (21,19). Consequently, it was necessary to search for the second two bits, using each of these results for the first two, and hope that that would result in a clear-cut winner. But, as you can see, both the graphs for (3,7) and (21,19) contain points of interest. So the search for the last two bits will have to start with the top results for (3,7), which are (3,7,21,6), (3,7,25,6), (3,7,25,12), (3,7,21,19), and (3,7,21,12), along with the top results for (21,19), which are (21,19,3,6) and (21,19,25,6). These are shown in the two images that follow:
The (21,19) results are obvious duds. The (3,7) results produced a couple candidates, though, followed by their correlation coefficients in [noparse][[/noparse]brackets]: (3,7,21,12,6,19) [noparse][[/noparse]0.899], (3,7,25,6,12,19) [noparse][[/noparse]0.886], and (3,7,25,12,6,19) [noparse][[/noparse]0.896]. The winner, (3,7,21,12,6,19), just happens to agree with Hippy's results, although its closeness to the other two is rather cold comfort.
The next items of business are to determine the immediate bit and the condition code bits. My hope was that the immediate bit could be ferreted out by looking at the JMPRET (incl. JMP, CALL, and RET: 010111) instructions. My reasoning was that the target of these instructions is almost always an immediate address, so the number of ones in the immediate bit position should be higher than the number of ones in the other bit positions. So by counting the ones in the JMPRETs for each bit position (excluding the instruction bits, which are now known) and picking the position with the highest total, we should come up with the immediate bit. The graph below shows a clear-cut winner in bit 15, which agrees with Hippy's results.
For the condition code bits, my thought was that, since most instructions are executed unconditionally, most of the condition bits will be ones. So the strategy was to do the same as I did for the JMPRETs, but for all instructions, and pick the four positions with the hightest totals. From the graph below, you can see that this results in 0, 8, 11, and 14. This gives us the bits, but not the order, and agrees with Hippy's result of (8,11,0,14). Bit 20 was a pretender, however, so the result is not as clear-cut as one might have hoped.
This is where I stopped. Getting the condition bits in the right order would have involved a correlation analysis for each of the 4! = 24 permutations of these bits against the paradigm program, which may or may not have produced the right result; although, clearly, some conditions, such as IF_Z, IF_NZ, IF_C, and IF_NC are used far more than others. The write-result bit would likely succumb to the "column total" method when performed over a class of instructions that slmost always writes the result. (As it turns out, our fifth-place pretender for the condition bits, bit 20, is Hippy's result for the write-result bit.) The write-Z and write-C bits are almost never used with a JMPRET. In the above graph for the JMPRET, bits 4 and 17 (excluding known instruction bits) have the lowest number of ones and are, in fact, Hippy's results for these two bits, although there's no way to tell which is which.
It's very doubtful that the source and destination fields would have fallen as readily to a statistical approach as the other fields did. I'm frankly astonished that Hippy was able to unravel them at all, and I stand in awe of his accomplishment! All in all, this was a fun endeavor — even though it did reduce my productivity to near-zero over several days. Hat's off to Chip for providing the incentive and for honoring his commitment to reveal the annotated source!
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 2/27/2008 12:35:12 AM GMT
Phil, this is a great explanation of some basics of decryption techniques. And you have also become aware of their limitations. I am sure Hippy will soon explain his method for himself... But what he did was using ANAGRAMs in in intuitive way. This is an astonishingly well working technique as every puzzle-solver knows.
It is bound to succeed when you have enough texts - and we had 495!
However ANAGRAMs can only be solved by an expert in the language you are decoding. And Hippy seems to "live" in PASM
Note also that a great feedback is given by the context of the instructions: The program has to make sense. This is something Phil did not take advantage of: Imagine the additional confusion, if some of the lower address bits would have been swapped as well
.
I'm gob-smacked. Far more than anything I could have managed, and an amazing amount revealed from
nothing more than 'looking at the raw data'. Trying to untangle which bits were which was incredibly hard
from my approach also. If it hadn't been for some known plaintext and a good enough partial decoding to
match to I'd have been stuck with having to try all permutations.
Interestingly, having made a good guess ( a bad one in reality ) what fell out looked remarkably correct
but was very wrong for both WZ/WC and register addresses. In terms of the statistical approach I suppose
this equates to everything having the same 'ranking' or probability. There being too little known to make
the correct call. The huge numbers of 0's or 1's in addresses with most variables in two small groups with
just lsb's changing makes it even harder.
My approach worked because I made a good guess. Yours seems to work without that and I am sure we
would have had the exact same partial decoding before the tough bit mappings had to be solved. It seems
that this particular small, very skewed, code doesn't lend itself to the statistical method as well as more
orthodox code could. Nonetheless you got an incredible amount determined.
Thanks for the explanation ( it must be well written, I understood most of it ).
So where is the code? I just followed link after link to find this VERY helpful thread that seems to be JUST WHAT I AM LOOKING FOR for my project.
But where is the code posted???
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔ Computers are microcontrolled.
The ROM was encrypted (if you can call it that) to slow down would-be copy cats. When people know enough about something, they can go about replicating it rather casually. If some critical part is unknown, the difficulty can quickly exceed their appetite for work. I would like for everything to be open for customers, but I don't want to enable some thankless rip-off artists. That's why this is the way it is.
In the Propeller II, there will be some form of ROM protection for the booter code, only, since it will be responsible for encrypting data to and from the EEPROM using a unique ID in each chip. If this algorithm (in the booter code) is known, there goes the protection. Real random numbers will also be used to make every download different, so there's never a repeatable 1:1 correspondence between plain and encrypted data. The ultimate point of this is to protect USER code.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
Looking back, this was an attempt to stop the BASIC Stamp copycats from stealing your customers? In that case, you used a commodity part and wrote the glue that was the secret sauce. You were adding value to an existing product, with the Propeller you were making a new product. I still worry about this whole FPGA business, with the soft core being out there, but perhaps that fear is unfounded. I do think it would be interesting if you could monetize the soft core along side the silicon. After all, the chip the soft core runs on are more expensive than the silicon, so the impetus isn't for lower cost, it would be for customization. A customer could get some special feature added to the soft core without incurring a huge fab cost and time delays, and they may see the Propeller (2) architecture as a benefit to go along with the rapid development nature of FPGAs. They wouldn't be using an FPGA except for rapid development, or field upgradability, otherwise.
Just did a small experiment approaching this differently. Assumption is that coginit generally unscrambles ROM compared to a plain rdlong. This can be checked quite easily by starting a cog with code entry at e.g. $7FE0. The cog dumps its registers to hub RAM where it is compared against ordinary rdlongs over the same range. Turns out everything above $8000 looks different, OK. Knowing that bits are only relocated within the long the ROM font gives you all the info to figure out the bit pairs.
Update: A single bit scan (one bit differs between two longs) returns the first 22 mappings, double bit scan the remaining 10. This info is contained in the first $1E8 longs of the font array.
So where is the code? I just followed link after link to find this VERY helpful thread that seems to be JUST WHAT I AM LOOKING FOR for my project.
But where is the code posted???
Comments
The download protocol was released a long time ago and several people have written downloaders. It's also been known for a long time how to start the Spin interpreter from assembly. There's not much else the bootloader does. It would be instructive to see how Chip did it, but there's little left to "protect".
I'd guess you got your answer. And, much, much more.
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
Thank you so much for your description of what was used to do this task. To me this is a very interesting subject, reverse engineering. Glad Chip gave his blessing. And you are a hero. Thanks
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
In this thread http://forums.parallax.com/forums/default.aspx?f=25&m=246713 Cliff L. Biffle did say:
"Also, PropellerForth (and other binaries built with propasm) is incompatible with the current version of PropDOS and other SD loaders due to a bug in their init code, which I will eventually send patches for. I noted this on the Getting Started page, but thought it worth reiterating."
I don't know if he sent patches, but you could compare the ROM'ed boot loader to what's being done in the SD loaders.
It was talked about earlier in the thread that this code, now that it's out in the open will most likely not lead to counterfeit Propellers because the hardware aspect.
I wonder now though, with the Spin Interpreter revealed to the public now, would it be possible for someone (or more likely a group of someones), be able to write an open source Propeller Tool (for users of Linux and Mac). Was this what was needed to enable such a project to begin?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
All that is needed is a way to do the translation from Source code to tokens, and a way to download it.
The first could be achieved by a command-line tool, if anyone at Parallax had the time to build and compile it for your OS of choice. (Probably not a prioritised task since the Windows-based tool is sufficient for 95%+ of the users)
The second part shouldn't be too difficult as the download protocol is already known.
(I'm not very good at serial stuff, though, so don't ask me)
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Don't visit my new website...
We're getting a lot of "sticky" threads. My recommendation would be to include the source codes in one of the existing "sticky" threads.
Mike
Is there a way to not have everyone start posting on those 'blue' stickies?
More as a 'read-only' for non-Parallax; but 'write only' for Parallax?
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Harley Shanko
It should be called the "Parallax GNU sticky.."
There are some clever people out there.. well done guy's ..very interesting stuff.
Ron...
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
con un numero aleatorio y listo ,
The one and only solution that I come up with is an external key the chip said co0mo, until each propeler have a memory location with a random number and I list,
Pardon the meti in a job ( alley ). A greeting of Argentina
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Chip Gracey
Parallax, Inc.
The idea behind an inductive approach is to ferret information out of the encryption using statistics based on a known plaintext example. The example I chose as a paradigm was the Float32a.spin file in the floating point object. This was an arbitrary choice, picked only because it contains a lot of assembly code. (The TV object also contains a lot of assembly code, but I was afraid that its peculiarities regarding video output would render it less than representative of assembly code in general.) Programs written in assembly do not use the available instructions in equal proportion. Some instructions, like JMPRET (including JMP, CALL, and RET), MOV, ADD, SUB, etc, are used way more frequently than others. It is this extreme skewing of instruction frequencies that permits information to be extracted from an otherwise scrambled array of bits.
Each Propeller instruction contains a six-bit instruction field. This provides for 64 distinct instructions. In a permutation cipher, these six bits can be scattered over a 32-bit long in 32! / 26! = 652,458,240 different ways. So to find the best match between the enciphered code and some paradigm, we'd have to try each of these 652 million ordered combinations and score it somehow against the paradigm code. This could take a long time. But what if, instead of finding all six bits at once, we could just do two at a time? In other words, find the best match for bits 31 and 30; then using those results, do 29 and 28; and finally, with four bits in hand, polish off 27 and 26. This would reduce the size of the search space considerably, possibly to as little as 32·31 + 30·29 + 28·27 = 2618 points, assuming each step produces one clear-cut winner. For this to work, a typcial assembly program would have to be skewed not only in individual instruction frequencies, but also by blocks of instruction frequencies defined by subsets of the instruction field. Fortunately, this is also the case, as the following graphs taken from Float32a.spin reveal.
Now, what to use as a measure of fitness for each of our samples? Probability and statistics provides a good candidate called a correlation coefficient. When applied to two ordered sets of numbers (e.g. the instruction frequencies from two different programs), it produces a number ranging between -1 and 1 that tells how well one set correlates — or agrees with — the other set. By computing the correlation coefficient for the relative instruction frequencies of each set of bits in our search space, against the frequencies from the paradigm program, we can rank each candidate and pick the best (and, hopefully, correct) one.
This is what I did, starting with the first two bits of the instruction field. The first graph in the following image displays the correlation coefficients, with the X-axis representing the first bit, and the Y-axis representing the second bit. The lighter the square at the intersection, the higher the correlation coefficient. (Anything below 0.7 is shown in black to keep from cluttering up the graph.)
This first-order scan produced five likely candidates for the first two bits, so there wasn't a clear-cut winner. These were (3,7), (10,7), (25,7), (19,21), and (21,19). Consequently, it was necessary to search for the second two bits, using each of these results for the first two, and hope that that would result in a clear-cut winner. But, as you can see, both the graphs for (3,7) and (21,19) contain points of interest. So the search for the last two bits will have to start with the top results for (3,7), which are (3,7,21,6), (3,7,25,6), (3,7,25,12), (3,7,21,19), and (3,7,21,12), along with the top results for (21,19), which are (21,19,3,6) and (21,19,25,6). These are shown in the two images that follow:
The (21,19) results are obvious duds. The (3,7) results produced a couple candidates, though, followed by their correlation coefficients in [noparse][[/noparse]brackets]: (3,7,21,12,6,19) [noparse][[/noparse]0.899], (3,7,25,6,12,19) [noparse][[/noparse]0.886], and (3,7,25,12,6,19) [noparse][[/noparse]0.896]. The winner, (3,7,21,12,6,19), just happens to agree with Hippy's results, although its closeness to the other two is rather cold comfort.
The next items of business are to determine the immediate bit and the condition code bits. My hope was that the immediate bit could be ferreted out by looking at the JMPRET (incl. JMP, CALL, and RET: 010111) instructions. My reasoning was that the target of these instructions is almost always an immediate address, so the number of ones in the immediate bit position should be higher than the number of ones in the other bit positions. So by counting the ones in the JMPRETs for each bit position (excluding the instruction bits, which are now known) and picking the position with the highest total, we should come up with the immediate bit. The graph below shows a clear-cut winner in bit 15, which agrees with Hippy's results.
For the condition code bits, my thought was that, since most instructions are executed unconditionally, most of the condition bits will be ones. So the strategy was to do the same as I did for the JMPRETs, but for all instructions, and pick the four positions with the hightest totals. From the graph below, you can see that this results in 0, 8, 11, and 14. This gives us the bits, but not the order, and agrees with Hippy's result of (8,11,0,14). Bit 20 was a pretender, however, so the result is not as clear-cut as one might have hoped.
This is where I stopped. Getting the condition bits in the right order would have involved a correlation analysis for each of the 4! = 24 permutations of these bits against the paradigm program, which may or may not have produced the right result; although, clearly, some conditions, such as IF_Z, IF_NZ, IF_C, and IF_NC are used far more than others. The write-result bit would likely succumb to the "column total" method when performed over a class of instructions that slmost always writes the result. (As it turns out, our fifth-place pretender for the condition bits, bit 20, is Hippy's result for the write-result bit.) The write-Z and write-C bits are almost never used with a JMPRET. In the above graph for the JMPRET, bits 4 and 17 (excluding known instruction bits) have the lowest number of ones and are, in fact, Hippy's results for these two bits, although there's no way to tell which is which.
It's very doubtful that the source and destination fields would have fallen as readily to a statistical approach as the other fields did. I'm frankly astonished that Hippy was able to unravel them at all, and I stand in awe of his accomplishment! All in all, this was a fun endeavor — even though it did reduce my productivity to near-zero over several days. Hat's off to Chip for providing the incentive and for honoring his commitment to reveal the annotated source!
-Phil
Post Edited (Phil Pilgrim (PhiPi)) : 2/27/2008 12:35:12 AM GMT
It is bound to succeed when you have enough texts - and we had 495!
However ANAGRAMs can only be solved by an expert in the language you are decoding. And Hippy seems to "live" in PASM
Note also that a great feedback is given by the context of the instructions: The program has to make sense. This is something Phil did not take advantage of: Imagine the additional confusion, if some of the lower address bits would have been swapped as well
.
Post Edited (deSilva) : 2/27/2008 1:10:13 AM GMT
nothing more than 'looking at the raw data'. Trying to untangle which bits were which was incredibly hard
from my approach also. If it hadn't been for some known plaintext and a good enough partial decoding to
match to I'd have been stuck with having to try all permutations.
Interestingly, having made a good guess ( a bad one in reality ) what fell out looked remarkably correct
but was very wrong for both WZ/WC and register addresses. In terms of the statistical approach I suppose
this equates to everything having the same 'ranking' or probability. There being too little known to make
the correct call. The huge numbers of 0's or 1's in addresses with most variables in two small groups with
just lsb's changing makes it even harder.
My approach worked because I made a good guess. Yours seems to work without that and I am sure we
would have had the exact same partial decoding before the tough bit mappings had to be solved. It seems
that this particular small, very skewed, code doesn't lend itself to the statistical method as well as more
orthodox code could. Nonetheless you got an incredible amount determined.
Thanks for the explanation ( it must be well written, I understood most of it ).
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Propeller Wiki: Share the coolness!
Chat in real time with other Propellerheads on IRC #propeller @ freenode.net
But where is the code posted???
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
Want to·experiment with the SX or just put together a cool project?
SX Spinning light display·
▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔▔
Computers are microcontrolled.
Robots are microcontrolled.
I am microcontrolled.
But you·can·call me micro.
Want to·experiment with the SX or just put together a cool project?
SX Spinning light display·
Looking back, this was an attempt to stop the BASIC Stamp copycats from stealing your customers? In that case, you used a commodity part and wrote the glue that was the secret sauce. You were adding value to an existing product, with the Propeller you were making a new product. I still worry about this whole FPGA business, with the soft core being out there, but perhaps that fear is unfounded. I do think it would be interesting if you could monetize the soft core along side the silicon. After all, the chip the soft core runs on are more expensive than the silicon, so the impetus isn't for lower cost, it would be for customization. A customer could get some special feature added to the soft core without incurring a huge fab cost and time delays, and they may see the Propeller (2) architecture as a benefit to go along with the rapid development nature of FPGAs. They wouldn't be using an FPGA except for rapid development, or field upgradability, otherwise.
Update: A single bit scan (one bit differs between two longs) returns the first 22 mappings, double bit scan the remaining 10. This info is contained in the first $1E8 longs of the font array.
I was about to ask that same question. Here's a link to the specific post that has the code download link.
http://forums.parallax.com/showthread.php/101336-Code-protect?p=710871&viewfull=1#post710871