x86 is a high level language
potatohead
Posts: 10,261
http://blog.erratasec.com/2015/03/x86-is-high-level-language.html?m=1#.VRLn95PF_XE
This looks to be an interesting take on assembly language. I'm still reading and may comment later.
This looks to be an interesting take on assembly language. I'm still reading and may comment later.
Comments
This is why Microcontrollers will always be with us, in expanding numbers.
I've just written some code to actually measure the pipeline effects on a tiny MCU with the most basic of pipelines.
The results are nothing like the current device data sheets, but I'm sure those will improve as I show them how it really works....;)
Yes it follows simple rules, but they interact a little, and sometimes you can get a free lunch, other times not so good.
The really fun thing isn't even the cryptographic vulnerability. It's that you can go from very good performance to massively poor performance with some relatively minor change, and never have any idea why.
Some of the things they did then to avoid lag because of the slow rotation speed of the drum...
Even simple pipelining is a major b!tch to handle if you want accurate timing. Whn you start adding stuff such as predictive branching, multiple Cache levels, memory segmentation, multiple threads...
And a CPU where timing of any command may change from series to series...
My head starts hurting...
One had to use techniques like that to get any performance out of Intel's i860 RISC processor back in the 1980's with it's pipelines and parallel execution that were exposed to the programmer. Oddly that chip flopped.
I cite the AAA instruction as a case in point. A fairly high level operation that only exists to make life easier for the assembly language programmer. Never used by compilers.
Anyway, the impossibility of knowing the execution time of code on x86 has been with us for a long time. What with it's pipe lines, caches, parallel and out of order execution, etc. Code that may be carefully optimized for one generation of Intel device may not be optimal in a later generation.
-Phil
Quite a few MCU's could benefit from that - some could even add it, with no opcode changes
As for x86, it used to be a physical CPU instruction set. With the current Intel processors ~160 registers and dynamic allocation of registers (mapping of names like "eax" mentioned in the article to a physical register).. all of that to me says that x86 is simply a virtual machine processed by the Intel CPU. That doesn't make x86 a high level language more than having a 6502 running in an emulator makes 6502 machine code a high level language. Well, that may not be what the author claims anyway, but he does seem to claim that if counting instructions doesn't give you the timing then it's a high level language. Which I don't agree with either.
On the other hand I used to work with a 32-bit minicomputer (not VAX) where the assembly language had so complex instructions (as in "doing much", not as in "complex to use") that writing assembly was nearly the same experience as writing Pascal. That was in a way a high level language. No wonder it was not a particular fast mini, well maybe it was at first, but not when I got to run the same code on the first generation Sparc processor Sun machine and it ran 30 times faster.. *after* going through f2c to create C from the original Fortran. When I wrote an emulator for that mini it became very clear that there was *no way* that could be made to run efficiently. 28 addressing modes and very variable instruction length due to that didn't help.
-Tor
Turns out that when doing encryption/decryption the execution time of the algorithms depends on the actual values of the bits in the data and in the keys. If an attacker has some way to assess that differing execution time he has a way to start learning about your data and your keys. I.e. he can start to crack your crypto system. Not good.
Attacks like this have already been demonstrated to break keys used for code protection on embedded systems. One approach is: turn power on to the device, measure the current consumption as it checks it's code signature or whatever, do that a lot and you see how current consumption varies with the key bits. Soon you have broken the key.
One solution to this is to ensure your code always takes the same time no matter what the data or key values are. This is very hard to do in a high level language like C. The author is simply pointing out that due to the way modern processors work it's very hard to do in assembler as well. In this respect a high level language and assembler are the same.
It's likely to be impossible to ensure execution time is always the same. What about a multiply or divide instruction for example? The exact same instruction may take different numbers to clocks depending on the bits in it's operands.
Then the internals of Intel and AMD processors change all the time. If you managed to solve this problem for this years devices it may well be broken again on next years devices.
Interesting problem...
-Tor
Those billions in R&D really have left everyone else way behind.
Of course it adds more work to do and lowers the performance of the crypto. These crypto people don't like to use hardware random number generation built into CPU's. You can never tell if some three letter agency has a back door in there or not.
Somewhere I found a video of a demo of how to break key bits of an MCU by measuring the power consumption whist repeatedly rebooting it. Amazing stuff.
Somebody just released a cheap little board with facilities on it that can be used to automate such cracking quite a bit. Sadly I don't recall what it was called now. Sounded like it could be used to break the proposed code protection bits on the Propeller 2 in short order.
-Tor
https://www.kickstarter.com/projects/coflynn/chipwhisperer-lite-a-new-era-of-hardware-security
I wonder how well it will help against breaking the code protection on the Propeller II ?
But there's a bit of inaccuracy in the writeup... Side Channel analysis is ANY indirect information that is being leaked in such a way that it can be measured and analyzed.
Other side-channel leaks are heat(people have uncovered the silicone and used IR microscopy to spot which parts are 'working harder'), noise(only suitable for relatively slow data streams), light and RF.
@Tor; was that 32bit computer a ND system?
http://www.net-security.org/secworld.php?id=18122
1. Air condition...
2. A person is equivalent to a 200W heater, at least.
3. Random 'high power usage' may be noticed.
4. Other random equipment producing heat, such as an electric kettle, or just a big cup of tea... (I have a 4.5dl cup... That's a lot of heat in one spot on my desk)
For a start it already assumes both computers are compromised as far as I can tell. One must get some code onto one machine to modulate the heat output thus sending data. And get code onto the other machine to read temperature variations thus receiving data.
Here I could set up one machine under my bench to modulate it's load which would drive it's horribly noise fan on and off. The sound of it's fan would be picked up by the microphone on the head set connected to the other machine under the bench. Easy.
The older 16-bit ND model started out as a nearly RISC-like fixed-instruction-length single-word-instruction computer, in some ways a bit Propeller-like in that the original instruction set encoded source and destination as bitfields of instructions. And designed to be extremely simple to decode and execute. Even the assembler used that, each opcode could be created by adding together the mnemonics for the operation and the arguments (so if you specified source or destination register first didn't matter..) but unfortunately each new iteration of the CPU added more complex instructions, but (as I found when I wrote an emulator for that one too), the compilers didn't use those complex instructions. And eventually the designers went completely overboard with the new 32-bit CPU, the definition of CISC if there ever was one. But I hear that the VAX instruction set is also way too complex.. I have done some assembly programming on VAX too but I'm not really familiar with it. Was just hacking on some driver code now and then over a period of some months. The ND systems I worked on for years.
-Tor
I want to build a one instruction computer.
One op code "SUBLEQ", subtract and branch if less than or equal to zero.
Three operands subtrahend, minuend, jmp target.
Of course with only one op code you never need to write it so in assembler you only write the three addresses
http://techtinkering.com/2009/05/15/improving-the-standard-subleq-oisc-architecture/
Being so simple if the Propeller were SUBLEQ we could probably get hundreds of COGs into a chip
-Tor
Good man. If you can get the SUBLEQ VM to run on the Propeller that would be great.
There is a SUBLEQ machine described here http://mazonka.com/subleq/ including an emulator, assembler and a compiler for a C like language. Also an FPGA implementation.
The SUBLEQ I linked to previously is a slightly improved machine that uses negative addresses to indicate indirection so A* is the thing A points to not A itself. This makes a lot of code much shorter and easier to write.
The same idea as this one process:
==============================================================================================
" Non-Invasive Attacks on Microcontrollers
I have been investigating into copy protection mechanisms in different microcontrollers since 1996. PIC16F84 was one of the first microcontrollers I have unprotected. Occasionally I do some research in this area and provide testing of new microcontrollers and renew testing of old.
Now let me introduce what sort of equipment I have been using for non-invasive attacks on microcontrollers. I built my own designed programmer device controlled by my software program. It operates on IBM PC under MS-DOS or Windows 95/98 and it was written in C++. Hardware of my programmer allows me to rapidly change power supply and programming voltages in wide range causing different power glitches. This is quite essential if you provide attacks on microcontrollers. Signal block of the programmer generates 32 output digital signals with adjustable logic levels and some of them has open-drain output. Also this block inputs 16 signals at correct logic levels. This is quite enough for most of microcontrollers. Another part of the programmer is the board with different IC sockets for most of microcontrollers I have tested. For some microcontrollers I use additional socket adapters. "
http://www.cl.cam.ac.uk/~sps32/mcu_lock.html
I'm supposed to be working and now I'm off reading about SUBLEQ and SUBLEQ FGPA implementations and pondering and puzzling away my time at work!
A pox on both of you for this distraction!!
Set the bar low, going in!
That is good then. We will be expecting a SUBLEQ machine for the Propeller or FPGA from you shortly then.
Of course SUBLEQ is not the only Single Instruction Machine (SIC) possible, so you might like to use some of your bosses time to look into those as well