x86 is a high level language

potatohead · 2015-03-25 14:53

http://blog.erratasec.com/2015/03/x86-is-high-level-language.html?m=1#.VRLn95PF_XE

This looks to be an interesting take on assembly language. I'm still reading and may comment later.

jmg · 2015-03-25 15:07

hehe - yes, the more complex things are, the less predictable they become.
This is why Microcontrollers will always be with us, in expanding numbers.

I've just written some code to actually measure the pipeline effects on a tiny MCU with the most basic of pipelines.

The results are nothing like the current device data sheets, but I'm sure those will improve as I show them how it really works....;)

Yes it follows simple rules, but they interact a little, and sometimes you can get a free lunch, other times not so good.

localroger · 2015-03-25 15:26

Although there were some mainframes that used pipelines back in the day, the original 8086 was the first consumer level chip to use an instruction pipeline, and it threw a community of developers that were used to cycle-counting to do fine time delays for a major loop. I remember reading articles that demonstrated the pipeline behavior in detail with photos of oscilloscope traces.

The really fun thing isn't even the cryptographic vulnerability. It's that you can go from very good performance to massively poor performance with some relatively minor change, and never have any idea why.

Gadgetman · 2015-03-25 16:06

You should take a look at early computers using 'drum' type memory.
Some of the things they did then to avoid lag because of the slow rotation speed of the drum...

Even simple pipelining is a major b!tch to handle if you want accurate timing. Whn you start adding stuff such as predictive branching, multiple Cache levels, memory segmentation, multiple threads...
And a CPU where timing of any command may change from series to series...

My head starts hurting...

Heater. · 2015-03-25 17:12

Ah, optimizing code to run on drum storage machines, cue the "Story of Mel" https://www.cs.utah.edu/~elb/folklore/mel.html

One had to use techniques like that to get any performance out of Intel's i860 RISC processor back in the 1980's with it's pipelines and parallel execution that were exposed to the programmer. Oddly that chip flopped.

Heater. · 2015-03-25 17:57

x86 has always been a "high level" language in a way.

I cite the AAA instruction as a case in point. A fairly high level operation that only exists to make life easier for the assembly language programmer. Never used by compilers.

Anyway, the impossibility of knowing the execution time of code on x86 has been with us for a long time. What with it's pipe lines, caches, parallel and out of order execution, etc. Code that may be carefully optimized for one generation of Intel device may not be optimal in a later generation.

localroger · 2015-03-25 20:22

One of the major things that has changed is that iwe have gone from an environment where processors were slow and memory was fast, giving rise to architectures like the 6502 that didn't even bother with many registers because you could use memory as registers, to today where memory access is literally hundreds of times slower than the processor, so we need all these multiple levels of caching to make the best use of that superfast CPU. This means any tweaking done for any generation is totally inappropriate for the next. Then there are other mods; either the 486 or Pentium eliminated the barrel shifters, and so there was a point where dividing by a power of two became faster than shifting. Nobody with any hard-iron sense of how things are supposed to work would see that.

Phil Pilgrim (PhiPi) · 2015-03-25 22:15

'Sounds like the x86 needs a waitcnt!

-Phil

jmg · 2015-03-25 23:12

Phil Pilgrim (PhiPi) wrote: »

'Sounds like the x86 needs a waitcnt!

Quite a few MCU's could benefit from that - some could even add it, with no opcode changes

Tor · 2015-03-26 00:09

Well, I disagree with the blog's definition of 'high-level' language.. that the only definition is that you can count instructions and figure out the time taken. To me, a high-level language makes things less complicated (although that may disqualify newer C++ versions..): You don't have to think about pipelines or delayed execution, NOP instructions at the right place, all that complexity. The general direction goes something like this: Machine code (opcodes) -> hex code to simplify how to enter opcodes -> assembly language -> assembly language plus macros -> high level language.

As for x86, it used to be a physical CPU instruction set. With the current Intel processors ~160 registers and dynamic allocation of registers (mapping of names like "eax" mentioned in the article to a physical register).. all of that to me says that x86 is simply a virtual machine processed by the Intel CPU. That doesn't make x86 a high level language more than having a 6502 running in an emulator makes 6502 machine code a high level language. Well, that may not be what the author claims anyway, but he does seem to claim that if counting instructions doesn't give you the timing then it's a high level language. Which I don't agree with either.

On the other hand I used to work with a 32-bit minicomputer (not VAX) where the assembly language had so complex instructions (as in "doing much", not as in "complex to use") that writing assembly was nearly the same experience as writing Pascal. That was in a way a high level language. No wonder it was not a particular fast mini, well maybe it was at first, but not when I got to run the same code on the first generation Sparc processor Sun machine and it ran 30 times faster.. *after* going through f2c to create C from the original Fortran. When I wrote an emulator for that mini it became very clear that there was *no way* that could be made to run efficiently. 28 addressing modes and very variable instruction length due to that didn't help.

-Tor

Heater. · 2015-03-26 00:44

The blog posters statement that "x86 is a high-level language" is made in a very specific context that he spells out in his opening paragraphs:

Just so you know, x86 machine-code is now a "high-level" language. What instructions say, and what they do, are very different things.

Notice the quotes around "high-level". The second sentence there is the key point. No way would we infer from this that the author is literally comparing writing x86 assembler to writing C or Haskel. Except in one major respect that he comes to next:

I mention this because of those commenting on this post on OpenSSL's "constant-time" calculations, designed to avoid revealing secrets due to variations in compute time. The major comment is that it's hard to do this perfectly in C. My response is that it's hard to do this even in x86 machine code.

Emphasis mine. What is this all about?

Turns out that when doing encryption/decryption the execution time of the algorithms depends on the actual values of the bits in the data and in the keys. If an attacker has some way to assess that differing execution time he has a way to start learning about your data and your keys. I.e. he can start to crack your crypto system. Not good.

Attacks like this have already been demonstrated to break keys used for code protection on embedded systems. One approach is: turn power on to the device, measure the current consumption as it checks it's code signature or whatever, do that a lot and you see how current consumption varies with the key bits. Soon you have broken the key.

One solution to this is to ensure your code always takes the same time no matter what the data or key values are. This is very hard to do in a high level language like C. The author is simply pointing out that due to the way modern processors work it's very hard to do in assembler as well. In this respect a high level language and assembler are the same.

It's likely to be impossible to ensure execution time is always the same. What about a multiply or divide instruction for example? The exact same instruction may take different numbers to clocks depending on the bits in it's operands.

Then the internals of Intel and AMD processors change all the time. If you managed to solve this problem for this years devices it may well be broken again on next years devices.

Interesting problem...

Tor · 2015-03-26 01:19

To me the solution to the varying-time problem was always to make it more varying by a random factor, not by trying to change it to always use the same amount of time, which is nearly impossible, not to mention the even more impossible goal of using the exact same amount of current.. instead, use a hardware random source (even pseudo-random should do) to deliberately insert variation into the processing time and current consumption.

-Tor

jmg · 2015-03-26 01:40

Meanwhile, at the other end of the scale, I see Intel have a 72 Core 'Chip' with 16GB of very fast local memory in-module, and 100GB/s links.
Those billions in R&D really have left everyone else way behind.

Heater. · 2015-03-26 01:42

Sounds like an idea.

Of course it adds more work to do and lowers the performance of the crypto. These crypto people don't like to use hardware random number generation built into CPU's. You can never tell if some three letter agency has a back door in there or not.

Somewhere I found a video of a demo of how to break key bits of an MCU by measuring the power consumption whist repeatedly rebooting it. Amazing stuff.

Somebody just released a cheap little board with facilities on it that can be used to automate such cracking quite a bit. Sadly I don't recall what it was called now. Sounded like it could be used to break the proposed code protection bits on the Propeller 2 in short order.

Tor · 2015-03-26 02:34

Heater. wrote: »

Somebody just released a cheap little board with facilities on it that can be used to automate such cracking quite a bit. Sadly I don't recall what it was called now[..]

Yes, I saw that one too. Quite impressive.

-Tor

Heater. · 2015-03-26 03:35

Here we go. The ChipWhisperer board by Colin O'Flynn. Everything you need to do side-channel analysis on a target MCU and crack it's code protection and other security;

https://www.kickstarter.com/projects/coflynn/chipwhisperer-lite-a-new-era-of-hardware-security

I wonder how well it will help against breaking the code protection on the Propeller II ?

Gadgetman · 2015-03-26 05:13

Interesting device, and I wish I could justify the cost...

But there's a bit of inaccuracy in the writeup...

Side channel analysis takes advantage of the fact that changing the state of a digital line uses a small amount of power.

Side Channel analysis is ANY indirect information that is being leaked in such a way that it can be measured and analyzed.
Other side-channel leaks are heat(people have uncovered the silicone and used IR microscopy to spot which parts are 'working harder'), noise(only suitable for relatively slow data streams), light and RF.

@Tor; was that 32bit computer a ND system?

Heater. · 2015-03-26 05:40

During my time dealing with such things we called them "covert channels".

Gadgetman · 2015-03-26 05:45

A Norwegian saying goes something like this: A dear child has many names...

Heater. · 2015-03-26 05:51

Funny you should mention heat. I was just reading this story about using heat to get data out of one PC and into another "air-gapped" machine near by.
http://www.net-security.org/secworld.php?id=18122

Gadgetman · 2015-03-26 06:03

There's some serious problems with that 'technology'

1. Air condition...
2. A person is equivalent to a 200W heater, at least.
3. Random 'high power usage' may be noticed.
4. Other random equipment producing heat, such as an electric kettle, or just a big cup of tea... (I have a 4.5dl cup... That's a lot of heat in one spot on my desk)

Heater. · 2015-03-26 06:14

I'm sure there are many issues but I guess one could set up a demonstration that works.

For a start it already assumes both computers are compromised as far as I can tell. One must get some code onto one machine to modulate the heat output thus sending data. And get code onto the other machine to read temperature variations thus receiving data.

Here I could set up one machine under my bench to modulate it's load which would drive it's horribly noise fan on and off. The sound of it's fan would be picked up by the microphone on the head set connected to the other machine under the bench. Easy.

Tor · 2015-03-26 06:44

Gadgetman wrote: »

@Tor; was that 32bit computer a ND system?

Yep, it had C functions like strcpy(), strcmp(), strcat(), strchr() in hardware, it also had variants like 'string compare with pad', and substring match.. and very complex loop constructs, in addition to the usual (and ok, except for the way too many addressing modes) arsenal of mul/div/add/sub/comp instructions. And it had a buddy-based memory allocator which could be used as the base for malloc() (something I actually used it for).
The older 16-bit ND model started out as a nearly RISC-like fixed-instruction-length single-word-instruction computer, in some ways a bit Propeller-like in that the original instruction set encoded source and destination as bitfields of instructions. And designed to be extremely simple to decode and execute. Even the assembler used that, each opcode could be created by adding together the mnemonics for the operation and the arguments (so if you specified source or destination register first didn't matter..) but unfortunately each new iteration of the CPU added more complex instructions, but (as I found when I wrote an emulator for that one too), the compilers didn't use those complex instructions. And eventually the designers went completely overboard with the new 32-bit CPU, the definition of CISC if there ever was one. But I hear that the VAX instruction set is also way too complex.. I have done some assembly programming on VAX too but I'm not really familiar with it. Was just hacking on some driver code now and then over a period of some months. The ND systems I worked on for years.

-Tor

Heater. · 2015-03-26 07:18

All far to high level and complex.

I want to build a one instruction computer.

One op code "SUBLEQ", subtract and branch if less than or equal to zero.

Three operands subtrahend, minuend, jmp target.

Of course with only one op code you never need to write it so in assembler you only write the three addresses

http://techtinkering.com/2009/05/15/improving-the-standard-subleq-oisc-architecture/

Being so simple if the Propeller were SUBLEQ we could probably get hundreds of COGs into a chip

Tor · 2015-03-26 07:45

Aaah! Grr, now I have to go and implement my own virtual SUBLEQ machine and write a compiler for it.. I'll start with a macro assembler (one instruction, the rest are just macros!). I'm sure that others have done that but the fun is doing it yourself.

-Tor

Heater. · 2015-03-26 08:01

Tor,

Good man. If you can get the SUBLEQ VM to run on the Propeller that would be great.

There is a SUBLEQ machine described here http://mazonka.com/subleq/ including an emulator, assembler and a compiler for a C like language. Also an FPGA implementation.

The SUBLEQ I linked to previously is a slightly improved machine that uses negative addresses to indicate indirection so A* is the thing A points to not A itself. This makes a lot of code much shorter and easier to write.

Bob Lawrence (VE1RLL) · 2015-03-26 08:27

re:break key bits of an MCU by measuring the power consumption

The same idea as this one process:
==============================================================================================
" Non-Invasive Attacks on Microcontrollers
I have been investigating into copy protection mechanisms in different microcontrollers since 1996. PIC16F84 was one of the first microcontrollers I have unprotected. Occasionally I do some research in this area and provide testing of new microcontrollers and renew testing of old.

Now let me introduce what sort of equipment I have been using for non-invasive attacks on microcontrollers. I built my own designed programmer device controlled by my software program. It operates on IBM PC under MS-DOS or Windows 95/98 and it was written in C++. Hardware of my programmer allows me to rapidly change power supply and programming voltages in wide range causing different power glitches. This is quite essential if you provide attacks on microcontrollers. Signal block of the programmer generates 32 output digital signals with adjustable logic levels and some of them has open-drain output. Also this block inputs 16 signals at correct logic levels. This is quite enough for most of microcontrollers. Another part of the programmer is the board with different IC sockets for most of microcontrollers I have tested. For some microcontrollers I use additional socket adapters. "

http://www.cl.cam.ac.uk/~sps32/mcu_lock.html

mindrobots · 2015-03-26 08:29

Dang it, you two!

I'm supposed to be working and now I'm off reading about SUBLEQ and SUBLEQ FGPA implementations and pondering and puzzling away my time at work!

A pox on both of you for this distraction!!

Heater. · 2015-03-26 10:02

I'm sure your boss will understand your lack of output when you tell him it's because of us

mindrobots · 2015-03-26 10:15

Heater. wrote: »

I'm sure your boss will understand your lack of output when you tell him it's because of us

Probably won't even notice it!

Set the bar low, going in!

Heater. · 2015-03-26 10:22

mindrobots,

That is good then. We will be expecting a SUBLEQ machine for the Propeller or FPGA from you shortly then.

Of course SUBLEQ is not the only Single Instruction Machine (SIC) possible, so you might like to use some of your bosses time to look into those as well

x86 is a high level language

Comments