Compiler ignoring order of operations.

TomUdale · 2016-05-17 15:12

Hi All,

In my quest to find ways to prevent the compiler reordering all my operations around timing instructions, I discovered what looks to me like an error.

To prevent code movement I came up with the idea of enclosing the code I did not want to move in an if() block that is always taken but cannot be optimized away. The template is this:

extern volatile int alwaysTrue;
void foo(void)
{
   unsigned ct;
   unsigned time;

   ct=CNT;
   if(alwaysTrue)
   {
       // do a bunch of stuff.
   }
   time=CNT-ct;
}

This seems to pretty much prevent any of the // do a bunch of stuff code from floating outside the timing block. I wrapped this up in a macro:

#define BEGIN_COMPILE_BARRIER(name) if(alwaysTrue){
#define END_COMPILE_BARRIER(name) }

(The "name" parameter is for my second implementation which removes the overhead of the if but is not shown here).

However, when I started applying this concept in general, I suddenly got some broken code. It turns out that the error does not seem to be related to my efforts but is more general in nature.

Below is a section of the C code in question.

        BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS);
        extraBase=CNT;
        END_COMPILE_BARRIER(MEASURE_CALC_LOADS);
        if (status)
        {
            status->stimLoopTime=(extraBase-ct)+extraTime;  // <This goes wrong.
            data->doneStatus=status;
        }
        extraTime=CNT-extraBase;

The error is around the calculation of stimLoopTime.

The generated asm is:

 412:cpx_stimdriver.cogc ****         BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS);
 603              		.loc 1 412 0
 604 0534 0000BC08 		rdlong	r7, r12
 605 0538 00007CC3 		cmps	r7, #0 wz,wc
 413:cpx_stimdriver.cogc ****         extraBase=CNT;
 606              		.loc 1 413 0
 607 053c 00001408 		IF_NE wrlong	CNT, sp
 414:cpx_stimdriver.cogc ****         END_COMPILE_BARRIER(MEASURE_CALC_LOADS);
 415:cpx_stimdriver.cogc ****         if (status)
 608              		.loc 1 415 0
 609 0540 00007CC3 		cmps	r14, #0 wz,wc
 610 0544 0000685C 		IF_E 	jmp	#.L33
 416:cpx_stimdriver.cogc ****         {
 417:cpx_stimdriver.cogc ****             status->stimLoopTime=(extraBase-ct)+extraTime;
 611              		.loc 1 417 0
 612 0548 0000BCA0 		mov r7, r14      ; load address of status
 613 054c 0000BC84 		sub r11, r8       ; subtract ct (in r8) from extraTime (r11)
 614              	.LVL31
 615 0550 1000FC80 		add r7, #16      ; calculate address of stimLooptime
 616 0554 0000BC08 		rdlong r3, sp    ; read extraBase back.
 617 0558 0000BC80 		add r11, r3       ; add extraBase to extraTime-ct.
 618 055c 00003C08 		wrlong r11, r7  ; write result to stimLoopTime
 418:cpx_stimdriver.cogc ****             data->doneStatus=status;
 619              		.loc 1 418 0
 620 0560 0000BCA0 		mov	r7, r13
 621 0564 0C00FC80 		add	r7, #12
 622 0568 00003C08 		wrlong	r14, r7
 623              	.L33
 419:cpx_stimdriver.cogc ****         }
 420:cpx_stimdriver.cogc ****         extraTime=CNT-extraBase;
 624              		.loc 1 420 0
 625 056c 0000BCA0 		mov r11, CNT   ; save CNT
 626 0570 0000BC08 		rdlong r5, sp    ; read extraBase
 627 0574 0000BC84 		sub r11, r5       ; calculate extraTime
 628              	.LVL32
 629              	.LBE3
 421:cpx_stimdriver.cogc ****     }
 630              		.loc 1 421 0
 631 0578 00007C5C 		jmp	#.L34

I have commented the relevant areas. You can see that for some reason the compiler has decided to change the order of operations from

status->stimLoopTime=(extraBase-ct)+extraTime;

to

status->stimLoopTime=(extraTime-ct)+extraBase;

This is not really ok to do. extraTime has a value of about 20. ct is a raw CNT clock as is extraBase. So extraTime-ct becomes negative which then wraps to a big positive. Then we add a big positive in the form of extraBase and now our total elapsed time, which should be in the order of 2500 is massive.

If I comment out the BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS) block around extraBase=CNT, the code changes to the following:

 412:cpx_stimdriver.cogc ****         //BEGIN_COMPILE_BARRIER(MEASURE_CALC_LOADS);
 413:cpx_stimdriver.cogc ****         extraBase=CNT;
 602              		.loc 1 413 0
 603 0524 0000BCA0 		mov	r7, CNT     ; save extraBase into r7
 604              	.LVL33
 414:cpx_stimdriver.cogc ****         //END_COMPILE_BARRIER(MEASURE_CALC_LOADS);
 415:cpx_stimdriver.cogc ****         if (status)
 605              		.loc 1 415 0
 606 0528 00007CC3 		cmps	r14, #0 wz,wc
 607 052c 0000685C 		IF_E 	jmp	#.L32
 416:cpx_stimdriver.cogc ****         {
 417:cpx_stimdriver.cogc ****             status->stimLoopTime=(extraBase-ct)+extraTime;
 608              		.loc 1 417 0
 609 0530 0000BCA0 		mov r6, r14        ; save address of status
 610 0534 0000BC80 		add r12, r7         ; add extraTime (r12) and extraBase (r7)
 611              	.LVL34
 612 0538 1000FC80 		add r6, #16        ; calculate offset of stimLoopTime
 613 053c 0000BC84 		sub r12, r8         ; subtract ct from extraTime+extraBase
 614 0540 00003C08 		wrlong r12, r6    ; save result
 418:cpx_stimdriver.cogc ****             data->doneStatus=status;
 615              		.loc 1 418 0
 616 0544 0000BCA0 		mov	r6, r13
 617 0548 0C00FC80 		add	r6, #12
 618 054c 00003C08 		wrlong	r14, r6
 619              	.L32
 419:cpx_stimdriver.cogc ****         }
 420:cpx_stimdriver.cogc ****         extraTime=CNT-extraBase;
 620              		.loc 1 420 0
 621 0550 0000BCA0 		mov r12, CNT
 622 0554 0000BC84 		sub r12, r7         ; calculate extraTime
 623              	.LVL35
 624              	.LBE3
 421:cpx_stimdriver.cogc ****     }
 625              		.loc 1 421 0
 626 0558 00007C5C 		jmp	#.L33

It has now recast the math to

status->stimLoopTime=(extraBase+extraTime)-ct;

which is still wrong but at least works in this instance.

But the underlying issue is that it is completely ignoring the ordering specified by the (). I know that compilers can do lots of optimizing, but I don't think they get to redefine things like this. Order of operations involving subtraction is important.

Any thoughts?

Best regards,

Tom

Heater. · 2016-05-17 19:11

As far as I can tell the order in which you do these operations should not mater. Surely if the thing overflows in the middle of the calculation that just gets "unwound" as you go along and you end up with the same result.

What results do you actually get?

ersmith · 2016-05-17 20:59

How are the variables declared? If they are unsigned then I think the compiler has to be careful, but if they're signed integers then all bets are off -- IIRC the C standard explicitly says that behavior in the face of underflow/overflow is undefined (and GCC takes advantage of this in its optimizations; basically it assumes that signed integer math cannot overflow).

I'll also note that it looks like the code generated is incorrect in at least one place:

413:cpx_stimdriver.cogc ****         extraBase=CNT;
 606              		.loc 1 413 0
 607 053c 00001408 		IF_NE wrlong	CNT, sp

The wrlong is using CNT as a destination register, but I think that will cause the shadow register attached to CNT to be used instead (I could be misremembering, but I think CNT only returns the right result when used as a source register).

Working around that bug may be awkward: you'll want to make sure that extraBase is in a register. Putting the CNT read in a subroutine call might help (as then it will be read into r0 instead of onto the stack).

TomUdale · 2016-05-17 21:42

Hi Heater,

Is that right? I guess that makes sense. I must confess that the subtraction is addition of twos complement stuff never really made intuitive sense to me. I convinced myself in part because indeed the results were completely different. In the one case I had the expected answer. In the other case I had either negative or very large positive values - depending on how printf was feeling about things.

To be perfectly honest, this particular cog has gotten so screwed up I can barely make heads or tails of it. All things impossible are starting to happen and I am about to start back at zero with it. I thought I had one small victory with this particular impossibility but I guess not. I will keep looking at it.

All the best,

Tom

Heater. · 2016-05-17 22:50

Actually this is interesting. For example:

x + y - z

Might over flow during the addition. But then then the subtraction could bring the result back into reality. For example:

0x7FFFFFFF + 1 - 1

Yields 0x8000000 after the addition, which has gone negative due to overflow. But the subtraction then yields 0x7FFFFFFF. The expected result.

BUT...

The C standard says the result of an overflow is undefined.

That means the addition above is free to produce any random number as a result.

That means it's pretty much impossible to write a expression in C, like x + y + z, that has a defined result!

Or am I missing a point here?

Does the C standard say anything about the order in which a bunch of additions and subtractions are done?

ersmith · 2016-05-17 23:31

I believe that the standard says that the result is undefined *if* (and only if) overflow occurs. So x + y + z has a well defined result as long as the intermediate operations do not overflow. In practice it would be pretty unlikely for any compiler to generate anything other than the "obvious" code for an expression like that. But for code like:

int x, y, z; // all signed
if (x > 0 && y > 0 && z > 0) {
  if (x + y + z > 0)
    printf("positive");
}

I believe the compiler is allowed to skip the second test and emit the equivalent of:

if (x > 0 && y > 0 && z > 0) {
   printf("positive");
}

since if the individual terms are positive and there is no overflow then the sum must be positive.

TomUdale · 2016-05-18 00:46

ersmith wrote: »
How are the variables declared? If they are unsigned then I think the compiler has to be careful, but if they're signed integers then all bets are off -- IIRC the C standard explicitly says that behavior in the face of underflow/overflow is undefined (and GCC takes advantage of this in its optimizations; basically it assumes that signed integer math cannot overflow).

I'll also note that it looks like the code generated is incorrect in at least one place:
413:cpx_stimdriver.cogc ****         extraBase=CNT;
 606              		.loc 1 413 0
 607 053c 00001408 		IF_NE wrlong	CNT, sp
The wrlong is using CNT as a destination register, but I think that will cause the shadow register attached to CNT to be used instead (I could be misremembering, but I think CNT only returns the right result when used as a source register).

Working around that bug may be awkward: you'll want to make sure that extraBase is in a register. Putting the CNT read in a subroutine call might help (as then it will be read into r0 instead of onto the stack).

This is in fact entirely consistent with something I have been seeing for a long time. Indeed the counts have always gone wrong when the intermediate variable was on the stack. I always had the feeling there was something inherently flaky with the stack and have often done what you said. I am not sure why I did not remember this in this case. I guess because I kept increasing the stack size (thinking there was an overflow) and nothing changed.

Also, the variables are in fact all unsigned. They are counts after all and (I thought) there would be no chance of negative values because I did the math in a specific order.

TomUdale · 2016-05-18 00:52

Heater. wrote: »

Does the C standard say anything about the order in which a bunch of additions and subtractions are done?

My understanding was that it did not. That is why I specified the order with the () because I knew that doing it in that order would not result in an overflow.

Although, in retrospect, propeller code relies on the overflow/underflows working the way you understand. Every timing or wait is either

CNT-previousCnt

or

CNT+somevalue

For those two to work in general, it has to be the way you say.

TomUdale · 2016-05-18 01:27

ersmith wrote: »

The wrlong is using CNT as a destination register, but I think that will cause the shadow register attached to CNT to be used instead (I could be misremembering, but I think CNT only returns the right result when used as a source register).

I just looked into this and you are totally right. CNT can only be used as a source register. Indeed you practically quoted the manual:

"CNT is a read-only pseudo-register; when used as an instruction’s source operand, it reads the current value of the System Counter. Do not use CNT as the destination operand; that only results in reading and modifying the shadow register whose address CNT occupies. "

Heater. · 2016-05-18 04:54

ersmith,

I believe that the standard says that the result is undefined *if* (and only if) overflow occurs.

Well I would hope the standard does say that non-overflowing arithmetic operations are defined !

So x + y + z has a well defined result as long as the intermediate operations do not overflow.

But there's the thing. If I want to do:

x + y - z

I might know that the addition may overflow. And I might always know the final result should always be in range. So I write:

x - z + y

To avoid the intermediate overflow.

BUT if the compiler can compiler can rearrange the order of those operations the compiled code may well do the addition first and we are effectively back to:

x + y - z

Which does overflow and the final result is therefore undefined.

Ergo, it's damn hard to write an arithmetic expression in C that actually has a defined result!

Of course C defines operator associativity which defines their evaluation order, which is left-to-right for + and - so the compiler should not be doing those in any order other than the one written.

As far as I can tell that is.

Tor · 2016-05-18 05:57

In my experience writing '(x - z) + y' does work, when the intention is to avoid a range problem during the calculation. Or at least it does work to re-arrange the sequence when there are potential resolution problems when some parts of the expression are floating point values. (For C in general, not related to propgcc)

Heater. · 2016-05-18 06:01

That's what I'd like to get to the bottom of.

I had always assumed that using brackets would dictate the order of operations. Besides operator associativity rules suggest order is preserved. Statements above indicated that the optimizer could still reorder things.

Which is it then?

Tor · 2016-05-18 06:14

Well, there's this:

(C90, 6.3) "Except as indicated by the syntax or otherwise specified later (for the function-call operator (), &&, ||, ?:, and comma operators). the order of evaluation of subexpressions and the order in which side effects take place are both unspecitied"

Doesn't look so good on the surface of it, at least.
But at least '*' and '/' have higher precedence than '+' and '-', which is often enough to sort out an expression to avoid overflows etc. But maybe not always.

Heater. · 2016-05-18 06:21

Tor,

That section (C90, 6.3) does not spell it out for me. What is a subexpression? I take that to mean the parts in braces. For example:

(a + b) + (c + d) + (e + f)

has three subexpressions.

We don't care which order the "a + b" or "c +c" or "e + f" is performed in.

We may care that the results of those three subexpressions are added in the order given. As the associativity rules suggest.

Tor · 2016-05-18 07:28

But do we really care? The result will be the same, whatever the order and whatever the values.

Heater. · 2016-05-18 09:47

Tor,

But do we really care? The result will be the same, whatever the order and whatever the values.

In my experience it all "just works", so I don't care so much practically.

What I'm getting is what does the standard actually say?

We know it says other the result of overflow is undefined.

If it were to also allow for reordering of operations, as hinted at earlier in the thread, then that would allow overflow where the programmer had carefully ordered operations to avoid it.

That would imply that the standard says that pretty much any expression is undefined!

This is all theoretical. As we know overflowing integer addition on any sane modern processor is defined.

ersmith · 2016-05-18 10:05

Consider

x = (a+a)/2

I think (but would have to read the standard more closely to be sure) that if a is signed then the compiler is allowed to optimize this to:

x = a

on the assumption that (a+a) will not overflow. Of course if (a+a) *does* overflow then the two expressions above are not equivalent.

Heater. · 2016-05-18 10:27

A good example.

I always imagined that at -O0 the compiler did no optimization and both the addition and division would get generated in code. On the other hand when optimizing, source code analysis can reduce it to just an assignment.

But that means that if overflow can occur, then optimized and unoptimized code would produce different results.

I get terrible headache whenever start reading standards documents.

This "random number" generating feature of integer arithmetic has always bugged me. How come processors don't throw an exception on overflow like they do for divide by zero.

Ada is the only language I know that takes care of this.

TomUdale · 2016-05-18 20:14

Hi All,

I was thinking about this on the drive in and it occurred to me that because this particular processor (and indeed almost all modern ones) have non-destructive overflow rules and because the compiler knows this, the compilers are allowed to ignore programmer suggested ordering. I believe the rule for optimizations is "don't change observable behavior". So clearly on these machines addition and subtraction order has no influence on the result and thus changing the addition/subtraction order will not change observable behavior.

However on a machine where the order did matter (say it has saturating logic on arithmetic) then I expect that the compiler will _not_ ignore user specified ordering under the assumption that the user is trying to avoid overflow.

The practical question therefore is should programmers of "normal" machines even worry about this stuff? It is probably folly to think you can write "overflow aware" code on a machine where it does not matter so maybe you should not even bother?

All the best,

Tom

Tor · 2016-05-19 00:48

I checked gcc 4.7.2, gcc 5.3.1, and icc (Intel C) 11.1, and none of them optimized it to x = a, at least not with any available -On

So they all got negative results with a >= 0x40000000

Tor · 2016-05-19 00:52

Heater. wrote: »

How come processors don't throw an exception on overflow like they do for divide by zero.

On SGI you could call a library function to enable exceptions for overflows and NaN and such. We used that for some wind scatterometer processing where the requirements said that flag bits should be set in the output products to indicate that the processing suffered problems like that.

SGI always were great computers. And IRIX.

TomUdale · 2016-05-19 01:45

ersmith wrote: »

The wrlong is using CNT as a destination register, but I think that will cause the shadow register attached to CNT to be used instead (I could be misremembering, but I think CNT only returns the right result when used as a source register).

By the way, is there some bug tracker I should be creating an entry in for this?

Heater. · 2016-05-19 08:31

My conclusion is that the C language specification leaves many things undefined. Like the width of an integer, the order of byte storage, etc etc etc.

Of course it is expected that the compiler implementations define those things.

This leads to all the familiar "surprises" we get when moving code from one system to another, say from a 16 bit machine to a 32, or 32 to 64. Or working on an opposite endianness machine.

It's amazing anything anything works at all.

I was bitten by this a few years back when trying to create security certificates with openssl. For some reason they were created pre-expired despite my setting the expiry date far into the future. I was busy fighting with the VPN configuration so I did not think about it much. Later I found the same procedure worked on a 64 bit PC. Then it hit me, I had set the expiry date 40 years into the future, well past 2038. Just happens the Unix time_t overflows it's 32 bit integer size in 2038 so my certs were pre-expired! On investigation I found that there had indeed been a bug raised against this issue a year or so after it bit me.

As you may know OpenSSL has become famous for being riddled with bugs recently. Many of which can be attributed to the sloppy nature of C.

Tor · 2016-05-19 09:13

This leads to all the familiar "surprises" we get when moving code from one system to another, say from a 16 bit machine to a 32, or 32 to 64. Or working on an opposite endianness machine.

Heh, I do that all the time

Everything works. My CPU emulators, for example (hobby), and other projects (work) as well. I always write endian-independent and 32/64-agnostic code. It takes only a few "rules" to get that right.

But looking at this discussion it's pretty clear that those who claim that C is just some kind of 'high-level' assembler are wrong. With assembly code you are in control of instruction sequence (there are out-of-order CPUs, but we don't want to go there), with C you don't, to some extent at least, as with high-level languages in general. And that's one of the distinguishing differences between a 'compiler' and an 'assembler' too.

As for OpenSSL, I took a look at that code back then (where the heartbeat error was), and I attribute that to horrible programming practice and not much else really.

Heater. · 2016-05-19 11:44

Tor,

Yes, writing writing endian-independent and 32/64-agnostic code is not all that hard. Don't forget 8 and 16 bit machines though

However you might be surprised out how often I have had to sort out such problems when projects have migrated from machine to machine. Often a problem when those brought up in the Windows only world are involved.

I agree C is a lot more than portable assembler.

As for OpenSSL, I took a look at that code back then (where the heartbeat error was), and I attribute that to horrible programming practice and not much else really.

I think there is more to it than that. Let's look at Apples SSL bug:

. . .
hashOut.data = hashes + SSL_MD5_DIGEST_LEN;
hashOut.length = SSL_SHA1_DIGEST_LEN;
if ((err = SSLFreeBuffer(&hashCtx)) != 0)
    goto fail;
if ((err = ReadyHash(&SSLHashSHA1, &hashCtx)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &clientRandom)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;  /* MISTAKE! THIS LINE SHOULD NOT BE HERE */
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
    goto fail;

err = sslRawVerify(...);
. . .

Well apart from the bug pointed out there I see a lot wrong with this code that I don't think the compiler should even accept.

1) Didn't Dijkstra point out that goto is harmful in 1968! Why is there even a goto in any high level language still?

2) Syntax inconsistency. See how an "if" can apply to a single statement or a block of statements enclosed in braces? This is a famous cause of bugs. The language should enforce the use of blocks after if, while, etc.

3) Those "if" statements are assigning to err as well as testing the call returns. Such side effects should not be allowed in a high level a language.

Enforcing the above would have prevented this bug.

One can say it's "horrible programming practice" to write like that. Which is true. But it would not take much for the compiler to spot such errors or even not support the feature at all.

Luckily we have linters and other such tools to find such problems.

I have not looked at the hearbleed bug closely but I bet we can find similar language issues there.

ersmith · 2016-05-19 12:24

TomUdale wrote: »

ersmith wrote: »

The wrlong is using CNT as a destination register, but I think that will cause the shadow register attached to CNT to be used instead (I could be misremembering, but I think CNT only returns the right result when used as a source register).

By the way, is there some bug tracker I should be creating an entry in for this?

Probably the github issue tracker for parallaxinc/propgcc. Unfortunately the SimpleIDE version of PropGCC hasn't been updated in years, so it may be a long time before any bug fixes make it out to the general public.

TomUdale · 2016-05-19 13:02

ersmith wrote: »

Probably the github issue tracker for parallaxinc/propgcc. Unfortunately the SimpleIDE version of PropGCC hasn't been updated in years, so it may be a long time before any bug fixes make it out to the general public.

Oh!! I did not realize that there were multiple versions of propgcc. Where is the location of the maintained propgcc? Is it this one:

https://github.com/parallaxinc/propgcc

I am not married to SimpleIDE at all. I just use it to run the disassembler. Nevertheless it would be ideal to just drop in the newest propgcc. Can it be used with SimpleIDE? Or are there some specific patches that SimpleIDE wants?

Tom

ersmith · 2016-05-19 13:06

Going back to the initial question, which is how to prevent the compiler from re-ordering around barriers so that timing can be achieved, I think it's necessary to add a dummy macro to force computations to be finished at a particular point. For example, to time a simple "add" instruction you can do:

#include <propeller.h>
#include <stdio.h>

// force value "val" to be available
#define USE(val) __asm__ volatile("" : : "rm"(val) : "memory")

int timesum(int x, int y)
{
    unsigned int ticks;
    int r;

    ticks = CNT;
    r = x + y;
    USE(r);
    ticks = CNT - ticks;
    return ticks;
}

The USE macro forces the computation to be finished before the ticks value is updated. So if you're trying to time some code, put USE invocations in before the final time for everything computed by the code you're timing. This shouldn't (generally) affect code generation since we've said that the value may be either in memory or register, but it will prevent some re-ordering since calculations must be finished at the USE point.

Eric

Tor · 2016-05-19 13:29

Heater. wrote: »
[..]Let's look at Apples SSL bug:
. . .
[..]
if ((err = SSLHashSHA1.update(&hashCtx, &serverRandom)) != 0)
    goto fail;
if ((err = SSLHashSHA1.update(&hashCtx, &signedParams)) != 0)
    goto fail;
    goto fail;  /* MISTAKE! THIS LINE SHOULD NOT BE HERE */
if ((err = SSLHashSHA1.final(&hashCtx, &hashOut)) != 0)
    goto fail;
[..]
Well apart from the bug pointed out there I see a lot wrong with this code that I don't think the compiler should even accept.

1) Didn't Dijkstra point out that goto is harmful in 1968! Why is there even a goto in any high level language still?

It's always possible to mis-use it..

2) Syntax inconsistency. See how an "if" can apply to a single statement or a block of statements enclosed in braces? This is a famous cause of bugs. The language should enforce the use of blocks after if, while, etc.

Agreed. At work we have a strict policy of not allowing 'open' if (or similar) statements. We always add the curly braces.

However, that extra 'goto' should have been easily detected, the compiler provides a warning about unreachable code. With the right flags (common now, in default automake setups) that should have failed the build.

3) Those "if" statements are assigning to err as well as testing the call returns. Such side effects should not be allowed in a high level a language.

It's not really a side effect problem there. It's not that bad, the coder has at least used the practice of extra braces around the expression, to tell GCC (and some other compilers) what the intention is.

Enforcing the above would have prevented this bug.

It could definitely have been written differently.

[..]

I have not looked at the hearbleed bug closely but I bet we can find similar language issues there.

The heartbleed bug was caused by someone trying to save typing by something like this (from memory):

struct something *styp = longish->path->pointer->styp;

and passing that styp around to other functions, the only apparent purpose was to avoid having to write those long lines. Nothing wrong with shortcuts, as long as it's kept inside that single block. But the code was full of those kind of pointer-copies.

TomUdale · 2016-05-20 02:29

Hi Eric,

ersmith wrote: »
Going back to the initial question, which is how to prevent the compiler from re-ordering around barriers so that timing can be achieved, I think it's necessary to add a dummy macro to force computations to be finished at a particular point. For example, to time a simple "add" instruction you can do:
#include <propeller.h>
#include <stdio.h>

// force value "val" to be available
#define USE(val) __asm__ volatile("" : : "rm"(val) : "memory")

int timesum(int x, int y)
{
    unsigned int ticks;
    int r;

    ticks = CNT;
    r = x + y;
    USE(r);
    ticks = CNT - ticks;
    return ticks;
}
The USE macro forces the computation to be finished before the ticks value is updated. So if you're trying to time some code, put USE invocations in before the final time for everything computed by the code you're timing. This shouldn't (generally) affect code generation since we've said that the value may be either in memory or register, but it will prevent some re-ordering since calculations must be finished at the USE point.

Eric

Ah, you remember the original question

That is a nice solution. It is lightweight and does not confuse the compiler much. I found a separate approach that is akin to the if(alwaysTrue) path I was describing earlier. It is this:

#define BEGIN_COMPILE_BARRIER(name)  __asm__ volatile goto(""::::name##_begin,name##_end);name##_begin:
#define END_COMPILE_BARRIER(name)  name##_end:

You use it the same as the original post in this thread:

ticks=CNT;
BEGIN_COMPILE_BARRIER(TIMING_BLOCK_1);
// do stuff
END_COMPILE_BARRIER(TIMING_BLOCK_1);
ticks = CNT - ticks;

Unwinding the macro you get:

ticks=CNT;
 __asm__ volatile goto(""::::TIMING_BLOCK_1_begin,TIMING_BLOCK_1_end);
TIMING_BLOCK_1_begin:;
// do stuff
TIMING_BLOCK_1_end:;
ticks = CNT - ticks;

The asm goto statement tells the compiler that the associated asm "code" might jump to either the begin or end label. Because the compiler does not know what will happen, it cannot move and statements across either the begin or end labels in either direction. I included the name parameter to allow multiple blocks within a function or even nested blocks.

This seems to work and does not generate any overhead other than missed optimization opportunities.

The advantage compared to your approach is that for situations where there are large numbers of statements in the //do stuff block, you don't have to USE() a bunch of times to hold them all back. The major disadvantage is that, depending on the code, you can get lots of spurious "variable may not be initialized before use" warnings since the compiler thinks various assignments may get skipped over even though they will not. You do not get this with USE().

I expect there may be some other clever asm trick to "touch" parameters without touching them hereby making it look like they were assigned that might solve that problem. I have decided to ignore that for the moment.

I have learned several things in the last few days: how to block optimization movement, why my val=CNT was failing when val was a stack variable and that propgcc is not being updated with SimpleIDE; all good things. Furthermore, not 2 hours ago I finally found a bug I have been chasing for about a week. Faith is being restored to the land.

Thanks and
all the best,

Tom

Compiler ignoring order of operations.

Comments