Memory Barrier
TomUdale
Posts: 75
in Propeller 1
Greetings,
Does this actually work on propgcc:
error: 'asm' undeclared (first use in this function)
so I cannot actually tell if it did anything. In my experience with timings, the compiler is so sensitive to changes in code that there is no logic to looking at the actual number you get back and you really have to examine the assembly to figure out why a time did what it did.
I am trying to prevent movement of timing measurements when compiling with -Os optimizations in a COGC. For example:
All the best,
Tom
Does this actually work on propgcc:
#define COMPILER_BARRIER() asm volatile("" ::: "memory")? I get very strange behavior in that it will compile under SimpleIDE, but when I try to look at the generated asm, I get:
error: 'asm' undeclared (first use in this function)
so I cannot actually tell if it did anything. In my experience with timings, the compiler is so sensitive to changes in code that there is no logic to looking at the actual number you get back and you really have to examine the assembly to figure out why a time did what it did.
I am trying to prevent movement of timing measurements when compiling with -Os optimizations in a COGC. For example:
cnt=CNT; //do a bunch of stuff externalStruct->time=CNT-cnt;The compiler feels free to move both timing lines pretty much anywhere it wants since there are no dependencies between them and anything that happens in the "do a bunch of stuff" code. Indeed, strictly speaking I believe the following is an allowable reordering under the rules of C:
cnt=CNT; externalStruct->time=CNT-cnt; //do a bunch of stuffObviously it typically does not move them nearly that far, but enough to be annoying when tuning things. So I want to do something like:
cnt=CNT; COMPILER_BARRIER() ; //do a bunch of stuff COMPILER_BARRIER() ; externalStruct->time=CNT-cnt;to lock those lines down.
All the best,
Tom
Comments
Did you try "__asm" instead of "asm"?
You mentioned that it worked for you under SimpleIDE, but you couldn't look at the generated asm? How are you generating the asm?
One oddity I have found with GCC is that "asm volatile" is not allowed outside of functions (i.e. at top level), but plain "asm" is. That shouldn't be an issue for your #define as long as COMPILER_BARRIER() is used inside of function bodies only.
For normal builds we have a makefile we run under Visual Studio. We started on SimpleIDE and moved to this approach when we needed multiple build configurations and could not find a nice way to make SimpleIDE do it.
I now use SimpleIDE only for generating asm listings and map files because we have not put that stuff into our makefile and I am too lazy to do so. So I just right click on the file and select the assembly option.
I was surprised when it choked so I tried to build from within SimpleIDE and it was fine. Hence my confusion.
I have not tried the asm volatile barrier outside a function yet. I will make sure not to.
Any chance __asm as suggested by ElectroDude will sort out SimpleIDE? Maybe it does not know about the asm keyword when generating assembly listings for some inexplicable reason? Or would I be better off just adding the ability to generate listings straight from my makefile?
All the best,
Tom
Have you tried it?
According to Google, asm isn't an official C keyword. You can tell GCC whether it should treat asm as a reserved word or as a keyword denoting a block of inline assembly. But since __asm and __asm__ start with double underscores, and anything starting with a double underscore is reserved to the compiler, the compiler can safely always enable them. I think you want __asm__ and not __asm , but I'm not really sure of the difference. David Zemon's PropWare library, which is full of inline fcache'd assembly, uses __asm__ volatile, so I guess that's what you want.
I thought it was an extreme outside chance that the compiler would know the asm alias for compiling and not for disassembling but I was wrong. __asm does "fix" the problem in that I no longer get errors. I use "fix" because the barrier does not work.
Here is the macro that compiles:
Here is the end of my loop where I use it:
You can imagine of course that ct is set to CNT at the top of the loop.
Here is the assembly listing of the generated code:
You can see that CNT is recorded into a register on line 629. The difference and the address of stimLoopTime are calculated immediately afterward (but it is not saved until line 641). Then there is a pile of code from before the barrier.
What I expect is that lines 629-632 and 641 should be all together immediately before line 655. By my count, my loop time is 15 instructions shy of what it should be. This is not that big a deal because I have more slop in my downstream timings than 15 instructions. But I don't think that anything in principal prevents those 15 instructions from being 50 or 100. What if those instructions are a loop? That would be bad.
Any thoughts anyone?
All the best,
Tom
I've had luck with declaring a function and having that return CNT, and then using the function everywhere in place of CNT: This seems to discourage code motion, I think because the function call acts as a sequence point.
Eric
And I gather then that there is no __asm volatile("":::"code|memory") or whatever? It is very strange that this is not working because I saw this asm volatile code on several different blogs (e.g. http://preshing.com/20120625/memory-ordering-at-compile-time/) and this sort of movement was exactly the kind of thing that was being discussed. Indeed it seems somewhat impossible to write lock free code with GCC if this barrier only prevents movement of the memory access and not that of the subsequent register handling.
Anyway, I will try your getcnt() suggestion. I had thought of that before but did not even try because the ability of the gcc optimizer to unwind tricks like this is staggering. I thought for sure it would see through that function, inline it straight away and then apply the same optimizations as it does with the straight line code.
Thanks for the suggestion.
Best,
Tom
GCC has a noinline attribute, if you're afraid of it inlining getcnt():
The GCC documentation explicitly states that using the memory "clobberer" creates a compiler memory barrier:
https://gcc.gnu.org/onlinedocs/gcc/Extended-Asm.html#Clobbers
While it might conceivably be a bug in the propgcc implementation, I think the likely problem here is that we are not contending with a memory barrier issue at all but rather, as Eric said, it is a code reordering issue.
Thanks but unfortunately, it does not help. Here is the new code with the getcnt function:
Here is what you get with a bare getcnt definition: You can see the function is indeed inlined at a very low level. All the remains is the return CNT;
Now here is what you get if you noinline getcnt:
Indeed it does not inline getcnt. But it changes nothing. Combining getcnt with the memory barrier is also completely ineffective.
I think the issue is that all the statements in the code sequence before my getcnt() line are almost completely independent of each other. There are no data dependencies that might enforce some order. So GCC in its wisdom just decides to reorder them based on who knows what.
The only thing I can think to do now is either put everything except the timing calls into a big function so I can enforce the order or to see if there is some specific optimiziation that can be turned off (not just going from Os to O0, but rather turning off "statement-reorder" or some other hopefully existing thing). I know that Os is actually shorthand for 20 different specific things so perhaps one is the one that causes this top level reordering.
I dare say that GCC is too clever by half for microcontrollers.
All the best,
Tom