Memory Barriers and __asm __volatile ("":::"memory")
idbruce
Posts: 6,197
Hello Everyone
I am trying to duplicate some Arduino code for use on a Propeller project. My goal is to prevent the compiler from optimizing and reordering certain sections of code. It is my understanding that __asm __volatile ("":::"memory") can be used for this purpose, but I am uncertain how to use it properly.
The Teacup firmware use a define named MEMORY_BARRIER() to accomplish this task. In the following code, you will see how it is used.
Of course the Propeller does not use a cli() function, nor is SREG defined, but __asm __volatile ("":::"memory") can be used by the Propeller, since it is utilized within atomicity.h.
Basically, I do not want any optimization to occur between ATOMIC_START and ATOMIC_END.
Any help in this matter will be greatly appreciated.
I am trying to duplicate some Arduino code for use on a Propeller project. My goal is to prevent the compiler from optimizing and reordering certain sections of code. It is my understanding that __asm __volatile ("":::"memory") can be used for this purpose, but I am uncertain how to use it properly.
The Teacup firmware use a define named MEMORY_BARRIER() to accomplish this task. In the following code, you will see how it is used.
#define MEMORY_BARRIER() __asm volatile( "" ::: "memory" ) #define CLI_SEI_BUG_MEMORY_BARRIER() MEMORY_BARRIER() #define ATOMIC_START { \ uint8_t save_reg = SREG; \ cli(); \ CLI_SEI_BUG_MEMORY_BARRIER(); #define ATOMIC_END MEMORY_BARRIER(); \ SREG = save_reg; \ } #endif
Of course the Propeller does not use a cli() function, nor is SREG defined, but __asm __volatile ("":::"memory") can be used by the Propeller, since it is utilized within atomicity.h.
Basically, I do not want any optimization to occur between ATOMIC_START and ATOMIC_END.
Any help in this matter will be greatly appreciated.
Comments
The whole ATOMIC_START and ATOMIC_END thing is based upon __asm volatile( "" ::: "memory" ), SREG, and cli(). In fact, there is no documentation for ATOMIC_START and ATOMIC_END except where it is defined within memory_barrier.h. Most documentation pertaining to __asm volatile( "" ::: "memory" ) pertains to preserving order during compilation, additionally there is also documention on preserving the value of the SREG register.
And what are you basing this upon? I have been investigating this for hours now, and I cannot find any documentation to support that.
When psu_timeout is incremented the value must be read from memory, one is added to it, and then it is written back to memory. If the interrupt were not disabled there is a chance that dda_step would clear psu_timeout between the time clock_250ms reads it, but before it is written back to memory. This would cause the clear to be lost.
psu_timeout is the simplest example of using the ATOMIC macros. The other uses involve manipulation of the queue, and are more complicated.
Note that there is no protection for psu_timeout in the interrupt routines. The assumption is that the ATOMIC_START will disable the interrupt and the interrupt routines will not run. This is not the case when using a separate cog on the Prop. There is no way for one cog to pause another cog. That is why you need to be very careful when trying to get this to run on the Prop. You will need to add locking logic to the timer routines as well as the main routines.
This is why I encourage you to run the thing as a single thread instead of multiple threads/cogs.
EDIT: If you use a single thread you don't have to worry about memory barriers, interrupt enables/disables or locks.
Imagine I have two processes that both run this code: What is the result?
Clearly to do that calculation each process has to:
1) Read the value of x
2) Add one to it
3) Write the new value back to x
That works fine as long as one process gets to do all that and then the next one. We get the result that X becomes x + 2.
But these processes run at the same time. So maybe they both read x and get say 1. Then they both add 1 to what they have. They now have 2. They then both write the value 2 to x. Se we end up with x + 1.
One needs to prevent this "race" happening. Hence enable/disable interrupts in an interrupt driven system or lock/unlock in a real multi-processor system. Well, yes, so you do get it after all.
There are four different items:
1) cli()
2) sei()
3) ATOMIC_START
4) ATOMIC_END
By quoting me, you agree that locks take care of the first two items, but how are ATOMIC_START and ATOMIC_END handled?
When I look at the Teacup source code I see things like this in clock.c: That is almost exactly the case I was describing in my previous post about x := x +1.
Then I see this in analog.c The comment about "16-bit copy" is a clue there. Remember the Arduino is an 8 bit machine. That means the 16 bits of "r" have to be written a byte at a time. That means that anyone reading "r" is in danger of getting a corrupt value as perhaps they read the low byte of an old value and then high byte of a new value.
So even writing a single variable is not an atomic operation if it is more than a byte wide.
Then I see something more complex in dda_queue.c Here we see a whole complex lot of stuff happening that all needs to happen "atomically" with no possibility of some other process reading or writing the data involved.
In short there are not four different items here. There is only one. You need to ensure that only one process can be accessing shared data at a time. You need mutual exclusion.
That mutual exclusion is done with cli/sei if the processes are interrupts. It is done with locks on the Prop. It is done with special hardware atomically locked instructions on Intel x86 and so on.
All those places where you see ATOMIC_START/ATOMIC_END exactly show where mutual exclusion is required. So just put the lock handling code in to those ATOMIC_xxx macros or functions. Leave the code that uses those unchanged.
Or you can just run single-threaded and not worry about the time-bombs buried in the code.