Memory Barriers and asm volatile ("":::"memory")

idbruce · 2015-03-20 14:02

Hello Everyone

I am trying to duplicate some Arduino code for use on a Propeller project. My goal is to prevent the compiler from optimizing and reordering certain sections of code. It is my understanding that __asm __volatile ("":::"memory") can be used for this purpose, but I am uncertain how to use it properly.

The Teacup firmware use a define named MEMORY_BARRIER() to accomplish this task. In the following code, you will see how it is used.

#define MEMORY_BARRIER() __asm volatile( "" ::: "memory" )

#define CLI_SEI_BUG_MEMORY_BARRIER() MEMORY_BARRIER()


#define ATOMIC_START { \
                       uint8_t save_reg = SREG; \
                       cli(); \
                       CLI_SEI_BUG_MEMORY_BARRIER();

#define ATOMIC_END   MEMORY_BARRIER(); \
                     SREG = save_reg; \
                   }

#endif

Of course the Propeller does not use a cli() function, nor is SREG defined, but __asm __volatile ("":::"memory") can be used by the Propeller, since it is utilized within atomicity.h.

Basically, I do not want any optimization to occur between ATOMIC_START and ATOMIC_END.

Any help in this matter will be greatly appreciated.

idbruce · 2015-03-20 16:07

For more information about this subject, visit this link: http://preshing.com/20120625/memory-ordering-at-compile-time/. However, I still cannot find anything similar to the code above.

idbruce · 2015-03-20 16:31

After hours and hours of research, I am slowly coming to the conclusion that ATOMIC_START and ATOMIC_END are solely for the purpose of preserving the value of SREG.

Dave Hein · 2015-03-20 16:39

No, that is not correct. They are used to prevent an ISR (or cog in the case of the Propeller) from modifying variables that are being used by the main thread (or cog). In the case of the Propeller you need to use a lock.

idbruce · 2015-03-20 16:51

Dave

The whole ATOMIC_START and ATOMIC_END thing is based upon __asm volatile( "" ::: "memory" ), SREG, and cli(). In fact, there is no documentation for ATOMIC_START and ATOMIC_END except where it is defined within memory_barrier.h. Most documentation pertaining to __asm volatile( "" ::: "memory" ) pertains to preserving order during compilation, additionally there is also documention on preserving the value of the SREG register.

No, that is not correct. They are used to prevent an ISR (or cog in the case of the Propeller) from modifying variables that are being used by the main thread (or cog).

And what are you basing this upon? I have been investigating this for hours now, and I cannot find any documentation to support that.

idbruce · 2015-03-20 16:55

Additionally, as stated previously, I think the locks should replace cli() ->while(lockset(lockID)); and sei()->lockclr(lockID);.

Dave Hein · 2015-03-20 17:20

The ATOMIC macros are use in several places, but the simplest example is when incrementing psu_timeout. psu_timeout is incremented every time clock_250ms is called. When it reaches a value of 120 (i.e., 30 seconds) the power to the stepper motors is turned off. The timer ISR will clear psu_timeout every time queue_step calls dda_step. The function clock_250ms must disable interrupts to prevent dda_step from being called.

When psu_timeout is incremented the value must be read from memory, one is added to it, and then it is written back to memory. If the interrupt were not disabled there is a chance that dda_step would clear psu_timeout between the time clock_250ms reads it, but before it is written back to memory. This would cause the clear to be lost.

psu_timeout is the simplest example of using the ATOMIC macros. The other uses involve manipulation of the queue, and are more complicated.

Note that there is no protection for psu_timeout in the interrupt routines. The assumption is that the ATOMIC_START will disable the interrupt and the interrupt routines will not run. This is not the case when using a separate cog on the Prop. There is no way for one cog to pause another cog. That is why you need to be very careful when trying to get this to run on the Prop. You will need to add locking logic to the timer routines as well as the main routines.

This is why I encourage you to run the thing as a single thread instead of multiple threads/cogs.

EDIT: If you use a single thread you don't have to worry about memory barriers, interrupt enables/disables or locks.

Heater. · 2015-03-21 01:43

Bruce,

And what are you basing this upon? I have been investigating this for hours now, and I cannot find any documentation to support that.

It's all based on simple logic.

Imagine I have two processes that both run this code:

x := x + 1

What is the result?

Clearly to do that calculation each process has to:

1) Read the value of x
2) Add one to it
3) Write the new value back to x

That works fine as long as one process gets to do all that and then the next one. We get the result that X becomes x + 2.

But these processes run at the same time. So maybe they both read x and get say 1. Then they both add 1 to what they have. They now have 2. They then both write the value 2 to x. Se we end up with x + 1.

One needs to prevent this "race" happening. Hence enable/disable interrupts in an interrupt driven system or lock/unlock in a real multi-processor system.

Additionally...I think the locks should replace cli() ->while(lockset(lockID)); and sei()->lockclr(lockID);.

Well, yes, so you do get it after all.

idbruce · 2015-03-21 02:28

Heater

Additionally, as stated previously, I think the locks should replace cli() ->while(lockset(lockID)); and sei()->lockclr(lockID);.

Well, yes, so you do get it after all.

There are four different items:

1) cli()
2) sei()
3) ATOMIC_START
4) ATOMIC_END

By quoting me, you agree that locks take care of the first two items, but how are ATOMIC_START and ATOMIC_END handled?

Heater. · 2015-03-21 03:02

Bruce,

When I look at the Teacup source code I see things like this in clock.c:

ATOMIC_START
    psu_timeout++;
ATOMIC_END

That is almost exactly the case I was describing in my previous post about x := x +1.

Then I see this in analog.c

ATOMIC_START
  // atomic 16-bit copy
  r = adc_result[index];
ATOMIC_END

The comment about "16-bit copy" is a clue there. Remember the Arduino is an 8 bit machine. That means the 16 bits of "r" have to be written a byte at a time. That means that anyone reading "r" is in danger of getting a corrupt value as perhaps they read the low byte of an old value and then high byte of a new value.

So even writing a single variable is not an atomic operation if it is more than a byte wide.

Then I see something more complex in dda_queue.c

ATOMIC_START
  current = &movebuffer[mb_tail];
  if ( ! current->live || current->waitfor_temp || current->nullmove)
    current = NULL;
ATOMIC_END

Here we see a whole complex lot of stuff happening that all needs to happen "atomically" with no possibility of some other process reading or writing the data involved.

In short there are not four different items here. There is only one. You need to ensure that only one process can be accessing shared data at a time. You need mutual exclusion.

That mutual exclusion is done with cli/sei if the processes are interrupts. It is done with locks on the Prop. It is done with special hardware atomically locked instructions on Intel x86 and so on.

All those places where you see ATOMIC_START/ATOMIC_END exactly show where mutual exclusion is required. So just put the lock handling code in to those ATOMIC_xxx macros or functions. Leave the code that uses those unchanged.

Dave Hein · 2015-03-21 04:10

In addition, you will either need to read through all of the interrupt code and sprinkle locks throughout it, or lock it when the interrupt code starts, and free the lock when the interrupt code finishes. The latter case is safer, and actually simulates how an ISR works, where it takes over the entire machine when running. However, in this case the main code will run in parallel with the interrupt code until it hits a critical point that is locked.

Or you can just run single-threaded and not worry about the time-bombs buried in the code.

Memory Barriers and __asm __volatile ("":::"memory")

Comments

Memory Barriers and asm volatile ("":::"memory")