Introduction
When the C programming language first came out on the market the compilers did very little optimization when they were translating the code into an executable binary. This led not only to larger executables but also quite often as not the inclusion of CPU instructions in the programs which performed no meaningful task. The extraneous code was as often as not the fault of the programmer as it was that of the compiler.
A simple example of the programmer being at fault can be seen with the following code fragment.
int * x; ... void func( ) { * x = 1; /* OTHER INSTRUCTIONS */ * x = 2; } ... func( );
If the comment section marked "OTHER INSTRUCTIONS" in example 1 never used the value pointed to by "x", then the first assignment of * x = 1 is probably not required and could be removed. Yet a compiler which did not perform optimization would not recognize this fact and the * x = 1 command would still end up being included in the executable.
In an attempt to help avoid situations like the one outlined above, many developers expended and continue to expend a lot of effort on supplying intelligent optimization in the compilers. Today those optimizations handle a wide variety of situations. Not only will the typical compiler optimize out the extraneous * x = 1 in the example but will also automatically handle much more complex situations which may not be immediately apparent to even a senior programmer. The expected end result of these optimizations is more efficient executables and a smaller required memory footprint.
Normally having the compiler perform optimization is a good thing, but as with all things there is also a dark side. Consider what happens if the memory pointed to by variable "x" was a control line for an external device. Also imagine that sending a value of 1 to that memory told our ficticious device to begin some operation and that sending a value of 2 told it to stop. If the compiler did optimization and removed the * x = 1 instruction then the external device would never receive a start signal. This would definitely not be the intent of the programmer.
The first solution to this problem of a compiler performing optimization where it should not be done is to completely turn off optimization. This would certainly work but then you are sacrificing the efficiency of the entire program for one tiny section. Instead of going to such extremes, there is an alternative supplied by the C programming language. That alternative is to use the "volatile" type qualifier in the definition of the variable in question.
When the keyword volatile is used in the type definition it is giving an indication to the compiler on how it should handle the variable. Primarily it is telling the compiler that the value of the variable may change at any time as a result of actions external to the program or current line of execution. Once the compiler knows this it will also know that no operations involving the variable should be optimized out of the code no matter how extraneous they may appear to the optimization algorithm. The compiler also now knows that any use of the variable in the program must use the value currently in memory and not a previously cached value.
Declaring a volatile variable
Now that the effect of the the "volatile" keyword has been briefly covered, we need to know exactly how to use it. It is quite simple if you are already familiar with the "const" keyword since "volatile" may be used anywhere that "const" is allowed. This means that all you have to do to indicate that a variable is volatile is to include the keyword before or after the type indicator in a variable definition.
An example of declaring a simple volatile int type variable would be:
volatile int x; int volatile x;
If you had a pointer variable where the memory pointed to was volatile you could indicate that using:
volatile int * x; int volatile * x;
On the other hand, if you have a pointer variable where the address itself was volatile but the memory pointed to was not then we have:
int * volatile x;
Last but certainly not least if there was a pointer variable where both the pointer address and the memory pointed to were both volatile then you could do:
volatile int * volatile x; int volatile * volatile x;
Common mistakes and pitfalls
We have already seen one example where declaring a variable to be volatile will ensure the proper compilation and execution of a program. You may in fact be tempted to declare all variables which are not const as volatile. If you are thinking of doing that then realise it is much simpler just to turn off all optimization at compile time instead of typing the word volatile repeatedly. Doing either of these things though would almost always be undesirable. Instead, determine which variables are truly volatile and declare those and only those as such.
There are still other pitfalls to using volatile that haven't been mentioned yet. Looking at some more example code:
volatile int index; thread_func_A( ) { ++ index; } thread_func_B( ) { ++ index; }
This is something that can be found in many examples of how to make use of volatile keyword. There is one main belief held by those who try to use volatile in this fashion. That belief is that the increment operation is a "simple" single step instruction and that by using volatile it will make increment an atomic operation. This may or may not hold true on a single CPU system and it certainly is incorrect if the code is executed on a multiple CPU computer.
The reason the code always fails to operate correctly on a system with more than one CPU is due to the fact that even a simple increment operation in actuality consists of three steps. Those steps consist of the CPU reading the value at the memory location, then incrementing it and finally storing the new value. This means that you can and eventually will end up with a circumstance like:
- index = 0
- thread A reads index and finds 0
- thread B reads index and finds 0
- thread A increments its value of index by 1
- thread B increments its value of index by 1
- thread A stores 1 back to index
- thread B stores 1 back to index
Its immediately apparent that we were expecting index to be incremented once by each thread and end up with a value of two. This is not what happened because declaring a variable as volatile does not guarantee that the operation will be atomic. It only promises that the increment operation will get and set the value of index directly instead of using a cached version. If atomic operations are really needed then a true mutual exclusion lock like that provided by a POSIX library should be employed.
Following is another example of where things can go wrong when using volatile improperly. In many ways it is similar to the problem with threads except but it also shows how platform specific conditions can alter program execution.
volatile long int * dev = PORT; void func( ) { for( ; ; ) { /* loop until device says data ready */ while( 0x00000001 != * dev ) { /* short delay */ } action_a( ); } }
Since we have an example lets set some other conditions:
- the size of a long int is 32 bits
- memory reads / writes are done 32 bits at a time
- the variable dev maps memory to some external device
- the external device may set dev to any value from 0x00000001 to 0xFFFFFFFF
- the program only wants to break from the loop when dev is exactly 0x00000001
Using the code from example three, if we can be assured that the conditions are true then the program should work exactly as desired on any system. The problem is that proper operation is dependent on some conditions that may be beyond our control. To see how lets modify condition two so that we get:
- the size of a long int is 32 bits
- memory reads / writes are done 16 bits at a time
- the variable dev maps memory to some external device
- the external device may set dev to any value from 0x00000001 to 0xFFFFFFFF
- the program only wants to break from the loop when dev is exactly 0x00000001
It easy to see that these are NOT contrived conditions. Often code is moved between platforms which, dependent on the processor(s) and other hardware considerations, may read and write memory in chunk sizes which differ from the size of the variable type you have declared as volatile. So given the new conditions we could now have the program execution sequence:
- value of dev is 0xFFFF00001
- code checks dev with two 16 bit reads, finds 0x00000001 != dev, enters short delay
- external device generates a value of 0x0000FFFF
- external device writes high order bytes setting dev to 0x00000001
- code exits delay with two 16 bit reads, finds 0x00000001 == dev, exits loop!!!
- external device writes lower order bytes setting dev to 0x0000FFFF
All that can be said it "oops". In the first case we made the assumption that any memory read or write would occur in a single step and were correct. The minute we switched to another platform with slightly different conditions though, the single step condition no longer held true and we get unexpected behavior from our program.
Now it should be pointed out that if the condition was 0x00000000 != * dev and we cared only about that condition and not the exact value of dev, then the code would operate correct only any platform. You could still potentially have a bit of a collision condition. An example of such a collision would be the case where dev was set to 0xFFFF000 then 0x0000FFFF. The difference between this collision and the one in the previous example is that here the code would simply be delayed slightly in the conditional loop instead of erroneously breaking out of it.
Conclusion
The point of all this is that volatile can be extremely useful in many circumstance but you must use it judiciously. It is not meant to be a cure-all and certainly should not be used under the assumption that it will somehow standardize the way the underlying system(s) operates. The only effect of using volatile is during code compilation. If you remember those simple facts you will be fine.