The Volatile Keyword in C

Not understanding what the volatile keyword does and how it should be used is specially problematic because you get any errors or warnings. Your code will silently fail or present you with wrong results which you may consider valid if don’t know what’s happening.

You probably don’t need to use it often and that’s why this is such an underrated keyword. But it exists for a purpose. It allows you to say this to the compiler:

Please Mr Compiler, do not optimize code that use this volatile variable, because it is, well, volatile. I mean, it can really change its value at any time and you are not capable of noticing it! Or I’m just doing stuff you don’t understand, so please don’t touch it. You have been advised.

In which circumstances can it happen? Let the examples begin. But before, a brief note: results will vary depending on the compiler you use. The compiler itself may decide to accept your “hints” or not. However, I think it’s safe to say that nowadays you may consider any compiler clever enough to consider these kind of “hints”. The following examples were tested with MinGW (the GCC for Windows). Also, note that any optimizations (the -O argument of GCC compiler) were used.

How Fast Can You Decrement a Huge Value?

Suppose you want to measure the time a program takes to decrement a huge integer value until it reaches zero. You would write something like this:

The lines that matter are 12 and 13. The other ones are there essentially for printing the elapsed time in seconds. Can you guess how many time it takes? Zero seconds (well, almost). Does it means you have a super fast computer? No. You can try to increase the initial value of the variable and the result will be the same. But why? Because the compiler is not stupid: it notices that you do nothing in the for loop so it decrements all the values at once and proceed with the remaining code. Logically, this is absolutely right but it doesn’t allow you to measure the actual time that it would take if any optimizations were implicitly applied by the compiler. Let’s bring the volatile keyword to the rescue then. If you prepend the volatile keyword to the variable declaration, as in:

volatile uint64_t i = 999999999;

Now the program runs as expected and it takes about 8 seconds to finish (on my computer; it depends on the CPU speed and how busy it is).

Adding Some Complexity

Instead of doing nothing inside the for loop, what happens if we don’t use volatile but we use the value of i to compute the value of j?

Well, this means the final value of j will be computed with the last value of i. This is also easy for the compiler to optimize so it will take zero seconds do finish unless you use the volatile keyword.
However, if the computation of j depends on the result of the previous iteration, i.e.:

Then, the compiler will actually have to compute all the iterations until we get the final result. Still, you may be assigning a value to j but if you don’t use it on the remaining code it’s actually the same thing of doing nothing (if it’s not used, don’t compute it). That’s why we need to print it (this is our “do something”). In this case, the program takes about 2 seconds to finish. It’s still less than the previous example. Why? Well, that’s because the compiler is still doing some optimizations we are not aware of in order to execute the code faster.

Finally, if both variables are declared as volatile, then it would take about 9 seconds, just a bit more than the first example.

So if you need ABSOLUTELY no optimizations on a given variable (“Mr Compiler, please, don’t touch it!”), use the volatile keyword.

GeckoLoader: An EFM32 Bootloader’s Utility

Are you looking for a CLI or GUI utility to upload programs to EFM32 microcontrollers through the factory-programmed bootloader? Look no further.

Why EFM32 microcontrollers?

EFM32 are my first choice for low-power sensor-based systems because, well, they are “Energy Friendly Microcontrollers”, providing several interesting features specially designed for that purpose and a nice set of peripherals (go figure on their website). With a 32-bit ARM Cortex processor at its core, performance is guaranteed. They also provide support for GCC ARM Embedded toolchain which, in my opinion, is a HUGE plus, making them really nice and easy to use (or getting started with), without code size limits or having to pay for a commercial toolchain (like Keil or IAR). Silicons Labs provide all the startup code, linker scripts, an easy to use and well documented peripheral library, tools to monitor power consumption and even an Eclipse-based IDE. It’s everything inside Simplicity Studio which can be freely downloaded.

Disclosure: I don’t work at Silicon Labs and I am by any means associated with them. I’m just an EFM32 enthusiast 🙂

A great feature of EFM32 microcontrollers, by Silicons Labs, is the factory-programmed UART bootloader that may be used to upload programs into the microcontroller instead of using a commercial programmer. The bootloader uses the XMODEM-CRC protocol to transfer data and, as referred on the  application note AN0003 UART Bootloader, TeraTerm, which supports that kind of data transfers, may be used. However, if you want the capability of uploading a program through your own application a command-line utility is what you need. Look no further, you have efm32_loader which can run in CLI or GUI mode (if any arguments are provided).

Gecko Loader

CLI mode usage:

UART: efm32_loader.exe <port_name> <bin_file> uart <boot_pol>
USB:  efm32_loader.exe <port_name> <bin_file> usb

Regarding hardware, all you need is a USB-to-UART converter connected to your computer. Connections are as follows:

TX  -- BOOT_RX (E11)
RX  -- BOOT_TX (E10)

Please be aware that, in order to prevent the bootloader from being overwritten, the linker script must be modified as described on application note AN0003.

The source code is available on Github:

And there’s also a Windows executable, available here:


Bugs report? Suggestions or feature requests? Please use the Github or the comment section bellow.

Update (29/09/2015): added support for USB bootloader

The Art of Fixed-Point Representation

Have you been using float or double variables to perform mathematical operations on embedded systems without a Floating-Point Unit (FPU)? You are doing it wrong! That’s incredibly inefficient. Use fixed-point representation instead.

An FPU is an hardware block specially designed to carry on arithmetic operations on floating point numbers. Even though the C/C++ code may work without an FPU, it’s always much faster to use hardware designed for a specific purpose, like this one, instead of relying on a software implementation, something that the compiler will do for you, knowing the hardware restrictions you have but not in an efficient manner. Essentially, it will generate a lot of assembly code, greatly increasing the size of your program and the amount of time required to complete the operation. Thus, if you don’t have an FPU available and you still want to perform those arithmetic operations efficiently you’ll have to convert those numbers to fixed-point representation. Integers! But how? By scaling them. Let’s see how that scaling value may be determined.

The scaling value as well as the resulting scaled number, which is an integer, really much depends on the bitness of the CPU’s architecture being used. You want to use values that fit in the available registers which have the same width as the CPU buses. So, whether you are working with an 8, 16 or 32-bit architecture, the range of integer values we can store on those registers,  being b the number of bits and representing numbers in two’s complement, is given by:

    \[-2^{b-1} \leq value \leq 2^{b-1} - 1\]

Fixed-Point Representation

If one bit is used to represent the sign (and in this text we’ll always consider signed numbers) the remaining ones may be used to represent the integer and fractional parts of the floating-point number.We may textually represent this format as follows (denoted as Q-format):


Where m corresponds to the bits available to represent the integer part of and n corresponds to the bits available to represent the fractional part. If m is zero you may use just Qn. So, when you use a register to save both integer and fractional parts (and the sign bit!), the value range is then given by:

    \[-2^{m} \leq value \leq 2^{m} - 2^{-n}\]

(note that the expression above is a particular case of this one, for n=0 and m=b-1).

It’s up to you deciding how many bits are reserved for m and n (still, you should base your decision on a good criteria: the more bits, the greater the precision you can achieve; more on this bellow). So, you are essentially fixing an imaginary point in your register that separates the integer and fractional parts. That’s why it’s called fixed-point.

Now, consider the following floating-point number:


Since the integer part is zero, you have n=b-1 bits to represent the fractional part. You do that by multiplying the floating-point number by 2^n. And that’s our scaler! Simple as that. For 8, 16 and 32-bit architectures, these are the resulting scalers and corresponding scaled values (i.e. the floating-point number represented in fixed-point):

Q-format Scaler x_{scaled} x_{rounded}
Q7 2^7 15.802 16
Q15 2^{15} 4045.4 4045
Q31 2^{31} 265119741.247488 265119741

Yes, after scaling the floating-point number you may still get a floating-point number as well, so you have to round it to get an integer.

Precision (Or The Lack of It)

Using finite word lengths (i.e. 8, 16 or 32-bit registers/variables) limits the precision with which we can represent a number. The more the bits we have, the greater the precision we can achieve. It’s a trade-off. Usually, the less bits you have available (in the architecture as well as on physical memory), the cheaper is the hardware.

Thus, it is important to respond to the question: “how many bits do I need to represent a floating-point number in fixed-point format and retaining the same accuracy?”

The answer is given by this expression:

    \[d\,log_210 \simeq 3.3 d \quad bits\]

Where d is the number of digits of a fractional number. For example, to retain the same accuracy of 0.123456 we need at least 3.3 \times 6 = 19.8 \simeq 20 bits.

Arithmetic Operations

Now if you want to perform arithmetic operations with fixed-point numbers, there are some rules to follow.


Multiplying two numbers with the same base and different exponents results in x^a \times x^b=x^{a+b}. Since for each operand we have 2^{b-1} possible signed numbers, by multiplying them we’ll get (2^{b-1})^2=2^{2b-2} possible signed numbers as a result. That means we need twice the space to store the result of a multiplication and that you get 2 sign bits in it. For example, if both operands are 16-bit values, we’ll need a 32-bit register to store the result, where the two most significant bits are sign bits and the remaining 30 bits contain the result.

The same principle applies to fixed-point multiplication and the output Q-format is given by:

    \[Qm_1.n_1 \times Qm_2.n_2=Q(m_1+m_2).(n_1+n_2)\]

In the example above, if we represent the operands in Q15 format, then we’ll get a result in Q30 format. To remove the extra sign bit you only need to shift-left the result. You may also want to reuse it in another multiplication. In that case, it’s useful to convert the result to the same Q-format as the operands. That means we have to truncate our value. Discarding bits containing information will result in loss of precision. In order to reduce that loss the result is rounded before truncation. Adding 2^n to our result will do the rounding. Truncation is then accomplished by right-shifting the result by n bits or by (n+1) bits in case the extra sign bit was already discarded (by left-shifting).

The following code snippet illustrates these operations by multiplying 0.123456 by itself. This code can also be used to test fixed-point multiplication for different Q-formats.
By running it with Q7, Q15 and Q31 formats we can build the following table:

Format Result (Fixed) Result (Float) Error
Float - 0.015241
Q7 2 0.015625 0.000384
Q15 500 0.015259 0.000017
Q31 32730621 0.015241 0

As you can see, the result obtained with Q31 is the only one where the error is zero. That’s because it’s the only format in which we can retain the same accuracy as the floating-point numbers used (the multiplication operands).


Division is trickier. When dividing two numbers with the same base and different exponents, the exponents are subtracted: x^a / x^b = x^{a-b}, which means:

    \[Qm1.n1 / Qm2.n2 = Q(m1-m2).(n1-n2)\]

If both operands were represented in the same format you would get the result in Q0. That’s a problem! But has very simple solution: if we convert the dividend to Q(2 \times n) format, then we get the result in Qn. This is done by left-shifting the dividend by n. We also want to round our values. In division this is accomplished by adding half the divisor to the dividend before proceeding with the division itself. The following code snippet illustrate these operations.

There’s still another “trick”. Consider the following two operands and divisions:

    \[x = 0.123456 \qquad y = 0.654321  \]

    \[x/y = 0.18868 \qquad y/x = 5.3\]

The result of x/y can be represented with m = 0 (i.e., Q7, Q15, Q31). However, for y/x at least 3 bits must be reserved for the decimal part. Hence, possible formats to use in this operation, for both operands and, consequently, the result, would be Q3.4, Q3.12 and Q3.28.
Unlike multiplication, the division of two fractional numbers without integer part, may or may not result in a number with integer part. This is of extreme importance when choosing the most appropriate Q-format.

Again, the following tables can be built:

Format x/y Result (Fixed) x/y Result (Float) Error
Float - 0.18868
Q7 23 0.179688 -0.008992
Q15 6182 0.18866 0.000012
Q31 405182993 0.188679 0.000002

Format y/x Result (Fixed) y/x Result (Float) Error
Float - 5.3
Q3.4 160 10.0 4.3
Q3.12 21737 5.306885 0.006885
Q3.28 1422717077 5.300034 0.000034

We can conclude that in Q3.4 format there aren’t enough bits to get a result with reasonable accuracy. We can also see that Q31 is the only format that, by rounding the result to the same decimal place as the “original result” (float format) would yield the same exact value.

Several small microprocessors may not have an hardware divider available, unlike multipliers which are very common. So the division may still be considered as an heavy operation (to be avoided). However, it’s always more efficient to do it with integers than with floats.

Addition and Subtraction

Addition and subtraction are equivalent operations since we are dealing with signed numbers. Two possible issues with these operations are: overflow and underflow, if the resulting value is larger/smaller than the one that can be stored in hardware.

The rule is: the sum of N individual b-bit binary numbers can require as many as b+log_2(N) bits to represent the results. For example, the sum of two 8-bit words requires an accumulator whose word length is 8+log_2(2)=9 bits to ensure no overflow errors occur.

Also, in fixed-point representation the operands to be added or subtracted must be in the same Q-format.

Additional Notes

Why did we choose 2^n as the scaler value? Couldn’t it be another number? Usually, we, humans, tend to use a 10^n value to scale or aid on the representation of fractional numbers, such as scientific notation. That’s a value we easily understand, it just moves the decimal place and the digits keep the same. However, in hardware values are stored in binary and as such using powers of two is more efficient. That’s what allows us to convert numbers in different Q-formats by simply shifting them and thus improving the efficiency of arithmetic operations.

Now you should feel comfortable enough to embrace the world of fixed-point representation. However, if you don’t know where to use it (or where it’s commonly used), let me tell you one thing: fixed-point representation is thoroughly used in Digital Signal Processing (DSP). You may use it to create digital audio effects, create inertial measurement units or extract other meaningful data from digital (and noisy) signals by filtering and processing them.

On the next post I will write about some types of digital filters and how they can be implemented in fixed-point.

The full source code to test fixed-point multiplication and division is available here.

Did you find this interesting? Let me know what you think.