The true cost of C++ exceptions

Momtchil Momtchev
12 min readJun 23, 2024

--

A rant about -fno-exceptions as an optimization technique

Not so long ago I got burned for a second time during the last 6 months by a horrible MSVC pitfall called the _HAS_EXCEPTIONS macro. The second time it took me about an hour to realize the problem — the first time it took me almost three days — including the time the write my previous story about using the Google address sanitizer in a Node.js addon on Windows [#7]. Should you be interested in the particular problem of exceptions on Windows for Node.js addons, I have included some links at the very end of the story.

This story focuses on whether disabling C++ exceptions when building is actually a valid optimization.

There is no such thing as C++ without exceptions

No more than there is C++ without const variables or C++ without switch statements. This horrible compiler option has been born out of necessity and inherited from the early days when the compilers had very poor exceptions implementations. A 2006 industry study, without any factual benchmarks, but with some very big names on its committee, including Bjarne Stroustrup, came to the conclusion that thanks to advances in compilation, the early performance problems of using C++ exceptions were more or less completely solved [#5].

I will cover the few still valid reasons not to use them in a later section.

Exceptions are one of the greatest advances in computer languages

This is not an overstatement. Exceptions enable structured error handling that greatly simplifies the code. Every single modern high level language has them. And C++, being this unique language that is both high level and low level at the same time, is not an exception (pun intended).

However, exceptions in C++ have bad reputation. There is a very famous Google coding style document from the 2000s that forbids using them in Google code [#1]. There is also the browser mafia, very inclined to cut corners during the second browser wars, that famously decided to skip them. C++ exceptions are also controversial. A C++ compiler engineer, when asked about exceptions, is wary of starting a flame war [#2]. It seems that the LLVM team itself, the C++ compiler engineers, avoid using them! Novice C++ programmers are told that C++ exceptions are dangerous and slow. C++ exceptions are scary.

If you are new to C++ memory management — learn one simple rule — always catch exceptions by const reference — and throw only std::exception objects. You will never ever worry about memory.

Yeah, but exceptions are slow, right?

Are they?

Show me the code and the benchmarks

There are already quite a lot of benchmarks that deal with throwing exceptions [#3] [#4]. And in case you didn’t know it — throwing an exception is expensive — it is orders of magnitude more expensive that returning an error code.

So if you plan to have a 10% exception rate, as it is the case in the second benchmark from the two cited above, then exceptions are not for you. Exceptions are about exceptional situations. When you have a 10% rate, this is not an error, this is part of the nominal mode and you need another error-handling mechanism.

This story is especially about the compiler option -fno-exceptions and its benefits.

So, before diving in the numbers, lets clarify what does -fno-exceptions do? When a C++ function gets compiled, the compiler must produce the so-called stack unwinding code — code that gets called to destroy all the local objects. As an exception must propagate through a certain number of function calls, it must follow this stack unwinding chain in order to destroy all the local objects. Code compiled with -fno-exceptions lacks part of this information, which means that if an exception reaches it, it is transformed into a std::terminate and terminates the program.

In order to stress the function calling, we will be using the following horrible program which does nothing but function calls:

Note the branch prediction optimization. The differences will be measuring are so small, that unless you get this correctly, you will be measuring mostly the branch prediction performance. It outweighs any impact from stack unwinding by a very wide margin.

We want to measure the impact of compiling with exceptions enabled. Also note that we are not actually throwing any exceptions — we consider throwing to be a very rare event — we want to measure the impact of simply using exceptions.

We will be compiling in a few different modes (you can find a link to the repository with the full code with all #ifdef at the end of story):

  • no_unwind : without exceptions at all, throwing manually replaced by a std::abort through a compile time macro, compiling with:
-fno-exceptions -fno-rtti -fno-unwind-tables -fomit-frame-pointer
  • unwind_dontcare : without exceptions at all, throwing manually replaced by a std::abort through a compile time macro but exceptions enabled in the compiler — a very important test for everyone who is disabling exceptions for performance benefits in code that does not use them
  • unwind_noexcept : without exceptions at all, throwing manually replaced by a std::abort through a compile time macro but exceptions enabled in the compiler and the fibonacci function marked as noexcept
  • unwind_throwing : with throwing exceptions but without a catch block in fibonacci — only a single catch block in main
  • unwind_catching : the full deal as you see it
  • and a plain old C test — for everyone who firmly believes that the world lost a little bit of performance when we started switching to C++ compilers

I did all of the tests using gcc , clang and Intel OneAPI — without any significant differences — except for the -funroll-loops option which is not enabled by default in clang . We will be discussing mainly the gcc results on x86–64. Keep in mind that x86-64 has very good hardware support for exceptions and stack unwinding and the results might be slightly different on other architectures.

Let’s try to analyze those results:

No optimization (-O0)

If you do not throw and do not catch, having or not having exceptions enabled comes down to nothing. Throwing has a small cost, catching at each recursion level has a more significant cost. In this case, there is real work to do — saving a handler before each call.

When it comes to the plain old C — the answer is — no, most of the time, compiling C in C++ mode does not lead to a performance loss.

What is absolutely remarkable is that the -fomit-frame-pointer binary is very slightly larger. How can omitting the frame pointer lead to a larger binary??

Well, this is lesson number 1

-fomit-frame-pointer, another not-so-great “optimization” idea.

Omitting the frame pointer saves one instruction in the beginning. However this also forces the compiler to access all the local variables using the slightly larger instruction based on the stack pointer %rsp instead of the more efficient %rbp instruction:

-fomit-frame-pointer -fno-exceptions
don’t care (ie compiler knows best)

Normal optimization (-O2)

Recursion unrolling, an optimization quite similar to loop unrolling, has kicked in. The functions are much bigger and run slightly faster. However, the above cited LLVM compiler engineer was right — throwing and catching gets in the way of compiler optimization and renders unrolling the recursion impossible.

Lesson number 2

A very important thing to note: our puny efforts at saving several dozen bytes from the function size have been completely dwarfed by the 500 byte increase from the recursion unrolling and now look totally ridiculous. Something to consider in the future.

Aggressive optimization (-O3)

Well, with -O3 gcc has now the upper hand versus the exception handling. Recursion unrolling is possible for all cases. Not only throwing and catching are now completely free — by pure chance — because of the better utilization of the CPU pipelining, catching at each recursion level is actually faster then compiling without exceptions. Just look at the stalled-cycles-backend counter to see the problem:

 Performance counter stats for './test-cc-O3-unwind_catching':

19 565,33 msec task-clock # 1,000 CPUs utilized
184 context-switches # 9,404 /sec
0 cpu-migrations # 0,000 /sec
110 page-faults # 5,622 /sec
66 614 649 139 cycles # 3,405 GHz (49,99%)
6 001 820 007 stalled-cycles-frontend # 9,01% frontend cycles idle (50,01%)
862 867 946 stalled-cycles-backend # 1,30% backend cycles idle (50,02%)
146 312 773 990 instructions # 2,20 insn per cycle
# 0,04 stalled cycles per insn (50,01%)
26 776 756 708 branches # 1,369 G/sec (49,99%)
414 037 604 branch-misses # 1,55% of all branches (49,98%)

19,568759571 seconds time elapsed

19,566978000 seconds user
0,000000000 seconds sys
 Performance counter stats for './test-cc-O3-no_unwind':

21 431,10 msec task-clock # 1,000 CPUs utilized
157 context-switches # 7,326 /sec
0 cpu-migrations # 0,000 /sec
52 page-faults # 2,426 /sec
72 985 918 854 cycles # 3,406 GHz (50,00%)
6 651 978 854 stalled-cycles-frontend # 9,11% frontend cycles idle (50,00%)
9 548 404 958 stalled-cycles-backend # 13,08% backend cycles idle (50,00%)
134 993 845 095 instructions # 1,85 insn per cycle
# 0,07 stalled cycles per insn (50,00%)
19 862 285 441 branches # 926,797 M/sec (50,00%)
788 489 544 branch-misses # 3,97% of all branches (50,00%)

21,433689694 seconds time elapsed

21,428887000 seconds user
0,003999000 seconds sys

I don’t want to dive into this 700 bytes function to see why and the results — which once again are the product of pure coincidence — might be different on a different CPU — as this is on an older AMD CPU for which the optimization might not be perfect. But still, the lesson stands and it is a very important one:

Lesson number 3

Simply avoid fiddling with compiler options. The compiler authors usually know what they are doing and it is unlikely they left out any hidden magical performance increasing options.

Adding code to the stack unwinding

As we already said, the job of the stack unwinding code is to destroy the local variables. However, in this first test, the function did not contain any local variables with destructors. We will see that adding even a single variable will make any execution time difference impossible to measure — as this difference is so tiny. However this will allow to see the code size difference. Here is the code for the second test:

We have added an artificial object construction that cannot be optimized away and has a non-trivial destructor that must be called when the exception travels up the stack. Although I have been very careful to avoid any memory allocation, this was more than enough to become the dominant part of the execution — now the execution time difference is (almost) impossible to measure.

However the code size experiment results remain the same — slight code size increase in the unoptimized case, that is completely dwarfed by the various loop/recursion unrolling optimizations:

Stack unwinding under the hood

We can use the last example to see the additional code needed for stack unwinding:

Actual unwinding code (https://github.com/mmomtchev/cpp-exceptions-cost/blob/main/img/test-vector-cc-O0-unwind_dontcare-annotated.png)

This is the raw unoptimized version. The unwinding code is a small subroutine at the end of the function that calls the destructors of the local objects. The new x86-64 endbr64 instruction — a security feature called Indirect Branch Tracking — marks valid jump locations and makes it particularly easy to spot the beginning of the function and the beginning of the stack unwinding routine. When the function exits normally, a jmp allows to bypass the unwinding. try / catch statements save its entry points in a special memory section which works as a second parallel stack. Sometimes — especially for the standard library — static tables allow for greater efficiency. You can find a more detailed description in [#6].

Alternatives to exceptions

There are few cases where not using exceptions is probably the right answer. For example, in the world of embedded software — where code size may be very important— this allows to skip parts of the compiler runtime — if the compiler supports it.

Do not forget that most normal C++ compilers will always use the same precompiled C++ runtime which uses exceptions.

Also, in very low-level routines, especially when very high error rates are expected, alternative constructs such as Result in Rust or the new std::expected in C++ can offer a better alternative.

It is worth noting that Rust — which does not have exceptions, but relies only on the higher performance Result — is trying to be a successor of C and not C++.

Because of its special status as both high- and low-level language, C++ offers both mechanisms.

Conclusions

  1. If you don’t use exceptions, and you decide to disable them at the compilation level, you can expect some very small execution times gains in some cases, but generally they tend to completely disappear in an optimized build
  2. The code size gains from reducing the stack unwinding code are on the order of 5% to 10% but these are completely dwarfed by loop unrolling when using optimizations
  3. If you decide to actually use exceptions — without throwing that much — every try and catch block has a small cost, that tends to disappear in an optimized build
  4. A throw statement might get in the way of compiler optimization, don’t use it in tight loops
  5. If you throw a lot, you should try one of the alternative error handling mechanisms

If you feel tempted to disable exceptions when building your C++ project, think again. Take the time to actually test it. Check if actually improves anything, because it definitely breaks things.

If it is a standalone application, then this will probably affect only yourself — when you start debugging it. But if you are building a target that people will be linking with, I very strongly recommend you to avoid doing this. Otherwise, sooner or later, you will have an user that will spend countless hours debugging a very bizarre problem.

References:

  1. Google C++ Style Guide https://google.github.io/styleguide/cppguide.html#Exceptions
  2. Exceptions and performance, LLVM discussion https://discourse.llvm.org/t/exceptions-and-performance/56185
  3. Investigating the Performance Overhead of C++ Exceptions: https://pspdfkit.com/blog/2020/performance-overhead-of-exceptions-in-cpp/
  4. P2544R0 C++ exceptions are becoming more and more problematic https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2022/p2544r0.html
  5. Technical Report on C++ Performance
    https://www.open-std.org/jtc1/sc22/wg21/docs/TR18015.pdf
  6. An Introduction to Stack Unwinding and Exception Handling https://www.zyma.me/post/stack-unwind-intro/
  7. Debugging random memory corruption with ASAN in a Node.js addon on Windows with MSVC
    https://mmomtchev.medium.com/debugging-random-memory-corruption-with-asan-in-a-node-js-addon-on-windows-with-msvc-6246af0c22c7

The problem that inspired this story:

Using C++ exceptions in Node.js addons on Windows

Or the horrible _HAS_EXCEPTIONS pitfall in Node.js addons on Windows

If you are building Node.js addons, Microsoft and the Node.js core team have conspired to create a truly horrible landmine waiting for you. MSVC carries two distinct std::exception implementations that differ in memory size and behavior. If part of your code is compiled without exceptions enabled, while another part has them, you will get an almost working binary with some very subtle memory alignment errors. This is the case of Node.js itself and node-gyp which builds by default with exceptions disabled. Node.js does not use C++ exceptions and cannot trigger the problem — however any Node.js addon that uses them is affected when running on Windows.

It is because of this problem that I decided to investigate the benefits of those two dreaded compiler options: -fno-exceptions and /EH* that caused me so much pain. MSVC is a particularly vicious offender since the default behavior deviates from the C++ standard — which means that almost every single project overrides it.

Here is more information on this subject.

  • A note in the SWIG JavaScript Evolution manual:
  • A note in gdal-async :
  • A gist with another example of different behavior of std::exception in MSVC depending on the _HAS_EXCEPTIONS macro:

I am an unemployed engineer living on social welfare in France because of a huge judicial scandal that involves a sexually-motivated extortion with the French police and some of the largest IT companies in the world.

My particular area of expertise is linking C++ and JavaScript.

I am the author of SWIG JavaScript Evolution, the hadron build system and I have authored and maintain a large number of bindings of C++ libraries for JavaScript.

--

--

Responses (4)