This document is getting rather dated, but the basic idea - to add a simple and efficient C-ish exception handling mechanism to the C language - remains just as good as it was when this document was first written sometime back in the Summer of 2009. One thing, however, that I feel is close to worthless in this article is the thought experiement of using the "spare" Intel x86/x64 Carry flag for returning information on whether the value in RAX is a mere function result value or it is an exception pointer. On paper, it looks nifty and is kind of entertaining (somewhat like the Brainfuck programming language), but in real life it doesn't measure up. So my suggestions should be read as follows:
As for effiency, when comparing to a setjmp/longjmp exception handling mechanism, I still believe this one is significantly faster in real life because it does not lead to the excessive code bloat (read: CPU cache misses) that the setjmp/longjmp version normally does.
While I've got your attention, I'd like to thank all the readers who have contacted me about this article. I don't recall all of your names and email addresses, but I do recall the stuff that you've told me and asked me about.
This paper presents an obvious extension of the C language implemented in a non-obvious way. It is basically a very rudimentary form of exception handling support that requires compiler support. I use the same method in my assembly programs but would prefer to code in C, if possible. I will outline what I do in assembly and you will quickly see what I am aiming at. Please notice that I usually code 64-bit assembly, for which reason the examples are given in x86-64 assembler - please substitute all occurrences of RAX with EAX to make it work on a 32-bit x86 platform. The mechanism presented here is suitable for all contemporary microcomputer architectures and all contemporary imperative languages (even if a full register has to be used instead of a single bit for passing the information that an exception has occured such as is the case with MIPS CPUs according to what I've been told).
The proposal here may be referred to as "C with exceptions" to keep up the good tradition established by Mr. Bjarne Stroustrup for his "C with Classes" proposal (which eventually became the C++ programming language). The proposal presented here is very fast, leads to very small code, and is completely thread-safe.
The idea is to provide C with a basic, yet highly usable exception handling mechanism whose performance is so great that it does not incur any noticeable overhead on the running program - whether or not one or many exceptions occur.
To many people, C is nearly a dead language. It seems everybody is using C++ and C# these days. This is not the case, however, as an extremely large number of legacy systems and even some contemporary systems (such as Linux) are being coded in C or a dialect thereof. Adding a simple and efficient exception handling mechanism to C would give many, many programmers an extremely useful tool for implementing their solutions.
As C is not an object-oriented language, I suggest that the exception item, the item that gets "thrown" by throw, is any word-size value - be it a pointer to a string, an integer, a pointer to an elaborate exception structure or whatever. The idea is to extend C minimally to give the programmer maximal freedom and efficiency so as to retain the simple and efficient nature of C. The most flexible seems to be to make throw take an arbitrary void * pointer and make that be the value that is passed down the call stack.
Despite being very interested in programming languages, compilers, interpreters, and their design, I have only some background in compiler design and implementation (even though I have been a programmer on an Ada95 compiler once). Therefore, I cannot easily extend existing open-source compilers with my ideas but have to appeal to you to do the actual work of adding the requested feature to your favorite open source C compiler.
When I code in assembly, I use the following convention:
So far, so good. Each assembler function is coded as follows:
The exception instance, ErrorParameter0, could be anything, but for the time being I am only using simple strings.
The caller of MemoryCreate invokes it as follows:
The really interesting thing here is that RAX is used simultaneously for the return code and for the exception instance pointer. As there is no exception instance pointer if there is no exception, the RAX register can safely be used for the function result in that case. And as there is no return value if an exception occurs, the RAX register can safely be used for the exception instance pointer in that case. The Carry flag is used to indicate whether an exception occured or not.
The Carry flag is used for one reason only: Because the Intel x86(-64) architecture offers instructions to clear and set this flag directly. On other architectures, any flag that can be directly set and cleared can be used.
Experienced readers may recognize the convention as almost identical to that of the BIOS int 13h functions: Upon error, the Carry flag is set and upon success the Carry flag is cleared. The main difference between this scheme and the BIOS int 13h services is that BIOS require the user to make another system call to query the actual error code, whereas this scheme already has loaded the EAX/RAX register with a near pointer to the exception instance.
Overall, the handling of exceptions is now reduced to the simple question of jumping to the exception handler if an exception occurs. If no user-defined exception handler (catch or finally block) is defined, all that needs to be done is to propagate the exception - which simply takes a simple ret instruction to do. The Carry flag remains set and the RAX register is already loaded up with the proper value, so all there is left is to return to the immediate caller who can then either repeat the pattern, of propagating the exception, or break the pattern by actually handling the exception.
If there is program data that needs to be cleaned up prior to returning from a given function, it is only a matter of specifying a finally handler. For languages that offer automatic cleanup, such as C++, the compiler can synthesize an appropriate finally handler to handle the cleaning up whether or not an exception occurs. In many cases, probably most cases, the compiler will only need to synthesize a single finally block for both of the cases.
I suggest the most straightforward and commonly recognized syntax for exception handling:
This would, in x86-64 assembly, be roughly equivalent to this code:
As C is not an object-oriented language, it does not make sense to derive exceptions from a predefined class. Instead, a simple void * pointer is used. The programmer can use this value for anything. This means the exception code will most likely only be usable in a single context, but that is still a major improvement over not having any exception handling in the language at all.
Obviously, the throw statement on its own would be used to re-throw an already signaled exception, just like in pretty much every other C-ish language out there, which would simply entail ensuring the Carry flag was still set and that RAX/EAX was loaded up with the former exception value.
I imagine something like this:
Obviously, it would be nice if one could write things like:
But that goes way beyond the scope of what I have in mind. Such a solution would require some sort of run-time identification of every exception type and that basically means redoing what has already been done for C++. Instead, I propose the exception instance to be a simple void * pointer that by convention points to an Exception structure, or a structure whose first field is an Exception structure, so that it is possible to mix and match modules from various vendors in one single project. However, my primary goal is to implement a C exception handling mechanism that is very efficient while beneficial in a single project. If people need or want a better exception handling mechanism than the one offered here, I suggest they make the transition to C++ or a similar modern language.
You may worry that all the jump-if-carry (jc) instructions will throw a huge performance impact on the generated code. Not so. I have compared by exception handling mechanism to that of C++ and my method is four times faster than the exception handling mechanism of C++. Furthermore, it generates code that is much smaller than the code generated by compilers that implement the traditional C++ exception handling mechanism. Notice that the jump instructions are almost never taken. I have read somewhere that forward jumps are assumed, by the Intel 80x86 processor, to not being taken. In other words: The processor happily assumes that forward jumps are not to be executed, so no stalls occur, which is exactly what we want, and everything works as we want it to. I have not been able to find this information again, though. In comparison to the Intel processors, processors such as the PowerPC 603 defines a mechanism that lets the programmer, or translator, specify if the jump is likely to be taken or not. On such architectures, all of the jump-if-carry instructions should be encoded so that their opcodes indicate that they are not likely to be taken.
Worth noticing, for those worrying about the extra clock cycles overhead imposed by the stc, clc, and jc instructions: Usually C programmers explicitly have to check the return value of the called function to determine if an error has occurred. This is tedious, error-prone, and adds a lot of overhead. Using the presented exception handling mechanism, the programmer no longer has to explicitly check the return value of the called function but can freely focus on his or her real task at hand: To code some code that makes use of some functions. The model presented here is actually much more efficient than the explicit checking of return values because it only requires the use of the clc and jc instructions in the typical case that no exception occurs. If an exception occurs, the run-time overhead entails an additional move instruction that loads the RAX/EAX register with the exception instance pointer. I have intentionally used a very complex example of throwing a non-trivial exception instance, allocated on the heap, so as to demonstrate just how powerful this scheme is. Anything can be thrown and exceptions typically occur only very rarely, for which reason it is of no importance how complex and time-consuming the code that generates the exception is.
I have today, Friday 4th of September, 2009, written two simple C and C++ test programs to test an approximation of the performance of this method. The result was that the C version (which simulates the method presented) performed four times faster than the C++ test program. I'll be happy to email my test programs to anyone who cares for them, but they basically do the following:
And the C version took 7.5816 seconds to execute whereas the C++ version took 28.2984 seconds. These numbers were obtained on a Dell XPS 420 3.16 GHz Intel Core 2 Duo running Microsoft Windows Vista Home Premium x64. The code was compiled and run in 32-bit mode using the OpenWatcom compiler v1.8. As the C version is not as efficient as it would be had it proper compiler support, I guess performance is five to ten times better than the C++ version. The reason for this being that the C version does not have to perform a setjmp() call for each try-catch block, but can happily execute as if nothing had happened at all (please notice that try expands to nothing in the scheme that has been presented in this article). Also, the C version was precisely 33 percent smaller than the C++ version.
I have received a single complaint that I did not properly address border conditions in my paper. I feel that I actually do, because the scheme presented here is so trivial and simple that there are no difficult border conditions. Nonetheless, I will try to complete the presentation so that each and every possible case is considered and explained.
There's the simple case where an exception is thrown and caught within the same function:
When optimized, this code becomes:
Because the "a += 1;" statement is clearly superfluous and will be optimized away. And the catch handler does not reraise the exception nor does it raise a new exception.
The "C with Exceptions" code below:
Becomes the Intel x64 assembler code shown below:
Handling exceptions across functions is actually just as simple as handling exceptions within functions. If the Carry flag is set, the caller jumps to its local catch block. If there is no catch block, the caller simply returns the unaltered Carry flag and Exception Instance Pointer (probably RAX) to its caller and so forth, until an exception handler exists. If an exception handler exists, the caller simply transfers control to that insofar the Carry flag is set. It couldn't be much simpler than this. And at the same time, it cannot be much faster than this. This just shows that the KISS philosophy (Keep It Simple, Silly!) generally produces extremely fast code.
A major disadvantage of the proposal presented here is that it only works if the client code is compiled to conform to the new exception handling mechanism (which is also the case with many C++ exception handling mechanisms). The compiler needs to generate code to propagate unhandled exceptions also in code that does not make use of exception handling. This could easily be implemented by making a new calling convention, say excall, which is automatically enabled for all functions in the program if a given command-line option is specified. Functions that do not make use of the exception handling feature still need to clear the Carry flag upon return - or, alternatively, the compiler needs to keep track of what functions are compiled using the excall calling convention and what functions are not compiled using this calling convention and handle them appropriately.
The disadvantage mentioned above is really rather small: Most PC compilers offer a number of calling conventions, suited to interfacing with various forms of code, and more importantly, the overhead of using the proposed exception handling mechanism is so slight, that it actually is smaller than the overhead of manually testing return codes to see if an error has occurred.
The most beautiful thing about the proposal that I have made here is that it requires no run-time library support whatsoever. Those of us accustomed to making embedded systems using C or C++ know all too well what a pain it is to make a C++ program that uses exceptions run on a bare system. The scheme presented here works right out of the box without any additional run-time library support. The only run-time library support that can be of relevance is a small set of predefined exception instances that define errors that are likely to happen in any program. I have shown a subset of this set in the definition of the header file Exception.h.
There are a number of existing alternatives to the proposal presented here:
All of the presented alternatives, except the undocumented xerror1 module by myself, use setjmp()/longjmp() as the means of transferring control across call frames, which is a very compatible, space consuming, and slow way of doing it. "C with Exceptions", the proposal described in this document, relies on compiler support but in return results in code that will be a bit smaller than the equivalent manually coded error checking code and will also run a bit faster than such code because the exception handling mechanism presented here relies on checking a single architecture-dependent CPU flag whereas manually coded error handling code generally relies on comparing a host of different return values against another host of predefined constants (some functions return an integer that indicate success/failure, some return a pointer, some return a char, some set a global variable, and it is all one big mess).
A compiler such as the OpenWatcom compiler already includes a feature that allows the running program to query the name of the containing function (by deciphering an encoded string that is stored right before the start of the function). Combining this feature with the presented exception handling mechanism, it would be possible to offer full stack traces while retaining the extraordinary high execution speed that this scheme results in.
Like most nerds out there, I'd love to participate in the design and implementation of a microprocessor. I often think that what we need from contemporary microprocessors is a more high-level approach rather than loads of useless, unused features such as segmentation and so on.
The scheme presented here is extremely well suited for inclusion in a future microprocessor: With a few additions to the basic instruction set, the CPU could become exception aware on the lowest level, opening up for the most beautiful implementation of various system error conditions.
Let's assume that the processor gets a hardware register devoted to the exception instance pointer. Then the processor also needs a jump-if-the-exception-register-is-nonzero instruction (and the complimentary jump-if-the-exception-register-is-zero instruction, just for the sake of sensible completeness).
On Intel x86/x64 processors, there's already a jump-if-rcx-is-zero instruction. But what we need is the complimentary instruction, jump-if-rcx-is-nonzero, instruction. Unfortunately, this instruction does not exist - otherwise it would have been logical to use the RCX register as the exception instance register.
But back to our dream microprocessor:
The great thing about adding an exception instance register and an exception flag to the CPU is that many, many low-level error conditions can be handled directly by the application code - without involving the kernel - so that the CPU could simply assign the Exception Instance Register a predefined exception instance pointer whenever a system error occurred. For instance, let's take the instance of an invalid address being addressed. There's really no need to notify the system about this as long as the application code is exception aware and can handle the exception directly. So if client code X addressed illegal address Y, the processor would simply load up the Exception Instance Register with a pointer to a predefined system exception instance and let the client code X handle the exception without notifying the operating system about the addressing error. Some will probably feel that the operating system must know all serious errors in the system, but not so. Why involve the operating system in something that can be settled right out of the box between the CPU and the application code? I see no reason why, perhaps except for statistical purposes, but in that case, the CPU could possibly be programmed into signaling an interrupt whenever something untoward happened. Again, the CPU could be configured to behave however the operating system designer wants, but still offer a very neat and elegant way of handling low-level errors.
This informal paper presented a compact and fast alternative to the traditional stack unwinding that takes place in typical C++ programs. The alternative has been measured to be about four times faster than the equivalent C++ stack unwinding mechanism. The performance gain is achieved by letting the CPU perform the stack unwinding rather than by programmatically implementing an explicit stack unwinding procedure. It is the belief of the author that the method presented here is suited for all languages that implement exception handling (C++, Ada, Java, C#, etc.). The basic idea of using a flag to indicate whether an exception has occurred or not, and to use the primary return value register for storing a pointer to an exception instance, can easily be applied to any imperative language.
I am sending this article to a number of independent compiler writers. I am doing so because I really hope to one day see this excellent error handling method implemented in a real compiler on the market. Most of all, I hope to see this mechanism implemented in one of the open source, freeware compilers that I use:
But I encourage commercial recipients of this article to explore what I have written and to implement it in their compilers. Doesn't a speed-up of four times sound promising? Exception handling is becoming more and more widely used for each and every day for which reason more and more can benefit from what this article presents.
I am certain that the Linux coders out there would appreciate having some form of exception handling mechanism in their code - even though some of them would probably laugh at the idea. But having coded in C and C++ for about 20 years, I can only say that the greatest problem of C is not the lack of object-oriented features (these are straightforward to simulate to a certain degree), but the lack of an efficient exception handling mechanism.
I am aware that traditional stack unwind exception handling can be simulated by using the setjmp() and longjmp() functions instead of try/catch blocks, but such an approach is way too expensive, in terms of clock cycles, and also makes the size of the program explode.
As of January, 2011, Mr. Keith Reynolds has made a patch that modifies the Tiny C Compiler to support the scheme outlined here. The patch is available from SourceForge under the name tcc-exceptions. If you need a patched Windows version of the Tiny C Compiler, please feel free to write me at firstname.lastname@example.org: The procedure for building a patched Windows version is somewhat difficult, unless you are familiar with Unix tools on Windows.
I have contacted the Tiny C folks and they adamantly refused to incorporate my ideas on the grounds that they are not ISO 1999 compliant (despite them having a number of GNU extension in the Tiny C compiler).
I have contacted the OpenWatcom folks and they basically ignored my proposal; a few supporting comments were made but nobody seemed interested in undertaking the project.
I have not yet contacted the GNU folks. If you read this, feel free to forward a pointer to this article to the GNU folks.
If this text is unclear, you have questions about specific details or you have ideas, please feel free to contact me. If you think I am nuts and that the project is insane, feel free to not contact me. I know that the idea of adding (simple) exception handling support to C will make many ask things like Why not just use C++? and Why not write your own compiler?, but look at it from my point of view: I am already programming some assembler modules that make use of this error handling strategy, with great results and very simple code as a result, and would like to write even more code in C and interface this code with my existing assembler code. Existing projects, such as the Linux kernel, could greatly benefit from a simple exception handling mechanism that makes it a breeze to define and handle errors in such a complex software system. Also, C is still widely used in embedded systems development and such systems very often suffer from inadequate error handling and complex code due to the constant problem of handling runtime errors.