Why is it so difficult to make C less prone to buffer overflows?

Question

I'm doing a course in college, where one of the labs is to perform buffer overflow exploits on code they give us. This ranges from simple exploits like changing the return address for a function on a stack to return to a different function, all the way up to code that changes a programs register/memory state but then returns to the function that you called, meaning that the function you called is completely oblivious to the exploit.

I did some research into this, and these kinds of exploits are used pretty much everywhere even now, in things like running homebrew on the Wii, and the untethered jailbreak for iOS 4.3.1

My question is why is this problem so difficult to fix? It's obvious this is one major exploit used to hack hundreds of things, but seems like it would be pretty easy to fix by simply truncating any input past the allowed length, and simply sanitizing all input that you take.

EDIT: Another perspective that I'd like answers to consider - why do the creators of C not fix these issues by reimplementing the libraries?

score 35 · Accepted Answer · 2012-02-18T15:20:06.350

They did fix the libraries.

Any modern C standard library contains safer variants of strcpy, strcat, sprintf, and so on.

On C99 systems - which is most Unixes - you will find these with names like strncat and snprintf, the "n" indicating that it takes an argument that's the size of a buffer or a maximum number of elements to copy.

These functions can be used to handle many operations more securely, but in retrospect their usability is not great. For example some snprintf implementations don't guarantee the buffer is null-terminated. strncat takes a number of elements to copy, but many people mistakenly pass the size of the dest buffer.

On Windows, one often finds the strcat_s, sprintf_s, the "_s" suffix indicating "safe". These too have found their way into the C standard library in C11, and provide more control over what happens in the event of an overflow (truncation vs. assert for example).

Many vendors provide even more non-standard alternatives like asprintf in the GNU libc, which will allocate a buffer of the appropriate size automatically.

The idea that you can "just fix C" is a misunderstanding. Fixing C is not the problem - and has already been done. The problem is fixing decades of C code written by ignorant, tired, or hurried programmers, or code that has been ported from contexts where security didn't matter to contexts where security does. No changes to the standard library can fix this code, although migration to newer compilers and standard libraries can often help identify the problems automatically.

score 19 · Answer 2 · answered Feb 18 '12 at 20:34

It's not really inaccurate to say that C is actually "error-prone" by design. Aside from some grievous mistakes like gets, the C language can't really be any other way without losing the primary feature that draws people to C in the first place.

C was designed as a systems language to act as a sort of "portable assembly." A major feature of the C language is that unlike higher-level languages, C code often maps very closely to the actual machine code. In other words, ++i is usually just an inc instruction, and you can often get a general idea of what the processor will be doing at run-time by looking at the C code.

But adding in implicit bounds checking adds a lot of extra overhead - overhead which the programmer didn't ask for and might not want. This overhead goes way beyond the extra storage required to store the length of each array, or the extra instructions to check array bounds on every array access. What about pointer arithmetic? Or what if you have a function that takes in a pointer? The runtime environment has no way of knowing if that pointer falls within the bounds of a legitimately allocated memory block. In order to keep track of this, you'd need some serious runtime architecture that can check each pointer against a table of currently allocated memory blocks, at which point we're already getting into Java/C#-style managed runtime territory.

score 15 · Answer 3 · answered Feb 18 '12 at 09:30

I think the real problem isn't that these kinds of bugs are hard to fix, but that they're so easy to make: If you use strcpy, sprintf and friends in the (seemingly) simplest way that can work, then you've probably opened the door for a buffer overflow. And nobody will notice it until someone exploits it (unless you have very good code reviews). Now add the fact that there are many mediocre programmers and that they're under time pressure most of the time - and you have a recipe for code that is so riddled with buffer overflows that it'll be hard to fix them all simply because there's so many of them and they're hiding so well.

score 7 · Answer 4 · answered Feb 18 '12 at 09:47

It's difficult to fix buffer overflows because C provides virtually no useful tools to address the problem. It's a fundamental language flaw that the native buffers provide no protection and it's virtually, if not completely, impossible to replace them with a superior product, like C++ did with std::vector and std::array, and it's hard even under debug mode to find buffer overflows.

score 7 · Answer 5 · answered Feb 20 '12 at 17:01

The problem isn't with the C language.

IMO, the single major obstacle to overcome is that C is just plain taught badly. Decades of bad practice and wrong information have been institutionalized in reference manuals and lecture notes, poisoning the minds of each new generation of programmers from the beginning. Students are given a brief description of "easy" I/O functions like gets¹ or scanf and then left to their own devices. They aren't told where or how those tools can fail, or how to prevent those failures. They aren't told about using fgets and strtol/strtod because those are considered "advanced" tools. Then they're unleashed on the professional world to wreak their havoc. Not that many of the more experienced programmers know any better, because they received the same brain-damaged education. It's maddening. I see so many questions here and on Stack Overflow and on other sites where it's clear that the person asking the question is being taught by someone who simply doesn't know what they're talking about, and of course you can't just say "your professor is wrong," because he's a Professor and you're just some guy on the Internet.

And then you have the crowd that disdains any answer beginning with, "well, according to the language standard..." because they're working in the real world and according to them the standard doesn't apply to the real world. I can deal with someone who just has a bad education, but anyone who insists on being ignorant is just a blight on the industry.

There would be no buffer overflow problems if the language were taught correctly with an emphasis on writing secure code. It's not "hard", it's not "advanced", it's just being careful.

Yes, this has been a rant.

¹ Which, thankfully, has finally been yanked from the language specification, although it will lurk in 40 years' worth of legacy code forever.

score 5 · Answer 6 · answered Feb 18 '12 at 09:53

The problem is as much one of managerial shortsightedness than of programmer incompetence. Remember, a 90,000-line application needs only one insecure operation to be completely insecure. It is almost beyond the realms of possibility that any application written on top of fundamentally insecure string handling will be 100% perfect - which means that it will be insecure.

The problem is that the costs of being insecure are either not charged to the right addressee (the company selling the app will almost never have to refund the purchase price), or not clearly visible at the time decisions are made ("We have to ship in March no matter what!"). I'm fairly certain that if you factored long-term costs and costs to your users rather than to your company profit in, writing in C or related languages would be much more expensive, probably so expensive that it is clearly the wrong choice in many fields where nowadays conventional wisdom says that it is a necessity. But that won't change unless much stricter software liability is introduced - which nobody in the industry wants.

score 4 · Answer 7 · answered Feb 20 '12 at 08:39

One of the great powers of using C is that it lets you manipulate memory in whatever way you see fit.

One of the great weaknesses of using C is that it lets you manipulate memory in whatever way you see fit.

There are safe versions of any unsafe functions. However, programmers and compiler do not strictly enforce their use.

score 2 · Answer 8 · answered Feb 19 '12 at 13:39

why do the creators of C not fix these issues by reimplementing the libraries?

Probably because C++ already did this, and is backward compatible with C code. So if you want a safe string type in your C code, you just use std::string and write your C code using a C++ compiler.

The underlying memory subsystem can help to prevent buffer overflows by introducing guard blocks and validity checking of them - so all allocations have 4 bytes of 'fefefefe' added, when these blocks are written to, the system can throw a wobbler. Its not guaranteed to prevent a memory write, but it will show that something has gone wrong and needs to be fixed.

I think the problem is that the old strcpy etc routines are still present. If they were removed in favour of strncpy etc then that would help.

score -2 · Answer 9 · answered Feb 18 '12 at 13:17

It is simple to understand why the overflow problem isn't fixed. C was flawed in a couple of areas. At the time those flaws were seen as tolerable or even as a feature. Now decades later those flaws are un-fixable.

Some parts of the programming community doesn't want those holes plugged. Just look at all the flame wars that start over strings, arrays, pointers, garbage collection...

Why is it so difficult to make C less prone to buffer overflows?

9 Answers9

Linked

Related