58

Since we are becoming more and more reliant on computing, including very critical tasks of day-to-day life, I was just wondering how those vital components are tested.

More technically, how are the compilers and assemblers tested? (I suppose this relates to the halting problem!!)

8 Answers8

105

You can't be certain, but you just assume they are, until you discover they are not. There have been plenty of bugs in compilers and hardware over the years.

The way these are tested, for example a compiler, is that they are very narrowly and rigidly defined, carefully written, then tested with an enormous test suite to verify correctness. Add to that the wide user base of a compiler, and more bugs will be detected and reported. A dentist appointment scheduling app, comparatively, has many fewer users, and fewer still that are capable of detecting defects.

SQLite consists of about 73k lines of code, while its test suite consists of about 91378k lines of code, more than 1250x times that of SQLite itself. I expect compilers and other core tools have similar ratios. Processors today are designed essentially with software, using hardware description languages like Verilog or VHDL, and those have software tests run on them as well, as well as specialized IO pins for running self tests at the point of manufacture.

Ultimately it's a probability game, and repeated and broadly covering testing allows you to push the probability of defects down to an acceptably low level, the same as an other software project.

whatsisname
  • 27,703
46

In layman's terms:

  1. You cannot.
  2. Compilers and interpreters are unit-tested as any other (professional) software.
  3. A sucessful test doesn't mean a program is bug-free, it only means no bugs were detected.
  4. A wide user base using the compiler during a long time is a pretty indicator of it having very few bugs, because users usually test cases the designers didn't think of.
  5. Being open source is also a good indicator. "Given enough eyeballs, all bugs are shallow... Given a large enough beta-tester and co-developer base, almost every problem will be characterized quickly and the fix will be obvious to someone.". A closed-source compiler could have bugs that arise at very specific times or that generate less-than optimal machine code and the company behind it could just simply not disclose their existence ang give it a very low priority in the product's road map.

Bottom-line:

I'd say go for OOP (Old, Open and Popular). I just made up that acronym.

Tulains Córdova
  • 39,570
  • 13
  • 100
  • 156
24

It's turtles all the way down.

Nothing is certain. You have no choice but to settle on confidence ratings.

You can think of it as a stack: Math > Physics > Hardware > Firmware > Operating System > Assembler/Compiler/etc

At each level you have tests that you can perform to improve your confidence ratings. Some of these tests have the quality of formal proofs, some of them are based on observation, most are a combination of both.

The tricky part is unravelling the recursion in some of these tests because we use programs to do proofs and observational analysis now where it has become too difficult to do that by hand.

Ultimately though the answer is that you try everything you can think of. Static analysis, fuzzing, simulation, running with purposefully selected extreme inputs or random inputs, running/mapping every control path, formal proofs, etc. Basically your aim in testing should always be to do everything possible to prove that your product (e.g. theory/chip/program) doesn't work as intended. If you make a genuine effort and still fail then you're allowed to improve your confidence rating in your product's correctness.

Testing is at best a semidecision process meaning that given there's a bug you will eventually find it but you can never be sure that you've found them all. Even with formally verified software you're still relying on physics, the tools used to do the formal proofs, and that the thing you proved is necessary and sufficient for your program to do what is (often subjectively) "intended". That's not to mention all the other components you're using that don't have formal proofs.

17

This is a "dangerous" question for new developers in that they'll start blaming their tools instead of their code (been there, done that, seen too many do it). Although there are bugs in compilers, runtime environments, OS, etc., developers should be realistic and remember that, until there is evidence and unit tests demonstrating otherwise, the bug is in your code.

In 25+ years of programming in mostly C, C++, and Java I have found:

  • two bugs due to a compiler bug (gcc and SunOS C)
  • about once every year or two a bug due to a Java JVM problem (usually related to memory consumption/garbage collection)
  • about once every month or two a bug in a library, which frequently is fixed by using the latest version or reverting to the prior version of the library

All of the other bugs are directly related to a bug or, more frequently, a lack of understanding of how a library works. Sometimes what seems to be a bug is due to an incompatibility, for instance how the Java class structure changed that broke some AOP libraries.

Ed Griebel
  • 329
  • 1
  • 8
8

I think an interesting point here is that the vast majority of commercial software (and indeed open source software) licences specifically specify that you cannot trust the software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.

From Microsoft Word Licence agreement

. Except for the Limited Warranty and to the maximum extent permitted by applicable law, Microsoft and its suppliers provide the Software and support services (if any) AS IS AND WITH ALL FAULTS, and hereby disclaim all other warranties and conditions, whether express, implied or statutory, including, but not limited to, any (if any) implied warranties, duties or conditions of merchantability, of fitness for a particular purpose, of reliability or availability, of accuracy or completeness of responses, of results, of workmanlike effort, of lack of viruses, and of lack of negligence, all with regard to the Software, and the provision of or failure to provide support or other services, information, software, and related content through the Software or otherwise arising out of the use of the Software.

In essence this sentence in the licence in almost every piece of software that you use specically tells you that you cannot trust the software let alone the compiler used.

Software is like a scientific theory, it is deemed to work as specified until it doesnt.

Toby Allen
  • 387
  • 1
  • 11
2

As a compiler writer for a math language*, from my experience I can say in theory you cannot. And some of the bugs just give wrong results like (from my shame list) calculating 6/3*2 from the right 6/(3*2) and outputting 1 without crashing or giving nonsensical compile errors.

But IMHO many compilers do not have as many bugs as other software because:

  • Writing unit tests is easy. Each statement is a unit and you can write tests as simple as: test_unit("2+(-2)*(-2+1)*3+1",9);
  • A program is a combination of statements and for any program to output the correct result each individual statement must give the correct result (mostly). So it is very unlikely to have any bugs while the program gives the correct result.
  • As the size and number of written programs increase the likelihood of catching bugs dramatically increases.

For assemblers, machine instructions etc, the above also hold; on the other hand verification and validation in chip design and production have a lot more strict processes since it is a huge business: Electronic design automation .

Before going to production each CPU should be tested rigorously because each bug costs nearly a couple of million dollars: there are huge non-recurring production costs in chip production. So companies spend a lot of money and write a lot of simulation code for their design before going production, although this does not give a 100% guarantee - for example: the Pentium FDIV bug.

In short it is very unlikely to have serious bugs in compilers, machine codes etc.

My humble math language *

psmears
  • 196
  • 4
Gorkem
  • 137
  • 4
0

Flawless? They're not. I recently installed some "updates", and it was months (and several reprogrammed sections of code) later before my ASP.NET site was working properly again, due to unexplained changes in how various basic things worked or failed.

However, they are tested and then used by many very smart detail-oriented people, who tend to notice and report and fix most things. Stack Exchange is a great example (and improvement on) how all the people using those tools help test and analyze how these amazingly complex and low-level tools work, at least as far as as practical use goes.

But flawless, no. Although you can also see people on Stack Exchange gaining impressive insight into performance details and standards compliance and quirks, there are always flaws and imperfections, especially when different people have different opinions of what a flaw is.

Dronz
  • 117
-1

To show that the underlying systems are flawless you either

a) Need to proof they are flawless

  1. Mathematical proof
  2. Only realistically possible for trivial programs

b) Make an exhaustive test

  1. Only possible for trivial programs and some simple programs
  2. As soon as a timing element enters the test it is not possible to make an exhaustive test as time can be indefinitely divided.
  3. Beyond the trivial programs the possible execution options explode exponentially.

In software testing the exhaustive test is only used in unit testing of some simple functions.

Example: You want to test a 8 character utf-8 input to some field, you make the choice to cut the input at 8 times the maximum length 6 of utf-8 in bytes which gives 8*6=48 bytes to actually have a finite amounts of possibilities.

You could now think you only need to test the 1,112,064 valid code points of each of the 8 character, ie. 1,112,064^8 (say 10^48) tests (which is already unlikely be possible), but you actually have to test each value of each of the 48 bytes or 256^48 which is around 10^120 which is the same complexity as chess compared to the total number of atoms in the universe of roughly 10^80.

Instead you can use, in increasing order of effort and each test should cover all the previous:

a) test a good and a bad sample.

b) code coverage, ie. try to test every line of code, which is relative simple for most code. Now you can wonder what the last 1% of the code you can't test is there ... bugs, dead code, hardware exceptions etc.

c) path coverage, all outcomes of all branches in all combinations are tested. Now you know why the test department hates you when your functions contains more than 10 conditions. Also you wonder why the last 1% can't be tested ... some branches are depended on the previous branches.

d) data test, test a number of sample with border value, common problematic values and magic numbers, zero, -1, 1, min +/-1, max +/-1, 42, rnd values. If this doesn't give you path coverage you know you haven't caught all the values in your analysis.

If you already do this you should be ready for ISTQB foundation exam.

Surt
  • 123