2009年3月9日星期一

Test Tech - How SQLite Is Tested

http://sqlite.org/testing.html

1.0 Introduction

The reliability and robustness of SQLite is achieved in large part by thorough and careful testing.

As of version 3.6.11 (all statistics in the report are against that release of SQLite), the SQLite library consists of approximately 62.2 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 716 times as much test code and test scripts - 44568.6 KSLOC.

2.0 Test Harnesses

There are three independent test harnesses used for testing the core SQLite library. Each test harness is designed, maintained, and managed separately from the others.

  1. The TCL Tests are the oldest and most complete set of tests for SQLite. The TCL tests are contained in the same source tree as the SQLite core and like the SQLite core are in the public domain. The TCL tests are the primary tests used during development. The TCL tests are written using the TCL scripting language. The TCL test harness itself consists of 15.7 KSLOC of C code use to create the TCL interface. The test scripts are contained in 456 files totaling 7.6MB in size. There are 23813 distinct test cases, but many of the test cases are parameterized and run multiple times (with different parameters) so that on a full test run, about 1.1 million separate tests are performed.

  2. The TH3 test harness is a set of proprietary tests, written in C. The impetus for TH3 was the need to have a set of tests that ran on embedded and specialized platforms that would not easily support TCL or other workstation services. TH3 tests use only the published SQLite interfaces. These tests are free to SQLite Consortium members and are available by license to others. TH3 consists of about 2.4 MB or 34.0 KSLOC of C code implementing 7014 distinct test cases. TH3 tests are heavily parameterized, though, so a full test runs about 2.3 million different test instances.

  3. The SQL Logic Test or SLT test harness is used to run huge numbers of SQL statements against both SQLite and several other SQL database engines and verify that they all get the same answers. SLT currently compares SQLite against PostgreSQL, MySQL, and Microsoft SQL Server. SLT runs 5.8 million queries comprising 1.10GB of test data.

All of the tests above must run successfully, on multiple platforms and under multiple compile-time configurations, before each release of SQLite.

Prior to each check-in to the SQLite source tree, developers typically run a subset (called "veryquick") of the Tcl tests consisting of about 40.6 thousand test cases and covering 96.94% of the core SQLite source code. The veryquick tests covers everything except the anomaly, fuzz, and soak tests. The idea behind the veryquick tests are that they are sufficient to catch most errors, but also run in only a few minutes instead of a few hours.

3.0 Anomaly Testing

Anomaly tests are tests designed to verify the correct behavior of SQLite when something goes wrong. It is (relatively) easy to build an SQL database engine that behaves correctly on well-formed inputs on a fully functional computer. It is more difficult to build a system that responses sanely to invalid inputs and continues to function following system malfunctions. The anomaly tests are designed to verify the latter behavior.

3.1 Out-Of-Memory Testing

SQLite, like all SQL database engines, makes extensive use of malloc() (See the separate report on dynamic memory allocation in SQLite for additional detail.) On workstations, malloc() never fails in practice and so correct handling of out-of-memory (OOM) errors is not particularly important. But on embedded devices, OOM errors are frightenly common and since SQLite is frequently used on embedded devices, it is important that SQLite be able to gracefully handle OOM errors.

OOM testing is accomplished by simulating OOM errors. SQLite allows an application to substitute an alternative malloc() implementation using the sqlite3_config(SQLITE_CONFIG_MALLOC,...) interface. The TCL and TH3 test harnesses are both capable of inserting a modified version of malloc() that can be rigged fail after a certain number of allocations. These instrumented mallocs can be set to fail only once and then start working again, or to continue failing after the first failure. OOM tests are done in a loop. On the first iteration of the loop, the instrumented malloc is rigged to fail on the first allocation. Then some SQLite operation is carried out and checks or done to make sure SQLite handled the OOM error correctly. Then the time-to-failure counter on the instrumented malloc is increased by one and the test is repeated. The loop continues until the entire operation runs to completion without ever encountering a simulated OOM failure. Tests like this are run twice, once with the instrumented malloc set to fail only once, and again with the instrumented malloc set to fail continuously after the first failure.

3.2 I/O Error Testing

I/O error testing seeks to verify that SQLite responds sanely to failed I/O operations. I/O errors might result from a full disk drive, malfunctioning disk hardware, network outages when using a network file system, system configuration or permission changes that occur in the middle of an SQL operation, or other hardware or operating system malfunctions. Whatever the cause, it is important that SQLite be able to respond correctly to these errors and I/O error testing seeks to verify that it does.

I/O error testing is similar in concept to OOM testing; I/O errors are simulated and checks are made to verify that SQLite responds correctly to the simulated errors. I/O errors are simulated in both the TCL and TH3 test harnesses by inserting a new Virtual File System object that is specially rigged to simulate an I/O error after a set number of I/O operations. As with OOM error testing, the I/O error simulators can be set to fail just once, or to fail continuously after the first failure. Tests are run in a loop, slowly increasing the point of failure until the test case runs to completion without error. The loop is run twice, once with the I/O error simulator set to simulate only a single failure and a second time with it set to fail all I/O operations after the first failure.

In I/O error tests, after the I/O error simulation failure mechanism is disabled, the database is examined using PRAGMA integrity_check to make sure that the I/O error has not introduced database corruption.

3.2 Crash Testing

Crash testing seeks to demonstrate that an SQLite database will not go corrupt if the application or operating system crashes or if there is a power failure in the middle of a database update. A separate white-paper titled Atomic Commit in SQLite describes the defensive measure SQLite takes to prevent database corruption following a crash. Crash tests strive to verify that those defensive measures are working correctly.

It is impractical to do crash testing using real power failures, of course, and so crash testing is done in simulation. An alternative Virtual File System is inserted that allows the test harness to simulate the state of the database file following a crash.

In the TCL test harness, the crash simulation is done in a separate process. The main testing process spawns a child process which runs some SQLite operation and randomly crashes somewhere in the middle of a write operation. A special VFS randomly reorders and corrupts the unsynchronized write operations to simulate the effect of buffered filesystems. After the child dies, the original test process opens and reads the test database and verifies that the changes attempted by the child either completed successfully or else were completely rolled back. The integrity_check PRAGMA is used to make sure no database corruption occurs.

The TH3 test harness needs to run on embedded systems that do not necessarily have the ability to spawn child processes, so it uses an in-memory VFS to simulate crashes. The in-memory VFS can be rigged to make snapshot of the entire filesystem after a set number of I/O operations. Crash tests run in a loop. On each iteration of the loop, the point at which a snapshot is made is advanced until the SQLite operations being tested run to completion without ever hitting a snapshot. Within the loop, after the SQLite operation under test has completed, the filesystem is reverted to the snapshot and random file damage is introduced that is characteristic of the kinds of damage one expects to see following a power loss. Then the database is opened and checks are made to ensure that it is well-formed and that the transaction either ran to completion or was completely rolled back. The interior of the loop is repeated multiple times for each snapshot with different random damage each time.

4.0 Fuzz Testing

Fuzz testing seeks to establish that SQLite response correctly to invalid, out-of-range, or malformed inputs.

4.1 SQL Fuzz

SQL fuzz testing consists of creating syntactically correct yet wildly nonsensical SQL statements and feeding them to SQLite to see what it will do with them. Usually some kind of error is returned (such as "no such table"). Sometimes, purely by chance, the SQL statement also happens to be semantically correct. In that case, the resulting prepared statement is run to make sure it gives a reasonable result.

The SQL fuzz generator tests are part of the TCL test suite. During a full test run, about 105.2 thousand fuzz SQL statements are generated and tested.

4.2 Malformed Database Files

There are numerous test cases that verify that SQLite is able to deal with malformed database files. These tests first build a well-formed database file, then add corruption by changing one or more bytes in the file by some means other than SQLite. Then SQLite is used to read the database. In some cases, the bytes changes are in the middle of data files. This causes the content to change, but does not otherwise impact the operation of SQLite. In other cases, unused bytes of the file are modified. The interesting cases are when bytes of the file that define database structure get changed. The malformed database tests verify that SQLite finds the file format errors and reports them using the SQLITE_CORRUPT return code without overflowing buffers, dereferencing NULL pointers, or performing other unwholesome actions.

4.3 Boundary Value Tests

SQLite defines certain limits on its operation, such as the maximum number of columns in a table, the maximum length of an SQL statement, or the maximum value of an integer. The TCL test suite contains numerous tests that push SQLite right to the edge of its defined limits and verify that it performs correctly for all allowed values. Additional tests go beyond the defined limits and verify that SQLite correctly returns errors.

5.0 Regression Testing

Whenever a bug is reported against SQLite, that bug is not considered fixed until new test cases have been added to the TCL test suite which would exhibit the bug in an unpatched version of SQLite. Over the years, this has resulted in thousands and thousands of new tests being added to the TCL test suite. These regression tests insure that bugs that have been fixed in the past are never reintroduced into future versions of SQLite.

6.0 Automatic Resource Leak Detection

Resource leak occurs when system resources are allocated and never freed. The most troublesome resource leaks in many applications are memory leaks - when memory is allocated using malloc() but never released using free(). But other kinds of resources can also be linked: file descriptors, threads, mutexes, etc.

Both the TCL and TH3 test harnesses automatically track system resources and report resources leaks on every test run. No special configuration or setup is required. The test harnesses are especially vigilant with regard to memory leaks. If a change causes a memory leak, the test harnesses will recognize this quickly. SQLite is designed to never leak memory, even after an exception such as an OOM error or disk I/O error. The test harnesses are zealous to enforce this.

7.0 Test Coverage

The gcov utility is used to measure the "test coverage" of the SQLite test suite. SQLite strives for but does not yet obtain 100% test coverage. A major goal of the SQLite project is to obtain 100% branch coverage during 2009.

Test coverage can be measured in several ways. "Statement coverage" measures (as a percentage of the whole) how many lines of code are exercised by the test cases. The TCL test suite obtains 99.37% statement coverage on the SQLite core. (The SQLite core, in this case, excludes the operating-system dependent VFS backends.) "Branch" coverage measures (again, as a percentage of the whole) how many machine-code branch instructions are taken at least once in both directions. The TCL test suite obtains 95.16% branch coverage.

To illustrate the difference between statement coverage and branch coverage, consider the following hypothetical line of C code:

if( a>b && c!=25 ){ d++; }

Such a line of C code might generate a dozen separate machine code instructions. If any one of those instructions is ever evaluated, then we say that the statement has been tested. So, for example, it might be the case that the conditional expression is always false and the "d" variable is never incremented. Even so, statement coverage counts this line of code as having been tested.

Branch coverage is more strict. With branch coverage, each test and each subblock within the statement is considered separately. In order to achieve 100% branch coverage in the example above, there must be at least three test cases:

  • a<=b
  • a>b && c==25
  • a>b && c!=25

Branch test coverage is normally less than statement coverage since a C program will typically contain some defensive tests which in practice are always true or always false. For testing purposes, the SQLite source code defines macros called ALWAYS() and NEVER(). The ALWAYS() macro surrounds conditions which are expected to always evaluated to true and NEVER() surrounds conditions that are always evaluate to false. These macros serve as comments to indicate that the conditions are defensive code. For standard builds, these macros are pass-throughs:

#define ALWAYS(X)  (X)
#define NEVER(X) (X)

During most testing, however, these macros will throw an assertion fault if their argument does not have the expected truth value. This alerts the developers quickly to incorrect design assumptions.

#define ALWAYS(X)  ((X)?1:assert(0),0)
#define NEVER(X) ((X)?assert(0),1:0)

When measuring test coverage, these macros are defined to be constant truth values so that they do not generate assembly language branch instructions, and hence do not come into play when calculating the branch coverage level:

#define ALWAYS(X)  (1)
#define NEVER(X) (0)

Another macro used in conjuction with test coverage measurement is the testcase() macro. The argument is a condition for which we want test cases that evaluate to both true and false. In non-coverage builds (that is to so, in release builds) the testcase() macro is a no-op:

#define testcase(X)

But in a coverage measuring build, the testcase() macro generates code that evaluates the conditional expression in its argument. Then during analysis, a check is made to insure tests exists that evaluate the conditional to both true and false. Testcase() macros are used, for example, to help verify that boundary values are tested. For example:

testcase( a==b );
testcase( a==b+1 );
if( a>b && c!=25 ){ d++; }

Testcase macros are also used when two or more cases of a switch statement go to the same block of code, to make sure that the code was reached for all cases:

switch( op ){
case OP_Add:
case OP_Subtract: {
testcase( op==OP_Add );
testcase( op==OP_Subtract );
/* ... */
break;
}
/* ... */
}

For bitmask tests, testcase() macros are used to verify that every bit of the bitmask effects the test. For example, in the following block of code, the condition is true if the mask contains either of two bits indicating either a MAIN_DB or a TEMP_DB is being opened. The testcase() macros that preceed the if statement verify that both cases are tested:

testcase( mask & SQLITE_OPEN_MAIN_DB );
testcase( mask & SQLITE_OPEN_TEMP_DB );
if( (mask & (SQLITE_OPEN_MAIN_DB|SQLITE_OPEN_TEMP_DB))!=0 ){ ... }

The developers of SQLite have found that coverage testing is an extremely productive method for finding bugs. Because such a high percentage of SQLite core code is covered by test cases, the developers can have confident that changes they make in one part of the code do not have unintended consequences in other parts of the code. It would not be possible to maintain the quality of SQLite without coverage testing.

8.0 Dynamic Analysis

Dynamic analysis refers to internal and external checks on the SQLite code which are performed while the code is live and running. Dynamic analysis has proven to be a great help in maintaining the quality of SQLite.

8.1 Assert

The SQLite core contains 2442 assert() statements that verify function preconditions and postconditions and loop invariants. Assert() is a macro which is a standard part of ANSI-C. The argument is a boolean value that is assumed to always be true. If the assertion is false, the program prints an error message and halts.

Assert() macros are disabled by compiling with the NDEBUG macro defined. In most systems, asserts are enabled by default. But in SQLite, the asserts are so numerous and are in such performance critical places, that the database engine runs about three times slower when asserts are enabled. Hence, the default (production) build of SQLite disables asserts. Assert statements are only enabled when SQLite is compiled with the SQLITE_DEBUG preprocessor macro defined.

8.2 Valgrind

Valgrind is perhaps the most amazing and useful developer tool in the world. Valgrind is a simulator - it simulates an x86 running a linux binary. (Ports of valgrind for platforms other than linux are in development, but as of this writing, valgrind only works reliably on linux, which in the opinion of the SQLite developers means that linux should be preferred platform for all software development.) As valgrind runs a linux binary, it looks for all kinds of interesting errors such as array overruns, reading from uninitialized memory, stack overflows, memory leaks, and so forth. Valgrind finds problems that can easily slip through all of the other tests run against SQLite. And, when valgrind does find an error, it can dump the developer directly into a symbolic debugger at the exact point where the error occur, to facilitate a quick fix.

Because it is a simulator, running a binary in valgrind is slower than running it on native hardware. So it is impractical to run the full SQLite test suite through valgrind. However, the veryquick tests and a subset of the TH3 tests are run through valgrind prior to every release.

8.3 Memsys2

SQLite contains a pluggable memory allocation subsystem. The default implementation uses system malloc() and free(). However, if SQLite is compiled with SQLITE_MEMDEBUG, an alternative memory allocation wrapper (memsys2) is inserted that looks for memory allocation errors at run-time. The memsys2 wrapper checks for memory leaks, of course, but also looks for buffer overruns, uses of uninitialized memory, and attempts to use memory after it has been freed. These same checks are also done by valgrind (and, indeed, valgrind does them better) but memsys2 has the advantage of being much faster than valgrind, which means the checks can be done more often and for longer tests.

8.4 Mutex Asserts

SQLite contains a pluggable mutex subsystem. Depending on compile-time options, the default mutex system contains interfaces sqlite3_mutex_held() and sqlite3_mutex_notheld() that detect whether or not a particular mutex is held by the calling thread. These two interfaces are used extensively within assert() statements in SQLite to verify mutexes are held and released at all the right moments, in order to double-check that SQLite does work correctly in multi-threaded applications.

8.5 Journal Tests

One of the things that SQLite does to insure that transactions are atomic across system crashes and power failures is to write all changes into the rollback journal file prior to changing the database. The TCL test harness contains an alternative Virtual File System implementation that helps to verify this is occurring correctly. The "journal-test VFS" monitors all disk I/O traffic between the database file and rollback journal, checking to make sure that nothing is written into the database file which has not first by written and synced to the rollback journal. If any discrepancies are found, an assertion fault is raised.

The journal tests are an additional double-check over and above the crash tests to make sure that SQLite transactions will be atomic across system crashes and power failures.

9.0 Static Analysis

Static analysis means analyzing code at or before compile-time to check for correctness. Static analysis consists mostly of making sure SQLite compiles without warnings, even when all warnings are enabled. SQLite is developed primarily using GCC and it does compile without warnings on GCC using the -Wall and -Wextra flags. There are occasional reports of warnings coming from VC++, however.

Static analysis has not proven to be helpful in finding bugs. We cannot call to mind a single problem in SQLite that was detected by static analysis that was not first seen by one of the other testing methods described above. On the other hand, we have on occasion introduced new bugs in our efforts to get SQLite to compile without warnings.

Our experience, then, is that static analysis is counter-productive to quality. In other words, focusing on static analysis (being concerned with compiler warnings) actually reduces the quality of the code. Nevertheless, we developers have capitulated to pressure from users and actively work to eliminate compiler warnings. We are willing to do this because the other tests described above do an excellent job of finding the bugs that are often introduced when removing compiler warnings, so that product quality is probably not decreased as a result.

10.0 Summary

SQLite is open source. That give many people with the idea that it is not well tested and is perhaps unreliable. But that impression is false. SQLite has exhibited very high reliability in the field and a very low defect rate, especially considering how rapidly it is evolving. The quality of SQLite is achieved in part by careful code design and implementation. But extensive testing also plays a vital role in maintaining and improving the quality of SQLite. This document has summarized the testing procedures that every release of SQLite undergoes with the hopes of inspiring the reader to understand that SQLite is suitable for use in mission-critical applications.

没有评论:

发表评论