2009年3月9日星期一

Test Tech - 软件测试常见模型-V,W,H,X

软件测试常见模型-V,W,H,X

V模型
在V模型中,测试过程被加在开发过程的后半部分,如下图所示:




单元测试所检测代码的开发是否符合详细设计的要求。集成测试所检测此前测试过的各组成部分是否能完好地结合到一起。系统测试所检测已集成在一起的产品是否符合系统规格说明书的要求。而验收测试则检测产品是否符合最终用户的需求。



对于测试设计,显而易见的是,V模型的用户往往会把执行测试与测试设计分开对待。在开发文档准备就绪后,就可以开始进行相关的测试设计。如下图所示,相应的测试设计覆盖在了相关的开发过程之上:




将测试设计覆盖了开发过程后的V模型



V模型有着很吸引人的对称外形,并且把很多人都带入了歧途。本文将集中讨论它在单元测试和集成测试中引起的问题。



为了说明的方便,这里专门制作了以下图片,图中包括一个单独的单元,以及一个单元组,称之为子系统(subsystem)。




一个假想的子系统



对于一个单元应该多大才最为合适的问题,已经有过很多的讨论,究竟一个单元仅仅是一个函数,一个类,还是相关的类的集合?这些讨论并不影响我在这里所要阐述的观点。我们权且认为一个单元就是一个最小程度的代码块,开发人员可以对进行独立地讨论。


V模型认为人们首先应该对每一个单元进行测试。当子系统中所有的单元都已经测试完毕,它们将被集中到一起进行测试,以验证它们是否可以构成一个可运行的整体。



那么,如何针对单元进行测试呢?我们会查看在详细设计中对接口定义,或者查看源代码,或者同时对两者进行查看,找出符合某些测试设计中的有关准则的输入数据来进行输入,然后检查结果,看其是否正确。由于各单元一般来说不能独立地运行,所以我们不得不另外设计桩模块(Stub)和驱动模块(Driver),如下图所示。


单元及其外部的驱动模块和桩模块



图中的箭头代表了测试的执行轨迹。这就是大多数人所说的“单元测试”。这样的方法有时候是一种不好的方法。



同样的输入也可以有同一子系统中的其它单元来提供,这样,其它的单元既扮演了桩模块,又扮演了驱动模块。如下图所示:





子系统内部各单元间的测试执行轨迹



到 底选择哪一种方法,这需要一种折衷和权衡。设计桩模块和驱动模块要付出多少代价?这些模块如何进行维护?子系统是否会由此而掩盖了一些故障?在整个子系统 范围内进行排错的困难程度有多大?如果我们的测试直到集成测试时才真正开始,那么一些bug可能较晚才被发现。由此造成的代价同设计桩模块和驱动模块的代 价如何比较?


V模型没有去考虑这些问题,当单元开发完成后就执行单元测试,而当自系统被集中在一起后就执行集成测试,仅此而已。令我奇怪和沮丧的是,人们从不去做一些权衡,他们已经受制于他们的模型。



因此,一个有用的模型应该允许测试人员考虑节省并推迟测试的可能性。



一 个测试,如果要发现一个特定的单元中的bug,最好是在该单元保持独立的情况下执行,并且在其外部辅以特定的桩模块和驱动模块。而另一种方法则是让它作为 子系统的一部分来进行测试,该测试的设计主要是为了发现集成的问题。由于一个子系统本身也需要桩模块和驱动模块来模拟该子系统和其它子系统的联系,因此, 单元测试和集成测试可能被推迟到至少整个系统已经部分集成的时候。在这种情况下,测试者可能通过产品的外部接口同时进行单元测试、集成测试和系统测试,同样的,其主要目的还是为了减少总体生命周期的成本,对测试成本和延期进行测试及由此造成延期发现bug的代价成本进行权衡。据此而言,"单元测试"、"集成测试"和"系统测试"的区别已经大大削弱了。其结果可参考下图:




新的方法:在部分阶段延迟进行单元测试和集成测试



在上图右边的方块中,最好要改成为“执行某些适当的测试并得到相应的结果”。



图 中的左边会怎样?考虑一下系统测试设计,它的主要根据和信息来源是是规格说明。假设你知道有2个单元处在一个特定的子系统中,它们在运行时相互联系,并且 要执行规格说明中的一个特定的声明。为什么不在该子系统被集成时立即对此规格说明中的声明进行测试,就象是在设计完成后立即开始测试的设计一样呢?如果该 声明的执行和子系统外的子系统没有任何关系,为什么还要等到整个系统完成以后再测试呢?难道越早发现bug成本越低不对吗?



在上一张图片中,我们用了向上指的箭头(更有效,但在时间上有延迟)。这里还可以把箭头往下指(在时间上提前):




新的方法:在不同阶段上提前进行测试设计



在这种情况下,左边的方块中最好被标记为:“在当前信息条件和情况下可以做的任何测试设计”。这样,当测试设计得自于系统中某一个组件的描述时,模型必须允许这样的测试在组件被装配之前被执行。我必须承认我的图片非常难看,这些箭头指得到处都是,对此我有2点说明:


1.我们所讨论的事情不是创造美,而是想要发现尽可能多的严重错误,同时尽可能地降低成本。


2.难看的部分原因也是因为必须按照某些次序来执行的结果,亦即开发人员先提供系统描述文档,然后测试和这些文档进行关联。这些文档就象是坚实的老橡树,而测试设计则象是细细的枝条缠绕在树上。如果我们采用不同的原理来进行组织,图片可能就会变得好看些。但复杂性仍不可避免,因为我们要讨论的问题本身就很复杂。


V模型失败的原因是它把系统开发过程划分为具有固定边界的不同阶段,这使得人们很难跨过这些边界来采集测试所需要的信息。有些测试应该执行得更早些,有些测试则需要延后进行。而且,它也阻碍了你从系统描述的不同阶段中取得信息进行综合。例如,某些组织有时执行这样的做法,即对完成的工作进 行签署。这样的规定也扩展到系统测试的设计。签署表示已经过评估,该测试设计工作已经完成,除非对应的设计文档改变,否则就不会被修订。如果同这些测试相 关的信息后来被重新挖掘和认识,例如,架构设计表明有些测试是多余的,或者,详细设计表明有一个内部的边界可以和已存在的系统测试组合在一起进行测试的 话,那么实际上还需要继续调整原来的系统测试设计。



因此,模型必须允许利用不同来源的综合信息进行个别的测试设计。另外,模型还应该允许在新的信息来源出现后重新进行测试的设计。




上次说到V模型的局限性在于没有明确地说明早期的测试,无法体现尽早地和不断地进行软件测试的原则。在V模型中增加软件各开发阶段应同步进行的测试,演化为W模型。在模型中不难看出,开发是“V”,测试是与此并行的“V”。基于尽早地和不断地进行软件测试的原则,在软件的需求和设计阶段的测试活动应遵循IEEE1012-1998《软件验证与确认(V&V)》的原则。

V模型

Evolutif公司提出,相对于V模型,W模型更科学。W模型是V模型的发展,强调的是测试伴随着整个软件开发周期,而且测试的对象不仅仅是程序,需求、功能和设计同样要测试。测试与开发是同步进行的,从而有利于尽早地发现问题。



W模型也有局限性。W模型和V模型都把软件的开发视为需求、设计、编码等一系列串行的活动,无法支持迭代、自发性以及变更调整。


上次提到两个测试过程模型,都没有很好地体现测试流程的完整性。为了解决以上问题,提出了H模型。它将测试活动完全独立出来,形成一个完全独立的流程,将测试准备活动和测试执行活动清晰地体现出来。




软件测试H模型



示意图演示了在整个生产周期中某个层次上的一次测试微循环。图中的其他流程图可以是任意开发流程。例如,设计流程和编码流程。也可以是其他非开发流程,例如,SQA流程,甚至是测试流程本身。只要测试条件成熟了,测试准备活动完成了,测试执行活动就可以进行了。

H模型揭示了:


·
软件测试不仅仅指测试的执行,还包括很多其他的活动


·
软件测试是一个独立的流程,贯穿产品整个生命周期,与其他流程并发地进行


·
软件测试要尽早准备,尽早执行


·
软件测试是根据被测物的不同而分层次进行的。不同层次的测试活动可以是按照某个次序先后进行的,但也可能是反复的



H模型中,软件测试模型是一个独立的流程,贯穿于整个产品周期,与其他流程并发地进行。当某个测试时间点就绪时,软件测试即从测试准备阶段进入测试执行阶段。


X模型的基本思想是由Marick提出的,但首先是Marick不建议要建立一个替代模型。Robin F·Goldsmith引用了一些Marick的想法,并重新经过组织,形成了“X模型。其实并不是为了和V模型相对应而选择这样的名字,而是由于其它一些原因:X通常代表未知,而Marick也认为他的观点并不足以支撑一个模型的完整描述,但其中已经有一个模型所需要的一些主要内容,其中也包括了象探索性测试(exploratorytesting)这样的亮点。




X模型示意图



MarickV模型的最主要批评是V模型无法引导项目的全部过程。他认为一个模型必须能处理开发的所有方面,包括交接,频繁重复的集成,以及需求文档的缺乏等等。



Marick认为一个模型不应该规定那些和当前所公认的实践不一致的行为。X模型的左边描述的是针对单独程序片段所进行的相互分离的编码和测试,此后将进行频繁的交接,通过集成最终合成为可执行的程序。(右上半部分),这些可执行程序还需要进行测试。已通过集成测试的成品可以进行封版并提交给用户,也可以作为更大规模和范围内集成的一部分。多根并行的曲线表示变更可以在各个部分发生。
由上图中可见,X模型还定位了探索性测试(右下方)。这是不进行事先计划的特殊类型的测试,诸如我这么测一下结果会怎么样?,这一方式往往能帮助有经验的测试人员在测试计划之外发现更多的软件错误。
然而,关注于这样的低级别的行为可能会引起不同的议论。一个模型和一个单独的项目计划有所不同。模型不应该描述每个项目的具体细节,模型应该对项目进行指导和支持。当然,代码的交接也可以简单地认为是一种集成的形式。而V模型也并没有限制各种创建周期的发生次数。

Marick
Graham都一致认同,应该在执行测试之前进行测试设计。Marick建议:在你掌握相关知识时进行设计,在你手头有交付内容时进行测试。”X模型包含了测试设计的步骤,就象使用不同的测试工具所要包含的步骤一样,而V模型没有这么做。但是,Marick的例子提示,X模型在这层意义上看也并不是一个真的模型,取而代之的是,应该允许在任何时候选择使用测试设计步骤。

MarickV模型提出质疑,也因为V模型基于一套必须按照一定顺序严格排列的开发步骤,而这很可能并没有反映实际的实践过程。
尽管很多项目缺乏足够的需求,V模型还是从需求处理开始。V模型提示我们要对各开发阶段中已经得到的内容进行测试,但它没有规定我们要取得多少内容。如果没有任何的需求资料,开发人员知道他们要做什么吗?我主张在X模型和其它模型中都需要足够的需求并至少进行一次发布。虽然在没有模型的情况下也必须正常工作,但一个有效的模型,可以鼓励很多好的实践方法的采用。因此,V模型的一个强项是它明确的需求角色的确认,而X模型没有这么做,这大概是X模型的一个不足之处。

Marick
也质疑了单元测试和集成测试的区别,因为在某些场合人们可能会跳过单元测试而热衷于直接进行集成测试。Marick担心人们盲目地跟随学院派的V模型,按照模型所指导的步骤进行工作,而实际上某些做法并不切合实用。我已经尽自己的努力把Marick的关于需要很多具有可伸缩性的行为的期望结合进了X模型,这样,X模型并不要求在进行作为创建可执行程序(图中右上方)的一个组成部分的集成测试之前,对每一个程序片段都进行单元测试(图中左侧的行为)。但X模型没能提供是否要跳过单元测试的判断准则。


X模型填补了V模型和W模型的缺陷,并可为测试人员和开发人员带来明显的帮助。



这篇文章详细介绍了测试里面所涉及到的模型,并加入自己对模型的看法,解释了模型的优缺点

-----------------------编者注:testage_snooker

Test Tech - How SQLite Is Tested

http://sqlite.org/testing.html

1.0 Introduction

The reliability and robustness of SQLite is achieved in large part by thorough and careful testing.

As of version 3.6.11 (all statistics in the report are against that release of SQLite), the SQLite library consists of approximately 62.2 KSLOC of C code. (KSLOC means thousands of "Source Lines Of Code" or, in other words, lines of code excluding blank lines and comments.) By comparison, the project has 716 times as much test code and test scripts - 44568.6 KSLOC.

2.0 Test Harnesses

There are three independent test harnesses used for testing the core SQLite library. Each test harness is designed, maintained, and managed separately from the others.

  1. The TCL Tests are the oldest and most complete set of tests for SQLite. The TCL tests are contained in the same source tree as the SQLite core and like the SQLite core are in the public domain. The TCL tests are the primary tests used during development. The TCL tests are written using the TCL scripting language. The TCL test harness itself consists of 15.7 KSLOC of C code use to create the TCL interface. The test scripts are contained in 456 files totaling 7.6MB in size. There are 23813 distinct test cases, but many of the test cases are parameterized and run multiple times (with different parameters) so that on a full test run, about 1.1 million separate tests are performed.

  2. The TH3 test harness is a set of proprietary tests, written in C. The impetus for TH3 was the need to have a set of tests that ran on embedded and specialized platforms that would not easily support TCL or other workstation services. TH3 tests use only the published SQLite interfaces. These tests are free to SQLite Consortium members and are available by license to others. TH3 consists of about 2.4 MB or 34.0 KSLOC of C code implementing 7014 distinct test cases. TH3 tests are heavily parameterized, though, so a full test runs about 2.3 million different test instances.

  3. The SQL Logic Test or SLT test harness is used to run huge numbers of SQL statements against both SQLite and several other SQL database engines and verify that they all get the same answers. SLT currently compares SQLite against PostgreSQL, MySQL, and Microsoft SQL Server. SLT runs 5.8 million queries comprising 1.10GB of test data.

All of the tests above must run successfully, on multiple platforms and under multiple compile-time configurations, before each release of SQLite.

Prior to each check-in to the SQLite source tree, developers typically run a subset (called "veryquick") of the Tcl tests consisting of about 40.6 thousand test cases and covering 96.94% of the core SQLite source code. The veryquick tests covers everything except the anomaly, fuzz, and soak tests. The idea behind the veryquick tests are that they are sufficient to catch most errors, but also run in only a few minutes instead of a few hours.

3.0 Anomaly Testing

Anomaly tests are tests designed to verify the correct behavior of SQLite when something goes wrong. It is (relatively) easy to build an SQL database engine that behaves correctly on well-formed inputs on a fully functional computer. It is more difficult to build a system that responses sanely to invalid inputs and continues to function following system malfunctions. The anomaly tests are designed to verify the latter behavior.

3.1 Out-Of-Memory Testing

SQLite, like all SQL database engines, makes extensive use of malloc() (See the separate report on dynamic memory allocation in SQLite for additional detail.) On workstations, malloc() never fails in practice and so correct handling of out-of-memory (OOM) errors is not particularly important. But on embedded devices, OOM errors are frightenly common and since SQLite is frequently used on embedded devices, it is important that SQLite be able to gracefully handle OOM errors.

OOM testing is accomplished by simulating OOM errors. SQLite allows an application to substitute an alternative malloc() implementation using the sqlite3_config(SQLITE_CONFIG_MALLOC,...) interface. The TCL and TH3 test harnesses are both capable of inserting a modified version of malloc() that can be rigged fail after a certain number of allocations. These instrumented mallocs can be set to fail only once and then start working again, or to continue failing after the first failure. OOM tests are done in a loop. On the first iteration of the loop, the instrumented malloc is rigged to fail on the first allocation. Then some SQLite operation is carried out and checks or done to make sure SQLite handled the OOM error correctly. Then the time-to-failure counter on the instrumented malloc is increased by one and the test is repeated. The loop continues until the entire operation runs to completion without ever encountering a simulated OOM failure. Tests like this are run twice, once with the instrumented malloc set to fail only once, and again with the instrumented malloc set to fail continuously after the first failure.

3.2 I/O Error Testing

I/O error testing seeks to verify that SQLite responds sanely to failed I/O operations. I/O errors might result from a full disk drive, malfunctioning disk hardware, network outages when using a network file system, system configuration or permission changes that occur in the middle of an SQL operation, or other hardware or operating system malfunctions. Whatever the cause, it is important that SQLite be able to respond correctly to these errors and I/O error testing seeks to verify that it does.

I/O error testing is similar in concept to OOM testing; I/O errors are simulated and checks are made to verify that SQLite responds correctly to the simulated errors. I/O errors are simulated in both the TCL and TH3 test harnesses by inserting a new Virtual File System object that is specially rigged to simulate an I/O error after a set number of I/O operations. As with OOM error testing, the I/O error simulators can be set to fail just once, or to fail continuously after the first failure. Tests are run in a loop, slowly increasing the point of failure until the test case runs to completion without error. The loop is run twice, once with the I/O error simulator set to simulate only a single failure and a second time with it set to fail all I/O operations after the first failure.

In I/O error tests, after the I/O error simulation failure mechanism is disabled, the database is examined using PRAGMA integrity_check to make sure that the I/O error has not introduced database corruption.

3.2 Crash Testing

Crash testing seeks to demonstrate that an SQLite database will not go corrupt if the application or operating system crashes or if there is a power failure in the middle of a database update. A separate white-paper titled Atomic Commit in SQLite describes the defensive measure SQLite takes to prevent database corruption following a crash. Crash tests strive to verify that those defensive measures are working correctly.

It is impractical to do crash testing using real power failures, of course, and so crash testing is done in simulation. An alternative Virtual File System is inserted that allows the test harness to simulate the state of the database file following a crash.

In the TCL test harness, the crash simulation is done in a separate process. The main testing process spawns a child process which runs some SQLite operation and randomly crashes somewhere in the middle of a write operation. A special VFS randomly reorders and corrupts the unsynchronized write operations to simulate the effect of buffered filesystems. After the child dies, the original test process opens and reads the test database and verifies that the changes attempted by the child either completed successfully or else were completely rolled back. The integrity_check PRAGMA is used to make sure no database corruption occurs.

The TH3 test harness needs to run on embedded systems that do not necessarily have the ability to spawn child processes, so it uses an in-memory VFS to simulate crashes. The in-memory VFS can be rigged to make snapshot of the entire filesystem after a set number of I/O operations. Crash tests run in a loop. On each iteration of the loop, the point at which a snapshot is made is advanced until the SQLite operations being tested run to completion without ever hitting a snapshot. Within the loop, after the SQLite operation under test has completed, the filesystem is reverted to the snapshot and random file damage is introduced that is characteristic of the kinds of damage one expects to see following a power loss. Then the database is opened and checks are made to ensure that it is well-formed and that the transaction either ran to completion or was completely rolled back. The interior of the loop is repeated multiple times for each snapshot with different random damage each time.

4.0 Fuzz Testing

Fuzz testing seeks to establish that SQLite response correctly to invalid, out-of-range, or malformed inputs.

4.1 SQL Fuzz

SQL fuzz testing consists of creating syntactically correct yet wildly nonsensical SQL statements and feeding them to SQLite to see what it will do with them. Usually some kind of error is returned (such as "no such table"). Sometimes, purely by chance, the SQL statement also happens to be semantically correct. In that case, the resulting prepared statement is run to make sure it gives a reasonable result.

The SQL fuzz generator tests are part of the TCL test suite. During a full test run, about 105.2 thousand fuzz SQL statements are generated and tested.

4.2 Malformed Database Files

There are numerous test cases that verify that SQLite is able to deal with malformed database files. These tests first build a well-formed database file, then add corruption by changing one or more bytes in the file by some means other than SQLite. Then SQLite is used to read the database. In some cases, the bytes changes are in the middle of data files. This causes the content to change, but does not otherwise impact the operation of SQLite. In other cases, unused bytes of the file are modified. The interesting cases are when bytes of the file that define database structure get changed. The malformed database tests verify that SQLite finds the file format errors and reports them using the SQLITE_CORRUPT return code without overflowing buffers, dereferencing NULL pointers, or performing other unwholesome actions.

4.3 Boundary Value Tests

SQLite defines certain limits on its operation, such as the maximum number of columns in a table, the maximum length of an SQL statement, or the maximum value of an integer. The TCL test suite contains numerous tests that push SQLite right to the edge of its defined limits and verify that it performs correctly for all allowed values. Additional tests go beyond the defined limits and verify that SQLite correctly returns errors.

5.0 Regression Testing

Whenever a bug is reported against SQLite, that bug is not considered fixed until new test cases have been added to the TCL test suite which would exhibit the bug in an unpatched version of SQLite. Over the years, this has resulted in thousands and thousands of new tests being added to the TCL test suite. These regression tests insure that bugs that have been fixed in the past are never reintroduced into future versions of SQLite.

6.0 Automatic Resource Leak Detection

Resource leak occurs when system resources are allocated and never freed. The most troublesome resource leaks in many applications are memory leaks - when memory is allocated using malloc() but never released using free(). But other kinds of resources can also be linked: file descriptors, threads, mutexes, etc.

Both the TCL and TH3 test harnesses automatically track system resources and report resources leaks on every test run. No special configuration or setup is required. The test harnesses are especially vigilant with regard to memory leaks. If a change causes a memory leak, the test harnesses will recognize this quickly. SQLite is designed to never leak memory, even after an exception such as an OOM error or disk I/O error. The test harnesses are zealous to enforce this.

7.0 Test Coverage

The gcov utility is used to measure the "test coverage" of the SQLite test suite. SQLite strives for but does not yet obtain 100% test coverage. A major goal of the SQLite project is to obtain 100% branch coverage during 2009.

Test coverage can be measured in several ways. "Statement coverage" measures (as a percentage of the whole) how many lines of code are exercised by the test cases. The TCL test suite obtains 99.37% statement coverage on the SQLite core. (The SQLite core, in this case, excludes the operating-system dependent VFS backends.) "Branch" coverage measures (again, as a percentage of the whole) how many machine-code branch instructions are taken at least once in both directions. The TCL test suite obtains 95.16% branch coverage.

To illustrate the difference between statement coverage and branch coverage, consider the following hypothetical line of C code:

if( a>b && c!=25 ){ d++; }

Such a line of C code might generate a dozen separate machine code instructions. If any one of those instructions is ever evaluated, then we say that the statement has been tested. So, for example, it might be the case that the conditional expression is always false and the "d" variable is never incremented. Even so, statement coverage counts this line of code as having been tested.

Branch coverage is more strict. With branch coverage, each test and each subblock within the statement is considered separately. In order to achieve 100% branch coverage in the example above, there must be at least three test cases:

  • a<=b
  • a>b && c==25
  • a>b && c!=25

Branch test coverage is normally less than statement coverage since a C program will typically contain some defensive tests which in practice are always true or always false. For testing purposes, the SQLite source code defines macros called ALWAYS() and NEVER(). The ALWAYS() macro surrounds conditions which are expected to always evaluated to true and NEVER() surrounds conditions that are always evaluate to false. These macros serve as comments to indicate that the conditions are defensive code. For standard builds, these macros are pass-throughs:

#define ALWAYS(X)  (X)
#define NEVER(X) (X)

During most testing, however, these macros will throw an assertion fault if their argument does not have the expected truth value. This alerts the developers quickly to incorrect design assumptions.

#define ALWAYS(X)  ((X)?1:assert(0),0)
#define NEVER(X) ((X)?assert(0),1:0)

When measuring test coverage, these macros are defined to be constant truth values so that they do not generate assembly language branch instructions, and hence do not come into play when calculating the branch coverage level:

#define ALWAYS(X)  (1)
#define NEVER(X) (0)

Another macro used in conjuction with test coverage measurement is the testcase() macro. The argument is a condition for which we want test cases that evaluate to both true and false. In non-coverage builds (that is to so, in release builds) the testcase() macro is a no-op:

#define testcase(X)

But in a coverage measuring build, the testcase() macro generates code that evaluates the conditional expression in its argument. Then during analysis, a check is made to insure tests exists that evaluate the conditional to both true and false. Testcase() macros are used, for example, to help verify that boundary values are tested. For example:

testcase( a==b );
testcase( a==b+1 );
if( a>b && c!=25 ){ d++; }

Testcase macros are also used when two or more cases of a switch statement go to the same block of code, to make sure that the code was reached for all cases:

switch( op ){
case OP_Add:
case OP_Subtract: {
testcase( op==OP_Add );
testcase( op==OP_Subtract );
/* ... */
break;
}
/* ... */
}

For bitmask tests, testcase() macros are used to verify that every bit of the bitmask effects the test. For example, in the following block of code, the condition is true if the mask contains either of two bits indicating either a MAIN_DB or a TEMP_DB is being opened. The testcase() macros that preceed the if statement verify that both cases are tested:

testcase( mask & SQLITE_OPEN_MAIN_DB );
testcase( mask & SQLITE_OPEN_TEMP_DB );
if( (mask & (SQLITE_OPEN_MAIN_DB|SQLITE_OPEN_TEMP_DB))!=0 ){ ... }

The developers of SQLite have found that coverage testing is an extremely productive method for finding bugs. Because such a high percentage of SQLite core code is covered by test cases, the developers can have confident that changes they make in one part of the code do not have unintended consequences in other parts of the code. It would not be possible to maintain the quality of SQLite without coverage testing.

8.0 Dynamic Analysis

Dynamic analysis refers to internal and external checks on the SQLite code which are performed while the code is live and running. Dynamic analysis has proven to be a great help in maintaining the quality of SQLite.

8.1 Assert

The SQLite core contains 2442 assert() statements that verify function preconditions and postconditions and loop invariants. Assert() is a macro which is a standard part of ANSI-C. The argument is a boolean value that is assumed to always be true. If the assertion is false, the program prints an error message and halts.

Assert() macros are disabled by compiling with the NDEBUG macro defined. In most systems, asserts are enabled by default. But in SQLite, the asserts are so numerous and are in such performance critical places, that the database engine runs about three times slower when asserts are enabled. Hence, the default (production) build of SQLite disables asserts. Assert statements are only enabled when SQLite is compiled with the SQLITE_DEBUG preprocessor macro defined.

8.2 Valgrind

Valgrind is perhaps the most amazing and useful developer tool in the world. Valgrind is a simulator - it simulates an x86 running a linux binary. (Ports of valgrind for platforms other than linux are in development, but as of this writing, valgrind only works reliably on linux, which in the opinion of the SQLite developers means that linux should be preferred platform for all software development.) As valgrind runs a linux binary, it looks for all kinds of interesting errors such as array overruns, reading from uninitialized memory, stack overflows, memory leaks, and so forth. Valgrind finds problems that can easily slip through all of the other tests run against SQLite. And, when valgrind does find an error, it can dump the developer directly into a symbolic debugger at the exact point where the error occur, to facilitate a quick fix.

Because it is a simulator, running a binary in valgrind is slower than running it on native hardware. So it is impractical to run the full SQLite test suite through valgrind. However, the veryquick tests and a subset of the TH3 tests are run through valgrind prior to every release.

8.3 Memsys2

SQLite contains a pluggable memory allocation subsystem. The default implementation uses system malloc() and free(). However, if SQLite is compiled with SQLITE_MEMDEBUG, an alternative memory allocation wrapper (memsys2) is inserted that looks for memory allocation errors at run-time. The memsys2 wrapper checks for memory leaks, of course, but also looks for buffer overruns, uses of uninitialized memory, and attempts to use memory after it has been freed. These same checks are also done by valgrind (and, indeed, valgrind does them better) but memsys2 has the advantage of being much faster than valgrind, which means the checks can be done more often and for longer tests.

8.4 Mutex Asserts

SQLite contains a pluggable mutex subsystem. Depending on compile-time options, the default mutex system contains interfaces sqlite3_mutex_held() and sqlite3_mutex_notheld() that detect whether or not a particular mutex is held by the calling thread. These two interfaces are used extensively within assert() statements in SQLite to verify mutexes are held and released at all the right moments, in order to double-check that SQLite does work correctly in multi-threaded applications.

8.5 Journal Tests

One of the things that SQLite does to insure that transactions are atomic across system crashes and power failures is to write all changes into the rollback journal file prior to changing the database. The TCL test harness contains an alternative Virtual File System implementation that helps to verify this is occurring correctly. The "journal-test VFS" monitors all disk I/O traffic between the database file and rollback journal, checking to make sure that nothing is written into the database file which has not first by written and synced to the rollback journal. If any discrepancies are found, an assertion fault is raised.

The journal tests are an additional double-check over and above the crash tests to make sure that SQLite transactions will be atomic across system crashes and power failures.

9.0 Static Analysis

Static analysis means analyzing code at or before compile-time to check for correctness. Static analysis consists mostly of making sure SQLite compiles without warnings, even when all warnings are enabled. SQLite is developed primarily using GCC and it does compile without warnings on GCC using the -Wall and -Wextra flags. There are occasional reports of warnings coming from VC++, however.

Static analysis has not proven to be helpful in finding bugs. We cannot call to mind a single problem in SQLite that was detected by static analysis that was not first seen by one of the other testing methods described above. On the other hand, we have on occasion introduced new bugs in our efforts to get SQLite to compile without warnings.

Our experience, then, is that static analysis is counter-productive to quality. In other words, focusing on static analysis (being concerned with compiler warnings) actually reduces the quality of the code. Nevertheless, we developers have capitulated to pressure from users and actively work to eliminate compiler warnings. We are willing to do this because the other tests described above do an excellent job of finding the bugs that are often introduced when removing compiler warnings, so that product quality is probably not decreased as a result.

10.0 Summary

SQLite is open source. That give many people with the idea that it is not well tested and is perhaps unreliable. But that impression is false. SQLite has exhibited very high reliability in the field and a very low defect rate, especially considering how rapidly it is evolving. The quality of SQLite is achieved in part by careful code design and implementation. But extensive testing also plays a vital role in maintaining and improving the quality of SQLite. This document has summarized the testing procedures that every release of SQLite undergoes with the hopes of inspiring the reader to understand that SQLite is suitable for use in mission-critical applications.