Seminar by Sumit Gulwani
Translation Validation (using Randomization)
Sumit Gulwani
Graduate student, EECS Department
University of California, Berkeley, CA, USA.
Date: Friday, January 11, 2002
Time: 03:30 PM
Venue: CS-101
Abstract
Compilers are an indispensable tool in software engineering because they allow the programmer to work at a higher level of abstraction and to ignore many details of the underlying hardware. While this simplifies programming considerably, it introduces a semantic gap between the source programs that we design, analyze, and reason about and the compiled programs that we actually execute. This is a major problem for safety critical systems. Formally proving the correctness of a realistic optimizing compiler seems to be an unrealistic goal because of the huge size and complex nature of the software involved, its ongoing evolution modification, and, possibly, proprietary considerations. This observation has inspired an alternative technique of translation validation whose goal is to prove the correctness of each translation rather than the correctness of a compiler. Checking program equivalence is an undecidable problem. But the key point to note is that we are not dealing with any arbitrary set of two programs whose equivalence is to be decided. In fact, one of the programs has been obtained by translation from the other program by using a fixed set of transformation rules. Also, the translator did not perform an intractable analysis. Hence, we can hope to solve the problem of translation validation provided the transformation rules satisfy some realistic constraints. We have focussed on using randomized techniques to solve the problem of translation validation. These techniques have been much better (i.e., more powerful, faster, and easier to implement) than their deterministic counterparts in tackling this undecidable problem.
The problem of an exponential blowup is imminent in translation validation even for simple programs. There are 2n paths in a loop-free program with n conditionals. If a program refers to n memory locations (which may be aliased with each other), then there are O(n**n) possible aliasing scenarios. It is necessary to consider all these exponential possibilities while deciding the equivalence of two such programs. The number of terms in the polynomial computed by a straight line program using simple operators and n statements can be O(2**n) and the only deterministic algorithm (that is known to date) to decide the equivalence of two polynomials is to compare the polynomials term-wise. Our randomized techniques help avoid this exponential blowup. This does not mean that we have polynomial time randomized algorithms for NP-hard problems. What we claim is that our randomized techniques are sound, but they are also complete only if some of the transformation rules satisfy some constraints. Our partial implementation of these techniques has verified our belief that these constraints are satisfied in practice. Our polynomial time randomized techniques are not only more powerful than the polynomial time deterministic algorithms, but they are also faster and easier to implement.
Our randomized techniques are also applicable to a wide class of program analysis problems. The use of randomization in program analysis is a novel idea. Testing has often been used as an alternative to formal verification for verifying properties of programs. Though testing is fast, easy to perform, and has been useful in finding bugs, it cannot prove the absence of bugs. On the other hand, formal verification tools and techniques can prove the absence of bugs but often have exponential complexity and do not scale well to large programs. Our proposed randomized techniques offer the best of both worlds - they are as fast and as simple as testing and also provide guarantees of absence of bugs (with an infinitesimally small probability of error).
About the Speaker
Sumit Gulwani obtained his BTech (Computer Science) from IIT Kanpur in 2000. He was awarded the President's Gold Medal for his performance in the undergraduate program. He is currently a graduate student in EECS department (Computer Science Division) at University of California, Berkeley.