Symbolic Execution: Intermediate Representation

Systematic Comparison of Symbolic Execution Systems: Intermediate Representation and its Generation
Sebastian Poeplau and Aurélien Francillon
Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC) 2019, San Juan, Puerto Rico
PDF Bibtex

Symbolic execution has become a popular technique for software testing and vulnerability detection. Most implementations transform the program under analysis to some intermediate representation (IR), which is then used as a basis for symbolic execution. There is a multitude of available IRs, and even more approaches to transform target programs into a respective IR.


When developing a symbolic execution engine, one needs to choose an IR, but it is not clear which influence the IR generation process has on the resulting system. What are the respective benefits for symbolic execution of generating IR from source code versus lifting machine code? Does the distinction even matter? What is the impact of not using an IR, executing machine code directly? We feel that there is little scientific evidence backing the answers to those questions. Therefore, we first develop a methodology for systematic comparison of different approaches to symbolic execution; we then use it to evaluate the impact of the choice of IR and IR generation. We make our comparison framework available to the community for future research.

Introduction

On this site, we publish all code and data that we used in the study referenced above, as promised in the paper. Feel free to contact us if you have any questions.

Code

To measure the size of the IR representations:

Let's move on to the changes we made in the four analyzed symbolic execution engines. For now, we simply link our patches here. However, we are in the process of submitting as many of our code changes as possible to the respective upstream projects.

In all cases, we fed the SMT queries to z3 -T:1800 -smt2 for benchmarking.

Data

The CGC programs published for the DARPA Cyber Grand Challenge are the basis of our comparison. We used the DECREE binaries provided by Yan Shoshitaishvili for angr and S2E, and custom builds of the Linux port by Trail of Bits for KLEE and Qsym. While the former two systems simply processed the unmodified binaries, the following instructions, based on notes by Brendan Dolan-Gavitt, let you produce test programs for KLEE and Qsym:

Finally, we wrote a small tool that converts PoV descriptions from the original XML format to raw inputs, as well as one that produces ".ktest" files from raw inputs (required for KLEE). The PoV conversion does not work in all cases—when the PoV uses received data to derive what to send next, we cannot translate it to a simple binary input (as noted in the paper). The tools are not currently very user friendly (sorry!). We are working on easier to use versions.

Acknowledgements

This work was supported by the DAPCODS/IOTics ANR 2016 project (ANR-16-CE25-0015).

Contact

Feel free to reach out to us if anything is unclear or if you need more information.