Symbolic Execution

Here we collect various projects on the subject of symbolic execution.

Systematic Comparison of Symbolic Execution Systems: Intermediate Representation and its Generation

Sebastian Poeplau and Aurélien Francillon
Proceedings of the 35th Annual Computer Security Applications Conference (ACSAC) 2019, San Juan, Puerto Rico
PDF Bibtex

Symbolic execution has become a popular technique for software testing and vulnerability detection. Most implementations transform the program under analysis to some intermediate representation (IR), which is then used as a basis for symbolic execution. There is a multitude of available IRs, and even more approaches to transform target programs into a respective IR.

When developing a symbolic execution engine, one needs to choose an IR, but it is not clear which influence the IR generation process has on the resulting system. What are the respective benefits for symbolic execution of generating IR from source code versus lifting machine code? Does the distinction even matter? What is the impact of not using an IR, executing machine code directly? We feel that there is little scientific evidence backing the answers to those questions. Therefore, we first develop a methodology for systematic comparison of different approaches to symbolic execution; we then use it to evaluate the impact of the choice of IR and IR generation. We make our comparison framework available to the community for future research.

Symbolic execution with SymCC: Don't interpret, compile!

Sebastian Poeplau and Aurélien Francillon
Proceedings of the 29th USENIX Security Symposium (USENIX Security 20), Boston, MA, USA
PDF Bibtex

A major impediment to practical symbolic execution is speed, especially when compared to near-native speed solutions like fuzz testing. We propose a compilation-based approach to symbolic execution that performs better than state-of-the-art implementations by orders of magnitude. We present SymCC, an LLVM-based C and C++ compiler that builds concolic execution right into the binary. It can be used by software developers as a drop-in replacement for clang and clang++, and we show how to add support for other languages with little effort. In comparison with KLEE, SymCC is faster by up to three orders of magnitude and an average factor of 12. It also outperforms QSYM, a system that recently showed great performance improvements over other implementations, by up to two orders of magnitude and an average factor of 10. Using it on real-world software, we found that our approach consistently achieves higher coverage, and we discovered two vulnerabilities in the heavily tested OpenJPEG project, which have been confirmed by the project maintainers and assigned CVE identifiers.

SymQEMU: Compilation-based symbolic execution for binaries

Sebastian Poeplau and Aurélien Francillon
Proceedings of the Network and Distributed System Symposium (NDSS 2021), San Diego, CA, USA
PDF Bibtex

Symbolic execution is a powerful technique for software analysis and bug detection. Compilation-based symbolic execution is a recently proposed flavor that has been shown to improve the performance of symbolic execution significantly when source code is available. We demonstrate a novel technique to enable compilation-based symbolic execution of binaries (i.e., without the need for source code). Our system, SymQEMU, builds on top of QEMU, modifying the intermediate representation of the target program before translating it to the host architecture. This enables SymQEMU to compile symbolic-execution capabilities into binaries and reap the associated performance benefits while maintaining architecture independence.

We present our approach and implementation, and we show that it outperforms the state-of-the-art binary symbolic executors S2E and QSYM with statistical significance; on some benchmarks, it even achieves better performance than the source-based SymCC. Moreover, our tool has found a previously unknown vulnerability in the well-tested libarchive library, demonstrating its utility in testing real-world software.

You may also be interested in Inception, a framework for symbolic execution of firmware.