SymQEMU

SymQEMU: Compilation-based symbolic execution for binaries

Sebastian Poeplau and Aurélien Francillon
Proceedings of the Network and Distributed System Symposium (NDSS 2021), San Diego, CA, USA
PDF slides (PDF) Bibtex Evaluation data

Symbolic execution is a powerful technique for software analysis and bug detection. Compilation-based symbolic execution is a recently proposed flavor that has been shown to improve the performance of symbolic execution significantly when source code is available. We demonstrate a novel technique to enable compilation-based symbolic execution of binaries (i.e., without the need for source code). Our system, SymQEMU, builds on top of QEMU, modifying the intermediate representation of the target program before translating it to the host architecture. This enables SymQEMU to compile symbolic-execution capabilities into binaries and reap the associated performance benefits while maintaining architecture independence.

We present our approach and implementation, and we show that it outperforms the state-of-the-art binary symbolic executors S2E and QSYM with statistical significance; on some benchmarks, it even achieves better performance than the source-based SymCC. Moreover, our tool has found a previously unknown vulnerability in the well-tested libarchive library, demonstrating its utility in testing real-world software.

Introduction

SymQEMU is a fast symbolic execution engine for binaries. On this page, we provide its source code, the raw results of the experiments described in the paper, and instructions how you can replicate those experiments yourself.

Code

SymQEMU is available on GitHub.

Experiments

In the paper, we describe three sets of experiments: we first benchmark SymQEMU with Google FuzzBench, then we run it on real-world software, and finally we perform a benchmark comparison during concolic execution of fixed paths. This section describes how to replicate our experiments, and provides links to our results.

FuzzBench (see the report)

We will share our integration scripts shortly; they're being cleaned up to obtain SymQEMU and its dependencies from the new public repository.
Real-world software

For the analysis of real-world software we used the same setup as in the evaluation of SymCC. The binaries for SymQEMU, QSYM and S2E were plain builds without any instrumentation. SymQEMU was run via SymCC's fuzzing helper by prefixing the target command with /path/to/symqemu-x86_64. For the S2E analysis, we created a default project, then enabled the FunctionModels plugin and activated the option generateOnStateFork in the TestCaseGenerator plugin; coverage was evaluated with afl-showmap at the end of the analysis, using the same AFL-instrumented binaries as with the hybrid fuzzers.
- OpenJPEG (our results), libarchive (our results), tcpdump (our results): please find the details on our SymCC page.
- WinRAR (our results): we downloaded version 6.00 for 64-bit Linux.
Benchmark experiments (our results)

After the analysis of real-world software described above, we randomly collected 1,000 generated test cases per open-source target. We ran SymQEMU, QSYM and SymCC on each of those inputs, recording the time spent in execution and SMT solving, respectively, as per the logging output from the QSYM backend.

Acknowledgements

This work has been supported partly by the DAPCODS/IOTics ANR 2016 project (ANR-16-CE25-0015) and partly by the Defense Advanced Research Projects Agency (DARPA) under agreement number FA875019C0003.

Contact

Feel free to reach out to us if anything is unclear or if you need more information.