A gentle introduction to symbolic execution (opens in new tab)

(blog.monic.co)

190 pointsjoelburget7y ago27 comments

27 comments

25 comments · 7 top-level

joelburgetOP7y ago· 8 in thread

Co-author here. We're planning to turn this into a series, with additional posts on how symbolic execution works at a lower level, how to present counterexamples to users, and ideas for future programming tools built on symbolic execution.

dfabulich7y ago

One piece of feedback: I've often wished to work on static analysis tools, but too much of the work in this area is focused on Haskell or Lisp. (All of the code in this article is in Haskell or Lisp.)

Does any of this stuff work on Java, Python, or JavaScript? I want to improve the quality of real-world programs written in industry-standard languages.

This "gentle introduction" makes it seem like the first step of the process is "port your app to a homoiconic language," -- or, in other words, "you can prove the correctness of any language you want, as long as it's Haskell or Lisp."

nickpsecurity7y ago

https://en.wikipedia.org/wiki/List_of_tools_for_static_code_...

dfabulich7y ago

That's static analysis, but not symbolic execution. Looking at the JavaScript section, you can't do any of the stuff described in the article with JSLint.

2 more replies

sophiebits7y ago

Prepack is an example of a (not production ready) tool to do abstract evaluation of JS.

CalChris7y ago

Seriously helpful article. Looking forward to the sequels.

  we can map each syntax tree to an interpreter

Maybe I'm being dense but I'm not sure what you mean by map here. A map is from set to set. I'm not thinking of an interpreter as a set.

joelburgetOP7y ago

I'll use a bit of Haskell in my response, since that's what I'm most comfortable with and what we actually use for our work. Hopefully that's okay.

We can define a data type of expressions in our language, with addition, multiplication, etc.

  data Expr = Addition Expr Expr | Multiplication Expr Expr | ...

Then we need the type of an interpreter which consumes program inputs and produces an output. I'll leave this abstract for now.

  data Interpreter = ...

By map, we just mean that we can write a Haskell function with this type

  interpret :: Expr -> Interpreter
  interpret (Addition a b) = (+) <$> interpret a <*> interpret b
  interpret (Multiplication a b) = (*) <$> interpret a <*> interpret b

If you're curious, here's some real symbolic evaluation code I wrote doing the same https://github.com/kadena-io/pact/blob/234ba3dd01f0df8b4e462...

CalChris7y ago

Well, I'm glad I asked. I get this now. It really is a mapping. But this was a bit of shorthand that I didn't get in the post. You might want to add that.

Xophmeister7y ago

The set of interpreters.

triska7y ago· 5 in thread

Very nice, thank you for sharing this!

Symbolic execution is also known as abstract interpretation. The program is being interpreted, with concrete values abstracted away, generalized to symbolic elements that often denote several or even infinitely many concrete values.

Logic programming languages like Prolog are especially amenable to abstract interpretation, since we can absorb the Prolog system's built-in binding environment for variables, and simply plug in different values for the existing variables. We only have to reify unification, i.e., implement it within the language with suitable semantics.

An impressive application of this idea is contained in the paper Meta-circular Abstract Interpretation in Prolog by Michael Codish and Harald Søndergaard:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.137....

As one of the examples, the authors show how abstract interpretation over the abstract parity domain {zero, one, even, odd} can be used to derive non-trivial facts about the Ackermann function.

In particular, they deduce that Ackermann(i,j) is odd and greater than 1 for all i greater than 1, by taking a Prolog implementation of the Ackermann function, and interpreting it with these abstract domain elements instead of all concrete values (which would not terminate, since there are infinitely many concrete values). This is computed by fixpoint computation, determining the deductive closure of the relation.

mjn7y ago

> Symbolic execution is also known as abstract interpretation

These aren't the same thing! They do both use abstraction of concrete values, but abstract interpretation solves for fixpoints in a lattice of abstract values, which is a fairly different process from how symbolic execution works. The answers here go into more detail: https://cstheory.stackexchange.com/questions/19708/symbolic-...

One way of digging into abstract interpretation non-theoretically is via this recently open-sourced framework: https://code.fb.com/open-source/sparta/

rscho7y ago

https://docs.racket-lang.org/tutorial/index.html

For a very non-theoretical forage into that.

BTW the guy you are answering to is a prolog expert.

TACIXAT7y ago

This is the most digestible intro to abstract interpretation that I've found, https://wiki.mozilla.org/Abstract_Interpretation.

archi427y ago

I believe GP wanted to say that Symbolic Execution is a subset of Abstract Interpretation; because here the prolog expert is wrong: SE is not strictly equal to AI (though I'd gladly believe he didn't actually meant to say that).

AI itself boils down to "Build some lattices, construct a Galois Connection between them and add some abstract transfer function on top of it" (and make sure that all the necessary correctness equations hold; this is the difficult part). For SE you then choose the appropriate lattices and construct suitable transfer functions.

BTW: SE usually also needs a fixed-point iteration to produce meaningful results. At least when considering cyclic control flow graphs (loops & recursion).

vector_spaces7y ago

A nice and accessible (to someone with minimal introductory pure math under their belt) introduction to abstract interpretation: https://mitpress.mit.edu/books/lazy-functional-languages

souprock7y ago· 2 in thread

There are actual jobs, in industry, making use of this stuff. For example, here is one that I put in the "Who is hiring?" item this month:

https://news.ycombinator.com/item?id=19543995

Typically one would convert a binary executable into some other form, then analyze it to find all possible bugs. Of course one quickly slams into troubles like the halting problem, but it is still usually possible to gain useful understanding of the binary executable.

saagarjha7y ago

I noticed that you didn’t mention an employer in that listing.

schoen7y ago

Looks like it's probably Raytheon Cyber:

https://www.reddit.com/r/ReverseEngineering/comments/9x0v2z/...

llamathrowaway7y ago· 1 in thread

Thank you for this. However the labels 'syntax Haskell' and 'syntax LLVM' do not seem to display properly on Firefox. Wish you could find a way to fix them.

joelburgetOP7y ago

Thanks for the heads up. It never occurred to me the svg might render differently. I'll see if we can fix this.

bediger40007y ago· 1 in thread

I visited this expecting a tutorial about burning someone in effigy. Imagine my surprise!

joelburgetOP7y ago

Stick around for part 2. We haven't gotten to the good stuff yet :)

fwip7y ago· 1 in thread

When you say “what is the result when x is 3 and y is 100?” - shouldn't the answer be 113, not 103?

joelburgetOP7y ago

You have a keen eye -- thanks for the close reading.

kazinator7y ago

When a program has a free variable, we can consider an entire space of possible execution paths, one for each possible value the variable could take.

Or we could put on a greasy old pragmatic ball cap and issue an "unbound variable in line 42" error.

A free free variable has no "possible value". If a variable has a value, it has a binding; if it has a binding, it is not a free variable.

If we suspect that the variable is not actually free, but rather its definition has not yet appeared, we can defer the processing. Perhaps capture a continuation of the symbolic execution machine, and dispatch it later.

If the expression which refers to the free variable is not actually evaluated (as our symbolic execution shows), we can squelch the warning, or change its wording.

The idea that a free variable is implicitly universally quantified works in logic, but it doesn't transfer to an ordinary programming language (i.e. not Prolog, and its ilk) very well.

j / k navigate · click thread line to collapse

27 comments

25 comments · 7 top-level

joelburgetOP7y ago· 8 in thread

dfabulich7y ago

Does any of this stuff work on Java, Python, or JavaScript? I want to improve the quality of real-world programs written in industry-standard languages.

nickpsecurity7y ago

https://en.wikipedia.org/wiki/List_of_tools_for_static_code_...

dfabulich7y ago

That's static analysis, but not symbolic execution. Looking at the JavaScript section, you can't do any of the stuff described in the article with JSLint.

2 more replies

sophiebits7y ago

Prepack is an example of a (not production ready) tool to do abstract evaluation of JS.

CalChris7y ago

Seriously helpful article. Looking forward to the sequels.

  we can map each syntax tree to an interpreter

Maybe I'm being dense but I'm not sure what you mean by map here. A map is from set to set. I'm not thinking of an interpreter as a set.

joelburgetOP7y ago

I'll use a bit of Haskell in my response, since that's what I'm most comfortable with and what we actually use for our work. Hopefully that's okay.

We can define a data type of expressions in our language, with addition, multiplication, etc.

  data Expr = Addition Expr Expr | Multiplication Expr Expr | ...

Then we need the type of an interpreter which consumes program inputs and produces an output. I'll leave this abstract for now.

  data Interpreter = ...

By map, we just mean that we can write a Haskell function with this type

  interpret :: Expr -> Interpreter
  interpret (Addition a b) = (+) <$> interpret a <*> interpret b
  interpret (Multiplication a b) = (*) <$> interpret a <*> interpret b

If you're curious, here's some real symbolic evaluation code I wrote doing the same https://github.com/kadena-io/pact/blob/234ba3dd01f0df8b4e462...

CalChris7y ago

Well, I'm glad I asked. I get this now. It really is a mapping. But this was a bit of shorthand that I didn't get in the post. You might want to add that.

Xophmeister7y ago

The set of interpreters.

triska7y ago· 5 in thread

Very nice, thank you for sharing this!

An impressive application of this idea is contained in the paper Meta-circular Abstract Interpretation in Prolog by Michael Codish and Harald Søndergaard:

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.137....

As one of the examples, the authors show how abstract interpretation over the abstract parity domain {zero, one, even, odd} can be used to derive non-trivial facts about the Ackermann function.

mjn7y ago

> Symbolic execution is also known as abstract interpretation

One way of digging into abstract interpretation non-theoretically is via this recently open-sourced framework: https://code.fb.com/open-source/sparta/

rscho7y ago

https://docs.racket-lang.org/tutorial/index.html

For a very non-theoretical forage into that.

BTW the guy you are answering to is a prolog expert.

TACIXAT7y ago

This is the most digestible intro to abstract interpretation that I've found, https://wiki.mozilla.org/Abstract_Interpretation.

archi427y ago

BTW: SE usually also needs a fixed-point iteration to produce meaningful results. At least when considering cyclic control flow graphs (loops & recursion).

vector_spaces7y ago

A nice and accessible (to someone with minimal introductory pure math under their belt) introduction to abstract interpretation: https://mitpress.mit.edu/books/lazy-functional-languages

souprock7y ago· 2 in thread

There are actual jobs, in industry, making use of this stuff. For example, here is one that I put in the "Who is hiring?" item this month:

https://news.ycombinator.com/item?id=19543995

saagarjha7y ago

I noticed that you didn’t mention an employer in that listing.

schoen7y ago

Looks like it's probably Raytheon Cyber:

https://www.reddit.com/r/ReverseEngineering/comments/9x0v2z/...

llamathrowaway7y ago· 1 in thread

Thank you for this. However the labels 'syntax Haskell' and 'syntax LLVM' do not seem to display properly on Firefox. Wish you could find a way to fix them.

joelburgetOP7y ago

Thanks for the heads up. It never occurred to me the svg might render differently. I'll see if we can fix this.

bediger40007y ago· 1 in thread

I visited this expecting a tutorial about burning someone in effigy. Imagine my surprise!

joelburgetOP7y ago

Stick around for part 2. We haven't gotten to the good stuff yet :)

fwip7y ago· 1 in thread

When you say “what is the result when x is 3 and y is 100?” - shouldn't the answer be 113, not 103?

joelburgetOP7y ago

You have a keen eye -- thanks for the close reading.

kazinator7y ago

When a program has a free variable, we can consider an entire space of possible execution paths, one for each possible value the variable could take.

Or we could put on a greasy old pragmatic ball cap and issue an "unbound variable in line 42" error.

A free free variable has no "possible value". If a variable has a value, it has a binding; if it has a binding, it is not a free variable.

If the expression which refers to the free variable is not actually evaluated (as our symbolic execution shows), we can squelch the warning, or change its wording.

The idea that a free variable is implicitly universally quantified works in logic, but it doesn't transfer to an ordinary programming language (i.e. not Prolog, and its ilk) very well.

j / k navigate · click thread line to collapse