Effective Property-Based Testing (opens in new tab)

(blog.auxon.io)

87 pointsmullr5y ago34 comments

34 comments

33 comments · 14 top-level

spockz5y ago· 6 in thread

I would like to make a shout out to quickcheck (Haskell, https://hackage.haskell.org/package/QuickCheck) as it was my first introduction to property based testing and has since let me to use property based testing everywhere in the last 10 years. Then there is scalacheck (https://www.scalacheck.org/). Both let you write your own generators and are quite good at shrinking down to minimal use cases.

The suggestion elsewhere in this thread to decrease the number of iterations during normal testing and crank it up during nightlies is also good.

The only thing I’m still missing from the libraries is a convenient mechanism of remembering previously failing generated inputs and use them as a static list of testcases next to the at runtime generated ones like a regression test of sorts.

Edit: typos

dwohnitmok5y ago

> I would like to make a shout out to quickcheck (Haskell, https://hackage.haskell.org/package/QuickCheck) as it was my first introduction to property based testing

As far as I know the Haskell Quickcheck library by John Hughes was in fact the first library to lay out property-based testing. He then went on to create a paid and expanded version of the library in Erlang. And then as Quickcheck rose in popularity it's been re-implemented in many many different languages.

im_down_w_otp5y ago

Indeed. It's an unfortunate general omission in the ecosystem. It's one of the things that our product does (in its appropriation from different inspirations like MC, PBT, SBFL, etc., but we move things up to assessing systems rather than programs) where contradictions and counter-examples are curated specifically for making it easier to reuse them. They're provided back to the user as executable "properties" in our DSL to apply to past, present, future data.

disclaimer: Auxon co-founder

Also, big (h/t) to Quickcheck from me as well. Getting into it via Erlang many, many years ago was among the more impactful and transformative developments in my approach to thinking about improving software quality.

spockz5y ago

In the article you (auxon) wrote to avoid using type based generators and unbounded collections and instead write your own generators. Usually I find it more convenient to write filters on the generated types then create my own generators. As the post mentions, this can lead to many discarded inputs. Have you ever considered how we could use the predicates in the filters to create Specialized generators that only generate inputs that will be accepted by the filter? You probably need some meta programming or reification of the predicates for that to work.

mristin5y ago

You can already do that with a family of predicates if you write preconditions in your Python code (see my previous comment [1]). There is an ongoing discussion how to bring this in into Hypothesis (see the issue [2]).

[1]: https://news.ycombinator.com/item?id=26018386

[2]: https://github.com/HypothesisWorks/hypothesis/issues/2701

edit: newlines

mullrOP5y ago

> Have you ever considered how we could use the predicates in the filters to create Specialized generators that only generate inputs that will be accepted by the filter? You probably need some meta programming or reification of the predicates for that to work.

That's a really nice idea. It doesn't fit into the usual compositional design of the proptest libraries that I've used, but it certainly seems like it should be possible in principle.

amw-zero5y ago

Hypothesis does what you’re talking about. It stores failing examples and chooses them during subsequent runs.

the_gastropod5y ago· 6 in thread

Property testing is awesome, but it does significantly slow down test suites. Are there any standard practices for striking a balance of getting the added confidence/safety of using property tests without introducing large delays into your CI pipeline?

mullrOP5y ago

We like to run the tests as part of CI with a relatively small number of iterations, and then turn the knob way up in a nightly or weekly scheduled test job.

Smaug1235y ago

Yup. We've got some stuff set up so that in the IDE your tests finish in ~100ms; in precommit testing pipelines you get several seconds; in nightly tests you get two hours.

hinkley5y ago

My understanding, shallow as it is, is that property testing goes great with pure functional code, because without side effects you can run tests in parallel, taking advantage of all of these cores and build servers we have.

If your tests are coupled, you're already in a bad way whether you know it or not. Dumping property testing on top of that without addressing the underlying cause sounds like a recipe for misery.

It's probably a great stick and carrot if you're pushing a tech debt reduction agenda though.

Jtsummers5y ago

Item 5 in TFA is one that I found related to performance most. Using filters to toss out bad input means you're potentially generating thousands of inputs for dozens of tests. Aggregate this across all your tests and it's an incredible waste of time. Constructing proper inputs directly can get you closer to the minimum time required to execute the tests and help achieve that desired performance threshold. (NB: Not used professionally, only got one office to even entertain them but they wouldn't take it and run with it even after I showed them a dozen errors I'd found in an hour of using them. Consequently my experience is still limited.)

k__5y ago

You could fan-out tests to multiple FaaS instances.

Cloudflare Workers, for example, have no cold-starts. But they are JS/WASM only

amw-zero5y ago

That’s a lot of buzzwords you got there

lowercase15y ago· 2 in thread

This is a good overview of property based testing but the mentions of Cucumber threw me off.

In my job Cucumber seems to add little more than just commenting and sequencing functions, tasks that are better suited to your programming language of choice, while adding overhead and complexity.

What am I missing?

mullrOP5y ago

To me, Cucumber is a way to write executable specifications that can also be read by non-developers. This can be tremendously powerful if done judiciously, or tremendously pointless if not. This may be because you do not in fact want or need executable specification, or perhaps because there's nobody interested in reading it.

I believe that Cucumber is at its best in situations where it's clear to all parties that a specification is valuable. In that case, making the specification executable is very clearly a massively useful way to spend your time.

Vinnl5y ago

But if they're only read, not written, by non-developers, wouldn't it make more sense to generate a report from your spec code, rather than using the report as a cumbersome authoring language?

pfdietz5y ago· 2 in thread

Turning off shrinking makes little sense to me. If tests are passing there's nothing to shrink. If you are bombarded with so many failures you can't afford to shrink them then more testing is pointless.

amw-zero5y ago

The author’s point is that shrinking can take a very long time, and that’s what you should consider turning it off. In my experience this is also true.

They’re saying that sometimes it’s better to just see the error in full and try and figure it out.

pfdietz5y ago

That is not my experience (doing this sort of testing on compilers). In my experience, the shrinking is usually very fast. When it takes a non-negligible amount of time, debugging the failure would have been hopeless on the non-shrunk input.

What takes the time in my case was simply getting a failure to occur at all. It might take days on a mature compiler before a failure occurs, if then. This would be millions of attempts.

Smaug1235y ago· 1 in thread

Brilliant article.

If I were to add just one thing to the list: metatest. Write a test that asserts that your generated test cases are "sufficiently comprehensive", for whatever value of "sufficiently" you need. In an impure language, this is as easy as having the generator contain a mutable counter for "number of test cases meeting X condition" for whatever conditions you're interested in. For example, say your property is "A iff B". You might want to fail the test if fewer than 10% of the generated cases actually had A or B hold. (And then, of course, make sure your generators are such that - say - A and B hold 50% of the time; you want an astronomically small chance of random metatest failure.)

(I did a brief intro to this in the Metatesting section of a talk I did two years ago: https://github.com/Smaug123/talks/blob/master/DogeConf2019/D... . On rereading it now, I see there's a typo on the "bounded even integers" slide, where the final `someInts` should read `evenIntegers`.)

zhd5y ago

Hypothesis can report statistics on user-defined events, as well as the usual timing stuff: https://hypothesis.readthedocs.io/en/latest/details.html#tes...

I'd just check that when you're writing or changing the tests though; for nontrivial conditions it can take a very long time to get neglibible probability of any metatest failing in a given run, and flaky metatests are just as bad as the usual kind.

If this split is particularly important, we'd usually recommend just writing separate tests for data that satisfy A or B; you can even supply the generators with pytest.mark.parametrize if copy-pasting the test body offends.

mathw5y ago· 1 in thread

I've been trying to push property-based testing at work but there's not a load of enthusiasm showing. I guess it's in part because we work in C# and the best option appears to be FsCheck, which is very inconsistently documented between its F#-native and C#-accessible APIs.

I think there's a strong argument with FsCheck to write all your proptest code in F# just to take advantage of the vastly better generator syntax, but that's a hard sell for a team who mostly don't know F# and aren't convinced proptests are much better anyway. Writing the generators in C# seemed really incredibly tedious. I did start to get the hang of identifying properties to test though. Once you're past the mechanics of "how does this work" that can become much easier.

A load road to travel here, but I kind of gave myself a remit to improve software quality and I do think we need to be looking at this kind of testing to help.

Where do people who are using it find that it offers the most value? I keep feeling that we could really solidify some of our bespoke parsing and serialisation code using this kind of tech.

indoorskier5y ago

FsCheck maintainer here.

I'd very much welcome contributions on documentation, and esp. on approaches of how to keep the C#/F# documentation consistent and still accessible for both types of users. Even if it's just ideas/comments - how would you like the documentation presented? What are examples of excellent C# documentation? We need to balance that with available resources - we don't have a team of ghostwriters to write docs and examples for every language, as you can imagine. I know it's a cliche by this time, but if every user would take a couple minutes to write a paragraph or example where e.g. the C# docs are lacking, it might be in a much better state. From our side, if something is stopping you from contributing in this way, we'd like to hear about it. Addressing that is important.

Separately, I'm surprised you experienced that generators are significantly more tedious to write in C# vs F# - could you open an issue with a few examples of this? This would inform v3.0 where we will stop trying to use tricks to make the F# API accessible, but instead add a bespoke C#/VB.NET API in the FsCheck.Fluent namespace, and separating F#'y bits in FsCheck.FSharp.

hinkley5y ago· 1 in thread

Hypothesis:

Property Based Testing is Monte Carlo simulation for model checking.

zhd5y ago

Yep! And it's also possible to run fuzzers [1, 2] or SAT-based verifiers against the same test harness :-)

1: https://hypofuzz.com/docs/literature.html 2: https://google.github.io/oss-fuzz/getting-started/new-projec...

mristin5y ago

If you use Python and want to infer test strategies from contracts, you might want to check out this library of mine: [1].

There are also plugins for IDEs (Pycharm, VS Code and vim), which can be quite helpful during the development.

[1]: http://github.com/mristin/icontract-hypothesis

rck5y ago

This is a great set of ideas for using property-based testing. I've found it useful to think of code in terms of invariants and contracts, and property-based testing lets me express those very directly in code. No other testing method comes close.

spion5y ago

For JS and TypeScript, the best property testing library I've encountered so far is fast-check https://github.com/dubzzz/fast-check

delightful5y ago

Does anyone have any thoughts on the graphic showing the spectrum of testing options listed in the article (or additional resources that cover at a high-level this topic of understanding range of testing options)?

link to the graphic for ease of reference:

https://blog.auxon.io/images/posts/effective-property-based-...

mavelikara5y ago

When you first start on PBT, the first hurdle is finding good properties of your system to test. I found this article [1] to be a great overview to get started.

[1]: https://fsharpforfunandprofit.com/posts/property-based-testi...

leeuw015y ago

Auxon also looks like an interesting company.

1 more reply

mekatronix5y ago

These are great tips. Thanks.

j / k navigate · click thread line to collapse

34 comments

33 comments · 14 top-level

spockz5y ago· 6 in thread

The suggestion elsewhere in this thread to decrease the number of iterations during normal testing and crank it up during nightlies is also good.

Edit: typos

dwohnitmok5y ago

> I would like to make a shout out to quickcheck (Haskell, https://hackage.haskell.org/package/QuickCheck) as it was my first introduction to property based testing

im_down_w_otp5y ago

disclaimer: Auxon co-founder

spockz5y ago

mristin5y ago

[1]: https://news.ycombinator.com/item?id=26018386

[2]: https://github.com/HypothesisWorks/hypothesis/issues/2701

edit: newlines

mullrOP5y ago

That's a really nice idea. It doesn't fit into the usual compositional design of the proptest libraries that I've used, but it certainly seems like it should be possible in principle.

amw-zero5y ago

Hypothesis does what you’re talking about. It stores failing examples and chooses them during subsequent runs.

the_gastropod5y ago· 6 in thread

mullrOP5y ago

We like to run the tests as part of CI with a relatively small number of iterations, and then turn the knob way up in a nightly or weekly scheduled test job.

Smaug1235y ago

Yup. We've got some stuff set up so that in the IDE your tests finish in ~100ms; in precommit testing pipelines you get several seconds; in nightly tests you get two hours.

hinkley5y ago

If your tests are coupled, you're already in a bad way whether you know it or not. Dumping property testing on top of that without addressing the underlying cause sounds like a recipe for misery.

It's probably a great stick and carrot if you're pushing a tech debt reduction agenda though.

Jtsummers5y ago

k__5y ago

You could fan-out tests to multiple FaaS instances.

Cloudflare Workers, for example, have no cold-starts. But they are JS/WASM only

amw-zero5y ago

That’s a lot of buzzwords you got there

lowercase15y ago· 2 in thread

This is a good overview of property based testing but the mentions of Cucumber threw me off.

In my job Cucumber seems to add little more than just commenting and sequencing functions, tasks that are better suited to your programming language of choice, while adding overhead and complexity.

What am I missing?

mullrOP5y ago

Vinnl5y ago

But if they're only read, not written, by non-developers, wouldn't it make more sense to generate a report from your spec code, rather than using the report as a cumbersome authoring language?

pfdietz5y ago· 2 in thread

amw-zero5y ago

The author’s point is that shrinking can take a very long time, and that’s what you should consider turning it off. In my experience this is also true.

They’re saying that sometimes it’s better to just see the error in full and try and figure it out.

pfdietz5y ago

What takes the time in my case was simply getting a failure to occur at all. It might take days on a mature compiler before a failure occurs, if then. This would be millions of attempts.

Smaug1235y ago· 1 in thread

Brilliant article.

zhd5y ago

Hypothesis can report statistics on user-defined events, as well as the usual timing stuff: https://hypothesis.readthedocs.io/en/latest/details.html#tes...

mathw5y ago· 1 in thread

A load road to travel here, but I kind of gave myself a remit to improve software quality and I do think we need to be looking at this kind of testing to help.

Where do people who are using it find that it offers the most value? I keep feeling that we could really solidify some of our bespoke parsing and serialisation code using this kind of tech.

indoorskier5y ago

FsCheck maintainer here.

hinkley5y ago· 1 in thread

Hypothesis:

Property Based Testing is Monte Carlo simulation for model checking.

zhd5y ago

Yep! And it's also possible to run fuzzers [1, 2] or SAT-based verifiers against the same test harness :-)

1: https://hypofuzz.com/docs/literature.html 2: https://google.github.io/oss-fuzz/getting-started/new-projec...

mristin5y ago

If you use Python and want to infer test strategies from contracts, you might want to check out this library of mine: [1].

There are also plugins for IDEs (Pycharm, VS Code and vim), which can be quite helpful during the development.

[1]: http://github.com/mristin/icontract-hypothesis

rck5y ago

spion5y ago

For JS and TypeScript, the best property testing library I've encountered so far is fast-check https://github.com/dubzzz/fast-check

delightful5y ago

link to the graphic for ease of reference:

https://blog.auxon.io/images/posts/effective-property-based-...

mavelikara5y ago

When you first start on PBT, the first hurdle is finding good properties of your system to test. I found this article [1] to be a great overview to get started.

[1]: https://fsharpforfunandprofit.com/posts/property-based-testi...

leeuw015y ago

Auxon also looks like an interesting company.

1 more reply

mekatronix5y ago

These are great tips. Thanks.

j / k navigate · click thread line to collapse