A survey of testing techniques we've found useful (opens in new tab)

(vector.dev)

98 pointslukes3866y ago22 comments

22 comments

One thing that is rarely discussed (I think?) is how to test things which don't have a correct answer. It's not just "refactor until you can test" it's output that may be subjective. For example, suppose you write some code to do some image processing like a stereo matcher. How do you check your code works? Usually you have some ground truth which you can compare, but it's difficult because you'll never get 100% accuracy. At best you can declare a baseline, eg that your algo should be say 90% accurate (if you implemented it properly based on literature results) and if you don't get that, then error. In that case you can use a numerical metric, but other applications you might care about the result being aesthetically pleasing (eg a video ISP where you do colour correction on the stream coming from a low level camera).

Or hardware where the advice is usually to mock the device under test. But if you don't own the hardware the most you can do is try and emulate it, and maybe check that your simulated state machine works. In my experience its easier to run with hardware connected and just skip those tests otherwise. There are also extremely subtle bugs that can crop up with hardware interfaces like needing to insert delays into code (eg when sending serial) that will otherwise fail in the real world.

OpenCV has some interesting approaches to this, for example testing storing a video in a certain format, inserting a frame with a known shape (like a circle), then reading back the video and checking that the shape can be detected.

lukes386OP6y ago

I only mentioned it briefly at the end of the post, but metamorphic testing is a very interesting technique that addresses exactly this [0].

The basic idea is to start with some known-good inputs and outputs, and then generate ways to modify the input that should not change the output.

[0]: https://www.hillelwayne.com/post/metamorphic-testing/

Ididntdothis6y ago

Testing with hardware is hard. You can emulate to some degree but that just helps making sure your tests are written correctly. In the end you have to run tests against the real thing. I deal with complex UIs that interact with hardware. If you are smart you can split things up so they are easier to test in isolation but the whole system has a ton of potential interactions that are hard to write test cases for.

The OpenCV example is a pretty easy one. You have clear inputs with clearly defined outputs. The only thing you have to do is to create sample data.

dbcurtis6y ago

> Testing with hardware is hard.

Yup. I work in robotics.

I try to isolate the actual hardware interaction layer so that for testing you can mock the driver and hardware in one piece. Of course that does not test the driver. With any luck, the driver is pretty stable once it works, though. And the driver+hardware piece can have it's own (physical) test bench so that at least manual testing is, well maybe not efficient, but at least not painful.

Simulators are great but not always available. Or are too much work to get going.

One configuration often used for robots is the "boneless chicken". Take a bench, and bolt all the guts down to it in a configuration where they are easy to probe. Put the wheel motors someplace safe, with a synthetic load like a pony brake. Of course you can't test the nav stack that way. (I once interviewed a firmware engineer who was coming off of the Juicero shutdown -- say what you want about Juicero, but from the sounds of it their boneless chicken was outstanding, even integrated into the CI automation pipeline. Of course, they didn't have the nav problem).

Speaking of nav, I once saw a warehouse robot company's nav PR test micro-warehouse. Not the full test warehouse, just a 500 square foot or so area dedicated to testing nav PR's. It was integrated with CI automation. I could tell from the accumulated tire marks on the floor that they had nav pretty much nailed.

I have done several robot-to-elevator interfaces (probably more than anyone else). In the end, final testing always required something akin to a few midnight to 4 AM test blocks on the real elevator. And then of course as you point out:

> the whole system has a ton of potential interactions that are hard to write test cases for.

They often don't show up until the system is under load.

1 more reply

joshvm6y ago

Emulation works well if you have firmware and an emulator. For example, the Ardupilot autopilot software has both hardware-in-the-loop and software-in-the-loop packages which use the actual firmware. It runs off an STM32 emulator (I think) which is well defined. As you say, if you don't have that firmware, your emulator is only as good as your reverse engineering is.

When I'm testing thermal cameras there are a sequence of things I can check to ensure that the test worked: was the command sent without errors? Did I get an error back from the camera (e.g. CRC failure)? Does the state of the camera change as I expect it to? If all of those things are correct then the likelihood is that the command sent OK. Of course for states you should check various permutations (e.g. shutter open and shutter closed) to make sure that you don't have a bug in your state reading code :)

Here's a stereo matching example from OpenCV. This is a case when you do have the correct answer, but you don't expect to equal it, and your tolerance to accuracy varies with algorithm:

https://github.com/opencv/opencv/blob/055645080161c6af6083b6...

kqr6y ago

Tell me about it! I work on a search engine powered by relatively basic machine learning of user behaviour. We probably achieve the most relevant results in the world for our customers. That's not the hard part.

The hard part is tightening our development feedback cycle. Since we outperform all competitors, we don't have an oracle to test against. We can automate testing with a small sample of input-output pairs, but the brunt of the work is still done by humans trained and paid to judge the quality of the results. It's an awful position to be in.

I have started looking for better ways of doing it, and the most promising I've found so far is metamorphic testing, mentioned in another comment.

Property testing only takes you a short bit here, as far as I've been able to figure out.

(I have also glanced at the techniques used in bioinformatics, since those guys are good at comparing sequences, but that's more specific to our case and not a general solution.)

carlosf6y ago

> How do you check your code works?

When I think about my projects "working", I always try to answer the following questions:

1) Is my code doing what I believe it should be doing? That question always have objective answers and is the subject of software engineering testing.

2) Is my solution solving my problem efficiently? Often that's a domain specific question and different domains have different ways of doing quality assurance, there's no silver bullet.

Lichtso6y ago

I think this is a fairly good summary of the most important testing styles and where / when to (not) use them.

One more category of tests I would add are meta tests (like mutation tests). These are tests which test the tests, seeing if they would actually catch any errors / bugs or just report everything to be alright always.

discreteevent6y ago

"If something is difficult to unit test, refactor until it's easy."

This is often a good idea but if you only need the flexibility to enable unit testing then it may make your system more complex than it needs to be. Only introduce indirection where it's really needed. See also "test induced design damage" and "write tests, not too many, mostly integration".

pkolaczk6y ago

Only if you want to minimize the total complexity, which is almost never a good idea. What is more important, is the minimum amount of complexity needed to be understood in order to make a change. If you are able to test a small subset of components in isolation, you can also understand them in isolation, and modify them without the need to understand the whole system. I'd rather read 20% of 110% code than 100% of 100%.

The advice to write mostly integration tests is a terrible one. Particularly when they test integrating of everything. When such tests catch bugs, they don't tell where the problem happened. They also take long time to execute.

taeric6y ago

The problem I have been exposed to is unit tests that have locked in what should be just an implementation detail. Can be fine, if it is an important detail. That said, your tests should not have to import all of the same pieces that your coffee does. I prefer, actually, to use alternatives in the tests, where possible.

For example, if you're code zips something, your test could use many zip engines to verify.

epdlxjmonad6y ago

When writing code, we often think "according to the design of our system, this condition must be true at this point of execution.” Examples are:

1. The argument x must not be 0.

2. The variable x must smaller than the variable y.

3. The list foo must be non-empty.

4. The variable x should have value 'Success' if it had value 'Try' in the beginning of the function call.

These 'invariants', or assertions, can be extremely useful for testing the correctness of the code. Put simply, if an invariant is violated (during unit test, integration tests, or system tests), it indicates that either the design or the implementation is wrong. An article on testing methodology would be more appealing if it had some discussion on exploiting invariants/assertions.

AdieuToLogic6y ago

I believe "Design by Contract"[0], or "DbC", is the concept you are describing. In the Wiki page Notes and External Links sections there are some good resources IMHO.

0 - https://en.wikipedia.org/wiki/Design_by_contract

epdlxjmonad6y ago

Yes, closely related, but invariants can appear anywhere in the code (like loop invariants), and are less restrictive than pre-conditions and post-conditions which must appear in the beginning and end of methods. So, invariants are more about testing than design.

Arguably, invariants are especially powerful in testing distributed systems:

0 - https://www.datamonad.com/post/2020-02-19-testing-mr3/

cuchoi6y ago

Aren't these examples of unit tests?

MaxBarraclough6y ago

No, as they're runtime asserts baked into the code itself, rather than existing elsewhere.

choeger6y ago

Very nice overview. I very much agree that all these techniques are very useful. But one thing still bothers me. How does one test with a notion of time? How do you test a cron or a calendar with alarm function?

ptsneves6y ago

Yeah, those are the hard ones. Worse, time is the essence of a good amount of issues from performance, to startup stabilization to the deadliest of all errors: the race condition.

RaoulP6y ago

I’m a complete newbie when it comes to writing tests. But I know that SQLite is tested to hell and back, and I believe Richard Hipp (the creator) has said he spends more time and lines of code on the testing suite, than on the SQLite code itself. I hope he shares some of his insights some day.

idoubtit6y ago

> I hope he shares some of his insights some day.

You didn't search much: https://sqlite.org/testing.html

binarylogicben6y ago

I love this page! We (Vector) hope to post something similar in the future.

j / k navigate · click thread line to collapse

22 comments

joshvm6y ago

lukes386OP6y ago

I only mentioned it briefly at the end of the post, but metamorphic testing is a very interesting technique that addresses exactly this [0].

The basic idea is to start with some known-good inputs and outputs, and then generate ways to modify the input that should not change the output.

[0]: https://www.hillelwayne.com/post/metamorphic-testing/

Ididntdothis6y ago

The OpenCV example is a pretty easy one. You have clear inputs with clearly defined outputs. The only thing you have to do is to create sample data.

dbcurtis6y ago

> Testing with hardware is hard.

Yup. I work in robotics.

Simulators are great but not always available. Or are too much work to get going.

> the whole system has a ton of potential interactions that are hard to write test cases for.

They often don't show up until the system is under load.

1 more reply

joshvm6y ago

Here's a stereo matching example from OpenCV. This is a case when you do have the correct answer, but you don't expect to equal it, and your tolerance to accuracy varies with algorithm:

https://github.com/opencv/opencv/blob/055645080161c6af6083b6...

kqr6y ago

I have started looking for better ways of doing it, and the most promising I've found so far is metamorphic testing, mentioned in another comment.

Property testing only takes you a short bit here, as far as I've been able to figure out.

(I have also glanced at the techniques used in bioinformatics, since those guys are good at comparing sequences, but that's more specific to our case and not a general solution.)

carlosf6y ago

> How do you check your code works?

When I think about my projects "working", I always try to answer the following questions:

1) Is my code doing what I believe it should be doing? That question always have objective answers and is the subject of software engineering testing.

2) Is my solution solving my problem efficiently? Often that's a domain specific question and different domains have different ways of doing quality assurance, there's no silver bullet.

Lichtso6y ago

I think this is a fairly good summary of the most important testing styles and where / when to (not) use them.

discreteevent6y ago

"If something is difficult to unit test, refactor until it's easy."

pkolaczk6y ago

taeric6y ago

For example, if you're code zips something, your test could use many zip engines to verify.

epdlxjmonad6y ago

When writing code, we often think "according to the design of our system, this condition must be true at this point of execution.” Examples are:

1. The argument x must not be 0.

2. The variable x must smaller than the variable y.

3. The list foo must be non-empty.

4. The variable x should have value 'Success' if it had value 'Try' in the beginning of the function call.

AdieuToLogic6y ago

I believe "Design by Contract"[0], or "DbC", is the concept you are describing. In the Wiki page Notes and External Links sections there are some good resources IMHO.

0 - https://en.wikipedia.org/wiki/Design_by_contract

epdlxjmonad6y ago

Arguably, invariants are especially powerful in testing distributed systems:

0 - https://www.datamonad.com/post/2020-02-19-testing-mr3/

cuchoi6y ago

Aren't these examples of unit tests?

MaxBarraclough6y ago

No, as they're runtime asserts baked into the code itself, rather than existing elsewhere.

choeger6y ago

ptsneves6y ago

Yeah, those are the hard ones. Worse, time is the essence of a good amount of issues from performance, to startup stabilization to the deadliest of all errors: the race condition.

RaoulP6y ago

idoubtit6y ago

> I hope he shares some of his insights some day.

You didn't search much: https://sqlite.org/testing.html

binarylogicben6y ago

I love this page! We (Vector) hope to post something similar in the future.

j / k navigate · click thread line to collapse