There were so many possible states that a system's file system can be in. Were the conventional tests going to catch subtle bugs? Here's an example of an unusual but not unheard of issue: in a Unix system a file can be unlinked and hence deleted from all directories while remaining open by one or more processes. Such a file could still be written to and read by multiple processes at least until it was finally closed by all processes having open file descriptors at which point it would disappear from the system. Does the distributed file system model this correctly? Many other strange combinations of system calls might be applied to the file system. Will the tests exercise these.
It occurred to me that the "correct" behavior for any sequence of system calls could be defined by just running the sequence on a local file system and comparing the results with running an identical sequence of system calls against the distributed file system.
I built a system to generate random sequences of file related system calls that would run these on a local file system and a remote distributed file system that I wanted to test. As soon as a difference in outcome resulted the test would halt and save a log of the sequence of operations.
My experience with this test rig was interesting. At first discrepancies happened right away. As each bug was fixed by the dev team, we would start the stochastic testing again, and then a new bug would be found. Over time the test would run for a few minutes before failure and then a few minute longer and finally for hours and hours. It was a really ingesting and effective way to find some of the more subtle bugs in the code. I don't recall if I published this information internally or not.
You could run many tests in parallel to reduce the chance but it will never be completely zero. Writing bug free software this way is hard. The better way is to design it from the ground up with a bunch of instrumentation that keeps all of your invariants under close observation and that stops the moment anything is not according to your assumptions. This usually gets you to a high level of confidence that things really do work as designed. But of course, that also isn't perfect and residual risk (and residual bugs...) will always remain in any system of even moderate complexity. File systems are well above that level, especially distributed file systems.
Any personal "war stories" you are willing to share explaining how you went about designing such a system? :-) Or any presentation of someone who did that?
Google have a project to do fuzzing on Linux system calls using coverage feedback: https://github.com/google/syzkaller
It was very pleasant to work with such a system. Nowadays I would probably fuzz the patterns with AFL somehow.
Don't get me wrong, we should have more randomization, but it's not good everywhere, which might explain why we don't have as much of it.
generate a random seed, log it, then create an RNG using that "random, but recorded" seed. make sure all randomness used in the test flows from that explicitly-seeded RNG.
then, have an escape hatch where if a seed is provided as an environment variable, it will use that instead of generating one.
if you have a failure occur, you can always re-run with the same seed as a way to reproduce the failure (assuming it was indeed caused by that random seed and not some other factor)
depending on how fast the tests are, it may also be possible to run them multiple times with different seeds. for example, your on-every-commit CI run might run once with a hardcoded seed of 42. or it might run once with a hardcoded seed and once with a random seed.
and meanwhile, you might have a nightly test run that runs that same test suite 100 or 1000 times, with a different random seed each time.
this is a little tricky to integrate into ci
I will note that Google have a programme for doing fuzz testing on open source projects using compute from their cloud: https://google.github.io/oss-fuzz/
maybe someday software people will listen
that would be a good day
1. correctness: from small units tests to relatively complex integrations tests. they typically populate a test database and query it via various interfaces, such as REST or the Postgres protocol. we use Azure Pipelines to execute them - testing in MacoOS, Linux (both Intel and ARM) and Windows.
2. performance: we tend to use the TSBS project for most of our performance testing and profiling. fun fact: we actually had to patch it as the vanilla TSBS was a bottleneck in some tests. Sadly, the PR with the improvements is still not merged: https://github.com/timescale/tsbs/pull/186
edit: I thought I would link some of the more interesting tests: Since QuestDB supports the Postgres wire protocol we have to gracefully handle even various half-broken Postgres clients. But how do you write a test with mimicking a client generating invalid requests? No sane client will generate broken requests. So we use Wireshark to record network communication between the broken client and our server and then use this recorded communication in tests. Example: https://github.com/questdb/questdb/blob/3995c31210c70664d4b3...
I want to do fuzz testing on a library/framework written in C++. The actual target to test against is a simulator that takes input from a socket according to a network protocol. This simulator is built on both Linux and Windows.
What fuzzing frameworks would you recommend for this? There are quite a few and not always easy for me (as a fuzzing beginner) to understand the differences between them. Some of them seems to be abandoned projects as well.