a) Logs which break the program semantics when missing. That means, the program does not work "as expected" anymore for any directly or indirectly impacted enduser.
b) Logs that have no semantic impact when missing. No one will notice that they are missing, unless there is any other problem. These logs are solely meant to better understand the program operations.
Logs of type A should be tested like everything else. But logs of type B - I don't think I would test them.
While it is not nice to have a failure in production and figuring out only at that time that the logging doesn't work, I think that is even worse to break tests because of logging changes. Logging usually is so easy that it is an all- or nothing thing. I'm not saying that it does not have advantages to test for these kind of logs, but from my experience it is not worth the costs in 99% of the cases.
Just wanted to throw this in as to discourage completely ignoring testing logging. Of course e.g. good static analysis should somewhat alleviate chance of surprises here.
c) Logs that when present, "break" (using the term loosely) the application. The example here is when application is logging sensitive data, e.g. PII
It's surprising that it gets labeled as "logs". Don't we test the expected behavior of the _code_ under test?
Be it a business or an auxilliary code like logging, it's still code, some function/class/facility...well, any purpose added code.
Thus, I would test such logging facility for correctness; otherwise, the next time its output is needed, I want to be sure that it can be trusted.
Of course, using ready-made logging frameworks may help shift such burden of testing onto the framework's developer.
> Logging usually is so easy that it is an all- or nothing thing.
By all means, have the integration test that verifies that logs are produced at all (on each mechanism, if you support more than one). And it's a bonus if that test makes the deployment (or installation if that's your thing) fail.
What's an example of this kind of logging? I can imagine perhaps legally-mandated audit logs for financial software, but that would be stretching the definition of "semantics" in my mind.
The build asserted that the number of failing unit tests was 0.
It turns out that "breaking the unit test library completely such that 0 tests run instead of all thousands of them" did not break the build.
When someone noticed a week later, that was not fun to clean up.
Tests for tests exist. One such technique is called "mutation testing" [1].
I don't know of any job that actually uses them, but they make sense to me: the tests need checks too, otherwise how do you know they are adding value? It's too easy to write worthless tests, see them pass, and get a false sense of security. Techniques such as mutation testing check how sensitive your tests are to actual bugs (or program changes): do they detect them? If not, they aren't useful.
- most test frameworks use unittests to test themselves.
- tests need to have value. the dev/team/organization needs to come up with the sensible measure of value that guides poeople what to test and how to test it.
If your code is mostly passing data around and managing side effects a unit test is more like a parasite. It will couple to your code and fail noisily when refactoring but not provide any actual useful feedback.
No, I’m still going to sweep the floor, just not with a toothpick.
You need to test that you're using the contracts you've made, and integration tests verify that everything works as expected when both sides are complying with the contract.
I'm saying that if I've abstracted out the filesystem into an interface the distinction of asserting against whether a method was called on that interface and asserting a side-effect is somewhat trivial. I've made it so that side-effects must be handled in integration tests.
But now you're enforcing a degree of coupling between your filesystem interface and your code. If you refactor the filesytem interface such that the overall logic hasn't changed, but the parameter order for one of the methods is updated, then you essentially double the amount of work you have to do (update the code and update the tests). If you just test the logic and not the interface, then you only have to update the tests that pertain to the logic itself.
You should read up on the concept of functional core/imperative shell.
While that's true, you don't want to end up in a situation where the test is enforcing coupling which makes it harder to refactor your code. So, for instance, if you have tests that assert whether the method tested calls another method with certain parameters in a certain order, then you're enforcing coupling between the two methods. If you refactor the code to have the first method return a result and then use that result to call the second method, then you also have to update the tests. But they shouldn't have to be updated since the overall logic hasn't changed. If, instead, you test the logic of each method individually, then you can refactor and not have to update the tests.
(I'm probably using incorrect terminology here, so happy if someone corrects me)
It's what I describe in "Returning a message list" in this post: https://henrikwarne.com/2020/07/23/good-logging/
I would say it's not great practice, but, it can remove a lot of contortions to be able to make assertions about the execution path taken when doing a unit test.
If I have a tricky thing to test. I ask, if I added the required operational monitoring (e.g. queue sizes)? Would I suddenly be able to test this easier? Would it make the test more self explanatory. If the answer is yes I might add the monitoring and then do the unit testing. If the answer is no I have to refactor the implementation for testability.
conclusion: exposing logs and especially monitoring counters to unit tests can actually super charge your testing and leave you net positive
When doing TDD there is no reason to skip over logging (although that does seem to happen, everyone likes print debugging ;-) it’s just that sometimes it’s a bit clunky to which is why I wrote this tool. It supplies a secondary package that has FluentAssertions extensions to make testing even easier.
Code/packages can be found here: https://github.com/sandermvanvliet/SerilogSinksInMemory
Edit: formatting
Here's the coordinates in gradle:
compile group: 'org.apache.logging.log4j', name: 'log4j-core', classifier: 'tests'