Software testing, and why I'm unhappy about it (opens in new tab)

(nhaehnle.blogspot.com)

78 pointsdrothlis3y ago73 comments

73 comments

43 comments · 10 top-level

13of403y ago· 18 in thread

I have a lot of bitter things to say about automated testing, having spent 14 years of my life trying to knead it into a legitimate profession, but here's the most significant:

You test case is more useless than a turd in the middle of the dining room table unless you put a comment in front of it that explains what it assumes, what it attempts, and what you expect to happen as a result.

Because if you just throw in some code, you're only giving the poor bastard investigating it two puzzles to debug instead of one.

quadrifoliate3y ago

At an old job, one manager would put in his employees' annual reports stuff like "Developer X wrote N automated tests, fixed M bugs, and filed P new bugs this quarter..."

The obvious result of Goodhart's Law ensued, leading to test cases like you mention.

Lesson to leaders: Please stop your bad managers from pulling stupid crap like this. It wastes a lot more time in the longer run.

randomdata3y ago

Which is funny as the purpose of testing is to explain to other other developers what the code under test assumes and what should be expected of it under various conditions. It is documentation.

If you have to document your documentation, you might be missing something fundamental in how you are writing your first order documentation. Not to mention that in doing so you defeat the reason for writing your documentation in an executable form (to be able to automatically validate that the documentation is true).

vincnetas3y ago

So i understand correctly, that your position is "code is the documentation"?

Over time im inclined to value human written documentation. Especially when things involve integrations of multiple systems. I had real cases, where two parties point at code and say their code is correct. And in isolation code looks correct. But when time comes to integrate these systems. It breaks. And then if you have human readable document where intentions and expectations are specified it's much easier to come to common (working) solution.

Not all languages have capability to express complex intentions so code as documentation does not work most of the time.

5 more replies

jmcomets3y ago

Disregarding the "code is doc" position, it's still common to have an overview or index for documentation, which points readers in the right direction instead of dumping pages of detailed docs on them.

Now, you could also have a well organized test suite that goes from most obvious to most detailed, split into sections for each use-case, but this sounds a lot more tedious than "write a one-line comment describing the unit test".

charcircuit3y ago

>the purpose of testing is to explain to other other developers what the code under test assumes and what should be expected of it under various conditions

No, the point of automated testing is to verify that what is under test behaves correctly and to be able to scale this verification cheaper than having humans do it. Documenting what it verifies and under what conditions is just a side effect.

1 more reply

hulitu3y ago

The test plan is the documentation. That people are cutting corners is unfortunate.

A test must be reproduceable. If it is not, is not a test.

ramesh313y ago

>You test case is more useless than a turd in the middle of the dining room table unless you put a comment in front of it that explains what it assumes, what it attempts, and what you expect to happen as a result.

This is why I found Gherkin/Cucumber (and BDD in general) to be a total revelation when I first encountered it. No one should be writing tests any other way IMO.

https://cucumber.io/docs/gherkin/reference/

randomdata3y ago

Gherkin/Cucumber reintroduce the very problem TDD/BDD was intended to solve: Documentation falling out of sync with the implementation.

The revelation of TDD, which was later rebranded as BDD to deal with the confusion that arose with other types of testing, was that if your documentation was also executable the machine could be used to prove that the documentation is true. The Gherkin/Cucumber themselves are not executable and require you to re-document the function in another language with no facilities to ensure that the two are consistent with each other.

If you are attentive enough to ensure that the documentation and the implementation are aligned, you may as well write it in plain English. It will give you all of the same benefits without the annoying syntax.

1 more reply

invalidname3y ago

This sounds like a good theory but the practice of it is really hard. Pretty quickly you end up with tests that "say" one thing but have nuanced different behavior in the underlying implementation.

Then try to debug a "document"...

I like the idea. But having tried it at scale, it becomes a mess. Code I can understand. I can read English comments. I can't debug English.

3 more replies

chriswarbo3y ago

I agree. One nice feature of property-driven testing is that assumptions often end up causing test failures. For example (in ScalaTest):

  "Average of list" should "be within range" in {
    forAll() {
      (l: List[Float]) => {
        val avg = l.average
        assert(avg >= l.min && avg <= l.max)
      }
    }

This test will fail, since it doesn't hold for e.g. empty lists. Requiring non-empty lists will still fail, if we have awkward values like NaNs, etc. The following version has a better chance of passing:

  "Average of list" should "be within range" in {
    forAll() {
      (raw: List[Float]) => {
        val l = raw.filter(n => !n.isNaN && !n.isInfinite)
        whenever (l.nonEmpty) {
          val avg = l.average
          assert(avg >= l.min && avg <= l.max)
        }
      }
    }

Getting this test to pass required us to make those assumptions explicit. Of course, it doesn't spot everything; here's an article which explores this example in more depth (in Python) https://hypothesis.works/articles/calculating-the-mean

midasz3y ago

I always use (if the scenario is simple enough, which most are):

@Test

public void myTestMethod_Scenario_ShouldReturnThis() {....

geraldwhen3y ago

Jest makes this far more straightforward.

It(“throws when the object belongs to another user”)

It(“does a business thing when thing is in state BLAH”)

sirsinsalot3y ago

To some degree, this is what BDD attempts to solve, separation of test mechanics and documentation of the test's intention.

I don't think it quite does it right, but it is of note.

hgomersall3y ago

We have a policy of making each test a spec. That is, a test requires a plain text spec to be attached to it in its doc string. It's kind of like BDD but without all the weird DSLs.

dgb233y ago

What about data driven tests where you lay out several variants, including edge cases, for function arguments? Seems pretty clear to me.

debug-desperado3y ago

So, "given when then" style tests (e.g. Spock) plus a descriptive test name. Or more than that?

amelius3y ago

I suppose soon you could ask GPT "what is this code supposed to do?"

(I would buy a Copilot subscription for this)

nosianu3y ago

Which would replace all those humans producing perfectly valid sounding explanations that if you invest some research effort have no basis in (the usually far more complex, but also far more fascinating and infinitely deep) reality. So yes, I think AI can indeed replace lots of human-produced thoughts :-)

I admit to have been guilty of this myself. I have a famous anecdote-example where I had a very well-paid contractor job and explained something about how my then department's software worked to someone from another department. I think I must have sounded very convincing, the person went off to change something in how they used our stuff. A few minutes later, after accidentally meeting and casually chatting with my boss for that job I realized everything I had said was total garbage. I quickly excused myself from my boss and hurried after the person to tell them to forget and ignore everything I had just explained to them because it was all wrong. I think this last step is not what happens in those cases because we don't usually realize that such a thing just happened.

The brain, or parts of it, are great at producing "explanations". I think that it was part of the more established and reproducible results of psychology that our brain first decides and acts, and only then produces some (often bullshit) "reason" when/if our conscious self asks for one? Does anybody remember if this is true and has a link?

1 more reply

mixedCase3y ago· 5 in thread

> During day-to-day development, the important bit isn't that there are no failures. The important bit is that there are no regressions.

And that's why we test and why tests shouldn't be allowed to fail.

Just because the scenarios described make testing hard does not change reality of what makes tests valuable.

If pre-existing failures are halting the production pipeline and you don't like it, switch off trunk based development and see if you like the waits and constant rebasing in large projects/teams. But don't eff with the bloody tests!

emptysea3y ago

when the codebase gets large enough you need to allow some tests to "fail" but really I mean you need a way to quickly mark a failing test as flakey so the author can fix it while everyone else can get on with their day and merge code.

At $dayjob this works well, if your CI comes up red with some unrelated test failing, you can mark the test as flakey in the UI and CI will allow your code to merge and a Jira ticket will be created for the test owner to fix their test (and it will be disabled for future test runs)

I think for small to medium projects, you can have all tests succeed but once the repo is large enough / has frequent enough changes, flakey tests are bound to slip in.

Gigachad3y ago

Our setup just reruns the tests a few times which sorts out flaky tests. The page then shows the most frequently failing tests so they can be properly fixed.

wpietri3y ago

That sounds like it works, but shouldn't flaky tests be detected by frequent automatic test suite runs?

1 more reply

jamesfinlayson3y ago

I think GitHub does something similar - public website tests must always pass but if you break GitHub Enterprise you get three days to fix it (or something like that - I think they had a blog post on it).

idontwantthis3y ago

That sounds fantastic. What tool are you describing?

1 more reply

drothlisOP3y ago· 5 in thread

Some good ideas here for when your tests are in a separate repo than the system under test (GPUs/drivers/compilers in the case of the author, but it's applicable to a variety of industries).

Gigachad3y ago

Tests in seperate repo is the worst anti pattern I have seen. It’s extremely common that a change requires a change in tests but it’s impossible to correctly manage this situation if the tests can’t be updated in the same commit/pr.

cstoner3y ago

The only time "tests in a separate repo" makes sense to me is if they are truly cross-functional end to end tests that exercise several systems.

Those tests should be as small as possible to verify that everything is still wired together correctly.

Everything else should be either unit tests or narrow integration tests between a small handful of components. And as you said, they should live in the repository of the software they test.

jeroenhd3y ago

I can't think of any project I've worked on where external test suites even make sense. I suppose it would work when you have a very clear spec or compliance document you could write independent tests, or if you're rewriting a system and need the public API to be bug-for-bug compatible with the old one, but other than those niche use cases I wouldn't want to keep those tests external at all.

Even if you do have external tests, you still need internal ones for the surface area your external tests don't check for. Unit tests and such don't make sense at all combined with a separate test repo.

drothlisOP3y ago

Think systems integrators and compliance tests. I would imagine that each of the individual systems being "integrated" do have their own unit tests, upstream, in their own repos.

1 more reply

adamwk3y ago

It also makes it impossible to test outside of the public API

drewcoo3y ago· 3 in thread

Doctor, it hurts when I punch myself in the head!

If testing that way is painful (and it is), then work with people to remove the pain. Tests are supposed to help developers, not constrain or punish them.

Put tests in the same repo as the SUT. Do more testing closer to the code (more service and component tests) and do less end-to-end testing. Ban "flakey" tests - they burn engineering time for questionable payoff.

Test failures can be thought of as "things developers should investigate." Make sure the tests are focused on telling you about those things as fast as possible.

Also, take the human out of the "wait for green, then submit PR" steps. Open a PR but don't alert everyone else about it until you run green, maybe?

cranium3y ago

It would work for most "classical" software development. In this case, the author talks about conformance tests (a HUGE collection) from an external vendor. Most of them will fail at first, then you make them pass slowly but steadily.

The problem becomes: I want to know if there are significant regressions in the vendor tests, ie. tests that were green for a long time and suddenly changed. You could flag any test that became green at some point as "required" to pass the CI, but then you have tests that randomly succeed or fail depending on code you have not yet written (eg. locking around concurrent structures). Marking these tests manually is impractical and could definitively be replaced by tooling that supports some statistical modeling of success/failure.

You may have the best testing strategy for internal code but as long as you have to test against these conformance tests it's simply unfeasible to say "sorry, only green allowed".

kubanczyk3y ago

> take the human out of the "wait for green, then submit PR"

It'd be great if GitHub could open a PR for reviews (aka un-draft) automatically after CI succeeds. (If not in the core product, is there a bot that does that?)

mike_hearn3y ago

My company uses a workflow where we don't use PRs for code reviews. Instead we each have our own git repo that's a fork of the tech lead's, with some git rules in place to impose a branch namespace. To open a review request you push a branch into the reviewer's repository. Our CI system detects the new branch and starts running it. Once CI passes that updates the bug tracker which triggers a notification to the reviewer.

The reviewer then does a git fetch, and then checks out the newly created rr/ branch. They make any small changes that aren't worth a roundtrip and push them to the rr branch. They add FIXME comments for bigger changes. They then either assign the ticket back to the developer, or go ahead and merge straight into their own dev branch. Once an rr branch is merged it's simply deleted. The dev branch is then pushed and CI will merge it to that user's master when it's green.

IntelliJ will show branches in each origin organized by "folder" if you use backslashes in branch names, and gitolite (which is what we use to run our repos) can impose ACLs by branch name too. So for example only user alice can push to a branch named rr/alice/whatever in each persons repo. That ensures it's always clear where a PR/RR is coming from.

Because each user gets their own git repo and cloned set of individual CI builds, you can push experimental or WIP branches to your personal area and iterate there without bothering other people.

This workflow gets rid of things like draft PRs (which are a contradiction), it ensures each reviewer has a personal review queue, it means work and progress is tracked via the bug tracker (which understands commands in commit messages so you can mark bugs as fixed when they clear CI automatically) and it eliminates the practice of requesting dozens of tiny changes that'd be faster for the reviewer to apply themselves, because reviewer and task owner can trade commits on the rr branch using git's features to keep it all organized and mergeable.

theamk3y ago· 2 in thread

The author's problem is pretty simple: the test repo is required for pre-merge tests to pass, but it can be updated independently, without having pre-merge tests pass.

And the answer is pretty simple: pin the specific test repo version! Use lockfiles, or git submodules, or put "cd tests && git checkout 3e524575cc61" in your CI config file _and keep it in the same repo as source code_ (that part is very important!).

This solves all of author problems:

> new test case is added to the conformance test suite, but that test happens to fail. Suddenly nobody can submit any changes anymore.

Conformance test suite is pinned, so new test is not used. A separate PR has to update conformance test suite version/revision, and it must go through regular driver PR process and therefore must pass. Practically, this is a PR with 2 changes: update pin and disable new test.

> are you going to remember to update that exclusion list?

That's why you use "expect fail" list (not exclusion) and keep it in driver's dir. Ad you submit your PR you might see a failure saying: "congrats, test X which was expect-fail is now passing! Please remove it from the list". You'll need to make one more PR revision but then you get working tests.

> allowing tests to be marked as "expected to fail". But they typically also assume that the TB can be changed in lockstep with the SUT and fall on their face when that isn't the case.

And if your TB cannot be changed in lockstep with SUT, you are going to have truly miserable time. You cannot even reproduce the problems of the past! So make sure your kernel is known or at least recorded, repos are pinned. Ideally the whole machine image, with packages and all is archived somehow -- maybe via docker or raw disk image or some sort of ostree system.

> Problem #2 is that good test coverage means that tests take a very long time to run.

The described system sounds very nice, and I would love to have something like this. I suspect it will be non-trivial to get working, however. But meanwhile, there is a manual solution: have more than one test suite. "Pre-merge" tests run before each merge and contain small subset of testing. A bigger "continuous" test suite (if you use physical machines) or "every X hours" (if you use some sort of auto-scaling cloud) will run a bigger set of tests, and can be triggered manually on PRs if a developer suspects the PR is especially risky.

You can even have multiple levels (pre-merge, once per hour, 4 times per day) but this is often more trouble than it worth.

And of course it is absolutely critical to have reproducible tests first -- if you come up to work and find a bunch of continuous failures, you want to be able to re-run with extra debugging or bisect what happened.

chriswarbo3y ago

> And the answer is pretty simple: pin the specific test repo version! Use lockfiles, or git submodules, or put "cd tests && git checkout 3e524575cc61" in your CI config file _and keep it in the same repo as source code_ (that part is very important!).

Indeed. Where I work we have a bunch of repos, but they always reference each other via pinned commits. We happen to use Nix, with its built in 'fetchGit' function; it's also easy to override any of these dependencies with a different revision. For example:

  { helpers ? import (fetchGit {
      url = "git://url-of-helpers.git";
      ref = "master";
      rev = "11111";
    })
  , some-library ? import (fetchGit {
      url = "git://url-of-some-library.git";
      ref = "master";
      rev = "22222"
    }) {}
  }:
  helpers.build-a-service {
    name = "my-service";
    src  = ./src;
    deps = { inherit some-library; };
  }

This is a function taking two arguments ('helpers' and 'some-library'), with default arguments that fetch particular git commits. This gives us the option of calling the function with different values, to e.g. build against different commits.

We run our CI on GitHub Actions, which allows some jobs to be marked as 'required' for PRs (using branch protection rules). The normal build/test jobs use the default arguments, and are marked as required: everything is pinned, so there should be no unexpected breakages.

Some of our libraries also define extra CI jobs, which are not marked as required. Those fetch the latest revision of various downstream projects which are known to use that library, and override the relevant argument with themselves. For example, the 'some-library' repo might have a test like this:

  import (fetchGit {
    url = "git://url-of-some-library.git";
    ref = "master";
    # No 'rev' given, so it will fetch 'HEAD'
  }) {
    # Build with this checkout of some-library, instead of the pinned version
    some-library = import ./. {};
  }

This lets us know if our PR would break downstream projects, if they were to subsequently update their pinned dependencies (either because we've broken the library, or the downstream project is buggy). It's useful for spotting problems early, regardless of whether the root cause is upstream or downstream.

yellow_lead3y ago

Yeah - developers need to control their own tests. If in the weird case they don't control their tests (conformance tests) - you need to control when those tests are added.

gampleman3y ago

Seems to me like you're underinvesting in tooling. It's a mistake a lot of development shops make - you focus on your product, so you can't spend time building something completely orthogonal, but in the process you suddenly waste man-years wasting time on a broken PR process, instead of spending a month early on building some tooling that would have removed the pain in the first place.

andreareina3y ago

The continuous testing is something I’ve thought about and it’s a tricky one. We use property tests[1] so here’s a quick stab at how I’d like it to look like:

Test starts failing, immediately send a report with the failing input, then continue with the test case minimisation and send another report when that finishes.

Concurrently, start up another long running process to look for other failures, skipping the input that caused the previous failure. We do want new inputs for the same failure though. This is the tricky one. We could probably make it work by having the prop test framework not reuse previously-failing inputs, but that’s one of the big strategies it uses to catch regressions.

[1] specifically, hypothesis on python

ranting-moth3y ago

> The above development practice works well when the SUT and TB are both defined by the same code repository and are developed together.

I once witnessed a team creating an app, specs and tests in three respective repositories. For no other reason than "each project should be in it's own repository".

The added work/maintenance around that is crazy, for absolutely no gain in that case.

nurettin3y ago

If I am given the time and resources I do this:

Phase 1. Code and test basic functions concerning any kind of arithmetic, mathematical distribution, state machines, file operations and datetimes. This documents any assumptions and makes a solid foundation.

Phase 2. Write a simulation for generating randomized inputs to test the whole system. Run it for hours. If I can't generate the inputs, find as big a variety of inputs as possible. Collect any bugs, fix, repeat. This reduces the chances of finding real time bugs by three orders of magnitude.

This has worked really well in the past whether I'm working on games, parsers or financial software. I don't conform to corporate whatever driven testing patterns because they are usually missing the crucial part 2 and time part 1 incorrectly.

t003y ago

Have I misunderstood the article or it is just a matter of separating feature branches and putting relevant tests in a feature branch while keeping regression in a master branch?

j / k navigate · click thread line to collapse

73 comments

43 comments · 10 top-level

13of403y ago· 18 in thread

I have a lot of bitter things to say about automated testing, having spent 14 years of my life trying to knead it into a legitimate profession, but here's the most significant:

Because if you just throw in some code, you're only giving the poor bastard investigating it two puzzles to debug instead of one.

quadrifoliate3y ago

At an old job, one manager would put in his employees' annual reports stuff like "Developer X wrote N automated tests, fixed M bugs, and filed P new bugs this quarter..."

The obvious result of Goodhart's Law ensued, leading to test cases like you mention.

Lesson to leaders: Please stop your bad managers from pulling stupid crap like this. It wastes a lot more time in the longer run.

randomdata3y ago

Which is funny as the purpose of testing is to explain to other other developers what the code under test assumes and what should be expected of it under various conditions. It is documentation.

vincnetas3y ago

So i understand correctly, that your position is "code is the documentation"?

Not all languages have capability to express complex intentions so code as documentation does not work most of the time.

5 more replies

jmcomets3y ago

charcircuit3y ago

>the purpose of testing is to explain to other other developers what the code under test assumes and what should be expected of it under various conditions

1 more reply

hulitu3y ago

The test plan is the documentation. That people are cutting corners is unfortunate.

A test must be reproduceable. If it is not, is not a test.

ramesh313y ago

This is why I found Gherkin/Cucumber (and BDD in general) to be a total revelation when I first encountered it. No one should be writing tests any other way IMO.

https://cucumber.io/docs/gherkin/reference/

randomdata3y ago

Gherkin/Cucumber reintroduce the very problem TDD/BDD was intended to solve: Documentation falling out of sync with the implementation.

1 more reply

invalidname3y ago

This sounds like a good theory but the practice of it is really hard. Pretty quickly you end up with tests that "say" one thing but have nuanced different behavior in the underlying implementation.

Then try to debug a "document"...

I like the idea. But having tried it at scale, it becomes a mess. Code I can understand. I can read English comments. I can't debug English.

3 more replies

chriswarbo3y ago

I agree. One nice feature of property-driven testing is that assumptions often end up causing test failures. For example (in ScalaTest):

  "Average of list" should "be within range" in {
    forAll() {
      (l: List[Float]) => {
        val avg = l.average
        assert(avg >= l.min && avg <= l.max)
      }
    }

  "Average of list" should "be within range" in {
    forAll() {
      (raw: List[Float]) => {
        val l = raw.filter(n => !n.isNaN && !n.isInfinite)
        whenever (l.nonEmpty) {
          val avg = l.average
          assert(avg >= l.min && avg <= l.max)
        }
      }
    }

midasz3y ago

I always use (if the scenario is simple enough, which most are):

@Test

public void myTestMethod_Scenario_ShouldReturnThis() {....

geraldwhen3y ago

Jest makes this far more straightforward.

It(“throws when the object belongs to another user”)

It(“does a business thing when thing is in state BLAH”)

sirsinsalot3y ago

To some degree, this is what BDD attempts to solve, separation of test mechanics and documentation of the test's intention.

I don't think it quite does it right, but it is of note.

hgomersall3y ago

We have a policy of making each test a spec. That is, a test requires a plain text spec to be attached to it in its doc string. It's kind of like BDD but without all the weird DSLs.

dgb233y ago

What about data driven tests where you lay out several variants, including edge cases, for function arguments? Seems pretty clear to me.

debug-desperado3y ago

So, "given when then" style tests (e.g. Spock) plus a descriptive test name. Or more than that?

amelius3y ago

I suppose soon you could ask GPT "what is this code supposed to do?"

(I would buy a Copilot subscription for this)

nosianu3y ago

1 more reply

mixedCase3y ago· 5 in thread

> During day-to-day development, the important bit isn't that there are no failures. The important bit is that there are no regressions.

And that's why we test and why tests shouldn't be allowed to fail.

Just because the scenarios described make testing hard does not change reality of what makes tests valuable.

emptysea3y ago

I think for small to medium projects, you can have all tests succeed but once the repo is large enough / has frequent enough changes, flakey tests are bound to slip in.

Gigachad3y ago

Our setup just reruns the tests a few times which sorts out flaky tests. The page then shows the most frequently failing tests so they can be properly fixed.

wpietri3y ago

That sounds like it works, but shouldn't flaky tests be detected by frequent automatic test suite runs?

1 more reply

jamesfinlayson3y ago

idontwantthis3y ago

That sounds fantastic. What tool are you describing?

1 more reply

drothlisOP3y ago· 5 in thread

Some good ideas here for when your tests are in a separate repo than the system under test (GPUs/drivers/compilers in the case of the author, but it's applicable to a variety of industries).

Gigachad3y ago

cstoner3y ago

The only time "tests in a separate repo" makes sense to me is if they are truly cross-functional end to end tests that exercise several systems.

Those tests should be as small as possible to verify that everything is still wired together correctly.

Everything else should be either unit tests or narrow integration tests between a small handful of components. And as you said, they should live in the repository of the software they test.

jeroenhd3y ago

drothlisOP3y ago

Think systems integrators and compliance tests. I would imagine that each of the individual systems being "integrated" do have their own unit tests, upstream, in their own repos.

1 more reply

adamwk3y ago

It also makes it impossible to test outside of the public API

drewcoo3y ago· 3 in thread

Doctor, it hurts when I punch myself in the head!

If testing that way is painful (and it is), then work with people to remove the pain. Tests are supposed to help developers, not constrain or punish them.

Test failures can be thought of as "things developers should investigate." Make sure the tests are focused on telling you about those things as fast as possible.

Also, take the human out of the "wait for green, then submit PR" steps. Open a PR but don't alert everyone else about it until you run green, maybe?

cranium3y ago

You may have the best testing strategy for internal code but as long as you have to test against these conformance tests it's simply unfeasible to say "sorry, only green allowed".

kubanczyk3y ago

> take the human out of the "wait for green, then submit PR"

It'd be great if GitHub could open a PR for reviews (aka un-draft) automatically after CI succeeds. (If not in the core product, is there a bot that does that?)

mike_hearn3y ago

Because each user gets their own git repo and cloned set of individual CI builds, you can push experimental or WIP branches to your personal area and iterate there without bothering other people.

theamk3y ago· 2 in thread

The author's problem is pretty simple: the test repo is required for pre-merge tests to pass, but it can be updated independently, without having pre-merge tests pass.

This solves all of author problems:

> new test case is added to the conformance test suite, but that test happens to fail. Suddenly nobody can submit any changes anymore.

> are you going to remember to update that exclusion list?

> allowing tests to be marked as "expected to fail". But they typically also assume that the TB can be changed in lockstep with the SUT and fall on their face when that isn't the case.

> Problem #2 is that good test coverage means that tests take a very long time to run.

You can even have multiple levels (pre-merge, once per hour, 4 times per day) but this is often more trouble than it worth.

chriswarbo3y ago

  { helpers ? import (fetchGit {
      url = "git://url-of-helpers.git";
      ref = "master";
      rev = "11111";
    })
  , some-library ? import (fetchGit {
      url = "git://url-of-some-library.git";
      ref = "master";
      rev = "22222"
    }) {}
  }:
  helpers.build-a-service {
    name = "my-service";
    src  = ./src;
    deps = { inherit some-library; };
  }

  import (fetchGit {
    url = "git://url-of-some-library.git";
    ref = "master";
    # No 'rev' given, so it will fetch 'HEAD'
  }) {
    # Build with this checkout of some-library, instead of the pinned version
    some-library = import ./. {};
  }

yellow_lead3y ago

Yeah - developers need to control their own tests. If in the weird case they don't control their tests (conformance tests) - you need to control when those tests are added.

gampleman3y ago

andreareina3y ago

The continuous testing is something I’ve thought about and it’s a tricky one. We use property tests[1] so here’s a quick stab at how I’d like it to look like:

Test starts failing, immediately send a report with the failing input, then continue with the test case minimisation and send another report when that finishes.

[1] specifically, hypothesis on python

ranting-moth3y ago

> The above development practice works well when the SUT and TB are both defined by the same code repository and are developed together.

I once witnessed a team creating an app, specs and tests in three respective repositories. For no other reason than "each project should be in it's own repository".

The added work/maintenance around that is crazy, for absolutely no gain in that case.

nurettin3y ago

If I am given the time and resources I do this:

t003y ago

Have I misunderstood the article or it is just a matter of separating feature branches and putting relevant tests in a feature branch while keeping regression in a master branch?

j / k navigate · click thread line to collapse