Improving end-to-end test reliability (opens in new tab)

(frantic.im)

57 pointsdailymorn4y ago22 comments

22 comments

I just want to plug Playwright by Microsoft as I've been using it over the past month and have had a really great experience with it: https://playwright.dev

It's built by the founders of Puppeteer which came out of the Chrome team. Some things I like about it:

1. It's reliable and implements auto-waiting as described in the article. You can use modern async/await syntax and it ensures elements are a) attached to the DOM, visible, stable (not animating), can receive events, and are enabled: https://playwright.dev/docs/actionability

2. It's fast — It creates multiple processes and runs tests in parallel, unlike e.g. Cypress.

3. It's cross-browser — supports Chrome, Safari, and Firefox out-of-the-box.

4. The tracing tools are incredible, you can step through the entire test execution and get a live DOM that you can inspect with your browser's existing developer tools, see all console.logs, etc...

5. The developers and community are incredibly responsive. This is one of the biggest ones — issues are quickly responded to and addressed often by the founders, pull requests are welcomed and Slack is highly active and respectful.

My prior experience with end-to-end tests was that they were highly buggy and unreliable and so Playwright was a welcome surprise and inspired me to fully test all the variations of our checkout flow.

wereHamster4y ago

Have you used Cypress before? If yes I'd be interested in a comparison from your perspective.

vthommeret4y ago

I did, but only very briefly. I originally wasn't looking for an E2E tool but was evaluating another tool for a different problem (Nx) which included Cypress as part of its opinionated defaults.

Cypress was a surprisingly nice experience as well and led me to research other modern e2e tools. Most of the points above can be compared against Cypress — Playwright supports parallel execution of tests within the same file on the same machine, which Cypress doesn't, and so is much faster. Cypress doesn't use modern async / await syntax. Due to its architecture, Playwright can test across tabs, work with iframes easily, which Cypress can't.

The UI for Cypress's developer tools is nice, but... as I said, Playwright's tracing UI is really excellent and the documentation is also really well done. This is also a personal thing, but I trust tools that came out of browser teams (Chrome) to emulate browsers in a more efficient way, e.g. spinning up cheap, isolated browser contexts in Chrome, the details of waiting for an element to be ready, etc...

Another post on this: https://alisterbscott.com/2021/10/27/five-reasons-why-playwr...

avensec4y ago

I appreciate the article for visibility as we could always use more knowledge and partners in the Quality Engineering space. Just recognize that these are relatively low-hanging fruit/early maturity concepts in test engineering.

Are you are discovering these for the first time? Great, happy that you are getting exposed! If you read these and think, "we could utilize these concepts with our engineers(test or not)," I would encourage you to look at it from an organizational perspective. You may want to add someone to your team(s) with these skillsets. Most automation testers understand these concepts well and can help you on the next-level maturity items.

goodusername4y ago

A really hard problem that often arises when doing E2E tests, is creating and managing test data.

If you have one or more integrations to external systems, where you cannot control your test data, it becomes much harder to write stable E2E tests.

Some don't have test environments, some have too few. Most don't allow you to setup data easily either way.

You can, of course, mock the external systems, but if they play a large enough part, your tests start looking more like integration tests again, but with the added overhead of something like browser automation.

It's a hard balance to strike.

8organicbits4y ago

> When an E2E test is failing consistently and nobody cares to fix it, that means the test isn’t useful. There’s no point in having it around.

I suspect this is a good idea, but it raises some red flags for me. People may not want to fix tests if they don't feel like they have time, or fixing tests will help their promotion (i.e. culture). Of course if you have good engineering culture, this is probably a useful signal for tests to remove.

domesticsimian4y ago

I think maybe the point of that quote is that failing tests are just noise. Either fix them or remove them. If I try to run all tests for some pull request and 20% fail, what does that tell me? In the case where we regularly have some number of continuously failing tests, it doesn't tell me much. Did my PR make it worse? Did my PR make it better? Having continuously failing tests definitely doesn't add value and explicitly makes things harder to reason about when looking at test results.

avensec4y ago

One of the reasons we test is for confidence. If we can't trust a test, it isn't providing value. It may give negative value due to the time required to inspect the failure or general erosion of trust in the test suite.

One pattern that we can apply to increase visibility or ownership is stability metrics. If a test must/should be fixed many times can be teased out once you can view these metrics. On failure, display that this test has passed in this configuration for the past x-amount of runs. - Pass the last 100 runs? High likelihood the test is highlighting a bug and must be engaged on. - 95% pass rate in the last 100 runs? It may be time to quarantine this test and add it to the remediation backlog. Your level of acceptable false-positive rates may differ depending on team velocity and suite runtimes.

"How many tests are in quarantine, what is the average time-to-fix, and what direction is this trending" are valuable metrics that we can utilize to find ownership and highlight the technical debt.

As you said, culture around such patterns isn't always there.

mleonhard4y ago

I wish there were tools for small teams to achieve this level of sophistication. It seems like only massive corporations can do testing really well, because they can afford to assign multiple engineers to build and maintain their bespoke test systems.

I'm a solopreneur building an app with Flutter. Flutter's testing support is mostly broken and or unwritten. It's very frustrating.

0xbadcafebee4y ago

This article is spot on. For $LargeNetworkHardwareVendor we maintained three different automation test frameworks for end-to-end testing. Our tests were more abstract functions that were given arguments for a particular test case. Those were then made into collections of tests that could be re-used. A configuration file allowed QE to build new test cases without programming knowledge. QE would write configs and occasionally one or two of them that could code would modify the test framework. All the tests ran in a scheduler from clusters of test-running manager-servers against globally distributed labs of hardware. While teams did have unit tests and functional tests, the end-to-end test was king (and necessary given the multiple levels of interface for that gear)

A lot of reliability in that system came from being able to quickly iterate on different levels of the system. The easier it was to solve a failure where it's happening, the more likely your bugs can be fixed quickly, so you have a healthy system (as opposed to suffering from entropy and tech debt)

TotempaaltJ4y ago

I love learnings on automated testing. From the perspective of someone who isn't used to TDD or even just building many tests, maintaining E2E tests often seems extremely cumbersome. I wonder if I'm just missing out on the best practices, or if the tooling simply hasn't evolved enough yet.

martinald4y ago

The payoff is much higher imo though. Of all the tests we do, e2e catches by far the most problems. Indeed the biggest mistakes I've made often are me thinking tests are flakey 'because e2e' when in reality they are showing a glaring problem.

Especially in mobile/web applications where you are often consuming loads of services/libraries/sdks, some in house, some external, you are often running a tiny amount of your own code. Adding tonnes of unit tests to that is sort of missing the big picture - you need to test it all works together as a user would.

sidlls4y ago

That's a rare attitude to have in the Bay Area anyway. Unfortunately. Everyone wants lots of unit tests because they run fast and give (roughly) instant feedback. Unit tests paint an incomplete picture. Too little attention given to integration tests and end-to-end tests leaves systems exposed to critically bad edge case bugs.

2 more replies

eatonphil4y ago

Yep same. For libraries unit tests are great. For applications though I feel the most value writing integration tests and e2e tests. That's what helps capture the biggest user-facing bugs.

Afton4y ago

It's about tradeoffs. On one end you have precision, speed, reliability, diagnosability. At the other end you have "realness".

Unit tests fall on the far left, workload tests/E2E tests/testing-in-production fall on the far right.

It turns out that there's no 'wrong' level, there's just different tradeoffs. I've worked at a lot of companies that embraced the realness of E2E tests, but then suffered from the maintenance/performance/diagnosability/instability of those tests. I have colleagues who worked at places that avoided E2E at all costs, and suffered because they would have a green test run, but user scenarios that a simple E2E test would have caught, were completely broken.

IMO there is a lot that can be done to improve E2E testing at most companies, but they definitely have the capacity to add value to your release/testing pipeline.

spuz4y ago

The title should mention this article is from 2019. I wonder if Facebook testing practices have changed since then.

j / k navigate · click thread line to collapse

22 comments

vthommeret4y ago

I just want to plug Playwright by Microsoft as I've been using it over the past month and have had a really great experience with it: https://playwright.dev

It's built by the founders of Puppeteer which came out of the Chrome team. Some things I like about it:

2. It's fast — It creates multiple processes and runs tests in parallel, unlike e.g. Cypress.

3. It's cross-browser — supports Chrome, Safari, and Firefox out-of-the-box.

4. The tracing tools are incredible, you can step through the entire test execution and get a live DOM that you can inspect with your browser's existing developer tools, see all console.logs, etc...

wereHamster4y ago

Have you used Cypress before? If yes I'd be interested in a comparison from your perspective.

vthommeret4y ago

I did, but only very briefly. I originally wasn't looking for an E2E tool but was evaluating another tool for a different problem (Nx) which included Cypress as part of its opinionated defaults.

Another post on this: https://alisterbscott.com/2021/10/27/five-reasons-why-playwr...

avensec4y ago

goodusername4y ago

A really hard problem that often arises when doing E2E tests, is creating and managing test data.

If you have one or more integrations to external systems, where you cannot control your test data, it becomes much harder to write stable E2E tests.

Some don't have test environments, some have too few. Most don't allow you to setup data easily either way.

It's a hard balance to strike.

8organicbits4y ago

> When an E2E test is failing consistently and nobody cares to fix it, that means the test isn’t useful. There’s no point in having it around.

domesticsimian4y ago

avensec4y ago

"How many tests are in quarantine, what is the average time-to-fix, and what direction is this trending" are valuable metrics that we can utilize to find ownership and highlight the technical debt.

As you said, culture around such patterns isn't always there.

mleonhard4y ago

I'm a solopreneur building an app with Flutter. Flutter's testing support is mostly broken and or unwritten. It's very frustrating.

0xbadcafebee4y ago

TotempaaltJ4y ago

martinald4y ago

sidlls4y ago

2 more replies

eatonphil4y ago

Yep same. For libraries unit tests are great. For applications though I feel the most value writing integration tests and e2e tests. That's what helps capture the biggest user-facing bugs.

Afton4y ago

It's about tradeoffs. On one end you have precision, speed, reliability, diagnosability. At the other end you have "realness".

Unit tests fall on the far left, workload tests/E2E tests/testing-in-production fall on the far right.

IMO there is a lot that can be done to improve E2E testing at most companies, but they definitely have the capacity to add value to your release/testing pipeline.

spuz4y ago

The title should mention this article is from 2019. I wonder if Facebook testing practices have changed since then.

j / k navigate · click thread line to collapse