mickeyben on Hacker News

There are great services like Rollbar, Bugsnag, etc. helping find and sort our application exceptions. When our engineering team was small, exceptions were very easy to manage. Determining the priority of the exception was simple: is it affecting customers? what flow is it affecting? how many occurrences? Once you have answers to those three questions, you can quickly triage the exception.

Over time, the number of exceptions in our infrastructure grew, the number of 3rd-party dependencies also grew, the traffic to our website increased 10X, and as a result the quantity of exceptions has made it difficult to parse the signal from the noise.

My question is: how do other engineering teams manage their exceptions? I don't want to ignore exceptions that could be real issues that we should be fixing and I also don't want to litter our code with code that does nothing other than prevent non-harmful, non-user generated exceptions as that would be layering on complexity to our codebase.

We tried a new approach recently, based on a spreadsheet + a few lines of Google app script. Basically we elect a “Bug Master” every week whose responsibility is to sort the bugs. This worked ok for a while but engineers don’t all have the same involvement, aren’t always available, they might be in a rush to release something, busy with meetings, etc. And in the mean time, exceptions stack up. Potentially, important ones.

Any insights would be greatly appreciated as we're trying to find a more sustainable and scalable way for our team to handle exceptions.

7mickeyben11y ago5

15

Redis latency spikes and the Linux kernel: a few more details (opens in new tab)

(antirez.com)

173mickeyben11y ago44

mickeyben

Recent submissions

Being explicit about expectations isn't micromanagement (opens in new tab)

Challenges of Running a Global Engineering Team (opens in new tab)

EU Resolution on Encryption (opens in new tab)

Dagster: A open-source Python library for building data applications (opens in new tab)

Design System and API-Driven UI (opens in new tab)

Evolution of Our Continuous Delivery Process (opens in new tab)

Best Practices for Large Features (opens in new tab)

Story of a junior developer after a coding bootcamp (opens in new tab)

Sidekiq Instrumentation (opens in new tab)

Taskqueues tips (opens in new tab)

Introducing the AWS IoT Button Enterprise Program (opens in new tab)

AWS Managed Services – Infrastructure Operations Management for the Enterprise (opens in new tab)

After a year of using Nodejs in production (opens in new tab)

Ask HN: How do large engineering teams manage application exceptions?

Redis latency spikes and the Linux kernel: a few more details (opens in new tab)

Recent submissions

Being explicit about expectations isn't micromanagement (opens in new tab)

Challenges of Running a Global Engineering Team (opens in new tab)

EU Resolution on Encryption (opens in new tab)

Dagster: A open-source Python library for building data applications (opens in new tab)

Design System and API-Driven UI (opens in new tab)

Evolution of Our Continuous Delivery Process (opens in new tab)

Best Practices for Large Features (opens in new tab)

Story of a junior developer after a coding bootcamp (opens in new tab)

Sidekiq Instrumentation (opens in new tab)

Taskqueues tips (opens in new tab)

Introducing the AWS IoT Button Enterprise Program (opens in new tab)

AWS Managed Services – Infrastructure Operations Management for the Enterprise (opens in new tab)

After a year of using Nodejs in production (opens in new tab)

Ask HN: How do large engineering teams manage application exceptions?

Redis latency spikes and the Linux kernel: a few more details (opens in new tab)