The Five Levels of Logging (opens in new tab)

(aib42.net)

104 pointsdetay7y ago59 comments

59 comments

55 comments · 19 top-level

sagichmal7y ago· 9 in thread

Counterpoint: https://dave.cheney.net/2015/11/05/lets-talk-about-logging

tl;dr: Only Info and Debug levels carry their weight.

This works at a very, very small scale where you are reading your logs by hand. At this point, you don't even need log levels.

In the real world, you're not going to read your logs unless you're trying to investigate an issue. Instead, you're going to monitor your logs. And this kind of monitoring is where WARN and ERROR become helpful.

WARNs are useful because if you run into a WARN-level issue every now and again, it's no big deal, but if you run into a ton of them, that might indicate a problem that's worth fixing. A single WARN log isn't that useful, but a statistically significant number of WARN logs is a monitorable metric. Being able to detect and address issues after they're serious enough to cause elevated WARNs but before they're serious enough to cause customer impact is worthwhile.

ERRORs are useful because then you have a one-size-fits-all way of monitoring for error conditions, if for example, it's an offline job that doesn't result in a 5xx response. You should still monitor for 5xx responses, pipeline job failures, non-zero exit codes from processes, or whatever your application-level "this didn't end well" signal is. But ERROR logging is also a good metric to have if you want to have multiple log-based metrics all in one place.

sagichmal7y ago

If you're parsing your logs with a machine for triage like you describe, you should be doing structured logging, not string logging.

philwelch7y ago

And the structure will include a log level to enable this kind of classification.

1 more reply

Rapzid7y ago

We should all be doing structured logging always in 2018 IMHO. The future is now.

nicolaslem7y ago

The author argues that the error level should not exist because according to him, an error can either be handled or bubble up and logging an error is in a way handling it. If an error is handled, it is not an error anymore.

I think he is overthinking it a bit. Yes there should be very little places where an error is actually logged, yet is does not mean that it should not exist.

In web applications you very often have one instance of:

    try:
        return handle_request()
    except Exception:
        logger.exception('Something bad happened')
        return 500

Sure you can argue that as the client received a 500 it is not an error anymore, but this doesn't help you notice and fix the problem.

sagichmal7y ago

Any exception that bubbles all the way up to a request-scoped catch block indicates a bug in your handler logic. It should probably crash your program, or at least be reported as a structured event to your exception tracking system. In either case, logger.Info is sufficient to record this event in the application logs.

nicolaslem7y ago

Why should exception tracking and logging be two separated things? They perform the same job, an exception tracking system is just another way to present log records.

The real problem is that people still associate logs with lines of text in a file, whereas most logging frameworks create full objects containing a ton of information[0]. Once you stop considering logs as text, exception tracking and logging really become the same thing.

[0] https://docs.python.org/3/library/logging.html#logrecord-att...

2 more replies

helper7y ago

We use 3 log levels:

- Debug: should not show up in normal logs but you can turn it on down to a specific log message in a running process without restarting. We use structured logging so we have the additional ability to turn on debug messages down to a specific key=value pair (or set of key=value pairs).

- Info: An interesting event occurred that someone might care about. This could be an error that we were able to handle or just some important event.

- Alert: An error case that should not happen has occurred and someone needs to look into it. In addition to logging the event, an external system should be notified (e.g. create a ticket, page someone, send an email, etc).

maxxxxx7y ago

"- Debug: should not show up in normal logs but you can turn it on down to a specific log message in a running process without restarting. We use structured logging so we have the additional ability to turn on debug messages down to a specific key=value pair (or set of key=value pairs)."

I always turn on the highest log level unless there is a very strong performance reason against doing so. I have been able to resolve a lot of issues over the years because I had good logging when something went wrong.

I don't really see alerts as logging. They are an integral part of the application's behavior.

ath07y ago· 5 in thread

I support these 5 levels of logging, but given the discussion below I also would like to highlight a lesson from an SRE talk I once attended[citation needed] that really there are only three levels of alert:

1. DEBUG - Look at this when debugging, otherwise no one will ever see it. 2. TICKET - Open a ticket and have someone on your team look at this when it gets to the top of your queue. 3. PAGER - Drop what you're doing and handle this, now.

Of course, you want to track INFO and ERROR type messages because a sufficient number of them might cause someone to raise a ticket... but at scale, you probably should've built that monitoring already, and just drop INFO/ERROR down to DEBUG.

pfranz7y ago

It's nice to have articles like this to give structure to the conversation, but there's no single solution. Python has default levels of; debug, info, warning, error, and critical, but doesn't really give any guidance to what each means. At one company we tried to create some structure, but 3rd party libraries would flood messages at the "wrong" logging level.

I like these 5-levels and the 3-levels you talk about seem like a paired down version that reenforces what each level means.

To illustrate the "no single solution," we tried to use logging for command line output, neither of those systems really has a level for that. We probably should have created a new level between INFO and WARNING because of chatty libraries.

In practice I've found the DEBUG and TRACE are usually best added to code when troubleshooting. When you're writing the code you don't have a good idea what info you'd need when troubleshooting, so it's overly verbose but still missing what you'd want. When trying to stash all logging either locally or centrally (because adding logging and re-running isn't always an option) it can affect performance.

jschwartzi7y ago

I think a good log system should give you the ability to store everything but filter quickly. So for example I'll set up my systems to store everything down to LOG_DEBUG because I might want that information. But when I'm actually looking at the logs I like being able to exclude the DEBUG messages and look at the general behavior, then drill down when I think I understand what the sequence of events is. Good log systems should support this without making me filter as I'm storing the data.

Also in UNIX systems I wish there was a way to generically pipe the output of a command to the system log while preserving the return code in case the command exits. In BASH you can configure the pipe behavior to propagate errors to the calling shell, but in normal shells the last return code is the one that gets stored, which is not usually what you want. It would be really cool if the logger utility captured return codes from upstream and returned them so that I can use it in the antiquated SystemV init scripts I have to maintain.

mixmastamyk7y ago

That may work for an application, but there are many errors that occur while using a library that are expected/handled etc by the application. Perhaps separate logs can help.

yjftsjthsd-h7y ago

There are also messages which should not trigger a page themselves but will provide useful context when responding to one.

solipsism7y ago

That's DEBUG

jdmichal7y ago· 5 in thread

I have always used them as follows. Deciding between DEBUG and INFO is the most difficult. INFO, WARN, and ERROR are simple decisions.

DEBUG: Information which is relevant for tracing activity and data through the program.

INFO: Important workflow decisions made within the normal operation of the program. Allows you to tell what path a particular operation flowed down, but not with tracing every activity.

WARN: Non-fatal errors which do not prevent the operation from completing. Example: Failing to close a file after having read from it.

ERROR: Fatal errors which prevent the operation from completing. Example: Failing to open the file or reading from it.

delinka7y ago

What about FATAL errors which prevent the entire program from continuing?

jdmichal7y ago

Error and fatal are the same thing. If an operation fails and cannot be recovered, that's an error. There is no meaningful difference between an operation failing and the entire program failing. Or, from another perspective, keeping the program running is just another operation that can have fatal or non-fatal errors.

KayEss7y ago

That depends on the nature of the program. We use error and critical for server software. If there is a protocol violation on a particular connection, log an error and drop the connection, but otherwise continues. If the `accept` system call fails that's critical and the program shuts down. Depending on what we're doing there may be other critical operations, but some software doesn't have any.

derefr7y ago

Picture an Erlang web-application server.

A request that causes a 500 error response to the client, should also produce an Error log output on the backend. The request crashed. But, because this is an actor-based system, that crash didn't affect anything other than the request. It was Fatal to that interaction, but not to the system.

A request that causes the entire server process to spontaneously halt, should generate a Fatal log message. That is never supposed to be something that happens. The system is built, on a number of levels, to ensure that it doesn't happen. But it'd be really great, if it does happen, to have enough information to know why.

Ideally, a Fatal log message is accompanied by a crash dump.

ryancouto7y ago

i think OP is implying that error level and fatal level should be regarded as the same

cwingrav7y ago· 4 in thread

I'm going to weigh in on my opinionated library for logging: [woveon-logger](https://www.npmjs.com/package/woveon-logger) (js). Sorry if that's a no-no, but nobody seems to use my much loved library anyway.

The multiple levels are nice, especially for logging errors in production and tracking general information of your software. But in development, logging is often about tracking a particular aspect in your code. You need something that logs to a particular aspect that you are working on, and then can turn off that aspect once you move on. In the future, you often find yourself back to debug and need to flip that aspect back on again.

Summary: Log levels of info, warn and error all have a place. Logging to aspects is valuable during development, easy to turn off for production and useful when returning to debug.

hinkley7y ago

Logging is one of the weirder cross cutting concerns, for many reasons but especially including the one you mention.

We had to roll back an upgrade last month because one nonfatal (but still pretty bad) problem was generating a ridiculous amount of log entries. We had no way to turn it off or throttle it, so we rolled back.

I have fantasized for years about a programming language with a small runtime that supports instrumentation of the code, not just for debugging and profiling, but for logging as well. Logging becomes a set of conditional breakpoints stored with or at least near the code and you can tweak them individually.

Something akin to applying the 80/20 rule to Aspect Oriented Programming.

We fake this by implementing logging frameworks that try to short circuit out cheaply. But it really only pays off if you can marshal the arguments to the logger very cheaply, and you still have problems with third party code, which can't know that you've got workarounds for problems they consider to be fatal, and are therefore so noisy at the WARN or INFO log levels that you can't see your own log entries (Apache Foundation, I'm looking at you).

Rapzid7y ago

Check out log points: https://code.visualstudio.com/docs/editor/debugging#_logpoin...

hinkley7y ago

It's buried but my IDE has a similar feature.

I meant support for running systems, not local development.

1 more reply

Karrot_Kream7y ago

Wouldn't this be the same as naming loggers? Many logging frameworks allow you to name a logger and just reference that. From there, you can enable or disable named loggers, and get this same "apect" that you talk about.

testplzignore7y ago· 3 in thread

Info: What is the program up to. Occasional messages that indicate it is alive and doing what is expected.

Warn: Problems that you know about and wish would not happen, but do anyways. A service call has failed, unexpected input that violates the contract of the method, etc. You may be handling these conditions gracefully and/or returning an error to the caller, but still want to know when they happen. Usually you should have a ticket open in your bug tracker for these.

Error: Problems that you do not know about. When an error is logged, a developer should look at it. Some kind of action should be taken immediately. Put the fire out if it is an emergency. If not an emergency, create a ticket in your bug tracker, then change the level to warn.

I keep warn and info separate so that I can easily grep/filter on one. I often want to just see info messages to know what the status of things are.

jandrese7y ago

IMHO the difference between Warn and Error is that Warn is recoverable while Error is not.

hinkley7y ago

In an enterprise setting, treating one level of logs as actionable, as GP mentions, is a very important for clarifying responsibilities.

If it takes the right pair of eyes on the logs to see a problem for what it is, the whole process slows down and doesn't scale as high. All errors and very chatty warnings should induce me to ask someone what's going on, to make sure it hasn't slipped through the cracks. And they should thank me for pointing it out. Otherwise you reinforce some bad patterns.

paulddraper7y ago

What does recoverable mean though?

If the server process handles it without crashing, is that a recovery?

A more general distinction IMO is the one presented in the article: an Error is something that should not have happened. A Warning is something that maybe should not have happened.

js87y ago· 2 in thread

I increasingly feel that there should be two separate logs, one for the user with the Error/Warning/Info information (what the user wanted to be done and how well it went) and the other for the developer with the Debug/Trace information (capture key information about what the program was doing, especially on the interface - what files it was looking for, what database calls it did etc.).

I always get complaints from users that there is too much useless stuff in the logs through which they need to go. I also think that the developer log should be structured differently, possibly to be easily read by some automation.

exabrial7y ago

An "audit log" or "branch log", what business decisions were made, and a "debug log", what params drove said decision

franciscop7y ago

This is a very common pattern on Node.js... look into the package debug, it's structured to work as a dev help and can be enabled/disabled per part. So you only enable it for the bit you are debugging and unrelated info gets filtered out.

kripke7y ago· 2 in thread

> Connecting to server:4242 as "user":"hunter2"

as an example of DEBUG logging is very wrong. Passwords should not end up in your logs, ever.

jschwartzi7y ago

What password? All I see is * * * * * * *.

sokoloff7y ago

Hah. Context for others: http://bash.org/?244321

hota_mazi7y ago· 2 in thread

I've always preferred levels of logging rather than these arbitrary names (what's the difference between trace and debug, and which one is lower level?).

Give me three levels of logging, 1, 2, and 3, and then warning, and error. Maybe allow to mix and match them, if this makes sense (the typical nomenclature makes them all mutually exclusive, which is dumb).

There are your five levels.

delinka7y ago

Depending on your Coding Culture, you order them based on "priority" (FATAL would have priority over all others, and therefore numbered as 1), or "importance" (FATAL would be most important, and therefore numbered 5.)

Alternatively, each is a flag in a number so that they can be combined, with FATAL being 1 or 16 depending on priority vs. importance.

I would argue that the names do indeed suffice to tell you their "level" - TRACE is the noisiest, least important, least priority level. DEBUG is next, followed by INFO, WARN, ERROR, FATAL.

zoomablemind7y ago

Curiously, in the days of OpenVMS this was called 'severity' level and was a global system concept [0] related to status code of execution.

Info, Warning, and Error are degrees of Success or Fatal/Severe.

The status would then be logged in a structured manner that shows a facility code, serverity, message.

0. https://h41379.www4.hpe.com/wizard/wiz_3582.html

exabrial7y ago· 1 in thread

Talking about logging got me a rejection at Google. The very kind 22 year old interviewing me asked what my thoughts on commenting code was and I said most every time it's used as an excuse to show off a piece of overly clever code, so normally I discourage it in favor of logging, as they are code comments you can use in production. I realize now that was my first graybeard moment.

paulddraper7y ago

Without diving too far down the comment rabbit hole (complex but efficient algorithms, external references, legal retirements, backwards compatibilty), that's not true though.

A comment is attached to code. A log is attached to an event.

Conflating those will either give you very bad logs or very bad comments.

tzs7y ago· 1 in thread

The article is a bit confusing, because the title says 5, the article gives 6 (Error, Warning, Info, Debug, Trace, and Fatal), but the article casts doubt on whether either of those is right when it says this:

> I would like to talk about the five (or six, or four) that I find are the most crucial, the fundamental set to base your own levels (and your thinking) on.

Which are the 2 from the 6 given whose status as levels of logging is in doubt?

Jarwain7y ago

He orders it in terms of relevance too. Note that the last two levels he states are Trace and Fatal. He talks about how Trace may not always be enabled, or how Fatal doesn't necessarily make sense if such an error needs to be handled manually anyways.

xtracto7y ago· 1 in thread

This article reminded me of a writing that an ex-colleague of mine did at a previous company: https://nerds.kueski.com/better-logging/

Everybody developer knows that they should include logging in their systems, however how and what to log is not something so well documented.

When you start working in a system (i.e. at a startup) you put the odd log here and there, without too much attention. However, as your system(s) starts growing, you have to start paying attention at your logs' structure and semantics, so that they are more useful in ELK, Sumologic and similar log searching engines.

pfranz7y ago

What's nice about something like Elasticsearch is if you categorize your logging levels you can more easily drop logs of a certain level at varying times. Debug/verbose/trace is a lot less useful after, say, 24hrs and takes up a lot of space while Fatal errors should be rare and probably want to be tracked year-over-year or between releases.

dionian7y ago· 1 in thread

trace: used for visualizing flow of program, extremely noisy and never on prod

debug: low level flow of program, inputs/outputs

info: what I want to see at runtime in prod, high level flow of program, details about operation so I know what's going on if I want to look

warn: things that may indicate errors or other unexpected situations to be aware of that may be relevant to problems elsewhere

error: something serious or unexpected that probably indicates a defect.. issue that needs to be looked into... unhandled/unexpected state

ohazi7y ago

trace: printf debugging

Dowwie7y ago

Debug: If you can read this, you are standing too close to the program.

blaisio7y ago

I really believe only two levels are necessary most of the time: error, and info. Error is for anything that a dev should be notified about (AKA if you send an error log message, assume someone will be notified about it immediately), and info is for any other useful debugging information, like "we got a request for /foo/bar at 10 pm". I also think structured logging is absolutely essential. Log levels are really just a way to get some of the benefits of structured logging.

grkvlt7y ago

I was really hoping this might continue with 'six, if you work with IoT devices' or something like that, as in the Larry Miller[0] routine.[1] Never mind, I'll leave the links here anyway...

0. https://en.wikipedia.org/wiki/Larry_Miller_(comedian)

1. https://youtu.be/eceSHKqwUPY?t=18

dstick7y ago

Thanks for this! Was just about to start implementing a logging subsystem myself so the timing is perfect :)

stephengillie7y ago

Netscaler logging deserves its own level of hell. You're given 15 undocumented logging levels, with the suggestion not to leave it running longer than necessary, and that level 11 should be enough for anybody.

nwmcsween7y ago

The issue with logging levels is they only convey basic information through the levels and the rest through the actual message which results in the mess that is log processing.

yacse7y ago

Logging is very important because it also allows you to debug/resolve issues easily on production systems. But caring about the appropriate log levels and the ability to configure them at runtime is also super important. It's just insane and ridiculous to log everything on a trace/debug level and stream it to the Expensive Innovative Big Data Machine Learning Smart Log Analytics Product. Yes, I'm looking at you Application Insights.

j / k navigate · click thread line to collapse

59 comments

55 comments · 19 top-level

sagichmal7y ago· 9 in thread

Counterpoint: https://dave.cheney.net/2015/11/05/lets-talk-about-logging

tl;dr: Only Info and Debug levels carry their weight.

philwelch7y ago

This works at a very, very small scale where you are reading your logs by hand. At this point, you don't even need log levels.

sagichmal7y ago

If you're parsing your logs with a machine for triage like you describe, you should be doing structured logging, not string logging.

philwelch7y ago

And the structure will include a log level to enable this kind of classification.

1 more reply

Rapzid7y ago

We should all be doing structured logging always in 2018 IMHO. The future is now.

nicolaslem7y ago

I think he is overthinking it a bit. Yes there should be very little places where an error is actually logged, yet is does not mean that it should not exist.

In web applications you very often have one instance of:

    try:
        return handle_request()
    except Exception:
        logger.exception('Something bad happened')
        return 500

Sure you can argue that as the client received a 500 it is not an error anymore, but this doesn't help you notice and fix the problem.

sagichmal7y ago

nicolaslem7y ago

Why should exception tracking and logging be two separated things? They perform the same job, an exception tracking system is just another way to present log records.

[0] https://docs.python.org/3/library/logging.html#logrecord-att...

2 more replies

helper7y ago

We use 3 log levels:

- Info: An interesting event occurred that someone might care about. This could be an error that we were able to handle or just some important event.

maxxxxx7y ago

I don't really see alerts as logging. They are an integral part of the application's behavior.

ath07y ago· 5 in thread

pfranz7y ago

I like these 5-levels and the 3-levels you talk about seem like a paired down version that reenforces what each level means.

jschwartzi7y ago

mixmastamyk7y ago

That may work for an application, but there are many errors that occur while using a library that are expected/handled etc by the application. Perhaps separate logs can help.

yjftsjthsd-h7y ago

There are also messages which should not trigger a page themselves but will provide useful context when responding to one.

solipsism7y ago

That's DEBUG

jdmichal7y ago· 5 in thread

I have always used them as follows. Deciding between DEBUG and INFO is the most difficult. INFO, WARN, and ERROR are simple decisions.

DEBUG: Information which is relevant for tracing activity and data through the program.

INFO: Important workflow decisions made within the normal operation of the program. Allows you to tell what path a particular operation flowed down, but not with tracing every activity.

WARN: Non-fatal errors which do not prevent the operation from completing. Example: Failing to close a file after having read from it.

ERROR: Fatal errors which prevent the operation from completing. Example: Failing to open the file or reading from it.

delinka7y ago

What about FATAL errors which prevent the entire program from continuing?

jdmichal7y ago

KayEss7y ago

derefr7y ago

Picture an Erlang web-application server.

Ideally, a Fatal log message is accompanied by a crash dump.

ryancouto7y ago

i think OP is implying that error level and fatal level should be regarded as the same

cwingrav7y ago· 4 in thread

Summary: Log levels of info, warn and error all have a place. Logging to aspects is valuable during development, easy to turn off for production and useful when returning to debug.

hinkley7y ago

Logging is one of the weirder cross cutting concerns, for many reasons but especially including the one you mention.

Something akin to applying the 80/20 rule to Aspect Oriented Programming.

Rapzid7y ago

Check out log points: https://code.visualstudio.com/docs/editor/debugging#_logpoin...

hinkley7y ago

It's buried but my IDE has a similar feature.

I meant support for running systems, not local development.

1 more reply

Karrot_Kream7y ago

testplzignore7y ago· 3 in thread

Info: What is the program up to. Occasional messages that indicate it is alive and doing what is expected.

I keep warn and info separate so that I can easily grep/filter on one. I often want to just see info messages to know what the status of things are.

jandrese7y ago

IMHO the difference between Warn and Error is that Warn is recoverable while Error is not.

hinkley7y ago

In an enterprise setting, treating one level of logs as actionable, as GP mentions, is a very important for clarifying responsibilities.

paulddraper7y ago

What does recoverable mean though?

If the server process handles it without crashing, is that a recovery?

A more general distinction IMO is the one presented in the article: an Error is something that should not have happened. A Warning is something that maybe should not have happened.

js87y ago· 2 in thread

exabrial7y ago

An "audit log" or "branch log", what business decisions were made, and a "debug log", what params drove said decision

franciscop7y ago

kripke7y ago· 2 in thread

> Connecting to server:4242 as "user":"hunter2"

as an example of DEBUG logging is very wrong. Passwords should not end up in your logs, ever.

jschwartzi7y ago

What password? All I see is * * * * * * *.

sokoloff7y ago

Hah. Context for others: http://bash.org/?244321

hota_mazi7y ago· 2 in thread

I've always preferred levels of logging rather than these arbitrary names (what's the difference between trace and debug, and which one is lower level?).

There are your five levels.

delinka7y ago

Alternatively, each is a flag in a number so that they can be combined, with FATAL being 1 or 16 depending on priority vs. importance.

I would argue that the names do indeed suffice to tell you their "level" - TRACE is the noisiest, least important, least priority level. DEBUG is next, followed by INFO, WARN, ERROR, FATAL.

zoomablemind7y ago

Curiously, in the days of OpenVMS this was called 'severity' level and was a global system concept [0] related to status code of execution.

Info, Warning, and Error are degrees of Success or Fatal/Severe.

The status would then be logged in a structured manner that shows a facility code, serverity, message.

0. https://h41379.www4.hpe.com/wizard/wiz_3582.html

exabrial7y ago· 1 in thread

paulddraper7y ago

Without diving too far down the comment rabbit hole (complex but efficient algorithms, external references, legal retirements, backwards compatibilty), that's not true though.

A comment is attached to code. A log is attached to an event.

Conflating those will either give you very bad logs or very bad comments.

tzs7y ago· 1 in thread

> I would like to talk about the five (or six, or four) that I find are the most crucial, the fundamental set to base your own levels (and your thinking) on.

Which are the 2 from the 6 given whose status as levels of logging is in doubt?

Jarwain7y ago

xtracto7y ago· 1 in thread

This article reminded me of a writing that an ex-colleague of mine did at a previous company: https://nerds.kueski.com/better-logging/

Everybody developer knows that they should include logging in their systems, however how and what to log is not something so well documented.

pfranz7y ago

dionian7y ago· 1 in thread

trace: used for visualizing flow of program, extremely noisy and never on prod

debug: low level flow of program, inputs/outputs

info: what I want to see at runtime in prod, high level flow of program, details about operation so I know what's going on if I want to look

warn: things that may indicate errors or other unexpected situations to be aware of that may be relevant to problems elsewhere

error: something serious or unexpected that probably indicates a defect.. issue that needs to be looked into... unhandled/unexpected state

ohazi7y ago

trace: printf debugging

Dowwie7y ago

Debug: If you can read this, you are standing too close to the program.

blaisio7y ago

grkvlt7y ago

I was really hoping this might continue with 'six, if you work with IoT devices' or something like that, as in the Larry Miller[0] routine.[1] Never mind, I'll leave the links here anyway...

0. https://en.wikipedia.org/wiki/Larry_Miller_(comedian)

1. https://youtu.be/eceSHKqwUPY?t=18

dstick7y ago

Thanks for this! Was just about to start implementing a logging subsystem myself so the timing is perfect :)

stephengillie7y ago

nwmcsween7y ago

The issue with logging levels is they only convey basic information through the levels and the rest through the actual message which results in the mess that is log processing.

yacse7y ago

j / k navigate · click thread line to collapse