Grepping logs is terrible (opens in new tab)

(asylum.madhouse-project.org)

94 points_5csa11y ago102 comments

102 comments

71 comments · 24 top-level

moonshinefe11y ago· 10 in thread

Yes, grepping logs is terrible if "you have 100Gb of logs a day". I'm not sure why the author is thinking his use case is anything near the norm or why he's shocked in most use cases people prefer text files.

I'm also not getting why he just doesn't use scripts to parse the logs and insert them into a database at that point. Why use some ad-hoc logging binary format if you're doing complex queries that SQL would be better suited for anyway, on proven db systems?

Maybe I'm missing something.

mrweasel11y ago

Grepping is just fine if you have a few hundred megabytes of data a day, so wanting to kill text based logging, because YOU reached multiple gigabytes a day is going to be met with resistance from the people who don't have those issues.

As the author himself points out: "I'm sorry, but deciding how much and what we log is not your job. Its ours, and this is the amount we have to deal with."

That goes both ways. If I only have one or two servers, having to run a centralized logging services doesn't scale either, the overhead is not worth the trouble.

If I want to look for an IP in logs from multiple service, text files are perfect. Doing the same across multiple servers, yes, then you want centralized logging. Binary logging ruins the first case, while text based works in both (sort of).

I don't really see the point of binary logs. Either you're small enough that text files won't be an issue, or you're large enough to have centralized logging.

It seems that there's a push towards "scaleable solution" for everything, but people keep forgetting that you need to scale down as well. Most of us will never have to run more than a handful of servers, and in these cases the Twitter/Google/Facebook-like infrastructure just isn't worth the hassle.

lazzlazzlazz11y ago

I think I'm missing the same thing. He keeps going on about structure, but it wasn't obvious to me where the solution (?) actually introduces query-able structure.

He needs a log database, clearly. And when you put it that way, it's obvious why grepping logs is a nice, quick solution in many cases when you aren't getting "100Gb of logs a day".

chakrit11y ago

I think his point was that querying a structured data is better than grepping unstructured text. SQL vs Regex, for example. I get the impression that he didn't state what solution to use but simply that binary/structured > text/unstructured. He even says that Journal isn't his ideal solution and never will be.

fletchowns11y ago

Throwing 100GB of data into a relational database and being able to run your queries quickly isn't exactly a no brainer

nosir3311y ago

One of the initial challenges I see from an OPS perspective is that the most recent logs are often the most interesting. The latency of the logs being ingested into a DB would prevent me from using the DB. Generally, I find my self grepping logs on the prod servers.

ars11y ago

"you have 100Gb of logs a day"

Logs have lots of redundancy, so they compress quite nicely. So it is actually practical to grep those files since on disk they are not so large, and 100Gb of memory data is not a problem to grep.

leni53611y ago

The author shows a use case for both a small and a large logging system. The use case is complex queries which spans multiple applications and don't need regex ninja skills but sensible queries.

onli11y ago

he does not. His small logging system is not small at all, it spans multiple systems and has requirements that are not at all typical for small systems.

1 more reply

madhouse11y ago

You're missing the point. I'm not using a custom logging format. I'm using binary log storage, with emphasis on the storage. There is a database and a search engine behind it.

Logging format and log storage format are two very different things.

Also, I'm not shocked people prefer text files. I'm shocked why they're so much against binary log storage. There's an important distinction between the two: you can prefer text, if that fits your case better, without hating on binary storage.

gambiter11y ago

> There's an important distinction between the two: you can prefer text, if that fits your case better, without hating on binary storage.

Except according to the article (which you posted and are defending all over this thread, so I'm guessing you actually wrote it?) the author has NO intention of honoring those who prefer text logs, in fact using the phrase "so vigilantly against text based log storage". To use your own reply, you can prefer binary, if it fits your case better, BUT DON'T HATE ON TEXT STORAGE.

blueskin_11y ago· 7 in thread

People don't want it because it's binary, not because you can't grep it.

* you need to use a new proprietary tool to interact with them

* all scripts relating to logs are now broken

* binary logs are easy to corrupt, e.g. if they didn't get closed properly.

>You can have a binary index and text logs too! / You can. But what's the point?

The point is having human-readable logs without having to use a proprietary piece of crap to read them. A binary index would actually be a perfect solution - if you're worried about the extra space readable logs take, just .gz/.bz2 them; on decent hardware, the performance penalty for reading is almost nonexistent.

If you generate 100GB/day, you should be feeding them into logstash and using elasticsearch to go through them (or use splunk if $money > $sense), not keeping them as files. Grepping logs can't do all the stuff the author wants anyway, but existing tools can, that are compatible with rsyslog, meaning there is no need for the monstrosity that is systemd.

oblio11y ago

What's wrong with Splunk? Honest question.

blueskin_11y ago

Price, mostly. It's good, but there are alternatives that aren't as ridiculously expensive.

1 more reply

fletchowns11y ago

It's expensive

madhouse11y ago

* Why would you need a proprietary tool? * What if they get broken? I don't want to look at them raw anyway. * Text logs are easy to corrupt as well. Oh, append only? Well, you can do that with binary storage too.

And again, there is no need for proprietary tools at all. Everything I want to do is achievable with free software - so much so, that I use only such software in all my systems.

As for compressing - yeah, no. Please try compressing 100Gb of data and tell me the performance cost is nonexistent.

As for LogStash & ES: Guess what: their storage is binary.

Also note that my article explicitly said that the Journal is unfit for my use cases.

leni53611y ago

Why does it have to be proprietary?

cthalupa11y ago

It doesn't have to be - but let's look at reality here. NIH syndrome is everywhere, we have millions of competing protocols and formats, everyone thinks they can build a better solution than someone else, etc.

I suppose that if there was a large push to universally log things in binary the possibility exists that sanity would prevail and we'd get one format that everyone agreed upon, but I don't see any reason that this would be the case when historically it basically never happens.

So, at least from my prediction of a future where binary logging is the norm, we have a half dozen or so competing primary formats, and then random edge cases where people have rolled their own, all with different tools needed to parse them.

Or we could stick with good ol' regular text files and if you want to make it binary or throw it in ELK/splunk or log4j or pipe it over netcat across an ISDN line to a server you hid with a solar panel and a satellite phone in Angkor Wat upon which you apply ROT13 and copy it to 5.25 floppy, you can do it on your own and inflict whatever madness you want while leaving me out of it.

1 more reply

regularfry11y ago

It doesn't, but nothing is universal like `grep`. If you find a machine that's logging stuff which doesn't have `grep`, you're already having a bad day.

You just can't say that about binary log formats. Text is a lowest common denominator; and yes, that cuts both ways, but the advantages of universality can't be trivially thrown away.

1 more reply

616c11y ago· 7 in thread

On a slightly unrelated note, as a largely amateur Linux user: have people made systems that instead of grepping for info, use machine learning do detect normal patterns of a log file (like what type of events, similar, at different intervals) and report the anomalous output via email or report to an admin?

I was thinking this would be a cool area of research for me to try programming again, but it seems so daunting I am not sure where to start.

nosir3311y ago

I don't know of any systems that do this.

As an software developer, I generally use log levels to indicate severity in my logs. So grepping for ERROR should catch anything I had the foresight to log at the ERROR level.

Simple heuristics like the number of WARN level logs a minute may be useful.

Beyond that it sounds interesting. It may be hard to do in a general way, so focusing on Apache logs or something common may be a simpler task.

falcolas11y ago

In addition to logging, you can send out a statsd[0] message, graph it, and use something like Skyline[1] for alerting based on trend issues. You can also use logstash to generate metrics on logs when sending them up to Elasticsearch.

[0] https://github.com/etsy/statsd [1] https://github.com/etsy/skyline

1 more reply

ars11y ago

I use the logwatch program for that. There is no machine learning, it's entirely manual with a large list of things it filters out, but the defaults are quite good.

It emails me any log entires it doesn't know about. I did have to add a large number of ssh lines that it should not bother me about, but other than that it works very well and I find it very useful.

616c11y ago

Cool tip. I remember hearing the name but knowing it had these features. I will definitely check it out.

malka11y ago

you can use fail2ban for this. It is used to automatically ban IP that, for instance, tries to bruteforce your SSH, but it really is an engine that match regexp log file lines, and fires an action if the regexp match.

So you can use it for other usages (such as sending an admin a mail if suddenly your server sends 500 errors, or a unusual amount of 404 errors for instance)

616c11y ago

Of course. Not to be dismissive, but I am familiar with fail2ban. I was wondering if anyone had this idea that did not require manual or pre-set rules, like that the program would go passive for a few days, reading log files and learning certain log entries will be indentical minus timestamp, then some change with a small amount of text in addition, and others have never been seen (or will not match in the next stage). Next stage turns active, and the machine filters down and sends you anything it has not seen over time and knows must be something anomolous.

I like fail2ban, a lot, and alternatives in that field, but when I looked at the Arch Linux package last time there were dozens of commented-out, but heavily commented nonetheless regexp template files like you describe. I think this would be a neat machine learning thing.

What I am going for: use AI to train a passive entry-level sysadmin to warn you.

madhouse11y ago

I experimented with that, and heard others toying with the idea too. There are even products out there that do something similar.

tatterdemalion11y ago· 4 in thread

This applies more generally than just to logs. I love Unix, but "everything is text" is not actually great. It's better that Unix utils output arbitrary ASCII than that they output arbitrary binary data, but it's obvious why people don't do serious IPC 'the Unix way.' Imagine if instead of exchanging JSON, or ProtoBufs, or whatever, your programs all exchanged text you had to regex into some sort of adhoc structure. So why do we manage our logs and our pipelines that way? There's no actual reason that the terminal couldn't interpret structured data into text for us so that, in the world of intercommunicating processes on the other side of the TTY, everything is well-structured, semantically comprehensible data.

pjc5011y ago

This is the PowerShell argument. It's a step in the right direction, but it needs the tooling and user community to come along with it.

The advantage of the traditional unix pipe manipulation tools is that most of them are simpler and faster than regex.

ygra11y ago

> There's no actual reason that the terminal couldn't interpret structured data into text for us so that, in the world of intercommunicating processes on the other side of the TTY, everything is well-structured, semantically comprehensible data.

I think you just described PowerShell (or things that follow down the same path, e.g. TermKit) ;-)

njharman11y ago

JSON is text!

Text is not synonymous with unstructured.

tatterdemalion11y ago

Of course JSON is encoded in Unicode, making it "text," but when it is said that text is the universal protocol of Unix, it means that the only guarantee a well-behaving Unix utility can make is that it will output ASCII. You cannot leverage the further structure of JSON or any other protocol because utilities that interpret JSON do not compose with those many Unix utilities which emit non-JSON data.

Only entropic bits are truly "unstructured data." The question is one of how much semantic structure you can rely on in the data you are processing, which is a continuum.

onion2k11y ago· 3 in thread

Binary logs are opaque! Just as much as text logs.

I don't agree with the second assertion there. Text logs are only opaque as far as the format is concerned, but not so much as far as the content goes. Using the example in the article;

    127.0.0.1 - - [04/May/2015:16:02:53 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0"

You can read a lot of information without knowing the format, the application that generated it, or even which file it was in - you know it's something to do with localhost, you know when it happened, you know the protocol, from which you can infer the "304" means Not Modified, and you know it came from a Mozilla agent. That's a lot more information than you could get from a binary log without any tools.

That isn't necessarily an argument against binary logging, but the notion that text log files are opaque in the same way as binary logs isn't really true.

nosir3311y ago

> That's a lot more information than you could get from a binary log without any tools.

The environment I work in I am frequently looking at logs that other teams generate. If I needed to ramp up on their custom logging toolset just to perform simple queries I am going to give up and waste the the teams time by getting them to perform the queries for me.

Jedd11y ago

(Background: I'm not a journal apologist. For a fact I'm finding it challenging to make the mental shifts required to become adept with this new suite of system tools on my myriad Debian boxen.)

> That's a lot more information than you could get from a binary log without any tools.

Arguably you need a tool to get the information you showed above - a single line from an apache log. The tool may have been grep, cat, vi, awk, less, or whatever. That it was installed as part of a base-build on your computer, or at the behest of your usual configuration management system, is either kind of aside, or kind of the point.

Journal uses a bunch of diagnostic & query tools that get installed at the same time that the journal is installed. Yes, the tool / command to get the same type of data you're looking at above -- something that is comparably readable to a line from an apache log file -- is going to be different. But only different.

onli11y ago

It is not only different. It is also less universal.

With a text based logging system, I can take the usb stick with the system that does not boot on my headless homeserver to any computer and read the logs there. I could even boot the original linux system on that server, running a really old kernel and practically no userland tools, and read them there. Cause that server was using journald, that was not possible. Still don't know what went wrong.

3 more replies

datenwolf11y ago· 3 in thread

    > Embedded systems don't have the resources!
    > ...
    > I'd still use a binary log storage, because
    > I find that more efficient to write and parse,
    > but the indexing part is useless in this case.

This is yet again a case of a programmer completely misjudging how an actual implementation will perform in the real world.

When I wrote the logging system for this thing http://optores.com/index.php/products/1-1310nm-mhz-fdml-lase... I first fell for the very same misjudgement: "This is running on a small, embedded processor: Binary will probably be much more efficient and simpler."

So I actually did first implement a binary logging system. Not only logging, but also the code to retrieve and display the logs via the front panel user interface. And the performance was absolutely terrible. Also the code to manage the binary structure in the round robin staging area, working in concert with the storage dump became an absolute mess; mind you the whole thing is thread safe, so this also means that logging can cause inter thread synchronization on a device that puts hard realtime demands on some threads.

Eventually I came to the conclusion to go back and try a simple, text only log dumper with some text pattern matching for the log retrieval. Result: The text based logging system code is only about 35% of the binary logging code and it's about 10 times faster because it doesn't spend all these CPU cycles structuring the binary. And even that text pattern matching is faster than walking the binary structure.

Like so often... premature optimization.

madhouse11y ago

I've worked with a number of implementations, both embedded and others (ranging from a PC under my desk, through dozen-node clusters to ~hundred nodes). For most cases, binary storage triumphed. Most often, we kept text based transport.

Again, transport and storage are different. While I prefer binary storage, most of my transports are text (at least in large part, some binary wrapping may be present here and there).

MrBuddyCasino11y ago

Cool tech you have there, but I only understood it once I saw the video. You basically have a very fast laser that can do volumetric scans at a high framerate, did I get this right? What do people typically use it for?

datenwolf11y ago

    > You basically have a very fast laser that
    > can do volumetric scans at a high framerate,
    > did I get this right?

Sort of. The laser itself is constantly sweeping its wavelength (over a bandwidth of >100nm). Using it as a light source in a interferometer where one leg is reflected by a fixed mirror and the other leg goes into the sample something interesting happens: The interferometric fringes produced for a certain wavelength correspond to the spatial frequency of scattering in the sample. So the fringe distribution over wavelengths is the Fourier transform of the scattering distribution. So by applying an inverse Fourier transform to the wavelength spectrum of the light coming out of the interferometer you get a depth profile.

Now the challenge is to get the wavelength spectrum. You can either use a broadband CW light source and a spectrometer. But these are slow, so you can't generate depth scans at more than about 30kHz (which is too slow for 3D but suffices for 2D imaging). Or you can encode the wavelength in time and use a very fast photodetector (those go up to well over 4GHz bandwidth).

This is what we do: Have a laser that sweeps over 100nm at a rate >1.5MHz and use a very fast digitizer (1.8GS/s) to obtain a interference spectrum with over 1k sampling points. Then apply a little bit of DSP (mapping time to wavelength, resampling, windowing, iFFT, dynamic range compression) and you get a volume dataset.

BTW, all the GPU OCT processing and visualization code I wrote, too.

    > What do people typically use it for?

Mostly for OCT, but you can also use it for fiber sensing (using fiber optics as sensors in harsh environments), Raman spectroscopic imaging, short pulse generation and a few other applications. But OCT is the bread and butter application for these things.

1 more reply

leni53611y ago· 3 in thread

I don't have experience with binary logs. I think the fragility of binary logs is not baseless though. AFAIK there was (is?) a problem in systemd's journal where a local corruption of the log could cause a global unavailability of the logged data.

People like text logs because local corruptions remain local. Some lines could be gibberish, but that's all. I'm not suggesting that this couldn't be done with binary logs, but you have to carefully design your binary logging format to keep this property.

Otherwise I agree with the author that we shouldn't be afraid of binary formats in general, we need much more general formats and tools though (grep, less equivalents).

I'm not fond of "human readable" tree formats like XML or JSON either. bencode could be equally "human readable" as an utf-8 text if one has a less equivalent for bencode.

616c11y ago

> I don't have experience with binary logs. I think the fragility of binary logs is not baseless though. AFAIK there was (is?) a problem in systemd's journal where a local corruption of the log could cause a global unavailability of the logged data.

From my experience (I do not want to troll and presume you have not tried it), systemd starts off where it picked up when an old log is corrupted and stars a new one. There is a command line utility to verify the integrity of these files (on my Windows laptop at work, cannot check). Now, I am not sure the state of log file repair. I was told it is not possible. However, it seems this means the file is corrupted in a way it is not easily indexed. It is likely it is still readable. I wish I had seen this last time.

https://www.reddit.com/r/linux/comments/1y6q0l/systemds_bina...

Granted, I use Arch Linux on an old laptop. I had these corruptions routinely happen when I had disabled ACPI controls (I do not use the fancy WMs, I am back to Ratpoision) and completely, and I mean completely drained the battery until it came crashing to a halt). So, I am not surprised about these corruptions.

Anyone using systemd boxes in production who can comment on this? Flamewar or not, I would like to know more. I do not really care for it one way or the other. Parts I like, parts I do not.

realusername11y ago

I was thinking exactly the same, once you want to create a binary efficient format which you can query, you then have the same problems as a database. And if there is something we have learned in the history of computing, it's that databases are hard to design properly, and especially from scratch.

TheLoneWolfling11y ago

And especially when you want it to be immune to random failures without data loss.

The last few entries of a log file before something catastrophic happens are precisely the entries that are the most important to make sure they aren't lost.

dsr_11y ago· 2 in thread

Change for the sake of change is anti-engineering. It is anti-productive. Your changes must be improvements, and they must not cost more than they save or generate in a reasonable period of time.

Many organizations have a fully functional, well-debugged logging infrastructure. The basic design happened years ago, was implemented years ago, and was expected to be useful basically forever. Growth was planned for. Ongoing expenses expected to be small.

That's what happens when you build reliable systems on technologies that are as well understood as bricks and mortar. You get multiple independent implementations which are generally interoperable. You get robustness. And you get cost-efficiency, because any changes you decide to make can be incremental.

Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?

Short term solutions are generally non-optimal in the long term. Using AWS, Google Compute and other instant-service cloud mechanisms trades money, security and control for speed of deployment. An efficient mature company may well wish to trade in the opposite direction: reducing operating costs by planning, understanding growth and making investments instead of paying rent.

Forcing a major incompatible change in basic infrastructure rather than offering it as an option to people who want to take advantage of it is an anti-pattern.

VLM11y ago

"Growth was planned for."

One interesting problem with almost all of the "advantages" of binary logs, is if they're good reasons today, they would have been really awesome reasons in '93 when I started admining my first linux box. The problem with changing the way I've been doing things is I'm already used to the staggering change in performance from a 40 meg non-DMA PATA drive in '93 to dual raid fractional terabyte SSDs. Its really quite a boost in raw power. Yet what I need to log hasn't changed much. So performance gains have been spectacular. So the comparative appeal is incredibly low. It wasn't a "real problem" in '93. Its maybe a thousandth of that problem level today due to technological improvement.

"Hey, if you change everything in your infrastructure, and all your machines, and all your command lines and procedures and ways of thinking to access logs, you MIGHT be 5% more efficient, well, eventually, in the long term" "Eh so what I remember transitioning from spinning rust to SSD and getting 100x the overall system-wide performance a couple years ago, if I want 5% its more economic just to wait for the next tech boost. Also shrinking basically zero load and effort by half is worthless if there's any cost at all, and unfortunately the cost is absolutely huge."

madhouse11y ago

I assume this comment is related to the journal. The article is not.

But, to reply: yes, many organisations have fully functional, well-debugged logging infrastructures. A lot of them also use binary log storage, and have been for over a decade, and are more than satisfied with the solution.

Both rsyslog and syslog-ng have been able to assist with setting such a thing up for about a decade now.

> Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?

The journal has a syslog forwarded, but both rsyslog and syslog-ng can read directly from the journal. Interoperability was there from day one. Smooth upgrade mechanism took a while to iron out, but it's there now, too.

regularfry11y ago· 2 in thread

This is all well and good if you want to, and can, spend time up front figuring out how to parse each and every log line format which might appear in syslog so you can drop it in your structured store.

The alternative is to leave everything unstructured, and understand the formats minimally and lazily. Laziness is a virtue, right?

madhouse11y ago

Why would I need to be able to parse everything up front? Taking the syslog example, that has a commonly understood format. As a default case, I can just split the parts and have structured data (esp. with RFC5424, where structured data is part of the protocol to begin with).

Then, I can add further parsers for the MESSAGE part whenever I feel like it, or whenever there is need. I don't need that up front.

regularfry11y ago

Because in my experience, the interesting stuff isn't in the syslog metadata. It's in the message part. Until you add that further parser, you're grepping.

babuskov11y ago· 2 in thread

If you need to grep logs on regular basis, you're doing it wrong.

Store important data in the database so that you can query it efficiently.

Keep logs for random searches when something unexpected happens. I log gigabytes per day, but only grep maybe once-twice a year.

madhouse11y ago

Agreed. But if I keep my logs in a database, I may aswell use the database to query my logs, instead of grepping in them.

(And voila, you have binary log storage.)

babuskov11y ago

Separate the data you are sure you need often and only store that in the database. Store everything else in the textual logs.

alephnil11y ago· 1 in thread

I guess that much of the resistance against the binary logs of systemd is the unfamiliarity and to some extent lack of well known tools for dealing with them. Sysadmins that have years of experience with traditional Unix tools now suddenly have to start almost from scratch when it comes to everyday tools for examining the system. Not only that, programmers are also most familiar with text based formats, and libraries for handling these formats have to become more available in the most popular programming languages and become familiar for programmers that develop tools for analysing systems. Until that happens, sysadmins feel that they are set back by the introduction of binary logs, even if binary logs are technically superior.

wang_li11y ago

It's like no one remembers the reasons we switched away from fixed format records. The biggest of which is that text based logging is a lot more future proof. Sure I might have to change a regex when time stamps improve their resolution to milliseconds, but at least I won't have to rebuild my entire suite and deal with two incompatible binary files on disk.

halayli11y ago· 1 in thread

logs should be in text. The last thing you want is to find out that your binary format cannot be decoded due to a bug in the logging or because file got corrupted. Not to mention that you won't be able to integrate with a lot of log systems like Splunk and friends.

On the other hand, if you have logs, you need to store them in a centralized place and have an aging policy, etc... Grepping is definitely not the answer. Systems like Splunk exist for a reason.

madhouse11y ago

Please don't confuse log storage with log transport. We can transform the stored format into any other, if so need be.

(For example, I use Kibana at home. Works great, though I have no text logs stored.)

jeady11y ago· 1 in thread

I think the author is conflating several problems here. There are several ways logs can be used, and efficiency is a scale. For example, if I receive a bug report, I like to be able to locate the textual logs from when the incident occurred and actually just sit and read what was happening at the time. On the other hand, if I'm doing higher-level analysis such as what features do users use most, clearly it's more efficient to have some sort of structure format because you're interested in the logs in aggregate. The author makes it sound like they're advocating optimizing for the aggregate use case at the expense of other use cases. I think that the declaration that textual logs are terrible is an oversimplification of the considerations in play.

Also, if the author has a 5-node cluster producing 100Gbs of logs a day, the logs may also be too verbose or poorly organized. I work on a system that produces 100s of Gbs of logs a day but with proper organization they're perfectly manageable.

I think that a more nuanced solution is to log things that are useful to manual examination in text form, but high-frequency events that are not particularly useful could reasonably be logged elsewhere (e.g. a database or binary log that is asynchronously fed into a database).

In conclusion, as is frequently the case with engineering, I think the author oversimplifies the problem here and tries to present a one-size-fits-all solution instead of taking a more pragmatic solution. Textual logs are useful when meant for human consumption (debugging) and when they can be organized such that the logs of interest at any time are limited in size, and some other binary-based format is useful for aggregate higher-level analysis.

madhouse11y ago

With a binary log storage system, nothing stops you from browsing all logs that happened around the time of the incident. Instead of locating the files, you just tell the engine to show you the logs from that time onwards (or from a little bit before).

As for our logs being too verbose: nope, read the article.

Also, it's not an one-size-fits-all solution: I have no problem with people using text. All the article wants to show, is that binary logs are not evil, bad, useless, etc, and that there are actually very good reasons to use them.

For example, storing logs in a database is one kind of binary log storage: most databases don't store the data as text.

henrik_w11y ago· 1 in thread

One solution to the problem of too much logging data can be what I call "session-based logging" (also known as tracing). You can enable logging on a single session (e.g. a phone call), and for that call you get a lot of logging data, much more than a typical logging system.

This obviously only works when you are trouble shooting a specific issue, not when you need to investigate something that happened in the past (where the logging for the session wasn't enabled). However, it has proven to be an excellent tool for troubleshooting issues in the system.

I have used session-based logging both when I worked at Ericsson (the AXE system), and at Symsoft (the Nobill system), and both were excellent. However, I get a feeling that they are not in widespread use (may be wrong on that though), so that's why I wrote a description of them: http://henrikwarne.com/2014/01/21/session-based-logging/

TheLoneWolfling11y ago

Depending on the language, this can be expensive even if you're not actually logging the data.

And it invites timing-based heisenbugs (enable tracing, problem goes away).

Still a neat approach, however.

ghshephard11y ago

If your logs aren't text, and it's a small system, I'm not going to look at them. Therefore they don't exist. That's one reason why people don't like binary logs - they are effectively useless.

On the flip side, if the system is huge - then we can use tools like splunk.

grep/tail/awk are the first three tools I use on any system - if you create logs that I can't manipulate with those three tools, then you haven't created logs for your system that I can use.

bigbugbag11y ago

The title is misleading, I was expecting to discover a better way of dealing with logs in the general case. Instead I got served an attempt of the author to generalize its way as if his quite specific use case could apply to the outside world.

Reading this was a waste of my time.

Being a universal open format text is a better format than binary, unless you don't care about being able to read your data in the future. There's already enough issue with filesystems and storage media, no need to add more complexity to the issue.

agjmills11y ago

The greatest thing that I've found recently was fluentd and elasticsearch - we have fluentd on all of our nodes that aggregate logs to a central fluentd search which dumps all of the data into elastic search, then we use kibana as a graphical frontend to elasticsearch

It took a while to get developers to use it, but now it's indispensable - particularly when someone asks me 'what happened to the 1000 emails I sent last month'

I now know, as previously, the data would have been logrotated

hxn11y ago

Text logs let me do all the things I want to do.

Grep them, tail them, copy and paste, search, transform them, look at them in less, open them in any editor. I love two write little bash oneliners that answer questions about logs. I can use these onliners everywhere anytime.

I dont have any of the efficiency problems the author talks about.

AceJohnny211y ago

The author's use of logs is sophisticated and proactive. Sadly, most Linux installations I've dealt with are lazy and reactive, where logs are kept around "just in case" for future forensics (hah!).

webhat11y ago

I think binary logging is the wrong word to use. As far as I can tell it's not binary he means, but database logging. Storing things in a database sounds far less scary than binary.

At best it's a NUL separated database structure where the fields are not compressed, which IS greppable just use \x00 in your regexp. At worst he might mean BER, which is an ASN.1 data encoding structure.

http://en.wikipedia.org/wiki/X.690#BER_encoding

pdkl9511y ago

So some people want a log format that is more structured than plain text lines. That is going to require some sort of specialized tool. So if a dependency is allowable (instead of leaving the log in a format that is already readable by ~everything), why can't the specialized tool generate an efficient index?

A traditional log with a parallel index would be completely backwards compatible, the query tool should work the same way, and you could even treat the index file as a rebuildable cache which can be useful. The interface presented by a specialized tool doesn't have to depend on any specific storage method.

Really, this recent fad of trying to remove old formats in the believe the old format was somehow preventing any new format from working in parallel reminds me of JWZ's recommendations[1] on mbox "summary files" over the complexity of an actual database. Sometimes you can get the features you want without sacrificing performance or compatibility.

[1] http://www.jwz.org/doc/mailsum.html

zimbatm11y ago

What binary logging solution is the author using if he's not using the systemd journal ?

erikb11y ago

Look at a first year computer science student. He will already put prints in his programs and if he is smart and has a bigger assignment he might already start to write other programs to parse that output. You can't beat that, because it is nearly impossible for a newbie to even know that there might be a problem with text logging and that binary logging might be a solution. In fact he might not even know that what he does is called logging. But he is already doing it!

So even if binary logging is way better (I can't say, not enough experience) you simply can't beat text logging, because text logging is natural. It just happens.

print("Hello World!")

michipili11y ago

Of course grepping log is terrible! Grep is a generic tool, why shouldn't it be defeated by specialised tools?

http://unix-workstation.blogspot.de/2015/05/of-course-greppi...

j / k navigate · click thread line to collapse

102 comments

71 comments · 24 top-level

moonshinefe11y ago· 10 in thread

Maybe I'm missing something.

mrweasel11y ago

As the author himself points out: "I'm sorry, but deciding how much and what we log is not your job. Its ours, and this is the amount we have to deal with."

That goes both ways. If I only have one or two servers, having to run a centralized logging services doesn't scale either, the overhead is not worth the trouble.

I don't really see the point of binary logs. Either you're small enough that text files won't be an issue, or you're large enough to have centralized logging.

lazzlazzlazz11y ago

I think I'm missing the same thing. He keeps going on about structure, but it wasn't obvious to me where the solution (?) actually introduces query-able structure.

He needs a log database, clearly. And when you put it that way, it's obvious why grepping logs is a nice, quick solution in many cases when you aren't getting "100Gb of logs a day".

chakrit11y ago

fletchowns11y ago

Throwing 100GB of data into a relational database and being able to run your queries quickly isn't exactly a no brainer

nosir3311y ago

ars11y ago

"you have 100Gb of logs a day"

Logs have lots of redundancy, so they compress quite nicely. So it is actually practical to grep those files since on disk they are not so large, and 100Gb of memory data is not a problem to grep.

leni53611y ago

The author shows a use case for both a small and a large logging system. The use case is complex queries which spans multiple applications and don't need regex ninja skills but sensible queries.

onli11y ago

he does not. His small logging system is not small at all, it spans multiple systems and has requirements that are not at all typical for small systems.

1 more reply

madhouse11y ago

You're missing the point. I'm not using a custom logging format. I'm using binary log storage, with emphasis on the storage. There is a database and a search engine behind it.

Logging format and log storage format are two very different things.

gambiter11y ago

> There's an important distinction between the two: you can prefer text, if that fits your case better, without hating on binary storage.

blueskin_11y ago· 7 in thread

People don't want it because it's binary, not because you can't grep it.

* you need to use a new proprietary tool to interact with them

* all scripts relating to logs are now broken

* binary logs are easy to corrupt, e.g. if they didn't get closed properly.

>You can have a binary index and text logs too! / You can. But what's the point?

oblio11y ago

What's wrong with Splunk? Honest question.

blueskin_11y ago

Price, mostly. It's good, but there are alternatives that aren't as ridiculously expensive.

1 more reply

fletchowns11y ago

It's expensive

madhouse11y ago

And again, there is no need for proprietary tools at all. Everything I want to do is achievable with free software - so much so, that I use only such software in all my systems.

As for compressing - yeah, no. Please try compressing 100Gb of data and tell me the performance cost is nonexistent.

As for LogStash & ES: Guess what: their storage is binary.

Also note that my article explicitly said that the Journal is unfit for my use cases.

leni53611y ago

Why does it have to be proprietary?

cthalupa11y ago

1 more reply

regularfry11y ago

It doesn't, but nothing is universal like `grep`. If you find a machine that's logging stuff which doesn't have `grep`, you're already having a bad day.

You just can't say that about binary log formats. Text is a lowest common denominator; and yes, that cuts both ways, but the advantages of universality can't be trivially thrown away.

1 more reply

616c11y ago· 7 in thread

I was thinking this would be a cool area of research for me to try programming again, but it seems so daunting I am not sure where to start.

nosir3311y ago

I don't know of any systems that do this.

As an software developer, I generally use log levels to indicate severity in my logs. So grepping for ERROR should catch anything I had the foresight to log at the ERROR level.

Simple heuristics like the number of WARN level logs a minute may be useful.

Beyond that it sounds interesting. It may be hard to do in a general way, so focusing on Apache logs or something common may be a simpler task.

falcolas11y ago

[0] https://github.com/etsy/statsd [1] https://github.com/etsy/skyline

1 more reply

ars11y ago

I use the logwatch program for that. There is no machine learning, it's entirely manual with a large list of things it filters out, but the defaults are quite good.

It emails me any log entires it doesn't know about. I did have to add a large number of ssh lines that it should not bother me about, but other than that it works very well and I find it very useful.

616c11y ago

Cool tip. I remember hearing the name but knowing it had these features. I will definitely check it out.

malka11y ago

So you can use it for other usages (such as sending an admin a mail if suddenly your server sends 500 errors, or a unusual amount of 404 errors for instance)

616c11y ago

What I am going for: use AI to train a passive entry-level sysadmin to warn you.

madhouse11y ago

I experimented with that, and heard others toying with the idea too. There are even products out there that do something similar.

tatterdemalion11y ago· 4 in thread

pjc5011y ago

This is the PowerShell argument. It's a step in the right direction, but it needs the tooling and user community to come along with it.

The advantage of the traditional unix pipe manipulation tools is that most of them are simpler and faster than regex.

ygra11y ago

I think you just described PowerShell (or things that follow down the same path, e.g. TermKit) ;-)

njharman11y ago

JSON is text!

Text is not synonymous with unstructured.

tatterdemalion11y ago

Only entropic bits are truly "unstructured data." The question is one of how much semantic structure you can rely on in the data you are processing, which is a continuum.

onion2k11y ago· 3 in thread

Binary logs are opaque! Just as much as text logs.

I don't agree with the second assertion there. Text logs are only opaque as far as the format is concerned, but not so much as far as the content goes. Using the example in the article;

    127.0.0.1 - - [04/May/2015:16:02:53 +0200] "GET / HTTP/1.1" 304 0 "-" "Mozilla/5.0"

That isn't necessarily an argument against binary logging, but the notion that text log files are opaque in the same way as binary logs isn't really true.

nosir3311y ago

> That's a lot more information than you could get from a binary log without any tools.

Jedd11y ago

(Background: I'm not a journal apologist. For a fact I'm finding it challenging to make the mental shifts required to become adept with this new suite of system tools on my myriad Debian boxen.)

> That's a lot more information than you could get from a binary log without any tools.

onli11y ago

It is not only different. It is also less universal.

3 more replies

datenwolf11y ago· 3 in thread

    > Embedded systems don't have the resources!
    > ...
    > I'd still use a binary log storage, because
    > I find that more efficient to write and parse,
    > but the indexing part is useless in this case.

This is yet again a case of a programmer completely misjudging how an actual implementation will perform in the real world.

Like so often... premature optimization.

madhouse11y ago

Again, transport and storage are different. While I prefer binary storage, most of my transports are text (at least in large part, some binary wrapping may be present here and there).

MrBuddyCasino11y ago

datenwolf11y ago

    > You basically have a very fast laser that
    > can do volumetric scans at a high framerate,
    > did I get this right?

BTW, all the GPU OCT processing and visualization code I wrote, too.

    > What do people typically use it for?

1 more reply

leni53611y ago· 3 in thread

Otherwise I agree with the author that we shouldn't be afraid of binary formats in general, we need much more general formats and tools though (grep, less equivalents).

I'm not fond of "human readable" tree formats like XML or JSON either. bencode could be equally "human readable" as an utf-8 text if one has a less equivalent for bencode.

616c11y ago

https://www.reddit.com/r/linux/comments/1y6q0l/systemds_bina...

Anyone using systemd boxes in production who can comment on this? Flamewar or not, I would like to know more. I do not really care for it one way or the other. Parts I like, parts I do not.

realusername11y ago

TheLoneWolfling11y ago

And especially when you want it to be immune to random failures without data loss.

The last few entries of a log file before something catastrophic happens are precisely the entries that are the most important to make sure they aren't lost.

dsr_11y ago· 2 in thread

Change for the sake of change is anti-engineering. It is anti-productive. Your changes must be improvements, and they must not cost more than they save or generate in a reasonable period of time.

Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?

Forcing a major incompatible change in basic infrastructure rather than offering it as an option to people who want to take advantage of it is an anti-pattern.

VLM11y ago

"Growth was planned for."

madhouse11y ago

I assume this comment is related to the journal. The article is not.

Both rsyslog and syslog-ng have been able to assist with setting such a thing up for about a decade now.

> Where are the rsyslogd and syslog-ng competitors to systemd's journald? Where is the interoperability? Where is the smooth, useful upgrade mechanism?

regularfry11y ago· 2 in thread

The alternative is to leave everything unstructured, and understand the formats minimally and lazily. Laziness is a virtue, right?

madhouse11y ago

Then, I can add further parsers for the MESSAGE part whenever I feel like it, or whenever there is need. I don't need that up front.

regularfry11y ago

Because in my experience, the interesting stuff isn't in the syslog metadata. It's in the message part. Until you add that further parser, you're grepping.

babuskov11y ago· 2 in thread

If you need to grep logs on regular basis, you're doing it wrong.

Store important data in the database so that you can query it efficiently.

Keep logs for random searches when something unexpected happens. I log gigabytes per day, but only grep maybe once-twice a year.

madhouse11y ago

Agreed. But if I keep my logs in a database, I may aswell use the database to query my logs, instead of grepping in them.

(And voila, you have binary log storage.)

babuskov11y ago

Separate the data you are sure you need often and only store that in the database. Store everything else in the textual logs.

alephnil11y ago· 1 in thread

wang_li11y ago

halayli11y ago· 1 in thread

On the other hand, if you have logs, you need to store them in a centralized place and have an aging policy, etc... Grepping is definitely not the answer. Systems like Splunk exist for a reason.

madhouse11y ago

Please don't confuse log storage with log transport. We can transform the stored format into any other, if so need be.

(For example, I use Kibana at home. Works great, though I have no text logs stored.)

jeady11y ago· 1 in thread

madhouse11y ago

As for our logs being too verbose: nope, read the article.

For example, storing logs in a database is one kind of binary log storage: most databases don't store the data as text.

henrik_w11y ago· 1 in thread

TheLoneWolfling11y ago

Depending on the language, this can be expensive even if you're not actually logging the data.

And it invites timing-based heisenbugs (enable tracing, problem goes away).

Still a neat approach, however.

ghshephard11y ago

If your logs aren't text, and it's a small system, I'm not going to look at them. Therefore they don't exist. That's one reason why people don't like binary logs - they are effectively useless.

On the flip side, if the system is huge - then we can use tools like splunk.

grep/tail/awk are the first three tools I use on any system - if you create logs that I can't manipulate with those three tools, then you haven't created logs for your system that I can use.

bigbugbag11y ago

Reading this was a waste of my time.

agjmills11y ago

It took a while to get developers to use it, but now it's indispensable - particularly when someone asks me 'what happened to the 1000 emails I sent last month'

I now know, as previously, the data would have been logrotated

hxn11y ago

Text logs let me do all the things I want to do.

I dont have any of the efficiency problems the author talks about.

AceJohnny211y ago

The author's use of logs is sophisticated and proactive. Sadly, most Linux installations I've dealt with are lazy and reactive, where logs are kept around "just in case" for future forensics (hah!).

webhat11y ago

I think binary logging is the wrong word to use. As far as I can tell it's not binary he means, but database logging. Storing things in a database sounds far less scary than binary.

http://en.wikipedia.org/wiki/X.690#BER_encoding

pdkl9511y ago

[1] http://www.jwz.org/doc/mailsum.html

zimbatm11y ago

What binary logging solution is the author using if he's not using the systemd journal ?

erikb11y ago

So even if binary logging is way better (I can't say, not enough experience) you simply can't beat text logging, because text logging is natural. It just happens.

print("Hello World!")

michipili11y ago

Of course grepping log is terrible! Grep is a generic tool, why shouldn't it be defeated by specialised tools?

http://unix-workstation.blogspot.de/2015/05/of-course-greppi...

j / k navigate · click thread line to collapse