Logging: Unsexy, Important, and now Usable. (opens in new tab)

(roadtofailure.com)

24 pointsgsteph2216y ago28 comments

28 comments

21 comments · 9 top-level

gsteph22OP16y ago· 4 in thread

It's true that LogSearch is similar, but we focus on cloud, analytics, ease-of-use and scalability -- each of which we'ce heard Splunk lacks.

n8agrin16y ago

I don't mean to come off sounding like a mouthpiece for Splunk, but Splunk does work with "cloud" data (hurray for buzz words), it is dead simple to use out of the box and recognizes many special fields like timestamps without any configuration (and also provides extensive configurability when needed) and scales like the distributed monster it is, capable of handling 10-15 MB of data / sec. on a single machines and GBs worth of data per day when scaled out across multiple machines. You should try it before you judge it on hearsay (and I'll maintain that you should also try any Splunk competitor before judging them as well).

matrix16y ago

I have tried Splunk. It's a great concept and I commend the Splunk team for building such an easy to use, polished product. However we found the query performance to be... ahem... not well geared towards large data sets. This combined with the licensing model meant it wasn't an option for us.

I just wish Yahoo would open source Everest (their multi-PB column store DB based on PostgreSQL) -- this would be ideal for building an open source Splunk competitor.

3 more replies

n8agrin16y ago

I miss spoke. Splunk handles multiple TBs per day easily in a distributed environment.

gsteph22OP16y ago

We have tried them :) And so have several of our customers.

randolph_carter16y ago· 3 in thread

Out of curiosity, can you be more specific on what scalability, analytics, and ease-of-use improvements LogSearch offers vs. Splunk in particular? We use Splunk internally and have not found it to be significantly lacking in these areas but are always open to alternatives. Additionally, what is the advantage of relegating log management to a cloud application? Typically log data is relatively sensitive; do you provide assurances that the data won't be misused, lost, repurposed, etc.?

gsteph22OP16y ago

Thanks! Actually our software runs in the cloud or in your datacenter. If it's in the cloud, it can be encrypted in trasmission (or stored encrypted if you're willing to take the performance hit).

If it's sensitive data, I'd recommend just spinning up your own cluster and installing the tool.

randolph_carter16y ago

Thanks for the additional context... like one of the other posters mentioned, I think once your target organization has evolved beyond the capabilities of the existing products in this space, the knowledge and analytics built-in to the product become the product differentiator. Scalability is assumed. For known problems, the real value of a log analysis tool is in the knowledgebase you buy with the product and the canned reports you can get out of it; that's why there are specialized SIM/SIEM products out there. If it is an unknown problem, your customer will end up generating the knowledge base themselves anyway. Unfortunately, many of your customers may not even know what their problems are, and will expect you to sell them an "Easy button" solution...

All that said, good luck in your endeavors!

gsteph22OP16y ago

Oh, and as far as scale-wise: we run on the "web scale", which means a dozen to hundreds or more of computers, each generating tons of logs. It's why we're built on Hadoop. It's the main complaint our customer research indicated people have with tools like Splunk.

1 more reply

n8agrin16y ago· 2 in thread

"We want to be like Geordi in Star Trek, who can see everything on the Enterprise with a few finger presses. I wish there was such a thing! The closest we can get is ganglia, which monitors OS-level events (it can be hacked for apps)."

I disagree. Large software companies already exist in this space: Splunk, LogLogic, Arcsight, etc. The author Sounds like they reinvented Splunk in particular. (disclaimer: I work for Splunk and can see everything in your datacenter with a few finger presses. Call me Geordi LaForge.)

eru16y ago

I hope it's not as painful as Geordi's visor.

RyanMcGreal16y ago

The client app is called BananaClip.

ajross16y ago· 2 in thread

FTA: This means in order to find what you want, you need to explore the data: you need to search it. The only tool most of us can use is grep

Um... what? Have the authors gotten stuck in a time vortex and been dropped off before, y'know, awk? Much less perl, or any of those new-fangled toys.

I mean: writing scripts to do log analysis is a pretty fundamental problem for server-side development, and lots of very smart people have spend the last two^H^H^Hthree decades working on tools to address the issue.

I don't even see how this (indexing the entries across a Hadoop cluster) is all that useful. In general, you don't do log analysis by asking "give me all the entries that match this pattern", you do it by walking them in order and extracting one or two fields from each line and building some kind of result data structure. This thing would be fine if you were asking for all the logs messages that mentioned "coffee", I guess. But what if you wanted a histogram of hit counts per page per day-of-week?

gsteph22OP16y ago

Thanks for providing this valuable perspective. I sort of group grep/Perl scripts/everything else together as manual processes. What I was getting at is the whole "roll your own" scripts is a royal pain in the ass.

For analytics, you're right, search is only part of the equation. That's why we make MapReduce easy to use on a cluster. You can write Pig or Hive scr

We also have templates for common data formats (and ways to roll your own) so you can turn unstructured log text into structured data, so that a histogram of hit counts per day-of-week is just a few lines of a script (or maybe even a search).

gsteph22OP16y ago

Looks like I got cut off in the middle. *You can write Pig or Hive scripts to generate interesting analytics.

bravura16y ago· 1 in thread

I worry, like the other posters, that cloud logging + search is a solution to a problem that doesn't exist.

gsteph22OP16y ago

It's not only the cloud, it's also in the datacenter :)

jawn16y ago

Hi I work in the log importation services business.

From my perspective the main hurdle to log aggregation/correlation is not scalability. If splunk doesn't cut it for your performance needs, you have probably hit the price point to where you can afford a loglogic or similar appliance.

Instead the barrier to entry is in the number of applications supported by a particular log archival product, and the ability to correlate across the different applications.

As I'm sure you know at this point, adding support for log types is a painstaking task. Most vendors punt on this and tell customers to do it themselves.

If there is a niche available to you as a startup I would think that it would be in offering a very low turnaround time in supporting new log types. For example: give us some log sources and we'll support and categorize your logs with our service.

As for running in the cloud on large datasets, I think you'll find that most customers are not going to want to double or triple their outgoing bandwidth -- In addition to concerns from a security compliance standpoint.

That being said, good luck in your venture. Logging is a mess, and could certainly use some clean up. :)

xal16y ago

For 99% of all companies a simple unix box that collects log files and moves them along for permanent storage a few days later (S3/Tape) is enough. At Shopify we use Clarity which is a web frontend for Grep and Tail that we released as open source: http://github.com/tobi/clarity

gsteph22OP16y ago

The main thing is: we focus on bringing a cost and performance advantage to the small and medium companies, to make their lives a little better.

spudlyo16y ago

j / k navigate · click thread line to collapse

28 comments

21 comments · 9 top-level

gsteph22OP16y ago· 4 in thread

It's true that LogSearch is similar, but we focus on cloud, analytics, ease-of-use and scalability -- each of which we'ce heard Splunk lacks.

n8agrin16y ago

matrix16y ago

I just wish Yahoo would open source Everest (their multi-PB column store DB based on PostgreSQL) -- this would be ideal for building an open source Splunk competitor.

3 more replies

n8agrin16y ago

I miss spoke. Splunk handles multiple TBs per day easily in a distributed environment.

gsteph22OP16y ago

We have tried them :) And so have several of our customers.

randolph_carter16y ago· 3 in thread

gsteph22OP16y ago

Thanks! Actually our software runs in the cloud or in your datacenter. If it's in the cloud, it can be encrypted in trasmission (or stored encrypted if you're willing to take the performance hit).

If it's sensitive data, I'd recommend just spinning up your own cluster and installing the tool.

randolph_carter16y ago

All that said, good luck in your endeavors!

gsteph22OP16y ago

1 more reply

n8agrin16y ago· 2 in thread

eru16y ago

I hope it's not as painful as Geordi's visor.

RyanMcGreal16y ago

The client app is called BananaClip.

ajross16y ago· 2 in thread

FTA: This means in order to find what you want, you need to explore the data: you need to search it. The only tool most of us can use is grep

Um... what? Have the authors gotten stuck in a time vortex and been dropped off before, y'know, awk? Much less perl, or any of those new-fangled toys.

gsteph22OP16y ago

For analytics, you're right, search is only part of the equation. That's why we make MapReduce easy to use on a cluster. You can write Pig or Hive scr

gsteph22OP16y ago

Looks like I got cut off in the middle. *You can write Pig or Hive scripts to generate interesting analytics.

bravura16y ago· 1 in thread

I worry, like the other posters, that cloud logging + search is a solution to a problem that doesn't exist.

gsteph22OP16y ago

It's not only the cloud, it's also in the datacenter :)

jawn16y ago

Hi I work in the log importation services business.

Instead the barrier to entry is in the number of applications supported by a particular log archival product, and the ability to correlate across the different applications.

As I'm sure you know at this point, adding support for log types is a painstaking task. Most vendors punt on this and tell customers to do it themselves.

That being said, good luck in your venture. Logging is a mess, and could certainly use some clean up. :)

xal16y ago

gsteph22OP16y ago

The main thing is: we focus on bringing a cost and performance advantage to the small and medium companies, to make their lives a little better.

spudlyo16y ago

j / k navigate · click thread line to collapse