Syslog-NG can already split your log files into subdirectories with the hostname of each server, but it also has the capability of redirecting messages to named pipes. This is great because you can pipe it into mysql and stuff all of your log messages in a database. Combine with a php front-end and now your developers and sysadmins can search logs intelligently across multiple servers, and get really fine-grained on their search strings. Want to tail the output from all Tomcat servers in your app server pool looking for a specific string? Go right ahead.
Has anyone else experienced this? Is it just a simple configuration tuning problem?
Rather unlikely unless your deployment is very large or you're doing extraordinarily expensive filtering/binning on the sink-host.
The first bottleneck is normally diskspace, not disk I/O. Those logs pile up very quickly, depending on how long you retain them.
The raw network and disk I/O, however, are rarely of concern. Before you approach either limit you're already logging to the tune of ~300G per hour - and have probably switched to a distributed architecture of some sort long ago.
Writing broad streams of sequential text is very cheap.
Making sense of what you wrote, ideally before actually writing it, but at the very least before being forced to purge it due to storage constraints, is the difficult part. ;-)
Looking at limits on open files, and considering tcp timeouts and cookies were all we needed. Total volume was in the order of high 10s of GB.
If you want to unpack offline, contact details are in my profile.
If you're running a whole datacenter's worth of logs into your loghost, then you might want to consider a distributed approach. For medium sized companies, I don't see why rsyslog wouldn't do the trick.
On the other hand, a lot of high security environments want to encrypt their syslog traffic using something like stunnel, which introduces OpenSSL overhead as well as TCP connection overhead. With thousands of clients and lots of encryption going on you definitely are going to hit some limits sooner rather than later. Check kernel parameter net.ipv4.ip_local_port_range (on Linux) and make sure you have a large enough range to accomodate all of the clients.
Besides longer log message (arbitrarily long, with a recompile) and reliable delivery, it obviates my main use for logrotate, since it can be configured to write to a filename (including directory) based on time, date, or other variables.
Splunk, given its cost and complexity, is almost never right for startups.
Non-ng syslog is, on the other hand, so simplistic that it's not worth the effort of fancy configuration. Is there some kind of compelling advantage that I've been overlooking?
I never quite understood the conceit that every environment is a precious-and-unique snowflake requiring careful evaluation of any given tool.
We use scribe and we have a stdin2scribe program (python) that can be used to hook into any log output (like apache access and error logs). We have it set up in a two tier system, all systems that we'd want to log from run a "scribe leaf" on a port on localhost, and this forwards all logs to a "scribe aggregator" (behind a load balancer), with a buffering space on the local disk when the aggregator can not be contacted. It's a pretty solid system and I recommend it.
We also have services, and command line and library interfaces to those services, that let you grab all the logs that came in on a certain time frame, or tail all the data coming into the aggregators in real time (one of them is a wrapper around a more generic tool that just tags the logs, the wrapper takes grep-style filtering arguments and the output is pipped to a pretty printer).