undefined | Better HN

0 pointsfauigerzigerk9y ago0 comments

>They said availability was "death of a child", not dropping log messages.

True, but it appears to me that availability problems and dropped log messages often have the same root cause - network issues.

So whenever they do have availability issues (and dying babies) they won't be able to investigate properly because log messages are being lost as well.

That's obviously a very general observation. It may well be that in their architecture availability issues are mostly caused by something unrelated to networking (e.g. the database).

0 comments

2 comments · 1 top-level

dkarapetyan9y ago· 1 in thread

It would be quite simple to have a two tiered approach to the logging problem since they have separated it into 2 components. One can just write and ship files while the other is what they have described in terms of providing real time streaming.

So the question then becomes what are the failures modes of their logging setup in terms of misbehaving clients? I don't know how kafka handles misbehaving clients. I suspect it would lead to global effects and slowdown of the entire cluster because of 1 or 2 misbehaving clients whereas in the current set up local misbehavior will be localized to the nearest aggregator dropping messages. Simple memory usage and other kinds of monitoring can then be used to find these issues and then mitigate them accordingly.

This is still a heck of lot simpler setup than using kafka and worrying about all sorts of weird distributed system failure modes. I'm sure kafka got them started initially but continuing to use it is like using a sledgehammer to kill a fly. For the use case they have this setup is the correct one and migrating to kafka if it becomes necessary will be possible. So in my view this is proper engineering. They've made all the right trade-offs instead of just chasing fads and trends.

fauigerzigerkOP9y ago

>It would be quite simple to have a two tiered approach to the logging problem since they have separated it into 2 components. One can just write and ship files while the other is what they have described in terms of providing real time streaming.

Yes, they absolutely could do that, but they apparently don't. And maybe that's because they would lose a lot of the simplicity they won by ditching Kafka.

Anyway, I didn't want to defend Kafka specifically. The one time I considered it, I ended up not using it because it seemed too heavy weight for my use case in terms of memory usage and complexity.

j / k navigate · click thread line to collapse