Show HN: Homelab Monitoring Setup with Grafana (opens in new tab)

(blog.randombits.host)

155 pointsconor_f3y ago83 comments

83 comments

68 comments · 17 top-level

bovermyer3y ago· 9 in thread

I'm in the process of building out a Grafana stack (Prometheus, Loki, Tempo, Mimir, Grafana) for my day job right now.

...and also for one of my side projects, OSRBeyond.

It's easy to get overwhelmed by all the moving pieces, but it's also a lot of _fun_ to set up.

danwee3y ago

> It's easy to get overwhelmed by all the moving pieces

Exactly my thoughts! Isn't there something (open source and as good as Prometheus+Grafana) that doesn't have as many moving parts as the stack used by OP? I can imagine there are many use cases for that: from side projects (homelabs) to small startups that don't have huge distributed systems, but still need monitoring (without relying on third-parties).

Ideally, my setup would be:

- install an agent in each server I'm interesting in gathering metrics from. In this regard, Prometheus works just fine

- one service to handle logs/metrics/traces ingestion and that allows you to search and visualize your stuff in nice dashboards. Grafana works, but it doesn't support logs and traces out of the box (you need Loki for that)

So, basically 2 pieces of software (if they can be installed by just dropping a binary, even better)

dracyr3y ago

I think there's nothing currently that combines both logging and metrics into one easy package and visualizes it, but it's also something I would love to have.

Vector[1] would work as the agent, being able to collect both logs and metrics. But the issue would then be storing it. I'm assuming the Elastic Stack might now be able to do both, but it's just to heavy to deal with in a small setup.

A couple of months ago I took a brief look at that when setting up logging for my own homelab (https://pv.wtf/posts/logging-and-the-homelab). Mostly looking at the memory usage to fit it on my synology. Quickwit[2] and Log-Store[3] both come with built in web interfaces that reduce the need for grafana, but neither of them do metrics.

- [1] https://vector.dev - [2] https://quickwit.io/ - [3] https://log-store.com/

3 more replies

Cyph0n3y ago

I use Telegraf (collector) + Influx (storage) + Grafana (visualization and alerting). Telegraf is amazingly simple to use for collection and has a ton of plugins available.

2 more replies

alexose3y ago

My simple-as-dirt solution is generally to use InfluxDB + Grafana. InfluxDB provides a nice HTTP interface that all of my devices simply POST to. I write all the queries myself, because I find that it's a heck of lot easier than to track down individual agents/plugins that actually work.

nullify883y ago

The closest thing may well be Elasticsearch (and Kibana for visualisation), if you are fine with the Elastic license. As its document format is very flexible, it can be used to store logs, metrics, and traces. It'll be a solution inferior to specialised tools like Prometheus, Mirmir, Tempo, though. And some may be put off by the difficulty of running Elasticsearch.

Alternatives could be other general purpose databases.

gmuslera3y ago

Telegraf is a single agent that collects a nice amount of metrics and send it to many databases. I prefer to use telegraf (and scripts) to collect the metrics into influxdb and then grafana.

Telegraf have some log parsing/extraction functionality, but for something more generic promtail+loki would be better.

computershit3y ago

> Grafana works, but it doesn't support logs and traces out of the box

Grafana doesn't support anything out of the box by that logic. Before you get any viz in Grafana you have to add a data source cmon.

stasmo3y ago

I've had great success using this helm chart to install the entire stack into my EKS clusters. Even if you're not using Kubernetes, it's still a useful example for how everything should fit together. https://github.com/prometheus-community/helm-charts/tree/mai...

Good luck! It's a lot.

nullify883y ago

Ditto. I've recently just completed a migration from Thanos to Mimir, and I've found that its much easier to operate and administrate. Still stuck on Elastic for logs but I'm slowly convincing developers Loki can be just as effective.

sjsdaiuasgdia3y ago· 8 in thread

This confirms to me what I suspected when I was trying to determine whether to host my own Grafana stack or use the Grafana Cloud free tier - that I'd end up spending a ton of time fiddling with a constellation of services I didn't actually care about that I could spend on the projects and services I do care about.

I've not found it too hard to stay within the limits of the free tier. The 10 dashboards limit is the main one that actually constrains me, but I just put more stuff on each dashboard and live with the scrolling. The free retention is not great but it's good enough for my purposes.

adql3y ago

IIRC grafana cloud requires to use their importer which was a no-start for me.

Also 14 days retention is not useful for home, I want to know temperature and power stats from last winter, not from last 2 weeks.

Even the "first paid" tier contains only 13 months of retention

I just used VictoriaMetrics all-in-one binary for home stuff + grafana as visualisation

heinrichhartman3y ago

I use Grafana Cloud with OpenTelemery without problems.

trallnag3y ago

What database do you use for storing metrics?

gurchik3y ago

If you are running Kubernetes in your homelab then, for better or for worse, the Prometheus helm chart abstracts all of this away. The default Helm values worked perfectly for me to gather metrics from my cluster and make a quick dashboard in Grafana. Other than increasing the default size of the Prometheus storage volume and configuring the node exporter for a non-Kubernetes host I wanted metrics from, I didn't have to touch anything.

nullify883y ago

Alternatively the 3 chosen tools to ship metrics and logs (CAdvisor, promtail, node-exporter) could be replaced with an all-in-one tool such as Vector or Telegraf. If you wanted to slim it down further, Netdata accomplishes what those 3 tools and Prometheus can do in a really nice UI.

If the poster hosted those services in a single node k3s or something, the kube-prometheus-stack helm chart is able to deploy a lot of those tools easily.

_joel3y ago

> This confirms to me what I suspected when I was trying to determine whether to host my own Grafana stack or use the Grafana Cloud free tier - that I'd end up spending a ton of time fiddling with a constellation of services I didn't actually care about that I could spend on the projects and services I do care about.

This. Although it can be fun to learn, I've done that, got the t-shirt (literally from a conference)

adql3y ago

From my experience once setup it's pretty much "touch never aside from updates" type of deal. I had one stack based off influxdb for 5+ years, now changed the backend for victoriametrics (mostly because a lot of stuff supports prometheus-likes), and again, not much touching after setup.

But I did similar stuff for work so I already had the skills.

LaGrange3y ago

It's more moving part than it should be, but it's not so bad - I set up an equivalent locally in one evening. Sure beats keeping my private server logs off-site.

BrandoElFollito3y ago· 6 in thread

I self host for years about 30 services, out of these 3 are vital (bitwarden, home assistant and pihole).

I work in IT, I am a geek so I tried a few monitoring systems and wrote two myself.

Then I realized that I have self-sustaining, 24/7 monitoring agents: wife and children.

I gave up trying to have the right stack and just wait for them to yell.

Seriously: it works great and it made me wonder WHY I am trying to monitor. Turns out this is more for the fun, discovery of tools than a real need at home.

oaktowner3y ago

Reminds me of the (possibly apocryphal) monitoring that was in place when Healthcare.gov was launched: they had a TV tuned to the news, and the news would tell them whenever the site crashed!

Breza3y ago

Oh hey, that was basically my job! After getting out of grad school, I worked overnight for an HHS contractor writing media summaries for the Secretary. Every night, we would gather all the news stories about the topics HHS told us they wanted to track. At 6:00am, we'd publish the Secretary's briefing and at 7:30am we'd publish the org-wide briefing. To this day, I possess more trivia about the Healthcare.gov launch than I will ever need for any conceivable reason.

applied_heat3y ago

I monitor so that when there is a problem I have some data I can use to troubleshoot the problem and identify possible solutions.

lcnPylGDnU4H9OF3y ago

This is the first time I've heard a parent refer to their currently-living-at-home child as "self-sustaining".

akiselev3y ago

Usually that means they know how to microwave their own hot pockets and open a can of spaghetti-Os.

1 more reply

BrandoElFollito3y ago

It depends on the age.

When they were young they were definitely not self sustaining.

As teenagers they now live on food (either provided when it meets their standards, or the one they cook themselves), water and wi-fi.

tacker20003y ago· 5 in thread

I have been using Zabbix to monitor my servers for the last years, since I wanted something simple and this Grafana/Prometheus stack always scared me because, as the OP says, of the amount of “moving parts”.

Zabbix has been quite solid and has lots of templates for different servers (linux, windows, etc), triggers and can also monitor docker containers (although i never tried that).

The only thing Zabbix cant do well is log file monitoring, so I am considering something like an ELK stack as an addition.

maxmalkav3y ago

I utterly dislike Zabbix (enough to login here and complain). I guess that if it fits your needs is all good and fine, but as someone that has been in charge of defining and feeding it with LLD rules and registering multi-dimension metrics with Zabbix Sender, I feel scarred by it.

I cannot find my way around the Zabbix web interface neither and most of the templates, rules and macros system confused me, deeply.

On the other hand we have a Prometheus + Grafana stack for another system and the model makes all the sense to me. I guess that there is something in time series and graph plotting that just clicks with me.

ASalazarMX3y ago

I've monitored two whole sites with Zabbix, dozens of servers at different companies, and everyone was very satisfied with it, myself included. Zabbix has extensive documentation, no one should be confused after reading the manual, all is explained there. I've fed it through zabbix_sender too, and while it can be a complicated setup, if you design it well it will seldomly need maintenance.

ASalazarMX3y ago

For a homelab use case, Zabbix is all you need. No grafana or ELK or Kibana orother overcomplicated solution, just a Zabbix instance. Simple TCP checks will cover most of your services, and there's web monitoring for special cases. Beyond that and CPU/RAM/Storage monitoring, there's nothing much else to do.

stcroixx3y ago

I'm satisfied with Zabbix too. With something like what OP described, I'd always be worried some integration between all these 'moving pieces' could break and my monitoring would be down without me knowing. Definitely appreciate simplicity with regards to monitoring.

Jnr3y ago

I use Zabbix and Grafana. Grafana has a zabbix data source plugin so you can have best of both worlds really.

adql3y ago· 4 in thread

I've found VictoriaMetrics all-in-one binary to be perfect size for home at the very least for metrics gathering.

Supports Prometheus querying and few other formats for ingesting so any knowledge bout "how to get data into prometheus" applies pretty much 1:1 + their own vmagent is pretty advanced. Not related to company in any way, just a happy user.

https://victoriametrics.com/

donmcronald3y ago

I’ll never understand how companies with a UI focused product end up with websites that don’t have any screenshots of the UI. I spent over a minute on the site and couldn’t find a screenshot.

dengolius3y ago

Could you explain why you had that thought? AFAIK VictoriaMetrics is a backend product that is positioned as an alternative to Prometheus/Graphite/InfluxDB/OpenTSDB, perfectly works with Grafana. It has its datasource for Grafana and yes, it has integrated VMUI which you can try at https://play.victoriametrics.com.

1 more reply

ianschmitz3y ago

It’s a database

applied_heat3y ago

I haven’t started using it yet but i identified Victoria metrics as the first time series database I would try as a replacement for our wonderware historian so I won’t have to use AVEVA’s half baked web dashboard product and can use grafana or something else sane instead

majkinetor3y ago· 4 in thread

Is there anything easier for logs? Basically glorified ripgrep?

mekster3y ago

Don't use Grafana for logs. Its (or any other basically, including Kibana or Graylog) interface sucks and I used Metabase which has much more friendly interface to show logs in tabular format instead of throwing raw logs like the others.

I collect logs with Vector on each instances and sent to central ClickHouse which Metabase reads from.

majkinetor3y ago

Interesting. I am already trying that with dBeaver and ClickHouse with docker.

Used this tutorial:

https://clickhouse.com/docs/en/integrations/vector

My services usually produce around 2GB of log data per day. From quick read on the CH I beleive it should not be a problem. Not sure how big the database it is but zip compressed log data is around that size for entire month.

jacksgt3y ago

I think Loki is pretty much the easiest thing you can find (if you want it to be multi server, at least). Loki whole approach comes down to avoiding expensive indexing (compared to Elastic search et all.), and really on "grep" for searching instead.

majkinetor3y ago

What I hate about Elastic is it special grep syntax I can never get right...

I tried loki around v1.0 and it didn't seem to offer much back then...

conor_fOP3y ago· 3 in thread

Hey everyone, this is a post I've been working on the past few months about setting up my own monitoring stack with Grafana for my home server.

I'd love your feedback on how this process could be easier for me, some resources on learning the Grafana query languages, and general comments.

Thanks for taking the time to read + engage!

bombcar3y ago

What does the monitoring actually do for you? I've seen these setups, even setup one for myself a few times (either Grafana or similar such as Netdata, or Linode's Longview) but I've not really seen what it does for me beyond the "your disk is almost full" warnings.

Cyph0n3y ago

I recently setup basic monitoring using Telegraf + Influx + Grafana. Here are the alert triggers, in order of importance (imo):

* ZFS pool errors. Motivator: one of my HDDs failed and it took me a few days to notice. The pool (raidz1) kept chugging along of course.

* HDD and SSD SMART errors

* High HDD and SSD temperatures

* ZFS pool utilization

* High CPU temperature. Motivator: one of my case fans failed and it took a while for me to notice.

* High GPU temperatures. Motivator: I have two GPUs in my tower, one of which I don't really monitor (used for transcoding).

* High (sustained) CPU usage. I track this at the server level, rather than for individual VMs.

1 more reply

aseipp3y ago

Continuous performance monitoring of a service, from its inception. I'm building a storage service using SeaweedFS and also a web UI for another project. One thing I'm looking at doing is using k6[1] in order to do performance stress testing of API endpoints and web frontends on a continuous basis under various conditions.[2] For example, I'm trying to lean hard into using R2/S3 for storage offload, so my question is: "What does it look like when Seaweed offloads a local volume chunk to S3 aggressively, and what is the impact of that in a 90/10 hot/cold split on objects?" Maybe 90/10 storage splits are too aggressive or optimistic to hit a specific number. Every so often -- maybe every day at certain points, or a bigger global test once a week -- you run k6 against all these endpoints, record the results, and shuffle them into Prometheus so you can see if things get noticeably worse for the user. Test login flows under bad conditions, when objects they request are really cold or large paginations occur, etc.

You can run numbers manually but I think designing for it up front is really important to keep performance targets on lock. That's where Prometheus and Grafana come in. And I think looking at performance numbers is a really good way to help understand systems dynamics and helps you ask why something is hitting some threshold. On the other hand, there are so many tools and they're often fun to play with, it's easy to get carried away. There's also a pretty reasonable amount of complexity involved in setting it up, so it's also easy to just say fuck it a lot of times and respond to issues on demand instead.

[1] http://k6.io/, it's also a Grafana project.

[2] It can test both normal REST endpoints but also browsers thanks to the use of headless chrome/chromium! So you can actually look at first paint latency and things like that too.

shrx3y ago· 3 in thread

Mildly related: can anyone recommend a time series database that supports easy aggregation by week (with the ability to configure the start of the week) and month? I'm looking for something to switch from InfluxDB which I'm currently using. The linked article is using Prometheus which also doesn't appear to support this functionality.

sohooo3y ago

You could take a look at Postgres + TimescaleDB extension, which offers a nice time_bucket() function on its hypertables[1]. You can also materialize using continuous aggregates („self updating“ materialized views).

1: https://docs.timescale.com/api/latest/hyperfunctions/time_bu...

shrx3y ago

Thanks, this looks exactly what I want. Sensible interval origins [1] too (January 1, 2000 for months and years, and January 3 2000, a Monday, for weeks) and also configurable.

[1] https://docs.timescale.com/use-timescale/latest/time-buckets...

cocire3y ago

would love an answer to this as well! something with great Python (Flask, maybe even SQLAlchemy) support would be cool too

artisin3y ago· 3 in thread

I went down the Grafana rabbit hole, and without a doubt, it's a fantastic tool. It can handle just about any kind of data you throw at it, and when it comes to visualizing time series data, it's second to none. That said, it's a slog to set up and configure, but once finished, I had a beautiful dashboard for my home media server, and life was good. Unfortunately, a few months later, I was forced to upgrade and lacked the time to reconfigure Grafana. So, as a stopgap, I installed Netdata... fast-forward two years, and today I still haven't reconfigured Grafana, nor do I plan to.

For my use case, a home media server, Netdata turned out to be way simpler to set up, and, most importantly, way less of a hassle/dink-around. It's a basic plug-and-play operation with auto-discovery. While the dashboard isn't nearly as beautiful or configurable, it gets the job done and provides everything I pretty much need or want. It offers a quick overview, historical metrics (over a year of data) to analyze trends or spot potential issues, and push/email notifications if something goes awry.

If you decide to go down this route, there are two major items:

1. You'll need to configure the dbengine[1] database to save and store historical metric data. However, I found the dbengine configuration documentation to be a bit confusing, so I'll spare you the trouble - just use this Jupyter Notebook[2]. If needed, adjust the input, run it, scroll down, and you'll see a summary of the number of days, the maximum dbengine size, and the yaml config, which you can copy, paste, and voila.

2. If you're hoarding data, you'll probably want to set up smartmontools/smartd[3] in a separate Docker container for better disk monitoring metrics. However, I think you can enable hddtemp[4] with Netdata through the config if you don't want or need the extra hassle. You can have Netdata to query this smartd container, but with a handful of disks, it ends up timing out frequently, so I found it's best to simply set up smartd/smartd.conf to log out the smartd data independently. Then all you need to do is tell Netdata where to find the smartd_log[5], and Netdata handles the rest.

Boom, home media server metrics with historical data, done. It still takes a bit of time to set up, but way less than Grafana. Anywho, hopefully, this saves you from wasting as much time as I did. And if you're looking for a smartd reference, shoot me a reply, and I'll tidy up and share my Docker config/scripts and notes.

[1] https://learn.netdata.cloud/docs/typical-netdata-agent-confi... [2] https://colab.research.google.com/github/andrewm4894/netdata... [3] https://www.smartmontools.org/wiki [4] https://github.com/vitlav/hddtemp [5] https://learn.netdata.cloud/docs/data-collection/storage,-mo...

zokier3y ago

Is there way to self-host the netdata "cloud"?

artisin3y ago

Unfortunately, no. At the same time, you don't need their "cloud" to use or run an instance of netdata. From what I gather, it's more or less intended to help monitor multiple instances of netdata on different machines. When I first installed netdata, I initially thought it required the use of their cloud to store historical metrics, but this is not the case. You just need to configure the dbengine, as I mentioned in my post, and you'll be good to go without their cloud.

Jemaclus3y ago

There's an Open Source version: https://github.com/netdata/netdata

I don't know if it has the same features or not, but it looks like you can set it up yourself.

shashasha23y ago· 2 in thread

We've been using nagios and munin for years, this stack is rock solid. We added recently ELK. This feels overkill, heavyweight and fragile.

beAbU3y ago

This is for a homelab. I think overkill is the point.

gmuslera3y ago

You can configure some nagios addons for performance metrics collecting, but is better to have a single/efficient/granular enough metrics collector. And time series monitoring helps a bit being proactive on bad trends or to give good context for past events.

whalesalad3y ago· 1 in thread

check out netdata if y'all haven't already - incredible software

mekster3y ago

People should realize the best part of Netdata is that it can export data to be stored for Grafana to consume.

I don't like its own UI but no need to use it and it can easily gather metrics from systemd services and containers.

czzzzz3y ago· 1 in thread

Shameless plug for AppScope (https://github.com/criblio/appscope) which is designed for exactly this. Capturing observability data from processes in your environment without code modification, and shipping the data off to tools like grafana for monitoring.

cbreshears3y ago

+1 on AppScope!

hardwaresofton3y ago· 1 in thread

Has anyone had lots of trouble configuring Grafana via YAML from the documentation? A lot of it is kind of hard to follow.

I've found that the ability to (pre)configure Grafana without clicking around in it is pretty difficult.

sohooo3y ago

I recommend using your preferred flavor of configuration management tool. It is tricky, especially when you want to provision multiple users in different Grafana organizations, data sources, and their dashboards, but it can be done (I prefer Puppet because of its flexible language, but Ansible should also work).

guybedo3y ago· 1 in thread

shameless plug for uptimeFunk (https://uptimefunk.com) that i soft launched a some time ago. I wanted some uptime monitoring with nice ui and a few advanced features that i didn't find anywhere: - monitoring mongo db/replicaset status

- monitoring sql databases with basic sql queries

- monitoring host cpu, ram and disk usage

- monitoring docker containers

- and being able to monitor all of this through ssh tunnels because not all my services are on the internet

ghoshbishakh3y ago

1. Looks nice 2. Where are some documentation pages showing how simple it is to set up for example monitoring a sql database? 3. Expensive 4. No alerts?

codetrotter3y ago

I recently set up packet loss monitoring on a Raspberry Pi, using Prometheus for logging and graphing.

https://video.nstr.no/w/hjTH3Vggn2fvpTrQitMmVP

I would like to set up Grafana and more monitoring as well, on some of my other machines. But for now this is what I have :D

revskill3y ago

Just push to github and people will contribute the rest for you. Easy!

Demmme3y ago

With 40 containers I would go kubernetes and with Kube stack you basically have this up and running in 5 minutes.

Aligning metric endpoints for fine-tuning.

Add tracing to it in a few more clicks

j / k navigate · click thread line to collapse

83 comments

68 comments · 17 top-level

bovermyer3y ago· 9 in thread

I'm in the process of building out a Grafana stack (Prometheus, Loki, Tempo, Mimir, Grafana) for my day job right now.

...and also for one of my side projects, OSRBeyond.

It's easy to get overwhelmed by all the moving pieces, but it's also a lot of _fun_ to set up.

danwee3y ago

> It's easy to get overwhelmed by all the moving pieces

Ideally, my setup would be:

- install an agent in each server I'm interesting in gathering metrics from. In this regard, Prometheus works just fine

So, basically 2 pieces of software (if they can be installed by just dropping a binary, even better)

dracyr3y ago

I think there's nothing currently that combines both logging and metrics into one easy package and visualizes it, but it's also something I would love to have.

- [1] https://vector.dev - [2] https://quickwit.io/ - [3] https://log-store.com/

3 more replies

Cyph0n3y ago

I use Telegraf (collector) + Influx (storage) + Grafana (visualization and alerting). Telegraf is amazingly simple to use for collection and has a ton of plugins available.

2 more replies

alexose3y ago

nullify883y ago

Alternatives could be other general purpose databases.

gmuslera3y ago

Telegraf is a single agent that collects a nice amount of metrics and send it to many databases. I prefer to use telegraf (and scripts) to collect the metrics into influxdb and then grafana.

Telegraf have some log parsing/extraction functionality, but for something more generic promtail+loki would be better.

computershit3y ago

> Grafana works, but it doesn't support logs and traces out of the box

Grafana doesn't support anything out of the box by that logic. Before you get any viz in Grafana you have to add a data source cmon.

stasmo3y ago

Good luck! It's a lot.

nullify883y ago

sjsdaiuasgdia3y ago· 8 in thread

adql3y ago

IIRC grafana cloud requires to use their importer which was a no-start for me.

Also 14 days retention is not useful for home, I want to know temperature and power stats from last winter, not from last 2 weeks.

Even the "first paid" tier contains only 13 months of retention

I just used VictoriaMetrics all-in-one binary for home stuff + grafana as visualisation

heinrichhartman3y ago

I use Grafana Cloud with OpenTelemery without problems.

trallnag3y ago

What database do you use for storing metrics?

gurchik3y ago

nullify883y ago

If the poster hosted those services in a single node k3s or something, the kube-prometheus-stack helm chart is able to deploy a lot of those tools easily.

_joel3y ago

This. Although it can be fun to learn, I've done that, got the t-shirt (literally from a conference)

adql3y ago

But I did similar stuff for work so I already had the skills.

LaGrange3y ago

It's more moving part than it should be, but it's not so bad - I set up an equivalent locally in one evening. Sure beats keeping my private server logs off-site.

BrandoElFollito3y ago· 6 in thread

I self host for years about 30 services, out of these 3 are vital (bitwarden, home assistant and pihole).

I work in IT, I am a geek so I tried a few monitoring systems and wrote two myself.

Then I realized that I have self-sustaining, 24/7 monitoring agents: wife and children.

I gave up trying to have the right stack and just wait for them to yell.

Seriously: it works great and it made me wonder WHY I am trying to monitor. Turns out this is more for the fun, discovery of tools than a real need at home.

oaktowner3y ago

Reminds me of the (possibly apocryphal) monitoring that was in place when Healthcare.gov was launched: they had a TV tuned to the news, and the news would tell them whenever the site crashed!

Breza3y ago

applied_heat3y ago

I monitor so that when there is a problem I have some data I can use to troubleshoot the problem and identify possible solutions.

lcnPylGDnU4H9OF3y ago

This is the first time I've heard a parent refer to their currently-living-at-home child as "self-sustaining".

akiselev3y ago

Usually that means they know how to microwave their own hot pockets and open a can of spaghetti-Os.

1 more reply

BrandoElFollito3y ago

It depends on the age.

When they were young they were definitely not self sustaining.

As teenagers they now live on food (either provided when it meets their standards, or the one they cook themselves), water and wi-fi.

tacker20003y ago· 5 in thread

Zabbix has been quite solid and has lots of templates for different servers (linux, windows, etc), triggers and can also monitor docker containers (although i never tried that).

The only thing Zabbix cant do well is log file monitoring, so I am considering something like an ELK stack as an addition.

maxmalkav3y ago

I cannot find my way around the Zabbix web interface neither and most of the templates, rules and macros system confused me, deeply.

ASalazarMX3y ago

stcroixx3y ago

Jnr3y ago

I use Zabbix and Grafana. Grafana has a zabbix data source plugin so you can have best of both worlds really.

adql3y ago· 4 in thread

I've found VictoriaMetrics all-in-one binary to be perfect size for home at the very least for metrics gathering.

https://victoriametrics.com/

donmcronald3y ago

I’ll never understand how companies with a UI focused product end up with websites that don’t have any screenshots of the UI. I spent over a minute on the site and couldn’t find a screenshot.

dengolius3y ago

1 more reply

ianschmitz3y ago

It’s a database

applied_heat3y ago

majkinetor3y ago· 4 in thread

Is there anything easier for logs? Basically glorified ripgrep?

mekster3y ago

I collect logs with Vector on each instances and sent to central ClickHouse which Metabase reads from.

majkinetor3y ago

Interesting. I am already trying that with dBeaver and ClickHouse with docker.

Used this tutorial:

https://clickhouse.com/docs/en/integrations/vector

jacksgt3y ago

majkinetor3y ago

What I hate about Elastic is it special grep syntax I can never get right...

I tried loki around v1.0 and it didn't seem to offer much back then...

conor_fOP3y ago· 3 in thread

Hey everyone, this is a post I've been working on the past few months about setting up my own monitoring stack with Grafana for my home server.

I'd love your feedback on how this process could be easier for me, some resources on learning the Grafana query languages, and general comments.

Thanks for taking the time to read + engage!

bombcar3y ago

Cyph0n3y ago

I recently setup basic monitoring using Telegraf + Influx + Grafana. Here are the alert triggers, in order of importance (imo):

* ZFS pool errors. Motivator: one of my HDDs failed and it took me a few days to notice. The pool (raidz1) kept chugging along of course.

* HDD and SSD SMART errors

* High HDD and SSD temperatures

* ZFS pool utilization

* High CPU temperature. Motivator: one of my case fans failed and it took a while for me to notice.

* High GPU temperatures. Motivator: I have two GPUs in my tower, one of which I don't really monitor (used for transcoding).

* High (sustained) CPU usage. I track this at the server level, rather than for individual VMs.

1 more reply

aseipp3y ago

[1] http://k6.io/, it's also a Grafana project.

[2] It can test both normal REST endpoints but also browsers thanks to the use of headless chrome/chromium! So you can actually look at first paint latency and things like that too.

shrx3y ago· 3 in thread

sohooo3y ago

1: https://docs.timescale.com/api/latest/hyperfunctions/time_bu...

shrx3y ago

Thanks, this looks exactly what I want. Sensible interval origins [1] too (January 1, 2000 for months and years, and January 3 2000, a Monday, for weeks) and also configurable.

[1] https://docs.timescale.com/use-timescale/latest/time-buckets...

cocire3y ago

would love an answer to this as well! something with great Python (Flask, maybe even SQLAlchemy) support would be cool too

artisin3y ago· 3 in thread

If you decide to go down this route, there are two major items:

zokier3y ago

Is there way to self-host the netdata "cloud"?

artisin3y ago

Jemaclus3y ago

There's an Open Source version: https://github.com/netdata/netdata

I don't know if it has the same features or not, but it looks like you can set it up yourself.

shashasha23y ago· 2 in thread

We've been using nagios and munin for years, this stack is rock solid. We added recently ELK. This feels overkill, heavyweight and fragile.

beAbU3y ago

This is for a homelab. I think overkill is the point.

gmuslera3y ago

whalesalad3y ago· 1 in thread

check out netdata if y'all haven't already - incredible software

mekster3y ago

People should realize the best part of Netdata is that it can export data to be stored for Grafana to consume.

I don't like its own UI but no need to use it and it can easily gather metrics from systemd services and containers.

czzzzz3y ago· 1 in thread

cbreshears3y ago

+1 on AppScope!

hardwaresofton3y ago· 1 in thread

Has anyone had lots of trouble configuring Grafana via YAML from the documentation? A lot of it is kind of hard to follow.

I've found that the ability to (pre)configure Grafana without clicking around in it is pretty difficult.

sohooo3y ago

guybedo3y ago· 1 in thread

- monitoring sql databases with basic sql queries

- monitoring host cpu, ram and disk usage

- monitoring docker containers

- and being able to monitor all of this through ssh tunnels because not all my services are on the internet

ghoshbishakh3y ago

1. Looks nice 2. Where are some documentation pages showing how simple it is to set up for example monitoring a sql database? 3. Expensive 4. No alerts?

codetrotter3y ago

I recently set up packet loss monitoring on a Raspberry Pi, using Prometheus for logging and graphing.

https://video.nstr.no/w/hjTH3Vggn2fvpTrQitMmVP

I would like to set up Grafana and more monitoring as well, on some of my other machines. But for now this is what I have :D

revskill3y ago

Just push to github and people will contribute the rest for you. Easy!

Demmme3y ago

With 40 containers I would go kubernetes and with Kube stack you basically have this up and running in 5 minutes.

Aligning metric endpoints for fine-tuning.

Add tracing to it in a few more clicks

j / k navigate · click thread line to collapse