Incidentally yesterday I created a simple exporter for (linux) coretemp, hddtemp and NUT's upsc: https://github.com/andmarios/sensor_exporter
Still slowly hacking on my homelab Prometheus bringup, but I've already got a start developing a battery level/power supply exporter. By far the most important Node information I want metrics and alarms on in my developer life! https://github.com/rektide/node_exporter/tree/export-battery
Almost went with a standalone Node exporter, but I decided I'd try some Go, and tour some of the Prometheus codebase. MustNewConstMetric is a very confusing thing to me, but hopefully I'm on the right track! Feeling double-inspired to get my homelab Prometheus up and going right now, between renewed excitement for these exporters and the 1.0! So close!
I started with the same path, using github.com/prometheus/client_golang/prometheus, but it exported too many application metrics (an order of magnitude more than the sensors I exported :p), so I went to a more custom approach.
In the usual cases, pull has many benefits however:
- You can get high availability by simply running two identically configured, independent Prometheus servers. No clustering required.
- You can run a copy of production monitoring (or similar) on your laptop without changing production. This is great for experimentation and testing changes.
- You get free up-ness monitoring via scrapes and can use this for alerting.
- When there's an HTTP pull endpoint on service instances, you can also go there manually as a human and check out the current metrics state of any target, independent of the Prometheus server.
- Knowledge of service identities is inverted: instead of each service instance having to know its own identity (usually instance="hostname:port" and some job/service name), the monitoring system knows (usually via some form of service discovery) what instances should be there and how they are labeled, and proactively checks on them. Services have no knowledge of where the monitoring system lives anymore, enabling the above use cases.
- Debatable, but push-based monitoring systems can make it easier for someone to accidentally DDoS your monitoring. (still possible with pull, but you have one central place where you know what you pull from)
I don't see why processing pipelines would be special for push vs. pull, it's generally a wash.
Granted, you don't need to touch this confirmation very often, but anyone who's going to operate the cluster will need to understand it thoroughly.
The fact that it's ad-hoc (prometheus.io/probe etc. aren't built in) means everyone's config is probably going to be unique and not portable. For example, we found the current config example to be insufficient, since blackbox-exporter needs information about whether its endpoint is HTTP or HTTPS.
Kubernetes' template system, combined with variable expansion, seems like it would be a better model for what you're currently trying to do with service discovery.
Also: I'm setting this up right now, but it seems there's no exporter for the Kubernetes API proper, just Kubelet?
The way I would describe Prometheus in relation to those other tools:
- Prometheus is open-source and self-hosted.
- Prometheus is about dimensional numeric time series metrics only (no log-based analysis, no per-request tracing, etc.).
- Prometheus has a strong focus on systems and service monitoring, not so much on business metrics.
- Prometheus is more of a Swiss army knife of monitoring rather than a ready-to-drop-in package that starts monitoring everything automatically.
- Prometheus is very much about whitebox monitoring and manually defining any metrics that could be useful for you (although we support blackbox exporting and bridging metrics from existing systems as well).
- We don't do machine-learning-style anomaly detection, but we do alerting based on manually defined rules.
- For a purely metrics-based solution, the insight we deliver is one of the best in the field (via the dimensional data model and the query language to go with it).
- Many open-source projects are starting to expose native Prometheus metrics (like k8s, etcd, ...), which gives Prometheus an advantage when being used together with those.
EDIT: Also try the "Getting Started" tutorial - that should only take a couple of minutes to try it out: https://prometheus.io/docs/introduction/getting_started/
https://prometheus.io/blog/2016/03/23/interview-with-life360... and https://prometheus.io/blog/2016/05/01/interview-with-showmax... look at two of our users.
A lot of open-source projects (Kubernetes, Etcd, ...) are also exposing native Prometheus metrics now, making it easier to integrate with those.
EDIT: also check the PromCon schedule for other user companies giving talks: https://promcon.io/schedule/
And the sponsors are also users: https://promcon.io/#our-sponsors
So it's a question of whether you can squeeze your data into that model and whether you need per-event details or whether aggregated time series are ok.
Many thanks to the authors for all their hard work, and congrats.
I'm blocked because a network partition can mean no stats.
Edit: Maybe https://github.com/prometheus/prometheus/issues/398 ? Also, limit is configurable.
Not sure how to filter through say 300 servers and select out the top 10 for a particular time span. I would think that it would be a common need but I guess I'm missing something?
For example, instead of highestCurrent, you could do something like:
topk(3, my_metric)
This would at every point along the graph select the current top 3 series that have the my_metric metric name.
Or if you want to average each series over the last 10 minutes at every point in the graph before selecting the top 3:
topk(3, avg_over_time(my_metric[10m]))
Note that due to the reasons mentioned above, topk() here does not select whatever line has the largest area under the entire visible graph range, but whatever is at the top at each given resolution step. So you may actually get more than 3 series in your graph, but only 3 at a time at any given X.
There's also an issue asking about this, but we're not sure if that is fundamentally compatible with Prometheus's query execution model without major changes: https://github.com/prometheus/prometheus/issues/586
Still look forward to rolling out Prometheus for all of the other great features. Congrats on the release!
You can calculate the highest value now (topk) or the highest averages (topk+avg_over_time) and now that you know which timeseries you want, graph those. I believe this is doable in Grafana.
Hopefully have something to show for promcon.
Some resources:
- Scaling in general: http://www.robustperception.io/scaling-and-federating-promet...
- Federation: https://prometheus.io/docs/operating/federation/
Note that a hosted Prometheus service has different design tradeoffs. As Brian mentioned, Prometheus as it is is really meant to be run as close as possible to your monitored services for maximum reliability. Also, there's no clustered and long-term durable storage for a similar reason, which will likely be different in hosted versions.