Prometheus reaches 1.0 (opens in new tab)

(prometheus.io)

152 pointsgrobie9y ago58 comments

58 comments

41 comments · 11 top-level

jrv9y ago· 8 in thread

Prometheus cofounder here - we're happy to take any questions. Huge congrats to everyone who made this release possible and for all the excellent work over the years that lead up to this!

atombender9y ago

Any plans on more native support for Kubernetes? The relabeling spaghetti config you end up with is very confusing and unreadable.

Granted, you don't need to touch this confirmation very often, but anyone who's going to operate the cluster will need to understand it thoroughly.

The fact that it's ad-hoc (prometheus.io/probe etc. aren't built in) means everyone's config is probably going to be unique and not portable. For example, we found the current config example to be insufficient, since blackbox-exporter needs information about whether its endpoint is HTTP or HTTPS.

Kubernetes' template system, combined with variable expansion, seems like it would be a better model for what you're currently trying to do with service discovery.

Also: I'm setting this up right now, but it seems there's no exporter for the Kubernetes API proper, just Kubelet?

lyonlim9y ago

Is it even right to try to compare this with New Relic? Prometheus looks very interesting and I'm trying to figure out if this is something useful to us. Thanks!

jrv9y ago

Not quite, though there is some overlap. Someone else asked me to compare Prometheus to other tools in the APM (application performance monitoring) space, and I'm going to share the summary I came up with:

The way I would describe Prometheus in relation to those other tools:

- Prometheus is open-source and self-hosted.

- Prometheus is about dimensional numeric time series metrics only (no log-based analysis, no per-request tracing, etc.).

- Prometheus has a strong focus on systems and service monitoring, not so much on business metrics.

- Prometheus is more of a Swiss army knife of monitoring rather than a ready-to-drop-in package that starts monitoring everything automatically.

- Prometheus is very much about whitebox monitoring and manually defining any metrics that could be useful for you (although we support blackbox exporting and bridging metrics from existing systems as well).

- We don't do machine-learning-style anomaly detection, but we do alerting based on manually defined rules.

- For a purely metrics-based solution, the insight we deliver is one of the best in the field (via the dimensional data model and the query language to go with it).

- Many open-source projects are starting to expose native Prometheus metrics (like k8s, etcd, ...), which gives Prometheus an advantage when being used together with those.

EDIT: Also try the "Getting Started" tutorial - that should only take a couple of minutes to try it out: https://prometheus.io/docs/introduction/getting_started/

2 more replies

_lbaq9y ago

What is the typical user/customer of this system ? I've found it very hard to get customers to spend time configuring a motoring system, let alone writing any scripts to gather data.

bbrazil9y ago

I'm not sure there is a typical user. We've got everything from single-sysadmin small companies to Fortune 500s to companies looking to integrate it into their products.

https://prometheus.io/blog/2016/03/23/interview-with-life360... and https://prometheus.io/blog/2016/05/01/interview-with-showmax... look at two of our users.

1 more reply

jrv9y ago

Yeah, it goes from people putting Prometheus on their Raspberry PI all the way to companies like DigitalOcean monitoring millions of machines with Prometheus (https://promcon.io/talks/scaling_to_a_million_machines_with_...).

A lot of open-source projects (Kubernetes, Etcd, ...) are also exposing native Prometheus metrics now, making it easier to integrate with those.

EDIT: also check the PromCon schedule for other user companies giving talks: https://promcon.io/schedule/

And the sponsors are also users: https://promcon.io/#our-sponsors

1 more reply

vegabook9y ago

Could you elaborate on what seems to be your focus on logging only. Would Prometheus be relevant, for say, high frequency financial markets data? I note that Prometheus "has knowledge about what the world should look like" and "actively tries to find faults" [1]. Isn't this something which is applicable to other fields than simply monitoring? I spend a lot of my time watching and managing "bad data" coming through in finance....

[1] https://prometheus.io/docs/introduction/comparison/

jrv9y ago

Our focus is not logging only, it's the opposite: we don't support storing logs of individual events. What Prometheus does is store dimensional numerical time series. See the data model: https://prometheus.io/docs/concepts/data_model/

So it's a question of whether you can squeeze your data into that model and whether you need per-event details or whether aggregated time series are ok.

andmarios9y ago· 4 in thread

Congrats! Prometheus + grafana are a killer duet! Easy to setup and use. Same goes for node_exporter. I only wish more JVM apps included a configuration file for the jmx_exporter and an easier to setup nginx_exporter. :)

Incidentally yesterday I created a simple exporter for (linux) coretemp, hddtemp and NUT's upsc: https://github.com/andmarios/sensor_exporter

rektide9y ago

I really hope upstream node_exporter is happy adding new exports!

Still slowly hacking on my homelab Prometheus bringup, but I've already got a start developing a battery level/power supply exporter. By far the most important Node information I want metrics and alarms on in my developer life! https://github.com/rektide/node_exporter/tree/export-battery

Almost went with a standalone Node exporter, but I decided I'd try some Go, and tour some of the Prometheus codebase. MustNewConstMetric is a very confusing thing to me, but hopefully I'm on the right track! Feeling double-inspired to get my homelab Prometheus up and going right now, between renewed excitement for these exporters and the 1.0! So close!

jrv9y ago

If the interface you're getting the metrics from is generic and standardized enough to work on most Linux systems, it has a good chance of making it in. "/sys/class/power_supply/..." probably fits that requirement, but bbrazil (same name on GitHub) would be the best to give a final judgement on that kind of thing.

1 more reply

gophernaught9y ago

Funny coincidence, yesterday I did almost the same thing, though mine's based on lm-sensors and hddtemp:

https://github.com/ncabatoff/sensor-exporter

andmarios9y ago

Nice! I didn't knew about gosensors library.

I started with the same path, using github.com/prometheus/client_golang/prometheus, but it exported too many application metrics (an order of magnitude more than the sensors I exported :p), so I went to a more custom approach.

2 more replies

alainchabat9y ago· 4 in thread

Is anyone using Promotheus for monitoring micro-services deployed with Kubernetes? Any feedbacks?

netingle9y ago

We are! And unsurprising it's a great fit given k8s is inspired by Borg and Prometheus is inspired by Borgmon.

bbrazil9y ago

There's a few, here's a recent talk by Weave for example: http://www.slideshare.net/weaveworks/kubernetes-and-promethe...

philips9y ago

Here is the CoreOS blog on using it: https://coreos.com/blog/prometheus-and-kubernetes-up-and-run...

rvanniekerk9y ago

Yes, it works wonderfully. I published a Grafana dashboard for this here - https://grafana.net/dashboards/162

daniel_levine9y ago· 4 in thread

Is there a company offering a SaaS version of Prometheus?

otterley9y ago

Datadog and SignalFX are both far more scalable, easier to use, and have more features than Prometheus and are SaaS offerings. Prometheus is about 5 years behind them in terms of engineering effort, I believe.

jrv9y ago

Not yet, but one or more companies are working on this as part of their offerings in the future. Stay tuned!

Note that a hosted Prometheus service has different design tradeoffs. As Brian mentioned, Prometheus as it is is really meant to be run as close as possible to your monitored services for maximum reliability. Also, there's no clustered and long-term durable storage for a similar reason, which will likely be different in hosted versions.

bbrazil9y ago

There's noone presently offering that, it's designed more with a view to being on-prem for reliability (http://www.robustperception.io/monitoring-without-consensus/).

daniel_levine9y ago

Understood and generally agreed. That said, often when monitoring matters most is the exact moment when you least want to worry about the integrity or uptime of your monitoring.

1 more reply

xvf339y ago· 3 in thread

I my have misread the documentation but there seems to be no way to get the same output as the highestAverage, highestCurrent, or highestCurrent functions from graphite.

Not sure how to filter through say 300 servers and select out the top 10 for a particular time span. I would think that it would be a common need but I guess I'm missing something?

jrv9y ago

Prometheus range queries work a bit differently, so this is not 100% reproducible in a graph query, but a similar thing is possible. A given PromQL expression is evaluated at every resolution step along the graph and doesn't have context about what the "graph range" is. At every evaluation point, it can still look back over a given time window, but that's more of a sliding window approach then and independent of what the visible graph time range is.

For example, instead of highestCurrent, you could do something like:

topk(3, my_metric)

This would at every point along the graph select the current top 3 series that have the my_metric metric name.

Or if you want to average each series over the last 10 minutes at every point in the graph before selecting the top 3:

topk(3, avg_over_time(my_metric[10m]))

Note that due to the reasons mentioned above, topk() here does not select whatever line has the largest area under the entire visible graph range, but whatever is at the top at each given resolution step. So you may actually get more than 3 series in your graph, but only 3 at a time at any given X.

There's also an issue asking about this, but we're not sure if that is fundamentally compatible with Prometheus's query execution model without major changes: https://github.com/prometheus/prometheus/issues/586

xvf339y ago

I suppose what I'm asking for isn't really possible yet. Hopefully it ends up getting implemented at some point.

Still look forward to rolling out Prometheus for all of the other great features. Congrats on the release!

1 more reply

bbrazil9y ago

We've a slightly different computational model, so that takes two passes.

You can calculate the highest value now (topk) or the highest averages (topk+avg_over_time) and now that you know which timeseries you want, graph those. I believe this is doable in Grafana.

akbar5019y ago· 3 in thread

What is the scale out process with Prometheus? Is sharding/replication a manual setup process or is it automated? What's involved in scaling out?

netingle9y ago

There is an project we've just started: https://docs.google.com/document/d/1C7yhMnb1x2sfeoe45f4mnnKC...

Hopefully have something to show for promcon.

jrv9y ago

Usually you start by some functional sharding (giving each team of services their own Prometheus servers), but also having per-datacenter Prometheus servers and then some hierarchical federation layer ontop of that. There's no built-in automatic horizontal scaling though (which would be going against the design goal of not having a clustered system, for reliability).

Some resources:

- Scaling in general: http://www.robustperception.io/scaling-and-federating-promet...

- Federation: https://prometheus.io/docs/operating/federation/

bbrazil9y ago

http://www.robustperception.io/scaling-and-federating-promet... explains it. Unless you're absolutely massive, it's fairly easy.

Rapzid9y ago· 2 in thread

Is Prometheus still pull only or is there a first class push option? This always rubbed me the wrong way.. of course most metrics collection scrapes at some level, but push/streaming at the infrastructure level is easier to integrate with and compose processing pipelines with..

jrv9y ago

It's still focused on pull, while there is the Pushgateway (see https://prometheus.io/docs/instrumenting/pushing/ and https://prometheus.io/docs/practices/pushing/) for dealing with one-off situations where you cannot scrape something. Pull works great in most situations where people have their own private clouds or datacenters, but it's less suitable for very restrictive environments where you can't run Prometheus on the same network or behind the same firewall as the targets you want to monitor.

In the usual cases, pull has many benefits however:

- You can get high availability by simply running two identically configured, independent Prometheus servers. No clustering required.

- You can run a copy of production monitoring (or similar) on your laptop without changing production. This is great for experimentation and testing changes.

- You get free up-ness monitoring via scrapes and can use this for alerting.

- When there's an HTTP pull endpoint on service instances, you can also go there manually as a human and check out the current metrics state of any target, independent of the Prometheus server.

- Knowledge of service identities is inverted: instead of each service instance having to know its own identity (usually instance="hostname:port" and some job/service name), the monitoring system knows (usually via some form of service discovery) what instances should be there and how they are labeled, and proactively checks on them. Services have no knowledge of where the monitoring system lives anymore, enabling the above use cases.

- Debatable, but push-based monitoring systems can make it easier for someone to accidentally DDoS your monitoring. (still possible with pull, but you have one central place where you know what you pull from)

bbrazil9y ago

There is no first-class push option.

I don't see why processing pipelines would be special for push vs. pull, it's generally a wash.

j_s9y ago· 2 in thread

What is the best GitHub issue to follow to find out when Prometheus supports events older than the configurable 5 minute limit?

I'm blocked because a network partition can mean no stats.

Edit: Maybe https://github.com/prometheus/prometheus/issues/398 ? Also, limit is configurable.

jrv9y ago

That would be the correct issue about the staleness limit, yes. Note that Prometheus does not track individual events, but only numerical time series and their current and historical values. What's the exact use case you're stuck on? I guess you are using the pushgateway with client-side timestamps?

j_s9y ago

Yes, I have to collect metrics in a very restricted and separated production environment then ship them over to a completely separate system for reporting. My impression is that Prometheus just isn't the right fit.

1 more reply

woodcut9y ago

We've been using Prometheus in production for a while now and found it to be rock solid, it's never an issue, never requires much of an after thought other than extending the current use of it. All of which says a lot. Our only "feature request" is that it stays stable and the project doesn't lose focus. Cheers!

gophernaught9y ago

Hooray! Couldn't come at a better time for me, we're about to roll it out to all our customers and the API stability promises are great news.

Many thanks to the authors for all their hard work, and congrats.

Comradin9y ago

Congratulations and thumbs up to reaching the one point ohhh

j / k navigate · click thread line to collapse

58 comments

41 comments · 11 top-level

jrv9y ago· 8 in thread

Prometheus cofounder here - we're happy to take any questions. Huge congrats to everyone who made this release possible and for all the excellent work over the years that lead up to this!

atombender9y ago

Any plans on more native support for Kubernetes? The relabeling spaghetti config you end up with is very confusing and unreadable.

Granted, you don't need to touch this confirmation very often, but anyone who's going to operate the cluster will need to understand it thoroughly.

Kubernetes' template system, combined with variable expansion, seems like it would be a better model for what you're currently trying to do with service discovery.

Also: I'm setting this up right now, but it seems there's no exporter for the Kubernetes API proper, just Kubelet?

lyonlim9y ago

Is it even right to try to compare this with New Relic? Prometheus looks very interesting and I'm trying to figure out if this is something useful to us. Thanks!

jrv9y ago

The way I would describe Prometheus in relation to those other tools:

- Prometheus is open-source and self-hosted.

- Prometheus is about dimensional numeric time series metrics only (no log-based analysis, no per-request tracing, etc.).

- Prometheus has a strong focus on systems and service monitoring, not so much on business metrics.

- Prometheus is more of a Swiss army knife of monitoring rather than a ready-to-drop-in package that starts monitoring everything automatically.

- We don't do machine-learning-style anomaly detection, but we do alerting based on manually defined rules.

- For a purely metrics-based solution, the insight we deliver is one of the best in the field (via the dimensional data model and the query language to go with it).

- Many open-source projects are starting to expose native Prometheus metrics (like k8s, etcd, ...), which gives Prometheus an advantage when being used together with those.

EDIT: Also try the "Getting Started" tutorial - that should only take a couple of minutes to try it out: https://prometheus.io/docs/introduction/getting_started/

2 more replies

_lbaq9y ago

What is the typical user/customer of this system ? I've found it very hard to get customers to spend time configuring a motoring system, let alone writing any scripts to gather data.

bbrazil9y ago

I'm not sure there is a typical user. We've got everything from single-sysadmin small companies to Fortune 500s to companies looking to integrate it into their products.

https://prometheus.io/blog/2016/03/23/interview-with-life360... and https://prometheus.io/blog/2016/05/01/interview-with-showmax... look at two of our users.

1 more reply

jrv9y ago

A lot of open-source projects (Kubernetes, Etcd, ...) are also exposing native Prometheus metrics now, making it easier to integrate with those.

EDIT: also check the PromCon schedule for other user companies giving talks: https://promcon.io/schedule/

And the sponsors are also users: https://promcon.io/#our-sponsors

1 more reply

vegabook9y ago

[1] https://prometheus.io/docs/introduction/comparison/

jrv9y ago

So it's a question of whether you can squeeze your data into that model and whether you need per-event details or whether aggregated time series are ok.

andmarios9y ago· 4 in thread

Incidentally yesterday I created a simple exporter for (linux) coretemp, hddtemp and NUT's upsc: https://github.com/andmarios/sensor_exporter

rektide9y ago

I really hope upstream node_exporter is happy adding new exports!

jrv9y ago

1 more reply

gophernaught9y ago

Funny coincidence, yesterday I did almost the same thing, though mine's based on lm-sensors and hddtemp:

https://github.com/ncabatoff/sensor-exporter

andmarios9y ago

Nice! I didn't knew about gosensors library.

2 more replies

alainchabat9y ago· 4 in thread

Is anyone using Promotheus for monitoring micro-services deployed with Kubernetes? Any feedbacks?

netingle9y ago

We are! And unsurprising it's a great fit given k8s is inspired by Borg and Prometheus is inspired by Borgmon.

bbrazil9y ago

There's a few, here's a recent talk by Weave for example: http://www.slideshare.net/weaveworks/kubernetes-and-promethe...

philips9y ago

Here is the CoreOS blog on using it: https://coreos.com/blog/prometheus-and-kubernetes-up-and-run...

rvanniekerk9y ago

Yes, it works wonderfully. I published a Grafana dashboard for this here - https://grafana.net/dashboards/162

daniel_levine9y ago· 4 in thread

Is there a company offering a SaaS version of Prometheus?

otterley9y ago

jrv9y ago

Not yet, but one or more companies are working on this as part of their offerings in the future. Stay tuned!

bbrazil9y ago

There's noone presently offering that, it's designed more with a view to being on-prem for reliability (http://www.robustperception.io/monitoring-without-consensus/).

daniel_levine9y ago

Understood and generally agreed. That said, often when monitoring matters most is the exact moment when you least want to worry about the integrity or uptime of your monitoring.

1 more reply

xvf339y ago· 3 in thread

I my have misread the documentation but there seems to be no way to get the same output as the highestAverage, highestCurrent, or highestCurrent functions from graphite.

Not sure how to filter through say 300 servers and select out the top 10 for a particular time span. I would think that it would be a common need but I guess I'm missing something?

jrv9y ago

For example, instead of highestCurrent, you could do something like:

topk(3, my_metric)

This would at every point along the graph select the current top 3 series that have the my_metric metric name.

Or if you want to average each series over the last 10 minutes at every point in the graph before selecting the top 3:

topk(3, avg_over_time(my_metric[10m]))

xvf339y ago

I suppose what I'm asking for isn't really possible yet. Hopefully it ends up getting implemented at some point.

Still look forward to rolling out Prometheus for all of the other great features. Congrats on the release!

1 more reply

bbrazil9y ago

We've a slightly different computational model, so that takes two passes.

You can calculate the highest value now (topk) or the highest averages (topk+avg_over_time) and now that you know which timeseries you want, graph those. I believe this is doable in Grafana.

akbar5019y ago· 3 in thread

What is the scale out process with Prometheus? Is sharding/replication a manual setup process or is it automated? What's involved in scaling out?

netingle9y ago

There is an project we've just started: https://docs.google.com/document/d/1C7yhMnb1x2sfeoe45f4mnnKC...

Hopefully have something to show for promcon.

jrv9y ago

Some resources:

- Scaling in general: http://www.robustperception.io/scaling-and-federating-promet...

- Federation: https://prometheus.io/docs/operating/federation/

bbrazil9y ago

http://www.robustperception.io/scaling-and-federating-promet... explains it. Unless you're absolutely massive, it's fairly easy.

Rapzid9y ago· 2 in thread

jrv9y ago

In the usual cases, pull has many benefits however:

- You can get high availability by simply running two identically configured, independent Prometheus servers. No clustering required.

- You can run a copy of production monitoring (or similar) on your laptop without changing production. This is great for experimentation and testing changes.

- You get free up-ness monitoring via scrapes and can use this for alerting.

- When there's an HTTP pull endpoint on service instances, you can also go there manually as a human and check out the current metrics state of any target, independent of the Prometheus server.

bbrazil9y ago

There is no first-class push option.

I don't see why processing pipelines would be special for push vs. pull, it's generally a wash.

j_s9y ago· 2 in thread

What is the best GitHub issue to follow to find out when Prometheus supports events older than the configurable 5 minute limit?

I'm blocked because a network partition can mean no stats.

Edit: Maybe https://github.com/prometheus/prometheus/issues/398 ? Also, limit is configurable.

jrv9y ago

j_s9y ago

1 more reply

woodcut9y ago

gophernaught9y ago

Hooray! Couldn't come at a better time for me, we're about to roll it out to all our customers and the API stability promises are great news.

Many thanks to the authors for all their hard work, and congrats.

Comradin9y ago

Congratulations and thumbs up to reaching the one point ohhh

j / k navigate · click thread line to collapse