At the same time eBPF was on the rise and it was a match made in heaven, we can provide an incredible user experience without requiring any code changes, and all our users need to do is deploy an agent (a single command to run against a Kubernetes cluster), and we can profile just about any popular language (and those that we can't profile yet we have on the roadmap). Not only does eBPF get us an incredible user experience, it also allows us to collect this data at never-before-seen low overhead (<1%) since we can grab exactly the data we need and nothing more.
In addition to that we built a custom columnar database to back this product to allow a Prometheus-style label-based query language to work on this very high cardinality data.
In a world where the economics of Moore's Law have already stagnated, everyone is going to have to learn to build more efficient software, and the root of all evil performance optimizations are those that are not based on measurements.
We're going to be hanging out in the comments all day, so please feel free to ask us any questions you might have, as well as any feedback you have for us!
https://calendly.com/frederic-branczyk/getting-to-know-polar...
I did some digging in your blog history and it seems that is referencing https://www.polarsignals.com/blog/posts/2022/07/22/frostdb-i... and digging into the "but why?" section <https://github.com/polarsignals/frostdb#why-you-should-use-f...> seems to imply you favored the embedded feature over having something standalone, but I would enjoy hearing (or reading a blog post!) about why you felt it was a better use of your engineering to make your own columar DB versus using one of the existing columanr dbs that I have seen referenced a ton in other Show HN announcements around both logging and metrics services
The big one that existing columnar databases can’t do (or not well), is the ability to search and aggregate on user defined dimensions (think prometheus-style labels). Influx 3.0 is the only other columnar database that is (now) available that was engineered specifically to be able to do this. The good folks at honeycomb came to the same conclusion that this type of columnar database (wide column) is necessary to build Observability with exceptional user experience. Plus we now own our future and can do any kind of optimization while other companies (and I’ve spoken to them) constantly fear ClickHouse relicensing or otherwise destroying their business.
//edit: we call this feature dynamic columns: https://github.com/polarsignals/frostdb#dynamic-columns
First off, congratulations on the launch!
I am doing some work around Python tracing using eBPF tools and spending a lot time reading about CPython interpreter and runtime implementation. I came across this Polar Signals blog post[0] last week. Fantastic, fantastic work! I learned a ton about Python internals in just a few minutes that I had completely missed even after reading the code for days. This post also set a benchmark for technical writing for me.
Huge kudos for all the great work! And thank you for sharing all these out!
[0] https://www.polarsignals.com/blog/posts/2023/10/04/profiling...
If you enjoyed that post, I think you’ll also like the one we wrote about native unwinding without frame pointers: https://www.polarsignals.com/blog/posts/2022/11/29/dwarf-bas...
Continuous profiling is a game changer and once you use it on your production systems you can't do without it.
The first time we got differential flamegraphs working we were so mesmerised that we could finally see exactly what the difference between a low and a high point (like a CPU spike) was, that we clicked around our own code for a good hour.
Wishing you all the best with your launch and looking forward to seeing this product help countless developers and businesses!
I noticed pricing doesn't include anything about data retention.
Does the data live indefinitely? How far back can I query?
I’ve enjoyed having 3-5 years of prom metrics retention to look at seasonal traffic trends but we spent a fair bit of CPU aggregating raw metrics down to the right granularity for that kind of long term retention. My feeling is that the observability world is moving towards small, localized installations with real-time data and separate systems with cheaper, slower, but long term retention. Curious how you see it as you’re building a product in the space.
Kubernetes Cluster. Kubernetes Nodes are running Linux 5.4 or newer.
Dang.. but it does look like a tool I would want if I used K8s!
We’ll think about building something like this but if there is someone with a use case I’d love to chat and figure out how we can make it work together!