I can see this being very attractive for side projects / early stage bootstrap stuff where you may not be able to afford something like datadog.
On the flip side, there is something to be said about someone else hosting your monitoring systems, as hopefully if you have a massive outage the third party system will still be up.
I'd imagine that you could achieve this here by starting off self hosted and then migrating to their cloud offering once your systems were critical enough to justify it.
You're right that this is a pretty crowded space though - look forward to seeing how they do
I think we've done pretty good job with filtering+grouping+aggregation and data exploration in general. That is something I am proud of.
Uptrace is significantly cheaper.
As for the rest, it is the same but different. DD is bigger and more complex. I guess that is not the problem when you get used to the UI.
We tried other hosted services but most if not all of them consider you are creating gold from thin air and charge you accordingly, we started with a hosted jaeger solution and switched to uptrace without looking back.
Any mobile client sdk? Android/iOS you are aware of?
UI-wise, Uptrace does not have service dependency analysis, because I don't find it very useful/interesting (let me know if I am wrong). I believe Jaeger does not have span grouping, percentiles, and filters/aggregation are much much simpler.
Jaeger has remote sampling which can be a powerful feature if users are ready to spend their time configuring it. Would be nice to hear if that is the case and if many people are using it.
I've also seen some work on in-memory tail-based sampling, but I don't know if that is ready to use or not. I plan to add tail-based sampling to Uptrace too.
That is about as much I know :)
The benefit of remote sampling, as I understand it, is that the sample rata doesn't need to be configured in each app. So in principle, you could have a little slider in the UI to adjust the sample rate for each individual app, perhaps increasing it during a particular incident to capture more traffic data.
Without remote sampling, you'd have to hard-code the rate into the app's config or code, which then requires a redeploy to change the setting. With a lot of microservices to maintain, it seems a lot simpler to have a central location for such settings.
Uptrace looks fantastic, by the way. It's about time someone gives Zipkin and Jaeger some competition. And Clickhouse makes a lot more sense as a backend than Cassandra.
- ability to use ReplicatedMergeTree in the table schema
- round-robin writing to multiple nodes
It is mostly a matter of providing configuration options. Thought I could skip it in the first release.
>How are you handling persistent storage?
If you mean avoiding data loss by using ClickHouse cluster, then yes - we use CH cluster and replication :)
ClickHouse handles data corruption surprisingly good - even if there are broken parts CH continues to serve the rest of data.
For reference, here's what I'm using: https://github.com/Altinity/clickhouse-operator/blob/master/...
Tail-based sampling will require buffering spans in memory for some time, but tail-based sampling is not implemented yet.
Cloud version also uses Kafka to survive surges in traffic, but I guess "personal" / company version does not need that as much. So no need to introduce additional dependency.
Main things distinguishing this from Jaeger are on the storage side. Jaeger has pluggable span writers and readers so it would be possible to do the same right in Jaeger.
The UI part is probably more work than the actual storage but the default Jaeger UI is anyway not the main tool people tend to work with.
I understand that it is not ideal to have so many competing tools, but contributing to an existing mature project is a nightmare. It is by far easier to start a new one.
>Jaeger UI is anyway not the main tool people tend to work with.
Which tools / features do you have in mind?
Uptrace OS competes with Jaeger / Zipkin / SigNoz / SkyWalking and I believe it already does a pretty good job.
I get your point about contributing, especially features that are incompatible with the maintainer vision. Feature creep, right?
What I value in open source projects is extensibility. Plugins which one can maintain outside of the main product.
> "But" do you need another storage? :)
I’m only saying that it’s possible. I might not need it but if someone does and they want to self host it as a managed solution, it can be done right in Jaeger.
> Which tools / features do you have in mind?
The default Jaeger UI isn’t really ergonomic. Trace info is more useful in the context of other information. As in, tools pulling trace info out of storage and overlaying on other data. There’s also Grafana Tempo.
we also just introduced experimental support for ingesting OTLP/ZIPKIN spans and a tempo-compatible API in cloki, looking for testers to validate this feature:
https://github.com/lmangani/cLoki/wiki/Tempo-Tracing#clickho...
Internally trace spans are stored as tagged JSON logs, meaning they are available from both Loki and Tempo APIs and can be used from pretty much any visualization, too!
For example, a question I want to be able to answer with a query against the distributed trace data: show me the (mean, median) time between a parent http request and a child http request in the same trace tree. As far as I understand, this requires the query language to be able to group by trace id, then be able to identify parent/child relations.
Does the Uptrace query language allow you to do something like this?
Sometimes using a UI is not possible, for example, if you want to automate such checks. In that case, I would build a custom metric or two and would use that metric for monitoring purposes. That requires some programming / instrumentation, but it still looks like a better solution to me.
>Also, are there plans for an Elixir client?
Uptrace client is a just pre-configured distribution of OpenTelemetry so you try to configure OpenTelemetry https://github.com/open-telemetry/opentelemetry-erlang . We will provide a pre-configured client for Elixir if there is an interest.
Would be happy to answer your questions here.
signoz datastore is built with cassandra and yours with clickhouse.