Also, I did research on alternatives to GA few days back, might be helpful of someone:
https://github.com/Open-Web-Analytics/Open-Web-Analytics
https://github.com/matomo-org/matomo
https://github.com/usefathom/fathom
Ps: You probably need a better name. Since your website says it is privacy respecting, 'UserTrack' doesn't exactly convey that. Just something with 'Track' not in the name.
And adblockers like uBlock Origin tend to block everything like track.domain.com.
And just a bug/overlap I noticed when hovering over the delete button - https://i.imgur.com/aBA7cpr.png
No, XCSme did not pay me for this comment ;)
One thing I haven't seen is someone categorize open source web traffic analytics into Client Side Analytics (via javascript) and Web Server Log analytics.
Since each approach drastically changes the data collected and reported.
i.e. collects both (or one of either) server logs and client side analytics, normalize them, etc.
I've been using it for our name generator product Mashword (https://mashword.com) and it was really straightforward to implement. It's reasonably priced, has a clean interface and graphs, is privacy protecting and supports using your own domain for pulling in the js include.
Be careful of this one. It started out as OSS, but switched to proprietary once they'd achieved traction.
Author of Umami here. I totally did not expect this response so it looks like you all hugged my little server to death. The demo should be back up now.
A little background. This is a side project I started 30 days ago because I was tired of how slow and complicated Google Analytics was. I just wanted something really simple and fast that I could browse quickly without diving through layers of menus. So I created Umami to track my own websites and then open sourced it. The stack is React, Redux, and Next.js with a Postgresql backend.
Would be happy to answer any questions you have.
I am wondering why in the past 2 years we went form having little to zero GA alternative to all of a sudden having dozens of them.
I am genuinely curious.
I always start side projects so I can learn something new. In this case it was Prisma.io, Chart.js, Next.js authentication, JWT and Postgresql. All of which I didn't know about until this project.
EDIT: just noticed the demo is for another site flightphp.com not the landing page umami.is which is sort of weird. That explains it. The demo should really be demoing the metrics for umami.is. Which is a shame, because that would prove how scalable umami.is is. Unfortunately umami.is is not eating its own dog food.
https://app.umami.is/share/8rmHaheU/umami.is
I'm using it for all my websites. The reason I went with another site for the demo is because I wanted something with at least 30 days of data so users can play around with the different settings. Once I get enough data, I'll switch it over.
Have a look at patterns that resolve this like Snowplow Analytics.
> Umami does not collect any personally identifiable information so it is GDPR and CCPA compliant. No cookie notices are needed because Umami does not use cookies.
From auditing the source code, this doesn't seem to be the case. First, it claims it doesn't use cookies, but it clearly uses localStorage to store a "sessionKey"[0].
The other claim, that Umami is GDPR and CCPA compliant because it does not collect any personally identifiable information is only half true. While the data collected isn't PII (because you can't use it on it's own to identify a user), it's still "personal data". This is because the "sessionKey" stored alongside all events is actually a pseudonymous user identifier. It's really just a hash of the user's IP along with a few other properties[1]. Because the data Umami collects, when combined with some other data, can be attributed back to the user, the data is still considered "personal data". That means you're still subject to most of GDPR such as GDPR deletion requests[2].
[0] https://github.com/mikecao/umami/blob/f4ca353b5c68750bf391e5...
[1] https://github.com/mikecao/umami/blob/master/lib/session.js#...
As for the localStorage, it's just for performance so I don't have to recompute the session hash. The product will work the same without it. But seeing as it is a cause contention I am probably going to remove it.
> We do not attempt to generate a device-persistent identifier because they are considered personal data under GDPR.
> Instead, we generate a daily changing identifier using the visitor’s IP address and User Agent. To anonymize these datapoints, we run them through a hash function with a rotating salt.
If you don't feel fit to judge whether something breaches GDPR, then maybe you shouldn't say "so it is GDPR and CCPA compliant".
This is just another misguided attempt to adhere to the letter of the law while going against its spirit. Is is misguided because it's based on a wrong understand of what the letter of the law actually is. You see this a lot with adtech and analytics companies who try to skirt regulations through elaborate mechanisms but ultimately in vain.
This was mentioned when the guest spoke about right to be forgotten. The law is really weird, because you need to delete user's data from your database, but it's OK to keep backups.
> It is the same data that is found in server log files. In the strictest interpretation of GDPR, I don't think any analytics product can exist.
It can exists as long as user agrees to be tracked. There is a category of "metrics" "cookies" user needs to agree on before you can track him for metrics. That's the whole point of the law. You need user's permission.
Besides, it's not only GDPR you should consider, but also the latest cookie verdict by the CJEU. You need a consent if you drop cookies, session storage or any other tracking technology, no matter if you process personal data or not.
The definition in GDPR Art. 4 reads: [1]
> ‘personal data’ means any information relating to an identified or identifiable natural person (‘data subject’); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person;
[1]: https://gdpr-info.eu/art-4-gdpr/
My intuition is that a randomly generated session key could not be tied back to the identity of a natural person, as long as client IP, user agent, etc., are also excluded from the analytics data.
If you only save it on the server, not on the client side, it's not PII. But then it's almost useless for analytics. Because next time the user comes around, you create another hash and therefore another user.
If you do something like Plausible.io with daily changing salts, you know only about daily visitors. This might be GDPR compliant.
If you do something like Fathom with chaining requests, you can see daily uniques, bounce rates and click speed. Not sure this is GDPR compliant though. Would feel better if they run this through an European GDPR watchdog which AFAIK they haven't.
If you do something like SimpleAnalytics with using the referrer to find uniques, you can see daily unique visits but with some statistical errors. Should be GDPR an ePrivacy compliant without your customers needing to declare your usage or have a data processing agreement with you. But gets you the least analytical data (We use SimpleAnalytics).
None of these can do cohorts, the holy grail of VC analytics.
For cohorts I would think you could make something GDPR compliant with Bloom (Cuckoo) filters.
Also, anyone with a Tedomum account? It'd be nice if you could open an issue about adding umami.is. https://forge.tedomum.net/ReverseEagle/developers/-/issues
It has a plain PHP + MySQL backend, so it's really easy to install (on a LAMP server, as a WordPress plugin or one-click install on a DigitalOcean droplet).
When I started building it 8 years ago, the idea was exactly this, it should run on any basic shared hosting that can run PHP, so any site can just have its own analytics dashboard, without relying on 3rd parties.
1. Upload the script files.
2. Create a MySQL database for the script to use.
3. Run the auto-installer (to set up DB connection and create the tables in DB).
https://docs.usertrack.net/installation/uploading-the-script
This isn't a criticism of Umami. It looks like a nice clean app that accomplishes what it is trying to do. But if this is all you needed from Google Analytics than that tool was overkill in the first place.
https://www.kaushik.net/avinash/
https://www.kaushik.net/avinash/digital-marketing-and-measur...
out of curiosity, why so?
As for question - I saw a lot of great reviews on ClickHouse DB
Not if you assume that some hours will have more web traffic than others.
For simple sites like blogs, simple low volume ecommerce, etc.
But for more "serious" eCommerce, SAAS based applications and sites that are concerned with marketing on email, social and web then then optimizing what you show then and finally generating leads for salespeople to call or actual sales...
Cookies or local storage, or some way of tracking the customer across all the channels and their actions are essential.
If one can avoid using Google Analytics, then that's a good thing also.
But let's get real -- the idea of a cookie-less future is not gonna happen because people actually do business in the web.
But I'm always amazed at how much popularity these projects seem to gather. I myself made a very simple landing page [1] for a similar service (but one that caters more to the saas based applications), and it's managed to gather some interest even though I've barely done any promotion to it.
[1]: https://tinylens.io
Sure business-wise & cost-wise it might be better, but should we accept it ?
Also none of this is "essential" at all. It is only needed in a world where the competition does it too because they think it will give them a competitive advantage.
If we could decide that those kind of tracking becomes illegal, then all those big companies will be totally fine. We'll still be buying them the products we need
I know ~10 of them are React, and there's some in there that make sense. But I haven't got the time to audit them all, and re-audit it every time any of those dependencies update .
And escape-string-regexp? Really? it's literally 2 lines of code [0]. Why have I got to give the maintainer of that project commit access to this program that will be seeing potentially sensitive data?
Why, if the developer couldn't come up with those 2 lines themselves, isn't this a Stack Overflow copy/paste?
[0]https://github.com/sindresorhus/escape-string-regexp/blob/ma...
As a noob at UI it was bizarre and unintuitive for me.
Just finding the region locations of the traffic was odd and didn't make immediate sense.
<div class="button umami--onclick--signup-button">Signup</div>[0] https://softwareengineering.stackexchange.com/questions/2905...
just add the command as a cron job, and you get an auto generated static dashboard. very neat.
I wouldn’t call this a replacement to Google Analytics.
The reason to have something like Google Analytics is to track traffic at a more granular level, and with very specific intent.
Some of the things I _rely_ on include:
- custom parameters - segments - goals - A/B testing - specific views
And that’s just the short list.
Now, I use Analytics heavily because we spend a lot of effort on growth, both organic (content, seo) and paid (ads), so knowing what’s going on at that level is essential.
If you don’t, there’s not much reason to use something like GA.
Tackling the privacy focus for GA is great, but they're a good deal of products out there that already fill that niche, not to mention the requirements of the privacy crowd usually being a venture into itself.
If you wanted to make it relatively competitive for marketing, the simplest addition would be adding labelling via regex for referrers.
i.e. - Some users want to be able to group Baidu, Google, DuckDuckGo, into a single bucket for comparison. Some users want to break them down into common market segments by country. "https://www.baidu.com/link?url=FyYbCZqj65Vc7A4XeSNrOcQCS2qFX...
is from your live demo referrers, and makes it difficult to actually assess the amount of traffic from Baidu. Using a regex label means that users can break down traffic from Paid/Organic marketing fairly quickly, and start to build up dashboards they can use.
If you ever extended it to allow multiple labels for each hit, could re-run the regex over past data, and could build reports off it, you'd easily have a benefit over GA that would start to wean the marketing crowd off it.
I have been working on something similar at https://argyle.cc -- we combine cloud analytics with a self-hosted analytics collector js. That gives you the best of both worlds: privacy focused, user respecting analytics, but full featured reporting in the cloud and ad-blocker resistance. It also allows event tracking to be done over js/web or in-line/server side.
https://en.wikipedia.org/wiki/Local_differential_privacy and https://en.wikipedia.org/wiki/Randomized_response
Using it for some personal stuff, and does absolutely everything I need it to, and then some.
I love the ethos of the project, and whilst it's open source, there's a hosted option that looks super reasonable too.
What was your reasoning? Personally, I write tests for all my projects, it forces me to really think hard about how to break down the different components and functionalities and it helps others feel more confident to contribute.
FlightPHP looks nice, too, what didn't you use that for the backend?
Hacker News hug of death?