We have millions of rows coming from unique users in realtime at $600/mo, with Segment this would at least be $5000/mo.
We then use Redash to prepare charts, tables, etc for analysis.
This might!
The pros are:
* full SQL access to the data via BigQuery
* simple to set up (yes, you have to write custom code; but a basic implementation is on the order of a couple dozen of lines of code)
* we have full ownership of the data across the pipeline (better for user privacy than using another 3rd party)
Prior to this we used Google Analytics, but their paid solution is too expensive for us and their analytics/aggregation API (though quite powerful) samples a subset of the data which was not acceptable for some of our use cases.
rakam.logEvent('pageview', {url: 'https://test.com'})
The event types and attributes are all dynamic. The API server automatically infers the attribute types from the JSON blob and creates the tables which correspond to event types and the columns which correspond to event attributes and inserts the data into that table. It also enriches the events with visitor information such as user agent, location, referrer, etc. The users just run the following SQL query:
SELECT url, count(*) from pageview where _city = 'New York' group by 1
All the project is open-source: https://github.com/rakam-io/rakam Would love to get some contribution!
If you just need web server log processing, AWStats is free and pretty good: https://awstats.sourceforge.io/