Analytics at GitHub (opens in new tab)

(johnnunemaker.com)

125 pointsjanerik12y ago16 comments

16 comments

10 comments · 6 top-level

fuziontech12y ago· 2 in thread

Fantastic read. Concise and solid decision explanations. Thanks for writing!

Was there any other reason you chose kestrel over alternatives like kafka? Did you test any others, or where you just that satisfied with kestrel?

jnunemaker12y ago

We chose kestrel mostly just from usage/familiarity. We've been satisfied with it, but are currently researching/testing kafka.

nextplaylist12y ago

Are you guys using either of them elsewhere or just for analytics?

1 more reply

nicklovescode12y ago· 2 in thread

As an aside, do you have any info on the visual software used to run the charts? I'm guessing d3 is there somewhere., but maybe not. I've struggled to find a beautiful charting library and yours are beautiful!

calavera12y ago

we use d3 for all our charts.

nicklovescode12y ago

any chance of you guys open-sourcing them?

1 more reply

phunge12y ago

Jay Kreps speaks the truth. His talk "Building LinkedIn's Real-time Data Pipeline" is along the same lines as the Log blogpost mentioned here and is also extremely informative.

khaledh12y ago

Very good article. It aligns with our envisioned architecture for our next-gen analytics platform.

So far our decision is to keep the raw events in Cassandra, and pre-aggregate most data for fast reads. Just wondering about your decision to not store raw events in Cassandra, and use raw files for that, and using Cassandra only for storing Hadoop analysis results. Do you think this decision may affect you later if you ever decide to support real-time analytics?

nickstinemates12y ago

> For any business, the process of collecting data, measuring performance, making changes, and reviewing if those changes were successful is really important.

This applies for any sort of goal/process/?, whether programmatic or personal.

Very cool story, I'm looking forward to additional features. We pull a lot of data about Docker from GitHub that could be more readily available. We'd be more than happy to discuss or beta any new features, if you're interested.

alexatkeplar12y ago

Nice to see lots of parallels to how we have architected things at Snowplow (trackers -> collectors -> enrich -> storage -> analytics)

j / k navigate · click thread line to collapse

16 comments

10 comments · 6 top-level

fuziontech12y ago· 2 in thread

Fantastic read. Concise and solid decision explanations. Thanks for writing!

Was there any other reason you chose kestrel over alternatives like kafka? Did you test any others, or where you just that satisfied with kestrel?

jnunemaker12y ago

We chose kestrel mostly just from usage/familiarity. We've been satisfied with it, but are currently researching/testing kafka.

nextplaylist12y ago

Are you guys using either of them elsewhere or just for analytics?

1 more reply

nicklovescode12y ago· 2 in thread

calavera12y ago

we use d3 for all our charts.

nicklovescode12y ago

any chance of you guys open-sourcing them?

1 more reply

phunge12y ago

Jay Kreps speaks the truth. His talk "Building LinkedIn's Real-time Data Pipeline" is along the same lines as the Log blogpost mentioned here and is also extremely informative.

khaledh12y ago

Very good article. It aligns with our envisioned architecture for our next-gen analytics platform.

nickstinemates12y ago

> For any business, the process of collecting data, measuring performance, making changes, and reviewing if those changes were successful is really important.

This applies for any sort of goal/process/?, whether programmatic or personal.

alexatkeplar12y ago

Nice to see lots of parallels to how we have architected things at Snowplow (trackers -> collectors -> enrich -> storage -> analytics)

j / k navigate · click thread line to collapse