undefined | Better HN

0 pointsRandomBK3y ago0 comments

>the moment you step out of FAANG

Even in FAANG, most data is going to be extremely messy.

* There's usually very little incentive for good documentation, so you have to guess what the table contains based on column names and exploring patterns in the data.

* Enforcing a single pattern/process for data across the entire org is a pipe dream.

* Logging tables are written to by hundreds of engineers across dozens of teams, with no standardized naming scheme.

* The product contains multiple edge cases/special overrides for very specific circumstances, so there's very few simple queries that don't come with 100 footnotes attached.

FAANG is not immune to large-organization problems, and data quality is no exception.

0 comments

2 comments · 2 top-level

foobazgt3y ago

Can't say speak for FAANG, but can confirm for a $100B+ business that analytics was a huge mess. There was constant investment on both the engineering and analytics functions to wrangle a coherent view on top of the underlying operational data model.

Eji17003y ago

Oh yeah im not surprised to hear that, Ive just known one or two people who've been in similar sized companies as analysts, and while the underlying table structure was a nightmare, the data model they dealt with was pretty clean.

But with that in mind, thats because theres a major pipeline of people and processes to get the data to that point, and it meant that there could sometimes be significant delay on new KPI's as they had to be cleanly worked into the model, and it of course didn't represent everything.

j / k navigate · click thread line to collapse