I’m interested in moving to a model that continually extracts changed data to a data lake, then using the power of a cloud data warehouse to read those files and perform the transformations and modeling in SQL. I guess that's the ELT concept that you mentioned in the book's summary.
Goal being to reduce the latency and allow for the possibility of more frequent batches, as well as making the process more accessible to my team with strong SQL skills and being able to adapt faster to changing business needs.
This book looks like a good foray for me to get a glimpse into those new process. Thanks for putting it together.
- ELT over ETL
- Data Modeling is crucial as part of BI workflow
- Cloud DW over on-premise DW (but this depends on your org's requirements)
- SQL reporting (Redash, Metabase, Looker, Holistics) over non-SQL reporting (Tableau, Qlik)
I like how you explain why the methodologies of the pre-cloud era still have lessons learned to apply to today, but implementation best practices have changed thanks to the cost model of the cloud.
The section that stood out the most to me, strangely, was not anything to do with the technology or analytics stack. It was in Chapter 3 – Data Modeling Layer and Concepts where you discuss the dynamic between the CEO and the data analyst and the data. This really articulated quite well how our own dynamic functions at our current company. Even with our current data warehouse, our BI team is a bottleneck, and it is something becoming more and more apparent to me. It is my primary motivation in seeking out how best to re-architect our analytics stack.
I don't think simpler technology can give people the magic answers and easy data access that they crave any more than no code tools can let people build complicated correct systems
A suggestion of approaches and tools could be useful. Whether it's via tools such as dbt for expected field values or with frameworks such as GreatExpectations. What happens to data that doesn't conform to expected values? How should you handle it? and how can the testing process be automated? This forms an important part of ensuring data quality and reliability of the analysed output.
A little bit of context: I am a product manager and I have been working with data analysts and engineers for a few months, and even though I have tried to do a lot of research, sometimes I still don't understand what they said.
Terms are extremely difficult and varied depending on the site, and it seems like each company will have a different perception for one term.
So that's where this book comes in handy. It helped me visualize the big picture of the whole data analytics landscape. What's more, I understand what the role and challenges of the data analysts and data engineers in my team are. I was able to communicate with them in their "language", especially when I was explaining why we should use ELT instead of ETL (Chap 3, I suppose)
Anw, I think this book is great for non-tech people like me, but it requires certain experience in the tech industry to get started with. Definitely recommend for other PMs who will be working with data people!
If you're a data analyst, data engineer, or a founder setting up a data analytics stack for the first time, this is a book that will give you a soup-to-nuts overview of an entire field.
Like most books about data analytics, this assumes some amount of technical competence.
Unlike most books in the space, this is mostly about first principles. About the ideas behind the tools, not the tools themselves.
The hope is to give you 'just enough to not get lost'. And the book is written to be read within 2 hours of reading — in some cases, no more than two sittings!
There's probably more than a hundred hours of research and writing that went into this. I'm looking forward to read your comments.
Sending to my data team btw, thanks for sharing