Show HN: We wrote a book to help data people build scalable analytics stacks (opens in new tab)

(holistics.io)

33 pointsshadowsun76y ago16 comments

16 comments

16 comments · 9 top-level

vinhdp6y ago· 2 in thread

I'm a data engineer at a large corporation. At my current company, we use Pentaho to extract and transform our data from Oracle daily. The transformed data is loaded to a staging database where we then model it to a star-schema. Then the final results are bulk loaded to an on-prem DW. The process takes hours to complete.

I’m interested in moving to a model that continually extracts changed data to a data lake, then using the power of a cloud data warehouse to read those files and perform the transformations and modeling in SQL. I guess that's the ELT concept that you mentioned in the book's summary.

Goal being to reduce the latency and allow for the possibility of more frequent batches, as well as making the process more accessible to my team with strong SQL skills and being able to adapt faster to changing business needs.

This book looks like a good foray for me to get a glimpse into those new process. Thanks for putting it together.

huy6y ago

I'm one of the authors of the book. Yes you're right. The book outline the transition from the "old world of BI" to the "new world of BI". If you read through the book, you'll see the biases clearly stated there:

- ELT over ETL

- Data Modeling is crucial as part of BI workflow

- Cloud DW over on-premise DW (but this depends on your org's requirements)

- SQL reporting (Redash, Metabase, Looker, Holistics) over non-SQL reporting (Tableau, Qlik)

mritchie7126y ago

I'd toss SeekWell[1] in the "new world" category, but we've taken a different approach. Instead of forcing people to use a whole separate platform for BI, we decided to tightly integrate with the tools people were already going to for data (e.g. Google Sheets, Excel, Slack, etc.). We've found teams stay better informed when the data is in places they're already hanging out.

[1] https://seekwell.io/

sondnm6y ago· 1 in thread

Really well done! The content looks like a great balance. I'm an engineer working a bit in data engineering, and I find the content of the book relevant to me.

I like how you explain why the methodologies of the pre-cloud era still have lessons learned to apply to today, but implementation best practices have changed thanks to the cost model of the cloud.

The section that stood out the most to me, strangely, was not anything to do with the technology or analytics stack. It was in Chapter 3 – Data Modeling Layer and Concepts where you discuss the dynamic between the CEO and the data analyst and the data. This really articulated quite well how our own dynamic functions at our current company. Even with our current data warehouse, our BI team is a bottleneck, and it is something becoming more and more apparent to me. It is my primary motivation in seeking out how best to re-architect our analytics stack.

huy6y ago

Thank you for sharing your thoughts! Yes the concept of data modeling and self-service analytics are interesting, yet few people fully grasp it. Chapter 3.1 is probably one of my most favorites.

kentnguyen6y ago· 1 in thread

I've been looking for something like this for a while. Most of the time when I go online to search for resources on building analytics stack, most of the content is biased towards the vendor's preferred way of doing things. This looks like it will give me a high-level understanding to the why of the proposed approach.

huy6y ago

This is exactly why we started out writing this book. We actually spoke with a lot of customers, and a fair share of them sharing the same thing: They found a lot of how-to on the web, but none of which is comprehensive, and goes deep into the "why" and the "history" of BI.

sixhobbits6y ago· 1 in thread

Amazing that we can get such high quality resources for free. That said, I'm not convinced by the example where the CEO uses the "data modelling layer" (essentially what holistics offers to build for you and what this book is an ad for). In my experience a good data analyst does far more than "translate" the business question to SQL. The exec's understanding is not only limited by not knowing SQL, but also by potential confounds or a billion other things that can make a seemingly meaningful result meaningless or dangerous.

I don't think simpler technology can give people the magic answers and easy data access that they crave any more than no code tools can let people build complicated correct systems

huy6y ago

You’re right, and also right that we might be biased. We actually not trying to say that with the modeling layer the CEO can remove her reliance on the DA completely, that would be foolish. We think she can only reduce that reliance down when it only comes to getting access to data. We are not talking about complicated analysis that requires a proper data mindset.

khaito246y ago· 1 in thread

Great book and spot on of the problem statement. One interesting note to point out is I didn't see any mention of testing your data models or version control? This should be part of the process of the modern analytics stack to ensure data quality.

A suggestion of approaches and tools could be useful. Whether it's via tools such as dbt for expected field values or with frameworks such as GreatExpectations. What happens to data that doesn't conform to expected values? How should you handle it? and how can the testing process be automated? This forms an important part of ensuring data quality and reliability of the analysed output.

huy6y ago

Yes.. we did get a number of early feedback about this exact topic of data quality. But we eventually decided to cover it in another separate sidebar to the book. It’s also not a simple topic to cover. And the book is supposed to “give you enough to be dangerous”.

scared26y ago· 1 in thread

Very nice drawings, what tool was used ?

huy6y ago

We used iPad, Apple Pencil and Paper app :)

alanng6y ago

This is just the book I need!

A little bit of context: I am a product manager and I have been working with data analysts and engineers for a few months, and even though I have tried to do a lot of research, sometimes I still don't understand what they said.

Terms are extremely difficult and varied depending on the site, and it seems like each company will have a different perception for one term.

So that's where this book comes in handy. It helped me visualize the big picture of the whole data analytics landscape. What's more, I understand what the role and challenges of the data analysts and data engineers in my team are. I was able to communicate with them in their "language", especially when I was explaining why we should use ELT instead of ETL (Chap 3, I suppose)

Anw, I think this book is great for non-tech people like me, but it requires certain experience in the tech industry to get started with. Definitely recommend for other PMs who will be working with data people!

shadowsun7OP6y ago

Hey HN. This is something we've been working on for the last three months over at Holistics.

If you're a data analyst, data engineer, or a founder setting up a data analytics stack for the first time, this is a book that will give you a soup-to-nuts overview of an entire field.

Like most books about data analytics, this assumes some amount of technical competence.

Unlike most books in the space, this is mostly about first principles. About the ideas behind the tools, not the tools themselves.

The hope is to give you 'just enough to not get lost'. And the book is written to be read within 2 hours of reading — in some cases, no more than two sittings!

There's probably more than a hundred hours of research and writing that went into this. I'm looking forward to read your comments.

thongda6y ago

Nice, the illustrations look pretty good. I just took a look at the table of contents, it seems to cover a lot of my questions about data analytics for a product guy like me, will spend some time reading this weekend.

Sending to my data team btw, thanks for sharing

j / k navigate · click thread line to collapse

16 comments

16 comments · 9 top-level

vinhdp6y ago· 2 in thread

This book looks like a good foray for me to get a glimpse into those new process. Thanks for putting it together.

huy6y ago

- ELT over ETL

- Data Modeling is crucial as part of BI workflow

- Cloud DW over on-premise DW (but this depends on your org's requirements)

- SQL reporting (Redash, Metabase, Looker, Holistics) over non-SQL reporting (Tableau, Qlik)

mritchie7126y ago

[1] https://seekwell.io/

sondnm6y ago· 1 in thread

Really well done! The content looks like a great balance. I'm an engineer working a bit in data engineering, and I find the content of the book relevant to me.

I like how you explain why the methodologies of the pre-cloud era still have lessons learned to apply to today, but implementation best practices have changed thanks to the cost model of the cloud.

huy6y ago

Thank you for sharing your thoughts! Yes the concept of data modeling and self-service analytics are interesting, yet few people fully grasp it. Chapter 3.1 is probably one of my most favorites.

kentnguyen6y ago· 1 in thread

huy6y ago

sixhobbits6y ago· 1 in thread

I don't think simpler technology can give people the magic answers and easy data access that they crave any more than no code tools can let people build complicated correct systems

huy6y ago

khaito246y ago· 1 in thread

huy6y ago

scared26y ago· 1 in thread

Very nice drawings, what tool was used ?

huy6y ago

We used iPad, Apple Pencil and Paper app :)

alanng6y ago

This is just the book I need!

Terms are extremely difficult and varied depending on the site, and it seems like each company will have a different perception for one term.

shadowsun7OP6y ago

Hey HN. This is something we've been working on for the last three months over at Holistics.

If you're a data analyst, data engineer, or a founder setting up a data analytics stack for the first time, this is a book that will give you a soup-to-nuts overview of an entire field.

Like most books about data analytics, this assumes some amount of technical competence.

Unlike most books in the space, this is mostly about first principles. About the ideas behind the tools, not the tools themselves.

The hope is to give you 'just enough to not get lost'. And the book is written to be read within 2 hours of reading — in some cases, no more than two sittings!

There's probably more than a hundred hours of research and writing that went into this. I'm looking forward to read your comments.

thongda6y ago

Sending to my data team btw, thanks for sharing

j / k navigate · click thread line to collapse