We are at the stage now we we need to start doing it "properly". I have a reasonably good understanding of tech in general, including on the infra, database and sql side. I have also done a fair bit of research into data warehousing, ELT/ETL pipelines, etc.
To start off with, I'd like to get a proof of concept up and running, just by extracting data from our RDS databases (we don't have a HUGE amount of data - few hundred million records spread over 2 DBs). There are other data sources that we'd like to connect to, but that can come later.
We would prefer AWS or self hosted, but we are open to any possibilities (Google, Fivetran, Snowflake, etc). I guess there are 3 pieces to the puzzle:
- Data warehouse (I almost think that we should just be pushing this all to another RDS database (and denormalise, aggregate, etc where appropriate).
- ETL/ELT
- Reporting (Metabase, Superset, Quicksight?)
We're looking for the simplest option that we can use to get up and running, but also have a clear path to growing with us over time. Do you have any suggestions on the best way to get started?