Loads of questions that might help to find your answers:
What's your re-projection strategy? Are you at liberty to apply the same projection to all of the data in your pipeline? If not (using UTM for rasters for example), what are the fewest number of CRSs you can get away with?
How are you going to efficiently retrieve data?
For example, do you intent to CoG your rasters to enable range reading? Do you intend to pyramid your rasters on ingest so you can pull different zoom levels quickly? If you have a mix of resolutions, do you want to standardize your resolutions so that co-registration is easier on the read side?
Do you want to automate your ETL process and have it run continuously or are you ok with ad-hoc manual runs?
Is there any data filtering your want to apply in your ETL? Cloud removal, special NODATA cases, spatial-temporal filtering?
What are your cost, latency, throughput requirements? Does this project prioritize any of those more than the others?
Source: built a raster/vector ingestion pipeline which I now use for analysis. Contact info in bio if you want to chat more about this.