undefined | Better HN

0 pointsmjburgess4y ago0 comments

Is there anything you can say about Spark for Data Engineering (/ETL) ?

The most common reason for spark use today is ETL+DataLakes (ie., cloud object stores and ETL in/out).

It seems actual analysis is happening in fast databases that receive data from the object stores.

can anyone here comment on this paradigm?

0 comments

I don't have much insight into spark but I've been using Dataflow/beam for ETL. Been a pretty good experience. follows the style of spinning up compute to process as needed then shutdown.

j / k navigate · click thread line to collapse

0 pointsmjburgess4y ago0 comments

Is there anything you can say about Spark for Data Engineering (/ETL) ?

The most common reason for spark use today is ETL+DataLakes (ie., cloud object stores and ETL in/out).

It seems actual analysis is happening in fast databases that receive data from the object stores.

can anyone here comment on this paradigm?

0 comments

xentripetal4y ago

I don't have much insight into spark but I've been using Dataflow/beam for ETL. Been a pretty good experience. follows the style of spinning up compute to process as needed then shutdown.

j / k navigate · click thread line to collapse