“Airflow is not a data streaming solution. Tasks do not move data from one to the other (though tasks can exchange metadata!)“
Many of our tools take on the order of 5-200gb of data and do either some transformation (which gets passed along to the next tool; similar size, possibly after another automated validation step) and/or validation (whereby this particular branch of workflow ceases).
The automated modules we have are self-contained; each task in our case is “data + config parameters in, data out”, then use “data out” as “data in” after choosing configuration parameters for the next step.
Would this still be a good usecase — am I misunderstanding what the above quote is about?