Reading the data off s3 will mean you will be slower than offerings like snowflake. Snowflake has optimized the crap out of doing analytics in s3, so you can’t beat it with something as simple as duckdb.
Importantly you need the data in some distributed format like parquet or split csv. Otherwise duckdb can’t read it in parallel.