AWS also had customers that had petabytes of data in Redshift for analysis. The conversation is missing a key point: DuckDB is optimizing for a different class of use cases. They’re optimizing for data science and not traditional data warehousing use cases. It’s masquerading as size. Even for small sizes, there are other considerations: access control, concurrency control, reliability, availability, and so on. The requirements are different for those different use cases. Data science tends to be single user, local, and lower availability requirements than warehouses that serve production pipelines, data sharing, and so on. I also think that DuckDB can be used for those, but not optimized for those.
Data size is a red herring in the conversation.