AWS Redshift Serverless (opens in new tab)

(aws.amazon.com)

64 pointsTrisell4y ago51 comments

51 comments

37 comments · 9 top-level

orf4y ago· 7 in thread

We’ve just moved to Snowflake. I haven’t really been impressed with some of the new features added to Redshift, it seems like too little and too late.

The JSON support (SUPER type) is kind of cool, and they are moving towards more “automatic” sorting + partitioning, but it’s just all a bit shit to be honest.

We encountered major bugs with data-sharing, our clusters keep insisting that zstd is the best compression format to use for all our data (but then never actually using it), materialised views often fail to update and understanding why is a nightmare, terrible performance if your strings are varchar(max) (guess what Glue sets them to…), Redshift data often just dies (4 hour downtime recently, no status page) and has some really weird semantics around listing queries, before the data API you couldn’t run async queries and it’s eventbridge integration straight up doesn’t work, nightmare bugs in the Java connection library that don’t show up using psql, tiny set of types (no arrays, uuids), unkillable queries, AQUA actually causing everything to slow down hugely, critical release notes posted only in a fucking random forum, etc etc.

Snowflake has apparently sorted this, as well as including ingestion tools (snowpipe) that you’d otherwise have to stitch together with AWS Glue or something (a cursed service if ever there was one).

That being said, in some cases Redshift absolutely flies. But the real world isn’t filled with ideal schemas and natural sort keys. It’s messy. And Snowflake deals with messy better.

b9a2cab54y ago

I've had terrible experiences with Snowflake "automatic" optimizations. You have zero visibility or control over partitioning, join strategy, or anything else compared to Spark/Databricks.

dominotw4y ago

> You have zero visibility or control over partitioning

snowflake give you visiblity into clustering [1] and in the query profile view you can see how pruning is working( or not working)

Can you give an example of what you visibility you would like to see in terms of partitioning?

1. https://docs.snowflake.com/en/sql-reference/functions/system...

1 more reply

ggregoire4y ago

What do you mean "too late"? Redshift was released in 2012.

Edit for future readers: the original comment was "I haven’t really been impressed with Redshift, it seems like too little and too late".

orf4y ago

Sorry, I tend to abbreviate too much. Given the context of the article (new sexy cool redshift features ripped from Snowflake) I meant to convey “the direction redshift is heading in seems too little too late”.

1 more reply

NikolaeVarius4y ago

This article was released 30 NOV 2021. So 9 years,

dominotw4y ago

given snowflake also uses s3. Could aws do something with redshift that moves computation closer to s3 and be even faster than snowflake. Basically take advantage of the fact that they own s3.

dalyons4y ago

they already have https://aws.amazon.com/redshift/features/aqua/

basically distributing compute down to the actual storage nodes

1 more reply

opjjf4y ago· 7 in thread

So basically BigQuery from AWS. Looks good on first sight, a bit late. Personally worked for a large org which has just moved from BigQuery from Redshift and I have to say that BigQuery is the much better product.

tedivm4y ago

The only problem with BigQuery is that it's on GCP, so either you have to migrate your whole workload over or you end up eating a lot of data transfer costs.

Having used both I do think BigQuery is better in a lot of ways (although it's easier to make it a lot expensive too), but I'm really excited to see Redshift catch up. Adding the serverless options are really great too since my biggest complaint with Redshift was managing the quantity and type of the underlying instances.

Scorpiion4y ago

I have not used it, but my understanding is that BigQuery Omni is "BigQuery running on AWS/Azure". My understanding is that they are running Anthos on AWS (managed by Google) and they offer BigQuery as a service from that AWS infra managed by Google.

See more details here:

- https://cloud.google.com/bigquery-omni/docs/introduction

- https://cloud.google.com/bigquery-omni/docs/aws

pram4y ago

Athena is BigQuery from AWS. I'd have to imagine this is for people already heavily invested in Redshift.

tedivm4y ago

Athena is not BigQuery. Athena is just a fancy wrapper around Hive (you can see this right from the log output) and just runs map reduce over your S3 data. It's a great tool for what it is.

BigQuery is a full database. It is significantly faster than running anything from Athena. The closest comparison on AWS is Redshift.

2 more replies

spullara4y ago

It isn't. It is more similar to Snowflake's warehouse model.

glogla4y ago

Yeah I get that they want to go after Snowflake, Databricks and BigQuery, but AWS it not known for delivering high quality software. They have few things that are very good (EC2, S3, Lambda) and the rest is done by their B-team and barely holding together.

fdgsdfogijq4y ago

Keep telling yourself that

1 more reply

bryan04y ago· 6 in thread

How is this different than Athena?

the_af4y ago

Athena is not a relational database under the hood. Very different technologies with different implications.

glogla4y ago

Two competing departments inside AWS.

Technically, Athena is based on modified Presto while Redshift is (very) heavily modified Postgres.

the_af4y ago

Competing departments and competing underlying technologies. Postgres (and therefore Redshift) is running a relational database engine under the hood.

1 more reply

tmitchel24y ago

My impression is

Athena = Lambda + S3 (what i would call true serverless)

Redshift Serverless = Auto AWS Managed EC2 instances with local storage + S3

Although I could be wrong as I just had a quick 5 minute look at it...

tyingq4y ago

Query things other than data in S3 seems to be the big difference.

bdcravens4y ago

Athena can query non-s3 data sources using federated query.

1 more reply

ndm0004y ago· 2 in thread

I have worked with several companies that have their infrastucture on AWS but consider BigQuery or Snowflake for the serverless model they provide. This brings RedShift much closer to those options. I envision Redshift Serverless becoming the default options for most enterprises, mainly because it stays in the AWS ecosystem and you don't have to work with a different vendor and create different cost governance processes.

I beleive the real advantage AWS has here is in cost. Snowflake has positioned itself as price competitive with Redshift but this is primarily due to Snowflake's ability to scale on-demand, whereas prior Redshift versions required you to size for peak usage (RA3 helped with this). In my experience Snowflake is an order of magnitude more expensive if you compare similiar workloads and do not account for idle time. We will need to see the performance of a "Redshift Processing Unit" to be sure of the advantage, but even so AWS will be able provide significant downward cost pressure through this offering.

ignoramous4y ago

> In my experience Snowflake is an order of magnitude more expensive if you compare similiar workloads and do not account for idle time.

Cost reasons is why I'm most bullish about DataBricks's FOSS https://delta.io

gizmodo594y ago

Delta is not the way. I’d prefer an Apache project like Iceberg https://iceberg.apache.org/ rather than delta lake.

ggregoire4y ago· 2 in thread

It's not very clear why I should keep using a normal Redshift instead of switching to Serverless? Anyone has more insights?

MaxGanzII4y ago

The minimum cost for serverless seems quite high - 14.4 USD per hour.

The advantage is the flexibility to easily change compute resource. A disadvantage is that your data is now in S3 or something very like it, and this I think alters the characteristics of write performance, for the cluster; I've not yet looked into this, but it's on the list.

You absolutely should beware of falling into the trap of imagining that serverless simply gives you flexible compute and that's the only change to behaviour.

AWS in their press releases and docs are relentlessly positive - anything which is not a strength is obfuscated - so only actual experimentation and investigation throws light on what you're really getting.

bayan12344y ago

If you have a very stable SQL pipeline that runs frequently during the day then it's probably cheaper to have it on a normal Redshift cluster.

dreyfan4y ago· 2 in thread

> Just load your data

Load my data where? This is "serverless".

the_af4y ago

To "the cloud" :P

More seriously, "serverless" usually just means you aren't supposed to worry about server/cluster management, not that there are no servers anywhere. So it really means "load your data to Redshift, wherever that lives".

dreyfan4y ago

I look forward to the next stage of cloud hype - dataless - so I don't have to worry about the nuances of data management and I can just run my analytics.

1 more reply

spullara4y ago· 1 in thread

I wish Amazon would stop naming things "serverless" that clearly have a well defined type and number of servers at any point. That includes Redshift Serverless and Aurora Serverless. If it has a cluster, it isn't serverless it is just autoscaling. Every time they announce a serverless product I'm assuming that it will be like Lambda and am mostly disappointed. For example, a real Aurora Serverless would be more like CockroachDB Cloud or DynamodDB. And a real Redshift Serverless would be more like BigQuery.

MaxGanzII4y ago

AWS hype everything.

What they emit is the dance routine of a sugar coated cheerleader squad.

I am of the view everything which is not a strength is obfuscated.

I have zero faith, confidence and trust is all information AWS emits.

I approach press releases and the docs on the basis that they cover up the actual implementation, and so my task is to find out what is actually going on under the hood, so I can actually make sense of what's been provided and operate it correctly (or avoid it completely, as it may be!)

bayan12344y ago· 1 in thread

I was initially excited about this as I think it might solve our Redshift pain points and potentially avoid us having to deal with a migration to Snowflake but then I remembered when AWS account managers promised Athena and Spectrum would solve these same problems at a previous company I worked for a few years ago. I'm assuming the developer experience will still be terrible with lots of knobs to tune to actually get any decent cost/performance.

MaxGanzII4y ago

I hold a very dim view of all information AWS emits, by whatever channels.

My experience with support/account managers is that they always tell you "yes, Redshift can do this", and the and the only way to actually get a "no" out of them is to already know Redshift cannot do something, and to explain to them why.

They won't deny reality, but you would never have got that answer from them in any other way.

I suspect the problem is the training AWS give its staff. The material they are taught is relentlessly positive and I suspect AWS staff actually have no idea what Redshift is no good for.

(Indeed, if you read the official docs for RS, which I strongly advise you never to do, you will come out the other end under the impression there is literally nothing Redshift cannot do; the docs describe everything using positive terms only.)

MaxGanzII4y ago

Blog post with initial thoughts about internal design of serverless;

https://amazonredshiftresearchproject.org/slblog/index.html

j / k navigate · click thread line to collapse

51 comments

37 comments · 9 top-level

orf4y ago· 7 in thread

We’ve just moved to Snowflake. I haven’t really been impressed with some of the new features added to Redshift, it seems like too little and too late.

The JSON support (SUPER type) is kind of cool, and they are moving towards more “automatic” sorting + partitioning, but it’s just all a bit shit to be honest.

That being said, in some cases Redshift absolutely flies. But the real world isn’t filled with ideal schemas and natural sort keys. It’s messy. And Snowflake deals with messy better.

b9a2cab54y ago

I've had terrible experiences with Snowflake "automatic" optimizations. You have zero visibility or control over partitioning, join strategy, or anything else compared to Spark/Databricks.

dominotw4y ago

> You have zero visibility or control over partitioning

snowflake give you visiblity into clustering [1] and in the query profile view you can see how pruning is working( or not working)

Can you give an example of what you visibility you would like to see in terms of partitioning?

1. https://docs.snowflake.com/en/sql-reference/functions/system...

1 more reply

ggregoire4y ago

What do you mean "too late"? Redshift was released in 2012.

Edit for future readers: the original comment was "I haven’t really been impressed with Redshift, it seems like too little and too late".

orf4y ago

1 more reply

NikolaeVarius4y ago

This article was released 30 NOV 2021. So 9 years,

dominotw4y ago

given snowflake also uses s3. Could aws do something with redshift that moves computation closer to s3 and be even faster than snowflake. Basically take advantage of the fact that they own s3.

dalyons4y ago

they already have https://aws.amazon.com/redshift/features/aqua/

basically distributing compute down to the actual storage nodes

1 more reply

opjjf4y ago· 7 in thread

tedivm4y ago

The only problem with BigQuery is that it's on GCP, so either you have to migrate your whole workload over or you end up eating a lot of data transfer costs.

Scorpiion4y ago

See more details here:

- https://cloud.google.com/bigquery-omni/docs/introduction

- https://cloud.google.com/bigquery-omni/docs/aws

pram4y ago

Athena is BigQuery from AWS. I'd have to imagine this is for people already heavily invested in Redshift.

tedivm4y ago

Athena is not BigQuery. Athena is just a fancy wrapper around Hive (you can see this right from the log output) and just runs map reduce over your S3 data. It's a great tool for what it is.

BigQuery is a full database. It is significantly faster than running anything from Athena. The closest comparison on AWS is Redshift.

2 more replies

spullara4y ago

It isn't. It is more similar to Snowflake's warehouse model.

glogla4y ago

fdgsdfogijq4y ago

Keep telling yourself that

1 more reply

bryan04y ago· 6 in thread

How is this different than Athena?

the_af4y ago

Athena is not a relational database under the hood. Very different technologies with different implications.

glogla4y ago

Two competing departments inside AWS.

Technically, Athena is based on modified Presto while Redshift is (very) heavily modified Postgres.

the_af4y ago

Competing departments and competing underlying technologies. Postgres (and therefore Redshift) is running a relational database engine under the hood.

1 more reply

tmitchel24y ago

My impression is

Athena = Lambda + S3 (what i would call true serverless)

Redshift Serverless = Auto AWS Managed EC2 instances with local storage + S3

Although I could be wrong as I just had a quick 5 minute look at it...

tyingq4y ago

Query things other than data in S3 seems to be the big difference.

bdcravens4y ago

Athena can query non-s3 data sources using federated query.

1 more reply

ndm0004y ago· 2 in thread

ignoramous4y ago

> In my experience Snowflake is an order of magnitude more expensive if you compare similiar workloads and do not account for idle time.

Cost reasons is why I'm most bullish about DataBricks's FOSS https://delta.io

gizmodo594y ago

Delta is not the way. I’d prefer an Apache project like Iceberg https://iceberg.apache.org/ rather than delta lake.

ggregoire4y ago· 2 in thread

It's not very clear why I should keep using a normal Redshift instead of switching to Serverless? Anyone has more insights?

MaxGanzII4y ago

The minimum cost for serverless seems quite high - 14.4 USD per hour.

You absolutely should beware of falling into the trap of imagining that serverless simply gives you flexible compute and that's the only change to behaviour.

bayan12344y ago

If you have a very stable SQL pipeline that runs frequently during the day then it's probably cheaper to have it on a normal Redshift cluster.

dreyfan4y ago· 2 in thread

> Just load your data

Load my data where? This is "serverless".

the_af4y ago

To "the cloud" :P

dreyfan4y ago

I look forward to the next stage of cloud hype - dataless - so I don't have to worry about the nuances of data management and I can just run my analytics.

1 more reply

spullara4y ago· 1 in thread

MaxGanzII4y ago

AWS hype everything.

What they emit is the dance routine of a sugar coated cheerleader squad.

I am of the view everything which is not a strength is obfuscated.

I have zero faith, confidence and trust is all information AWS emits.

bayan12344y ago· 1 in thread

MaxGanzII4y ago

I hold a very dim view of all information AWS emits, by whatever channels.

They won't deny reality, but you would never have got that answer from them in any other way.

I suspect the problem is the training AWS give its staff. The material they are taught is relentlessly positive and I suspect AWS staff actually have no idea what Redshift is no good for.

MaxGanzII4y ago

Blog post with initial thoughts about internal design of serverless;

https://amazonredshiftresearchproject.org/slblog/index.html

j / k navigate · click thread line to collapse