Why is Snowflake so expensive (opens in new tab)

(blog.devgenius.io)

364 pointseyeball3y ago207 comments

207 comments

159 comments · 48 top-level

cs7023y ago· 37 in thread

Great article. On the surface, it's about Snowflake. At a deeper level, the article is about the perverse incentives motivating SaaS businesses to do seemingly dumb, inefficient things and avoid seemingly obvious optimizations by default.

Many SaaS businesses are perfectly happy to let customers shoot themselves in the foot if it generates more revenue. The BigQuery example (presently, by default, `select * from table limit 10` obediently scans the entire table at your expense!) is spot-on.

As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."

danielmarkbruce3y ago

It's a terrible article. The author misunderstands competition and how much it drives products in this area. Snowflake is incentivized to make their product better on every dimension. If Snowflake don't improve, customers will leave in droves - like when they moved to Snowflake.

In practice, as has been pointed out in other comments, they do improve their performance (for competitive reasons) and it does cost them money when they do it.... They did it a couple qtrs ago and left $97 mill on the table.

https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...

rurp3y ago

There are many degrees of optimization and clearly there's some cost to bad performance, but Snowflake still has a massive perverse incentive to not spend too much effort on improving performance. If Snowflake is like every software company I've ever been involved with there are many competing projects at any given time and direct revenue impact is a big factor in what gets prioritized.

My own experience with Snowflake absolutely backs up the article's point. At my work we routinely encounter abysmal performance for certain types of queries, due to a flaw on Snowflake's side. We have had numerous talks with them and there is no question that they have an issue, but they have shown absolutely no urgency to fix it. Their recommendation is that we spend more money to work around the problem on their end.

2 more replies

fnordpiglet3y ago

It is a terrible article. I’ve been on the engineering side of these big data platforms including snowflake in its early days, Paraccel (redshift’s code ancestor), redshift, and others you probably use but don’t realize are actually hyper scale database engines. The author missed the mark consistently. I chortled when he discussed the redshift WLM which I helped design a very long time ago and it’s absolute garbage. Snowflakes entire point is you can decouple the storage and the database from the warehouse query engine to provide total isolation from noisy neighbors. If you’re encountering noisy neighbors you’re using the product entirely wrong.

And you’re right. The motivation snowflake has to improve is survival. It’s not like their architecture is impossible to replicate. Redshift is doing a total reorganization of the product and rewrite to compete more directly with snowflake (redshift aqua etc).

They also seem to completely discount the value of SaaS outsourcing database and storage operations to snowflake whose only focus is operating the database product. Running your own clusters is an exercise that seems smart in the first few months then like a puppy when it grows up you’re stuck with a dog. If you love dogs and train them well then great. But fact is most people are terrible dog owners, and the same is true for MPP clusters. Being able to focus on the query management operations exclusively is really ideal. Highly stateful distributed products are a PITA.

He also rants about snowflake not telling him the hardware. Snowflake runs in ec2, gcp, azure. You can literally guess the hardware types - there’s just not that many saddle point instance types for that sort of workload. Discussing ssd vs hdd is also an obvious sign of ignorance - it’s basic premise is it does very wide highly concurrent s3 gets and scans of the data using a foundation db metadata catalog to help prune. Being in aws, it’s implausible they use hdd and realistically they could elide ssds (I do not remember if they use local disks for caching, but it’s stateless regardless).

The unit costing being hardware agnostic is totally normal too - they don’t have to expose to you the details of their costing because they normalize it to a standard fictional unit.

1 more reply

AdamProut3y ago

We regularly benchmark the "big 3" Cloud Data warehouses - Redshift, Snowflake and Big Query at SingleStore. Their performance is very close to the same (within 10-20%) on most benchmarks on reasonable sized data sets (10s of TB).

I agree if the performance of one of them fell behind the others for any prolonged period of time the cost to the laggard in market share would be much much worse then short term revenue gain of "being slow on purpose".

uoaei3y ago

I don't think it misunderstands business competition. In fact it understands the concept of competition very well, and develops an insightful critique into the perverse incentives that are borne from competition.

It benefits no one except for a couple thousand people to so blatantly play their customers in this way. In fact, it's worse, as it incentivizes that same behavior of other market actors in the space.

1 more reply

didgetmaster3y ago

While I think it is definitely in a company's best long term interest to implement features that benefit its customers; it might not be in the best interest of those who are currently running the company.

We have seen many, many examples of executives who are willing to sacrifice the future of the company to get a personal short-term gain. Jack up the revenues (or slash costs) in ways that alienate customers is a great strategy when you plan to jump off with your golden parachute in a couple years when all your stock options vest.

2 more replies

hodgesrm3y ago

> It's a terrible article. The author misunderstands competition and how much it drives products in this area.

Agree, but the author has one thing right. Snowflake is not transparent about product behavior, which makes it hard to reason about costs and performance.

Open source data warehouses like ClickHouse and Druid don't have this problem. If you want to know how something works, you can look at the code. Or listen to talks from the committers. This transparency is an enduring strength of open source projects.

1 more reply

mr_toad3y ago

> The author misunderstands competition and how much it drives products in this area.

Snowflake compete on marketing.

Plenty of people rave about Snowflake and have never heard of Databricks, BigQuery or Redshift.

simo73y ago

The main flaw of the article is not controlling for product category.

I suspect most data warehouses have similar NDRs.

In many companies a data warehouse is the place where you dump all your data and let everyone run poorly written programs against it.

Add to that poor engineering culture in data teams (often lead by non-technical people) and costs are bound to skyrocket.

1 more reply

scarface743y ago

Standard disclaimer: I work at AWS in consulting and could easily be accused of drinking the Kool Aid.

Everyone from consultants, SAs, Sales, support etc is constantly working toward getting customers to “optimize” their spend. Of course any business wants you to give them more money. But, none of us are pushed to get them to spend money on services or methods to do things inefficiently.

I specifically work in consulting specializing in “application modernization”. That means most of my implementations are cheap and I’m constantly spending time making sure my implementation is cheap as possible and still meet the requirements. I first noticed this attitude from AWS when I was working for a startup.

This isn’t just with AWS. I spent years working in enterprise shops and saw the same attitude working with Microsoft.

I can’t speak for any other large organizations - AWS and Microsoft are the only two I’ve worked with as either a customer or employee where there was huge spending on infrastructure or software.

Now I could easily get started about my opinion of Oracle from the customer standpoint. But I won’t.

spmurrayzzz3y ago

Well said. I'd also add a cynical note that the recurring revenue model is incentivized to keep the gremlins around not just because of the impact to metered costs, but also because off-ramping is that much more difficult once engineers implement workaround/solutions to mitigate the impact of those smells.

Just another way that vendor lock-in occurs (intentionally or otherwise).

makk3y ago

> As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."

It depends on the time scale. A SaaS optimizing for, say, a 1-3 year financial return will see their interests through a different lens than one optimizing for a multi-decade return. Leaving optimization gremlins in isn't aligned with customers' interests in the long run, so the customers will eventually find alternatives if the SaaS doesn't eventually align itself with customers.

smugma3y ago

"As an investor, I expect Snowflake to show amazing profitability and record-breaking revenue numbers. As an Engineer, if Snowflake continues on the current path of ignoring performance, I expect them to lose share to the open-source community or some other competitor, eventually walking down the path of Oracle and Teradata. Here are a few things I think they can do to stay relevant in five years."

1 more reply

twistedpair3y ago

FWIW, BigQuery tables can be configured to require a partition filter clause [0] in the SQL query, so that you cannot shoot yourself in the foot like that. Now if they'd just make an Organization Policy to let you turn it on by default for all new tables.

[0] https://cloud.google.com/bigquery/docs/querying-partitioned-...

cs7023y ago

Yes. That's exactly the OP's point: It's up to you to remember to do the extra work necessary to avoid shooting yourself in the foot by default.

1 more reply

deepGem3y ago

Wow their statement about not participating in benchmarking wars is alarming. In this day and age, when benchmarking tools are so inexpensive and almost everything is very transparent, why not participate.

Or even better engage with a neutral third party such as Jepsen to get on an even playing field and duke it out.

datavirtue3y ago

Because their business is providing a solution that IT failed to. Despite the large cost, which the business was already accustomed to from previous IT attempts, pales in comparison to the additional costs of doing it themselves.

It's like the cloud in general, the cost is high but so is the hype. When all that dust settles over the coming years the business will start shopping on price. They will then realize they have been locked in to some extent and will need to start wriggling loose of the lock-in.

hodgesrm3y ago

> Wow their statement about not participating in benchmarking wars is alarming.

I found the Snowflake statement pretty reasonable. [0]

Vendor benchmarks are largely propaganda. What actually counts is performance on real-world workloads, starting with your own. Plus good bencharks are costly to do well. If vendors are going to invest in load testing, it's way better to do it as part of the QA process, which directly benefits users. The other thing for vendors to do is to drop DeWitt clauses so others can run benchmarks and share the results. Snowflake announced this in the statement and also changed their acceptable use policy accordingly. [1]

[0] https://www.snowflake.com/blog/industry-benchmarks-and-compe...

[1] https://www.snowflake.com/legal/acceptable-use-policy/

Disclaimer: My company runs a cloud service for ClickHouse that competes against Snowflake.

lokar3y ago

Benchmark results rarely predict actual application perf. You need to run your own queries against your own data. Do a real POC.

danielmarkbruce3y ago

Because their value prop isn't being #1 on benchmarks. It's about

* being easy to manage * being able to scale up and down compute so you can get good performance without having to keep a bunch of machines running.

tluyben23y ago

Funny that most people here advocate aws while they have tons and tons of foot shooting tools that cost people 1000s of usd all the time. And we just accept it. Like if you want to kill a complex cluster with one api call or button click, it won’t let you for xyz; that’s not because they cannot, it’s because you will just let it be and that makes money.

thehappypm3y ago

I worked at a BiqQuery shop and they have a terrific feature where right next to the “Run query” button there is an estimate of the cost of the query, in bytes. It becomes extremely obvious when a query is a full table scan.

philjohn3y ago

Ha! I wonder if we worked at the same place ... it's in the travel space, because when I worked there someone wrote a plugin that did this, and it was a real eye opener at times!

1 more reply

Aulig3y ago

It feels like these companies haven't found the right value metric to price along. Ideally it should align with the value the customer receives.

polskibus3y ago

Only competition can enforce this. The article ideally demonstrates the problems with monopolies and vendor lock-in.

1 more reply

carimura3y ago

Close. Product pricing is based on a variety of perceived factors (value, cost of change, risk of loss, etc.)

altdataseller3y ago

But that's almost impossible to measure by Snowflake. How would they know how much more revenue you earned because you use Snowflake?

1 more reply

ed25519FUUU3y ago

> The BigQuery example (presently, by default, `select * from table limit 10` obediently scans the entire table at your expense!) is spot-on.

This bit me on big queries Public patent search, which I was just noodling with for fun. Each query was $4. Ow!

dcow3y ago

I was thinking about this too. Why don’t SaaS companies just force price increases to offset their broken pricing model? Nobody would care, you’re paying the same you were paying yesterday. If you’re still the best in class product with sticky features people will stay. If not and you’re competing, then you have the opportunity to reduce the price in the future or simply not increase it and let users see lower bills which might also retain them.

kolinko3y ago

in case of BigQuery it makes sense though - they use map reduce on distributed clusters, so there is no easy way to stop after 10 results are found

JimmyAustin3y ago

It's pretty easy to limit the number of results returned by each partition to by limited to 10, then have that further reduced to 10 total during the reduce step.

cyanydeez3y ago

One more deeper level: almost all consulting exists in a world of consultants driven to limit efficiency lest their billables decline. I know a few people who seemed to aggregate their entire personality to "hard worker" when they refused to progress.

scarface743y ago

It depends. Many large companies have internal “Professional Services” departments with “consultants” who are full time employees.

Standard disclaimer: I work in ProServe at AWS.

When you “consult” and are employed by the company selling the software, billable hours and utilization is not the be all end all. Consulting is just the “nose of the camel in the tent”. They want you to be as efficient as possible so they can make ongoing revenue.

Trust me, AWS is not going to complain if it only took me 20 hours to do work that was estimated for 40 and brings in half as much consulting revenue if it means ongoing revenue from the customer.

There isn’t just a singular focus on utilization rates.

aiisjustanif3y ago

That’s one massively vague take on the whole industry of consulting, including on-prem software, open-source solutions.

My billable hours do fine while making operations more efficient and cost less.

soheil3y ago

If that lowers the barrier to entry without having expert level knowledge to know what a full table scan even means why not? Instead of hiring a dba maybe you could hire an intern instead and happily eat the cost of Snowflake.

kalimoxto3y ago

I think the point of the article is that an optimizer doesn't affect the barrier to entry at all, but adding it would save end users quite a bit of money. So they don't do it because end users' money is revenue for Snowflake/Alphabet

1 more reply

florbo3y ago

It doesn't lower barriers to entry, it's contrary to logical expectations for someone unfamiliar with how BQ works. If the query is limited to 10 results you wouldn't expect it to scan all 2 trillion of your records. Granted there are numerous warnings in the GUI for these types of things but make this mistake in Python and you're none the wiser.

1 more reply

brianwawok3y ago· 11 in thread

Ran into the same exact thing at CircleCI.

Me: My builds are really slow

CircleCI: Here are a few very low effort answers

Me: git checkout is taking literally 60 seconds, but it takes 3 seconds locally, why?

CircleCI: Mumble Mumble.

They charge per minute, so why would they care if builds are slow? Was about a year of this getting worse and worse, till I finally cancelled the service last week and built my own server in my basement.

I know get 200% faster builds, and the hardware payback time is not very long (6 months of my CircleCI bill?).

I think it's a huge red flag anytime the metric you care about is something that being "worse" makes the provider more money.

hangonhn3y ago

100% and not just in tech: when a party’s incentives aren’t aligned with yours, you’ll often find yourself getting little help or even working in opposition to each other. We recently experienced this with filing for a health related insurance claim. My wife wondered why they kept losing stuff, not doing what they promised, or asking for more paper work. I kept explaining to her that while not necessarily malicious, they have very little incentive to improve that department.

Always try to find partners or counter parties who win when you do as well. I know we don’t always have that luxury but sometimes a little headache initially is better than being stuck with someone who works in opposition to you in the long run.

Thanks so much for sharing your story. We are in the process of outsourcing some of our Jenkins functionality and these stories are useful to hear.

sremani3y ago

From the front page of CircleCI. __________________________________________________ Industry-leading speed As soon as you think it, you can deliver it. Your developers’ time is too important to waste. No other CI/CD platform takes performance as seriously as we do. Your pipelines should accelerate your business, not slow you down. __________________________________________________

Rule of thumb: Anyone talking about their honesty is not honest.

thexumaker3y ago

We did the same thing but with self hosted runners with github actions.

https://github.com/philips-labs/terraform-aws-github-runner

phillips-labs has some good resources for scaling this up as well.

mikewhy3y ago

I love that CircleCI flaunts it's speed compared to other providers, meanwhile we can clearly see the CircleCI steps take the longest in our builds.

Not to mention the constant failures.

josephcsible3y ago

> They charge per minute, so why would they care if builds are slow?

It's worse than just not caring: they have a direct financial incentive to make sure your builds are as slow as you'll tolerate.

icedchai3y ago

At a previous startup, we dumped CircleCI and switched to Jenkins on our own EC2 instance. We had a lot less problems. (This was way back in 2016, I'm sure things have improved now.)

brianwawok3y ago

Yup!

I ended up doing TeamCity over Jenkins, but they do the same thing.

Amazing how fast a 32C / 64T EYPC server in my basement can be..

1 more reply

Fatnino3y ago

Those app rental scooters that are littered around city centers: you pay for distance as well as for time. And that's why they don't go very fast.

gkoberger3y ago

No, they're legally limited to 15 MPH for safety. You also don't pay for distance, just time. Not everything is a conspiracy theory.

SF: https://www.williamweisslaw.com/sf-e-scooter-laws/ NYC: https://www1.nyc.gov/html/dot/html/bicyclists/ebikes.shtml#:....

CrazyStat3y ago

Also because going fast on those things is fucking dangerous, especially when (like most people riding them) you're not wearing a helmet.

2 more replies

SkyMarshal3y ago

In that particular case another reason could be that they quite reasonably don't want you going very fast for safety and liability concerns.

beoberha3y ago· 8 in thread

I disagree with the assertion that Snowflake has no incentive to improve performance. While I don’t work for Snowflake, I work for a competitor and we’re constantly looking to improve performance to make customers happy.

For the exact reason that the article claims Snowflake wouldn’t innovate, I’d assert that they would. If they are expensive and slow, and a competitor is faster and cheaper, eventually they will see business move to the competitor. We see it all the time.

PaulWaldman3y ago

Chrun for these services take a long time. They are "sticky" and have the baggage of enterprise agreements. With the switching costs never being zero, if SLAs are being met, it's exceedingly difficult to switch vendors.

Alternatively there is a faster impact on new sign-ups when falling behind competitors on costs and benchmarks.

cs7023y ago

Exactly. For enterprise customers in particular, replacing a SaaS tool that's deeply intertwined with many internal systems is about as easy and convenient as it is for a homeowner to rip out his/her home's existing HVAC system to replace it with a newer, more efficient one. No one ever wants to do that -- unless there's absolutely no other choice.

dominotw3y ago

Their stock price is pegged at new customer acquisition. They signed up over 6k new customers last qtr. This is one of their top stats that they present to investors.

beoberha3y ago

I worded it poorly, but I don’t necessarily mean a full exodus from the platform. In my experience, large enterprises have a lot of workloads running on different technologies (for whatever reasons) and the migration to cloud is a multi-year effort. If someone is just dipping their toe into Snowflake with easy-to-migrate workloads (which is very likely given their relative age in the market) and see performance and cost issues with those workloads, they may be hesitant to migrate the bigger ones and use that as leverage to get Snowflake to improve.

danielmarkbruce3y ago

They are all out to get new logos. They spent about $800m on S&M TTM v $1.4 bill rev. They aren't milking their customer base for cashflow.

And large customers are moving to them in droves.

tomnipotent3y ago

> have the baggage of enterprise agreements

Snowflake let's you roll into pay-as-you-go after a contract expires.

wpietri3y ago

Could you say more about the relative market position of your two companies?

I don't know the market at all, but Snowflake is certainly large and successful (IPOed in 2020, $50bn market cap). I could readily imagine that a company doing so well might not feel the incentive to improve very strongly. Or that they might see themselves more as a sales/marketing-led company than one where technical quality is a key driver. Whereas you folks as a challenger would have a lot more incentive to differentiate yourselves.

beoberha3y ago

You could probably google my username and find out, but I’ll say we’re bigger than Snowflake and are very much entrenched in the enterprise database market :)

mritchie7123y ago· 6 in thread

I predict[0] we'll see more people choosing Clickhouse over Snowflake in the next 5 years. Clickhouse will get reasonably feature compatible with Snowflake and give people a better escape hatch if they want to self-host their data stack. Clickhouse, Inc is building a cloud product that abstracts away the complexity and there's already companies like Altinity that will spin up a cluster for you in minutes.

0 - https://blog.luabase.com/clickhouse-for-data-nerds/

ramesh313y ago

Isn't Clickhouse a hosted SQL DBMS? Not really comparable to a cloud data lake.

Snowflake/Databricks scales infinitely across cloud object stores like S3. Clickhouse is run as a single (or sharded) process that uses the local file system like any other SQL database, and requires volume provisioning as your data scales. It also has a fixed run cost (EC2 or wherever it's hosted) versus an "on-demand" model where read clusters are spun up to run queries against static objects that have no fixed cost other than storage pricing.

morelisp3y ago

ClickHouse can access non-local storage without issue (or at least, with only issues for some of them - HDFS and S3 seem to work fine, I've had less luck with real-time Kafka). I'm not sure how well it scales horizontally for such uses; you can hack something up with macros that isn't too painful but there may also be better options.

However, it's probably not a great pick if you're already struggling with the operations side of things, which seems to be the main selling point for services like Snowflake.

hodgesrm3y ago

ClickHouse only has fixed run cost if you configure it that way. We run ClickHouse clusters in AWS / GCS using block storage in our cloud platform. You can scale VMs up and down vertically in minutes, and scale horizontally in the same amount of time. The model works great for SaaS use cases that require constant response at all times and scale over days or weeks rather than minutes. Real-time analytic apps that show tenant dashboards or generate recommendations for users on ecommerce sites have this characteristic.

I don't think there's really a right or wrong answer here, just trade-offs.

Disclaimer: I work on Altinity.Cloud, a platform for managed ClickHouse

KingOfCoders3y ago

In which way not comparable?

1 more reply

SnowHill99023y ago

Clickhouse is incredible software. It only feels a little foreign when coming from Postgres (e.g. some CamelCase terms).

mritchie7123y ago

Yeah, the CamelCase throws me too, especially since it's mixed in with snake_case (e.g. date_trunc[0])

0 - https://clickhouse.com/docs/en/sql-reference/functions/date-...

1 more reply

ramesh313y ago· 6 in thread

I'm of the mind that Snowflake and Databricks are losing their value prop now that Delta Lake is open source and Iceberg is maturing. What's to stop me from rolling my own Spark clusters and just using one of those? Is anyone doing this?

nemothekid3y ago

>What's to stop me from rolling my own Spark clusters and just using one of those? Is anyone doing this?

Ops. Unless your core competency is running reports and spark nodes, it's probably cheaper to outsource the management of Spark and friends than to hire people to make sure it's always up and running. To be fair I haven't touched Spark in many years but having to page someone who was good enough to spark to debug why a job stopped at 3am isn't fun.

ramesh313y ago

>Ops. Unless your core competency is running reports and spark nodes, it's probably cheaper to outsource the management of Spark and friends than to hire people to make sure it's always up and running.

I think as an end user I would absolutely agree on this point. But many companies use Databricks as part of their automated backend systems that they resell to customers. The cost per "DBU" unit is astronomical for the amount of raw compute in use. It feels a bit like running a restaurant where you serve takeout.

joshhart3y ago

[Disclaimer: Databricks employee] There's also a lot of value in DBSQL, Unity catalog (data management), and serverless for autoscaling that can all save money in terms of just running raw Spark. But if you want to operate Spark yourself, cool do it. We're happy for that, it builds the base of Spark committers over time and increases the quality of our products.

nojito3y ago

I can spin up and down 100+ node clusters on the 4 largest cloud providers at will.

What ops am I missing?

eximius3y ago

You'll find plenty of the customer base of Databricks used to run their own clusters.

It's a tradeoff. It might cost less dollars but more time. The time and expertise to run their own clusters effectively is not something every org can or desires to do.

buttaphingas3y ago

And to get the very best price for those clusters your you'd need to commit to the CSP for three years!

Would love to know the TCO trade-off between procuring, securing and deploying on your own clusters vs having them managed via SaaS.

twawaaay3y ago· 5 in thread

Snowflake is not expensive. Snowflake is super cheap, IF you know what it is for and how to use it. Compared to if you had to solve the problem on your own.

The best way to describe Snowflake is that it is a brute force method to run complex queries without creating indexes.

If you have a more traditional database, you will notice you need to set up indexes to be able to get anything from it in finite time. What if you don't know the indexes upfront? What if you want your users to be able to ask arbitrary queries and get answers before bedtime?

That's what Snowflake is for. It automates using ENORMOUS amount of hardware to get your query executed fast, very inefficiently.

It is not for free though. That inefficiency will cause a lot of resources used for queries. It is meant for those few queries when your users try to get some insight into your data and you can't predict indexes beforehand. Sometimes this is exactly what you want, like when you let your data people in to figure stuff out. Or when you have very rare functionality that allows the user to build their own queries -- which you should avoid like hell (and there are tricks to make it index pretty well) but can't always avoid.

For everything else, whenever you can predict your indexes, you always want to use more traditional database that can be very efficient on queries properly supported by indexes.

The issue is a lot of people try to use Snowflake as a database or to support frequently executing queries of the same kind. This is bad and it will cost you.

zurfer3y ago

It is fair to critize that some workloads on Snowflake are expensive.

What I found however is that Snowflake is indeed super cheap if we look at Total Cost of Ownership (TCO). Compared with other cloud data warehouses it is even easy for to cost control (warehouse size with autosuspend and resource monitors).

I work with many Snowflake customers and the biggest cost they are concerned with is usually training users so they don't shoot themselves (wrong joins, external programs "pinging" the service, ...).

Snowflake is mainly expensive because of usage, not because of bad query optimization.

(Co-Founder at https://www.sled.so/)

JustLurking20223y ago

Honestly, in the financial world, I think the value proposition may be less about anything to do with the query capabilities and more about the permissions model. Making it simple to provide clients with visibility into their data in a structured way that doesn't involve shoveling around text files (with numerous formatting gremlins to worry about) is a huge win in and of itself.

ssalka3y ago

> The issue is a lot of people try to use Snowflake as a database or to support frequently executing queries of the same kind. This is bad and it will cost you.

It seems totally natural to expect these use cases to be well-supported & cost-efficient. That they're not I think is likely to be misunderstood by a great many people, even technical folks.

danielmarkbruce3y ago

Materialized views help with this. It might not be perfect, but it isn't as bad as you say.

deep_red3y ago

Very clear-minded take on what Snowflake is great and not so great for. Snowflake is great and cheap for what it's meant for. It gets expensive when you try to use it for something it was not designed for.

jwie3y ago· 5 in thread

You would think they would be saving (and charging the customer!) a bundle not enforcing constraints on their tables.

I’d be very interested to hear the Snowflake side of this decision, but to the customer it’s simply unforgivable to have cosmetic constraints on a database.

dominotw3y ago

Because snowflake doesn't build foreign key indexes. Imagine clickstream data where every insert is being checked against an index of customers. This isn't a typical usecase for big data warehouses.

jwie3y ago

I understand that. But why have constraints that don’t do anything?

2 more replies

marcinzm3y ago

Do you have any data on the pricing of distributed databases that do support proper foreign key constraints? And how it stacks against Snowflake pricing?

veeti3y ago

Do you really need functional constraints in a OLAP database? Surely such validations already exist wherever your data is coming from.

Foobar85683y ago

Ohohoh yeah sure, you mean application based constraints? Or an Entity–attribute–value base application ? What about documents?

stassajin3y ago· 4 in thread

I'm the author of the article. Didn't expect it to blow up. Let me clarify a few points:

1. I like Snowflake and I think they brought several innovations to the field: Instant scale out/up, time-travel, unstructured data query support. 2. Snowflake obviously makes innovations and performance improvements, otherwise they would not be the market leader they are. But I'm also suspecting that they make just enough performance improvements to be at par and then use the vendor lock in features to make switching hard.

My argument is that their rate of performance innovation has considerably gone down and DataBricks, Firebolt, and open source alternatives just seem more attractive from a cost/performance ratio. I agree that Snowflake is still the best data-warehouse to start with if you have 100k, but not if you truly plan for a multi-year horizon and your usage expands.

- Redshift also brought a lot of innovation that allowed people to execute analytical queries 100x-1000x faster than any OLTP that existed out there. I've used Redshift for four years and they kept ignoring performance and features until Snowflake came out. All of a sudden because of competitor pressure, they put more effort into the product to maintain and gain market share. My hope is that Snowflake finds a solution to their innovator's dilemma, since competitors are hot on their tails.

- Some people point out that 70% usage growth just shows that Snowflake is useful. Nobody disagrees with that. The issue is that majority of the companies don't experience a 70% revenue growth to catch up with the growth in costs. At some point, you have to clamp down on costs, which means that you have to look for alternatives to run things more efficiently.

mejakethomas3y ago

Totally agree with Redshift sentiments. It's been lovely seeing BigQuery and Redshift step their game up over the past 1.5yrs, because they really should have been doing certain things for many years prior.

Re: Firebolt, I don't consider it to be in the same class as Snowflake whatsoever (even though their advertising seems to indicate otherwise). Snowflake is like a very powerful swiss army knife. Firebolt is good for a very specific (dare I say niche?) workload but falls all over itself for the vast majority of data org needs.

mr_toad3y ago

> Firebolt is good for a very specific (dare I say niche?) workload but falls all over itself for the vast majority of data org needs.

It runs SQL queries on structured data. Is that niche?

evtx3y ago

Stas: "The issue is that majority of the companies don't experience a 70% revenue growth to catch up with the growth in costs"

I think you are misunderstanding something very fundamental here. Snowflake has usage pricing and no one is forcing companies to use Snowflake 70% more every year. In my experience, companies are typically evaluating spend on other platforms and after some testing, moving additional workloads there to displace cost elsewhere. Let's say your Snowflake bill was $100k and you were unhappy with your your security data lake provider and replace a $1M bill there with $200k of Snowflake. Your Snowflake bill has now increased 200% to $300k, but you are still $800k ahead overall. In other words, your existing workload (the original $100k) didn't get more expensive.

I've worked in data warehousing for a lot of years now and stepping back, I guess I don't understand what you are trying to accomplish here. I certainly think everyone should take a "trust but verify" approach with their vendors but honestly, I don't think you proven your case, especially since you appear to complete ignore the competitive reality these vendors live in. Beyond that, I don't think "speeds and feeds" are the most important improvements going on with these platforms at the moment. Check the monthly release notes:

BigQuery: https://cloud.google.com/bigquery/docs/release-notes Databricks: https://docs.databricks.com/release-notes/product/index.html Snowflake: https://docs.snowflake.com/en/release-notes.html

Performance is important but it doesn't exist in a vacuum. What percentage of features in the past two months for each of these platforms relate to performance? On the flip side, how much does your company spend on things like data governance? How much would a data breach cost? How many people maintain the platform? What do pipeline failures cost? How is connectivity to other solutions your company uses?

If you look at where innovation is happening (and this is a VERY interesting space these days), the bulk of improvements are in areas arguably more important to companies. BigQuery has added migration improvements, Databricks has added Photon and Unity Catalog improvements, Snowflake has added Java and Python stored procedures. The list is miles long for all of these vendors and I challenge anyone in the space to keep up with everything.

Another comment here said all of these vendors are within 10-20% performance of each other. If that is true, in my opinion you're focused on a problem that is an edge case at best. Something to watch, but not nearly as interesting or as impactful as the rapid pace of innovation across this space in all areas. IMHO.

stassajin3y ago

"In my experience, companies are typically evaluating spend on other platforms and after some testing, moving additional workloads there to displace cost elsewhere"

Fair point, some of that net revenue increase is because of consolidation of workloads, although the majority of the cost is likely still driven by consumers expanding usage beyond what they expected. As I mention in my article, the second part of increase in costs has to do with data governance, and my argument is that snowflake doesn't make governance easy. Why can't they stand up a IAM-like service with a nice UI and dashboards? why can't they make integrations with pagerduty, slack, email work out of the box? Why can't I specify team based budgets and instead have to do it on a per warehouse-team basis? Why do I have to build custom bespoke tooling on top to make governance work?

I can unequivocally say that at a certain scale you need to move on and that Snowflake and many of the SaaS providers are too expensive even at medium scale companies. This article describes this paradox better than I could: https://a16z.com/2021/05/27/cost-of-cloud-paradox-market-cap...

Moreover Snowflake's enterprise pricing model is even more non-scalable. Why do companies often have to pay two times higher price per credit relative to the standard model? Shouldn't guarantees on security or support come with a fixed cost? Shouldn't enterprise offer economies of scale in pricing?

I also wish folks would read my article from end to end because my conclusion in the article is that you don't really have a choice but to use an enterprise solution when your scale is small. If I had to start my own company and had only 2 data engineers, you betcha I would use Snowflake and DataBricks.

--- btw, it really surprises me that nobody has commented on the workload manager. Am I the only one seeing that as an issue? I have enough exposure to compare it with Redshift and I can say that Snowflake's workload manager is just very bad at optimizing throughput.

1 more reply

carlineng3y ago· 4 in thread

[Disclaimer: former Snowflake employee]

Snowflake is not expensive because of perverse incentives, which is the primary claim of the article. It is expensive because it is a highly differentiated and very sticky product.

As others have mentioned, competition is the ultimate incentive to work on performance. Every dollar of Snowflake revenue is a dollar of revenue that Amazon, Google, Microsoft and Databricks are fighting for.

klysm3y ago

They aren’t exclusive. They also have perverse incentives to leave optimization gremlins in, even if they are very low hanging fruit to remove. They also have the incentive to not document them well.

daniel-cussen3y ago

Oh like injecting jitter so there's no consistency in measurement?

discodave3y ago

> Every dollar of Snowflake revenue is a dollar of revenue that Amazon, Google, Microsoft and Databricks are fighting for.

This is true, but misses one detail...

Snowflake runs in the cloud so every dollar of Snowflake revenue is roughly $0.40^1 of Amazon/Google/Microsoft revenue anyway.

^1: Snowflakes gross margin is in the range of 50-60% https://www.macrotrends.net/stocks/charts/SNOW/snowflake/gro...

mejakethomas3y ago

This, 100%.

It eats/consolidates formerly-disparate costs around the org. Because it's so good.

Which makes it look expensive.

benreesman3y ago· 4 in thread

Alright I’ll bite finally. What do these companies do? Neither Snowflake’s front-facing website, nor the Wikipedia article, nor this post tell me why people pay all this money.

I know a bit about the effort involved in chucking around 100 petabyte datasets, and there are numerous niches a SaaS could fill in there, but it’s very murky from the outside.

Croftengea3y ago

I was wondering the same thing. This sums up pretty good I guess:

> The best way to describe Snowflake is that it is a brute force method to run complex queries without creating indexes.

(https://news.ycombinator.com/item?id=32554072)

benreesman3y ago

Column stores on DFS are without a doubt tricky beasts. It’s a very rich field technically.

I guess I’m trying to get a read on whether their core competency / moat is distributed columnar query technology or sales/support/marketing.

1 more reply

joelthelion3y ago

But why not create indexes? I mean, I understand why sometimes you're you don't want an index. But building an entire warehouse around the idea of "no indexes", really ?

4 more replies

DebtDeflation3y ago

Snowflake is a data warehouse in the cloud. In the past, companies would have spent a fortune on Oracle or Teradata licenses and a fortune on on-prem hardware to run it on. Now they spend it on Snowflake and run it on AWS, etc. Same story as with any SaaS product - cheap and easy to get started, only pay for what you use, but over time the costs........get big.

KingOfCoders3y ago· 4 in thread

I have no Snowflake experience, but some limited BigQuery experience. And it's very easy for a small company to get to $100k/year bills without massive data.

tootie3y ago

Anytime your cloud spend with a single vendor starts to get out of hand, you just call and negotiate. If you make a multi-year commitment, they'll apply a substantial discount. Also, $100k/yr is still cheap compared to the cost of developers. Not just in terms of actual price tag, but risk management because a SaaS won't quit for a better offer.

dotopotoro3y ago

So you dont need developers when you use SaaS?

1 more reply

mejakethomas3y ago

Completely agree. Currently staring at 700k+ BigQuery costs annually and accomplished MUCH more with Snowflake at the same price.

dominotw3y ago

they should switch to flat rate billing capped at slots they are willing to pay for.

benjaminwootton3y ago· 3 in thread

The monthly bill does make me wince, but Snowflake of course includes all server and compute costs, no installation, initial configuration or upgrades etc. It’s genuine SaaS.

It’s also very simple to manage and optimise so less DBA or DevOps type manpower.

Then of course you can perfectly right size your instances and pay by the second for compute and by the byte for storage.

Expensive, but lower TCO than alternate approaches I suspect.

jeffwask3y ago

Yeah...100%. It's expensive til you try running a data warehouse yourself and have to hire in to support it.

Like any other service there are scale points where it no longer makes sense but for most smaller orgs it's still a bargain over DIY

nojito3y ago

We did a cost analysis and found databricks and BQ to be cheaper than a similar snowflake build out.

I think people are falling into a trap of not considering costs because “it takes care of everything”.

1 more reply

Keyframe3y ago

It’s also very simple to manage and optimise so less DBA or DevOps type manpower.
Then of course you can perfectly right size your instances and pay by the second for compute and by the byte for storage.

These two are connected vessels.

flyinglizard3y ago· 3 in thread

Where does all this data go? It's processed and then what? Sent to decision makers? Used to run automated processes?

I'm genuinely curious and would appreciate anyone who could show a real life example of this kind of pipeline where data is accumulated, then processed, then turned into revenue at the other end.

I've implemented systems that do this but my experience is that accumulating data is (too) easy, processing it in a meaningful way is slightly more challenging but ultimately driving positive business processes according to this data, which require a lot of friction with employees (training, procedures, maintenance, support) is the most difficult part.

lysecret3y ago

Same experience. I think the most interesting and most public example of such a pipeline is Google/ building a search index. This is also where a lot of the methods originally came from. Nowadays a lot of this will be used to build recommendation systems / feature pipelines for ML.

frankbinette3y ago

These are a bit too advanced examples. Think of simple descriptive statistics which is still so important yet not sexy as ML/DL/AI. ML is great, but the main usage behind these data technologies is still simple business intelligence.

Every business in every market need to understand what is going on with their processes. How many sales did I do yesterday, last week, last month, compared to last year, in which stores, what is the average basket amount, customers buy what with what, what size t-shirt do I sell the most, etc.

frankbinette3y ago

Seems like you kind of answered your own question... this data is used for business intelligence purposes.

shrimalpreeti3y ago· 2 in thread

[Disclaimer: I work for a company that offers a Snowflake Cost Optimizer product] We’re an open-source monitoring & alerting tool and many of our users were using it to set alerts on their warehousing (Snowflake) costs. The problem with Snowflake is particularly worse due to its lack of query level attribution of costs and no in-built features for monitoring or recommendations on improvements. We’re building a Snowflake Cost Optimizer (https://www.chaosgenius.io/snowflake-cost-optimizer.html) and are hearing the same feedback from our customers as the author mentions. Snowflake is definitely coming up with features towards better cost transparency but I wonder if it’s too little too late.

evtx3y ago

In my experience Snowflake is very receptive to enhancement requests. If you feel Snowflake should be doing something better for surfacing optimizations, I'd ask them.

That said, I'm not sure your comment is fully accurate: 1) "lack of query level attribution of costs" Snowflake doesn't charge per query so there can't be default query level attribution of cost. Snowflake charges by second of warehouse use. But you CAN easily see which queries ran on which warehouse and allocate costs back to that using your own criteria (by query second, usually better than by number of queries). 2) "no in-built features for monitoring" Snowflake has built in cost monitoring dashboards: https://docs.snowflake.com/en/user-guide/cost-overview.html And resource monitors: https://docs.snowflake.com/en/user-guide/resource-monitors.h...

That said, I'm sure improvements could be made. Ask for them. There must be a market for this because Capital One and Acceldata and others offer similar solutions for optimization recommendations.

mejakethomas3y ago

This. Snowflake introspection five years ago looks very, very different than today. Mostly due to enhancement requests.

glenjamin3y ago· 2 in thread

I don't know if this is the case at Snowflake, but there are similar seemingly misaligned incentives with CircleCI's build-seconds-based pricing model.

However, the generally accepted wisdom there was that improving performance had always led to more builds being run - and so still come out as a net-positive. This had happened a bunch of times as we upgraded CPUs or storage drivers or the version - there'd be a short term drop in direct revenue, but then it would bounce back quickly as people took advantage of being able to do more stuff in the same amount of time.

I'm told the revenue and finance people were pretty concerned the first time it happened though!

morelisp3y ago

I would guess that this is less likely to be true of Snowflake than CircleCI.

Most dev teams are underinvested in CI. That is, if you queried some random team, they'd probably have a dozen ideas for tests or processes they'd like to write/run if they had the resources, most of which would provide some real value - the ideas likely coming from some previous actual bugs that hit prod.

Most BI teams are overinvested in data. They have way more than is valuable. Large scale analysis is mostly exploratory and speculative, and rarely yields results. Any induced usage is more from fear they might throw away the magic bits than real value being unlocked by better efficiency. (And I think this is probably necessarily true. Any BI process that gets to the point the data is clear and regularly actionable also gets operationalized and right-sized through a more normal dev process.)

idoh3y ago

I work at Circle, but not on this specifically, and echo the same experience. This (https://en.wikipedia.org/wiki/Khazzoom–Brookes_postulate) was cited last week in a meeting that I was in, for example.

pykello3y ago· 2 in thread

(I am not affiliated with Keebo, although I had a recruiting meeting with them earlier this year)

FWIW, Keebo (https://keebo.ai/) tries to solve this problem & reduce your Snowflake bill by using Data Learning techniques. It can be configured to return exact results or approximate results.

not-my-account3y ago

It is always interesting seeing companies building up on the products / services of other companies. Kinda like TurboTax built on the IRS, these "children" (is there a better term?) companies are quite dependent on the "parent" company not changing or improving its product / service.

I don't see AWS changing so dramatically that companies like DataBricks are put in hot water (but I could be wrong), but I could see Snowflake improving its product due to competition, putting Keebo in a tough situation.

morelisp3y ago

By the time I reached this comment I counted no fewer than five completely separate links to offerings to help reduce your Snowflake bill. For something that is already a focused SaaS product, I have to say that starts to smell a bit.

mejakethomas3y ago· 1 in thread

It's not expensive.

What it can do, successfully, with three engineers was previously impossible with dozens.

What IS expensive is not being careful with it.

marymac3y ago

THIS. Apply the correct guardrails and learn to optimize.

falcolas3y ago· 1 in thread

Snowflake is a bit generic to easily find - and the article has no hyperlinks - anybody have a one sentence summary?

EDIT: There it is: https://www.snowflake.com/

Data warehousing, basically.

thesandlord3y ago

It's a data warehouse, like Google BigQuery or AWS Redshift / Athena

toto4443y ago· 1 in thread

The competition is tough in the data warehousing industry, if Snowflake is expensive people will know. Current customers may not leave but it's going to be harder for them to get new customers.

KingOfCoders3y ago

Everyone seems expensive (Looker seems to be the most expensive), and vendors are hard to compare. When evaluting some of them for a migration project, they would not let us run performance tests with our data to compare them and make a decision (paid).

wsostt3y ago· 1 in thread

Snowflake is so expensive that Capital One has developed a toolkit for managing your instance.

https://www.capitalone.com/software/solutions/

marymac3y ago

I'd love to talk to someone who has tried this out - I think it's called Slingshot

wiradikusuma3y ago· 1 in thread

So, what is Snowflake? (I assume it's snowflake.com) From Googling it looks like Google's BigQuery. So it's a DB?

hnal9433y ago

It's a data warehouse database.

alberth3y ago

This is all much simpler than the post makes it sound.

It's usage-based pricing and customers are using more of it.

> a customer that joins a year ago and spends $1 is paying out well over $1.7 a year later

The entire article is based on this 1.7x "net dollar expansion" statement.

After integrating Snowflake, customers have found value in using Snowflake and are using more of it 1 year later.

Since Snowflake is billed on usage, that explains the net-dollar expansion.

datadisruptor3y ago

[disclaimer: comment written by one of cofounders of iomete - a YC-backed startup - active in the same market as Snowflake]

I think Snowflake is (still) expensive because it is a venture-backed enterprise software company and goes through a typical trajectory...

Story goes like this: founders are product-driven and first movers -> find PMF -> need VC funding -> VCs only fund enterprise software ventures with 70%+ gross margins and high retention rates -> product/service gets priced to achieve these metrics -> VCs happy to fund sales & marketing machine needed to obtain sales growth, nobody cares about profitability until after IPO -> startup is everyone’s darling until ~2 years after IPO.

Then: economic crisis hits, customers become more price sensitive, competition intensifies. Plus now management is exposed to quarterly pressure of financial markets to deliver on top-line and margin expectations.

Meanwhile a bunch of startups are building (lower priced) alternatives. Perhaps not as mature or feature-rich as Snowflake, but good enough for 80% of use cases that Snowflake covers.

Therefore the assertion that Snowflake is not optimizing their product sounds a bit crazy to me. It would be optimizing for short-term gain, while jeopardizing its reputation as the leader in the space. Obtaining excessive margins through excessive pricing only works under monopolistic conditions or if they had a truly distinctive product. Both are not the case imo. Also, it's early days. Not exactly sure what Snowflake's market share is, but I bet it is < 5%.. so they haven't locked in everyone yet...

I bet that Snowflake will be forced to compete "also on price" in the next five years because free enterprise is a powerful thing. The title of the article could be “Why Snowflake is (still) expensive but will get more affordable over the next few years”..

kjw3y ago

"Snowflake has no incentive to push a code change that makes things 20% faster because that can correspond to 10–20% drop in short-term revenue. In a typical Innovator’s Dilemma, Snowflake prioritizes other things that generate an ever larger menu of compute options, like Snowpark and data apps built on Streamlit, that will bleed your organization dry."

This is not true. Snowflake has done just that - it has continuously improved performance resulting in reduced credit consumption and revenue from customers on a unit compute/storage basis. And it has negatively impacted their revenues and stock price. Snowflake's incentive is to strengthen their competitive position and to hopefully generate more long-term revenue from their customers.

The CFO forecasted a $97 million dollar short fall when guiding for 2022 revenue resulting from product improvements. Snowflake stock dropped immediately after.

See Q4 transcript -- https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...

"Similarly, phased throughout this year, we are rolling out platform improvements within our cloud deployments. No two customers are the same, but our initial testing has shown performance improvements ranging on average from 10% to 20%. We have assumed an approximately $97 million revenue impact in our full-year forecast, but there is still uncertainty around the full impact these improvements can have. While these efforts negatively impact our revenue in the near term, over time, they lead customers to deploy more workloads to Snowflake due to the improved economics."

Also see the Bloomberg article -- https://www.bloomberg.com/news/articles/2022-03-02/snowflake....

"Snowflake Inc., a software company that helps businesses organize data in the cloud, dropped the most ever in a single day Thursday after projecting that annual product sales growth would slow from its previous triple-digit-percentage pace.

Executives said improvements to the company’s data storage and analysis products will let customers get the same results by spending less, which will hurt revenue in the short term, but attract more clients in the future.

“The full-year impact of that next year is quite significant,” Chief Executive Officer Frank Slootman said on a conference call Wednesday after the results were released. But “when customers see their performance per credit get cheaper, they realize they can do other things cheaper in Snowflake and they move more data into us to run more queries.”"

georgewfraser3y ago

The core claim of this article, that Snowflake doesn't implement optimizations that would reduce usage, is not true. Search optimized tables, partitioned tables, and per-second billing are all counterexamples.

jjfoooo43y ago

This is a kind of poor engineering writing in which the author finds a product to not be tailored to his precise tastes and concludes it is because the company is user hostile and/or doomed.

The bit about Snowflake not being incentivized to care about costs are trivially untrue. The rest of the article perceives trade offs as simple feature gaps.

For example, Snowflake gives the user more latitude to distribute workloads among “warehouses” than other offerings. With poor distribution the author will experience the workload provisioning issues he describes.

tommyphongs3y ago

The article I doesn't have exprience with Snowflake but with Cloudera's tech stack on on-primise infrastructure. Both Cloudera and Snowflake use same approach: Separating computing and storage with main purpose: trade-of performance for scalability, easily maintaining without knowledge about user data, thus easily selling the solution to a wide range of customers without care about customer cost( maybe this also of them purpose). In my experience with Cloudera's tech stack, it become very complexity bruce-forced system, we need install HDFS for store data( storage layer), and Hive ( basically use Mysql to keep mapping between table and the hdfs file of that table)metadata store to keep HDFS's metadata, Impala to query engine( computing layer). Because computing layer don't know much about how data are organized, It is very limited when we want optimise our system, query like 'select * from TABLE limit 1' lead to scan overall data on many of hdfs file, and because Impala is memory computing engine, scan all table data lead to memory exceed, and because that, DA can't use sampling data to quickly manipulate with our data. Everything leads to the hell, and because many of things can effect to our system: HDFS, Impala, Hive metadata store, etc... so very hard to fix problem when it occurred.

cedricd3y ago

I'm glad the author also points out how customer (mis)use can blow up data warehouse costs too. No matter how efficient Snowflake could get, using the warehouse too much or with unnecessary queries will ultimately have a larger impact.

The trend in the data space currently is for usage to increase -- as more companies adopt dbt they're running more and more prebuilt (materialized views) queries on a scheduled basis, rather than on demand. This is overall a good thing in that data is becoming easier to manage and use, but it does come at an increase in warehousing costs.

I think eventually the pendulum will swing back to tools that help optimize warehouse usage, as long as they allow for the same increase in productivity as dbt (disclosure - I work for one such company)

awinder3y ago

I think the main metric that this is built on may be too coarse to derive the meaning that the article is. There’s conjecture that what’s driving this is more querying over the same dataset (more streamlit dashboards) but it could just as easily be expanding usage inside of companies. That’s what’s going on at my company right now, more teams using snowflake, more data being pushed in to replace existing workflows, etc.

I’m also not sure I understand the dig at streamlit dashboards. If you’re running hardware and introduce new read workflows, eventually you’ll need more read replicas and you’ll pay more for it. Maybe you can argue that snowflake is doing this at a higher cost but the metric data is not available in the sources to make that claim.

buremba3y ago

I believe they need to focus on the performance at least nowadays because both Databricks & BigQuery are also great products and they push Snowflake in terms of feature-parity and performance.

That being said, Snowflake is also pushing for the marketplace model where you publish your app natively to move your code where the customers environment is. If they become successful, the performance might not be the one of the incentives for the companies to go with Snowflake and the switching cost might be higher as companies will move more of their business logic embedded in the system.

epberry3y ago

> Not providing observability to monitor and reduce costs

Vantage just launched this - https://www.vantage.sh/blog/vantage-launches-snowflake-suppo.... The problems the author describes are almost exactly what we heard from customers:

- list of users/queries that are the most expensive

- alerts and notifications for costs

- query timeout. Not something a third party can do but there is an interesting 'query tagging' feature for snowflake which Vantage supports.

YouWhy3y ago

I often analyze tools as reduction from the space of problems × resources to the space of outcomes.

Let's consider Snowflake in this paradigm

- Problems: analytics on data that is not laid out in a way that's directly accessible for analysts.

- Resources: SQL analysts, few or no competent data engineers, spare cash

- Outcomes: run analytics at an industrial scale without requiring competent engineers or DevOps.

Since Snowflake's optimal client gets very easily locked in, it follows up that saving said client's money is not something even the client would care about

teej3y ago

From what I can tell, the author is incorrect about the example given in "Optimizer gremlins". I tested an example on my own data and micro-partition pruning was active.

The issue with dbt models in Snowflake is that if you ever perform a full-refresh and don't sort it, you ruin any natural clustering that arises from an incremental model. I've run into this issue many times. Auto-clustering gets too expensive at scale and Snowflake doesn't give you much guidance on alternatives.

darksaints3y ago

> We have 5–6 very good open-source data warehouse alternatives. We have Redshift, DataBricks, Firebolt, BigQuery, and likely a few other enterprise offerings, yet it is surprising how little training most companies have in negotiating and re-negotiating vendor contracts or in pushing for heavily discounted pricing.

Small nit: Redshift isn't open source. I would also add Clickhouse, Citus, and TimescaleDB as majorly capable open source technologies with commercial offerings in this space.

jmacd3y ago

Retrospectively, this is very similar to how most SaaS behaved when per user per month billing was first introduced. There were almost never any actual limits on the number of users you could add to the software, but you purchased a license for a certain number. Occasionally your account would be audited and you would be billed of the overage. It was always a significant penalty. The same was true for CPU based licenses for things like IIS, SQL Service, Oracle, etc.

0xbadcafebee3y ago

> Snowflake has no incentive to push a code change that makes things 20% faster because that can correspond to 10–20% drop in short-term revenue.

If they improve performance they can lower the cost to customers, which will make the product more attractive to prospective customers. But if they are already swimming in cash they may not feel the need to gain more customers.

Only threats prompt companies to improve things. Threat of a competitor, threat of losing all their money, threat of bad PR, threat of regulation, threat to the stock price, etc.

I see this every day in companies that don't care about managing their cloud costs. They waste money like crazy because they literally don't care if they lose money, because some exec doesn't care, or they got enough funding until the next round, etc. A couple years later another exec asks why the CISO/CTO is spending so much money without any ROI, and then everybody has to stop everything they're doing to shave pennies off cloud costs.

Companies run by individual executives are insane. I don't understand why people allow companies to be run this way. I think a co-op where employees could be active participants in the running of the company would allow for more sane decision-making.

rnk3y ago

What most commentators are missing here is that Snowflake had a significant revenue reduction when they improved the efficiency of their product, ie they could do more with less customer cost, less cpu use. This is similar to AWS lowering prices for many things steadily over time. Snowflake did this knowing that they would get less revenue, they would have less growth, and I suspect they also knew it would cause their stock price to go down. Here's an article on it from March, https://www.yahoo.com/now/snowflake-plunges-revenue-growth-o....

Certainly snowflake wants to make it easier for people to spend money and solve all their problems on that platform, every company wants that. But it's a very competitive world out there, and snowflake leaders aren't complete idiots - they have to keep lowering their prices when they can, otherwise new people will come along and do things cheaper.

hobs3y ago

I am like 95% sure that the MAX issue he mentions is wrong - I just modified some windowing function based approaches to the one he mentions and its several OOM faster because of partition elimination.

Nonetheless I agree with the basic points of the article.

rsweeney213y ago

This is a great example of misaligned incentives.

Another example of misaligned incentives is LinkedIn. LinkedIn charges $3/message. The more messages sent on their platform, the more money they make. They are not incentivized to help sales or recruiters target the right people. It can be a cash cow in the short term, but it creates a negative experience for your users.

The fact that it has worked for so long is a testament to how strong network effects are.

In the case of Snowflake, high switching costs will protect them for a while.

imwillofficial3y ago

It’s easy to point out ways leaving in foot guns look predatory. But that’s not always the case.

I work for AWS in billing, and the way we calculate bills is to try to et the customer the maximum discount.

Things like calculating savings plan coverage from smallest to largest to maximize utilization, or turning on Reserved Instance sharing on by default within an org.

I would say that the seemingly gouging behavior is more often than not technical or time constraints.

manassolanki3y ago

Snowflake is expensive if not monitored properly, on top of that they provide minimal observability. There are some good features like auto suspend and auto resume for cost savings but still there is scope of optimisations. For ex, they will charge you for minimum 1 minutes even if your query is running only for 2 seconds.

spullara3y ago

Snowflake increases performance all the time and their customers just use more of it.

throw8383833jj3y ago

it all comes down to the cost of switching and willingness of users to switch. the higher the cost of switching the higher you can make your product's price. Otherwise, with an extremely low cost of switching, the cost will ultimately be driven to near zero as more and more competitors enter the landscape.

dstola3y ago

"optimization gremlin" = dark-pattern to take as much money away from you as possile

tablespoon3y ago

> RevOps management

And now "XxxOps" is a meaningless buzzword.

danielodievich3y ago

Interesting article. Some of it accurate. Some not.

>"Snowflake has no incentive to push a code change that makes things 20% faster because that can correspond to 10–20% drop in short-term revenue" Completely untrue. There is constant optimization of scheduler, execution process, global services, and compute fabric. The famous "we shipped AWS Graviton and it's like 10%" cheaper was something we did to ourselves. There is work underway to make FoundationDB faster/more efficient too that's totally out of this world. In short, nobody wants to burn extra CPU cycles and bill you for it.

>"Disclose Hardware Specs" This isn't hard to find if you work with Snowflake's SE and Services, but it's not going to give you anything. The whole POINT of Snowflake is to hide all this nonsense and make it "just work". You want CPU and SSD metrics, feel free to use Databricks (many do) or whatever.

Now, there IS something to be said about some sort of observability into query execution as it is going. There are constant discussions on that, and some of the new upcoming features (like programmatic access to query profiler) can open that up. But yeah, Snowflake is NOT something that will open up what's under the hood and it is super intentional

>"Not adopting benchmarks" This goes around and everyone freaks out. Just profile your own work. Whatever. Nobody cares about benchmarks.

>"Optimizer gremlins" Snowflake COULD do more to expose some of the internals. My job (and job of 100s of my services and technical SE colleagues) is to help customers understand what's happening under the hood. Some of the company's "make it simple" ethos COULD be a bit more open. However, much of the common things (MP pruning) can be solved by simple user education. I've lost count of how many customers I worked with who had 0 education in Snowflake and even like 20-30 minute intro in it made them open their eyes and go "woah, I get it now". On other hand, dozens of people told me that it was amazingly easy to use without training, and it IS!

>"Improve the workload manager to increase throughput" Workload manager is considerably more complex and sophisticated than this guy tells us it is. I saw an internal presentation on its internals that I asked to convert to a confluence article which thankfully happened pretty quickly and lots of people benefitted. There is cost-based scheduling that takes expected resources of queries to schedule and also considers actual resources consumed, all very frequently and for every XP. I wish that article was public but I think it will not be made one, but still, it's definitely not FIFO.

>"Not providing observability to monitor and reduce costs" This is valid feedback now and constantly what we do in services. New manageability features are coming to help with this. See CapitalOne or bunches of companies in this ecosystem.

>"What companies that use Snowflake could do better? I agree with point about education. Huge portion of people using and abusing Snowflake don't have any formal education. Best think you can do is hire Snowflake PS or get a partner/SI, or just take a damn class, they are REALLY good.

Source: 2 years in services at Snowflake with focus on perf, cost, and manageability.

msluyter3y ago

Some of these complaints seem fair to me, some not as much. tl;dr -- Snowflake requires a fair bit of knowledge/effort to use optimally.

I spent a number of months last year focused on lowering Snowflake spend. In the process I learned a ton about Snowflake and gained a fair amount of respect for the product. Respect as in "this is really great" as well as respect as in "I need to be on guard here or I'm going to get hurt."

I think my biggest misconception at the outset was thinking of Snowflake like it's a relational database. It's not. Or rather, it is with a large number of caveats. Snowflake doesn't have b-tree indexes -- rather it has "clustering keys," which are sort of like coarse grained indexes that colocate data in micropartions, allowing queries to do micropartition pruning. If you have a well clustered table and you're filtering on your clustering keys, things will be great. But if not, or, for example you have to do multi-table joins on non-clustered columns, you'll suffer. So unless you have search optimization enabled (which costs more!), you have to retrain yourself away from "oh, just add an index here or there to make things fast" type of thinking you may have had working with Postgres or whatnot.

Regarding the author's complaints about lack of observability, I generally found it pretty easy to analyze what was going on via the query_history table. And the built in query analyzer is quite helpful. We did add tags to our dbt runs, which was pretty easy, and I wrote a handful of queries to find like the most expensive dbt models. It wasn't really that hard.

That said, dbt in particular provides a number of foot guns wrt Snowflake. Subqueries, as the author mentions, is one. We created some custom dbt macros to do things like instead of `select * from foo where x in (select * from blah)` -- if blah was small -- do a query on blah and write the query using a literal list, like `select * from foo where x in ('a', 'b', 'c', 'etc...').

Another issue we discovered is that in dbt it's trivial to create views. But we found that if views get too deeply nested, Snowflake can't adequately do predicate pushdown. So big stacks of views on views are suboptimal.

Another interesting one was tests. Dbt makes it trivial to perform null or uniqueness checks against a column. We found we were spending a lot on those tests that simply were doing something like `select * from blah where col is null`. On non-cluster key columns or complex views, these were causing full table scans. We took a number of steps to mitigate those issues. (Combining queries; changing where we did these checks in the dag). The way tests are scheduled is problematic as well. One "long pole" test will keep your warehouse up and using credits even after the other 99.9% of the tests have completed. After some analysis we separated long pole tests from the others and put them on different warehouses.

I could go on and on, actually, but I think that provides a taste of some of the complexities involved. Like almost any tool, you have to really understand it to use it effectively. But it's all too easy for, say, analysts, who may be blissfully unaware of the issues above, to write really poorly performing SQL on Snowflake.

dboreham3y ago

Because someone needs a new boat?

j / k navigate · click thread line to collapse

207 comments

159 comments · 48 top-level

cs7023y ago· 37 in thread

As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."

danielmarkbruce3y ago

https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...

rurp3y ago

2 more replies

fnordpiglet3y ago

The unit costing being hardware agnostic is totally normal too - they don’t have to expose to you the details of their costing because they normalize it to a standard fictional unit.

1 more reply

AdamProut3y ago

uoaei3y ago

1 more reply

didgetmaster3y ago

2 more replies

hodgesrm3y ago

> It's a terrible article. The author misunderstands competition and how much it drives products in this area.

Agree, but the author has one thing right. Snowflake is not transparent about product behavior, which makes it hard to reason about costs and performance.

1 more reply

mr_toad3y ago

> The author misunderstands competition and how much it drives products in this area.

Snowflake compete on marketing.

Plenty of people rave about Snowflake and have never heard of Databricks, BigQuery or Redshift.

simo73y ago

The main flaw of the article is not controlling for product category.

I suspect most data warehouses have similar NDRs.

In many companies a data warehouse is the place where you dump all your data and let everyone run poorly written programs against it.

Add to that poor engineering culture in data teams (often lead by non-technical people) and costs are bound to skyrocket.

1 more reply

scarface743y ago

Standard disclaimer: I work at AWS in consulting and could easily be accused of drinking the Kool Aid.

This isn’t just with AWS. I spent years working in enterprise shops and saw the same attitude working with Microsoft.

I can’t speak for any other large organizations - AWS and Microsoft are the only two I’ve worked with as either a customer or employee where there was huge spending on infrastructure or software.

Now I could easily get started about my opinion of Oracle from the customer standpoint. But I won’t.

spmurrayzzz3y ago

Just another way that vendor lock-in occurs (intentionally or otherwise).

makk3y ago

> As the article so well puts it, every SaaS company has a vested financial interest "to leave optimization gremlins in."

smugma3y ago

1 more reply

twistedpair3y ago

[0] https://cloud.google.com/bigquery/docs/querying-partitioned-...

cs7023y ago

Yes. That's exactly the OP's point: It's up to you to remember to do the extra work necessary to avoid shooting yourself in the foot by default.

1 more reply

deepGem3y ago

Or even better engage with a neutral third party such as Jepsen to get on an even playing field and duke it out.

datavirtue3y ago

hodgesrm3y ago

> Wow their statement about not participating in benchmarking wars is alarming.

I found the Snowflake statement pretty reasonable. [0]

[0] https://www.snowflake.com/blog/industry-benchmarks-and-compe...

[1] https://www.snowflake.com/legal/acceptable-use-policy/

Disclaimer: My company runs a cloud service for ClickHouse that competes against Snowflake.

lokar3y ago

Benchmark results rarely predict actual application perf. You need to run your own queries against your own data. Do a real POC.

danielmarkbruce3y ago

Because their value prop isn't being #1 on benchmarks. It's about

* being easy to manage * being able to scale up and down compute so you can get good performance without having to keep a bunch of machines running.

tluyben23y ago

thehappypm3y ago

philjohn3y ago

Ha! I wonder if we worked at the same place ... it's in the travel space, because when I worked there someone wrote a plugin that did this, and it was a real eye opener at times!

1 more reply

Aulig3y ago

It feels like these companies haven't found the right value metric to price along. Ideally it should align with the value the customer receives.

polskibus3y ago

Only competition can enforce this. The article ideally demonstrates the problems with monopolies and vendor lock-in.

1 more reply

carimura3y ago

Close. Product pricing is based on a variety of perceived factors (value, cost of change, risk of loss, etc.)

altdataseller3y ago

But that's almost impossible to measure by Snowflake. How would they know how much more revenue you earned because you use Snowflake?

1 more reply

ed25519FUUU3y ago

> The BigQuery example (presently, by default, `select * from table limit 10` obediently scans the entire table at your expense!) is spot-on.

This bit me on big queries Public patent search, which I was just noodling with for fun. Each query was $4. Ow!

dcow3y ago

kolinko3y ago

in case of BigQuery it makes sense though - they use map reduce on distributed clusters, so there is no easy way to stop after 10 results are found

JimmyAustin3y ago

It's pretty easy to limit the number of results returned by each partition to by limited to 10, then have that further reduced to 10 total during the reduce step.

cyanydeez3y ago

scarface743y ago

It depends. Many large companies have internal “Professional Services” departments with “consultants” who are full time employees.

Standard disclaimer: I work in ProServe at AWS.

Trust me, AWS is not going to complain if it only took me 20 hours to do work that was estimated for 40 and brings in half as much consulting revenue if it means ongoing revenue from the customer.

There isn’t just a singular focus on utilization rates.

aiisjustanif3y ago

That’s one massively vague take on the whole industry of consulting, including on-prem software, open-source solutions.

My billable hours do fine while making operations more efficient and cost less.

soheil3y ago

kalimoxto3y ago

1 more reply

florbo3y ago

1 more reply

brianwawok3y ago· 11 in thread

Ran into the same exact thing at CircleCI.

Me: My builds are really slow

CircleCI: Here are a few very low effort answers

Me: git checkout is taking literally 60 seconds, but it takes 3 seconds locally, why?

CircleCI: Mumble Mumble.

I know get 200% faster builds, and the hardware payback time is not very long (6 months of my CircleCI bill?).

I think it's a huge red flag anytime the metric you care about is something that being "worse" makes the provider more money.

hangonhn3y ago

Thanks so much for sharing your story. We are in the process of outsourcing some of our Jenkins functionality and these stories are useful to hear.

sremani3y ago

Rule of thumb: Anyone talking about their honesty is not honest.

thexumaker3y ago

We did the same thing but with self hosted runners with github actions.

https://github.com/philips-labs/terraform-aws-github-runner

phillips-labs has some good resources for scaling this up as well.

mikewhy3y ago

I love that CircleCI flaunts it's speed compared to other providers, meanwhile we can clearly see the CircleCI steps take the longest in our builds.

Not to mention the constant failures.

josephcsible3y ago

> They charge per minute, so why would they care if builds are slow?

It's worse than just not caring: they have a direct financial incentive to make sure your builds are as slow as you'll tolerate.

icedchai3y ago

At a previous startup, we dumped CircleCI and switched to Jenkins on our own EC2 instance. We had a lot less problems. (This was way back in 2016, I'm sure things have improved now.)

brianwawok3y ago

Yup!

I ended up doing TeamCity over Jenkins, but they do the same thing.

Amazing how fast a 32C / 64T EYPC server in my basement can be..

1 more reply

Fatnino3y ago

Those app rental scooters that are littered around city centers: you pay for distance as well as for time. And that's why they don't go very fast.

gkoberger3y ago

No, they're legally limited to 15 MPH for safety. You also don't pay for distance, just time. Not everything is a conspiracy theory.

SF: https://www.williamweisslaw.com/sf-e-scooter-laws/ NYC: https://www1.nyc.gov/html/dot/html/bicyclists/ebikes.shtml#:....

CrazyStat3y ago

Also because going fast on those things is fucking dangerous, especially when (like most people riding them) you're not wearing a helmet.

2 more replies

SkyMarshal3y ago

In that particular case another reason could be that they quite reasonably don't want you going very fast for safety and liability concerns.

beoberha3y ago· 8 in thread

PaulWaldman3y ago

Alternatively there is a faster impact on new sign-ups when falling behind competitors on costs and benchmarks.

cs7023y ago

dominotw3y ago

Their stock price is pegged at new customer acquisition. They signed up over 6k new customers last qtr. This is one of their top stats that they present to investors.

beoberha3y ago

danielmarkbruce3y ago

They are all out to get new logos. They spent about $800m on S&M TTM v $1.4 bill rev. They aren't milking their customer base for cashflow.

And large customers are moving to them in droves.

tomnipotent3y ago

> have the baggage of enterprise agreements

Snowflake let's you roll into pay-as-you-go after a contract expires.

wpietri3y ago

Could you say more about the relative market position of your two companies?

beoberha3y ago

You could probably google my username and find out, but I’ll say we’re bigger than Snowflake and are very much entrenched in the enterprise database market :)

mritchie7123y ago· 6 in thread

0 - https://blog.luabase.com/clickhouse-for-data-nerds/

ramesh313y ago

Isn't Clickhouse a hosted SQL DBMS? Not really comparable to a cloud data lake.

morelisp3y ago

However, it's probably not a great pick if you're already struggling with the operations side of things, which seems to be the main selling point for services like Snowflake.

hodgesrm3y ago

I don't think there's really a right or wrong answer here, just trade-offs.

Disclaimer: I work on Altinity.Cloud, a platform for managed ClickHouse

KingOfCoders3y ago

In which way not comparable?

1 more reply

SnowHill99023y ago

Clickhouse is incredible software. It only feels a little foreign when coming from Postgres (e.g. some CamelCase terms).

mritchie7123y ago

Yeah, the CamelCase throws me too, especially since it's mixed in with snake_case (e.g. date_trunc[0])

0 - https://clickhouse.com/docs/en/sql-reference/functions/date-...

1 more reply

ramesh313y ago· 6 in thread

nemothekid3y ago

>What's to stop me from rolling my own Spark clusters and just using one of those? Is anyone doing this?

ramesh313y ago

joshhart3y ago

nojito3y ago

I can spin up and down 100+ node clusters on the 4 largest cloud providers at will.

What ops am I missing?

eximius3y ago

You'll find plenty of the customer base of Databricks used to run their own clusters.

It's a tradeoff. It might cost less dollars but more time. The time and expertise to run their own clusters effectively is not something every org can or desires to do.

buttaphingas3y ago

And to get the very best price for those clusters your you'd need to commit to the CSP for three years!

Would love to know the TCO trade-off between procuring, securing and deploying on your own clusters vs having them managed via SaaS.

twawaaay3y ago· 5 in thread

Snowflake is not expensive. Snowflake is super cheap, IF you know what it is for and how to use it. Compared to if you had to solve the problem on your own.

The best way to describe Snowflake is that it is a brute force method to run complex queries without creating indexes.

That's what Snowflake is for. It automates using ENORMOUS amount of hardware to get your query executed fast, very inefficiently.

For everything else, whenever you can predict your indexes, you always want to use more traditional database that can be very efficient on queries properly supported by indexes.

The issue is a lot of people try to use Snowflake as a database or to support frequently executing queries of the same kind. This is bad and it will cost you.

zurfer3y ago

It is fair to critize that some workloads on Snowflake are expensive.

I work with many Snowflake customers and the biggest cost they are concerned with is usually training users so they don't shoot themselves (wrong joins, external programs "pinging" the service, ...).

Snowflake is mainly expensive because of usage, not because of bad query optimization.

(Co-Founder at https://www.sled.so/)

JustLurking20223y ago

ssalka3y ago

> The issue is a lot of people try to use Snowflake as a database or to support frequently executing queries of the same kind. This is bad and it will cost you.

It seems totally natural to expect these use cases to be well-supported & cost-efficient. That they're not I think is likely to be misunderstood by a great many people, even technical folks.

danielmarkbruce3y ago

Materialized views help with this. It might not be perfect, but it isn't as bad as you say.

deep_red3y ago

jwie3y ago· 5 in thread

You would think they would be saving (and charging the customer!) a bundle not enforcing constraints on their tables.

I’d be very interested to hear the Snowflake side of this decision, but to the customer it’s simply unforgivable to have cosmetic constraints on a database.

dominotw3y ago

Because snowflake doesn't build foreign key indexes. Imagine clickstream data where every insert is being checked against an index of customers. This isn't a typical usecase for big data warehouses.

jwie3y ago

I understand that. But why have constraints that don’t do anything?

2 more replies

marcinzm3y ago

Do you have any data on the pricing of distributed databases that do support proper foreign key constraints? And how it stacks against Snowflake pricing?

veeti3y ago

Do you really need functional constraints in a OLAP database? Surely such validations already exist wherever your data is coming from.

Foobar85683y ago

Ohohoh yeah sure, you mean application based constraints? Or an Entity–attribute–value base application ? What about documents?

stassajin3y ago· 4 in thread

I'm the author of the article. Didn't expect it to blow up. Let me clarify a few points:

mejakethomas3y ago

mr_toad3y ago

> Firebolt is good for a very specific (dare I say niche?) workload but falls all over itself for the vast majority of data org needs.

It runs SQL queries on structured data. Is that niche?

evtx3y ago

Stas: "The issue is that majority of the companies don't experience a 70% revenue growth to catch up with the growth in costs"

BigQuery: https://cloud.google.com/bigquery/docs/release-notes Databricks: https://docs.databricks.com/release-notes/product/index.html Snowflake: https://docs.snowflake.com/en/release-notes.html

stassajin3y ago

"In my experience, companies are typically evaluating spend on other platforms and after some testing, moving additional workloads there to displace cost elsewhere"

1 more reply

carlineng3y ago· 4 in thread

[Disclaimer: former Snowflake employee]

Snowflake is not expensive because of perverse incentives, which is the primary claim of the article. It is expensive because it is a highly differentiated and very sticky product.

klysm3y ago

daniel-cussen3y ago

Oh like injecting jitter so there's no consistency in measurement?

discodave3y ago

> Every dollar of Snowflake revenue is a dollar of revenue that Amazon, Google, Microsoft and Databricks are fighting for.

This is true, but misses one detail...

Snowflake runs in the cloud so every dollar of Snowflake revenue is roughly $0.40^1 of Amazon/Google/Microsoft revenue anyway.

^1: Snowflakes gross margin is in the range of 50-60% https://www.macrotrends.net/stocks/charts/SNOW/snowflake/gro...

mejakethomas3y ago

This, 100%.

It eats/consolidates formerly-disparate costs around the org. Because it's so good.

Which makes it look expensive.

benreesman3y ago· 4 in thread

Alright I’ll bite finally. What do these companies do? Neither Snowflake’s front-facing website, nor the Wikipedia article, nor this post tell me why people pay all this money.

I know a bit about the effort involved in chucking around 100 petabyte datasets, and there are numerous niches a SaaS could fill in there, but it’s very murky from the outside.

Croftengea3y ago

I was wondering the same thing. This sums up pretty good I guess:

> The best way to describe Snowflake is that it is a brute force method to run complex queries without creating indexes.

(https://news.ycombinator.com/item?id=32554072)

benreesman3y ago

Column stores on DFS are without a doubt tricky beasts. It’s a very rich field technically.

I guess I’m trying to get a read on whether their core competency / moat is distributed columnar query technology or sales/support/marketing.

1 more reply

joelthelion3y ago

But why not create indexes? I mean, I understand why sometimes you're you don't want an index. But building an entire warehouse around the idea of "no indexes", really ?

4 more replies

DebtDeflation3y ago

KingOfCoders3y ago· 4 in thread

I have no Snowflake experience, but some limited BigQuery experience. And it's very easy for a small company to get to $100k/year bills without massive data.

tootie3y ago

dotopotoro3y ago

So you dont need developers when you use SaaS?

1 more reply

mejakethomas3y ago

Completely agree. Currently staring at 700k+ BigQuery costs annually and accomplished MUCH more with Snowflake at the same price.

dominotw3y ago

they should switch to flat rate billing capped at slots they are willing to pay for.

benjaminwootton3y ago· 3 in thread

The monthly bill does make me wince, but Snowflake of course includes all server and compute costs, no installation, initial configuration or upgrades etc. It’s genuine SaaS.

It’s also very simple to manage and optimise so less DBA or DevOps type manpower.

Then of course you can perfectly right size your instances and pay by the second for compute and by the byte for storage.

Expensive, but lower TCO than alternate approaches I suspect.

jeffwask3y ago

Yeah...100%. It's expensive til you try running a data warehouse yourself and have to hire in to support it.

Like any other service there are scale points where it no longer makes sense but for most smaller orgs it's still a bargain over DIY

nojito3y ago

We did a cost analysis and found databricks and BQ to be cheaper than a similar snowflake build out.

I think people are falling into a trap of not considering costs because “it takes care of everything”.

1 more reply

Keyframe3y ago

These two are connected vessels.

flyinglizard3y ago· 3 in thread

Where does all this data go? It's processed and then what? Sent to decision makers? Used to run automated processes?

I'm genuinely curious and would appreciate anyone who could show a real life example of this kind of pipeline where data is accumulated, then processed, then turned into revenue at the other end.

lysecret3y ago

frankbinette3y ago

Seems like you kind of answered your own question... this data is used for business intelligence purposes.

shrimalpreeti3y ago· 2 in thread

evtx3y ago

In my experience Snowflake is very receptive to enhancement requests. If you feel Snowflake should be doing something better for surfacing optimizations, I'd ask them.

That said, I'm sure improvements could be made. Ask for them. There must be a market for this because Capital One and Acceldata and others offer similar solutions for optimization recommendations.

mejakethomas3y ago

This. Snowflake introspection five years ago looks very, very different than today. Mostly due to enhancement requests.

glenjamin3y ago· 2 in thread

I don't know if this is the case at Snowflake, but there are similar seemingly misaligned incentives with CircleCI's build-seconds-based pricing model.

I'm told the revenue and finance people were pretty concerned the first time it happened though!

morelisp3y ago

I would guess that this is less likely to be true of Snowflake than CircleCI.

idoh3y ago

pykello3y ago· 2 in thread

(I am not affiliated with Keebo, although I had a recruiting meeting with them earlier this year)

FWIW, Keebo (https://keebo.ai/) tries to solve this problem & reduce your Snowflake bill by using Data Learning techniques. It can be configured to return exact results or approximate results.

not-my-account3y ago

morelisp3y ago

mejakethomas3y ago· 1 in thread

It's not expensive.

What it can do, successfully, with three engineers was previously impossible with dozens.

What IS expensive is not being careful with it.

marymac3y ago

THIS. Apply the correct guardrails and learn to optimize.

falcolas3y ago· 1 in thread

Snowflake is a bit generic to easily find - and the article has no hyperlinks - anybody have a one sentence summary?

EDIT: There it is: https://www.snowflake.com/

Data warehousing, basically.

thesandlord3y ago

It's a data warehouse, like Google BigQuery or AWS Redshift / Athena

toto4443y ago· 1 in thread

The competition is tough in the data warehousing industry, if Snowflake is expensive people will know. Current customers may not leave but it's going to be harder for them to get new customers.

KingOfCoders3y ago

wsostt3y ago· 1 in thread

Snowflake is so expensive that Capital One has developed a toolkit for managing your instance.

https://www.capitalone.com/software/solutions/

marymac3y ago

I'd love to talk to someone who has tried this out - I think it's called Slingshot

wiradikusuma3y ago· 1 in thread

So, what is Snowflake? (I assume it's snowflake.com) From Googling it looks like Google's BigQuery. So it's a DB?

hnal9433y ago

It's a data warehouse database.

alberth3y ago

This is all much simpler than the post makes it sound.

It's usage-based pricing and customers are using more of it.

> a customer that joins a year ago and spends $1 is paying out well over $1.7 a year later

The entire article is based on this 1.7x "net dollar expansion" statement.

After integrating Snowflake, customers have found value in using Snowflake and are using more of it 1 year later.

Since Snowflake is billed on usage, that explains the net-dollar expansion.

datadisruptor3y ago

[disclaimer: comment written by one of cofounders of iomete - a YC-backed startup - active in the same market as Snowflake]

I think Snowflake is (still) expensive because it is a venture-backed enterprise software company and goes through a typical trajectory...

Meanwhile a bunch of startups are building (lower priced) alternatives. Perhaps not as mature or feature-rich as Snowflake, but good enough for 80% of use cases that Snowflake covers.

kjw3y ago

The CFO forecasted a $97 million dollar short fall when guiding for 2022 revenue resulting from product improvements. Snowflake stock dropped immediately after.

See Q4 transcript -- https://www.fool.com/earnings/call-transcripts/2022/03/02/sn...

Also see the Bloomberg article -- https://www.bloomberg.com/news/articles/2022-03-02/snowflake....

georgewfraser3y ago

jjfoooo43y ago

This is a kind of poor engineering writing in which the author finds a product to not be tailored to his precise tastes and concludes it is because the company is user hostile and/or doomed.

The bit about Snowflake not being incentivized to care about costs are trivially untrue. The rest of the article perceives trade offs as simple feature gaps.

tommyphongs3y ago

cedricd3y ago

awinder3y ago

buremba3y ago

I believe they need to focus on the performance at least nowadays because both Databricks & BigQuery are also great products and they push Snowflake in terms of feature-parity and performance.

epberry3y ago

> Not providing observability to monitor and reduce costs

Vantage just launched this - https://www.vantage.sh/blog/vantage-launches-snowflake-suppo.... The problems the author describes are almost exactly what we heard from customers:

- list of users/queries that are the most expensive

- alerts and notifications for costs

- query timeout. Not something a third party can do but there is an interesting 'query tagging' feature for snowflake which Vantage supports.

YouWhy3y ago

I often analyze tools as reduction from the space of problems × resources to the space of outcomes.

Let's consider Snowflake in this paradigm

- Problems: analytics on data that is not laid out in a way that's directly accessible for analysts.

- Resources: SQL analysts, few or no competent data engineers, spare cash

- Outcomes: run analytics at an industrial scale without requiring competent engineers or DevOps.

Since Snowflake's optimal client gets very easily locked in, it follows up that saving said client's money is not something even the client would care about

teej3y ago

From what I can tell, the author is incorrect about the example given in "Optimizer gremlins". I tested an example on my own data and micro-partition pruning was active.

darksaints3y ago

Small nit: Redshift isn't open source. I would also add Clickhouse, Citus, and TimescaleDB as majorly capable open source technologies with commercial offerings in this space.

jmacd3y ago

0xbadcafebee3y ago

> Snowflake has no incentive to push a code change that makes things 20% faster because that can correspond to 10–20% drop in short-term revenue.

Only threats prompt companies to improve things. Threat of a competitor, threat of losing all their money, threat of bad PR, threat of regulation, threat to the stock price, etc.

rnk3y ago

hobs3y ago

Nonetheless I agree with the basic points of the article.

rsweeney213y ago

This is a great example of misaligned incentives.

The fact that it has worked for so long is a testament to how strong network effects are.

In the case of Snowflake, high switching costs will protect them for a while.

imwillofficial3y ago

It’s easy to point out ways leaving in foot guns look predatory. But that’s not always the case.

I work for AWS in billing, and the way we calculate bills is to try to et the customer the maximum discount.

Things like calculating savings plan coverage from smallest to largest to maximize utilization, or turning on Reserved Instance sharing on by default within an org.

I would say that the seemingly gouging behavior is more often than not technical or time constraints.

manassolanki3y ago

spullara3y ago

Snowflake increases performance all the time and their customers just use more of it.

throw8383833jj3y ago

dstola3y ago

"optimization gremlin" = dark-pattern to take as much money away from you as possile

tablespoon3y ago

> RevOps management

And now "XxxOps" is a meaningless buzzword.

danielodievich3y ago

Interesting article. Some of it accurate. Some not.

>"Not adopting benchmarks" This goes around and everyone freaks out. Just profile your own work. Whatever. Nobody cares about benchmarks.

Source: 2 years in services at Snowflake with focus on perf, cost, and manageability.

msluyter3y ago

Some of these complaints seem fair to me, some not as much. tl;dr -- Snowflake requires a fair bit of knowledge/effort to use optimally.

dboreham3y ago

Because someone needs a new boat?

j / k navigate · click thread line to collapse