Comparing Fauna and DynamoDB: Architecture and Pricing (opens in new tab)

(fauna.com)

53 pointsevanweaver5y ago34 comments

34 comments

25 comments · 11 top-level

yazaddaruvala5y ago· 5 in thread

I'm sure Fauna is a great database and probably cheaper in many cases. I just have some issues with the "Complex Example". I just don't feel it is realistic that anyone familiar with DynamoDB would create such a schema. It comes across like a good schema for Fauna is being forced onto DynamoDB, without an evaluation of what would be the recommended "DynamoDB way" of solving the customer's needs.

> We have an accounts table with 20 secondary indexes defined for all the possible sort fields (DynamoDB’s maximum—Fauna has no limit).

The usecase of having 20 secondary indexes in DDB is an extremely rare case. Arguably should be considered an anti-pattern, only used for an application transitioning query patterns in some way. If this is the norm for an application, I'd argue the product managers/developers do not understand their customer's needs well enough. I'd assume that at this stage in the product's life, a basic Postgress installation is likely a better choice.

Additionally, if the query pattern really needs to be "super flexible" for the long term, you'll find that eventually you'll need more and more of ElasticSearch's tech (or similar tech). A very common pattern is to use DDB Streams to ElasticSearch connector (obviously sacrificing query-after-write consistency).

> Viewing just the default account screen queries 7 indexes and 25 documents. A typical activity update transactionally updates 3 documents at a time with 10 dependency checks and modifies all 35 indexes.

This is such a red flag. If your application requires this from DDB, you should change your schema (probably more de-normalization). However, the example doesn't have enough information for me to suggest a better schema to meet the customer's needs.

Disclaimer: I work at Amazon, but not in AWS. My opinions are my own.

evanweaverOP5y ago

We are in agreement, the difference in experience as you move between denormalized key/value style modeling and normalized relational modeling is the core of the post. DynamoDB has added relational-like features, but using them in a traditional relational way goes against its architectural grain.

Is it necessary that data modeling flexibility must decline as an application matures and scales, though? This was one of the larger millstones around our neck at Twitter and what we are building Fauna to avoid.

yazaddaruvala5y ago

> Is it necessary that data modeling flexibility must decline as an application matures and scales, though?

Yeah, this is a valuable question, and I agree its not an obvious answer. The tricky part is that at enough scale, humans are really the bottleneck. Teams step on each others toes, abuse schemas, add data to a schema "because", etc, etc. Its possible that to best manage the humans, de-normalization and relying on replicated data stores with slightly different views of the data is simplest.

Add again if "overly flexible" is a long term product requirement, I'd argue you're going to eventually need full text search with all the power of Lucene (I'm betting its on your roadmap).

If Fauna perfectly addresses this problem domain, its likely quite helpful, but this article did not convince me it'll always be cheaper / better than DynamoDB + ElasticSearch for the complex usecases. That said, I look forward to the day I'm proved wrong :)

1 more reply

arpinum5y ago

> I'd argue the product managers/developers do not understand their customer's needs well enough

The limited sorting options in AWS services seem to be optimising AWS’ costs rather than understanding customer needs. I’m often frustrated by the experience when I think I can click on a column and can’t. DynamoDB doesn’t handle the use case of diverse user groups exploring data and slicing through it. That’s ok, every database has its strengths. But don’t dismiss the idea that unconstrained access patterns can be the solution to a customer need.

1 more reply

pier255y ago

> I'd argue the product managers/developers do not understand their customer's needs well enough. I'd assume that at this stage in the product's life, a basic Postgress installation is likely a better choice.

You mean compared to DynamoDB?

Because Fauna is just as flexible as Postgres.

yazaddaruvala5y ago

Yeah, sorry. I was just comparing to DynamoDB, I have no comparison between Fauna and Postgres.

sargun5y ago· 4 in thread

One of the really cool DynamoDB features I love (at least in theory) is CDC / Streams. Also the fact it automagically hooks up to Kinesis is neat. Unfortunately, for personal projects, this can lead to runaway spending.

Does Fauna have strongly ordered CDC stream?

databrecht5y ago

If I'm not mistaking, Dynamo's streaming is pulling under the hood. I suppose it depends on how frequently it pulls how quickly the price goes up and how expensive such a pull is in reads.

In Fauna we have temporality as a first-class citizen. You can get efficient and cheap changesets by leveraging temporality since you can just ask: "what has been added/removed in this collection or index match after timestamp X" and can combine that by writing an index that delivers you the answer to "what are the updated documents in a collection after a certain timestamp?". That brings you very cheap pull-based CDC.

We recently introduced a second possibility that allows for push-based streaming for documents. Document-based streaming allows you to open separate streams for multiple existing documents to get updates on those. This is only the first phase, sets (such as index matches or whole collections) are coming up. Streaming becomes cheaper if you want your data to be really life (<1s) which is excellent for UI redraws but could potentially also be used for CDC (probably in combination with the temporal features if you need to restart streams). Both push-based as pull-based are strongly ordered.

I describe how to get the query for the pull-based approach here: https://forums.fauna.com/t/example-custom-subscription-funct... And a blog on the streaming API can be found here: https://fauna.com/blog/live-ui-updates-with-faunas-real-time...

Once set streaming (next to document streaming) is out as well, we have two strong solutions and you can choose what suits you best and what is most efficient for your use case. Do you want instant UI redraws? Use push-based, are you pulling a changeset every houre? Use pull-based.

pier255y ago

Yeah they recently announced strongly consistent streaming.

https://fauna.com/blog/live-ui-updates-with-faunas-real-time...

jcims5y ago

I think the pricing of DynamoDB is the killer for personal projects. I ran it as a bit of a persistent cache one night and ran up $60 in charges.

ledauphin5y ago

this is pretty weird for me to hear, as a developer on a project with 3000+ monthly average users where DynamoDB is my only database and the costs are in the hundreds per month.

I'm sure that any database can be "too expensive" if your access patterns are out-of-this-world intense, but it's very difficult to compare apples to apples when we have no idea how beefy the PostreSQL instance to support your cache would have been.

2 more replies

rafaelturk5y ago· 2 in thread

Amazingly Fauna pricing is even more confusing than DynamoDB's

northstar7025y ago

What do you find confusing? Would be interested in your feedback.

rafaelturk5y ago

First is the overall pricing strategy: Plans..

Why it has to be plans? Neither AWS nor GCP, nor Azure have plans for any of their products. I understand that plans make sense for SaaS, platforms and user based products. But this is a DB where costs are mostly related to storage+cpu usage.

Second the plans have fancy names, but they are actually just $ based volume commitment.

So the $25 Plan gives you $25 worth of usage, the $150 gives you $150 worth of usage.. This is not a plan is a monthly commitment. AWS Saving Plans is far smarter on this case, after you have a monthly commitment you'll receive a discount for your loyalty.

Third: Some features are only unlocked after a certain plan. I'm interested in `premium regions` but this is only unlocked at $500/month. So in order to test this i need to commit to the `business plan`. With AWS, Azure, GCP you can pretty much use any feature at any scale, so you can test complex features with just a few users or even for just a few days, but you'll only pay what you use.

Lastly: The real, clear, pricing is already there! At the bottom of the page: Billing unit rates. This is a simple, clean table that I can easy reason about and evaluate costs my business plan.

P.S Sorry the headline `Pay only for what you need`? sorry this BS marketing, as stated above I can't use premium regions in the individual plan, so well technically incorrect.

Hope this helps.. Project looks really amazing..

1 more reply

mNovak5y ago· 1 in thread

In their simple example of a website hit-counter, can someone explain how you would aggregate batches of 50 requests to amortize compute costs? I thought the whole point of the DB is to store information between disparate requests?

pier255y ago

AFAIK you can't for that particular contrived example. The article probably mentions the batching of 50 queries just to give you an idea of pricing, not because it works for that example.

Still, even in the worst use case for Fauna I find that $7.50 for 2M queries with all the features it offers is still a good price (multi-region, ACID, realtime, FQL, authentication and authorization, etc).

ledgerdev5y ago· 1 in thread

The data consistency seems most attractive point. I'm wondering where precisely fauna clusters are located, so I could run my lambda functions in same location. What sort of latency do we see when connecting from various azure/aws datacenters? Are they in most aws data centers?

databrecht5y ago

At the bottom of this page you can see the regions (and future regions) https://fauna.com/features. Since Fauna is inspired by Calvin we are not dependent on clocks like Spanner to deliver global consistency and can run on any hardware. Currently, each database is automatically distributed so your lambdas would read from the closest location. We are working on region selection in case you want to avoid the overhead (arguably small latency overhead thanks to the algorithm) of multi-region.

Latencies can be found here: https://status.fauna.com/ As a long fan of Fauna (who now works as a dev adv for them after following them for 2 years) for me what was the most attractive is the combination of features without compromises. Scalability/distribution without losing consistency, relations and powerful indexing (e.g. best of NoSQL and traditional databases combined). I was also attracted by the temporality aspect personally.

xdmr5y ago· 1 in thread

> Finally, let’s imagine we have something more like a typical SaaS application, for example, a CRM. We have an accounts table with 20 secondary indexes defined for all the possible sort fields

What makes you think you've imagined a good CRM schema here?

One problem with this article is that it doesn't have any code. You'd think it would, right? You're selling this thing to developers and architects. Why aren't we linking to a supplementary repo with the examples used for this application for both DynamoDB and Fauna?

One possibility is that the example DynamoDB design is a very bad one (I mean, you're actually using all 20 GSIs; what?), and that anyone familiar with DynamoDB would say "Actually, you can cover all the query patterns with 3 GSIs if you do it this way."

Why do this? One possibility is that the ways that Fauna actually is better than DynamoDB are too subtle to get anyone's attention. They're real and useful, but not ridiculous. The people who actually use DynamoDB at massive scale might understand them, but also probably won't want to change up.

So you go after people who aren't using DynamoDB at massive scale. Say, early-stage startup founders who want to be on DynamoDB from day 1 because someday their product will be Web Scale. But don't have a lot of time to carefully evaluate claims like this. They just say "10x cost reduction? Wow, Fauna is the new best DB!" Most of these guys fail, but a few of them are a runaway success (and would have been equally so if they'd used DynamoDB), are now stuck with Fauna whether they like it or not (but let's assume they like it as least as well as DynamoDB, maybe even slightly more), and are now listed as large scale users of Fauna on their website. You too could be a unicorn startup! Start using Fauna today!

Basically, I think the makers of Fauna are trying to con you with this article. It's not that their product is bad, it's that they're trying to get you to buy it for reasons other than that it's good.

databrecht5y ago

> Why do this? One possibility is that the ways that Fauna actually is better than DynamoDB are too subtle to get anyone's attention. They're real and useful, but not ridiculous. The people who actually use DynamoDB at massive scale might understand them, but also probably won't want to change up.

I respectfully disagree :). I don't think that the combination of relations, strong consistency, flexible/powerful indexing, a language that allows you to do complex conditional transactions or reads in one query are subtle differentiators. Especially when you can maintain all those things while being multi-region and scalable (and you also get a flexible security system and get to query back-in-time and/or query/alter history and/or get changesets cheaply). Of course, this post didn't go in depth on all of these since that's not the topic of this post.

Many databases have limitations on the former and present workarounds that require you to either do a lot of work or build something in such an inflexible hard-coded way that it would be very hard to change. The mere fact that they present workarounds (which essentially what a single-table design is for me), to me, indicates that there is a need for their users to work around it.

> So you go after people who aren't using DynamoDB at massive scale. Say, early-stage startup founders who want to be on DynamoDB from day 1 because someday their product will be Web Scale. But don't have a lot of time to carefully evaluate claims like this. They just say "10x cost reduction? Wow, Fauna is the new best DB!" Most of these guys fail, but a few of them are a runaway success (and would have been equally so if they'd used DynamoDB), are now stuck with Fauna whether they like it or not (but let's assume they like it as least as well as DynamoDB, maybe even slightly more), and are now listed as large scale users of Fauna on their website. You too could be a unicorn startup! Start using Fauna today!

I think you just described the life of a developer when selecting <insert random new technology>. Technological advances are accelerating, and we don't have enough time to research them all, so we skim through the posts/docs and look at what other companies have done. I understand what you mean and it's an everyday source of frustration to me as well that many chase new technologies based on one article. That's how many startups ended up with microservices they didn't need or how a new SPA technology takes the world by storm every 2 years.

> Basically, I think the makers of Fauna are trying to con you with this article. It's not that their product is bad, it's that they're trying to get you to buy it for reasons other than that it's good.

The last sentence is quite unfair imho though. Fauna is one of the databases that tries to be correct in their messaging and respects other products deeply. In my personal opinion, Dynamo and Fauna are very different products with a different focus. Dynamo focuses on a use case where you need scalability and sheer speed and are less interested in relations, many access patterns or consistency over many collections. At the same time they do appeal to people who do need those features by presenting workarounds. Maybe someone from a relational background sees these workarounds, didn't think it through and then gets stuck in their inflexibility? Is Dynamo to blame? No, they are just helping their users with questions that often come back. Similarly, the question of 'how is Fauna different from Dynamo' and more importantly for this article 'how does pricing compare' is a question that often comes back here. A question that is hard to answer since it depends entirely on how you use it and many subtleties that are not visible at first hand. Do you need relations? A single-table approach would help but will also blow up your table with redundant data and therefore increase pricing although at first Dynamo looked cheap, it depends on the use case. If you do not research a product thoroughly, chances are you will run into a wall and be stuck with it no matter whether the product is Dynamo, Fauna, Spanner, Firebase, etc. All we can do is provide as much details on what we can and can't do and I think the Fauna docs and forums do quite a good job on that.

I am a developer advocate at Fauna, this reply is however entirely my personal opinion.

abadid5y ago

IMO, it's hard to put a price on strong isolation and consistency. Being able to write an app that that uses atomic transactions, that are isolated from concurrently running transactions, and that see the correct data is something that translates to reduced programmer time and effort, and improves user experience. Many programmers discount those important features when they start out, but they'd be better served including them in the price comparisons of different products that are out there.

pier255y ago

Dear HN

I've been commenting on this thread and I'd like to add a disclaimer. While I'm not a Fauna employee, I've been paid by Fauna to write articles that have been published in their blog. My opinions are my own though.

That said, I've been using and studying Fauna for almost a year now so if you have any questions let me know!

Graphguy5y ago

> "Read operations assume a data size of 4K or less; each additional 4K costs an additional operation. Write operations assume a data size of 1K or less. Notably, index writes count as entirely separate write operations; they are not included in the document’s 1K."

So many customers don't account for this and it up costing $$$ if your data model isn't a good fit. Cosmos even takes it further w. 1kb units (I have spent hours on Cosmos pricing and am still baffled on how to price a workload.) Although... it does incentivize decent data modeling practices which often lead to more performant apps.

crb0025y ago

1) What is the latency within an AWS Region for a key lookup?

2) What is the latency of a global sync between all AWS regions for say a 100kb update?

sudeepj5y ago

For simple use-cases [1] isn't replicated Redis much better in terms of cost?

With in-mem DBs, there is no dollar cost for reads + writes and the IOPS will be way better than dynamoDB.

AWS has redis offering as elasticache.

[1] No indexes, strongly consistent get & put, < 10 GB

j / k navigate · click thread line to collapse

34 comments

25 comments · 11 top-level

yazaddaruvala5y ago· 5 in thread

> We have an accounts table with 20 secondary indexes defined for all the possible sort fields (DynamoDB’s maximum—Fauna has no limit).

Disclaimer: I work at Amazon, but not in AWS. My opinions are my own.

evanweaverOP5y ago

yazaddaruvala5y ago

> Is it necessary that data modeling flexibility must decline as an application matures and scales, though?

Add again if "overly flexible" is a long term product requirement, I'd argue you're going to eventually need full text search with all the power of Lucene (I'm betting its on your roadmap).

1 more reply

arpinum5y ago

> I'd argue the product managers/developers do not understand their customer's needs well enough

1 more reply

pier255y ago

You mean compared to DynamoDB?

Because Fauna is just as flexible as Postgres.

yazaddaruvala5y ago

Yeah, sorry. I was just comparing to DynamoDB, I have no comparison between Fauna and Postgres.

sargun5y ago· 4 in thread

Does Fauna have strongly ordered CDC stream?

databrecht5y ago

If I'm not mistaking, Dynamo's streaming is pulling under the hood. I suppose it depends on how frequently it pulls how quickly the price goes up and how expensive such a pull is in reads.

pier255y ago

Yeah they recently announced strongly consistent streaming.

https://fauna.com/blog/live-ui-updates-with-faunas-real-time...

jcims5y ago

I think the pricing of DynamoDB is the killer for personal projects. I ran it as a bit of a persistent cache one night and ran up $60 in charges.

ledauphin5y ago

this is pretty weird for me to hear, as a developer on a project with 3000+ monthly average users where DynamoDB is my only database and the costs are in the hundreds per month.

2 more replies

rafaelturk5y ago· 2 in thread

Amazingly Fauna pricing is even more confusing than DynamoDB's

northstar7025y ago

What do you find confusing? Would be interested in your feedback.

rafaelturk5y ago

First is the overall pricing strategy: Plans..

Second the plans have fancy names, but they are actually just $ based volume commitment.

Lastly: The real, clear, pricing is already there! At the bottom of the page: Billing unit rates. This is a simple, clean table that I can easy reason about and evaluate costs my business plan.

P.S Sorry the headline `Pay only for what you need`? sorry this BS marketing, as stated above I can't use premium regions in the individual plan, so well technically incorrect.

Hope this helps.. Project looks really amazing..

1 more reply

mNovak5y ago· 1 in thread

pier255y ago

AFAIK you can't for that particular contrived example. The article probably mentions the batching of 50 queries just to give you an idea of pricing, not because it works for that example.

ledgerdev5y ago· 1 in thread

databrecht5y ago

xdmr5y ago· 1 in thread

> Finally, let’s imagine we have something more like a typical SaaS application, for example, a CRM. We have an accounts table with 20 secondary indexes defined for all the possible sort fields

What makes you think you've imagined a good CRM schema here?

databrecht5y ago

I am a developer advocate at Fauna, this reply is however entirely my personal opinion.

abadid5y ago

pier255y ago

Dear HN

That said, I've been using and studying Fauna for almost a year now so if you have any questions let me know!

Graphguy5y ago

crb0025y ago

1) What is the latency within an AWS Region for a key lookup?

2) What is the latency of a global sync between all AWS regions for say a 100kb update?

sudeepj5y ago

For simple use-cases [1] isn't replicated Redis much better in terms of cost?

With in-mem DBs, there is no dollar cost for reads + writes and the IOPS will be way better than dynamoDB.

AWS has redis offering as elasticache.

[1] No indexes, strongly consistent get & put, < 10 GB

j / k navigate · click thread line to collapse