In a world where Opex is much higher than Capex DynamoDB might make sense, but for me server costs are 5% of dev costs. And even if it works from a cost perspective, how many AWS services have the console experience ruined by DynamoDB? The UI tricks you into thinking its a data table with sortable columns, but no! DynamoDB limitations strike again and you are off on a journey of endless paging. The cost savings come at the expense of the user.
DynamoDB also isn't fast. 20ms for a query isn't fast, 30ms for an insert isn't fast. Yes its amazingly consistent and faster than other systems holding 500TB, but that isn't a use case for many users.
- It provides rich types with some odd limitations: strings, sets, lists, and binaries do not allow empty values.
- You can store a maximum 400 KB data in one row.
- You can get a maximum of 1 MB data returned in a single query.
So it's mostly good for high-data-throughput applications, and then only if your high data throughput consists of large numbers of small records, processed a few at a time. This surely describes an important class of workloads. You may suffer if your workload isn't in this class.
Another annoyance is that (in my experience) one of the most common errors you will encounter is ProvisionedThroughputExceededException, when your workload changes faster than the auto-scaling. Until last year you couldn't test this scenario offline with the DynamoDB Local service because DynamoDB Local didn't implement capacity limits.
That is _infuriating_
It's documented, but it is so surprising when you first hit it. Sometimes, empty values have semantics attached to them, I don't want to scrub them out.
(Disclosure: I work for AWS on DynamoDB and on this)
The rich data types in Dynamo are quite strange, since they're basically useless for querying I'm not sure why you would use them. Maybe I'm missing something...
Its not just that it is put in the database category, but that its champions at AWS make statements like "if you are utilising RDBMS you are living in the past", or that "there are very few use cases to choose Postgres over DynamoDB".
Btw, loved your AWS book!.
It's definitely a database. The modeling principles are different, and you won't get some of the niceties you get with a RDBMS, but it still allows for flexible querying and more.
S3 is not a database, but DynamoDB is :).
Did you mean to say "as a RDBMS"? Because I don't see how it's not a DBMS.
IMO, there are two times you should absolutely default to DynamoDB:
- Very high scale workloads, due to its scaling characteristics
- Workloads w/ serverless compute (aka Lambda) due to how well it fits with the connection model, provisioning model, etc.
You can use DynamoDB for almost all OLTP workloads, but outside of those two categories, I won't fault you for choosing an RDBMS.
Agree that DynamoDB isn't _blazing_ fast. It's more that it's extremely consistent. You're going to get ~10 millisecond response times when you have 1GB of data or when you have 10 TB of data, and that's pretty attractive.
If you can use Aurora Serverless, the Data API makes sense for lambda.
https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...
This is only true for AWS. Azure functions share resources and don't have this issue.
The speed is actually quite sad. Its 5-10x slower than my other databases at p95, and I can't throw money at the problem on the write side. Reads I can use DAX, but then there goes consistency.
If you're approaching that point, you already are going to need an analytics pipeline, a search DB, etc, because maintaining ever growing indices will kill your latency. You probably can get away with aggregations for a bit longer, but if the number of rows you aggregate is growing too, eventually you will need to come up with something and the way you do that with Dynamo off a stream isn't a bad way to go about it with MySql either.
Looking at the tables I have access to, they all come under 5ms for both read/write. This is the same ballpark as our MySql apps for similar style queries (i.e. not aggegrations).
Sadly my favorite reason to use Dynamo is political, not technical. Since it somehow is not classified as a database at my company, the DBAs don't 'own' it. So I don't have to wait 2-3 months for them to manually configure something.
Conway's law strikes again.
RDS goes 4TB on X1e instance type. But the point is RDBMS systems handle a large amount of data and workload types before needing to reach for specialist systems
I don't know how you are doing write transactions in 5ms on DynamoDB. Single puts p50 maybe, but i've never seen p90 put operations below 10ms.
What's the price of that on the cloud? I know I can run crazy big tables on DynamoDB for a couple of dollars. I don't know what 1 month of a relational database with 2TB of RAM costs on the cloud, but I am pretty sure I can't afford it.
NoSQL modeling is waaay different than relational modeling. I think a lot of NoSQL advice out there is pretty bad, which results in people dismissing the technology altogether. I've been working with DynamoDB for a few years now, and there's no way I'll go back.
The book has been available for about a month now, and I've been pretty happy with the reception. Strong support from Rick Houlihan (AWS DynamoDB wizard) and a lot of other folks at AWS.
You can get a free preview by signing up at the landing page. If you buy and don't like it, there's a full money-back guarantee with no questions asked. Also, if you're having income problems due to COVID, hit me up and we'll make something work :)
Anyhow, hit me up with questions!
EDIT: Added a coupon code for folks hearing about the book here. Use the code "HACKERNEWS" to save $20 on Basic, $30 on Plus, or $50 on Premium. :)
[1] https://syslog.ravelin.com/you-probably-shouldnt-use-dynamod...
You also finally have a way of identifying hot keys with the terribly named CloudWatch Contributor Insights for DynamoDB. [2]
For exceptional use cases, you also have the option of On-Demand Capacity to pay for what you use and not worry about capacity at all. [3]
[1] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
[2] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
[3] https://docs.aws.amazon.com/amazondynamodb/latest/developerg...
https://aws.amazon.com/blogs/database/how-amazon-dynamodb-ad...
Basically, most of these issues are gone. As long as you don't have extreme skew in your partition keys, you don't need to worry about throughput limits.
What was your approach to self-publishing here? What tools did you use? If I wanted to publish a book but knew nothing about it, what resources should I read and what approach would you recommend?
The biggest advice I can give you is not about any specific tool, it's about an approach. You need to think about how you will market the book if you're self-publishing.
Engage with the community that will be interested in the book. Write articles, help out on Twitter, write code libraries, etc.
For me, I wrote DynamoDBGuide.com two and a half years ago over Christmas break. I wanted to just make an easier introduction to DynamoDB after I watched Rick Houlihan's talk at re:Invent (which is awesome).
That led to other opportunities and to me being seen as an 'expert' (even when I wasn't!). I got more questions and spent more time on DynamoDB to the point where I started to know more. I gave a few talks, etc.
I finally decided to do a book and set up a landing page and mailing list. I basically followed the playbook that Adam Wathan described for his first book launch.[0] Write in public, release sample chapters, engage with people, etc.
In terms of tooling, I used AsciiDoc to generate the book and Gumroad to sell. On a 1-10 scale, I'd give AsciiDoc a 5 and Gumroad an 8. But the tooling barely matters -- think about how to find the people that are interested :)
Happy to answer any other questions, either in public or via email.
[0] - https://adamwathan.me/the-book-launch-that-let-me-quit-my-jo...
err.. back to what?
You can get many of the benefits of dynamo (sans auto-sharding), by applying its elegant indexing strategy to an sql database. It will be as fast or faster, your transactions can be as big as you need them to be, and you retain the ability to occasionally fire off un-indexed ad hoc queries for development or convenience. Running and scaling an sql db is also fairly painless these days with options like aurora.
But I agree in general about the limitations. Having used RDBMSes like Postgres a lot, as well used Cassandra and DynamoDB in production, I would almost certainly not create a new app with DynamoDB as the primary DB. Even if you have an app where you expect to need to scale writes heavily, it's not going to be on all tables equally. For instance, your users table, and related resources that are relatively small and grow linearly with your users, will probably fit fine in a Postgres DB for a very long time. And being able to have normalized models and powerful indexing and querying patterns available is a big benefit.
DynamoDB can work well for a specific sub-system that needs very high scalability. For instance, if you needed to store pairwise info between every user and product combination for some reason. Of if every user can upload a huge number of resources of some type (though the access patterns need to fit dynamodb's constraints, if these are documents or files of some type then another system like S3 or Elasticsearch would probably make more sense). Or if you're tracking advertising views by an advertising identifier or something. Or scraping and importing a bunch of data from other places. In some specific use-cases like this, the downsides vs an RDMS can be very minimal, and the built-in scalability can save you a ton of time vs having to constantly tune and potentially shard your RDBMS system.
But even in these cases, you might have better options depending on your access patterns. For instance if you don't ever need to refer to this data by reading it in an OLTP context, you might want to just write it to a log like Kafka to be ingested into Redshift or HDFS for offline processing or querying.
That said, I think you can definitely handle complex, relational patterns in DynamoDB pretty easily. It will take some work to learn new modeling patterns, but it's absolutely doable.
Sounds like a perfect use case for a traditional RDBMS. Why Dynamo?
I ... can't think of a single time I've ever needed this.
That said, a few notes:
1. I added a coupon code ('HACKERNEWS') to knock $20 off Basic, $30 off Plus, and $50 off Premium.
2. If you're from a country where PPP makes this pretty expensive, hit me up. I'm happy to help.
3. If you're facing income challenges due to COVID-19, hit me up, I'm happy to help.
4. If this is unaffordable for any reason, hit me up, I'm happy to help. :)
I bought it and have found it to be completely worth the money. I don't look at prices for these things in relation to how much other books cost but how much time it will save me.
We tend to criticize people for asking decent amount of money in our industry whereas people on others industries shamelessly ask for ludicrous amount of money even for pretty much anything (think medical or legal)
Alex answered my questions in such a way that I myself saw where the bug was in my code.
He saved me easily several hours of time.
At my hourly rate, this means that the book had a negative cost in my case.
I was able to repay the favor, I suggested an improvement to one code example in the book which Alex eagerly accepted.
RDBMS capacity planning basically goes:
1. How much traffic will I get? 2. How much RAM & CPU will I need to handle the traffic from (1).
With DynamoDB, you can skip the second question.
Can you tell me why the On Demand mode doesnt work for you?
It is just stunning how much better it is learning Dynamo/NoSQL in general from this than effectively any other source. Anyone who's had to rely on AWS docs knows how face-meltingly dense they can be.
I went back and refactored all my previous Dynamo work last night, and the difference was night and day. I'm planning to migrate some relational structures later this week, as well.
Is good book.
Edit: nevermind, I see another review elsewhere and the author replying. Though, your opinion would still be appreciated! :)
I mean- hand a person a gun, and they might shoot themselves in the foot. While you can make bad queries/workloads for a relational database, you can just as easily make bad workloads for DynamoDB.
This is underrated, but it's really helpful. So many times w/ a relational database, I've had to tweak queries or access patterns over time as response times degrade. DynamoDB basically doesn't have that unless you really screw something up.
If I have a person entity and its attributes listed out in a table. How would you go about sorting by first name, last name, created at, etc... I was thinking of streaming everything over to elastic search, but that would add extra complexity to maintain.
But how widely used is DynamoDB? And for what use cases?
And what are the problems with it?
- It was designed for super high scale use cases (think Amazon.com retail on Cyber Monday). It has decent adoption there. Competes mostly with Cassandra or other similar tools.
- With the introduction of AWS Lambda, it got more adoption in the 'serverless' ecosystem because of how well its connection model, provisioning model, and billing model works with Lambda. RDBMS doesn't work as well here.
A lot of people find 'problems' with it because they try to use it like a relational database, which it most certainly isn't. You have to model differently and think about it differently. The book helps here :).
That said, the principles apply pretty well to other popular NoSQL databases, especially MongoDB and Cassandra. There will be some slight differences -- MongoDB allows better nesting and querying on nested objects -- but it's broadly the same. If you want to model NoSQL for scale, you need to use these general patterns.
If you want to check it out but find out it doesn't work for you, just let me know. I've got a 100% money-back guarantee with no questions asked if you don't like it.
In others, you might have relations but lose consistency, in others you might have relations but only keep consistency under specific conditions (sharding keys etc)
NoSQL modeling typically depends on the specific characteristics of the database. Essentially it's about looking at these, see what it doesn't offer, compare that with what you need, and find workarounds.
The JS landscape for Dynamo is a bit bare, notable options all largely ignore the indexing principles that are the real draw of Dynamo. This heartburn caused me to sit down and write a library myself (https://github.com/tywalch/electrodb) that allows you focus on the models and relationships while taking care of all the little pitfalls and “hacky” tricks inherent in single table design.
Alex’s book covers all these things and I honestly wish I had had it sooner before having to learn via foot shooting. It’s pricey but if you have a need for Dynamo on your project it really pays off knowing you’re swimming with the current, and Alex definitely gets you there.
I haven't done any serious tests but I'd say on average my reads to Fauna from Cloudflare workers are 30ms. Seems a lot compared to querying a local instance of Postgres but since Fauna is distributed you end up getting much better latency on average for your worldwide users compared to a single DB in us-east-1.
Writes take longer (probably around 200-300ms on average) but considering these are replicated to all Fauna servers with ACID I'm ok with that.
I wrote a little intro to Fauna's query language which is very powerful if anyone is interested: