If anyone from AWS is here: how is this used internally at Amazon?
I wonder if this is essentially a Presto SaaS product?
https://www.dropbox.com/s/s4cw5x7yyrdl3ch/Screenshot%202016-...
Even the pricing is same: $5 / TB of data scanned.
> Amazon Athena supports a wide variety of data formats like CSV, TSV, JSON, or Textfiles and also supports open source columnar formats such as Apache ORC and Apache Parquet. Athena also supports compressed data in Snappy, Zlib, and GZIP formats. By compressing, partitioning, and using columnar formats you can improve performance and reduce your costs.
Other formats are schema-less (JSON,CSV, etc.) or not supported by Redshift (ORC, Parquet). Perhaps less efficient for some queries (AVRO is not a columnar format) but still useful.
I'm confused though. The monthly fees for dynamodb only apply after you exceed the free tier, and for someone who is unable to commit to a monthly fee because they envision low usage, shouldn't the free tier be sufficient? (Honest question, I'm looking at using dynamodb, but comments like this make me think I'm missing something)
Athena is really interesting and if it can be as it is advertised "Serverless SQL" then they've got a killer product in the pipes: A future where developers no longer need to spend time on scaling, configuring, maintaining, strategizing deployments but upload code and instantly begin reaping the benefits of serverless tech.
The only missing component that would be a killer feature is something that answers to Azure's Active Directory. It would be nice if we had serverless plug-and-play user authentication and access control that integrated with Lambda and Athena.
I'd imagine some sort of "RoR on Serverless" type of framework that will scaffold out CRUD, User Management & REST Api is going to be in the works as well.
The only potential downside I see at the moment for Serverless is the uncertainty surrounding cold boots, it will directly affect user experience. It's fine when you got enough traffic to keep things in the "warm" state but there needs to be no dead zone when the call to the API Gateway is taking many seconds waiting for Lambda function to fire.
As for the cold boot issue, I thought the standing solution was to have a "fast-exit" ping-like code-path within the lambda. Query it on a regular basis (you can even do it with a lambda scheduled-event). That way your lambda should be kept warm.
Simply point to your data in Amazon S3