Cloud Platform at Google I/O – new Big Data, Mobile and Monitoring products (opens in new tab)

(googledevelopers.blogspot.com)

119 pointsGoranek12y ago31 comments

31 comments

23 comments · 8 top-level

nostromo12y ago· 8 in thread

This is one space where Google really excels.

We're in the AWS ecosystem, and the database offerings are really subpar. DynamoDB, which I originally expected to be somewhat comparable to MongoDB, is an incredibly frustrating (and expensive) product to use. AWS Data Pipeline is extremely confusing and very expensive as well.

AWS's offerings really lag behind Google's offerings (like BigQuery) in this space. Hopefully AWS can catch up because I'd rather not have requests bouncing between data centers.

threeseed12y ago

If you are in AWS there is also the four RDS (Oracle, MySQL, PostresSQL, SQL Server) options as well as RedShift. Also the best thing about AWS is that there are so many third party choices e.g. MongoLab, MongoHQ, Instaclustr, Cloudant.

Databases is not the area I would be choosing Google for.

Nitramp12y ago

I think App Engine's Datastore is generally an under-appreciated gem. Possibly because you have to use App Engine to use it without sacrificing performance, maybe because it's not easy enough to use if all you have is some JavaScript + JSON and don't want or know how to write Python/Java/Go.

But it's actually the only generally available product I know of that solves all the hard problems (availability, partition tolerance, some - but well defined - consistency with cross entity transactions) with zero hassle for you.

If you read through http://aphyr.com/tags/Jepsen, you get some appreciation for how hard this is to pull off without running into operational nightmares (massive data loss, split brains, etc).

Disclaimer: I work for Google, though not on Datastore.

bkirkbri12y ago

We've had good luck with DynamoDB, but it could be that it just fits our use case very well. What sort of frustration were you running into? (Honestly interested to avoid trouble down the line)

nostromo12y ago

Most recently: hot hash key. DynamoDB uses the object to be persisted's hash key to route it to the right data cluster.

We're a SaaS company with lots of tiny customers and a few very large customers. We need to keep an index to show a specific customer only their data. That means the index for our largest customers gets hit a lot. The problem with this structure is we have to pay as if all of our customers were as popular as our biggest customers, or we get throttled. And even though the DynamoDB interface shows that you have provisioned 10x above your current usage, you still get throttled, because you're being throttled only in a single cluster.

So, let's say you solve that problem, but now you need to drop the troublesome index on a billion+ row table. With DynamoDB you can't change a table's indexes, so you have to migrate your table to a new table. Doing that without downtime is an incredible challenge.

Which reminds me of when they announced indexes. We were so excited only to find out we couldn't add indexes to our tables, but instead had to recreate them all.

The whole point of SaaS is to make our lives easier, but with DynamoDB our lives were much more difficult than just using Mongo.

Anyway, I need to do a blog post on this -- it's a bit too complicated for a HN comment. :-)

2 more replies

tedsumme12y ago

What do you think is the analogous product to Cloud Dataflow in the AWS ecosystem? SWF? http://aws.amazon.com/swf/

persona12y ago

I believe it's Kinesis: http://aws.amazon.com/kinesis/

1 more reply

nostromo12y ago

I presume AWS Data Pipeline. But they have their differences, so perhaps there is no true analog.

http://aws.amazon.com/datapipeline/

Simple Workflow is new to me, so thanks for putting it on my radar!

1 more reply

ackdesha12y ago

Have you looked at AWS Redshift? It is somewhat analogous to BigQuery.

opendais12y ago· 4 in thread

Google adding cloud monitoring has me sorely tempted to abandon a side project of mine. I'm sure Google could do it better. Bah. :P

samstave12y ago

What type of monitoring? We use Stackdriver, which Google just recently bought.

dpg1711y ago

DataDog is an important part of monitoring at SmarterAgent, where we use their API's and Integrations heavily, especially the CloudWatch integrations for Amazon's Web Services (AWS). Using these, we have been able to put up effective dashboards for new environments in a matter of minutes.

We leverage DD’s API primarily for eventing. For example, deployment notifications are posted to datadog, where they overlay our metrics. This has proven very useful in tracking changes due to deploys and/or configuration changes.

While we do leverage the DataDog agent for standard and custom metrics, DataDog’s ability to put together dashboards (and alerting) for AWS without any modifications to the host is what really closed the deal for us.

opendais12y ago

I probably will end up building the bare minimum to meet my needs and moving on tbh.

It was basically a monitoring/metrics system to merge how I handle the monitoring of crons, work queue, system metrics, analytics, etc. into a single service. Right now, I'm stuck using 3.

Sure, I could just build something to merge it together ... but at that point, I'm halfway to building my own.

2 more replies

kzwin12y ago

We use datadog at pelotoncycle.com. My past experience was nagios but datadog is so much nicer as well as easier to scale. I looked into other services (cheaper, more expensive) before deciding on datadog.

k.z

IanCal12y ago· 2 in thread

The streaming data stuff looks extremely interesting. My main concern is around cost, unfortunately many of these things are great if you've got a massive data problem but not particularly worth it if you've got much smaller data.

I'm in a rather awkward phase of having small enough data that I don't need "Scale to 1000 machines!", I want just one or a few machines occasionally but managed for me (turn on, run code, shut off). Tutum works very well for this, but I'd like to use more of the ecosystem available at Google or AWS (pay-per-usage datastorage, for example). GCE is pretty decent, but a bit awkward, although the new docker support helps (but I've had problems getting it even working).

Maybe this is my magic bullet :)

smoe12y ago

I'm using MITs StarCluster to quickly spin up a bunch of AWS Spot Instances, run some calculations and shut them down again.

http://star.mit.edu/cluster/ http://www.youtube.com/watch?v=2Ym7epCYnSk

IanCal12y ago

Thanks, I'll have a look around at that!

curiousDog12y ago· 1 in thread

Anyone know if they'll be offering Spanner as a publicly available service?

GoranekOP12y ago

Probable in the future.. They never offer the latest technology...

persona12y ago

Google Dataflow seems to be the big one here specially if it works well for stream processing. Fault-tolerant stream processing with huge scalability? Perfect for the IoT!

isbadawi12y ago

Judging from the code samples they showed during the keynote, I'd guess that Google Cloud Dataflow is based on (or an extension of, or a public version of...) FlumeJava, described in this PLDI 2010 paper: http://pages.cs.wisc.edu/~akella/CS838/F12/838-CloudPapers/F...

davecap112y ago

Anybody have experience moving from AWS to Google Cloud? If so, did you have any surprises in terms of difficulty or cost?

nwfzp12y ago

It looks like an attempt to respond to AWS kinesis which was released last year. The monitoring stuff seems to be about the software they got when they bought stackdriver.

j / k navigate · click thread line to collapse

31 comments

23 comments · 8 top-level

nostromo12y ago· 8 in thread

This is one space where Google really excels.

AWS's offerings really lag behind Google's offerings (like BigQuery) in this space. Hopefully AWS can catch up because I'd rather not have requests bouncing between data centers.

threeseed12y ago

Databases is not the area I would be choosing Google for.

Nitramp12y ago

If you read through http://aphyr.com/tags/Jepsen, you get some appreciation for how hard this is to pull off without running into operational nightmares (massive data loss, split brains, etc).

Disclaimer: I work for Google, though not on Datastore.

bkirkbri12y ago

We've had good luck with DynamoDB, but it could be that it just fits our use case very well. What sort of frustration were you running into? (Honestly interested to avoid trouble down the line)

nostromo12y ago

Most recently: hot hash key. DynamoDB uses the object to be persisted's hash key to route it to the right data cluster.

Which reminds me of when they announced indexes. We were so excited only to find out we couldn't add indexes to our tables, but instead had to recreate them all.

The whole point of SaaS is to make our lives easier, but with DynamoDB our lives were much more difficult than just using Mongo.

Anyway, I need to do a blog post on this -- it's a bit too complicated for a HN comment. :-)

2 more replies

tedsumme12y ago

What do you think is the analogous product to Cloud Dataflow in the AWS ecosystem? SWF? http://aws.amazon.com/swf/

persona12y ago

I believe it's Kinesis: http://aws.amazon.com/kinesis/

1 more reply

nostromo12y ago

I presume AWS Data Pipeline. But they have their differences, so perhaps there is no true analog.

http://aws.amazon.com/datapipeline/

Simple Workflow is new to me, so thanks for putting it on my radar!

1 more reply

ackdesha12y ago

Have you looked at AWS Redshift? It is somewhat analogous to BigQuery.

opendais12y ago· 4 in thread

Google adding cloud monitoring has me sorely tempted to abandon a side project of mine. I'm sure Google could do it better. Bah. :P

samstave12y ago

What type of monitoring? We use Stackdriver, which Google just recently bought.

dpg1711y ago

opendais12y ago

I probably will end up building the bare minimum to meet my needs and moving on tbh.

It was basically a monitoring/metrics system to merge how I handle the monitoring of crons, work queue, system metrics, analytics, etc. into a single service. Right now, I'm stuck using 3.

Sure, I could just build something to merge it together ... but at that point, I'm halfway to building my own.

2 more replies

kzwin12y ago

k.z

IanCal12y ago· 2 in thread

Maybe this is my magic bullet :)

smoe12y ago

I'm using MITs StarCluster to quickly spin up a bunch of AWS Spot Instances, run some calculations and shut them down again.

http://star.mit.edu/cluster/ http://www.youtube.com/watch?v=2Ym7epCYnSk

IanCal12y ago

Thanks, I'll have a look around at that!

curiousDog12y ago· 1 in thread

Anyone know if they'll be offering Spanner as a publicly available service?

GoranekOP12y ago

Probable in the future.. They never offer the latest technology...

persona12y ago

Google Dataflow seems to be the big one here specially if it works well for stream processing. Fault-tolerant stream processing with huge scalability? Perfect for the IoT!

isbadawi12y ago

davecap112y ago

Anybody have experience moving from AWS to Google Cloud? If so, did you have any surprises in terms of difficulty or cost?

nwfzp12y ago

It looks like an attempt to respond to AWS kinesis which was released last year. The monitoring stuff seems to be about the software they got when they bought stackdriver.

j / k navigate · click thread line to collapse