Amazon Kinesis (opens in new tab)

What's going on with Amazon recently? We're seeing a torrent of new technologies and platform offerings. Are we finally catching a glimpse of Bezos's grand scheme?

skorgu12y ago

Amazon's reinvent conference[0] has been going on over the last few days, it's an obvious time/place to announce.

[0] https://reinvent.awsevents.com/

pvnick12y ago

Oh, derp. Well that makes more sense.

monkeyspaw12y ago

Bezos said recently that he thinks AWS could be Amazon's biggest business. http://techcrunch.com/2013/11/13/jeff-bezos-believes-aws-cou...

From the press conference reported in the link: "Jeff is very excited about the AWS business and he believes - like the rest of the leadership team does – that in the fullness of time- it is very possible that AWS could be the biggest business at Amazon."

hatred12y ago

Well , the new plethora of updates are centered on AWS since AWS:re-invent is going on at present. Historically, that is when Amazon likes to release new AWS services.

fizx12y ago· 4 in thread

Seems like a useful reworking of SQS, but all the hard work is being done in the client: "client library automatically handle complex issues like adapting to changes in stream volume, load-balancing streaming data, coordinating distributed services, and processing data with fault-tolerance."

Unfortunately, there's no explanation of the mechanics of coordination and fault tolerance, so the hard part appears to be vaporware.

vosper12y ago

> Unfortunately, there's no explanation of the mechanics of coordination and fault tolerance, so the hard part appears to be vaporware.

I think it's unfair to call it vaporware - Amazon doesn't tend to release vaporware. You can also be fairly confident this has been in private beta for some time, so we'll probably see a few blog posts about it from some of their privileged (big spending) clients - typically someone like Netflix or AirBnB. But I agree it would be nice to get some more information on the details.

As for the client library handling load-balancing, fault tolerance, etc - that might not be ideal, but as long as I don't have to do it myself then it might be okay.

fizx12y ago

The client handling it is ideal from a systems perspective, because the app won't forget to be fault tolerant on its connection to the server.

Its less ideal from a maintenance perspective, because there will need to be feature-rich clients in Java and C (with dynamic language bindings). Applications will be running many many versions of the clients. Also, for coordination, the clients will need to communicate, so there may be configuration and/or firewall issues for the app to resolve.

It will be interesting to see Amazon make this tradeoff for what I believe is the first time.

1 more reply

javajammer12y ago

The currently available docs reveal the client-nodes coordinate through a DynamoDB table. Processing with the library yields "at least once" semantics.

http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record...

fizx12y ago

thanks!

vosper12y ago· 3 in thread

I'm really excited about this - data streaming has been a crucial missing piece for building large-scale apps on AWS.

If the performance and pricing are right it's going to relieve a lot of headaches in terms of infrastructure management.

cjwebb12y ago

Forgive my ignorance, but what would this potentially replace? Kafka/Storm/Something else?

hatred12y ago

Yep, Amazon's version of Kafka/Storm with pay as you go minus the headaches of maintaining the cluster.

ihsw12y ago

Sounds about right.

mikebabineau12y ago· 2 in thread

This is essentially a hosted Kafka (http://kafka.apache.org/). Given the complexity of operating a distributed persistent queue, this could be a compelling alternative for AWS-centric environments. (We run a large Kafka cluster on AWS, and it is one of our highest-maintenance services.)

kodablah12y ago

We are about to deploy Kafka in our ecosystem and I am curious what maintenance you have? Can you explain or write a blog post? Is it on 0.8 beta?

We are choosing Kafka over other solutions like RabbitMQ because we like the persistent txn-log-style messages and how cheap consumers are.

mikebabineau12y ago

We're running 0.7 and most of our problems have been around partition rebalancing. I'm not the primary engineer on this, but here's my understanding:

If we add nodes to an existing Kafka cluster, those nodes own no partitions and therefore send/receive no traffic. A rebalancing event must occur for these servers to become active. Bouncing Kafka on one of the active nodes is one way to trigger such an event.

Fortunately, cluster resizing is infrequent. Unfortunately, network interruptions are not (at least on EC2).

When ZooKeeper detects a node failure (however brief), the node is removed from the active pool and the partitions are rebalanced. This is desirable. But when the node comes back online, no rebalancing takes place. The server remains inactive (as if it were a new node) until we trigger a rebalancing event.

As a result, we have to bounce Kafka on an active server every few weeks in response to network blips. 0.8 alleges to handle this better, but we'll see.

Handle-jiggling aside, I'm a fan of Kafka and the types of systems you can build around it. Happy to put you in touch with our Kafka guy, just email me (mike.babineau@rumblegames.com). Loggly's also running Kafka on AWS - would be interesting to hear their take on this.

2 more replies

andrewcooke12y ago· 1 in thread

it is possible that the MD5 hash of your partition keys isn't evenly distributed

how? i mean, apart from poisson stats / shot noise, obviously (and which is noise, so you can't predict it anyway).

thinking some more, i guess this (splitting and merging partitions in a non-generic way) is to handle when a consumer is slow for some reason. perhaps that partition is backing up because the consumer crashed.

but then why not say that, instead of postulating the people are going to have uneven hashes?

[edit:] maybe they allow duplicates?

twotwotwo12y ago

Yes, duplicates, I think. Looks like the partition key can be set to whatever you want, so maybe you log, I dunno, hits sharded by page, and your homepage gets a ton. I'd lean towards sharding randomly to avoid that, but, eh, they're just giving you enough rope to mess up your logging pipe with.

kylequest12y ago· 1 in thread

The Kinesis consumer API is somewhat equivalent to the Simple Consumer API in Kafka. You'll have to manage the consumed sequence number yourself. There's no higher level consumer API to keep track of the consumed sequence numbers.

kylequest12y ago

Looks like AWS decide to put this capability in their Kinesis Client Library, which keeps track of the checkpoints in DynamoDB.

kylequest12y ago· 1 in thread

Interesting I/O limitations in Kinesis:

1MB/s writes with 1000 writes/s 2MB/s reads with 5 read/s

senderista12y ago

Per shard.

zhaodaxiong12y ago

As a team member helped built the service, I would like to offer some of my personal understanding. I am not with Amazon now, and all my views are based on public information on the website.

Like all AWS offerings, Kinesis is a platform. It looks like kafka + storm, with fully integrated ecosystem with other AWS services. From the very beginning, the reliability, real-time processing, and transparent elasticity are built in. That's all I can say.

kylequest12y ago

The 50KB limit on data (base64 encoded data) will be a gotcha you'll have to deal with similar to the size limit in DynamoDB. Now you'll have to split your messages so they fit inside the Kinesis records and then you'll have to reassemble them on the other end... Not fun :-)

kylequest12y ago

Having to base64 encode data is also a bit awkward. They should be passing PutRecord parameters as HTTP headers (which they are already using for other properties) and let users pass raw data in the body.

itchyouch12y ago

It's interesting to see these messaging platforms and the new use cases starting to hit the mainstream a la kinesis, storm, kafka.

Some interesting things about these kinds of measaging platforms.

Many exhanges/algo/low-latency/hft firms have large clusters of these kinds of systems for trading. The open source stuff out there is kind of different from the typical systems that revolve around a central engine/sequencer (matching engine).

There's a large body of knowledge in the financial industry on building low-latency versions of these message processors. Here's some interesting possibilities. On an e5-2670 with 7122 solarflare cards running openonload, its possible to pump a decent 2M 100byte messages/sec with a packetization of around 200k pps.

Avergae latency through a carefully crafted system using efficient data structures and in-memory only stores can pump and process a message through in about 15 microseconds with the 99.9 percent median at around 20 micros. This is a message hitting a host, getting sent to an engine, then back to the host and back.

Using regular interrupt based processing and e1000s probably yields around 500k msgs/sec with average latency through the system at around 100 micros and 99.9% medians in the 30-40 millisecond range.

Its useful to see solarflares tuning guidelines on building uber-efficient memcache boxes that can handle something like 7-8M memcache requests/sec.

dylanz12y ago

Can someone with enough knowledge give a high level comparison to Kinesis compared with something like Storm or Kafka?

j / k navigate · click thread line to collapse

40 comments

34 comments · 13 top-level

carterschonwald12y ago· 5 in thread

Before I clicked the link I was hoping Amazon was releasing a clone of the kinesis keyboard. Anyone else have that initial hope? :-)

rbanffy12y ago

I wondered why would Amazon enter the keyboard market...

ewoodrich12y ago

They already have:

http://www.amazon.com/AmazonBasics-KU-0833-Wired-Keyboard-Bl...

1 more reply

logicallee12y ago

they would never do that - the margins are too big.

rohitn12y ago

Rest assured, Amazon has nothing to do with the AmazonBasics brand.

1 more reply

ohwait812y ago

they should stick to things they know, like cars.

pvnick12y ago· 4 in thread

What's going on with Amazon recently? We're seeing a torrent of new technologies and platform offerings. Are we finally catching a glimpse of Bezos's grand scheme?

skorgu12y ago

Amazon's reinvent conference[0] has been going on over the last few days, it's an obvious time/place to announce.

[0] https://reinvent.awsevents.com/

pvnick12y ago

Oh, derp. Well that makes more sense.

monkeyspaw12y ago

Bezos said recently that he thinks AWS could be Amazon's biggest business. http://techcrunch.com/2013/11/13/jeff-bezos-believes-aws-cou...

hatred12y ago

Well , the new plethora of updates are centered on AWS since AWS:re-invent is going on at present. Historically, that is when Amazon likes to release new AWS services.

fizx12y ago· 4 in thread

Unfortunately, there's no explanation of the mechanics of coordination and fault tolerance, so the hard part appears to be vaporware.

vosper12y ago

> Unfortunately, there's no explanation of the mechanics of coordination and fault tolerance, so the hard part appears to be vaporware.

As for the client library handling load-balancing, fault tolerance, etc - that might not be ideal, but as long as I don't have to do it myself then it might be okay.

fizx12y ago

The client handling it is ideal from a systems perspective, because the app won't forget to be fault tolerant on its connection to the server.

It will be interesting to see Amazon make this tradeoff for what I believe is the first time.

1 more reply

javajammer12y ago

The currently available docs reveal the client-nodes coordinate through a DynamoDB table. Processing with the library yields "at least once" semantics.

http://docs.aws.amazon.com/kinesis/latest/dev/kinesis-record...

fizx12y ago

thanks!

vosper12y ago· 3 in thread

I'm really excited about this - data streaming has been a crucial missing piece for building large-scale apps on AWS.

If the performance and pricing are right it's going to relieve a lot of headaches in terms of infrastructure management.

cjwebb12y ago

Forgive my ignorance, but what would this potentially replace? Kafka/Storm/Something else?

hatred12y ago

Yep, Amazon's version of Kafka/Storm with pay as you go minus the headaches of maintaining the cluster.

ihsw12y ago

Sounds about right.

mikebabineau12y ago· 2 in thread

kodablah12y ago

We are about to deploy Kafka in our ecosystem and I am curious what maintenance you have? Can you explain or write a blog post? Is it on 0.8 beta?

We are choosing Kafka over other solutions like RabbitMQ because we like the persistent txn-log-style messages and how cheap consumers are.

mikebabineau12y ago

We're running 0.7 and most of our problems have been around partition rebalancing. I'm not the primary engineer on this, but here's my understanding:

Fortunately, cluster resizing is infrequent. Unfortunately, network interruptions are not (at least on EC2).

As a result, we have to bounce Kafka on an active server every few weeks in response to network blips. 0.8 alleges to handle this better, but we'll see.

2 more replies

andrewcooke12y ago· 1 in thread

it is possible that the MD5 hash of your partition keys isn't evenly distributed

how? i mean, apart from poisson stats / shot noise, obviously (and which is noise, so you can't predict it anyway).

but then why not say that, instead of postulating the people are going to have uneven hashes?

[edit:] maybe they allow duplicates?

twotwotwo12y ago

kylequest12y ago· 1 in thread

kylequest12y ago

Looks like AWS decide to put this capability in their Kinesis Client Library, which keeps track of the checkpoints in DynamoDB.

kylequest12y ago· 1 in thread

Interesting I/O limitations in Kinesis:

1MB/s writes with 1000 writes/s 2MB/s reads with 5 read/s

senderista12y ago

Per shard.

zhaodaxiong12y ago

As a team member helped built the service, I would like to offer some of my personal understanding. I am not with Amazon now, and all my views are based on public information on the website.

kylequest12y ago

itchyouch12y ago

It's interesting to see these messaging platforms and the new use cases starting to hit the mainstream a la kinesis, storm, kafka.

Some interesting things about these kinds of measaging platforms.

Using regular interrupt based processing and e1000s probably yields around 500k msgs/sec with average latency through the system at around 100 micros and 99.9% medians in the 30-40 millisecond range.

Its useful to see solarflares tuning guidelines on building uber-efficient memcache boxes that can handle something like 7-8M memcache requests/sec.

dylanz12y ago

Can someone with enough knowledge give a high level comparison to Kinesis compared with something like Storm or Kafka?

j / k navigate · click thread line to collapse