Usually when people release open source software, the documentation is lacking, there's no website etc... those guys absolutely nail it every single time.
Kudos for them, really!
This is very cool. Integrating with a name resolution protocol that every existing programmer and stack knows how to use (often without even thinking about it) should lead to some magical "just works" moments.
In common with Consul:
* DNS interface
* Operates as a distributed cluster
* Uses Raft for consensus
It seems like the right thing to do here would be to take the lessons of building consul into making serf something more like a library on which to build other things rather than a service in its own right.
And while you may not see Serf as having much use, we've personally helped and seen Serf clusters with many thousands of nodes. Serf is very useful to these organizations for its purpose. And while some of these orgs are now looking at Consul, many don't need Consul in the same way (but may deploy it separately).
We're not stopping with Consul. We have something more on the way. But we now have some great building blocks and experience building distributed systems to keep doing it correctly without having to rebuild everything from scratch.
Consul is a CP system, meaning it trades availability for consistency. It has a much more limited ability to tolerate failures. However, its more central architecture allows it to support a richer feature set.
By keeping the tools separate we give developers and operators two different tools. Sometimes you need a hammer, and sometimes a screwdriver will do.
This page compares the two: http://www.consul.io/intro/vs/serf.html
This coalesces a lot of different ideas together into what seems to be a really tight package to solve hard problems. In looking around at what most companies are doing, even startupy types, architectures are becoming more distributed and a (hopefully) solid tool for discovery and configuration seems like a big step in the right direction.
I was planning to make a tool like this (smaller scale, one machine), and this will certainly serve as a good guide on how to do it right (or whether I should even bother at all).
I can't find a trace of a standard/included slick web interface for managing the clusters and agents -- are they leaving this up to a 3rd party (by just providing the HTTP API and seeing what people will do with it)? Is that a good idea?
If I may ask, it seems like the design of the consul site is one step (iteration) away from the serf site (particularly, the docs pages -- some subtle changes made a large difference)... I agree with the others here, really dig the site, and big text definitely doesn't hurt deeply technical descriptions architecture page was very readable for me
that said, as i wrote my blog post on service discovery ( http://nerds.airbnb.com/smartstack-service-discovery-cloud/ ), dns does not make for the greatest interface to service discovery because many apps and libraries cache DNS looksups.
an http interface might be safer, but then you have to build a connector for this into every one of your apps.
i still feel that smartstack is a better approach because it is transparent. haproxy also provides us with great introspection for what's happening in the infrastructure -- who is talking to whom. we can analyse this both in our logs via logstash and in real-time using datadog's haproxy monitoring integration, and it's been invaluable.
however, this definitely deserves a look if you're interested in, for instance, load-balancing UDP traffic
How much time did it take to put this together?
From the service definition[0] it looks like the IP is always the IP of the node hosting `/etc/consul.d/*` files. I am thinking about it in a scenario where each service (running in a container) is getting an IP address on a private network which is not the IP of the node.
[0]: http://www.consul.io/docs/agent/services.html
Update: An external service is possible: http://www.consul.io/docs/guides/external.html
Discovery: The consul page alleges that it provides a DNS compatible DNS alternative for peer discovery but is unclear as to what improvements it offers other than 'health checks', with the documentation leaving failure resolution processes unspecified (as far as I can see) thus mandating a hyper-simplistic architecture strategy like run lots of redundant instances in case one fails. That's not very efficient. (It might be interesting to note that at the ethernet level, IP addresses also provide MAC address discovery. If you are serious about latency, floating IP ownership is generally far faster than other solutions.)
Configuration: We already have many configuration management systems, with many problems[1]. This is just a key/value store, and as such is not as immediately portable to arbitrary services as existing approaches such as "bunch-of-files", instead requiring overhead for each service launched in order to make it function with to this configuration model.
The use of the newer raft consensus algorithm is interesting, but consensus does not a high availability cluster make. You also need elements like formal inter-service dependency definition in order to have any hope of automatically managing cluster state transitions required to recover from failures in non-trivial topologies. Corosync/Pacemaker has this, Consul doesn't. Then there's the potential split-brain issues resulting from non-redundant communications paths... raft doesn't tackle this, as it's an algorithm only. Simply put: given five nodes, one of which fails normally, if the remaining four split in equal halves who is the legitimate ruler? Game of thrones.
As peterwwillis pointed out, for web-oriented cases, the same degree of architectural flexibility and failure detection proposed under consul can be achieved with significantly reduced complexity using traditional means like a frontend proxy. For other services or people wanting serious HA clustering, I would suggest looking elsewhere for the moment.
They do this by a using gossip-based protocol and a derivative of paxos called Raft. These two things work together to essentially have the servers that run your various services (whether api or db or cache or whatever) know about EACH OTHER.
The database they use is LMDB, but I think they chose that for lightness -- you could easily replace it with a local instance of cassandra, most likely.
Also, I'm assuming you don't mean switching to a centralized cassandra instance -- why you don't want to do that should be obvious (central point of failure).
I've never had a cluster completely collapse on me unless things were already screwed up enough that Service Discovery was ultimately useless since nothing else would work.
It just seems to me that losing your datastore makes your services unusable...at which point 'discovering them' isn't really the issue. Instead, everyone wants to introduce another datastore you need to rely on that its loss == can't find anyone. Even if your services themselves are still functional.
Well so the problem is, as soon as you centralize, you introduce a single point of failure, which is a no-no if you're looking for as pure of a distributed system as you can get (distributed systems have their flaws, but single-point-of-failure systems have been worked past at this point, generally the drawbacks are expressed in terms of number of non-faulty/byzantine nodes).
While it is definitely true that if the cluster completely collapses, service discovery won't work anyway, but as that is very rare (hopefully), the thinking here is that what if your centralized cassandra cluster fails? You would need to replicate everything to something else, and once you start preparing for those kinds of failures, you're already building a distributed system.
NOTE: I am assuming here that you mean ONE machine running cassandra... if you mean multiple, then the stuff below doesnt' really apply, if cassandra handles dynamic node changing well... but still, why not abstract? Why not make EVERY service you're running app/db/cache/app2/utility know about dynamic changes to architecture?
What do you mean by "losing your data store?" -- from what I understand, a consul agent runs on every machine and EVERY consul agent has an LMDB instance. If you mean losing your data store as in losing the service that provides your actual application data -- that would be the point of automatically discovering services, you could just arbitrarily add nodes that do the "db" service, and your nodes that run "app" would automatically know more "db"s showed up.
Forgive me if this is unnecessary explanation, but:
To illustrate this -- let's say I have 3 servers, 2 are running instances of the app (5 instances each) and 1 big-RAM machine is running the DB. All 10 instances are relying on that DB to not go down. While there are many very very capable & reliable DBs out there (cassandra, postgres, etc), it's dangerous to assume they will not fail.
However, the problem is, how do you just add nodes? You're going to either need to change app code, change some env variables, or do some other kind of monkey patching to let some of the app processes (there are 10 of them) know which DB to use. Also, if you look at just the problem of adding instances of app processes for more load balancing, there are various static-y files that possibly need to change to accomodate (nginx/apache config, env variables,etc).
Again, someone correct me if I'm wrong, but this is where Consul comes in. If app server 1 knows about at LEAST 1 of the DB clusters, you can easily add more DB clusters, and ask Consul about them. So, if one DB has gone down, and you have consul-aware code in place, consul can tell your app instances where to get their database data.
Like Cassandra, There are some DBs that make this really easy to do (spin up more DBs that can act as masters, or just backup read onlys, or whatever) -- rethinkdb is one of them (http://rethinkdb.com/)... They have a really good web interface that makes adding and managing clusters as easy as starting up a rethinkdb service with some extra options telling it where the master is. However, cassandra seems like it doesn't really handle dynamic node creation (I'm going off this page: http://www.datastax.com/docs/0.8/install/cluster_init). If it does, the case for an abstracted, dynamic service discovery still stands (cassandra might be OK, what about if you want to know about service x?)
We compare Consul to ZooKeeper here, but much of that applies to Cassandra as well: http://www.consul.io/intro/vs/zookeeper.html
Internally, Consul could also use something like Cassandra to store the data. However we use LMDB which is an embedded database to avoid an expensive context switch out of the process to serve requests with lower latency and higher throughput.
"... However, Serf does not provide any high-level features such as service discovery..."
Hm...
If your application relies on memcached, you need to pass the memcached location to your application somehow. For simple architectures, this may just be a hardcoded localhost:11211.
As you scale, it becomes prudent to distribute services across different servers. Your configuration could then become something like "server1.mycompany.com:11211". But what if memcached moves from server1 to server2? You'll need to reconfigure and restart your application.
More sophisticated apps will often use a dynamic approach: services are registered with something like ZooKeeper or etcd. When serviceA needs to talk to serviceB, serviceA looks up serviceB's address in the service registry (or a local cache) and makes the request.
The good news is that these often include basic health check functionality, so you get a bit of fault tolerance for free. Unfortunately, this requires services to integrate directly with ZooKeeper or etcd, adding undesired complexity.
Some architectures therefore choose to use DNS as their service registry. But instead of hardcoding a the DNS address of a single node (like "server1.mycompany.com"), they hit an address associated with the service (serviceB.mycompany.com). This usually means rolling your own system to keep DNS up to date (adding/pruning in context of health state).
Consul is a hybrid approach. It allows you to use DNS as a service registry, but operates as its own, distributed DNS server cluster. Think of it like a specialized ZooKeeper cluster that exposes service information via DNS (and HTTP, if you prefer).
Back to the memcached case. With Consul, you'd point your app at "memcached.consul:11211". If your memcached server fell over and was replaced, Consul would pick up the change and return the new address. And without any app config changes or restarts.
From what I can tell, Consul supports two registration mechanisms: Static defined services in /etc/consul.d, and dynamically defined services through the HTTP API.
For the statically-defined case, for any given node, you have to create Puppet (or Chef, or whatever) definitions that populate /etc/consul.d with the stuff that's going to run on that node. For the actual configuration itself, you still want Puppet to be the one to populate it. The question then is what you gain by doing this; if that configuration goes into Puppet, then Puppet is still the main truth where you want to centralize things, so then you have this flow of data:
client <- DNS <- Consul <- /etc/consul.d <- Puppet
...compared to the "old" way: client <- /srv/myapp/myapp.conf-or-whatever <- Puppet
In this case, Consul's benefit comes from the fact that it can know which services are alive and not, so that when myapp needs otherapp, it doesn't need a load-balancer to figure that out.The documentation makes a point about Puppet updates being slow and unsynchronized, and it's true that you get into situations where, for example, service A is configured with hosts that aren't up yet, for example. With Consul you can update the config "live"; surely you want to centralize config in Puppet and populate Consul's K/V from Puppet, and then you get the single-point-of-update synchronization missing from Puppet, but you still need to store the truth in Puppet.
So I'm counting two good, but not altogether mind-blowing benefits from using Consul with Puppet, over not using Consul at all. The overlap is looking a lot like two systems vaguely competing for dominance.
I suspect the better use of Consul is in conjunction with something like Docker, where you ditch Puppet altogether (except as a way to update the host OS), and instead build images of apps and services that don't contain any configuration at all, but simply point themselves at Consul. That means that when you bring up a new Docker container, it can start its Consul agent, register its services, and suddenly its contained services are dynamically available to the whole cluster.
The container itself contains no config, no context, just general-purpose application/service code; and Consul doesn't need to be populated through Puppet because in that way, Consul is (in conjunction with some container provisioning system) the application world's Puppet.
That, to me, sounds pretty nice.
What I will say, in my usually derisive fashion, is I can't tell why the majority of businesses would need decentralized network services like this. If you own your network, and you own all the resources in your network, and you control how they operate, I can't think of a good reason you would need services like this, other than a generalized want for dynamic scaling of a service provider (which doesn't really work without your application being designed for it, or an intermediary/backend application designed for it).
Load balancing an increase of requests by incrementally adding resources is what most people want when they say they want to scale. You don't need decentralized services to provide this. What do decentralized services provide, then? "Resilience". In the face of a random failure of a node or service, another one can take its place. Which is also accomplished with either network or application central load balancing. What you don't get [inherently] from decentralized services is load balancing; sending new requests to some poor additional peer simply swamps it. To distribute the load amongst all the available nodes, now you need a DHT or similar, and take a slight penalty from the efficiency of the algorithm's misses/hits.
All the features that tools like this provide - a replicated key/value store, health checks, auto discovery, network event triggers, service discovery, etc - can all be found in tools that work based on centralized services, while remaining scalable. I guess my point is, before you run off to your boss waving an iPad with Consul's website on it demanding to implement this new technology, try to see if you need it, or if you just think it's really cool.
It's also kind of scary that the ability of an entire network like Consul's to function depends on minimum numbers of nodes, quorums, leaders, etc. If you believe the claims that the distributed network is inherently more robust than a centralized one, you might not build it with fault-tolerant hardware or monitor them adequately, resulting in a wild goose chase where you try to determine if your app failures are due to the app server, the network, or one piece of hardware that the network is randomly hopping between. Could a bad switch port cause a leader to provide false consensus in the network? Could the writes on one node basically never propagate to its peers due to similar issues? How could you tell where the failure was if no health checks show red flags? And is there logging of the inconsistent data/states?
I want to clarify: Of all the buzz words Consul has, one thing Consul ISN'T is decentralized. You must run at least one Consul server in a cluster. If you want a fully centralized approach, you can just run one server. No big deal. Of course, if that server goes down, reads/writes are unavailable. If you want high availability, you run multiple servers. They leader elect to determine who will handle the writes but that is about it.
It is "decentralized" in that you can send read/writes to any server, but those servers actually just forward the requests onto the leader.
Now that i've re-read your architecture page, let me see if I understand this: the basic point behind using Consul is to have multiple servers agree on the result of a request, and communicate that agreement to a single node to write it, and then return it to the client. So really it's a fault-tolerant messaging platform that includes features that take advantage of such a network; do I have that right?
Also, your docs say there are between three and five servers, but here you're saying you only need one?
The leader is elected using Raft as the consensus algorithm (it is linked in the internals section of the docs).
You are correct though that if you run multiple servers, the leader election will happen automatically. There is no way to manually override it.