undefined | Better HN

0 pointsmverwijs7y ago0 comments

Being in a simular situation what do you consider "reasonably large"?

0 comments

250-300GB.

Not large by absolute standards sure, but large enough to cause issues.

I’m sure there’s some kind of solution that involves re-architecting the ES cluster and indices and re-architecting the data flows and stuff. But if our options are go through all that, or seriously slim down our architecture and costs by just running Sonic + our data warehouse, I’m definitely going to give it a go. After all, worst comes to worst we can go down the re-architecting ES route if Sonic doesn’t work out.

¯\_(ツ)_/¯

Xylakant7y ago

I’d be curious what your expectations and constraints are, but from my experience of running clusters in the double digit TB-Size my ballpark figure for that amount of data would be 2 medium size data nodes and a small tiebreaker. Alternatively, if you can live with the reduced resilience and availability, even a single node might just do. Depends on the expectated churn though, ES really does not like document updates.

cinbun87y ago

That does not sound like a good idea. You can't even maintain a quorum of 2 replicas with n=3 on a cluster like that. Losing one data node would be disastrous.

Xylakant7y ago

That’s really not how ES replica works. The quorum is formed on the master eligible nodes (hence a tie-breaker) and is only required to elect a master. The elected master designates a primary shard and as many replica as you configure. However, replica shards are replica and may lag. There’s no read quorum or reconciliation or anything happening. If a primary fails, an (in-sync, depending on the version of ES) replica is auto-promoted. The master keeps track of in-sync replica and you can request that writes are on a number of replica before a write returns, but still, no true quorum.

You can absolutely run 2 data/master-eligible nodes plus a single master-eligible tie-breaker node as a safe configuration. The only constraint is that you should have an uneven number of master-eligibile node to avoid a split brain. You also need to understand what the resilience guarantees are for any given number of replica (roughly: each replica allows for the loss of a single random node) and how many replica you can allocate on a given cluster (at most one per data node). That would allow you to run a 2-datanode cluster in a configuration that survives the loss of one node.

1 more reply

fnordsensei7y ago

From what I've learned, running any cluster on fewer than four nodes is not really recommended.

Xylakant7y ago

I’ve run quite a few cluster on such a configuration or alternatively 3 data/master-eligible nodes. It’s a safe configuration unless you manage to overload the elected master. But if you’re fighting that issue, you’ll have to go beyond 4 nodes and have a triplet of dedicated master-eligible nodes plus whatever data nodes you need.

I pretty much specifically avoid 4 node clusters. You’d have to either designate 3 of the four nodes as master-eligibile with a quorum of 2 or have all of them master-eligible with a quorum of 3. Both options allow for failure of a single node before the cluster becomes unavailable. Any other configuration would either fail immediately on a node loss (quorum 4) or be unsafe (quorum of 2, allows for split-brain)

I’d much rather opt for 4 data/master eligible nodes plus a dedicated master eligible node with a quorum of 3.

You also need to pick the number of replica suitably: each replica allows for the loss of a single random(!) data node while retaining all your data. Note that if losses are not random but you want to safeguard against loss of a rack or an availability zone or such, configurations are possible that distribute primary and replica suitably (“keep a full copy on either side”)

FridgeSeal7y ago

Is that double digit TB on ElasticSearch?

Xylakant7y ago

Yes, certainly.

j / k navigate · click thread line to collapse

0 comments

FridgeSeal7y ago

250-300GB.

Not large by absolute standards sure, but large enough to cause issues.

¯\_(ツ)_/¯

Xylakant7y ago

cinbun87y ago

That does not sound like a good idea. You can't even maintain a quorum of 2 replicas with n=3 on a cluster like that. Losing one data node would be disastrous.

Xylakant7y ago

1 more reply

fnordsensei7y ago

From what I've learned, running any cluster on fewer than four nodes is not really recommended.

Xylakant7y ago

I’d much rather opt for 4 data/master eligible nodes plus a dedicated master eligible node with a quorum of 3.

FridgeSeal7y ago

Is that double digit TB on ElasticSearch?

Xylakant7y ago

Yes, certainly.

j / k navigate · click thread line to collapse