I say this as someone who got burned hard with weird bugs using Hazelcast 2.X as distributed lock manager. I'll have a hard think before adopting any part of the Hazelcast ecosystem in the future after that experience. When the analysis of Hazelcast 3.x was posted on jepson.io (https://jepsen.io/analyses/hazelcast-3-8-3) I had a good laugh because a number of issues that were exposed, we had seen in production in older versions. Locks claimed on both sides of a cluster partition, locks never getting released when a node crashed while running, memory leaks, etc. In the end, we had the option of upgrading to 3.X or dumping it entirely in favor of ZooKeeper + Curator. We chose the latter and haven't had issues with our locking system once and nobody has gotten paged in the middle of the night because of a ZooKeeper issue.
After that experience, I'll take every guarantee made by Hazelcast with a giant grain of salt. I've heard good things about later versions so I'm going to assume things have improved but I implore people to look very closely at solutions like these and in particular, the guarantees they make before picking any of them.
1. Re-implemented concurrency primitives on top of Raft protocol. This includes Distributed Locks, Semaphores, AtomicLong, etc. Raft provides linearizability and that's what you usually want for concurrency primitives. See our epic blog post about locking: https://hazelcast.com/blog/long-live-distributed-locks/ or our Jepsen testing story: https://hazelcast.com/blog/testing-the-cp-subsystem-with-jep...
2. Added a FlakeID generator. This is on the opposite side of the consistency spectrum - it's a k-ordered Available (wrt CAP) ID generator. It won't generate duplicates even when there is a split-brain. See: https://docs.hazelcast.org/docs/4.0.2/manual/html-single/ind...
3. PNCounter - CRDT-based eventually consistent data structure, suitable for .. well, counting things:) See: https://en.wikipedia.org/wiki/Conflict-free_replicated_data_...
4. Significantly extended documentation, to be more explicit about Hazecast replication models and guarantees. The goal is clear: Avoid Surprises. See: https://docs.hazelcast.org/docs/4.0.2/manual/html-single/ind...
Disclaimer: Obviously I am biased as I work for Hazelcast.
The whole cloud computing space got me confused. I don't know what horse to bet on and don't have the time to get familiar with every new framework. Is this the new javascript world? If so I'd like to skip the next couple of years until we found our react equivalent.
edit: Not to be read as an invitation to discuss how react is not the de-facto standard of ui web frameworks
It baffles me they're so casual about it ...
None of those things are nefarious and don't necessarily provide additional knowledge, as long as care is taken to fully deanonymize and fuzz start/stop/end locations of trips or associate trips together.
People agree to provide this information to services like Waze etc for exactly these tasks.
It’s strange to me that people read something like this and infer the absolute inverse of the actual situation. That is definitely a “thinking fast” reaction.
https://jet-start.sh/blog/2020/06/09/jdk-gc-benchmarks-part1
The point is that Jet can track several million distinct keys, even on a single machine, and finding velocity vectors boils down to linear regression sliding window against two FP variables.
If your concern is why you would specifically want to track locations, the answer is that there are plenty location-based apps that track locations with user's consent.
By user consent you mean someone clicked a button without thinking to get to the app ?
Besides, I think this statement is just meant to give a sense of the kind of processing that can be done, and the scale it can reach.
So, more comparable to Apache Beam, like a fancy ETL. Programming via pipes, transformations, etc.
It would hook to a Kafka (or other) stream.
The license itself is similar to the licenses from Confluent, Elastic among many others. You can read more about it here: https://hazelcast.org/blog/announcing-the-hazelcast-communit...
Beam is just an API layer with different backing implementations. But you don't typically use Beam to work with Jet, instead you use its own Pipeline API which is mostly like Java Streams. Jet will also soon get an SQL API.
- Flink uses Zookeeper for metadata and coordination, Jet doesn't require any external systems for resilience.
- Flink uses RocksDB and HDFS for checkpointing/snapshotting, Jet stores it in distributed, replicated in-memory store.
- Flink allocates operators to slots, while Jet uses green threads/cooperative multi-threading. This means you can run many concurrent streaming jobs on the same cluster, with very low overhead.
- Jet is basically a single, self-contained JAR. It's all you need to run a production-grade service (+ some connectors, if you'd like)
- Jet can scale up/down with very little friction. You start a couple of processes and they will form a cluster automatically. Kill a couple of the processes, and the cluster goes on.
That said, Flink have a great set of overall features, especially around persistence and huge states. This is another area we're currently investing in as well as SQL support.
How does the shift to cooperative multi-threading change the way that the cluster is used? In the "slot" approach, Alice and Bob can run concurrent jobs with relatively little coordination needed to "share" effectively -- e.g. they might use different branches of the same shared repo. In exchange for the lower-overhead, does Jet's approach require that multiple use cases are more carefully planned?