Also you can't tell a story just with the Aesop ending - you have to tell the fable and END with that line.
By asking both super-microscopic stuff and stuff way out in space, you can now find out what time it is!
Remember... your CPU can halt at any time for any number of milliseconds. That means simple things like:
upperBound, lowerBound = readTime()
if (upperBound<deadline)
do_stuff(x, y, z)
Are incorrect... There is no guarantee that the 'if' statement didn't take many milliseconds, and that the stuff didn't end up happening after the deadline.It's also very easy to write code that works, but is theoretically wrong. You will leave a hidden bug that may only rear its head years down the line.
edit: well, after reading Spivak's comment[1] a bit, I guess it does provide strict upper bounds. May be useful to reduce how often you need to use fallback behaviors / get more byzantine. Though I'm not yet sure how to turn that into something useful. No doubt there are some though.
lowerBound, upperBound = readTime()
often without noticing my error. def must_complete_before(func, deadline):
result = func()
lower, upper = time.now()
if upper < deadline:
return result
else:
throw MissedDeadlineException(“Can’t guarantee func completed before deadline”)The APIs provided by TrueTime and Time Sync are useful to compare two events, each with their own uncertainty intervals. Then you can be sure if any event "happened before" the other, or if they're concurrent.
syncTime()
And they take care of the messI know this is a somewhat simplified story, but it does make me chuckle.
However, no one is looking at Map Reduce type jobs as a replacement for a database and vice versa. That's like saying "wow linkedin made kafka why do we need a webserver too". Those two technologies are only related in the loosest sense.
Yes, I specifically mentioned non-Google users adopting Hadoop, since it encompassed both a MapReduce implementation and supporting infrastructure.
Once on the bandwagon inspired by the MapReduce paper, many orgs didn't just use MapReduce itself for parallelized batch analytic purposes, but also HBase and Hive and other stuff with actual longer term state atop HDFS, YARN, etc.
> However, no one is looking at Map Reduce type jobs as a replacement for a database and vice versa.
The marketing and sales teams of HortonWorks, Cloudera, etc certainly sold Hadoop platforms, related Apache projects, and "MapReduce" (as a broad brand name for all this, not the specific technical concept) as replacements for databases, broadly speaking. It's that culture that was a bit shocked when Spanner was unveiled.
In terms of popular nosql vs google sql products, it's more Hadoop : BigQuery :: Mongo? : Spanner
You're pretty explicitly not supposed to run OLAP queries on spanner.
True time helps with things like spanner transactions. It's just a totally different use case.
MapReduce was used at Google for highly inappropriate things. For example, the machine learning system I worked on, Sibyl https://www.datanami.com/2014/07/17/inside-sibyl-googles-mas... was implemented using mapreduce but there was no real technical justification for that- it's just that there was no other system that could scale to the volumes required or handle the constant failures endemic to GOogle's internal systems. It ended up requiring all sorts of heroic work to make MR scale, for example map-side combiners (which "reduced" items with common keys in the map output before it gets flushed to the shuffle files). All of this got replaced with TensorFlow and only the good bits of Sibyl were extracted to TFX.
* slight exaggerations, I know
MapReduce predates TrueTime by a decade or more. MR was critical to scaling internet systems at the time it was released.
However, Flume + Spanner was a much nicer system to work with than MR + GFS, I'll give you that.
* It's not that hard to get your own "world class" time server, for under a thousand. A Rb standard slaved to a GPSDO is gonna be so accurate and stable, and use that to drive a SBC that supports IEEE1588, where you run your NTP and PTP server. Oh, but I guess that box, while inexpensive, isn't in Amazons DC, so doesn't help you.
* PTP's absence in the Amazon Time Sync Service article is quite conspicuous!
Even on local networks, NTP can only get you so close. If you set up chrony just so, in ideal conditions, I've gotten hundreds of microseconds (more commonly ~500-1000us). But combined with PTP, you can get sub-microsecond accuracy.
If you ask for the time and get A, and then ask for the time again and get B, then Truetime guarantees that A is less than B.
Obviously, in a distributed setting this is much easier to do if you have accurately sync'ed clocks, but that accuracy goes to reducing the uncertainty in the time (and hence making truetime faster) rather than providing accuracy.
"Please use the original title, unless it is misleading or linkbait; don't editorialize."
> It uses a fleet of redundant satellite-connected and atomic clocks in each Region to deliver time derived from these highly accurate reference clocks.
https://aws.amazon.com/blogs/mt/manage-amazon-ec2-instance-c...
I can't find any instance of the word "bound" in the GCP or Google NTP docs.
[1]: https://developers.google.com/time/guides#google_compute_eng...
[2]: https://cloud.google.com/compute/docs/instances/managing-ins...
Obviously it's all a matter of reverse engineering rather than documented API's.
Today they released a new OSS daemon and library for it.
[1] https://aws.amazon.com/about-aws/whats-new/2017/11/introduci...
I really wish they made more of that stuff publicly available.