Amazon Time Sync Service (opens in new tab)

(aws.amazon.com)

128 pointssizediterable4y ago88 comments

88 comments

43 comments · 11 top-level

strimp0994y ago· 8 in thread

Can someone please explain to me what this is all about like I’m five years old?

The combination of a time and guaranteed error bound can be used as a primitive in distributed systems. You can get timestamps from requests coming from different systems and relate them in a meaningful way. At least, in a false negative fashion. E.g., if A occurs at T + 100ms with 1ms uncertainty and B occurs at T + 105ms with 2ms uncertainty you can conclude that A happened before B. This does not allow you to always establish an order (e.g., if there is overlap) but this isn't always necessary. For instance if instead you are B and you want to have something occur after some time A with uncertainty A_uncertainty, then you can sleep until B with uncertainty B_uncertainty such that B - B_uncertainty > A + A_uncertainty.

gnu84y ago

None of this has to do with Amazon though. I am very uncomfortable with Amazon (or the others) pretending that they invented precision timekeeping and that it is a special feature only they have because it seems like a lot of people believe it and build systems that rely on an API that is proprietary to a cloud vendor rather than just installing ntp.

1 more reply

bigsparky4y ago

You must know some pretty smart five year olds!

1 more reply

vel0city4y ago

A person with two watches never knows what time it is.

pugworthy4y ago

Nah. They just figure out eventually one of them is kind of wonky and ignore it. Or maybe both are, so they end up just looking at the stove clock.

Also you can't tell a story just with the Aesop ending - you have to tell the fable and END with that line.

1 more reply

teddyh4y ago

So what I’m hearing is “Get three or more watches.”.

Groxx4y ago

Time is hard.

By asking both super-microscopic stuff and stuff way out in space, you can now find out what time it is!

bogomipz4y ago

One is cesium? Whats the other stuff, way out in space?

6 more replies

londons_explore4y ago· 7 in thread

It's really hard to use these API's correctly.

Remember... your CPU can halt at any time for any number of milliseconds. That means simple things like:

    upperBound, lowerBound = readTime()
    if (upperBound<deadline)
      do_stuff(x, y, z)

Are incorrect... There is no guarantee that the 'if' statement didn't take many milliseconds, and that the stuff didn't end up happening after the deadline.

It's also very easy to write code that works, but is theoretically wrong. You will leave a hidden bug that may only rear its head years down the line.

Groxx4y ago

Yeah, I was thinking about this too. Without a real-time OS on separate hardware, and/or some kind of guaranteed "will not context-switch out of your process for more than X time" + strict coding habits, this seems like at best a practical (>?)99% improvement and at worst snake-oil. Certainly not reliable in any case.

edit: well, after reading Spivak's comment[1] a bit, I guess it does provide strict upper bounds. May be useful to reduce how often you need to use fallback behaviors / get more byzantine. Though I'm not yet sure how to turn that into something useful. No doubt there are some though.

[1] https://news.ycombinator.com/item?id=29103093

karmakaze4y ago

Is that really the API? I'm sure I'd be writing:

  lowerBound, upperBound = readTime()

often without noticing my error.

Spivak4y ago

I mean it’s not that hard even on non-rt preemptive schedulers. You can’t avoid false negatives because you could be preempted after func completes but before the time is fetched but if you get a result it will be valid.

    def must_complete_before(func, deadline):
        result = func()
        lower, upper = time.now()
        if upper < deadline:
            return result
        else:
            throw MissedDeadlineException(“Can’t guarantee func completed before deadline”)

Psyonic4y ago

Assuming func() doesn’t have side effects, sure. But if the point was to gate func() to only run before deadline this doesn’t really help.

1 more reply

en4bz4y ago

You can use rseq on linux to avoid this.

ergl4y ago

To be fair your example would be unsafe with and kind of time API.

The APIs provided by TrueTime and Time Sync are useful to compare two events, each with their own uncertainty intervals. Then you can be sure if any event "happened before" the other, or if they're concurrent.

dheera4y ago

How about just

    syncTime()

And they take care of the mess

drewda4y ago· 7 in thread

Perhaps the biggest "fake-out" in 21st century computing: Google publicly released its MapReduce paper -- directing most of the rest of the industry toward loosely coupled, overly complex distributed data processing systems like Hadoop for the following decade -- but internally they just bought a bunch of atomic clocks and built a distributed RDBMS.

I know this is a somewhat simplified story, but it does make me chuckle.

VirusNewbie4y ago

It's rare to see a comment on HN that misunderstands basic distributed systems concepts. MapReduce the paper has nothing to do with a database. You're likely conflating the fact that to achieve fault tolerant distributed computation, hadoop and hadoop like systems use a database like filesystem.

However, no one is looking at Map Reduce type jobs as a replacement for a database and vice versa. That's like saying "wow linkedin made kafka why do we need a webserver too". Those two technologies are only related in the loosest sense.

drewda4y ago

> You're likely conflating the fact that to achieve fault tolerant distributed computation, hadoop and hadoop like systems use a database like filesystem.

Yes, I specifically mentioned non-Google users adopting Hadoop, since it encompassed both a MapReduce implementation and supporting infrastructure.

Once on the bandwagon inspired by the MapReduce paper, many orgs didn't just use MapReduce itself for parallelized batch analytic purposes, but also HBase and Hive and other stuff with actual longer term state atop HDFS, YARN, etc.

> However, no one is looking at Map Reduce type jobs as a replacement for a database and vice versa.

The marketing and sales teams of HortonWorks, Cloudera, etc certainly sold Hadoop platforms, related Apache projects, and "MapReduce" (as a broad brand name for all this, not the specific technical concept) as replacements for databases, broadly speaking. It's that culture that was a bit shocked when Spanner was unveiled.

1 more reply

d_watt4y ago

Those aren't really equivalent, are they? Hadoop is for analytics, spanner is transactional.

In terms of popular nosql vs google sql products, it's more Hadoop : BigQuery :: Mongo? : Spanner

You're pretty explicitly not supposed to run OLAP queries on spanner.

skj4y ago

Google uses map reduce extensively... where it's appropriate.

True time helps with things like spanner transactions. It's just a totally different use case.

dekhn4y ago

The tech lead of the Google MapReduce team (which no longer exists) just received their award for turning down mapreduce. IIRC it was officially done 5 years ago. However I believe the code to delete MR was never checked in and I'm not sure if there are still users.

MapReduce was used at Google for highly inappropriate things. For example, the machine learning system I worked on, Sibyl https://www.datanami.com/2014/07/17/inside-sibyl-googles-mas... was implemented using mapreduce but there was no real technical justification for that- it's just that there was no other system that could scale to the volumes required or handle the constant failures endemic to GOogle's internal systems. It ended up requiring all sorts of heroic work to make MR scale, for example map-side combiners (which "reduced" items with common keys in the map output before it gets flushed to the shuffle files). All of this got replaced with TensorFlow and only the good bits of Sibyl were extracted to TFX.

4 more replies

drewda4y ago

Yes, which is why it's amusing in hindsight that for a decade everyone* outside Google was forcing all* their distributed data tasks into the MapReduce paradigm, without considering alternative approaches like the one used by Spanner.

* slight exaggerations, I know

3 more replies

dekhn4y ago

amazing. you literally got everything wrong.

MapReduce predates TrueTime by a decade or more. MR was critical to scaling internet systems at the time it was released.

However, Flume + Spanner was a much nicer system to work with than MR + GFS, I'll give you that.

loxias4y ago· 4 in thread

Two observations/questions, the first probably naive:

* It's not that hard to get your own "world class" time server, for under a thousand. A Rb standard slaved to a GPSDO is gonna be so accurate and stable, and use that to drive a SBC that supports IEEE1588, where you run your NTP and PTP server. Oh, but I guess that box, while inexpensive, isn't in Amazons DC, so doesn't help you.

* PTP's absence in the Amazon Time Sync Service article is quite conspicuous!

deelowe4y ago

Amazon time sync likely also requires extremely accurate clocks on the system board as well to prevent there from being too much drift.

NavinF4y ago

You don’t even need a tcxo. If you don’t care about holdover, connecting the pps pin from a $5 GPS module to an $15 SBC will give you a time source that is significantly better than anything you can serve over NTP to your clients. NTP is always gonna be the weak link for a service like this.

2 more replies

Gelob4y ago

Them providing PTP would be really interesting

loxias4y ago

To be clear, what I meant was "the fact that they're advertising whiz bang time synchronization, but then not using the correct protocol, makes me question the quality of the whole enterprise".

Even on local networks, NTP can only get you so close. If you set up chrony just so, in ideal conditions, I've gotten hundreds of microseconds (more commonly ~500-1000us). But combined with PTP, you can get sub-microsecond accuracy.

1 more reply

tyingq4y ago· 4 in thread

I'm curious about the title on this submission. I thought TrueTime had unusually strong guarantees about accuracy that don't seem to be called out in what I'm reading on the linked article.

morei4y ago

Not quite: Truetime has guarantees about ordering, rather than accuracy.

If you ask for the time and get A, and then ask for the time again and get B, then Truetime guarantees that A is less than B.

Obviously, in a distributed setting this is much easier to do if you have accurately sync'ed clocks, but that accuracy goes to reducing the uncertainty in the time (and hence making truetime faster) rather than providing accuracy.

sizediterableOP4y ago

Sorry if it's a bit sensationalized. I wanted to give some color as to why this release might be interesting and took this tweet I saw at face value https://twitter.com/rbranson/status/1455923426359578631

dang4y ago

If you want to say what you think is important about an article, that's great, but please do it by adding a comment to the thread. Then your view will be on a level playing field with everyone else's: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&so...

"Please use the original title, unless it is misleading or linkbait; don't editorialize."

https://news.ycombinator.com/newsguidelines.html

tyingq4y ago

Oh, I wasn't criticizing, just genuinely curious. If the AWS service has a similar design, that's interesting to me. AWS doesn't seem to say much about the details.

babelfish4y ago· 1 in thread

Using atomic clocks too, nice. From an older post:

> It uses a fleet of redundant satellite-connected and atomic clocks in each Region to deliver time derived from these highly accurate reference clocks.

https://aws.amazon.com/blogs/mt/manage-amazon-ec2-instance-c...

Faaak4y ago

so... GPS ?

adamfeldman4y ago· 1 in thread

Google and GCP offer their own NTP endpoints. Does anyone know if GCP also exposes the TrueTime API to customers, or if it's only internal to Spanner?

I can't find any instance of the word "bound" in the GCP or Google NTP docs.

[1]: https://developers.google.com/time/guides#google_compute_eng...

[2]: https://cloud.google.com/compute/docs/instances/managing-ins...

londons_explore4y ago

I'm pretty sure if you use the GCP dataflow stuff, the dataflow worker VM's will be given access to an RPC server which has the necessary endpoints for TrueTime.

Obviously it's all a matter of reverse engineering rather than documented API's.

deanCommie4y ago

They released it in 2017. [1]

Today they released a new OSS daemon and library for it.

[1] https://aws.amazon.com/about-aws/whats-new/2017/11/introduci...

glenngillen4y ago

Back from my time at Amazon, one of my favourite technical videos on the internal educational/YouTube thing was from a service team explaining how hard accurate clock sync in distributed systems was. One of those problems I’ve just taken for granted over the years. But just layers and layers of complexity where naive assumptions at any point get you the wrong result, but it’s not at all obvious you have the wrong result.

I really wish they made more of that stuff publicly available.

jeffbee4y ago

One of the key aspects of the TrueTime system is any device with serious clock error is simply murdered. It seems like offering that would significantly benefit users of this AWS API.

kohlerm4y ago

Does that mean cockroachdb or others could implement a Spanner like DB?

j / k navigate · click thread line to collapse

88 comments

43 comments · 11 top-level

strimp0994y ago· 8 in thread

Can someone please explain to me what this is all about like I’m five years old?

foota4y ago

gnu84y ago

1 more reply

bigsparky4y ago

You must know some pretty smart five year olds!

1 more reply

vel0city4y ago

A person with two watches never knows what time it is.

pugworthy4y ago

Nah. They just figure out eventually one of them is kind of wonky and ignore it. Or maybe both are, so they end up just looking at the stove clock.

Also you can't tell a story just with the Aesop ending - you have to tell the fable and END with that line.

1 more reply

teddyh4y ago

So what I’m hearing is “Get three or more watches.”.

Groxx4y ago

Time is hard.

By asking both super-microscopic stuff and stuff way out in space, you can now find out what time it is!

bogomipz4y ago

One is cesium? Whats the other stuff, way out in space?

6 more replies

londons_explore4y ago· 7 in thread

It's really hard to use these API's correctly.

Remember... your CPU can halt at any time for any number of milliseconds. That means simple things like:

    upperBound, lowerBound = readTime()
    if (upperBound<deadline)
      do_stuff(x, y, z)

Are incorrect... There is no guarantee that the 'if' statement didn't take many milliseconds, and that the stuff didn't end up happening after the deadline.

It's also very easy to write code that works, but is theoretically wrong. You will leave a hidden bug that may only rear its head years down the line.

Groxx4y ago

[1] https://news.ycombinator.com/item?id=29103093

karmakaze4y ago

Is that really the API? I'm sure I'd be writing:

  lowerBound, upperBound = readTime()

often without noticing my error.

Spivak4y ago

    def must_complete_before(func, deadline):
        result = func()
        lower, upper = time.now()
        if upper < deadline:
            return result
        else:
            throw MissedDeadlineException(“Can’t guarantee func completed before deadline”)

Psyonic4y ago

Assuming func() doesn’t have side effects, sure. But if the point was to gate func() to only run before deadline this doesn’t really help.

1 more reply

en4bz4y ago

You can use rseq on linux to avoid this.

ergl4y ago

To be fair your example would be unsafe with and kind of time API.

dheera4y ago

How about just

    syncTime()

And they take care of the mess

drewda4y ago· 7 in thread

I know this is a somewhat simplified story, but it does make me chuckle.

VirusNewbie4y ago

drewda4y ago

> You're likely conflating the fact that to achieve fault tolerant distributed computation, hadoop and hadoop like systems use a database like filesystem.

Yes, I specifically mentioned non-Google users adopting Hadoop, since it encompassed both a MapReduce implementation and supporting infrastructure.

> However, no one is looking at Map Reduce type jobs as a replacement for a database and vice versa.

1 more reply

d_watt4y ago

Those aren't really equivalent, are they? Hadoop is for analytics, spanner is transactional.

In terms of popular nosql vs google sql products, it's more Hadoop : BigQuery :: Mongo? : Spanner

You're pretty explicitly not supposed to run OLAP queries on spanner.

skj4y ago

Google uses map reduce extensively... where it's appropriate.

True time helps with things like spanner transactions. It's just a totally different use case.

dekhn4y ago

4 more replies

drewda4y ago

* slight exaggerations, I know

3 more replies

dekhn4y ago

amazing. you literally got everything wrong.

MapReduce predates TrueTime by a decade or more. MR was critical to scaling internet systems at the time it was released.

However, Flume + Spanner was a much nicer system to work with than MR + GFS, I'll give you that.

loxias4y ago· 4 in thread

Two observations/questions, the first probably naive:

* PTP's absence in the Amazon Time Sync Service article is quite conspicuous!

deelowe4y ago

Amazon time sync likely also requires extremely accurate clocks on the system board as well to prevent there from being too much drift.

NavinF4y ago

2 more replies

Gelob4y ago

Them providing PTP would be really interesting

loxias4y ago

To be clear, what I meant was "the fact that they're advertising whiz bang time synchronization, but then not using the correct protocol, makes me question the quality of the whole enterprise".

1 more reply

tyingq4y ago· 4 in thread

I'm curious about the title on this submission. I thought TrueTime had unusually strong guarantees about accuracy that don't seem to be called out in what I'm reading on the linked article.

morei4y ago

Not quite: Truetime has guarantees about ordering, rather than accuracy.

If you ask for the time and get A, and then ask for the time again and get B, then Truetime guarantees that A is less than B.

sizediterableOP4y ago

dang4y ago

"Please use the original title, unless it is misleading or linkbait; don't editorialize."

https://news.ycombinator.com/newsguidelines.html

tyingq4y ago

Oh, I wasn't criticizing, just genuinely curious. If the AWS service has a similar design, that's interesting to me. AWS doesn't seem to say much about the details.

babelfish4y ago· 1 in thread

Using atomic clocks too, nice. From an older post:

> It uses a fleet of redundant satellite-connected and atomic clocks in each Region to deliver time derived from these highly accurate reference clocks.

https://aws.amazon.com/blogs/mt/manage-amazon-ec2-instance-c...

Faaak4y ago

so... GPS ?

adamfeldman4y ago· 1 in thread

Google and GCP offer their own NTP endpoints. Does anyone know if GCP also exposes the TrueTime API to customers, or if it's only internal to Spanner?

I can't find any instance of the word "bound" in the GCP or Google NTP docs.

[1]: https://developers.google.com/time/guides#google_compute_eng...

[2]: https://cloud.google.com/compute/docs/instances/managing-ins...

londons_explore4y ago

I'm pretty sure if you use the GCP dataflow stuff, the dataflow worker VM's will be given access to an RPC server which has the necessary endpoints for TrueTime.

Obviously it's all a matter of reverse engineering rather than documented API's.

deanCommie4y ago

They released it in 2017. [1]

Today they released a new OSS daemon and library for it.

[1] https://aws.amazon.com/about-aws/whats-new/2017/11/introduci...

glenngillen4y ago

I really wish they made more of that stuff publicly available.

jeffbee4y ago

One of the key aspects of the TrueTime system is any device with serious clock error is simply murdered. It seems like offering that would significantly benefit users of this AWS API.

kohlerm4y ago

Does that mean cockroachdb or others could implement a Spanner like DB?

j / k navigate · click thread line to collapse