FoundationDB Record Layer (opens in new tab)

(foundationdb.org)

321 pointsdavelester7y ago78 comments

78 comments

69 comments · 18 top-level

wwilson7y ago· 14 in thread

This is very cool!

FoundationDB excites a lot of people because it's an extremely scalable and extremely reliable distributed database that supports ACID transactions, and which is both open-source and has Apple standing behind it. And yeah, all of that is pretty nice.

But arguably the real power comes from the fact that it exposes a relatively low-level data model that can then be wrapped in one or more stateless "layers". All of these layers write to the same storage substrate, so you can have your document database, your SQL database, your time-series database, your consensus/coordination store, your distributed task queue, etc., etc., but you're only actually operating one stateful system. Your SREs will thank you.

Writing these layers to be scalable and high-performance can be challenging, but it looks like Apple is actively doing it and willing to release the results to the rest of us. This also suggests that their previous open-sourcing of the MongoDB-compatible document layer wasn't a one-off fluke. All of this is very good news for everybody who needs to run databases in the real world.

Full disclosure: I worked on FoundationDB a long, long time ago.

wwilson7y ago

Wow, from the post:

"Together, the Record Layer and FoundationDB form the backbone of Apple's CloudKit. We wrote a paper describing how we built the Record Layer to run at massive scale and how CloudKit uses it."

I think this is the first time that little detail has been publicly disclosed.

Scriptor7y ago

Isn't the code on GitHub?

SloopJon7y ago

Do I understand correctly that the record layer is only usable by Java clients? That strikes me as a drawback of the layer approach: unless you supply a server protocol, like the document layer does, every language essentially reimplements the layer from scratch. That seems to be the case for the tuple layer, for example.

gregwebs7y ago

It certainly doesn't have to be the case. The MongoDB layer can be used via the MongoDB protocol. TiDB is an entire MySQL implementation that any MySQL client can connect to which is built on top of TiKV (equivalent to FDB). If the goal is lower overhead of data transfer, another approach is writing layers in C (or Rust exposed as C) and then generating bindings for different languages.

rubyn00bie7y ago

Just wanted to say thanks for your testing distributed systems talk! That has been a huge help to me as I develop distributed systems and consider the problems inherent to them.

wikibob7y ago

Is this the video you mention?

https://youtu.be/4fFDFbi3toc

rubyn00bie7y ago

Yes it is! Sorry for not linking it and thank you for doing so!

nathan_long7y ago

This definitely sounds cool.

> your SQL database

They mention "a declarative query API", but as far as I can tell that's not actually SQL, right? So migrating from another relational db would require learning a new query language?

rdsubhas7y ago

One could technically write an SQL translation layer on top of it, as a client-side library? Or does it need support on the server-side / record layer?

nschiefer7y ago

(I'm from the iCloud team that works on the Record Layer.) Both building a relational database and implementing a proper SQL interface on top of it are huge projects. The SQL spec is large and complicated, so achieving true compatibility (as opposed to superficial compatibility) is challenging. Even worse, once you have a SQL interface users expect to be able to throw any SQL that they give to, say, Postgres, and have it work just as well, which requires a ton of detailed work on the query optimizer.

The client/server distinction isn't terribly strong in the FDB world. The FDB client is unusual in that it's a (stateless) part of the FDB cluster itself. You could therefore embed it in the client itself or build an RPC service around it. The Record Layer takes the same approach---it's just a Java library---so you could either embed it in the client application or build some kind of wire protocol for accessing it. One could have an embedded SQL layer like SQLite or H2 with no additional server beyond the cluster or a separate SQL layer network server that acted more like Postgres or MySQL.

The Record Layer was designed for use cases that don't need a SQL interface, so we focused on building the layer itself. That said, the Record Layer exposes a ton of extension points so there's a fluid boundary between what needs to live in its main codebase and what can be implemented on top. There are almost certainly enough extension points to implement a SQL interface as another layer on top of the Record Layer. For example, you could add totally new types of indexes outside of the Record Layer's codebase, if that were needed for SQL support. It's still a lot of work, especially on the query optimizer. Perhaps the community is up to that challenge. :-)

1 more reply

ryanworl7y ago

It would be tough to implement every SQL construct on top of what is there today. I could explain why (I have tried and presumably ran into the same issues they did described in various areas in the docs and the paper), but the docs give an authoritative answer to save me typing :)

"In the future it is possible that the Record Layer may develop a formal query language, but it is unlikely that such a language would closely resemble the SQL standard." [0]

[0] https://foundationdb.github.io/fdb-record-layer/FAQ.html

1 more reply

Nullabillity7y ago

> and has Apple standing behind it.

So far behind it that they already shut it down once.

shereadsthenews7y ago

I think foundation excites a lot of people who have never read its code or tried to operate it and therefore have only these statements of hype to go on.

ryanworl7y ago

FoundationDB is about as honest and up-front about its limitations and flaws as any system I've ever seen.

Have you had a negative experience with it you can share?

jwr7y ago· 7 in thread

Very interesting. I've been looking closely at FoundationDB as a way forward (to replace RethinkDB and Cassandra in existing systems). It's one of the few contenders for a really interesting take on a distributed database.

I am not sure if I will use the record layer (I've been planning to write "my layer" myself), but it will definitely be an interesting thing to look at.

_y4bi7y ago

Fellow RethinkDB user here. I’ve been looking at Cassandra and FoundationDB as replacements. I’m genuinely curious— what didn’t you like about Cassandra?

jwr7y ago

Cassandra

To be honest, I don't like anything about Cassandra. Beginning with the naming: back when I was trying to learn about Cassandra, I couldn't get past the obscure and bizarre naming (super-columns?). When I dealt with systems using it, I never quite understood how you can keep saying that "the later timestamp wins" and speak of consistency with a straight face: in a distributed system, there is no such thing as a "later timestamp". Or speak of transactions which aren't really transactions at all.

Then I read the Jepsen reports about Cassandra. Yes, Cassandra has made progress since then, but still.

I think of Cassandra as an outdated piece of technology at this point: we can (and do) build better distributed databases today, with better consistency guarantees, and proper transactions in case of FoundationDB. Cassandra was designed for a specific use case and then outgrew its initial design, because there was nothing else at the time. But I see no reason to stick with it any longer.

Even now when you need massive multi-region scalability there is little to choose from — if you want it to be open-source, there's pretty much only FoundationDB left.

aseipp7y ago

FoundationDB does not support true geo-replicated multi-region distribution the way Cassandra, Spanner, Cockroach, etc do, at least not without paying huge latency/round trip costs. If you want to avoid that, the best you can have is a separate failover region, and, with FoundationDB 6, you can get closer-to-LAN latencies for failover deployments to separate regions (but only one region) while retaining ACID semantics. You could build truly global geo-distribution on top of it but that would have to be its own layer that implements 2PC/Paxos or something between regions. Ultimately you have to pay the toll somewhere in a truly consistent system like that if you want global availability (unless you're Spanner and have incredible hardware engineering that can be deployed across the globe).

Cassandra/Scylla are the only open source key value stores that do linear scalability by simply adding nodes even in huge, geo-distributed settings as far as I know, but they are ultimately AP systems. And Scylla just has absurd performance compared to Cassandra or FoundationDB. You just have to know what you're getting into. (But yes, ACID transactions are a good model for developers, and truly FDB's linearizable transactions and high scalability make it an obvious choice many CP systems, if you ask me.)

2 more replies

manigandham7y ago

There is nothing else open-source that does multi-region active/active clusters like Cassandra/Scylla.

The rest either have a single cluster that can try to be stretched (usually with bad results or incredibly high latency) or is an enterprise feature using complicated log-shipping to apply updates everywhere.

mnutt7y ago

If you’re considering Cassandra, it’s probably also worth considering Scylla. It’s a drop-in replacement for Cassandra so shares some of its flaws, but is considerably more pleasant to run in production.

ddorian437y ago

Probably async

techie1287y ago

"Probably async"? Could you expand on it?

1 more reply

lima7y ago· 5 in thread

This might be the first good alternative to etcd for configuration stores that need real-time updates.

Like Kubernetes.

Many Kubernetes scaling issues are etcd-related.

RethinkDB is dead-ish, and CockroachDB is treating their changefeeds as an enterprise feature that requires a Kafka instance to stream to :(

PhilippGille7y ago

Is TiKV an alternative?

Short overview and maybe good to know it's becoming part of the CNCF: https//www.cncf.io/blog/2018/08/28/cncf-to-host-tikv-in-the-sandbox/

Haven't worked with it myself yet, but maybe others can share their experience?

There have also been some HN threads in the past, about TiDB at least.

c4pt0r7y ago

TiDB developer here. Yes, I think TiKV is an alternative to FDB. Compare to FDB Record Layer, TiKV aims to provide a more atomic primitive, just including Get/Set/Transaction in key-value layer, so users can build customized distributed system around it. The main differences between TiKV/TiDB and FDB are:

1. TiKV uses Multi-Raft architecture, I think Raft provides more HA.

2. TiKV's transaction model is inspired by Google Percolator, it's a classical optimistic 2PC transaction model with MVCC support. I'm not a expert of FDB, but I think different transaction models fit for different application scenarios, TiKV's transaction model is good when your workload is mainly small transactions and with a low conflict rate.

3. TiDB is a full-featured SQL layer on top of TiKV, aims to provide a MySQL compatible solution, you know, most of the TiDB users are migrated from the MySQL, so the focus of TiDB will be how to be compatible with these legacy MySQL-based applications. For example, how to read MySQL binlog and then replay on TiDB in real time, let TiDB become a MySQL active replica, or how to support complex SQL queries like distributed join or groupby, you know, building a full-featured SQL optimizer is a huge project.

There are some case studies:

https://pingcap.com/success-stories/

https://pingcap.com/success-stories/tidb-in-meituan-dianping...

There are some quick-start documents you can start with:

https://pingcap.com/docs/op-guide/docker-compose/

https://pingcap.com/docs/v2.0/op-guide/migration/#migrate-da...

smarterclayton7y ago

For kubernetes unless you’re past a couple of million keys it’s unlikely etcd is the issue, and other things would fall over first. However, if you’re at those levels it might be worth it to use something else.

I’m not sure foundation would be able to do the full consistent list required in the current model though (someone above mentioned full consistent table scans being not possible in FDB). In etcd and cockroach you can leverage MVCC and get a consistent scan over older history. But if FDB has that it’s pretty close (I looked into this before FDB was acquired the first time).

Edit: unlikely = in practice I haven’t seen anything except large range lock contention (which is why we added chunking to the api)

ryanworl7y ago

If you’re referring to my comment, by “not possible”, I meant “subject to the 5s transaction duration limit.” etcd and ZK could be implemented on top of FDB.

Artemis27y ago

DynamoDB with strong consistency turned on works pretty nicely for us.

mbesto7y ago· 4 in thread

Has anyone ever used FoundationDB and not found it successful? All I read is "it supports RDMS + NoSQL and can be distributed". So what use cases doesn't it solve?

ryanworl7y ago

The best way I can describe FoundationDB is it is like a file system. You can do just about whatever you’d like with files and a file system, in theory. You can implement just about any data model you can dream up in FDB.

But the current storage engine is not as well optimized as it could be.

It does have scalability limits, although they’re not relevant for 99.9% of use cases.

Upgrading a cluster to a new non-patch version will require a small (seconds) amount of downtime. A mitigating factor there is upgrading your client doesn’t have that limit, which is where all the interesting stuff is.

The minimum latency for a transaction is relatively high compared to systems which acknowledge writes before syncing to disk or only after syncing to a single disk.

I wouldn't say it doesn’t solve “use cases”. Rather, if you can live within the limitations (which means you need to know what they are), you can reduce the complexity and cost of designing a system for your use case by a lot.

Check out my talk from the FDB Summit for an example: https://youtu.be/SKcF3HPnYqg

mbesto7y ago

Super helpful, thanks for the info.

Birch-san7y ago

Even with the Record layer, it doesn't have support for JOINs in the same way as an RDBMS would.

Yes, the Record Layer helps you define and index into _hierarchies_ of entities, but I suspect it doesn't have an answer for other access patterns (e.g. producing "report" views that relate or aggregate non-hierarchical data).

You could retroactively construct a custom view _after the fact_, but only if you can do so within 5 seconds. And (if you want continued access to that view) you'll need to ensure that that view is maintained thereafter (you would have to define your own layer -- it cannot be entrusted to application logic unless you can atomically switch to a new version of your stack). Maintaining such a view is made more difficult if your data allows updates/deletes.

The same caveat affects schema migrations. You would need to be able to fit the migration into a 5 second transaction (or tell your applications to stop modifying the data for a while, and handle it as a series of smaller transactions).

My assessment is that if you have (non-hierarchical) relational use-cases at scale, FDB really requires you to plan your access patterns from day 1. Whereas a typical RDBMS fares far better at satisfying emergent needs. That said: FDB's model is brilliant for document store and key-value use-cases.

nschiefer7y ago

You're right the Record Layer doesn't yet have join support, but note PR #306 [1]. We support aggregate indexes but can't run aggregate “reports” that aren't backed by indexes. The rationale for this is covered in the paper (see [2]), but note that doesn't preclude such support being added into or on top of Record Layer (also discussed in the paper).

The Record Layer does a whole bunch of work to deal with index maintenance (see our docs [3])). Our “index maintainer” abstraction (discussed in the paper) makes maintaining our indexes (including those that are basically materialized views) completely seamless from the user's perspective, even for updates and deletes. We also have a lot of tooling for making efficient schema migrations. For example, schema migrations are performed lazily (when the data is accessed), so they aren't limited by the 5 second transaction limit. If you add/remove/change indexes, they'll be put into a “write-only” mode where they'll keep accepting writes while an “online indexer” builds the index over multiple transactions. We even have fancy logic to automatically adjust the size of the transactions if they start failing due to contention or timeouts!

Basically, the Record Layer solves a lot (but not all) of the pain points that shows up when you don't know your access patterns from the beginning. The paper talks a bit about how CloudKit uses some of those features.

[1] https://github.com/FoundationDB/fdb-record-layer/pull/306

[2] https://foundationdb.github.io/fdb-record-layer/FAQ.html — search for “aggregation”

[3] https://foundationdb.github.io/fdb-record-layer/SchemaEvolut...

manigandham7y ago· 3 in thread

Glad Apple is releasing all of this, I wonder what kickstarted it all?

The paper is rather interesting: https://www.foundationdb.org/files/record-layer-paper.pdf

mastox7y ago

recruitment?

gshack7y ago

Surely not apple has an issue with it

ubershmekel7y ago

I'm sure they're good, but better is better.

devj7y ago· 3 in thread

Few doubts:

1. Any reason to write it in Java instead of C, C++, Rust, etc?

2. Any reason to use Protobuf instead of Flatbuffers, Avro, etc?

3. Can FoundationdDB be used with Apache Arrow?

all0c7y ago

The Record Layer is written in Java as it was designed to fit in with an existing stack that was already primarily Java-based. You can read more about how CloudKit uses the Record Layer in the preprint of the Record Layer paper: https://www.foundationdb.org/files/record-layer-paper.pdf

Excellent question regarding the choice to use Protocol Buffers. Firstly, as mentioned in the paper released last year, CloudKit uses Protocol Buffers for client-server intercommunication. As a result, there was already expertise around protobuf, which is a good tie breaker when evaluating alternatives. (Here's that paper, by the way: http://www.vldb.org/pvldb/vol11/p540-shraer.pdf) Secondly, the Record Layer makes heavy use of Protocol Buffer descriptors, which specify the field types and names within protobuf schemata, and dynamic messages. Descriptors are used internally within the Record Layer to do things like schema validation. (For example, if an index is defined on a specific field, the descriptor can be checked to validate that that field exists in the given record type.) Likewise, dynamic messages make it possible for applications using the Record Layer to load their schema at run time by reading it from storage. The FDBMetaDataStore allows the user to do exactly that (while storing the schema persistently in FoundationDB): https://static.javadoc.io/org.foundationdb/fdb-record-layer-...

The Record Layer's data format is not compatible with the specification specified by Apache Arrow, no.

devj7y ago

Thanks for your reply. Would be really helpful if you can share the following:

1. Size of the CloudKit cluster and the number of RecordLayer instances. A ratio would also be enough to get an approx. idea.

2. How metadata changes involving field data type are being handled?

3. How are relationships and therefore, foreign keys handled? Are any referential actions like cascading deletes supported?

all0c7y ago

The Record Layer doesn't currently support foreign key constraints, so foreign keys are more of an “design pattern” than a first-class feature. For example, in a sample schema in the repository, an “Order” message has have a field called “item_id” that points to the primary key of an “Item” message: https://github.com/FoundationDB/fdb-record-layer/blob/792c95... There isn't an automatic check to make sure the item exists, though, nor are there cascading deletes. That being said, I don't think the architecture is incompatible with that feature, so it would be a reasonable feature request.

There are some guidelines regarding field type changes in the schema evolution guide: https://foundationdb.github.io/fdb-record-layer/SchemaEvolut... Most data type changes are incompatible with either Protobuf's serialization format or the FDB Tuple layer's serialization format (which the Record Layer users for storing secondary indexes and primary keys). The general advice for type changes (if there are existing data in your record stores) would instead be to introduce a new field of the new type and deprecate the old one.

mathnode7y ago· 3 in thread

Does anyone know if FoundationDB is gaining ground over Cassandra at Apple?

nemothekid7y ago

I recall a couple years ago that it was rumored that Apple had bought FDB with the intention of replacing Cassandra (and I think, at the time, Apple had the largest Cassandra cluster ever known).

Combined with other statements in this thread, I think that may be true. I remember reading once that iMessage used to be served by Cassandra, but now its served by FDB.

This is all speculation though.

seidoger7y ago

The FDB Record Layer white paper [0], section 8.1, does open with:

> 8.1 New CloudKit Capabilities

> CloudKit was initially implemented using Cassandra as the underlying storage engine.

So it seems this is what happened, for CloudKit at least.

[0] https://www.foundationdb.org/files/record-layer-paper.pdf

ta77567y ago

Apple's Cassandra footprint has grown to over 100 PB (https://twitter.com/jjirsa/status/1071357976454316033)

continuations7y ago· 3 in thread

Does that mean FDB now supports secondary indexes?

If that's the case, how does FDB compare to ScyllaDB now that they both have secondary indexes?

all0c7y ago

(I'm from the FDB team and work on the Record Layer.) As ryanworl's excellent answer suggests, the FoundationDB key value does not support secondary indexing on its own. It is strictly an ordered store mapping byte-array keys to byte-array values.

Secondary indexing is a core feature of the Record Layer, though! It includes a variety of secondary index types. The simplest are implemented using essentially the same strategy as ryanworl outlines (with more details on how that index works available in the key-value store documentation: https://apple.github.io/foundationdb/simple-indexes.html). And index updates are all entirely transactional (i.e., as the index update happens in the same transaction as record insertion, they are always consistent and up-to-date). However, all of that happens behind the scenes. The API presented to the user only asks for what record to save (update or insert), and then the Record Layer updates the appropriate indexes using a user-provided schema. Importantly, the Record Layer also supports handling the various stages of index maintenance (e.g., deleting an index's data after removing it from the schema or filling in data from existing records after an index is added). More can be found within the Record Layer overview: https://foundationdb.github.io/fdb-record-layer/Overview.htm...

ryanworl7y ago

FDB does not automatically index your data, but you can write a layer (like this one) to index your data.

In a transaction, you write a key like “users/1” with a value of “bob” and then write another key like “users/bob/1” with no value. Then you can do a range scan over the prefix “users/bob/“ and find all the primary keys. After that you do individual gets for the keys in the PK index to retrieve the full record if needed.

The comparison between the two is FDB “secondary indexes” are just like anything else in FDB. Namely, you update them in transactions and they are consistent immediately. Scylla does not AFAIK have this feature.

misframer7y ago

I wrote a blog post on how to implement secondary indexes using an ordered key-value store a couple of years ago: https://misfra.me/2017/01/18/how-to-implement-secondary-inde...

It would work with FoundationDB, RocksDB, etc. I actually learned these techniques when I interned at FoundationDB but have used them the most with other K-V systems.

abalone7y ago· 2 in thread

Apple low key does some cool server projects with a Java bent. They've contributed to Netty (well, they hired core developers).[1]

They've been basically put them to work reimplementing it in Swift.[2] It's open and out there but not a lot of people paying attention. While it's still early days I think there may a year where, suddenly, Swift on the server is a super serious thing and all this work they've been doing on little old CloudKit kind of takes over the world.

Just a fun prediction.. but it wouldn't be the first time Apple pulled something like that.

I do like that Swift's non-tracing garbage collection model is well suited for server apps. Rust is cool too but maybe Swift would be a little friendlier and thus better suited to inherit Java's mantle. I mean can you just imagine if Apple is slowly building up Swift to overtake Java on the server? That that's one of their long game master plans? I know that sounds completely crazy.. It just might work. They do run one of the biggest data center networks in the world so they have a pretty good testbed and can justify a hefty R&D budget.

[1] https://www.infoq.com/presentations/apple-netty

[2] https://github.com/apple/swift-nio

ascagnel_7y ago

Apple is probably hoping to run Swift on their servers. I don't foresee them putting in the effort into enterprise sales and service to make Swift overtake Java, though -- it hasn't really been their MO in the past.

abalone7y ago

They wouldn’t need to. They already partner with IBM for that stuff.

spullara7y ago· 2 in thread

I had built a layer like this one for my startup Bagcheck called Havrobase[1] (it was on top of HBase/Solr, here is the motivating blog post[2]) that ultimately I put on top of MySQL/Solr and other stores. Later, when we started Wavefront, I ported that layer to FDB and that still powers their metadata. Really a good fit and very much like this record layer. I highly recommend this approach for 24/7 services as you never need to have maintainence windows for schema upgrades and the like.

[1] https://github.com/spullara/havrobase [2] https://javarants.com/havrobase-a-searchable-evolvable-entit...

Initially at Wavefront we were using HBase for telemetry, Zookeeper for the service mesh and MySQL Cluster for entity metadata. All that was moved on top of FDB with 3 different layers that we developed.

I'm excited that this kind of database is now going to be available more broadly and with the confidence that CloudKit is using the same technology since to date implementing something like this was basically a DIY project.

thejerz7y ago

What were the pro's and con's of using FDB over HBase?

spullara7y ago

Several things caused us to move off of HBase:

1) Operationally, HBase is a nightmare whereas FDB is extremely easy to operate. 2) HBase doesn't natively, or efficiently with extensions, support transactions across rows. 3) GC makes HBase performance unpredictable whereas FDB is written in C++. 4) HBase depends on Zookeeper and it is operationally painful to support and we were replacing it with FDB also.

I don't think I will ever again use anything from the Hadoop ecosystem if I can get away with it.

bcx7y ago· 2 in thread

I learned that basically all of Imessages and contacts are stored on foundation DB, it's pretty great this is making it into opensource. Thanks Apple!

ryanworl7y ago

Are you using FDB at Olark?

(Saw it in your profile)

bcx7y ago

No :) Just off the shelf DBs so far. But the FoundationDB guys are HS friends ;).

georgewfraser7y ago· 2 in thread

Seeing this soon after the AWS “wire compatible with Mongo” kerfuffle, it makes me think: it would be amazing if the cloud vendors would offer a managed FDB service. An open-source, cloud-agnostic, horizontally scalable, document-oriented transactional database would be an incredible tool. I know AWS is going in the opposite direction these days with proprietary “wire compatible” services but a guy can dream...

mcintyre19947y ago

It superficially sounds a bit like Azure's CosmoDB - they say that'll scale horizontally as much as you need, it's document-oriented, ACID transactions, with SQL, Mongo and graph APIs. Obviously lacking badly the open-source and cloud-agnostic. I wonder if there's a world where Microsoft and Apple could work together to standardise something cloud-agnostic based on the best of both.

thanatos_dem7y ago

Pourque no los dos? Foundation announced a mongo api compatible document layer in November - https://www.foundationdb.org/blog/announcing-document-layer/

akavel7y ago· 1 in thread

Can someone please ELI5/executive summary to me what are the benefits of FoundationDB? Assuming I know the basics of PostgreSQL and ElasticSearch? I see some hype around it, but I can't understand what's the breakthrough. As a helping question: can you maybe try to tell me who are the expected users of it, vs. PSQL, ES? Or, when I should choose it over them? Also, what are its disadvantages? (I suspect bigger complexity, and bigger cost/worse effectiveness at small scale?) TIA!

dominotw7y ago

Its a distributed acid k/v layer that other models can be built on top of.

So you can build PostgreSQL, ElasticSearch on top of the foundationDB.

ryanworl7y ago

Congrats to the team at Apple for getting this released! They have had a busy few months with getting the document layer released, the FDB Summit, and now the record layer.

pier257y ago

So why would Apple be doing this now? Maybe preparing the terrain to enter the cloud space and compete with Azure and AWS in a couple of years?

After all, it's no mystery Apple wants to expand their services revenue. Their hardware revenue it's not growing as much as it used to.

nschiefer7y ago

The preprint of the paper is now up on arXiv.org: https://arxiv.org/abs/1901.04452

gigatexal7y ago

This is super exciting. Can’t wait to have some time this weekend to play with it.

Artemis27y ago

This is powering CloudKit. Very cool!

j / k navigate · click thread line to collapse

78 comments

69 comments · 18 top-level

wwilson7y ago· 14 in thread

This is very cool!

Full disclosure: I worked on FoundationDB a long, long time ago.

wwilson7y ago

Wow, from the post:

"Together, the Record Layer and FoundationDB form the backbone of Apple's CloudKit. We wrote a paper describing how we built the Record Layer to run at massive scale and how CloudKit uses it."

I think this is the first time that little detail has been publicly disclosed.

Scriptor7y ago

Isn't the code on GitHub?

SloopJon7y ago

gregwebs7y ago

rubyn00bie7y ago

Just wanted to say thanks for your testing distributed systems talk! That has been a huge help to me as I develop distributed systems and consider the problems inherent to them.

wikibob7y ago

Is this the video you mention?

https://youtu.be/4fFDFbi3toc

rubyn00bie7y ago

Yes it is! Sorry for not linking it and thank you for doing so!

nathan_long7y ago

This definitely sounds cool.

> your SQL database

They mention "a declarative query API", but as far as I can tell that's not actually SQL, right? So migrating from another relational db would require learning a new query language?

rdsubhas7y ago

One could technically write an SQL translation layer on top of it, as a client-side library? Or does it need support on the server-side / record layer?

nschiefer7y ago

1 more reply

ryanworl7y ago

"In the future it is possible that the Record Layer may develop a formal query language, but it is unlikely that such a language would closely resemble the SQL standard." [0]

[0] https://foundationdb.github.io/fdb-record-layer/FAQ.html

1 more reply

Nullabillity7y ago

> and has Apple standing behind it.

So far behind it that they already shut it down once.

shereadsthenews7y ago

I think foundation excites a lot of people who have never read its code or tried to operate it and therefore have only these statements of hype to go on.

ryanworl7y ago

FoundationDB is about as honest and up-front about its limitations and flaws as any system I've ever seen.

Have you had a negative experience with it you can share?

jwr7y ago· 7 in thread

I am not sure if I will use the record layer (I've been planning to write "my layer" myself), but it will definitely be an interesting thing to look at.

_y4bi7y ago

Fellow RethinkDB user here. I’ve been looking at Cassandra and FoundationDB as replacements. I’m genuinely curious— what didn’t you like about Cassandra?

jwr7y ago

Cassandra

Then I read the Jepsen reports about Cassandra. Yes, Cassandra has made progress since then, but still.

Even now when you need massive multi-region scalability there is little to choose from — if you want it to be open-source, there's pretty much only FoundationDB left.

aseipp7y ago

2 more replies

manigandham7y ago

There is nothing else open-source that does multi-region active/active clusters like Cassandra/Scylla.

mnutt7y ago

ddorian437y ago

Probably async

techie1287y ago

"Probably async"? Could you expand on it?

1 more reply

lima7y ago· 5 in thread

This might be the first good alternative to etcd for configuration stores that need real-time updates.

Like Kubernetes.

Many Kubernetes scaling issues are etcd-related.

RethinkDB is dead-ish, and CockroachDB is treating their changefeeds as an enterprise feature that requires a Kafka instance to stream to :(

PhilippGille7y ago

Is TiKV an alternative?

Short overview and maybe good to know it's becoming part of the CNCF: https//www.cncf.io/blog/2018/08/28/cncf-to-host-tikv-in-the-sandbox/

Haven't worked with it myself yet, but maybe others can share their experience?

There have also been some HN threads in the past, about TiDB at least.

c4pt0r7y ago

1. TiKV uses Multi-Raft architecture, I think Raft provides more HA.

There are some case studies:

https://pingcap.com/success-stories/

https://pingcap.com/success-stories/tidb-in-meituan-dianping...

There are some quick-start documents you can start with:

https://pingcap.com/docs/op-guide/docker-compose/

https://pingcap.com/docs/v2.0/op-guide/migration/#migrate-da...

smarterclayton7y ago

Edit: unlikely = in practice I haven’t seen anything except large range lock contention (which is why we added chunking to the api)

ryanworl7y ago

If you’re referring to my comment, by “not possible”, I meant “subject to the 5s transaction duration limit.” etcd and ZK could be implemented on top of FDB.

Artemis27y ago

DynamoDB with strong consistency turned on works pretty nicely for us.

mbesto7y ago· 4 in thread

Has anyone ever used FoundationDB and not found it successful? All I read is "it supports RDMS + NoSQL and can be distributed". So what use cases doesn't it solve?

ryanworl7y ago

But the current storage engine is not as well optimized as it could be.

It does have scalability limits, although they’re not relevant for 99.9% of use cases.

The minimum latency for a transaction is relatively high compared to systems which acknowledge writes before syncing to disk or only after syncing to a single disk.

Check out my talk from the FDB Summit for an example: https://youtu.be/SKcF3HPnYqg

mbesto7y ago

Super helpful, thanks for the info.

Birch-san7y ago

Even with the Record layer, it doesn't have support for JOINs in the same way as an RDBMS would.

nschiefer7y ago

[1] https://github.com/FoundationDB/fdb-record-layer/pull/306

[2] https://foundationdb.github.io/fdb-record-layer/FAQ.html — search for “aggregation”

[3] https://foundationdb.github.io/fdb-record-layer/SchemaEvolut...

manigandham7y ago· 3 in thread

Glad Apple is releasing all of this, I wonder what kickstarted it all?

The paper is rather interesting: https://www.foundationdb.org/files/record-layer-paper.pdf

mastox7y ago

recruitment?

gshack7y ago

Surely not apple has an issue with it

ubershmekel7y ago

I'm sure they're good, but better is better.

devj7y ago· 3 in thread

Few doubts:

1. Any reason to write it in Java instead of C, C++, Rust, etc?

2. Any reason to use Protobuf instead of Flatbuffers, Avro, etc?

3. Can FoundationdDB be used with Apache Arrow?

all0c7y ago

The Record Layer's data format is not compatible with the specification specified by Apache Arrow, no.

devj7y ago

Thanks for your reply. Would be really helpful if you can share the following:

1. Size of the CloudKit cluster and the number of RecordLayer instances. A ratio would also be enough to get an approx. idea.

2. How metadata changes involving field data type are being handled?

3. How are relationships and therefore, foreign keys handled? Are any referential actions like cascading deletes supported?

all0c7y ago

mathnode7y ago· 3 in thread

Does anyone know if FoundationDB is gaining ground over Cassandra at Apple?

nemothekid7y ago

I recall a couple years ago that it was rumored that Apple had bought FDB with the intention of replacing Cassandra (and I think, at the time, Apple had the largest Cassandra cluster ever known).

Combined with other statements in this thread, I think that may be true. I remember reading once that iMessage used to be served by Cassandra, but now its served by FDB.

This is all speculation though.

seidoger7y ago

The FDB Record Layer white paper [0], section 8.1, does open with:

> 8.1 New CloudKit Capabilities

> CloudKit was initially implemented using Cassandra as the underlying storage engine.

So it seems this is what happened, for CloudKit at least.

[0] https://www.foundationdb.org/files/record-layer-paper.pdf

ta77567y ago

Apple's Cassandra footprint has grown to over 100 PB (https://twitter.com/jjirsa/status/1071357976454316033)

continuations7y ago· 3 in thread

Does that mean FDB now supports secondary indexes?

If that's the case, how does FDB compare to ScyllaDB now that they both have secondary indexes?

all0c7y ago

ryanworl7y ago

FDB does not automatically index your data, but you can write a layer (like this one) to index your data.

misframer7y ago

I wrote a blog post on how to implement secondary indexes using an ordered key-value store a couple of years ago: https://misfra.me/2017/01/18/how-to-implement-secondary-inde...

It would work with FoundationDB, RocksDB, etc. I actually learned these techniques when I interned at FoundationDB but have used them the most with other K-V systems.

abalone7y ago· 2 in thread

Apple low key does some cool server projects with a Java bent. They've contributed to Netty (well, they hired core developers).[1]

Just a fun prediction.. but it wouldn't be the first time Apple pulled something like that.

[1] https://www.infoq.com/presentations/apple-netty

[2] https://github.com/apple/swift-nio

ascagnel_7y ago

abalone7y ago

They wouldn’t need to. They already partner with IBM for that stuff.

spullara7y ago· 2 in thread

[1] https://github.com/spullara/havrobase [2] https://javarants.com/havrobase-a-searchable-evolvable-entit...

thejerz7y ago

What were the pro's and con's of using FDB over HBase?

spullara7y ago

Several things caused us to move off of HBase:

I don't think I will ever again use anything from the Hadoop ecosystem if I can get away with it.

bcx7y ago· 2 in thread

I learned that basically all of Imessages and contacts are stored on foundation DB, it's pretty great this is making it into opensource. Thanks Apple!

ryanworl7y ago

Are you using FDB at Olark?

(Saw it in your profile)

bcx7y ago

No :) Just off the shelf DBs so far. But the FoundationDB guys are HS friends ;).

georgewfraser7y ago· 2 in thread

mcintyre19947y ago

thanatos_dem7y ago

Pourque no los dos? Foundation announced a mongo api compatible document layer in November - https://www.foundationdb.org/blog/announcing-document-layer/

akavel7y ago· 1 in thread

dominotw7y ago

Its a distributed acid k/v layer that other models can be built on top of.

So you can build PostgreSQL, ElasticSearch on top of the foundationDB.

ryanworl7y ago

Congrats to the team at Apple for getting this released! They have had a busy few months with getting the document layer released, the FDB Summit, and now the record layer.

pier257y ago

So why would Apple be doing this now? Maybe preparing the terrain to enter the cloud space and compete with Azure and AWS in a couple of years?

After all, it's no mystery Apple wants to expand their services revenue. Their hardware revenue it's not growing as much as it used to.

nschiefer7y ago

The preprint of the paper is now up on arXiv.org: https://arxiv.org/abs/1901.04452

gigatexal7y ago

This is super exciting. Can’t wait to have some time this weekend to play with it.

Artemis27y ago

This is powering CloudKit. Very cool!

j / k navigate · click thread line to collapse