Redis re-implemented in Rust (opens in new tab)

(github.com)

400 pointswmwragg11y ago106 comments

106 comments

56 comments · 15 top-level

shmerl11y ago· 9 in thread

I was just thinking, that Rust is a great candidate for big data processing tools. So much more than Java (which is annoyingly used a lot there). Something like Spark and HDFS should be implemented in Rust.

frankmcsherry11y ago

You can contribute to:

https://github.com/frankmcsherry/timely-dataflow

https://github.com/frankmcsherry/differential-dataflow

Or, just tell your friends. :)

Better yet, write some python / pandas / dataframes / whatever_the_cool_kids_need layer on top, and rule the next big data drama cycle.

shmerl11y ago

Thanks, those look very interesting! I'll go read about Naiad first :)

pron11y ago

The more cores you have and the more RAM, the bigger advantage GC has. The thing with having lots of RAM is that it's very hard to take advantage of it with on-stack data (which can, at most, use about 1-2% of the total RAM available -- do the math) and thread-local heaps. Once you use thread-local heaps/arenas, you need to shard your data. Any cross-shard access would mean locking, which doesn't scale very well. That's exactly where GCs shine: they let you have scalable, concurrent access to data with arbitrary lifespan. That's why Java is used for those kind of applications -- it performs and scales much better than Rust can hope to do on large machines.

You are right, though, that if the processing is extremely "batchy" and all data dies at the same time, then it doesn't make a difference.

shmerl11y ago

> That's why Java is used for those kind of applications

I'm not convinced that's the reason why Java is used for it. There are native alternatives like HPCC which claim to perform better.

As was noted, concurrent access to shared data is not something very common in such distributed computation scenario. Well designed processing will avoid it, and thus will avoid need for locking as well.

1 more reply

Meai11y ago

Rust has heap memory as well..?

1 more reply

maxdemarzi11y ago

Little secret... with Rust you can just do big data processing on a single core => http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...

shmerl11y ago

In many cases distributed computation is needed when there is BIG data, which won't fit on a laptop like in that example. I.e. distribution is needed not only because computation can't be done on one node (i.e. it would take too long), but because data can't possibly fit on any one node.

2 more replies

sampo11y ago

Many parallel algorithms become slower if you make them use too many cores, as they spend most of the time in communication, and very little in actual computations. Maybe the parallel systems would have been faster on a smaller number of cores than 128?

nickpsecurity11y ago

That's a pretty sad result to get from 128 cores. I've seen amateur Beowulf clusters get better results.

1 more reply

r0naa11y ago· 8 in thread

Impressive!

Could someone (or OP) elaborate on the value that re-implementing a whole software to a new language provide comparatively to just building an interface "bridging" both worlds?

To clarify, my metric for "value" is usefulness to other people. That is, without considering the (interesting) learning opportunity that it represent for the author.

For example, someone developed a Python interface to the Stanford Core-NLP library (written in Java). Would re-writing the Core NLP library to Python be useful to the community? How to figure what are people needs?

I am asking because while I think it would be ton of fun and allow me to learn a lot, I also value building useful software and re-writing a whole system sounds like an overkill except for a few very niche cases..

And if I am not mistaken you would need a team at least as large as the parent project to implement new features, fix bugs and keep pace with it. Looking forward to hear what HNers think!

edit: clarified ambiguities

nickpsecurity11y ago

Aside from project's reason, there's one very good reason to re-implement a bunch of stuff in Rust: testing whether it delivers. It's being pushed as a safer, better systems language. So, let's take many different C/C++ apps, rewrite them in Rust, and see what the results are across many metrics. By the end of it, we should know Rust's strengths and weaknesses in real-world coding.

themckman11y ago

The README answers this:

  To learn Rust.

Edit: It also mentions not being tied to UNIX and appears to claim it will run on Windows. That's certainly something.

r0naa11y ago

Sorry if I wasn't clear, but I am looking for a more general answer! I would like to know in which case it is useful (to other people) and discuss it's value comparatively to writing interfaces to other languages.

1 more reply

coldtea11y ago

>Could someone (or OP) elaborate on the value that re-implementing a whole software to a new language provide comparatively to just building an interface "bridging" both worlds?

Making it safer and even catching bugs in the original implementation (both things Rust will help with)?

Making it integrate seamlesly with the new language's ecosystem? E.g. Lucene is Java, and someone could use that, but there are tons of ports of it in C, C++, Python etc, providing convenience to integrate it with projects in these languages.

>And if I am not mistaken you would need a team at least as large as the parent project to implement new features, fix bugs and keep pace with it.

Not necessarily. A project with 10 part time contributors could be matched with a project with 2-3 full time competent hackers for example, or even surpassed.

frik11y ago

> Lucene is Java ... there are tons of ports of it in C, C++, Python, etc.

There used to be several ports, though most are dead and/or are several major versions behind. A new C++ or Rust port would be great, though unrealistic given the huge project side.

1 more reply

deet11y ago

This project specifically is probably not useful for others just because it is written in another language (unless it were to succeed in fixing problems or improving security or performance.) Redis is a server application, not a library, so there's a clearly defined and interoperable protocol to bridge the Redis server to other languages already. Writing a Redis client for the other language would be more useful.

More generally, as coldtea mentions, making integration into the rest of the language's ecosystem is the primary benefit of rewriting in another language.

The value of such a port to others depends on how easy it is to integrate between the two languages, either via libraries or other methods. The harder it is to integrate the two (and the absence of automated translation tools) increases the value of the rewrite to others.

Your Core-NLP example is actually an interesting one, because that library has already been ported to other languages... It is available for the C#/F# ecosystem (http://sergey-tihon.github.io/Stanford.NLP.NET/).

kbenson11y ago

In this case, it's a learning exercise. Sometimes, it's because there are other benefits, such as making it easier to distribute or easier to tie into specific parts of the algorithm than may be possible by calling a library that provides a high level interface.

unoti11y ago

From a learning perspective, reimplementation has key advantage versus other kinds of projects: the design is completely done, so you can focus exclusively on the mechanics of implementation.

thedufer11y ago· 6 in thread

Before anyone starts using this as a Redis replacement on Windows as the readme suggests, take a look at the TODO file. Notable missing features include:

- maxmemory key eviction

- hash values

- ~2/3 of the set operators

- multi/exec

- lua scripting

This is an interesting and potentially useful effort, but a replacement for Redis it is not.

shankun11y ago

By the way, if you are looking for a production-quality Windows port of Redis, there is a fork available at https://github.com/MSOpenTech/redis. We (Microsoft) provide it in production as Azure's cache service today, and are committed to continuing to work on it.

kawsper11y ago

Although it sparked some debate when Microsoft ported Redis to Win32 with libuv (http://oldblog.antirez.com/post/redis-win32-msft-patch.html) I am impressed by their commitment that the fork is still going 4 years later.

ddlutz11y ago

Are you part of the Azure cache service? I was an intern the the Edge Caching and Storage team and joining back fulltime next month. If things are the same way they were it would be worth it exploring using Redis for our cache and I'd like to talk details.

1 more reply

netcraft11y ago

I want to thank you for that work and look forward to 3.0 there.

seppo001011y ago

Also notice that the main developer only has two months of experience with Rust, so it is probably not as stable and well tested as Redis. As stated in the README, the main goal is to learn Rust.

Zancarius11y ago

The README does seem a bit ambitious, but it notes that the purpose was to learn Rust.

I'm sure pull requests to bring it up to feature parity would be welcome!

unfamiliar11y ago· 5 in thread

Could somebody give me a tl;dr on Redis? I keep hearing about it but from the summary I can't tell what kind of applications it is being used for.

twic11y ago

The phrase i like most is "data structure server". It's basically a giant heap that you can fill with data structures - strings, lists, sets, sorted sets, maps, bitsets, and this wacky HyperLogLog thing:

http://redis.io/topics/data-types-intro

The data structures are all addressed by string keys.

Redis can persist this heap to disk, and load it again, so you get a measure of durability, but the typical use case is for data you can afford to lose - caches, metrics, some kinds of events, etc.

Redis's key non-functional strengths are speed and robustness. Operations people love it because you stick it in production and it just quietly keeps on working without needing attention or setting your CPU on fire.

To my mind, any project should have PostgreSQL as its first data store. But it should probably have Redis as its second, when it finally has some data that needs to be changed or accessed so fast that PostgreSQL can't keep up.

(Kafka is third)

aaggarwal11y ago

From their official github page, Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, HyperLogLogs, Bitmaps.

It simply means that the key-value store is directly loaded into the memory (RAM) and is available for fast access, but the data is retained (persistent) even after the application is closed.

It is usually used as cache store, queuing messages to communicate with different processes locally or distributed.

rakoo11y ago

To add to that, Redis is a TCP server so you can speak to it from multiple processes on multiple machines (and Redis will easily support huge loads)

Its data structure cover a good part of what you'd need with generic data structures, which makes Redis an easy way to do the logic of, say, List intersection of friends common between multiple people, sorted set of goods ranked by their amount, all of this shared with other processes.

Redis also offers pubsub capabilities in two forms:

- A standard PUB/SUB couple which does what you think it does

- Blocking pop on a list for a client, and a push for another client, which will "wake up" the first one with the value.

It's a very versatile swiss knife.

kaitnieks11y ago

It's a memory cache to take some load off your DB. The good thing about Redis is the useful data structures it supports: lists, sets, hashes, bitmaps and somewhat more specific sorted sets.

iagooar11y ago

Put key & read key with straight-forward abstractions. Simple and beautiful.

It can be used for caching, queues and for applications with volatile data.

sudhirj11y ago· 4 in thread

I'm try the same thing for similar reasons in Go, but I'm wondering if at some point a Go version would perform better than C. On a machine with a large number of cores, perhaps?

GitHub.com/sudhirj/restis

Also wondering if some rethinking is possible - would a HTTP interface a la DynamoDB be more useful? Can complexity and performance be increased by using a purely memory backend with no disk persistence? If there were pluggable back ends would a Postgres or DynamoDB back end be more useful for terabytes / petabytes of data? Is the beauty of Redis the API or the implementation?

endymi0n11y ago

> but I'm wondering if at some point a Go version would perform better than C.

The answer is "no" with a certain amount of probability. Redis isn't single threaded by lack of capability, but by design. Concurrency for multiple CPUs will actually slow down a lot of the stuff you see, as you will need to introduce locking mechanisms.

Also, garbage collection is highly tuned and customized in Redis to the use case of an in-memory-DB (in stark contrast to usual allocation patterns of an application), up to the point where it's almost impossible to replicate the performance in a garbage collected language.

I love Go and we're a 100% Go (and Angular) shop, but for an in-memory DB it wouldn't be a sane choice.

pzduniak11y ago

https://github.com/siddontang/ledisdb

vidarh11y ago

You can turn off disk persistence in Redis. My main usage of Redis is with disk persistence turned off (it handles a few hours worth of samples of data that we don't care if we lose - we care about having long term averages only of the data in question).

There should be minimal overhead from having the capability in Redis due to the way it implements disk snapshots (RDB snapshots are done by fork()'ing and relying on copy on write to let the main process keep on doing its thing while the child process writes the snapshot, so the main process doesn't need to care; other than that Redis offers logging/journalling of changes, but the cost of having that as an option is trivial if it's switched off).

Having pluggable backends for things like Postgres or DynamoDB seems a bit at odds with the purpose of Redis, which is exactly that you pay the (very low) cost of in-memory manipulation of simple data structures, though if a single Redis server could partition the key space between plugins, it might potentially be useful by letting you e.g. move keys between backends and still access them transparently to the client. E.g. for the samples I mentioned above, we roll up and transfer data to a CouchDB instance for archival now (doesn't matter that it's CouchDB really - we just need a persistent key-value store; Postgres or DynamoDB would also both have worked fine), but if I could archive them while still having them visible in a Redis-like server together with the in-memory keys, that'd make the client a tiny bit simpler.

For most Redis usage, I think paying the cost of connection setup and teardown and sending HTTP headers etc. would likely slow things down immensely. At least it would for my usage. Having a HTTP interface as an addition might be useful in some scenarios to enable new use cases, but as a replacement for the current Redis API would be a disaster.

If you want to explore alternative interfaces, I'd instead suggest going in the opposite direction, and experimenting with a UDP interface. In a typical data centre setting packet loss is low enough that while you'd need retry logic, it wouldn't necessarily get exercised much in normal situations.

(On the other hand, for the typical request/reply cycle it might very well not give any benefits vs. tcp in most scenarios where multiple request/replies are done over a single connection and thus amortising the connection setup cost - would be interesting to benchmark, though)

alexchamberlain11y ago

Ah, HTTP is your hammer

vicpara11y ago· 4 in thread

Why would someone do that? To what end? Why isn't anyone re-writing Redis in assembler to have it kick ass like pros? Can you write Windows in rust?

derefr11y ago

What I'm personally really surprised about is that nobody's rewriting Redis as a unikernel to clear away all the OS context-switching/networking overhead from its basic operations.

Jweb_Guru11y ago

Redis is already leaving plenty of performance on the table, e.g. by not having any concurrent shared memory data structures (the fastest concurrent hash tables achieve better throughput even on inserts than the fastest single-threaded ones). It does this in the name of implementation simplicity. People focused on implementation simplicity don't generally abandon the operating system.

1 more reply

itamarhaber11y ago

The "Why" is @seppo010's to answer (but having it run as is on all OSs is a big plus for one). As for writing it in Assembler, that makes little practical sense since Redis is written in (ANSI) C and it quite well optimized. In fact, if you profile Redis you'll see that very little time is actually spent by the code itself - OS, storage and network are the real bottlenecks usually.

adamrt11y ago

Its at the very top of the readme. Here is a direct link to your question. https://github.com/seppo0010/rsedis#why

undefined011y ago· 2 in thread

For me, Redis was the first software written in C which I could easily customize with additional features (as I have low C knowledge). It was written beautifully. I've been learning Rust, I certainly find learning Rust easier than C and I like the fact that I can dive into software written in Rust without worrying about GC. You've done the best of both worlds for me by writing Redis in Rust. With that said, I'm still having an easier time reading Redis in C over your code as you are lacking comments and well named function/variable names. I admire your work nonetheless.

seppo001011y ago

That's a fair criticism, and well taken, but keep in mind I'm learning the language as I go, and most of the commits are just rewriting things because they were suboptimal, not idiomatic, or hard to read. At this stage, I would consider adding comments wasteful.

I also have no intention of making this project live as long or have as many users as Redis does.

illumen11y ago

Learning to write readable code is a good thing I think worth the effort.

Can rust be readable?

1 more reply

kibwen11y ago· 1 in thread

Since this seems to be just a learning project, note as well that there exist Rust bindings to Redis itself, from Armin Ronacher (though I'm not sure if they've yet been updated to work on 1.0): https://github.com/mitsuhiko/redis-rs

the_mitsuhiko11y ago

Yep, works with 1.0.

wyaeld11y ago· 1 in thread

The readme says its a learning project.

Its a very interesting piece of work though.

I'll be interested to see Antirez's view on the trade-offs between C and Rust for this.

seppo001011y ago

He is at least curious. https://twitter.com/antirez/status/611189939519229952

ahmetmsft11y ago· 1 in thread

Care to post details about this? Is this actually fast? Does it implement all features and guarantees of redis? Should anybody actually use this in production (maybe because it works on Windows)? Is it well tested?

Looks like a really cool effort but authors of open source projects often think people would read the code and figure out all, the truth is people usually look at what's in the readme and that's all the attention span most people are going to have. My 2c: improve your README.md.

detaro11y ago

He links a list of missing stuff in the readme.

And if you read "Why? To learn rust" and ask "should I use this in production"...

resca7911y ago

I like this kind of project. But the use case of redis it's a little bit exstream, I mean that the main feature of redis is the speed and the way how the memory consuption is handled. If this requirements are not satisfied, it is only a very good way to learn Rust( as the author goal) and the redis internal.

GeertVL11y ago

So how do you re-implement something like Redis in another language? Is it more of a translation job or do you start with splitting the concepts and try to implement it. Or take the idea and go your own way with implementing it?

clu311y ago

Man you should have named it Rudis

beyondcompute11y ago

Spectacular! Could you add synchronous replication though? And coalescing queries (so that entire system processes queries in batches, say 300 times per second)?

vamitrou11y ago

Is it compatible with the .rdb redis dumps?

j / k navigate · click thread line to collapse

106 comments

56 comments · 15 top-level

shmerl11y ago· 9 in thread

frankmcsherry11y ago

You can contribute to:

https://github.com/frankmcsherry/timely-dataflow

https://github.com/frankmcsherry/differential-dataflow

Or, just tell your friends. :)

Better yet, write some python / pandas / dataframes / whatever_the_cool_kids_need layer on top, and rule the next big data drama cycle.

shmerl11y ago

Thanks, those look very interesting! I'll go read about Naiad first :)

pron11y ago

You are right, though, that if the processing is extremely "batchy" and all data dies at the same time, then it doesn't make a difference.

shmerl11y ago

> That's why Java is used for those kind of applications

I'm not convinced that's the reason why Java is used for it. There are native alternatives like HPCC which claim to perform better.

1 more reply

Meai11y ago

Rust has heap memory as well..?

1 more reply

maxdemarzi11y ago

Little secret... with Rust you can just do big data processing on a single core => http://www.frankmcsherry.org/graph/scalability/cost/2015/01/...

shmerl11y ago

2 more replies

sampo11y ago

nickpsecurity11y ago

That's a pretty sad result to get from 128 cores. I've seen amateur Beowulf clusters get better results.

1 more reply

r0naa11y ago· 8 in thread

Impressive!

Could someone (or OP) elaborate on the value that re-implementing a whole software to a new language provide comparatively to just building an interface "bridging" both worlds?

To clarify, my metric for "value" is usefulness to other people. That is, without considering the (interesting) learning opportunity that it represent for the author.

And if I am not mistaken you would need a team at least as large as the parent project to implement new features, fix bugs and keep pace with it. Looking forward to hear what HNers think!

edit: clarified ambiguities

nickpsecurity11y ago

themckman11y ago

The README answers this:

  To learn Rust.

Edit: It also mentions not being tied to UNIX and appears to claim it will run on Windows. That's certainly something.

r0naa11y ago

1 more reply

coldtea11y ago

>Could someone (or OP) elaborate on the value that re-implementing a whole software to a new language provide comparatively to just building an interface "bridging" both worlds?

Making it safer and even catching bugs in the original implementation (both things Rust will help with)?

>And if I am not mistaken you would need a team at least as large as the parent project to implement new features, fix bugs and keep pace with it.

Not necessarily. A project with 10 part time contributors could be matched with a project with 2-3 full time competent hackers for example, or even surpassed.

frik11y ago

> Lucene is Java ... there are tons of ports of it in C, C++, Python, etc.

There used to be several ports, though most are dead and/or are several major versions behind. A new C++ or Rust port would be great, though unrealistic given the huge project side.

1 more reply

deet11y ago

More generally, as coldtea mentions, making integration into the rest of the language's ecosystem is the primary benefit of rewriting in another language.

kbenson11y ago

unoti11y ago

From a learning perspective, reimplementation has key advantage versus other kinds of projects: the design is completely done, so you can focus exclusively on the mechanics of implementation.

thedufer11y ago· 6 in thread

Before anyone starts using this as a Redis replacement on Windows as the readme suggests, take a look at the TODO file. Notable missing features include:

- maxmemory key eviction

- hash values

- ~2/3 of the set operators

- multi/exec

- lua scripting

This is an interesting and potentially useful effort, but a replacement for Redis it is not.

shankun11y ago

kawsper11y ago

ddlutz11y ago

1 more reply

netcraft11y ago

I want to thank you for that work and look forward to 3.0 there.

seppo001011y ago

Also notice that the main developer only has two months of experience with Rust, so it is probably not as stable and well tested as Redis. As stated in the README, the main goal is to learn Rust.

Zancarius11y ago

The README does seem a bit ambitious, but it notes that the purpose was to learn Rust.

I'm sure pull requests to bring it up to feature parity would be welcome!

unfamiliar11y ago· 5 in thread

Could somebody give me a tl;dr on Redis? I keep hearing about it but from the summary I can't tell what kind of applications it is being used for.

twic11y ago

http://redis.io/topics/data-types-intro

The data structures are all addressed by string keys.

Redis can persist this heap to disk, and load it again, so you get a measure of durability, but the typical use case is for data you can afford to lose - caches, metrics, some kinds of events, etc.

(Kafka is third)

aaggarwal11y ago

It simply means that the key-value store is directly loaded into the memory (RAM) and is available for fast access, but the data is retained (persistent) even after the application is closed.

It is usually used as cache store, queuing messages to communicate with different processes locally or distributed.

rakoo11y ago

To add to that, Redis is a TCP server so you can speak to it from multiple processes on multiple machines (and Redis will easily support huge loads)

Redis also offers pubsub capabilities in two forms:

- A standard PUB/SUB couple which does what you think it does

- Blocking pop on a list for a client, and a push for another client, which will "wake up" the first one with the value.

It's a very versatile swiss knife.

kaitnieks11y ago

It's a memory cache to take some load off your DB. The good thing about Redis is the useful data structures it supports: lists, sets, hashes, bitmaps and somewhat more specific sorted sets.

iagooar11y ago

Put key & read key with straight-forward abstractions. Simple and beautiful.

It can be used for caching, queues and for applications with volatile data.

sudhirj11y ago· 4 in thread

I'm try the same thing for similar reasons in Go, but I'm wondering if at some point a Go version would perform better than C. On a machine with a large number of cores, perhaps?

GitHub.com/sudhirj/restis

endymi0n11y ago

> but I'm wondering if at some point a Go version would perform better than C.

I love Go and we're a 100% Go (and Angular) shop, but for an in-memory DB it wouldn't be a sane choice.

pzduniak11y ago

https://github.com/siddontang/ledisdb

vidarh11y ago

alexchamberlain11y ago

Ah, HTTP is your hammer

vicpara11y ago· 4 in thread

Why would someone do that? To what end? Why isn't anyone re-writing Redis in assembler to have it kick ass like pros? Can you write Windows in rust?

derefr11y ago

What I'm personally really surprised about is that nobody's rewriting Redis as a unikernel to clear away all the OS context-switching/networking overhead from its basic operations.

Jweb_Guru11y ago

1 more reply

itamarhaber11y ago

adamrt11y ago

Its at the very top of the readme. Here is a direct link to your question. https://github.com/seppo0010/rsedis#why

undefined011y ago· 2 in thread

seppo001011y ago

I also have no intention of making this project live as long or have as many users as Redis does.

illumen11y ago

Learning to write readable code is a good thing I think worth the effort.

Can rust be readable?

1 more reply

kibwen11y ago· 1 in thread

the_mitsuhiko11y ago

Yep, works with 1.0.

wyaeld11y ago· 1 in thread

The readme says its a learning project.

Its a very interesting piece of work though.

I'll be interested to see Antirez's view on the trade-offs between C and Rust for this.

seppo001011y ago

He is at least curious. https://twitter.com/antirez/status/611189939519229952

ahmetmsft11y ago· 1 in thread

detaro11y ago

He links a list of missing stuff in the readme.

And if you read "Why? To learn rust" and ask "should I use this in production"...

resca7911y ago

GeertVL11y ago

clu311y ago

Man you should have named it Rudis

beyondcompute11y ago

Spectacular! Could you add synchronous replication though? And coalescing queries (so that entire system processes queries in batches, say 300 times per second)?

vamitrou11y ago

Is it compatible with the .rdb redis dumps?

j / k navigate · click thread line to collapse