Performance Improvements Using Judy Arrays (opens in new tab)

exabrial13y ago

holy wow!

twistedpair13y ago

So if you want to run a big site on Ruby, slowly migrate all the Ruby bits you use heavily to C/++? Seems to be the storyline of their blog.

quandrum13y ago

Well, maybe to put it more generously:

1. Build a site in ruby (insert langauge/framework the creators are most comfortable/familiar with and can iterate quickly). 2. Build a community of a certain size that causes it scaling problems. 3. Identify the hotspots and re-implement them in more performant way.

This seems entirely reasonable to me. In fact, I'm guessing this is the path that all applications that get to Github scale take. Of course, it's easy to be snarky and identify Ruby as the problem. (Building the community is the hard part!)

dekz13y ago

You're assuming that writing the code in a different language immediately makes it performant. Would you still not perform the following: 1) Write app in Go 2) Identify hotspots 3) Improve/rework code of hotspot.

icebraining13y ago

Their problem was mostly related to the choice of data structure and the object allocation and subsequent garbage collection, not dynamic typing. I don't see how any of those languages would have helped, except by having a better garbage collector, in the case of Java and Scala.

jemfinch13y ago· 5 in thread

The article's credibility is significantly reduced by referring to Judy Trie lookups as O(log n) operations. They're actually O(log w) operations: the number of operations is bounded by the array's word size, not the size of the Judy array. A million elements in a Judy32 array will cause at most 7 node hops, not the 20 you'd expect from a binary tree.

mcintyre199413y ago

This is the genius of tries. We had to code up the data structures for Twitter for our coursework, for some reason as if it was all in memory. I found a suffix tree (not as clever, it branches on character, so normalised there were 36, and is designed for full text searching) was a really clever way to store that data because search time for a phrase doesn't increase as your data set does. Unrealistic for something like Twitter, but a really useful trick that this article definitely misses.

Peaker13y ago

Twitter's data is all in memory. That memory may not be random-access memory, though.

A pet peeve of mine is how people can do all the CS necessary to get nice data structures for their in-RAM data, but seem to forget everything they know and use very bad structures when they spill to flash or disk storage.

[1] http://en.wikipedia.org/wiki/Hey_Jude

aidenn013y ago

Actually its O(w) and so are optimal hash-tables (since the hashing function must be O(w).

Interestingly enough you can consider hash tables and Tries to be O(log n) since in order to have N distinct keys, the maximum key length must be at least log(N).

AaronFriel13y ago

7 node hops? It should be at most 4. Judy32 has a maximum branching factor of 256, and log base 256 of 2^32 is 4.

jamesaguilar13y ago

Iirc tries are not dense

jrochkind113y ago· 2 in thread

This is great. Someone should change the title so it's descriptive of what's going on, although I have no idea what that title should be, heh.

This post is about Github moving some code from ruby to C for performance, MRI's GC, and judy trie-like data structure and how it compares wrt to both performance and memory consuption with hash implementations.

127001brewer13y ago

"... I have no idea what that title should be, heh."

It's a reference to the Beatles' song, "Hey Jude" (which opens with "Hey, Jude, don't make it bad | Take a sad song and make it better ...").[1]

Edited for clarity.

mitchh13y ago

The issue is that the title, while clever, doesn't give a even a remote idea of what the post is about.

nthitz13y ago· 2 in thread

Sometimes Mods change the title to be the original article title. And sometimes they change it from the original article title to something else!

anateus13y ago

Yes... they change it to the original unless it's really clear the original change was super necessary. In cases when the original title tells you nothing of the content, they change it to something useful. Seems like a pretty transparent and reasonable heuristic to me.

nthitz13y ago

I agree that it's reasonable. I don't feel it's applied uniformly though.

revelation13y ago· 2 in thread

Something here doesn't sit right with me. Please don't launch into a lecture on algorithmic complexity and cache lines if you are running on Ruby on Rails.

jknightco13y ago

Slow platforms are exactly where you need to talk algorithmic complexity. Switching to a faster environment would only reduce a routine's execution time by a constant multiplier—switching algorithms has the potential to do much more.

revelation13y ago

That's not what I mean. I think that any half-decent implementation in C would have been better than their current Ruby solution. And instead of spending all that time to research Judy arrays and other magic bullets, they could have already identified another problem zone and fixed it.

This just feels like "we have this huge and slow Ruby behemoth here, but using magic Judy arrays we made this little gear turn insanely fast!".

lobster_johnson13y ago· 1 in thread

While this is very interesting, the real problem here is not Linguist stuff, but the fact that it's crammed into the web app.

A web app involved in a site as complex as Github's should really contain only the parts required to service web pages. A service like language classification is clearly better designed as a standalone service.

Aside from working around GC overhead, compositing big app from many smaller apps have many advantages. You force yourself to encapsulate complexity with strictly focused, network-friendly APIs (we use REST ourselves), which changes the way you have to think about programs. Hiding the implementation behind an API also allows you to swap out the implementation without the client ever knowing. Since the client and server are separated, you have fine-grained control over how much resources you are willing to spend on which components. And so on.

Having done monolithic Rails apps for many years, it's a model I am not going to back. Today, our web apps contain only UI logic. No database, no models, no data logic, just UI all the way. Everything else is handled by specialized components.

lobster_johnson13y ago

[Sorry about the horrendous spelling/grammar errors. I was in a hurry.]

habosa13y ago· 1 in thread

Wow this is an incredibly interesting and accessible article on performance tuning. I have two questions:

1) Is there an easy tutorial somewhere on calling out to native code from Ruby?

2) Could the team at Github possible give a little more detail on what you did to get all of those pretty benchmarking graphs? How did you get all of the info on memory usage and CPU activity (I assume it wasn't just time {command} > file.txt)

steveklabnik13y ago

> 1) Is there an easy tutorial somewhere on calling out to native code from Ruby?

The canonical documentation is here: https://github.com/ruby/ruby/blob/trunk/README.EXT

pjmlp13y ago· 1 in thread

I wonder if something like Crystal could have helped.

IanChiles13y ago

That looks amazing! Hadn't seen this before, but it's what I've been after for a long time.

thezilch13y ago· 1 in thread

For starters, the blog post was informative and should get some creative juices flowing for applications in others' stacks -- piqued my interests in hacking core utilities. However...

The graphs never show a 30K token mark, which is what the heap-allocation counter showed for their [programming language] classifier.

It's not clear to me that much RSS was saved. Maybe 20Mb? So, the next question is how many classifiers are running, where such a "hit" actually matters. Again, there is no 30K mark on the runtime graphs, but let's assume the generation to the graphs' left are linear. It looks like we're saving a half-second and removing most of the jitter, but it's not made clear how the removal of GC chunking has any effect on processing outside of the classifier. I just can't imagine how a couple of these classifiers can't keep up with the push rate of files to Github -- the classification improvements are neither low-hanging or part of the BigO in Github's stack. The process seems very able to run asynchronous at push time and un-deferred, if unrun, on page view. Unless they're seeing more than 1 million pageviews/sec (per classifier; just run more? one 20Mb service per httpd server?), because I can't really imagine hitting more than 10x their tokens (300K tokens), which is still only ~50Mb; the RSS graph, again, has a weird scale to suggest they might grow their tokens by 67x.

driverdan13y ago

They said in the article that they want to expand the classification data so my guess is they will be going well beyond 30k tokens.

ww52013y ago· 1 in thread

Isn't Judy Array patented? What's the legal status using it?

dalke13y ago

If patent does apply (and there's been no statement or claims about which ones do definitely apply), then people assume it's US6735595: Data structure and storage and retrieval method supporting ordinality based searching and data retrieval.

The software from HP is under the LGPLv2.1 which says "Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license."

That can imply that there are no additional patent protections should you use the LGPL'ed version of Judy or its derivatives. YMMV. Consult your nearest patent attorney for details.

The patent expires in 7.5 years.

jekub13y ago

There is some interesting things in this post but the main problem is not using Judy or something else, or talking about memory or complexity. The main problem I see here is using the wrong tool.

Replace the token matcher with a simple classifier (a maxent will work very well here) with n-gram of characters features through a hash-kernel and you get a very accurate, fast and low memory system.

I've build one two-years ago who accurately classified a bit over 100 different languages with only 64k features in the end. (So requiring only 8*64ko of memory) And this was without using file extensions as they weren't available in our case.

Before any hard optimizations, first check the methods used, and next the algorithm, anything else should go after.

prezjordan13y ago

Some previous discussion on Judy Arrays [0]. Apparently they are patented!

[0]: https://news.ycombinator.com/item?id=5043667

jerrysievert13y ago

Judy arrays are also faster in node.js: http://legitimatesounding.com/blog/Faster_sometimes_Associat...

the issue is with the patents - has anyone been contacted yet about the usage?

jongraehl13y ago

1. this could be done offline, or cached

2. unconvinced that they can't do better w/ the right hash table (which will be thousands of lines fewer of code). but since they're using a library, i'll give them a pass. not even a benchmark vs. a quick hash table version makes all the claims about "judy arrays are better for our purpose" suspect.

kristianp13y ago

So each language is stored with a prefix in the Judy array. This means that to identify the language of a token, you have to loop for all languages, prefixing the token and looking it up, keeping a count of matches for each language. Does that sound correct?

I wouldn't have used the prefix approach, instead storing a token once in the judy array, and using the data stored to indicate which languages match the token.

alexdowad13y ago

This is awesome!!! I remember looking for an object allocation profiling tool for Ruby 1.9+ some time ago, and not finding one (http://stackoverflow.com/questions/12204012/allocation-trace...). When will Github's --debug-objects option make it into MRI???

ikeepforgetting13y ago

Somebody finally found a use for them!

j / k navigate · click thread line to collapse

51 comments

39 comments · 17 top-level

exabrial13y ago· 6 in thread

I would have liked to seen the Judy array implementation in pure Ruby, so we could compare apples to apples. I'm not trying to troll... but basically they solved their problem by:

* Avoiding Ruby language features

* Rewriting it in a different language

This is why I lean towards static languages like Go, Scala, and Java.

pjmlp13y ago

Have a look at Crystal as well.

exabrial13y ago

holy wow!

twistedpair13y ago

So if you want to run a big site on Ruby, slowly migrate all the Ruby bits you use heavily to C/++? Seems to be the storyline of their blog.

quandrum13y ago

Well, maybe to put it more generously:

dekz13y ago

icebraining13y ago

jemfinch13y ago· 5 in thread

mcintyre199413y ago

Peaker13y ago

Twitter's data is all in memory. That memory may not be random-access memory, though.

[1] http://en.wikipedia.org/wiki/Hey_Jude

aidenn013y ago

Actually its O(w) and so are optimal hash-tables (since the hashing function must be O(w).

Interestingly enough you can consider hash tables and Tries to be O(log n) since in order to have N distinct keys, the maximum key length must be at least log(N).

AaronFriel13y ago

7 node hops? It should be at most 4. Judy32 has a maximum branching factor of 256, and log base 256 of 2^32 is 4.

jamesaguilar13y ago

Iirc tries are not dense

jrochkind113y ago· 2 in thread

This is great. Someone should change the title so it's descriptive of what's going on, although I have no idea what that title should be, heh.

127001brewer13y ago

"... I have no idea what that title should be, heh."

It's a reference to the Beatles' song, "Hey Jude" (which opens with "Hey, Jude, don't make it bad | Take a sad song and make it better ...").[1]

Edited for clarity.

mitchh13y ago

The issue is that the title, while clever, doesn't give a even a remote idea of what the post is about.

nthitz13y ago· 2 in thread

Sometimes Mods change the title to be the original article title. And sometimes they change it from the original article title to something else!

anateus13y ago

nthitz13y ago

I agree that it's reasonable. I don't feel it's applied uniformly though.

revelation13y ago· 2 in thread

Something here doesn't sit right with me. Please don't launch into a lecture on algorithmic complexity and cache lines if you are running on Ruby on Rails.

jknightco13y ago

revelation13y ago

This just feels like "we have this huge and slow Ruby behemoth here, but using magic Judy arrays we made this little gear turn insanely fast!".

lobster_johnson13y ago· 1 in thread

While this is very interesting, the real problem here is not Linguist stuff, but the fact that it's crammed into the web app.

lobster_johnson13y ago

[Sorry about the horrendous spelling/grammar errors. I was in a hurry.]

habosa13y ago· 1 in thread

Wow this is an incredibly interesting and accessible article on performance tuning. I have two questions:

1) Is there an easy tutorial somewhere on calling out to native code from Ruby?

steveklabnik13y ago

> 1) Is there an easy tutorial somewhere on calling out to native code from Ruby?

The canonical documentation is here: https://github.com/ruby/ruby/blob/trunk/README.EXT

pjmlp13y ago· 1 in thread

I wonder if something like Crystal could have helped.