> We figured we could tune the garbage collector to happen more often in order to prevent large spikes, so we implemented an endpoint on the service to change the garbage collector GC Percent on the fly. Unfortunately, no matter how we configured the GC percent nothing changed. How could that be? It turns out, it was because we were not allocating memory quickly enough for it to force garbage collection to happen more often.
As someone not too familiar with GC design, this seems like an absurd hack. That this 2-minute hardcoded limitation is not even configurable comes across as amateurish even. I have no experience with Go -- do people simply live with this and not talk about it?
> The ballast in our application is a large allocation of memory that provides stability to the heap.
> As noted earlier, the GC will trigger every time the heap size doubles. The heap size is the total size of allocations on the heap. Therefore, if a ballast of 10 GiB is allocated, the next GC will only trigger when the heap size grows to 20 GiB. At that point, there will be roughly 10 GiB of ballast + 10 GiB of other allocations.
Also, it is almost trivial to edit the Go sources (they are included in the distribution) and rebuild it, which usually takes just a minute. So Go is really suited for your own experiments - especially, as Go is implemented in Go.
Well, parts of it. You can't implement "make" or "new" in Go yourself, for example.
Obviously they consider spending 50% more on hardware is a worthwhile compromise for the gains they get (e.g. reduction of developer hours and reduced risk of security flaws or avoiding other effects of invalid pointers).
Ruby 1.8.x wants to say "Hello"
> We kept digging and learned the spikes were huge not because of a massive amount of ready-to-free memory, but because the garbage collector needed to scan the entire LRU cache in order to determine if the memory was truly free from references.
So maybe this is one of those things that just doesn't come up in most cases? Maybe most services also generate enough garbage that that 2-minute maximum doesn't really come into play?
But (virtually) nobody is writing games in Go, so it's entirely possible that it's an unusual case in the Go ecosystem. Being an unsupported usecase is a great reason to switch language.
You could maybe hack around the GC performance without destroying the aims of LRU eviction by batching additions to your LRU data structure to reduce the number of pointers by a factor of N. It's also possible that a Go BTree indexed by timestamp, with embedded data, would provide acceptable LRU performance and would be much friendlier on the cache. But it might also not have acceptable performance. And Go's lack of generic datastructures makes this trickier to implement vs Rust's BtreeMap provided out of the box.
This is something important to know before choosing a GC-based language for a task like this. I don't think "generating more garbage" would help, the problem is the scan is slow.
If Discord was forced to do this in pure Go, there is a solution, which is basically to allocate a []byte or a set of []bytes, and then treat it as expanse of memory yourself, managing hashing, etc., basically, doing manual arena allocation yourself. GC would drop to basically zero in that case because the GC would only see the []byte slices, not all the contents as individual objects. You'll see this technique used in GC'd languages, including Java.
But it's tricky code. At that point you've shucked off all the conveniences and features of modern languages and in terms of memory safety within the context of the byte expanses, you're writing in assembler. (You can't escape those arrays, which is still nice, but hardly the only possible issue.)
Which is, of course, where Rust comes in. The tricky code you'd be writing in Go/Java/other GC'd language with tons of tricky bugs, you end up writing with compiler support and built-in static checking in Rust.
I would imagine the Discord team evaluated the option of just grabbing some byte arrays and going to town, but it's fairly scary code to write. There are just too many ways to even describe for such code to end up having a 0.00001% bug that will result in something like the entire data structure getting intermittently trashed every six days on average or something, virtually impossible to pick up in testing and possibly even escaping canary deploys.
Probably some other languages have libraries that could support this use case. I know Go doesn't ship with one and at first guess, I wouldn't expect to find one for Go, or one I would expect to stand up at this scale. Besides, honestly, at feature-set maturity limit for such a library, you just end up with "a non-GC'd inner platform" for your GC'd language, and may well be better off getting a real non-GC'd platform that isn't an inner platform [1]. I've learned to really hate inner platforms.
By contrast... I'd bet this is fairly "boring" Rust code, and way, way less scary to deploy.
It is worth it to read and understand: https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...
If these issues were more common, there would be more configuration available.
[EDIT] to downvoters: I'm not saying it's not an issue worth addressing (and it may have already been since they were on 1.9), I was just answering the question of "why this might happen"
GOGC=off
As someone mentions below.More details here: https://golang.org/pkg/runtime/
I think with that, you could turn off GC after startup, then turn it back on at desired intervals (e.g. once an hour or after X cache misses).
It's definitely risky though. E.g. if there is a hiccup with the database backend, the client library might suddenly produce more garbage than normal, and all instances might OOM near the same time. When they all restart with cold caches, they might hammer the database again and cause the issue to repeat.
IIRC I have used GitLab and Bitbucket and self-hosted Gitea instances the same exact way, and I'm fairly sure there was an hg repo in one of those. Don't recall doing anything out of the ordinary compared to how I would use a github URL.
Sometimes it means an easy thing in most other languages is difficult or tiresome to do in Go. Sometimes it means hard-coded values/decisions you can't change (only tabs anyone?).
But overall this makes for a language that's very easy to learn, where code from project to project and team to team is very similar and quick to understand.
Like anything, it all depends on your needs. We've found it suits ours quite well, and migrating from a Ruby code base has been a breath of fresh air for the team. But we don't have the same performance requirements as Discord.
These are two things that make a lot of sense at Google if you read why they were done.
But unless you're working at Google, I struggle to guess why you would care about either of these things. The first requires sacrificing anything resembling a reasonable type system, and even with that sacrifice Go doesn't really deliver: are we really supposed to buy that "go generate" isn't a compilation step? The second is sort of nice, but not nice enough to be a factor in choosing a language.
The core language is currently small, but every language grows with time: even C with its slow-moving, change-averse standards body has grown over the years. Currently people are refreshed by the lack of horrible dependency trees in Go, but that's mostly because there aren't many libraries available for Go: that will also change with time (and you can just not import all of CPAN/PyPy/npm/etc. in any language, so Go isn't special anyway).
If you like Go for some aesthetic of "simplicity", then sure, I guess I can see how it has that. But if we're discussing pros and cons, aesthetics are pretty subjective and not really work talking about.
The JVM world tends to solve this problem by using off-heap caches. See Apache Ignite [0] or Ehcache [1].
I can't speak for how their Rust cache manages memory, but the thing to be careful of in non-GC runtimes (especially non-copying GC) is memory fragmentation.
Its worth mentioning that the Dgraph folks wrote a better Go cache [2] once they hit the limits of the usual Go caches.
From a purely architectural perspective, I would try to put cacheable material in something like memcache or redis, or one of the many distributed caches out there. But it might not be an option.
It's worth mentioning that Apache Cassandra itself uses an off-heap cache.
[0]: https://ignite.apache.org/arch/durablememory.html [1]: https://www.ehcache.org/documentation/2.8/get-started/storag... [2]: https://blog.dgraph.io/post/introducing-ristretto-high-perf-...
"Remarkably, we had only put very basic thought into optimization as the Rust version was written. Even with just basic optimization, Rust was able to outperform the hyper hand-tuned Go version."
For those who care, I was interested how off-heap caching works in Java and I did some quick searching around the Apache Ignite code.
The meat is here:
- GridUnsafeMemory, an implementation of access to entries allocated off-heap. This appears to implement some common Ignite interface, and invokes calls to a “GridUnsafe” class https://github.com/apache/ignite/blob/53e47e9191d717b3eec495...
- This class is the closest to the JVM’s native memory, and wraps sun.misc.Unsafe: https://github.com/apache/ignite/blob/53e47e9191d717b3eec495...
- And this, sun.misc.Unsafe, is what it’s all about: http://www.docjar.com/docs/api/sun/misc/Unsafe.html
It’s very interesting because I did my fair share of JNI work, and context switches between JVM and native code are typically fairly expensive. My guess is that this class was likely one of the reasons why Sun ended up implementing their (undocumented) JavaCritical* etc functions and the likes.
Unsafe was one of the cooler aspects to Java that Oracle is actively killing for, well, no good reason at least.
Aren't these Unsafe memory read and write methods intrinsified by any serious compiler? I don't believe they're using JNI or doing any kind of managed/native transition, except in the interpreter. They turn into the same memory read and write operations in the compiler's intermediate representation as Java field read and writes do.
Yeah, but I really do not bite your argument.
When you are reduced to do manual memory management and fight the GC of your language, maybe you should simply not use a language with GC in the first place.
They are right to use rust ( or C/C++) for that. It's not for nothing that redis (C) is so successful in the LRU domain.
> It's worth mentioning that Apache Cassandra itself uses an off-heap cache.
And still ScyllaDB (C++) is able to completely destroy Cassandra in term of AVG latency [0]
As far as I know, a mark-and-sweep collector like Go's doesn't have any advantage over malloc/free when it comes to memory fragmentation. Am I missing some way in which Go's GC helps with fragmentation?
I only glossed over the article but the problem they had with Go seems to be the GC incurred from having a large cache. Their cache eviction algorithm was efficient, but every 2 minutes there was a GC run which slowed things down. Re-implementing this algorithm in Rust gave them better performance because the memory was freed right after the cache eviction.
Splitting it across more processes will result in more cache misses and more DB calls.
Now I'm wondering if there's a Rust library for a generational copying arena--one that compacts strings/blobs over time.
You cannot use a caching server at that scale with those latency requirements. It has to be embedded
Can you speak to why using something like memcache or redis may not be an option?
https://go.scylladb.com/7-reasons-no-external-cache-database...
Looks like this issue was resolved for maps that don't contain pointers by [1]. From the article, sounds like the map keys were strings (which do contain pointers, so the map would need to be scanned by the GC).
If pointers in the map keys and values could be avoided, it would have (if my understanding is correct) removed the need for the GC to scan the map. You could do this for example by replacing string keys with fixed size byte arrays. Curious if you experimented this approach?
After spending weeks fighting with Java's GC tuning for a similar production service tail latency problem, I wouldn't want to be caught having to do that again.
Once I spend even the plurality of my time cleaning up messes instead of doing something new (and there are ways to do both), then all the life is sucked out of me and I just have to escape.
Telling me that I have to keep using a tool with known issues that we have to process or patches to fix would be super frustrating. And the more times we stumble over that problem the worse my confirmation bias will be.
Even if the new solution has a bunch of other problems, the set that is making someone unhappy is the one that will cause them to switch teams or quit. This is one area where management is in a tough spot with respect to rewrites.
Rewrites don't often fix many things, but if you suspect they're the only thing between you and massive employee turnover, you're between a rock and a hard place. The product is going to change dramatically, regardless of what decision you make.
Really all languages with tracing GC are at a disadvantage when you have a huge number of long-lived objects in the heap. The situation is improved with generational GC (which Go doesn't have) but the widespread use of off-heap data structures to solve the problem even in languages like Java with generational GC suggests this alone isn't a good enough solution.
In Go's defense, I don't know another GC'ed language in which this optimization is present in the native map data structure.
Go 1.9 is fairly old (1.14 is about to pop out), and there have been large improvements on tail latency for the Go GC over that period.
One of the Go 1. 12 improvements in particular seems to at least symptomatically line up with what they described, at least at the level of detail covered in the blog post:
https://golang.org/doc/go1.12#runtime
“Go 1.12 significantly improves the performance of sweeping when a large fraction of the heap remains live.“
The problem is that garbage collectors are optimized for applications that mostly have short-lived objects, and a small amount of long-lived objects.
Things like large in-RAM LRU are basically the slowest thing for a garbage collector to do, because the mark-and-sweep phase always has to go through the entire cache, and because you're constantly generating garbage that needs to be cleaned.
I think it's not quite that.
Applications typically have a much larger old generation than young generation, i.e. many more long lived objects than short lived objects. So GCs do get optimized to process large heaps of old objects quickly and efficiently, e.g. with concurrent mark/sweep.
However as an additional optimization, there is the observation that once an application has reached steady state, most newly allocated objects die young (think: the data associated with processing a single HTTP request or user interaction in a UI).
So as an additional optimization, GCs often split their heap into a young and an old generation, where garbage collecting the young generation earlier/more frequently overall reduces the mount of garbage collection done (and offsets the effort required to move objects around).
In the case of Go though, the programming language allows "internal pointers", i.e. pointers to members of objects. This makes it much harder (or much more costly) to implement a generational, moving garbage collector, so Go does not actually have a young/old generation split nor the additional optimization for young objects.
There would be no need for a GC to traverse the entire map, but that's because rust doesn't use a GC.
- You'd have to ensure that your large data structure gets allocated entirely within the special region. That's simple enough if all you have is a big array, but it gets more complicated if you've got something like a map of strings. Each map cell and each string would need to get allocated in the special region, and all of the types involved would need new APIs to make that happen.
- You'd have to ensure that data structures in your special region never hold references to anything outside. Since the whole point of the region is that the GC doesn't scan it, nothing in the region will be able to keep anything outside the region alive. Any external references could easily become dangling pointers to freed memory, which is the sort of security vulnerability that GC itself was designed to prevent.
All of this is doable in theory, but it's sufficiently difficult, and it comes with sufficiently many downsides, that it makes more sense for a project with these performance needs to just use C or Rust or something.
[1] https://docs.oracle.com/javase/9/gctuning/garbage-first-garb...
That real business problem is Java generates boat load of garbage so GC needs a lot more performance tuning to make application run normal.
As you fight your language, you're GC avoidance system will become larger and larger. At some point you might re-evaluate your latency requirements, your architecture, and which are the right tools for the job.
Probably, yeah. But the Golang team would never add such a feature because of their philosophy of keeping the language simple.
Rust is faster than Go. People use Go, like any other technology, when the tradeoffs between developer iteration/throughput/latency/etc. make sense. When those cease to make sense, a hot path gets converted down to something more efficient. This is the natural way of things.
Well, sure, because categorizing languages as "valid/invalid" doesn't make any sense.
But it does show yet another example of how designing a language to solve Google's fairly-unique problems doesn't result in a general-purpose language suitable for solving most people's problems.
Go is actually great to solve most people's problem with web servers, while Rust is better for edge cases.
How much of Google's infrastructure actually runs on Go tho? :)
This shows only a single example where Go is not very suitable, but it doesn't prove a general case on its own.
Also note this was with Go1.9. I know GC work was ongoing during that time, I wonder if this time of situation would still happen?
> We tried upgrading a few times. 1.8, 1.9, and 1.10. None of it helped. We made this change in May 2019. Just getting around to the blog post now since we've been busy.
https://www.reddit.com/r/programming/comments/eyuebc/why_dis...
> Another Discord engineer chiming in here. I worked on trying to fix these spikes on the Go service for a couple weeks. We did indeed try moving up the latest Go at the time (1.10) but this had no effect.
> For a more detailed explanation, it helps to understand what is going on here. It is not the increased CPU utilization that causes the latency. Rather, it's because Go is pausing the entire world for the length of the latency spike. During this time, Go has completely suspended all goroutines which prevents them from doing any work, which appears as latency in requests.
> The specific cause of this seems to be because we used a large free-list like structure, a very long linked list. The head of the list is maintained as a variable, which means that Go's mark phase must start scanning from the head and then pointer chase its way through the list. For whatever reason, Go does (did?) this section in a single-threaded manner with a global lock held. As a result, everything must wait until this extremely long pointer chase occurs.
> It's possible that 1.12 does fix this, but we had tried upgrading a few times already on releases that promised GC fixes and never saw a fix to this issue. I feel the team made a pragmatic choice to divest from Go after giving the language a good attempt at salvaging the project.
EDIT: Actually, no it didn't, I misunderstood it.
It isn't surprising to me. It's stated elsewhere they tried 4 difference version of Go, up through 1.10 apparently, and had performance problems with all of them. At some point you can't suffer garbage collector nonsense anymore and since they'd already employed Rust on other services they tried it here.
It worked on the first try.
That's not surprising either.
What would be surprising is if any of these "but version such and such is Waaay better and they should just use that" actually panned out. The best case would be that the issue just manifests as some other garbage collector related performance problem. That's the deal you sign up for when you saddle yourself with a garbage collector.
Go is what would have happened if Bell Labs wrote Java.
In that case, Go is Bell Labs' second attempt at Java.
(And Kernighan was their floor-mate too, that must have been a stunningly great environment)
And Unix is what happened when Bell Labs wrote an operating system -- something that was born outdated from the start.
Just like Golang.
I also think it is great that Discord is using the right tool for the job. It isn't often that you need the performance gains that Rust & Tokio so pick what works best to get the job done and iterate.
> Rust is blazingly fast and memory-efficient: with no runtime or garbage collector, it can power performance-critical services, run on embedded devices, and easily integrate with other languages.
I’m not so sure they would have done the rewrite if the Go GC was performing better, and the choice of Rust seems primarily based on prior experience at the company writing performance sensitive code rather than delivering business value.
We have 2 golang services left, one of them has a rewrite in rust in PR as of last week (as a fun side project an engineer wanted to try out.)
Additionally, as we move towards a more SOA internally, we plan to write more high velocity data services, and rust will be our language of choice for that.
Kinda like this: https://blog.sentry.io/2016/10/19/fixing-python-performance-... ?
This isn’t exactly “Linux kernel: now in Rust!”
Glad you’re making tech for you all better.
We get to take up the externalized runtime costs of the mess that is the Electron app.
Engineers are super efficient at offloading the last mile of effort.
They were able to rewrite their hot spot in a new language without having to rewrite all their business logic in a new language. Not that there wouldn’t have been solutions with a monolith, but this certainly seems elegant and precise.
B) They weren't allocating a lot, and Go was enforcing a GC sweep every 2 minutes, and it was spending a lot of time on their LRU cache. To "reduce allocations" they had to cut their cache down, which negatively impacted latency.
> These latency spikes definitely smelled like garbage collection performance impact, but we had written the Go code very efficiently and had very few allocations. We were not creating a lot of garbage.
The problem was due to the GC scanning all of their allocated memory and taking a long time to do so, regardless of it all being necessary and valid memory usage.
In many languages with GC you can actually do manual memory management relatively easily with few helper functions. You write your own allocate() and free() functions/methods. When you allocate, you check the free list first, if nothing is available, you do normal allocation. When you call free you add the object into a free list. If you memory management leaks, it triggers GC.
Usually you need to do that kind of stuff to only in few places and few data structures to cut GC 90%.
The article also states that is was quite easy to port over and didn't need any quirky tuning.
Collections are one of the big areas where Go's lack of generics really hurts it. In Go, if one of the built in collections does not meet your needs, you are going to take a safety and ergonomic hit going to a custom collection. In Rust, if one of the standard collections does not meet your needs, you (or someone else) can create a pretty much drop-in replacement that does that has similar ergonomic and safety profiles.
A corollary to this is that adding more generic collections to Go’s standard library implies expanding the set of magical constructs.
(anecdotal: in Java I've never needed anything else than a HashMap or an ArrayList)
EDIT: Currently at -4 downvotes. Would downvoters care to discuss their votes?
But "it's for Google, and you aren't Google" isn't a novel perspective, doesn't leave me with new insights, and isn't really actionable for either Google or people who aren't Google.
Usually this criticism is leveled at Go's dependency management story, with the implication being that it's suited to Google's monorepo but not normal people's repo habits. It's not clear to me how the criticism relates to the issues discussed in the article, which seem to be more about the runtime and GC behavior.
Your comment also doesn't come off as amusing or otherwise entertaining, so it feels like you're just dunking on Go users without really aiming to make anyone's day better.
Disclaimer: I use Go at work and think it's incredibly frustrating at times.
No, I wouldn't say Go is specific to Google's problems, though I'm sure some of the engineers had them in mind. I see Go used far more outside of Google than in.
It was made by people who had been designing languages for about 40 years now. While some design choices seem weird, they usually have very strong argumentation and solid experience behind them.
Also if you read the list of problems tha Go is intended to solve, you will be surprised how common they are in software development.
I suppose that won't stop the GC from scanning the memory though ... so maybe they had something akin to that. I assume that a company associated with games and with some former games programmers would have thought to use pool allocators. Honestly, if that strategy didn't work then I would be a bit frustrated with Go.
I have to say, out of all of the non-stop spamming of Rust I see on this site - this is definitely the first time I've thought to myself that this is a very appropriate use of the language. This kind of simple yet high-throughput workhorse of a system is a great match for Rust.
The allocations were not the issue, the article notes that they did little to no allocations, hence the GC only running on forced triggers (every 2mn)
I realize you're not advocating pervasive use of the technique, but if someone reading this is going to make pervasive use of manually managed object pools in a GC'd language, they should at least consider the possibility of moving to a language with both good language support for manually managed memory and a good ecosystem of tooling around manual memory management.
Manually managed object pools in a language designed around GC don't fully get rid of the costs of GC, and re-expose the program to most of the errors (primarily use-after-free, double-free, and leaks related to poorly reasoned ownership) that motivated so much effort in developing garbage collectors in the first place.
It appears that Go has a lower CPU floor, but it's killed by the GC spikes, presumably due to the large cache mentioned by the author.
This is interesting to me. It suggests that Rust is better at scale than Go, and I would have thought with Go's mature concurrency model and implementation would have been optimized for such cases while Rust would shine in smaller services with CPU bound problems.
Great post!
Their use case doesn't seem to have either consideration (note that even when these are considerations a hybrid of languages is often a good idea) so there isn't a compelling reason to choose C++. That doesn't mean C++ is wrong, just that there is nothing wrong with rust. Maybe a great C++ programmer can get a few tenths of a percent faster code (mostly because compiler writers spend more effort figuring out how to optimize C++ - rust uses the same llvm optimizer but it might sometimes do something less optimal because it assumed C++ input), but in general if the difference matters in your environment you are too close to the edge and need to scale.
Rust might be easier/faster to write than modern C++. If so that is a point in favor of rust. They seem to have people who know rust, which is important. There might be more people who know C++, but I can take any great programmer and make them good in any programming language in a few weeks in the worst case (worst case would be writing a large program in intercal or some such intentionally hard language) - not to be confused with expert which takes more experience.
For me the interesting part is that their new implementation in Rust with a new data structure is less than 2x faster than an implementation in Go using a 2+years old runtime.
It shows how fast Go is vs an very optimized language + new data structure with no GC.
Overall I'm pretty sure there was a way to make the spikes go away.
Still great post.
Can someone explain to me how BTreeMap is more memory efficient than a HashMap?
Note that this explanation is a bit handwavy, as both data structures have numerous optimizations in production scenarios.
Why would BTreeMap be faster than HashMap? HashMap performance is O(1), while BTreeMap performance is O(log N).
Which brings me to my second point: hashtable based data structures are not worst-case O(1). They are worst-case O(n), because in the worst case, you will either have to scan every entry in your table (open addressing) or walk a list of size n (separate chaining). Of course, good hashtable implementations will not allow a situation with so many collisions, but in order to avoid that, they will need to allocate a new table and copy over the contents of the old, which is also a O(n) operation.
Given two kinds of data structures, one which is average-case O(1), but worst-case O(n) versus best- and worst-case O(log n), which one you choose depends on what kinds of performance you're optimizing for, and how bad the constants are that we've been ignoring. If you care more about throughput, then you usually want average-case O(1), as the occasional latency spikes aren't important to you. But if you care more about latency, then you'll probably want to choose worst-case O(log n), assuming that its implementations constants aren't too bad.
2. Memory usage on a hash map would be worse especially if the fill ratio is relatively low.
They're probably right, because Google doesn't need it. But for everyone else who decided to use a language designed to solve Google's fairly-unique problems as if it were a general-purpose language: that kind of sucks, doesn't it?
Write their code in a functional style. Get the benefits of the Erlang BEAM platform.
Their system runs over the web, so time sensitivity isn’t as important, in comparison to video games, VR, or AR.
Anyone ever done a performance comparison breakdown between something like Elixir vs. Rust?
No. It would have been unshippably bad. BEAM is generally fairly slow. It was fast at multitasking for a while, but that advantage has been claimed by several other runtimes in 2020. As a language, it is much slower than Rust. Plus, if you tried to implement a gigantic shared cache map in Erlang/Elixir, you'd have two major problems: One is that you'd need huge chunks of the map in single (BEAM) processes, and you'd get hit by the fact BEAM is not set up to GC well in that case. It wants lots of little processes, not a small number of processes holding tons of data. Second is that you'd be trading what in Rust is "accept some bytes, do some hashing, look some stuff up in memory" with generally efficient, low-copy operations, with "copy the network traffic into an Erlang binary, do some hashing, compute the PID that actually has the data, send a message to that PID with the request, wait for the reply message, and then send out the answer", with a whole lot of layers that expect to have time to make copies of lots of things. Adding this sort of coordination into these nominally fast lookups is going to slow this to a crawl. It's like when people try to benchmark Erlang/Elixir/Go's threading by creating processes/goroutines to receive two numbers and add them together "in parallel"; the IPC completely overshadows the tiny amount of work being done. (They mention tokio, but that's still going to add a lot less coordination overhead than Erlang messages.)
Go is a significantly better language for this use case than Elixir/Erlang/BEAM is, let alone Rust.
(This is not a "criticism" of Erlang/Elixir/BEAM. It's an engineering analysis. Erlang/Elixir/BEAM are still suitable for many tasks, just as people still use Python for many things despite the fact it would be a catastrophically bad choice for this particular task. This just isn't one of the tasks it would be suitable for.)
I get for certain core code situations, you want to manage all memory safety yourself (or use built in static GC), but beyond that it seems to me at a higher level you'd rather have the automatic GC. Why burden all of your developers rather than just a core few?
I don't think GC issues is a compelling argument to move everything to Rust. I'm not saying there aren't compelling arguments, but that just seems a bit odd that that's their main argument.
"Go sucked for us because we refused to own our tooling and make a special allocator for this service. Switching to Rust forced us to do that, and life got better"
Don't really know about Go versus Rust for this purpose, but don't really care because read states (like nearly everything that makes Discord less like IRC) is an anti-feature in any remotely busy server. Anything important enough that it shouldn't be missed can be pinned, and it encourages people to derail conversations by replying out of context to things posted hours or days ago.
The big thing IMO is that once started I normally leave discord running, and most actions within discord itself feel very snappy - E.g. You click on a voice channel and you're instantly there. I think that's what they mean, they're trying to keep the delay for such an action low. Sometimes you click a voice Chanel and there's a few seconds of delay, those for some reason more annoying then the long (ish) startup time
The idea for this is (if I remember correctly) to be able to return unused memory to the OS. As returning memory requires a gc to run, it is forced in time intervals. I am a bit surprised that they didn't contact the corresponding Go developers, as they seem to be interested in practical use cases where the gc doesn't perform well. Besides that newer Go releases improved the gc performance, I am a bit surprised that they didn't just increase this time interval to an arbitrary large number and checked, if their issues went away.
To be fair, most languages without GCs also don't have good language constructs to support manual memory management. If you're going to make wide use of manual memory management, you should think very carefully about how the language and ecosystem you're using help or hinder your manual memory management.
This blog post kinda internally matches our upgrade to std::futures and tokio 0.2, away from futures 0.1.
I imagine a bunch of front end servers managing open web sockets connections, and also proving filtering/routing of newly published messages. Alas, it’s probably best categorized as a multicast-to-server, multicast-to-user problem.
Anyways, if there’s an elegant solution to this problem, would love to learn more.
(from 2015): "Go is building a garbage collector (GC) not only for 2015 but for 2025 and beyond: A GC that supports today’s software development and scales along with new software and hardware throughout the next decade. Such a future has no place for stop-the-world GC pauses, which have been an impediment to broader uses of safe and secure languages such as Go." [2]
Also, as others have said, lots of big GC improvements were ignored by insisting on go1.9.2 and not the latest.
I just checked and as usually, I have an entry labeled "Discord Helper (Not Responding)" in my process list. I don't think i've ever seen it in a normal state.
I'm quiet disappointed though they did not update their Go Version to 1.13[0][1] which would normally have remove the spike issue and thus he latency before they move to Rust...
Rust seems more performant with proper usage ( tokio + async ) but I'm more worried about the ecosystem that doesn't seem has mature has Go.
We could quote the recent[2] Drama with Actix...
[0]https://golang.org/doc/go1.13#runtime [1]https://golang.org/doc/go1.12#runtime [2]https://github.com/fafhrd91/actix-web-postmortem
https://github.com/golang/go/search?q=forcegcperiod&unscoped...
Go's GC seems kind of primitive.
The Go's GC is groundbreaking in several aspects, but probably needs to provide ways to fine-tune it. Posts like this make me believe that one-size-fits-all settings are yet to be seen.
[0]: https://blog.twitch.tv/en/2019/04/10/go-memory-ballast-how-i...
Yes, reality is more complex since they probably have multi socket servers/NUMA, which might add memory access latencies and atomic updates to the LRU might require a locking scheme, which also isn't trivial (and where async Rust might be useful).
This doesn't only go for Go.
[0] https://blog.cloudflare.com/recycling-memory-buffers-in-go/
This is more from niche to niche. Thought that was interesting, but yet the discussion here wasn't all that different to the usual. Guess it's flamewars always, regardless of popularity.
In gaming industry there are similar problems with GC and they were solved with memory pools
Yeah please tell me again how GC is a superior solution to reference counting in cases when you know exactly when you don't need the object anymore.
(Hint: RC is not GC if the object is dealocating itself)
Rust won again.
Garbage collection has gotten a lot of updates in the last 3 years. Why would you not take the exceedingly trivial step of just upgrading to the latest Go stable in order to at least try for the free win? From the go 1.12 release notes: “Go 1.12 significantly improves the performance of sweeping when a large fraction of the heap remains live. This reduces allocation latency immediately following a garbage collection.” ¯\_(ツ)_/¯ This sounds like “we just wanted to try Rust, ok?” Which is fine. But like, just say that.
> Embracing the new async features in Rust nightly is another example of our willingness to embrace new, promising technology. As an engineering team, we decided it was worth using nightly Rust and we committed to running on nightly until async was fully supported on stable.
> Changing to a BTreeMap instead of a HashMap in the LRU cache to optimize memory usage.
It is always an algorithm change
The thing is, you can allocate memory outside of Go, and GC will simply ignore such regions, since GC only scan regions known to it. (Mmap should work like a charm here.) A drawback is that pointers in such regions will not be counted, but it's easy to workaround by copying whole data, which is encouraged by the language itself.
TBH, Go sucks for storing a large amount of data. As you can see here, even the simplest cache can be problematic. The language is biased towards large datacenters, where the amount of available resources are less of a concern. Say, this problem can be solved by having external cache servers and extra nodes around them. Latency will not be idealistic, but the service will survive with minimal changes.