* They do raw packet reassembly using gopacket, and gopacket keeps TCP reassembly buffers that can grow without bound when you miss a TCP segment. They capped the buffers, and the huge 5G spikes went away.
* They were reading whole buffers into memory before handing them off to YAML and JSON parsers. They passed readers instead.
* They were using a protobuf diffing library that used `reflect` under the hood, which allocates. They generated their own explicit object inspection thingies.
* They stopped compiling regexps on the fly and moved the regexps to package variables. (I actually don't know if this was a significant win; there might just be the three big wins.)
This is a great article. But none of these seem Go-specific†, or even GC-specific. They're doing something really ambitious (slurping packets up off the wire against busy API servers, reassembling them in userland into streams, and then parsing the contents of the streams). Memory usage was going to be fiddly no matter what they built with. The problems they ran up against seem pretty textbook.
Frankly I'm surprised Go acquitted itself as well as it did here.
† Maybe the perils of `reflect` count as a Go thing; it's worth noting that there's folk wisdom in Go-land to avoid `reflect` when possible.
Buffering is a pretty common bad habit. As programmers, we know stuff is going to go wrong, and we don't want to tell the user "come back later" (or in this case, undercount TCP stream metrics)... we want to save the data and automatically process it when we can so they don't have to. But, unfortunately it's an intrinsic Law Of The Universe that if data comes in a X bytes per second, and leaves at X-k bytes per second, then eventually you will use all storage space in the Universe for your buffer, and then you have the same problem you started with. (Storage limits in mirror may be closer than they appear.) Getting it into your mind that you have to apply back pressure when the system is out of its design specification is pretty crucial. Monitor it, alert on it, fix it, but don't assume that X more bytes of RAM will solve your problem -- there will eventually be a bigger event that exceeds those bounds.
Incidentally, the reason why you can make Zoom calls and use SSH while you download a file is because people added software to your networking stack that drops packets even though buffer space in your consumer-grade router are available. That tells your download to chill out so SSH and video conferencing packets get a chance to be sent to the network. The people that made the router had one focus -- get the highest possible Speedtest score. Throughput, unfortunately, comes at the cost of latency (bandwidth * buffer size for every single packet!), and it's not the right decision overall.
I don't know where I was going with this rant but ... when your system is overloaded, apply backpressure to the consumers. A packet monitoring system can't do that (people wouldn't accept "monitoring is overloaded, stop the main process"), but it does have to give up at some point. If you don't have any more memory to reassemble TCP connections, mark the stream as an error and give up. If you're dumping HTTP requests into a database, and the database stops responding, you'll just have to tell the HTTP client at the other end "too many requests" or "temporarily unavailable". To make the system more reliable, keep an eye on those error metrics and do work to get them down. Don't just add some buffers and cross your fingers; you'll just increase latency and still be paged to fight some fire when an upstream system gets slow ;)
Edit to add: I have a few stories here. One of them is about memory limits, which I always put on any production service I run. sum(memory limits) < sum(memory installed in the machine), of course. One time I had Prometheus running in a k8s cluster, with no memory limit. Sometimes people would run queries that took a lot of RAM, and there was often slack space on the machine, so nothing bad happened. Then someone's mouse driver went crazy, and they opened the same Grafana tab thousands of times. On a high memory query. Obviously, Prometheus used as much RAM as it could, and Linux started OOM killing everything. Prometheus died, was rescheduled on a healthy node, and the next group of tabs killed it. Eventually, the OOM killer had killed the Kubelet on every node, and no further progress could be made. The moral of the story is that it would have been better to serve that user 1000 "sorry, Prometheus died horribly and we can't serve your request right now", which memory limits would have achieved. Instead, we used up all the RAM in the Universe to try to satisfy them, and still failed. (What was the resolution? I think we killed the bad browser, which happened to be a dashboard-displaying TV next to our desks. Then kubelets restarted, and I of course updated Prometheus to have a 4G memory limit. Retried 1000 tabs with an expensive query, and Prometheus died and the frontend proxy served 990 of the tabs an error message. Back pressure! It works! You can imagine how fun this story would have been if I had cluster autoscaling, though. Would have just eventually come back to a $1,000,000 AWS bill and a 1000 node Kubernetes cluster ;)
This is known as Little's Law. Using Little's Law, you know that if the average time spent in queue is more than the average time it takes for a new entry to be added to the queue, then your queue fills up.
Anecdotally, this could be a huge win, depending on how often it's called.
A guy I was working with, new to Go, was writing a router config parser and asked why it was so slow.
The first thing I did was moved regexp.Compile from a hot path into a broader scope. It went from something like 40 seconds down to 2 on my machine.
It would probably be a good idea to add performance hints like 'prefer to put static regular expressions in a package variable' in a linter or go vet.
That is true. I do find however the explicitness of the Rust way of dealing with memory, whether it be lifetimes, who can and can't mutate it and who the memory belongs to, makes it much easier to reason about the right way of doing these things.
In C++ the same is often possible, but there is no way to have guarantees at the interfaces. Const is a promise that your function won't mutate something, it doesn't put any restrictions on the caller. Pass by reference doesn't guarantee that the reference will be kept alive.
Go (I guess with no experience there) probably has fewer footguns, but how explicit is memory management?
I'm not saying rust is better or they made the wrong choice, sounds like C++ would let users easily make the same "wrong" choices, just interesting to carry the thoughts through a bit further.
IIRC it is `reflect.Type.FieldXXX` which is the main culprit of allocations. Since the number of types in a typical application are bounded and small, you can get pretty far by just precomputing/caching struct fields.
But that's not what the problem was here; the product they were building was using `reflect` in anger. They were relying on something that did magic, pulling a rabbit out of its hat to automatically compare protobuf thingies. They used it on a hot path. The room quickly filled with rabbit corpses. I guess you can blame Go for the existence of those kinds of libraries, but most perf-sensitive devs know that they're a risk.
Assuming the file is on a disk and the 1 MB refers to the system memory - like you do with any potentially unbound data, you read and write it in chunks. Reading in data of any kind in whole is only reasonable, if you can clearly set an upper bound for its size.
As to the issues you mentioned, there's a few 'adages' you could apply; "always use readers", "don't use reflect if you can help it", "move unchanging expressions to package level", etc.
As opposed to, e.g. Java, which I ranted elsewhere in the thread, is a trashy mess. I programmed for over a decade in Java, and yeah, it's only gotten worse over the years. They would have done even more custom processing and bypassing of the layers underneath due to Java's typical copy-happiness.
One big difference is that the Java runtime design intent is more in the vein of 'converting memory into performance'. On HN, Ron Pressler ('pron) has written a bunch of interesting stuff about that over the years
https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
> PRO-REWRITE: Rust has manual memory management, so we would avoid the problem of having to wrestle with a garbage collector because we would just deallocate unused memory ourselves, or more carefully be able to engineer the response to increased load.
> ANTI-REWRITE: Rust has manual memory management, which means that whenever we’re writing code we’ll have to take the time to manage memory ourselves.
Isn't part of the point of Rust that you don't manage memory yourself, and rather that the compiler is smart enough to manage it for you?
It's also like, a perception thing in some sense. Imagine someone writes some code. They get a compiler error. There are two ways to react to this event:
"Wow the compiler didn't make this work, I have to think about memory all the time."
"Ah, the compiler caught a mistake for me. Thank goodness I don't have to think about this for myself."
Both perceptions make sense, but seem to be in complete and total opposition.
The Rust community has done laudable work in bringing down the cognitive threshold of "manual / compile-time" memory management, but I think we're finding out that the returns are diminishing quickly and there's still quite a chasm between borrow checking and GC with respect to developer velocity.
Are there some examples of that?
What I think people _should_ focus on with Rust versus Go (et al) is that Rust allows you to choose where you _place_ memory. You can choose the stack or the heap. The placement can matter in hot regions of code. Additionally, Rust is pretty in-your-face when it comes to concurrency and sharing memory across thread/task boundaries.
Use channels whenever possible.
Rust = when writing your code, you explicitly describe the ownership and lifetime of your objects and how your functions are allowed to consume/copy etc. them and get safety as a result
C = when writing your code, you explicitly allocate and free your objects and you get no assistance from the language about when it is safe to copy/dereference/free/etc. a pointer/allocation
For trivial cases, kind of. But once you start to do anything remotely sophisticated, no. Everything you do in Rust is checked w.r.t. memory management, but you still need to make many choices about it. All the stuff about lifetimes, borrowing, etc: that's memory management. The compiler's checking it for you, but you still need to design stuff sanely, with memory management (and the checking thereof) in mind. It's easy to back yourself into a corner if you ignore this.
Yes Rc and equivalents have a performance overhead, but for many use cases the overhead really isn't that bad since you typically aren't creating tons of copies. In practice, I've found one can ignore lifetimes in almost all cases even when using references except when storing them in structs or closures. So really you would just need to increment the Rc counter for structs/closures outside of allocation/deallocation which is dominated by calls to malloc/free.
The trick is to pre-allocate all your objects and buffers and reuse them in a ring buffer. Similar techniques work in zero-malloc embedded C environments.
Similarly, idiomatic modern C++ rarely needs new and delete calls, but I'd still say it has manual memory management.
I suppose it's reasonable to talk about degrees of manual-ness, and say that memory management in Rust or modern C++ is less manual than C, but more manual than Go/Python/Java.
In GC languages, memory management is generally runtime through the interpreter/runtime. In C, memory management is generally done at programming time by the (human) programmer. In Rust, memory management is generally done at compile time by the compiler. There are exceptions in all three cases, but the "default" paradigm of a language informs a lot about how it's designed and used.
From what I've read, Rust memory management actually requires more work but provides fantastic safety guarantees. This could mean that rust actually lowers productivity at first, but as the complexity of the code base grows, some of that productivity is restored or even supercedes C/C++ because you spend no time chasing runtime memory bugs.
For some products or projects, the costs of shipping a security flaw caused by a memory bug exploit could be high enough that a drop in productivity from Rust relative to C is still more than justified due to external costs that Rust mitigates.
“The module that does this inference was recompiling those regular expressions each time it was asked to do the work.”
“The reason for the allocation was a buffer holding decompressed data, before feeding it to a parser. …the output of the decompression could be fed directly into the parser, without any extra buffer.”
The problem here isn’t that the language has GC, it’s that memory usage was just not considered. If you want performance, you have to pay attention to allocations no matter what kind of memory management your language has. And as the article demonstrates, if you pay attention, you can get performance no matter what kind of memory management your language has.
I don't miss the rote parts of manual memory management, but it had the enormously beneficial side effect of making people consider object lifetimes upfront (to keep the retain graph acyclic) and cultivate occasional familiarity with leak tracking tools. Problematic patterns like the undo queue or query correlator that accidentally leak everything tended to become obvious when writing the code, rather than while running it. These days, I keep seeing those same memory management anti-patterns show up when I ask interviewees to tell a debugging war story. Sometimes I even see otherwise capable devs shooting in the dark and missing when it comes to the "what's eating RAM" problem.
I feel like GC in long-form program development substitutes a small problem for a big one. Short-form programming can get away with just leaking everything, which is what GC does anyway, so I'm not sure there's any benefit there either.
tl;dr: get off my lawn.
I write almost all of my code in Virgil these days. It is fully garbage-collected but nothing forces you into a trashy style. E.g. I use (and reuse) StringBuilders, DataReaders, and TextReaders that don't create unnecessary intermediate garbage. It makes a big difference.
Sometimes avoiding allocation means reusing a data structure and "resetting" or clearing its internal state to be empty. This works if you are careful about it. It's a nightmare if you are not careful about it.
I'm not going back to manual memory management, and I don't want to think about ownership. So GC.
edit: Java also highly discourages reimplementing common JDK functionality, but I've found building a customized datastructure that fits exactly my needs (e.g. an intrusive doubly-linked list) can work wonders for performance.
I’ve come to a new obvious realisation with this sort of thing recently: if you care about some metric, make a test for it early and run it often.
If you care about correctness, grow unit tests and run them at least every commit.
If you care about performance, write a benchmark and run it often. You’ll start noticing what makes performance improve and regress, which over time improves your instincts. And you’ll start finding it upsetting when a small change drops performance by a few percent.
If you care about memory usage, do the same thing. Make a standard test suite and measure it regularly. Ideally write the test as early as possible in the development process. Doing things in a sloppy way will start feeling upsetting when it makes the metric get worse.
I find when I have a clear metric, it always feels great when I can make the numbers improve. And that in turn makes it really effortless bring my attention to performance work.
That has not been my experience. Rewrites do sometimes help, because in a lot of codebases there’s too many “pet” modules or badly designed frozen interfaces.
Rewrites can help in those situations, because there’s no sacred cows anymore. The issue is that a lot of people do rewrites as translations, without touching structures.
So many posts here over the years of examples of 'how we rewrote from x to y and saw 2000% gains', where x and y are languages. Such examples are 100% meaningless. Rewrites from the ground up -should- always be way faster, since it's all greenfield. If trying to make a language comparison, rewrite the entire thing in both languages!
All too often people don’t understand how to performance tune software properly and instead blame other things first (eg garbage collection)
My point was that, contrary to what is apparently a common impulse, rewriting the same thing in a different language while maintaining the lack of attention to performance considerations that was present in the first version isn't going to help much.
That said, to slot in a new implementation you often have to make the external API very similar to the old one, which can complicate making the improvements you're after.
You can pretty much always get better performance with more control over memory, and more importantly, you can dramatically lower overall memory usage and avoid GC pauses, but you have to weigh that against the fact that automated memory management is one of the few programming language features that is basically proven to give a massive developer productivity boost. In my corner of the industry, everyone chooses the GC'ed languages and performance isn't really a major concern most of the time.
While I agree with the gist of what you're saying, I do think runtimes based on the we'll-clean-it-up-some-day GC paradigm makes it more important to consider memory allocation than less laissez-faire paradigms (like RAII or reference counting), contrary to how it's presented in the glamorous brochures.
The last error in particular, using byte buffers instead of a streaming abstraction, is pervasive in programming. I don't know if Rust is necessarily any worse than Go's library environment for dealing with that problem but I doubt it's any better. By having io.Reader in the standard library from the beginning (and not because of any other particular virtue of the language, IMHO) it has had one of the best ecosystems for dealing with streams without having to manifest them as full bytes around [1].
It amounts to, the root problem is that they didn't have the problem they thought they have. Rust will blow the socks off the competition w.r.t. memory efficiency of lots of small objects, which is why it's so solid in the browser space. But that's not the problem they were having. Go's just fine where they seem to have ultimately ended up, stream processing things with transient per-object processing. Even if you do some allocation in the processing, the GC ends up not being a big deal because the runs end up scanning over not much memory not all that frequently. This is why Go is so popular in network servers. Could Rust do better? Yes. Absolutely, beyond a shadow of a doubt. But not enough to matter, in a lot of cases.
[1]: An expansion on that thought if you like: https://news.ycombinator.com/item?id=28368080
In the worst case, you can always (even on GC'd languages) pre-allocate buffers and do your work without new memory requests. But you need to plan for this, in the same way you'd do in a language without GC.
The memory and CPU savings are negligible between Go and Rust in practice no matter what people might claim in theory. However, the side effects of making your team less productive by using Rust is a much higher price to pay than just running you Go service on more powerful hardware.
There are many other non-obvious problems with going to Rust that I won't get into here but they can be quite costly and invisible at first and impossible to fix later.
Simple is better. Stay with Go.
I don't see Rust having much of a place in web services development until there's years of improvements in place. There's plenty of other potentially appropriate places for Rust replacing systems code.
nitpicking here, but this is by design - it's also true for datetimes and random numbers. it isn't a fault, it's a different packaging philosophy.
i agree with the rest - the good things about Rust just don't matter as much when developing bit-shoveling HTTP services, which is what 99% of backend seems to do nowadays.
I don't think that my experience is something isolated, either, here is for example a quote from one of Microsoft employees:
> "For the first week or so, we lost much of our time to learning how borrows worked. After about two weeks, we were back up to 50% efficiency compared to us writing in Go. After a month, we all were comfortable enough that we were back up to full efficiency (in terms of how much code we could write)," writes Thomas.
> "However, we noticed that we gained productivity in the sense that we didn't spend as much time manually checking specific conditions, like null pointers, or not having to debug as many problems."
https://www.zdnet.com/article/microsoft-why-we-used-programm...
Recently I even gave a shot to a todo-backend[1] implementation in Rust[2] and it honestly doesn't look that different from the Go versions.
Granted the todo-backend spec is very very simple. I would prefer to also include stuff like authentication/authorization and maybe even multi tenancy to compare better. But when I'm writing this kind of Rust code I'm often wondering - what makes Rust so unergonomic for other people?
1. https://todobackend.com/
2. https://github.com/drogus/todo-backend/blob/main/src/main.rsThis entirely depends on the ratio of development effort to deployed instances. At one end of the spectrum, lots of developers work for years on a system which is only deployed on one machine; obviously you optimize for developer effort and buy a single massive machine. At the other end of the spectrum, a few developers work for a short time on a system which is deployed at massive scale; obviously you optimize for performance.
At Pernosco we have a very small team deploying a relatively small number of instances, and after five years of Rust we're very happy.
There is a vicious cycle: few projects use Rust because the productivity hit is large, and programmers do not get enough experience using Rust because few projects use it.
Ive been feeling the same, but as someone who just played with Go/Rust (and never professionally), it's nice to hear that professionals feel the same.
If you were struggling with this, turning to Rust might be a thing people would try, even if it wasn't fixing the first order problems, and only addressing the 2nd order ones.
Go and Rust are great in that they let you write code at good speed, although, I think this just highlights the well known problems of over optimizing a single metric.
No.
Basically whether you write C, C++, or Rust, you have to track ownership the same ways, the only thing that changes is how much the compiler helps you with that. However, if you write your program in Java, Lisp or Haskell, you simply do not care about ownership for memory-only objects, and can structure your program significantly differently.
This can have significant impact on certain types of workflows, especially when it comes to shared objects. A well-known example is when implementing lock-free data structures based on compare-and-swap, where you need to free the old copy of the structure after a successful compare-and-swap; but, you can't free it since you don't know who may still be reading from it. Here is an in-depth write-up from Andrei Alexandrescu on the topic [0].
Note: I am using "object" here in the sense from C - basically any piece of data that was allocated.
Go’s focus on simplicity means that there is only a single parameter, SetGCPercent, which controls how much larger the heap is than the live objects within it.
FWIW, there is a new proposal from a member of the core Go team to add a second GC knob in the form of a soft limit on total memory:
https://github.com/golang/proposal/blob/master/design/48409-...
It includes some provisions to make sure that the application can keep making progress and avoid death spirals (part of the reason why it is a "soft" limit), and also includes some new GC-related telemetry.
From the blog write up, a second GC knob with a soft limit might have only been a minor help here, with the bigger wins coming from the code changes they described in the blog.
I have a Go application that runs in Kubernetes, where memory usage steadily increases until it's at around 90% of the cgroup limit, where it seems to stabilize. As far as I can tell, Go GC uses the container memory limits to navigate it's total memory usage (this might be the fault of the OS not reclaiming what Go has already freed(?)).
However, my issue is that in this app, I also call out to cGo, and do manual memory allocations in C++ every 10-30minutes. This works well, except when the container is stabilized at a high memory usage, and my manual allocation brings it over the limit, thus forcing kubernetes to terminate it. (These allocations should as far as I know not be leaking. For a short while, I have two large objects allocated, and 99.9% of the time it's only one)
So, what I'd ideally want is to be able to specify a target heap size for GoGC, and then have a known overhead for the manual allocation. But as far as I'm aware, this isn't possible (?)
Does anyone have any experience with something like this, or see any obvious avenues to pursue to solve the termination issue?
Is this a typo? Weren't seeing 500 MB of live data, just a little more than 200MB in the worst case?
EDIT: Btw, I read the entire article. It was fascinating, thank you!
Could you not just set a ulimit on memory usage of the process in that case? (And use another process as the parent, e.g. a supervisor or init, to avoid exiting the container and just restart the process instead)
However I think writing efficient code in even in managed memory languages for large, heavily used service is kind of normal thing and not above and beyond normal work.