Why does creating a backtrace need such a large amount of memory? Is there a memory leak involved as well?
(Assuming that the article incorrectly used Mib when they meant MiB. Used correctly b=bit, B=byte)
The only explanation I can see (if their conclusion is accurate) is that the end result of the symbolization is more than 400MB additional memory consumption (which is a lot in my opinion), however the process of the symbolization requires more than 2GB additional memory (which is incredibly a lot).
The first increase of the memory limit was not 4G, but something roughly around 300Mb/400Mb, and the OOM did happen again with this setting.
Thus leading to a 2nd increase to 4Gi to be sure the app would not get OOM killed when the behavior get triggered. We needed the app to be alive/running for us to investigate the memory profiling.
Regarding the increase of 400MiB, yeah it is a lot, and it was a surprise to us too. We were not expecting such increase. There are, I think 2 reasons behind this.
1. This service is a grpc server, which has a lot of code generated, so lots of symbols
2. we compile the binary with debug symbols and a flag to compress the debug symbols sections to avoid having huge binary. Which may part be of this issue.
symbols are usually included even with debuglevel 0, unless stripped[0]. And debuginfo is configurable at several levels[1]. If you've set it to 2/full try dropping to a lower level, that might also result in less data to load for the backtrace implementation.
[0] https://users.rust-lang.org/t/difference-between-strip-symbo... [1] https://doc.rust-lang.org/cargo/reference/profiles.html#debu...
I don't think the article is misleading, but I do think it's a shame that all the interesting info is saved for this hackernews comment. I think it would make for a more exciting article if you included more of the analysis along with the facts. Remember, as readers we don't know anything about your constraints/system.
How big are the uncompressed debug symbols? I'd expected processing uncompressed debug symbols to happen via a memory mapped file, while compressed debug symbols probably need to be extracted to anonymous memory.
cargo build --bin engine-gateway --release
Finished `release` profile [optimized + debuginfo] target(s) in 1m 00s
ls -lh target/release/engine-gateway
.rwxr-xr-x erebe erebe 198 MB Sun Jan 19 12:37:35 2025 target/release/engine-gateway
what we ship export RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zlib -C force-frame-pointers=yes"
cargo build --bin engine-gateway --release
Finished `release` profile [optimized + debuginfo] target(s) in 1m 04s
ls -lh target/release/engine-gateway
.rwxr-xr-x erebe erebe 61 MB Sun Jan 19 12:39:13 2025 target/release/engine-gateway
The diff is more impressive on some bigger projectsI couldn't find any other complaints about rust backtrace printing consuming a lot of memory, which I would have expected if this was normal behaviour. So I wonder if there is anything special about their environment or usecase?
I would assume that the same OOM problem would arise when printing a panic backtrace. Either their instance has enough memory to print backtraces, or it doesn't. So I don't understand why they only disable lib backtraces.
You can see my other comment https://news.ycombinator.com/item?id=42708904#42756072 for more details.
But yes, the cache does persist after the first call, the resolved symbols stay in the cache to speed up the resolution of next calls.
Regarding the why, it is mainly because
1. this app is a gRPC server and contains a lot of generated code (you can investigate binary bloat with rust with https://github.com/RazrFalcon/cargo-bloat)
2. and that we ship our binary with debug symbols, with those options ``` ENV RUSTFLAGS="-C link-arg=-Wl,--compress-debug-sections=zlib -C force-frame-pointers=yes" ```
For the panic, indeed, I had the same question on Reddit. For this particular service, we don't expect panics at all, it is just that by default we ship all our rust binaries with backtrace enabled. And we have added an extra api endpoint to trigger a catched panic on purpose for other apps to be sure our sizing is correct.
Collecting a call stack only requires unwinding information (which is usually already present for C++ exceptions / Rust panics), not full debug symbols. This gives you a list of instruction pointers. (on Linux, the glibc `backtrace` function can help with this)
Print those instruction pointers in a relative form (e.g. "my_binary+0x1234") so that the output is independent of ASLR.
The above is all that needs to happen on the production/customer machines, so you don't need to ship debug symbols -- you can ship `strip`ped binaries.
On your own infrastructure, keep the original un-stripped binaries around. We use a script involving elfutil's eu-addr2line with those original binaries to turn the module+relative_address stack trace into a readable symbolized stack trace. I wasn't aware of llvm-symbolizer yet, seems like that can do the same job as eu-addr2line. (There's also binutil's addr2line but in my experience that didn't work as well as eu-addr2line)