It wraps up with:
>>> COMPILE YOUR SOFTWARES. It's going to help you understand it, make your app faster, more secure (by removing the mail gateway of NGiNX for example), and it will show you which softwares are easily maintanable and reliable.
It is PITA is keeping up with compiler changes and library changes of third party software. In this case, throwing in different malloc implementations in there too, not to mention different libc implementations. With all those variables for each component you deploy you're probably less likely to understand what's going in your software.
Maybe for a small app with 2-3 extra pieces or something that needs to really be optimized this is good advice, but it sounds like a lot of work for significant footprints.
It would be nice if the official docker images had some tags that were better optimized, within reason.
> The argument here I suppose is live on the bleeding edge? Or swap compilers/allocators as necessary? Or use O3? I'm not entirely sure.
My point is that when you are using a software, especially open source, you should evaluate it, and understand it. BTW, we are running arch in production, with our own repos.
> It is PITA is keeping up with compiler changes and library changes of third party software. In this case, throwing in different malloc implementations in there too, not to mention different libc implementations. With all those variables for each component you deploy you're probably less likely to understand what's going in your software.
in my example, the libc implementation is always the same! it's just that it's outdated on all main docker images. Furthermore, Redis already use a different malloc implementation (jemalloc), but in the makefile, they support also the standard malloc and tcmalloc, so throwing in another one is very easy.
> Maybe for a small app with 2-3 extra pieces or something that needs to really be optimized this is good advice, but it sounds like a lot of work for significant footprints. Or when you need security/performance. of course, if you have 2 servers, it's useless, but we have more than 4000 servers running in production on arch, so it's worth it for us.
> It would be nice if the official docker images had some tags that were better optimized, within reason. or just be up to date
Yeah - if you're targeting a single platform for a single app it's probably much easier, and DIY is probably even easier with Arch or Gentoo. I've been in the "business" of maintaining multiple systems/libraries (mysql, boost, numpy, scipy, matplotlib, even the JDK) over multiple platforms (centos+devtoolset, macos, clang) and it's a nightmare. I've also, of course, done lots of custom builds of nginx/openresty for specific things which isn't so bad.
We've since moved on to conda-forge for getting most our third parties, which moved faster than the defaults channel even though the libc version compatibility is very old (but improving). Compilers update more frequently than glibc there, but compiler upgrades there are still deliberate.
I believe the ability to deploy from one's own fork is a minimum requirement for a distro of choice. I was wondering, do you have any resources to recommend for how to accomplish this with Arch?
- To read and understand all relevant parts of the build pipelines for each performance-critical part of the infrastructure. In a relatively simple web application that could mean at least a web server, a caching framework, a database and a message bus.
- To debug any build failures, which could be plentiful and hard to parse until you're very familiar with the build infrastructure for that particular piece of software.
- To benchmark the build outputs in comparable ways with realistic configuration and inputs.
- To repeat the above whenever any part of the stack changes significantly.
I'm happy nowadays when I see there's a binary available, no mucking around with gcc/clang/llvm - just trying to work out which one, let alone which version! - no diving down a rabbit hole of compiling dependencies that then need other dependencies compiled… no deciphering Makefiles that were written in a way that only a C guru can grok, with no comments.
Whatever the benefits are, I prefer sanity.
So are you ready to deploy in prod a software that is so hard to compile?
> I'm happy nowadays when I see there's a binary available, no mucking around with gcc/clang/llvm - just trying to work out which one, let alone which version! - no diving down a rabbit hole of compiling dependencies that then need other dependencies compiled… no deciphering Makefiles that were written in a way that only a C guru can grok, with no comments.
But that's my job, as SRE/DevOps/whatever new fancy name!
> Whatever the benefits are, I prefer sanity.
Sanity of having a very old software, with backported features that are only on this distrib? I prefer to trust the engineers from the software that I deploy.
Anyway, thanks for the clarification!
I wouldn't recommend to enable it system wide because it causes issues with programs that fork() due to limitations in gcc's OpenMP library[2], but other than that it works pretty well. For example, I can fully load my 4C/8T CPU using 3 clang processes because compilation is magically spread over multiple threads. I've seen a "single threaded" program (qemu-img) suddenly start using more than a single core to convert disk images into other formats, leading to speedups.
Also things like PGO/FDO in combination with workload specific profiling data can easily give you 10% or more if you are CPU bound.
[1]: https://gcc.gnu.org/wiki/AutoParInGCC
[2]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624 (There was a patch to fix this, but it never got merged and doesn't apply to the current version any more, sadly)