Compile your software (opens in new tab)

(blog.kalvad.com)

46 pointswowi424y ago43 comments

43 comments

36 comments · 9 top-level

prpl4y ago· 8 in thread

The argument here I suppose is live on the bleeding edge? Or swap compilers/allocators as necessary? Or use O3? I'm not entirely sure.

It wraps up with:

>>> COMPILE YOUR SOFTWARES. It's going to help you understand it, make your app faster, more secure (by removing the mail gateway of NGiNX for example), and it will show you which softwares are easily maintanable and reliable.

It is PITA is keeping up with compiler changes and library changes of third party software. In this case, throwing in different malloc implementations in there too, not to mention different libc implementations. With all those variables for each component you deploy you're probably less likely to understand what's going in your software.

Maybe for a small app with 2-3 extra pieces or something that needs to really be optimized this is good advice, but it sounds like a lot of work for significant footprints.

It would be nice if the official docker images had some tags that were better optimized, within reason.

wowi42OP4y ago

Author here.

> The argument here I suppose is live on the bleeding edge? Or swap compilers/allocators as necessary? Or use O3? I'm not entirely sure.

My point is that when you are using a software, especially open source, you should evaluate it, and understand it. BTW, we are running arch in production, with our own repos.

> It is PITA is keeping up with compiler changes and library changes of third party software. In this case, throwing in different malloc implementations in there too, not to mention different libc implementations. With all those variables for each component you deploy you're probably less likely to understand what's going in your software.

in my example, the libc implementation is always the same! it's just that it's outdated on all main docker images. Furthermore, Redis already use a different malloc implementation (jemalloc), but in the makefile, they support also the standard malloc and tcmalloc, so throwing in another one is very easy.

> Maybe for a small app with 2-3 extra pieces or something that needs to really be optimized this is good advice, but it sounds like a lot of work for significant footprints. Or when you need security/performance. of course, if you have 2 servers, it's useless, but we have more than 4000 servers running in production on arch, so it's worth it for us.

> It would be nice if the official docker images had some tags that were better optimized, within reason. or just be up to date

prpl4y ago

redis:latest uses glibc though? looks like glibc 2.28.

Yeah - if you're targeting a single platform for a single app it's probably much easier, and DIY is probably even easier with Arch or Gentoo. I've been in the "business" of maintaining multiple systems/libraries (mysql, boost, numpy, scipy, matplotlib, even the JDK) over multiple platforms (centos+devtoolset, macos, clang) and it's a nightmare. I've also, of course, done lots of custom builds of nginx/openresty for specific things which isn't so bad.

We've since moved on to conda-forge for getting most our third parties, which moved faster than the defaults channel even though the libc version compatibility is very old (but improving). Compilers update more frequently than glibc there, but compiler upgrades there are still deliberate.

1MachineElf4y ago

>BTW, we are running arch in production, with our own repos.

I believe the ability to deploy from one's own fork is a minimum requirement for a distro of choice. I was wondering, do you have any resources to recommend for how to accomplish this with Arch?

1 more reply

l0b04y ago

So much advice in software development presupposes that everyone has the same single goal and at least one inexhaustible resource. In this case, time:

- To read and understand all relevant parts of the build pipelines for each performance-critical part of the infrastructure. In a relatively simple web application that could mean at least a web server, a caching framework, a database and a message bus.

- To debug any build failures, which could be plentiful and hard to parse until you're very familiar with the build infrastructure for that particular piece of software.

- To benchmark the build outputs in comparable ways with realistic configuration and inputs.

- To repeat the above whenever any part of the stack changes significantly.

ec1096854y ago

It also could introduce bugs given your company might be the only one earth with that combination of compiler, kernel and libraries.

coliveira4y ago

You mean, expose bugs. If the software doesn't support a certain combination, this is a bug in the software.

3 more replies

thayne4y ago

Only if you do it for everything. But it might be worth it for the most performance critical components.

prpl4y ago

I buy that for a bottleneck, and also when you encounter bugs (it happens). This post didn’t go into day 2 though, and having dealt with a very large code base. It was very hard to maintain it over the years. We started sourcing most things from conda-forge because it was easier (data science pipelines) but a few things we still built.

brigandish4y ago· 8 in thread

The sad fact is that a lot of software is difficult to compile; isn't documented well, something that is worse for building; won't work well if installed in a non-standard way, whether that is final location, different supporting libs, or different platform; and can take a long time.

I'm happy nowadays when I see there's a binary available, no mucking around with gcc/clang/llvm - just trying to work out which one, let alone which version! - no diving down a rabbit hole of compiling dependencies that then need other dependencies compiled… no deciphering Makefiles that were written in a way that only a C guru can grok, with no comments.

Whatever the benefits are, I prefer sanity.

Rendello4y ago

This is one thing where I see Rust (and probably Zig) making headway. A lot of the newer Rust software isn't in package repos yet, but I don't mind doing a Cargo build. It might take a while to compile, but it always seems to just work.

wowi42OP4y ago

> The sad fact is that a lot of software is difficult to compile; isn't documented well, something that is worse for building; won't work well if installed in a non-standard way, whether that is final location, different supporting libs, or different platform; and can take a long time.

So are you ready to deploy in prod a software that is so hard to compile?

> I'm happy nowadays when I see there's a binary available, no mucking around with gcc/clang/llvm - just trying to work out which one, let alone which version! - no diving down a rabbit hole of compiling dependencies that then need other dependencies compiled… no deciphering Makefiles that were written in a way that only a C guru can grok, with no comments.

But that's my job, as SRE/DevOps/whatever new fancy name!

> Whatever the benefits are, I prefer sanity.

Sanity of having a very old software, with backported features that are only on this distrib? I prefer to trust the engineers from the software that I deploy.

brigandish4y ago

I'm sorry but I just don't understand your point. Would you clarify?

binarybanana4y ago

That doesn't match my experience. Two decades ago everything was in constant flux and more unreliable, but nowadays it's rare to find broken builds. It's only difficult if you distribution puts headers and such in separate packages, or splits up stuff so much that it's hard to tell what you actually need. Can't remember the last time I had to intervene to compile a random GitHub project I wanted to try out. It just works.

brigandish4y ago

Are you on Linux? I'm running a Mac, and I find broken builds for projects big and small all the time. I'd open issues for them but the number of times I get a "it works for me" and any of the myriad other types of shrug devs like to give tends to put me off.

1 more reply

lnxg33k14y ago

https://hub.docker.com/r/gentoo/stage3

anthk4y ago

Try pkgsrc.

brigandish4y ago

I do, it's very good!

powersnail4y ago· 4 in thread

Please label your bar graph axis, with units. It’s kind of counterproductive to look at a benchmark graph without knowing whether more or less is better.

n8ta4y ago

Agreed. I thought this was an interesting post but was baffled by every y axis.

wowi42OP4y ago

Noted!

yellow_lead4y ago

And better titles. "Performance" is confusing. I think he means "Performance of different base images for Redis"?

wowi42OP4y ago

Thanks for the feedback. Let me find a better way to make it more clear!

mathfailure4y ago· 3 in thread

Why would I read some tosser?

wowi42OP4y ago

https://www.larousse.fr/dictionnaires/francais/tosser/78590

Zuider4y ago

https://www.etymonline.com/search?q=tosser

wowi42OP4y ago

Nice one, I never heard it!

Aeolun4y ago· 2 in thread

I don’t follow, where did this person compile anything? The only thing I saw him do was use pregenerated docker images?

wowi42OP4y ago

Read the article. I compiled redis with Zig, O3 native, and mimalloc!

Aeolun4y ago

I did read the article, but that wasn’t immediately clear to me. Maybe something about the formatting of pre-generated docker images in the same line as the compiled versions?

Anyway, thanks for the clarification!

Macha4y ago· 1 in thread

I wonder how x86-64-v3 for Arch/v2 for fedora in the near future will change this calculus. Currently you're basically compiling for a Core 2/Athlon 64 era chip, so there's clear wins to be had, but I wonder how much of the benefit can be had just by using software requiring Haswell/Zen1 at minimum

binarybanana4y ago

-march=native will still give you better performance. It's not just about the instruction set, but also heuristics taking into account cache size, latencies, topology and other things. Intel for example has this quirk that aligning functions (and other jump targets) at 32 byte boundaries speeds up funct8 calls and jumps. I haven't tested it but I suspect you'd gain more from -mtune=native with the generic x86_64 target than -march=native. Some loops that can be autovectorized with AVX instructions will probably be faster though. But cache size especially is important for deciding if some optimization is beneficial or just leads to stalls due to thrashing.

latenightcoding4y ago· 1 in thread

On my side projects I compile everything myself, but I do not completely agree with this post because Redis is one of easiest/fastest mainstream databases to compile, it can get very time consuming and the returns are not always there.

wowi42OP4y ago

that's true, but without some crazy guy like me to check it, you would not know! :-)

binarybanana4y ago

Other cool things you can do if you compile yourself is use features like auto parallelization[1].

I wouldn't recommend to enable it system wide because it causes issues with programs that fork() due to limitations in gcc's OpenMP library[2], but other than that it works pretty well. For example, I can fully load my 4C/8T CPU using 3 clang processes because compilation is magically spread over multiple threads. I've seen a "single threaded" program (qemu-img) suddenly start using more than a single core to convert disk images into other formats, leading to speedups.

Also things like PGO/FDO in combination with workload specific profiling data can easily give you 10% or more if you are CPU bound.

[1]: https://gcc.gnu.org/wiki/AutoParInGCC

[2]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624 (There was a patch to fix this, but it never got merged and doesn't apply to the current version any more, sadly)

b2158264y ago

Please avoid GIFs and memes in your article. It adds nothing to the actual information in the article, but takes the seriousness away and makes it less readable.

j / k navigate · click thread line to collapse

43 comments

36 comments · 9 top-level

prpl4y ago· 8 in thread

The argument here I suppose is live on the bleeding edge? Or swap compilers/allocators as necessary? Or use O3? I'm not entirely sure.

It wraps up with:

Maybe for a small app with 2-3 extra pieces or something that needs to really be optimized this is good advice, but it sounds like a lot of work for significant footprints.

It would be nice if the official docker images had some tags that were better optimized, within reason.

wowi42OP4y ago

Author here.

> The argument here I suppose is live on the bleeding edge? Or swap compilers/allocators as necessary? Or use O3? I'm not entirely sure.

My point is that when you are using a software, especially open source, you should evaluate it, and understand it. BTW, we are running arch in production, with our own repos.

> It would be nice if the official docker images had some tags that were better optimized, within reason. or just be up to date

prpl4y ago

redis:latest uses glibc though? looks like glibc 2.28.

1MachineElf4y ago

>BTW, we are running arch in production, with our own repos.

I believe the ability to deploy from one's own fork is a minimum requirement for a distro of choice. I was wondering, do you have any resources to recommend for how to accomplish this with Arch?

1 more reply

l0b04y ago

So much advice in software development presupposes that everyone has the same single goal and at least one inexhaustible resource. In this case, time:

- To debug any build failures, which could be plentiful and hard to parse until you're very familiar with the build infrastructure for that particular piece of software.

- To benchmark the build outputs in comparable ways with realistic configuration and inputs.

- To repeat the above whenever any part of the stack changes significantly.

ec1096854y ago

It also could introduce bugs given your company might be the only one earth with that combination of compiler, kernel and libraries.

coliveira4y ago

You mean, expose bugs. If the software doesn't support a certain combination, this is a bug in the software.

3 more replies

thayne4y ago

Only if you do it for everything. But it might be worth it for the most performance critical components.

prpl4y ago

brigandish4y ago· 8 in thread

Whatever the benefits are, I prefer sanity.

Rendello4y ago

wowi42OP4y ago

So are you ready to deploy in prod a software that is so hard to compile?

But that's my job, as SRE/DevOps/whatever new fancy name!

> Whatever the benefits are, I prefer sanity.

Sanity of having a very old software, with backported features that are only on this distrib? I prefer to trust the engineers from the software that I deploy.

brigandish4y ago

I'm sorry but I just don't understand your point. Would you clarify?

binarybanana4y ago

brigandish4y ago

1 more reply

lnxg33k14y ago

https://hub.docker.com/r/gentoo/stage3

anthk4y ago

Try pkgsrc.

brigandish4y ago

I do, it's very good!

powersnail4y ago· 4 in thread

Please label your bar graph axis, with units. It’s kind of counterproductive to look at a benchmark graph without knowing whether more or less is better.

n8ta4y ago

Agreed. I thought this was an interesting post but was baffled by every y axis.

wowi42OP4y ago

Noted!

yellow_lead4y ago

And better titles. "Performance" is confusing. I think he means "Performance of different base images for Redis"?

wowi42OP4y ago

Thanks for the feedback. Let me find a better way to make it more clear!

mathfailure4y ago· 3 in thread

Why would I read some tosser?

wowi42OP4y ago

https://www.larousse.fr/dictionnaires/francais/tosser/78590

Zuider4y ago

https://www.etymonline.com/search?q=tosser

wowi42OP4y ago

Nice one, I never heard it!

Aeolun4y ago· 2 in thread

I don’t follow, where did this person compile anything? The only thing I saw him do was use pregenerated docker images?

wowi42OP4y ago

Read the article. I compiled redis with Zig, O3 native, and mimalloc!

Aeolun4y ago

I did read the article, but that wasn’t immediately clear to me. Maybe something about the formatting of pre-generated docker images in the same line as the compiled versions?

Anyway, thanks for the clarification!

Macha4y ago· 1 in thread

binarybanana4y ago

latenightcoding4y ago· 1 in thread

wowi42OP4y ago

that's true, but without some crazy guy like me to check it, you would not know! :-)

binarybanana4y ago

Other cool things you can do if you compile yourself is use features like auto parallelization[1].

Also things like PGO/FDO in combination with workload specific profiling data can easily give you 10% or more if you are CPU bound.

[1]: https://gcc.gnu.org/wiki/AutoParInGCC

[2]: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=42624 (There was a patch to fix this, but it never got merged and doesn't apply to the current version any more, sadly)

b2158264y ago

Please avoid GIFs and memes in your article. It adds nothing to the actual information in the article, but takes the seriousness away and makes it less readable.

j / k navigate · click thread line to collapse