Archlinux in Production?

As you may know, at Kalvad, we are managing thousands of servers with Ubuntu, Alma or Rocky installed on it, but we are also running a big part of our production on Archlinux. But why? How? Is that stable? Do you have a backup plan?

When I explain that we are using Arch in prod!

Why?

How do you choose your Linux distribution? Mostly, you have 2 ways:

  • Simple and Mainstream, like Ubuntu, Fedora, etc... The good point is that most sysadmin knows Ubuntu, but we have already exposed some issues with it (compilation flags, hard to package .deb/.rpm)
  • Rugged and configurable, like Arch, or Exherbo (used by the people at Clever Cloud)

Of course, we picked the second choice, but why?

We are building our servers has a 2 faces system:

  • What is provided by the OS? For example, htop, vector, telegraf, etc...
  • What is provided by the customer (where Kalvad could be the customer too)? For example PostgreSQL, some Java apps, ...

Upstream

Of course, if you want your system to be secure, fast, and shinny :-), you want to be as close as possible from upstream, without having some maintainers changing some code because Valgrind said that a free was missing (true old story). Great, Archlinux is upstream-based!

For example, today (2022/04/24), HAProxy, which is our favorite load balancer, is in version 2.5.5 (the latest one) on Arch, but 1.8.27 on Alma Linux and 2.4.14 on Ubuntu 22.04, which was released only a few days ago!

Furthermore, archlinux, like most source-based/rolling release distributions, is trying to stick as much as possible to the main package: if you check the method to compile HAProxy on Arch, you would see that there is only 1 patch distro-specific!

# Alma Linux

haproxy -vvv
HA-Proxy version 1.8.27-493ce0b 2020/11/06
Copyright 2000-2020 Willy Tarreau <willy@haproxy.org>

Build options :
  TARGET  = linux2628
  CPU     = generic
  CC      = gcc
  CFLAGS  = -O2 -g -fno-strict-aliasing -Wdeclaration-after-statement -fwrapv -Wno-null-dereference -Wno-unused-label -Wno-stringop-overflow
  OPTIONS = USE_LINUX_TPROXY=1 USE_CRYPT_H=1 USE_GETADDRINFO=1 USE_ZLIB=1 USE_REGPARM=1 USE_OPENSSL=1 USE_LUA=1 USE_SYSTEMD=1 USE_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with OpenSSL version : OpenSSL 1.1.1g FIPS  21 Apr 2020
Running on OpenSSL version : OpenSSL 1.1.1k  FIPS 25 Mar 2021
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.4
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Encrypted password support via crypt(3): yes
Built with multi-threading support.
Built with PCRE version : 8.42 2018-03-20
Running on PCRE version : 8.42 2018-03-20
PCRE library supports JIT : no (USE_PCRE_JIT not set)
Built with zlib version : 1.2.11
Running on zlib version : 1.2.11
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Built with network namespace support.

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available filters :
	[SPOE] spoe
	[COMP] compression
	[TRACE] trace





# Arch

haproxy -vv
HAProxy version 2.5.5-384c5c5 2022/03/14 - https://haproxy.org/
Status: stable branch - will stop receiving fixes around Q1 2023.
Known bugs: http://www.haproxy.org/bugs/bugs-2.5.5.html
Running on: Linux 5.17.4-arch1-1 #1 SMP PREEMPT Wed, 20 Apr 2022 18:29:28 +0000 x86_64
Build options :
  TARGET  = linux-glibc
  CPU     = native
  CC      = cc
  CFLAGS  = -march=x86-64 -mtune=native -O2 -pipe -fno-plt -fexceptions -Wp,-D_FORTIFY_SOURCE=2 -Wformat -Werror=format-security -fstack-clash-protection -fcf-protection -fwrapv
  OPTIONS = USE_PCRE=1 USE_PCRE_JIT=1 USE_GETADDRINFO=1 USE_OPENSSL=1 USE_LUA=1 USE_ZLIB=1 USE_SYSTEMD=1 USE_PROMEX=1
  DEBUG   = 

Feature list : +EPOLL -KQUEUE +NETFILTER +PCRE +PCRE_JIT -PCRE2 -PCRE2_JIT +POLL +THREAD +BACKTRACE -STATIC_PCRE -STATIC_PCRE2 +TPROXY +LINUX_TPROXY +LINUX_SPLICE +LIBCRYPT +CRYPT_H +GETADDRINFO +OPENSSL +LUA +ACCEPT4 -CLOSEFROM +ZLIB -SLZ +CPU_AFFINITY +TFO +NS +DL +RT -DEVICEATLAS -51DEGREES -WURFL +SYSTEMD -OBSOLETE_LINKER +PRCTL -PROCCTL +THREAD_DUMP -EVPORTS -OT -QUIC +PROMEX -MEMORY_PROFILING

Default settings :
  bufsize = 16384, maxrewrite = 1024, maxpollevents = 200

Built with multi-threading support (MAX_THREADS=64, default=12).
Built with OpenSSL version : OpenSSL 1.1.1n  15 Mar 2022
Running on OpenSSL version : OpenSSL 1.1.1n  15 Mar 2022
OpenSSL library supports TLS extensions : yes
OpenSSL library supports SNI : yes
OpenSSL library supports : TLSv1.0 TLSv1.1 TLSv1.2 TLSv1.3
Built with Lua version : Lua 5.3.6
Built with the Prometheus exporter as a service
Built with network namespace support.
Built with zlib version : 1.2.12
Running on zlib version : 1.2.12
Compression algorithms supported : identity("identity"), deflate("deflate"), raw-deflate("deflate"), gzip("gzip")
Support for malloc_trim() is enabled.
Built with transparent proxy support using: IP_TRANSPARENT IPV6_TRANSPARENT IP_FREEBIND
Built with PCRE version : 8.45 2021-06-15
Running on PCRE version : 8.45 2021-06-15
PCRE library supports JIT : yes
Encrypted password support via crypt(3): yes
Built with gcc compiler version 11.2.0

Available polling systems :
      epoll : pref=300,  test result OK
       poll : pref=200,  test result OK
     select : pref=150,  test result OK
Total: 3 (3 usable), will use epoll.

Available multiplexer protocols :
(protocols marked as <default> cannot be specified using 'proto' keyword)
              h2 : mode=HTTP       side=FE|BE     mux=H2       flags=HTX|CLEAN_ABRT|HOL_RISK|NO_UPG
            fcgi : mode=HTTP       side=BE        mux=FCGI     flags=HTX|HOL_RISK|NO_UPG
       <default> : mode=HTTP       side=FE|BE     mux=H1       flags=HTX
              h1 : mode=HTTP       side=FE|BE     mux=H1       flags=HTX|NO_UPG
       <default> : mode=TCP        side=FE|BE     mux=PASS     flags=
            none : mode=TCP        side=FE|BE     mux=PASS     flags=NO_UPG

Available services : prometheus-exporter
Available filters :
	[SPOE] spoe
	[CACHE] cache
	[FCGI] fcgi-app
	[COMP] compression
	[TRACE] trace

To be completely transparent, there is one lib that we are late with on Archlinux, it's OpenSSL 3, but this could be another article.

But you can see the difference: we compile and execute on the same lib version, and we are as close as possible to upstream (+ some optimizations for the CPU).

Lightweight

Contrary to some source-based distributions (like Exherbo), we have precompiled packages, but it's so easy to compile your own packages and build your own repo, and seeing a 10-15% performance improvement be doing a small modification inside /etc/makepkg.conf!

What is the interest? Distributions like Debian or Ubuntu, want to run the distribution from a super computer to a toaster (NetBSD is not the only one to run on one :-) !), where we know what is our hardware, so we can clearly optimize it through our repos, or at compilation time on the servers!

Hackability

As you check a PKGBUILD on Archlinux, it's very simple to understand and hack your way around, for example, inside HAProxy's PKGBUILD, changing from generic to native was just changing a line!

Furthermore, we can maintain our own packages (like warp10), at least for some time, before contributing to the AUR or the repos!

Finally, we are able to patch our own software, and implement a security patch without waiting for the distribution!

Security

As most of the attacks are based on injecting some code inside a binary, we don't have the same binary as everybody else, so we are far less exposed!

How?

We explained why we chose Arch, but how do we manage it?

We have been building a software (not yet open-source, unfortunately), called Konstruis, which was heavily inspired by this fabulous FreeBSD tool called Poudriere!

Long story short, it builds a VM on Xen (XCP-NG), installs the latest version of packages from the main repo, and starts to build our own packages/upgraded one, like the HAProxy shared earlier! Then it uploads the built packages to our central storage, and rebuilds our repo!

Furthermore, we have developed another tool: package tracking. All our servers are having this tool, which will run on a periodical basis, and it will send:

  • the package installed
  • Its version/source

We can then compare to our internal database and gain 2 advantages:

  • we can detect an unauthorized package on our servers
  • we can compare it with the RSS/Atom feed of security.archlinux.org

Then we add the repo to our servers, and it's done!

Stability

Is our solution stable? To be honest, yes! We have a lot of traffic coming, we are upgrading permanently, and we are rarely facing issues! Of course, we had some troubles, but we are used to managing it and improving our redundancy systems, especially for the reboots due to the kernel upgrade! (We don't have any infamous servers with 10 years of uptime!).

We have more issues with not being able to patch some sensitive servers on Ubuntu/Centos-like than with Arch, but is Arch the perfect solution? For the moment, and for us, it is!

Better alternatives

We know that there are some alternatives, especially on the server-side:

  • FreeBSD: same advantages as Arch, but sometimes less up to date. We are using it in production for ZFS (a working btrfs).
  • KissLinux: very minimalist OS, could be interesting for us.
  • Exherbo: actively recommended by some people that we respect a lot!

But we are not yet ready to cross this line!

The future?

We hope to release all our tools, once the code is cleaned, on Github, but we are also planning to add some new features, like geo-replicated storage through Garage, optimized build machine, and go to the next step: put Kalvad's kernel configuration inside the build system.

If you have a problem and no one else can help, maybe you can hire the Kalvad-Team.