We use proot in our build pipeline and it would be interesting to look into alternatives.
Re 'proot'. I've never used it (it seems to be a configurator for the mount namespace), but nsjail seems much more advanced: cgroups support, seccomp-bpf via configuration language support, and a few more features (configs, net).
Not exactly, you can technically overwrite a file with bind mounts, e.g. use
nsjail --chroot / -R /dev/null:/etc/passwd -- /bin/sh -i
This will make /etc/passwd empty, but nsjail doesn't rewrite syscalls. In order to do that, you'd have to use SECCOMP_RET_TRACE (TRACE(number) in kafel config lang), and then add some C code to nsjail which will use ptrace() to intercept and rewrite your syscall. It's possible, just not implemented, because it didn't seem like something that's required by users.
Otherwise, it's possible to make it support that. Though, a word of caution: ptrace() is complex, and sometimes buggy interface with a lot of corner-cases - iow: it's easy to make a mistake with consequences for security of the whole setup.
PS: It's possible to use SECCOMP_RET_TRAP (TRAP(number) in kafel's - nsjail seccomp-bpf cfg language - nomenclature), and rewrite syscalls in-process with help of SIGSYS signal handler.
Yup, nsjail doesn't have X hacks (I should work on that), though it offers some profiles for Apache-like type of applications:
https://github.com/google/nsjail/tree/master/configs
I believe nsjail uses one of the most advanced (if not the most advanced) seccomp-bpf config language - kafel: https://github.com/google/kafel
Are there any other notable differences?
- ability to use config files (in nsjail in protobuf format)
- 3 operational modes: one of them allows to listen on a TCP port and run processes on-demand (inetd-style)
- support for cgroups (pid and mem limiting), here rlimits are not enough
- more expressive seccomp-bpf rules
systemd-nspawn supports ".nspawn" files (see --settings=true mode)
> socket activation
systemd can start up an nspawn thing in reaction to a systemd socket-activation request I think?
> cgroups
I guess for that you'd use 'systemd-run --scope -p MemoryLimit=10M -p CPUShares=100 -- systemd-nspawn ...'
> more expressive seccomp-bpf rules
Absolutely!
Without nsjail making that guarantee, nsjail is just yet another command line interface to namespaces.
Thank you authors! Really appreciate your work on this project.