Splitting Warewulf Images Between PXE and NFS

This article was also published via the CIQ blog on 6 December 2022.

Warewulf 4 introduced compatibility with the OCI container ecosystem, which greatly streamlines the process of defining, importing, and maintaining compute node images compared to other systems--even compared to Warewulf 3! But one aspect of compute node images remains unchanged: they can quickly grow in size.

Warewulf (and the technique of PXE-booting a node image more broadly) expects that a compute node image will remain relatively small. Larger sets of software, like you might provide via an Environment Modules stack or, perhaps, via Spack, are typically deployed via a central NFS share, which is then mounted at runtime by the booted compute node. Even OpenHPC, with software packaged as operating system containers, supports this paradigm, with packages installed on the head node, landing in /opt, and then being shared from the head node to compute nodes.

However, there are still benefits to maintaining this software as part of a compute node image; but such a large image can quickly grow to tens of gigabytes, making network booting difficult.

In this article I'll demonstrate how a full software stack can be managed together with a given compute node image, but the resultant payload can be split in-place between PXE-served netbooting and an NFS-mounted file system.


NOTE

This procedure depends on support for /etc/warewulf/excludes, which was broken in Warewulf v4.3.0.


The root image

First, I start with the standard Rocky Linux 8 image as published by HPCng.

[root@wwctl1 ~]# wwctl container import docker://docker.io/warewulf/rocky:8 rocky-8-split

Installing some software

Using the OpenHPC project as a source, I install a set of typical scientific software. Most OpenHPC packages install software in /opt for distribution via NFS, which is what we're going to do: just a little bit differently than usual.

[root@wwctl1 ~]# wwctl container shell rocky-8-split
[rocky-8-split] Warewulf> dnf -y install 'dnf-command(config-manager)'
[rocky-8-split] Warewulf> dnf config-manager --set-enabled powertools
[rocky-8-split] Warewulf> dnf -y install epel-release http://repos.openhpc.community/OpenHPC/2/CentOS_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm
[rocky-8-split] Warewulf> dnf -y install valgrind-ohpc {netcdf,pnetcdf,hypre,boost}-gnu9-mpich-ohpc

After installing the software our image is approaching 2GB. This isn't egregious (and the compressed image as sent over the network is even smaller), but gives us a point of comparison for what comes next.

[root@wwctl1 ~]# du -h /var/lib/warewulf/container/rocky-8-split.img{,.gz}
1.8G    /var/lib/warewulf/container/rocky-8-split.img
651M    /var/lib/warewulf/container/rocky-8-split.img.gz

Excluding the software from the final image

Warewulf consults /etc/warewulf/excludes within the image itself to define files that should not be included in the built image. For our example here, I exclude the full contents of /opt/, in anticipation that we'll be mounting it via NFS in stead.

[rocky-8-split] Warewulf> cat /etc/warewulf/excludes
/boot
/usr/share/GeoIP
/opt/*

Rebuilding the image with /opt/* excluded, the image is reduced in size, and further software installation would no longer increase the final size of the image delivered over PXE.

[root@wwctl1 ~]# du -h /var/lib/warewulf/container/rocky-8-split.img{,.gz}
1.1G    /var/lib/warewulf/container/rocky-8-split.img
483M    /var/lib/warewulf/container/rocky-8-split.img.gz

Exporting the software via NFS

With the software in /opt excluded from the image, we need to export it via NFS in stead. This is relatively easily done, though we must discover and hard-code paths to the container directory.

[root@wwctl1 ~]# readlink -f $(wwctl container show rocky-8-split)/opt
/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt

Add an NFS export to /etc/warewulf/warewulf.conf, restart the Warewulf server, and configure NFS with wwctl. Note that I've specified mount: false for this export, as I want to control which nodes will mount it: presumably nodes that aren't using this image should not mount this image's software.

nfs:
  export paths:
  - path: /var/lib/warewulf/chroots/rocky-8-split/rootfs/opt
    export options: rw,sync,no_root_squash
    mount: false
[root@wwctl1 ~]# systemctl restart warewulfd
[root@wwctl1 ~]# wwctl configure nfs

Mounting the software on the compute node

We can mount this new NFS share just like any other, by listing it in fstab.

Warewulf typically configures fstab as part of the wwinit overlay. In order to mount this NFS share without setting mount: true for all nodes, I copy fstab.ww to a new overlay and add an additional entry.

[root@wwctl1 ~]# wwctl overlay list -a rocky-8-split
OVERLAY NAME                   FILES/DIRS
rocky-8-split                  /etc/
rocky-8-split                  /etc/fstab.ww

[root@wwctl1 ~]# wwctl overlay show rocky-8-split /etc/fstab.ww | tail -n1
{{ .Ipaddr }}:/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt /opt nfs defaults 0 0

I can add the new overlay to our wwinit list, and the fstab in rocky-8-split will override the one in wwinit. (Note: --wwinit was specified as --system in Warewulf 4.3.0.)

[root@wwctl1 ~]# wwctl profile set --wwinit wwinit,rocky-8-split default
[root@wwctl1 ~]# wwctl profile set --container rocky-8-split default

From a compute node, we can see that /opt is mounted via NFS as expected.

[root@compute1 ~]# findmnt /opt
TARGET SOURCE                                                      FSTYPE OPTIONS
/opt   10.0.0.3:/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt nfs4   rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.4,local_lock=none,addr=10.0.0.3

We can further confirm that /opt is empty on the local, PXE-deployed file system.

[root@compute1 ~]# mount -o bind / /mnt
[root@compute1 ~]# du -s /mnt/opt
0   /mnt/opt

Future work

As demonstrated here, we can already implement split PXE/NFS images using functionality already in Warewulf; but future Warewulf development may simplify this process further:

Container path variables in warewulf.conf

We could support referring to compute node images in warewulf.conf. For example, it would be nice to be able to replace

nfs:
  export paths:
  - path: /var/lib/warewulf/chroots/rocky-8-split/rootfs/opt
    export options: rw,sync,no_root_squash
    mount: false

with something like

nfs:
  export paths:
  - path: {{ containers['rocky-8-split'] }}/opt
    export options: rw,sync,no_root_squash
    mount: false

This way, our configuration would not have to hard-code the path to the container chroot.

Move NFS mount settings to nodes and profiles

Right now, NFS client settings are stored in warewulf.conf as mount options, mount, and implicitly via path; but if these settings were moved to nodes and profiles we could configure per-profile and per-node NFS client behavior without having to manually edit or override fstab.