Stateless provisioning of stateful nodes: examples with Warewulf 4
This article was also published via the CIQ blog on 30 November 2022.
When deploying Warewulf 4, we often encounter expectations that Warewulf should support stateful provisioning. Typically these expectations are born from experience with another system (such as Foreman, XCAT, or even Warewulf 3) that supported writing a provisioned operating system to the local disk of each compute node.
Warewulf 4 intentionally omits this kind of stateful provisioning from its feature set, following experiences from Warewulf 3: the code for stateful provisioning was complex, and required a disproportionate amount of maintenance compared to the number of sites using it.
For the most part, we think that arguments for stateful provisioning are better addressed within Warewulf 4's stateless provisioning process. I'd like to go over three such common use cases here, and show how each can be addressed to provision nodes with local state using Warewulf 4.
Local scratch
The first thing to understand is that stateless provisioning does not mean diskless nodes. For example, you may have a local disk that you want to provide as a scratch file system.
Warewulf compute nodes run a small wwclient
agent that assists with
the init
process during boot and deploys the node's overlays during
boot and runtime. wwclient
reads its own initialization scripts from
/warewulf/init.d/
, so we can place startup scripts there to take
actions during boot.
My test nodes here are KVM instances with a virtual disk at
/dev/vda
. This wwclient
init script looks for a "local-scratch"
file system and, if it does not exist, creates one on the local disk.
#!/bin/sh # # /warewulf/init.d/70-mkfs-local-scratch PATH=/usr/sbin:/usr/bin # KVM disks require a kernel module modprobe virtio_blk fs=$(findfs LABEL=local-scratch) if [ $? == 0 ] then echo "local-scratch filesystem already exists: ${fs}" else target=/dev/vda echo "Creating local-scratch filesystem on ${target}" mkfs.ext4 -FL local-scratch "${target}" fi
wwclient
runs this script before it passes init
on to systemd, so
it is also processed before fstab
. So we can mount the
"local-scratch" file system just like any other disk in fstab
.
LABEL=local-scratch /mnt/scratch ext4 defaults,X-mount.mkdir,nofail 0 0
The Warewulf 4 overlay system allows us to deploy customized files to
nodes or groups of nodes (via profiles) at boot. For this example,
I've placed my customized fstab
and init script in a "local-scratch"
overlay and included it as a system overlay, alongside the default
wwinit
overlay.
# wwctl overlay list -a local-scratch OVERLAY NAME FILES/DIRS local-scratch /etc/ local-scratch /etc/fstab.ww local-scratch /warewulf/ local-scratch /warewulf/init.d/ Local-scratch /warewulf/init.d/70-mkfs-local-scratch # wwctl profile set --system wwinit,local-scratch default # wwctl overlay build
Because local-scratch
is listed after wwinit
in the "system"
overlay list (see above), its fstab
overrides the definition in the
wwinit
overlay. 70-mkfs-local-scratch
is placed alongside other
init scripts, and is processed in lexical order.
A node booting with this overlay will create (if it does not exist) a "local-scratch" file system and mount it at "/mnt/scratch", potentially for use by compute jobs.
Disk partitioning
But perhaps you want to do something more complex. Perhaps you have a single disk, but you want to allocate part of it for scratch (as above) and part of it as swap space. Perhaps contrary to popular opinion, we actively encourage the use of swap space in an image-netboot environment like Warewulf 4: a swap partition that is at least as big as the image to be booted allows Linux to write idle portions of the image to disk, freeing up system memory for compute jobs.
So let's expand on the above pattern to actually partition a disk, rather than just format it.
#!/bin/sh # # /warewulf/init.d/70-parted PATH=/usr/sbin:/usr/bin # KVM disks require a kernel module modprobe virtio_blk local_swap=$(findfs LABEL=local-swap) local_scratch=$(findfs LABEL=local-scratch) if [ -n "${local_swap}" -a -n "${local_scratch}" ] then echo "Found local-swap: ${local_swap}" echo "Found local-scratch: ${local_scratch}" else disk=/dev/vda local_swap="${disk}1" local_scratch="${disk}2" echo "Writing partition table to ${disk}" parted --script --align=optimal ${disk} -- \ mklabel gpt \ mkpart primary linux-swap 0 2GB \ mkpart primary ext4 2GB -1 echo "Creating local-swap on ${local_swap}" mkswap --label=local-swap "${local_swap}" echo "Creating local-scratch on ${local_scratch}" mkfs.ext4 -FL local-scratch "${local_scratch}" fi
This new init script looks for the expected "local-scratch" and
"local-swap" and, if either of them is not found, uses parted
to
partition the disk and creates them. As before, this is done before
fstab
is processed, so we can configure these with fstab
the
standard way.
LABEL=local-swap swap swap defaults,nofail 0 0 LABEL=local-scratch /mnt/scratch ext4 defaults,X-mount.mkdir,nofail 0 0
This configuration went into a new parted
overlay, allowing us to
configure some nodes for "local-scratch" only, and some nodes for this
partitioned layout.
# wwctl overlay list -a parted OVERLAY NAME FILES/DIRS parted /etc/ parted /etc/fstab.ww parted /warewulf/ parted /warewulf/init.d/ parted /warewulf/init.d/70-parted # wwctl profile set --system wwinit,parted default # wwctl overlay build
(Note: I installed parted
in my system image to support this; but
the same could also be done with sfdisk
, which is included in the
image by default.)
Persistent storage for logs
Another common use case we hear concerns the persistence of logs on the compute nodes. Particularly in a failure event, where a node must be rebooted, it can be useful to have retained logs on the compute host so that they can be investigated when the node is brought back up: in a default stateless deployment, these logs are lost on reboot.
We can extend from the previous two examples to deploy a "local-log" file system to retain these logs between reboots.
(Note: generally we advise not retaining logs on compute nodes: in stead, you should deploy something like Elasticsearch, Splunk, or even just a central rsyslog instance.)
#!/bin/sh # # /warewulf/init.d/70-parted PATH=/usr/sbin:/usr/bin # KVM disks require a kernel module modprobe virtio_blk local_swap=$(findfs LABEL=local-swap) local_log=$(findfs LABEL=local-log) local_scratch=$(findfs LABEL=local-scratch) if [ -n "${local_swap}" -a -n "${local_log}" -a -n "${local_scratch}" ] then echo "Found local-swap: ${local_swap}" echo "Found local-log: ${local_log}" echo "Found local-scratch: ${local_scratch}" else disk=/dev/vda local_swap="${disk}1" local_log="${disk}2" local_scratch="${disk}3" echo "Writing partition table to ${disk}" parted --script --align=optimal ${disk} -- \ mklabel gpt \ mkpart primary linux-swap 0 2GB \ mkpart primary ext4 2GB 4GB \ mkpart primary ext4 4GB -1 echo "Creating local-swap on ${local_swap}" mkswap --label=local-swap "${local_swap}" echo "Creating local-log on ${local_log}" mkfs.ext4 -FL local-log "${local_log}" echo "Populating local-log from image /var/log/" mkdir -p /mnt/log/ \ && mount "${local_log}" /mnt/log \ && rsync -a /var/log/ /mnt/log/ \ && umount /mnt/log/ \ && rmdir /mnt/log echo "Creating local-scratch on ${local_scratch}" mkfs.ext4 -FL local-scratch "${local_scratch}" fi
For the most part, this follows the same pattern from the "parted" example above; but adds a step to initalize the new "local-log" file system from the directory structure in the image.
Finally, the new file system is added to fstab, after which logs will be persisted on the local disk.
LABEL=local-swap swap swap defaults,nofail 0 0 LABEL=local-scratch /mnt/scratch ext4 defaults,X-mount.mkdir,nofail 0 0 LABEL=local-log /var/log ext4 defaults,nofail 0 0
Some applications may write logs outside of /var/log
; but, in these
instances, it's probably easier to configure the application to write
to /var/log
than to try to capture all the places where logs might
be written.
The future
There are a few more use cases that we sometimes hear brought up in the context of stateful node provisioning:
- How can we use Ansible to configure compute nodes?
- How can we configure custom kernels and kernel modules per node?
- Isn't stateless provisioning slower than having the OS deployed on disk?
If you'd like to hear more about these or other potential corner-cases for stateless provisioning, get in touch! We'd love to hear from you, learn about the work you're doing, and address any of the challenges you're having.