<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="../assets/xml/rss.xsl" media="all"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd"><channel><title>civilfritz.net (Posts about technology)</title><link>https://civilfritz.net/</link><description></description><atom:link href="https://civilfritz.net/tags/technology.xml" rel="self" type="application/rss+xml"></atom:link><language>en</language><copyright>Contents © 2024 &lt;a href="mailto:janderson@civilfritz.net"&gt;Jonathon Anderson&lt;/a&gt; &lt;a rel="license" href="http://creativecommons.org/licenses/by-sa/4.0/"&gt;&lt;img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-sa/4.0/88x31.png" /&gt;&lt;/a&gt;</copyright><lastBuildDate>Sun, 24 Mar 2024 06:48:49 GMT</lastBuildDate><generator>Nikola (getnikola.com)</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs><item><title>Splitting Warewulf Images Between PXE and NFS</title><link>https://civilfritz.net/anderbubble/splitting-warewulf-images-between-pxe-and-nfs/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;&lt;em&gt;&lt;a href="https://ciq.co/blog/splitting-warewulf-images-between-pxe-and-nfs/"&gt;This article was also published via the CIQ blog on 6 December 2022.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;Warewulf 4 introduced compatibility with the OCI container ecosystem,
which greatly streamlines the process of defining, importing, and
maintaining compute node images compared to other systems--even
compared to Warewulf 3! But one aspect of compute node images remains
unchanged: they can quickly grow in size.&lt;/p&gt;
&lt;p&gt;Warewulf (and the technique of PXE-booting a node image more broadly)
expects that a compute node image will remain relatively small. Larger
sets of software, like you might provide via an Environment Modules
stack or, perhaps, via Spack, are typically deployed via a central NFS
share, which is then mounted at runtime by the booted compute
node. Even OpenHPC, with software packaged as operating system
containers, supports this paradigm, with packages installed on the
head node, landing in &lt;code&gt;/opt&lt;/code&gt;, and then being shared from the head node
to compute nodes.&lt;/p&gt;
&lt;p&gt;However, there are still benefits to maintaining this software as part
of a compute node image; but such a large image can quickly grow to
tens of gigabytes, making network booting difficult.&lt;/p&gt;
&lt;p&gt;In this article I'll demonstrate how a full software stack can be
managed together with a given compute node image, but the resultant
payload can be split in-place between PXE-served netbooting and an
NFS-mounted file system.&lt;/p&gt;
&lt;hr&gt;
&lt;p&gt;&lt;strong&gt;NOTE&lt;/strong&gt;&lt;/p&gt;
&lt;p&gt;This procedure depends on support for &lt;code&gt;/etc/warewulf/excludes&lt;/code&gt;, which
was broken in Warewulf v4.3.0.&lt;/p&gt;
&lt;hr&gt;
&lt;h3&gt;The root image&lt;/h3&gt;
&lt;p&gt;First, I start with the standard Rocky Linux 8 image as published by
HPCng.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;container&lt;span class="w"&gt; &lt;/span&gt;import&lt;span class="w"&gt; &lt;/span&gt;docker://docker.io/warewulf/rocky:8&lt;span class="w"&gt; &lt;/span&gt;rocky-8-split
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Installing some software&lt;/h3&gt;
&lt;p&gt;Using the OpenHPC project as a source, I install a set of typical
scientific software. Most OpenHPC packages install software in &lt;code&gt;/opt&lt;/code&gt;
for distribution via NFS, which is what we're going to do: just a
little bit differently than usual.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;container&lt;span class="w"&gt; &lt;/span&gt;shell&lt;span class="w"&gt; &lt;/span&gt;rocky-8-split
&lt;span class="go"&gt;[rocky-8-split] Warewulf&amp;gt; dnf -y install 'dnf-command(config-manager)'&lt;/span&gt;
&lt;span class="go"&gt;[rocky-8-split] Warewulf&amp;gt; dnf config-manager --set-enabled powertools&lt;/span&gt;
&lt;span class="go"&gt;[rocky-8-split] Warewulf&amp;gt; dnf -y install epel-release http://repos.openhpc.community/OpenHPC/2/CentOS_8/x86_64/ohpc-release-2-1.el8.x86_64.rpm&lt;/span&gt;
&lt;span class="go"&gt;[rocky-8-split] Warewulf&amp;gt; dnf -y install valgrind-ohpc {netcdf,pnetcdf,hypre,boost}-gnu9-mpich-ohpc&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;After installing the software our image is approaching 2GB. This isn't
egregious (and the compressed image as sent over the network is even
smaller), but gives us a point of comparison for what comes next.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;du&lt;span class="w"&gt; &lt;/span&gt;-h&lt;span class="w"&gt; &lt;/span&gt;/var/lib/warewulf/container/rocky-8-split.img&lt;span class="o"&gt;{&lt;/span&gt;,.gz&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="go"&gt;1.8G    /var/lib/warewulf/container/rocky-8-split.img&lt;/span&gt;
&lt;span class="go"&gt;651M    /var/lib/warewulf/container/rocky-8-split.img.gz&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Excluding the software from the final image&lt;/h3&gt;
&lt;p&gt;Warewulf consults &lt;code&gt;/etc/warewulf/excludes&lt;/code&gt; within the image itself to
define files that should not be included in the built image. For our
example here, I exclude the full contents of &lt;code&gt;/opt/&lt;/code&gt;, in anticipation
that we'll be mounting it via NFS in stead.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="go"&gt;[rocky-8-split] Warewulf&amp;gt; cat /etc/warewulf/excludes&lt;/span&gt;
&lt;span class="go"&gt;/boot&lt;/span&gt;
&lt;span class="go"&gt;/usr/share/GeoIP&lt;/span&gt;
&lt;span class="go"&gt;/opt/*&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Rebuilding the image with &lt;code&gt;/opt/*&lt;/code&gt; excluded, the image is reduced in
size, and further software installation would no longer increase the
final size of the image delivered over PXE.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;du&lt;span class="w"&gt; &lt;/span&gt;-h&lt;span class="w"&gt; &lt;/span&gt;/var/lib/warewulf/container/rocky-8-split.img&lt;span class="o"&gt;{&lt;/span&gt;,.gz&lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="go"&gt;1.1G    /var/lib/warewulf/container/rocky-8-split.img&lt;/span&gt;
&lt;span class="go"&gt;483M    /var/lib/warewulf/container/rocky-8-split.img.gz&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Exporting the software via NFS&lt;/h3&gt;
&lt;p&gt;With the software in &lt;code&gt;/opt&lt;/code&gt; excluded from the image, we need to export
it via NFS in stead. This is relatively easily done, though we must
discover and hard-code paths to the container directory.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;readlink&lt;span class="w"&gt; &lt;/span&gt;-f&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;container&lt;span class="w"&gt; &lt;/span&gt;show&lt;span class="w"&gt; &lt;/span&gt;rocky-8-split&lt;span class="k"&gt;)&lt;/span&gt;/opt
&lt;span class="go"&gt;/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Add an NFS export to &lt;code&gt;/etc/warewulf/warewulf.conf&lt;/code&gt;, restart the
Warewulf server, and configure NFS with &lt;code&gt;wwctl&lt;/code&gt;. Note that I've
specified &lt;code&gt;mount: false&lt;/code&gt; for this export, as I want to control &lt;em&gt;which&lt;/em&gt;
nodes will mount it: presumably nodes that aren't using this image
should not mount this image's software.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nt"&gt;nfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;export paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;export options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;rw,sync,no_root_squash&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;systemctl&lt;span class="w"&gt; &lt;/span&gt;restart&lt;span class="w"&gt; &lt;/span&gt;warewulfd
&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;configure&lt;span class="w"&gt; &lt;/span&gt;nfs
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Mounting the software on the compute node&lt;/h3&gt;
&lt;p&gt;We can mount this new NFS share just like any other, by listing it in &lt;code&gt;fstab&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Warewulf typically configures &lt;code&gt;fstab&lt;/code&gt; as part of the &lt;code&gt;wwinit&lt;/code&gt;
overlay. In order to mount this NFS share without setting &lt;code&gt;mount:
true&lt;/code&gt; for all nodes, I copy &lt;code&gt;fstab.ww&lt;/code&gt; to a new overlay and add an
additional entry.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;overlay&lt;span class="w"&gt; &lt;/span&gt;list&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;rocky-8-split
&lt;span class="go"&gt;OVERLAY NAME                   FILES/DIRS&lt;/span&gt;
&lt;span class="go"&gt;rocky-8-split                  /etc/&lt;/span&gt;
&lt;span class="go"&gt;rocky-8-split                  /etc/fstab.ww&lt;/span&gt;

&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;overlay&lt;span class="w"&gt; &lt;/span&gt;show&lt;span class="w"&gt; &lt;/span&gt;rocky-8-split&lt;span class="w"&gt; &lt;/span&gt;/etc/fstab.ww&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;|&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;tail&lt;span class="w"&gt; &lt;/span&gt;-n1
&lt;span class="go"&gt;{{ .Ipaddr }}:/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt /opt nfs defaults 0 0&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;I can add the new overlay to our &lt;code&gt;wwinit&lt;/code&gt; list, and the &lt;code&gt;fstab&lt;/code&gt; in
&lt;code&gt;rocky-8-split&lt;/code&gt; will override the one in &lt;code&gt;wwinit&lt;/code&gt;. (Note: &lt;code&gt;--wwinit&lt;/code&gt;
was specified as &lt;code&gt;--system&lt;/code&gt; in Warewulf 4.3.0.)&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;profile&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--wwinit&lt;span class="w"&gt; &lt;/span&gt;wwinit,rocky-8-split&lt;span class="w"&gt; &lt;/span&gt;default
&lt;span class="gp"&gt;[root@wwctl1 ~]# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;profile&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--container&lt;span class="w"&gt; &lt;/span&gt;rocky-8-split&lt;span class="w"&gt; &lt;/span&gt;default
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;From a compute node, we can see that &lt;code&gt;/opt&lt;/code&gt; is mounted via NFS as
expected.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@compute1 ~]# &lt;/span&gt;findmnt&lt;span class="w"&gt; &lt;/span&gt;/opt
&lt;span class="go"&gt;TARGET SOURCE                                                      FSTYPE OPTIONS&lt;/span&gt;
&lt;span class="go"&gt;/opt   10.0.0.3:/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt nfs4   rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,hard,proto=tcp,timeo=600,retrans=2,sec=sys,clientaddr=10.0.0.4,local_lock=none,addr=10.0.0.3&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;We can further confirm that &lt;code&gt;/opt&lt;/code&gt; is empty on the local, PXE-deployed
file system.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;[root@compute1 ~]# &lt;/span&gt;mount&lt;span class="w"&gt; &lt;/span&gt;-o&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;bind&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/&lt;span class="w"&gt; &lt;/span&gt;/mnt
&lt;span class="gp"&gt;[root@compute1 ~]# &lt;/span&gt;du&lt;span class="w"&gt; &lt;/span&gt;-s&lt;span class="w"&gt; &lt;/span&gt;/mnt/opt
&lt;span class="go"&gt;0   /mnt/opt&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;h3&gt;Future work&lt;/h3&gt;
&lt;p&gt;As demonstrated here, we can already implement split PXE/NFS images
using functionality already in Warewulf; but future Warewulf
development may simplify this process further:&lt;/p&gt;
&lt;h4&gt;Container path variables in warewulf.conf&lt;/h4&gt;
&lt;p&gt;We could support referring to compute node images in
&lt;code&gt;warewulf.conf&lt;/code&gt;. For example, it would be nice to be able to replace&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nt"&gt;nfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;export paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/var/lib/warewulf/chroots/rocky-8-split/rootfs/opt&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;export options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;rw,sync,no_root_squash&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;with something like&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="nt"&gt;nfs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="nt"&gt;export paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
&lt;span class="w"&gt;  &lt;/span&gt;&lt;span class="p p-Indicator"&gt;-&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nt"&gt;path&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;{{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;containers&lt;/span&gt;&lt;span class="p p-Indicator"&gt;[&lt;/span&gt;&lt;span class="s"&gt;'rocky-8-split'&lt;/span&gt;&lt;span class="p p-Indicator"&gt;]&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p p-Indicator"&gt;}}&lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;/opt&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;export options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;rw,sync,no_root_squash&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nt"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="l l-Scalar l-Scalar-Plain"&gt;false&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This way, our configuration would not have to hard-code the path to
the container chroot.&lt;/p&gt;
&lt;h4&gt;Move NFS mount settings to nodes and profiles&lt;/h4&gt;
&lt;p&gt;Right now, NFS client settings are stored in &lt;code&gt;warewulf.conf&lt;/code&gt; as &lt;code&gt;mount
options&lt;/code&gt;, &lt;code&gt;mount&lt;/code&gt;, and implicitly via &lt;code&gt;path&lt;/code&gt;; but if these settings
were moved to nodes and profiles we could configure per-profile and
per-node NFS client behavior without having to manually edit or
override &lt;code&gt;fstab&lt;/code&gt;.&lt;/p&gt;</description><category>technology</category><category>warewulf</category><guid>urn:uuid:0f31138a-a5e0-467b-be30-c9287d505ec9</guid><pubDate>Wed, 07 Dec 2022 23:30:00 GMT</pubDate></item><item><title>Stateless provisioning of stateful nodes: examples with Warewulf 4</title><link>https://civilfritz.net/anderbubble/stateless-provisioning-of-stateful-nodes/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;&lt;em&gt;&lt;a href="https://ciq.co/blog/stateless-provisioning-of-stateful-notes-examples-with-warewulf-4/"&gt;This article was also published via the CIQ blog on 30 November 2022.&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;
&lt;p&gt;When deploying Warewulf 4, we often encounter expectations that
Warewulf should support stateful provisioning. Typically these
expectations are born from experience with another system (such as
Foreman, XCAT, or even Warewulf 3) that supported writing a
provisioned operating system to the local disk of each compute node.&lt;/p&gt;
&lt;p&gt;Warewulf 4 intentionally omits this kind of stateful provisioning from
its feature set, following experiences from Warewulf 3: the code for
stateful provisioning was complex, and required a disproportionate
amount of maintenance compared to the number of sites using it.&lt;/p&gt;
&lt;p&gt;For the most part, we think that arguments for stateful provisioning
are better addressed within Warewulf 4's stateless provisioning
process. I'd like to go over three such common use cases here, and
show how each can be addressed to provision nodes with local state
using Warewulf 4.&lt;/p&gt;
&lt;h3&gt;Local scratch&lt;/h3&gt;
&lt;p&gt;The first thing to understand is that stateless provisioning does not
mean diskless nodes. For example, you may have a local disk that you
want to provide as a scratch file system.&lt;/p&gt;
&lt;p&gt;Warewulf compute nodes run a small &lt;code&gt;wwclient&lt;/code&gt; agent that assists with
the &lt;code&gt;init&lt;/code&gt; process during boot and deploys the node's overlays during
boot and runtime. &lt;code&gt;wwclient&lt;/code&gt; reads its own initialization scripts from
&lt;code&gt;/warewulf/init.d/&lt;/code&gt;, so we can place startup scripts there to take
actions during boot.&lt;/p&gt;
&lt;p&gt;My test nodes here are KVM instances with a virtual disk at
&lt;code&gt;/dev/vda&lt;/code&gt;. This &lt;code&gt;wwclient&lt;/code&gt; init script looks for a "local-scratch"
file system and, if it does not exist, creates one on the local disk.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="ch"&gt;#!/bin/sh&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# /warewulf/init.d/70-mkfs-local-scratch&lt;/span&gt;

&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/sbin:/usr/bin

&lt;span class="c1"&gt;# KVM disks require a kernel module&lt;/span&gt;
modprobe&lt;span class="w"&gt; &lt;/span&gt;virtio_blk

&lt;span class="nv"&gt;fs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;findfs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;local-scratch&lt;span class="k"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;$?&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"local-scratch filesystem already exists: &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;fs&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;target&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/vda
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Creating local-scratch filesystem on &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;mkfs.ext4&lt;span class="w"&gt; &lt;/span&gt;-FL&lt;span class="w"&gt; &lt;/span&gt;local-scratch&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;target&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;wwclient&lt;/code&gt; runs this script before it passes &lt;code&gt;init&lt;/code&gt; on to systemd, so
it is also processed before &lt;code&gt;fstab&lt;/code&gt;. So we can mount the
"local-scratch" file system just like any other disk in &lt;code&gt;fstab&lt;/code&gt;.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;LABEL=local-scratch /mnt/scratch ext4 defaults,X-mount.mkdir,nofail 0 0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;The Warewulf 4 overlay system allows us to deploy customized files to
nodes or groups of nodes (via profiles) at boot. For this example,
I've placed my customized &lt;code&gt;fstab&lt;/code&gt; and init script in a "local-scratch"
overlay and included it as a system overlay, alongside the default
&lt;code&gt;wwinit&lt;/code&gt; overlay.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;overlay&lt;span class="w"&gt; &lt;/span&gt;list&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;local-scratch
&lt;span class="go"&gt;OVERLAY NAME                   FILES/DIRS  &lt;/span&gt;
&lt;span class="go"&gt;local-scratch                  /etc/        &lt;/span&gt;
&lt;span class="go"&gt;local-scratch                  /etc/fstab.ww&lt;/span&gt;
&lt;span class="go"&gt;local-scratch                  /warewulf/   &lt;/span&gt;
&lt;span class="go"&gt;local-scratch                  /warewulf/init.d/&lt;/span&gt;
&lt;span class="go"&gt;Local-scratch                  /warewulf/init.d/70-mkfs-local-scratch&lt;/span&gt;

&lt;span class="gp"&gt;# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;profile&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--system&lt;span class="w"&gt; &lt;/span&gt;wwinit,local-scratch&lt;span class="w"&gt; &lt;/span&gt;default
&lt;span class="gp"&gt;# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;overlay&lt;span class="w"&gt; &lt;/span&gt;build
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Because &lt;code&gt;local-scratch&lt;/code&gt; is listed after &lt;code&gt;wwinit&lt;/code&gt; in the "system"
overlay list (see above), its &lt;code&gt;fstab&lt;/code&gt; overrides the definition in the
&lt;code&gt;wwinit&lt;/code&gt; overlay. &lt;code&gt;70-mkfs-local-scratch&lt;/code&gt; is placed alongside other
init scripts, and is processed in lexical order.&lt;/p&gt;
&lt;p&gt;A node booting with this overlay will create (if it does not exist) a
"local-scratch" file system and mount it at "/mnt/scratch",
potentially for use by compute jobs.&lt;/p&gt;
&lt;h3&gt;Disk partitioning&lt;/h3&gt;
&lt;p&gt;But perhaps you want to do something more complex. Perhaps you have a
single disk, but you want to allocate part of it for scratch (as
above) and part of it as swap space. Perhaps contrary to popular
opinion, we actively encourage the use of swap space in an
image-netboot environment like Warewulf 4: a swap partition that is at
least as big as the image to be booted allows Linux to write idle
portions of the image to disk, freeing up system memory for compute
jobs.&lt;/p&gt;
&lt;p&gt;So let's expand on the above pattern to actually &lt;em&gt;partition&lt;/em&gt; a disk,
rather than just format it.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="ch"&gt;#!/bin/sh&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# /warewulf/init.d/70-parted&lt;/span&gt;

&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/sbin:/usr/bin

&lt;span class="c1"&gt;# KVM disks require a kernel module&lt;/span&gt;
modprobe&lt;span class="w"&gt; &lt;/span&gt;virtio_blk

&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;findfs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;local-swap&lt;span class="k"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;findfs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;local-scratch&lt;span class="k"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-n&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;-n&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Found local-swap: &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Found local-scratch: &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/vda
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;1"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;2"&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Writing partition table to &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;parted&lt;span class="w"&gt; &lt;/span&gt;--script&lt;span class="w"&gt; &lt;/span&gt;--align&lt;span class="o"&gt;=&lt;/span&gt;optimal&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mklabel&lt;span class="w"&gt; &lt;/span&gt;gpt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mkpart&lt;span class="w"&gt; &lt;/span&gt;primary&lt;span class="w"&gt; &lt;/span&gt;linux-swap&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;2GB&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mkpart&lt;span class="w"&gt; &lt;/span&gt;primary&lt;span class="w"&gt; &lt;/span&gt;ext4&lt;span class="w"&gt; &lt;/span&gt;2GB&lt;span class="w"&gt; &lt;/span&gt;-1

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Creating local-swap on &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;mkswap&lt;span class="w"&gt; &lt;/span&gt;--label&lt;span class="o"&gt;=&lt;/span&gt;local-swap&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Creating local-scratch on &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;mkfs.ext4&lt;span class="w"&gt; &lt;/span&gt;-FL&lt;span class="w"&gt; &lt;/span&gt;local-scratch&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This new init script looks for the expected "local-scratch" and
"local-swap" and, if either of them is not found, uses &lt;code&gt;parted&lt;/code&gt; to
partition the disk and creates them. As before, this is done before
&lt;code&gt;fstab&lt;/code&gt; is processed, so we can configure these with &lt;code&gt;fstab&lt;/code&gt; the
standard way.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;LABEL=local-swap swap swap defaults,nofail 0 0
LABEL=local-scratch /mnt/scratch ext4 defaults,X-mount.mkdir,nofail 0 0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;This configuration went into a new &lt;code&gt;parted&lt;/code&gt; overlay, allowing us to
configure some nodes for "local-scratch" only, and some nodes for this
partitioned layout.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="gp"&gt;# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;overlay&lt;span class="w"&gt; &lt;/span&gt;list&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;parted
&lt;span class="go"&gt;OVERLAY NAME                   FILES/DIRS  &lt;/span&gt;
&lt;span class="go"&gt;parted                         /etc/        &lt;/span&gt;
&lt;span class="go"&gt;parted                         /etc/fstab.ww&lt;/span&gt;
&lt;span class="go"&gt;parted                         /warewulf/   &lt;/span&gt;
&lt;span class="go"&gt;parted                         /warewulf/init.d/&lt;/span&gt;
&lt;span class="go"&gt;parted                         /warewulf/init.d/70-parted&lt;/span&gt;

&lt;span class="gp"&gt;# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;profile&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--system&lt;span class="w"&gt; &lt;/span&gt;wwinit,parted&lt;span class="w"&gt; &lt;/span&gt;default
&lt;span class="gp"&gt;# &lt;/span&gt;wwctl&lt;span class="w"&gt; &lt;/span&gt;overlay&lt;span class="w"&gt; &lt;/span&gt;build
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;(Note: I installed &lt;code&gt;parted&lt;/code&gt; in my system image to support this; but
the same could also be done with &lt;code&gt;sfdisk&lt;/code&gt;, which is included in the
image by default.)&lt;/p&gt;
&lt;h3&gt;Persistent storage for logs&lt;/h3&gt;
&lt;p&gt;Another common use case we hear concerns the persistence of logs on
the compute nodes. Particularly in a failure event, where a node must
be rebooted, it can be useful to have retained logs on the compute
host so that they can be investigated when the node is brought back
up: in a default stateless deployment, these logs are lost on reboot.&lt;/p&gt;
&lt;p&gt;We can extend from the previous two examples to deploy a "local-log"
file system to retain these logs between reboots.&lt;/p&gt;
&lt;p&gt;(Note: generally we advise &lt;em&gt;not&lt;/em&gt; retaining logs on compute nodes: in
stead, you should deploy something like Elasticsearch, Splunk, or even
just a central rsyslog instance.)&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;&lt;span class="ch"&gt;#!/bin/sh&lt;/span&gt;
&lt;span class="c1"&gt;#&lt;/span&gt;
&lt;span class="c1"&gt;# /warewulf/init.d/70-parted&lt;/span&gt;

&lt;span class="nv"&gt;PATH&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/usr/sbin:/usr/bin

&lt;span class="c1"&gt;# KVM disks require a kernel module&lt;/span&gt;
modprobe&lt;span class="w"&gt; &lt;/span&gt;virtio_blk

&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;findfs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;local-swap&lt;span class="k"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;local_log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;findfs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;local-log&lt;span class="k"&gt;)&lt;/span&gt;
&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;$(&lt;/span&gt;findfs&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nv"&gt;LABEL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;local-scratch&lt;span class="k"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;[&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-n&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;-n&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_log&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;-n&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="o"&gt;]&lt;/span&gt;
&lt;span class="k"&gt;then&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Found local-swap: &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Found local-log: &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_log&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Found local-scratch: &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;else&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;/dev/vda
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;1"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;local_log&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;2"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;3"&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Writing partition table to &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;parted&lt;span class="w"&gt; &lt;/span&gt;--script&lt;span class="w"&gt; &lt;/span&gt;--align&lt;span class="o"&gt;=&lt;/span&gt;optimal&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;disk&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;--&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mklabel&lt;span class="w"&gt; &lt;/span&gt;gpt&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mkpart&lt;span class="w"&gt; &lt;/span&gt;primary&lt;span class="w"&gt; &lt;/span&gt;linux-swap&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="m"&gt;0&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;2GB&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mkpart&lt;span class="w"&gt; &lt;/span&gt;primary&lt;span class="w"&gt; &lt;/span&gt;ext4&lt;span class="w"&gt; &lt;/span&gt;2GB&lt;span class="w"&gt; &lt;/span&gt;4GB&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;        &lt;/span&gt;mkpart&lt;span class="w"&gt; &lt;/span&gt;primary&lt;span class="w"&gt; &lt;/span&gt;ext4&lt;span class="w"&gt; &lt;/span&gt;4GB&lt;span class="w"&gt; &lt;/span&gt;-1

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Creating local-swap on &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;mkswap&lt;span class="w"&gt; &lt;/span&gt;--label&lt;span class="o"&gt;=&lt;/span&gt;local-swap&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_swap&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Creating local-log on &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_log&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;mkfs.ext4&lt;span class="w"&gt; &lt;/span&gt;-FL&lt;span class="w"&gt; &lt;/span&gt;local-log&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_log&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Populating local-log from image /var/log/"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;mkdir&lt;span class="w"&gt; &lt;/span&gt;-p&lt;span class="w"&gt; &lt;/span&gt;/mnt/log/&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;mount&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_log&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;/mnt/log&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rsync&lt;span class="w"&gt; &lt;/span&gt;-a&lt;span class="w"&gt; &lt;/span&gt;/var/log/&lt;span class="w"&gt; &lt;/span&gt;/mnt/log/&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;umount&lt;span class="w"&gt; &lt;/span&gt;/mnt/log/&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="se"&gt;\&lt;/span&gt;
&lt;span class="w"&gt;      &lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;rmdir&lt;span class="w"&gt; &lt;/span&gt;/mnt/log

&lt;span class="w"&gt;    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Creating local-scratch on &lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="w"&gt;    &lt;/span&gt;mkfs.ext4&lt;span class="w"&gt; &lt;/span&gt;-FL&lt;span class="w"&gt; &lt;/span&gt;local-scratch&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;local_scratch&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="k"&gt;fi&lt;/span&gt;
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;For the most part, this follows the same pattern from the "parted"
example above; but adds a step to initalize the new "local-log" file
system from the directory structure in the image.&lt;/p&gt;
&lt;p&gt;Finally, the new file system is added to fstab, after which logs will
be persisted on the local disk.&lt;/p&gt;
&lt;div class="code"&gt;&lt;pre class="code literal-block"&gt;LABEL=local-swap swap swap defaults,nofail 0 0
LABEL=local-scratch /mnt/scratch ext4 defaults,X-mount.mkdir,nofail 0 0
LABEL=local-log /var/log ext4 defaults,nofail 0 0
&lt;/pre&gt;&lt;/div&gt;

&lt;p&gt;Some applications may write logs outside of &lt;code&gt;/var/log&lt;/code&gt;; but, in these
instances, it's probably easier to configure the application to write
to &lt;code&gt;/var/log&lt;/code&gt; than to try to capture all the places where logs might
be written.&lt;/p&gt;
&lt;h3&gt;The future&lt;/h3&gt;
&lt;p&gt;There are a few more use cases that we sometimes hear brought up in
the context of stateful node provisioning:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;How can we use Ansible to configure compute nodes?&lt;/li&gt;
&lt;li&gt;How can we configure custom kernels and kernel modules per node?&lt;/li&gt;
&lt;li&gt;Isn't stateless provisioning slower than having the OS deployed on
  disk?&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;If you'd like to hear more about these or other potential corner-cases
for stateless provisioning, &lt;a href="https://ciq.co/contact-us"&gt;get in touch!&lt;/a&gt; We'd love to
hear from you, learn about the work you're doing, and address any of
the challenges you're having.&lt;/p&gt;</description><category>technology</category><category>warewulf</category><guid>urn:uuid:f6dd7a5c-d8ad-43a8-a38c-e982ec2e1637</guid><pubDate>Fri, 21 Oct 2022 06:00:00 GMT</pubDate></item><item><title>The SSH agent</title><link>https://civilfritz.net/anderbubble/ssh-agent/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;This is one part in a series on OpenSSH client configuration. Also read
&lt;a class="reference external" href="https://civilfritz.net/anderbubble/elegant-openssh-configuration/"&gt;Elegant OpenSSH
Configuration&lt;/a&gt; and &lt;a class="reference external" href="https://civilfritz.net/anderbubble/secure-openssh-defaults/"&gt;Secure
OpenSSH Defaults&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;As part of another SSH client article we potentially generated a new ssh
key for use in ssh public-key authentication.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-keygen -t rsa -b 4096 # if you don't already have a key&lt;/pre&gt;
&lt;p&gt;SSH public-key authentication has intrinsic benefits; but many see it as
a mechanism for non-interactive login: you don’t have to remember, or
type, a password.&lt;/p&gt;
&lt;p&gt;This behavior is dependent, however, on having a non-encrypted private
key. This is a security risk, because the non-encrypted private key may
be compromised, either by accidential mishandling of the file or by
unauthorized intrusion into the client system. In almost all cases, ssh
private keys should be encrypted with a passphrase.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-keygen -t rsa -b 4096 -f test
Generating public/private rsa key pair.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:&lt;/pre&gt;
&lt;p&gt;If you already have a passphrase that is not encrypted, use the &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;-p&lt;/span&gt;&lt;/code&gt;
argument to &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ssh-keygen&lt;/span&gt;&lt;/code&gt; to set one.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-keygen -p -f ~/.ssh/id_rsa&lt;/pre&gt;
&lt;p&gt;Now the private key is protected by a passphrase, which you’ll be
prompted for each time you use it. This is better than a password,
because the passphrase is not transmitted to the server; but we’ve lost
the ability to authenticate without having to type anything.&lt;/p&gt;
&lt;section id="ssh-agent"&gt;
&lt;h2&gt;&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ssh-agent&lt;/span&gt;&lt;/code&gt;&lt;/h2&gt;
&lt;p&gt;OpenSSH provides a dedicated agent process for the sole purpose of
handling decrypted ssh private keys in-memory. Most Unix and Linux
desktop operating systems (including OS X) start and maintain a per-user
SSH agent process automatically.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ pgrep -lfu $USER ssh-agent
815 /usr/bin/ssh-agent -l&lt;/pre&gt;
&lt;p&gt;Using the &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ssh-add&lt;/span&gt;&lt;/code&gt; command, you can decrypt your ssh private key by
inputing your passphrase once, adding the decrypted key to the running
agent.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-add ~/.ssh/id_rsa # the path to the private key may be omitted for default paths
Enter passphrase for /Users/user1234/.ssh/id_rsa:
Identity added: /Users/user1234/.ssh/id_rsa (/Users/user1234/.ssh/id_rsa)&lt;/pre&gt;
&lt;p&gt;The decrypted private key remains resident in the &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ssh-agent&lt;/span&gt;&lt;/code&gt; process.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-add -L
ssh-rsa [redacted] /Users/user1234/.ssh/id_rsa&lt;/pre&gt;
&lt;p&gt;This is better than a non-encrypted on-disk private key for two reasons:
first the decrypted private key exists only in memory, not on disk. This
makes is more difficult to mishandle, including the fact that it cannot
be recovered without re-inputing the passphrase once the workstation is
powered off. Second, client applications (like OpenSSH itself) no longer
require direct access to the private key, encrypted or otherwise, nor
must you provide your (secret) key passphrase to client applications:
the agent moderates all use of the key itself.&lt;/p&gt;
&lt;p&gt;The default OpenSSH client will use the agent process identified by the
&lt;code class="docutils literal"&gt;SSH_AUTH_SOCK&lt;/code&gt; environment variable by default; but you generally
don’t have to worry about it: your workstation environment should
configure it for you.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ echo $SSH_AUTH_SOCK
/private/tmp/com.apple.launchd.L311i5Nw5J/Listeners&lt;/pre&gt;
&lt;p&gt;At this point, there’s nothing more to do. With your ssh key added to
the agent process, you’re back to not needing to type in a password (or
passphrase), but without the risk of a non-encrypted private key stored
permanently on disk.&lt;/p&gt;
&lt;/section&gt;</description><category>technology</category><guid>https://civilfritz.net/anderbubble/ssh-agent/</guid><pubDate>Sat, 29 Oct 2016 06:00:00 GMT</pubDate></item><item><title>Secure OpenSSH defaults</title><link>https://civilfritz.net/anderbubble/secure-openssh-defaults/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;This is one part in a series on OpenSSH client configuration. Also read
&lt;a class="reference external" href="https://civilfritz.net/anderbubble/elegant-openssh-configuration/"&gt;Elegant OpenSSH
configuration&lt;/a&gt; and &lt;a class="reference external" href="https://civilfritz.net/anderbubble/ssh-agent/"&gt;The
SSH agent&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;It’s good practice to harden our ssh client with some secure “defaults”.
Starting your configuration file with the following directives will
apply the directives to all (&lt;code class="docutils literal"&gt;*&lt;/code&gt;) hosts.&lt;/p&gt;
&lt;p&gt;(These are listed as multiple &lt;code class="docutils literal"&gt;Host *&lt;/code&gt; stanzas, but they can be
combined into a single stanza in your actual configuration file.)&lt;/p&gt;
&lt;p&gt;If you prefer, follow along with &lt;a class="reference external" href="https://civilfritz.net/listings/secure-openssh-defaults/ssh_config.html"&gt;an example of a complete
~/.ssh/config file&lt;/a&gt;.&lt;/p&gt;
&lt;section id="require-secure-algorithms"&gt;
&lt;h2&gt;Require secure algorithms&lt;/h2&gt;
&lt;p&gt;OpenSSH supports many encryption and authentication algorithms, but some
of those algorithms are known to be weak to cryptographic attack. The
Mozilla project publishes a &lt;a class="reference external" href="https://wiki.mozilla.org/Security/Guidelines/OpenSSH#Modern"&gt;list of recommended
algorithms&lt;/a&gt;
that exclude algorithms that are known to be insecure.&lt;/p&gt;
&lt;pre class="literal-block"&gt;Host *
HostKeyAlgorithms ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,ssh-rsa,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp521,ecdsa-sha2-nistp384,ecdsa-sha2-nistp256
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1&lt;/pre&gt;
&lt;p&gt;(More information on the the available encryption and authentication
algorithms, and how a recommended set is derived, is available in &lt;a class="reference external" href="https://stribika.github.io/2015/01/04/secure-secure-shell.html"&gt;this
fantastic blog post, “Secure secure
shell.”&lt;/a&gt;)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="hash-your-known-hosts-file"&gt;
&lt;h2&gt;Hash your &lt;code class="docutils literal"&gt;known_hosts&lt;/code&gt; file&lt;/h2&gt;
&lt;p&gt;Every time you connect to an SSH server, your client caches a copy of
the remote server’s host key in a &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.ssh/known_hosts&lt;/span&gt;&lt;/code&gt; file. If your
ssh client is ever compromised, this list can expose the remote servers
to attack using your compromised credentials. Be a good citizen and hash
your known hosts file.&lt;/p&gt;
&lt;pre class="literal-block"&gt;Host *
HashKnownHosts yes&lt;/pre&gt;
&lt;p&gt;(Hash any existing entries in your &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.ssh/known_hosts&lt;/span&gt;&lt;/code&gt; file by
running &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ssh-keygen&lt;/span&gt; &lt;span class="pre"&gt;-H&lt;/span&gt;&lt;/code&gt;. Don’t forget to remove the backup
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.ssh/known_hosts.old&lt;/span&gt;&lt;/code&gt;.)&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-keygen -H
$ rm -i ~/.ssh/known_hosts.old&lt;/pre&gt;
&lt;/section&gt;
&lt;section id="no-roaming"&gt;
&lt;h2&gt;No roaming&lt;/h2&gt;
&lt;p&gt;Finally, &lt;a class="reference external" href="http://www.openssh.com/txt/release-7.1p2"&gt;disable the experimental “roaming”
feature&lt;/a&gt; to mitigate
exposure to a pair of potential vulnerabilities,
&lt;a class="reference external" href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-0777"&gt;CVE-2016-0777&lt;/a&gt;
and
&lt;a class="reference external" href="https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2016-0778"&gt;CVE-2016-0778&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;Host *
UseRoaming no&lt;/pre&gt;
&lt;/section&gt;
&lt;section id="dealing-with-insecure-servers"&gt;
&lt;h2&gt;Dealing with insecure servers&lt;/h2&gt;
&lt;p&gt;Some servers are old enough that they may not support the newer, more
secure algorithms listed. In the RC environment, for example, the login
and other Internet-accessible systems provide relatively modern ssh
algorithms; but the host in the &lt;code class="docutils literal"&gt;rc.int.colorado.edu&lt;/code&gt; domain may not.&lt;/p&gt;
&lt;p&gt;To support connection to older hosts while requiring newer algorithms by
default, override these settings earlier in the configuration file.&lt;/p&gt;
&lt;pre class="literal-block"&gt;# Internal RC hosts are running an old version of OpenSSH
Match host=*.rc.int.colorado.edu
MACs hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96&lt;/pre&gt;
&lt;/section&gt;</description><category>technology</category><guid>https://civilfritz.net/anderbubble/secure-openssh-defaults/</guid><pubDate>Tue, 25 Oct 2016 06:00:00 GMT</pubDate></item><item><title>Elegant OpenSSH configuration</title><link>https://civilfritz.net/anderbubble/elegant-openssh-configuration/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;This is one part in a series on OpenSSH client configuration. Also read
&lt;a class="reference external" href="https://civilfritz.net/anderbubble/secure-openssh-defaults/"&gt;Secure OpenSSH defaults&lt;/a&gt; and
&lt;a class="reference external" href="https://civilfritz.net/anderbubble/ssh-agent/"&gt;The SSH agent&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;The OpenSSH client is very robust, verify flexible, and very
configurable. Many times I see people struggling to remember
server-specific ssh flags or arcane, manual multi-hop procedures. I even
see entire scripts written to automate the process.&lt;/p&gt;
&lt;p&gt;But the vast majority of what you might want ssh to do can be abstracted
away with some configuration in your &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.ssh/config&lt;/span&gt;&lt;/code&gt; file.&lt;/p&gt;
&lt;p&gt;All (or, at least, most) of these configuration directives are fully
documented in &lt;a class="reference external" href="http://man.openbsd.org/ssh_config"&gt;the ssh_config manpage&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;If you prefer, follow along with &lt;a class="reference external" href="https://civilfritz.net/listings/elegant-openssh-configuration/ssh_config.html"&gt;an example of a complete
~/.ssh/config file&lt;/a&gt;.&lt;/p&gt;
&lt;section id="hostname"&gt;
&lt;h2&gt;HostName&lt;/h2&gt;
&lt;p&gt;One of the first annoyances people have–and one of the first things
people try to fix–when using a command-line ssh client is having to type
in long hostnames. For example, the Research Computing login service is
available at &lt;code class="docutils literal"&gt;login.rc.colorado.edu&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh login.rc.colorado.edu&lt;/pre&gt;
&lt;p&gt;This particular name isn’t too bad; but coupled with usernames and
especially when used as part of an &lt;code class="docutils literal"&gt;scp&lt;/code&gt;, these fully-qualified domain
names can become cumbersome.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ scp -r /path/to/src/ user1234@login.rc.colorado.edu:dest/&lt;/pre&gt;
&lt;p&gt;OpenSSH supports host aliases through pattern-matching in &lt;code class="docutils literal"&gt;Host&lt;/code&gt;
directives.&lt;/p&gt;
&lt;pre class="literal-block"&gt;Host login*.rc
HostName %h.colorado.edu

Host *.rc
HostName %h.int.colorado.edu&lt;/pre&gt;
&lt;p&gt;In this example, &lt;code class="docutils literal"&gt;%h&lt;/code&gt; is substituted with the name specified on the
command-line. With a configuration like this in place, connections to
&lt;code class="docutils literal"&gt;login.rc&lt;/code&gt; are directed to the full name &lt;code class="docutils literal"&gt;login.rc.colorado.edu&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ scp -r /path/to/src/ user1234@login.rc:dest/&lt;/pre&gt;
&lt;p&gt;Failing that, other references to hosts with a &lt;code class="docutils literal"&gt;.rc&lt;/code&gt; suffix are
directed to the &lt;em&gt;internal&lt;/em&gt; Research Computing domain. (We’ll use these
later.)&lt;/p&gt;
&lt;p&gt;(The &lt;code class="docutils literal"&gt;.rc&lt;/code&gt; domain segment could be moved from the &lt;code class="docutils literal"&gt;Host&lt;/code&gt; pattern to
the &lt;code class="docutils literal"&gt;HostName&lt;/code&gt; value; but leaving it in the alias helps to distinguish
the Research Computing login nodes from other login nodes that you may
have access to. You can use arbitrary aliases in the &lt;code class="docutils literal"&gt;Host&lt;/code&gt; directive,
too; but then the &lt;code class="docutils literal"&gt;%h&lt;/code&gt; substitution isn’t useful: you have to
enumerate each targeted host.)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="user"&gt;
&lt;h2&gt;User&lt;/h2&gt;
&lt;p&gt;Unless you happen to use the same username on your local workstation as
you have on the remove server, you likely specify a username using
either the &lt;code class="docutils literal"&gt;@&lt;/code&gt; syntax or &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;-l&lt;/span&gt;&lt;/code&gt; argument to the &lt;code class="docutils literal"&gt;ssh&lt;/code&gt; command.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh user1234@login.rc&lt;/pre&gt;
&lt;p&gt;As with specifying a fully-qualified domain name, tracking and
specifying a different username for each remote host can become
burdensome, especially during an &lt;code class="docutils literal"&gt;scp&lt;/code&gt; operation. Record the correct
username in your &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.ssh/config&lt;/span&gt;&lt;/code&gt; file in stead.&lt;/p&gt;
&lt;pre class="literal-block"&gt;Match host=*.rc.colorado.edu,*.rc.int.colorado.edu
User user1234&lt;/pre&gt;
&lt;p&gt;Now all connections to Research Computing hosts use the specified
username by default, without it having to be specified on the
command-line.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ scp -r /path/to/src/ login.rc:dest/&lt;/pre&gt;
&lt;p&gt;Note that we’re using a &lt;code class="docutils literal"&gt;Match&lt;/code&gt; directive here, rather than a &lt;code class="docutils literal"&gt;Host&lt;/code&gt;
directive. The &lt;code class="docutils literal"&gt;host=&lt;/code&gt; argument to &lt;code class="docutils literal"&gt;Match&lt;/code&gt; matches against the
&lt;em&gt;derived&lt;/em&gt; hostname, so it reflects the real hostname as determined using
the previous &lt;code class="docutils literal"&gt;Host&lt;/code&gt; directives. (Make sure the correct &lt;code class="docutils literal"&gt;HostName&lt;/code&gt; is
established earlier in the configuration, though.)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="controlmaster"&gt;
&lt;h2&gt;ControlMaster&lt;/h2&gt;
&lt;p&gt;Even if the actual command is simple to type, authenticating to the host
may be require manual intervention. The Research Computing login nodes,
for example, require two-factor authentication using a password or pin
coupled with a one-time VASCO password or Duo credential. If you want to
open multiple connections–or, again, copy files using &lt;code class="docutils literal"&gt;scp&lt;/code&gt;–having to
authenticate with multiple factors quickly becomes tedious. (Even having
to type in a password at all may be unnecessary; but we’ll assume, as is
the case with the Research Computing login example, that you can’t use
public-key authentication.)&lt;/p&gt;
&lt;p&gt;OpenSSH supports sharing a single network connection for multiple ssh
sessions.&lt;/p&gt;
&lt;pre class="literal-block"&gt;Match host=login.rc.colorado.edu
ControlMaster auto
ControlPath ~/.ssh/.socket_%h_%p_%r
ControlPersist 4h&lt;/pre&gt;
&lt;p&gt;With &lt;code class="docutils literal"&gt;ControlMaster&lt;/code&gt; and &lt;code class="docutils literal"&gt;ControlPath&lt;/code&gt; defined, the first ssh
connection authenticates and establishes a session normally; but future
connections join the active connection, bypassing the need to
re-authenticate. The optional &lt;code class="docutils literal"&gt;ControlPersist&lt;/code&gt; option causes this
connection to remain active for a period of time even after the last
session has been closed.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh login.rc
user1234@login.rc.colorado.edu's password:
[user1234@login01 ~]$ logout

$ ssh login.rc
[user1234@login01 ~]$&lt;/pre&gt;
&lt;p&gt;(Note that many arguments to the &lt;code class="docutils literal"&gt;ssh&lt;/code&gt; command are effectively ignored
after the initial connection is established. Notably, if X11 was not
forwarded with &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;-X&lt;/span&gt;&lt;/code&gt; or &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;-Y&lt;/span&gt;&lt;/code&gt; during the first session, you cannot use
the shared connection to forward X11 in a later session. In this case,
use the &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;-S&lt;/span&gt; none&lt;/code&gt; argument to &lt;code class="docutils literal"&gt;ssh&lt;/code&gt; to ignore the existing
connection and explicitly establish a new connection.)&lt;/p&gt;
&lt;/section&gt;
&lt;section id="proxycommand"&gt;
&lt;h2&gt;ProxyCommand&lt;/h2&gt;
&lt;p&gt;But what if you want to get to a host that &lt;em&gt;isn’t&lt;/em&gt; directly available
from your local workstation? The hosts in the &lt;code class="docutils literal"&gt;rc.int.colorado.edu&lt;/code&gt;
domain referenced above may be accessible from a local network
connection; but if you are connecting from elsewhere on the Internet,
you won’t be able to access them directly.&lt;/p&gt;
&lt;p&gt;Except that OpenSSH provides the &lt;code class="docutils literal"&gt;ProxyCommand&lt;/code&gt; option which, when
coupled with the OpenSSH client presumed to be available on the
intermediate server, supports arbitrary proxy connections through to
remotely-accessible servers.&lt;/p&gt;
&lt;pre class="literal-block"&gt;Match host=*.rc.int.colorado.edu
ProxyCommand ssh -W %h:%p login.rc.colorado.edu&lt;/pre&gt;
&lt;p&gt;Even though you can’t connect directly to Janus compute nodes from the
Internet, for example, you &lt;em&gt;can&lt;/em&gt; connect to them from a Research
Computing login node; so this &lt;code class="docutils literal"&gt;ProxyCommand&lt;/code&gt; configuration allows
transparent access to hosts in the internal Research Computing domain.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh janus-compile1.rc
[user1234@janus-compile1 ~]$&lt;/pre&gt;
&lt;p&gt;And it even works with &lt;code class="docutils literal"&gt;scp&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ echo 'Hello, world!' &amp;gt;/tmp/hello.txt
$ scp /tmp/hello.txt janus-compile1.rc:/tmp
hello.txt                                     100%   14     0.0KB/s   00:00

$ ssh janus-compile1.rc cat /tmp/hello.txt
Hello, world!&lt;/pre&gt;
&lt;/section&gt;
&lt;section id="public-key-authentication"&gt;
&lt;h2&gt;Public-key authentication&lt;/h2&gt;
&lt;p&gt;If you tried the example above, chances are that you were met with an
unexpected password prompt that didn’t accept any password that you
used. That’s because most internal Research Computing hosts don’t
actually support interactive authentication, two-factor or otherwise.
Connections from a CURC login node are authorized by the login node; but
a proxied connection must authenticate from your local client.&lt;/p&gt;
&lt;p&gt;The best way to authenticate your local workstation to an internal CURC
host is using public-key authentication.&lt;/p&gt;
&lt;p&gt;If you don’t already have an SSH key, generate one now.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-keygen -t rsa -b 4096 # if you don't already have a key&lt;/pre&gt;
&lt;p&gt;Now we have to copy the (new?) public key to the remote CURC
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.ssh/authorized_keys&lt;/span&gt;&lt;/code&gt; file. RC provides a global home directory, so
copying to any login node will do. Targeting a specific login node is
useful, though: the &lt;code class="docutils literal"&gt;ControlMaster&lt;/code&gt; configuration for
&lt;code class="docutils literal"&gt;login.rc.colorado.edu&lt;/code&gt; tends to confuse &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ssh-copy-id&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh-copy-id login01.rc&lt;/pre&gt;
&lt;p&gt;(The &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;ssh-copy-id&lt;/span&gt;&lt;/code&gt; command doesn’t come with OS X, but theres a
third-party port available &lt;a class="reference external" href="https://github.com/beautifulcode/ssh-copy-id-for-OSX"&gt;on
GitHub&lt;/a&gt;. It’s
usually available on a Linux system, too. Alternatively, you can just
edit &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;~/.ssh/authorized_keys&lt;/span&gt;&lt;/code&gt; manually.)&lt;/p&gt;
&lt;/section&gt;</description><category>technology</category><guid>https://civilfritz.net/anderbubble/elegant-openssh-configuration/</guid><pubDate>Mon, 24 Oct 2016 06:00:00 GMT</pubDate></item><item><title>User-selectable authentication methods using pam_authtok</title><link>https://civilfritz.net/curc/user-selectable-authentication-methods-using-pam-authtok/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;Research Computing is in the process of migrating and expanding our
authentication system to support additional authentication methods.
Historically we’ve supported &lt;a class="reference external" href="https://www.vasco.com/products/management-platforms/"&gt;VASCO
IDENTIKEY&lt;/a&gt;
time-based one-time-password and pin to provide two-factor
authentication.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh user1234@login.rc.colorado.edu
user1234@login.rc.colorado.edu's password: &amp;lt;pin&amp;gt;&amp;lt;otp&amp;gt;

[user1234@login04 ~]$&lt;/pre&gt;
&lt;p&gt;But the VASCO tokens are expensive, get lost or left at home, have a
battery that runs out, and have an internal clock that sometimes falls
out-of-sync with the rest of the authentication system. For these and
other reasons we’re provisioning most new account with
&lt;a class="reference external" href="http://www.duosecurity.com"&gt;Duo&lt;/a&gt;, which provides iOS and Android
apps but also supports SMS and voice calls.&lt;/p&gt;
&lt;p&gt;Unlike VASCO, Duo is only a single authentication factor; so we’ve also
added support for upstream CU-Boulder campus password authentication to
be used in tandem.&lt;/p&gt;
&lt;p&gt;This means that we have to support both authentication mechanisms–VASCO
and password+Duo–simultaneously. A naïve implementation might just stack
these methods together.&lt;/p&gt;
&lt;pre class="literal-block"&gt;auth sufficient pam_radius_auth.so try_first_pass # VASCO authenticates over RADIUS
auth requisite  pam_krb5.so try_first_pass # CU-Boulder campus password
auth required   pam_duo.so&lt;/pre&gt;
&lt;p&gt;This generally works: VASCO authentication is attempted first over
RADIUS. If that fails, authentication is attempted against the campus
password and, if that succeeds, against Duo.&lt;/p&gt;
&lt;p&gt;Unfortunately, this generates spurious authentication failures in VASCO
when using Duo to authenticate: the VASCO method fails, &lt;em&gt;then&lt;/em&gt; Duo
authentication is attempted. Users who have &lt;em&gt;both&lt;/em&gt; VASCO &lt;em&gt;and&lt;/em&gt; Duo
accounts (e.g., all administrators) may generate enough failures to
trigger the break-in mitigation security system, and the VASCO account
may be disabled. This same issue exists if we reverse the authentication
order to try Duo first, &lt;em&gt;then&lt;/em&gt; VASCO: VASCO users might then cause their
campus passwords to become disabled.&lt;/p&gt;
&lt;p&gt;In stead, we need to enable users to explicitly specify which
authentication method they’re using.&lt;/p&gt;
&lt;section id="separate-sssd-domains"&gt;
&lt;h2&gt;Separate sssd domains&lt;/h2&gt;
&lt;p&gt;Our first attempt to provide explicit access to different authentication
methods was to provide multiple redundant
&lt;a class="reference external" href="https://fedorahosted.org/sssd/"&gt;sssd&lt;/a&gt; domains.&lt;/p&gt;
&lt;pre class="literal-block"&gt;[domain/rc]
description = Research Computing
proxy_pam_target = curc-twofactor-vasco


[domain/duo]
description = Research Computing (identikey+duo authentication)
enumerate = false
proxy_pam_target = curc-twofactor-duo&lt;/pre&gt;
&lt;p&gt;This allows users to log in normally using VASCO, while password+Duo
authentication can be requested explicitly by logging in as
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;${user}@duo&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh -l user1234@duo login.rc.colorado.edu&lt;/pre&gt;
&lt;p&gt;This works well enough for the common case of shell access over SSH:
login is permitted and, since both the default &lt;code class="docutils literal"&gt;rc&lt;/code&gt; domain and the
&lt;code class="docutils literal"&gt;duo&lt;/code&gt; alias domain are both backed by the same LDAP directory, NSS
sees no important difference once a user is logged in using either
method.&lt;/p&gt;
&lt;p&gt;This works because POSIX systems store the uid number returned by
&lt;a class="reference external" href="http://www.linux-pam.org"&gt;PAM&lt;/a&gt; and
&lt;a class="reference external" href="https://en.wikipedia.org/wiki/Name_Service_Switch"&gt;NSS&lt;/a&gt;, and
generally resolve the uid number to the username on-demand. Not all
systems work this way, however. For example, when we attempted to use
this authentication mechanism to authenticate to our prototype
&lt;a class="reference external" href="https://jupyterhub.readthedocs.io/"&gt;JupyterHub&lt;/a&gt; (web) service, jobs
dispatched to &lt;a class="reference external" href="http://slurm.schedmd.com"&gt;Slurm&lt;/a&gt; retained the
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;${user}@duo&lt;/span&gt;&lt;/code&gt; username format. Slurm &lt;em&gt;also&lt;/em&gt; uses usernames internally,
and the &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;${user}@duo&lt;/span&gt;&lt;/code&gt; username is not populated within Slurm: only the
base &lt;code class="docutils literal"&gt;${user}&lt;/code&gt; username.&lt;/p&gt;
&lt;p&gt;Expecting that we would continue to find more unexpected side-effects of
this implementation, we started to look for an alternative mechanism
that doesn’t modify the specified username.&lt;/p&gt;
&lt;/section&gt;
&lt;section id="pam-authtok"&gt;
&lt;h2&gt;pam_authtok&lt;/h2&gt;
&lt;p&gt;In general, a user provides two pieces of information during
authentication: a username (which we’ve already determined we shouldn’t
modify) and an authentication token or password. We &lt;em&gt;should&lt;/em&gt; be able to
detect, for example, a prefix to that authentication token to determine
what authentication method to use.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ssh user1234@login.rc.colorado.edu
user1234@login.rc.colorado.edu's password: duo:&amp;lt;password&amp;gt;

[user1234@login04 ~]$&lt;/pre&gt;
&lt;p&gt;But we found no such pam module that would allow us to manipulate the
authentication token… so &lt;a class="reference external" href="https://github.com/ResearchComputing/pam_authtok"&gt;we wrote
one&lt;/a&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;auth [success=1 default=ignore] pam_authtok.so prefix=duo: strip prompt=password:

auth [success=done new_authtok_reqd=done default=die] pam_radius_auth.so try_first_pass

auth requisite pam_krb5.so try_first_pass
auth [success=done new_authtok_reqd=done default=die] pam_duo.so&lt;/pre&gt;
&lt;p&gt;Now our PAM stack authenticates against VASCO by default; but, if the
user provides a password with a &lt;code class="docutils literal"&gt;duo:&lt;/code&gt; prefix, authentication skips
VASCO and authenticates the supplied password, followed by Duo push. Our
actual production PAM stack is a bit more complicated, supporting a
redundant &lt;code class="docutils literal"&gt;vasco:&lt;/code&gt; prefix as well, for forward-compatibility should we
change the default authentication mechanism in the future. We can also
extend this mechanism to add arbitrary additional authentication
mechanisms in the future.&lt;/p&gt;
&lt;/section&gt;</description><category>hpc</category><category>technology</category><guid>https://civilfritz.net/curc/user-selectable-authentication-methods-using-pam-authtok/</guid><pubDate>Mon, 16 May 2016 11:37:00 GMT</pubDate></item><item><title>Two software design methods</title><link>https://civilfritz.net/anderbubble/two-ways-software-design/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;blockquote&gt;
&lt;p&gt;There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies and the
other way is to make it so complicated that there are no obvious
deficiencies.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;p&gt;–C.A.R. Hoare, The 1980 ACM Turing Award Lecture&lt;/p&gt;</description><category>philosophy</category><category>technology</category><guid>https://civilfritz.net/anderbubble/two-ways-software-design/</guid><pubDate>Sun, 13 Mar 2016 02:07:49 GMT</pubDate></item><item><title>Why hasn’t my (Slurm) job started?</title><link>https://civilfritz.net/curc/why-job-not-starting/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;A job can be blocked from being scheduled for the following reasons:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;There are insufficient resources available to start the job, either
due to active reservations, other running jobs, component status, or
system/partition size.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Other higher-priority jobs are waiting to run, and the job’s time
limit prevents it from being backfilled.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The job’s time limit exceeds an upcoming reservation (e.g., scheduled
preventative maintenance)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The job is associated with an account that has reached or exceeded
its &lt;code class="docutils literal"&gt;GrpCPUMins&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;Display a list of queued jobs sorted in the order considered by the
scheduler using &lt;code class="docutils literal"&gt;squeue&lt;/code&gt;.&lt;/p&gt;
&lt;pre class="literal-block"&gt;squeue --sort=-p,i --priority --format '%7T %7A %10a %5D %.12L %10P %10S %20r'&lt;/pre&gt;
&lt;section id="reason-codes"&gt;
&lt;h2&gt;Reason codes&lt;/h2&gt;
&lt;p&gt;A list of reason codes &lt;a class="footnote-reference brackets" href="https://civilfritz.net/curc/why-job-not-starting/#footnote-1" id="footnote-reference-1" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;1&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; is available as part of the &lt;code class="docutils literal"&gt;squeue&lt;/code&gt;
manpage. &lt;a class="footnote-reference brackets" href="https://civilfritz.net/curc/why-job-not-starting/#footnote-2" id="footnote-reference-2" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;2&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Common reason codes:&lt;/p&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;ReqNodeNotAvail&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AssocGrpJobsLimit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AssocGrpCPUMinsLimit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;resources&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;QOSResourceLimit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Priority&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;AssociationJobLimit&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;JobHeldAdmin&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;
&lt;section id="how-are-jobs-prioritized"&gt;
&lt;h2&gt;How are jobs prioritized?&lt;/h2&gt;
&lt;pre class="literal-block"&gt;PriorityType=priority/multifactor&lt;/pre&gt;
&lt;p&gt;Slurm prioritizes jobs using the multifactor plugin &lt;a class="footnote-reference brackets" href="https://civilfritz.net/curc/why-job-not-starting/#footnote-3" id="footnote-reference-3" role="doc-noteref"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;3&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/a&gt; based on a
weighted summation of age, size, QOS, and fair-share factors.&lt;/p&gt;
&lt;p&gt;Use the &lt;code class="docutils literal"&gt;sprio&lt;/code&gt; command to inspect each weighted priority value
separately.&lt;/p&gt;
&lt;pre class="literal-block"&gt;sprio [-j jobid]&lt;/pre&gt;
&lt;section id="age-factor"&gt;
&lt;h3&gt;Age Factor&lt;/h3&gt;
&lt;pre class="literal-block"&gt;PriorityWeightAge=1000
PriorityMaxAge=14-0&lt;/pre&gt;
&lt;p&gt;The age factor represents the length of time a job has been sitting in
the queue and eligible to run. In general, the longer a job waits in the
queue, the larger its age factor grows. However, the age factor for a
dependent job will not change while it waits for the job it depends on
to complete. Also, the age factor will not change when scheduling is
withheld for a job whose node or time limits exceed the cluster’s
current limits.&lt;/p&gt;
&lt;p&gt;The weighted age priority is calculated as
PriorityWeightAge[1000]*[0..1] as the job age approaches
PriorityMaxAge[14-0], or 14 days. As such, an hour of wait-time is
equivalent to ~2.976 priority.&lt;/p&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="job-size-factor"&gt;
&lt;h2&gt;Job Size Factor&lt;/h2&gt;
&lt;pre class="literal-block"&gt;PriorityWeightJobSize=2000&lt;/pre&gt;
&lt;p&gt;The job size factor correlates to the number of nodes or CPUs the job
has requested. The weighted job size priority is calculated as
PriorityWeightJobSize[2000]*[0..1] as the job size approaches the entire
size of the system. A job that requests all the nodes on the machine
will get a job size factor of 1.0, with an effective weighted job size
priority of 28 wait-days (except that job age priority is capped at 14
days).&lt;/p&gt;
&lt;/section&gt;
&lt;section id="quality-of-service-qos-factor"&gt;
&lt;h2&gt;Quality of Service (QOS) Factor&lt;/h2&gt;
&lt;pre class="literal-block"&gt;PriorityWeightQOS=1500&lt;/pre&gt;
&lt;p&gt;Each QOS can be assigned a priority: the larger the number, the greater
the job priority will be for jobs that request this QOS. This priority
value is then normalized to the highest priority of all the QOS’s to
become the QOS factor. As such, the weighted QOS priority is calculated
as PriorityWeightQOS[1500]*QosPriority[0..1000]/MAX(QOSPriority[1000]).&lt;/p&gt;
&lt;pre class="literal-block"&gt;QOS          Priority  Weighted priority  Wait-days equivalent
-----------  --------  -----------------  --------------------
admin            1000               1500                  21.0
janus               0                  0                   0.0
janus-debug       400                600                   8.4
janus-long        200                300                   4.2&lt;/pre&gt;
&lt;/section&gt;
&lt;section id="fair-share-factor"&gt;
&lt;h2&gt;Fair-share factor&lt;/h2&gt;
&lt;pre class="literal-block"&gt;PriorityWeightFairshare=2000
PriorityDecayHalfLife=14-0&lt;/pre&gt;
&lt;p&gt;The fair-share factor serves to prioritize queued jobs such that those
jobs charging accounts that are under-serviced are scheduled first,
while jobs charging accounts that are over-serviced are scheduled when
the machine would otherwise go idle.&lt;/p&gt;
&lt;p&gt;The simplified formula for calculating the fair-share factor for usage
that spans multiple time periods and subject to a half-life decay is:&lt;/p&gt;
&lt;pre class="literal-block"&gt;F = 2**((-NormalizedUsage)/NormalizedShares))&lt;/pre&gt;
&lt;p&gt;Each account is granted an equal share, and historic records of use
decay with a half-life of 14 days. As such, the weighted fair-share
priority is calculated as PriorityWeightFairshare[2000]*[0..1] depending
on the account’s historic use of the system relative to its allocated
share.&lt;/p&gt;
&lt;p&gt;A fair-share factor of 0.5 indicates that the account’s jobs have used
exactly the portion of the machine that they have been allocated and
assigns the job additional 1000 priority (the equivalent of 2976
wait-hours). A fair-share factor of above 0.5 indicates that the
account’s jobs have consumed less than their allocated share and assigns
the job up to 2000 additional priority, for an effective relative 14
wait-day priority boost. A fair-share factor below 0.5 indicates that
the account’s jobs have consumed more than their allocated share of the
computing resources, and the added priority will approach 0 dependent on
the account’s history relevant to its equal share of the system, for an
effective relative 14-day priority penalty.&lt;/p&gt;
&lt;aside class="footnote-list brackets"&gt;
&lt;aside class="footnote brackets" id="footnote-1" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://civilfritz.net/curc/why-job-not-starting/#footnote-reference-1"&gt;1&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://slurm.schedmd.com/squeue.html#lbAF"&gt;http://slurm.schedmd.com/squeue.html#lbAF&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="footnote-2" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://civilfritz.net/curc/why-job-not-starting/#footnote-reference-2"&gt;2&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://slurm.schedmd.com/squeue.html"&gt;http://slurm.schedmd.com/squeue.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;aside class="footnote brackets" id="footnote-3" role="doc-footnote"&gt;
&lt;span class="label"&gt;&lt;span class="fn-bracket"&gt;[&lt;/span&gt;&lt;a role="doc-backlink" href="https://civilfritz.net/curc/why-job-not-starting/#footnote-reference-3"&gt;3&lt;/a&gt;&lt;span class="fn-bracket"&gt;]&lt;/span&gt;&lt;/span&gt;
&lt;p&gt;&lt;a class="reference external" href="http://slurm.schedmd.com/priority_multifactor.html"&gt;http://slurm.schedmd.com/priority_multifactor.html&lt;/a&gt;&lt;/p&gt;
&lt;/aside&gt;
&lt;/aside&gt;
&lt;/section&gt;</description><category>technology</category><guid>https://civilfritz.net/curc/why-job-not-starting/</guid><pubDate>Mon, 22 Feb 2016 03:50:24 GMT</pubDate></item><item><title>The curc::sysconfig::scinet Puppet module</title><link>https://civilfritz.net/curc/curc-scinet-module/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;I’ve been working on a new module, &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;curc::sysconfig::scinet&lt;/span&gt;&lt;/code&gt;, which
will generally do the Right Thing™ when configuring a host on the CURC
science network, with as little configuration as possible.&lt;/p&gt;
&lt;p&gt;Let’s look at some examples.&lt;/p&gt;
&lt;section id="login-nodes"&gt;
&lt;h2&gt;login nodes&lt;/h2&gt;
&lt;pre class="literal-block"&gt;class { 'curc::sysconfig::scinet':
  location =&amp;gt; 'comp',
  mgt_if   =&amp;gt; 'eth0',
  dmz_if   =&amp;gt; 'eth1',
  notify   =&amp;gt; Class['network'],
}&lt;/pre&gt;
&lt;p&gt;This is the config used on a new-style login node like &lt;code class="docutils literal"&gt;login05&lt;/code&gt; and
&lt;code class="docutils literal"&gt;login07&lt;/code&gt;. (What makes them new-style? Mostly just that they’ve had
their interfaces cleaned up to use &lt;code class="docutils literal"&gt;eth0&lt;/code&gt; for “mgt” and &lt;code class="docutils literal"&gt;eth1&lt;/code&gt; for
“dmz”.)&lt;/p&gt;
&lt;p&gt;Here’s the routing table that this produced on &lt;code class="docutils literal"&gt;login07&lt;/code&gt;:&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list
10.225.160.0/24 dev eth0  proto kernel  scope link  src 10.225.160.32
10.225.128.0/24 via 10.225.160.1 dev eth0
192.12.246.0/24 dev eth1  proto kernel  scope link  src 192.12.246.39
10.225.0.0/20 via 10.225.160.1 dev eth0
10.225.0.0/16 via 10.225.160.1 dev eth0  metric 110
10.128.0.0/12 via 10.225.160.1 dev eth0  metric 110
default via 192.12.246.1 dev eth1  metric 100
default via 10.225.160.1 dev eth0  metric 110&lt;/pre&gt;
&lt;p&gt;Connections to “mgt” subnets use the “mgt” interface &lt;code class="docutils literal"&gt;eth0&lt;/code&gt;, either by
the link-local route or the static routes via &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;comp-mgt-gw&lt;/span&gt;&lt;/code&gt;
(&lt;code class="docutils literal"&gt;10.225.160.1&lt;/code&gt;). Connections to the “general” subnet (a.k.a. “vlan
2049”), as well as the rest of the science network (“data” and “svc”
networks) also use &lt;code class="docutils literal"&gt;eth0&lt;/code&gt; by static route. The default &lt;code class="docutils literal"&gt;eth0&lt;/code&gt; route
is configured by DHCP, but the interface has a default metric of 110, so
it doesn’t conflict with or supersede &lt;code class="docutils literal"&gt;eth1&lt;/code&gt;’s default route, which
is configured with a lower metric of &lt;code class="docutils literal"&gt;100&lt;/code&gt;.&lt;/p&gt;
&lt;p&gt;Speaking of &lt;code class="docutils literal"&gt;eth1&lt;/code&gt;, the “dmz” interface is configured statically,
using information retrieved from DNS by Puppet.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ cat /etc/sysconfig/network-scripts/ifcfg-eth1
TYPE=Ethernet
DEVICE=eth1
BOOTPROTO=static
HWADDR=00:50:56:88:2E:36
ONBOOT=yes
IPADDR=192.12.246.39
NETMASK=255.255.255.0
GATEWAY=192.12.246.1
METRIC=100
IPV4_ROUTE_METRIC=100&lt;/pre&gt;
&lt;p&gt;Usually the routing priority of the “dmz” interface would mean that
inbound connections to the “mgt” interface from outside of the science
network would be blocked when the “dmz”-bound response is filtered by
&lt;code class="docutils literal"&gt;rp_filter&lt;/code&gt;; but &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;curc::sysconfig::scinet&lt;/span&gt;&lt;/code&gt; also configures routing
policy for &lt;code class="docutils literal"&gt;eth0&lt;/code&gt;, so traffic on that interface &lt;em&gt;always&lt;/em&gt; returns from
that interface.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip rule show | grep 'lookup 1'
32764:  from 10.225.160.32 lookup 1
32765:  from all iif eth0 lookup 1

$ ip route list table 1
default via 10.225.160.1 dev eth0&lt;/pre&gt;
&lt;p&gt;This allows me to ping &lt;code class="docutils literal"&gt;login07.rc.int.colorado.edu&lt;/code&gt; from my office
workstation.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ping -c 1 login07.rc.int.colorado.edu
PING login07.rc.int.colorado.edu (10.225.160.32) 56(84) bytes of data.
64 bytes from 10.225.160.32: icmp_seq=1 ttl=62 time=0.507 ms

--- login07.rc.int.colorado.edu ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 1ms
rtt min/avg/max/mdev = 0.507/0.507/0.507/0.000 ms&lt;/pre&gt;
&lt;p&gt;Because the default route for &lt;code class="docutils literal"&gt;eth0&lt;/code&gt; is actually configured, outbound
routing from &lt;code class="docutils literal"&gt;login07&lt;/code&gt; is resilient to failure of the “dmz” link.&lt;/p&gt;
&lt;pre class="literal-block"&gt;# ip route list | grep -v eth1
10.225.160.0/24 dev eth0  proto kernel  scope link  src 10.225.160.32
10.225.128.0/24 via 10.225.160.1 dev eth0
10.225.0.0/20 via 10.225.160.1 dev eth0
10.225.0.0/16 via 10.225.160.1 dev eth0  metric 110
10.128.0.0/12 via 10.225.160.1 dev eth0  metric 110
default via 10.225.160.1 dev eth0  metric 110&lt;/pre&gt;
&lt;p&gt;Traffic destined to leave the science network simply proceeds to the
next preferred (and, in this case, only remaining) default route,
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;comp-mgt-gw&lt;/span&gt;&lt;/code&gt;.&lt;/p&gt;
&lt;section id="dhcp-dns-and-the-fqdn"&gt;
&lt;h3&gt;DHCP, DNS, and the FQDN&lt;/h3&gt;
&lt;p&gt;Tangentially, it’s important to note that the DHCP configuration of
&lt;code class="docutils literal"&gt;eth0&lt;/code&gt; will tend to re-wite &lt;code class="docutils literal"&gt;/etc/resolv.conf&lt;/code&gt; and the &lt;code class="docutils literal"&gt;search&lt;/code&gt;
path it defines, with the effect of causing the FQDN of the host to
change to &lt;code class="docutils literal"&gt;login07.rc.int.colorado.edu&lt;/code&gt;. Because login nodes are
logically (and historically) external hosts, not internal hosts, they
should prefer their external identity to their internal identity. As
such, we override the domain search path on login nodes to cause them to
discover their &lt;code class="docutils literal"&gt;rc.colorado.edu&lt;/code&gt; FQDN’s first.&lt;/p&gt;
&lt;pre class="literal-block"&gt;# cat /etc/dhcp/dhclient-eth0.conf
supersede domain-search "rc.colorado.edu", "rc.int.colorado.edu";&lt;/pre&gt;
&lt;/section&gt;
&lt;/section&gt;
&lt;section id="petalibrary-repl"&gt;
&lt;h2&gt;PetaLibrary/repl&lt;/h2&gt;
&lt;p&gt;The Petibrary/repl GPFS NSD nodes &lt;code class="docutils literal"&gt;replnsd{01,02}&lt;/code&gt; are still in the
“COMP” datacenter, but only attach to “mgt” and “data” networks.&lt;/p&gt;
&lt;pre class="literal-block"&gt;class { 'curc::sysconfig::scinet':
  location         =&amp;gt; 'comp',
  mgt_if           =&amp;gt; 'eno2',
  data_if          =&amp;gt; 'enp17s0f0',
  other_data_rules =&amp;gt; [ 'from 10.225.176.61 table 2',
                        'from 10.225.176.62 table 2',
                        ],
  notify           =&amp;gt; Class['network_manager::service'],
}&lt;/pre&gt;
&lt;p&gt;This config produces the following routing table on &lt;code class="docutils literal"&gt;replnsd01&lt;/code&gt;…&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list
default via 10.225.160.1 dev eno2  proto static  metric 110
default via 10.225.176.1 dev enp17s0f0  proto static  metric 120
10.128.0.0/12 via 10.225.160.1 dev eno2  metric 110
10.128.0.0/12 via 10.225.176.1 dev enp17s0f0  metric 120
10.225.0.0/20 via 10.225.160.1 dev eno2
10.225.0.0/16 via 10.225.160.1 dev eno2  metric 110
10.225.0.0/16 via 10.225.176.1 dev enp17s0f0  metric 120
10.225.64.0/20 via 10.225.176.1 dev enp17s0f0
10.225.128.0/24 via 10.225.160.1 dev eno2
10.225.144.0/24 via 10.225.176.1 dev enp17s0f0
10.225.160.0/24 dev eno2  proto kernel  scope link  src 10.225.160.59  metric 110
10.225.160.49 via 10.225.176.1 dev enp17s0f0  proto dhcp  metric 120
10.225.176.0/24 dev enp17s0f0  proto kernel  scope link  src 10.225.176.59  metric 120&lt;/pre&gt;
&lt;p&gt;…with the expected interface-consistent policy-targeted routing tables.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list table 1
default via 10.225.160.1 dev eno2

$ ip route list table 2
default via 10.225.176.1 dev enp17s0f0&lt;/pre&gt;
&lt;p&gt;Static routes for “mgt” and “data” subnets are defined for their
respective interfaces. As on the login nodes above, default routes are
specified for &lt;em&gt;both&lt;/em&gt; interfaces as well, with the lower-metric “mgt”
interface &lt;code class="docutils literal"&gt;eno2&lt;/code&gt; being preferred. (This is configurable using the
&lt;code class="docutils literal"&gt;mgt_metric&lt;/code&gt; and &lt;code class="docutils literal"&gt;data_metric&lt;/code&gt; parameters.)&lt;/p&gt;
&lt;p&gt;Perhaps the most notable aspect of the PetaLibrary/repl network config
is the provisioning of the GPFS CES floating IP addresses
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;10.225.176.{61,62}&lt;/span&gt;&lt;/code&gt;. These addresses are added to the &lt;code class="docutils literal"&gt;enp17s0f0&lt;/code&gt;
interface dynamically by GPFS, and are not defined with
&lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;curc::sysconfig::scinet&lt;/span&gt;&lt;/code&gt;; but the config must reference these
addresses to implement proper interface-consistent policy-targeted
routing tables. Though version of Puppet deployed at CURC lacks the
semantics to infer these rules from a more semantic &lt;code class="docutils literal"&gt;data_ip&lt;/code&gt;
parameter; so the &lt;code class="docutils literal"&gt;other_data_rules&lt;/code&gt; parameter is used in stead.&lt;/p&gt;
&lt;pre class="literal-block"&gt;other_data_rules =&amp;gt; [ 'from 10.225.176.61 table 2',
                      'from 10.225.176.62 table 2',
                      ],&lt;/pre&gt;
&lt;/section&gt;
&lt;section id="blanca-ics-login-node"&gt;
&lt;h2&gt;Blanca/ICS login node&lt;/h2&gt;
&lt;p&gt;porting the blanca login node would be great because it’s got a “dmz”,
“mgt”, &lt;em&gt;and&lt;/em&gt; “data” interface; so it would exercise the full gamut of
features of the module.&lt;/p&gt;
&lt;/section&gt;</description><category>technology</category><guid>https://civilfritz.net/curc/curc-scinet-module/</guid><pubDate>Thu, 14 Jan 2016 13:39:25 GMT</pubDate></item><item><title>Linux policy-based routing</title><link>https://civilfritz.net/anderbubble/linux-policy-routing/</link><dc:creator>Jonathon Anderson</dc:creator><description>&lt;p&gt;How could Linux policy routing be so poorly documented? It’s so useful,
so essential in a multi-homed environment… I’d almost advocate for its
inclusion as default behavior.&lt;/p&gt;
&lt;p&gt;What is this, you ask? To understand, we have to start with what Linux
does by default in a multi-homed environment. So let’s look at one.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip addr
[...]
4: eth2: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 1500 qdisc mq state UP qlen 1000
    link/ether 78:2b:cb:66:75:c0 brd ff:ff:ff:ff:ff:ff
    inet 10.225.128.80/24 brd 10.225.128.255 scope global eth2
    inet6 fe80::7a2b:cbff:fe66:75c0/64 scope link
       valid_lft forever preferred_lft forever
[...]
6: eth5: &amp;lt;BROADCAST,MULTICAST,UP,LOWER_UP&amp;gt; mtu 9000 qdisc mq state UP qlen 1000
    link/ether e4:1d:2d:14:93:60 brd ff:ff:ff:ff:ff:ff
    inet 10.225.144.80/24 brd 10.225.144.255 scope global eth5
    inet6 fe80::e61d:2dff:fe14:9360/64 scope link
       valid_lft forever preferred_lft forever&lt;/pre&gt;
&lt;p&gt;So we have two interfaces, &lt;code class="docutils literal"&gt;eth2&lt;/code&gt; and &lt;code class="docutils literal"&gt;eth5&lt;/code&gt;. They’re on separate
subnets, &lt;code class="docutils literal"&gt;10.225.128.0/24&lt;/code&gt; and &lt;code class="docutils literal"&gt;10.225.144.0/24&lt;/code&gt; respectively. In
our environment, we refer to these as “spsc-mgt” and “spsc-data.” The
practical circumstance is that one of these networks is faster than the
other, and we would like bulk data transfer to use the faster
“spsc-data” network.&lt;/p&gt;
&lt;p&gt;If the client system also has an “spsc-data” network, everything is
fine. The client addresses the system using its data address, and the
link-local route prefers the data network.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list 10.225.144.0/24
10.225.144.0/24 dev eth5  proto kernel  scope link  src 10.225.144.80&lt;/pre&gt;
&lt;p&gt;Our network environment covers a number of networks, however. So let’s
say our client lives in another data network–“comp-data.” Infrastructure
routing directs the traffic to the -data interface of our server
correctly, but the default route on the server prefers the -mgt
interface.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list | grep ^default
default via 10.225.128.1 dev eth2&lt;/pre&gt;
&lt;p&gt;For this simple case we have two options. We can either change our
default route to prefer the -data interface, or we can enumerate
intended -data client networks with static routes using the data
interface. Since changing the default route simply leaves us in the same
situation for the -mgt network, let’s define some static routes.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route add 10.225.64.0/20 via 10.225.144.1 dev eth5
$ ip route add 10.225.176.0/24 via 10.225.144.1 dev eth5&lt;/pre&gt;
&lt;p&gt;So long as we can enumerate the networks that should always use the
-data interface of our server to communicate, this basically works. But
what if we want to support clients that don’t themselves have separate
-mgt and -data networks? What if we have a single client–perhaps with
only a -mgt network connection–that should be able to communicate
individually with the server’s -mgt interface and its -data interface.
In the most pathological case, what if we have a host that is only
connected to the &lt;code class="docutils literal"&gt;&lt;span class="pre"&gt;spsc-mgt&lt;/span&gt;&lt;/code&gt; (&lt;code class="docutils literal"&gt;10.225.128.0/24&lt;/code&gt;) interface, but we
want that client to be able to communicate with the server’s -data
interface. In this case, the link-local route will &lt;em&gt;always&lt;/em&gt; prefer the
-mgt network for the return path.&lt;/p&gt;
&lt;section id="policy-based-routing"&gt;
&lt;h2&gt;Policy-based routing&lt;/h2&gt;
&lt;p&gt;The best case would be to have the server select an outbound route based
not on a static configuration, but in response to the incoming path of
the traffic. This is the feature enabled by policy-based routing.&lt;/p&gt;
&lt;p&gt;Linux policy routing allows us to define distinct and isolated routing
tables, and then select the appropriate routing table based on the
traffic context. In this situation, we have three different routing
contexts to consider. The first of these are the routes to use when the
server initiates communication.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list table main
10.225.128.0/24 dev eth2  proto kernel  scope link  src 10.225.128.80
10.225.144.0/24 dev eth5  proto kernel  scope link  src 10.225.144.80
10.225.64.0/20 via 10.225.144.1 dev eth5
10.225.176.0/24 via 10.225.144.1 dev eth5
default via 10.225.128.1 dev eth2&lt;/pre&gt;
&lt;p&gt;A separate routing table defines routes to use when responding to
traffic from the -mgt interface.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list table 1
default via 10.225.128.1 dev eth2&lt;/pre&gt;
&lt;p&gt;The last routing table defines routes to use when responding to traffic
from the -data interface.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip route list table 2
default via 10.225.144.1 dev eth5&lt;/pre&gt;
&lt;p&gt;With these separate routing tables defined, the last step is to define
the rules that select the correct routing table.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ ip rule list
0:  from all lookup local
32762:  from 10.225.144.80 lookup 2
32763:  from all iif eth5 lookup 2
32764:  from 10.225.128.80 lookup 1
32765:  from all iif eth2 lookup 1
32766:  from all lookup main
32767:  from all lookup default&lt;/pre&gt;
&lt;p&gt;Despite a lack of documentation, all of these rules may be codified in
Red Hat “sysconfig”-style “network-scripts” using interface-specific
&lt;code class="docutils literal"&gt;route-&lt;/code&gt; and &lt;code class="docutils literal"&gt;rule-&lt;/code&gt; files.&lt;/p&gt;
&lt;pre class="literal-block"&gt;$ cat /etc/sysconfig/network-scripts/route-eth2
default via 10.225.128.1 dev eth2
default via 10.225.128.1 dev eth2 table 1

$ cat /etc/sysconfig/network-scripts/route-eth5
10.225.64.0/20 via 10.225.144.1 dev eth5
10.225.176.0/24 via 10.225.144.1 dev eth5
default via 10.225.144.1 dev eth5 table 2

$ cat /etc/sysconfig/network-scripts/rule-eth2
iif eth2 table 1
from 10.225.128.80 table 1

$ cat /etc/sysconfig/network-scripts/rule-eth5
iif eth5 table 2
from 10.225.144.80 table 2&lt;/pre&gt;
&lt;!--  --&gt;
&lt;blockquote&gt;
&lt;p&gt;Changes to the RPDB made with these commands do not become active
immediately. It is assumed that after a script finishes a batch of
updates, it flushes the routing cache with ip route flush cache.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;/section&gt;
&lt;section id="references"&gt;
&lt;h2&gt;References&lt;/h2&gt;
&lt;ul class="simple"&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://linux-ip.net/html/routing-rpdb.html"&gt;http://linux-ip.net/html/routing-rpdb.html&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://access.redhat.com/solutions/19596"&gt;https://access.redhat.com/solutions/19596&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="https://access.redhat.com/solutions/288823"&gt;https://access.redhat.com/solutions/288823&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;a class="reference external" href="http://linux-ip.net/gl/ip-cref/"&gt;http://linux-ip.net/gl/ip-cref/&lt;/a&gt;&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;/section&gt;</description><category>technology</category><guid>https://civilfritz.net/anderbubble/linux-policy-routing/</guid><pubDate>Thu, 07 Jan 2016 12:52:34 GMT</pubDate></item></channel></rss>