Secure OpenSSH defaults

This is one part in a series on OpenSSH client configuration. Also read Elegant OpenSSH configuration and The SSH agent.

It’s good practice to harden our ssh client with some secure “defaults”. Starting your configuration file with the following directives will apply the directives to all (*) hosts.

(These are listed as multiple Host * stanzas, but they can be combined into a single stanza in your actual configuration file.)

If you prefer, follow along with an example of a complete ~/.ssh/config file.

Require secure algorithms

OpenSSH supports many encryption and authentication algorithms, but some of those algorithms are known to be weak to cryptographic attack. The Mozilla project publishes a list of recommended algorithms that exclude algorithms that are known to be insecure.

Host *
HostKeyAlgorithms ssh-ed25519-cert-v01@openssh.com,ssh-rsa-cert-v01@openssh.com,ssh-ed25519,ssh-rsa,ecdsa-sha2-nistp521-cert-v01@openssh.com,ecdsa-sha2-nistp384-cert-v01@openssh.com,ecdsa-sha2-nistp256-cert-v01@openssh.com,ecdsa-sha2-nistp521,ecdsa-sha2-nistp384,ecdsa-sha2-nistp256
Ciphers chacha20-poly1305@openssh.com,aes256-gcm@openssh.com,aes128-gcm@openssh.com,aes256-ctr,aes192-ctr,aes128-ctr
MACs hmac-sha2-512-etm@openssh.com,hmac-sha2-256-etm@openssh.com,umac-128-etm@openssh.com,hmac-sha2-512,hmac-sha2-256,umac-128@openssh.com
KexAlgorithms curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1

(More information on the the available encryption and authentication algorithms, and how a recommended set is derived, is available in this fantastic blog post, “Secure secure shell.”)

Hash your known_hosts file

Every time you connect to an SSH server, your client caches a copy of the remote server’s host key in a ~/.ssh/known_hosts file. If your ssh client is ever compromised, this list can expose the remote servers to attack using your compromised credentials. Be a good citizen and hash your known hosts file.

Host *
HashKnownHosts yes

(Hash any existing entries in your ~/.ssh/known_hosts file by running ssh-keygen -H. Don’t forget to remove the backup ~/.ssh/known_hosts.old.)

$ ssh-keygen -H
$ rm -i ~/.ssh/known_hosts.old

No roaming

Finally, disable the experimental “roaming” feature to mitigate exposure to a pair of potential vulnerabilities, CVE-2016-0777 and CVE-2016-0778.

Host *
UseRoaming no

Dealing with insecure servers

Some servers are old enough that they may not support the newer, more secure algorithms listed. In the RC environment, for example, the login and other Internet-accessible systems provide relatively modern ssh algorithms; but the host in the rc.int.colorado.edu domain may not.

To support connection to older hosts while requiring newer algorithms by default, override these settings earlier in the configuration file.

# Internal RC hosts are running an old version of OpenSSH
Match host=*.rc.int.colorado.edu
MACs hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96

Elegant OpenSSH configuration

This is one part in a series on OpenSSH client configuration. Also read Secure OpenSSH defaults and The SSH agent.

The OpenSSH client is very robust, verify flexible, and very configurable. Many times I see people struggling to remember server-specific ssh flags or arcane, manual multi-hop procedures. I even see entire scripts written to automate the process.

But the vast majority of what you might want ssh to do can be abstracted away with some configuration in your ~/.ssh/config file.

All (or, at least, most) of these configuration directives are fully documented in the ssh_config manpage.

If you prefer, follow along with an example of a complete ~/.ssh/config file.

HostName

One of the first annoyances people have–and one of the first things people try to fix–when using a command-line ssh client is having to type in long hostnames. For example, the Research Computing login service is available at login.rc.colorado.edu.

$ ssh login.rc.colorado.edu

This particular name isn’t too bad; but coupled with usernames and especially when used as part of an scp, these fully-qualified domain names can become cumbersome.

$ scp -r /path/to/src/ user1234@login.rc.colorado.edu:dest/

OpenSSH supports host aliases through pattern-matching in Host directives.

Host login*.rc
HostName %h.colorado.edu

Host *.rc
HostName %h.int.colorado.edu

In this example, %h is substituted with the name specified on the command-line. With a configuration like this in place, connections to login.rc are directed to the full name login.rc.colorado.edu.

$ scp -r /path/to/src/ user1234@login.rc:dest/

Failing that, other references to hosts with a .rc suffix are directed to the internal Research Computing domain. (We’ll use these later.)

(The .rc domain segment could be moved from the Host pattern to the HostName value; but leaving it in the alias helps to distinguish the Research Computing login nodes from other login nodes that you may have access to. You can use arbitrary aliases in the Host directive, too; but then the %h substitution isn’t useful: you have to enumerate each targeted host.)

User

Unless you happen to use the same username on your local workstation as you have on the remove server, you likely specify a username using either the @ syntax or -l argument to the ssh command.

$ ssh user1234@login.rc

As with specifying a fully-qualified domain name, tracking and specifying a different username for each remote host can become burdensome, especially during an scp operation. Record the correct username in your ~/.ssh/config file in stead.

Match host=*.rc.colorado.edu,*.rc.int.colorado.edu
User user1234

Now all connections to Research Computing hosts use the specified username by default, without it having to be specified on the command-line.

$ scp -r /path/to/src/ login.rc:dest/

Note that we’re using a Match directive here, rather than a Host directive. The host= argument to Match matches against the derived hostname, so it reflects the real hostname as determined using the previous Host directives. (Make sure the correct HostName is established earlier in the configuration, though.)

ControlMaster

Even if the actual command is simple to type, authenticating to the host may be require manual intervention. The Research Computing login nodes, for example, require two-factor authentication using a password or pin coupled with a one-time VASCO password or Duo credential. If you want to open multiple connections–or, again, copy files using scp–having to authenticate with multiple factors quickly becomes tedious. (Even having to type in a password at all may be unnecessary; but we’ll assume, as is the case with the Research Computing login example, that you can’t use public-key authentication.)

OpenSSH supports sharing a single network connection for multiple ssh sessions.

Match host=login.rc.colorado.edu
ControlMaster auto
ControlPath ~/.ssh/.socket_%h_%p_%r
ControlPersist 4h

With ControlMaster and ControlPath defined, the first ssh connection authenticates and establishes a session normally; but future connections join the active connection, bypassing the need to re-authenticate. The optional ControlPersist option causes this connection to remain active for a period of time even after the last session has been closed.

$ ssh login.rc
user1234@login.rc.colorado.edu's password:
[user1234@login01 ~]$ logout

$ ssh login.rc
[user1234@login01 ~]$

(Note that many arguments to the ssh command are effectively ignored after the initial connection is established. Notably, if X11 was not forwarded with -X or -Y during the first session, you cannot use the shared connection to forward X11 in a later session. In this case, use the -S none argument to ssh to ignore the existing connection and explicitly establish a new connection.)

ProxyCommand

But what if you want to get to a host that isn’t directly available from your local workstation? The hosts in the rc.int.colorado.edu domain referenced above may be accessible from a local network connection; but if you are connecting from elsewhere on the Internet, you won’t be able to access them directly.

Except that OpenSSH provides the ProxyCommand option which, when coupled with the OpenSSH client presumed to be available on the intermediate server, supports arbitrary proxy connections through to remotely-accessible servers.

Match host=*.rc.int.colorado.edu
ProxyCommand ssh -W %h:%p login.rc.colorado.edu

Even though you can’t connect directly to Janus compute nodes from the Internet, for example, you can connect to them from a Research Computing login node; so this ProxyCommand configuration allows transparent access to hosts in the internal Research Computing domain.

$ ssh janus-compile1.rc
[user1234@janus-compile1 ~]$

And it even works with scp.

$ echo 'Hello, world!' >/tmp/hello.txt
$ scp /tmp/hello.txt janus-compile1.rc:/tmp
hello.txt                                     100%   14     0.0KB/s   00:00

$ ssh janus-compile1.rc cat /tmp/hello.txt
Hello, world!

Public-key authentication

If you tried the example above, chances are that you were met with an unexpected password prompt that didn’t accept any password that you used. That’s because most internal Research Computing hosts don’t actually support interactive authentication, two-factor or otherwise. Connections from a CURC login node are authorized by the login node; but a proxied connection must authenticate from your local client.

The best way to authenticate your local workstation to an internal CURC host is using public-key authentication.

If you don’t already have an SSH key, generate one now.

$ ssh-keygen -t rsa -b 4096 # if you don't already have a key

Now we have to copy the (new?) public key to the remote CURC ~/.ssh/authorized_keys file. RC provides a global home directory, so copying to any login node will do. Targeting a specific login node is useful, though: the ControlMaster configuration for login.rc.colorado.edu tends to confuse ssh-copy-id.

$ ssh-copy-id login01.rc

(The ssh-copy-id command doesn’t come with OS X, but theres a third-party port available on GitHub. It’s usually available on a Linux system, too. Alternatively, you can just edit ~/.ssh/authorized_keys manually.)

Some routes are more default than others

This article was first published in the Fall 2016 issue of Usenix ;login:.

Typical IP-networked hosts are configured with a single default route. For single-homed hosts the default route defines the first destination for packets addressed outside of the local subnet; but for multi-homed hosts the default route also implicitly defines a default interface to be used for all outbound traffic. Specific subnets may be accessed using non-default interfaces by defining static routes; but the single default route remains a "single point of failure" for general access to other and Internet subnets. The Linux kernel, together with the iproute2 suite supports the definition of multiple default routes distinguished by a preference metric. This allows alternate networks to serve as fail-over for the preferred default route in cases where the link has failed or is otherwise unavailable.

Background

The CU-Boulder Research Computing environment spans three datacenters, each with its own set of special-purpose networks. Public-facing hosts may be accessed through a 1:1 NAT or via a dedicated "DMZ" VLAN that spans all three environments. We have historically configured whichever interface was used for inbound connection from the Internet as the default route in order to support responses to connections from Internet clients; but our recent and ongoing deployment of policy routing (as described in a previous issue of ;login:) removes this requirement.

figure1.svg

Figure 1 - The CU-Boulder Research Computing Science Network, with subnets in three datacenters

All RC networks are capable of routing traffic with each other, the campus intranet, and the greater Internet, so we more recently prefer the host's "management" interface as its default route as a matter of convention; but this unnecessarily limits network connectivity in cases where the default interface is down, whether by link failure or during a reconfiguration or maintenance process.

The problem with a single default route

The simplest Linux host routing table is a system with a single network interface.

# ip route list
default via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192  proto kernel  scope link  src 10.225.160.38

Traffic to hosts on 10.225.160.0/24 is delivered directly, while traffic to any other network is forwarded to 10.225.160.1. In this case, the default route eventually provides access to the public Internet.

# ping -c1 example.com
PING example.com (93.184.216.34) 56(84) bytes of data.
64 bytes from 93.184.216.34: icmp_seq=1 ttl=54 time=24.0 ms

--- example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 24.075/24.075/24.075/0.000 ms

A dual-homed host adds a second network interface and a second link-local route; but the original default route remains.

# ifup ens224 && ip route list
default via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192  proto kernel  scope link  src 10.225.160.38
10.225.176.0/24 dev ens224  proto kernel  scope link  src 10.225.176.38

The new link-local route provides access to hosts on 10.225.176.0/24; but traffic to other networks still requires access to the default interface as defined by the single default route. If the default route interface is unavailable, external networks become inaccessible, even though identical routing is available via 10.225.176.1.

# ifdown ens192 && ping -c1 example.com; ifup ens192
connect: Network is unreachable

Attempts to add a second default route fail with an error message (in typically unhelpful iproute2 fashion) implying that it is impossible to configure a host with multiple default routes simultaneously.

# ip route add default via 10.225.176.1 dev ens224
RTNETLINK answers: File exists

It would be better if the host could select dynamically from any of the physically available routes.; but without an entry in the host's routing table directing packets out the ens224 "data" interface, the host will simply refuse to deliver the packets.

Multiple default routes and routing metrics

The RTNETLINK error above indicates that the ens224 "data" route cannot be added to the table because a conflicting route already exists--in this case, the ens192 "management" route. Both routes target the "default" network, which would lead to non-deterministic routing with no way to select one route in favor of the other.

However, the Linux routing table supports more attributes than the "via" address and "dev" specified in the above example. Of use here, the "metric" attribute allows us to specify a preference number for each route.

# ip route change default via 10.225.160.1 dev ens192 metric 100
# ip route add default via 10.225.176.1 dev ens224 metric 200
# ip route flush cache

The host will continue to prefer the ens192 "management" interface for its default route, due to its lower metric number; but, if that interface is taken down, outbound packets will automatically be routed via the ens224 "data" interface.

# ifdown ens192 && ping -c1 example.com; ifup ens192
PING example.com (93.184.216.34) 56(84) bytes of data.
64 bytes from example.com (93.184.216.34): icmp_seq=1 ttl=54 time=29.0 ms

--- example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 29.032/29.032/29.032/0.000 ms

Persisting the configuration

This custom routing configuration can be persisted in the Red Hat "ifcfg" network configuration system by specifying a METRIC number in the ifcfg- files. This metric will be applied to any route populated by DHCP or by a GATEWAY value in the ifcfg- file or /etc/sysconfig/network file.

# grep METRIC= /etc/sysconfig/network-scripts/ifcfg-ens192
METRIC=100

# grep METRIC= /etc/sysconfig/network-scripts/ifcfg-ens224
METRIC=200

Alternatively, routes may be specified using route- files. These routes must define metrics explicitly.

# cat /etc/sysconfig/network-scripts/route-ens192
default via 10.225.160.1 dev ens192 metric 100

# cat /etc/sysconfig/network-scripts/route-ens-224
default via 10.225.176.1 dev ens224 metric 200

Alternatives and further improvements

The NetworkManager service in RHEL 7.x handles multiple default routes correctly by supplying distrinct metrics automatically; but, of course, specifying route metrics manually allows you to control which route is preferred explicitly.

I continue to wonder if it might be better to go completely dynamic and actually run OSPF on all multi-homed hosts. This should--in theory--allow our network to be even more automatically dynamic in response to link availability, but this may be too complex to justify in our environment.

There's also potential to use all available routes simultaneously with weighted load-balancing, either per-flow or per-packet. This is generally inappropriate in our environment; but could be preferable in an environment where the available networks are definitively general-purpose.

# ip route equalize add default \
    nexthop via 10.225.160.1 dev ens192 weight 1 \
    nexthop via 10.225.176.1 dev ens224 weight 10

Conclusion

We've integrated a multiple-default-route configuration into our standard production network configuration, which is being deployed in parallel with our migration to policy routing. Now the default route is specified not by the static binary existence of a single default entry in the routing table; but by an order of preference for each of the available interfaces. This allows our hosts to remain functional in more failure scenarios than before, when link failure or network maintenance makes the preferred route unavailable.

Improve your multi-homed servers with policy routing

This article was first published in the Summer 2016 issue of Usenix ;login:

Traditional IP routing systems route packets by comparing the destinaton address against a predefined list of routes to each available subnet; but when multiple potential routes exist between two hosts on a network, the preferred route may be dependent on context that cannot be inferred from the destination alone. The Linux kernel, together with the iproute2 suite, supports the definition of multiple routing tables and a routing policy database to select the preferred routing table dynamically. This additional expressiveness can be used to avoid multiple routing pitfalls, including asymmetric routes and performance bottlenecks from suboptimal route selection.

Background

The CU-Boulder Research Computing environment spans three datacenters, each with its own set of special-purpose networks. A traditionally-routed host simultaneously connected to two or more of these networks compounds network complexity by making only one interface (the default gateway) generaly available across network routes. Some cases can be addressed by defining static routes; but even this leads to asymmetric routing that is at best confusing and at worst a performance bottleneck.

Over the past few months we've been transitioning our hosts from a single-table routing configuration to a policy-driven, multi-table routing configuration. The end result is full bidirectional connectivity between any two interfaces in the network, irrespective of underlying topology or a host's default route. This has reduced the apparent complexity in our network by allowing the host and network to Do the Right Thing™ automatically, unconstrained by an otherwise static route map.

Linux policy routing has become an essential addition to host configuration in the University of Colorado Boulder "Science Network." It's so useful, in fact, that I'm surprised a basic routing policy isn't provided by default for multi-homed servers.

The problem with traditional routing

The simplest Linux host routing scenario is a system with a single network interface.

# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:56:88:56:1f brd ff:ff:ff:ff:ff:ff
    inet 10.225.160.38/24 brd 10.225.160.255 scope global dynamic ens192
       valid_lft 60184sec preferred_lft 60184sec

Such a typically-configured network with a single uplink has a single default route in addition to its link-local route.

# ip route list
default via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192  proto kernel  scope link  src 10.225.160.38

Traffic to hosts on 10.225.160.0/24 is delivered directly, while traffic to any other network is forwarded to 10.225.160.1.

A dual-homed host adds a second network interface and a second link-local route; but the original default route remains. (Figure 1.)

# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:56:88:56:1f brd ff:ff:ff:ff:ff:ff
    inet 10.225.160.38/24 brd 10.225.160.255 scope global dynamic ens192
       valid_lft 86174sec preferred_lft 86174sec
3: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
    link/ether 00:50:56:88:44:18 brd ff:ff:ff:ff:ff:ff
    inet 10.225.176.38/24 brd 10.225.176.255 scope global dynamic ens224
       valid_lft 69193sec preferred_lft 69193sec

# ip route list
default via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192  proto kernel  scope link  src 10.225.160.38
10.225.176.0/24 dev ens224  proto kernel  scope link  src 10.225.176.38

The new link-local route provides access to hosts on 10.225.176.0/24, and is sufficient for a private network connecting a small cluster of hosts. In fact, this is the configuration that we started with in our Research Computing environment: .160.0/24 is a low-performance "management" network, while .176.0/24 is a high-performance "data" network.

figure1.svg

Figure 1 - A simple dual-homed server with a traditional default route

In a more complex network, however, link-local routes quickly become insufficient. In the CU Science Network, for example, each datacenter is considered a discrete network zone with its own set of "management" and "data" networks. For hosts in different network zones to communicate, a static route must be defined in each direction to direct performance-sensitive traffic across the high-performance network route. (Figure 2.)

server # ip route add 10.225.144.0/24 via 10.225.176.1
client # ip route add 10.225.176.0/24 via 10.225.144.0

Though managing these static routes can be tedious, they do sufficiently define connectivity between the relevant network pairs: "data" interfaces route traffic to each other via high-performance networks, while "management" interfaces route traffic to each other via low-performance networks. Other networks (e.g., the Internet) can only communicate with the hosts on their default routes; but this limitation may be acceptable for some scenarios.

figure2.svg

Figure 2 - A server and a client, with static routes between their data interfaces

Even this approach is insufficient, however, to allow traffic between "management" and "data" interfaces. This is particularly problematic when a client host is not equipped with a symmetric set of network interfaces. (Figure 3.) Such a client may only have a "management" interface, but should still communicate with the server's high-performance interface for certain types of traffic. (For example, a dual-homed NFS server should direct all NFS traffic over its high-performance "data" network, even when being accessed by a client that itself only has a low-performance "management" interface.) By default, the Linux rp_filter blocks this traffic, as the server's response to the client targets a different route than the incomming request; but even if rp_filter is disabled, this asymmetric route limits the server's aggregate network bandwidth to that of its lower-performing interface.

The server's default route could be moved to the "data" interface--in some scenarios, this may even be preferable--but this only displaces the issue: clients may then be unable to communicate with the server on its "management" interface, which may be preferred for certain types of traffic. (In Research Computing, for example, we prefer that administrative access and monitoring not compete with IPC and file system traffic.)

figure3.svg

Figure 3 - In a traditional routing configuration, the server would try to respond to the client via its default route, even if the request arrived on its data interface

Routing policy rules

Traditional IP routing systems route incoming packets based solely on the the intended destination; but the Linux iproute2 stack supports route selection based on additional packet metadata, including the packet source. Multiple discrete routing tables, similar to the virtual routing and forwarding (VRF) support found in dedicated routing appliances, define contextual routes, and a routing policy selects the appropriate routing table dynamically based on a list of rules.

In this example there are three different routing contexts to consider. The first of these--the "main" routing table--defines the routes to use when the server initiates communication.

server # ip route list table main
10.225.144.0/24 via 10.225.176.1 dev ens224
default via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192  proto kernel  scope link  src 10.225.160.38
10.225.176.0/24 dev ens224  proto kernel  scope link  src 10.225.176.38

A separate routing table defines routes to use when responding to traffic on the "management" interface. Since this table is concerned only with the default route's interface in isolation, it simply reiterates the default route.

server # ip route add default via 10.225.160.1 table 1
server # ip route list table 1
default via 10.225.160.1 dev ens192

Similarly, the last routing table defines routes to use when responding to traffic on the "data" interface. This table defines a different default route: all such traffic should route via the "data" interface.

server # ip route add default via 10.225.176.1 table 2
server # ip route list table 2
default via 10.225.176.1 dev ens224

With these three routing tables defined, the last step is to define routing policy to select the correct routing table based on the packet to be routed. Responses from the "management" address should use table 1, and responses from the "data" address should use table 2. All other traffic, including server-initiated traffic that has no outbound address assigned yet, uses the "main" table automatically.

server # ip rule add from 10.225.160.38 table 1
server # ip rule add from 10.225.176.38 table 2
server # ip rule list
0:  from all lookup local
32764:  from 10.225.176.38 lookup 2
32765:  from 10.225.160.38 lookup 1
32766:  from all lookup main
32767:  from all lookup default

With this routing policy in place, a single-homed client (or, in fact, any client on the network) may communicate with both the server's "data" and "management" interfaces independently and successfully, and the bidirectional traffic routes consistently via the appropriate network. (Figure 4.)

figure4.svg

Figure 4 - Routing policy allows the server to respond using its data interface for any request that arrived on its data interface, even if it has a different default route

Persisting the configuration

This custom routing policy can be persisted in the Red Hat "ifcfg" network configuration system by creating interface-specific route- and rule- files.

# cat /etc/sysconfig/network-scripts/route-ens192
default via 10.225.160.1 dev ens192
default via 10.225.160.1 dev ens192 table mgt

# cat /etc/sysconfig/network-scripts/route-ens224
10.225.144.0/24 via 10.225.176.1 dev ens224
default via 10.225.176.1 dev ens224 table data

# cat /etc/sysconfig/network-scripts/rule-ens192
from 10.225.160.38 table mgt

# cat /etc/sysconfig/network-scripts/rule-ens224
from 10.225.176.38 table data

The symbolic names mgt and data used in these examples are translated to routing table numbers as defined in the /etc/iproute2/rt_tables file.

# echo "1 mgt" >>/etc/iproute2/rt_tables
# echo "2 data" >>/etc/iproute2/rt_tables

Once the configuration is in place, activate it by restarting the network service. (e.g., systemctl restart network) You may also be able to achieve the same effect using ifdown and ifup on individual interfaces.

Red Hat's support for routing rule configuration has a confusing regression that merits specific mention. Red Hat (and its derivatives) has historically used a "network" initscript and subscripts to configure and manage network interfaces, and these scripts support the aforementioned rule- configuration files. Red Hat Enterprise Linux 6 introduced NetworkManager, a persistent daemon with additional functionality; however, NetworkManager did not support rule- files until version 1.0, released as part of RHEL 7.1. If you're currently using NetworkManager, but wish to define routing policy in rule- files, you'll need to either disable NetworkManager entirely or exempt specific interfaces from NetworkManager by specifying NM_CONTROLLED=no in the relevant ifcfg- files.

In a Debian-based distribution, these routes and rules can be persisted using post-up directives in /etc/network/interfaces.

Further improvements

We're still in the process of deploying this policy-based routing configuration in our Research Computing environment; and, as we do, we discover more cases where previously complex network requirements and special-cases are abstracted away by this relatively uniform configuration. We're simultaneously evaluating other potential changes, including the possibility of running a dynamic routing protocol (such as OSPF) on our multi-homed hosts, or of configuring every network connection as a simultaneous default route for fail-over. In any case, this experience has encouraged us to take a second look at our network configuration to re-evaluate what we had previously thought were inherent limitations of the stack itself.

User-selectable authentication methods using pam_authtok

Research Computing is in the process of migrating and expanding our authentication system to support additional authentication methods. Historically we’ve supported VASCO IDENTIKEY time-based one-time-password and pin to provide two-factor authentication.

$ ssh user1234@login.rc.colorado.edu
user1234@login.rc.colorado.edu's password: <pin><otp>

[user1234@login04 ~]$

But the VASCO tokens are expensive, get lost or left at home, have a battery that runs out, and have an internal clock that sometimes falls out-of-sync with the rest of the authentication system. For these and other reasons we’re provisioning most new account with Duo, which provides iOS and Android apps but also supports SMS and voice calls.

Unlike VASCO, Duo is only a single authentication factor; so we’ve also added support for upstream CU-Boulder campus password authentication to be used in tandem.

This means that we have to support both authentication mechanisms–VASCO and password+Duo–simultaneously. A naïve implementation might just stack these methods together.

auth sufficient pam_radius_auth.so try_first_pass # VASCO authenticates over RADIUS
auth requisite  pam_krb5.so try_first_pass # CU-Boulder campus password
auth required   pam_duo.so

This generally works: VASCO authentication is attempted first over RADIUS. If that fails, authentication is attempted against the campus password and, if that succeeds, against Duo.

Unfortunately, this generates spurious authentication failures in VASCO when using Duo to authenticate: the VASCO method fails, then Duo authentication is attempted. Users who have both VASCO and Duo accounts (e.g., all administrators) may generate enough failures to trigger the break-in mitigation security system, and the VASCO account may be disabled. This same issue exists if we reverse the authentication order to try Duo first, then VASCO: VASCO users might then cause their campus passwords to become disabled.

In stead, we need to enable users to explicitly specify which authentication method they’re using.

Separate sssd domains

Our first attempt to provide explicit access to different authentication methods was to provide multiple redundant sssd domains.

[domain/rc]
description = Research Computing
proxy_pam_target = curc-twofactor-vasco


[domain/duo]
description = Research Computing (identikey+duo authentication)
enumerate = false
proxy_pam_target = curc-twofactor-duo

This allows users to log in normally using VASCO, while password+Duo authentication can be requested explicitly by logging in as ${user}@duo.

$ ssh -l user1234@duo login.rc.colorado.edu

This works well enough for the common case of shell access over SSH: login is permitted and, since both the default rc domain and the duo alias domain are both backed by the same LDAP directory, NSS sees no important difference once a user is logged in using either method.

This works because POSIX systems store the uid number returned by PAM and NSS, and generally resolve the uid number to the username on-demand. Not all systems work this way, however. For example, when we attempted to use this authentication mechanism to authenticate to our prototype JupyterHub (web) service, jobs dispatched to Slurm retained the ${user}@duo username format. Slurm also uses usernames internally, and the ${user}@duo username is not populated within Slurm: only the base ${user} username.

Expecting that we would continue to find more unexpected side-effects of this implementation, we started to look for an alternative mechanism that doesn’t modify the specified username.

pam_authtok

In general, a user provides two pieces of information during authentication: a username (which we’ve already determined we shouldn’t modify) and an authentication token or password. We should be able to detect, for example, a prefix to that authentication token to determine what authentication method to use.

$ ssh user1234@login.rc.colorado.edu
user1234@login.rc.colorado.edu's password: duo:<password>

[user1234@login04 ~]$

But we found no such pam module that would allow us to manipulate the authentication token… so we wrote one.

auth [success=1 default=ignore] pam_authtok.so prefix=duo: strip prompt=password:

auth [success=done new_authtok_reqd=done default=die] pam_radius_auth.so try_first_pass

auth requisite pam_krb5.so try_first_pass
auth [success=done new_authtok_reqd=done default=die] pam_duo.so

Now our PAM stack authenticates against VASCO by default; but, if the user provides a password with a duo: prefix, authentication skips VASCO and authenticates the supplied password, followed by Duo push. Our actual production PAM stack is a bit more complicated, supporting a redundant vasco: prefix as well, for forward-compatibility should we change the default authentication mechanism in the future. We can also extend this mechanism to add arbitrary additional authentication mechanisms in the future.

Thomas Hauser | CURCast

Our fearless leader joins me in the studio to talk about moving to the US, the genesis of CU Research Computing, and our upcoming HPC resource, 'Summit.'

Music used

  • Fight CU from University of Colorado Boulder Department of Music

  • Western Showdown by Jay Man

Two software design methods

There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.

–C.A.R. Hoare, The 1980 ACM Turing Award Lecture

Roger Goff | CURCast

Roger Goff from Data Direct Networks visits to talk about the SFA14k, the storage system that serves the scratch filesystem for our upcoming 'Summit' system. We also end up on a number of tangents including procurement processes, Omni-Path, and the minutae of IO design in a high-performance storage system.

Music used

  • Fight CU from University of Colorado Boulder Department of Music

  • Positivity Rocks by Jay Man

Introduction and Sound Test | CURCast

Listen to me talk into a microphone for a few minutes so I can get some practice with the recording studio and the editing software. And maybe you'll find out why I'm recording the podcast in the first place.

Music used

  • Fight CU from University of Colorado Boulder Department of Music

  • Prelude BWV 846 by Jay Man

Why hasn’t my (Slurm) job started?

A job can be blocked from being scheduled for the following reasons:

  • There are insufficient resources available to start the job, either due to active reservations, other running jobs, component status, or system/partition size.

  • Other higher-priority jobs are waiting to run, and the job’s time limit prevents it from being backfilled.

  • The job’s time limit exceeds an upcoming reservation (e.g., scheduled preventative maintenance)

  • The job is associated with an account that has reached or exceeded its GrpCPUMins.

Display a list of queued jobs sorted in the order considered by the scheduler using squeue.

squeue --sort=-p,i --priority --format '%7T %7A %10a %5D %.12L %10P %10S %20r'

Reason codes

A list of reason codes [1] is available as part of the squeue manpage. [2]

Common reason codes:

  • ReqNodeNotAvail

  • AssocGrpJobsLimit

  • AssocGrpCPUMinsLimit

  • resources

  • QOSResourceLimit

  • Priority

  • AssociationJobLimit

  • JobHeldAdmin

How are jobs prioritized?

PriorityType=priority/multifactor

Slurm prioritizes jobs using the multifactor plugin [3] based on a weighted summation of age, size, QOS, and fair-share factors.

Use the sprio command to inspect each weighted priority value separately.

sprio [-j jobid]

Age Factor

PriorityWeightAge=1000
PriorityMaxAge=14-0

The age factor represents the length of time a job has been sitting in the queue and eligible to run. In general, the longer a job waits in the queue, the larger its age factor grows. However, the age factor for a dependent job will not change while it waits for the job it depends on to complete. Also, the age factor will not change when scheduling is withheld for a job whose node or time limits exceed the cluster’s current limits.

The weighted age priority is calculated as PriorityWeightAge[1000]*[0..1] as the job age approaches PriorityMaxAge[14-0], or 14 days. As such, an hour of wait-time is equivalent to ~2.976 priority.

Job Size Factor

PriorityWeightJobSize=2000

The job size factor correlates to the number of nodes or CPUs the job has requested. The weighted job size priority is calculated as PriorityWeightJobSize[2000]*[0..1] as the job size approaches the entire size of the system. A job that requests all the nodes on the machine will get a job size factor of 1.0, with an effective weighted job size priority of 28 wait-days (except that job age priority is capped at 14 days).

Quality of Service (QOS) Factor

PriorityWeightQOS=1500

Each QOS can be assigned a priority: the larger the number, the greater the job priority will be for jobs that request this QOS. This priority value is then normalized to the highest priority of all the QOS’s to become the QOS factor. As such, the weighted QOS priority is calculated as PriorityWeightQOS[1500]*QosPriority[0..1000]/MAX(QOSPriority[1000]).

QOS          Priority  Weighted priority  Wait-days equivalent
-----------  --------  -----------------  --------------------
admin            1000               1500                  21.0
janus               0                  0                   0.0
janus-debug       400                600                   8.4
janus-long        200                300                   4.2

Fair-share factor

PriorityWeightFairshare=2000
PriorityDecayHalfLife=14-0

The fair-share factor serves to prioritize queued jobs such that those jobs charging accounts that are under-serviced are scheduled first, while jobs charging accounts that are over-serviced are scheduled when the machine would otherwise go idle.

The simplified formula for calculating the fair-share factor for usage that spans multiple time periods and subject to a half-life decay is:

F = 2**((-NormalizedUsage)/NormalizedShares))

Each account is granted an equal share, and historic records of use decay with a half-life of 14 days. As such, the weighted fair-share priority is calculated as PriorityWeightFairshare[2000]*[0..1] depending on the account’s historic use of the system relative to its allocated share.

A fair-share factor of 0.5 indicates that the account’s jobs have used exactly the portion of the machine that they have been allocated and assigns the job additional 1000 priority (the equivalent of 2976 wait-hours). A fair-share factor of above 0.5 indicates that the account’s jobs have consumed less than their allocated share and assigns the job up to 2000 additional priority, for an effective relative 14 wait-day priority boost. A fair-share factor below 0.5 indicates that the account’s jobs have consumed more than their allocated share of the computing resources, and the added priority will approach 0 dependent on the account’s history relevant to its equal share of the system, for an effective relative 14-day priority penalty.