It’s good practice to harden our ssh client with some secure “defaults”.
Starting your configuration file with the following directives will
apply the directives to all (*) hosts.
(These are listed as multiple Host * stanzas, but they can be
combined into a single stanza in your actual configuration file.)
If you prefer, follow along with an example of a complete
``~/.ssh/config`
file <link://listing/secure-openssh-defaults/ssh_config>`__.
Require secure algorithms
OpenSSH supports many encryption and authentication algorithms, but some
of those algorithms are known to be weak to cryptographic attack. The
Mozilla project publishes a list of recommended
algorithms
that exclude algorithms that are known to be insecure.
Every time you connect to an SSH server, your client caches a copy of
the remote server’s host key in a ~/.ssh/known_hosts file. If your
ssh client is ever compromised, this list can expose the remote servers
to attack using your compromised credentials. Be a good citizen and hash
your known hosts file.
Host *
HashKnownHosts yes
(Hash any existing entries in your ~/.ssh/known_hosts file by
running ssh-keygen-H. Don’t forget to remove the backup
~/.ssh/known_hosts.old.)
Some servers are old enough that they may not support the newer, more
secure algorithms listed. In the RC environment, for example, the login
and other Internet-accessible systems provide relatively modern ssh
algorithms; but the host in the rc.int.colorado.edu domain may not.
To support connection to older hosts while requiring newer algorithms by
default, override these settings earlier in the configuration file.
# Internal RC hosts are running an old version of OpenSSH
Match host=*.rc.int.colorado.edu
MACs hmac-sha1,umac-64@openssh.com,hmac-ripemd160,hmac-ripemd160@openssh.com,hmac-sha1-96
The OpenSSH client is very robust, verify flexible, and very
configurable. Many times I see people struggling to remember
server-specific ssh flags or arcane, manual multi-hop procedures. I even
see entire scripts written to automate the process.
But the vast majority of what you might want ssh to do can be abstracted
away with some configuration in your ~/.ssh/config file.
All (or, at least, most) of these configuration directives are fully
documented in the ``ssh_config`
manpage <http://man.openbsd.org/ssh_config>`__.
If you prefer, follow along with an example of a complete
``~/.ssh/config`
file <link://listing/elegant-openssh-configuration/ssh_config>`__.
HostName
One of the first annoyances people have–and one of the first things
people try to fix–when using a command-line ssh client is having to type
in long hostnames. For example, the Research Computing login service is
available at login.rc.colorado.edu.
$ ssh login.rc.colorado.edu
This particular name isn’t too bad; but coupled with usernames and
especially when used as part of an scp, these fully-qualified domain
names can become cumbersome.
In this example, %h is substituted with the name specified on the
command-line. With a configuration like this in place, connections to
login.rc are directed to the full name login.rc.colorado.edu.
$ scp -r /path/to/src/ user1234@login.rc:dest/
Failing that, other references to hosts with a .rc suffix are
directed to the internal Research Computing domain. (We’ll use these
later.)
(The .rc domain segment could be moved from the Host pattern to
the HostName value; but leaving it in the alias helps to distinguish
the Research Computing login nodes from other login nodes that you may
have access to. You can use arbitrary aliases in the Host directive,
too; but then the %h substitution isn’t useful: you have to
enumerate each targeted host.)
User
Unless you happen to use the same username on your local workstation as
you have on the remove server, you likely specify a username using
either the @ syntax or -l argument to the ssh command.
$ ssh user1234@login.rc
As with specifying a fully-qualified domain name, tracking and
specifying a different username for each remote host can become
burdensome, especially during an scp operation. Record the correct
username in your ~/.ssh/config file in stead.
Match host=*.rc.colorado.edu,*.rc.int.colorado.edu
User user1234
Now all connections to Research Computing hosts use the specified
username by default, without it having to be specified on the
command-line.
$ scp -r /path/to/src/ login.rc:dest/
Note that we’re using a Match directive here, rather than a Host
directive. The host= argument to Match matches against the
derived hostname, so it reflects the real hostname as determined using
the previous Host directives. (Make sure the correct HostName is
established earlier in the configuration, though.)
ControlMaster
Even if the actual command is simple to type, authenticating to the host
may be require manual intervention. The Research Computing login nodes,
for example, require two-factor authentication using a password or pin
coupled with a one-time VASCO password or Duo credential. If you want to
open multiple connections–or, again, copy files using scp–having to
authenticate with multiple factors quickly becomes tedious. (Even having
to type in a password at all may be unnecessary; but we’ll assume, as is
the case with the Research Computing login example, that you can’t use
public-key authentication.)
OpenSSH supports sharing a single network connection for multiple ssh
sessions.
Match host=login.rc.colorado.edu
ControlMaster auto
ControlPath ~/.ssh/.socket_%h_%p_%r
ControlPersist 4h
With ControlMaster and ControlPath defined, the first ssh
connection authenticates and establishes a session normally; but future
connections join the active connection, bypassing the need to
re-authenticate. The optional ControlPersist option causes this
connection to remain active for a period of time even after the last
session has been closed.
(Note that many arguments to the ssh command are effectively ignored
after the initial connection is established. Notably, if X11 was not
forwarded with -X or -Y during the first session, you cannot use
the shared connection to forward X11 in a later session. In this case,
use the -S none argument to ssh to ignore the existing
connection and explicitly establish a new connection.)
ProxyCommand
But what if you want to get to a host that isn’t directly available
from your local workstation? The hosts in the rc.int.colorado.edu
domain referenced above may be accessible from a local network
connection; but if you are connecting from elsewhere on the Internet,
you won’t be able to access them directly.
Except that OpenSSH provides the ProxyCommand option which, when
coupled with the OpenSSH client presumed to be available on the
intermediate server, supports arbitrary proxy connections through to
remotely-accessible servers.
Match host=*.rc.int.colorado.edu
ProxyCommand ssh -W %h:%p login.rc.colorado.edu
Even though you can’t connect directly to Janus compute nodes from the
Internet, for example, you can connect to them from a Research
Computing login node; so this ProxyCommand configuration allows
transparent access to hosts in the internal Research Computing domain.
If you tried the example above, chances are that you were met with an
unexpected password prompt that didn’t accept any password that you
used. That’s because most internal Research Computing hosts don’t
actually support interactive authentication, two-factor or otherwise.
Connections from a CURC login node are authorized by the login node; but
a proxied connection must authenticate from your local client.
The best way to authenticate your local workstation to an internal CURC
host is using public-key authentication.
If you don’t already have an SSH key, generate one now.
$ ssh-keygen -t rsa -b 4096 # if you don't already have a key
Now we have to copy the (new?) public key to the remote CURC
~/.ssh/authorized_keys file. RC provides a global home directory, so
copying to any login node will do. Targeting a specific login node is
useful, though: the ControlMaster configuration for
login.rc.colorado.edu tends to confuse ssh-copy-id.
$ ssh-copy-id login01.rc
(The ssh-copy-id command doesn’t come with OS X, but theres a
third-party port available on
GitHub. It’s
usually available on a Linux system, too. Alternatively, you can just
edit ~/.ssh/authorized_keys manually.)
This article was first published in the Fall 2016 issue of Usenix
;login:.
Typical IP-networked hosts are configured with a single default route.
For single-homed hosts the default route defines the first destination
for packets addressed outside of the local subnet; but for multi-homed
hosts the default route also implicitly defines a default interface to
be used for all outbound traffic. Specific subnets may be accessed
using non-default interfaces by defining static routes; but the single
default route remains a "single point of failure" for general access
to other and Internet subnets. The Linux kernel, together with the
iproute2 suite supports the definition of multiple default routes
distinguished by a preference metric. This allows alternate networks
to serve as fail-over for the preferred default route in cases where
the link has failed or is otherwise unavailable.
Background
The CU-Boulder Research Computing environment spans three datacenters,
each with its own set of special-purpose networks. Public-facing hosts
may be accessed through a 1:1 NAT or via a dedicated "DMZ" VLAN that
spans all three environments. We have historically configured
whichever interface was used for inbound connection from the Internet
as the default route in order to support responses to connections from
Internet clients; but our recent and ongoing deployment of policy
routing (as described in a previous issue of ;login:) removes this
requirement.
Figure 1 - The CU-Boulder Research Computing Science Network, with
subnets in three datacenters
All RC networks are capable of routing traffic with each other, the
campus intranet, and the greater Internet, so we more recently prefer
the host's "management" interface as its default route as a matter of
convention; but this unnecessarily limits network connectivity in
cases where the default interface is down, whether by link failure or
during a reconfiguration or maintenance process.
The problem with a single default route
The simplest Linux host routing table is a system with a single
network interface.
# ip route listdefault via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38
Traffic to hosts on 10.225.160.0/24 is delivered directly,
while traffic to any other network is forwarded to
10.225.160.1. In this case, the default route eventually
provides access to the public Internet.
# ping -c1 example.comPING example.com (93.184.216.34)56(84) bytes of data.
64 bytes from 93.184.216.34: icmp_seq=1ttl=54time=24.0 ms
--- example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev =24.075/24.075/24.075/0.000 ms
A dual-homed host adds a second network interface and a second
link-local route; but the original default route remains.
# ifup ens224 && ip route listdefault via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38
10.225.176.0/24 dev ens224 proto kernel scope link src 10.225.176.38
The new link-local route provides access to hosts on
10.225.176.0/24; but traffic to other networks still requires
access to the default interface as defined by the single default
route. If the default route interface is unavailable, external
networks become inaccessible, even though identical routing is
available via 10.225.176.1.
Attempts to add a second default route fail with an error message (in
typically unhelpful iproute2 fashion) implying that it is impossible
to configure a host with multiple default routes simultaneously.
# ip route add default via 10.225.176.1 dev ens224RTNETLINK answers: File exists
It would be better if the host could select dynamically from any of
the physically available routes.; but without an entry in the host's
routing table directing packets out the ens224 "data"
interface, the host will simply refuse to deliver the packets.
Multiple default routes and routing metrics
The RTNETLINK error above indicates that the ens224
"data" route cannot be added to the table because a conflicting route
already exists--in this case, the ens192 "management"
route. Both routes target the "default" network, which would lead to
non-deterministic routing with no way to select one route in favor of
the other.
However, the Linux routing table supports more attributes than the
"via" address and "dev" specified in the above example. Of use here,
the "metric" attribute allows us to specify a preference number for
each route.
# ip route change default via 10.225.160.1 dev ens192 metric 100# ip route add default via 10.225.176.1 dev ens224 metric 200# ip route flush cache
The host will continue to prefer the ens192 "management"
interface for its default route, due to its lower metric number; but,
if that interface is taken down, outbound packets will automatically
be routed via the ens224 "data" interface.
# ifdown ens192 && ping -c1 example.com; ifup ens192PING example.com (93.184.216.34)56(84) bytes of data.
64 bytes from example.com (93.184.216.34): icmp_seq=1ttl=54time=29.0 ms
--- example.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev =29.032/29.032/29.032/0.000 ms
Persisting the configuration
This custom routing configuration can be persisted in the Red Hat
"ifcfg" network configuration system by specifying a METRIC
number in the ifcfg- files. This metric will be applied to any
route populated by DHCP or by a GATEWAY value in the
ifcfg- file or /etc/sysconfig/network file.
Alternatively, routes may be specified using route-
files. These routes must define metrics explicitly.
# cat /etc/sysconfig/network-scripts/route-ens192default via 10.225.160.1 dev ens192 metric 100# cat /etc/sysconfig/network-scripts/route-ens-224default via 10.225.176.1 dev ens224 metric 200
Alternatives and further improvements
The NetworkManager service in RHEL 7.x handles multiple default routes
correctly by supplying distrinct metrics automatically; but, of
course, specifying route metrics manually allows you to control which
route is preferred explicitly.
I continue to wonder if it might be better to go completely dynamic
and actually run OSPF on all multi-homed hosts. This should--in
theory--allow our network to be even more automatically dynamic in
response to link availability, but this may be too complex to justify
in our environment.
There's also potential to use all available routes simultaneously with
weighted load-balancing, either per-flow or per-packet. This is
generally inappropriate in our environment; but could be preferable in
an environment where the available networks are definitively
general-purpose.
# ip route equalize add default \ nexthop via 10.225.160.1 dev ens192 weight 1\ nexthop via 10.225.176.1 dev ens224 weight 10
Conclusion
We've integrated a multiple-default-route configuration into our
standard production network configuration, which is being deployed in
parallel with our migration to policy routing. Now the default route
is specified not by the static binary existence of a single
default entry in the routing table; but by an order of
preference for each of the available interfaces. This allows our hosts
to remain functional in more failure scenarios than before, when link
failure or network maintenance makes the preferred route unavailable.
This article was first published in the Summer 2016 issue of Usenix
;login:
Traditional IP routing systems route packets by comparing the
destinaton address against a predefined list of routes to each
available subnet; but when multiple potential routes exist between two
hosts on a network, the preferred route may be dependent on context
that cannot be inferred from the destination alone. The Linux kernel,
together with the iproute2 suite, supports the definition
of multiple routing tables and a routing policy
database to select the preferred routing table
dynamically. This additional expressiveness can be used to avoid
multiple routing pitfalls, including asymmetric routes and performance
bottlenecks from suboptimal route selection.
Background
The CU-Boulder Research Computing environment spans three datacenters,
each with its own set of special-purpose networks. A
traditionally-routed host simultaneously connected to two or more
of these networks compounds network complexity by making only one
interface (the default gateway) generaly available across network
routes. Some cases can be addressed by defining static routes; but
even this leads to asymmetric routing that is at best confusing and at
worst a performance bottleneck.
Over the past few months we've been transitioning our hosts from a
single-table routing configuration to a policy-driven, multi-table
routing configuration. The end result is full bidirectional
connectivity between any two interfaces in the network, irrespective
of underlying topology or a host's default route. This has reduced the
apparent complexity in our network by allowing the host and network to
Do the Right Thing™ automatically, unconstrained by an otherwise
static route map.
Linux policy routing has become an essential addition to host
configuration in the University of Colorado Boulder "Science
Network." It's so useful, in fact, that I'm surprised a basic
routing policy isn't provided by default for multi-homed servers.
The problem with traditional routing
The simplest Linux host routing scenario is a system with a single
network interface.
# ip addr show1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:88:56:1f brd ff:ff:ff:ff:ff:ff
inet 10.225.160.38/24 brd 10.225.160.255 scope global dynamic ens192
valid_lft 60184sec preferred_lft 60184sec
Such a typically-configured network with a single uplink has a single
default route in addition to its link-local route.
# ip route listdefault via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38
Traffic to hosts on 10.225.160.0/24 is delivered directly,
while traffic to any other network is forwarded to
10.225.160.1.
A dual-homed host adds a second network interface and a second
link-local route; but the original default route remains. (Figure 1.)
# ip addr show1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:88:56:1f brd ff:ff:ff:ff:ff:ff
inet 10.225.160.38/24 brd 10.225.160.255 scope global dynamic ens192
valid_lft 86174sec preferred_lft 86174sec
3: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:88:44:18 brd ff:ff:ff:ff:ff:ff
inet 10.225.176.38/24 brd 10.225.176.255 scope global dynamic ens224
valid_lft 69193sec preferred_lft 69193sec
# ip route listdefault via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38
10.225.176.0/24 dev ens224 proto kernel scope link src 10.225.176.38
The new link-local route provides access to hosts on
10.225.176.0/24, and is sufficient for a private network
connecting a small cluster of hosts. In fact, this is the
configuration that we started with in our Research Computing
environment: .160.0/24 is a low-performance "management"
network, while .176.0/24 is a high-performance "data" network.
Figure 1 - A simple dual-homed server with a traditional default
route
In a more complex network, however, link-local routes quickly become
insufficient. In the CU Science Network, for example, each datacenter
is considered a discrete network zone with its own set of "management"
and "data" networks. For hosts in different network zones to
communicate, a static route must be defined in each direction to
direct performance-sensitive traffic across the high-performance
network route. (Figure 2.)
server # ip route add 10.225.144.0/24 via 10.225.176.1client # ip route add 10.225.176.0/24 via 10.225.144.0
Though managing these static routes can be tedious, they do
sufficiently define connectivity between the relevant network pairs:
"data" interfaces route traffic to each other via high-performance
networks, while "management" interfaces route traffic to each other via
low-performance networks. Other networks (e.g., the Internet) can only
communicate with the hosts on their default routes; but this
limitation may be acceptable for some scenarios.
Figure 2 - A server and a client, with static routes between their
data interfaces
Even this approach is insufficient, however, to allow traffic between
"management" and "data" interfaces. This is particularly problematic
when a client host is not equipped with a symmetric set of network
interfaces. (Figure 3.) Such a client may only have a "management"
interface, but should still communicate with the server's
high-performance interface for certain types of traffic. (For example,
a dual-homed NFS server should direct all NFS traffic over its
high-performance "data" network, even when being accessed by a client
that itself only has a low-performance "management" interface.) By
default, the Linux rp_filter blocks this traffic, as the server's
response to the client targets a different route than the incomming
request; but even if rp_filter is disabled, this asymmetric
route limits the server's aggregate network bandwidth to that of its
lower-performing interface.
The server's default route could be moved to the "data" interface--in
some scenarios, this may even be preferable--but this only displaces
the issue: clients may then be unable to communicate with the server
on its "management" interface, which may be preferred for certain
types of traffic. (In Research Computing, for example, we prefer that
administrative access and monitoring not compete with IPC and file
system traffic.)
Figure 3 - In a traditional routing configuration, the server would
try to respond to the client via its default route, even if the
request arrived on its data interface
Routing policy rules
Traditional IP routing systems route incoming packets based solely on
the the intended destination; but the Linux iproute2 stack supports
route selection based on additional packet metadata, including the
packet source. Multiple discrete routing tables, similar to the
virtual routing and forwarding (VRF) support found in dedicated
routing appliances, define contextual routes, and a routing
policy selects the appropriate routing table dynamically based on a
list of rules.
In this example there are three different routing contexts to
consider. The first of these--the "main" routing table--defines the
routes to use when the server initiates communication.
server # ip route list table main10.225.144.0/24 via 10.225.176.1 dev ens224
default via 10.225.160.1 dev ens192
10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38
10.225.176.0/24 dev ens224 proto kernel scope link src 10.225.176.38
A separate routing table defines routes to use when responding to
traffic on the "management" interface. Since this table is concerned
only with the default route's interface in isolation, it simply
reiterates the default route.
server # ip route add default via 10.225.160.1 table 1server # ip route list table 1default via 10.225.160.1 dev ens192
Similarly, the last routing table defines routes to use when
responding to traffic on the "data" interface. This table defines a
different default route: all such traffic should route via the
"data" interface.
server # ip route add default via 10.225.176.1 table 2server # ip route list table 2default via 10.225.176.1 dev ens224
With these three routing tables defined, the last step is to define
routing policy to select the correct routing table based on the packet
to be routed. Responses from the "management" address should use table
1, and responses from the "data" address should use table 2. All other
traffic, including server-initiated traffic that has no outbound
address assigned yet, uses the "main" table automatically.
server # ip rule add from 10.225.160.38 table 1server # ip rule add from 10.225.176.38 table 2server # ip rule list0: from all lookup local32764: from 10.225.176.38 lookup 232765: from 10.225.160.38 lookup 132766: from all lookup main
32767: from all lookup default
With this routing policy in place, a single-homed client (or, in fact,
any client on the network) may communicate with both the server's
"data" and "management" interfaces independently and successfully, and
the bidirectional traffic routes consistently via the appropriate
network. (Figure 4.)
Figure 4 - Routing policy allows the server to respond using its
data interface for any request that arrived on its data interface,
even if it has a different default route
Persisting the configuration
This custom routing policy can be persisted in the Red Hat "ifcfg"
network configuration system by creating interface-specific
route- and rule- files.
# cat /etc/sysconfig/network-scripts/route-ens192default via 10.225.160.1 dev ens192
default via 10.225.160.1 dev ens192 table mgt
# cat /etc/sysconfig/network-scripts/route-ens22410.225.144.0/24 via 10.225.176.1 dev ens224
default via 10.225.176.1 dev ens224 table data
# cat /etc/sysconfig/network-scripts/rule-ens192from 10.225.160.38 table mgt
# cat /etc/sysconfig/network-scripts/rule-ens224from 10.225.176.38 table data
The symbolic names mgt and data used in these examples
are translated to routing table numbers as defined in the
/etc/iproute2/rt_tables file.
Once the configuration is in place, activate it by restarting the
network service. (e.g., systemctl restart network) You may
also be able to achieve the same effect using ifdown and
ifup on individual interfaces.
Red Hat's support for routing rule configuration has a confusing
regression that merits specific mention. Red Hat (and its derivatives)
has historically used a "network" initscript and subscripts to
configure and manage network interfaces, and these scripts support the
aforementioned rule- configuration files. Red Hat Enterprise
Linux 6 introduced NetworkManager, a persistent daemon with additional
functionality; however, NetworkManager did not support rule-
files until version 1.0, released as part of RHEL 7.1. If you're
currently using NetworkManager, but wish to define routing policy in
rule- files, you'll need to either disable NetworkManager
entirely or exempt specific interfaces from NetworkManager by
specifying NM_CONTROLLED=no in the relevant ifcfg-
files.
In a Debian-based distribution, these routes and rules can be
persisted using post-up directives in
/etc/network/interfaces.
Further improvements
We're still in the process of deploying this policy-based routing
configuration in our Research Computing environment; and, as we do, we
discover more cases where previously complex network requirements and
special-cases are abstracted away by this relatively uniform
configuration. We're simultaneously evaluating other potential
changes, including the possibility of running a dynamic routing
protocol (such as OSPF) on our multi-homed hosts, or of configuring
every network connection as a simultaneous default route for
fail-over. In any case, this experience has encouraged us to take a
second look at our network configuration to re-evaluate what we had
previously thought were inherent limitations of the stack itself.
Research Computing is in the process of migrating and expanding our
authentication system to support additional authentication methods.
Historically we’ve supported VASCO
IDENTIKEY
time-based one-time-password and pin to provide two-factor
authentication.
But the VASCO tokens are expensive, get lost or left at home, have a
battery that runs out, and have an internal clock that sometimes falls
out-of-sync with the rest of the authentication system. For these and
other reasons we’re provisioning most new account with
Duo, which provides iOS and Android
apps but also supports SMS and voice calls.
Unlike VASCO, Duo is only a single authentication factor; so we’ve also
added support for upstream CU-Boulder campus password authentication to
be used in tandem.
This means that we have to support both authentication mechanisms–VASCO
and password+Duo–simultaneously. A naïve implementation might just stack
these methods together.
This generally works: VASCO authentication is attempted first over
RADIUS. If that fails, authentication is attempted against the campus
password and, if that succeeds, against Duo.
Unfortunately, this generates spurious authentication failures in VASCO
when using Duo to authenticate: the VASCO method fails, then Duo
authentication is attempted. Users who have both VASCO and Duo
accounts (e.g., all administrators) may generate enough failures to
trigger the break-in mitigation security system, and the VASCO account
may be disabled. This same issue exists if we reverse the authentication
order to try Duo first, then VASCO: VASCO users might then cause their
campus passwords to become disabled.
In stead, we need to enable users to explicitly specify which
authentication method they’re using.
Separate sssd domains
Our first attempt to provide explicit access to different authentication
methods was to provide multiple redundant
sssd domains.
This allows users to log in normally using VASCO, while password+Duo
authentication can be requested explicitly by logging in as
${user}@duo.
$ ssh -l user1234@duo login.rc.colorado.edu
This works well enough for the common case of shell access over SSH:
login is permitted and, since both the default rc domain and the
duo alias domain are both backed by the same LDAP directory, NSS
sees no important difference once a user is logged in using either
method.
This works because POSIX systems store the uid number returned by
PAM and
NSS, and
generally resolve the uid number to the username on-demand. Not all
systems work this way, however. For example, when we attempted to use
this authentication mechanism to authenticate to our prototype
JupyterHub (web) service, jobs
dispatched to Slurm retained the
${user}@duo username format. Slurm also uses usernames internally,
and the ${user}@duo username is not populated within Slurm: only the
base ${user} username.
Expecting that we would continue to find more unexpected side-effects of
this implementation, we started to look for an alternative mechanism
that doesn’t modify the specified username.
pam_authtok
In general, a user provides two pieces of information during
authentication: a username (which we’ve already determined we shouldn’t
modify) and an authentication token or password. We should be able to
detect, for example, a prefix to that authentication token to determine
what authentication method to use.
Now our PAM stack authenticates against VASCO by default; but, if the
user provides a password with a duo: prefix, authentication skips
VASCO and authenticates the supplied password, followed by Duo push. Our
actual production PAM stack is a bit more complicated, supporting a
redundant vasco: prefix as well, for forward-compatibility should we
change the default authentication mechanism in the future. We can also
extend this mechanism to add arbitrary additional authentication
mechanisms in the future.
Our fearless leader joins me in the studio to talk about moving to the
US, the genesis of CU Research Computing, and our upcoming HPC
resource, 'Summit.'
Music used
Fight CU from University of Colorado Boulder Department of Music
There are two ways of constructing a software design: One way is to
make it so simple that there are obviously no deficiencies and the
other way is to make it so complicated that there are no obvious
deficiencies.
Roger Goff from Data Direct Networks visits to talk about the SFA14k,
the storage system that serves the scratch filesystem for our upcoming
'Summit' system. We also end up on a number of tangents including
procurement processes, Omni-Path, and the minutae of IO design in a
high-performance storage system.
Music used
Fight CU from University of Colorado Boulder Department of Music
Listen to me talk into a microphone for a few minutes so I can get
some practice with the recording studio and the editing software. And
maybe you'll find out why I'm recording the podcast in the first
place.
Music used
Fight CU from University of Colorado Boulder Department of Music
A job can be blocked from being scheduled for the following reasons:
There are insufficient resources available to start the job, either
due to active reservations, other running jobs, component status, or
system/partition size.
Other higher-priority jobs are waiting to run, and the job’s time
limit prevents it from being backfilled.
The job’s time limit exceeds an upcoming reservation (e.g., scheduled
preventative maintenance)
The job is associated with an account that has reached or exceeded
its GrpCPUMins.
Display a list of queued jobs sorted in the order considered by the
scheduler using squeue.
A list of reason codes [1] is available as part of the squeue
manpage. [2]
Common reason codes:
ReqNodeNotAvail
AssocGrpJobsLimit
AssocGrpCPUMinsLimit
resources
QOSResourceLimit
Priority
AssociationJobLimit
JobHeldAdmin
How are jobs prioritized?
PriorityType=priority/multifactor
Slurm prioritizes jobs using the multifactor plugin [3] based on a
weighted summation of age, size, QOS, and fair-share factors.
Use the sprio command to inspect each weighted priority value
separately.
sprio [-j jobid]
Age Factor
PriorityWeightAge=1000
PriorityMaxAge=14-0
The age factor represents the length of time a job has been sitting in
the queue and eligible to run. In general, the longer a job waits in the
queue, the larger its age factor grows. However, the age factor for a
dependent job will not change while it waits for the job it depends on
to complete. Also, the age factor will not change when scheduling is
withheld for a job whose node or time limits exceed the cluster’s
current limits.
The weighted age priority is calculated as
PriorityWeightAge[1000]*[0..1] as the job age approaches
PriorityMaxAge[14-0], or 14 days. As such, an hour of wait-time is
equivalent to ~2.976 priority.
Job Size Factor
PriorityWeightJobSize=2000
The job size factor correlates to the number of nodes or CPUs the job
has requested. The weighted job size priority is calculated as
PriorityWeightJobSize[2000]*[0..1] as the job size approaches the entire
size of the system. A job that requests all the nodes on the machine
will get a job size factor of 1.0, with an effective weighted job size
priority of 28 wait-days (except that job age priority is capped at 14
days).
Quality of Service (QOS) Factor
PriorityWeightQOS=1500
Each QOS can be assigned a priority: the larger the number, the greater
the job priority will be for jobs that request this QOS. This priority
value is then normalized to the highest priority of all the QOS’s to
become the QOS factor. As such, the weighted QOS priority is calculated
as PriorityWeightQOS[1500]*QosPriority[0..1000]/MAX(QOSPriority[1000]).
The fair-share factor serves to prioritize queued jobs such that those
jobs charging accounts that are under-serviced are scheduled first,
while jobs charging accounts that are over-serviced are scheduled when
the machine would otherwise go idle.
The simplified formula for calculating the fair-share factor for usage
that spans multiple time periods and subject to a half-life decay is:
F = 2**((-NormalizedUsage)/NormalizedShares))
Each account is granted an equal share, and historic records of use
decay with a half-life of 14 days. As such, the weighted fair-share
priority is calculated as PriorityWeightFairshare[2000]*[0..1] depending
on the account’s historic use of the system relative to its allocated
share.
A fair-share factor of 0.5 indicates that the account’s jobs have used
exactly the portion of the machine that they have been allocated and
assigns the job additional 1000 priority (the equivalent of 2976
wait-hours). A fair-share factor of above 0.5 indicates that the
account’s jobs have consumed less than their allocated share and assigns
the job up to 2000 additional priority, for an effective relative 14
wait-day priority boost. A fair-share factor below 0.5 indicates that
the account’s jobs have consumed more than their allocated share of the
computing resources, and the added priority will approach 0 dependent on
the account’s history relevant to its equal share of the system, for an
effective relative 14-day priority penalty.