Posts about curc

User-selectable authentication methods using pam_authtok

Research Computing is in the process of migrating and expanding our authentication system to support additional authentication methods. Historically we've supported VASCO IDENTIKEY time-based one-time-password and pin to provide two-factor authentication.

$ ssh joan5896@login.rc.colorado.edu
joan5896@login.rc.colorado.edu's password: <pin><otp>

[joan5896@login04 ~]$

But the VASCO tokens are expensive, get lost or left at home, have a battery that runs out, and have an internal clock that sometimes falls out-of-sync with the rest of the authentication system. For these and other reasons we're provisioning most new account with Duo, which provides iOS and Android apps but also supports SMS and voice calls.

Unlike VASCO, Duo is only a single authentication factor; so we've also added support for upstream CU-Boulder campus password authentication to be used in tandem.

This means that we have to support both authentication mechanisms--VASCO and password+Duo--simultaneously. A naïve implementation might just stack these methods together.

auth sufficient pam_radius_auth.so try_first_pass # VASCO authenticates over RADIUS
auth requisite  pam_krb5.so try_first_pass # CU-Boulder campus password
auth required   pam_duo.so

This generally works: VASCO authentication is attempted first over RADIUS. If that fails, authentication is attempted against the campus password and, if that succeeds, against Duo.

Unfortunately, this generates spurious authentication failures in VASCO when using Duo to authenticate: the VASCO method fails, then Duo authentication is attempted. Users who have both VASCO and Duo accounts (e.g., all administrators) may generate enough failures to trigger the break-in mitigation security system, and the VASCO account may be disabled. This same issue exists if we reverse the authentication order to try Duo first, then VASCO: VASCO users might then cause their campus passwords to become disabled.

In stead, we need to enable users to explicitly specify which authentication method they're using.

Separate sssd domains

Our first attempt to provide explicit access to different authentication methods was to provide multiple redundant sssd domains.

[domain/rc]
description = Research Computing
proxy_pam_target = curc-twofactor-vasco


[domain/duo]
description = Research Computing (identikey+duo authentication)
enumerate = false
proxy_pam_target = curc-twofactor-duo

This allows users to log in normally using VASCO, while password+Duo authentication can be requested explicitly by logging in as ${user}@duo.

$ ssh -l joan5896@duo login.rc.colorado.edu

This works well enough for the common case of shell access over SSH: login is permitted and, since both the default rc domain and the duo alias domain are both backed by the same LDAP directory, NSS sees no important difference once a user is logged in using either method.

This works because POSIX systems store the uid number returned by PAM and NSS, and generally resolve the uid number to the username on-demand. Not all systems work this way, however. For example, when we attempted to use this authentication mechanism to authenticate to our prototype JupyterHub (web) service, jobs dispatched to Slurm retained the ${user}@duo username format. Slurm also uses usernames internally, and the ${user}@duo username is not populated within Slurm: only the base ${user} username.

Expecting that we would continue to find more unexpected side-effects of this implementation, we started to look for an alternative mechanism that doesn't modify the specified username.

pam_authtok

In general, a user provides two pieces of information during authentication: a username (which we've already determined we shouldn't modify) and an authentication token or password. We should be able to detect, for example, a prefix to that authentication token to determine what authentication method to use.

$ ssh joan5896@login.rc.colorado.edu
joan5896@login.rc.colorado.edu's password: duo:<password>

[joan5896@login04 ~]$

But we found no such pam module that would allow us to manipulate the authentication token... so we wrote one.

auth [success=1 default=ignore] pam_authtok.so prefix=duo: strip prompt=password:

auth [success=done new_authtok_reqd=done default=die] pam_radius_auth.so try_first_pass

auth requisite pam_krb5.so try_first_pass
auth [success=done new_authtok_reqd=done default=die] pam_duo.so

Now our PAM stack authenticates against VASCO by default; but, if the user provides a password with a duo: prefix, authentication skips VASCO and authenticates the supplied password, followed by Duo push. Our actual production PAM stack is a bit more complicated, supporting a redundant vasco: prefix as well, for forward-compatibility should we change the default authentication mechanism in the future. We can also extend this mechanism to add arbitrary additional authentication mechanisms in the future.

The curc::sysconfig::scinet Puppet module

I've been working on a new module, curc::sysconfig::scinet, which will generally do the Right Thing™ when configuring a host on the CURC science network, with as little configuration as possible.

Let's look at some examples.

login nodes

class { 'curc::sysconfig::scinet':
  location => 'comp',
  mgt_if   => 'eth0',
  dmz_if   => 'eth1',
  notify   => Class['network'],
}

This is the config used on a new-style login node like login05 and login07. (What makes them new-style? Mostly just that they've had their interfaces cleaned up to use eth0 for "mgt" and eth1 for "dmz".)

Here's the routing table that this produced on login07:

$ ip route list
10.225.160.0/24 dev eth0  proto kernel  scope link  src 10.225.160.32 
10.225.128.0/24 via 10.225.160.1 dev eth0 
192.12.246.0/24 dev eth1  proto kernel  scope link  src 192.12.246.39 
10.225.0.0/20 via 10.225.160.1 dev eth0 
10.225.0.0/16 via 10.225.160.1 dev eth0  metric 110 
10.128.0.0/12 via 10.225.160.1 dev eth0  metric 110 
default via 192.12.246.1 dev eth1  metric 100 
default via 10.225.160.1 dev eth0  metric 110

Connections to "mgt" subnets use the "mgt" interface eth0, either by the link-local route or the static routes via comp-mgt-gw (10.225.160.1). Connections to the "general" subnet (a.k.a. "vlan 2049"), as well as the rest of the science network ("data" and "svc" networks) also use eth0 by static route. The default eth0 route is configured by DHCP, but the interface has a default metric of 110, so it doesn't conflict with or supersede eth1's default route, which is configured with a lower metric of 100.

Speaking of eth1, the "dmz" interface is configured statically, using information retrieved from DNS by Puppet.

$ cat /etc/sysconfig/network-scripts/ifcfg-eth1 
TYPE=Ethernet
DEVICE=eth1
BOOTPROTO=static
HWADDR=00:50:56:88:2E:36
ONBOOT=yes
IPADDR=192.12.246.39
NETMASK=255.255.255.0
GATEWAY=192.12.246.1
METRIC=100
IPV4_ROUTE_METRIC=100

Usually the routing priority of the "dmz" interface would mean that inbound connections to the "mgt" interface from outside of the science network would be blocked when the "dmz"-bound response is filtered by rp_filter; but curc::sysconfig::scinet also configures routing policy for eth0, so traffic on that interface always returns from that interface.

$ ip rule show | grep 'lookup 1'
32764:  from 10.225.160.32 lookup 1 
32765:  from all iif eth0 lookup 1

$ ip route list table 1
default via 10.225.160.1 dev eth0

This allows me to ping login07.rc.int.colorado.edu from my office workstation.

$ ping -c 1 login07.rc.int.colorado.edu
PING login07.rc.int.colorado.edu (10.225.160.32) 56(84) bytes of data.
64 bytes from 10.225.160.32: icmp_seq=1 ttl=62 time=0.507 ms

--- login07.rc.int.colorado.edu ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 1ms
rtt min/avg/max/mdev = 0.507/0.507/0.507/0.000 ms

Because the default route for eth0 is actually configured, outbound routing from login07 is resilient to failure of the "dmz" link.

# ip route list | grep -v eth1
10.225.160.0/24 dev eth0  proto kernel  scope link  src 10.225.160.32 
10.225.128.0/24 via 10.225.160.1 dev eth0 
10.225.0.0/20 via 10.225.160.1 dev eth0 
10.225.0.0/16 via 10.225.160.1 dev eth0  metric 110 
10.128.0.0/12 via 10.225.160.1 dev eth0  metric 110 
default via 10.225.160.1 dev eth0  metric 110

Traffic destined to leave the science network simply proceeds to the next preferred (and, in this case, only remaining) default route, comp-mgt-gw.

DHCP, DNS, and the FQDN

Tangentially, it's important to note that the DHCP configuration of eth0 will tend to re-wite /etc/resolv.conf and the search path it defines, with the effect of causing the FQDN of the host to change to login07.rc.int.colorado.edu. Because login nodes are logically (and historically) external hosts, not internal hosts, they should prefer their external identity to their internal identity. As such, we override the domain search path on login nodes to cause them to discover their rc.colorado.edu FQDN's first.

# cat /etc/dhcp/dhclient-eth0.conf 
supersede domain-search "rc.colorado.edu", "rc.int.colorado.edu";

PetaLibrary/repl

The Petibrary/repl GPFS NSD nodes replnsd{01,02} are still in the "COMP" datacenter, but only attach to "mgt" and "data" networks.

class { 'curc::sysconfig::scinet':
  location         => 'comp',
  mgt_if           => 'eno2',
  data_if          => 'enp17s0f0',
  other_data_rules => [ 'from 10.225.176.61 table 2',
                        'from 10.225.176.62 table 2',
                        ],
  notify           => Class['network_manager::service'],
}

This config produces the following routing table on replnsd01...

$ ip route list
default via 10.225.160.1 dev eno2  proto static  metric 110 
default via 10.225.176.1 dev enp17s0f0  proto static  metric 120 
10.128.0.0/12 via 10.225.160.1 dev eno2  metric 110 
10.128.0.0/12 via 10.225.176.1 dev enp17s0f0  metric 120 
10.225.0.0/20 via 10.225.160.1 dev eno2 
10.225.0.0/16 via 10.225.160.1 dev eno2  metric 110 
10.225.0.0/16 via 10.225.176.1 dev enp17s0f0  metric 120 
10.225.64.0/20 via 10.225.176.1 dev enp17s0f0 
10.225.128.0/24 via 10.225.160.1 dev eno2 
10.225.144.0/24 via 10.225.176.1 dev enp17s0f0 
10.225.160.0/24 dev eno2  proto kernel  scope link  src 10.225.160.59  metric 110 
10.225.160.49 via 10.225.176.1 dev enp17s0f0  proto dhcp  metric 120 
10.225.176.0/24 dev enp17s0f0  proto kernel  scope link  src 10.225.176.59  metric 120

...with the expected interface-consistent policy-targeted routing tables.

$ ip route list table 1
default via 10.225.160.1 dev eno2

$ ip route list table 2
default via 10.225.176.1 dev enp17s0f0

Static routes for "mgt" and "data" subnets are defined for their respective interfaces. As on the login nodes above, default routes are specified for both interfaces as well, with the lower-metric "mgt" interface eno2 being preferred. (This is configurable using the mgt_metric and data_metric parameters.)

Perhaps the most notable aspect of the PetaLibrary/repl network config is the provisioning of the GPFS CES floating IP addresses 10.225.176.{61,62}. These addresses are added to the enp17s0f0 interface dynamically by GPFS, and are not defined with curc::sysconfig::scinet; but the config must reference these addresses to implement proper interface-consistent policy-targeted routing tables. Though version of Puppet deployed at CURC lacks the semantics to infer these rules from a more semantic data_ip parameter; so the other_data_rules parameter is used in stead.

  other_data_rules => [ 'from 10.225.176.61 table 2',
                        'from 10.225.176.62 table 2',
                        ],

Blanca/ICS login node

[porting the blanca login node would be great because it's got a "dmz", "mgt", and "data" interface; so it would exercise the full gamut of features of the module]