I just posted the entirety of the P.O.A. album, "Home."
Chris Hill, a member of our church, shared this prayer in the context of the then-upcoming presidential inauguration and Martin Luther King, Jr. Day.
In a time of self-described conservatism vs liberalism, I found it remarkably neither, but only Christian.
Father we approach you today with many gratitudes, thoughts and requests. As a community we first empty our hands of those things that do not belong to us. We lay down our worldly possessions, those things that you have loaned us. We lay down our worldly successes and failures, which do not define us. And we lay down the pride that so easily devours us and those we live around. We bring before you our weakness, and thank you for it. We understand that without it, we would not see our desperation for you.
Behold us, Father, as we Behold you. See us. Understand us. Know our Human hearts. Together, today, we want to bring before you two events that we will undoubtedly carry with us this week. We bring these before you in faith that you are worth approaching and worth glorifying. We also bring these before you recognizing that only you are Good.
As Barak Obama and his family leave the presidency, we thank you for the ways you have worked during the 8 years he has served our country. Would you bless his family as they adjust to life outside of the white house. And as Donald Trump and his family transition into the presidency, we pray with hope and expectation that you would use them to strengthen the Kingdom of Heaven. Give us the strength to bear with one another in love and patience.
As we remember the life and work of Martin Luther King Jr. would you have mercy on us. Will you call us out of our complacency as middle to upper class white America to gaze across the scene and remember what’s painfully obvious and self-evident. That the person we see with a different skin color than our own is, in fact, a person. An image bearer. A jar of clay containing my blood, and your Holy Spirit. Today Father, we remember that much of what happened during the Civil Rights movement was guided and fueled by your Spirit. Thank you for gifting us a man who was able to give an ear to you and an ear to the people while persevering through an agony and persecution that few of us in this room understand. Thank you for the lives of our brothers and sisters who have fought to level our view of humanity. But if we are going to acknowledge what has been done Father, we will also acknowledge the work that still needs to be done. Would you equip those of us who are not white with hope, strength, and perseverance. Would you equip those of us who are white with ears to hear and eyes to see your children.
We believe that you, Holy Spirit, are the only one who can bring reconciliation between us and our neighbor. So we lean into you, ready to be led. Jesus, you are all and you are in all. We ask these things in your perfect name. Amen.
I read Shūsaku Endō's Silence as part of a book club with my pastor and a few other members of our church. The book was scheduled coming out of the holiday season so that if we didn't have time or motivation to read we could at least all watch the new film together and discuss the story.
I hadn't managed to finish the book by the time I reached the theatre. When I left Rodrigues (on the page) he was being brought before the authorities for interrogation and defending the purpose of the church in Japan.
When we passed that moment in the film, I appreciated the fresh perspective of watching the story play out on screen, but I realized that I had actually managed to remain unspoiled on the remaining plot. When Rodrigues was climatically confronted with the decision to trample on the fumi-e or allow others to suffer, I was overwhelmed by the cumulative anticipation of not one but two readings: I've never before experience so palpable a moment of, "I have no idea what is about to happen."
I've wrestled for years with the question of whether it would be sin to accept damnation--defined here as separation from God--for the sake of another's salvation. Self-sacrifice is good; but might such a sacrifice be construed an elevation of man over God?
Through Silence I've concluded that such a sacrifice is good, but that its consequence is inherent: damnation. In fact, in Christian theology, this is the sacrifice Christ made for us, and only Christ could both endure all of our damnation and still remain blameless.
And still, salvation through Christ is sufficient even for those who would deny him for the sake of others. It's obvious when you consider the apostle Peter, who famously denied association with Christ three times; but I hadn't before seen this portrayed so vividly, and the story of Peter is perhaps too familiar to be so impactful. It's easy to vilify Kichijiro when he repeatedly betrays the Kirishitans, and to become dismissive as Rodrigues when the acts of confession and atonement becomes rote and seemingly meaningless; but Rodrigues and Kichijiro both demonstrate what Peter did in the Passion: that Christ offers forgiveness and reconciliation even to those who betray him.
After more consideration, though, I fear that the Silence that has affected me so deeply exists only in my own heart and mind. The book, perhaps more than the film, might actually be more concerned with a technical definition of apostasy and Rodrigues' prideful self-image as a Christ figure than it is with deeper questions of the nature of salvation. He's a bit like Job, in a way: so assured of his blamelessness and rite of martyrdom that he can't see how he himself falls short of the perfection he aspires to.
But I still can't stop thinking about Silence, and I'm struck more than ever by the potential discontinuity between the story the author wrote and the story in my mind.
I can't imagine what Silence must mean to a Japanese Buddhist. From my western Christian perspective the story is familiar enough, and I implicitly understand the context and motivation of Rodrigues and his fellow Jesuits. But what I read was an English translation from original Japanese, ostensibly intended for a Japanese audience, and that presumably non-Christian. How could a Japanese person, with no experience with the church or Christ, possibly react to any of this?
As part of another SSH client article we potentially generated a new ssh key for use in ssh public-key authentication.
$ ssh-keygen -t rsa -b 4096 # if you don't already have a key
SSH public-key authentication has intrinsic benefits; but many see it as a mechanism for non-interactive login: you don't have to remember, or type, a password.
This behavior is dependent, however, on having a non-encrypted private key. This is a security risk, because the non-encrypted private key may be compromised, either by accidential mishandling of the file or by unauthorized intrusion into the client system. In almost all cases, ssh private keys should be encrypted with a passphrase.
$ ssh-keygen -t rsa -b 4096 -f test Generating public/private rsa key pair. Enter passphrase (empty for no passphrase): Enter same passphrase again:
If you already have a passphrase that is not encrypted, use the
ssh-keygen to set one.
$ ssh-keygen -p -f ~/.ssh/id_rsa
Now the private key is protected by a passphrase, which you'll be prompted for each time you use it. This is better than a password, because the passphrase is not transmitted to the server; but we've lost the ability to authenticate without having to type anything.
OpenSSH provides a dedicated agent process for the sole purpose of handling decrypted ssh private keys in-memory. Most Unix and Linux desktop operating systems (including OS X) start and maintain a per-user SSH agent process automatically.
$ pgrep -lfu $USER ssh-agent 815 /usr/bin/ssh-agent -l
ssh-add command, you can decrypt your ssh private key by
inputing your passphrase once, adding the decrypted key to the running
$ ssh-add ~/.ssh/id_rsa # the path to the private key may be omitted for default paths Enter passphrase for /Users/joan5896/.ssh/id_rsa: Identity added: /Users/joan5896/.ssh/id_rsa (/Users/joan5896/.ssh/id_rsa)
The decrypted private key remains resident in the
$ ssh-add -L ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAgEAuuq61MFxmPDHf806s65O6pljn1hM/BB6XpDPu52YjOSriTtN2cgo1CpDDXNJffiwjvodr7Sq0UBTKcFvZzu41N5oGy5ob3H4axY+SRjs7giSYJna3OB+ABO3PV4LduEpt9IOFue7Q8Q0xvtSeZdNqPq2jPGXuxviFzBCVebkFE6BDdL6NI3Y591DIs2LZwm+v/wHk9cLjo//beWyMl7Gku5jnvCNozo7VBS6nfPcFW9SScAH8ow/2XP0sF4+YCOIbikQ6R69Dl18il8MumQUt+Cxc4cL39zjIuzpuGjhSTEb3TY3crHU8TH7cdhY7+WW1Ab8A19gec2vwAbAEOtdx24L7nXA2SFiImuO0FtQBS6gYp9BMKlotm0xBsNzvGsutIz6nb3iA/4OuVV9wUcE3SpicY0nE7Xn1LKfZdeSc+Jjs7K3Sry9SeTJ7awwaqQDEeAhw6GM5PUaZStwtheQe5tAe/OnwRR8J8lCXI+VUOnoTogOG7md99beygNuF1VEPHp044sjLzOf0ubcZYdA0jXqddkxG4S9TNFPNZ7dxGwdLQqWoqnwBDat4xHQ7g7ifdy1F3WKj2V+BVl4owCzlWYCRtgN5f6O18MLAjBLG0Mdd6jZoMGyBHG8jlJkMFNNiuwRqT53mR+hTFMFOHEe+DJXDPgBisLs19HBbK4MM9k= /Users/joan5896/.ssh/id_rsa
This is better than a non-encrypted on-disk private key for two reasons: first the decrypted private key exists only in memory, not on disk. This makes is more difficult to mishandle, including the fact that it cannot be recovered without re-inputing the passphrase once the workstation is powered off. Second, client applications (like OpenSSH itself) no longer require direct access to the private key, encrypted or otherwise, nor must you provide your (secret) key passphrase to client applications: the agent moderates all use of the key itself.
The default OpenSSH client will use the agent process identified by
SSH_AUTH_SOCK environment variable by default; but you generally
don't have to worry about it: your workstation environment should
configure it for you.
$ echo $SSH_AUTH_SOCK /private/tmp/com.apple.launchd.L311i5Nw5J/Listeners
At this point, there's nothing more to do. With your ssh key added to the agent process, you're back to not needing to type in a password (or passphrase), but without the risk of a non-encrypted private key stored permanently on disk.
It's good practice to harden our ssh client with some secure
"defaults". Starting your configuration file with the following
directives will apply the directives to all (
(These are listed as multiple
Host * stanzas, but they can be
combined into a single stanza in your actual configuration file.)
If you prefer, follow along with
an example of a complete
Require secure algorithms
OpenSSH supports many encryption and authentication algorithms, but some of those algorithms are known to be weak to cryptographic attack. The Mozilla project publishes a list of recommended algorithms that exclude algorithms that are known to be insecure.
Host * HostKeyAlgorithms firstname.lastname@example.org,email@example.com,ssh-ed25519,ssh-rsa,firstname.lastname@example.org,email@example.com,firstname.lastname@example.org,ecdsa-sha2-nistp521,ecdsa-sha2-nistp384,ecdsa-sha2-nistp256 Ciphers email@example.com,firstname.lastname@example.org,email@example.com,aes256-ctr,aes192-ctr,aes128-ctr MACs firstname.lastname@example.org,email@example.com,firstname.lastname@example.org,hmac-sha2-512,hmac-sha2-256,email@example.com KexAlgorithms firstname.lastname@example.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1
(More information on the the available encryption and authentication algorithms, and how a recommended set is derived, is available in this fantastic blog post, "Secure secure shell.")
Every time you connect to an SSH server, your client caches a copy of
the remote server's host key in a
~/.ssh/known_hosts file. If your
ssh client is ever compromised, this list can expose the remote
servers to attack using your compromised credentials. Be a good
citizen and hash your known hosts file.
Host * HashKnownHosts yes
(Hash any existing entries in your
~/.ssh/known_hosts file by
ssh-keygen -H. Don't forget to remove the backup
$ ssh-keygen -H $ rm -i ~/.ssh/known_hosts.old
Host * UseRoaming no
Dealing with insecure servers
Some servers are old enough that they may not support the newer, more
secure algorithms listed. In the RC environment, for example, the
login and other Internet-accessible systems provide relatively modern
ssh algorithms; but the host in the
rc.int.colorado.edu domain may
To support connection to older hosts while requiring newer algorithms by default, override these settings earlier in the configuration file.
# Internal RC hosts are running an old version of OpenSSH Match host=*.rc.int.colorado.edu MACs hmac-sha1,email@example.com,hmac-ripemd160,firstname.lastname@example.org,hmac-sha1-96
The OpenSSH client is very robust, verify flexible, and very configurable. Many times I see people struggling to remember server-specific ssh flags or arcane, manual multi-hop procedures. I even see entire scripts written to automate the process.
But the vast majority of what you might want ssh to do can be
abstracted away with some configuration in your
All (or, at least, most) of these configuration directives are fully
If you prefer, follow along with
an example of a complete
One of the first annoyances people have--and one of the first things
people try to fix--when using a command-line ssh client is having to
type in long hostnames. For example, the Research Computing login
service is available at
$ ssh login.rc.colorado.edu
This particular name isn't too bad; but coupled with usernames and
especially when used as part of an
scp, these fully-qualified domain
names can become cumbersome.
$ scp -r /path/to/src/ email@example.com:dest/
OpenSSH supports host aliases through pattern-matching in
Host login*.rc HostName %h.colorado.edu Host *.rc HostName %h.int.colorado.edu
In this example,
%h is substituted with the name specified on the
command-line. With a configuration like this in place, connections to
login.rc are directed to the full name
$ scp -r /path/to/src/ firstname.lastname@example.org:dest/
Failing that, other references to hosts with a
.rc suffix are
directed to the internal Research Computing domain. (We'll use these
.rc domain segment could be moved from the
Host pattern to
HostName value; but leaving it in the alias helps to distinguish
the Research Computing login nodes from other login nodes that you may
have access to. You can use arbitrary aliases in the
too; but then the
%h substitution isn't useful: you have to
enumerate each targeted host.)
Unless you happen to use the same username on your local workstation
as you have on the remove server, you likely specify a username using
@ syntax or
-l argument to the
$ ssh email@example.com
As with specifying a fully-qualified domain name, tracking and
specifying a different username for each remote host can become
burdensome, especially during an
scp operation. Record the correct
username in your
~/.ssh/config file in stead.
Match host=*.rc.colorado.edu,*.rc.int.colorado.edu User joan5896
Now all connections to Research Computing hosts use the specified username by default, without it having to be specified on the command-line.
$ scp -r /path/to/src/ login.rc:dest/
Note that we're using a
Match directive here, rather than a
host= argument to
Match matches against the derived
hostname, so it reflects the real hostname as determined using the
Host directives. (Make sure the correct
established earlier in the configuration, though.)
Even if the actual command is simple to type, authenticating to the
host may be require manual intervention. The Research Computing login
nodes, for example, require two-factor authentication using a password
or pin coupled with a one-time VASCO password or Duo credential. If
you want to open multiple connections--or, again, copy files using
scp--having to authenticate with multiple factors quickly becomes
tedious. (Even having to type in a password at all may be unnecessary;
but we'll assume, as is the case with the Research Computing login
example, that you can't use public-key authentication.)
OpenSSH supports sharing a single network connection for multiple ssh sessions.
Match host=login.rc.colorado.edu ControlMaster auto ControlPath ~/.ssh/.socket_%h_%p_%r ControlPersist 4h
ControlPath defined, the first ssh
connection authenticates and establishes a session normally; but
future connections join the active connection, bypassing the need to
re-authenticate. The optional
ControlPersist option causes this
connection to remain active for a period of time even after the last
session has been closed.
$ ssh login.rc firstname.lastname@example.org's password: [joan5896@login01 ~]$ logout $ ssh login.rc [joan5896@login01 ~]$
(Note that many arguments to the
ssh command are effectively ignored
after the initial connection is established. Notably, if X11 was
not forwarded with
-Y during the first session, you cannot
use the shared connection to forward X11 in a later session. In this
case, use the
-S none argument to
ssh to ignore the existing
connection and explicitly establish a new connection.)
But what if you want to get to a host that isn't directly available
from your local workstation? The hosts in the
domain referenced above may be accessible from a local network
connection; but if you are connecting from elsewhere on the Internet,
you won't be able to access them directly.
Except that OpenSSH provides the
ProxyCommand option which, when
coupled with the OpenSSH client presumed to be available on the
intermediate server, supports arbitrary proxy connections through to
Match host=*.rc.int.colorado.edu ProxyCommand ssh -W %h:%p login.rc.colorado.edu
Even though you can't connect directly to Janus compute nodes from the
Internet, for example, you can connect to them from a Research
Computing login node; so this
ProxyCommand configuration allows
transparent access to hosts in the internal Research Computing domain.
$ ssh janus-compile1.rc [joan5896@janus-compile1 ~]$
And it even works with
$ echo 'Hello, world!' >/tmp/hello.txt $ scp /tmp/hello.txt janus-compile1.rc:/tmp hello.txt 100% 14 0.0KB/s 00:00 $ ssh janus-compile1.rc cat /tmp/hello.txt Hello, world!
If you tried the example above, chances are that you were met with an unexpected password prompt that didn't accept any password that you used. That's because most internal Research Computing hosts don't actually support interactive authentication, two-factor or otherwise. Connections from a CURC login node are authorized by the login node; but a proxied connection must authenticate from your local client.
The best way to authenticate your local workstation to an internal CURC host is using public-key authentication.
If you don't already have an SSH key, generate one now.
$ ssh-keygen -t rsa -b 4096 # if you don't already have a key
Now we have to copy the (new?) public key to the remote CURC
~/.ssh/authorized_keys file. RC provides a global home directory, so
copying to any login node will do. Targeting a specific login node is
useful, though: the
ControlMaster configuration for
login.rc.colorado.edu tends to confuse
$ ssh-copy-id login01.rc
ssh-copy-id command doesn't come with OS X, but theres a
third-party port available
on GitHub. It's
usually available on a Linux system, too. Alternatively, you can just
This article was first published in the Fall 2016 issue of Usenix ;login:.
Typical IP-networked hosts are configured with a single default route. For single-homed hosts the default route defines the first destination for packets addressed outside of the local subnet; but for multi-homed hosts the default route also implicitly defines a default interface to be used for all outbound traffic. Specific subnets may be accessed using non-default interfaces by defining static routes; but the single default route remains a "single point of failure" for general access to other and Internet subnets. The Linux kernel, together with the iproute2 suite supports the definition of multiple default routes distinguished by a preference metric. This allows alternate networks to serve as fail-over for the preferred default route in cases where the link has failed or is otherwise unavailable.
The CU-Boulder Research Computing environment spans three datacenters, each with its own set of special-purpose networks. Public-facing hosts may be accessed through a 1:1 NAT or via a dedicated "DMZ" VLAN that spans all three environments. We have historically configured whichever interface was used for inbound connection from the Internet as the default route in order to support responses to connections from Internet clients; but our recent and ongoing deployment of policy routing (as described in a previous issue of ;login:) removes this requirement.
All RC networks are capable of routing traffic with each other, the campus intranet, and the greater Internet, so we more recently prefer the host's "management" interface as its default route as a matter of convention; but this unnecessarily limits network connectivity in cases where the default interface is down, whether by link failure or during a reconfiguration or maintenance process.
The problem with a single default route
The simplest Linux host routing table is a system with a single network interface.
# ip route list default via 10.225.160.1 dev ens192 10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38
Traffic to hosts on
10.225.160.0/24 is delivered directly,
while traffic to any other network is forwarded to
10.225.160.1. In this case, the default route eventually
provides access to the public Internet.
# ping -c1 example.com PING example.com (22.214.171.124) 56(84) bytes of data. 64 bytes from 126.96.36.199: icmp_seq=1 ttl=54 time=24.0 ms --- example.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 24.075/24.075/24.075/0.000 ms
A dual-homed host adds a second network interface and a second link-local route; but the original default route remains.
# ifup ens224 && ip route list default via 10.225.160.1 dev ens192 10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38 10.225.176.0/24 dev ens224 proto kernel scope link src 10.225.176.38
The new link-local route provides access to hosts on
10.225.176.0/24; but traffic to other networks still requires
access to the default interface as defined by the single default
route. If the default route interface is unavailable, external
networks become inaccessible, even though identical routing is
Attempts to add a second default route fail with an error message (in typically unhelpful iproute2 fashion) implying that it is impossible to configure a host with multiple default routes simultaneously.
It would be better if the host could select dynamically from any of
the physically available routes.; but without an entry in the host's
routing table directing packets out the
interface, the host will simply refuse to deliver the packets.
Multiple default routes and routing metrics
RTNETLINK error above indicates that the
"data" route cannot be added to the table because a conflicting route
already exists--in this case, the
route. Both routes target the "default" network, which would lead to
non-deterministic routing with no way to select one route in favor of
However, the Linux routing table supports more attributes than the "via" address and "dev" specified in the above example. Of use here, the "metric" attribute allows us to specify a preference number for each route.
# ip route change default via 10.225.160.1 dev ens192 metric 100 # ip route add default via 10.225.176.1 dev ens224 metric 200 # ip route flush cache
The host will continue to prefer the
interface for its default route, due to its lower metric number; but,
if that interface is taken down, outbound packets will automatically
be routed via the
ens224 "data" interface.
# ifdown ens192 && ping -c1 example.com; ifup ens192 PING example.com (188.8.131.52) 56(84) bytes of data. 64 bytes from example.com (184.108.40.206): icmp_seq=1 ttl=54 time=29.0 ms --- example.com ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms rtt min/avg/max/mdev = 29.032/29.032/29.032/0.000 ms
Persisting the configuration
This custom routing configuration can be persisted in the Red Hat
"ifcfg" network configuration system by specifying a
number in the
ifcfg- files. This metric will be applied to any
route populated by DHCP or by a
GATEWAY value in the
ifcfg- file or
# grep METRIC= /etc/sysconfig/network-scripts/ifcfg-ens192 METRIC=100 # grep METRIC= /etc/sysconfig/network-scripts/ifcfg-ens224 METRIC=200
Alternatively, routes may be specified using
files. These routes must define metrics explicitly.
Alternatives and further improvements
The NetworkManager service in RHEL 7.x handles multiple default routes correctly by supplying distrinct metrics automatically; but, of course, specifying route metrics manually allows you to control which route is preferred explicitly.
I continue to wonder if it might be better to go completely dynamic and actually run OSPF on all multi-homed hosts. This should--in theory--allow our network to be even more automatically dynamic in response to link availability, but this may be too complex to justify in our environment.
There's also potential to use all available routes simultaneously with weighted load-balancing, either per-flow or per-packet. This is generally inappropriate in our environment; but could be preferable in an environment where the available networks are definitively general-purpose.
We've integrated a multiple-default-route configuration into our
standard production network configuration, which is being deployed in
parallel with our migration to policy routing. Now the default route
is specified not by the static binary existence of a single
default entry in the routing table; but by an order of
preference for each of the available interfaces. This allows our hosts
to remain functional in more failure scenarios than before, when link
failure or network maintenance makes the preferred route unavailable.
This article was first published in the Summer 2016 issue of Usenix ;login:
Traditional IP routing systems route packets by comparing the destinaton address against a predefined list of routes to each available subnet; but when multiple potential routes exist between two hosts on a network, the preferred route may be dependent on context that cannot be inferred from the destination alone. The Linux kernel, together with the iproute2 suite, supports the definition of multiple routing tables and a routing policy database to select the preferred routing table dynamically. This additional expressiveness can be used to avoid multiple routing pitfalls, including asymmetric routes and performance bottlenecks from suboptimal route selection.
The CU-Boulder Research Computing environment spans three datacenters, each with its own set of special-purpose networks. A traditionally-routed host simultaneously connected to two or more of these networks compounds network complexity by making only one interface (the default gateway) generaly available across network routes. Some cases can be addressed by defining static routes; but even this leads to asymmetric routing that is at best confusing and at worst a performance bottleneck.
Over the past few months we've been transitioning our hosts from a single-table routing configuration to a policy-driven, multi-table routing configuration. The end result is full bidirectional connectivity between any two interfaces in the network, irrespective of underlying topology or a host's default route. This has reduced the apparent complexity in our network by allowing the host and network to Do the Right Thing™ automatically, unconstrained by an otherwise static route map.
Linux policy routing has become an essential addition to host configuration in the University of Colorado Boulder "Science Network." It's so useful, in fact, that I'm surprised a basic routing policy isn't provided by default for multi-homed servers.
The problem with traditional routing
The simplest Linux host routing scenario is a system with a single network interface.
# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:88:56:1f brd ff:ff:ff:ff:ff:ff inet 10.225.160.38/24 brd 10.225.160.255 scope global dynamic ens192 valid_lft 60184sec preferred_lft 60184sec
Such a typically-configured network with a single uplink has a single default route in addition to its link-local route.
# ip route list default via 10.225.160.1 dev ens192 10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38
Traffic to hosts on
10.225.160.0/24 is delivered directly,
while traffic to any other network is forwarded to
A dual-homed host adds a second network interface and a second link-local route; but the original default route remains. (Figure 1.)
# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever 2: ens192: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:88:56:1f brd ff:ff:ff:ff:ff:ff inet 10.225.160.38/24 brd 10.225.160.255 scope global dynamic ens192 valid_lft 86174sec preferred_lft 86174sec 3: ens224: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000 link/ether 00:50:56:88:44:18 brd ff:ff:ff:ff:ff:ff inet 10.225.176.38/24 brd 10.225.176.255 scope global dynamic ens224 valid_lft 69193sec preferred_lft 69193sec # ip route list default via 10.225.160.1 dev ens192 10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38 10.225.176.0/24 dev ens224 proto kernel scope link src 10.225.176.38
The new link-local route provides access to hosts on
10.225.176.0/24, and is sufficient for a private network
connecting a small cluster of hosts. In fact, this is the
configuration that we started with in our Research Computing
.160.0/24 is a low-performance "management"
.176.0/24 is a high-performance "data" network.
In a more complex network, however, link-local routes quickly become insufficient. In the CU Science Network, for example, each datacenter is considered a discrete network zone with its own set of "management" and "data" networks. For hosts in different network zones to communicate, a static route must be defined in each direction to direct performance-sensitive traffic across the high-performance network route. (Figure 2.)
server # ip route add 10.225.144.0/24 via 10.225.176.1 client # ip route add 10.225.176.0/24 via 10.225.144.0
Though managing these static routes can be tedious, they do sufficiently define connectivity between the relevant network pairs: "data" interfaces route traffic to each other via high-performance networks, while "management" interfaces route traffic to each other via low-performance networks. Other networks (e.g., the Internet) can only communicate with the hosts on their default routes; but this limitation may be acceptable for some scenarios.
Even this approach is insufficient, however, to allow traffic between
"management" and "data" interfaces. This is particularly problematic
when a client host is not equipped with a symmetric set of network
interfaces. (Figure 3.) Such a client may only have a "management"
interface, but should still communicate with the server's
high-performance interface for certain types of traffic. (For example,
a dual-homed NFS server should direct all NFS traffic over its
high-performance "data" network, even when being accessed by a client
that itself only has a low-performance "management" interface.) By
default, the Linux rp_filter blocks this traffic, as the server's
response to the client targets a different route than the incomming
request; but even if
rp_filter is disabled, this asymmetric
route limits the server's aggregate network bandwidth to that of its
The server's default route could be moved to the "data" interface--in some scenarios, this may even be preferable--but this only displaces the issue: clients may then be unable to communicate with the server on its "management" interface, which may be preferred for certain types of traffic. (In Research Computing, for example, we prefer that administrative access and monitoring not compete with IPC and file system traffic.)
Routing policy rules
Traditional IP routing systems route incoming packets based solely on the the intended destination; but the Linux iproute2 stack supports route selection based on additional packet metadata, including the packet source. Multiple discrete routing tables, similar to the virtual routing and forwarding (VRF) support found in dedicated routing appliances, define contextual routes, and a routing policy selects the appropriate routing table dynamically based on a list of rules.
In this example there are three different routing contexts to consider. The first of these--the "main" routing table--defines the routes to use when the server initiates communication.
server # ip route list table main 10.225.144.0/24 via 10.225.176.1 dev ens224 default via 10.225.160.1 dev ens192 10.225.160.0/24 dev ens192 proto kernel scope link src 10.225.160.38 10.225.176.0/24 dev ens224 proto kernel scope link src 10.225.176.38
A separate routing table defines routes to use when responding to traffic on the "management" interface. Since this table is concerned only with the default route's interface in isolation, it simply reiterates the default route.
server # ip route add default via 10.225.160.1 table 1 server # ip route list table 1 default via 10.225.160.1 dev ens192
Similarly, the last routing table defines routes to use when responding to traffic on the "data" interface. This table defines a different default route: all such traffic should route via the "data" interface.
server # ip route add default via 10.225.176.1 table 2 server # ip route list table 2 default via 10.225.176.1 dev ens224
With these three routing tables defined, the last step is to define routing policy to select the correct routing table based on the packet to be routed. Responses from the "management" address should use table 1, and responses from the "data" address should use table 2. All other traffic, including server-initiated traffic that has no outbound address assigned yet, uses the "main" table automatically.
server # ip rule add from 10.225.160.38 table 1 server # ip rule add from 10.225.176.38 table 2 server # ip rule list 0: from all lookup local 32764: from 10.225.176.38 lookup 2 32765: from 10.225.160.38 lookup 1 32766: from all lookup main 32767: from all lookup default
With this routing policy in place, a single-homed client (or, in fact, any client on the network) may communicate with both the server's "data" and "management" interfaces independently and successfully, and the bidirectional traffic routes consistently via the appropriate network. (Figure 4.)
Persisting the configuration
This custom routing policy can be persisted in the Red Hat "ifcfg"
network configuration system by creating interface-specific
# cat /etc/sysconfig/network-scripts/route-ens192 default via 10.225.160.1 dev ens192 default via 10.225.160.1 dev ens192 table mgt # cat /etc/sysconfig/network-scripts/route-ens224 10.225.144.0/24 via 10.225.176.1 dev ens224 default via 10.225.176.1 dev ens224 table data # cat /etc/sysconfig/network-scripts/rule-ens192 from 10.225.160.38 table mgt # cat /etc/sysconfig/network-scripts/rule-ens224 from 10.225.176.38 table data
The symbolic names
data used in these examples
are translated to routing table numbers as defined in the
Once the configuration is in place, activate it by restarting the
network service. (e.g.,
systemctl restart network) You may
also be able to achieve the same effect using
ifup on individual interfaces.
Red Hat's support for routing rule configuration has a confusing
regression that merits specific mention. Red Hat (and its derivatives)
has historically used a "
network" initscript and subscripts to
configure and manage network interfaces, and these scripts support the
rule- configuration files. Red Hat Enterprise
Linux 6 introduced NetworkManager, a persistent daemon with additional
functionality; however, NetworkManager did not support
files until version 1.0, released as part of RHEL 7.1. If you're
currently using NetworkManager, but wish to define routing policy in
rule- files, you'll need to either disable NetworkManager
entirely or exempt specific interfaces from NetworkManager by
NM_CONTROLLED=no in the relevant
In a Debian-based distribution, these routes and rules can be
post-up directives in
We're still in the process of deploying this policy-based routing configuration in our Research Computing environment; and, as we do, we discover more cases where previously complex network requirements and special-cases are abstracted away by this relatively uniform configuration. We're simultaneously evaluating other potential changes, including the possibility of running a dynamic routing protocol (such as OSPF) on our multi-homed hosts, or of configuring every network connection as a simultaneous default route for fail-over. In any case, this experience has encouraged us to take a second look at our network configuration to re-evaluate what we had previously thought were inherent limitations of the stack itself.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies and the other way is to make it so complicated that there are no obvious deficiencies.
--C.A.R. Hoare, The 1980 ACM Turing Award Lecture
How could Linux policy routing be so poorly documented? It's so useful, so essential in a multi-homed environment... I'd almost advocate for its inclusion as default behavior.
What is this, you ask? To understand, we have to start with what Linux does by default in a multi-homed environment. So let's look at one.
$ ip addr [...] 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 78:2b:cb:66:75:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.128.80/24 brd 10.225.128.255 scope global eth2 inet6 fe80::7a2b:cbff:fe66:75c0/64 scope link valid_lft forever preferred_lft forever [...] 6: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000 link/ether e4:1d:2d:14:93:60 brd ff:ff:ff:ff:ff:ff inet 10.225.144.80/24 brd 10.225.144.255 scope global eth5 inet6 fe80::e61d:2dff:fe14:9360/64 scope link valid_lft forever preferred_lft forever
So we have two interfaces,
eth5. They're on separate
10.225.144.0/24 respectively. In our
environment, we refer to these as "spsc-mgt" and "spsc-data." The
practical circumstance is that one of these networks is faster than
the other, and we would like bulk data transfer to use the faster
If the client system also has an "spsc-data" network, everything is fine. The client addresses the system using its data address, and the link-local route prefers the data network.
$ ip route list 10.225.144.0/24 10.225.144.0/24 dev eth5 proto kernel scope link src 10.225.144.80
Our network environment covers a number of networks, however. So let's say our client lives in another data network--"comp-data." Infrastructure routing directs the traffic to the -data interface of our server correctly, but the default route on the server prefers the -mgt interface.
$ ip route list | grep ^default default via 10.225.128.1 dev eth2
For this simple case we have two options. We can either change our default route to prefer the -data interface, or we can enumerate intended -data client networks with static routes using the data interface. Since changing the default route simply leaves us in the same situation for the -mgt network, let's define some static routes.
$ ip route add 10.225.64.0/20 via 10.225.144.1 dev eth5 $ ip route add 10.225.176.0/24 via 10.225.144.1 dev eth5
So long as we can enumerate the networks that should always use the
-data interface of our server to communicate, this basically
works. But what if we want to support clients that don't themselves
have separate -mgt and -data networks? What if we have a single
client--perhaps with only a -mgt network connection--that should be
able to communicate individually with the server's -mgt interface and
its -data interface. In the most pathological case, what if we have a
host that is only connected to the
interface, but we want that client to be able to communicate with the
server's -data interface. In this case, the link-local route will
always prefer the -mgt network for the return path.
The best case would be to have the server select an outbound route based not on a static configuration, but in response to the incoming path of the traffic. This is the feature enabled by policy-based routing.
Linux policy routing allows us to define distinct and isolated routing tables, and then select the appropriate routing table based on the traffic context. In this situation, we have three different routing contexts to consider. The first of these are the routes to use when the server initiates communication.
$ ip route list table main 10.225.128.0/24 dev eth2 proto kernel scope link src 10.225.128.80 10.225.144.0/24 dev eth5 proto kernel scope link src 10.225.144.80 10.225.64.0/20 via 10.225.144.1 dev eth5 10.225.176.0/24 via 10.225.144.1 dev eth5 default via 10.225.128.1 dev eth2
A separate routing table defines routes to use when responding to traffic from the -mgt interface.
$ ip route list table 1 default via 10.225.128.1 dev eth2
The last routing table defines routes to use when responding to traffic from the -data interface.
$ ip route list table 2 default via 10.225.144.1 dev eth5
With these separate routing tables defined, the last step is to define the rules that select the correct routing table.
$ ip rule list 0: from all lookup local 32762: from 10.225.144.80 lookup 2 32763: from all iif eth5 lookup 2 32764: from 10.225.128.80 lookup 1 32765: from all iif eth2 lookup 1 32766: from all lookup main 32767: from all lookup default
Despite a lack of documentation, all of these rules may be codified in
Red Hat "sysconfig"-style "network-scripts" using interface-specific
$ cat /etc/sysconfig/network-scripts/route-eth2 default via 10.225.128.1 dev eth2 default via 10.225.128.1 dev eth2 table 1 $ cat /etc/sysconfig/network-scripts/route-eth5 10.225.64.0/20 via 10.225.144.1 dev eth5 10.225.176.0/24 via 10.225.144.1 dev eth5 default via 10.225.144.1 dev eth5 table 2 $ cat /etc/sysconfig/network-scripts/rule-eth2 iif eth2 table 1 from 10.225.128.80 table 1 $ cat /etc/sysconfig/network-scripts/rule-eth5 iif eth5 table 2 from 10.225.144.80 table 2
Changes to the RPDB made with these commands do not become active immediately. It is assumed that after a script finishes a batch of updates, it flushes the routing cache with ip route flush cache.