Linux policy-based routing
How could Linux policy routing be so poorly documented? It’s so useful, so essential in a multi-homed environment… I’d almost advocate for its inclusion as default behavior.
What is this, you ask? To understand, we have to start with what Linux does by default in a multi-homed environment. So let’s look at one.
$ ip addr [...] 4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 78:2b:cb:66:75:c0 brd ff:ff:ff:ff:ff:ff inet 10.225.128.80/24 brd 10.225.128.255 scope global eth2 inet6 fe80::7a2b:cbff:fe66:75c0/64 scope link valid_lft forever preferred_lft forever [...] 6: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000 link/ether e4:1d:2d:14:93:60 brd ff:ff:ff:ff:ff:ff inet 10.225.144.80/24 brd 10.225.144.255 scope global eth5 inet6 fe80::e61d:2dff:fe14:9360/64 scope link valid_lft forever preferred_lft forever
So we have two interfaces, eth2
and eth5
. They’re on separate
subnets, 10.225.128.0/24
and 10.225.144.0/24
respectively. In
our environment, we refer to these as “spsc-mgt” and “spsc-data.” The
practical circumstance is that one of these networks is faster than the
other, and we would like bulk data transfer to use the faster
“spsc-data” network.
If the client system also has an “spsc-data” network, everything is fine. The client addresses the system using its data address, and the link-local route prefers the data network.
$ ip route list 10.225.144.0/24 10.225.144.0/24 dev eth5 proto kernel scope link src 10.225.144.80
Our network environment covers a number of networks, however. So let’s say our client lives in another data network–“comp-data.” Infrastructure routing directs the traffic to the -data interface of our server correctly, but the default route on the server prefers the -mgt interface.
$ ip route list | grep ^default default via 10.225.128.1 dev eth2
For this simple case we have two options. We can either change our default route to prefer the -data interface, or we can enumerate intended -data client networks with static routes using the data interface. Since changing the default route simply leaves us in the same situation for the -mgt network, let’s define some static routes.
$ ip route add 10.225.64.0/20 via 10.225.144.1 dev eth5 $ ip route add 10.225.176.0/24 via 10.225.144.1 dev eth5
So long as we can enumerate the networks that should always use the
-data interface of our server to communicate, this basically works. But
what if we want to support clients that don’t themselves have separate
-mgt and -data networks? What if we have a single client–perhaps with
only a -mgt network connection–that should be able to communicate
individually with the server’s -mgt interface and its -data interface.
In the most pathological case, what if we have a host that is only
connected to the spsc-mgt
(10.225.128.0/24
) interface, but we
want that client to be able to communicate with the server’s -data
interface. In this case, the link-local route will always prefer the
-mgt network for the return path.
Policy-based routing
The best case would be to have the server select an outbound route based not on a static configuration, but in response to the incoming path of the traffic. This is the feature enabled by policy-based routing.
Linux policy routing allows us to define distinct and isolated routing tables, and then select the appropriate routing table based on the traffic context. In this situation, we have three different routing contexts to consider. The first of these are the routes to use when the server initiates communication.
$ ip route list table main 10.225.128.0/24 dev eth2 proto kernel scope link src 10.225.128.80 10.225.144.0/24 dev eth5 proto kernel scope link src 10.225.144.80 10.225.64.0/20 via 10.225.144.1 dev eth5 10.225.176.0/24 via 10.225.144.1 dev eth5 default via 10.225.128.1 dev eth2
A separate routing table defines routes to use when responding to traffic from the -mgt interface.
$ ip route list table 1 default via 10.225.128.1 dev eth2
The last routing table defines routes to use when responding to traffic from the -data interface.
$ ip route list table 2 default via 10.225.144.1 dev eth5
With these separate routing tables defined, the last step is to define the rules that select the correct routing table.
$ ip rule list 0: from all lookup local 32762: from 10.225.144.80 lookup 2 32763: from all iif eth5 lookup 2 32764: from 10.225.128.80 lookup 1 32765: from all iif eth2 lookup 1 32766: from all lookup main 32767: from all lookup default
Despite a lack of documentation, all of these rules may be codified in
Red Hat “sysconfig”-style “network-scripts” using interface-specific
route-
and rule-
files.
$ cat /etc/sysconfig/network-scripts/route-eth2 default via 10.225.128.1 dev eth2 default via 10.225.128.1 dev eth2 table 1 $ cat /etc/sysconfig/network-scripts/route-eth5 10.225.64.0/20 via 10.225.144.1 dev eth5 10.225.176.0/24 via 10.225.144.1 dev eth5 default via 10.225.144.1 dev eth5 table 2 $ cat /etc/sysconfig/network-scripts/rule-eth2 iif eth2 table 1 from 10.225.128.80 table 1 $ cat /etc/sysconfig/network-scripts/rule-eth5 iif eth5 table 2 from 10.225.144.80 table 2
Changes to the RPDB made with these commands do not become active immediately. It is assumed that after a script finishes a batch of updates, it flushes the routing cache with ip route flush cache.