Linux policy-based routing

How could Linux policy routing be so poorly documented? It’s so useful, so essential in a multi-homed environment… I’d almost advocate for its inclusion as default behavior.

What is this, you ask? To understand, we have to start with what Linux does by default in a multi-homed environment. So let’s look at one.

$ ip addr
[...]
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 78:2b:cb:66:75:c0 brd ff:ff:ff:ff:ff:ff
    inet 10.225.128.80/24 brd 10.225.128.255 scope global eth2
    inet6 fe80::7a2b:cbff:fe66:75c0/64 scope link
       valid_lft forever preferred_lft forever
[...]
6: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
    link/ether e4:1d:2d:14:93:60 brd ff:ff:ff:ff:ff:ff
    inet 10.225.144.80/24 brd 10.225.144.255 scope global eth5
    inet6 fe80::e61d:2dff:fe14:9360/64 scope link
       valid_lft forever preferred_lft forever

So we have two interfaces, eth2 and eth5. They’re on separate subnets, 10.225.128.0/24 and 10.225.144.0/24 respectively. In our environment, we refer to these as “spsc-mgt” and “spsc-data.” The practical circumstance is that one of these networks is faster than the other, and we would like bulk data transfer to use the faster “spsc-data” network.

If the client system also has an “spsc-data” network, everything is fine. The client addresses the system using its data address, and the link-local route prefers the data network.

$ ip route list 10.225.144.0/24
10.225.144.0/24 dev eth5  proto kernel  scope link  src 10.225.144.80

Our network environment covers a number of networks, however. So let’s say our client lives in another data network–“comp-data.” Infrastructure routing directs the traffic to the -data interface of our server correctly, but the default route on the server prefers the -mgt interface.

$ ip route list | grep ^default
default via 10.225.128.1 dev eth2

For this simple case we have two options. We can either change our default route to prefer the -data interface, or we can enumerate intended -data client networks with static routes using the data interface. Since changing the default route simply leaves us in the same situation for the -mgt network, let’s define some static routes.

$ ip route add 10.225.64.0/20 via 10.225.144.1 dev eth5
$ ip route add 10.225.176.0/24 via 10.225.144.1 dev eth5

So long as we can enumerate the networks that should always use the -data interface of our server to communicate, this basically works. But what if we want to support clients that don’t themselves have separate -mgt and -data networks? What if we have a single client–perhaps with only a -mgt network connection–that should be able to communicate individually with the server’s -mgt interface and its -data interface. In the most pathological case, what if we have a host that is only connected to the spsc-mgt (10.225.128.0/24) interface, but we want that client to be able to communicate with the server’s -data interface. In this case, the link-local route will always prefer the -mgt network for the return path.

Policy-based routing

The best case would be to have the server select an outbound route based not on a static configuration, but in response to the incoming path of the traffic. This is the feature enabled by policy-based routing.

Linux policy routing allows us to define distinct and isolated routing tables, and then select the appropriate routing table based on the traffic context. In this situation, we have three different routing contexts to consider. The first of these are the routes to use when the server initiates communication.

$ ip route list table main
10.225.128.0/24 dev eth2  proto kernel  scope link  src 10.225.128.80
10.225.144.0/24 dev eth5  proto kernel  scope link  src 10.225.144.80
10.225.64.0/20 via 10.225.144.1 dev eth5
10.225.176.0/24 via 10.225.144.1 dev eth5
default via 10.225.128.1 dev eth2

A separate routing table defines routes to use when responding to traffic from the -mgt interface.

$ ip route list table 1
default via 10.225.128.1 dev eth2

The last routing table defines routes to use when responding to traffic from the -data interface.

$ ip route list table 2
default via 10.225.144.1 dev eth5

With these separate routing tables defined, the last step is to define the rules that select the correct routing table.

$ ip rule list
0:  from all lookup local
32762:  from 10.225.144.80 lookup 2
32763:  from all iif eth5 lookup 2
32764:  from 10.225.128.80 lookup 1
32765:  from all iif eth2 lookup 1
32766:  from all lookup main
32767:  from all lookup default

Despite a lack of documentation, all of these rules may be codified in Red Hat “sysconfig”-style “network-scripts” using interface-specific route- and rule- files.

$ cat /etc/sysconfig/network-scripts/route-eth2
default via 10.225.128.1 dev eth2
default via 10.225.128.1 dev eth2 table 1

$ cat /etc/sysconfig/network-scripts/route-eth5
10.225.64.0/20 via 10.225.144.1 dev eth5
10.225.176.0/24 via 10.225.144.1 dev eth5
default via 10.225.144.1 dev eth5 table 2

$ cat /etc/sysconfig/network-scripts/rule-eth2
iif eth2 table 1
from 10.225.128.80 table 1

$ cat /etc/sysconfig/network-scripts/rule-eth5
iif eth5 table 2
from 10.225.144.80 table 2

Changes to the RPDB made with these commands do not become active immediately. It is assumed that after a script finishes a batch of updates, it flushes the routing cache with ip route flush cache.

Two Compasses

A captain and his first mate were sailing on the open ocean. Each possessed a compass, with a third affixed to the helm. The first mate approached the captain, saying, “Sir: in my clumsiness this morning, I have dropped my compass into the sea! What should I do?” After a moment, the captain wordlessly turned and threw his own compass into the sea. The first mate watched and became enlightened.

The Psalms and Me

Every time I read the Psalms:

Hear my prayer, O LORD, Give ear to my supplications!

“Oh, maybe this will be a nice verse to uplift my friend!”

For the enemy has persecuted my soul; He has crushed my life to the ground; He has made me dwell in dark places, like those who have long been dead. Therefore my spirit is overwhelmed within me; My heart is appalled within me.

“Yes! I’ll bet this message will really resonate with my friend! Sometimes we all feel downtrodden.”

Answer me quickly, O LORD, my spirit fails; Do not hide Your face from me, Or I will become like those who go down to the pit.

“Yes! When we are at our lowest, we should run to God!”

And in Your lovingkindness, cut off my enemies And destroy all those who afflict my soul

“Um… David? We cool, bro?”

Awake to punish all the nations; Do not be gracious to any who are treacherous in iniquity.

“Hey… I don’t know if I meant all that…”

Scatter them by Your power, and bring them down, O Lord, our shield.

Destroy them in wrath, destroy them that they may be no more

Deal with them as You did with Midian, […] they became manure for the ground

“Hold on, there, man! Let’s not get too crazy…”

How blessed will be the one who seizes and dashes your little ones Against the rock.

sigh

“Come on, David. Things were going so well. I mean, sure: you murdered a man to cover up the fact that you impregnated his wife… but you were sorry, right?”

Wash me thoroughly from my iniquity And cleanse me from my sin.

“See? There. That’s more like it…”

O God, shatter their teeth in their mouth; Break out the fangs of the young lions, O LORD.

“No! Bad David! No biscuit!”

“Whatcha got for me, Jesus?”

You have heard that it was said, "You shall love your neighbor and hate your enemy. But I say to you, love your enemies and pray for those who persecute you, so that you may be sons of your Father who is in heaven; for He causes His sun to rise on the evil and the good, and sends rain on the righteous and the unrighteous. For if you love those who love you, what reward do you have?

“Aw, yeah. That’s the stuff.”

“David, have you met this guy? I think you should probably meet this guy.”

6 February 2015

I brought my Go board into work today because Jon mentioned that he was interested in learning how to play. Maybe I’ll finally get to start playing regularly. Yet one more example of how much better life is here.

Speaking of which, I spent all night dreaming about high-stress situations, only to wake up and realize that life is actually so much easier than I was dreaming. For some reason we had moved back to KAUST, and I was trying to justify it to ourselves and to our family. Meanwhile, we were apparently expecting guests for some kind of two-week visit, and I was stressing out about having scheduled activities–notably a schedule of who would be preparing food–ahead of time. Doesn’t even make sense; but I certainly felt better once I was awake.

We had a meeting scheduled with Seagate today to talk about the storage system for Saga; but they’ve just cancelled at the last minute. Quite frustrating that we weren’t told until 10 minutes before the meeting: I was feeling pretty bad this morning, and had been contemplating working from home today. Still, maybe it’s for the best: maybe it’s better that I not try to work from home. Better to communicate with Aaron, after all.

I passed Aaron some docs for Slurm, but it’s certainly not finished. Still, I also gave him links to all the source material in the existing user guide, so he has something to look at, anyway.

I need to port our notes from the bench review into the github issues, and review my flagged email in general.

At home, I want to go through all the inboxes and pay bills. Horray for the weekend.

Meanwhile, I still need to finish up that pam stack for local passwords and Google OTP.

Oh: and I need to send Thomas a picture of the chair we got!

5 February 2015

Home

I finally got curtain brackets mounted in the family room, though no curtains are hung yet. Last night my drill battery died, and I ended up with a drill bit stuck in the wall; but now it’s out, anchors in, and brackets hung. Tonight, curtains.

Tech

My piratebox still won’t install the software from the usb stick, claiming that the storage device is full. Since there’s plenty of space left on the usb stick, I can only assume it’s talking about the internal storage. Maybe I flashed it incorrectly? Or maybe a previous failed install has left it without some space it exepcts to have? Or maybe I need to do some kind of reset?

In any case, I was able to telnet in, re-flashed the firmware manually, hit the reset button for good measure, and then tried installing again from install_piratebox.zip. This time it worked, and I have a “PirateBox” ssid once again following me around. I must admit, though: I’m a bit disappointed by the “new, responsive” layout. I think it’s really only the media browser that has improved.

I don’t know that I’m going to do anything about it today, but I’d really like to install OpenProject on civilfritz. I think it would help Andi and me work together to get projects done at home.

CU

We had a meeting with Peak and IBM storage about a possible replacement for the existing home and projects storage N-series system. They are proposing a black-box GPFS system “IBM StoreWise v7000 Unified” which I should look into further. They have also mentioned ESS nee GSS (hopefully still a grey-box solution), though they didn’t have many details on what that would look like. They weren’t even sure if it’s running Linux or AIX. (I’m hoping that their first instinct was wrong, and that it’s Linux on Power.)

We have a talk scheduled about upgrading the PetaLibrary. I specifically am of the opinion that we should be using the PetaLibrary to house home and projects. The way I see it, we should have two independent filesets, /pl/home/ and /pl/projects/, and just NFS export and snapshot each of them. We could even still have each home and project directory be a dependent fileset underneath, if we want.

I’m going to keep working on the user guide today. Hopefully I’ll have a batch queueing / slurm guide to give to Aaron by tomorrow.

I also need to work on a local passwords pam config to tick off the last of my performance plan post-its.

Before I do anything, though, I should look through org-mode and calendar to see what else I might be forgetting.

Understanding OpenStack networking with Neutron and Open vSwitch

I couldn’t figure out OpenStack’s networking system enough to get my instances’ floating IPs to work, even from the packstack --allinone host itself. I read the RDO document Networking in too much detail, but even that seemed to assume more knowledge about how things fit together than I had.

I eventually got some help from the #rdo irc channel; but I think the best documentation ended up being Visualizing OpenStack Networking Service Traffic in the Cloud from the OpenStack Operations Guide.

In the end, most of my problem was that I was trying to assign an IP address to my br-ex interface that conflicted with the the l3-agent that was already connected to the br-ex bridge. Literally any other address in the subnet that wasn’t also used by an instance gave me the behavior I was looking for: being able to ping the floating addresses from the host.

ip addr add 172.24.4.225/28 dev br-ex

Once that was done, I was able to configure NAT on the same host. This is described at the end of the “Networking in too much detail” document, and was echoed by the individual who helped me in #rdo; but I modified the POSTROUTING rule to identify the external network interface, p4p1. If the external interface is left unspecified, then even internal traffic from the host to the guests will be rewritten to the external address, which isn’t valid on the floating-IP subnet.

iptables -A FORWARD -d 172.24.4.224/28 -j ACCEPT
iptables -A FORWARD -s 172.24.4.224/28 -j ACCEPT
iptables -t nat -I POSTROUTING 1 -s 172.24.4.224/28 -o p4p1 -j MASQUERADE

Rebuilding civilfritz TLS

Some random notes on when I rebuilt TLS for civilfritz using gnutls and cacert.

  • https

  • ldap start_tls

http://www.gnutls.org/manual/html_node/certtool-Invocation.html

certtool --generate-privkey --outfile civilfritz.net.key

certtool --generate-request --load-privkey civilfritz.net.key --outfile civilfritz.net.csr

https://www.cacert.org

vi civilfritz.net.pem

certtool --certificate-info < civilfritz.net.pem

Subject Alternative Name (not critical):
   DNSname: civilfritz.net
   XMPP Address: civilfritz.net
   DNSname: www.civilfritz.net
   XMPP Address: www.civilfritz.net

$ cat civilfritz.net.pem /etc/ssl/certs/cacert.org.pem | certtool --verify-chain
Certificate[0]: CN=civilfritz.net
    Issued by: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Verifying against certificate[1].
Error: Issuer's name: O=CAcert Inc.,OU=http://www.CAcert.org,CN=CAcert Class 3 Root
certtool: issuer name does not match the next certificate


$ cat civilfritz.net.pem cacert.org.pem | certtool --verify-chain
Certificate[0]: CN=civilfritz.net
    Issued by: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Verifying against certificate[1].
    Verification output: Verified.

Certificate[1]: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Issued by: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Verification output: Verified.

Chain verification output: Verified.

Migrating from Apache to Nginx

I've had this hanging out as a draft for a while, and never got it polished up as well as I'd like; but in case it's useful to anyone, here's some notes on my recent migration to nginx.


I run civilfritz.net on a VPS, but I do my best to keep as low a monthly payment as possible. That means running on the smallest (lowest-memory) VM available from my provider: a 1GB Linode.

1GB used to seem like a lot of memory; but when I'm trying to run a Minecraft server alongside a preforking Apache server alongside a Salt master, it fills up quickly.

I've wanted to try moving to a lighter-weight webserver for a while; so today I'm porting my Apache config to Nginx.

sites-available/civilfritz.net

civilfritz.net runs as a pair of Apache virtual hosts to support http and https. I want the majority of the configuration between the vhosts to be identical, so I include a separate common configuration file in each.

The http vhost includes the common config, as well as a rewrite for the ikiwiki /auth section. (Authentication should only happen over https, but attempts to authenticate over http should be redirected there.)

# apache http vhost

<VirtualHost *:80>
    RewriteEngine on
    RewriteRule ^/auth(|/.*)$ https://%{HTTP_HOST}%{REQUEST_URI} [R,L]

    Include sites-available/civilfritz.net-common
</VirtualHost>

The transition to nginx was pretty simple. The ikiwiki /auth section is a virtually equivalent rewrite rule, and the include directive is also similar.

# nginx http vhost

server
{
        listen 80;

        rewrite ^/auth(|/.*)$ https://$server_name:443$request_uri? permanent;

        include sites-available/civilfritz.net-common;
}

The https vhost also includes the common config, as well as the requisite ssl config. To support http basic authentication, an instance of pwauth is configured as an external authentication module, which proxies to PAM.

# apache https vhost

<VirtualHost *:443>
    SSLEngine on
    SSLCertificateFile /etc/ssl/certs/civilfritz.net.pem
    SSLCertificateKeyFile /etc/ssl/private/civilfritz.net.key

    AddExternalAuth pwauth /usr/sbin/pwauth
    SetExternalAuthMethod pwauth pipe

    <Location />
        AuthType Basic
        AuthBasicProvider external
        AuthExternal pwauth
        AuthName "civilfritz.net"
    </Location>

    Include sites-available/civilfritz.net-common

    <Location /auth>
        Require valid-user
    </Location>
</VirtualHost>

Again, the nginx vhost starts out similarly. Listen on tcp 443, initialize the requisite certificate and key, and include the common config.

pwauth is an Apache-specific interface, so I wasn't able to use it to proxy to pam in nginx; but the auth_pam module works well enough and, since I'm not trying to use PAM to auth directly against local unix files (I'm using sssd to access kerberos), I still don't have to run the server as root.

# nginx ssl vhost

server
{
        listen 443 ssl;

        ssl_certificate /etc/ssl/certs/civilfritz.net.pem;
        ssl_certificate_key /etc/ssl/private/civilfritz.net.key;

        include sites-available/civilfritz.net-common;

        location /auth
        {
                auth_pam "civilfritz.net";
                include fastcgi_params;
                fastcgi_pass unix:/var/run/fcgiwrap.socket;
                fastcgi_index ikiwiki.cgi;
                fastcgi_param REMOTE_USER $remote_user;
        }
}

The semantics of Nginx basic authentication differ from Apache. In Apache I was able to set AuthName globally (at /) and then require authentication arbitrarily at lower points in the tree. Here, the inclusion of the auth_pam directive implies an auth requirement; so I'll have to repeat the authentication realm ("civilfritz.net") anywhere I want to authenticate.

The biggest difference, though, is how Nginx handles cgi. Whereas Apache builds-in cgi execution for nominated files or directories, Nginx proxies all cgi execution through an external interface: here, fastcgi. A packed-in fastcgi_params file contains some useful default cgi environment variables, but omits REMOTE_USER. I set here so that ikiwiki can determine what user has authenticated.

sites-available/civilfritz.net-common

The vast majority of my local config is in the common file included by both vhosts.

# Apache initial config

ServerAdmin anderbubble@gmail.com
DirectoryIndex index.html
ServerName civilfritz.net
ServerAlias www.civilfritz.net

LogLevel warn
ErrorLog /var/log/apache2/error.log
CustomLog /var/log/apache2/access.log combined

DocumentRoot /srv/www/wiki

RewriteEngine on

Alias /robots.txt /srv/www/robots.txt

Alias /minecraft/overview /srv/www/minecraft-overviewer

<Location /users/janderson/private>
    Require user janderson
</Location>

<Directory />
    Options FollowSymLinks
    AllowOverride None
    Order deny,allow
    Deny from all
</Directory>

<Directory /srv/www>
    Order allow,deny
    Allow from all
</Directory>

<Directory /srv/www/wiki>
    AddHandler cgi-script .cgi
    Order allow,deny
    Allow from all
    Options +ExecCGI
    ErrorDocument 404 /ikiwiki.cgi
    ExpiresActive on
    ExpiresDefault "access plus 0 seconds"
    Header set Cache-Control "no-store, no-cache, must-revalidate, max-age=0"
    Header set Pragma "no-cache"
</Directory>

<Location /gitweb>
    Order allow,deny
    Allow from all
    DirectoryIndex index.cgi
</Location>

<Directory /home/*/public_html/>
    AllowOverride FileInfo AuthConfig Limit Indexes Options=ExecCGI
</Directory>

WSGIApplicationGroup %{GLOBAL}

New Nginx config

sites-available/civilfritz.net-common

index index.html;
server_name civilfritz.net www.civilfritz.net;

root /srv/www/wiki/;

location /
{
        error_page 404 /ikiwiki-404.cgi;
        expires -1;
}

location /robots.txt
{
        alias /srv/www/robots.txt;
}

location /minecraft/overview
{
        alias /srv/www/minecraft-overviewer;
}

location /ikiwiki.cgi
{
        include fastcgi_params;
        fastcgi_pass unix:/var/run/fcgiwrap.socket;
        fastcgi_index ikiwiki.cgi;
}

location /ikiwiki-404.cgi
{
        internal;
        include fastcgi_params;
        fastcgi_pass unix:/var/run/fcgiwrap.socket;
        fastcgi_param REDIRECT_URL $request_uri;
        # also needed to remove explicit 200
        fastcgi_param REDIRECT_STATUS 404;
}

location ~ /gitweb/(index|gitweb).cgi
{
        root /usr/share/;
        gzip off;
        include fastcgi_params;
        fastcgi_pass unix:/var/run/fcgiwrap.socket;
}

location /gitweb/
{
        root /usr/share/;
        gzip off;
        index index.cgi;
}

location ~ ^/~(.+?)(/.*)?$
{
        alias /home/$1/public_html$2;
        autoindex on;
}

Salt state

nginx.sls

nginx:

  pkg:
    - installed

  service:
    - running
    - enable: True
    - reload: True
    - watch:
      - pkg: nginx

civilfritz/www.sls

include:
  - nginx

[...]

/etc/nginx/sites-enabled/default:
  file:
    - absent
    - watch_in:
      - service: nginx

/etc/nginx/sites-enabled/civilfritz.net:
  file:
    - symlink
    - target: /etc/nginx/sites-available/civilfritz.net
    - require:
      - file: /etc/nginx/sites-available/civilfritz.net
    - watch_in:
      - service: nginx

/etc/nginx/sites-available/civilfritz.net:
  file:
    - managed
    - source: salt://civilfritz/nginx-sites/civilfritz.net
    - user: root
    - group: root
    - mode: 0644
    - require:
      - file: /etc/nginx/sites-available/civilfritz.net-common
    - watch_in:
      - service: nginx

/etc/nginx/sites-available/civilfritz.net-common:
  file:
    - managed
    - source: salt://civilfritz/nginx-sites/civilfritz.net-common
    - user: root
    - group: root
    - mode: 0644
    - watch_in:
      - service: nginx

/srv/www/wiki/ikiwiki-404.cgi:
  file:
    - symlink
    - target: /srv/www/wiki/ikiwiki.cgi