The curc::sysconfig::scinet Puppet module

I’ve been working on a new module, curc::sysconfig::scinet, which will generally do the Right Thing™ when configuring a host on the CURC science network, with as little configuration as possible.

Let’s look at some examples.

login nodes

class { 'curc::sysconfig::scinet':
  location => 'comp',
  mgt_if   => 'eth0',
  dmz_if   => 'eth1',
  notify   => Class['network'],
}

This is the config used on a new-style login node like login05 and login07. (What makes them new-style? Mostly just that they’ve had their interfaces cleaned up to use eth0 for “mgt” and eth1 for “dmz”.)

Here’s the routing table that this produced on login07:

$ ip route list
10.225.160.0/24 dev eth0  proto kernel  scope link  src 10.225.160.32
10.225.128.0/24 via 10.225.160.1 dev eth0
192.12.246.0/24 dev eth1  proto kernel  scope link  src 192.12.246.39
10.225.0.0/20 via 10.225.160.1 dev eth0
10.225.0.0/16 via 10.225.160.1 dev eth0  metric 110
10.128.0.0/12 via 10.225.160.1 dev eth0  metric 110
default via 192.12.246.1 dev eth1  metric 100
default via 10.225.160.1 dev eth0  metric 110

Connections to “mgt” subnets use the “mgt” interface eth0, either by the link-local route or the static routes via comp-mgt-gw (10.225.160.1). Connections to the “general” subnet (a.k.a. “vlan 2049”), as well as the rest of the science network (“data” and “svc” networks) also use eth0 by static route. The default eth0 route is configured by DHCP, but the interface has a default metric of 110, so it doesn’t conflict with or supersede eth1’s default route, which is configured with a lower metric of 100.

Speaking of eth1, the “dmz” interface is configured statically, using information retrieved from DNS by Puppet.

$ cat /etc/sysconfig/network-scripts/ifcfg-eth1
TYPE=Ethernet
DEVICE=eth1
BOOTPROTO=static
HWADDR=00:50:56:88:2E:36
ONBOOT=yes
IPADDR=192.12.246.39
NETMASK=255.255.255.0
GATEWAY=192.12.246.1
METRIC=100
IPV4_ROUTE_METRIC=100

Usually the routing priority of the “dmz” interface would mean that inbound connections to the “mgt” interface from outside of the science network would be blocked when the “dmz”-bound response is filtered by rp_filter; but curc::sysconfig::scinet also configures routing policy for eth0, so traffic on that interface always returns from that interface.

$ ip rule show | grep 'lookup 1'
32764:  from 10.225.160.32 lookup 1
32765:  from all iif eth0 lookup 1

$ ip route list table 1
default via 10.225.160.1 dev eth0

This allows me to ping login07.rc.int.colorado.edu from my office workstation.

$ ping -c 1 login07.rc.int.colorado.edu
PING login07.rc.int.colorado.edu (10.225.160.32) 56(84) bytes of data.
64 bytes from 10.225.160.32: icmp_seq=1 ttl=62 time=0.507 ms

--- login07.rc.int.colorado.edu ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 1ms
rtt min/avg/max/mdev = 0.507/0.507/0.507/0.000 ms

Because the default route for eth0 is actually configured, outbound routing from login07 is resilient to failure of the “dmz” link.

# ip route list | grep -v eth1
10.225.160.0/24 dev eth0  proto kernel  scope link  src 10.225.160.32
10.225.128.0/24 via 10.225.160.1 dev eth0
10.225.0.0/20 via 10.225.160.1 dev eth0
10.225.0.0/16 via 10.225.160.1 dev eth0  metric 110
10.128.0.0/12 via 10.225.160.1 dev eth0  metric 110
default via 10.225.160.1 dev eth0  metric 110

Traffic destined to leave the science network simply proceeds to the next preferred (and, in this case, only remaining) default route, comp-mgt-gw.

DHCP, DNS, and the FQDN

Tangentially, it’s important to note that the DHCP configuration of eth0 will tend to re-wite /etc/resolv.conf and the search path it defines, with the effect of causing the FQDN of the host to change to login07.rc.int.colorado.edu. Because login nodes are logically (and historically) external hosts, not internal hosts, they should prefer their external identity to their internal identity. As such, we override the domain search path on login nodes to cause them to discover their rc.colorado.edu FQDN’s first.

# cat /etc/dhcp/dhclient-eth0.conf
supersede domain-search "rc.colorado.edu", "rc.int.colorado.edu";

PetaLibrary/repl

The Petibrary/repl GPFS NSD nodes replnsd{01,02} are still in the “COMP” datacenter, but only attach to “mgt” and “data” networks.

class { 'curc::sysconfig::scinet':
  location         => 'comp',
  mgt_if           => 'eno2',
  data_if          => 'enp17s0f0',
  other_data_rules => [ 'from 10.225.176.61 table 2',
                        'from 10.225.176.62 table 2',
                        ],
  notify           => Class['network_manager::service'],
}

This config produces the following routing table on replnsd01

$ ip route list
default via 10.225.160.1 dev eno2  proto static  metric 110
default via 10.225.176.1 dev enp17s0f0  proto static  metric 120
10.128.0.0/12 via 10.225.160.1 dev eno2  metric 110
10.128.0.0/12 via 10.225.176.1 dev enp17s0f0  metric 120
10.225.0.0/20 via 10.225.160.1 dev eno2
10.225.0.0/16 via 10.225.160.1 dev eno2  metric 110
10.225.0.0/16 via 10.225.176.1 dev enp17s0f0  metric 120
10.225.64.0/20 via 10.225.176.1 dev enp17s0f0
10.225.128.0/24 via 10.225.160.1 dev eno2
10.225.144.0/24 via 10.225.176.1 dev enp17s0f0
10.225.160.0/24 dev eno2  proto kernel  scope link  src 10.225.160.59  metric 110
10.225.160.49 via 10.225.176.1 dev enp17s0f0  proto dhcp  metric 120
10.225.176.0/24 dev enp17s0f0  proto kernel  scope link  src 10.225.176.59  metric 120

…with the expected interface-consistent policy-targeted routing tables.

$ ip route list table 1
default via 10.225.160.1 dev eno2

$ ip route list table 2
default via 10.225.176.1 dev enp17s0f0

Static routes for “mgt” and “data” subnets are defined for their respective interfaces. As on the login nodes above, default routes are specified for both interfaces as well, with the lower-metric “mgt” interface eno2 being preferred. (This is configurable using the mgt_metric and data_metric parameters.)

Perhaps the most notable aspect of the PetaLibrary/repl network config is the provisioning of the GPFS CES floating IP addresses 10.225.176.{61,62}. These addresses are added to the enp17s0f0 interface dynamically by GPFS, and are not defined with curc::sysconfig::scinet; but the config must reference these addresses to implement proper interface-consistent policy-targeted routing tables. Though version of Puppet deployed at CURC lacks the semantics to infer these rules from a more semantic data_ip parameter; so the other_data_rules parameter is used in stead.

other_data_rules => [ 'from 10.225.176.61 table 2',
                      'from 10.225.176.62 table 2',
                      ],

Blanca/ICS login node

porting the blanca login node would be great because it’s got a “dmz”, “mgt”, and “data” interface; so it would exercise the full gamut of features of the module.

Linux policy-based routing

How could Linux policy routing be so poorly documented? It’s so useful, so essential in a multi-homed environment… I’d almost advocate for its inclusion as default behavior.

What is this, you ask? To understand, we have to start with what Linux does by default in a multi-homed environment. So let’s look at one.

$ ip addr
[...]
4: eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000
    link/ether 78:2b:cb:66:75:c0 brd ff:ff:ff:ff:ff:ff
    inet 10.225.128.80/24 brd 10.225.128.255 scope global eth2
    inet6 fe80::7a2b:cbff:fe66:75c0/64 scope link
       valid_lft forever preferred_lft forever
[...]
6: eth5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc mq state UP qlen 1000
    link/ether e4:1d:2d:14:93:60 brd ff:ff:ff:ff:ff:ff
    inet 10.225.144.80/24 brd 10.225.144.255 scope global eth5
    inet6 fe80::e61d:2dff:fe14:9360/64 scope link
       valid_lft forever preferred_lft forever

So we have two interfaces, eth2 and eth5. They’re on separate subnets, 10.225.128.0/24 and 10.225.144.0/24 respectively. In our environment, we refer to these as “spsc-mgt” and “spsc-data.” The practical circumstance is that one of these networks is faster than the other, and we would like bulk data transfer to use the faster “spsc-data” network.

If the client system also has an “spsc-data” network, everything is fine. The client addresses the system using its data address, and the link-local route prefers the data network.

$ ip route list 10.225.144.0/24
10.225.144.0/24 dev eth5  proto kernel  scope link  src 10.225.144.80

Our network environment covers a number of networks, however. So let’s say our client lives in another data network–“comp-data.” Infrastructure routing directs the traffic to the -data interface of our server correctly, but the default route on the server prefers the -mgt interface.

$ ip route list | grep ^default
default via 10.225.128.1 dev eth2

For this simple case we have two options. We can either change our default route to prefer the -data interface, or we can enumerate intended -data client networks with static routes using the data interface. Since changing the default route simply leaves us in the same situation for the -mgt network, let’s define some static routes.

$ ip route add 10.225.64.0/20 via 10.225.144.1 dev eth5
$ ip route add 10.225.176.0/24 via 10.225.144.1 dev eth5

So long as we can enumerate the networks that should always use the -data interface of our server to communicate, this basically works. But what if we want to support clients that don’t themselves have separate -mgt and -data networks? What if we have a single client–perhaps with only a -mgt network connection–that should be able to communicate individually with the server’s -mgt interface and its -data interface. In the most pathological case, what if we have a host that is only connected to the spsc-mgt (10.225.128.0/24) interface, but we want that client to be able to communicate with the server’s -data interface. In this case, the link-local route will always prefer the -mgt network for the return path.

Policy-based routing

The best case would be to have the server select an outbound route based not on a static configuration, but in response to the incoming path of the traffic. This is the feature enabled by policy-based routing.

Linux policy routing allows us to define distinct and isolated routing tables, and then select the appropriate routing table based on the traffic context. In this situation, we have three different routing contexts to consider. The first of these are the routes to use when the server initiates communication.

$ ip route list table main
10.225.128.0/24 dev eth2  proto kernel  scope link  src 10.225.128.80
10.225.144.0/24 dev eth5  proto kernel  scope link  src 10.225.144.80
10.225.64.0/20 via 10.225.144.1 dev eth5
10.225.176.0/24 via 10.225.144.1 dev eth5
default via 10.225.128.1 dev eth2

A separate routing table defines routes to use when responding to traffic from the -mgt interface.

$ ip route list table 1
default via 10.225.128.1 dev eth2

The last routing table defines routes to use when responding to traffic from the -data interface.

$ ip route list table 2
default via 10.225.144.1 dev eth5

With these separate routing tables defined, the last step is to define the rules that select the correct routing table.

$ ip rule list
0:  from all lookup local
32762:  from 10.225.144.80 lookup 2
32763:  from all iif eth5 lookup 2
32764:  from 10.225.128.80 lookup 1
32765:  from all iif eth2 lookup 1
32766:  from all lookup main
32767:  from all lookup default

Despite a lack of documentation, all of these rules may be codified in Red Hat “sysconfig”-style “network-scripts” using interface-specific route- and rule- files.

$ cat /etc/sysconfig/network-scripts/route-eth2
default via 10.225.128.1 dev eth2
default via 10.225.128.1 dev eth2 table 1

$ cat /etc/sysconfig/network-scripts/route-eth5
10.225.64.0/20 via 10.225.144.1 dev eth5
10.225.176.0/24 via 10.225.144.1 dev eth5
default via 10.225.144.1 dev eth5 table 2

$ cat /etc/sysconfig/network-scripts/rule-eth2
iif eth2 table 1
from 10.225.128.80 table 1

$ cat /etc/sysconfig/network-scripts/rule-eth5
iif eth5 table 2
from 10.225.144.80 table 2

Changes to the RPDB made with these commands do not become active immediately. It is assumed that after a script finishes a batch of updates, it flushes the routing cache with ip route flush cache.

References

Two Compasses

A captain and his first mate were sailing on the open ocean. Each possessed a compass, with a third affixed to the helm. The first mate approached the captain, saying, “Sir: in my clumsiness this morning, I have dropped my compass into the sea! What should I do?” After a moment, the captain wordlessly turned and threw his own compass into the sea. The first mate watched and became enlightened.

The Psalms and Me

Every time I read the Psalms:

Hear my prayer, O LORD, Give ear to my supplications!

“Oh, maybe this will be a nice verse to uplift my friend!”

For the enemy has persecuted my soul; He has crushed my life to the ground; He has made me dwell in dark places, like those who have long been dead. Therefore my spirit is overwhelmed within me; My heart is appalled within me.

“Yes! I’ll bet this message will really resonate with my friend! Sometimes we all feel downtrodden.”

Answer me quickly, O LORD, my spirit fails; Do not hide Your face from me, Or I will become like those who go down to the pit.

“Yes! When we are at our lowest, we should run to God!”

And in Your lovingkindness, cut off my enemies And destroy all those who afflict my soul

“Um… David? We cool, bro?”

Awake to punish all the nations; Do not be gracious to any who are treacherous in iniquity.

“Hey… I don’t know if I meant all that…”

Scatter them by Your power, and bring them down, O Lord, our shield.

Destroy them in wrath, destroy them that they may be no more

Deal with them as You did with Midian, […] they became manure for the ground

“Hold on, there, man! Let’s not get too crazy…”

How blessed will be the one who seizes and dashes your little ones Against the rock.

sigh

“Come on, David. Things were going so well. I mean, sure: you murdered a man to cover up the fact that you impregnated his wife… but you were sorry, right?”

Wash me thoroughly from my iniquity And cleanse me from my sin.

“See? There. That’s more like it…”

O God, shatter their teeth in their mouth; Break out the fangs of the young lions, O LORD.

“No! Bad David! No biscuit!”

“Whatcha got for me, Jesus?”

You have heard that it was said, "You shall love your neighbor and hate your enemy. But I say to you, love your enemies and pray for those who persecute you, so that you may be sons of your Father who is in heaven; for He causes His sun to rise on the evil and the good, and sends rain on the righteous and the unrighteous. For if you love those who love you, what reward do you have?

“Aw, yeah. That’s the stuff.”

“David, have you met this guy? I think you should probably meet this guy.”

6 February 2015

I brought my Go board into work today because Jon mentioned that he was interested in learning how to play. Maybe I’ll finally get to start playing regularly. Yet one more example of how much better life is here.

Speaking of which, I spent all night dreaming about high-stress situations, only to wake up and realize that life is actually so much easier than I was dreaming. For some reason we had moved back to KAUST, and I was trying to justify it to ourselves and to our family. Meanwhile, we were apparently expecting guests for some kind of two-week visit, and I was stressing out about having scheduled activities–notably a schedule of who would be preparing food–ahead of time. Doesn’t even make sense; but I certainly felt better once I was awake.

We had a meeting scheduled with Seagate today to talk about the storage system for Saga; but they’ve just cancelled at the last minute. Quite frustrating that we weren’t told until 10 minutes before the meeting: I was feeling pretty bad this morning, and had been contemplating working from home today. Still, maybe it’s for the best: maybe it’s better that I not try to work from home. Better to communicate with Aaron, after all.

I passed Aaron some docs for Slurm, but it’s certainly not finished. Still, I also gave him links to all the source material in the existing user guide, so he has something to look at, anyway.

I need to port our notes from the bench review into the github issues, and review my flagged email in general.

At home, I want to go through all the inboxes and pay bills. Horray for the weekend.

Meanwhile, I still need to finish up that pam stack for local passwords and Google OTP.

Oh: and I need to send Thomas a picture of the chair we got!

5 February 2015

Home

I finally got curtain brackets mounted in the family room, though no curtains are hung yet. Last night my drill battery died, and I ended up with a drill bit stuck in the wall; but now it’s out, anchors in, and brackets hung. Tonight, curtains.

Tech

My piratebox still won’t install the software from the usb stick, claiming that the storage device is full. Since there’s plenty of space left on the usb stick, I can only assume it’s talking about the internal storage. Maybe I flashed it incorrectly? Or maybe a previous failed install has left it without some space it exepcts to have? Or maybe I need to do some kind of reset?

In any case, I was able to telnet in, re-flashed the firmware manually, hit the reset button for good measure, and then tried installing again from install_piratebox.zip. This time it worked, and I have a “PirateBox” ssid once again following me around. I must admit, though: I’m a bit disappointed by the “new, responsive” layout. I think it’s really only the media browser that has improved.

I don’t know that I’m going to do anything about it today, but I’d really like to install OpenProject on civilfritz. I think it would help Andi and me work together to get projects done at home.

CU

We had a meeting with Peak and IBM storage about a possible replacement for the existing home and projects storage N-series system. They are proposing a black-box GPFS system “IBM StoreWise v7000 Unified” which I should look into further. They have also mentioned ESS nee GSS (hopefully still a grey-box solution), though they didn’t have many details on what that would look like. They weren’t even sure if it’s running Linux or AIX. (I’m hoping that their first instinct was wrong, and that it’s Linux on Power.)

We have a talk scheduled about upgrading the PetaLibrary. I specifically am of the opinion that we should be using the PetaLibrary to house home and projects. The way I see it, we should have two independent filesets, /pl/home/ and /pl/projects/, and just NFS export and snapshot each of them. We could even still have each home and project directory be a dependent fileset underneath, if we want.

I’m going to keep working on the user guide today. Hopefully I’ll have a batch queueing / slurm guide to give to Aaron by tomorrow.

I also need to work on a local passwords pam config to tick off the last of my performance plan post-its.

Before I do anything, though, I should look through org-mode and calendar to see what else I might be forgetting.

Understanding OpenStack networking with Neutron and Open vSwitch

I couldn’t figure out OpenStack’s networking system enough to get my instances’ floating IPs to work, even from the packstack --allinone host itself. I read the RDO document Networking in too much detail, but even that seemed to assume more knowledge about how things fit together than I had.

I eventually got some help from the #rdo irc channel; but I think the best documentation ended up being Visualizing OpenStack Networking Service Traffic in the Cloud from the OpenStack Operations Guide.

In the end, most of my problem was that I was trying to assign an IP address to my br-ex interface that conflicted with the the l3-agent that was already connected to the br-ex bridge. Literally any other address in the subnet that wasn’t also used by an instance gave me the behavior I was looking for: being able to ping the floating addresses from the host.

ip addr add 172.24.4.225/28 dev br-ex

Once that was done, I was able to configure NAT on the same host. This is described at the end of the “Networking in too much detail” document, and was echoed by the individual who helped me in #rdo; but I modified the POSTROUTING rule to identify the external network interface, p4p1. If the external interface is left unspecified, then even internal traffic from the host to the guests will be rewritten to the external address, which isn’t valid on the floating-IP subnet.

iptables -A FORWARD -d 172.24.4.224/28 -j ACCEPT
iptables -A FORWARD -s 172.24.4.224/28 -j ACCEPT
iptables -t nat -I POSTROUTING 1 -s 172.24.4.224/28 -o p4p1 -j MASQUERADE

Rebuilding civilfritz TLS

Some random notes on when I rebuilt TLS for civilfritz using gnutls and cacert.

  • https

  • ldap start_tls

http://www.gnutls.org/manual/html_node/certtool-Invocation.html

certtool --generate-privkey --outfile civilfritz.net.key

certtool --generate-request --load-privkey civilfritz.net.key --outfile civilfritz.net.csr

https://www.cacert.org

vi civilfritz.net.pem

certtool --certificate-info < civilfritz.net.pem

Subject Alternative Name (not critical):
   DNSname: civilfritz.net
   XMPP Address: civilfritz.net
   DNSname: www.civilfritz.net
   XMPP Address: www.civilfritz.net

$ cat civilfritz.net.pem /etc/ssl/certs/cacert.org.pem | certtool --verify-chain
Certificate[0]: CN=civilfritz.net
    Issued by: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Verifying against certificate[1].
Error: Issuer's name: O=CAcert Inc.,OU=http://www.CAcert.org,CN=CAcert Class 3 Root
certtool: issuer name does not match the next certificate


$ cat civilfritz.net.pem cacert.org.pem | certtool --verify-chain
Certificate[0]: CN=civilfritz.net
    Issued by: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Verifying against certificate[1].
    Verification output: Verified.

Certificate[1]: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Issued by: O=Root CA,OU=http://www.cacert.org,CN=CA Cert Signing Authority,EMAIL=support@cacert.org
    Verification output: Verified.

Chain verification output: Verified.