Posts about technology (old posts, page 4)

Securing Salt file_roots

My only real problem with Salt vs Puppet is its security model for files stored in the manifest. Puppet’s fileserver supports per-node export configuration, allowing for node-private file distribution. Salt, on the other hand, exposes all files to all nodes at all times.

How Puppet does it

# fileserver.conf
[mount_point]
path /path/to/files
allow *.example.com
deny *.wireless.example.com

[private]
path /data/private/%h
allow *

How salt could do it

# file_roots.sls
file_roots:
  base:
    - /srv/salt
    - /srv/salt-example.com:
      - allow: *.example.com
      - deny: *.wireless.example.com
    -
  dev:
    - /srv/salt/dev/services
    - /srv/salt/dev/states
  prod:
    - /srv/salt/prod/services
    - /srv/salt/prod/states

Proposal

file_roots:
  base:
    - /srv/salt
  *.example.com:
    - /srv/salt-example.com

A New Firewall Salt State

My evaluation of Salt Stack is going pretty well. I’ve moved my main vps over to it with no ill effect, and was able to transcribe its Puppet manifest almost in its entirety. In many instances, I think the Salt version is more readable, and feels lighter than the Puppet version.

One glaring hole, though, is Salt’s support for firewall configuration. I was using the Puppet Labs firewall module to maintain iptables rules for my vps. That worked pretty well; but all Salt has right now is the ability to append new rules to a chain. The existing iptables state is documented at-risk for deprecation, too, so it’s a bad place to start.

It is expected that this state module, and other system-specific firewall states, may at some point be deprecated in favor of a more generic firewall state.

(Salt does have good support for iptables at the functional layer; it’s just the configuration management part that’s lacking.)

Since the firewall module I used before worked well enough, and I have a bunch of config based on it already, I’ve started reimplementing its interface in a Salt state module.

"100 salt-master":
  firewall_rule:
    - managed
    - protocol: tcp
    - ports: 4505:4506
    - action: accept

I’ve found developing a Salt state to be a pretty simple process so far. I really like how Salt’s effective layers cleanly separate between functionality, state management, and configuration. (My firewall state makes liberal use of the existing iptables module, for example.)

I’ve just published the module so far on github. This module at least recognizes that my existing config exists, and would be able to rebuild it in the proper order (sorted lexically by comment) if necessary. There’s a lot of functionality missing, but it’s a place to start. If anyone else uses it, that will just be an excuse to make it better!

Discovering Salt Stack

I’ve been a pretty stalwart Puppet user since I first discovered it in 2009. At that time, my choices were, as I saw them, between the brand-new cfengine3, the I’ve-seen-how-the-sausage-is-made bcfg2, and Puppet. Of those choices, Puppet seemed like the best choice.

In particular, I liked Puppet’s “defined state” style of configuration management, and how simple it was to describe dependencies between the various packages, files, and services to be configured.

Like I said, I’ve been using Puppet happily for the past 4 years; but now, I think I’ve been swayed by Salt Stack.

I know I looked at salt stack before; but, at the time, I think I dismissed it as just “remote execution.” Salt does, after all, start from a very different place than Puppet. At its most simple, it is a mechanism for shipping Python functions to remote nodes and executing them. It seemed the very opposite of the idempotent state management that I was looking for.

But now that I’ve taken the time to look deeper into the documentation (or, perhaps, now that the project has grown further) I’ve found Salt Stack States: the state enforcement configuration management system I was looking for; and with a trivial-to-setup remote execution layer underneath it.

Salt is based on 0MQ. I don’t know much about message queues; but I do know that I could never get ActiveMQ working for use with Puppet’s MCollective. After only 30 minutes of hacking, I had Salt, with 0MQ, running on two OS X machines and two Debian machines, all taking to the same master, each from behind its own form of inconveniently private network.

$ sudo salt '*' test.ping
ln1.civilfritz.net:
    True
Jonathons-MacBook-Pro.local:
    True
numfar.civilfritz.net:
    True
dabade.civilfritz.net:
    True

Glorious.

Some other things that I like about Salt:

  • States are defined in YAML, so there’s no proprietary (cough poorly defined cough) language to maintain.

  • The remote execution layer and state module layer help keep executable code separate from state definitions.

  • Key management is a bit less foolish. (It shows you what you’re about to sign before you sign it.)

Of course, no new technology arrives without the pain of a legacy conversion. I have a lot of time and effort invested into the Puppet manifests that drive ln1.civilfritz.net; but converting them to Salt Stack States is serving as a pretty good exercise for evaluating whether I really prefer Salt to Puppet.

I’ve already discovered a few things I don’t like, of course:

  • The abstraction of the underlying Python implementation is a bit thin. This is sometimes a good thing, as it’s easier to see how a state definition maps to individual function calls; but it also means that error messages sometimes require an understanding of Python. Sometimes you even get full tracebacks.

  • Defined states don’t seem to understand the correlation between uid and uidNumber. In Puppet I started specifying group ownership as 0 when I discovered that AIX uses the gid system rather than root. In Salt, this appears to try to reassign the group ownership every time.

  • All hosts in a Salt config have access to all of the files in the master.

  • YAML formatting can be a bit wonky. (Why are arguments lists of dictionaries? Why is the function being called in the same list as its arguments?)

  • No good firewall (iptables) configuration support. The iptables module isn’t even present in the version of Salt I have; but the documentation warns that even it is likely to be deprecated in the future.

That said, I can’t ignore the fact that, since Salt happens to be written in Python, I might actually be able to contribute to this project. I’ve already done some grepping around in the source code, and it seems immediately approachable. Enhancing the roots fileserver, for example, to provide node-restricted access to files, shouldn’t be too bad. I might even be able to port Puppet Lab’s firewall module from Ruby to Python for use as a set of Salt modules.

Time will tell, I suppose. For now, the migration continues.

Introducing civilfritz Minecraft

I started playing Minecraft with my brother and old college roommate a few weeks ago. My expectations have been proven correct, as I’ve found it much more compelling to play on a persistent server with a group of real-life friends. In fact, in the context of my personal dedicated server instance, I’m finding the game strikes a compelling chord between my gamer side and my sysadmin side.

There’s already some documentation for running a Minecraft server on the Minecraft wiki, but none of it was really in keeping with how I like to administer a server. I don’t want to run services in a screen session, even if an init script sets it up for me.

I wrote my own Debian init script that uses start-stop-daemon and named pipes to allow server commands. Beyond that, I made a Puppet module that can install and configure the server. You can clone it from Git at git://civilfritz.net/puppet-minecraft.git.

I also really like maps, so I started looking for software that would let me generate maps of the world. (I was almost pacified when I learned how to craft maps. Almost.) I eventually settled on Minecraft Overviewer, mostly because it seems to be the most polished implementation. They even provide a Debian repository, so I didn’t have to do anything special to install it.

I’ve configured Minecraft Overviewer to update the render once a day (at 04:00 EST, which hopefully won’t conflict with actual Minecraft server use), with annotations updated once an hour. You can see it at http://civilfritz.net/minecraft/overview.

I couldn’t get Overviewer to display over https for some reason I don’t understand yet; so all access is redirected back at http for now.

Installing Netflix on an import PS3

I purchased my first HD console—a used PlayStation 3 Slim—from a friend as he was leaving KAUST. Tokyo Games (the most trustworthy games retailer in the kingdom) imports its PS3s from somewhere in Europe: I knew when I purchased it that I would not be able to play region-locked Blue-Ray disks. I really only care about games, though, and PS3 games are region-free.

When I returned to the US, import PS3 in tow, I added a use-case for the system as a streaming media center. Hulu Plus worked well enough while I had it; but, when I dropped the service in favor of Netflix, I couldn’t get the Netflix application to install.

NetfliX is only licensed for use from the US. Though I was in the US, and logged-into an explicitly US PSN account, the XMB never presented me with the option to install the Netflix application. It’s not listed in the PSN store, either; so if the XMB doesn’t advertise Netflix to you on it’s own, there’s no way to get it to install.

I went back and forth for a while with Sony support; but, ultimately, they were completely unhelpful. I was sure that, if only I could get the application to install, it would work, but the technician didn’t seem to understand the technical side of the issue I was having.

Thankfully, the people at unblock-us did understand the problem, as I discovered on my own while waiting for the Sony technician to catch up. Ultimately, all I had to do was visit http://ps3.unblock-us.com/ from the internal PS3 web browser. An HTTP redirect points to the installer on the Sony download server, prompting the system to download and install.

As expected, once the application had been installed, streaming worked perfectly.

It’s disappointing that something intended to simplify the user experience (don’t advertise applications that don’t work in your region) ended up severely complicating mine. I had the stubborn persistence to find a workaround myself; but there’s no reason Sony shouldn’t have a similar URL available and documented for people who take the time to contact support. It’d be even better if Netflix was just listed in the PSN store, too, as that’s already filtered by the region of your PSN account, irrespective of the console’s origin.

At least it “works for me” now.

Why I’m abandoning strict Allman style in Puppet manifests

I pretty much always use Allman style in languages that have braces. I like the symmetry, and the visible separation of identifier from value.

Though Allman style has its roots in C, the only brace language I use these days is Puppet. (Python end-runs around this whole issue by omitting braces altogether, which I ultimately prefer.) Pedantic as I am, my choice of brace style has extended (as closely as I could) to writing Puppet manifests.

class motd

(
  $content = undef
)

{
  file
  { '/etc/motd':
    content => $content,
    owner   => '0',
    group   => '0',
    mode    => '0644',
  }
}

This isn’t what most people do, and it’s certainly not what the examples in the Puppet style guide do; but it’s also not in violation of any of the recommendations in the style guide.

I’ve been doing this for years, now; but today, I had one of those “aha” moments where I pleasantly realized that I’ve been doing it wrong.

Allman style works just fine for Puppet class definition; but Puppet resources provide their titles within the braces, rather than outside. This supports the compression of multiple resources into a single declaration.

file {
  '/tmp/a':
    content => 'a';
  '/tmp/b':
    content => 'b';
}

This syntax is explicitly discouraged in the style guide, but it’s part of the language’s legacy.

The problem with Allman style in this context is that is separates the resource title from the resource type. In most braced languages, the title of an element is written outside of the braces, after the type.

#! /bin/bash

function main
{
    # ...
}

In this example, it would be easy to grep a pile of Bash source files for scripts that declare a main function.

$ grep 'function main' *.sh

Not so with Allman style. I can grep for /etc/motd; but that would match against any reference to the file. Finding the declaration itself becomes a manual exercise with a contextual grep (grep --before-context 1).

All of this becomes much simpler, however, if resource declarations include the resource title (and the interstitial brace) on the same line as the resource type.

class motd

(
  $content = undef
)

{
  file { '/etc/motd':
    content => $content,
    owner   => '0',
    group   => '0',
    mode    => '0644',
  }
}

Even I have to admit that grep "file { '/etc/motd':" *.pp is much simpler.

This is immaterial for class declarations, since the class name is located before the brace.

class motd
{
  ...
}

I’d argue that Puppet should at least support a similar syntax for resources; one that puts the title directly after the type.

file '/etc/motd'
{
  ...
}

That could get a bit confusing, though, when using parameterized classes, as a parameterized class application syntax is somewhat close to regular class definition syntax.

# definition
class motd
{
  # ...
}

# declaration
class motd
{
  content => 'Hello, world!',
}

Tracking user actions with the Linux Audit Subsystem

I was given a mandate to log “what the users are doing” on the Minerva cluster system at Mount Sinai. Actually, the original mandate was more prescriptive: implement an auditing ssh daemon on the login nodes.

So that’s what I started doing… or, trying to do. I grabbed the source for auditing ssh, which was, unfortunately, a big custom-patched tarball of openssh, hpn-ssh, and the auditing patches. There was a Red Hat specfile included, so I went to work building a set of packages from these sources.

Unfortunately, my packages, when installed, didn’t function. I say unfortunately, but it might have turned out to be a blessing in disguise. As I researched why my new auditing sshd wasn’t allowing any users to log in (explicitly, with a denied action) I kept coming up against a more general-purpose Linux audit system, built into the kernel.

I had seen bits of this system in use before. I had seen pam_loginuid in default pam stacks before, and anyone who has come up against selinux knows about /var/log/audit/audit.log; but I didn’t appreciate just how flexible the linux audit subsystem is, right down to, if we really want, the ability to log every tty keystroke. (That said, I think we really only need to log execs; but we’ll see.)

Introduction

The linux audit system is a kernel subsystem paired with a userspace daemon that, based on a set of rules stored at /etc/audit/audit.rules, maintains an audit log of events that take place in the kernel, either by instrumenting specific syscalls (e.g., open, execve) or by watching for access to specific inodes (e.g., to track changes to sensitive files.)

In particular, the Linux audit subsystem can be used in the implementation of a Controlled Access Protection Profile as defined by the NSA. Red Hat ships a ruleset, capp.rules, with the audit daemon to implement such a policy.

Goals

  • Track user access from login to logout as a single user.
  • Log all user actions.

User tracking

$ grep pam_loginuid /etc/pam.d/*
/etc/pam.d/crond:session    required   pam_loginuid.so
/etc/pam.d/login:session    required     pam_loginuid.so
/etc/pam.d/remote:session    required     pam_loginuid.so
/etc/pam.d/sshd:session    required     pam_loginuid.so
/etc/pam.d/ssh-keycat:session    required     pam_loginuid.so

Audit rules

-a exit,always -F arch=b32 -S execve
-a exit,always -F arch=b64 -S execve

Reporting script

$ sudo ausearch -r | audit-commands

audit-commands.py

https://www.centos.org/docs/5/html/5.1/Deployment_Guide/rhlcommon-section-0081.html

Experimenting with Twitter Bootstrap

I recently found out about Twitter Bootstrap when version 2.2.0 was tagged. I’ve been wanting to work on the ikiwiki theme I use for civilfritz, and this seemed like a fine opportunity to consolidate my efforts.

I’ve already started hacking around with a Bootstrap-based template for ikiwiki, but it ended up broken enough that I’ve decided to revert back to the minimally-modified version of the default theme that I had already been using. For future work, I’ll probably create a fork of the site somewhere to experiment in.

In which Puppet broke DNS queries

So there I was, idly doing some personal sysadmin on my stateside virtual machine, when I decided, “It’s been a while since I’ve done a dist-upgrade. I should look at the out-of-date packages.”

apt-get update, and immediately I noticed a problem. It was kind of obvious, what with it hanging all over my shell and all.

# apt-get update
0% [Connecting to ftp.us.debian.org] [Connecting to security.debian.org] [Connecting to apt.puppetlabs.com]^C

I quickly tracked a few symptoms back to DNS. I couldn’t get a response off of my DNS servers.

$ host google.com
;; connection timed out; no servers could be reached

I use Linode for vps hosting, and they provide a series of resolving name servers for customer use. It seemed apparent to me that the problem wasn’t on my end–I could still ssh into the box, and access my webserver from the Internet–so I contacted support about their DNS service not working.

Support was immediately responsive; but, after confirming my vm location, they reported that the nameservers seemed to be working correctly. Further, the tech pointed out that he couldn’t ping my box.

No ping? Nope. So I can’t DNS out, and I can’t ICMP at all. “I have just rewritten my iptables config, so it’s possible enough that I’ve screwed something up, there; but, with a default policy of ACCEPT on OUTPUT, I don’t know what I could have done there to affect this.” I passed my new config along, hoping that the tech would see somthing obvious that I had missed. He admitted that it all looked normal, but that, in the end, he can’t support the configuration I put on my vm.

“For more hands on help with your iptables rules you may want to reach out to the Linode community.”

Boo. Time to take a step back.

I use Puppet to manage iptables. More specifically, until recently, I have been using `bobsh/iptables <http://forge.puppetlabs.com/bobsh/iptables>`__, a Puppet module that models individual rules with a native Puppet type.

iptables
{ 'http':
  state => 'NEW',
  proto => 'tcp',
  dport => '80',
  jump  => 'ACCEPT',
}

There’s a newer, more official module out now, though: `puppetlabs/firewall <http://forge.puppetlabs.com/puppetlabs/firewall>`__. This module basically does the same thing as bobsh/iptables, but it’s maintained by Puppet Labs and is positioned for eventual portability to other firewall systems. Plus, whereas bobsh/iptables concatenates all of its known rules and then replaces any existing configuration, puppetlabs/firewall manages the tables in-place, allowing other systems (e.g., fail2ban) to add rules out-of-band without conflict.

In other words: new hotness.

The porting effort was pretty minimal. Soon, I had replaced all of my rules with the new format.

firewall
{ '100 http':
  state  => 'NEW',
  proto  => 'tcp',
  dport  => '80',
  action => 'accept',
}

Not that different. I’m using lexical ordering prefixes now, but I could have done that before. The big win, though, is the replacement of the pre and post fixtures with now-explicit pervasive rule order.

file
{ '/etc/puppet/iptables/pre.iptables':
  source => 'puppet:///modules/s_iptables/pre.iptables',
  owner  => 'root',
  group  => 'root',
  mode   => '0600',
}

file
{ '/etc/puppet/iptables/post.iptables':
  source => 'puppet:///modules/s_iptables/post.iptables',
  owner  => 'root',
  group  => 'root',
  mode   => '0600',
}

See, bobsh/iptables uses a pair of flat files to define a static set of rule fixtures that should always be present.

# pre.iptables

-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT

# post.iptables

-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited

So, in my fixtures, loopback connections were always ACCEPTed, as were any existing connections flagged by connection tracking. Everything else (that isn’t allowed by a rule between these fixtures) is REJECTed. This works well enough, but the flat files are a bit of a hack.

firewall
{ '000 accept localhost':
  iniface => 'lo',
  action  => 'accept',
}

firewall
{ '000 accept tracked connections':
  state  => ['RELATED', 'ESTABLISHED'],
  action => 'accept',
}

firewall
{ '999 default deny (input)':
  proto  => 'all',
  action => 'reject',
}

firewall
{ '999 default deny (forward)':
  chain  => 'FORWARD',
  proto  => 'all',
  action => 'reject',
  reject => 'icmp-host-prohibited',
}

That’s much nicer. (I think so, anyway.) Definitely more flexible.

Anyway: I spent thirty minutes or so porting my existing rules over to puppetlabs/firewall, with no real problems to speak of. Until, of course, I realize I can’t query DNS.

What could have possibly changed? The new configuration is basically one-to-one with the old configuration.

:INPUT ACCEPT [0:0]
[...]
-A INPUT -p tcp -m comment --comment "000 accept tracked connections" -m state --state RELATED,ESTABLISHED -j ACCEPT
[...]
-A INPUT -m comment --comment "900 default deny (input)" -j REJECT --reject-with icmp-port-unreachable

Oh.

So, it turns out that puppetlabs/firewall has default values. In particular, proto defaults to tcp. That’s probably the reasonably pervasive common case, but it was surprising. End result? That little -p tcp in my connection tracking rule means that icmp, udp, and anything else other than tcp can’t establish real connections. The udp response from the DNS server doesn’t get picked up, so it’s rejected at the end.

The fix: explicitly specifying proto => 'all'.

firewall
{ '000 accept tracked connections':
  state  => ['RELATED', 'ESTABLISHED'],
  proto  => 'all',
  action => 'accept',
}

Alternatively, I could reconfigure the default; but it’s fair enought that, as a result, I’d have to explicitly spectify tcp for the majority of my rules. That’s a lot more verbose in the end.

Firewall
{
  proto => 'all',
}

Once again, all is right with the world (or, at least, with RELATED and ESTABLISHED udp and icmp packets).

In which Cisco SACK’d iptables

At some unknown point, ssh to at least some of our front-end nodes started spuriously failing when faced with a sufficiently large burst of traffic (e.g., cating a large file to stdout). I was the only person complaining about it, though–no other team members, no users–so I blamed it on something specific to my environment and prioritized it as an annoyance rather than as a real system problem.

Which is to say: I ignored the problem, hoping it would go away on its own.

Some time later I needed to access our system from elsewhere on the Internet, and from a particularly poor connection as well. Suddenly I couldn’t even scp reliably. I returned to the office, determined to pinpoint a root cause. I did my best to take my vague impression of “a network problem” and turn it into a repeatable test case, dding a bunch of /dev/random into a file and scping it to my local box.

$ scp 10.129.4.32:data Downloads
data 0% 1792KB 1.0MB/s - stalled -^CKilled by signal 2.
$ scp 10.129.4.32:data Downloads
data 0% 256KB 0.0KB/s - stalled -^CKilled by signal 2.

Awesome: it fails from my desktop. I duplicated the failure on my netbook, which I then physically carried into the datacenter. I plugged directly into the DMZ and repeated the test with the same result; but, if I moved my test machine to the INSIDE network, the problem disappeared.

Wonderful! The problem was clearly with the Cisco border switch/firewall, because (a) the problem went away when I bypassed it, and (b) the Cisco switch is Somebody Else’s Problem.

So I told Somebody Else that his Cisco was breaking our ssh. He checked his logs and claimed he saw nothing obviously wrong (e.g., no dropped packets, no warnings). He didn’t just punt the problem back at me, though: he came to my desk and, together, we trawled through some tcpdump.

15:48:37.752160 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520], length 0
15:48:37.752169 IP 10.129.4.32.ssh > 10.68.58.2.53760: Flags [.], seq 47766:55974, ack 3670, win 601, options [nop,nop,TS val 1751514521 ecr 511936353], length 8208
15:48:37.752215 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520,nop,nop,sack 1 {491353276:491354644}], length 0
15:48:37.752240 IP 10.129.4.32 > 10.68.58.2: ICMP host 10.129.4.32 unreachable - admin prohibited, length 72

The sender, 10.129.4.32, was sending an ICMP error back to the receiver, 10.68.58.2. Niggling memory of this “admin prohibited” message reminded me about our iptables configuration.

-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m comment --comment "ssh" -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited

iptables rejects any packet that isn’t explicitly allowed or part of an existing connection with icmp-host-prohibited, exactly as we were seeing. In particular, it seemed to be rejecting any packet that contained SACK fields.

When packets are dropped (or arrive out-of-order) in a modern TCP connection, the receiver sends a SACK message, “ACK X, SACK Y:Z”. The “selective” acknowledgement indicates that segments between X and Y are missing, but allows later segments to be acknowledged out-of-order, avoiding unnecessary retransmission of already-received segments. For some reason, such segments were not being identified by iptables as part of the ESTABLISHED connection.

A Red Hat bugzilla indicated that you should solve this problem by disabling SACK. That seems pretty stupid to me, though, so I went looking around in the iptables documentation in stead. A netfilter patch seemed to indicate that iptables connection tracking should support SACK, so I contacted the author–a wonderful gentleman named Jozsef Kadlecsik–who confirmed that SACK should be totally fine passing through iptables in our kernel version. In stead, he indicated that problems like this usually implicate a misbehaving intermediate firewall appliance.

And so the cycle of blame was back on the Cisco… but why?

Let’s take a look at that tcpdump again.

15:48:37.752215 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520,nop,nop,sack 1 {491353276:491354644}], length 0
15:48:37.752240 IP 10.129.4.32 > 10.68.58.2: ICMP host 10.129.4.32 unreachable - admin prohibited, length 72

ACK 36822, SACK 491353276:491354644… so segments 36823:491353276 are missing? That’s quite a jump. Surely 491316453 segments didn’t get sent in a few nanoseconds.

A Cisco support document holds the answer; or, at least, the beginning of one. By default, the Cisco firewall performs “TCP Sequence Number Randomization” on all TCP connections. That is to say, it modifies TCP sequence ids on incoming packets, and restores them to the original range on outgoing packets. So while the system receiving the SACK sees this:

15:48:37.752215 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520,nop,nop,sack 1 {491353276:491354644}], length 0

…the system sending the sack/requesting the file sees this:

15:49:42.638349 IP 10.68.58.2.53760 > 10.129.4.64.ssh: . ack 36822 win 65535 <nop,nop,timestamp 511936353 1751514520,nop,nop,sack 1 {38190:39558}>

The receiver says “I have 36822 and 38190:39558, but missed 36823:38189.” The sender sees “I have 36822 and 491353276:491354644, but missed 36823:491353275.” 491353276 is larger than the largest sequence id sent by the provider, so iptables categorizes the packet as INVALID.

But wait… if the Cisco is rewriting sequence ids, why is it only the SACK fields that are different between the sender and the receiver? If Cisco randomizes the sequence ids of packets that pass through the firewall, surely the regular SYN and ACK id fields should be different, too.

It’s a trick question: even though tcpdump reports 36822 on both ends of the connection, the actual sequence ids on the wire are different. By default, tcpdump normalizes sequence ids, starting at 1 for each new connection.

-S Print absolute, rather than relative, TCP sequence numbers.

The Cisco doesn’t rewrite SACK fields to coincide with its rewritten sequence ids. The raw values are passed on, conflicting with the ACK value and corrupting the packet. In normal situations (that is, without iptables throwing out invalid packets) this only serves to break SACK; but the provider still gets the ACK and responds with normal full retransmission. It’s a performance degregation, but not a catastrophic failure. However, because iptables is rejecting the INVALID packet entirely, the TCP stack doesn’t even get a chance to try a full retransmission.

Because TCP sequence id forgery isn’t a problem under modern TCP stacks, we’ve taken the advice of the Cisco article and disabled the randomization feature in the firewall altogether.

class-map TCP
match port tcp range 1 65535
policy-map global_policy
class TCP
set connection random-sequence-number disable
service-policy global_policy global

With that, the problem has finally disappeared.

$ scp 10.129.4.32:data Downloads
data 100% 256MB 16.0MB/s 00:16
$ scp 10.129.4.32:data Downloads
data 100% 256MB 15.1MB/s 00:17
$ scp 10.129.4.32:data Downloads
data 100% 256MB 17.1MB/s 00:15