In which I discover how Avahi slowed down ssh

This page, at the time of writing, is served by a virtual machine running at linode. Nothing special: just Debian 6. The node is more than just an Apache server: I connect to it via ssh a lot (most often indirectly, with Git).

As long as I can remember, there’s been a noticeable delay in establishing an ssh connection. I’ve always written this off as one of the side-effects of living so far away from the server; but the delay is present even when I’m in the US, and doesn’t affect latency as much once the connection is established.

Tonight, I decided to find out why.

First, I started a parallel instance of sshd on my VM, running on an alternate port.

# /usr/sbin/sshd -ddd -p 2222

Then I connected to this port from my local system.

$ ssh -vvv ln1.civilfritz.net -p 2222

I watched the debug output in both terminals, watching for any obvious delay. On the server, that delay was preceded by a particular log entry.

debug3: Trying to reverse map address 109.171.130.234.

The log on the client was less damning: just a reference to checking the assumable locations for ssh keys.

debug2: key: /Users/janderson/.ssh/id_ecdsa (0x0)

I confirmed that the address referenced by the server was one of the external NAT addresses used by my ISP. Presumably sshd or PAM is trying to determine a name for use in authorization assertions.

At that point, I set out to understand why there was a delay reverse-mapping the address. Unsurprisingly, KAUST hasn’t populated reverse-dns for the address.

$ time host 109.171.130.234
Host 234.130.171.109.in-addr.arpa. not found: 3(NXDOMAIN)

real    0m0.010s
user    0m0.004s
sys     0m0.005s

That said, getting the confirmation that the reverse-dns was unpopulated did not take the seconds that were seen as a delay establishing an ssh connection.

I had to go… deeper.

# strace -t /usr/sbin/sshd -d -p 2222

Here I invoke the ssh server with a system call trace. I included timestamps in the trace output so I could more easily locate the delay, but I was able to see it in real-time.

00:27:44 socket(PF_FILE, SOCK_STREAM, 0) = 4
00:27:44 fcntl64(4, F_GETFD)            = 0
00:27:44 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
00:27:44 connect(4, {sa_family=AF_FILE, path="/var/run/avahi-daemon/socket"}, 110) = 0
00:27:44 fcntl64(4, F_GETFL)            = 0x2 (flags O_RDWR)
00:27:44 fstat64(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
00:27:44 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7698000
00:27:44 _llseek(4, 0, 0xbf88eb18, SEEK_CUR) = -1 ESPIPE (Illegal seek)
00:27:44 write(4, "RESOLVE-ADDRESS 109.171.130.234\n", 32) = 32
00:27:44 read(4, "-15 Timeout reached\n", 4096) = 20
00:27:49 close(4)                       = 0

That’s right: a five second delay. The horror!

Pedantic troubleshooting aside, the delay is introduced while reading from /var/run/avahi-daemon/socket. Sure enough, I have Avahi installed.

$ dpkg --get-selections '*avahi*'
avahi-daemon                    install
libavahi-client3                install
libavahi-common-data                install
libavahi-common3                install
libavahi-compat-libdnssd1           install
libavahi-core7                  install

Some research that is difficult to represent here illuminated the fact that Avahi was installed when I installed Mumble. (Mumble might actually be the most-used service on this box, after my use of it to store my Org files. Some friends of mine use it for voice chat in games.)

I briefly toyed with the idea of tracking down the source of the delay in Avahi; but this is a server in a dedicated, remote datacenter. I can’t imagine any use case for an mDNS resolver in this environment, and I’m certainly not using it now.

# service stop avahi-daemon

Now it takes less than three seconds to establish the connection. Not bad for halfway across the world.

$ time ssh janderson@ln1.civilfritz.net -p 2222 /bin/false

real    0m2.804s
user    0m0.017s
sys     0m0.006s

Better yet, there are no obvious pauses in the debugging output on the server.

Avahi hooks into the system name service using nsswitch.conf, just as you’d expect.

$ grep mdns /etc/nsswitch.conf
hosts:          files mdns4_minimal [NOTFOUND=return] dns mdns4

Worst case I could simply edit this config file to remove mDNS; but I wanted to remove it from my system altogether. It should just be “recommended” by mumble-server, anyway.

# aptitude purge avahi-daemon libnss-mdns

With the relevant packages purged from my system, nsswitch.conf is modified (automatically) to match.

$ grep hosts: /etc/nsswitch.conf
hosts:          files dns

Problem solved.