In which I discover how Avahi slowed down ssh
This page, at the time of writing, is served by a virtual machine running at linode. Nothing special: just Debian 6. The node is more than just an Apache server: I connect to it via ssh a lot (most often indirectly, with Git).
As long as I can remember, there’s been a noticeable delay in establishing an ssh connection. I’ve always written this off as one of the side-effects of living so far away from the server; but the delay is present even when I’m in the US, and doesn’t affect latency as much once the connection is established.
Tonight, I decided to find out why.
First, I started a parallel instance of sshd on my VM, running on an
alternate port.
# /usr/sbin/sshd -ddd -p 2222
Then I connected to this port from my local system.
$ ssh -vvv ln1.civilfritz.net -p 2222
I watched the debug output in both terminals, watching for any obvious delay. On the server, that delay was preceded by a particular log entry.
debug3: Trying to reverse map address 109.171.130.234.
The log on the client was less damning: just a reference to checking the assumable locations for ssh keys.
debug2: key: /Users/janderson/.ssh/id_ecdsa (0x0)
I confirmed that the address referenced by the server was one of the
external NAT addresses used by my
ISP. Presumably
sshd or PAM is trying to determine a name for use in authorization
assertions.
At that point, I set out to understand why there was a delay reverse-mapping the address. Unsurprisingly, KAUST hasn’t populated reverse-dns for the address.
$ time host 109.171.130.234 Host 234.130.171.109.in-addr.arpa. not found: 3(NXDOMAIN) real 0m0.010s user 0m0.004s sys 0m0.005s
That said, getting the confirmation that the reverse-dns was unpopulated did not take the seconds that were seen as a delay establishing an ssh connection.
I had to go… deeper.
# strace -t /usr/sbin/sshd -d -p 2222
Here I invoke the ssh server with a system call trace. I included timestamps in the trace output so I could more easily locate the delay, but I was able to see it in real-time.
00:27:44 socket(PF_FILE, SOCK_STREAM, 0) = 4
00:27:44 fcntl64(4, F_GETFD) = 0
00:27:44 fcntl64(4, F_SETFD, FD_CLOEXEC) = 0
00:27:44 connect(4, {sa_family=AF_FILE, path="/var/run/avahi-daemon/socket"}, 110) = 0
00:27:44 fcntl64(4, F_GETFL) = 0x2 (flags O_RDWR)
00:27:44 fstat64(4, {st_mode=S_IFSOCK|0777, st_size=0, ...}) = 0
00:27:44 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7698000
00:27:44 _llseek(4, 0, 0xbf88eb18, SEEK_CUR) = -1 ESPIPE (Illegal seek)
00:27:44 write(4, "RESOLVE-ADDRESS 109.171.130.234\n", 32) = 32
00:27:44 read(4, "-15 Timeout reached\n", 4096) = 20
00:27:49 close(4) = 0
That’s right: a five second delay. The horror!
Pedantic troubleshooting aside, the delay is introduced while reading
from /var/run/avahi-daemon/socket. Sure enough, I have
Avahi installed.
$ dpkg --get-selections '*avahi*' avahi-daemon install libavahi-client3 install libavahi-common-data install libavahi-common3 install libavahi-compat-libdnssd1 install libavahi-core7 install
Some research that is difficult to represent here illuminated the fact that Avahi was installed when I installed Mumble. (Mumble might actually be the most-used service on this box, after my use of it to store my Org files. Some friends of mine use it for voice chat in games.)
I briefly toyed with the idea of tracking down the source of the delay in Avahi; but this is a server in a dedicated, remote datacenter. I can’t imagine any use case for an mDNS resolver in this environment, and I’m certainly not using it now.
# service stop avahi-daemon
Now it takes less than three seconds to establish the connection. Not bad for halfway across the world.
$ time ssh janderson@ln1.civilfritz.net -p 2222 /bin/false real 0m2.804s user 0m0.017s sys 0m0.006s
Better yet, there are no obvious pauses in the debugging output on the server.
Avahi hooks into the system name service using nsswitch.conf, just
as you’d expect.
$ grep mdns /etc/nsswitch.conf hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4
Worst case I could simply edit this config file to remove mDNS; but I
wanted to remove it from my system altogether. It should just be
“recommended” by mumble-server, anyway.
# aptitude purge avahi-daemon libnss-mdns
With the relevant packages purged from my system, nsswitch.conf is
modified (automatically) to match.
$ grep hosts: /etc/nsswitch.conf hosts: files dns
Problem solved.