Posts about kaust

In which Cisco SACK’d iptables

At some unknown point, ssh to at least some of our front-end nodes started spuriously failing when faced with a sufficiently large burst of traffic (e.g., cating a large file to stdout). I was the only person complaining about it, though–no other team members, no users–so I blamed it on something specific to my environment and prioritized it as an annoyance rather than as a real system problem.

Which is to say: I ignored the problem, hoping it would go away on its own.

Some time later I needed to access our system from elsewhere on the Internet, and from a particularly poor connection as well. Suddenly I couldn’t even scp reliably. I returned to the office, determined to pinpoint a root cause. I did my best to take my vague impression of “a network problem” and turn it into a repeatable test case, dding a bunch of /dev/random into a file and scping it to my local box.

$ scp 10.129.4.32:data Downloads
data 0% 1792KB 1.0MB/s - stalled -^CKilled by signal 2.
$ scp 10.129.4.32:data Downloads
data 0% 256KB 0.0KB/s - stalled -^CKilled by signal 2.

Awesome: it fails from my desktop. I duplicated the failure on my netbook, which I then physically carried into the datacenter. I plugged directly into the DMZ and repeated the test with the same result; but, if I moved my test machine to the INSIDE network, the problem disappeared.

Wonderful! The problem was clearly with the Cisco border switch/firewall, because (a) the problem went away when I bypassed it, and (b) the Cisco switch is Somebody Else’s Problem.

So I told Somebody Else that his Cisco was breaking our ssh. He checked his logs and claimed he saw nothing obviously wrong (e.g., no dropped packets, no warnings). He didn’t just punt the problem back at me, though: he came to my desk and, together, we trawled through some tcpdump.

15:48:37.752160 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520], length 0
15:48:37.752169 IP 10.129.4.32.ssh > 10.68.58.2.53760: Flags [.], seq 47766:55974, ack 3670, win 601, options [nop,nop,TS val 1751514521 ecr 511936353], length 8208
15:48:37.752215 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520,nop,nop,sack 1 {491353276:491354644}], length 0
15:48:37.752240 IP 10.129.4.32 > 10.68.58.2: ICMP host 10.129.4.32 unreachable - admin prohibited, length 72

The sender, 10.129.4.32, was sending an ICMP error back to the receiver, 10.68.58.2. Niggling memory of this “admin prohibited” message reminded me about our iptables configuration.

-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -p tcp -m tcp --dport 22 -m state --state NEW -m comment --comment "ssh" -j ACCEPT
-A INPUT -j REJECT --reject-with icmp-host-prohibited

iptables rejects any packet that isn’t explicitly allowed or part of an existing connection with icmp-host-prohibited, exactly as we were seeing. In particular, it seemed to be rejecting any packet that contained SACK fields.

When packets are dropped (or arrive out-of-order) in a modern TCP connection, the receiver sends a SACK message, “ACK X, SACK Y:Z”. The “selective” acknowledgement indicates that segments between X and Y are missing, but allows later segments to be acknowledged out-of-order, avoiding unnecessary retransmission of already-received segments. For some reason, such segments were not being identified by iptables as part of the ESTABLISHED connection.

A Red Hat bugzilla indicated that you should solve this problem by disabling SACK. That seems pretty stupid to me, though, so I went looking around in the iptables documentation in stead. A netfilter patch seemed to indicate that iptables connection tracking should support SACK, so I contacted the author–a wonderful gentleman named Jozsef Kadlecsik–who confirmed that SACK should be totally fine passing through iptables in our kernel version. In stead, he indicated that problems like this usually implicate a misbehaving intermediate firewall appliance.

And so the cycle of blame was back on the Cisco… but why?

Let’s take a look at that tcpdump again.

15:48:37.752215 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520,nop,nop,sack 1 {491353276:491354644}], length 0
15:48:37.752240 IP 10.129.4.32 > 10.68.58.2: ICMP host 10.129.4.32 unreachable - admin prohibited, length 72

ACK 36822, SACK 491353276:491354644… so segments 36823:491353276 are missing? That’s quite a jump. Surely 491316453 segments didn’t get sent in a few nanoseconds.

A Cisco support document holds the answer; or, at least, the beginning of one. By default, the Cisco firewall performs “TCP Sequence Number Randomization” on all TCP connections. That is to say, it modifies TCP sequence ids on incoming packets, and restores them to the original range on outgoing packets. So while the system receiving the SACK sees this:

15:48:37.752215 IP 10.68.58.2.53760 > 10.129.4.32.ssh: Flags [.], ack 36822, win 65535, options [nop,nop,TS val 511936353 ecr 1751514520,nop,nop,sack 1 {491353276:491354644}], length 0

…the system sending the sack/requesting the file sees this:

15:49:42.638349 IP 10.68.58.2.53760 > 10.129.4.64.ssh: . ack 36822 win 65535 <nop,nop,timestamp 511936353 1751514520,nop,nop,sack 1 {38190:39558}>

The receiver says “I have 36822 and 38190:39558, but missed 36823:38189.” The sender sees “I have 36822 and 491353276:491354644, but missed 36823:491353275.” 491353276 is larger than the largest sequence id sent by the provider, so iptables categorizes the packet as INVALID.

But wait… if the Cisco is rewriting sequence ids, why is it only the ``SACK`` fields that are different between the sender and the receiver? If Cisco randomizes the sequence ids of packets that pass through the firewall, surely the regular SYN and ACK id fields should be different, too.

It’s a trick question: even though tcpdump reports 36822 on both ends of the connection, the actual sequence ids on the wire are different. By default, tcpdump normalizes sequence ids, starting at 1 for each new connection.

-S Print absolute, rather than relative, TCP sequence numbers.

The Cisco doesn’t rewrite SACK fields to coincide with its rewritten sequence ids. The raw values are passed on, conflicting with the ACK value and corrupting the packet. In normal situations (that is, without iptables throwing out invalid packets) this only serves to break SACK; but the provider still gets the ACK and responds with normal full retransmission. It’s a performance degregation, but not a catastrophic failure. However, because iptables is rejecting the INVALID packet entirely, the TCP stack doesn’t even get a chance to try a full retransmission.

Because TCP sequence id forgery isn’t a problem under modern TCP stacks, we’ve taken the advice of the Cisco article and disabled the randomization feature in the firewall altogether.

class-map TCP
match port tcp range 1 65535
policy-map global_policy
class TCP
set connection random-sequence-number disable
service-policy global_policy global

With that, the problem has finally disappeared.

$ scp 10.129.4.32:data Downloads
data 100% 256MB 16.0MB/s 00:16
$ scp 10.129.4.32:data Downloads
data 100% 256MB 15.1MB/s 00:17
$ scp 10.129.4.32:data Downloads
data 100% 256MB 17.1MB/s 00:15

The Brzeen Hotel in Riyadh

living room

Unexpectedly, I find myself in Riyadh tonight. This is my first trip here, though so far the sum total of my experience has been airport, taxi, and hotel.

I hope that I manage to see something uniquely Riyadh during the trip, though I will not be surprised if my journey is a sequence of point-to-point trips, with no time in the actual city. A shame.

The story, so far as I know it, is that we (my team lead, Andrew Winfer, and I) have been called out as representatives of KAUST to assist KACST in the configuration of their Blue Gene/P. They have a single rack (4096 PowerPC compute cores, likely with four terabytes of memory disributed among them). As I understand it, KACST is structured much like a national laboratory, and this system is being managed by the group of research scientists using it. Apparently they haven't been terribly pleased with the system thus far; but a Blue Gene is a bit different from a traditional cluster, and those differences can be confusing at first.

I hope we will be able to assist them. More exciting, though, is the possibility that this is the first in a series of future collaborations between our two institutions.

Of course, I haven't been to KACST yet: we only just arrived in Riyadh at 22:00. I'm procrastinating sleep with the trickle of Internet available in my room.

KAUST has put us up in the Brzeen Hotel. (I'm giving up on trying to isolate a correct Arabic spelling.) The room is perfectly servicable, if a bit barren; but overshadowing everything else is the size of it all.

Anyway: I expect there will be more interesting things to say tomorrow.

bedroom

things stolen from our house

Someone broke into our house and stole from us while we were on vacation in Greece. I had hoped that, when we returned home, I’d write a bit about our trip; but after the theft all I could think about is how broken my world felt.

I was upset about losing the things that were stolen. God has blessed us, and we’ll be alright, but I hate spending money. I didn’t like spending money to buy things in the first place, and it seems doubly wasteful to replace luxury items. Was it wrong to purchase these things in the first place? It certainly seems like it was a waste when they’re gone.

Before we left, a few emails were being circulated about break-ins on the KAUST campus. I usually think that such stories are overblown. People overreact, and panic, and I don’t want to be like that.

I felt like there was so little I could do to prevent this in the future. We’re on a closed campus, with photo ID’s checked at the gate. That gave me this implicit sense of security within these walls; but now I’m questioning everyting. One of the first things that we did was have key control replace the locks on our doors; but KAUST manages the locks: if someone in key control is stealing things, who can stop them? What if someone in the transportation department is targeting people that they know have gone to the airport?

I wrote all this pretty soon after it happened, when I was trying to make sense of my disconcertion about everything. A few days layer, though, we got a call from KAUST security telling us that they thought they had recovered some of our property. A group of four or five guys from the housekeeping service were using their positions to stake out houses that had things worth taking.

So there you have it: I got some closure. We didn’t get everything back (Andi’s necklace and my netbook, most disappointingly) but my relative peace with that makes me feel a little bit better about the materialism that I feared in myself. I think what I needed was an end to the story. That’s a different problem, but it at least disappoints me less than it would for me to find out that my material possessions possess me as much as they seemed to for a bit.

Here’s a list of what we’ve noticed missing so far:

  • Macbook Air ($1400) (returned)

    Part number: MC906LL/A Serial number: [redacted]

  • HP Mini 1000 netbook (approx. $300)

  • Fourth-generation, 8GB iPod Nano, green ($150) (returned)

  • Fifth-generation, 8GB iPod Nano, blue ($150)

    Part number: MC037LL/A Serial number: [redacted]

  • Nokia mobile phone for AT&T GoPhone service (approx. $40)

  • Nokia mobile phone and Mobily sim card (approx. $40)

  • gold necklace

    I got this for Andi last Christmas in Al Balad, but I don’t remember how much I paid for it at all.

  • yurbud Ironman earbuds ($50)

  • Nike+ iPod dongle ($30)

  • Foam laptop zipper sleeve (approx. $20)

  • 60W MagSafe Power Adapter for Macbook ($80)

    This was attached to an extended power lead (adapter-specific) that was also taken.

  • Olympus digital camera, waterproof (approx. $170) (returned)

    It wasn’t this precise model, but it was one of these Olympus waterproof cameras.

  • Power adapter for a Western Digital MyBook external hard drive (partial, unknown)

    I think this might have been taken by mistake for one of the phones that was taken.

  • Pocket watch (approx. $100)

    Andi got this for me when she was in Turkey.

breakfast with Khan

My attempt to follow the Ramadan fast this year is going much better than last year. (I didn’t get violently ill after one day, after all.) I think it’s about ten days to go now, so I seem to be in the clear.

It’s been an interesting experience. I’ve learned a bit about myself, and about the centrality of food (specifically eating) to society. I haven’t spent much time with my coworkers during the fast: there’s no lunch, no coffee breaks…

Yesterday wasn’t so good of a day for me, both for completely unrelated reasons and because I hadn’t had enough water the night before. We were out of food, so I was at Tamimi to get something to eat for iftar. There was Khan: “Jon! Come have breakfast with me!” For once, it just felt right. I went with him, and shared iftar with Khan and his coworkers in the back of Tamimi.

It was awesome.

description of Shaheen from the WatsonLinux decommissioning

I just sent out the word that WatsonLinux is being decommissioned, and managed a relatively good sales pitch for Shaheen in the process. For those of you who wonder what I do…

Shaheen is powered by a 16-rack (65536-core) IBM Blue Gene/P system and a 96-(soon 128-)node IBM System x cluster. It currently ranks at #18 on the TOP500 list 1 of the world’s most powerful supercomputers, and remains the most powerful supercomputer in the middle east, capable of 222.82 teraflops peak (190.90 sustained).

Built with the environment in mind, Shaheen ranks #9 on the TOP Green500 list 2, providing 378.77 megaflops per watt.

If you are interested in using Shaheen in your research, contact our support desk at shaheen-help@kaust.edu.sa.

1

http://www.top500.org/list/2009/11/100

2

http://www.green500.org/lists/2009/11/top/list.php

I can’t always figure out the reason

I’m obsessed with reason. I have to be able to explain things to myself; to understand why things are. That doesn’t mean that my reasons are rational, but I have to put it somewhere in the taxonomy of my mind.

When our house flooded, and the ceiling caved in, KAUST moved us to a house five doors down the block that was identical in design, except that it was the mirror image of our first house. To reconcile the disruption in my mind, I decided that the reversed house represented a turning point of the reversal of our experience at KAUST. From that point on, things would be easier… the reversal of the frustration we had experienced before.

We had a very frustrating time traveling back to Saudi Arabia after our trip to the states for thanksgiving. Our flight from Indianapolis to Chicago was delayed (picking us up) so our entire itinerary was disrupted. We were rerouted on a different airline network, so none of our new flights had any record of us. Andi’s ticket didn’t match her passport (both via nicknames and married names), causing us to almost miss our flight out of Heathrow after a nine-hour layover (three hours of which was spent arguing that Andi should be allowed to board the plane).

I spent a lot of time during our our last flight (Saudi Airlines from Heathrow to Jeddah) praying to God that our luggage would be at the airport when we arrived. We were returning to a strange country with no house to go to, and no idea where we would be staying. We had been through a lot of frustration in transit, and we were both frustrated and frightened. Andi was falling apart, and, as always, I decided it was my responsibility to make it right. To keep things together. If our luggage was at the airport, against all odds, I would have evidence that, through it all, God was watching out for us, and helping us through our times of trial.

We landed at a strange airport with no luggage, no idea where we were, and no driver to pick us up. I was crushed. I didn’t swear off my faith or trust in God, mind you; but my reasoning went something like this:

  1. God does everything for a reason.

  2. God had the power to guide our luggage to us at the airport.

  3. Our luggage was not at the airport.

  4. There was no reason why we should get our luggage later, rather than immediately.

  5. God must not need us to get our luggage.

  6. We would never see our luggage again.

I had it all figured out, and I had doomed myself.

A week later, though, and we had our luggage back. Why? Because I was driven back to the airport by a wonderful man named Arnold. The same man who took Andi and me to Jeddah for Nerph’s first visit to the vet. He told me all about his family back home, why he was working in Saudi Arabia, and how he hopes to be hired as a KAUST employee so he can bring his family with him.

Later that week, he came to my house and beat me at chess. Three times in a row.

the story so far…

My original plan was to get civilfritz set up as soon as I got my apartment in New York, and begin logging my experiences there. Though I suppose I have not yet violated that expectation, it seems that I won’t be in a permanent residence, with a regular ISP, for some time. Thus blogspot.

For those who haven’t heard, things at KAUST have not gone as planned. Though I had intended to live in New York for 12-18 months, I was greeted at IBM/Watson with the announcement that I would need to relocate to Saudi Arabia in June, leaving us with less than 30 days to prepare.

We’ve had a couple of difficult days (the worst of which was probably the day we transferred the contents of the U-Haul to a seemingly tiny storage space) but God has definitely been there with us. This was what we wanted, after all: to be challenged.

We’ve been through three separate hotels so far, including less than 12 hours in an Extended Stay America, but we finally have a wonderful room at the Marriott Westchester Residence Inn in White Plains. It’s a 10 minute walk to the train station, which has been wonderful.

We found a surprisingly great church in the area our first Sunday here: the First Community Church of the Nazarene. I seem to be destined to attend an ethnically-centered church: this time it’s an all-black congregation that is, apparently, mostly Jamaican (or otherwise Caribbean). The minister and his wife sought us out, though, and took us on a tour of Westchester County (including the Pepsi Headquarters), terminating at a seaside restaurant in Stamford, Connecticut.

Shaun came for a quick weekend visit (from New Mexico) to see us off before we leave the country. We spent a day in the city, including a lot of wandering about in the subway and a riverboat cruise around Liberty Island.

Work at IBM/KAUST has been a hectic cross-section of paperwork, IT, and meetings so far, including two days of meetings with the KAUST CIO, John Larson. It’s all gone respectably well so far, and I’m excited to get to Thuwal so I can start building things that matter long-term. Until then, though, it’s back to the 32-node Linux cluster, and more paperwork to get a visa for Saudi Arabia.