Post-mortem on last week’s problems

The last few days have been terrible, for us and for you. The performance on gitorious.org has been really bad, and the site has been going up and down constantly. We’re really sorry about that.

We started receiving notifications that the server was unavailable about a week ago, and since that we’ve been trying to find out what was causing the outages and bad response times – which became worse after the weekend.

One of the first things we noticed was that out caching server was unable to cache any content at all, due to our Rails backend sending a Set-Coookie HTTP header on every request. Since this would tax the server really badly, we deployed a fix for this on Tuesday and saw some of the most requested content (single commit diffs) being cached again.

We were hoping that we had found the root cause and hoping for better performance as our cache got warm again. Not seeing the desired improvement in response times, we started suspecting there could be other issues causing the bad response times. Analyzing the server load, we found a really high number of Atom requests for pages that rendered slowly and suspected that the combination of polling RSS clients and missing cache support were slowing down the servers. We set up our cache server to force caching of any atom request for an hour, removing Set-Cookie headers in the process. When this didn’t work, we temporarily disabled atom requests entirely, which helped a little.

But the problems persisted. Last night we noticed that rendering a project page on gitorious.org was really slow. Easily over a minute for the most popular ones. Looking through the code, it turned out that we were effectively no longer caching events listed on that page in Memcache, resulting in a lot of database access. Once you’ve found the problem, the solution usually isn’t very far away – which was also the case here. We deployed Christian’s commit from last night this morning, and the servers are handling the load a lot better today.

The reason this happened, and the reason it took so long to find the problem, was that we merged a quite big feature into master a week ago, private repositories. Gitorious.org will not be offering private repositories, and there is a “feature switch” which is turned off for gitorious.org. Apparently there was still a place or two where performance was affected even if the feature was turned off, which is what happened over the last few days.

We’re not very proud of how we’ve kept you informed about the problems over the last few days. Heads-down, diagnosing the problem, trying different fixes and responding to support email, we neglected updating our status site and Identi.ca/Twitter accounts.

We did a bad job of keeping you informed, and we will make sure that you are kept fully in the loop if trouble should strike in the future.

[Edit: after publishing this post, we discovered a similar issue in the repository pages on Gitorious, we just deployed a fix for that].

3 Comments

  1. Posted May 18, 2012 at 9:10 pm | Permalink

    One of the more impressive blogs Ive seen.I came to your article from another article and am really interested in this learning about this.I just stumbled upon your blog and wanted to say that I have really enjoyed browsing your blog posts.I am so lucky.

  2. Dallas J. Yoon
    Posted July 2, 2012 at 11:45 am | Permalink

    thanks a lot for your valuable sharing ,right from the beginning till end it was really very informative .i can witness the experience and steps you have taken to accomplish this wonderful work. uk florist

  3. Posted December 6, 2013 at 7:11 pm | Permalink

    The Cisco 200-001 real exam is planned and researched by IT experts who are very much involved in the IT field. They have been trying their level best to create concise and logical study guide by using their data. Using the product of Uniquedumps will not only help you pass the exam but also safe a bright future for you ahead. Cisco Rich Media Communications


Post a Comment

Required fields are marked *

*
*

Follow

Get every new post delivered to your Inbox.

Join 716 other followers

%d bloggers like this: