Further info on Identi.ca problems

Evan Prodromou's picture

I posted a few days ago about some technical problems with identi.ca. tl;dr version: our Web servers occasionally hit very high load and stop responding, hurting performance of the site. I've found out a few things in the last few days, so I thought I'd update here for those interested.

Having a high load on a server can come from several causes. It can either be due to high I/O (network connections) or high CPU usages. Direct problems can be repeated connections to non-responsive network services, or buggy software that works inefficiently or goes into an infinite loop. Load can also be spread over multiple processes, or just be one process hogging all the resources.

Here's what we're seeing: one Apache process, on our Web server, has an explosive growth in memory usage. You can see an example in this ps output. Process 26617 has allocated 4Gb of memory, which is causing swapping to virtual memory, which has slowed the server to a standstill.

This memory leak is kind of confusing, since we've got a 96Mb memory_limit set in our PHP configuration. Theoretically, StatusNet itself shouldn't be able to allocate this much memory.

Another point that seems worth noting is that my systems team already had checks in place for this. If the server is loaded, periodic checks will kill and restart Apache. That means that largely these servers have been recovering on their own. It also means that the problem happens more frequently than I thought -- once or twice an hour, not once a day.

At this point, I'm working on a few fronts. First, I'd like to restrict the amount of memory available to any one process on the system. That should prevent (I think) the issue of one process forcing the server to swap and dragging down everything else with it. I'm trying limits.conf (hasn't worked yet) and may venture into cgroups if that doesn't work out.

Second, I've tried to mitigate some of the effects of long-running Apache processes by tuning our Apache settings (including MaxRequestsPerClient) to prevent a process from building up a lot of memory over time.

Third, I'm trying to map individual hits to Apache processes so I can determine what exactly is making that process explode. I hope that I can identify what's causing this explosive memory allocation and fix it so it doesn't happen in the first place.

Thanks to everyone on identi.ca for their patience while I work this out. Still hacking, I promise!

Comments

Changing platforms

Thanks to everyone for your help and comments.

I agree that there is a lot of promise in some lighter-weight Web servers like lighttpd and nginx that run PHP out-of-process. I think we'd probably more likely look into PHP with fastcgi through Apache. I try to be conservative about this kind of change.

@Francois: ZOMG, great point! I'd tried with limits.conf but as you probably know you can't simulate the -v command-line flag there. Putting it into /etc/default/apache2 is a great way to get this done.

I'm going to try it out right now!

memory limit and C modules

I confirm François Marier's theory ; I too have sort out one day that memory_limit PHP setting does not include memory used by C functions (modules).
I recommend putting PHP processes in fastcgi and killing those becoming too big, as a workaround until you find which module is hogging the memory.
Cheers,
X_Cli

You rock!

You rock! Thanks for making identi.ca such a great service.

We Notice

I don't think we say it enough. We do notice your hard work.

Thank you, Evan.

Limiting the virtual memory used by external C modules in PHP

We've had a similar problem [0] with our PHP application. For us, it turned out to be due to the GD library having problems with large (as in height x width, not file size) images.

What we discovered is that the PHP memory limit only applies to actual PHP code, not C libraries like GD that are called from PHP. It's not obvious what PHP libraries are implemented as external C calls which fall outside of the control of the interpreter, but anything that sounds like it's using some other library is probably not in PHP and is worth looking at with suspicion.

So what I tried to limit this memory was to set process limits for the main Apache process and all of its children using ulimit [1].

Unfortunately, the one we really wanted to limit (resident memory or "-m") isn't implemented [2] in the Linux kernel. So what we settled on was to limit the total _virtual_ memory that an Apache process (or sub-process) can take using "ulimit -v". That did work for us.

I did it on a Debian box by adding this to the bottom of /etc/default/apache2:

ulimit -v 1048576

(for a limit of 1GB of virtual mem)

You can of course ensure that it works by setting it first to a very low value and then loading one of your PHP pages and seeing it die with some kind of malloc error.

I'll have a look at cgroups as it sounds promising, but another thing I just discovered is the RLimitMEM directive [3] for Apache.

Cheers,
Francois

[0] https://bugs.launchpad.net/mahara/+bug/784978
[1] http://ss64.com/bash/ulimit.html
[2] https://www.linuxquestions.org/questions/linux-general-1/limit-computing...
[3] https://httpd.apache.org/docs/2.2/mod/core.html#rlimitmem

Maybe lightweight servers should be considered

Evan,

Thanks for working on this. Ironically, I hadn't been using Identi.ca for a while until just today, so I hope it wasn't *my* return to microblogging that caused this problem! :)

But silliness aside, I'm sure that you have already considered this, but these kinds of problems are almost always alleviated when migrating to lightweight web servers, such as nginx. While by no means are they a silver bullet, they definitely have some advantages to full-blown Apache HTTP servers when it comes to handling massive load spikes and large numbers of requests. Memory blow-ups like this would also probably be more avoidable.

It's just a thought. Apache HTTP server is the stalwart of the Internet, of course, but by no means is it also the only option. For specialized uses, we have alternatives. :)

Good luck

That sounds like a hell of an issue; good luck to you and the rest of the Status.net team!

Thanks for all you guys do, with both Identi.ca and the Status.net software. :D

Post new comment

Please note that blog comments are not monitored by our support staff. If you need assistance please visit our forums at forum.status.net or see the Support page for other options.
The content of this field is kept private and will not be shown publicly.
CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.
Image CAPTCHA
Enter the characters shown in the image.