How do you approach the problem of "MaxClients reached" with apache?

Discussion:

(too old to reply)

francis picabia

2012-02-22 14:10:02 UTC

Hello,

One of the most frustrating problems which can happen in apache is to
see the error:

server reached MaxClients setting

After it, the server slowly spirals down. Sometimes it mysteriously recovers.
This is difficult to diagnose after the problem appeared and went away.

What have we for advice on :

a) diagnosis of the cause when the problem is not live
b) walling off the problem so one bad piece of code or data does not
effect all sites hosted on same server

Obviously a Virtual Machine could handle item b, but I am hoping for
something interesting on the apache side. Obviously there is
increasing the limit, but this is typically not the solution - the real
problem for apache is bad code or data outside of what the code
can digest in a timely manner).

One tip I've seen suggested is to reduce the apache timeout from 120
seconds down to 60 seconds or so.

--
To UNSUBSCRIBE, email to debian-user-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Archive: http://lists.debian.org/CA+***@mail.gmail.com

Jochen Spieker

2012-02-22 15:20:02 UTC

Permalink

Post by francis picabia
One of the most frustrating problems which can happen in apache is to
server reached MaxClients setting

Well, either you are trying to serve more clients than you have
configured, or whatever is responsible for creating the responses
(application server, CGI script, PHP, the filesystemâŠ) is too slow.

You can monitor response times using a CustomLogFormat.

Post by francis picabia
One tip I've seen suggested is to reduce the apache timeout from 120
seconds down to 60 seconds or so.

What might help is to use a low KeepAliveTimeout or don't use KeepAlive
at all. That makes Apache kill idle conections faster. Reducing the
CompressionLevel in use has helped on one site I help out with, too.

J.

--
Ultimately, the Millenium Dome is a spectacular monument of the
doublethink of our times.
[Agree] [Disagree]
<http://www.slowlydownward.com/NODATA/data_enter2.html>

Karl E. Jorgensen

2012-02-22 17:00:02 UTC

Permalink

To diagnose it, enable server-status - this gives you a URL in apache
where you can see how many workers are busy, how many requests are being
served etc. You want to tie this into your monitoring system.

As for increasing capacity you need to read the docs for the MPM worker
you're using - for example:
http://httpd.apache.org/docs/2.2/mod/prefork.html for the pre-forking
server.

Post by francis picabia
Obviously a Virtual Machine could handle item b, but I am hoping for
something interesting on the apache side. Obviously there is
increasing the limit, but this is typically not the solution - the real
problem for apache is bad code or data outside of what the code
can digest in a timely manner).
One tip I've seen suggested is to reduce the apache timeout from 120
seconds down to 60 seconds or so.

francis picabia

2012-02-23 02:30:02 UTC

Permalink

On Wed, Feb 22, 2012 at 12:39 PM, Karl E. Jorgensen

Post by francis picabia
Hello,
One of the most frustrating problems which can happen in apache is to
server reached MaxClients setting
After it, the server slowly spirals down. Sometimes it mysteriously recovers.
This is difficult to diagnose after the problem appeared and went away.
a) diagnosis of the cause when the problem is not live
b) walling off the problem so one bad piece of code or data does not
effect all sites hosted on same server
To diagnose it, enable server-status - this gives you a URL in apache where
you can see how many workers are busy, how many requests are being served
etc. You want to tie this into your monitoring system.

I've never tried this. I'll need to see if it can be polled by cacti
or something.
It seems when the problem often happens, there is no one in the office.

Post by francis picabia
As for increasing capacity you need to read the docs for the MPM worker
http://httpd.apache.org/docs/2.2/mod/prefork.html for the pre-forking
server.

Capacity isn't the issue, but finding the source for the runaway
train/fire, whatever
analogy makes sense for you.

Another thing I've looked at is logging more detail. The LogFormat
can include %T, %I and %O to show the time elapsed in response,
the input bytes and output bytes. I suspect if we look back in
the logs while using the enhanced logging, and see some of
these values rise we will have an idea where the lag is beginning.

--
To UNSUBSCRIBE, email to debian-user-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Archive: http://lists.debian.org/CA+AKB6EjXk9CHyAp18iYfip22ayTHULufEmqeyK4mD0y+***@mail.gmail.com

Bob Proulx

2012-02-22 20:00:02 UTC

Permalink

Post by francis picabia
One of the most frustrating problems which can happen in apache is to
server reached MaxClients setting

Why is it frustrating?

Is that an error? Or is that protection against a denial of service
attack? I think it is protection.

The default for Debian's apache2 configuration is MaxClients 150.
That is fine for many systems but way too high for many light weight
virtual servers for example. Every Apache process consumes memory.
The amount of memory will depend upon your configuration (whether mod
php or other modules are installed) but values between 20M and 50M are
typical. On the low end of 20M per process hitting 150 clients means
use of 1000M (that is one gig) of memory. If you only had a 512M ram
server instance then this would be a serious VM thrash, would slow
your server to a crawl, and would generally be very painful. The
default MaxClients 150 is probably suitable for any system with 1.5G
or more of memory. On a 4G machine the default should certainly be
fine. On a busier system you would need additional performance
tuning.

On small machines I usually decide how much memory I am willing to
allocate to Apache and then set MaxClients accordingly. For example
on a 512M ram virtual private server running a small site the best
MaxClient value is down in the range of 12 depending upon various
factors. Hitting MaxClients is often required protection against
external attacks.

Post by francis picabia
After it, the server slowly spirals down. Sometimes it mysteriously
recovers. This is difficult to diagnose after the problem appeared
and went away.
a) diagnosis of the cause when the problem is not live

Look in your access and error logs for a high number of simultaneous
clients. Tools such as munin, awstats, webalizer and others may be
helpful. I use those in addition to scanning the logs directly.

Post by francis picabia
b) walling off the problem so one bad piece of code or data does not
effect all sites hosted on same server

I haven't tried this and have no information but it would seem
reasonable to hack up some type of iptables rate limiting based upon
attack activity from the log files. Something like fail2ban does for
ssh connections.

Usually the problem with MaxClients is an unfriendly Internet. Robots
hammering away at sites rapidly in parallel can consume many server
resources. The attack is usually external.

Post by francis picabia
One tip I've seen suggested is to reduce the apache timeout from 120
seconds down to 60 seconds or so.

Reducing the Timeout value is also useful at reducing the attack window.

Bob

francis picabia

2012-02-23 02:20:02 UTC

Permalink

Post by Bob Proulx

Post by francis picabia
One of the most frustrating problems which can happen in apache is to
server reached MaxClients setting

Why is it frustrating?

Yes, maybe you don't know this condition. Suppose you have hundreds of
users who might decide to dabble in some php and not know much more
than their textbook examples? In this case, part of the high connection
rate is merely code running on the server. It comes from the server's IP,
so no, a firewall rate limit won't help. It is particularly annoying when
this happens after hours and we need to understand the situation
posthumously.

Post by Bob Proulx
Is that an error? Or is that protection against a denial of service
attack? I think it is protection.

It does protect the OS, but it doesn't protect apache. Apache stops taking
new connections, and it is just as good as if the system had burned to
the ground in terms of what the outside world sees.

MaxClients isn't much of a feature from the standpoint of running a service
as the primary purpose of the server. When the maximum is reached,
apache does nothing to get rid of the problem. It can just stew there
not resolving any hits for the next few hours. It is not as useful say
as the OOM killer in the Linux kernel.

Post by Bob Proulx
The default for Debian's apache2 configuration is MaxClients 150.
That is fine for many systems but way too high for many light weight
virtual servers for example. Every Apache process consumes memory.
The amount of memory will depend upon your configuration (whether mod
php or other modules are installed) but values between 20M and 50M are
typical. On the low end of 20M per process hitting 150 clients means
use of 1000M (that is one gig) of memory. If you only had a 512M ram
server instance then this would be a serious VM thrash, would slow
your server to a crawl, and would generally be very painful. The
default MaxClients 150 is probably suitable for any system with 1.5G
or more of memory. On a 4G machine the default should certainly be
fine. On a busier system you would need additional performance
tuning.

Ours was at 256, already tuned. So the problem is, as I stated, not
about raising the limit, but about troubleshooting the source of
the problem.

Post by Bob Proulx
Look in your access and error logs for a high number of simultaneous
clients. Tools such as munin, awstats, webalizer and others may be
helpful. I use those in addition to scanning the logs directly.

I hate munin. Too much overhead for the systems where we
are already close to performance issues when there is
peak traffic. I already poll the load and it had not increased.
I should add a scan of the memory usage.
I prefer cacti for this sort of thing.

Shortening the timeout is another useful thing as you mentioned.

--
To UNSUBSCRIBE, email to debian-user-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org
Archive: http://lists.debian.org/CA+AKB6HuLGuQsMmnKgc0zrWGDtt0BGZcK-kAohkz37AyDCse=***@mail.gmail.com

Bob Proulx

2012-03-20 06:00:01 UTC

Permalink

Post by francis picabia

Post by Bob Proulx

Post by francis picabia
One of the most frustrating problems which can happen in apache is to
server reached MaxClients setting

Why is it frustrating?

Yes, maybe you don't know this condition.

I am familiar with the condition. But for me it is protection. I am
not maintaining anything interesting enough to be slashdotted. But
some of the sites do get attacked by botnets.

Post by francis picabia
Suppose you have hundreds of users who might decide to dabble in
some php and not know much more than their textbook examples? In
this case, part of the high connection rate is merely code running
on the server. It comes from the server's IP,

That would definitely be a lot of users to hit this limit all at one
time only from internal use. I could imagine a teacher assigning
homework all due at midnight that would cause a concentration of
effect though.

Post by francis picabia
so no, a firewall rate limit won't help. It is particularly annoying when
this happens after hours and we need to understand the situation
posthumously.

You would need to have a way to examine the log files and gain
knowledge from them. Lots of connections in a short period of time
would be the way to find that.

The default configuration is designed to be useful to the majority of
users. That is why it is a default. But especially if you are a
large site and need industrial strength configuration then you will
probably need to set some industrial strength configuration.

Post by francis picabia

Post by Bob Proulx
Is that an error? Or is that protection against a denial of service
attack? I think it is protection.

It does protect the OS, but it doesn't protect apache. Apache stops
taking new connections, and it is just as good as if the system had
burned to the ground in terms of what the outside world sees.

First, I don't quite follow you. If Apache has been limited becuase
there isn't enough system resources (such as ram) for it to run a
zillion processes, then if it were to try to run a zillion processes
it would hurt everything. By hurt I mean either swapping and
thrashing or by having the out of memory killer invoke or other bad
things. That isn't good for anyone.

Secondly Apache continues to service connections. It isn't stuck and
it isn't "burned to the ground". So that is incorrect. It just won't
spawn any new service processes. The existing processes will continue
to process queued requests from the network. They will run at machine
speed to handle the incoming requests as fast as they can. Most web
browsers will simply be slower to respond. If the server can't keep
up to the timeout from the browser then of course the browser will
timeout. That is the classic "slashdot" effect. But that isn't to
say that Apache stops responding. A large number of clients will get
serviced.

There isn't any magic pixie dust to sprinkle on. The machine can only
do so much. At some point eBay had to buy a second server. :-)

Post by francis picabia
MaxClients isn't much of a feature from the standpoint of running a service
as the primary purpose of the server. When the maximum is reached,
apache does nothing to get rid of the problem. It can just stew there
not resolving any hits for the next few hours. It is not as useful say
as the OOM killer in the Linux kernel.

I read your words but I completely disagree with them. The apache
server doesn't take a nap for a few hours. Of course if you are
slashdotted then the wave will continue for many hours or days but
that is simply the continuing hits from the masses. But if you have
already max'd out your server then your server is max'd. Unlike
people the machine is good at math and knows that giving 110% is just
psychological but not actually possible. And as far as the OOM killer
goes, well, it is never a good thing.

Post by francis picabia

Ours was at 256, already tuned. So the problem is, as I stated, not
about raising the limit, but about troubleshooting the source of
the problem.

I think you have some additional problem above and beyond hitting
MaxClients. If you are having some issue that is causing your system
to be completely unresponsive for hours as you say then something else
is going wrong.

Post by francis picabia

As I simply listed munin as an example it isn't something I say is
required. But munin on a node is quite low overhead. On the server
where it is producing graphs it can be a burden. But on individual
nodes it is quite light weight. Of course there are a lot of
available plugins. If a particular plugin is problematic (I don't
know of any) I would simply disable that one plugin.

Bob

francis picabia

2012-03-20 15:50:02 UTC

Permalink

Post by francis picabia

Post by Bob Proulx

Post by francis picabia
One of the most frustrating problems which can happen in apache is to
server reached MaxClients setting

Why is it frustrating?

Yes, maybe you don't know this condition.

I am familiar with the condition. But for me it is protection. I am
not maintaining anything interesting enough to be slashdotted. But
some of the sites do get attacked by botnets.

Post by francis picabia
so no, a firewall rate limit won't help. It is particularly annoying when
this happens after hours and we need to understand the situation
posthumously.

You would need to have a way to examine the log files and gain
knowledge from them. Lots of connections in a short period of time
would be the way to find that.
The default configuration is designed to be useful to the majority of
users. That is why it is a default. But especially if you are a
large site and need industrial strength configuration then you will
probably need to set some industrial strength configuration.

Post by francis picabia

Post by Bob Proulx
Is that an error? Or is that protection against a denial of service
attack? I think it is protection.

It does protect the OS, but it doesn't protect apache. Apache stops
taking new connections, and it is just as good as if the system had
burned to the ground in terms of what the outside world sees.

First, I don't quite follow you. If Apache has been limited becuase
there isn't enough system resources (such as ram) for it to run a
zillion processes, then if it were to try to run a zillion processes
it would hurt everything. By hurt I mean either swapping and
thrashing or by having the out of memory killer invoke or other bad
things. That isn't good for anyone.
Secondly Apache continues to service connections. It isn't stuck and
it isn't "burned to the ground". So that is incorrect. It just won't
spawn any new service processes. The existing processes will continue
to process queued requests from the network. They will run at machine
speed to handle the incoming requests as fast as they can. Most web
browsers will simply be slower to respond. If the server can't keep
up to the timeout from the browser then of course the browser will
timeout. That is the classic "slashdot" effect. But that isn't to
say that Apache stops responding. A large number of clients will get
serviced.
There isn't any magic pixie dust to sprinkle on. The machine can only
do so much. At some point eBay had to buy a second server. :-)

Post by francis picabia

Ours was at 256, already tuned. So the problem is, as I stated, not
about raising the limit, but about troubleshooting the source of
the problem.

Post by francis picabia

I rattled on about MaxClients because it isn't very helpful in finding
why the server stopped responding. It would be better
if it dumped out a snapshot of what URLs it was servicing
at the moment MaxClients was reached. And yes, in my experience,
the next thing apache generally needs after hitting the limit is
a restart of apache service to make it responsive again. I've run
into this with Lon-Capa and merely 80 users, or Contao driven
web site with one web master writing inefficient PHP code to update
sports scores in a browser client. It doesn't require a slashdot
event to bring down apache, moreso some code that is generating
too many concurrent hits, or not enough memory for the web app
and the expected user numbers.

The solution I've found for monitoring is to enable /server-status
with extended information. I wait to see the problem and take
a look at what URLs are associated with the W state.

Thanks for the responses from all.