Applications using the LAMP (Linux®, Apache, MySQL, PHP/Perl)
architecture are constantly being developed and deployed. But often the server
administrator has little control over the application itself because it's written by
someone else. This
series of three articles
discusses many of the server configuration items that can make or break an
application's performance. This second article focuses on steps you can take to
optimize Apache and PHP. Linux, Apache, MySQL, and PHP (or Perl) form the basis of the LAMP architecture
for Web applications. Many open source packages based on LAMP components are
available to solve a variety of problems. As the load on an application increases,
the bottlenecks in the underlying infrastructure become more apparent in the form
of slow response to user requests. The
previous article
showed you how to tune the Linux system and covered the basics of LAMP and
performance measurement. This article focuses on the Web server components, Apache
and PHP.
Tuning Apache
Apache is a highly configurable piece of software. It has a lot of features, but
each one comes at a price. Tuning Apache is partially an exercise in proper
allocation of resources, and involves stripping down the configuration to only
what's needed.
Configuring the MPM
Apache is modular in that you can add and remove features easily.
Multi-Processing Modules (MPMs) provide this modular functionality at the core of
Apache -- managing the network connections and dispatching the requests. MPMs let
you use threads or even move Apache to a different operating system.
Only one MPM can be active at one time, and it must be compiled in statically
with --with-mpm=(worker|prefork|event)
.
The traditional model of one process per request is called prefork. A
newer, threaded, model is called worker, which uses multiple processes,
each with multiple threads to get better performance with lower overhead. The
final, event MPM is an experimental module that keeps separate pools of
threads for different tasks. To determine which MPM you're currently using,
execute httpd -l.
Choosing the MPM to use depends on many factors. Setting aside the event MPM
until it leaves experimental status, it's a choice between threads or no threads.
On the surface, threading sounds better than forking, if all the underlying
modules are thread safe, including all the libraries used by PHP. Prefork is the
safer choice; you should do careful testing if you choose worker. The performance
gains also depend on the libraries that come with your distribution and your
hardware.
Regardless of which MPM you choose, you must configure it appropriately. In
general, configuring an MPM involves telling Apache how to control how many
workers are running, whether they're threads or processes. The important
configuration options for the prefork MPM are shown in Listing 1.
Listing 1. Configuration of the prefork MPM
StartServers 50
MinSpareServers 15
MaxSpareServers 30
MaxClients 225
MaxRequestsPerChild 4000
|
In the prefork model, a new process is created per request. Spare processes are
kept idle to handle incoming requests, which reduces the start-up latency. The
previous configuration starts 50 processes as soon as the Web server comes up and
tries to keep between 10 and 20 idle servers running. The hard limit on processes
is dictated by MaxClients. Even though a process can
handle many consecutive requests, Apache kills off processes after 4,000
connections, which mitigates the risk of memory leaks.
Configuring the threaded MPMs is similar, except that you must determine how many
threads and processes are to be used. The Apache documentation explains all the
parameters and calculations necessary.
Choosing the values to use involves some trial and error. The most important
value is MaxClients. The goal is to allow enough worker
processes or threads to run without causing your server to swap excessively. If
more requests come in than can be handled, then at least those that made it
through get service; the others are blocked.
If MaxClients is too high, then all clients experience
poor service because the Web server tries to swap out one process to allow another
one to run. Too low a setting means you may deny services unnecessarily. Checking
the number of processes running at high loads and the resulting memory footprint
of all the Apache processes gives you a good idea of how to set this value. If you
go over 256 MaxClients, you must also set
ServerLimit to the same number; read the MPM's
documentation carefully for the associated caveats.
Tuning the number of servers to start and keep spare depends on the role of the
server. If the server runs only Apache, you can use modest values as shown in
Listing 1, because you're able to make full use of the
machine. If the system is shared with a database or other server, then you should
limit the number of spare servers being run.
Using options and
overrides efficiently
Each request that Apache processes goes through a complicated set of rules that
dictates any restrictions or special instructions the Web server must follow.
Access to a folder can be restricted by IP address to a certain folder, or a
username and password can be configured. These options also include the handling
of certain files, such as if a directory listing is provided, how certain
filetypes are to be handled, or whether the output should be compressed.
These configurations take the form of containers in httpd.conf such as
<Directory> to specify that the configuration to follow refers to a
location on disk, or <Location> to indicate that the reference is to
a path in the URL. Listing 2 shows a Directory container in action.
Listing 2. A Directory container being applied to the root directory
<Directory />
AllowOverride None
Options FollowSymLinks
</Directory>
|
In Listing 2, the configuration enclosed in the
Directory and /Directory
tags is applied to the given directory and everything under it — in this case,
the root directory. Here, the AllowOverride tag
dictates that users aren't allowed to override any options (more on this later).
The FollowSymLinks option is enabled, which lets Apache
look past symlinks to serve the request, even if the file is outside the directory
containing Web files. This means that if a file in your Web directory is a symlink
to /etc/passwd, the Web server happily serves the file if asked. With
-FollowSymLinks used instead, this feature is disabled,
and the same request causes an error to be returned to the client.
This last scenario is a cause for concern on two fronts. The first is a
performance matter. If FollowSymLinks is disabled, then
Apache must check each component of the filename (directories and the file itself)
to make sure they're not symbolic links. This incurs extra overhead in the form of
disk activity. A companion option called
FollowSymLinksIfOwnerMatch follows the symbolic link if
the owner of the file is the same as that of the link. This has the same
performance hit as disabling following of symlinks. For best performance, use the
options in Listing 2.
Security-conscious readers should be alert by now. Security is always a trade-off
between functionality and risk. In this case, the functionality is speed, and the
risk is allowing unauthorized access to files on the system. One of the
mitigations is that LAMP application servers are generally dedicated to a
particular function, and users can't create the potentially dangerous symbolic
links. If it's vital to have symbolic link-checking enabled, you can restrict it
to a particular area of the file system, as in Listing 3.
Listing 3. Restricting FollowSymLinks to a user's directory
<Directory />
Options FollowSymLinks
</Directory>
<Directory /home/*/public_html>
Options -FollowSymLinks
</Directory>
|
In Listing 3, any public_html directory in a user's home directory has the
FollowSymLinks option removed for it and any child
directories.
As you've seen, options can be configured on a per-directory basis through the
main server configuration. Users can override this server configuration themselves
(if permitted by the administrator by the
AllowOverrides statement) by dropping a file called
.htaccess into a directory. This file contains additional server directives that
are loaded and followed on each request to the directory where the .htaccess file
resides. Despite the earlier discussion about not having users on the system, many
LAMP applications use this functionality to control access and for URL rewriting,
so it's wise to understand how it works.
Even though the AllowOverrides statement prevents
users from doing anything you don't want them to, Apache must still look for the
.htaccess file to see if there is any work to be done. A parent directory can
specify directives that are to be processed by requests from child directories,
which means Apache must also search each component of the directory tree leading
to the requested file. Understandably, this causes a great deal of disk activity
on each request.
The easiest solution is to not allow any overrides, which eliminates the need for
Apache to check for .htaccess. Any special configurations are then placed directly
in httpd.conf. Listing 4 shows the additions to httpd.conf to enable password
checking for a user's project directory, rather than putting in a .htaccess file
and relying on AllowOverrides.
Listing 4. Moving .htaccess configuration into httpd.conf
<Directory /home/user/public_html/project/>
AuthUserFile /home/user/.htpasswd
AuthName "uber secret project"
AuthType basic
Require valid-user
</Directory>
|
If the configuration is moved into httpd.conf and
AllowOverrides is disabled, disk usage can be reduced.
A user's project may not attract many hits, but consider how powerful this
technique is when applied to a busy site.
Sometimes it's not possible to eliminate use of .htaccess files. For example, in
Listing 5, where an option is restricted to a certain part of the file system,
overrides can also be scoped.
Listing 5. Scoping .htaccess checking
<Directory />
AllowOverrides None
</Directory>
<Directory /home/*/public_html>
AllowOverrides AuthConfig
</Directory>
|
After you implement Listing 5, Apache still looks for .htaccess files in the
parent directories, but it stops in the public_html directory because the rest of
the file system has the functionality disabled. For example, if a file that maps
to /home/user/public_html/project/notes.html is requested, only the public_html
and project directories are searched.
One final note about per-directory configurations is in order. Any document about
tuning Apache will tell you to disable DNS lookups through the
HostnameLookups off directive because trying to
reverse-resolve every IP address connecting to your server is a waste of
resources. However, any limitations based on hostname force the Web server to
perform a reverse lookup on the client's IP address and a forward lookup on the
result of that to verify the authenticity of the name. Therefore, it's wise to
avoid using access controls based on the client's hostname and to scope them as
described when they're necessary.
Persistent connections
When a client connects to a Web server, it's allowed to issue multiple requests
over the same TCP connection, which reduces the latency associated with multiple
connections. This is useful when a Web page refers to several images: The client
can request the page and then all the images over one connection. The downside is
that the worker process on the server has to wait for the session to be closed by
the client before it can move on to the next request.
Apache lets you configure how persistent connections, called keepalives,
are handled. KeepAlive 5 at the global level of
httpd.conf allows the server to handle 5 requests on a connection before forcing
the connection closed. Setting this number to 0 disables the use of persistent
connections. KeepAliveTimeout, also at the global
level, determines how long Apache will wait for another request before closing the
session.
Handling persistent connections isn't a one-size-fits-all configuration. Some Web
sites fare better with keepalives disabled
(KeepAlive 0), and some experience a tremendous benefit
by having them on. The only solution is to try both and see for yourself. It's
advisable, though, to use a low timeout such as 2 seconds with
KeepAliveTimeout 2 if you enable keepalives. This
ensures that any client wishing to make another request has ample time, and that
worker processes aren't idling while waiting for another request that may never
come.
Compression
The Web server can compress the output before it's sent back to the client. This
results in a smaller page being sent over the Internet at the expense of CPU
cycles on the Web server. For those servers that can afford the CPU overhead, this
is an excellent way of making pages download faster — it isn't unheard of for
pages to be a third of their size after compression.
Images are generally already compressed, so compression should be limited to text
output. Apache provides compression through
mod_deflate. Although
mod_deflate can be simple to turn on, it includes many
complexities that the manual is eager to explain. This article doesn't cover the
configuration of compression except to provide a link to the appropriate
documentation (see the Resources section.)
Tuning PHP
PHP is the engine that runs the application code. You should install only the
modules you plan to use and have your Web server configured to use PHP only for
script files (usually those ending in .php) and not all static files.
Opcode caching
When a PHP script is requested, PHP reads the script and compiles it into what's
called Zend opcode, a binary representation of the code to be executed.
This opcode is then executed by the PHP engine and thrown away. An opcode cache
saves this compiled opcode and reuses it the next time the page is called. This
saves a considerable amount of time. Several opcode caches are available; I've had
a great deal of success with eAccelerator.
Installing eAccelerator requires the PHP development libraries on your computer.
Because different Linux distributions place files in difference places, it's best
to get the installation instructions directly from the eAccelerator Web site (see
the Resources section for a link). It's also possible
that your distribution has already packaged an opcode cache, and you just have to
install it.
Regardless of how you get eAccelerator on your system, there are a few
configuration options to look at. The configuration file is usually
/etc/php.d/eaccelerator.ini. eaccelerator.shm_size
defines the size of the shared memory cache, which is where the compiled scripts
are stored. The value is in megabytes. Determining the proper size depends on your
application. eAccelerator provides a script to show the status of the cache, which
includes the memory usage; 64 megabytes is a good start
(eaccelerator.shm_size="64"). You may also have to
tweak your kernel's maximum shared memory size if the value you choose isn't
accepted. Add kernel.shmmax=67108864 to
/etc/sysctl.conf, and run sysctl -p to make the setting
take effect. The value for kernel.shmmax is in bytes.
If the shared memory allocation is exceeded, eAccelerator must purge old scripts
from memory. By default, this is disabled;
eaccelerator.shm_ttl = "60" specifies that when
eAccelerator runs out of shared memory, any script that hasn't been accessed in 60
seconds should be purged.
Another popular alternative to eAccelerator is the Alternative PHP Cache (APC).
The makers of Zend also have a commercial opcode cache that includes an optimizer
to further increase efficiency.
php.ini
You configure PHP in php.ini. Four important settings control how much system
resources PHP can consume, as listed in Table 1.
Table 1. Resource related settings in php.ini
| Setting | Description | Recommended value |
|---|
| max_execution_time | How many CPU-seconds a script can consume | 30 | | max_input_time | How long (seconds) a script can wait for input data | 60 | | memory_limit | How much memory (bytes) a script can consume before being killed | 32M | | output_buffering | How much data (bytes) to buffer before sending out to the client | 4096 |
These numbers depend mostly on your application. If you accept large files from
users, then max_input_time may have to be increased,
either in php.ini or by overriding it in code. Similarly, a CPU- or memory-heavy
program may need larger settings. The purpose is to mitigate the effect of a
runaway program, so disabling these settings globally isn't recommended. Another
note on max_execution_time: This refers to the CPU time
of the process, not the absolute time. Thus a program that does lots of I/O and
few calculations may run for much longer than
max_execution_time. It's also how
max_input_time can be greater than
max_execution_time
The amount of logging that PHP can do is configurable. In a production
environment, disabling all but the most critical logs saves disk writes. If logs
are needed to troubleshoot a problem, you can turn up logging as needed.
error_reporting = E_COMPILE_ERROR|E_ERROR|E_CORE_ERROR
turns on enough logging to spot problems but eliminates a lot of chatter from
scripts. Summary
This article focused on tuning the Web server, both Apache and PHP. With Apache,
the general idea is to eliminate extra checks the Web server must do, such as
processing the .htaccess file. You must also tune the Multi-Processing Module
you're using to balance the system resources used with the availability of idle
workers for incoming requests. The best thing you can do for PHP is to install an
opcode cache. Keeping your eye on a few resource settings also ensures that
scripts don't hog resources and make the system slow for everyone else. |