WebServer
A web server is
an information technology that processes requests via HTTP, the basic network
protocol used to distribute information on the World Wide Web (WWW). When the
Web server receives an HTTP request, it responds with an HTTP
response, such as sending back an HTML page.
Apache is one of the web server. The list of web servers available in the market are Apache
web server, Microsoft IIS, Nginx web server, Lighttpd, Jigsaw, klone, Abyss web server,
Oracle Http server, X5 webserver, Zeus webserver, IBM Http server, Google web
server, Oracle iPlanet web server, Redhat web server etc.
Apache
The Apache HTTP Server is the world's most used web server software. Originally based on the NCSA HTTPd server, development of Apache began in early
1995 after work on the NCSA code stalled. Apache played a key role in the
initial growth of the World Wide
Web quickly overtaking NCSA HTTPd as the
dominant HTTP server, and has remained most popular since
April 1996. In 2009, it became the first web server software to serve more than
100 million websites.
Apache is developed and maintained by an open community of
developers under the auspices of the Apache Software Foundation. Most commonly used on a Unix-like system (usually Linux), the software is available for a
wide variety of operatingsystems besides Unix, including eComStation, Microsoft
Windows, NetWare, OpenVMS, OS/2, and TPF. Released under the Apache
License, Apache is free and open-source software.
Apache Version
|
Initial release
|
Latest release
|
1.3
|
1998-06-06
|
|
2.0
|
2002-04-06
|
2013-07-10 (2.0.65)
|
2.2
|
2005-12-01
|
2015-07-17 (2.2.31)
|
2.4
|
2012-02-21
|
2015-12-14 (2.4.18)
|
Client (Web browser): A client connects to
a server (Apache HTTP Server), with the specified protocol (http), and makes a request for
a resource using the URL-path
Server ( Apache http server): The server will send a response consisting
of a status code and, optionally, a response body. The status code indicates
whether the request was successful, and, if not, what kind of error condition
there was. This tells the client what it should do with the response
In order to connect to a server, the client will first have
to resolve the server name to an IP address - the location on the Internet
where the server resides. Thus, in order for your web server to be reachable,
it is necessary that the servername be in DNS.
If you don't know how to do this, you'll need to contact your
network administrator, or Internet service provider, to perform this step for
you.
More than one hostname may point to the same IP address, and
more than one IP address can be attached to the same physical server. Thus, you
can run more than one web site on the same physical server, using a feature
called virtual hosts.
If you are testing a server that is not Internet-accessible,
you can put host names in your hosts file in order to do local resolution. For
example, you might want to put a record in your hosts file to map a request for
www.example.com to your local system, for testing purposes.
This entry
would look like:
127.0.0.1 www.example.com
A hosts file will probably be located at /etc/hosts or C:\Windows\system32\drivers\etc\hosts.
Features of Apache.
1. Modules
2. Aliases
3 Virtual
Hosting
Modules: One of the Apache's key features is its modular construction.
After installation you can add extra functionality to it quickly and easily by
loading modules without having to re-compile the source code for Example: you
can load mod-dir for basic dir handling (or) mod-auth to authenticate users
with text file.
The modular approach in Apache makes it is easy for third
party developers to add functionality to a server. You can also customize the
Apache for a site by developing own modules using Apache module API.
Alias: Apache supports the use of aliases, which enable it to serve
content from file system locations other than those directly underneath the
specified document sort.
As a result an Apache server can reference any content on a
computer (or) even on other computers without having to move (or) duplicate the
information.
Virtual Hosting:Virtual hosting is a useful feature of Apache that enables
the simultaneous hosting of multiple websites on a single computer.
Virtual hosting has many practical applications. For Example:
an ISP commonly configures websites for different companies as virtual hosts
this enables the separation of the use of separate computers.
From 1.1 on wards Apache
can support both IP-based and name-based virtual hosting.
The IP-based system establishes which virtual host Apache
should serve by using the connection’s IP address. So it requires each virtual
host domain to have a dedicated IP address.
By using the host names to identify virtual hosts name based virtual
hosting enables the use of one IP address for multiple virtual hosts.
Httpd.conf file is the main configuration file in Apache
The Apache Directory Structure:
The Apache software is typically distributed into the
following subdirectories:
cgi-bin
|
This is where many, if not all, of the interactive
programs that you write will reside. These will be programs written with
Perl, Java, or other programming languages.
|
Conf
|
This directory will contain your configuration files.
|
htdocs
|
This directory will contain your actual hypertext
documents. This directory will typically have many subdirectories. This
directory is known as the DocumentRoot.
|
Icons
|
This directory contains the icons (small images) that
Apache will use when displaying information or error messages.
|
images
|
This directory will contain the image files (GIF or JPG)
that you will use on your web site.
|
Logs
|
This directory will contain your log files - the
access_log and error_log files.
|
Sbin
|
Use nogroup
|
Main Configuration file in apache:
1.The Apache software is configured by changing settings in
several text files in the Apache conf (configuration) directory.
2.There are four configuration files used by Apache. The
main configuration file is usually called httpd.conf.
access.conf
|
This is The security configuration file. It Contains
instructions about which users should be able to access. And what
information.
|
httpd.conf
|
This is The server configuration file. It Typically
contains directives that affect how the server runs, such as user and group
ID's it should use when running, the location of other files, etc.
|
srm.conf
|
This is The resource configuration file. It Contains
directives that define where documents are found, how to change addresses to
filenames, etc.
|
mime.types
|
A configuration file that relates filename extensions to
file types.
|
Httpd.conf file Sections
Httpd.conf file has 3 main sections.
Section 1: Global environment
The directives in this section affect the overall operation
such as the number of concurrent requests it can handle (or) where it can find
its configuration data.
Access
config /dev/null
Resource
config /dev/null
Server Type Standalone
Specify whether the apache server should seen under the inetd
daemon (or) as a standalone server.
Server Root “/etc/httpd”
Don’t give a slash at the end configuration and log files
are stored as subdirectories of this root directory.
Pid File run/httpd.pid
The file in which the server should
record its process identification number when it starts.
Start servers 5
The number
of processes that are run at start-up.
Timeout 300: Sets the period of which apache waits during
certain operations before sending (or) receiving a timeout signal.
Keep Alive off: Whether (or) not to allow persistent
connections (more one request per connection). Set to “off” to deactivate.
Section 2: Main Server Configuration
This
section contains the directives for the main server. The values of these
directives are also used as the default values for virtual hosts unless the
virtual host section of the file specifies different values.
Port number: 80
User and Group: apache
Server Admin: root@localhost
Server Name: www.easynomad:80
Document Root: “/var/www/html”
Directory Index: http://www.easynomad.com/offers/
Error Log:
Alias:
Script Alias: /cgi-bin/ “/var/www/html”
“/var/www/easynomad/cgi-bin”
Section 3: Virtual Hosts
This
enables you to setup virtual host containers to enable multiple-servers
capability
< Virtual host >
Server Admin:
Document Root:
Server Name:
Error Log:
Custom Log:
< Virtual host >
Apache Directives
General Configuration Tips
If configuring the Apache HTTP Server, edit /etc/httpd/conf/httpd.conf and
then either reload, restart, or stop and start the httpd.
Before editing httpd.conf, make a copy the original
file. Creating a backup makes it easier to recover from mistakes made while
editing the configuration file.
If a mistake is made and the Web server does not work
correctly, first review recently edited passages in httpd.conf to
verify there are no typos.
Next look in the Web server's error log, /var/log/httpd/error_log.
The error log may not be easy to interpret, depending on your level of
expertise. However, the last entries in the error log should provide useful
information.
The following subsections contain a list of short
descriptions for many of the directives included in httpd.conf.
ServerRoot
The ServerRoot directive specifies the top-level
directory containing website content. By default, ServerRoot is set
to "/etc/httpd"for both secure and non-secure servers.
PidFile names the file where the server records its
process ID (PID).
Timeout
Timeout defines, in seconds, the amount of time that
the server waits for receipts and transmissions during
communications. Timeoutis set to 300 seconds by default, which
is appropriate for most situations.
KeepAlive sets whether the server allows more than one
request per connection and can be used to prevent any one client from consuming
too much of the server's resources.
By default Keepalive is set to off.
If Keepalive is set to on and the server becomes very busy,
the server can quickly spawn the maximum number of child processes. In this
situation, the server slows down significantly. If Keepalive is
enabled, it is a good idea to set the the KeepAliveTimeout low and
monitor the /var/log/httpd/error_log log file on the server. This log
reports when the server is running out of child processes.
This directive sets the maximum number of requests allowed
per persistent connection. The Apache Project recommends a high setting, which
improves the server's performance. MaxKeepAliveRequests is set
to 100 by default, which should be appropriate for most situations.
KeepAliveTimeout sets the number of seconds the server
waits after a request has been served before it closes the connection. Once the
server receives a request, the Timeout directive applies
instead. KeepAliveTimeout is set to 15 seconds by default.
<IfModule> and </IfModule> tags
create a conditional container which are only activated if the specified module
is loaded. Directives within the IfModule container are processed
under one of two conditions. The directives are processed if the module
contained within the starting <IfModule> tag is loaded. Or, if
an exclamation point [!] appears before the module name, the
directives are processed only if the module specified in
the <IfModule> tag is not loaded.
MPM Specific Server-Pool Directives
Apache HTTP Server 2.0 the responsibility for managing
characteristics of the server-pool falls to a module group called MPMs. The
characteristics of the server-pool differ depending upon which MPM is used. For
this reason, an IfModule container is necessary to define the
server-pool for the MPM in use.
By default, Apache HTTP Server 2.0 defines the server-pool
for both the prefork and worker MPMs.
The following a list of directives found within the
MPM-specific server-pool containers.
StartServers sets how many server processes are created
upon startup. Since the Web server dynamically kills and creates server
processes based on traffic load, it is not necessary to change this parameter.
The Web server is set to start 8 server processes at startup for
the prefork MPM and 2 for the worker MPM.
MaxRequestsPerChild sets the total number of requests
each child server process serves before the child dies. The main reason for
setting MaxRequestsPerChild is to avoid long-lived process induced
memory leaks. The default MaxRequestsPerChild for theprefork MPM
is 1000 and for the worker MPM is 0.
MaxClients sets a limit on the total number of server
processes, or simultaneously connected clients that can run at one time. The
main purpose of this directive is to keep a runaway Apache HTTP Server from
crashing the operating system. For busy servers this value should be set to a
high value. The server's default is set to 150 regardless of the MPM in use.
However, it is not recommended that the value
for MaxClients exceeds 256 when using the prefork MPM.
MinSpareServers and MaxSpareServers
These values are only used with the prefork MPM.
They adjust how the Apache HTTP Server dynamically adapts to the perceived load
by maintaining an appropriate number of spare server processes based on the
number of incoming requests. The server checks the number of servers waiting
for a request and kills some if there are more
than MaxSpareServers or creates some if the number of servers is less
than MinSpareServers.
The default MinSpareServers value is 5; the
default MaxSpareServers value is 20. These default settings
should be appropriate for most situations. Be careful not to increase
the MinSpareServers to a large number as doing so creates a heavy
processing load on the server even when traffic is light.
MinSpareThreads and MaxSpareThreads
These values are only used with the worker MPM.
They adjust how the Apache HTTP Server dynamically adapts to the perceived load
by maintaining an appropriate number of spare server threads based on the number
of incoming requests. The server checks the number of server threads waiting
for a request and kills some if there are more
than MaxSpareThreads or creates some if the number of servers is less
than MinSpareThreads.
The default MinSpareThreads value is 25; the
default MaxSpareThreads value is 75. These default settings
should be appropriate for most situations. The value
for MaxSpareThreads is must be greater than or equal to the sum
of MinSpareThreads andThreadsPerChild or Apache HTTP Server
automatically corrects it.
This value is only used with the worker MPM. It
sets the number of threads within each child process. The default value for
this directive is 25.
The Listen command identifies the ports on which
the Web server accepts incoming requests. By default, the Apache HTTP Server is
set to listen to port 80 for non-secure Web communications and (in
the /etc/httpd/extra/conf/ssl.conf file which defines any secure
servers) to port 443 for secure Web communications.
If the Apache HTTP Server is configured to listen to a port
under 1024, only the root user can start it. For port 1024 and
above, httpdcan be started as a regular user.
The Listen directive can also be used to specify
particular IP addresses over which the server accepts connections.
Include allows other configuration files to be included
at runtime.
The path to these configuration files can be absolute or
relative to the ServerRoot.
LoadModule is used to load in Dynamic Shared Object
(DSO) modules.
ExtendedStatus
The ExtendedStatus directive controls whether
Apache generates basic (off) or detailed server status information (on), when
theserver-status handler is called. The Server-status handler is
called using Location tags.
IfDefine
The IfDefine tags surround configuration
directives that are applied if the "test" stated in
the IfDefine tag is true. The directives are ignored if the test is
false.
The test in the IfDefine tags is a parameter name
(for example, HAVE_PERL). If the parameter is defined, meaning that it is
provided as an argument to the server's start-up command, then the test is
true. In this case, when the Web server is started, the test is true and the
directives contained in the IfDefine tags are applied.
The User directive sets the user name of the
server process and determines what files the server is allowed to access. Any
files inaccessible to this user are also inaccessible to clients connecting to
the Apache HTTP Server.
By default User is set to apache.
Specifies the group name of the Apache HTTP Server
processes.
By default Group is set to apache.
Sets the ServerAdmin directive to the email
address of the Web server administrator. This email address shows up in error
messages on server-generated Web pages, so users can report a problem by
sending email to the server administrator.
By default, ServerAdmin is set
to root@localhost.
A common way to set up ServerAdmin is to set it
to webmaster@example.com. Then alias webmaster to the person
responsible for the Web server in /etc/aliases and
run /usr/bin/newaliases.
ServerName specifies a hostname and port number
(matching the Listen directive) for the server. The ServerName does
not need to match the machine's actual hostname. For example, the Web server
may be www.example.com, but the server's hostname is
actuallyfoo.example.com. The value specified in ServerName must be a
valid Domain Name Service (DNS) name that can be resolved by the system do not make something up.
The following is a sample ServerName directive:
ServerName www.example.com:80
When specifying a ServerName, be sure the IP address
and server name pair are included in the /etc/hosts file.
When set to on, this directive configures the Apache
HTTP Server to reference itself using the value specified in
the ServerName and Port directives.
When UseCanonicalName is set to off, the server instead uses the
value used by the requesting client when referring to itself.
UseCanonicalName is set to off by default.
The DocumentRoot is the directory which contains
most of the HTML files which are served in response to requests. The default DocumentRoot for
both the non-secure and secure Web servers is
the /var/www/html directory. For example, the server might receive a
request for the following document:
http://example.com/foo.html
The server looks for the following file in the default
directory:
/var/www/html/foo.html
<Directory
/path/to/directory> and </Directory> tags create a
container used to enclose a group of configuration directives which apply only
to a specific directory and its subdirectories. Any directive which is applicable
to a directory may be used within Directory tags.
By default, very restrictive parameters are applied to the
root directory (/), using the Options and AllowOverride directives.
Under this configuration, any directory on the system which needs more
permissive settings has to be explicitly given those settings.
In the default configuration,
another Directory container is configured for
the DocumentRoot which assigns less rigid parameters to the directory
tree so that the Apache HTTP Server can access the files residing there.
The Directory container can be also be used to
configure additional cgi-bin directories for server-side applications
outside of the directory specified in the ScriptAlias directive
To accomplish this, the Directory container must
set the ExecCGI option for that directory.
For example, if CGI scripts are located
in /home/my_cgi_directory, add the following Directory container
to the httpd.conf file:
<Directory /home/my_cgi_directory>
Options +ExecCGI
</Directory>
Next, the AddHandler directive must be uncommented
to identify files with the .cgi extension as CGI scripts.
For this to work, permissions for CGI scripts, and the entire
path to the scripts, must be set to 0755.
The Options directive controls which server
features are available in a particular directory. For example, under the
restrictive parameters specified for the root directory, Options is
set to only FollowSymLinks. No features are enabled, except that the
server is allowed to follow symbolic links in the root directory.
By default, in
the DocumentRoot directory, Options is set to
include Indexes and FollowSymLinks. Indexes permits
the server to generate a directory listing for a directory if
no DirectoryIndex (for example, index.html) is
specified. FollowSymLinks allows the server to follow symbolic links
in that directory.
The AllowOverride directive sets whether
any Options can be overridden by the declarations in
an .htaccess file. By default, both the root directory and
the DocumentRoot are set to allow no .htaccess overrides.
The Order directive controls the order in which allow and deny directives
are evaluated. The server is configured to evaluate theAllow directives
before the Deny directives for the DocumentRoot directory.
Allow specifies which client can access a given
directory. The client can be all, a domain name, an IP address, a partial
IP address, a network/netmask pair, and so on.
The DocumentRoot directory is configured to Allow requests
from all, meaning everyone has access.
Deny works similar to Allow, except it specifies
who is denied access. The DocumentRoot is not configured
to Deny requests from anyone by default.
UserDir is the subdirectory within each user's home
directory where they should place personal HTML files which are served by the
Web server. This directive is set to disable by default.
The name for the subdirectory is set
to public_html in the default configuration. For example, the server
might receive the following request:
http://example.com/~username/foo.html
The server would look for the file:
/home/username/public_html/foo.html
In the above example, /home/username/ is the
user's home directory (note that the default path to users' home directories
may vary).
Make sure that the permissions on the users' home
directories are set correctly. Users' home directories must be set to 0711. The
read (r) and execute (x) bits must be set on the
users' public_html directories (0755 also works). Files that are
served in a users'public_html directories must be set to at least 0644.
The DirectoryIndex is the default page served by
the server when a user requests an index of a directory by specifying a forward
slash (/) at the end of the directory name.
When a user requests the page
http://example/this_directory/, they get either the DirectoryIndex page
if it exists or a server-generated directory list. The default
for DirectoryIndex is index.html and
the index.html.var type map. The server tries to find either of these
files and returns the first one it finds. If it does not find one of these
files and Options Indexes is set for that directory, the server
generates and returns a listing, in HTML format, of the subdirectories and
files within the directory, unless the directory listing feature is turned off.
AccessFileName names the file which the server should
use for access control information in each directory. The default is.htaccess.
Immediately after the AccessFileName directive, a
set of Files tags apply access control to any file beginning with
a .ht. These directives deny Web access to any .htaccess files
(or other files which begin with .ht) for security reasons.
By default, the Web server asks proxy servers not to cache
any documents which were negotiated on the basis of content (that is, they may
change over time or because of the input from the requester).
If CacheNegotiatedDocs is set to on, this function is disabled
and proxy servers are allowed to such cache documents.
TypesConfig names the file which sets the default list
of MIME type mappings (file name extensions to content types). The
defaultTypesConfig file is /etc/mime.types. Instead of
editing /etc/mime.types, the recommended way to add MIME type mappings is
to use the AddType directive.
DefaultType sets a default content type for the Web
server to use for documents whose MIME types cannot be determined. The default
is text/plain.
HostnameLookups can be set
to on, off or double. If HostnameLookups is set
to on, the server automatically resolves the IP address for each
connection. Resolving the IP address means that the server makes one or more
connections to a DNS server, adding processing overhead.
If HostnameLookups is set to double, the server performs a
double-reverse DNS look up adding even more processing overhead.
To conserve resources on the
server, HostnameLookups is set to off by default.
If hostnames are required in server log files, consider
running one of the many log analyzer tools that perform the DNS lookups more
efficiently and in bulk when rotating the Web server log files.
ErrorLog specifies the file where server errors are
logged. By default, this directive is set to /var/log/httpd/error_log.
LogLevel sets how verbose the error messages in the
error logs are. LogLevel can be set (from least verbose to most
verbose)
toemerg, alert, crit, error, warn, notice, info or debug.
The default LogLevel is warn.
The LogFormat directive configures the format of
the various Web server log files. The actual LogFormat used depends
on the settings given in the CustomLog directive
The following are the format options if
the CustomLog directive is set to combined:
%h (remote host's IP address or hostname)
Lists the remote IP address of the requesting client.
If HostnameLookups is set to on, the client hostname is recorded
unless it is not available from DNS.
%l (rfc931)
Not used. A hyphen [-] appears in the log file for
this field.
%u (authenticated user)
If authentication was required, lists the user name of the
user is recorded. Usually, this is not used, so a hyphen [-] appears
in the log file for this field.
%t (date)
Lists the date and time of the request.
%r (request string)
Lists the request string exactly as it came from the browser
or client.
%s (status)
Lists the HTTP status code which was returned to the client
host.
%b (bytes)
Lists the size of the document.
%\"%{Referer}i\" (referrer)
Lists the URL of the webpage which referred the client host
to Web server.
%\"%{User-Agent}i\" (user-agent)
Lists the type of Web browser making the request.
CustomLog identifies the log file and the log file
format. By default, the log is recorded to
the /var/log/httpd/access_log file.
The default CustomLog format is combined. The
following illustrates the combined log file format: remotehost rfc931 user date
"request" status bytes referrer user-agent
The ServerSignature directive adds a line
containing the Apache HTTP Server server version and
the ServerName to any server-generated documents, such as error messages
sent back to clients. ServerSignature is set to on by
default.
It can also be set to off or
to EMail. EMail, adds a mailto:ServerAdmin HTML tag to the
signature line of auto-generated responses.
The Alias setting allows directories outside
the DocumentRoot directory to be accessible. Any URL ending in the
alias automatically resolves to the alias' path. By default, one alias for
an icons/ directory is already set up. An icons/ directory
can be accessed by the Web server, but the directory is not in
the DocumentRoot.
The ScriptAlias directive defines where CGI
scripts are located. Generally, it is not good practice to leave CGI scripts
within theDocumentRoot, where they can potentially be viewed as text documents.
For this reason, a special directory outside of theDocumentRoot directory
containing server-side executables and scripts is designated by
the ScriptAlias directive. This directory is known as
a cgi-bin and set to /var/www/cgi-bin/ by default.
It is possible to establish directories for storing
executables outside of the cgi-bin directory
Redirect
When a webpage is moved, Redirect can be used to
map the file location to a new URL. The format is as follows:
Redirect /<old-path>/<file-name> http://<current-domain>/<current-path>/<file-name>
In this example, replace <old-path> with the
old path information
for <file-name> and <current-domain> and <current-path> with
the current domain and path information for <file-name>.
In this example, any requests
for <file-name> at the old location is automatically redirected
to the new location.
IndexOptions controls the appearance of server
generated directing listings, by adding icons, file descriptions, and so on.
If Options Indexes is set, the Web server generates a directory
listing when the Web server receives an HTTP request for a directory without an
index.
First, the Web server looks in the requested directory for a
file matching the names listed in the DirectoryIndex directive
(usually,index.html). If an index.html file is not found, Apache HTTP
Server creates an HTML directory listing of the requested directory. The
appearance of this directory listing is controlled, in part, by
the IndexOptions directive.
The default configuration turns on FancyIndexing. This
means that a user can re-sort a directory listing by clicking on column
headers. Another click on the same header switches from ascending to descending
order. FancyIndexing also shows different icons for different files,
based upon file extensions.
The AddDescription option, when used in
conjunction with FancyIndexing, presents a short description for the file
in server generated directory listings.
IndexOptions has a number of other parameters which can
be set to control the appearance of server generated directories. Parameters
include IconHeight and IconWidth, to make the server include
HTML HEIGHT and WIDTH tags for the icons in server
generated webpages; IconsAreLinks, for making the icons act as part of the
HTML link anchor along with the filename and others.
This directive names icons which are displayed by files with
MIME encoding in server generated directory listings. For example, by default,
the Web server shows the compressed.gif icon next to MIME encoded
x-compress and x-gzip files in server generated directory listings.
This directive names icons which are displayed next to files
with MIME types in server generated directory listings. For example, the server
shows the icon text.gif next to files with a mime-type of text,
in server generated directory listings.
AddIcon specifies which icon to show in server
generated directory listings for files with certain extensions. For example,
the Web server is set to show the icon binary.gif for files
with .bin or .exe extensions.
DefaultIcon specifies the icon displayed in server
generated directory listings for files which have no other icon specified.
Theunknown.gif image file is the default.
When using FancyIndexing as an IndexOptions parameter,
the AddDescription directive can be used to display user-specified
descriptions for certain files or file types in a server generated directory
listing. The AddDescription directive supports listing specific
files, wildcard expressions, or file extensions.
ReadmeName names the file which, if it exists in the
directory, is appended to the end of server generated directory listings. The
Web server first tries to include the file as an HTML document and then try to
include it as plain text. By default, ReadmeName is set
toREADME.html.
HeaderName names the file which, if it exists in the
directory, is prepended to the start of server generated directory listings.
LikeReadmeName, the server tries to include it as an HTML document if possible
or in plain text if not.
IndexIgnore lists file extensions, partial file names,
wildcard expressions or full filenames. The Web server does not include any
files which match any of those parameters in server generated directory
listings.
AddEncoding names filename extensions which should
specify a particular encoding type. AddEncoding can also be used to
instruct some browsers to uncompress certain files as they are downloaded.
AddLanguage associates file name extensions with
specific languages. This directive is useful for Apache HTTP Servers which
serve content in multiple languages based on the client Web browser's language
settings.
LanguagePriority sets precedence for different
languages in case the client Web browser has no language preference set.
Use the AddType directive to define or override a
default MIME type and file extension pairs. The following example directive
tells the Apache HTTP Server to recognize the .tgz file extension: AddType
application/x-tar .tgz
AddHandler maps file extensions to specific handlers.
For example, the cgi-script handler can be matched with the extension .cgito
automatically treat a file ending with .cgi as a CGI script. The
following is a sample AddHandler directive for
the .cgi extension.
AddHandler cgi-script .cgi
This directive enables CGIs outside of
the cgi-bin to function in any directory on the server which has
the ExecCGI option within the directories container.
Action specifies a MIME content type and CGI script
pair, so that whenever a file of that media type is requested, a particular CGI
script is executed.
The ErrorDocument directive associates an HTTP
response code with a message or a URL to be sent back to the client. By
default, the Web server outputs a simple and usually cryptic error message when
an error occurs. The ErrorDocument directive forces the Web server to
instead output a customized message or page.
The BrowserMatch directive allows the server to
define environment variables and take appropriate actions based on the
User-Agent HTTP header field — which identifies the client's Web browser type.
By default, the Web server uses BrowserMatch to deny connections to
specific browsers with known problems and also to disable keepalives and HTTP
header flushes for browsers that are known to have problems with those actions.
The <Location> and </Location> tags
create a container in which access control based on URL can be specified.
For instance, to allow people connecting from within the
server's domain to see status reports, use the following directives:
<Location /server-status>
SetHandler
server-status
Order deny,allow
Deny from all
Allow from
<.example.com>
</Location>
Replace <.example.com> with the second-level
domain name for the Web server.
To provide server configuration reports (including installed
modules and configuration directives) to requests from inside the domain, use
the following directives:
<Location /server-info>
SetHandler
server-info
Order deny,allow
Deny from all
Allow from
<.example.com>
</Location
Again, replace <.example.com> with the
second-level domain name for the Web server.
To configure the Apache HTTP Server to function as a proxy
server, remove the hash mark (#) from the beginning of the <IfModule
mod_proxy.c> line, the ProxyRequests, and each line in
the <Proxy> stanza. Set the ProxyRequests directive
to On, and set which domains are allowed access to the server in
the Allow from directive of the <Proxy> stanza.
<Proxy *> and </Proxy> tags
create a container which encloses a group of configuration directives meant to
apply only to the proxy server. Many directives which are allowed within a <Directory> container
may also be used within <Proxy> container.
The ProxyVia command controls whether or not an
HTTP Via: header line is sent along with requests or replies which go through
the Apache proxy server. The Via: header shows the hostname
if ProxyVia is set to On, shows the hostname and the Apache HTTP
Server version for Full, passes along any Via: lines unchanged
for Off, and Via: lines are removed for Block.
A number of commented cache directives are supplied by the
default Apache HTTP Server configuration file. In most cases, uncommenting
these lines by removing the hash mark (#) from the beginning of the line is
sufficient. The following, however, is a list of some of the more important
cache-related directives.
CacheEnable — Specifies whether the cache is a disk,
memory, or file descriptor cache. By default CacheEnable configures a
disk cache for URLs at or below /.
CacheRoot — Specifies the name of the directory
containing cached files. The default CacheRoot is
the /var/httpd/proxy/directory.
CacheSize — Specifies how much space the cache can use
in kilobytes. The default CacheSize is 5 KB.
The following is a list of some of the other common
cache-related directives.
CacheMaxExpire — Specifies how long HTML documents are
retained (without a reload from the originating Web server) in the cache. The
default is 24 hours (86400 seconds).
CacheLastModifiedFactor — Specifies the creation of an
expiry (expiration) date for a document which did not come from its originating
server with its own expiry set. The
default CacheLastModifiedFactor is set to 0.1, meaning that the
expiry date for such documents equals one-tenth of the amount of time since the
document was last modified.
CacheDefaultExpire — Specifies the expiry time in hours
for a document that was received using a protocol that does not support expiry
times. The default is set to 1 hour (3600 seconds).
NoProxy — Specifies a space-separated list of subnets,
IP addresses, domains, or hosts whose content is not cached. This setting is
most useful for Intranet sites.
The NameVirtualHost directive associates an IP
address and port number, if necessary, for any name-based virtual hosts.
Name-based virtual hosting allows one Apache HTTP Server to serve different
domains without using multiple IP addresses.
To enable name-based virtual hosting, uncomment
the NameVirtualHost configuration directive and add the correct IP
address. Then add more VirtualHost containers for each virtual host.
<VirtualHost> and </VirtualHost> tags
create a container outlining the characteristics of a virtual host.
The VirtualHostcontainer accepts most configuration directives.
A commented VirtualHost container is provided
in httpd.conf, which illustrates the minimum set of configuration
directives necessary for each virtual host.
SetEnvIf sets environment variables based on the
headers of incoming connections. It is not solely an SSL directive,
though it is present in the supplied /etc/httpd/extra/conf/ssl.conf file.
It's purpose in this context is to disable HTTP keepalive and to allow SSL to
close the connection without a close notify alert from the client browser. This
setting is necessary for certain browsers that do not reliably shut down the
SSL connection.
Apchectl Commnads
Apachectl: This is short for apache server control interface
to help admin manage the http daemon.
This
utility includes a variety of commands for starting, stopping, checking httpd
status and running syntax tests.
Apachectl start:
Start command starts the httpd daemon. An error message
displays if httpd is already running.
Restart:
If httpd is running the restart command restart the daemon,
automatically checking the configuration files as in configtest to make sure
the daemon doesn’t die. If the daemon is not running this control will start
it.
Graceful:
The command will start the httpd daemon if it is not
running. It allows current connections to continue before restarting the http
daemon.
configtest:
Command carries out a configuration syntax test. If passes
the configuration files and returns either syntax ok (or) detailed information
about the syntax error. While this command can’t check if the configuration
file what you expect them to do, it does make sure all configuration syntax is
correct.
Full status:
Command provides a status report from mod-status. We will
need to have both a text based browser such as syntax and mod-status installed
on the server if you want to use this command to report on the web server’s
status.
Status:
It will provide a brief status report similar to full
status.
Apache Redirects
What you are trying to accomplish here is to have one
resource (either a page or an entire site) redirect a visitor to a completely
different page or site, and while doing so tell the visitor's browser that the
redirect is either permanent (301) or temporary (302).
Therefore you need to do three things:
Have 2 resources - one source page or website, and one
destination page or website.
When an attempt to access the source resource is made,
the webserver transfers the visitor to the destination instead.
During the transfer, the webserver reports to the visitor
that a redirect is happening and it's either temporary or permanent.
The ability to control the "status" argument in the
redirect directive (which sets whether it's a 301 or 302) within Apache is only
available in version 1.2 and above. You are best off using version 2 or above
for maximum stability, security and usefulness.
301 Redirect
A function of a web server that redirects the visitor from
the current page or site to another page or site, while returning a response
code that says that the original page or site has been permanently moved
to the new location. Search engines like this information and will readily
transfer link popularity (and PageRank) to the new site quickly and with few
issues. They are also not as likely to cause issues with duplication filters.
SEOs like 301 redirects, and they are usually the preferred way to deal with
multiple domains pointing at one website.
302 Redirect
A function of a web server that redirects the visitor from
the current page or site to another page or site, while returning a response
code that says that the original page or site has been temporarily moved
to the new location. Search engines will often interpret these as a park, and
take their time figuring out how to handle the setup. Try to avoid a 302
redirect on your site if you can (unless it truly is only a temporary
redirect), and never use them as some form of click tracking for your outgoing
links, as they can result in a "website hijacking" under some
circumstances.
mod_rewrite
Mod_Rewrite is an Apache extension module which will allow
URL's to be rewritten on the fly. Often this is used by SEOs to convert dynamic
URL's with multiple query strings into static URL's. An example of this would
be to convert the dynamic URL
domain.com/search.php?day=31&month=may&year=2005 to
domain.com/search-31-may-2005.htm
htaccess
htaccess (Hypertext Access) is the default name of Apache's
directory-level configuration file. It provides the ability to customize
configuration directives defined in the main configuration file. You can
execute a mod_rewrite script using the .htaccess file.
httpd.conf
Apache is configured by placing directives in plain text
configuration files. The main configuration file is usually called httpd.conf.
The location of this file is set at compile-time, but may be overridden with
the -f command line flag. In addition, other configuration files may be added
using the Include directive, and wildcards can be used to include many
configuration files. Any directive may be placed in any of these configuration
files. Changes to the main configuration files are only recognized by Apache
when it is started or restarted.
Redirection (302)
A default redirection function of IIS that redirects the
visitor from the current page or site to another page or site, while returning
a response code that says that the original page or site has been temporarily moved
to the new location. Search engines will often interpret these as a park, and
take their time figuring out how to handle the setup. Try to avoid a 302
redirect on your site if you can (unless it truly is only a temporary redirect),
and never use them as some form of click tracking for your outgoing links, as
they can result in a "website hijacking" under some circumstances.
Permanent Redirection (301)
An optional function of IIS that redirects the visitor from
the current page or site to another page or site, while returning a response
code that says that the original page or site has been permanently moved
to the new location. Search engines like this information and will readily
transfer link popularity (and PageRank) to the new site quickly and with few
issues. They are also not as likely to cause issues with duplication filters.
SEOs like 301 redirects, and they are usually the preferred way to deal with
multiple domains pointing at one website.
Mod_Rewrite and the Apache Redirect
If you have the mod_rewrite extension installed (it comes
with most Apache installs as a default) you can use it to dynamically change
URL's using arguments on the fly - this is NOT a 301 redirect, but rather it's
related behavior. For example, if you wanted to redirect .htm files
from an old server to their equivalent .php files on a new one using
a 301 redirect, you would use a combination of mod_rewrite and the redirect
directive to do the redirection + URL change.
You could do it on a file by file basis by making a really
long list of possible redirects in the .htaccess file by hand without
mod_rewrite, but that would be a real pain on a server with a lot of files, or
a completely dynamic system. Therefore these 2 functions are often used together.
Syntax for a 301 Redirect
The syntax for the redirect directive is:
Redirect /yourdirectory
http://www.newdomain.com/newdirectory
If the client requests http://myserver/service/foo.txt,
it will be told to access http://www.yourdomain.com/service/foo.txt instead.
Note: Redirect directives take precedence over Alias and
ScriptAlias directives, irrespective of their ordering in the configuration
file. Also, URL-path must be a fully qualified URL, not a relative
path, even when used with .htaccess files or inside of <Directory>
sections.
If you use the redirect without the status argument, it will
return a status code of 302 by default. This default behaviour has given me
problems over the years as an SEO, so it's important to remember to use it,
like this:
Redirect permanent /one http://www.newdomain.com/two
or
Redirect 301 /two http://www.newdomain.com/other
Both of which will return the 301 status code. If you wanted
to return a 302 you could either not specify anything, or use "302"
or "temp" as the status argument above.
You can also use 2 other directives - RedirectPermanent URL-path
URL (returns a 301 and works the same as Redirect permanent /URL PathURL)
and RedirectTemp URL-path URL (same, but for a 302 status).
For more global changes, you would use redirectMatch, with
the same syntax:
RedirectMatch 301 ^(.*)$ http://www.newdomain.com
or
RedirectMatch permanent ^(.*)$ http://www.newdomain.com
These arguments will match any file requested at the old
account, change the domain, and redirect it to the file of the same name at the
new account.
You would use these directives in either the .htaccess file
or the httpd file. It's most common to do it in the .htaccess file because it's
the easiest and doesn't require a restart, but the httpd method has less overhead
and works fine, as well.
Simple Domain 301 Redirect Checklist
This assumes you just have a new domain (with no working
pages under it) and want it to redirect properly to your main domain.
1. Ensure that you have 2 accounts - the old site and
the new site (they do not have to be on different IP's or different machines).
2. Your main (proper or canonical) site should be
pointed at the new site using DNS. All your other domains should be pointed at
the old site using DNS. Parking them there is fine at this point.
3. Find the .htaccess file at the root of your old
account. Yes, it starts with a "." We will be working with this file.
The new site does not need any changes made to it - the old site does all the
redirection work.
4. Download the .htaccess file and open it in a text
only editor.
5a. Add this code:
Redirect 301 / http://www.newdomain.com/
6. Then upload the file to your root folder and test
your new redirect. Make you you also check it using a HTTP Header viewer just
to be sure it shows as a 301.
Control Panel Method
cPanel redirect
Log into your cPanel, and look for "Redirects"
under Site Management
Put in the current directory into the first box
Put the new directory in the second box
Choose the type (temporary or permanent) temporary=302 and
permanent=301
Click "Add" and you're done
You can only do 302 redirects (or frame forwarding - bad!)
using the Plesk control panel - use .htaccess for 301's instead.
If you use Ensim, the only way to redirect is by using
the .htaccess file (no control panel option at this time).
Basic Old Website to New Website Redirection
This is used when you have an existing website (with pages)
and want to move it to a new domain, while keeping all your page names and the
links to them.
1. Ensure that you have 2 websites - the old site and
the new site, and that they are on different accounts (they do not have to be
on different IP's or different machines).
2. Your main (proper or canonical) site should be
pointed at the new site using DNS. All your old domains should be pointed at
the old site using DNS.
3. Find the .htaccess file at the root of your old
account. Yes, it starts with a "." We will be working with this
file. The new site does not need any changes made to it - the old site does all
the redirection work.
4. Download the .htaccess file and open it in a text
only editor.
5a. If you have mod_rewrite installed, add this code:
Options +FollowSymLinks
RewriteEngine on
RewriteCond %{HTTP_HOST} !^newdomain\.com
RewriteRule ^(.*)$ http://www.newdomain.com/$1 [R=301,L]
5b. If you don't have mod_rewrite installed, you really
should. If you can't install it, then you can use this code instead:
RedirectMatch 301 ^(.*)$ http://www.newdomain.com
6. Then upload the file to your root folder and test
your new redirect. Make you you also check it using a HTTP Header viewer just
to be sure it shows as a 301.
FrontPage on Apache
After you've done the basic Apache 301 redirection described
in this article, you will also need to change the .htaccess files
in:
_vti_bin
_vti_bin /_vti_adm
_vti_bin/ _vti_aut
Replace "Options None" to "Options
+FollowSymLinks"
Those folders are part of your FrontPage extensions on the
server, so you will have to use FTP to get to them, since FrontPage hides these
folders by default to prevent them from accidentally being messed with by
novice users.
More Complicated Redirects
You can't use a control panel in Apache currently for these -
.htaccess only.
Redirecting everything to a single page
This is common when you are totally changing the new website
from the old and you just want all your links and requests form the old site to
be directed to a spot on your new site (usually the home page). You actually
need to do it on a page by page basis.
Redirect 301 /oldfile1.htm http://www.newdomain.com
Redirect 301 /oldfile2.htm http://www.newdomain.com
Redirect 301 /oldfile3.htm http://www.newdomain.com
Redirection while changing the filename
This example will redirect all the files on the old account
that end in html to the same file on the new account, but with a php extension.
You can also use this technique within the same account if you want
to change all your extensions but don't want to lose your incoming links to the
old pages. This is common when people switch to from static htm files to
dynamic ones while keeping the same domain name, for example.
Just change the "html" and "php" parts of
the below example to your specific situation, if needed.
RedirectMatch 301 (.*)\.html$ http://www.newdomain.com$1.php
Redirection while changing the filename, but keeping the GET
arguments
Sometimes, you will want to change to a different CMS, but
keep your database the same, or you want to switch everything but you like the
arguments and don't want to change them.
RedirectMatch 301 /oldcart.php(.*)
http://www.newdomain.com/newcart.php$1
This will result in
"http://www.olddomain.com/oldcart.php?Cat_ID=Blue" being redirected
to "http://www.newdomain.com/newcart.php?Cat_ID=Blue"
URL Rewriting
Most dynamic sites include variables in their URLs that tell
the site what information to show the user. Typically, this gives URLs like the
following, telling the relevant script on a site to load product number 7.
http://www.pets.com/show_a_product.php?product_id=7
The problems with this kind of URL structure are that the URL
is not at all memorable. It's difficult to read out over the phone (you'd be
surprised how many people pass URLs this way). Search engines and users alike
get no useful information about the content of a page from that URL. You can't
tell from that URL that that page allows you to buy a Norwegian Blue Parrot
(lovely plumage). It's a fairly standard URL - the sort you'd get by default
from most CMSes. Compare that to this URL:
http://www.pets.com/products/7/
Clearly a much cleaner and shorter URL. It's much easier to
remember, and vastly easier to read out. That said, it doesn't exactly tell
anyone what it refers to. But we can do more:
http://www.pets.com/parrots/norwegian-blue/
Now we're getting somewhere. You can tell from the URL, even
when it's taken out of context, what you're likely to find on that page. Search
engines can split that URL into words (hyphens in URLs are treated as spaces by
search engines, whereas underscores are not), and they can use that information
to better determine the content of the page. It's an easy URL to remember and
to pass to another person.
Unfortunately, the last URL cannot be easily understood by a
server without some work on our part. When a request is made for that URL, the
server needs to work out how to process that URL so that it knows what to send
back to the user. URL rewriting is the technique used to "translate"
a URL like the last one into something the server can understand.
Platforms and Tools
Depending on the software your server is running, you may
already have access to URL rewriting modules. If not, most hosts will enable or
install the relevant modules for you if you ask them very nicely.
Apache is the easiest system to get URL rewriting running on.
It usually comes with its own built-in URL rewriting module, mod_rewrite,
enabled, and working with mod_rewrite is as simple as uploading correctly
formatted and named text files.
IIS, Microsoft's server software, doesn't include URL
rewriting capability as standard, but there are add-ons out there that can
provide this functionality. ISAPI_Rewrite is the one I recommend working with,
as I've so far found it to be the closest to mod_rewrite's functionality.
Instructions for installing and configuring ISAPI_Rewrite can be found at the
end of this article.
The code that follows is based on URL rewriting using
mod_rewrite.
Basic URL Rewriting
To begin with, let's consider a simple example. We have a
website, and we have a single PHP script that serves a single page. Its URL is:
http://www.pets.com/pet_care_info_07_07_2008.php
We want to clean up the URL, and our ideal URL would be:
http://www.pets.com/pet-care/
In order for this to work, we need to tell the server to
internally redirect all requests for the URL "pet-care" to
"pet_care_info_07_07_2008.php". We want this to happen internally,
because we don't want the URL in the browser's address bar to change.
To accomplish this, we need to first create a text document
called ".htaccess" to contain our rules. It must be named exactly
that (not ".htaccess.txt" or "rules.htaccess"). This would
be placed in the root directory of the server (the same folder as
"pet_care_info_07_07_2008.php" in our example). There may already be
an .htaccess file there, in which case we should edit that rather than
overwrite it.
The .htaccess file is a configuration file for the server. If
there are errors in the file, the server will display an error message (usually
with an error code of "500"). If you are transferring the file to the
server using FTP, you must make sure it is transferred using the ASCII mode,
rather than BINARY. We use this file to perform 2 simple tasks in this instance
- first, to tell Apache to turn on the rewrite engine, and second, to tell
apache what rewriting rule we want it to use. We need to add the following to
the file:
RewriteEngine On # Turn on the rewriting engine
RewriteRule ^pet-care/?$ pet_care_info_01_02_2008.php [NC,L] #
Handle requests for "pet-care"
A couple of quick items to note - everything following a hash
symbol in an .htaccess file is ignored as a comment, and I'd recommend you use
comments liberally; and the "RewriteEngine" line should only be used
once per .htaccess file (please note that I've not included this line from here
onwards in code example).
The "RewriteRule" line is where the magic happens.
The line can be broken down into 5 parts:
RewriteRule - Tells Apache that this like
refers to a single RewriteRule.
^/pet-care/?$ - The "pattern".
The server will check the URL of every request to the site to see if this
pattern matches. If it does, then Apache will swap the URL of the request for
the "substitution" section that follows.
pet_care_info_01_02_2003.php - The
"substitution". If the pattern above matches the request, Apache uses
this URL instead of the requested URL.
[NC,L] - "Flags", that tell
Apache how to apply the rule. In this case, we're using two flags.
"NC", tells Apache that this rule should be case-insensitive, and
"L" tells Apache not to process any more rules if this one is used.
# Handle requests for "pet-care"
- Comment explaining what the rule does (optional but recommended)
The rule above is a simple method for rewriting a single URL,
and is the basis for almost all URL rewriting rules.
Patterns and Replacements
The rule above allows you to redirect requests for a single
URL, but the real power of mod_rewrite comes when you start to identify and
rewrite groups of URLs based on patterns they contain.
Let's say you want to change all of your site URLs as described
in the first pair of examples above. Your existing URLs look like this:
http://www.pets.com/show_a_product.php?product_id=7
And you want to change them to look like this:
http://www.pets.com/products/7/
Rather than write a rule for every single product ID, you of
course would rather write one rule to manage all product IDs. Effectively you
want to change URLs of this format:
http://www.pets.com/show_a_product.php?product_id={a number}
And you want to change them to look like this:
http://www.pets.com/products/{a number}/
In order to do so, you will need to use "regular
expressions". These are patterns, defined in a specific format that the
server can understand and handle appropriately. A typical pattern to identify a
number would look like this:
[0-9]+
The square brackets contain a range of characters, and
"0-9" indicates all the digits. The plus symbol indicates that the
pattern will idenfiy one or more of whatever precedes the plus - so this
pattern effectively means "one or more digits" - exactly what we're
looking to find in our URL.
The entire "pattern" part of the rule is treated as
a regular expression by default - you don't need to turn this on or activate it
at all.
RewriteRule ^products/([0-9]+)/?$
show_a_product.php?product_id=$1 [NC,L]
# Handle product requests
The first thing I hope you'll notice is that we've wrapped
our pattern in brackets. This allows us to "back-reference" (refer
back to) that section of the URL in the following "substitution"
section. The "$1" in the substitution tells Apache to put whatever
matched the earlier bracketed pattern into the URL at this point. You can have
lots of backreferences, and they are numbered in the order they appear.
And so, this RewriteRule will now mean that Apache redirects
all requests for domain.com/products/{number}/ to
show_a_product.php?product_id={same number}.
Regular Expressions
A complete guide to regular expressions is rather beyond the
scope of this article. However, important points to remember are that the
entire pattern is treated as a regular expression, so always be careful of
characters that are "special" characters in regular expressions.
The most instance of this is when people use a period in
their pattern. In a pattern, this actually means "any character"
rather than a literal period, and so if you want to match a period (and only a
period) you will need to "escape" the character - precede it with
another special character, a backslash, that tells Apache to take the next
character to be literal.
For example, this RewriteRule will not just match the URL
"rss.xml" as intended - it will also match "rss1xml",
"rss-xml" and so on.
RewriteRule ^rss.xml$ rss.php [NC,L] # Change feed URL
This does not usually present a serious problem, but escaping
characters properly is a very good habit to get into early. Here's how it
should look:
RewriteRule ^rss\.xml$ rss.php [NC,L] # Change feed URL
This only applies to the pattern, not to the substitution.
Other characters that require escaping (referred to as
"metacharacters") follow, with their meaning in brackets afterwards:
. (any character)
* (zero of more of the preceding)
+ (one or more of the preceding)
{} (minimum to maximum quantifier)
? (ungreedy modifier)
! (at start of string means "negative
pattern")
^ (start of string, or
"negative" if at the start of a range)
$ (end of string)
[] (match any of contents)
- (range if used between square brackets)
() (group, backreferenced group)
| (alternative, or)
\ (the escape character itself)
Using regular expressions, it is possible to search for all
sorts of patterns in URLs and rewrite them when they match. Time for another
example - we wanted earlier to be able to indentify this URL and rewrite it:
http://www.pets.com/parrots/norwegian-blue/
And we want to be able to tell the server to interpret this
as the following, but for all products:
http://www.pets.com/get_product_by_name.php?product_name=norwegian-blue
And we can do that relatively simply, with the following
rule:
RewriteRule ^parrots/([A-Za-z0-9-]+)/?$
get_product_by_name.php?product_name=$1 [NC,L]
# Process parrots
With this rule, any URL that starts with "parrots"
followed by a slash (parrots/), then one or more (+) of any combination of
letters, numbers and hyphens ([A-Za-z0-9-]) (note the hyphen at the end of the
selection of characters within square brackets - it must be added there to be
treated literally rather than as a range separator). We reference the product
name in brackets with $1 in the substitution.
We can make it even more generic, if we want, so that it
doesn't matter what directory a product appears to be in, it is still sent to
the same script, like so:
RewriteRule ^[A-Za-z-]+/([A-Za-z0-9-]+)/?$ get_product_by_name.php?product_name=$1
[NC,L] # Process all products
As you can see, we've replaced "parrots" with a
pattern that matches letter and hyphens. That rule will now match anything in
the parrots directory or any other directory whose name is comprised of at
least one or more letters and hyphens.
Flags
Flags are added to the end of a rewrite rule to tell Apache
how to interpret and handle the rule. They can be used to tell apache to treat
the rule as case-insensitive, to stop processing rules if the current one
matches, or a variety of other options. They are comma-separated, and contained
in square brackets. Here's a list of the flags, with their meanings (this
information is included on the cheat sheet, so no need to try to learn them
all).
C (chained with next rule)
CO=cookie (set specified cookie)
E=var:value (set environment variable var
to value)
F (forbidden - sends a 403 header to the
user)
G (gone - no longer exists)
H=handler (set handler)
L (last - stop processing rules)
N (next - continue processing rules)
NC (case insensitive)
NE (do not escape special URL characters
in output)
NS (ignore this rule if the request is a
subrequest)
P (proxy - i.e., apache should grab the
remote content specified in the substitution section and return it)
PT (pass through - use when processing
URLs with additional handlers, e.g., mod_alias)
R (temporary redirect to new URL)
R=301 (permanent redirect to new URL)
QSA (append query string from request to
substituted URL)
S=x (skip next x rules)
T=mime-type (force specified mime type)
Moving Content
RewriteRule ^article/?$ http://www.new-domain.com/article/
[R,NC,L] # Temporary Move
Adding an "R" flag to the flags section changes how
a RewriteRule works. Instead of rewriting the URL internally, Apache will send
a message back to the browser (an HTTP header) to tell it that the document has
moved temporarily to the URL given in the "substitution" section.
Either an absolute or a relative URL can be given in the substitution section.
The header sent back includea a code - 302 - that indicates the move is
temporary.
RewriteRule ^article/?$ http://www.new-domain.com/article/
[R=301,NC,L] # Permanent Move
If the move is permanent, append "=301" to the
"R" flag to have Apache tell the browser the move is considered
permanent. Unlike the default "R", "R=301" will also tell
the browser to display the new address in the address bar.
This is one of the most common methods of rewriting URLs of
items that have moved to a new URL (for example, it is in use extensively on
this site to forward users to new post URLs whenever they are changed).
Conditions
Rewrite rules can be preceded by one or more rewrite
conditions, and these can be strung together. This can allow you to only apply
certain rules to a subset of requests. Personally, I use this most often when
applying rules to a subdomain or alternative domain as rewrite conditions can
be run against a variety of criteria, not just the URL. Here's an example:
RewriteCond %{HTTP_HOST} ^addedbytes\.com [NC] RewriteRule
^(.*)$ http://www.addedbytes.com/$1 [L,R=301]
The rewrite rule above redirects all requests, no matter what
for, to the same URL at "www.addedbytes.com". Without the condition,
this rule would create a loop, with every request matching that rule and being
sent back to itself. The rule is intended to only redirect requests missing the
"www" URL portion, though, and the condition preceding the rule
ensures that this happens.
The condition operates in a similar way to the rule. It
starts with "RewriteCond" to tell mod_rewrite this line refers to a
condition. Following that is what should actually be tested, and then the
pattern to test. Finally, the flags in square brackets, the same as with a
RewriteRule.
The string to test (the second part of the condition) can be
a variety of different things. You can test the domain being requested, as with
the above example, or you could test the browser being used, the referring URL
(commonly used to prevent hotlinking), the user's IP address, or a variety of
other things (see the "server variables" section for an outline of
how these work).
The pattern is almost exactly the same as that used in a
RewriteRule, with a couple of small exceptions. The pattern may not be
interpreted as a pattern if it starts with specific characters as described in
the following "exceptions" section. This means that if you wish to
use a regular expression pattern starting with <, >, or a hyphen, you
should escape them with the backslash.
Rewrite conditions can, like rewrite rules, be followed by
flags, and there are only two. "NC", as with rules, tells Apache to
treat the condition as case-insensitive. The other available flag is
"OR". If you only want to apply a rule if one of two conditions
match, rather than repeat the rule, add the "OR" flag to the first
condition, and if either match then the following rule will be applied. The
default behaviour, if a rule is preceded by multiple conditions, is that it is
only applied if all rules match.
Exceptions and Special Cases
Rewrite conditions can be tested in a few different ways -
they do not need to be treated as regular expression patterns, although this is
the most common way they are used. Here are the various ways rewrite conditons
can be processed:
<Pattern (is test string lower than
pattern)
>Pattern (is test string greater than
pattern)
=Pattern (is test string equal to pattern)
-d (is test string a valid directory)
-f (is test string a valid file)
-s (is test string a valid file with size
greater than zero)
-l (is test string a symbolic link)
-F (is test string a valid file, and
accessible (via subrequest))
-U (is test string a valid URL, and
accessible (via subrequest))
Server Variables
Server variables are a selection of items you can test when
writing rewrite conditions. This allows you to apply rules based on all sorts
of request parameters, including browser identifiers, referring URL or a
multitude of other strings. Variables are of the following format:
%{VARIABLE_NAME}
And "VARIABLE_NAME" can be replaced with any one of
the following items:
HTTP Headers
HTTP_USER_AGENT
HTTP_REFERER
HTTP_COOKIE
HTTP_FORWARDED
HTTP_HOST
HTTP_PROXY_CONNECTION
HTTP_ACCEPT
Connection Variables
REMOTE_ADDR
REMOTE_HOST
REMOTE_USER
REMOTE_IDENT
REQUEST_METHOD
SCRIPT_FILENAME
PATH_INFO
QUERY_STRING
AUTH_TYPE
Server Variables
DOCUMENT_ROOT
SERVER_ADMIN
SERVER_NAME
SERVER_ADDR
SERVER_PORT
SERVER_PROTOCOL
SERVER_SOFTWARE
Dates and Times
TIME_YEAR
TIME_MON
TIME_DAY
TIME_HOUR
TIME_MIN
TIME_SEC
TIME_WDAY
TIME
Special Items
API_VERSION
THE_REQUEST
REQUEST_URI
REQUEST_FILENAME
IS_SUBREQ
Working With Multiple Rules
The more complicated a site, the more complicated the set of
rules governing it can be. This can be problematic when it comes to resolving
conflicts between rules. You will find this issue rears its ugly head most
often when you add a new rule to a file, and it doesn't work. What you may find,
if the rule itself is not at fault, is that an earlier rule in the file is
matching the URL and so the URL is not being tested against the new rule you've
just added.
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$
get_product_by_name.php?category_name=$1&product_name=$2 [NC,L] #
Process product requests
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$
get_blog_post_by_title.php?category_name=$1&post_title=$2 [NC,L] # Process
blog posts
In the example above, the product pages of a site and the
blog post pages have identical patterns. The second rule will never match a
URL, because anything that would match that pattern will have already been
matched by the first rule.
There are a few ways to work around this. Several CMSes
(including wordpress) handle this by adding an extra portion to the URL to
denote the type of request, like so:
RewriteRule ^products/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$
get_product_by_name.php?category_name=$1&product_name=$2 [NC,L] #
Process product requests
RewriteRule ^blog/([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$
get_blog_post_by_title.php?category_name=$1&post_title=$2 [NC,L]# Process
blog posts
You could also write a single PHP script to process all
requests, which checked to see if the second part of the URL matched a blog
post or a product. I usually go for this option, as while it may increase the
load on the server slightly, it gives much cleaner URLs.
RewriteRule ^([A-Za-z0-9-]+)/([A-Za-z0-9-]+)/?$ get_product_or_blog_post.php?category_name=$1&item_name=$2
[NC,L] # Process product and blog requests
There are certain situations where you can work around this
issue by writing more precise rules and ordering your rules intelligently.
Imagine a blog where there were two archives - one by topic and one by year.
RewriteRule ^([A-Za-z0-9-]+)/?$
get_archives_by_topic.php?topic_name=$1 [NC,L] # Get archive by topic
RewriteRule ^([A-Za-z0-9-]+)/?$
get_archives_by_year.php?year=$1 [NC,L] # Get archive by
The above rules will conflict. Of course, years are numeric
and only 4 digits, so you can make that rule more precise, and by running it
first the only type of conflict you cound encounter would be if you had a topic
with a 4-digit number for a name.
RewriteRule ^([0-9]{4})/?$ get_archives_by_year.php?year=$1
[NC,L] # Get archive by year RewriteRule ^([A-Za-z0-9-]+)/?$
get_archives_by_topic.php?topic_name=$1 [NC,L] # Get archive by topic
mod_rewrite
Apache's mod_rewrite comes as standard with most Apache hosting
accounts, so if you're on shared hosting, you are unlikely to have to do
anything. If you're managing your own box, then you most likely just have to
turn on mod_rewrite. If you are using Apache1, you will need to edit your
httpd.conf file and remove the leading '#' from the following lines:
#LoadModule rewrite_module modules/mod_rewrite.so #AddModule
mod_rewrite.c
If you are using Apache2 on a Debian-based distribution, you
need to run the following command and then restart Apache:
sudo a2enmod rewrite
Other distubutions and platforms differ. If the above
instructions are not suitable for your system, then Google is your friend. You
may need to edit your apache2 configuration file and add "rewrite" to
the "APACHE_MODULES" list, or edit httpd.conf, or even download and
compile mod_rewrite yourself. For the majority, however, installation should be
simple.
ISAPI_Rewrite
ISAPI_Rewrite is a URL rewriting plugin for IIS based on
mod_rewrite and is not free. It performs most of the same functionality as
mod_rewrite, and there is a good quality ISAPI_Rewrite forum where most common
questions are answered. As ISAPI_Rewrite works with IIS, installation is
relatively simple - there are installation instructions available.
ISAPI_Rewrite rules go into a file named httpd.ini. Errors
will go into a file named httpd.parse.errors by default.
Leading Slashes
I have found myself tripped up numerous times by leading
slashes in URL rewriting systems. Whether they should be used in the pattern or
in the substitution section of a RewriteRule or used in a RewriteCond statement
is a constant source of frustration to me. This may be in part because I work
with different URL rewriting engines, but I would advise being careful of
leading slashes - if a rule is not working, that's often a good place to start
looking. I never include leading slashes in mod_rewrite rules and always
include them in ISAPI_Rewrite.
Sample Rules
To redirect an old domain to a new domain:
RewriteCond %{HTTP_HOST} old_domain\.com [NC] RewriteRule
^(.*)$ http://www.new_domain.com/$1 [L,R=301]
To redirect all requests missing "www" (yes www):
RewriteCond %{HTTP_HOST} ^domain\.com [NC] RewriteRule ^(.*)$
http://www.domain.com/$1 [L,R=301]
To redirect all requests with "www" (no www):
RewriteCond %{HTTP_HOST} ^www\.domain\.com [NC] RewriteRule
^(.*)$ http://domain.com/$1 [L,R=301]
Redirect old page to new page:
RewriteRule ^old-url\.htm$ http://www.domain.com/new-url.htm
[NC,R=301,L]
.htaccess Error
Documents
In Apache, you can set up each directory on your server individually, giving
them different properties or requirements for access. And while you can do this
through normal Apache configuration, some hosts may wish to give users the ability
to set up their own virtual server how they like. And so we have .htaccess
files, a way to set Apache directives on a directory by directory basis without
the need for direct server access, and without being able to affect other
directories on the same server.
One up-side of this (amongst many) is that with a few short lines in an
.htaccess file, you can tell your server that, for example, when a user asks
for a page that doesn't exist, they are shown a customized error page instead
of the bog-standard error page they've seen a million times before. If you
visit http://www.addedbytes.com/random_made_up_address then you'll see this in
action - instead of your browser's default error page, you see an error page
sent by my server to you, telling you that the page you asked for doesn't
exist.
This has a fair few uses. For example, my 404 (page not found) error page also
sends me an email whenever somebody ends up there, telling me which page they
were trying to find, and where they came from to find it - hopefully, this will
help me to fix broken links without needing to trawl through mind-numbing error
logs.
[Aside: If you set up your custom error page to email you whenever a page isn't
found, remember that "/favicon.ico" requests failing doesn't mean
that a page is missing. Internet Explorer 5 assumes everyone has a
"favicon" and so asks the server for it. It's best to filter error
messages about missing "/favicon.ico" files from your error logging,
if you plan to do any.]
Setting up your htaccess file is a piece of cake. First things first, open
notepad (or better yet, [url=http://www.editplus.com/]EditPlus2[/url]), and add
the following to a new document:
ErrorDocument 404 /404.html
Next you need to save the file. You need to save it as ".htaccess".
Not ".htaccess.txt", or "mysite.htaccess" - just
".htaccess". I know it sounds strange, but that is what these files
are - just .htaccess files. Nothing else. Happy? If not, take a look at this
[url=http://wsabstract.com/howto/htaccess.shtml].htaccess guide[/url], which
also explains the naming convention of .htaccess in a little more depth. If you
do use Notepad, you may need to rename the file after saving it, and you can do
this before or after uploading the file to your server.
Now, create a page called 404.html, containing whatever you want a visitor to
your site to see when they try to visit a page that doesn't exist. Now, upload
both to your website, and type in a random, made-up address. You should, with
any luck, see your custom error page instead of the traditional "Page Not
Found" error message. If you do not see that, then there is a good chance
your server does not support .htaccess, or it has been disabled. I suggest the
next thing you do is check quickly with your server administrator that you are
allowed to use .htaccess to serve custom error pages.
If all went well, and you are now viewing a custom 404 (page not found) error
page, then you are well on your way to a complete set of error documents to
match your web site. There are more errors out there, you know, not just
missing pages. Of course, you can also use PHP, ASP or CFML pages as error
documents - very useful for keeping track of errors.
You can customize these directives a great deal. For example, you can add
directives for any of the status codes below, to show custom pages for any
error the server may report. You can also, if you want, specify a full URL
instead of a relative one. And if you are truly adventurous, you could even use
pure HTML in the .htaccess file to be displayed in case of an error, as below.
Note that if you want to use HTML, you must start the HTML with a quotation
mark, however you should not put one at the other end of the HTML (you can
include quotation marks within the HTML itself as normal).
ErrorDocument 404 "Ooops, that page was <b>not found</b>.
Please try a different one or <a
href="mailto:owner@site.com">email the site owner</a> for
assistance.
Server response codes
A server reponse code is a three digit number sent by a server to a user in
response to a request for a web page or document. They tell the user whether
the request can be completed, or if the server needs more information, or if
the server cannot complete the request. Usually, these codes are sent
'silently' - so you never see them, as a user - however, there are some common
ones that you may wish to set up error pages for, and they are listed below.
Most people will only ever need to set up error pages for server codes 400,
401, 403, 404 and 500, and you would be wise to always have an error document
for 404 errors at the very least.
It is also relatively important to ensure that any error page is over 512 bytes
in size. Internet Explorer 5, when sent an error page of less than 512 bytes,
will display its own default error document instead of your one. Feel free to
use padding if this is an issue - personally, I'm not going to increase the
size of a page because Internet Explorer 5 doesn't behave well.
In order to set up an error page for any other error codes, you simply add more
lines to your .htaccess file. If you wanted to have error pages for the above
five errors, your .htaccess file might look something like this:
ErrorDocument 400 /400.html
ErrorDocument 401 /401.html
ErrorDocument 403 /403.html
ErrorDocument 404 /404.html
ErrorDocument 500 /500.html
HTTP
Status Codes
Informational
100 - Continue
A status code of 100 indicates that (usually the first) part
of a request has been received without any problems, and that the rest of the
request should now be sent.
101 - Switching Protocols
HTTP 1.1 is just one type of protocol for transferring data
on the web, and a status code of 101 indicates that the server is changing to
the protocol it defines in the "Upgrade" header it returns to the
client. For example, when requesting a page, a browser might receive a statis
code of 101, followed by an "Upgrade" header showing that the server
is changing to a different version of HTTP.
Successful
200 - OK
The 200 status code is by far the most common returned. It
means, simply, that the request was received and understood and is being
processed.
201 - Created
A 201 status code indicates that a request was successful
and as a result, a resource has been created (for example a new page).
202 - Accepted
The status code 202 indicates that server has received and
understood the request, and that it has been accepted for processing, although
it may not be processed immediately.
203 - Non-Authoritative Information
A 203 status code means that the request was received and
understood, and that information sent back about the response is from a third
party, rather than the original server. This is virtually identical in meaning
to a 200 status code.
204 - No Content
The 204 status code means that the request was received and
understood, but that there is no need to send any data back.
205 - Reset Content
The 205 status code is a request from the server to the
client to reset the document from which the original request was sent. For
example, if a user fills out a form, and submits it, a status code of 205 means
the server is asking the browser to clear the form.
206 - Partial Content
A status code of 206 is a response to a request for part of
a document. This is used by advanced caching tools, when a user agent requests
only a small part of a page, and just that section is returned.
Redirection
300 - Multiple Choices
The 300 status code indicates that a resource has moved. The
response will also include a list of locations from which the user agent can
select the most appropriate.
301 - Moved Permanently
A status code of 301 tells a client that the resource they
asked for has permanently moved to a new location. The response should also
include this location. It tells the client to use the new URL the next time it
wants to fetch the same resource.
302 - Found
A status code of 302 tells a client that the resource they
asked for has temporarily moved to a new location. The response should also
include this location. It tells the client that it should carry on using the
same URL to access this resource.
303 - See Other
A 303 status code indicates that the response to the request
can be found at the specified URL, and should be retrieved from there. It does
not mean that something has moved - it is simply specifying the address at
which the response to the request can be found.
304 - Not Modified
The 304 status code is sent in response to a request (for a
document) that asked for the document only if it was newer than the one the
client already had. Normally, when a document is cached, the date it was cached
is stored. The next time the document is viewed, the client asks the server if
the document has changed. If not, the client just reloads the document from the
cache.
305 - Use Proxy
A 305 status code tells the client that the requested
resource has to be reached through a proxy, which will be specified in the
response.
307 - Temporary Redirect
307 is the status code that is sent when a document is
temporarily available at a different URL, which is also returned. There is very
little difference between a 302 status code and a 307 status code. 307 was
created as another, less ambiguous, version of the 302 status code.
Client Error
400 - Bad Request
A status code of 400 indicates that the server did not
understand the request due to bad syntax.
401 - Unauthorized
A 401 status code indicates that before a resource can be
accessed, the client must be authorised by the server.
402 - Payment Required
The 402 status code is not currently in use, being listed as
"reserved for future use".
403 - Forbidden
A 403 status code indicates that the client cannot access
the requested resource. That might mean that the wrong username and password
were sent in the request, or that the permissions on the server do not allow
what was being asked.
404 - Not Found
The best known of them all, the 404 status code indicates
that the requested resource was not found at the URL given, and the server has
no idea how long for.
405 - Method Not Allowed
A 405 status code is returned when the client has tried to
use a request method that the server does not allow. Request methods that are
allowed should be sent with the response (common request methods are POST and
GET).
406 - Not Acceptable
The 406 status code means that, although the server
understood and processed the request, the response is of a form the client
cannot understand. A client sends, as part of a request, headers indicating
what types of data it can use, and a 406 error is returned when the response is
of a type not i that list.
407 - Proxy Authentication Required
The 407 status code is very similar to the 401 status code,
and means that the client must be authorised by the proxy before the request
can proceed.
408 - Request Timeout
A 408 status code means that the client did not produce a
request quickly enough. A server is set to only wait a certain amount of time
for responses from clients, and a 408 status code indicates that time has
passed.
409 - Conflict
A 409 status code indicates that the server was unable to
complete the request, often because a file would need to be editted, created or
deleted, and that file cannot be editted, created or deleted.
410 - Gone
A 410 status code is the 404's lesser known cousin. It
indicates that a resource has permanently gone (a 404 status code gives no
indication if a resource has gine permanently or temporarily), and no new
address is known for it.
411 - Length Required
The 411 status code occurs when a server refuses to process
a request because a content length was not specified.
412 - Precondition Failed
A 412 status code indicates that one of the conditions the
request was made under has failed.
413 - Request Entity Too Large
The 413 status code indicates that the request was larger
than the server is able to handle, either due to physical constraints or to
settings. Usually, this occurs when a file is sent using the POST method from a
form, and the file is larger than the maximum size allowed in the server
settings.
414 - Request-URI Too Long
The 414 status code indicates the the URL requested by the
client was longer than it can process.
415 - Unsupported Media Type
A 415 status code is returned by a server to indicate that
part of the request was in an unsupported format.
416 - Requested Range Not Satisfiable
A 416 status code indicates that the server was unable to
fulfill the request. This may be, for example, because the client asked for the
800th-900th bytes of a document, but the document was only 200 bytes long.
417 - Expectation Failed
The 417 status code means that the server was unable to
properly complete the request. One of the headers sent to the server, the
"Expect" header, indicated an expectation the server could not meet.
Server Error
500 - Internal Server Error
A 500 status code (all too often seen by Perl programmers)
indicates that the server encountered something it didn't expect and was unable
to complete the request.
501 - Not Implemented
The 501 status code indicates that the server does not
support all that is needed for the request to be completed.
502 - Bad Gateway
A 502 status code indicates that a server, while acting as a
proxy, received a response from a server further upstream that it judged
invalid.
503 - Service Unavailable
A 503 status code is most often seen on extremely busy
servers, and it indicates that the server was unable to complete the request
due to a server overload.
504 - Gateway Timeout
A 504 status code is returned when a server acting as a
proxy has waited too long for a response from a server further upstream.
505 - HTTP Version Not Supported
A 505 status code is returned when the HTTP version
indicated in the request is no supported. The response should indicate which
HTTP versions are supported.