An Introduction to Apache

An Introduction to Apache

Tutorial Details
  • Program: Apache
  • Difficulty: Beginner

If Apache has always seemed like a black box to you, it’s time to learn just what’s going on behind the scenes!

Apache is the most popular web server available.

A web server’s job is basically to accept requests from clients and send responses to those requests. A web server gets a URL, translates it to a filename (for static requests), and sends that file back over the internet from the local disk, or it translates it to a program name (for dynamic requests), executes it, and then sends the output of that program back over the internet to the requesting party. If for any reason, the web server was not able to process and complete the request, it instead returns an error message. The word, web server, can refer to the machine (computer/hardware) itself, or the software that receives requests and sends out responses.

Apache is the most popular web server (after which comes Microsoft’s IIS) available. The reasons behind its popularity, to name a few, are:

  1. It is free to download and install.
  2. It is open source: the source code is visible to anyone and everyone, which basically enables anyone (who can rise up to the challenge) to adjust the code, optimize it, and fix errors and security holes. People can add new features and write new modules.
  3. It suits all needs: Apache can be used for small websites of one or two pages, or huge websites of hundreds and thousands of pages, serving millions of regular visitors each month. It can serve both static and dynamic content.

What is Apache?

Functionality that you don’t need or want can easily be removed.

The Apache HTTP server is a software (or program) that runs in the background under an appropriate operating system, which supports multi-tasking, and provides services to other applications that connect to it, such as client web browsers. It was first developed to work with Linux/Unix operating systems, but was later adapted to work under other systems, including Windows and Mac. The Apache binary running under UNIX is called HTTPd (short for HTTP daemon), and under win32 is called Apache.exe.

Installing Apache on Linux does require a bit of programming skills (though it is not too difficult). Installing it on a Windows platform is straight forward, as you can run it through a graphical user interface.

Apache’s original core is fairly basic and contains a limited number of features. Its power rather comes from added functionality introduced through many modules that are written by programmers and can be installed to extend the server’s capabilities. To add a new module, all you need to do is install it and restart the Apache server. Functionality that you don’t need or want can easily be removed which is actually considered a good practice as it keeps the server small and light, starts faster, consumes less system resources and memory, and makes the server less prone to security holes. The Apache server also supports third party modules, some of which have been added to Apache 2 as permanent features. The Apache server very easily integrates with other open source applications, such as PHP and MySQL, making it even more powerful than it already is.

A web server in its simplest form is a computer with special software, and an internet connection that allows it to connect to other devices.

Every device connected to a network has an IP address through which others connect to and communicate with it. This IP address is sort of like a regular address that you need in real life to call or visit any contact of yours. If they didn’t have an address, you wouldn’t know how to call or reach them. IP addresses serve the exact same purpose. If a device didn’t have one, the other machines on the same network wouldn’t know how to reach it.

The Apache server offers a number of services that clients might make use of. These services are offered using various protocols through different ports, and include: hypertext transfer protocol (HTTP), typically through port 80, simple mail transfer protocol (SMTP), typically through port 25, domain name service (DNS) for mapping domain names to their corresponding IP addresses, genearlly through port 53, and file transfer protocol (FTP) for uploading and downloading files, usually through port 21.


How Apache Works

Apache’s main role is all about communication over networks, and it uses the TCP/IP protocol (Transmission Control Protocol/Internet Protocol which allows devices with IP addresses within the same network to communicate with one another).

The TCP/IP protocol is a set of rules that define how clients make requests and how servers respond, and determine how data is transmitted, delivered, received, and acknowledged.

The Apache server is set up to run through configuration files, in which directives are added to control its behavior. In its idle state, Apache listens to the IP addresses identified in its config file (HTTPd.conf). Whenever it receives a request, it analyzes the headers, applies the rules specified for it in the Config file, and takes action.

But one server can host many websites, not just one – though, to the outside world, they seem separate from one another. To achieve this, every one of those websites has to be assigned a different name, even if those all map eventually to the same machine. This is accomplished by using what is known as virtual hosts.

Since IP addresses are difficult to remember, we, as visitors to specific sites, usually type in their respective domain names into the URL address box on our browsers. The browser then connects to a DNS server, which translates the domain names to their IP addresses. The browser then takes the returned IP address and connects to it. The browser also sends a Host header with the request so that, if the server is hosting multiple sites, it will know which one to serve back.

For example, typing in www.google.com into your browser’s address field might send the following request to the server at that IP address:

GET / HTTP/1.1
Host: www.google.com

The first line contains several pieces of information. First, there is the method (in this case it’s a GET), the URI, which specifies which page to be retrieved or which program to be run (in this case it’s the root directory denoted by the /), and finally there is the HTTP version (which in this case is HTTP 1.1).

HTTP is a request / response stateless protocol.

HTTP is a request / response stateless protocol. It’s a set of rules that govern communication between a client and the server. The client (usually but not necessarily a web browser) makes a request, the server sends back a response, and communication stops. The server doesn’t look forward for more communication as is the case with other protocols that stay at a waiting state after the request is over.

If the request is successful, the server returns a 200 status code (which means that the page is found), response headers, along with the requested data. The response header of an Apache server might look something like the following:

HTTP/1.1 200 OK
Date: Sun, 10 Jun 2012 19:19:21 GMT
Server: Apache
Expires: Wed, 11 Jan 1984 05:00:00 GMT
Cache-Control: no-cache, must-revalidate, max-age=0
Pragma: no-cache
Last-Modified: Sun, 10 Jun 2012 19:19:21 GMT
Vary: Accept-Encoding,User-Agent
Content-Type: text/html; charset=UTF-8
Content-Length: 7560
  

The first line in the response header is the status line. It contains the HTTP version and the status code. The date follows next, and then some information about the host server and the retrieved data. The Content-Type header lets the client know the type of data retrieved so it knows how to handle it. Content-Length lets the client know the size of the response body. If the request didn’t go throw, the client would get an error code and message, such as the following response header in case of a page not found error:

HTTP/1.1 404 Not Found

TCP/IP Protocol

TCP/IP is actually two protocols built one on top of the other.

TCP/IP is actually two protocols built one on top of the other. The IP protocol is responsible for getting the transferred data from one point to another. It takes the data to be transferred between the two points, splits it into smaller packets, attaches the source and destination addresses to each packet, and transfers the data.

TCP handles the part that includes establishing the connection between the two parties, making sure the data arrives to its destination, taking care of any data loss and managing data recovery.

Once a message is received, the destination party sends an Acknowledged (ACK) message to the sending host if all goes well, notifying it of data arrival. If something goes wrong, such as the occurrence of a data loss situation, the destination sends a Not Acknowledged (NAK) message instead, notifying the sending host of the problem and informing it of the need to resend the data packet.

As discussed earlier, Apache offers many services, which clients might want to connect to, to make use of or benefit from. TCP manages each service so that it is accessed through a particular port to differentiate between the various services. This way, it ensures that any one given interface (or host) can offer multiple services. So when a client connects to a host, it passes the port number along with the IP address. Browsers use the HTTP protocol which by default uses port 80, so there’s no need for further specification.

The following image is a snap shot of my FTP software (WinScp). As you can see, to FTP my server I not only need to provide the IP address (or alternatively type in the domain name), but I also need to specify the port number that my server provides the service through. In the case of FTP, the port number is 21. In the case of SFTP (secure FTP), the port number is 22.

Under UNIX, a list of services offered along with their respective port numbers can be found in the file /etc/services. The following command will display the contents of the file:

more /etc/services

Below is a screenshot showing a part of the file. As you can see, services are listed in the first column, followed by the port number to be accessed at and the protocol name the service uses.

Under windows the file is called Services, and can be found under C:\WINNT\system32\drivers\etc\


Inetd

To preserve system resources, UNIX handles many of its services through the internet daemon.

To preserve system resources, UNIX handles many of its services through the internet daemon (inetd), as opposed to a constantly running daemon. The inetd is a super server that listens to the various ports and handles connection requests as it receives them by initiating a new copy of the appropriate daemon (program). The new copy of the program then takes it from there and works with the client, and inted goes back to listening to the server ports waiting for new client requests to handle. Once the request is processed and the communication is over, the daemon exits.


General Structure

As mentioned earlier, Apache can be installed on a variety of operating systems. Regardless of the platform used, a hosted website will typically have four main directories: htdocs, conf, logs, cgi-bin.

htdocs is the default Apache web server document directory, meaning it is the public directory whose contents are usually available for clients connecting through the web. It contains all static pages and dynamic content to be served once an HTTP request for them is received. Since files and sub-directories under htdocs are available to the public, correct handling of file permissions is of great importance so as not to compromise the server’s safety and security.

conf is the directory where all server configuration files are located. Configuration files are basically plain text files where directives are added to control the web server’s behavior and functionality. Each directive is usually placed on a separate line, and the hash (#) key indicates a comment so the line proceeded by it is ignored.

logs is the directory where server logs are kept, and includes Apache access logs and error logs. The Apache HTTP Server provides a variety of different mechanisms for logging everything that happens on it, from the initial request, through the URL mapping process, to the final resolution of the connection, including any errors that may have occurred in the process. In addition to this, third-party modules may provide logging capabilities, or inject entries into the existing log files, and applications such as PHP scripts, or other handlers, may send messages to the server error log.

cgi-bin is the directory where CGI scripts are kept. The CGI (Common Gateway Interface) defines a way for a web server to interact with external content-generating programs, which are often referred to as CGI programs or CGI scripts. These are programs or shell scripts that are written to be executed by Apache on behalf of its clients.

It is important to note that the above discussed file and directory names (as well as locations) can differ from one server to another depending on the Apache flavor installed and the operating system it runs under. The roles though remain the same.


Conclusion

…with more than half of the sites on the web running on it.

Apache has been the most popular web server on the internet since 1996, with more than half the sites on the web running on it. It played a key role in shaping and making the World Wide Web what it is today. The reasons behind its success are obvious and the way things are looking, it will probably stay in the lead at least for quite some time. This was meant to be an introductory session to this powerful piece of software and I hope it was of help in understanding what this great tool is and how it generally works.

Tags: apache
Note: Want to add some source code? Type <pre><code> before it and </code></pre> after it. Find out more
  • http://armonge.info Andrés Reyes Monge

    Why would you need programming skills to install Apache under linux?

    • Jesus Bejarano

      would’nt been better just installing lamp?

      • Mateusz Charytoniuk

        LAMP = Linux Apache MySQL PHP

    • http://jayokey.net Jay Okey

      I thought the same thing, I guess he means if you compile from source. But, most distros have package.

    • eric

      sure you could just leverage apt, or RPM on linux to get started. Eventually you will want to compile the source yourself to better comprehend how this work of art is structured and glued together. This will give you a better grasp of HTTP, and the dependancies on the underlying operating system. And it will allow you to extend httpd in ways you never realized could have been done.

      • http://syonsoftware.com jspmahavir

        I read this article but i want to know what file gets and how to the whole process will be run on server.

        If we are not an internet user then how we know what’s going on in html and dynamic content.

    • Diana Eftaiha
      Author

      Because you’ll be doing it from the command line so you’ll need to know the commands to download, extract, install and run apache. But as I said, it is not very difficult.

      • http://www.kotaweaver.com Kota Weaver

        Command line = programming skills…..? I know plenty of people who can use bash but can’t write a line of anything else (or even write bash scripts). I wouldn’t say they have programming skills but rather bash skills or perhaps a bit of know-how with the computer. I have never referred to that as “programming skills”… Though perhaps things have changed. One can install Linux without entering in a single command these days I hear.

        Either way, thanks for the interesting article. Any intention of writing a similar one for Nginx?

  • http://inkwell.dotink.org Matthew J. Sahagian

    The article is inaccurate. See: http://www.zdnet.com/blog/open-source/nginx-takes-2nd-place-in-web-servers-from-microsoft-iis/10101

    That said, even when I used Apache, there was always something that bothered me about it. I have greatly reduced that feeling (not to mention my memory footprint) with NGINX.

    • Magnus Andersson

      Why is it inaccurate? Becouse IIS isn’t number two anymore? Thats not the point in the article.

    • Matthew

      Actually, the article is correct. What you are referring to is the fact that NGINX is the second most popular OPEN SOURCE web server. NGINX own website (http://nginx.com/) states as much. So, the article is accurate. You aren’t, and neither is ZDNet. In fact, the article that ZDNet uses is the same that NGINX uses. Only difference? You and ZDNet are using just one statistic from the article to prove your point, when in reality it tells another story. Microsoft IIS has the second biggest market share, period. You are cherry picking (rather, ZDNet is) one statistic out of many. Follow NGINX’s own lead, and don’t make false claims.

    • Biswadip Dasgupta

      That is an assertion which cannot be substantiated.

  • Brad

    Excellent explanation. Thank you

  • Robert Smith

    Excellent tutorial. Please consider writing a follow-up post (maybe, one describing a working example with that amenable explanation).

    • Esteve

      +1 (including rules and exception handlers)

      • Diana Eftaiha
        Author

        thank you guys glad you like. more articles are definitely coming your way soon!

  • http://websourcefree.com saha

    Great tutorial.Very well explained.

  • http://twitter.com/vladimir_light V-Light

    please make a mod_rewrite (premium) video-tutorial.

    • Diana Eftaiha
      Author

      already on the list ;)

  • Richy

    I’d like to see one on installing/configuring/optimizing Apache & NGINX and mod_rewrite.

  • thecodingdude

    Apache may be popular, but that’s only because every web host on the planet seems to use it. Apache is crap when it comes to creating huge websites. Nginx is the way to go.

  • http://www.thomashenson.com Thomas Henson

    Good article, I have been hacking my way through Apache not really understanding some of the structure. This article has helped me understand the WHY.

  • EM

    I stopped at this paragraph …

    “The Apache server offers a number of services … (HTTP) … (SMTP) .. (DNS) … (FTP)”

    Uh, no, Apache is HTTP(S) … other services have their own daemons (eg. Postfix, Bind, vsftpd, etc)

    • http://www.aggressivex.com Luis Hdez (Aggressivex)

      That’s incorrect, apache can work as FTP , DNS etc. There is modules that can work for that purposes mod_ftp mod_dns … Eg: http://httpd.apache.org/mod_ftp/mod/mod_ftp.html

      Anyway I also thing that paragraph wasn’t very good and he made the mistake that you’re referring.

  • Vanja Djurdjevic

    Very nice article, but in my opinion not thorough enough.

  • http://www.elimcmakin.com Eli McMakin

    love it. Would love to see more on Apache from the author. This is a field that is often neglected but very important to understand

  • http://cansurmeli.com C@N

    Nicely done!

  • Gus Fune

    I liked this article a lot. I’d like to see more about webservers. Comparsion between IIS, Nginx, Lighttpd; or introduction to IIS, a getting started with Apache (installing, setup), cache systems (Varnish, Nginx as proxy), etc.

  • Abdallah

    Good easy Explanation. Did you Know in wish programming language it is written?
    thank

  • http://akmwebtech.in awebtech

    nice thank you really the Apache http sever is to good

  • sachin

    Very nice and easy to understand explanation.love it.
    would you please write “how to install and run Apache server”tutorial in near future?I will like it.

  • Marcin

    Thanks for the article. However, I’d love to know more about NGINX, how to setup and how it compares to Apache.

  • felix

    Nice article!
    Would be nice to read more articles like this which introduce base software components.

  • Chris Sanders

    Great Article guys!

  • http://feed2.me David

    Wow, a woman wrote this. Sexy!

    • whage

      wow, i really imagined a man’s voice while reading it

  • http://www.ostheimer.at/ Andreas Ostheimer

    Good tutoria but too short. It does answer some questions in the text but some sequence diagrams would make the back and forth of messages more obvious.

  • http://twitter.com/zsljulius zsljulius

    This is the best introduction I could possibly find online. You have done a great job explaining these important concepts.

  • tiny

    Nice one :)Thnq !!

  • whage

    Such a well written and useful article! Thank you!

  • Tarun

    Amazing! Very well explained. Thanks

  • Biswadip Dasgupta

    Awesome – superb explanation in clear and impeccable English. Bravo!