One of the surprising things about Web sites is that, in certain
cases, a very small machine can handle a huge number of visitors. For
example, imagine that you have a simple Web site containing a number of
static pages
(in this case, "static" means that everybody sees the same version of
any page when they view it). If you took a normal 500MHz Celeron machine
running Windows NT or Linux, loaded the Apache Web server
on it, and connected this machine to the Internet with a T3 line (45
million bits per second), you could handle hundreds of thousands of
visitors per day. Many ISPs will rent you a dedicated-machine
configuration like this for $1,000 or less per month. This configuration
will work great unless:
- You need to handle millions of visitors per day.
- The single machine fails (in this case, your site will be down until a new machine is installed and configured).
- The pages are extremely large or complicated.
- The pages need to change dynamically on a per-user basis.
- Any back-end processing needs to be performed to create the contents of the page or to process a request on the page.
There are three main strategies for handling the load:
- The site can invest in a single huge machine with lots of processing power, memory, disk space and redundancy.
- The site can distribute the load across a number of machines.
- The site can use some combination of the first two options.
When you visit a site that has a different URL every time you visit (for example www1.xyz.com, www2.xyz.com, www3.xyz.com,
etc.), then you know that the site is using the second approach at the
front end. Typically the site will have an array of stand-alone machines
that are each running Web server software. They all have access to an
identical copy of the pages for the site. The incoming requests for
pages are spread across all of the machines in one of two ways:
- The Domain Name Server (DNS) for the site can distribute the load. DNS is an Internet service that translates domain names into IP addresses. Each time a request is made for the Web server, DNS rotates through the available IP addresses in a circular way to share the load. The individual servers would have common access to the same set of Web pages for the site.
- Load balancing switches can distribute the load. All requests for the Web site arrive at a machine that then passes the request to one of the available servers. The switch can find out from the servers which one is least loaded, so all of them are doing an equal amount of work. This is the approach that HowStuffWorks uses with its servers. The load balancer spreads the load among three different Web servers. One of the three can fail with no effect on the site.
The advantage of this redundant approach
is that the failure of any one machine does not cause a problem -- the
other machines pick up the load. It is also easy to add capacity in an
incremental way. The disadvantage is that these machines will still have
to talk to some sort of centralized database if there is any
transaction processing going on.
Microsoft's TerraServer
takes the "single large machine" approach. Terraserver stores several
terabytes of satellite imagery data and handles millions of requests for
this information. The site uses huge enterprise-class machines to
handle the load. For example, a single Digital AlphaServer 8400 used at
TerraServer has eight 440 MHz 64-bit processors and 10 GB of error
checked and corrected RAM. See the technology description for some truly impressive specifications!