The Explorer Guide : November 2011

Whenever you visit a web page, your computer will request data from a server through HTTP. Even before the requested page is displayed in your browser, the web server will send the HTTP header that has the status code. The status code provides information about the status of the request. A normal web page gets the status code as 200. But we do not see this as the server proceeds to send the contents of the page. It’s only when there is an error, we see the status code 404 Not Found.

Origin of Status Codes

As a part of the HTTP 0.9 specifications, the World Wide Web Consortium (W3C) established HTTP status codes in 1992. Tim Berners-Lee, who invented the web and the first web browser in 1990, defined the status codes.

List of Status Codes

A brief overview of HTTP status codes is given below.

Code	Meaning	Description
100	Continue	Confirms the client about the arrival of the first part of the request and informs to continue with the rest of the request or ignore if the request has been fulfilled
101	Switching Protocols	Informs the client about the server switching the protocols to that specified in the Upgrade message header field during the current connection.
200	OK	Standard response for successful requests
201	Created	Request fulfilled and new resource created
202	Accepted	Request accepted, but not yet processed
203	Non-Authoritative Information	Returned meta information was not the definitive set from the origin server.
204	No Content	Request succeeded without requiring the return of an entity-body
205	Reset Content	Request succeeded but require resetting of the document view that caused the request
206	Partial Content	Partial GET request was successful
300	Multiple Choices	Requested resource has multiple choices at different locations.
301	Moved Permanently	Resource permanently moved to a different URL.
302	Found	Requested resource was found under a different URL but the client should continue to use the original URL.
303	See Other	Requested response is at a different URL and can be accessed only through a GET command.
304	Not Modified	Resource not modified since the last request.
305	Use Proxy	Requested resource should be accessed through the proxy specified in the location field.
306	No Longer Used	Reserved for future use
307	Temporary Redirect	Resource has been moved temporarily to a different URL.
400	Bad Request	Syntax of the request not understood by the server.
401	Not Authorized	Request requires user authentication
402	Payment Required	Reserved for future use.
403	Forbidden	Server refuses to fulfill the request.
404	Not Found	Document or file requested by the client was not found.
405	Method Not Allowed	Method specified in the Request-Line was not allowed for the specified resource.
406	Not Acceptable	Resource requested generates response entities that has content characteristics not specified in the accept headers.
407	Proxy Authentication Required	Request requires the authentication with the proxy.
408	Request Timeout	Client fails to send a request in the time allowed by the server.
409	Conflict	Request was unsuccessful due to a conflict in the state of the resource.
410	Gone	Resource requested is no longer available with no forwarding address
411	Length Required	Server doesn’t accept the request without a valid Content-Length header field.
412	Precondition Failed	Precondition specified in the Request-Header field returns false.
413	Request Entity Too Large	Request unsuccessful as the request entity is larger than that allowed by the server
414	Request URL Too Long	Request unsuccessful as the URL specified is longer than the one, the server is willing to process.
415	Unsupported Media Type	Request unsuccessful as the entity of the request is in a format not supported by the requested resource
416	Requested Range Not Satisfiable	Request included a Range request-header field without any range-specifier value
417	Expectation Failed	Expectation given in the Expect request-header was not fulfilled by the server.
422	Unprocessable Entity	Request well-formed but unable to process because of semantic errors
423	Locked	Resource accessed was locked
424	Failed Dependency	Request failed because of the failure of a previous request
426	Upgrade Required	Client should switch to Transport Layer Security
500	Internal Server Error	Request unsuccessful because of an unexpected condition encountered by the server.
501	Not Implemented	Request unsuccessful as the server could not support the functionality needed to fulfill the request.
502	Bad Gateway	Server received an invalid response from the upstream server while trying to fulfill the request.
503	Service Unavailable	Request unsuccessful to the server being down or overloaded.
504	Gateway Timeout	Upstream server failed to send a request in the time allowed by the server.
505	HTTP Version Not Supported	Server does not support the HTTP version specified in the request.

Meaning of 404

When we expand the code 404, the first digit “4” represents a client error. The server indicates that you did a mistake like misspelling the URL or requesting for a page that is no longer available.
The middle digit, 0 represents a general syntax error and could indicate a spelling mistake.
The last digit, 4 refers to a specific error in the group of 40x.
The World Wide Web Consortium (W3C) states that 404 Not Found should be used in cases where the server fails to find the requested location and is unsure of its status. Whenever a page has been permanently removed, the status code used must be 410. But hardly have we seen a 410 page. Instead, 404 Not Found page has become popular and the most commonly used error page.

Content of a 404 Error Page

A 404 response code is always followed by a human readable reason phrase as per the HTTP specification. Generally, a web server issues an HTML page that has the 404 code and the “Not Found” phrase by default. You can configure a web server to display a branded page with a better description and a search form. But the protocol level phrase requires no customization as it is hidden from the user.

Soft 404s

Soft 404 errors are actually “Not Found” errors returned by a web server as a standard web page with a 200 Ok response code. In an automated process of discovering a broken link, the soft 404 errors are problematic.
The BT Group of UK has a clean feed content blocking system that returns a 404 error to the requests for content identified as illegal by the Internet Watch Foundation. Even when the user tries to access the Government censored websites, a fake 404 error will be returned.

404 Error Percentages

A sample web trends’ summary report by ARCHIVI shows the client error details for 404 Page.

Client Errors
Error	Hits	% of Failed Hits
000 Incomplete / Undefined	29,164	69.62%
404 Page or File Not Found	12,651	30.2%
400 Bad Request	57	0.13%
18745 Incomplete / Undefined	5	0.01%
18747 Incomplete / Undefined	4	0%
401 Unauthorized Access	4	0%
Total	41,885	100%

Although the web statistics generally vary from month to month, based on the strategy used to eliminate 404 errors, and how active the website is, the percentage of 404 errors varies. Most active websites that have frequently changed or added content generally experience a higher number of Page Not Found errors. But there are many large and busy sites that achieve zero percent 404 errors over a period. On an average, around 7% of visits to any given web site will result in a 404 error page.

Tracking and Preventing 404 Errors

Log Files - Web Server log files help in tracking the 404 errors. These standard log files are just ASCII text files that have each HTTP protocol transaction, whether completed or not, recorded in them. Most of the HTTP errors are recorded in the transfer log and the error log files. If you have access to the log files of your website, you can observe the HTTP status code field. This field gives you an idea about the occurrence of 404 errors, their frequencies, consistencies, and also the referred document that led to the errors. Also find out the existence of any broken link on your site and the misspelled URL that led to the error. When you know all these information, you can easily correct the link and prevent 404 errors on your website.
Redirects – If you find a page that is consistently getting a 404 error, you can create a redirect page using the .htaccess file that automatically takes the users from an older page to its newer replacement. You can use Permanent and Temporary Redirects to "catch" old referrals from other sites and send the visitors to their intended information.
Robots File - If you have a section of your site with pages that are frequently changed, you can block the search engines from indexing them in their databases using robots.txt file so that you can prevent 404 errors.

The Explorer Guide

Pages

Monday, November 21, 2011

HTTP Status Code