The evaluation of your website must not be overlooked. You need to judge the effectiveness of the website’s content, design, navigation and underlying technology. There are many ways to do this but for many web managers, the most important will be access statistics derived from the web server logs. These can help measure the size of the audience and the patterns of how they use the website. They can reveal if the website is reliably delivering pages. However, it is important to understand the limitations of these statistics.
1.4.1 Evaluation and website metrics
Is the web strategy working? Does the navigation get people to the information they need? Is the server reliable? Measuring audience satisfaction, looking at feedback, understanding access statistics without measures such as these you will not be able to demonstrate value for money, or that you are meeting the needs of users and the aims of management. Therefore, regular (quarterly will be sufficient), formal evaluation exercises of both the content and the technology are strongly recommended.
Evaluation of website design and content can be carried out by drawing on:
The effectiveness of the website can also be judged by measuring achievement in other ways. For example, one recruitment website was evaluated on:
If the
Some examples of other relevant metrics that can be identified from web server logs are:
For definitions of these terms, see section 1.4.7 Understanding the terminology.
Additional useful information can include:
This information can be used to do such things as:
It is, in addition, recommended that web teams should:
The web strategy and management team should ensure, at the procurement stage, that ISPs/hosting services are offering to provide a full range of server log information.
It is acceptable to use HTTP cookies or session identities to track visitors' paths through the website (and this will be essential in e-transactional sites). The website should contain a clear statement of policy on the use of cookies.
Good practice dictates that the need for attention to the accuracy and timeliness of information will increase as the level of activity of a site increases.
Web managers should, in the interests of open government, consider publishing a summary of usage statistics on their websites
Section 1.4.2 contains further details on understanding usage statistics.
1.4.2 Understanding user statistics
Website usage statistics are generally obtained by analysing the server logs. A typical HTTP server log contains in a log entry for each HTTP request (or hit) on the server. This entry will contain information about the web resource requested and the browser to which it was served. Software can be used to analyse and process these log files and provide a picture of the traffic to the website. Typically, in addition to the information in section 1.4.1, this will include information such as:
This analysis will also indicate:
There is a wide range of software available for processing and analysing the potentially huge amount of raw data contained in web server logs. This ranges from the commercially available Webtrends product family through to ‘shareware’ packages such as Wusage and free software like Analog.
1.4.3 Using a server log file
A standard HTTP server log entry may look like this:
193.63.182.194 [03/March/2001:11:30:35]
‘GET/webguidelines/index.htm HTTP/1.0’ 200 35000
What this means:
Depending upon the logging capabilities of the web server software and how the web server logging has been configured, web server logs may contain a large amount of additional information such as:
Annex 1 Common HTTP server status codes
1.4.4 Advanced techniques
Log files can be further analysed through advanced techniques. For example:
See section 1.4.5 Not the full picture!
Other website server software may also keep logs that can provide useful insights to the way visitors use your website. For example, it may be possible to configure search facility software to record the search terms that visitors have used when they are attempting find information on your website. This information can be useful when considering whether there are areas of the site that are not easy to find and can help with organising navigation. It also may indicate what other information users are expecting to be on the website, which would be of use when considering whether additional content should be included on your website.
1.4.5 Not the full picture!
You should be aware that there are limitations to the information that can be discovered from the analysis of Web server log files .The principal issues are:
All of these issues mean that there have to be reservations concerning the reliability of estimates derived from standard web server logs of the number of users of a website or of their browsing behaviour when they visit a website. The Internet advertising industry develops and promotes standard website traffic metrics and methodologies for calculating them. It is recognised that the measurements are flawed for the reasons outlined above, however, it is believed that the metrics provide the basis for comparing one website’s usage with another on the basis that these issues will affect all websites to broadly the same extent. There is, however, no sound basis for this belief.
The Joint Industry Committee for Web Standards in the UK and Ireland
JICWEBS is the body created by the UK and Ireland media industry whose aim is to ensure independent development and ownership of standards for measuring use and effectiveness of advertising on electronic media.
The International Federation of Audit Bureaux of Circulations
The IFABC Web Standards Committee promotes similar aims on a worldwide basis.
1.4.5.1 User agent masquerading
The term ‘user agent masquerading’ refers to browsers that transmit an incorrect browser identification string in the requests that they send to servers. Some browsers just do not properly identify themselves and are therefore not being identified in server log file records. Deliberate masquerading is also used for a number of reasons:
1.4.6 Downstream caching and pixel tagging
Copies of Web pages served to browsers are often 'captured' by content caching systems. 'Downstream' caching systems are typically operated by third parties such as the ISPs and other organisations through whose networks the pages travel on their route to users' computers. These caching systems are able to serve pages of which they hold copies in response to subsequent requests for them without reference to the origin server.
From an Internet-wide perspective caching content downstream close to the browsers is a good thing: serving content to topologically nearby browsers is quicker and consumes less network resource than transmitting it from the origin servers. It also reduces the load on the origin servers.
In order to have a website inter-operate properly with downstream caches (for example, to avoid out-of-date pages being served to users), it is important that appropriate cache control directives are included in the HTTP headers of the content that it serves. Getting this right normally involves having your server administrator configure the web server software appropriately. Note that it is not appropriate to attempt to control downstream caches by using <meta http-equiv ...> HTML mark up elements because the special purpose appliances typically used for caching only act upon HTTP directives in the content headers.
There is an important consideration with regard to website traffic measurement arising from the increasing deployment of downstream caches on the Internet. Typically, there will be no record of pages served from downstream caches in your traffic log. As downstream caches are increasingly deployed on the Internet, standard origin web server logs tend to underestimate the number of your pages that have actually been viewed by users.
1.4.6.1 The pixel tag approach
One way of achieving a more accurate page view counts in origin web server logs is to ensure that every page contains a content element whose HTTP headers mark it as non-cacheable. This can be achieved by including a tiny transparent image referred to as a pixel tag in each HTML page. This pixel tag is typically served from a directory the contents of which the web server has been configured to serve out with HTTP headers marking the content as non-cacheable.In a pixel-tagging regime, page impressions served (including those served from downstream caches) can be estimated by counting the number of pixel tags served. If more detailed information is required about which pages have been served, then all or a part of the page's own URL can be included as a query string on the end of the pixel tag.
1.4.6.2 Examples of pixel tagging
A basic pixel tag could be generated by including the following image element in HTML pages (conventionally just before the closing tag):
<img src=“/nocache/trans.gif” width=“1” height=“1”>
In this example, the directory named 'nocache' resides at the root of the web server. The web server would be configured to include HTTP headers marking any files served out of the 'nocache' directory as non-cacheable. The file named 'trans.gif' would be a one pixel square transparent GIF image.
If it is required to track actual pages visited by users. In this case, the pixel tag for example, in the file at:
http://www.e-envoy.gov.uk/insideoee/index.shtml, would be:
<img src=“/nocache/trans.gif?insideoee/index.shtml” width=“1” height=“1”>
1.4.7 Understanding the terminology
browser - is the web browser (also known as ‘user agent’) used by a visitor (client) to access your website.
bytes transferred - the number of bytes transferred to the client’s browser as a result of the request.
entry resource - the first web page viewed as part of a visit to your website.
exit resource - the last web page viewed as part of a visit to your website.
hit (or request) - a browser request for any one web resource (page element), for example a web page or a graphic. A web page containing two graphics will take three hits to display that web page in a client’s browser.
hits per visit - the number of hits occurring in a given visit to your website.
page impressions - a file or a combination of files sent to a user as a result of that user’s request being received by the server. For example, one web page that contains three frames and 2 graphic files will generate one page view but 5 hits. Also known as ‘page requests ’, ‘page views’ or ‘page accesses’. Where service providers, search engines or other organisations cache content, page impressions served from these caches may not be recorded on the originating website.
page view per visit - the number of page accesses occurring in a given visit to your website.
platform - the operating system used by the visitor to your website, eg, Windows ME
session - [industry-standard definition] A series of page impressions served in an unbroken sequence from within the website to the same user. A session begins when a user connects to a website, continues while page impressions are served in a continuous sequence from within the website, and ends when the user leaves the website.
user - this is defined as the combination of an IP address and an ‘heuristic’. The user agent string is usually employed as the ‘heuristic’. Because of the use of dynamic IP number assignment, NAT, PAT, perimeter cacheing and dynamic proxying this definition may overstate or understate the real number of users visiting a website. Alternatively, websites may use cookies and/or registration Ids as the basis for identifying user numbers. Often also referred to as ‘unique user’.
unique user duration - [industry-standard definition] The total time in seconds for all visits of two or more page impressions, divided by the number of unique users making such visits. In order to measure user duration, a first and last page impression record must exist for each visit. Therefore, users making visits of only one page are excluded, since no interval can be established. This metric is sometimes referred to as ‘website stickiness’.
user agent - the browser and platform used by a visitor when accessing your website.
visit - [industry-standard definition] a series of one or more page impressions served to one user, which ends when there is a gap of 30 minutes or more between successive page impressions for that user.
visit duration - [industry-standard definition] the total time in seconds for all visits of two or more page impressions divided by the total number of visits of two or more page impressions.
1.4.8 Graphical example of traffic analysis
The following bar graphs summarise the level of page requests over a seven-day period and separately the level of traffic represented hourly over the 24-hour period: