One of the first steps to be taken when an organisation is planning a website is to decide the underlying structure. It must be flexible and organised in such a way as to make the day-to-day housekeeping and long-term maintenance straightforward and efficient.
Use each checklist to ensure that your web pages comply with these guidelines.
2.1.1 Checklist and summary: Core guidance
Checklist
Summary
A website should make effective use of a hierarchical directory structure to separate and organise the files contained within it.
Some departmental websites will only contain a small number of files and will require only a simple structure. Others will be large and will therefore require a more sophisticated structure, which will need detailed advance planning and to be supported by appropriate management processes.
However large or small, each website will still have to adhere to certain conventions to ensure that all information is as accessible as possible for the duration of its lifespan.
2.1.2 How the Web fits into the Internet
The Internet fundamentally comprises a huge number of interconnected local- and wide-area networks (LANs and WANs). A basic level of intercommunication between all computers attached to the Internet is guaranteed because all of them use the TCP/IP (Transmission Control Protocol/Internet Protocol) protocol suite (set of signalling rules). Computers running TCP/IP and that have a connection to the Internet are referred to as ‘Internet hosts’ (or simply ‘hosts’). A host’s connection to the Internet may be permanent (for example via an Ethernet LAN connection) or intermittent (typically via a telephone modem dial-up connection).
TCP/IP provides the basic level of communication between hosts necessary for the implementation of ‘high-level’ protocols (sometimes referred to as ‘application protocols’). Examples of high-level protocols are email, file transfer protocol (FTP) and the Hypertext Transport Protocol (HTTP) that is used to request and deliver World Wide Web (‘web’) pages and other content.
Many Internet high-level protocols use a software engineering technique named client/server programming (or simply client/server). Client software running on one computer issues a request to matching server software (typically running on another computer) for it to perform a service such as conducting a database transaction, or sending a web page. Client software that makes server requests directly at the behest of a human computer user and that displays the results of the requests to the user is known as a ‘user agent’. In the case of the Web, user agent software is usually referred to as a ‘browser’.
Internet Service Providers (ISPs)
Organisations that provide connections to the Internet network infrastructure for their customers are known as Internet Service Providers (ISPs). ISPs that rent out capacity on servers attached to or close to the Internet’s high-speed backbone networks are termed ‘hosting providers’. Hosting providers that rent out computer room space with backbone class connections in which customers can set up their own servers are sometimes known as ‘co-location’ providers.
Internet hosts running HTTP server software are now normally referred to as ‘webservers’. The term World Wide Web refers to the huge mesh of information stored on the millions of webservers around the world. Web browsers allow access to this information and enable users to ‘navigate’ the hyperlinks between different sections of it regardless of whether the destination of a link is on the same server as its source or on another webserver anywhere else in the world.
Specifying Internet resources: Uniform Resource Locator (URL)
Each resource (for example a web page, image file, PDF file or animation file) on the web is identified by its network ‘address’ called its Uniform Resource Locator (URL). A web client fetches a resource by issuing an HTTP request for its URL to the webserver on which the resource is stored.
An example of a URL is:
http://www.ukonline.gov.uk/explanation/example.htm
URLs comprise three parts:
Go to section 1.9 Domain name registration
Pages of text intended for display in web browsers are each stored in a separate file on the webserver. (Note that a ‘web page’ may actually extend to more than a single page when printed out on paper.) Web pages are ‘marked-up’ with instructions to the browser telling it, for example, how to organise the text for display, where in the page image resources are to be inserted, and which page elements are to act as ‘hyperlinks’ to other web pages. The markup language used in web pages is named HyperText Markup Language (HTML).
The time required to fetch a web page across the Internet is an important consideration. A web page that seems to display instantaneously over a corporate high-speed Internet link may take a frustratingly long time to load over a low-speed modem dial-up connection. A slow-loading page is likely to interrupt a user’s train of thought and if the document does not immediately confirm their reason for having accessed it in the first place, it will be glanced at and ignored. It is most important to consider how documents should be structured for web publication in order to have it display as quickly as possible over the speed of Internet connections typically available to the intended audience.
2.1.2.1 An explanation of the Internet Domain Name Service
Each computer attached to the Internet is assigned an 'IP (Internet Protocol) number’ (sometimes referred to as an IP address). An IP number is a string of four numbers separated by full points, for example, 62.43.124.67. (Each number is always in the range 0 to 255.) Computers such as web servers and email relays that are permanently connected to the Internet have a permanent or 'static' IP number. Computers that are intermittently attached to the Internet, for example by way of dial-up telephone connections, are typically assigned an IP number for the duration of their connection from a pool of numbers managed by their ISP.
An IP number can be imagined to serve an analogous purpose to that of a conventional telephone number: in order for a web browser to connect to a web server, the browser must know the IP number of the server it needs to reach.
The Internet Domain Name Service (DNS) is a solution to the problem that IP numbers are not easily memorable. The DNS associates names (DNS names) such as www.e-envoy.gov.uk, www.bbc.co.uk and the like, with corresponding IP numbers. Following the analogy with the telephone system, the DNS may be thought of as providing a roughly equivalent service to directory enquiries. When given a DNS name, the Domain Name Service will reply with the corresponding IP number.
The Internet DNS is a complex system. It is replicated, meaning there are many computers on the Internet that provide a domain name service (nameservers or DNS servers). The DNS is also distributed; meaning that not every nameserver knows the IP number corresponding to every DNS name that has been issued. However, every nameserver does know to which other nameservers it should refer on lookup requests that it cannot directly answer itself.
In order for computers attached to the Internet to be able to consult the DNS, they must first be instructed how to contact the DNS. (The telephone system analogy is that you must first know directory enquiries' telephone number before you can get though to have them look up other peoples' numbers for you.) The IP number of one or more (topologically) local nameservers usually has to be supplied as a part of the initial set-up details for a computer that is to be connected to the Internet.
When a web address (URL) is typed into a web browser, the browser has first to contact and consult its local DNS server(s) to convert the DNS name portion of the URL into an IP number to which it can subsequently send the web page request. Issuing a request for a web page is therefore normally a two-stage process on the part of the browser: determining the IP number corresponding to the DNS name component of the URL, followed by sending the URL request to the web server's IP number. Both stages have to work properly in order for web pages and other content to be fetched across the Internet. The progress of the different stages is usually reported in the status bar at the bottom left of Web browser windows. Whenever a web page fails to load, checking the progress messages in the status bar will usually reveal whether the problem is with the nameservice, or whether it lies elsewhere.
2.1.3 Basic options for website structure
The design of a website covers many different areas. Decisions need to be made on how it is going to look; how a user should navigate through the information contained within it and what content should (and should not) be published on it.
As important as all of these areas are, the underlying organisation of the HTML files and other resources that comprise the website should be considered early in the project. This is neither a difficult concept nor a specifically technical issue so there is no real requirement for decision-makers to have detailed understanding of the way webservers or the Internet work.
A website has to be stored within the directory and file structuring system (the ‘filesystem’) provided by the webserver computer. There are two basic options. One is to construct the website in a flat (linear) fashion. The other is to use the more structured hierarchical approach. Whichever method is employed, careful consideration needs to be given to file names and their relationships to each other.
Whichever arrangement is chosen at the beginning will, in all probability, be the one you will have to manage during the entire lifespan of the project. It is difficult to change at a later date.
2.1.3.1 Flat (linear) construction
This method of organising files requires little initial planning. A root directory is named (for example ‘root’) and every file is placed within this one directory. All files - HTML, PDF, text, GIF and JPEG - sit side-by- side, at the same level in the filesystem hierarchy.
This is the simplest arrangement, although it is likely to turn out to be the least effective way of organising information in the long run if the website grows beyond the smallest of sizes.
In a flat construction scheme, linking between files is a simple matter of specifying the name of the target file. The Web manager always knows where to find a file as they are all contained in the same directory.
The important issue when using this form of file management is to ensure that all file names are descriptive and meaningful. When a website contains only 50 documents and 30 graphic files, finding the correct file for editing is relatively easy, but once the website has grown and contains hundreds of files the process becomes very much more difficult.
Flat (Linear) Construction Diagram
2.1.3.1 Hierarchical construction
This is by far the preferred option for the majority of websites. It is flexible, expandable and easier to manage on a day-to-day basis.
Hierarchical organisation uses a number of sub-directories stored at the next level down from the website’s root directory. Connected files, grouped either by their file type (for example GIF, PDF) or by their relevance to each other (eg business plan 2001), are stored together in their own directory.
The diagram is an example of a simple architecture for a website. The root file (index.htm) is the homepage that is automatically served when the user requests a web URL containing only the website’s DNS name.
The files listed underneath the homepage’s title are also stored in the root directory but are accessed by links from other pages.
In the example, other directories have been set up to contain related files. Within the library directory, there are a number of sub-directories for HTML, PDF, RTF and plain-text documents.
All images for the website have been grouped together. Within this directory there could be many sub-directories for specific areas within the website.
It is a good idea to establish a section of the directory structure as a central images repository to ensure that images used through the website are only saved once. Such images could include the organisation’s logo graphic, navigational button images, and so on.
These guidelines recommend that government websites should use Cascading Style Sheets (CSS) to control aspects of the graphical and typographical design specifications for the website. Storing CSS files in their own directory ensures that access to them can be managed and that they cannot be mistakenly amended or deleted.
Hierarchical Construction Diagram
2.1.4 Do not change names when moving files
Whenever a website is redesigned there is invariably a wish to reorganise and reconstruct the existing files into a structure that better reflects the department.
File names and their placement within the website architecture, should, where at all possible, not be changed, for a number of reasons:
External links
Over time, an increasing number of external websites will link to certain pages controlled by a department. Web managers will rarely be aware of how many links out on the web depend on pages under their control because external organisations may well not have asked for permission to establish a link in. If a page is peremptorily relocated it will break links in from other websites.
Search facilities
Search facilities use unattended ‘robot’ applications to visit websites in order to compile their catalogue entries for them. These automated visits to a department’s website are likely to be sporadic, as there are many websites to scan. Any change that is made between visits will not be reflected in their search service, which can result in potential visitors getting ‘Error 404’, document not found’ messages on their first visit to a department’s web presence.
Personal bookmarks
Over time users will have bookmarked pages for personal use. Suddenly finding that a useful resource has disappeared can have a negative effect on the perception of your department’s web presence.
Any file or collection of files that has to be relocated must have redirects put in place to ensure that previous visitors who have bookmarked the old location will still be able to access the files in their new location. Departments should not use metadata redirects for this purpose: as discussed in section 4.2.5, some web browsers do not support this feature and this will result in users being unable to access the required information.
Go to 4.2 HTML Pages
Multiple server redirects can be very difficult to maintain over a long period of time. If server redirects are employed, a short message should be available to visitors indicated that the information now has a new address and that they should replace any existing bookmark with the new URL.
Go to Annex K Redirect Page