When an organisation’s website is planned, the basic levels of file storage and naming conventions have to be considered alongside the design and navigational elements. Planning for the long-term from the outset will ensure that the file organisation and naming scheme does not hinder the development and expansion of a site and will help authors and users alike.
Whatever decisions are made, full documentation of the site structure, colours, standards of document construction, templates and reasoned arguments for these preferences should be prepared.
Use each checklist to ensure that your web pages comply with these guidelines
3.1.1 Checklist and summary: Core guidance
Checklist:
There should be no other form of punctuation in a manually generated URL file path
Summary:
Web managers need to be aware of the different operating systems used for storing and serving information on the web. Each of these systems has different requirements for internal linking and file naming.
3.1.2 Introduction
Web managers must be aware of any specific characteristics of the web server on which their website is hosted that may affect the way their website works.
A number of points should be considered in order to ensure that a website could be served from any web server system. These measures will:
3.1.3 Web servers
Of all the many different operating systems used for web servers the two most widely used are the Microsoft Windows NT and UNIX families.
There are a number of differences between the two technologies that need to be understood.
3.1.3.1 Microsoft Windows NT Server and files
The current generation of the Windows NT Server operating system family (version 5) is named Windows 2000 Server, although there are three different variants of it available. It is most common to use Microsoft’s own Internet Information Server (IIS) as the web server software on Windows NT 4 (IIS4) and Windows 2000 (IIS5). However, a range of third-party alternative web server software is also available.
The Microsoft operating systems allow flexibility in file naming conventions. For example, HTML file names can be of mixed case (for example HomePage.htm) and can include spaces (for example Home Page.htm).
A principle feature of IIS is that administrators can use ASP (Active Server Pages) to dynamically construct and serve web pages and the high degree of integration with many other Microsoft software products.
3.1.3.2 UNIX and files
The name UNIX is not an acronym: it doesn’t actually stand for anything (in fact it’s an obscure joke about the name of an earlier operating system named Multics). The name refers to a large number of closely related operating systems that have been developed since 1969.
The development history of UNIX is extraordinarily convoluted, but the name UNIX is currently owned by the Open Group to which many companies that supply UNIX-related operating systems belong.
Specific implementations of UNIX typically have their own product name. UNIX operating systems are supplied, for example, by Sun Microsystems (Solaris), Silicon Graphics (IRIX) and IBM (AIX), typically for their larger, more powerful computers.
Linux is a UNIX derivative that is available for free. However, commercial versions of Linux can also be purchased that contain proprietary additional features.
A variety of Web server software products are available for UNIX operating systems. The one that is probably most widely used is Apache, which is developed by the Apache Software Foundation.
At the time of writing, Apache is the most widely used web server software on the Internet
3.1.3.3 UNIX v Windows NT - filenaming considerations
It is not the role of these guidelines to suggest which operating system would be best for an organisation’s web hosting requirements, but certain elements of each operating system’s filesystems are quite different.
Case sensitivity
The Windows NT filesystem is for practical purposes case-insensitive but UNIX filesystems are case-sensitive.
A file called HomePage.htm on a Windows NT system will be accessed whether the reference to it is homepage.htm, HOMEPAGE.HTM or HOMepaGE.hTM. All three references in the example would be to the same file.
The same example would work quite differently on a UNIX system. If HomePage.htm is required then HomePage.htm is the reference that must be used. Homepage.htm, HOMEPAGE.HTM and HOMepaGE.hTM would all be different files in UNIX filesystems.
It is for this reason it is recommended that in general lower case should be used for an organisation’s website filenames regardless of which system they are stored on. All hypertext links and references to images and downloadable files within HTML files should all also be in lower-case. Following this recommendation will help ensure that website content can easily be moved between Windows and UNIX operating systems. This will, for example, facilitate the use of Windows PCs for the development of website content that will be served to the Internet from UNIX systems.
There are add-on components available for popular UNIX web server software that can eliminate the problems for users that can arise from having a case-sensitive web server filesystem that propagates through to case-sensitive file paths in URLs. For example, the Apache web server ‘mod_spelling’ module and others can be used to effectively produce case-insensitive URL paths in websites served from UNIX systems. When this kind of technology is deployed, the general recommendation in the preceding paragraph may not be appropriate. The important point is to devise a live service, development and test environment combination that results in a website that is easy for visitors to use, straightforward for web managers to develop and test and is resilient to future changes in the underlying system technology. It is recommended that the server administrators should be involved from the earliest stages in the design of a website’s operational and management regime.
File name length
Both Windows NT and UNIX allow file names of up to 256 characters so these names can be as descriptive as is required. Filenames should be kept as short as possible but remaining consistent with the recommendation that they should be descriptive.
Spaces
Although both operating systems will allow spaces within the file names, URL file paths containing them are unwieldy in many web browsers. For this reason use the hyphen or underscore character, eg consultation-2001.htm, to break up file names. No other character should be used for this purpose.
File extensions
In principle, HTML pages can be saved and served with any filename extension. However, the web server must have been configured to serve files whose extensions indicate to it that they contain HTML with an ‘internal label’ that indicates to the browser that it is receiving an HMTL file and should render it as such. [Technically, the server has to be configured to serve files that have specific filename extensions as MIME type ‘text/html’.]
In practice HTML pages are conventionally saved using either the .htm or the .html extension. However, server-side scripting and programming systems often bring their own conventions for filename extensions, for example .php for PHP scripted pages and .asp for Active Server Pages. For web content authors and editors it is important to establish a standard and adhere to it. The use of multiple extensions for a specific type of file will only cause confusion when building hyperlinks within your site.
Further information on access standards and common file extentions (Annex H)