Technical issues underlying the website
The technical challenges in producing this website are much greater than one might imagine and it has taken us many hours to find practical solutions to these challenges. As anyone wishing to develop a website which contains lots of graphs will come across the same issues, we thought it would be useful to set out what the major challenges have been and how we have addressed them. In every case, our solutions are effective in that the end result is as desired and the process can be largely automated. That is not to say, however, that our solutions are always the best ones available and we would be delighted if anyone could tell us about any better ones.
All the graphs on this website are created in Excel.
Some of the data used for the graphs arrives in formats which can be directly read by Excel, for example .csv files or html pages. In these cases, no software other than Excel is used.
Some other data arrives in pdf documents. For these to be read by Excel, they need to be converted into an electronically editable format. This is done using optical character reading software (Finereader).
However, most of the data, including all the major datasets, arrives in SPSS format. In these cases, the analysis starts in SPSS and ends in Excel, with choices about how far into the process the data is transferred from SPSS into Excel. For good or ill, the current approach is to try and get the data into Excel as early as possible in the process, largely because the Excel spreadsheets contain all the intermediate calculations and thus provide an complete audit trail in a way that SPSS does not. For all bar the Labour Force Survey, Annual Population Survey and English National Pupil Database (for all of which the number of records in each dataset precludes such an approach), this effectively means that the use of SPSS is limited to exporting the relevant fields from the dataset into Excel. For the three datasets listed above, however, the basic analysis is undertaken using SPSS syntaxes, with the results then copy-pasted into Excel for further processing.
Producing a graph on your PC or for a report is easy: you simply create it in Excel (or whatever) and, if needed, copy and paste it into your software of choice (e.g. Word). For reasons that are not easy to understand, however, producing a graph for a web page is much more difficult (and this is probably why there are so few high quality graphs on the Internet). The basic problem that that a web page requires the graph to be a separate file in a graphics format (gif, png, etc) but all the features in Excel to produce such files end up with completely mangled results.
One piece of software which does easily and successfully extract graphs from Excel is Acrobat Writer, where the end result is a .pdf file. This .pdf file can then be embedded into a web page and the end result is as one would hope and expect. Furthermore, .pdf files are dynamically resizeable so the width of the web page can be set to 100% of the user's window and the graphs will automatically resize appropriately. Finally, .pdf files are small so the web page loads relatively quickly. For all these reasons, this website originally had all its graphs as .pdf files.
However, and this is a big 'however', embedding .pdf files in web pages is viewed as non-standard by the cognescenti. As a result, whilst the embedding of .pdf files works fine in a Microsoft-only world (Frontpage, Internet Explorer, etc), it is much more problematic when other software (e.g. Firefox or other browsers) is used to view the results.
Rather, the only standard way of including graphics in web pages is as .gif or .png files. Of these, .png files are preferable for our purposes as they are quite a lot smaller (so the web page loads faster). The challenge, therefore, is how to convert a .pdf file to a .png file. Furthermore, because .png (and .gif) files are not resizeable without severe degradation in quality, they have to have a fixed size and the conversion process must allow this size to be chosen. Finally, because many graphs have to be converted, the conversion process has to be done in batch mode.
We have never found any software which will do the necessary conversion in a single step. Acrobat Writer will do the conversion but does not allow the size of the .png file to be chosen. Photoshop will also do the conversion but mangles the results. In the end, our chosen solution is to use the 'save for web' option in Photoshop, which has all the required features except that the end result has to be a .gif file rather than a .png file. We then use some software called Irfanview which converts all the .gifs to .pngs in batch mode.
In passing, note that:
- Because .png (and .gif) files have to have a fixed size on the web pages, the web pages themselves have to have a fixed width. The current width on the website is 1024 pixels, chosen because it is great enough for the graphs to be easily readable and small enough to fit on the screens of the vast majority of users.
- Some users requested larger versions of the graphs so that they could incorporate them into their presentations, reports, etc. Somewhat mysteriously, .png files can be resized downwards without only slight degradation if this is done by copy-pasting the file into Word, Powerpoint, etc. Presumably, this is because the copy-pasting process automatically converts the file to some other format, such as bitmap. On all the web pages on this website, links are provided to high resolution versions of all these graphs, where these versions are 2015 pixels wide (rather than the 725 pixels for the versions that actually appear on the web pages) as this is the size automatically produced by the Acrobat Writer exporting process.
- Before converting to .gif, the .pdf files have to be cropped to remove the surrounding white space (and associated hieroglyphics). This is done in batch mode using Acrobat Writer.
After trying out all sorts of different approaches, the process of graph production currently used for this website is as follows:
- Produce the graph in Excel.
- Export to a .pdf file using Acrobat Writer.
- Crop the .pdf file using Acrobat Writer (a batch process).
- Convert the .pdf file to a .gif file using the 'save for web' option in Photoshop (a batch process).
- Convert the .gif file to a .png file using Irfanview (a batch process).
- Convert the .pdf file to a large .png file using Acrobat Writer (a batch process).
Important note: the interactive maps are not currently available. For reasons that I don't understand, all the relevant IT companies now seem to view lack of support for SVG standards as a feature rather than as a problem. In reaction, the only way forward is for me to totally re-develop the maps using either Google maps or Flash but this will be a major task and will therefore not happen in the immediate future. If there is anyone out there who has experience in converting SVG maps to Google maps or Flash and who would potentially like to help, please contact me.
As with graphs, maps are easy to produce on a PC but much more difficult to include on a web page.
For a simple map graphic, the main challenge is how to ensure that the text (e.g. local authority names) is easily readable. Simply producing the map plus text in a geographic information system (GIS) and copy-pasting the result into a graphic package does not work because the text becomes blurred when the map is re-sized. Rather, our chosen solution has been to write templates in Word which contain all the relevant text in the right places, then copy-paste the map from the GIS into this template, and then export the results from Word into a .pdf file before processing the .pdf files using the same method as for the graphs. Whilst this approach involves quite a lot of effort whenever a new type of map is needed, largely because the placement of the text takes quite a long time, it makes the production of a map using an existing template relatively painless.
However, our experience of using maps is that they are often most effective when one can flip between maps for different subjects to see how their patterns differ and this requires some interactivity so that the user can switch from one map to another. Welcome to the weird world of scalable vector graphics (SVGs).
Use of SVG has a number of attractive features in theory, including:
- The user can zoom in to particular parts of the map without any degradation in quality (vectors can be viewed at any resolution To see this in action, simply open a .wmf file in your graphics package of choice and zoom in.).
- The user can click parts of the map and cause things to happen (for example, on this website, clicking a local authority causes a page to be created and shown listing a variety of statistics for that authority).
- The maps can easily be updated (because it only requires the data in a simple parameter text file to be changed).
- Additional maps can be added with no impact on download times (because they all use the same boundary file).
However, and (as always) it is a big 'however', it seems that some of the features above can currently only be implemented for people who are using Internet Explorer as their browser. This is because they require interaction between the web page (where the user chooses which map to view) and the SVG file (which is where the map vector and its associated XMLS is stored) and because (for some reason) such interactions are not viewed as standard by the cognescenti, they are not supported by other browsers such as Firefox. This is a typical dilemma faced by website developers: abide by open standards throughout, accepting the limitations and practical difficulties that this can cause, or incorporate Microsoft-specific solutions where these have major practical benefits. In the case of interactive maps, this dilemma is at its most stark: adopt the Microsoft approach or abandon having interactive maps altogether.
The end result is that the interactive maps on this website will only work properly if viewed using Internet Explorer combined with the free Adobe SVG viewer plugin, version 3.03. If viewed in other browsers, only a blank map will be displayed, if that.
In passing, note that:
- As part of the 2001 Census, all the relevant geographic boundaries were published by the Office for National Statistics (ONS) in vector form. These vectors are, however, very detailed and result in files which are much too large for their inclusion on a web page. We therefore had to 'thin them' by removing most of the nodes, the result being much smaller files which are nevertheless virtually indistinguishable to the naked eye compared with the originals.
The process by which simple map graphics are produced for this website is as follows:
- Produce the map in a geographic information system, with no text included.
- Produce a template in Word which contains all the text.
- Copy-paste the map into the Word template.
- Export the resulting Word document to a .pdf file using Acrobat Writer.
- Convert to the various formats using the same methods as for graphs.
The process by which more complex interactive maps are produced is as follows:
- Produce the basic map (no colouring) using SVG and XML.
- Maintain the data used to colour the maps and to set the legends in a simple text file.
- Tell the user that they can only use the maps with Internet Explorer combined with the Adobe SVG viewer plugin, version 3.03.
Like all large websites, this website uses cascading style sheets to control all the formatting.
The key challenges here are twofold. First, given the large number of pages on the website, the indicator selection menu has to be maintained as a separate entity which can then be automatically incorporated into the relevant web pages as the alternative (hard-coded copies of the menu on each page) would be a maintenance nightmare. Second, given the large total size of the indicator selection menu (around 200Kb), its incorporation has to be selective - incorporating only those parts that are relevant to the user given where they are and how they got there - as its complete incorporation on every page would massively slow down response times.
One obvious way of incorporating the menu into the web pages is via the use of frames. In such a scenario, the menu is always shown in the left hand frame whilst the various web pages are shown in the right hand frame. A big advantage of this approach is that the menu is only downloaded once by the user no matter how many pages they view and so their response times after the first page are quick. It is also very easy to implement. For both of these reasons, this website originally had all its web pages presented in frames.
However, and this is a big 'however', the use of frames screws up google etc searches. More specifically, a typical search will result in the right hand frame only being selected but, when viewed by the user, they will not then be able to navigate to other pages as there will be no menu. The search will only find the right hand frame as this is where all the search terms will be. By contrast, the total page comprising both frames will not be found as it actually contains no search terms, being simply links to the two frames.
Rather, the standard way of including menus in web pages is via 'server side includes' (SSIs). In such a scenario, the menu is maintained as a separate file but there is a command in the web page which causes the web server to merge the menu into the web page before downloading it to the user. This works fine but it has a major disadvantage, namely that the resulting page cannot be viewed on a standard PC prior to its uploading, as the merging can only be done by a web server. In effect, those maintaining the website simply have to upload it blind to the Internet, assuming that they have not made any errors. In the real world, this is effectively a showstopper when large websites are being maintained by small organisations. It is less of an issue for large organisations because they can easily maintain their own web servers.
The Microsoft way of getting round this problem is to use 'Frontpage includes'. In such a scenario, the menu is again maintained as a separate file but Frontpage automatically merges the menu into each web page before it is uploaded to the Internet. There are some disadvantages to this approach but none are major issues for this website; for example, it requires the use of Frontpage to maintain the website (not an issue for us as we do anyway) and it requires that all the web pages be uploaded if the menu changes (not a big issue for us as the menus only change rarely). Note that 'server side includes' do not require such uploading as the menu is incorporated into the web page dynamically at the time that user views the page. Most of the menus on this website are 'Frontpage includes'.
However, as the website grew and the menus became consequently larger, another problem emerged. This is that the total size of the indicator menu was such that its inclusion on all the web pages was significantly slowing down response times. The reason for this is that, for technical reasons, the menu could not be cached on users' PCs and thus had to be downloaded in its entirety every time a new page was viewed. This problem exists for 'server side includes' as well as 'Frontpage includes'. Our response was to break up the menu into a number of separate components (e.g. one for each country).
This solution was ok for most of the website but was still resulting in slowish response times for the UK section. The reasons for this are slightly complicated but also somewhat interesting in that they show how technical issues and website design can interact. In the UK section, the various indicator web pages can be selected from a variety of places namely 'by age group', 'by subject', 'by geography', 'by disability related', 'by gender' and 'by ethnic group'. Because, by default, any selected web page does not 'know' what route the user used to get to it but has to incorporate that part of the menu via which the user actually did get to it, the whole of the UK section had to have a single menu which could not be broken up. Putting this another way: even though only part of the menu is actually required in any given case - i.e. that part by which the user actually got to the web page - each page was by definition having to include all those parts of the menu by which the user could potentially have got to the page. The reason that this was slowing down access to the UK section more than the other sections is that it has more possible routes to a web page and thus the resulting menu was bigger.
Technically, there is much more that one could say here but the discussion above is already over long. If you want to know more, email us.
Finally, to give some idea of scale regarding the issues above, the bullets below summarise the average download size per page within the UK section assuming an average page size, including graphs but without menus, of 70Kb:
- Frames: 70 + 200/n, where n is the number of pages visited, so if n=10 then this totals 90Kb.
- Single menu and either 'server side includes' or 'Frontpage includes': 70 + 200 = 270Kb.
- As above but with separate menus for each country: 70 + 40 = 110Kb.
|Data analysis||SPSS||Much of the data arrives in SPSS format, so SPSS is either used to analyse this data or to extract the relevant parts into Excel.|
|Excel||The main software used for data analysis and graph production.|
|Finereader||Some data arrives in pdf format and needs to be optical character read to get it into an electronically editable format before being exported to Excel.|
|Netcaptor||Used to monitor the set of websites from which data is obtained. Netcaptor was originally chosen because of its tabbing features. Other browser software (e.g. Internet Explorer version 7) could probably equally well be used now that they all support tabbing.|
|Graph manipulation||Acrobat writer||Used to produce high quality graphs in standalone files in .pdf format from the Excel spreadsheets. Also used in batch mode to convert the .pdf graphs to large .png graphs.|
|Photoshop||Used in batch mode to convert .pdf graphs to .gif graphs. Chosen partly because it can open .pdf files (most graphics software cannot), partly because its 'save for web' option does the conversion correctly (much graphics software does not, including the normal 'save as' in Photoshop), partly because the graphs can be re-sized as part of the conversion (a feature not supported by Acrobat writer) and partly because it has a batch mode (so the process can be automated).|
|Irfanview||Used in batch mode to convert .gif graphs to .png graphs for inclusion on the relevant web pages.|
|Mapinfo||Used to produce the maps. Other GIS software could equally well be used.|
|Website maintenance||Frontpage||Used to edit the web pages. Other software (e.g. Dreamweaver) could equally well be used.|
|SmartFTP||Used to upload the updated pages to the website. Chosen because it has three important features, namely: the ability to synchronise the website and the local version (so only pages which have been updated are uploaded); the ability to filter on files or folders (so intermediate working files are not uploaded); and the ability to retry the uploading in the event of a failure (so it is a completely automated process).|