back

Notes

Technical issues underlying the website

The technical challenges in producing this website are much greater than one might imagine and it has taken us many hours to find practical solutions to these challenges.  As anyone wishing to develop a website which contains lots of graphs will come across the same issues, we thought it would be useful to set out what the major challenges have been and how we have addressed them.  In every case, our solutions are effective in that the end result is as desired and the process can be largely automated.  That is not to say, however, that our solutions are always the best ones available and we would be delighted if anyone could tell us about any better ones.

Data analysis

All the graphs on this website are created in Excel.

Some of the data used for the graphs arrives in formats which can be directly read by Excel, for example .csv files or html pages.  In these cases, no software other than Excel is used.

Some other data arrives in pdf documents.  For these to be read by Excel, they need to be converted into an electronically editable format.  This is done using optical character reading software (Finereader).

However, most of the data, including all the major datasets, arrives in SPSS format.  In these cases, the analysis starts in SPSS and ends in Excel, with choices about how far into the process the data is transferred from SPSS into Excel.  For good or ill, the current approach is to try and get the data into Excel as early as possible in the process, largely because the Excel spreadsheets contain all the intermediate calculations and thus provide an complete audit trail in a way that SPSS does not.  For all bar the Labour Force Survey, Annual Population Survey and English National Pupil Database (for all of which the number of records in each dataset precludes such an approach), this effectively means that the use of SPSS is limited to exporting the relevant fields from the dataset into Excel.  For the three datasets listed above, however, the basic analysis is undertaken using SPSS syntaxes, with the results then copy-pasted into Excel for further processing.

top

Graph production

Producing a graph on your PC or for a report is easy: you simply create it in Excel (or whatever) and, if needed, copy and paste it into your software of choice (e.g. Word).  For reasons that are not easy to understand, however, producing a graph for a web page is much more difficult (and this is probably why there are so few high quality graphs on the Internet).  The basic problem that that a web page requires the graph to be a separate file in a graphics format (gif, png, etc) but all the features in Excel to produce such files end up with completely mangled results.

One piece of software which does easily and successfully extract graphs from Excel is Acrobat Writer, where the end result is a .pdf file.  This .pdf file can then be embedded into a web page and the end result is as one would hope and expect.  Furthermore, .pdf files are dynamically resizeable so the width of the web page can be set to 100% of the user's window and the graphs will automatically resize appropriately.  Finally, .pdf files are small so the web page loads relatively quickly.  For all these reasons, this website originally had all its graphs as .pdf files.

However, and this is a big 'however', embedding .pdf files in web pages is viewed as non-standard by the cognescenti.  As a result, whilst the embedding of .pdf files works fine in a Microsoft-only world (Frontpage, Internet Explorer, etc), it is much more problematic when other software (e.g. Firefox or other browsers) is used to view the results.

Rather, the only standard way of including graphics in web pages is as .gif or .png files.  Of these, .png files are preferable for our purposes as they are quite a lot smaller (so the web page loads faster).  The challenge, therefore, is how to convert a .pdf file to a .png file.  Furthermore, because .png (and .gif) files are not resizeable without severe degradation in quality, they have to have a fixed size and the conversion process must allow this size to be chosen.  Finally, because many graphs have to be converted, the conversion process has to be done in batch mode.

We have never found any software which will do the necessary conversion in a single step.  Acrobat Writer will do the conversion but does not allow the size of the .png file to be chosen.  Photoshop will also do the conversion but mangles the results.  In the end, our chosen solution is to use the 'save for web' option in Photoshop, which has all the required features except that the end result has to be a .gif file rather than a .png file.  We then use some software called Irfanview which converts all the .gifs to .pngs in batch mode. 

In passing, note that:

Summary

After trying out all sorts of different approaches, the process of graph production currently used for this website is as follows:

  1. Produce the graph in Excel.
  2. Export to a .pdf file using Acrobat Writer.
  3. Crop the .pdf file using Acrobat Writer (a batch process).
  4. Convert the .pdf file to a .gif file using the 'save for web' option in Photoshop (a batch process).
  5. Convert the .gif file to a .png file using Irfanview (a batch process).
  6. Convert the .pdf file to a large .png file using Acrobat Writer (a batch process).

top

Map production

Important note: the interactive maps are not currently available.  For reasons that I don't understand, all the relevant IT companies now seem to view lack of support for SVG standards as a feature rather than as a problem.   In reaction, the only way forward is for me to totally re-develop the maps using either Google maps or Flash but this will be a major task and will therefore not happen in the immediate future.  If there is anyone out there who has experience in converting SVG maps to Google maps or Flash and who would potentially like to help, please contact me.

As with graphs, maps are easy to produce on a PC but much more difficult to include on a web page.

For a simple map graphic, the main challenge is how to ensure that the text (e.g. local authority names) is easily readable.  Simply producing the map plus text in a geographic information system (GIS) and copy-pasting the result into a graphic package does not work because the text becomes blurred when the map is re-sized.  Rather, our chosen solution has been to write templates in Word which contain all the relevant text in the right places, then copy-paste the map from the GIS into this template, and then export the results from Word into a .pdf file before processing the .pdf files using the same method as for the graphs.  Whilst this approach involves quite a lot of effort whenever a new type of map is needed, largely because the placement of the text takes quite a long time, it makes the production of a map using an existing template relatively painless.

However, our experience of using maps is that they are often most effective when one can flip between maps for different subjects to see how their patterns differ and this requires some interactivity so that the user can switch from one map to another.  Welcome to the weird world of scalable vector graphics (SVGs).

In the SVG world, maps are vectors rather than graphics.  The map boundaries are polygons defined by large numbers of latitude/longitude pairs of numbers (nodes) and the colour with which a particular boundary is filled is an attribute of the polygon.  The basic display of the vector as a map is then controlled using extensible markup language (XML) and what is shown on the map (e.g. the colour of any particular polygon) can be controlled using javascript.  Even though the end result is a map, no GIS or other mapping software is used anywhere in the process.

Use of SVG has a number of attractive features in theory, including:

However, and (as always) it is a big 'however', it seems that some of the features above can currently only be implemented for people who are using Internet Explorer as their browser.  This is because they require interaction between the web page (where the user chooses which map to view) and the SVG file (which is where the map vector and its associated XMLS is stored) and because (for some reason) such interactions are not viewed as standard by the cognescenti, they are not supported by other browsers such as Firefox. This is a typical dilemma faced by website developers: abide by open standards throughout, accepting the limitations and practical difficulties that this can cause, or incorporate Microsoft-specific solutions where these have major practical benefits.  In the case of interactive maps, this dilemma is at its most stark: adopt the Microsoft approach or abandon having interactive maps altogether.

The end result is that the interactive maps on this website will only work properly if viewed using Internet Explorer combined with the free Adobe SVG viewer plugin, version 3.03.  If viewed in other browsers, only a blank map will be displayed, if that.

In passing, note that:

Summary

The process by which simple map graphics are produced for this website is as follows:

  1. Produce the map in a geographic information system, with no text included.
  2. Produce a template in Word which contains all the text.
  3. Copy-paste the map into the Word template.
  4. Export the resulting Word document to a .pdf file using Acrobat Writer.
  5. Convert to the various formats using the same methods as for graphs.

The process by which more complex interactive maps are produced is as follows:

  1. Produce the basic map (no colouring) using SVG and XML.
  2. Write the logic to colour the maps, to allow user selection of the maps and to control what happens when the user clicks parts of the map using javascript.
  3. Maintain the data used to colour the maps and to set the legends in a simple text file.
  4. Tell the user that they can only use the maps with Internet Explorer combined with the Adobe SVG viewer plugin, version 3.03.

top

Web page formatting

Like all large websites, this website uses cascading style sheets to control all the formatting.

top

Website menus

The key challenges here are twofold.  First, given the large number of pages on the website, the indicator selection menu has to be maintained as a separate entity which can then be automatically incorporated into the relevant web pages as the alternative (hard-coded copies of the menu on each page) would be a maintenance nightmare.  Second, given the large total size of the indicator selection menu (around 200Kb), its incorporation has to be selective - incorporating only those parts that are relevant to the user given where they are and how they got there - as its complete incorporation on every page would massively slow down response times.

Both of these challenges relate to how the menu is incorporated into the web pages.  There is also the issue of how the menu is created in the first place.  There are many ways of doing this, most of which use a mixture of standard bullet etc tags (to create the menu items), cascading style sheets (to control what the menu items look like on the web page) and javascript (to control the opening and closing of submenus).  Our chosen solution uses the Ultimate Drop Down Menu package from Brothercake.

One obvious way of incorporating the menu into the web pages is via the use of frames.  In such a scenario, the menu is always shown in the left hand frame whilst the various web pages are shown in the right hand frame.  A big advantage of this approach is that the menu is only downloaded once by the user no matter how many pages they view and so their response times after the first page are quick.  It is also very easy to implement.  For both of these reasons, this website originally had all its web pages presented in frames.

However, and this is a big 'however', the use of frames screws up google etc searches.  More specifically, a typical search will result in the right hand frame only being selected but, when viewed by the user, they will not then be able to navigate to other pages as there will be no menu. The search will only find the right hand frame as this is where all the search terms will be.  By contrast, the total page comprising both frames will not be found as it actually contains no search terms, being simply links to the two frames.

Rather, the standard way of including menus in web pages is via 'server side includes' (SSIs).  In such a scenario, the menu is maintained as a separate file but there is a command in the web page which causes the web server to merge the menu into the web page before downloading it to the user.  This works fine but it has a major disadvantage, namely that the resulting page cannot be viewed on a standard PC prior to its uploading, as the merging can only be done by a web server.  In effect, those maintaining the website simply have to upload it blind to the Internet, assuming that they have not made any errors.  In the real world, this is effectively a showstopper when large websites are being maintained by small organisations. It is less of an issue for large organisations because they can easily maintain their own web servers.

The Microsoft way of getting round this problem is to use 'Frontpage includes'.  In such a scenario, the menu is again maintained as a separate file but Frontpage automatically merges the menu into each web page before it is uploaded to the Internet.  There are some disadvantages to this approach but none are major issues for this website; for example, it requires the use of Frontpage to maintain the website (not an issue for us as we do anyway) and it requires that all the web pages be uploaded if the menu changes (not a big issue for us as the menus only change rarely). Note that 'server side includes' do not require such uploading as the menu is incorporated into the web page dynamically at the time that user views the page.  Most of the menus on this website are 'Frontpage includes'.

However, as the website grew and the menus became consequently larger, another problem emerged.  This is that the total size of the indicator menu was such that its inclusion on all the web pages was significantly slowing down response times.  The reason for this is that, for technical reasons, the menu could not be cached on users' PCs and thus had to be downloaded in its entirety every time a new page was viewed. This problem exists for 'server side includes' as well as 'Frontpage includes'.  Our response was to break up the menu into a number of separate components (e.g. one for each country).

This solution was ok for most of the website but was still resulting in slowish response times for the UK section.  The reasons for this are slightly complicated but also somewhat interesting in that they show how technical issues and website design can interact.  In the UK section, the various indicator web pages can be selected from a variety of places namely 'by age group', 'by subject', 'by geography', 'by disability related', 'by gender' and 'by ethnic group'.  Because, by default, any selected web page does not 'know' what route the user used to get to it but has to incorporate that part of the menu via which the user actually did get to it, the whole of the UK section had to have a single menu which could not be broken up.  Putting this another way: even though only part of the menu is actually required in any given case - i.e. that part by which the user actually got to the web page - each page was by definition having to include all those parts of the menu by which the user could potentially have got to the page.  The reason that this was slowing down access to the UK section more than the other sections is that it has more possible routes to a web page and thus the resulting menu was bigger.

After wrestling with this problem for a long time, we have finally come up with a solution, namely 'javascript includes'.  In this scenario, javascript is used to track a user's progress through the menu (via parameters passed from page to page).  When a user gets to an indicator page, javascript is then used to dynamically generate that part of the menu, and only that part of the menu, that is required given the route by which the user actually got to the page.  In other words, the menu items that are incorporated into the web page just prior to its downloading will differ depending on the route by which the user got to that page and will be limited to only those required for subsequent navigation.

Technically, there is much more that one could say here but the discussion above is already over long.  If you want to know more, email us.

Finally, to give some idea of scale regarding the issues above, the bullets below summarise the average download size per page within the UK section assuming an average page size, including graphs but without menus, of 70Kb:

Summary

In terms of its menus. the website currently comprises 'Frontpage includes' for the non-UK sections and 'javascript includes' for the UK section.  In time, all the 'Frontpage includes' may be converted to 'javascript includes' but, for technical reasons, this would require some additional 'summary' pages in the non-UK sections and this is not something we really want to do from a design perspective.

top

Software used

Purpose Software Comments
Data analysis SPSS Much of the data arrives in SPSS format, so SPSS is either used to analyse this data or to extract the relevant parts into Excel.
Excel The main software used for data analysis and graph production.
Finereader Some data arrives in pdf format and needs to be optical character read to get it into an electronically editable format before being exported to Excel.
Netcaptor Used to monitor the set of websites from which data is obtained.  Netcaptor was originally chosen because of its tabbing features.  Other browser software (e.g. Internet Explorer version 7) could probably equally well be used now that they all support tabbing.
Graph manipulation Acrobat writer Used to produce high quality graphs in standalone files in .pdf format from the Excel spreadsheets.  Also used in batch mode to convert the .pdf graphs to large .png graphs.
Photoshop Used in batch mode to convert .pdf graphs to .gif graphs.  Chosen partly because it can open .pdf files (most graphics software cannot), partly because its 'save for web' option does the conversion correctly (much graphics software does not, including the normal 'save as' in Photoshop), partly because the graphs can be re-sized as part of the conversion (a feature not supported by Acrobat writer) and partly because it has a batch mode (so the process can be automated).
Irfanview Used in batch mode to convert .gif graphs to .png graphs for inclusion on the relevant web pages.
Mapinfo Used to produce the maps.  Other GIS software could equally well be used.
Website maintenance Frontpage Used to edit the web pages.  Other software (e.g. Dreamweaver) could equally well be used.
Texturizer Used to edit the javascript and SVG scripts.  Other software could equally well be used.
SmartFTP Used to upload the updated pages to the website.  Chosen because it has three important features, namely: the ability to synchronise the website and the local version (so only pages which have been updated are uploaded); the ability to filter on files or folders (so intermediate working files are not uploaded); and the ability to retry the uploading in the event of a failure (so it is a completely automated process).

top