English National Pupil Database (NPD)
The English National Pupil Database contains at least one record for each pupil at a state school in England, covering both the pupil's characteristics and their examination results. In principle, therefore, it can be used for just about any analysis of educational attainment at school. In practice, however, its sheer size makes any analysis time consuming and, as such, it should only be used when the required data cannot be obtained from either reports published by the Department for Education (see their website) or their school-level performance tables. For example, in this website its use is restricted to analyses by eligibility for free school meals and ethnic group.
- Available from: the Department for Education. Request via the NPD requesting service, 01325 392059.
- Registration required: no.
- First survey available: 2002.
- Frequency: annual.
- Updated: December.
- Scope: England only.
- Format: text files.
- Files: at least 10 files per year (see below for list).
- Documentation: some, but not a comprehensive user manual.
- Weighted or unweighted: not applicable, as the data is a complete count.
- Household income data: not applicable.
The major files included in each dataset are as follows:
|Table||A record per:||Number of records||Contains data about|
|Census/PLASC||pupil||8,000,000||Basic data about every pupil at school (local authority, school, age, ethnic group, eligibility for free school meals, etc)|
|KS1||Key Stage 1 pupil||600,000||Key Stage 1 results|
|KS2||Key Stage 2 pupil||600,000||Key Stage 2 results|
|KS2||Key Stage 3 pupil||600,000||Key Stage 3 results|
|KS4 candidate indicator||Key Stage 4 pupil||600,000||Key Stage 4 results in summary|
|KS4 results||Exam per Key Stage 4 pupil||6,000,000||Key Stage 4 results in detail|
|KS5 candidate indicator||Key Stage 5 pupil||600,000||Key Stage 5 results in summary|
|KS5 results||Exam per Key Stage 5 pupil||3,000,000||Key Stage 5 results in detail|
|Qual codes||Key Stage 4/5 qualification code||small||Lookup tables matching result and subject codes with their descriptions|
|Mapping codes||Key Stage 4/5 mapping code||small|
Each of the files above, apart from the Census, is available at different times of the year in one of three statuses:
- 'Unamended': from around September/October.
- 'Amended': from around December/January.
- 'Final': from around April.
It is strongly recommended that researchers use either the 'amended' or 'final' versions rather than the 'unamended' versions. This is because, whilst the differences between the 'amended' and 'final' versions are considered to be minimal, the differences between the 'unamended' and 'amended' versions are often considerable, particularly at a local authority level. This is, in turn, because some schools sometimes initially submit total incorrect responses and this is only corrected in the 'amended' versions.
Which software to use
Because the tables are so voluminous, they can only be analysed by importing the data into software which can both handle large databases and produce statistical results. SPSS or equivalent is therefore the obvious type of software to use.
Which tables to use
Most analyses will take the form of analyses a particular set of results by the characteristics of the pupils. As such, they will need to use the Census table plus one of the results tables. These tables can be linked together using the 'pupil matching reference number'. However, because these tables do not have a 1:1 correspondence, such matching might (depending on the software used) be non-trivial, for example requiring both files to be sorted first, those records with null 'pupil matching reference numbers' deleted, and decisions made about how to handle records with duplicate 'pupil matching reference numbers'.
Which filters to use
This issue is discussed at length in the documentation.
The results data (e.g. KS2, KS4 etc tables) contain individual records for every pupil whose school submits results. As well as all mainstream maintained schools, it is understood that this also includes special schools, some (but not all) referral units, and those independent schools which choose to submit results.
The Census data contains individual records for every pupil whose school is covered by the Census. As well as all mainstream maintained schools, it is understood that this also includes special schools but not either referral units or independent schools.
It follows that a simple analysis, with no filters, will have the same coverage as the table which it is using. Note that one, non-obvious, implication of this is that if the results data is linked with the Census data, then the analysis will have the coverage of the results data is no Census variable is included in the analysis but the coverage of the Census data if at least one Census variable is included in the analysis.
In any event, very few Department for Education analyses include 100% of the records, partly because there are some oddities and partly because the independent schools included in the data is somewhat arbitrary. Rather, they apply a range of filters and these filters are all available in the detailed tables. For example, in the 2010 KS2 tables, some of the commonly used filters include:
- KS2_NATRES =1 to include all schools but to exclude the oddities.
- KS2_NATMTDRES = 1 to include all maintained schools but to exclude the independents and the oddities.
- KS2_VALYYY=1 (where 'YYY" is a subject such as English) to include only those pupils who should be included for that subject.
There are equivalent filters for use when local authority-level or school-level analyses are required.
Which fields to use
One problem here is that the field names in the documentation sometimes do not match those in the actual tables. The key is to look for field names in the table that have some sort of similarity to the description of the field in the documentation that you want to use and then see if the values for that field are what one would expect from the documentation. For example:
- The documentation says that tables should be linked using the 'pupil matching reference number' and sometimes that the field name is 'pupilmatchingref'. There is no 'pupilmatchingref' field in either the Census table or the KS2 table. There is, however, sometimes a field called 'pmr' in the Census table and one called 'k2_pmr' in the KS2 table. Examination of these fields show that they are the 'pupil matching reference numbers' (they are unique to each record, the list of records of the appropriate age in the Census table match the list of records in the KS2 table, etc).
- The documentation sometimes says that the examination serial number in the KS4 results table has the field name 'examno'. Upon review, it becomes clear that the actual name for this field in the table is 'k4r_esn', where the starting clue is that 'esn' is an acronym for 'examination serial number'.
A second issue is that some Census variables are duplicated in some of the results tables. In these circumstance, it is the Census version of the variable which should be used.
Because the dataset is for England only, the graphs below are also for England only.
|Educational attainment at age 11||second and third||For the 'White other' ethnic group, combine the results for 'any other White background', 'Irish', 'Gypsy/Romany' and 'Traveller of Irish heritage', noting that the latter two groups are very small and do not materially alter the overall figures for 'White other'.|
|Educational attainment at age 16||third and fourth||
Use the 'level1' field rather than the 'fiveAG' field to include those with all GCSE equivalencies rather than just GNVQ equivalencies.
For the 'White other' ethnic group, combine the results for 'any other White background', 'Irish', 'Gypsy/Romany' and 'Traveller of Irish heritage', noting that the latter two groups are very small and do not materially alter the overall figures for 'White other'.