British Household Panel Survey (BHPS)
The core of BHPS is a panel survey, meaning that the same households are surveyed each year. Thus, unlike other surveys, it can be used to analyse data either longitudinally (e.g. how does a household's income change over time?) or in terms of persistency (e.g. for households in low income in year X, how long do they remain in low income?).
BHPS's second main use comes from its inclusion of particular subjects which are not in other UK-wide or Great Britain-wide surveys, with examples including consumer durables, participation and risk of mental illness. However, it is a relatively small survey and should therefore not be used when there are alternative surveys with the required data.
- Available from: UK data archive.
- Registration required: yes.
- First survey available: 1991/92.
- Frequency: annual.
- Updated: June (the core dataset) and varies (household income).
- Scope: UK-wide.
- Format: SPSS, STATA or TAB.
- Files: around 12 files per year, with some of these being at the household level and others being at the individual level.
- Documentation: comprehensive.
- Weighted or unweighted: weighted.
- Household income data: yes, equivalised.
The BHPS files in the UK data archive are unusual in two respects:
- With each release of data for a new year, all the data for all previous years is also included in the release, with the previous versions being removed from the archive.
- The household income data is released separately from the rest of the data, usually around a year in arrears.
Finally, note that the individual-level files are for adults only (i.e. there are no records for children).
As will become apparent from the discussion below, BHPS is a difficult dataset to use and it is easy to make mistakes. Furthermore, the manual is less helpful than it should be as, although comprehensive in scope, its details often does not seem to completely correspond to the datasets themselves. For both these reasons, it is important to completely familiarise yourself with the dataset before using it.
Which software to use
As the annual dataset is around 16,000 records for individuals and 8,000 records for households, it can be exported into Excel.
Which files to use
Because the dataset contains all the data for all the years that BHPS has been in existence, each file name has to have a prefix to indicate the year to which the file applies. These prefixes range from 'a' for 1991/92 through to 'o' for 2005/06.
For household-level analyses, use the household-level files. For individual-level analyses use the individual-level files, linking these as necessary to selected household-level files for relevant household-level data.
Because the household-level data for a particular year is spread across a number of files, it is likely that any analysis will need to link some of these files. This can be done using the household identification number. Similarly, the individual-level files can be linked together using a combination of the household identification number and person number.
Note that household income is not available for all households and the list of households in the household income files is somewhat different than that in the other files.
Which parts of the files to be used
The core of each BHPS dataset is a panel who are surveyed each year. This is called the 'Essex' sample. In addition, this core is supplemented each year by additional households in Scotland, Wales and Northern Ireland, the purpose being to provide sample sizes which are sufficient for analysis at the home country level. Most analyses will simply use the whole sample.
Which weights to use
Unlike most survey datasets, which have a single weight field, BHPS has many weight fields and the issue therefore arises of which to use when. Furthermore, some of the weights are very different so the choice is important.
The general format of the names for the weight fields is summarised in the table below.
|First digit||The year to which the dataset applies, from 'a' for the first year through to 'o' for the latest year|
'l' if it is a longitudinal weight; or
'x' if it is not a longitudinal weight
'e' if it is an individual-level weight for the whole population; or
'r' if it is an individual-level weight but only for the actual respondents to the survey questions (basically the adults only and not the children); or
'h' if it is a household-level weight for the whole population; or
|Fourth digits onwards||
'wtuk1' if the weight is to used to analyse the data at the UK-wide level and the whole sample is to be used; or
'wtuk2' if the weight is to be used to analyse the data at the home country level and the whole sample is to be used; or
'wght' if the weight is to be used to analyse the data at the UK-wide level but only using the core 'Essex' sample (see above); or
'sw1': it is not clear that one should ever use such a weight; or
'sw2': it is not clear that one should ever use such a weight
So, for example if what is wanted is a normal UK-wide analysis at the individual level using the latest year's data, then the weight to use is 'oxewtuk1'. If the same analysis is to be done for Scotland only, then the weight to use is 'oxewtuk2' .
Analysis by region
Because of the sample size is boosted for Scotland, Wales and Northern Ireland, analyses for these countries can be undertaken. Because of the small overall size of the survey, however, sub-England analysis is not recommended.
Analysis by household income
Such analyses can be undertaken but there are a number of caveats:
- First, the list of households in the household income files is somewhat different than that in the other files.
- Second, the household income is only available before deducting housing costs.
- Third, there are two sets of equivalised households incomes, one using the McClements methods of equivalisation and the other using the OECD method of equivalisation, but calculations of household income undertaken the Department of Work and Pensions match neither of these for reasons that have not yet been resolved.
|Persistent low income||all||indall, hhsamp and nethh||
Requires a complicated analysis.
Use indall for the weights, hhsamp for the region, and nethh for household income and the sample origin. Link these tables using a combination of household identification number and person number.
Use the lewght weights because longitudinal transitions (hence the l), includes children (hence the e) and for the original Essex sample only (hence not one of the UK ones).
For each year and each individual, calculate whether the person was in income poverty or not. Then link the requisite years together using the person identification number. Then, for those individuals who were surveyed in each of the years, allocated them to the requisite poverty groups.
|Lacking consumer durables||all||hhresp||
Use the xhwtuk1 weights because not longitudinal (hence the x), households (hence the h) and is for the UK (hence the 1).
Allocate each record to a household income quintile, calculating the quintile thresholds required to achieve this.
|Non-participation||first and second||indresp and hhresp||
Use the xrwtuk1 weights because not longitudinal (hence the x), adults only (hence the r) and is for the UK (hence the 1).
Allocate each record to a household income quintile using the allocations from hhresp and linking the two tables using household identification number.
Use ONS mid-year population estimates as necessary to translate proportions into absolute numbers.
Scotland, Wales and Northern Ireland graphs
These are effectively a subset of the UK graphs using government region (hhresp) as a filter and, in line with the rules discussed earlier, using the weights with the suffix 2 rather than 1.