Scottish Household Survey 2022: methodology and fieldwork outcomes

Details of the methodology and survey fieldwork outcomes relating to the 2022 Scottish Household Survey


Data quality and limitations

Introduction

Surveys provide estimates of population characteristics rather than exact measures. Survey error is the difference between the true value of a population characteristic, and the estimate of that characteristic provided by the survey. There are two main types of survey error – sampling and non-sampling. Non-sampling errors can be divided into representation errors (including non-response error and coverage error) and measurement errors.

Sampling error

Sampling error results from the variability inherent in using a sample of the population for estimation, rather than collecting information from every member of the population.

All samples can differ from the population by chance. In principle, many samples could be drawn and each would give different results, because each sample would be made up of different people, who would give different answers to the questions asked. The spread of these results is the sampling variability, which generally reduces with increasing sample size.

The likely extent of sampling variability can be quantified by calculating the 'standard error' associated with an estimate. The standard error of the estimate of a percentage depends upon several things:

  • The value of the percentage itself
  • The size of the sample (or sub-sample) from which it was calculated i.e. the number of sample cases corresponding to 100%
  • The sampling fraction i.e. the fraction of the relevant population that is included in the sample
  • The 'design effect' associated with the way in which the sample was selected. For example, a clustered random sample would be expected to have larger standard errors than a simple random sample of the same size).

Although the SHS has a large sample that covers the whole of Scotland, it has some geographical limitations because of the sample sizes in small local authorities and because it is designed to be representative only at national and local authority level.

This means:

  • users need to be mindful of the sampling errors for analysis, especially when this is based on breakdowns within a single local authority
  • it is not appropriate to undertake geographical analysis below local authority level, since the sampling techniques used in some local authorities cannot guarantee representativeness in smaller areas.

Confidence intervals and statistical significance

A confidence interval is a range of values, defined by a lower and upper bound, that indicates the variability of an estimate. If we drew 20 random samples and calculated a 95% confidence interval for each sample using the data in that sample, we would expect that, on average, 19 out of the 20 (95%) resulting confidence intervals would contain the true population value, and 1 in 20 (5%) would not.

The margin of error is the standard error multiplied by 1.96. The upper bound of the confidence interval is calculated by adding the margin of error to the estimate. The lower bound is the the estimate minus the margin of error. The Excel workbooks published as supporting documents to the SHS 2022 key findings report provide margins of error for a range of estimates and sample sizes, incorporating a design factor of 1.28 to account for the complex survey design. Where the exact value of interest is not given in the table, user can use the closest value in the table, or can derive more precise estimates through using standard formulas for confidence intervals from survey estimates, incorporating the design factor.

Because the survey's estimates may be affected by sampling errors, apparent differences of a few percentage points between sub-samples may not reflect real differences in the population. It might be that the true values in the population are similar but the random selection of households for the survey has, by chance, produced a sample which gives a high estimate for one sub-sample and a low estimate for the other.

A difference between two estimates is significant if it is so large that a difference of that size (or greater) is unlikely to have occurred purely by chance. Conventionally, significance is tested at the five per cent level, which means that a difference is considered significant if it would have occurred only once in 20 different samples.

Testing significance involves comparing the difference between the two estimates with the standard errors for each of the two estimates. In general, if the difference is smaller than the larger of the two margins of error, it could have occurred by chance and is not significant. A difference that is greater than the sum of the margins of error is significant.

If the difference is greater than the larger of the two margins of error, the difference might be significant, although the test is more complex. Statistical sampling theory suggests that the difference between the two estimates is significant if it is greater than the square root of the sum of the squares of the margins of errors for the two estimates.

It should be noted that the published estimates have been rounded to the nearest whole number, and this can affect the apparent significance of some of the results. For this reason, caution should be exercised where differences are on the margins of significance.

Non-response error

Social survey samples are normally designed so that if everyone responded, the sample would be an accurate representation of the whole population of interest. Non-response bias is where those who take part in a survey are different from those who do not. This can mean that the survey participants are not representative of the whole population of interest. An example of this would be if interviewers only approached households during working hours. In this case, the likelihood of obtaining interviews with retired people would be considerably higher than the likelihood of interviewing the employed population, leading to skewed data.

Research that is dependent upon voluntary participation is always vulnerable to this type of bias, and surveys such as the Scottish Household Survey are designed to reduce the potential for non-response bias. This is done by maximizing the response rate and trying to ensure that it is not more difficult for some groups than others to take part.

A high response rate does not necessarily create a quality, unbiased survey sample. Instead, it depends on the patterns of who participates. For example, Groves and Peytcheva (2008) make a distinction between data that is 'missing at random' and 'non-ignorable'.

Data is 'missing at random' when there is a common cause for both nonresponse and key output variables. For example, being young may cause nonresponse, and it may also mean a person is likely to participate in sport. Therefore, if young people are less likely to respond, people who participate in sport will be under-represented.

'Non-ignorable' missing data happens when there is a consistent reason for non-response, and therefore a danger of excluding this subgroup from the sample, creating non-response bias. For example, if the reason for non-response is because some of the respondents cannot read, then this is non-ignorable, as illiterate people are now excluded from the sample. Similarly, if people who participate in sport are less likely to be contacted by interviewers (because they are at home less often) then this would also be 'non-ignorable'.

Good weighting strategies help to correct for patterns of differential response. However, weighting can only correct data 'missing at random', not 'non-ignorable' missing data.

The higher the response rate, the less potential there is for non-response bias. While the traditional SHS approach was subject to non-response bias, weighting ensured that estimates were comparable with those from other robust sources. Moreover, because the SHS approach and response rate was relatively consistent over time prior to 2020, the effect of non-response bias was likely to be reasonably consistent between waves, and would therefore not affect analysis of trends over time.

Coverage error

Coverage error, like non-response error, has the potential to affect the representativeness of the survey data. It is bias that occurs when the sampling frame does not coincide with the target population.

The target population of the SHS is all adults living in private households in Scoltand. The survey uses Royal Mail's small user Postcode Address File (PAF) as the sampling frame. Overall, the PAF is a good record of all private households in Scotland. The PAF does not include accommodation in hospitals, prisons, military bases, larger student halls etc. Therefore, the SHS provides a sample of private households rather than all households.

Samples of the general population exclude prisons, hospitals and military bases. While prisons and hospitals do not generally have significant numbers of private households, the same may not be true of military bases. These are classified as special enumeration districts (EDs) in the Census and account for just 0.5 per cent of the population. Interviewing on military bases would pose fieldwork problems relating to access and security so they are removed from the PAF before sampling.

The following types of accommodation are excluded from the survey if they are not listed on the Small User file of the PAF:

  • Nurses' homes
  • Student halls of residence
  • Other communal establishments (e.g. hostels for the homeless and old people's homes)
  • Mobile homes
  • Sites for travelling people.

Households in these types of accommodation are included in the survey if they are listed on the Small User file of the PAF and the accommodation represents the sole or main residence of the individuals concerned. People living in bed and breakfast accommodation are similarly included if the accommodation is listed on PAF and represents the sole or main residence of those living there[5].

Students' term-time addresses are taken as their main residence (in order that they are counted by where they spend most of the year). However, since halls of residence are generally excluded, there will be some under-representation of students in the SHS.

Measurement error and comparability with other sources

Measurement error is the difference between a respondent's answer and the true value.

As a multi-purpose survey of households, the SHS is not designed to provide the kinds of information about economic activity and household income that can be obtained from more specialised surveys such as the Family Resources Survey. We have published the results of a project that assessed in detail how accurately the SHS measures household income.

The Scottish Household Survey (SHS) is not the official source of statistics on all of the topics it collects information on. The SHS has questions on these topics:

  • To explore differences between groups when analysing other topics e.g. to look at how internet use varies by income.
  • To contribute to the Scottish Surveys Core Questions (SSCQ) pooled sample.
Table 6: Alternative preferred data sources

Topic

Preferred source

Age, sex

NRS mid-year population estimates

Religion, ethnicity, sexual orientation

Scottish Surveys Core Questions

Disability/long term health condition, self-assessed health, smoking, unpaid caring

Scottish Health Survey

Perceptions of crime, confidence in the police

Scottish Crime and Justice Survey

Household income

Family Resources Survey

Employment, unemployment and economic activity

Annual Population Survey

Comparability with previous SHS years

The results of the 2020 and 2021 SHS telephone surveys were published as experimental statistics. They are not directly comparable to SHS face-to-face survey results for other years and are not presented in time series data.

The results of the 2022 survey have been published as official statistics and are broadly comparable to 2019 and earlier years. Unlike 2020 and 2021, interviewers were able to resume visiting people's homes to encourage participation in the 2022 survey, as they had done in 2019 and earlier. The majority (70%) of interviews were conducted face to face.

While most key measures that we would expect to remain broadly stable are in line with 2019, weighted results for educational attainment and tenure are slightly different to what we might expect.

The proportion of adults with no educational qualifications has declined over time, by about 3 percentage points every 3 years up to 2019. Between 2019 and 2022 it decreased by 4.6 percentage points. Similarly, the proportion of adults with a degree or professional qualification has been increasing by roughly 2 percentage points every 3 years, and increased by 4.5 percentage points between 2019 and 2022.

The 2022 SHS results for tenure show an increase in the proportion of owner occupied households (+3.1 percentage points) compared to 2019, and a decrease in the proportion of social rented (-1.7 percentage points) or privately rented (-1.2 percentage points) households. In contrast, social housing dwelling stock administrative data (published by the Scottish Government and the Scottish Housing Regulator) indicate that the percentage of dwellings in the social sector was stable from 2017 to 2022, with the growth in the number of social dwellings over this period matching the growth in total dwelling stock over these years.

Separate administrative data on the size of the private rental sector from properties registered as part of the Scottish Landlord Register suggests that the percentage of dwellings in the private rented sector decreased very slightly (by 0.4 percentage points) between 2019 and 2022. The Scottish Landlord Register provides a measure of the overall supply of privately rented properties based on the number of properties registered, although there are some limitations of this data source, such as the fact that registrations last for a period of three years and there could be a time lag in landlords de-registering properties which are no longer available for rent.

It is not currently possible to get a complete picture of the tenure of Scottish households from published sources other than the SHS. The 2022 census results, when they are published, will provide a very valuable comparison for these figures.

The results highlighted above could be reflective of genuine changes in the population rather than any methodological issues with the survey. They are unlikely to be due to the change in mode of interview, and the mode of approach has returned to face to face. However, response rates were lower in 2022 (44%) than they had been pre-pandemic (63% in 2019). This may have been accompanied by a small change in the pattern of non-response to the survey (with renters and those with no educational qualifications slightly less likely to respond to the survey than before, compared to other groups).

In general, these differences are unlikely to have a significant impact on the reported results. For those results where an impact is more likely, this is highlighted in the relevant chapter and as notes to the data tables.

Contact

Email: shs@gov.scot

Back to top