Long term survey strategy: mixed mode research report

Findings from research exploring mixed mode survey designs in the context of the Scottish Government’s general population surveys. The report details information on key issues, potential mitigations and remaining trade-offs, and includes 21 case studies on relevant surveys.


11. Administrative data

Introduction

There was a strong interest from some of the stakeholders interviewed for this report in the scope to make greater use of administrative data to enhance or, potentially, replace some of the data currently collected by the SHS, SCJS and SHeS. Administrative data is classified as data collected during interactions with public services, normally collected for reasons other than research. This can include education, health, and tax bodies, for example.

Changes to the legal basis for data linkage introduced in 2018 across the UK have opened up additional opportunities to use administrative data as part of government surveys, without explicit consent being required from participants. This has driven an increase in the scale of administrative data linkage occurring across government. Similarly, the drive to improve administrative data is increasing globally. For example, many countries, such as Denmark, Finland, Norway, Sweden, Austria, Slovenia, and the Netherlands, have entirely replaced their census with either administrative data, or a combination of administrative and sample survey data, and other countries, including Italy have published details of their testing of similar approaches.[57] In Scotland, In Scotland and other parts of the UK, consideration is being given to what the most appropriate mix of surveys, administrative data and Census looks like for our statistical systems. Organisations such as Research Data Scotland and Administrative Data Research Scotland are looking at more ways to support the embedding of administrative data into routine research practice, with a view to enhancing data availability and quality, and reducing costs.

This chapter discusses potential options for using administrative data alongside the three Scottish Government surveys that are the focus of this study, drawing primarily on published information on its use on other studies and the views of expert interviewees.

Potential uses of administrative data

To inform sampling

Administrative data can potentially provide sample frames for mixed mode surveys. Depending on the data included, this can make mixed-mode surveys easier and more effective. For example, the GP Patient Survey uses English GP registration data as a sample frame. This allows for individual level sample selection with named sample, ensures only the eligible population (those aged 16+ who have been registered with an English GP practice for at least 6 months) are selected, allows for disproportionate sampling by GP practice registered, and also provides additional contact details, which can support SMS reminders as well as postal contacts (Ipsos, 2023).

An obvious limitation of this approach is that administrative datasets in Scotland and the UK do not always contain the whole population of interest. This means that, for example, using GP registration data as a sample frame for a health survey of the general population would by definition exclude those not registered with a GP.

However, where a legal basis exists, there is still scope to add individual-level administrative data to PAF samples to support sampling. This already happens on SHeS, where the child boost sample is screened against health records to identify households with children for inclusion. Aggregate level administrative data (for example, local area statistics on deprivation) can also be added to sample frames, to allow for stratification, boosting and targeting. Again, this is already a feature of sampling on the three Scottish surveys (for example, all three order addresses by urban-rural classification, SIMD rank and postcode), but is also relevant to supporting an effective mixed mode design, since it can enable identification of areas or addresses that might need different contact or mode strategies, as discussed below.

To inform contact and response mode strategies

Using administrative data to understand the profile of the population being studied can help inform decisions about what modes are appropriate for a particular survey. For example, the CQC NHS Patient Survey Programme tested a mixed-mode push-to-web approach with paper follow-up across five different patient populations, and saw drastically different take-up of the online mode on each, dependent on the patient profile (which was established using administrative data).[58]

While this is perhaps less relevant to surveys of the general population, administrative data can nonetheless support with targeted contact strategies on mixed mode surveys. For example, the Community Life Survey and Participation Survey target paper questionnaires by local area statistics, focusing the paper surveys on populations assessed as less likely to respond online in order to increase response from these groups, but without incurring the cost of mailing a paper questionnaire to the full sample. However, as mentioned in the previous chapter, relying on local area statistics can limit the effectiveness of such approaches in identifying target groups, compared with individual level data. In a similar vein, an expert interviewee suggested that those in Houses of Multiple Occupancy (HMOs), which tend to be larger households, may be less likely to respond online, so if it was possible to use administrative data to identify HMOs in the sample for a mixed mode survey, these households could potentially be targeted for face-to-face interview from the outset. This approach could be used for targeting other elements, such as incentives, to encourage response among groups known to be less likely to respond by a particular mode.[59]

To triangulate and quality assure

Across the three Scottish Government surveys, data linkage could potentially allow exploration of absolute levels of non-response. This would involve comparing survey estimates to good estimates of a “true” value of a variable. For example, the Scottish 2022 Census provides an opportunity to measure absolute non-response in Scottish Government surveys post-Covid-19 by linking the census directly to the survey. This has been done previously in Scotland by ONS for the SHS in 2001 and is planned for a number of English surveys based on the 2021 Census.

This type of analysis allows a comparison of census characteristics of different categories of responding and non-responding households to identify variables that are independently associated with non-response. It would require the co-operation of National Records of Scotland (NRS), would be innovative, and would provide additional data on the relationship between response rates and data quality.

Census statistics represent the total population rather than just those who completed the questionnaire. Statistical modelling has been used to produce total population estimates across the United Kingdom since the 2001 censuses. In response to the lower-than-expected return rate for the 2022 Scottish census, NRS adapted their census methodologies to estimate those who did not respond. As intended, a Census Coverage Survey (CCS) was conducted immediately after the 2022 Scottish Census collection. The results of this survey were used in NRS modelling, but for the first time the CCS was supplemented with administrative data to improve the accuracy of estimates of the total population. The coverage estimation process thus combined the results of the Census Coverage Survey (CCS) with administrative data to create a combined survey and administrative frame, which was used to identify and adjust for the number of people and households not counted, those counted more than once, and those counted in the wrong place.[60]

Using administrative data to assess differential nonresponse arguably becomes particularly important when considering a move in mode, as it cannot be assumed that the previous survey demographics reflect the most accurate respondent profile possible. Understanding not only what difference, if any, is seen in the respondent profile compared with administrative data by using an alternative mode strategy, but also what this means for differential non-response, is vital for assessing quality on different mode strategies.

To extend or replace survey data

Adding administrative data to the data collected by surveys to extend the scope for analysis is already an element of many large-scale surveys in the UK. In Scotland, SHeS survey responses are linked to NHS health records (accessible in the National Safe Haven), unless respondents opt out from this. SCJS data have also been linked with other datasets in the Scottish National Safe Haven, for further analysis by approved researchers in a secure environment.

In addition to supplementing data collected by surveys, there was also an aspiration among experts interviewed for this study, and some Scottish Government stakeholders, to be able to use administrative data to replace elements of survey data – in other words, removing questions where the data they collect can instead be added via linked administrative data. In the context of considering moving or changing modes, this might enable the Scottish Government to reduce the length of their surveys, thereby reducing costs and/or making them easier to implement on modes other than face-to-face (see discussion in chapter 7 on survey length and mode). It was perhaps on this topic, however, where there was least consensus among experts about the scope for further use of administrative data, given the challenges discussed below.

Challenges in using administrative data

The survey experts interviewed for this research were in broad agreement on the potential value of administrative data when used to either inform or supplement survey data. However, there was less consensus on the scope to use administrative data to replace survey data, either now or in the future.

On the one hand, there was a strong view that surveys should only be collecting data that cannot be obtained from anywhere else and that “it’s the future”. It was noted that this is even a legal requirement in the Netherlands – if something can be obtained from another data source, government surveys cannot ask about it. In the context of current non-response levels and survey costs, it was suggested that there was an urgent need to consider ways of combining administrative data with surveys in this way.

However, other experts, while not necessarily disputing the principle of maximising the use of administrative data with or instead of survey data, were more sceptical about the feasibility of substantially increasing its use in practice.

“In theory it should be (straightforward); in practice it isn’t.” (Expert interview 8)

“The idea was that admin data would progress at the same rate and (surveys would) just fill the gap that admin data couldn't, and that just hasn’t happened. Surveys are collecting everything still, with the odd bit of admin data.” (Expert interview 4)

Data quality, completeness and relevance

Perhaps the most significant issues around using administrative data – particularly (though not only) if it is being considered as a replacement for survey data – are those relating to its scope and quality.

Even where administrative data are collected on a similar topic to a survey, as data are not collected for the specific research purposes, they may not meet the particular requirements of the research questions. For example, certain demographic variables (such as sexuality) may not be routinely collected as part of interactions with all services, limiting the ability for subgroup analysis (or for using this data to quality assure survey data). Similarly, specific definitions (such as employment status) may vary between datasets. The limitations of administrative data in relation to specific research questions have, in some cases, provided the rationale for survey research. For example, crime statistics based on administrative data can never provide figures on un-reported crime – filling this gap is a key reason why the SCJS was established in the first place.[61]

In addition, not all administrative data is of a quality to provide representative statistics and/or it may need adjustments to account for missing data. For example, HMRC does extensive analysis to understand the level of income underreporting that would be present in administrative data on tax collected, and true data on income would need to account for this gap (HMRC, 2023).

Additional complications of using administrative data include: ensuring the definition of the sample units of interest (especially households) are the same between the survey and the administrative source; assessing the impact of the frequency that administrative records are updated and impacts on any triggers to updating (e.g. visits to GPs, selling or renting a home); and assessing whether the administrative data is collected in a ‘neutral’ way, where the data collector has no motivation to skew any measure.

One expert interviewee felt that more attention needed to be paid to the quality of administrative data, and that if it is being used more often in the future further thought needs to be given to this, as it can include missing data, duplicate records, incorrect records, etc.

“The survey and the admin data triangulate each other, it’s not that the admin data is the fixed triangulation point for the survey result, they triangulate each other and I think that is particularly true when you’re using admin data to get a sense of coverage, because it’s quite possible there’s under coverage in the admin data as well, and you need to factor that in.” (Expert interviewee, E17)

Various frameworks exist to assess the quality of administrative data, and assess any adjustments needed, such as the ONS Quality of Admin Data in Statistics toolkit. Careful consideration of the quality of alternative administrative data would need to form part of any review of the scope to replace elements of data currently collected on SHS, SCJS and SHeS with data derived from such sources.

Data access and ethics

Although in theory, changes to the legal basis for linking data in the UK discussed in the introduction to this chapter have opened up new opportunities for using administrative data, a number of survey experts and methodologists interviewed for this study (all of whom were senior and highly experienced) cited experiences of significant access barriers in practice when they had tried to link administrative data to surveys. These challenges were, in part, related to difficulties agreeing a legal basis for data sharing with administrative data holders – although one view was that legal barriers should almost always be surmountable in the long-run when the aim is research for the public good. However, interviewees also cited occasions where data holders were reluctant to share administrative data because of concerns about its quality, coverage or biases.

A related issue that emerged in expert interviews was around whether, regardless of the actual legal basis, explicit consent ought to be sought to add administrative data to survey responses. Although this may not be a legal requirement for government surveys, there were different views on whether it should be an ethical and/or a methodological requirement. Questions were raised over whether response rates to surveys would drop if surveys adopt an ‘assumed consent’ model and whether it might damage public or respondent trust in surveys. The potential impact on overall response is being tested on the English Housing Survey, which was exploring moving to an ‘assumed consent’ model for data linkage at the time of writing (see case study in Appendix A). The intention was that respondents would be provided with upfront information about how their data will be linked to administrative data – ‘opting out’ of this process effectively involves opting out of the survey (as the aim is to achieve as close to 100% matching as possible).

At the same time, there is evidence that consent to data linkage is significantly lower among respondents who complete surveys (and are asked for this consent) online compared with those who take part face-to-face. Understanding Society have found that consent to data linkage is around 20 percentage points lower for those who take part online (Jackle et al, 2021). At the time of writing, they were still exploring the reasons for this but had yet to find a way of adapting their approach (e.g. through changing incentive structures or amending the way in which consent is requested) that negated this difference.

Summary

  • Changes to the legal basis for data linkage in the UK have opened up new opportunities to use administrative data as part of government surveys
  • Potential uses in relation to mixed mode surveys include:
    • To inform sampling, for example, by adding individual or aggregate-level administrative data to PAF samples to support targeting or boosts.
    • To inform contact and response mode strategies, for example by identifying areas or addresses that might require different contact, incentive or mode of interview strategies within a mixed mode design (although there are some limits to the usefulness of address-based administrative information in identifying target groups)
    • To triangulate and quality assure – particularly in relation to helping assess nonresponse bias
    • To extend or replace survey data – there was a clear aspiration among some experts and stakeholders to use administrative data in this way. However, it was also the area on which there was least consensus as to the scope to go further than at present in the use of administrative data in the near future.
  • Challenges in using administrative data include:
    • Data quality and completeness – missing data, incorrect records, duplicates etc.
    • Relevance – given it is collected for a different purpose
    • Data access – expert interviews indicated that significant practical barriers often remain
    • Ethics – although changes to the law mean it is possible to link administrative and survey data without consent, there was some concern about the potential ethics of doing so and the potential impact on initial opt outs from surveys, particularly if respondents were told that their data would be linked to more sensitive administrative data in future.

Contact

Email: sscq@gov.scot

Back to top