Long term survey strategy: summary report and framework to support decision-making

Summarises the key findings from research exploring mixed mode survey designs in the context of the Scottish Government’s large-scale general population surveys.


Key issues, mitigations and trade-offs in changing or mixing modes

Representation: Coverage and sampling

Coverage error occurs when there is a mismatch between the target population and the sample frame. Undercoverage is a key quality concern – if units of the target population (e.g. people or households) are systematically missing from the sample frame, the findings may not be representative of the population they are intended to reflect. The table below summarises the key issues relating to coverage and sampling that need to be considered when thinking about mode change, and options for mitigating some of the challenges that arise.

Priority considerations / issues

Potential mitigations

Remaining issues and trade-offs

Cross-cutting issues

Address-based sample sources (e.g. PAF or the Scottish Address Directory) are currently the only realistic probability sampling option for general population surveys.

Telephone matching is likely to introduce bias and RDD excludes the growing number of mobile only households.

Probability surveys can be conducted using telephone or web modes by writing to addresses selected and inviting them to 'opt in' to a web or phone survey (see e.g. Active Lives, Participation Survey, GP Patient Survey, Transformed Labour Force Survey (TLFS), National Survey for Wales).

Knock to nudge approaches can also be used, where an interviewer calls in person but encourages participants to take part online or by phone.

Knock to nudge approaches were primarily used when Covid-19 restrictions were in place, since sending an interviewer out without conducting an interview is an expensive way of obtaining an online or telephone interview. The TLFS is a notable exception to this (it continues to use an adaptive knock to nudge strategy).

Cross-cutting issues

Selection of individual respondents within households is more complex with postal and web modes, as it relies on respondents following instructions (with evidence suggesting compliance with these is low).

Allow for two or more respondents per household to complete the survey.

Sampling efficiency is reduced by within household clustering, and weighting is required to adjust for this.

Cross-cutting issues

Targeted boosting of non-geographic sub-groups is difficult when using self-completion modes (as screening is required and is difficult without an interviewer present). (See for example the Ethnic Minority British Election Study).

Administrative/other data can be appended to geographic sample to try and identify households more likely to contain specific subgroups. (See for example the Participation Survey).

Success in identifying non-geographic sub-groups on other surveys has been variable (see chapter 11 of main report), so boosting may remain more difficult. But if the overall sample size is increased, there may be more people within different subgroups anyway.

Cross-cutting issues

If surveys use a mixed mode design which retains a face-to-face element for follow-up of non-responders, there is likely to need to be some clustering of addresses for face-to-face follow-up to avoid very high costs.

A number of other mixed mode surveys have focused face-to-face follow-up only on areas where expected response rates are lower (e.g. deprived areas) to allow for clustering. (See for example, the TLFS).

See previous columns – if a concurrent design is used, where people are invited to respond by web (or telephone) first, there is likely to be a trade-off between the cost/practicality of following up geographically dispersed non-responders face-to-face and maximising response (see next table).

Scottish Government Core Questions

Issues above re. boosting may be relevant if there is a desire to increase the sample size for people within particular equality characteristics

See above re. appending admin data – but limited in this context (see next column)

Linking data on age, disability etc. to PAF may not enable accurate targeting of web or postal surveys – so sub-groups could only be boosted in an approach that includes a push-to-web as an initial element by boosting the overall sample size.

SHS

Selection of a random adult within a household is more complicated for push-to-web and postal surveys. Compliance with instructions is often low.

Consider a two-adult per household or all-adults approach if moving to a design that includes push-to-web (requires weighting to adjust for within household clustering). (See for example Active Lives, British Social Attitudes, Food and You 2, and the Participation Survey).

Potential risk of fraud (especially if incentives attached to completion). Also reduces sample efficiency, so needs to be taken into account in sample size calculations.

SCJS

Similar issue to SHS re. selection of a random adult.

See above

See above.

SHeS

As SHeS already invites all adults in a household to participate, it would not have the same selection issues as SHS and SCJS (though fraud could still be a risk).

Potential risk of fraud.

Representation: nonresponse

Non-response refers to the impact of households or individuals not taking part. It occurs not only at the survey level (who is uncontactable or refuses to take part) but also at the question level (the questions that respondents do not answer). Nonresponse can bias the results if those who do respond are systematically different from those who do not, meaning the findings are not representative of the population they are intended to reflect. The table below summarises the key issues to consider when thinking through the potential impacts of mode change on nonresponse and what steps can be taken to mitigate the issues that may occur.

Priority considerations / issues

Potential mitigations

Remaining issues and trade-offs

Cross-cutting issues

Response rates

While there is considerable variation in response rates between surveys conducted using the same mode or modes, in general, for cross-sectional surveys, F2F has been associated with higher response rates than telephone, web or paper surveys. However, there is some evidence that the difference in response rates between F2F and web surveys may be starting to change (see, for example, ESS Round 10 parallel testing). Response rates are also continuing to decrease over time across modes, and there is an ongoing debate about what constitutes an 'acceptable' response rate, or what the 'norm' might be in terms of response rates to F2F surveys post-Covid.

Item nonresponse is higher in paper questionnaires (since computer-aided checks are not possible) and tends to be lower for interviewer administered questionnaires. There is evidence that web surveys have higher item nonresponse than telephone or F2F, particularly for complex or sensitive questions.

Response rates primarily matter because they are associated with risk of nonresponse bias. However, meta-analysis and evidence from the Scottish Government surveys has found fairly weak association between overall survey response rates and nonresponse bias. Higher response rates can be associated with greater bias depending on what drives response, while lower response rates do not necessarily indicate higher bias. Nonresponse bias is also item-specific – so some variables may be more impacted by lower survey response rates than others.

Incentives can increase response rates across all modes but are particularly effective for those with lower baseline responses. They are, by extension, particularly important to consider for push-to-web and paper surveys. However, unconditional incentives, although generally more effective in raising response, may be considered more problematic on low response rate surveys as so many will be 'wasted'.

Determining an 'acceptable' current or future response rate for the Scottish Government surveys is likely to be difficult given the current level of uncertainty among experts and stakeholders on 'norms' (and on the importance of response rates per se).

Cross-cutting issues

Mode and nonresponse bias

Each mode has biases in terms of who is more or less likely to respond via that mode.

  • - F2F is better at reaching participants from more deprived areas and those with lower literacy levels. In relative terms, it may be less good at reaching working households and those in flats with door entry systems.
  • - Web - older people, those without the internet, those from deprived areas, those with lower levels of formal education or with low literacy, those with English as a second language, renters and larger households have all been found to be underrepresented in web surveys. Contrary to what is sometimes assumed, younger people are not overall more likely than other groups to respond online (though given the choice they are more likely to reply online than by paper). Those who are politically interested and engaged with the specific subject are more likely to respond to web surveys (see, for example, evidence from British Social Attitudes).
  • - Telephone surveys have some similar biases to web compared with F2F – renters, those with no qualifications, and people in vulnerable positions such as those who have experienced homelessness are underrepresented.
  • - Paper surveys, like web surveys, suffer from nonresponse from those with low literacy and exclude those with English as a second language. Older people, are more likely to respond via paper than web.

Multiple contact strategies can increase overall response rates AND reduce nonresponse bias.

Monetary incentives have a greater impact on those with a lower propensity to respond (and are therefore more likely to reduce nonresponse bias).

The evidence on nonmonetary incentives is more mixed – there is some evidence to suggest they work better for groups already more likely to respond, and may therefore increase nonresponse bias.

Weighting is used to reduce nonresponse bias but cannot eliminate it. The nonresponse issues associated with the particular mode or combination of modes used on a survey are likely to remain, at least to an extent, even after post-survey adjustment and weighting.

Multiple contacts and tailored contact strategies are particularly important to web and paper surveys and should form part of design and testing. However, there are practical limitations in the extent to which contact strategies can be targeted at individuals less likely to respond, since in general only address level data is available at the contact stage.

Many different incentive strategies are possible. Testing optimal incentive strategies should ideally form part of testing any proposed mixed mode design, given the potential impact on nonresponse bias.

A core challenge that needs to be considered with push-to-web designs is the risk that those more engaged in the topic will be more likely to respond.

Strategies to reduce nonresponse bias may have different impacts on different items – this also needs to be considered in developing and testing such strategies on mixed mode surveys.

SHS

There is some evidence that variables associated with civic responsibility (e.g. volunteering, which is measured on the SHS) are more strongly correlated with survey response – surveys with lower response rates tend to find higher levels of volunteering.

Consent to additional tasks tends to be lower when respondents take part in initial interviews online (see for example evidence from Understanding Society and the Health Survey for England (HSE)). This could impact the proportion agreeing to the physical survey on the SHS

See above re. factors that may reduce nonresponse (though lower responding modes may still be associated with nonresponse bias on these types of measures).

This would need to be factored into design and testing of any new modes – one option might be to increase the sample size for those asked to opt into the physical survey (and potentially to select a sub-sample).

This highlights the challenge of assessing the impact of a potentially lower response rate on surveys – it may impact differently on different measures. This is something to factor into any plans and testing.

SCJS

There is evidence that those who are victims of crime are more likely to respond to web crime surveys than F2F surveys (see for example development work on the Crime Survey for England and Wales) – echoing similar findings for telephone from the 2005 Scottish Crime and Victimisation telephone survey.

See above re. the SHS. While it is possible to incorporate elements that reduce overall nonresponse, these may not necessarily reduce this specific nonresponse bias.

This would require very careful consideration and testing if online was to form part of the future mode design for the SCJS. The CSEW are still testing possible web options, so there may be scope to learn from this work.

SHeS

As noted, consent to additional tasks tends to be lower when respondents take part in initial interviews online than when they take part F2F. This has impacted on agreement to biomeasures on other studies (see HSE and Understanding Society).

See above re. SHS physical survey.

See above re. SHS physical survey.

Measurement error and mode effects: impacts on different question types

'Measurement error' occurs when the answers people give do not accurately reflect what the survey was intended to measure – in other words, it is the difference between the data you obtain from your survey, and the 'true value' of the thing you were trying to capture. Understanding – and minimising – measurement error is key to ensuring that a survey is providing accurate data on which decisions can confidently be based. The table below summarises evidence on the ways in which different survey modes can impact on measurement error across a variety of types of survey question and what can be done to mitigate this.

Priority considerations / issues

Potential mitigations

Remaining issues and trade-offs

Cross-cutting issues - General

Mode effects occur at the question level – so fully assessing the impact of mode change requires assessing impacts at this level too.

Various frameworks exist that can be used to predict which types of survey item will be most at risk of mode effects and/or to assess the quality of survey questions (see for example work to assess the risks of mode effects for questions in Understanding Society).

A full consideration of impacts of a mixed mode design on measurement error is likely to require a more detailed review of the full content of each survey and a considerable number of questions may require redesign (leading to some discontinuity in trend data – see below). Testing is also necessary as mode effects can be unpredictable, in spite of general patterns.

Cross-cutting issues - General

In general, mode effects are likely to be more limited for simple factual questions, but more likely to occur for complex questions, sensitive questions, and attitudinal or perception questions.

Investment in question design and testing is important to understand the extent of mode effects and test ways of reducing them. Cognitive testing is seen as particularly important to understanding differences in how people understand and respond to questions in different modes.

Question redesign can take a number of different approaches (unimode, best practices, optimode) depending on the objective (see TLFS for an example of the 'optimode' approach). Respondent Centred Design and Device Agnostic Design principles can also help support the process to ensure questions are fit for purpose across modes/devices.

It is generally not possible to know the 'true' value – so if there is a difference between modes, it is not possible to be certain which mode is closer to this 'true' value. Testing questions on new modes may also reveal issues with long-standing questions that need to be acknowledged and dealt with. Effective redesign may involve completely rethinking what is being asked, how and why.

Most importantly, mode effects can be reduced through good design and testing, but cannot be eradicated.

Cross-cutting issues - specific question types/ elements

Socially desirable responding is generally a bigger issue within interviewer-administered modes. However, there is evidence that it is more prevalent in telephone than face-to-face interviews, due to the reduced opportunity to build rapport.

Limited mitigation options (beyond standard good practice around reassuring on confidentiality, encouraging honest response, etc.)

Not applicable.

Cross-cutting issues - specific question types/ elements

There is evidence that people respond to attitudinal questions, especially those with likert-type scales, differently by different modes (including both between online and interviewer-administered, and between face-to-face and telephone). SHS and SCJS include attitudinal questions while SHeS includes perception questions that use 5-point likert scales and may be subject to similar issues.

The greater observed tendency to choose a 'neutral' middle option online could be addressed by removing the middle option and having a four-point scale.

Removing the 'middle' option will impact on comparability over time and may also encourage people to select the answer when, in fact, they don't have an opinion, meaning the responses are arguably less accurate.

Cross-cutting issues - specific question types/ elements

There are likely to be differences in selection of 'Don't know' or 'prefer not to say' between interviewer-administered and self-complete modes.

One option is to allow respondents to skip questions, but if they do so politely asking them to answer and making 'don't know' and 'prefer not to say' visible at that point.

In practice, unless DK and PNTS are shown explicitly upfront (which increases their use), they are used much less often in web surveys than in F2F or telephone. There are likely to remain some differences in their use. This may be more of an issue for attitudinal questions, where 'DK' is a valid response.

Cross-cutting issues - specific question types/ elements

Where surveys rely on visual presentation (web surveys, paper, and showcards), there are more likely to be primacy effects (where respondents pick the first response that applies rather than reading the full list). Where surveys rely on aural presentation (read outs in F2F or telephone), recency effects (choosing the last valid response they heard) are more likely. Both are more of a risk with questions with long answer lists.

Limited mitigation options beyond standard practice re. mixing or rotating the order of answer options (more difficult with paper questionnaires)

Cross-cutting issues - specific question types/ elements

Open questions may get shorter/less complete responses online.

Limited mitigation options beyond encouraging fuller response.

SHS

Detailed socio-economic status measures have been particularly difficult to transition from interviewer-administered to self-complete. This is relevant to both SHS and SHeS, which ask detailed questions on this to enable derivation of SIC/SOC/NS-SEC categories.

ONS have developed a self-coded set of questions to measure socio-economic classification, but there were still felt to be limitations to this.

SHS

There is evidence that travel diary data is more difficult to collect online, without an interviewer there to assist and explain.

Limited obvious mitigation. This is one reason the National Travel Survey (NTS) has not moved mode. However, there may be learning from other surveys testing online diaries.

SCJS

There are both methodological and ethical issues around asking questions about within household crimes (like intimate partner violence) without an interviewer present to control who else sees or hears responses.

The role of interviewers in guiding respondents through the victim form to ensure sufficiently detailed responses to enable offence coding was also highlighted.

Limited obvious mitigation. The CSEW does not ask these questions in the telephone follow-up interviews.

There may be scope to share learning and/or testing of revised questions with CSEW.

SHeS

See notes under SHS re. detailed socio-economic measures.

See above under SHS re. detailed socio-economic measures.

SHeS

There is evidence that people may give socially desirable answers to some health behaviour questions (e.g. fruit and vegetable consumption) when interviewer-administered – so moving questions currently asked F2F online may result in different response patterns.

See above mitigations for cross-cutting issues.

Implications of mode for data collection options

In addition to the direct impacts of mode on measurement error, choice of mode(s) of data collection also has implications for other features of survey design, which in turn may constrain the type and volume of data that can be collected. The table below summarises the potential implications of changing or mixing mode for survey length, survey structure, options for presenting answer categories, collection of additional data, beyond the main interview, and for accessibility and inclusion.

Priority considerations / issues

Potential mitigations

Remaining issues and trade-offs

Cross-cutting issues

Length: There is no consensus among survey methodologists on the 'optimal' length of interview for different modes. Views vary in particular on how long is 'too long' for online. While some recommend restricting web surveys to 20 minutes, there is also evidence that significantly longer web interviews (up to 50 minutes) can work (see, for example, the European Social Survey). There is evidence of some risks, however, around break-off rates (especially if people complete on a mobile) and data quality with longer online interviews.

Telephone surveys may need to be shorter than F2F – one view was that 35 minutes is the limit before data quality declines (See also evidence from the NSW and HSE pandemic telephone surveys).

The maximum acceptable length for any questionnaire will depend on interest and cognitive burden. Good design is again key, particularly if changing mode. Examples of longer web surveys cited in this report were also implemented after testing the impact on response and drop-off rates.

Options for enabling inclusion of more content with an online mode include:

  • - Modularisation, so that different respondents complete different elements (see for example BSA and Food and You 2). However, this reduces sample size for modularised content and can create complexities in imputation and weighting.
  • - Setting up the web survey to enable people to break off and return to it later.
  • - Moving to a partially longitudinal design, such that more data can be collected over report interviews with the same respondents (this is being tested for the CSEW).
  • - Increasing the use of admin data (see below).

Tolerance for completing surveys online may be increasing – so there may be more evidence on longer web surveys and what works in future.

Modularisation is not possible with paper surveys, so if that is an element of a mixed mode design that needs to be taken into account.

Cross-cutting issues

Survey structure: Complex structures (looping, randomization, rotations, etc.) and routing are far less feasible with paper questionnaires. These are all features of the 3 Scottish Government surveys.

If paper is part of a mixed mode design, the paper element may need to be a simplified / shortened version. (See for example the Participation Survey, which has a single, simplified paper questionnaire compared with three longer web-versions).

Cross-cutting issues

Presenting answer categories: Asking questions currently based on long showcards is difficult by phone. This was a challenge for all 3 Scottish Government surveys during the Covid-19 pandemic.

Telephone respondents can be sent single-use hard copy showcards or online showcards can be used. (This was implemented on the three cross-sectional Scottish Government surveys during the pandemic and is used on the Childcare and Early Years survey of parents).

One view was that use of showcards is, in any case, problematic for less literate respondents, so redesigning surveys for different modes may be an opportunity to reassess their use.

Cross-cutting issues

Accessibility: Different modes are associated with different accessibility features, and this needs to be considered when choosing potential mixed mode designs.

  • - Interviewer-administered modes are more accessible to those with low literacy or who need additional assistance
  • - Web surveys exclude those who are not online, who are typically older and more likely to be living in deprivation, but can allow additional accessibility features for some additional needs.
  • - Telephone interviews are less accessible to those who are deaf or hearing impaired, though text-relay services are possible.

Offering multiple modes can be viewed as increasing options for completion and therefore as a positive for inclusion. However, this may partly depend on how modes are offered (e.g. whether/when alternative modes are actively highlighted).

Testing impacts on inclusion should be a deliberate element of any testing of new or alternative modes.

Ensuring that the combination and sequencing of modes does not exclude particular groups is an important consideration.

Even if particular modes are being prioritised initially (e.g. for cost reasons), there need to be clear alternatives for those for whom specific modes are not feasible (communicated in a manner accessible to those groups).

While offering alternative modes might improve inclusion in the sense of allowing people greater choice over how they complete the survey, the more modes offered, particularly concurrently, the greater the cost (see below re. cost). And the more modes offered, the greater the potential number of mode effects (discussed above) to consider.

SHS

Physical survey: The physical survey involves an in-person surveyor visit.

Limited. The EHS experimented with respondent walk-throughs or filming, but at this point anticipates this element needs to remain fully face-to-face because of the technical aspects of the task. The SHCS also conducted external only visits to properties during the pandemic, combined with additional data collection by phone, but analysis of the resulting data noted the impact of the change in approach on key estimates and argued that this approach should not be used in future waves because of mode effects.

There are no obvious alternatives to a face-to-face physical survey at this point.

SHeS

Height and weight data: There is evidence from HSE and Active Lives that respondents overestimate height and underestimate weight when this is self-reported rather than interviewer collected (which would be the case for all modes except F2F).

HSE and Active Lives have developed an adjustment formula, based on comparing the two.

Even with the adjustment, concerns remain about accuracy of height and weight data, especially for those very under- or over-weight. Moreover, so far there does not appear to be similar data to enable adjustment of data on child height and weight.

SHeS

Biomeasures: there are options for collecting some biomeasures remotely (and new approaches have been developed in response to need in the past). However, response rates to self-collected biomeasures can be much lower for those who participate in the initial survey online rather than face-to-face.

Incentives could be tested to improve response rates to self-collected biomeasures, and/or the proportion of the sample asked to return these could be increased.

The issues noted above around response rates and representativeness also apply in this context.

Impacts of changing or mixing modes on time series

Each of the three main cross-sectional Scottish Government population surveys have been used to generate trends and time series that now stretch back over several decades. Understanding the likely impact of changing or mixing modes on these trends and time series is therefore a key issue.

Priority considerations / issues

Potential mitigations

Remaining issues and trade-offs

Cross-cutting issues

Impacts on trends are a direct result of the issues discussed in previous sections – particularly mode effects, but also issues relating to sample composition and representativeness.

The more that can be done to mitigate mode effects and maintain sample representativeness, the greater the potential to reduce impacts on time series.

As noted previously, mode effects can be unpredictable and cannot be completely avoided.

Cross-cutting issues

The scale of impact on trends may vary between questions in the same survey, with those most at risk of mode effects also at most risk of discontinuity.

Work to assess the likelihood of mode effects will also help in identifying those questions where maintaining time series is likely to be an issue.

Cross-cutting issues

Understanding the impact of changing mode on trends was seen as a crucial element in informing decisions about mode change

Parallel testing is seen as the gold standard for understanding mode effects (and by extension, likely impacts on trends). This also has the advantage of starting to build a new time series (while the 'old' mode is still being used).

The time and resource available will impact on the degree of parallel testing that is feasible before reaching a decision. For example, ONS have run a full-scale parallel run of the Transformed Labour Force Survey for over two years and have announced this will extend to at least early 2025. In contrast, the Participation survey, which replaced Taking Part, was introduced without any parallel testing.

Cross-cutting issues

The consensus across experts is that it is not, in practice, possible to use statistical techniques to avoid a break in the time series once you transition to a different mode design.

In theory, recalibration of survey trends to account for changes in modes is an option. In practice, this has rarely been done and was regarded by experts as infeasible at large scale, resource intensive, introduces additional complexity for users, and involves assumptions that are open to question.

Recalibration of all measures to maintain time series is not likely to be a feasible practical option. Recalibration of even a small number of measures may not be practical (or desirable) given the issues identified.

The consensus view was that changing mode means accepting an effective break in the time series.

Cross-cutting issues

Avoiding a break in the time series may be a reason not to change mode (or to maintain some data collection of key trends by the existing mode). (For example, concern about breaking trends was cited as a reason why the Childcare and Early Years survey has not transitioned after a push-to-web pilot).

An alternative view (rather than a mitigation) is that it is better to transition sooner rather than later to start developing future trends.

The trade-off between maintaining trends and other priorities – including a bigger sample size and adapting data collection to current policy needs – is a central issue for any decision on future mode, and is likely to be a priority area for stakeholder engagement.

Survey quality metrics

In considering the potential impact of changing or mixing modes on survey quality, it is important to consider how quality can actually be assessed. The table below sets out some of the challenges and potential solutions and trade offs around this.

Priority considerations / issues

Potential mitigations

Remaining issues and trade-offs

Cross-cutting issues

A key concern in assessing the potential impact of changing mode design is the impact it will have on survey quality. Assessing quality is complex – there are multiple features of quality and different stakeholders prioritise different elements.

Various existing frameworks have been produced to help support consideration of survey quality across different dimensions of both statistical accuracy and usability. They tend to combine key measures (response rates, attrition rates, breakdowns of sample composition by key measures) and qualitative assessments of other elements, including relevance, accessibility and clarity, etc. These can provide templates to help the Scottish Government consider the quality of different mode designs. (See for example the quarterly ONS Labour Force Survey Performance and Quality monitoring reports).

Frameworks with multiple quality dimensions may be time consuming to complete, are potentially difficult for non-expert users to digest, and can be open to disagreements on interpretation, so need to be approached with care and an eye to effective communication.

Cross-cutting issues continued

Response rates have tended to be used as a key summary measure of quality. However, RRs are increasingly seen as a 'blunt tool'. They contribute to only one element ('accuracy') and (as discussed earlier) have a questionable relationship with nonresponse bias.

Experts suggested focusing on nonresponse bias and item response patterns instead, including comparing weighted and unweighted sample profiles against other external estimates (e.g. Census or Housing statistics), and assessing any change in item response that are not expected to change quickly or where there is a robust external point of comparison.

Response rates remain important to some users – moving away from them may be difficult to explain to stakeholders, especially as there is no single 'metric' that can be produced to measure nonresponse bias or item response. However, a shared interest in data quality provides a starting point for engaging with users on this issue.

Financial and resource considerations

Resource considerations do not fit neatly into the 'issues-mitigations-remaining trade offs' format used for other topics in this report. Rather, the question is how to balance available resources, quality (as determined by all the elements discussed in previous chapters in this report), and sample size (which is also an aspect of quality – but one which is arguably worth separating out given its specific relevance to value for money in the context of different mode designs). There is no straightforward formula for deciding how to balance these. However, decisions should be informed by a clear understanding of overall cost drivers, the short-term implications of changing mode, and medium or longer-term costs. These are summarised in the table below.

More radical options for reallocating resources that could be considered alongside mode redesign include:

  • Splitting surveys up so that different elements are conducted using different modes
  • Building in longitudinal elements to maximise return from each face-to-face interview achieved
  • Combining surveys
  • Finding alternative sources of data to use instead of or alongside the surveys, including passive data (which is at an early stage in survey research) and administrative data.

Drivers of survey costs

Short-term resource implications of changing or mixing modes

Medium-term resource implications of changing or mixing modes

Longer-term cost impacts

All modes;

  • - Sample size (cost advantages of cheaper modes are greater the bigger the sample size)
  • - Reminder and incentivization strategy
  • - Questionnaire length
  • - Extent of development, piloting and testing
  • - Complexity of sample management, data processing and weighting

Telephone and F2F: Reissue strategy

F2F: Clustering of the sample.

There was a consensus among experts that changing or mixing modes in a planned and robust manner increases short-term costs. Short-term costs of changing/mixing modes include:

  • - Questionnaire design and testing
  • - Parallel runs
  • - Experiments to test incentive and invitation strategies
  • - New/enhanced sample management and data collection systems
  • - Stakeholder engagement
  • - General commissioner and contractor time to plan and implement mode change.

The longer-term resource implications will depend on the exact approach and combination of modes AND what other changes are made at the same time (e.g. to frequency, sample size, etc). Given this, it is not possible to make a definitive comparison of the resource implications of different mode designs.

However, costs savings are likely to be greatest where surveys move to predominantly self-completion modes and make little or no use of F2F. Where face-to-face remains a significant element, cost savings tend to be less substantial.

Costs are not static. While interviewer administered modes are the most expensive and are likely to see further cost increases (e.g. relating to minimum wage changes), other factors may impact on mixed mode designs too, including:

  • - Increased postal costs (which can be significant in the context of push-to-web designs, and especially designs where a paper questionnaire may be included for some/all for at least one mailing)
  • - Call blocking technology may make telephone surveys more expensive in future.

One option to balance this is to target F2F follow-up only on specific demographics (although there are limits to which groups can effectively be targeted in this way).

Other costs that may remain higher longer-term for mixed-mode designs include: questionnaire design; field management; and data processing.

Whatever design is adopted, there is likely to be a need to continue to innovate to improve efficiency as costs change.

There are also environmental resource implications from surveys that require consideration. All survey modes have some environmental impact, including: travel (F2F); printing (push-to-web and paper especially); vouchers (where used).

Contact

Email: sscq@gov.scot

Back to top