Long term survey strategy: mixed mode research report

Findings from research exploring mixed mode survey designs in the context of the Scottish Government’s general population surveys. The report details information on key issues, potential mitigations and remaining trade-offs, and includes 21 case studies on relevant surveys.


6. Measurement error and mode effects: impact on different question types

Introduction

‘Measurement error’ occurs when the answers people give do not accurately reflect what the survey was intended to measure – in other words, it is the difference between the data you obtain from your survey, and the ‘true value’ of the thing you were trying to capture. Understanding – and minimising – measurement error is key to ensuring that a survey is providing accurate data on which decisions can confidently be based.

This chapter explains in more detail what is meant by measurement error and why, and how, measurement errors occur. It then discusses the ways in which different survey modes can impact on measurement error across a variety of types of survey question, and what can be done to mitigate this. It starts, however, with a brief discussion of Scottish Government stakeholder priorities in terms of the measures that matter most to them across the SCJS, SHS and SHeS.

Scottish Government survey stakeholder priorities

In assessing the potential impact of mode effects relating to mixing modes on the three Scottish Government surveys, it is important to consider the kinds of measures that matter most to their stakeholders. This is particularly important since, as this chapter goes on to discuss, mode effects may vary or manifest themselves in different ways for different types of questions. The views of the stakeholders interviewed for this study on this question varied depending on their specific role and interest in the survey in question:

  • SCJS stakeholders discussed the value of both data on victimisation (headline rates and details on experiences of specific crimes – including the profile of victims, the location and time of crimes, and the involvement of alcohol and drugs) and data on views and experiences of justice services. The questions on victimisation were seen as important to the longer-term understanding of, and narrative around, experiences of crime, while questions on justice services were valued by stakeholders responsible for those services. Differing views were expressed on whether the current balance between these elements was appropriate or had swung too far in favour of the latter. However, it was noted that adding the focus on perceptions of justice services had, historically, secured greater buy-in from policy makers, while comments from stakeholders involved in justice services indicated that they continue to value these questions.
  • SHS local authority stakeholders particularly valued the wide range of data relevant to local services and understanding their local population, from questions on satisfaction with services, to views of neighbourhoods, to cost of living and financial resilience questions, and more. Transport stakeholders unsurprisingly felt the travel diary was ‘invaluable’. Data on all the topics where the SHS/SHCS provides the basis of NPF and other key indicators was singled out by internal Scottish Government stakeholders.
  • SHeS stakeholders particularly valued the robust measures collected by interviewers on height and weight, in order to calculate BMI. The detail on prevalence of different long-term conditions, particularly mental health conditions, was also highlighted as of key importance, both overall and within subgroups of the population. The new questions on menopause were also highlighted as providing key data unavailable elsewhere.

This highlights the wide range of measures that matter to stakeholders. This chapter discusses evidence from the literature, expert interviews and survey reviews on the various ways in which different modes may impact on measurement error with respect to particular question types and elements, drawing out the potential relevance to the three surveys where possible. However, as each survey consists of many different measures and question types, a detailed assessment of the possible impact of specific mode options for the three Scottish Government surveys would require a more detailed review of the full questionnaires with these considerations – as well as the priorities of stakeholders – in mind.

Definitions: ‘measurement error’ and ‘mode effects’

In considering measurement error in the context of different survey modes, it is important to be clear what is being described. ‘Absolute mode-specific measurement errors’ reflect the fact that no single survey mode can be expected to capture the ‘truth’ (or, to put it more technically, the sought-after population parameters) perfectly. This is true even in the case of a census where errors of representation (discussed in the previous chapter) can (largely) be dismissed. That is, each survey mode will be associated with some degree of measurement error, and the extent and direction of this error will vary across the items within the data collection instrument. These ‘absolute mode-specific measurement errors’ usually cannot be calculated because the population parameters of interest are generally unknown – in other words, for most survey questions we do not know the ‘true’ value of what we are measuring, which is required to calculate absolute measurement errors.

Instead survey methodologists are usually restricted to investigating the extent to which different modes deliver different estimates for the same survey item: this difference between modes can be thought of as ‘relative mode-specific measurement effects’. For economy of language this is often shortened to ‘mode effects’, because while a mode effect has both a measurement and a representation component, the context usually makes it clear which is under consideration.

To illustrate what is meant by a mode effect (a relative mode-specific measurement effect), consider a question included in both a face-to-face and a web instrument which asks respondents to report on how good, in general, their health is. A relative mode-specific measurement effect describes the tendency for a respondent to provide a more positive (or negative) response in one mode than the other, or perhaps to provide a substantive response in one mode but a refusal or a ‘don’t know’ response in the other. These tendencies are products of the data collection mode, and – as we cannot know the ‘absolute mode-specific measurement errors’ for responses to questions on self-assessed health, we cannot be certain which mode is closer to the ‘true’ underlying distribution of perceptions among the population. This is an important point that was emphasised by several of the survey experts interviewed for this study – where a change in survey mode leads to a change in response patterns on a particular question, you typically cannot be certain whether the old or the new mode is producing a more accurate measure of the ‘true’ value.

Why do measurement effects occur?

Measurement-specific mode effects have received a great deal of research attention over the past several decades. Survey methodologists have sought to understand why they occur, when one can expect them, their magnitudes, and what steps can be taken to avoid them or to mitigate their impacts. Mode effects arise from a set of interactions between the following domains (e.g. see Biemer and Lyberg, 2003; Groves et al, 2009):

  • The cognitive steps a respondent takes when answering a survey item
  • Features of the survey mode
  • Characteristics of the survey item, and
  • Characteristics of the respondent.

The main focus of this chapter is on research that identifies how cognitive processes and features of survey mode interact with characteristics of item- and respondent-level characteristics to produce mode effects, since this literature can shed light on the types of questions or respondents that the Scottish Government might need to focus on when considering the likely impact of changing or mixing modes on its surveys. However, before moving on to this literature, we briefly introduce a) cognitive models of survey response processes and b) the key features of survey mode relevant to understanding mode-specific measurement effects.

Cognitive models of survey response: understanding why people respond to questions in the way they do

One of the most widely used models of the survey response process is that described by Tourangeau and Rasinski (1988, and see also Tourangeau, Rips, and Rasinski, 2000). This model breaks down the process of giving an answer to a survey question into four main cognitive stages: (i) comprehension of the survey item, (ii) retrieval of information from memory, (iii) formation of a judgement based on this information, and (iv) selection and reporting of an answer.

This model holds that measurement errors can arise at any stage of the response process. For instance at the comprehension stage the question wording might be ambiguous or complex or the respondent may not understand a technical term. At the retrieval stage the respondent may forget information, might ‘telescope’ events as occurring more recently (or longer ago) than they in fact occurred, or might misremember sources of information. At the judgement stage the respondent might make certain assumptions or estimations or their judgement might be biased by personal opinions or beliefs. And at the response selection and reporting stage the respondent might misunderstand the meaning of a response scale or might fail to notice a response option that is relevant to them.

Returning to the different modes of data collection, each mode has its own distinguishing features and these features have implications for the types of errors that tend to arise at each stage of the survey response process. Understanding the cognitive processes respondents go through in responding to surveys has also been crucial in informing the development and testing work methodologists have put in place when they have been planning or considering changing or mixing modes on a survey. The experts interviewed for this study placed a high degree of emphasis on the importance of piloting and testing, including cognitive testing, when changing modes, both in order to understand mode effects and, where possible, adjust instruments to reduce measurement error (in all modes). This is discussed further below, under ‘Mitigations’.

Features of data collection modes that may impact on how people answer questions

The features of the various survey modes have been summarised in various ways by survey methodologists (e.g. Couper, 2011; Schouten et al, 2021). In general, the greatest differences with respect to the features are between self-administered modes (web and paper) on the one hand, and interviewer-administered modes (face-to-face and telephone) on the other. Seven key features of different modes that may impact on how people respond to survey questions (as well as options for data collection, discussed in the following chapter) are:

  • Privacy - the extent to which a respondent provides their answers in private or provides them to (or in the presence of) another individual. Web and paper interviews have high levels of privacy but are not completely private modes. For instance a respondent might complete an interview in the presence of another household member who they believe may, inadvertently or not, see their response. This is particularly relevant to consider with respect to interviews on sensitive topics, as discussed below. Face-to-face and telephone interviews have low levels of privacy as respondents provide their answers to an interviewer. However, it is possible to mitigate this by including a self-completion element within an interviewer-administered interview. Moreover, interviewers can, to some extent, control the interview context to a greater degree than is possible with self-completion – for example by asking for other household members not to be present.
  • Interaction - the extent to which the survey experience is interactive, resembling an ordinary conversation. Face-to-face has the highest levels of interaction, followed by telephone. However, as interviewers are generally trained to stick fairly rigidly to the wordings contained in the script (in order to avoid between-interviewer measurement errors), the degree of interaction in these modes can be exaggerated. Web and paper interviews are not interactive save for the possibility for web surveys to incorporate certain elements such as chat bots, or questionnaire items which ask the respondent to record video or audio of themselves providing open-ended answers. However, these technologies are in their infancy and are, currently, rarely used (Revilla and Couper, 2019).
  • Assistance - how much help is available to the respondent as they complete the interview. Interviewer administered modes offer the highest potential for assistance, although as with interaction, it is important to note that the types of assistance permitted tend to be highly restricted, to avoid different interviewers giving different ad hoc advice to respondents. Web interviews allow for a small degree of assistance, for instance via help buttons, links to frequently asked questions, the inclusion of check questions which ask the respondent to review impossible or unlikely responses, and the provision of a survey helpline number. Paper interviews allow for very little assistance, limited only to what guidance can fit onto the printed survey instrument and the provision of a survey helpline number.
  • Presentation - whether the interview uses aural, visual, or both channels of communication to present the survey instrument. Telephone interviews are, obviously, wholly aural. Face-to-face interviews are primarily aural but allow for presentation of visual materials. Web and paper interviews are visual modes. Aural and visual communication may be more or less effective for different types of information – for example, most people cannot hold more than four simple response options in their memory if received aurally, but do not need to remember them if presented visually.
  • Computerisation - refers to the extent to which the mode is computer-assisted. Web, face-to-face, and telephone interviews are all computer-assisted, affording opportunities for many complex survey design features around routing, randomisation and rotation of questions and answer codes, logic checks, and use of ‘feed-forward’ data (where answers to one question appear in the text for another, later question, for example). Paper interviews have no computer-assisted elements, and therefore cannot employ any of these features.
  • Speed - how quickly the respondent goes through the interview, as well as the variations in pace during the interview. Web and paper interviews, being self-completion modes, afford the respondent the greatest flexibility with respect to speed and pace. Face-to-face and telephone interviews, being interviewer-administered, are relatively restricted with regards to speed and pace.
  • Scheduling - to the extent to which the respondent can decide when and where they do the interview. Web and paper interviews provide the greatest flexibility in terms of when and where people take part. Face-to-face and telephone interviews are far more restrictive with respect to scheduling as timings are limited by interviewers’ working hours and availability. Face-to-face interviews also afford very limited flexibility in terms of location, with almost all general population random probability surveys (like the three Scottish Government surveys) taking place in respondent’s homes.

How do mode effects impact on different types of questions?

As discussed above, the three main Scottish Government general population surveys contain a wide variety of question types and topics. Mode-specific measurement effects do not apply at ‘questionnaire’ level but at question level – so understanding the potential impact of changing or mixing modes on these surveys requires consideration of the potential mode effects and their impact on the full range of questions the Scottish Government wants to include. In general, the literature and expert interviews suggests that mode effects may be limited for factual questions, at least where these are simple, not particularly sensitive, and require limited recall (e.g. Bosnjak, 2017; see also case study for Childcare and Early Years survey of parents in Appendix A). However, mode effects are more likely to occur for complex questions (including those requiring a high degree of recall), sensitive questions, and attitudinal/perception questions. These are discussed in turn below, although it is important to note that some of the response tendencies associated with mode effects (e.g. satisficing, social desirability bias, etc., which we describe below) may cut across different question types.

Complex questions

Stakeholders for the Scottish Government’s general population surveys highlighted the complexity of some of the concepts the surveys aim to measure when discussing concerns about the potential impacts on data accuracy of moving away from a predominantly face-to-face approach. There was a perception that respondents might be tempted to give less accurate or thoughtful answers, particularly if the surveys moved online where there is no interviewer to prompt, provide additional explanations, or to encourage full responses (reflecting issues around the scope for interaction and assistance in other modes, as discussed above). The role of interviewers in taking respondents through the victim form element of the SCJS was highlighted as something that was seen as very important to obtaining sufficiently detailed and high quality data for offence coding. There was also a concern that people might “race through” their answers (reflecting potential differences in speed of completion by different modes, as discussed above), with negative impacts for data quality. A key benefit of interviews being face-to-face was that they were felt to be conducted in “a considered environment”.

The view that complex questions are more difficult to transition from an interviewer-administered to a self-complete mode was echoed in expert interviews with methodologists for this study. For example, several interviewees discussed the challenges they had encountered collecting data on socio-economic status online. This data is typically based on several questions in a face-to-face or telephone survey, with interviewers trained to probe for detailed information on features of people’s jobs in order to be able to correctly code their socio-economic classification. Encouraging people to provide the same information online was providing very challenging – for example, one interviewee noted that people might enter ‘work in education’ and the coder trying to interpret the data after the interview would not know if that person was a cleaner in a school or a head teacher. Another interviewer cited an example of a mechanic saying they had had 15 jobs in a week – an interviewer would have intervened to clarify and collect the correct information, but this is difficult to correct in a self-completion mode. ONS have developed a self-coded set of questions to measure socio-economic classification, but there were still felt to be limitations in what these are able to collect terms of data quality and accuracy, with work ongoing to try and improve this. Detailed household financial information was also reported to be more difficult to collect online, without an interviewer there to assist.

The literature also discusses the fact that the complexity of a survey item has implications for how much assistance respondents will need, how much time they will need, the likelihood that they will interpret the question as intended, and how motivated they will be to provide an accurate answer. Along these lines, research by Centrih et al. (2020) found that web respondents to a travel survey were more likely to combine two or more separate trips into a single trip than were face-to-face respondents, a finding the authors attribute to the complexity of the task, the lack of interviewer assistance, and possibly the desire for web respondents to proceed through the interview quickly.[41]

The literature also generally holds that questions which are complex or difficult, and which thus require the respondent to do more cognitive work, are more prone to ‘satisficing’ behaviours, whereby respondents take a variety of mental short cuts to enable them to answer more quickly or easily. The tendency to engage in satisficing may also be influenced by survey mode (see Schouten et al, 2021 for a review). For instance, interviewer-administered modes allow for relatively high levels of interaction, and the mere presence of the interviewer can motivate respondents to engage in more effortful cognitive processing. Opportunities for assistance are also greater in interviewer-administered modes and this can help respondents go through the appropriate cognitive steps in formulating their answer. On the other hand, as discussed above, the degree of interaction and assistance possible in interviewer administered modes should not be over-stated - restrictions in terms of speed and scheduling for interviewer-administered modes may in itself sometimes lead respondents to engage in satisficing behaviours. Specific examples of satisficing (including acquiescence bias, non-differentiation, and non-substantive responding) and their relationship with survey mode are discussed further below.

Sensitive questions

Encouraging honest answers to sensitive questions was a key concern for stakeholders in the Scottish Government surveys, particularly with regard to the SCJS questions on sensitive crimes and SHeS questions on health behaviours. They expressed concern that respondents might be less likely to give full or honest answers to sensitive questions without an interviewer to reassure them that it would be treated confidentially and build trust.

The likely impact of mode change on ‘fullness’ of answers to sensitive questions arguably relates to issues of complexity, discussed above. Whether or not respondents are likely to give ‘honest’ answers, however, raises questions around social desirability bias, in particular. Social desirability bias occurs when respondents give answers that they see as more likely to be interpreted as socially acceptable, but which do not reflect their true view, behaviour or experience.

The literature largely indicates that social desirability bias is less likely in self-completion modes than in interviewer-administered modes; it is the presence of an interviewer that leads the respondent to modify their answers to appear more socially acceptable, to avoid embarrassment, or to avoid judgement. There are many examples where web-based surveys appeared to find lower reported levels of behaviours or attitudes that might be considered ‘socially desirable’ compared with face-to-face interviews. For example:

  • Rillo and Mikucka (2017) found lower ratings of subjective wellbeing in web than telephone interviews.
  • Nigg et al. (2009) found lower reported levels of physical activity and fruit and vegetable consumption in web than telephone interviews – this is relevant to SHeS, which measures all of these.

This evidence was echoed in expert interviews with other survey commissioners and methodologists – there was a general view that people give more truthful responses to sensitive questions when asked without an interviewer. For example, it was noted that when Understanding Society uses telephone interviews to ‘mop up’ final non-responders, sensitive questions that are usually asked self-complete do not work as well. For this reason, sensitive topics, such as questions about domestic violence, are often asked as a self-complete module, even during face-to-face interviews.

Even some questions that might not be considered particularly ‘sensitive’ as such were noted to be potentially subject to social desirability bias – for example, the Participation survey found much lower levels of use of libraries with their web-first approach compared with levels previously found on Taking Part (face-to-face). While it was not possible to establish which mode was more accurate, it was suggested that the difference might reflect a greater willingness to admit to low or no use of libraries in a self-completion mode.

However, there are a number of additional points that are important to note here. First, in practice, sensitive questions on face-to-face surveys tend to be asked in a CASI self-completion section within this – this is the approach taken on SCJS regarding sexual offences and intimate partner violence, for example. Second, expert interviewees highlighted that the need to consider not only mode effects but also ethical issues when considering asking sensitive questions by different modes. In particular, some researchers advise strongly against asking questions on intimate partner violence on a general population survey remotely,[42] without an interviewer present, because of the risk of other people in the household overhearing (when by telephone) or seeing (web-based) people’s responses. This was described by an expert interviewee as both an ethical and a methodological concern, since not being able to control who else is in the room when a survey is completed means you might not get an accurate response on crimes committed in the home. Third, it was also noted that some of the differences in response patterns to sensitive questions by mode may in fact reflect differences in sample profiles (covered in the previous chapter of this report). For example, one interviewee described finding differences in responses to a question on history of homelessness when a survey moved from face-to-face to telephone during the pandemic, but it was unclear whether this was a mode-specific measurement effect, or because fewer people likely to have experienced vulnerabilities were participating.

While the extent of socially desirable responding seems to be broadly similar within interviewer-administered (face-to-face and telephone) and self-administered (web and paper) modes (e.g. Bosnjak, 2017; Meich et al, 2021), some differences have been identified. For instance, a meta-analysis by Gnambs and Kaspar (2017) found more socially desirable responding for reports of sensitive behaviours in paper than in web interviews, a finding the authors suggest is attributable to the greater levels of (perceived) privacy afforded by web interviewing. Greater levels of socially desirable responding have been identified in telephone than in face-to-face interviews by Jäckle et al. (2006), a finding the authors attribute to the lower levels of interaction in telephone than in face-to-face interviews, resulting in less rapport being established between the respondent and the interviewer, and fewer opportunities for the interviewer to assure the respondent of the survey’s legitimacy. This was reflected in observations from an expert interviewee that people were less willing to give responses to questions on their financial circumstances when surveys moved from face-to-face to telephone during the pandemic. Similarly, the technical report for the 2020 SHeS telephone survey suggested that the lower opportunity to build rapport coupled with not being able to include a self-completion section via telephone might have made respondents less inclined to reveal sensitive information, resulting in underestimates of sensitive measures such as those relating to mental health or health behaviours like smoking and alcohol consumption (McLean and Wilson (eds.), 2021).

Attitudinal questions

All three of the main Scottish Government general population surveys currently include a range of attitudinal questions on different topics. For example:

  • The SHS asks about attitudes to local council service provision, views of local neighbourhoods, and attitudes to climate change
  • The SCJS asks about views of local policing, sentencing options, and perceptions of crime levels
  • The SHeS self-completion sections includes a number of sets of questions that ask people to assess their mental health and wellbeing. While not ‘attitudinal’ in the same sense as the questions included in the SHS and SCJS, these are arguably subject to similar issues to those discussed below.

Attitudinal questions are subject to a number of responses tendencies that can interact with mode to produce mode-specific measurement effects, including social desirability (discussed above), as well as acquiescence bias, non-differentiation, and positivity bias.

Acquiescence bias – also known as ‘yea-saying’ – is a form of weak satisficing that refers to the tendency to agree (rather than disagree) with survey items that make an assertion, given that it takes less effort simply to agree than to think of reasons for disagreement. An example would be the tendency to choose the ‘strongly agree’ or ‘tend to agree’ response options at questions using Likert scales, irrespective of one’s true opinion.

Acquiescence bias is often attributed to surveys being interviewer administered – for example, the desire to be polite to the interviewer or avoid confrontation (Leech 1983), deference to the interviewer (Lenski and Leggett, 1960), and not wishing to admit one’s lack of an opinion to the interviewer (Beukenhorst and Wetzels, 2009). Liu et al (2017) found greater evidence of acquiescence bias in face-to-face compared with web interviews. However, the evidence for differences in acquiescence bias by survey mode is somewhat more mixed than this would suggest (e.g. see Roberts et al, 2019). Some studies have found greater levels of acquiescence bias in telephone surveys than in web surveys (e.g. Braunsberger et al, 2007; Christian et al, 2008). However, elsewhere no evidence of differences in acquiescence bias between telephone and web modes (Fricker et al, 2005) or face-to-face and web modes (Heerwegh, 2009) has been found. There is some evidence that acquiescence bias is greater in telephone than in face-to-face interviews (e.g. Holbrook et al, 2003; Ariel et al, 2008) and greater in telephone than with paper (e.g. Dillman et al, 1996).

Non-differentiation is a form of strong satisfising where the respondent chooses the same response option for each of a series of survey items that use the same response scale. Non-differentiation has been found to be greater in web interviews than in both face-to-face interviews (Chang and Krosnick, 2009, 2010; Heerwegh and Loosveldt, 2008) and telephone interviews (Fricker et al, 2005). Research has also found non-differentiation rates to be greater in telephone than in face-to-face interviews (Krosnick, 2005; Holbrook et al; 2003, Roberts, 2007). This may have particular implications for any ‘sets’ of attitudinal questions the Scottish Government surveys include. It may also have implications for other, non-attitudinal, sets of questions where these use the same response scale.

Positivity bias refers to the tendency for a respondent to choose a positive response option from an answer scale. This form of bias is a particular issue, for instance, for attitudinal questions which commonly use Likert-type scales (for example, agree-disagree questions on statements about council services in the SHS). A meta-analysis by Ye et al. (2011) found that those responding via interviewer-administered modes (telephone or face-to-face) were more likely to choose the most extreme positive response option than were those responding via the self-administered modes or web or paper. Ye and his colleagues attribute this tendency to a general reluctance for people to communicate unwelcome information to others, the so called “MUM” (Mum about Undesirable Messages) effect.

Another potential difference in how people respond to attitudinal questions by different modes relates to use of ‘middle’ options (e.g. ‘neither agree nor disagree’). An expert interviewed for this study noted that they had observed more people opting for ‘neutral’ middle options when questions using a five-point likert scale were transitioned from face-to-face to web. This mode effect was something they were currently “grappling with” as they used a number of these satisfaction questions for key performance metrics – they suggested they might need to move to a four-point scale, to reduce the level of ‘neutral’ response (but this would reduce comparability with previous measures – trends over time are discussed further in chapter 8).

Use of ‘don’t know’ and ‘prefer not to say’ options

Selection of ‘neutral’ middle options, along with changes in how people use ‘don’t know’ and ‘prefer not to say’ options can be described as ‘non-substantive’ answer choices. This does not imply that they are not an accurate reflection of respondent’s views – ‘don’t know’ may capture someone’s attitude more accurately than choosing an agree or disagree statement. However, to the extent that different modes appear to result in different levels of use of non-substantive answer categories, there is a need to take account of this when considering mode-specific measurement effects, since they can have significant implications for response distributions.

In general, making “Don’t know” (DK) and “Prefer not to say” (PNTS) response options explicit (as opposed to these options being accepted only if spontaneously volunteered) greatly increases the proportion of respondents selecting them (e.g. Huskinson et al, 2019). The handling of DK and PNTS response options has the greatest implications between interviewer-administered and self-administered modes, given their primarily aural and visual presentations, respectively. For telephone and face-to-face interviews DK and PNTS response options tend not to be offered explicitly – that is, they are not read out by the interviewer, or printed on showcards (though respondents are told at the outset that that the interviewer will accept a DK or a PNTS response if this is offered). For paper and web interviews this approach is not possible, and these response options must be explicitly presented, or absent.

Attempts have been made to mirror the typical interviewer-administered approach to capturing DK and PNTS response options in web interviews, in the interests of maximising data quality and minimising mode effects. For instance de Leeuw et al. (2016) advise against offering an explicit DK response option for web interviews, but instead recommend permitting the respondent to skip the question, with a polite follow-up probe asking them to answer, at which point DK becomes available for selection (see also Al Baghal and Lynn, 2015).

However even when respondents are informed up-front that DK and PNTS response options will become available should they skip a question without having selected an answer, very few tend to make use of this functionality (Jessop, 2019), with rates of DK and PNTS consequently being far lower than in their interviewer-administered counterpart modes (Huskinson et al, in press). Other researchers have recommended that an explicit DK should only be provided in web interviews for questions where there is a reasonable possibility that respondents will not know the answer, for instance for questions that assume knowledge that the respondent might not possess (Vis-Visschers, 2009).

Questions with long answer lists

Questions with long answer lists are particularly likely to be subject to both ‘primacy’ and ‘recency’ effects that may influence how likely people are to choose answers based on their position in the list. A primacy effect is a manifestation of weak satisficing behaviour where the respondent chooses the first response option (or options) that provide a satisfactory answer to a survey item, ignoring valid response options which are placed later in a list. Greater primacy effects have been observed with modes that rely on visual presentation such as web surveys and face-to-face survey questions using showcards (e.g. Chang and Krosnick, 2010). By contrast a recency effect sees the respondent choose the last response option (or options), ignoring valid ones that are placed earlier. Greater recency effects have been observed with aural presentation of answer options, where the interviewer reads out the answer options (e.g. Cernat and Revilla, 2020). However, primacy and recency effects also tend to be more common for more demanding survey items (e.g. those with long answer lists), and when the respondent’s ability and motivation is low (Roberts et al, 2019).

Open questions

There is more limited evidence in the literature on the impact of mode on responses to open questions. For web interviews, some researchers have found that those completing on mobile devices give shorter answers than those completing on PCs (Mavletova, 2013; Wells et al, 2014), although other researchers have found no differences (Buskirk and Andrus, 2014; Toepoel and Lugtig, 2014). An expert interviewee noted that people may be less likely to respond fully or at all to open questions online compared with other modes, including paper.

Mitigating mode-specific measurement effects

Various approaches to questionnaire design have been proposed to mitigate against measurement-specific mode effects. We have already touched on some of the specific techniques that can be applied, including adding sensitive questions into CASI modules in face-to-face surveys, and approaches to dealing with DK and PNTS response options. Approaches to adapting questions that use visual or verbal cues for modes where these are less feasible are discussed in the following chapter (chapter 7).

As discussed earlier in this chapter, the experts interviewed for this study who had been involved in transitioning surveys from one mode to another mode or modes placed a strong emphasis on the importance of investing in questionnaire design and testing to identify, understand and, where possible, mitigate mode effects, either within or between modes. Those who had been unable to implement extensive testing before transitioning to a new mode or modes (primarily because of the need to transition quickly during the Covid-19 pandemic) commented that this had made it very difficult to unpack whether any observed changes in response distributions reflected mode-related measurement effects or changes in sample profile.

It was argued that effective redesign and testing of questionnaires for new modes may well involve completely rethinking what is being asked, how and why. Those who had been involved in a substantial survey redesign process noted that survey commissioners need to be aware that it may identify long-standing face-to-face questions that simply do not work via a new mode:

“Now, that is really challenging, it could mean you have four questions when you only had one in the …or it might mean you actually have to change the wording of a question.” (Expert interviewee E4)

It was also noted that many commonly used face-to-face questions have not actually been cognitively tested, so testing as part of redesigning a survey for a new mode may also reveal long-standing issues of measurement error that may be difficult to communicate with stakeholders who have relied on that data – although it can also present an opportunity to rethink whether the survey content is actually meeting user needs (a point discussed further in chapter 12). Cognitive testing was seen as particularly important to understanding the differences in how people understand and respond to questions when asked via different modes, particularly when moving from an interviewer-administered to a self-complete mode, where people may read and process key information and definitions differently.

There are different approaches to designing questions for surveys that use mixed modes. A common approach, called "unimode design" (Dillman et al, 2014), aims to make survey questions as similar as possible across modes so that respondents receive the same stimuli and, it is hoped, provide responses that are not influenced by mode. This approach prioritises minimising differences between modes, even if it means potentially sacrificing some accuracy in the responses themselves.

An alternative approach, called "best practices design," focuses on designing the best possible questions for each mode separately, without worrying about uniformity between modes. This approach prioritises getting the most accurate responses for each mode, even if it means potentially increasing differences in responses due to the mode of participation. This approach is less commonly used because, while it is preferable to a unimode approach where the aim is simply the produce the most accurate point estimates possible, it is less appropriate where population sub-groups are to be compared or where trend analyses are of interest, which is the case for the great majority of social surveys (Tourangeau, 2017).

More recently Wilson and Dickinson (2022) have described what they term an “optimode” design as part of their Respondent Centred Design approach to survey development, which is followed by the ONS (including on their Transformed Labour Market Survey development work – see case study in Appendix A). Here the aim is to optimise questions for each survey mode separately, similar to the ‘best practices’ approach, but with cross-referencing between modes in the hope that both relative and absolute mode effects can be minimised. Wilson and Dickinson argue that this approach provides a better respondent experience and improves data quality. In practice, for surveys that include web as a mode they recommend first designing questions that work optimally for the web mode before taking these questions and amending them as appropriate to work optimally for the other modes of administration. Testing work with respondents and interviewers may lead to further iterations of changes, for instance with an online survey item being adjusted in light of how it is interpreted in the face-to-face mode. Designing optimally for the web mode most often also means designing optimally for completion on smartphones, using a Mobile First approach (Antoun et al, 2018; Couper, 2017).

Alongside these overarching approaches some researchers have developed frameworks that aim to predict which specific types of survey item will be most at risk of measurement-specific mode effects, along with mitigating strategies (e.g. Campanelli et al, 2013). D’Ardenne et al. (2017) built on this framework to assess the risk of mode effects for survey items included in the Understanding Society survey.

Other tools have been developed to assess the quality of survey questions and therefore have some application to assessing the risk of mode effects. These include the Survey Quality Predictor (SQP 2) typology (Saris and Gallhofer, 2007; Saris, 2013), the Questionnaire Appraisal System (QAS) (Willis and Lessler, 1999), and the Question Understanding Aid (QUAID) (Graesser et al, 2006).

However, while the application of good survey design and specific theories and approaches such as RCD or TDM can help to mitigate mode effects, it is unlikely to be possible to fully eradicate them. It is also important to be aware that, while some question types and topics may be more susceptible to mode effects than others, mode effects are not always fully predictable or consistent. For example, the CQC NHS Patient survey programme saw consistently more negative responses to attitudinal questions, that could not be adjusted for by weighting, when the survey tested a push-to-web approach against their historical paper approach, for their Inpatient Survey, Urgent and Emergency Care Survey, Community Mental Health Survey, and the parent sections of the Children and Young People Survey. By contrast, there was no impact on trends for attitudinal questions when similar mode changes were tested on the Maternity Survey or the child sections of the Children and Young People Survey.[43] While assessment of questions likely to be impacted, rigorous design, and robust testing are all important steps in mitigating mode-specific measurement error, there will likely remain differences in responses between modes that cannot be fully adjusted or controlled for, with implications for time series analysis – discussed in chapter 8.

Summary framework to help guide consideration of future mode on SHS, SHeS and SCJS: Measurement error and mode effects - impact on different question types
Priority considerations / issues Potential mitigations Remaining issues and trade-offs
Cross-cutting issues - General Mode effects occur at the question level – so fully assessing the impact of mode change requires assessing impacts at this level too. Various frameworks exist that can be used to predict which types of survey item will be most at risk of mode effects and/or to assess the quality of survey questions (see for example work to assess the risks of mode effects for questions in Understanding Society). A full consideration of impacts of a mixed mode design on measurement error is likely to require a more detailed review of the full content of each survey and a considerable number of questions may require redesign (leading to some discontinuity in trend data – see chapter 8). Testing is also necessary as mode effects can be unpredictable, in spite of general patterns.
In general, mode effects are likely to be more limited for simple factual questions, but more likely to occur for complex questions, sensitive questions, and attitudinal or perception questions. Investment in question design and testing is important to understand the extent of mode effects and test ways of reducing them. Cognitive testing is seen as particularly important to understanding differences in how people understand and respond to questions in different modes. Question redesign can take a number of different approaches (unimode, best practices, optimode) depending on the objective (see TLFS for an example of the ‘optimode’ approach). Respondent Centred Design and Device Agnostic Design principles can also help support the process to ensure questions are fit for purpose across modes / devices. It is generally not possible to know the ‘true’ value – so if there is a difference between modes, it is not possible to be certain which mode is closer to this ‘true’ value. Testing questions on new modes may also reveal issues with long-standing questions that need to be acknowledged and dealt with. Effective redesign may involve completely rethinking what is being asked, how and why. Most importantly, mode effects can be reduced through good design and testing, but cannot be eradicated.
Cross-cutting issues - specific question types/ elements Socially desirable responding is generally a bigger issue within interviewer-administered modes. However, there is evidence that it is more prevalent in telephone than face-to-face interviews, due to the reduced opportunity to build rapport. Limited mitigation options (beyond standard good practice around reassuring on confidentiality, encouraging honest response, etc.) Not applicable.
There is evidence that people respond to attitudinal questions, especially those with likert-type scales, differently by different modes (including both between online and interviewer-administered, and between face-to-face and telephone). SHS and SCJS include attitudinal questions while SHeS includes perception questions that use 5-point likert scales and may be subject to similar issues. The greater observed tendency to choose a ‘neutral’ middle option online could be addressed by removing the middle option and having a four-point scale. Removing the ‘middle’ option will impact on comparability over time and may also encourage people to select the answer when, in fact, they don’t have an opinion, meaning the responses are arguably less accurate.
There are likely to be differences in selection of ‘Don’t know’ or ‘prefer not to say’ between interviewer-administered and self-complete modes. One option is to allow respondents to skip questions, but if they do so politely asking them to answer and making ‘don’t know’ and ‘prefer not to say’ visible at that point. In practice, unless DK and PNTS are shown explicitly upfront (which increases their use), they are used much less often in web surveys than in F2F or telephone. There are likely to remain some differences in their use. This may be more of an issue for attitudinal questions, where ‘DK’ is a valid response.
Where surveys rely on visual presentation (web surveys, paper, and showcards), there are more likely to be primacy effects (where respondents pick the first response that applies rather than reading the full list). Where surveys rely on aural presentation (read outs in F2F or telephone), recency effects (choosing the last valid response they heard) are more likely. Both are more of a risk with questions with long answer lists. Limited mitigation options beyond standard practice re. mixing or rotating the order of answer options (more difficult with paper questionnaires)
Open questions may get shorter/less complete responses online. Limited mitigation options beyond encouraging fuller response.
SHS Detailed socio-economic status measures have been particularly difficult to transition from interviewer-administered to self-complete. This is relevant to both SHS and SHeS, which ask detailed questions on this to enable derivation of SIC/SOC/NS-SEC categories. ONS have developed a self-coded set of questions to measure socio-economic classification, but there were still felt to be limitations to this.
There is evidence that travel diary data is more difficult to collect online, without an interviewer there to assist and explain. Limited obvious mitigation. This is one reason the National Travel Survey (NTS) has not moved mode. However, there may be learning from other surveys testing online diaries.
SCJS There are both methodological and ethical issues around asking questions about within household crimes (like intimate partner violence) without an interviewer present to control who else sees or hears responses. The role of interviewers in guiding respondents through the victim form to ensure sufficiently detailed responses to enable offence coding was also highlighted. Limited obvious mitigation. The CSEW does not ask these questions in the telephone follow-up interviews. There may be scope to share learning and /or testing of revised questions with CSEW.
SHeS See notes under SHS re. detailed socio-economic measures. See above under SHS re. detailed socio-economic measures.
There is evidence that people may give socially desirable answers to some health behaviour questions (e.g. fruit and vegetable consumption) when interviewer-administered – so moving questions currently asked F2F online may result in different response patterns. See above mitigations for cross-cutting issues.

Contact

Email: sscq@gov.scot

Back to top