Information

Student Finance and Wellbeing Study (SFWS) Scotland 2023-2024: technical report

Provides information on the methodology used for the Student Finance and Wellbeing Study Scotland for academic year 2023 to 2024 and its strengths and limitations.


5. Dataset

Key variables

The full dataset includes source and derived variables from the survey data for FE, HN/UG and PG students. A list of key variables, including the break variables used in analysis, can be found in the Appendices to this technical report.

Extreme values

Once the summary measures of income, spending, borrowing, and savings were created and tested, they were reviewed by the Research Team. This allowed for any unfeasible answers to be corrected (e.g. amounts being recorded as annual amounts when they were intended to be term time amounts or vice versa). In addition, outliers were identified and the data was trimmed to the highest amount within the accepted range to avoid skewing the analyses.

Details about the variables that have been trimmed can be found in the Appendices to this technical report.

Missing values and imputation

Missing values occur when a respondent provides no answer, or when they give a response of ‘prefer not to say’, ‘refuse to answer’ or ‘don’t know’. A different approach to dealing with missing values has been used for the income section than for the expenditure and savings and debt sections.

The SFWS questionnaire includes a number of question ‘sets’ which build to provide a figure for each element of student income. For example, in most cases students were asked whether or not they received a range of different sources of income (such as specific types of student bursary, such as a Young Student Bursary or targeted payments, such as Disabled Student Allowance). If they stated that they received a specific type of income, they were then asked how frequently they received this income, and the regular amount received. The answers to these questions were then used to calculate the total amount received for that particular source of income (this is a derived variable) across all students, which is then used to calculate the median and mean values.

Missing values could occur in any one of the questions that make up the set and would lead to a missing value for the derived variable. For example, the survey asked students about different sources of income, and the amount of income for each source was then summed to produce the derived variable for total income. At any of the income questions a respondent may have chosen ‘prefer not to say’ but this did not exclude them from having a total income calculated for them.

To ensure that even respondents who had not given a response to an income question was still included, it was decided to give missing values/data an imputed value in order to retain all the cases for analysis, and to make full use of the data that students did provide. Imputed values were either a zero value or a median recipient value (based on the median value of a similar group of recipients). Given the large number of derived variables, each made up of several items, the cumulative impact of missing values was significant enough to warrant such an approach as noted above.

Zero values were used when there was insufficient additional data to be able to assume a non-zero value (either from the respondents' other answers to the questions in that ‘set’ or from the answers to that specific question/variable from other similar respondents). Non-zero values were used when there was sufficient additional data to be able to estimate a likely response value.

This approach follows that of the England and Wales SIES. It ensures that a consistent base is used throughout the analysis of income and has the added benefit that the mean values of each element of student income sum to the mean value of the total student income and that it is possible to estimate the proportion of income among students coming from each source.

For the SFWS analysis dealing with expenditure, missing values were treated as missing for the analysis (i.e. excluded from each relevant calculation), and different bases were used depending on the most appropriate sample to use (e.g. all students or only those students who had incurred a specific expenditure, such as parents who had incurred childcare costs). This approach reflects that having an expenditure of £0 in a specific category was a valid response, with the following instruction to respondents appearing on screen “If you have not spent anything on this, please answer 0”.

Similarly, with the analysis relating to savings and debt, missing values were also treated as missing for the analysis, and therefore different bases were used for each derived variable. Where there were overlaps between income variables and debt variables, the cleaned and imputed variables from the income section were used to ensure consistency.

Construction of strata variable

The strata variable was created to reflect the stratification by full-time or part-time status used for sampling students within institutions. As such it is appropriate to use in analysis, to account for the oversampling of part-time students in the sample targets provided to participating institutions.

Contact

Email: socialresearch@gov.scot

Back to top