
Growing Up in Scotland: changes in language ability over the primary school years

This report investigates the improvement of language ability during the primary school years and identifies factors which appear to help and hinder improvement over this period.

Appendix C: Multivariable analysis results

Description of the analysis undertaken

Linear regression analysis

Many of the factors we are interested in are related to each other as well as being related to cognitive ability. For example, parents on lower incomes are also more likely to have lower levels of education and to live in areas of high deprivation. Simple analysis may identify a relationship between income and language ability. However, this relationship may be occurring because of the underlying association between income and education. Thus, it may be the lower level of education among lower-income parents which is associated with a greater likelihood of lower language ability in their children rather than the fact that they are poor. To avoid this difficulty, multivariable regression analysis was used. This analysis allows the examination of the relationships between a dependent (outcome) variable and multiple independent (explanatory) variables whilst controlling for the inter-relationships between each of the independent variables. This means it is possible to identify an independent relationship between any single explanatory variable and the outcome variable; to show, for example, that there is a relationship between income and language ability that does not simply occur because parental education and income are related.

The regression models developed for this report were fitted with standardised WIAT-II vocabulary score (z score) measured when the child was in Primary 6 as the dependent variable. Standardised BAS-II vocabulary score (z score) measured at age 5 was included as an independent variable. Measures of social background characteristics and demographics, and various additional factors identified from the literature were also added as independent variables. By including a measure of ability at age 5, the results of this analysis identify characteristics which are associated with a relative change in assessment score between age 5 and Primary 6, after controlling for other, potentially confounding, characteristics. Note, though, that the identification of associations between one or more independent variables and a dependent variable does not necessarily imply that the independent variable(s) causes the dependent variable (the outcome).

The characteristics, experiences and circumstances considered in the analysis are outlined in Table 4-1. Readers should note that to ensure consistency in the analysis, for variables with a high number of cases with missing values (e.g. income), a separate category ('No information') was created. For cases with smaller numbers of missing cases (~<100), cases with missing values were added to the modal (most common) category. Further details are provided in Appendix A.

Note also that only children with valid vocabulary scores at both time points were included in the analysis (36 children with a valid vocabulary score at age 5/Primary 1 were excluded from the analysis because there was no valid vocabulary score at Primary 6). Furthermore, data were weighted using the GUS longitudinal survey weight, meaning that only cases which have taken part in every face-to-face sweep of GUS up to and including sweep 8 were included. In total, 2726 children were included in the analysis.

The regression analysis was carried out in four stages:

  • Stage 1: Univariate linear regression models (Tables C-1 to C-24)
    • To examine the relationship between the two standardised vocabulary scores used in the analysis, first, a univariate linear regression model was fitted with standardised expressive vocabulary score at Primary 6 as the dependent variable and standardised expressive vocabulary score at Primary 1 as the only independent variable.
    • Individual linear regression models were then fitted for each of the factors outlined in Table A. In each of these models standardised expressive vocabulary score at Primary 6 was the dependent variable, and standardised expressive vocabulary score at Primary 1 was included as a covariate.
  • Stage 2: Multivariable model with Stage 1 significant factors (Table C-25)
    • The next stage of analysis involved entering the factors which were significant at the 90% level into a single regression model. In so doing, this analysis explored the extent to which each factor remained independently associated with a relative improvement or decline in language ability over the primary school period once controlling for the influence of other factors, including social background.
  • Stage 3: Multivariable model with Stage 2 significant factors (Table C-26)
    • In the third stage of the analysis, a final model was created including only those factors which were significant at the 90% level in the stage 2 model. This is referred to as the 'final model'.
  • Stage 4: Stage 3 multivariable model with interaction effects (Table C-27)
    • To explore whether associations differed according to parental education, interaction effects were fitted to the stage 3 model (the 'final model') between parental education and each of the independent variables except for Primary 1 vocabulary score.

Interpreting the tables

The weighted sample size for each category is provided in the 'Weighted base' column. The sample size given in the top row for each variable is the sample size for the reference category, which is given in brackets.

All figures quoted in this report have a margin of error because they are estimates based on a sample of children, rather than all children. The p-value is an estimation of how likely it is that we would find a relationship in our sample of children if there was no actual relationship in the population (i.e., broadly speaking, among children in Scotland who are the same age as the GUS children but who are not part of GUS). Thus, the smaller the p-value (p<0.05), the more confident we can be that our results are likely to apply to children in Scotland more widely.

For continuous independent variables (covariates) the regression coefficient ('Coeff') illustrates the relative level of change (positive or negative) in language ability score at P6 if score at P1 is increased by 1 unit. A significant (p<0.05) positive coefficient denotes a relative improvement in ability score and a significant negative coefficient denotes a relative decline in ability score for every one-unit increase in P1 score.

For categorical independent variables (factors) the regression coefficient ('Coeff') illustrates the relative level of difference (positive or negative) in language ability for each sub-group as compared with the reference sub-group. A significant (p<0.05) positive coefficient denotes a higher ability score and a significant negative coefficient denotes a lower ability score when compared with the reference sub-group. The reference sub-group is indicated in brackets.

The 95% confidence interval is an indication of the level of uncertainty in the coefficient estimate.

Table C‑1 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1

Weighted base p-value Coeff 95% Confidence interval
Standardised vocabulary score (P1) 2698 .000 0.431 0.385 0.476
R squared 0.172
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑2 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and child's gender

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .432 .386 .478
Child's gender
Male 1374 .365 .035 -.042 .112
Female (ref) 1324 - - - -
R Squared 0.172
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑3 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and equivalised annual household income

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .411 .363 .459
Equivalised annual household Income .025
No Information 160 .803 .027 -.189 .243
Top Quintile (>=£37,857) 358 .001 .244 .099 .389
4th Quintile (>=£29,126<£37,857) 525 .041 .150 .007 .293
3rd Quintile (>=£19,643<£29,126) 430 .035 .175 .012 .337
2nd Quintile (>=£12,217<£19,643) 605 .132 .097 -.030 .225
Lowest Quintile (<£12,217) (ref) 620 - - - -
R Squared 0.178
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑4 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and level of area deprivation (SIMD)

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .419 .373 .466
Area Deprivation (SIMD) .049
Least Deprived Quintile 523 .029 .153 .016 .290
4th Quintile 556 .026 .153 .019 .286
3rd Quintile 510 .002 .222 .085 .358
2nd Quintile 508 .078 .122 -.014 .259
Most Deprived Quintile (ref) 602 - - - -
R Squared 0.178
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑5 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and parental education

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .411 .364 .458
Highest level of parental education
Degree level or above 964 .000 .180 .100 .260
Below degree (incl. missing) (ref) 1734 - - - -
R Squared 0.179
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑6 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and urban/small town or rural location

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .424 .379 .469
Urban/rural location
Small town or rural 852 .005 .139 .045 .233
Urban (incl. missing) (ref) 1846 - - - -
R Squared 0.176
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑7 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and languages spoken in the household

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .434 .387 .481
Languages spoken in household
Other language(s) spoken 141 .120 .133 -.036 .302
English only (incl. missing) (ref) 2558 - - - -
R Squared 0.173
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑8 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and parent literacy issues

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .430 .384 .477
Parent literacy issues
One or more literacy issues 338 .866 -.013 -.171 .145
No literacy issues (incl. missing) (ref) 2361 - - - -
R Squared 0.172
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑9 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and parent mental wellbeing

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .433 .386 .480
Parent mental wellbeing
Below average at sweep 5 and/or sweep 8 632 .456 .036 -.060 .132
Average or above at both sweeps (incl. missing) (ref) 2066 - - - -
R Squared 0.172
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑10 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and whether parent has limiting health problem

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .431 .385 .478
Parent limiting long-term health problem
Parent had limiting health problem at sweep 5 and/or sweep 8 356 .370 .056 -.068 .179
Parent had no limiting health problem (incl. missing) (ref) 2342 - - - -
R Squared 0.172
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑11 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and whether child has a limiting health problem

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .430 .384 .477
Child limiting long-term health problem
Child had limiting health problem at sweep 5 and/or sweep 8 269 .937 -.007 -.174 .161
Child had no limiting health problem (incl. missing) (ref) 2429 - - - -
R Squared 0.172
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑12 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and whether child had above average levels of social, emotional and behavioural difficulties

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .418 .371 .465
Child's social, emotional & behavioural difficulties
Above average difficulties at sweep 5 and/or sweep 8 433 .001 -.198 -.310 -.087
Average levels of difficulties at both sweeps (incl. missing) (ref) 2264 - - - -
R Squared 0.177
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑13 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and whether child experienced parental separation or re-partnering

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .430 .384 .475
Parental separation/re-partnering
Change in family type 432 .600 -.036 -.171 .100
Stable family type (incl. missing) (ref) 2266 - - - -
R Squared 0.172
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑14 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and whether child changed school

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .429 .383 .475
Whether child changed school
Changed school 419 .279 -.063 -.180 .053
Did not change school (incl. did not attend school and missing) (ref) 2280 - - - -
R Squared 0.173
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑15 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and whether child experienced significant adverse life event

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .431 .385 .476
Whether child experienced significant adverse life event
Significant adverse event occurred 327 .818 .015 -.116 .147
No significant adverse event (incl. missing) (ref) 2371 - - - -
R Squared 0.172
Total N (unweighted) 2698
Total N (weighted) 2726

Table C-16 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and child's feelings about school

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .431 .385 .477
Child's feelings about school
Highly positive 824 .335 .042 -.044 .128
Less positive (incl. missing) (ref) 1874 - - - -
R Squared 0.172
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑17 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and size of P1 school

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .431 .384 .478
Number of pupils in P1 School .228
No information 205 .445 .075 -.121 .271
0-100 215 .538 .057 -.127 .240
101-200 629 .495 .058 -.111 .227
201-300 678 .619 -.043 -.214 .129
301-400 688 .390 -.065 -.215 .085
More than 400 (ref) 284 - - - -
R Squared 0.175
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑18 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and whether P1 school denominational

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .431 .385 .477
Whether P1 school a denominational school
Non-denominational (incl. no information) 2011 .613 -.024 -.121 .072
Denominational (any religion) (ref) 687 - - - -
R Squared 0.172
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑19 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and % of children at P1 school registered for free school meals

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .422 .375 .470
% of pupils in P1 school registered for FSM .095
No information 205 .464 .057 -.098 .213
More than 25% 633 .068 -.113 -.236 .009
25% or less (ref) 1860 - - - -
R Squared 0.175
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑20 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and warmth of parent-child relationship

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .430 .384 .476
Parent-child warmth
High level of warmth 916 .440 .028 -.044 .101
Lower level of warmth (incl. missing) (ref) 1782 - - - -
R Squared 0.172
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑21 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and parental interactions with child's school

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .425 .379 .470
Parent interactions with child's school
High (7-10 interactions) 510 .005 .112 .036 .189
Low-Medium (0-6 interactions) (incl. missing) (ref) 2189 - - - -
R Squared 0.174
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑22 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and how often parent helps child look for school-related information

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .430 .384 .476
How often parent helps child look for school-related information
Most days 493 .333 -.059 -.180 .062
Less often (incl. missing) 2205 - - - -
R Squared .173
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑23 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and home reading

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .420 .372 .468
Home reading in last week
Most days (6-7 days) (incl. missing) 1557 .002 .139 .053 .225
5 days or less (ref) 1141 - - - -
R Squared .177
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑24 Linear regression model predicting standardised expressive vocabulary score at P6, by standardised expressive vocabulary score at P1 and parental belief they can influence child's achievements at school

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .427 .381 .473
Parent belief they can influence child's achievements at school
Highly positive (strongly agree) 1116 .089 .066 -.010 .143
Less positive (incl. missing) (ref) 1582 - - - -
R Squared .173
Total N (unweighted) 2698
Total N (weighted) 2726

Table C‑25 Linear regression model predicting standardised expressive vocabulary score at P6 - by factors individually associated with change in univariate analysis

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .384 .335 .433
Child's gender
Boy 1374 .122 .060 -.017 .137
Girl (ref) 1324 - - - -
Area deprivation (SIMD) .224
Least deprived quintile 523 .884 -.012 -.175 .151
4th quintile 556 .938 -.006 -.162 .150
3rd quintile 510 .157 .103 -.041 .248
2nd quintile 508 .445 .052 -.084 .189
Most deprived quintile (ref) 602 - - - -
Equivalised annual household Income .555
No information 160 .748 -.034 -.242 .174
Top quintile (>=£37,857) 358 .126 .128 -.037 .293
4th quintile (>=£29,126<£37,857) 525 .591 .042 -.112 .195
3rd quintile (>=£19,643<£29,126) 430 .188 .103 -.052 .258
2nd quintile (>=£12,217<£19,643) 605 .435 .052 -.080 .184
Lowest quintile (<£12,217) (ref) 620 - - - -
Highest level of parental education
Degree level or above 964 .012 .118 .026 .209
Below degree (incl. missing) (ref) 1734 - - - -
Urban/small town or rural location
Small town or rural 852 .021 .118 .018 .218
Urban (incl. missing) (ref) 1846 - - - -
% of pupils in P1 school registered for free school meals .799
No information 205 .508 .051 -.101 .202
More than 25% 633 .908 .008 -.130 .146
25% or less (ref) 1860 - - - -
Child's social, emotional and behavioural difficulties
Above average difficulties at sweep 5 and/or sweep 8 433 .011 -.163 -.286 -.039
Average levels of difficulties at both sweeps (incl. missing) (ref) 2265 - - - -
Home reading in last week
Most days (6-7 days) (incl. missing) 1557 .023 .107 .015 .200
5 days or less (ref) 1141 - - - -
Parent interactions with child's school
High (7-10 interactions) 510 .340 .035 -.038 .109
Low-Medium (0-6 interactions) (incl. missing) (ref) 2189 - - - -
Parent belief they can influence child's achievements at school
Highly positive (strongly agree) 1116 .453 .029 -.047 .105
Less positive (incl. missing) (ref) 1582 - - - -
R Squared 0.195
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑26 Linear regression model predicting standardised expressive vocabulary score at P6 - final model

Weighted base p-value Coeff 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .391 .342 .440
Child's gender
Boy 1374 .119 .063 -.017 .142
Girl (ref) 1324 - - - -
Highest level of parental education
Degree level or above 964 .001 .143 .061 .225
Below degree (incl. missing) (ref) 1734 - - - -
Urban/small town or rural location
Small town or rural 852 .013 .124 .027 .220
Urban (incl. missing) (ref) 1846 - - - -
Child social, emotional & behavioural difficulties
Above average difficulties at sweep 5 and/or sweep 8 433 .002 -.179 -.290 -.068
Average levels of difficulties at both sweeps (incl. missing) (ref) 2265 - - - -
Home reading in last week
Most days (6-7 days) (incl. missing) 1557 .012 .118 .027 .209
5 days or less (ref) 1141 - - - -
R Squared .191
Total N (unweighted) 2726
Total N (weighted) 2698

Table C‑27 Linear regression model predicting standardised expressive vocabulary score at P6 - final model with interaction effects

Weighted base p-value Regression coefficient 95% confidence interval
Standardised vocabulary score (P1) 2698 .000 .391 .342 .440
Child's gender .060
Boy 1374 .713 .019 -.085 .124
Girl (ref) 1324 - - - -
Highest level of parental education .138
Degree level or above 964 .566 .047 -.116 .210
Below degree (incl. missing) (ref) 1734 - - - -
Urban/small town or rural location .017
Small town or rural 852 .015 .153 .031 .275
Urban (incl. missing) (ref) 1846 - - - -
Child social, emotional & behavioural difficulties .004
Above average difficulties at sweep 5 and/or sweep 8 433 .021 -.167 -.307 -.026
Average levels of difficulties at both sweeps (incl. missing) (ref) 2265 - - - -
Home reading in last week .003
Most days (6-7 days) (incl. missing) 1557 .208 .079 -.045 .204
5 days or less (ref) 1141 - - - -
Interaction effects
Parental education* Child's gender - .156 - - -
Parental education * Urban/small town or rural location - .376 - - -
Parental education * Child social, emotional and behavioural difficulties - .850 - - -
Parental education * How often parent reads with child - .223 - - -
R Squared .192
Total N (unweighted) 2726
Total N (weighted) 2698



