Hearings in Scottish courts - ethnicity of individuals: analysis

This occasional paper presents new experimental analysis based on the Scottish Courts and Tribunals Service’s management information, on the ethnicity of individuals who were proceeded against and sentenced from April 2016 to February 2023.


6 Technical Annex

6.1 Data collection and the datasets

Multiple data files were provided to Scottish Government Justice Analytical Services (JAS) by SCTS, covering the period April 2016 to February 2023. The data are management information and have not been subject to the same quality assurance processes as data used for Official Statistics and some data improvements are required. Many of the data improvement issues identified are the subject of ongoing reviews, hence these results are to be considered experimental analysis and the observations treated with caution.

Information on the Scottish population by ethnicity was taken from Scotland's Census data 2011 [1]. The values that were extracted from the Census are presented in Table 5.

Table 5: Scotland population by ethnicity (2011).

Ethnicity

Scotland population

Proportion

African, Caribbean or Black

36,178

0.68%

Asian

140,678

2.7%

Mixed or Multiple

19,815

0.37%

Other

14,325

0.27%

White Minority Ethnic

221,620

4.2%

White Scottish/White Other British

4,862,787

92%

Total

5,295,403

The hearings and disposals datasets provided by SCTS were not designed to be linked. As discussed in section 3.2, doing so results in some inconsistencies in the number of convictions and acquittals observed. In order to test the sensitivity of the results to these inconsistencies, three processing methods were applied:

  • Include all cases where both datasets are consistent (in terms of acquittals and convictions) and exclude other cases.
  • Treat all cases where at least one of the datasets indicates a conviction as a conviction, and include acquittals only where the datasets are consistent.
  • Treat all cases where at least one of the datasets indicates an acquittal as an acquittal, and include convictions only where the datasets are consistent.

6.2 Dependent and independent variables

The dependent variable (sometimes known as the response, outcome, target or criterion variable) is one which depends on other factors. This analysis involves multiple research questions, each research question has different dependent variables. These include:

  • Charges that the individual was found guilty of,
  • Type of sentence, such as fine, community sentence, or imprisonment, and, and
  • Length of prison sentence in days.

Information on the "charges that the individual was found guilty of" is used to find out whether an individual has been convicted of a crime or not, and is used for the research question: is there a relationship between verdict and ethnicity? Information on "type of sentence, such as fine, community sentence, or imprisonment" is used to answer the research question: if guilty, is there a relationship between sentence and ethnicity? Lastly, the "length of prison sentence in days" stores the sentence length for each charge, and is used to examine whether there is a relationship between sentence length and ethnicity.

Independent variables (sometimes the predictor, explanatory or regressor variables) are those analysed to study how they may affect the value of the dependent variables. The analysis aims to determine the ethnicity characteristics of individuals. Therefore, the independent variable is "ethnicity". The breakdown of how ethnicity and crime type are grouped can be found in Appendix B and Appendix A respectively. These are grouped into six ethnic groups, and nine disposal crime types, as shown in table 6.

Table 6: Ethnic groups and disposal crime types.

Ethnic groups

Disposal crime types

African, Caribbean or Black

Asian

Mixed or multiple ethnic groups

Other ethnic group

White Minority Ethnic

White Scottish/White Other British

Non-sexual crimes of violence

Sexual crimes

Crimes of dishonesty

Damage and reckless behaviour

Crimes against society

Coronavirus restrictions

Antisocial offences

Miscellaneous offences

Road traffic offences

6.3 Statistical testing method

There are number of statistical testing methods available, from which the selection of a statistical test depends on the purpose of statistical testing, type and the distribution of a variable, and the number of groups in a variable. Based on the above prerequisites, four tests were selected, depending on the question: Chi-Squared goodness of fit test, Logistic Regression, Multinomial Regression, and Poisson Regression. The breakdown of the testing method for each research question is as follows:

  • Do people appearing in court represent the general population/people in prison? (Chi-Squared goodness of fit test)
  • Is there a relationship between court outcome and ethnicity?
  • Is there a relationship between verdict and ethnicity? (Logistic Regression)
  • If guilty, is there a relationship between sentence and ethnicity? (Multinomial Regression)
  • If sentenced to prison, is there a relationship between sentence length and ethnicity? (Poisson Regression)

6.3.1 Chi-squared test

The chi-squared test is a nonparametric statistical test that is used to determine whether there is a difference between observed values and expected values that is due to a true difference in the population or due to sampling error. The observed values are the frequencies from dataset. The expected values are the frequencies expected based on the null hypothesis. There are three main types of chi-square tests, goodness of fit test, independence test and homogeneity test. We are mainly focusing on the goodness of fit test and independence test.

Chi-squared goodness of fit is also referred to as the chi-square test for a single sample. It is used to test hypotheses about the proportions of population distribution or specified frequencies in null hypothesis and is suitable for samples with two or more categories [5]

6.3.2 Logistic Regression

Logistic regression [5], also known as binomial regression, is used for predicting the binary outcome of a categorical dependent variable, to examine the effect of a number of independent variables on the binary dependent variable. It models the odds/log odds/probability of one of the two outcomes occurring. The odds are a way of representing probability. The odds of an event of interest E, is the ratio of the probability that event E occurs to the probability that it does not occur.

This results in a different interpretation of a comparison such as "x times more likely" than when we use a percentage probability. For example, if an individual were equally likely to be convicted or acquitted, this would be an odds ratio of 1 (equivalent to 50% probability of conviction). In the odds ratio interpretation, an individual that is 2 times more likely to be convicted would therefore have an odds ratio of 2 (equivalent to 66.7% probability).

6.3.3 Poisson Regression

Poisson regression can be used to predict a dependent variable that consists of count data, given one or more independent variables. The variable we want to predict is often called the dependent variable. The variables we are using to predict the value of the dependent variable are often called the independent variables. The Poisson regression model assumes that the response variable has Poisson distribution. Rather than odds ratio in logistics regression, relative risk ratios are used for Poisson regression for count variables [6].

Contact

Email: Justice_Analysts@gov.scot

Back to top