Guide to basic quality assurance in statistics
Guidance for those producing official statistics to ensure that quality is monitored and assured.
Checking changes over time
Whether producing a time-series or not checking changes over time to assess whether the most recent data you have is credible when compared to previous data should be undertaken.
Graphs help to give a feel for data by highlighting trends, and deviations from trends, with far less effort than comparing sets of figures in a table and avoids mistakes in mental arithmetic. This will help to spot discrepancies, whether these indicate an error or not:
- You needn’t graph all of your data but it will certainly be useful to graph the main data from each output to spot any obvious discrepancies.
- It may be best to plot only one item per time-series for quality assurance to avoid patterns or discrepancies being hidden by values of a much higher or lower magnitude that could skew the axis.
While graphical displays will help to identify trends in time-series and highlight discrepancies it is not usually practical to produce a graph of all your data as a time-series, but you should check all your data against previous figures where possible.
- Set a threshold for the magnitude of change considered reasonable – this might be based on a statistical threshold or on an understanding of the context which means certain figures are unlikely.
- Calculate both the absolute change and the percentage change between intervals and compare the resulting values to your thresholds.
- Consider which packages will help with this kind of analysis. For example, Excel makes this easy through the use of conditional formatting, but Access and SAS can be helpful as well. This will help to identify anomalies at finer levels than an overall total, e.g. errors due to miscoding.
If you are producing a time-series consider whether it is reasonable to construct a time-series of the data you have.
- Is there enough data? Constructing a time-series with only two or three points is often inappropriate as it could give an impression of a trend that may be misleading.
- Are you comparing data from regular intervals? Although you may wish to compare data that is not from regular intervals you should consider if there are any cyclical variations that could impact on the interpretation of the results.
- Have there been major changes that might affect the comparability of historical data and more recent data? While some policy or sectoral changes over years can often be expected, reclassification of variables and definitional changes may affect the validity of comparisons.
When publishing time-series you should produce consistent historical data where possible. Where time-series are revised or changes made to methods or coverage you should consider whether these changes make comparisons with historical data unreliable or unsuitable. If this is the case you should consider the following:
- Are the figures used in government targets or indicators? If so it is normal to continue measuring on the old basis if possible (if not possible you will need an agreement from ministers before making changes to measurement).
- Where it is not possible to produce a historical series on the new basis, you must produce your best estimate of at least the previous year’s statistics on the new basis, or the new year’s figure on the old basis to allow change to be measured on a like for like basis (You can contact OCS if you require guidance on estimation). Ideally joint running of the system would then allow measurement of the change in definition or process to be quantified. In terms of presentation of the step change this may take the form of an overlap on a graph where there old method stops and the new method starts.
- Inform users of revisions through a revisions policy for scheduled revisions or, for unscheduled revisions, a statement within your product explaining the nature and extent of the revision and how the figures should be interpreted.
If something looks strange discuss it with data providers and/or policy colleagues – make use of their knowledge of changes in policy or within sectors or of the collection process that may explain unexpected data. It is often important to contact the origins of the data, e.g. institutions that have submitted administrative data to a central collection and to ensure that explanations for changes are well evidenced. It’s important to be able to explain anything that looks anomalous; if you can’t how do you know it’s reliable?
There is a problem
Thanks for your feedback