Guidance for Models

The Code of Practice: Guidance for Models

This guidance is in alpha phase, meaning that we welcome any comments and feedback from everyone. Please get in touch with regulation@statistics.gov.uk. We aim to release an updated version of this guidance in early 2022 based on a review of this work.

This document provides guidance on how the principles in the Code of Practice for Statistics can support good practice in designing, developing and using models. The document has been created to cover both traditional statistical techniques, such as linear regressions, and newer techniques including machine learning, when used to create statistics or generate data used to inform decision making and public policy. In the guidance a number of tick box statements have been included so that you can apply the principles in your work.

Part I explores the planning and designing of a model. It provides steps to ensure your planned work meets the expectations of the Code of Practice before you begin to develop the model. The two main factors which should guide your decision to create or use a model even before referring to specific elements of the Code of Practice are the context and purpose of the work. This is so you can demonstrate the appropriateness of your chosen technique. You should consider why you want to continue to use of change your current approach; what alternatives that also need to be considered; whether you can meet the quality of existing statistics with the approach; and whether learning and capability is sufficiently prioritised to achieve your aims.

You must also understand the ethics when considering the use of your model (§ Ethical considerations). You must ensure that the use of the chosen technique is ethically appropriate. This includes knowing the provenance and biases of the data, as well as knowing legal requirements such as Data Protection Legislation, the Human Rights Act 1998, the Statistics Registration and Service Act 2007 and the common law duty of confidence. The UK Statistics Authority’s (UKSA) Centre for Applied Data Ethics’ ethics self-assessment form can be used to assist this process.

Key in making sure your model succeeds is whether the responsible team is sufficiently skilled to undertake this work (Professional capability). Techniques such as machine learning may require different skillsets compared to traditional statistical techniques. However, it is important that the team also knows how to apply these traditional statistical techniques to avoid use of complex techniques if not necessary. If the team does not have the sufficient skills, you must decide whether to upskill them or bring in specialist resource. It is important to note, however, that the team must have the appropriate knowledge and skills to manage both the implementation and maintenance of the model.

Building capability and staying up to date with the latest techniques achieves development goals (Innovation and Improvement). You should understand where these skills sit within your team and organisations overall development plan. You should make sure learning is sufficient prioritised in your area, and whether the chosen techniques are the best use of available resource.

Like in the Code of Practice, the Chief Statistician or Head of Profession, or those with equivalent responsibility, should have sole authority for deciding on methods used for published statistics in their organisation (Transparent processes and management). This is also true for models that are used in decision making. The responsible individual needs to be aware of the methodological choices that have been taken in designing the model. Creating a chain of model accountability allows anyone involved in the project to know who to go to if something goes wrong or an error is identified.

Part II focuses on the steps you should take to best develop and use your new model to serve the public good. Part of this is ensuring that users of the model, or data and statistics generated by the model, are at the heart of any decisions to change the way these statistics or decisions are made (Relevance to users). This engagement should continue once the model is introduced to ensure user need is factored into the continuous development and monitoring of the model.

The model documentation and data should also be accessible to all (Accessibility). It is good practice to make data used by and generated from models open-access where applicable and appropriate. In some cases, this may not be possible. Model code should be findable, accessible, inter-operable, and re-usable.

You should collaborate with experts in both the type of model being used or developed and the subject matter which the data concern (Clarity and insight). This is to ensure any new insights drawn from the model are aligned with the experts’ understanding. It shall also help you identify potential errors or bias in your model. You should be transparent about the involvement of expert groups involved in the model creation. You should produce documentation that can describe the model and work to a range of users.

Crucially, you must know how your model works and know who you will need to communicate your model to (Explanation and Interpretaion). There is a risk that poor communication leads to misuse or misinterpretation of the model. This in turn could lead to over or under reliance on its outputs. Public acceptability is related to how well you can describe your model and its outputs. There should also be stringent quality assurance processes that can satisfy developers and those who are accountable for the model. Quality assurances show that the model is fit for purpose and generating outputs that can be trusted.

Download PDF version (163.24 KB)