Dataset Information (Metadata)
Main Content
1. Introduction
All datasets on the Neighbourhood Statistics website are accompanied by supporting information which is referred to as metadata. This document:
-
provides a summary of what the dataset shows, including which geographic areas it relates to, which organisation provided it, and the date the information relates to;
-
explains the contents of the dataset;
-
explains how the data are produced including collection, processing and quality assurance;
-
explains any limitations of the dataset; and
-
provides links to more detailed documentation and contact information for specific queries.
In other words the metadata document is there to help you understand the dataset and use it appropriately and effectively. We therefore recommend you read the metadata for each dataset you use.
2. Accessing the Metadata
Every dataset has supporting information available via the blue 'information' buttons which usually appear next to the dataset name. The Neighbourhood Statistics 'Topics' screen, for example, presents datasets in the following way:
Clicking on the relevant button
an 'Information on' box, which provides some summary information.
Scrolling down you can find information on the time series, periodicity, coverage and data source. The last entry shows when the data, or metadata, were last updated.
Although these summaries are a useful means of finding out more about a dataset quickly, for more comprehensive details we recommend that you read the full metadata document. These are available via the 'View more' option, and load as pdf documents with the following appearance. Please note that this is an extract showing the first few sections of the metadata only.
Note: when a dataset is downloaded as an Excel or csv file, the full metadata is also downloaded as a pdf document. This can be found inside the zipped file along with the dataset.
3. Metadata Headings
3.1.'Information On' box
The information in this box follows a standard format.
-
Description: this provides a brief description of the dataset and the information it contains.
-
Time Series: the dates of the start and end of the time series for which data are available. Not all variables may be available for all years, the time series functionality within the website will only work when variables have not changed between years.
-
Periodicity: this is the time period the data relates to- for example, calendar year, financial year or snapshot.
-
Latest coverage: this describes the overall area the dataset relates to- for example, England; England and Wales; Great Britain; United Kingdom; or otherwise.
-
Source: this is the organisation that supplied the data, it is usually a government department.
-
Last updated: this date is the last time a change was made to the data or the metadata. When a dataset is first published on the website the last updated date is the publication date. If, for some reason, it is subsequently revised the last updated date will be amended to show the date when the revised data are published.
3.2. Full metadata document (pdf file)
The presentation of metadata was reviewed at the start of 2009 and a new format adopted. This new format contains the same information as the previous version although some sections have been combined.
The full metadata will help you to identify and compare how different datasets have been collected, processed and quality assured. It provides comprehensive information on the methods used to produce the data, the way a dataset has been validated, and a number of factors affecting overall quality and potential use of the statistics.
General details
This includes a number of subheadings:
-
Dataset title: this will include the generic title and the year of the dataset to which the metadata relates.
-
Time period of dataset: this is the date the data relate to, which may be a specific date, such as 30/06/04 or a time period, for example, 2002/ 2003.
-
Geographic coverage: this describes the overall area the dataset relates to- for example, England; England and Wales; Great Britain; United Kingdom or otherwise.
-
Lowest area output: this is the smallest geographic area for which the data are presented- for example, Lower Layer Super Output Area.
-
Supplier: the organisation that supplied the data. This is usually a government department.
-
Department: this is the relevant department within the supplier organisation.
-
National Statistics data: prior to the creation of the United Kingdom Statistics Authority, outputs were classed as either National Statistics, Experimental Statistics or Not National Statistics:
-
National Statistics: The statistics have been produced in accordance with the National Statistics Code of Practice and have been approved as National Statistics by the Head of Profession responsible for their production.
-
Experimental Statistics: These are statistics that are still in their developmental phase and are moving towards being classified as National Statistics.
-
Not National Statistics: These statistics do not meet the criteria required to be classed as National Statistics. However, prior to the data being published on Neighbourhood Statistics, the quality of the statistics had been assessed against the Framework for Assessing and Reporting the Quality of Information provided by the Neighbourhood Statistics Service and had been passed as satisfactory.
-
Since the creation of the United Kingdom Statistics Authority, outputs have been classed as either National Statistics or Other Official Statistics. For a new output to be classed as a National Statistic it now has to undergo a formal assessment.
-
Revisions: information on whether the dataset has been revised, the nature of the revision, and the time it took place.
Data quality
This section explains the purpose of the document. It provides a link to the 'Guidelines For Measuring Statistical Quality' and details the six dimensions against which quality are reported.
This section is used to explain any specialist terminology or technical concepts used. The information will typically be presented as a glossary.
About the dataset
This section provides a more detailed overview of the dataset giving information on the general themes covered by the variables and how the data relate to other datasets. There will be information about how frequently the data will be updated and the expected time lag between the end of the reference period and publication. This section will also draw your attention to any key points covered in the following sections that you need to be aware of.
How the data are collected/ created
This section provides information about how the data were produced.
If the data have come from an administrative source then this section details the data collection process. The data may have been edited, for example to correct information that is obviously wrong, like a record stating someone is aged 200. Some records could be incomplete with the missing information being added by referencing either data for earlier periods, or current records for similar respondents; this is known as imputation.
If the data are the result of a statistical analysis then this section will outline the methodology used. Often there will be links to additional technical papers which will provide more detail.
Concepts and definitions
Datasets can contain terminology which is unfamiliar to the user but is required to be technically correct. This section explains such terminology and is effectively a glossary for the dataset.
Data classifications
Some datasets use 'standard classifications', for example, Standard Occupational Classification ( SOC) 2000. These are groups of approved terms that are used to ensure comparability with other datasets. If standard or other classifications have been used, these will be specified.
Validation and quality assurance
To assure the quality of a dataset, measures are taken to detect and correct errors or omissions during data collection, processing or analysis. This section provides information on quality checks, including details of routine data audits and other validation processes.
Geographic referencing
Geographic referencing refers to allocating the data to geographic areas. When data are collected, an address, postcode, grid reference or some other geographic identifier is recorded. This can be related to the areas for which the statistics are being presented, for example, Super Output Areas or local authorities. This section of the metadata contains details of any referencing procedures used.
Disclosure control
To safeguard confidentiality and ensure that individuals cannot be identified from a dataset, procedures such as rounding and / or suppression are applied to the data. This is known as 'disclosure control'. The impact of this procedure may be apparent when you view a dataset containing figures that are multiples of three of five (indicating rounding), or you may notice the symbol 'x' in the cells (indicating suppression).
It should be noted that the application of disclosure control may mean that subcategories do not sum to their totals, and information may not aggregate exactly between geographic areas. Although the section provides summary information on disclosure control measures and the implications for the interpretation of the data, exact details on the methods used will not be provided, since they might help a person 'unpick' the figures and deduce disclosive information.
Sources for further information or advice
This provides contact details of data suppliers, as well as details of any other potentially useful sources of information.