Case Study: Confidence
Main Content
Using Income Estimates to Help Understand Confidence Intervals for Modelled Data
Introduction
The purpose of this case study is to illustrate the importance of using confidence intervals when analysing data. We have used a dataset on Income Estimates for small geographic areas to demonstrate as an example. The main limitation of estimates for small areas is that they are subject to variability. Confidence intervals summarise this variability and should always be taken into account when making use of estimates. This case study is also available as a downloadable pdf document (182Kb).
Data
There are a number of datasets available on the Neighbourhood Statistics website which include confidence intervals. This case study focuses on ward level income estimates for England and Wales for 2001-2002.
The Office for National Statistics (ONS) has produced estimates of average Weekly Household Income for the financial year 2001-2002 using 2003 Census Area Statistics ward boundaries. The estimates have been produced using a model based approach because most surveys are not designed to produce reliable estimates at ward level. The model based process involves finding a relationship between the survey data available on income and other data drawn from administrative and Census data sources.
What is administrative data?
By administrative data we mean the collection of data for official and primarily non-statistical purposes, such as benefit claims, medical records, or vehicle registration.
Estimates and confidence intervals have been produced for four different income types:
- Total Weekly Household Income - the sum of the gross income of every member of the household plus any income from benefits such as disability benefit;
- Net Weekly Household Income - this includes all income after the deduction of income tax, National Insurance, pension contributions, council tax, child support payments, and parental contributions to students;
- Net Weekly Household Income (equivalised before housing costs) - equivalisation adjusts household income to take account of household size and composition, enabling more appropriate comparisons to be made between households;
- Net Weekly Household Income (equivalised after housing costs) - is subject to deductions for rent, water rates, mortgage interest payments, buildings insurance and ground rent.
Further details on the definitions are included in the Model-Based Income Estimates User Guide (586Kb pdf).
This study will use:
- equivalised estimates to take account of household composition and size;
- net estimates to focus on the amount of disposable income which households have;
- before and after housing cost estimates to analyse the patterns in the amount of income spent on housing costs.
Section 1: Analysis of equivalised net average weekly household income before housing cost
Figure 1.1: Equivalised net average weekly household income estimates before housing costs for the 10 wards in England and Wales that had the lowest estimates for 2001-2002
From the chart in figure 1.1, we can see that the wards of Middlehaven, Bastwell, Bradford Moor and University (Bradford) had the lowest estimated equivalised net average weekly household income before housing costs for 2001-2002 at £210. The black points on the chart represent the estimates for each ward; the vertical lines illustrate the range of confidence intervals.
What is a confidence interval?
A confidence interval provides an estimated range of values which includes an unknown value. For this case study the unknown value is the average weekly household income for different wards. In the context of model-based estimates, the confidence intervals summarise the variability in the estimates caused by the modelling process. It is therefore important to look at the confidence intervals as well as the estimate. The range of a confidence interval is expressed in terms of lower and upper limits.
The confidence interval reflects the range between which the true value of the average weekly household income is believed to lie, at a given level of confidence. At the 95 per cent confidence level, assuming that the model holds, on average the confidence interval is expected to contain the true value around 95 per cent of the time.
For example, Middlehaven's estimate of £210 has a 95 per cent confidence interval with a lower limit of £190 and an upper limit of £240. This means that for 2001-2002 the true average weekly income for Middlehaven is likely to lie between £190 and £240 for 95 per cent of the time.
It is important to take care when interpreting the estimates given for a series of ranked wards. In figure 1.1, the confidence intervals for the 10 ranked wards overlap. For example, the 95 per cent confidence interval for Park End (£210 to £260) overlaps with the 95 per cent confidence interval for Middlehaven (£190 to £240). This means that the actual average weekly income in Middlehaven could be higher than that experienced in Park End for 2001-2002.
From a policy perspective it is important to consider the group of lowest ranking wards which have overlapping confidence intervals together, rather than just the wards which have the lowest estimated income value.
We can also analyse the highest ranking wards in terms of income estimates.
Figure 1.2: Equivalised net average weekly household income estimates before housing costs for the 10 wards in England and Wales that had the highest estimates for 2001-2002
From the chart in figure 1.2, we can see that Austenwood had the highest estimated average weekly household income at £1,070 with a 95 per cent confidence interval of £870 to £1,320. This means that in 2001-2002, the average weekly household income for Austenwood is likely to lie between £870 and £1,320 for 95 per cent of the time.
We can see that among the 10 highest ranked wards there are wards with overlapping confidence intervals. This means for example, that the true average weekly household income in Ightham, which had an estimated average weekly income of £900 and confidence interval ranging from £770 to £1,050, could be higher than the true weekly household income in Austenwood for 2001-2002.
It is also important to note that due to the modelling process, the width of the confidence intervals for the different wards will vary. For example, for the ward of Austenwood the confidence interval ranges from £870 to £1,320; a difference of £450, whereas for the ward of Weston Green, the confidence interval ranges from £800 to £1,090; a difference of £290. The narrower the width of the confidence interval the smaller the range of the income estimates. As a result users can have more confidence in the accuracy or precision of the estimate for policy and planning purposes.
Estimates for two particular wards can only be described as significantly different if the confidence intervals for those estimates do not overlap. The chart and table below provide an example.
Figure 1.3: Equivalised net average weekly household income estimates before housing costs for the wards of Middlehaven, Bastwell and Austenwood for 2001-2002.
Table 1: Equivalised net household weekly income estimates before housing costs for the wards of Middlehaven, Bastwell and Austenwood for 2001-2002.
Section 2: Comparing equivalised net average weekly household income estimates before and after housing costs
In this section we will investigate the differences that are present when comparing the estimates before housing cost with those after housing costs. The importance of this is to understand how the modelling process itself can lead to unexpected results. This is an important aspect to consider as it is modelled data that is the main area in which confidence intervals are created.
Using the same group of wards included in figure 1.1, the chart in figure 2.1 illustrates the income estimates for these wards based on the equivalised net average weekly household income after housing costs. We can see that for some wards the income estimates have decreased using this income measure whilst for other wards the income estimates have increased.
Figure 2.1: Equivalised net average weekly household income estimates after housing costs for 2001-2002
For example, before housing costs, the estimated average weekly household income for Middlehaven was £210 with a 95 per cent confidence interval around the estimate of £190 to £240. After housing costs, the estimated average weekly household income for Middlehaven decreased to £180 with a 95 per cent confidence interval around the estimate of £160 to £210. In contrast, before housing costs, the estimated average weekly household income for Bastwell was £210 with a 95 per cent confidence interval around the estimate of £180 to £250. After housing costs, the estimated average weekly household income for Bastwell increased to £230 with a 95 per cent confidence interval around the estimate of £200 to £260, which is clearly not what we would expect.
For a small number of wards in England (213 out of 8,875) the income estimates after housing costs are either the same or higher than the estimates provided before the deduction of housing costs. This effect has occurred due to the modelling process and underlines the importance of using confidence intervals. For each type of income estimate (before and after housing costs) a different model is selected to best fit the relationship between the survey data on income and other data drawn from administrative and Census sources. This means that for a small number of wards it is possible to end up with income estimates that are higher after housing costs have been deducted.
The effect of this modelling process can also be seen when we consider the lowest 10 ranking wards after housing costs.
Figure 2.2: Equivalised net average weekly household income estimates after housing costs for the 10 wards that had the lowest estimates for 2001-2002
From figure 2.2 we can see that the group of wards with the lowest rankings for net average weekly household income after housing costs is not the same as the group of wards with the lowest ranking for net average weekly household income estimates before housing costs.
As we might expect, whether we consider average weekly income estimates before or after housing costs, the 10 lowest ranked wards in England are all located within local authorities that are in receipt of Neighbourhood Renewal Funding (NRF) and are situated in the North of England. The local authority of Knowsley in the North West of England is an example.
(NRF provide communities in the 88 poorest local authority districts with extra funds to tackle deprivation).
Figure 2.3: Equivalised net average weekly household income estimates after housing costs for some of the wards within the local authority of Knowsley for 2001-2002
From figure 2.3 we can see that the true average weekly household income for the local authority of Knowsley is likely to lie within the range of £150 to £390 for the all wards. Princess is the ward with the lowest estimated average weekly household income at £170 and 95 per cent confidence interval of £150 to £200. Roby is the ward with the highest estimated average weekly household income of £340 and 95 per cent confidence interval of £300 to £390 for 2001-2002.
From a policy perspective it will be important to consider all those wards with overlapping confidence intervals in a similar way. Even though their estimated values may be different, it is likely that their true values will be within a similar range and will therefore require similar policy considerations.
Summary
The analysis above has illustrated the importance of taking account of confidence intervals when using estimated values. Confidence intervals summarise the potential variability in the estimates. If we do not take account of the confidence intervals provided around the estimates, we may make inappropriate policy decisions. It is important to understand the implications of having overlapping or non-overlapping confidence intervals amongst a group of estimated values. Further, the size of the confidence interval provides an indication of the relative accuracy of the estimates provided.
Finally, it is worth noting that although the focus of this case study has been on income estimates, there are further datasets available on the Neighbourhood Statistics website which include confidence intervals. For example, within the 'Community Well-being' domain, the dataset which provides 'Best Value Performance Indicators ... General Satisfaction with Local Authority' provides estimated percentage levels of satisfaction with confidence intervals. For this dataset the confidence intervals summarise the variability in the estimates caused by using sample data.
Note: The subject of confidence intervals and the statistical significances of differences is complex and what is presented here is a simplified example
References
Model-based Estimates of Income for Wards in England and Wales, 1998/99, User Guide (586Kb pdf).