Department for Work and Pensions

home

Site navigation

Statistics


Family Resources Survey 2000-01

Reliability of estimates

All survey estimates have a sampling error attached to them, calculated from the variability of the observations in the sample. From this, a margin of error (confidence interval) is derived. It is this confidence interval (rather than the estimate itself) which is used to make statements about the likely 'true' value in the population; specifically, to state the probability that the true value will be found between the upper and lower limits of the confidence interval. In general, a confidence interval of twice the standard error is used to state, with 95 per cent confidence, that the true value falls within that interval. A small margin of error will result in a narrow interval, and hence a more precise estimate of where the true value lies.

The calculation of sampling errors (and thus confidence intervals) is based on an assumption of a simple random sampling method, but in practice this is almost never used with large general population surveys, due to its inefficiencies with regard to cost and time. The sample for the FRS, as described earlier, is selected using a stratified multi-stage design, based on addresses clustered into postal sectors. The sampling error estimate is therefore not simply based on the variability among all units in the sample (whether households or individuals) but must also take into account the variability within and between postal sectors. For example, if a sample characteristic is distributed differently by postal sector (i.e. is clustered) this produces a greater overall variance than would occur in a simple random sample of the same size. In other words, the complex (actual) sampling error is greater than the (assumed) simple random sampling error.

The size of the actual standard error relative to the simple random sampling error is represented by the design factor (DEFT) which is calculated as the ratio of the two. Where the standard errors are the same, the DEFT is one, implying that there is no loss of precision associated with the use of a clustered sample design. In most cases, the DEFT will be greater than one, implying that the estimates based on the clustered sample are less precise than those for a simple random sample of the same size. Similarly a DEFT less than one implies the estimate is more precise than would be obtained from a simple random sample.

Tables SE.1 to 10 provide standard errors and design factors for a selection of variables from the 2000-01 FRS. In common with other tabulations the percentages and sampling errors incorporate weighting factors which are designed to compensate for non-response. An example of how to interpret them follows:

Example: Table SE.1:  PDF Standard errors for household composition

Table SE.1 shows that 10.7 per cent of households were composed of one female adult over pension age. The standard error is 0.25. This can be interpreted in the following manner:

It can be estimated with 95 per cent confidence that the true percentage of households composed of one female adult over pension age is:

10.7 + 2(0.25) = 10.7 + 0.5

i.e. if sampling error is the sole source of error, the percentage of the population composed of one female adult over pension age is between 10.2 and 11.2 per cent, with 95 per cent confidence.

The design factor for this variable was 1.25. This implies that the effect of using a clustered sample rather than a simple random sample results in a loss of precision of 25 per cent on standard errors. Similarly, a design factor of 0.99 would have denoted a gain in precision of one per cent.

The sampling errors shown are likely to be slightly larger than the true sampling errors because the software used for the calculation does not take into account the improvement in precision due to post stratification.

In addition to sampling errors consideration should also be given to non-sampling errors. As is clear from the above discussion, the sampling errors generally arise through the process of random sampling and the influence of chance. Non-sampling errors arise from the introduction of some systematic bias in the sample as compared to the population it is supposed to represent. Besides response biases, considered above, there are several potential sources of such bias such as inappropriate definition of the population, misleading questions, data input errors or data handling problems - in fact any factor that might lead to the survey results systematically misrepresenting the population. There is no simple control or measurement for such non-sampling errors titlehough the risk can be minimised through careful application of the appropriate survey techniques from questionnaire and sample design through to analysis of results.

Next