COLLATION AND COMPARISON OF MORTALITY, HOSPITAL ADMISSION, GENERAL PRACTICE AND SURVEY DATA ON RESPIRATORY DISEASE

Executive Summary and Recommendations

Aims

This report describes work commissioned by the Department of Health under project reference STRACHAN/SURVEILLANCE/96/12

The aims of this work, were:

(a) to investigate whether consistent patterns emerge from nationally available sources of data on respiratory disease when analysed by time, place and person and

(b) to test the validity and feasibility of using routinely available data to explore environmental influences of respiratory disease.

Methods

Different data sources giving information on 10 respiratory diseases in 1991-1995 were compared. Three routine data sources were used: mortality statistics, Hospital Episode Statistics (HES) and the General Practice Research Database (GPRD). Comparisons also included a national survey, the Health Survey of England in 1995 (HSE95) which gave information on symptoms for asthma, COPD and hayfever and on the social class and smoking status of individuals. The types of respiratory diseases studied were allergic diseases (asthma and allergic rhinitis), obstructive airways diseases (COPD and asthma), infectious conditions (pneumonia, acute bronchitis & bronchiolitis and tuberculosis) and rarer conditions (cystic fibrosis, fibrosing alveolitis, sarcoidosis, pneumothorax). Cancers were not included.

Four types of analysis were performed for each disease in order to assess the degree of consistency between data sources:

Results

Each disease showed different patterns and it was not possible to extrapolate from one disease to another. The annual numbers of events for each disease and each data source is illustrated in Table 1. Division of the number of events by 100 gives an approximation of numbers expected in an average district health authority (DHA).

Table 1 Total observed number of events in England (1994 data) for patient consultations in the GPRD, emergency hospital admissions in HES, deaths and Health Survey for England 1995

Condition

GPRD

(~6% popn)

Age 0-84

HES

(100% popn)

Age 0-84

Deaths

(100% popn)

Age 0-84

HSE 95

(~0.04% popn)

Age 2-84

Asthma

81,905*

78,921

1,215

2,003 †

COPD

15,953*

52,898

18,388

1,222

Pneumonia

3,260*

43,784

22,436

-

Acute bronchitis or bronchiolitis

84,147

25,913

294

-

Hayfever

55,596

71

0

2,832

Tuberculosis

174

1,552

260

-

Cystic fibrosis

100

2,954

101

-

Sarcoidosis

159

427

75

-

Idiopathic fibrosing alveolitis

211

1,075

815

-

Pneumothorax

213

4,937

41

-

*based on patient prescriptions not patient consultations †used an inhaler in the past year

Table 2 summarises age and sex, seasonal and geographical distribution for the 10 respiratory diseases including asthma, COPD and previously unpublished information on rarer conditions such as sarcoidosis and idiopathic fibrosing alveolitis. Year on year trends were not consistent across data sources for any disease over this five year period except for acute bronchitis or bronchiolitis and fibrosing alveolitis which were partially consistent. Pneumonia mortality rates showed an artefactual rise between 1992 and 1993 due to changes in coding rules for death certificates.

Consistency across data sources varied by condition (Tables 2 & 3). Asthma showed inconsistent disease patterns and weak geographical correlations across data sources, but COPD and tuberculosis were fully consistent. Hayfever, acute bronchitis and bronchiolitis and pneumonia were consistent only for some analyses. COPD, acute bronchitis or bronchiolitis and pneumonia all showed higher (age-sex standardised) rates in Northern areas of England and COPD and pneumonia showed higher rates in urban areas. Adjustment of the prevalences of COPD symptoms for social class and smoking habits using individual data from the HSE95 attenuated the regional and urban patterns but did not remove them.

Table 2 Summary of age-sex, seasonal and geographical analyses for 1991-5

Disease

Age-sex, seasonal, geographical

 

 

Asthma

  • Inconsistent age-sex patterns: ­ deaths in elderly, ­ emergency hospital admissions in ages 0-4, ­ inhalers for asthma in ages 5-15
  • Inconsistent seasonal pattern: ­ deaths in winter, ­ hospital admissions in September, ­ first ever GP consultations in early summer
  • Inconsistent regional and urban rural patterns: ­ urban ¯ rural gradient in HES & mortality, no gradient in GPRD or HSE95

 

 

Acute bronchitis or bronchiolitis

  • Partially consistent age-sex patterns. Rates highest at extremes of age but relative magnitudes varied. M>F except ages 15-50 in HES and GPRD
  • Consistently highest in December and January
  • North & Midlands > South. No consistent urban rural pattern

 

 

COPD

  • Rates highest in elderly and M>F in all data sources
  • Winter consistently higher than summer
  • North>south, urban>rural areas in all data sources

 

 

Hayfever

  • Comparisons limited to HSE95 and GPRD because of small numbers
  • Boys>girls but F>M in adults in all data sources with highest rates in children or young adults
  • Rates consistently highest in June and July
  • ­ SW Thames and Oxford, ¯ Yorkshire region in all data sources.
  • No consistent urban rural pattern

 

 

Pneumonia

  • Consistently highest in elderly, M>F
  • Winter>summer in all data sources
  • Consistent regional and urban rural patterns. North>south, urban>rural areas

 

 

Cystic fibrosis

  • Highest in adolescence, but sex distribution varied by data source
  • Small numbers limited ability to interpret seasonal patterns
  • ­ Yorkshire, Mersey and Wessex, ¯ Northern, SW Thames and NE Thames. Small numbers limited ability to interpret urban rural patterns

 

 

Idiopathic fibrosing alveolitis

  • Increased with age, male rates » 2x female rates
  • Partially consistent for seasonal pattern. Higher in winter in mortality and HES, no seasonal pattern in GPRD
  • Not consistent for urban rural. Urban>rural for mortality. No urban rural pattern for HES and GPRD
  • Small numbers limited ability to interpret regional patterns

 

 

Pneumothorax

  • Male rates » 5x female rates. Partially consistent for age: GP consultations and emergency hospital admissions ­ in teenagers and ­ in elderly, but single peak in the elderly for deaths.
  • No seasonal patterns seen in any data source
  • Small numbers ability to interpret regional and urban rural patterns

 

 

Sarcoidosis

  • Male rates>female rates to age 50 then females>males. Partially consistent age distributions: ­ GP consultations and emergency hospital admissions in ages 40-60, ­ deaths in ages 55-85.
  • Small numbers limited interpretation of seasonality but GP consultations ­ June and July while hospital admissions and deaths ¯ July to early October.
  • ­ North Thames, East Anglia, South Western, ¯ North of England. Higher SERs in rural and conurbation than mixed and urban areas.

 

 

Tuberculosis

  • Highest rates in elderly, M>F in all data sources
  • Consistent lack of seasonal pattern
  • Highest regions North Thames & West Midlands, conurbation > rural

 

 

Table 3 Suggested routine data sources with sufficient numbers to permit annual rankings at district and regional health authority level and degree of consistency between data sources for regional rankings

Disease

Sufficient nos* for district rankings

Sufficient nos† for regional rankings

Consistency of regional rankings‡

Common diseases

 

 

 

Asthma

HES

GPRD

Mortality

HES

GPRD

HSE95

Weak geographical correlations across data sources

Acute bronchitis or bronchiolitis

HES

GPRD

HES

GPRD

Moderately good geographical correlation between GPRD and HES

COPD

Mortality

HES
GPRD

Mortality

HES

GPRD

HSE95

Good geographical correlations between data sources

Hayfever

GPRD

GPRD

HSE95

Weak geographical correlation between symptoms and GP prescriptions for hayfever

Pneumonia

Mortality

HES

Mortality

HES

Moderately good positive correlations between HES and mortality

Rarer diseases

 

 

 

Cystic fibrosis

-

HES

Good consistency of regional rankings across data sources

Idiopathic fibrosing alveolitis

-

Mortality

HES

Moderate consistency across data sources of regional rankings

Pneumothorax

-

HES

Could not be assessed due to small numbers even in combined years

Sarcoidosis

-

-

Moderate consistency across data sources of regional rankings

Tuberculosis

-

HES

 

 

 

 

Notifications

Good consistency of rankings between HES and mortality. Moderate consistency of GPRD with HES and mortality

Good consistency of notifications with HES and mortality. Moderate consistency with GPRD.

* at least 100 events per average district, based on observed number of events in 1994

† total of at least 800 events, based on observed number of events in 1994

based on one year of data for common diseases, based on several years data for rarer diseases

 

Discussion

Practical problems in using routine data to explore the geographical distribution of respiratory disease

  1. Boundary changes. Information is often required by administrative boundaries, but these are prone to frequent change. Postcoded or small area data are needed to aggregate data to the required boundary. Facilities need to be in place to allow researchers access to data aggregated to specified boundaries (for example, five years of data to 1999 boundaries).
  2. Data quality. This is a particular problem in using hospital admissions data as this varies between trusts which may affect certain types of studies.
  3. Diagnosis and coding. Regional differences in clinical practice and clinical coding may not be captured on routine quality reports.
  4. Comparability of coding systems. Different data sources use different clinical coding system or versions: mortality currently uses ICD9, HES uses ICD10, GPRD uses OXMIS codes, while surveys use text-based questionnaires.
  5. Small numbers. These may limit precision, particularly with age-specific rates using single years of data or in small geographical areas such as DHA.

Investigation of environmental influences on respiratory disease

The format of this will be determined by the data sources with sufficient numbers to permit meaningful statistical analysis and by the level of consistency between them (Table 3). Generally, routine data might be sought to investigate environmental influences on respiratory disease at a district health authority in two circumstances:

(i) Rates from a routine data source are reported to be higher than average

For example, hospital admission rates for a particular disease in the past year were significantly higher than average. Districts may wish to compare rates with previous years and with other routine data sources, taking into account the practical problems as above, and to refer to Table 3 or make their own assessment of consistency.

(a) Inconsistent routine data sources. For example, asthma. One cannot infer that asthma prevalence or mortality rates are high in an area with high hospital admission rates. Factors such as the threshold for hospital admission, geographical proximity to hospital and quality of and access to primary care are more likely than environmental influences to explain the inconsistency between data sources.

(b) Consistent routine data sources. For example, COPD. High hospital admission rates suggest high underlying prevalence of disease. Before investigation of environmental influences, known confounders such as smoking need to be adjusted for. Adjusting for social class can be performed, but may partially adjust for environmental influences or for lifestyle factors such as diet. If differences remain, it would be reasonable to suggest that environmental influences may be responsible. However, it may not be clear whether current environmental exposures or prior exposures (such as in childhood or previous years) are more important and further work may be needed to clarify this.

(ii) There is a known or suspected environmental hazard locally

Where data are clearly inconsistent, such as asthma, the data source most clearly related to the problem needs to be used. For example, asthma severity might be better assessed using hospital admissions, while prevalence might be better assessed by survey data on symptoms. Where data are clearly consistent, such as COPD, any data source could be used to estimate the impact of environmental influences.

 

Recommendations

(a) To the Department of Health

Use of routine data to investigate environmental influences on respiratory disease

1. Routine data can be used to give information about the patterns of disease, but they should be interpreted with care. In particular, asthma shows striking inconsistency between routine data sources and high rates of hospital admissions for this disease cannot be interpreted as an indicator of an adverse environmental effect.

Improvements to PACT and HES

2. Inclusion of age (even if limited to a simple distinction between child and adult) would improve the epidemiological usefulness of PACT as a routine data source.

  1. Dual coding should be performed for HES in the year before ICD changes as is already performed for mortality data. This could be performed on a 1% national sample. It is recommended that this is performed retrospectively on hospital admissions for 1995/6 to assess the impact of the ICD9 to ICD10 changes.
  2. (b) To the Office for National Statistics

    Decennial supplement

  3. Comparative analyses across data sources provide useful information about the burden of disease. A comparison using ten years of data could usefully be included with the decennial supplement, concentrated on diseases with particular public health relevance such as asthma and COPD.
  4. Improvements to the GPRD

  5. The value of the GPRD for epidemiological analysis would be enhanced if postcode-based socio-economic indicators for each patient were linked to the clinical records. This could be done at the practice level to avoid compromising patient confidentiality by centralised compilation of postcodes.
  6. The validity of the smoking data contained in the GPRD needs to be evaluated by analysing it in relation to outcomes such as lung cancer and COPD which are known to be strongly related to smoking..

 (c) To Health authorities, Primary Care Groups and Trusts

Quality of data

  1. Health Authorities should routinely monitor the quality of hospital data.
  2. The implementation of the government’s IT strategy "Information for Health", clinical governance and the use of routine data for performance monitoring as specified in the White Paper should be used as opportunities to improve the quality of HES.

Conducting epidemiological analyses

  1. Researchers undertaking detailed epidemiological analyses using HES should use the full (100% sample) unadjusted data rather than published data (which is adjusted and uses a 25% sample).
  2. Systematic variations in coverage and missing diagnostic codes in HES should generally be investigated in the following circumstances:

  1. Comparison of trusts
  2. Investigation of time trends using aggregated data from a small number of trusts, for example investigation of a disease cluster within a Health Authority or of an environmental hazard.
  3. If levels of missing data are high, trusts may have to be excluded from studies. Where this is not possible (for example, performance management of local trusts) local knowledge, trust level data quality reports and liaison with the trust concerned may be required. In epidemiological studies, statistical adjustment may be needed.

    Systematic variations in coverage and missing diagnostic codes may not need investigation in the following circumstances:

  4. Daily or weekly time series analyses such as short-term fluctuations in air pollution levels (variations unlikely to vary with exposures)
  5. Large aggregations of data – national and probably at regional level (variations in quality likely to even out over larger areas)
  6. Using "cross-sectional" data from a single trust (variations likely to be internally consistent, at least over short periods of time)
  7. Using all admissions irrespective of cause (completeness of coverage varies less than missing diagnostic codes)