Education + Advocacy = Change


Click a topic below for an index of articles:




Financial or Socio-Economic Issues


Health Insurance



Institutional Issues

International Reports

Legal Concerns

Math Models or Methods to Predict Trends

Medical Issues

Our Sponsors

Occupational Concerns

Our Board

Religion and infectious diseases

State Governments

Stigma or Discrimination Issues


IIf you would like to submit an article to this website, email us at for a review of this paper

any words all words
Results per page:

“The only thing necessary for these diseases to the triumph is for good people and governments to do nothing.”


Statistical methods in epidemic modelling:
Daniela De Angelis
report on progress 2001-2006



The main focus of this research programme has concerned the development and application of statistical methodology to estimate the characteristics and evolution of epidemics, in particular those caused by the Human Immunodeficiency Virus (HIV) and the Hepatitis C Virus (HCV). The main progress relates to: the development and application of methods to estimate prevailing and future disease prevalence (often at different disease stages) and incidence using information from a variety of sources; the characterisation of disease progression, and the factors affecting it, through the analysis of longitudinal data on disease markers collected in observational cohort studies, accounting for the bias inherent in such studies. The philosophy of this work is mainly Bayesian with an emphasis on the use of information from multiple sources and at different levels of aggregation and, particularly, on the identification of critical sources of information needed both to resolve apparent conflicts between data sources and to reduce estimation uncertainty. Apart from my close collaboration with the Health Protection Agency (HPA), the research has also benefited from continuation of important collaborations, such as that with the Concerted Action on SeroConversion to AIDS and Death in Europe (CASCADE) project, and the establishment of new ones, such as that with the Trent study. These have been fundamental in providing longitudinal data to estimate progression for the HIV and HCV diseases, respectively. Contributions in other areas include: input to cost-effectiveness analyses of HCV therapies; estimation of evolution of the injecting drug use "epidemic"; modelling of errors in protein databases.


My research activities are motivated by the need to provide evidence-based input to public health policies in England and Wales. This work is carried out through the link with the HPA (previously Public Health Laboratory Service), which funds my position. This long-standing collaboration has proved very successful for both organisations. The HPA is the national, and internationally renowned, centre for surveillance of infectious diseases, with responsibilities for preparedness for new and emerging health threats such as bio-terrorist attacks and virulent new disease strains. It offers a wealth of information, an in-depth epidemiological expertise and, importantly, the opportunity of contributing to the formulation of policies for genuinely significant and pressing public health issues. The Biostatistics Unit, on the other hand, provides the specialised statistical knowledge that is essential to the provision of sound evidence based advice.

Most of my work is focused on the development of statistical methods to address problems relevant to the HPA and it is mainly carried out at the Biostatistics Unit. However, my responsibilities to the HPA also include provision of statistical advice to junior statisticians and epidemiologists of a more routine nature.

HIV modelling

HIV incidence estimation

HIV incidence estimation: HIV has been the most serious communicable disease in the UK since the mid-1980s. Traditionally, estimates of the number of HIV infected individuals in high risk groups and short-term prediction of AIDS cases have been the quantitative under-pinning of the Department of Health's public health strategies. The advent of the highly active anti-retroviral therapies (HAART) in the late 1990s has substantially reduced the incidence of AIDS and death in industrialised countries. This, together with the shift from epidemic to endemic HIV transmission in high risk groups, has altered research priorities. Therefore while estimation of prevalence, particularly in ethnic minorities, has been increasingly important, knowledge of future trends in AIDS cases is no longer central to health care planning. Estimation and prediction of the number of people at earlier stages of HIV disease and, in particular, the number of new infections, has become more relevant to policy. Knowledge of the recent and current level of HIV transmission is the basis for the Department of Health Sexual Strategy, whose main aim is to reduce transmission of HIV and sexually transmitted disease by 25% by 2007. However, the introduction of HAART has also irremediably changed the historical trends in AIDS incidence and the natural history of HIV, compromising use of more traditional methods for incidence estimation. Implementation of the back-calculation method [05.503], which, based on reports of AIDS cases and the distribution of the time from HIV infection to AIDS (incubation time), estimates the underlying HIV incidence, is now problematic. This indicated the need for new approaches, less reliant on AIDS figures that can explore the potential of alternative, but already existing, data sources and, crucially, identify enhancements to current surveillance systems, which will most usefully inform estimation. Little progress in this direction has been made in other industrialised countries, which typically do not have the rich surveillance system of the UK. Attempts have been limited to devising versions of back-calculation that are based on HIV diagnoses data rather than AIDS cases. However, this has introduced the added difficulty of estimating the distribution of the time between infection and HIV diagnosis (the time of the first HIV positive test), which is not the result of a natural process, but depends on the state of the immune system as well as on external pressures (awareness campaigns) that might change over time (Chau et al, 2003; Ping Yang, Public Health Agency of Canada, personal communication).

We have addressed this situation through the development of new methods that exploit the complex body of information available on HIV. Our contribution adopts a Bayesian perspective, which is particularly suited to the incorporation of uncertain information from a variety of sources and allows coherent propagation of uncertainty.

Collaboration with Wally Gilks has led to the development of an approach to the re-construction of relevant aspects of the HIV epidemic, in particular HIV incidence, consistently with surveillance data, both at the individual level and aggregate level, as well as information from ad hoc surveys, national surveys and routinely collected statistics. One of the main goals of this work was to investigate the role of additional information (or potentially available additional information) in the estimation of HIV incidence and to make recommendations for routine collection of such information. Feasibility of the idea has been tested on data on homosexuals from England and Wales. Results have confirmed the potential of the approach while pointing out the computational burden. The appeal of this approach is the individual based modelling, which allows incorporation of individual-specific information.

As an alternative approach [05.110] we have extended our previous work (Aalen et al, 1997) by proposing a discrete time multistage version of the back-calculation method that employs HIV diagnoses and AIDS diagnoses with no previous HIV diagnosis as end-points. HIV progression is described as a series of disease stages in terms of CD4 count, and HIV diagnosis is allowed from the various stages with rates that are stage and calendar time dependent. In this way we model explicitly the dependence of the diagnosis process on both the state of the immune system and calendar time. Information on HIV progression through CD4 stages and data on end-points are used to estimate diagnosis rates and HIV incidence rates. These parameters are only identifiable if ancillary surveillance data (from the CD4 database held at the HPA) are included in the model, clearly demonstrating the value of this specific surveillance programme. Aggregate information on the number of undiagnosed infected individuals can also be used to refine estimation. This formulation, in which HIV diagnoses replaces AIDS diagnoses, avoids the necessity of dealing with treatment effects, which would complicate things substantially. Application has concentrated on the homosexual epidemic.


HIV prevalence estimation:

Estimation of HIV prevalence remains essential even after the introduction of HAART. Traditionally, prevalence estimates have been obtained as a cross-sectional summary of prevalences in various groups at high risk for HIV. The "Direct" method (see for example, Petruckevitch et al 1997; [06.073]) combines information on the size of risk group derived from population-based surveys of HIV-related risk or exposure behaviours, with HIV prevalence estimates from anonymous surveys.

A characteristic of this method is that it depends on the availability of direct data on relevant parameters, such as the size of a particular group at high risk. If direct data are not available, assumptions and adjustments are used to derive estimates of these parameters. Because of these adjustments, it is not possible to attach uncertainty to the final prevalence estimates, therefore only final point estimates are available with no measure of uncertainty. In this framework, there is also no scope for validation of results as there is no notion of model or model fitting. This method has been the accepted method to estimate HIV prevalence in the UK.

In collaboration with Tony Ades (Bristol), we have proposed an alternative Bayesian multi-parameter evidence synthesis (MPES) approach [06.405]. The philosophy underlying the approach is that all (both direct and indirect) available data are used, so that estimation of each parameter can also be informed by indirect data and, in fact, multiple sources of evidence can contribute to such estimation. This clearly uses information more efficiently, leads to more precise estimates, is less prone to biases due to selection of information and, importantly, allows the assessment of whether the various pieces of information are consistent with one another. For example, we found that, under the interpretation that has been routinely assumed in HIV prevalence estimates obtained through the "Direct Method", some of the HIV surveillance sources conflicted with each other. Finally, the Bayesian paradigm provides naturally the crucial measure of uncertainty. A substantive characteristic of the project has been the emphasis on consistency of evidence and the use of diagnostics for model fit and model choice. These important issues will be the topic of an MRC funded workshop in September 2006

We have assessed the feasibility of our approach using surveillance data for 2001 census data, and information from the National Survey of Sexual Attitudes and Lifestyles and the National Study of HIV in Pregnancy. The model developed includes thirteen distinct risk groups, and estimates the size of each risk group, the proportion of infected in each risk group and the proportion of infected diagnosed in three regions (Inner London, Outer London, and the rest of England and Wales). The MPES has been used to derive the official HIV prevalence estimates for 2004 [05.405].

A promising development of the above work is to use it to obtain HIV incidence estimates. As a result of the work on prevalence, we derive estimates for each risk group of the number of people in the compartment "infected" with HIV, "infected not yet diagnosed" and "infected and diagnosed" at any given point in time. These estimates, combined with an appropriate model for transition between compartments (and groups), can be used to obtain estimates of transition rates, in particular of the rate between "uninfected" and "infected" i.e. the incidence of infection. Feasibility of this approach has been demonstrated and further developments are the topic of Anne Presanis's PhD project.

HCV modelling

World-wide infection with the Hepatitis C Virus (HCV) is a major cause of chronic liver disease including liver cancer. In England HCV has been identified as a priority in the Chief Medical Officer's strategy for control of infectious diseases with the aim of improving prevention, diagnosis and treatment. Our involvement started with participation in the Department of Health (DH) strategy committee (Department of Health Strategy group for Hepatitis) whose work defined the HCV Action Plan for England. The research we are now conducting is to provide the quantitative support to this plan. However, information on HCV spread is very limited, surveillance systems are still not very developed and little is known about progression of infection with HCV. Once again the role of our work is that of identifying directions for development of surveillance systems. So far, we have concentrated on two areas: estimation of HCV prevalence and prediction of future burden in the general population.

Estimation of HCV prevalence:

There is no agreement on the prevalence of HCV in the general population and in the topic has recently been the subject of public debate. Data on HCV prevalence come from testing residual sera collected within unlinked anonymous programmes in key groups such as pregnant women and attenders at genitourinary medicine (GUM) clinics as well as from routine diagnostic hospital testing. As with the "Direct" method for HIV, estimates of the number of HCV infected individuals could be derived by combining the proportion of infected individuals from each group with its size. The resulting estimates of HCV prevalence are however difficult to interpret, since they are biased estimates of HCV prevalence in the general population, as a result of these key groups being mixtures of sub-groups with different risk for HCV. Only by including information on these mixtures, which might come from several data sources, is it possible to provide a reliable estimate of HCV prevalence. In collaboration with Matthew Hickman (Bristol), Tony Ades (Bristol) and epidemiologists at the HPA, we have developed an epidemiological model of the population of England and Wales aged 15 to 59 years and subdivided by gender, region and risk group (current injecting drug users (IDUS), ex-IDUS, not IDUS). Our approach is again Bayesian and our goal is to include all available information on relevant parameters. This modelling exercise has revealed inconsistency of information and identified the current lack of information on both the HCV prevalence in ex-IDUs and the size of this group.

Prediction of HCV burden

Application of an age-specific version of the Bayesian back-calculation approach developed for HIV has provided estimates of the current and future burden of chronic HCV by disease stage. We have used data on hepatocellular carcinoma (HCC) due to HCV over time, estimates of HCV progression through disease stages and information on the number of hospital admissions due to end-stage liver disease and hepatocellular carcinoma, to reconstruct the underlying incidence of HCV. The resulting incidence has then been used to predict the number of chronically infected individuals by disease stage. Here, the inclusion of additional information such as hospital admission data, has served to highlight the need for further research on some of the key parameters: for example, data on HCC deaths conflict with information from hospital admission unless some of transition probabilities are allowed to vary by age. An alternative explanation is a bias in the hospital admission data, on which there is currently no information. We have presented results in the first annual report on HCV in England, showing a likely substantial increase in the burden on healthcare resources [05.402; 06.118]Remarkably, trends in the underlying incidence of HCV mirror those in the incidence of injecting drug use as recently estimated from data on overdose mortality [04.032].

Disease progression modelling

Information on disease progression represents an essential ingredient to the understanding of the evolution of epidemics. In the HIV field, estimates of disease progression from HIV infection have been provided by cohort studies, mostly conducted in the United States, of individuals typically enrolled after HIV infection and with unknown date of seroconversion. Since 1997, the CASCADE collaboration has pooled 22 cohort studies from several European countries providing what is currently the largest cohort of HIV infected individuals with well estimated seroconversion dates. As a statistician member, I have been involved in a number of projects, and, in particular, in the parametric and semi-parametric modelling of the incubation time to AIDS [06.031]. More importantly, data from CASCADE have offered the opportunity to estimate age (at seroconversion) specific CD4-based staged models of HIV progression, prior to and after the advent of HAART. The resulting age-specific transition rates represent a fundamental input to further development of the models in section on HIV modelling.

Essential to projection of future HCV burden and clinical management of HCV infected individuals is the research conducted to estimate HCV progression. This is typically estimated using data on disease severity, established through a scoring system for fibrosis, from patients who have undergone liver biopsy. Progression is estimated at the patient level by dividing the change in fibrosis score at two consecutive observations by the time elapsed between them. There are two problems with the common approaches: firstly, the estimation method makes the assumption that the patient enters the fibrosis stage precisely at the time of observation and that progression is constant thereafter; and secondly, the dependence of the recruitment on the underlying disease process produces biased estimates of progression. In collaborative work [06.117] we have addressed these two problems by adopting a three stage progressive Markov model to describe fibrosis progression from mild HCV disease to a cirrhotic state. Models of this type are best suited to estimate transition rates between disease stages on the basis of interval censored observations. We have analysed data on the results of biopsies from three different cohorts characterised by different recruitment policies, and estimates of progression vary substantially according to the method of recruitment. The probability of developing cirrhosis after 20 years from infection, for a group of patients of a specific profile, was estimated to be 6%(95% CI 3%-13%) using data from the HCV National Register (Harris et al, 2000), 12% (95% CI 6%-22%) using data from the Trent Study (Mohsen, 2001), a hospital based cohort, and 23% (95% CI 14%-37%) using data from a tertiary referral centre for liver disease. Importantly, the HCV National Register run at the Health Protection Agency and, based on a "lookback" exercise of individuals infected through transfusion, recruits patients independently of their disease severity. We have used estimates from this cohort to predict future HCV burden while those derived from the hospital based cohorts have been used as input in cost-effectiveness analyses of treatment of HCV at a mild stage [06.046].


My involvement in bio-informatics has focused on the modelling of the process of error percolation in databases of protein sequences. Proteins are responsible for the functioning of an organism by performing specific tasks. Publicly available databases of protein sequences report this function in the form of an annotation. In good quality databases, the annotation is assigned manually, on the basis of experimental evidence. The genome sequencing project has resulted in a rapid increase in protein sequence information and the annotation process has been accelerated through use of automatic methods based on sequence similarity. The function of a protein is now more commonly attributed by copying the annotation from proteins already annotated that are "homologous", i.e. show, through a similarity of sequence, a common origin to the protein of interest. This process is prone to error if, for instance, the functional annotation of the homologous proteins has itself been derived from sequence similarity, as no information is kept on how the annotation of a protein has been acquired. It is then possible that, through this copying mechanism, annotation errors can percolate through the database. In collaboration with Christos Ouzounis's group at the European Bioinformatics Institute at Hinxton, we have modelled the percolation process to investigate the effect of the progressive misannotation on the quality of the database [02.029]. Results have shown a worrying progressive deterioration of the quality of the database and we recommended to improve data tracking. We later extended our model to deal with more complex annotation structures [05.032].


Reconstructing the IDU epidemic

The spread of injecting drug use is analogous to the spread of an infectious disease. Its evolution is strongly related to that of the HCV epidemic as sharing infectious needles is the major root of HCV transmission. In collaboration with Matthew Hickman (Bristol) we have conducted work to estimate the characteristics of the IDU epidemic. In a report to the Home office [02.402] we have reviewed methods for estimating prevalence and incidence. In [01.031] we have attempted to estimate incidence of opiate/IDU using data on the number of IDUs in treatment. Finally in [04.032] we exploit information on the age at first injection and age-specific mortality due to overdose as well as information on injecting history duration to derive age-specific estimates of opiate/IDU incidence. Results are sensitive to assumptions made about key parameters, such as the distribution of the length of injecting career, on which there is currently little evidence. Further insight into opiate related overdose mortality, derived from the Cohort Studies on Mortality of Opiate-users workshop run in collaboration with Sheila Bird in November 2003 [05.005], could also help to refine these estimates.

Summary of major achievements

  • development of Bayesian approaches for disease incidence and prevalence estimation with application to HIV;
  • estimation and prediction of HCV burden by disease stage using Bayesian multistage models;
  • detailed statistical analysis of data from observational cohorts leading to a greater understanding of HCV progression.

Publications from this programme


Publications - 2001


CASCADE Collaboration, (participant: De Angelis D). Is the time from HIV serconversion a determinant of the risk of AIDS after adjustment for updated CD4 cell counts? Journal of Acquired Immune Deficiency Syndrome 2001; 28: 158-165.


Hickman M, Seaman SR, De Angelis D Estimating the relative incidence of heroin use: application of a method for adjusting observed reports of first visits to specialized drug treatment agencies. American Journal of Epidemiology 2001; 153: 632-641.


Nicoll A, Hughes G, Donnelly M, Livingstone S, De Angelis D, Fenton K, Evans B, Gill ON, Catchpole M. Assessing the impact of national anti-HIV sexual health campaigns: trends in the transmission of HIV and other sexually transmitted infections in England. Sexually Transmitted Infections 2001; 77: 242-247.

Publications - 2002


CASCADE Collaboration, (participant: De Angelis D). Changes over calendar time in the risk of specific first AIDS-defining events following HIV seroconversion, adjusting for competing risks. International Journal of Epidemiology 2002; 31: 951-958.


Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA. Modeling the percolation of annotation errors in a database of protein sequences. Bioinformatics 2002; 18: 1641-1649.


McHenry A, Evans BG, Sinka K, Shaheem Z, Macdonald N, De Angelis D Numbers of adults with diagnosed HIV infection 1996-2005 - adjusted totals and extrapolations for England, Wales and Northern Ireland. Communicable Disease and Public Health 2002; 5: 97-100.

Publications - 2003


CASCADE Collaboration, De Angelis D Impact of tuberculosis on HIV disease progression in persons with well-documented time of HIV seroconversion. Journal of Acquired Immune Deficiency Syndrome 2003; 33: 184-190.

Publications - 2004


CASCADE Collaboration, (participant: De Angelis D). Short-term risk of AIDS according to current CD4 cell count and viral load in antiretroviral drug-naive individuals and those treated in the monotherapy area. AIDS 2004; 18: 51-58.


CASCADE Collaboration, (participant: De Angelis D). Systemic non-Hodgkin lymphoma in individuals with known datess of HIV seroconversion: incidence and predictors. AIDS 2004; 18: 673-681.


De Angelis D, Hickman M, Yang S. Estimating long-term trends in the incidence and prevalence of opiate use/injecting drug use and the number of former users: back-calculation methods and opiate overdose deaths. American Journal of Epidemiology 2004; 160: 994-1004.


Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA. Percolation of annotation errors through hierarchically structured protein sequence databases. Mathematical Biosciences 2004.


PLATO Collaboration, De Angelis D Predictors of trend in CD4-positive T-cell count and mortality among HIV-1-infected individuals with virological failure to all three antiretroviral-drug classes. Lancet 2004; 364: 51-64.


Sweeting MJ, Sutton AJ, Lambert PC. What to add to nothing? Use and avoidance of continuity corrections in meta-analysis of sparse data. Statistics in Medicine 2004; 23: 1351-1375.

Publications - 2005


Anderson HR, Atkinson RW, Peacock JL, Sweeting MJ, Marston L. Publication bias in studies of the short-term associations between ambient particulate matter and health effects. Epidemiology 2005; 16: 155-163.


Bargagli AM, Hickman M, Davoli M, Perucci C, Schifano P, Buster M, Brugal T, Vicente J. Drug-related mortality and its impact on adult mortality in eight European countries, DA/SB participants, for the COSMO European Group 6. European Journal of Public Health 2005; published online.


Gilks WR, Audit B, De Angelis D, Tsoka S, Ouzounis CA. Percolation of annotation errors through hierarchically structured protein sequence databases. Mathematical Biosciences 2005; 193: 223-234.


Sweeting MJ, De Angelis D, Aalen OO. Bayesian back-calculation using a multi-state model with application to HIV. Statistics in Medicine 2005; 24: 3991-4007.

Publications - 2006


Bongartz T, Sutton AJ, Sweeting MJ, Buchan I, Matteson EL, Montori V. Anti-TNF Antibody Therapy in Rheumatoid Arthritis and the risk of serious infections and malignancies: a systematic review and metaanalysis of rare harmful effects in randomized controlled trials. Journal of the American Medical Association 2006; under revision.


De Angelis D, Presanis A, Yang S, Walker S. Parametric models for the distribution of the incubation time between HIV infection and AIDS. Journal of the Royal Statistical Society 2006; submitted.


Grieve R, Roberts J, Wright M, Sweeting MJ, De Angelis D, Rosenberg W, Bassendine M, Main J, Thomas H. Cost-effectiveness of interferon alpha or peginterferon alpha with ribavirin for histologically mild chronic hepatitis C. Gut 2006; in press.


McGarrigle CA, Cliffe S, Copas AJ, Mercer CH, De Angelis D, Fenton KA, Evans BG, Johnson AM, Gill ON. Estimating adult HIV prevalence in the UK in 2003. The Direct method of estimation. Sexually Transmitted Infections 2006; 82: 78-86.


Sweeting MJ, De Angelis D, Neal KR, Ramsay ME, Wright M, Brant L, Harris HE, the Trent HCV Group and the HCV National Register Steering Group. Estimated progression rates in three United Kingdom hepatitis C cohorts differed according to method of recruitment. Journal of Clinical Epidemiology 2006; 59: 144-152.


Sweeting MJ, De Angelis D, Ramsay ME, Brant L, Harris HE. The burden of hepatitis C in England and Wales. British Medical Journal 2006; submitted.


Books, book chapters and reports


De Angelis D Methods for estimating incidence of heroin use and other problematic drug use in the UK. A Report to the Home Office. Home Office; 2002.


De Angelis D Prevalence of HIV and hepatitis infections in the United Kingdom. Department of Health; 2002.


De Angelis D Hepatitis C in England: The First Health Protection Agency Report. Health Protection Agency Centre for Infections; 2005.


UK Collaborative Group for HIV and STI Surveillance, (member: De Angelis D). Mapping the Issue. HIV and other Sexually Transmitted Infections in the United Kingdom. Health Protection Agency Centre for Infections; 2005.


Goubar A, Ades AE, De Angelis D, McGarrigle CA, Mercer C, Tookey P, Fenton K, Gill ON. Bayesian multi-parameter synthesis of HIV surveillance data in England and Wales, 2001 [technical report]. London: Health Protection Agency Centre for Infections 2006; in press.


Editorials and commissioned articles


De Angelis D Backcalculation. In: Everitt BS, Palmer CR. Encyclopaedic Companion to Medical Statistics. London: Hodder Education; 2005.


De Angelis D Incubation time. In: Everitt BS, Palmer CR. Encyclopaedic Companion to Medical Statistics. London: Hodder Education; 2005.


European COSMO workshop (Cohort Studies on Mortality of Opiate-users), (members: Bird SM, De Angelis D, Hutchinson SJ). Over 1200 drug-related deaths and 190,000 opiate-user-years of follow-up: relative risks by sex and age-group. International Journal of Drug Policy, submitted.


Other non-peer-reviewed publications


Presanis A. The UK Collaborative Group for HIV and STI Surveillance. Mapping the Issues. HIV and other Sexually Transmitted Infections in the United Kingdom. Health Protection Agency Centre for Infections; 2005.



Taken from the submitted MRC Biostatistics Unit's Quinquennial Review, 2006