Archive | Epidemiology

Data visualization tools become mainstream

We’ve come a long way from the hand drawn and photographed black and white graphs that were used to illustrate a key point in the days before computer graphics.

The techies among us were the first to play with color, line and movement to make data more intriguing, entertaining, and attractive, and data visualizations have progressed to the point of becoming reference resources rich in data about a given topic.

The Institute for Health Metrics and Evaluation has produced data visualizations on a range of topics. Particularly timely in the US, is this visualization showing personal health care spending by disease, type of care (ambulatory, inpatient, prescribed pharmaceutical, nursing facility, dental, and emergency), year, gender, and age group. We can see that in 2013 $18 billion was spent for pharmaceuticals for those under 20, compared to $112 billion spent for those 65 and over. In the group under 20, the greatest amounts were spent on mental and substance use disorders, followed by chronic respiratory diseases. In the group 65 and over, diabetes, treatment of risk factors, and cardiovascular diseases led the way.

Data visualizations keep coming

And publications are pouring out of these data visualization tools. A good example is the article by Roth et al, “Trends and patterns of geographic variation in cardiovascular mortality among US counties, 1980–2014” published in JAMA this past May. They estimate age-standardized mortality rates by county from cardiovascular diseases (CVD) for the United States.

Data visualizations help us understand causes of mortality.

US County-Level Mortality From Cardiovascular Diseases A, Age-standardized mortality rate for both sexes combined in 2014. B, Percent change in the age-standardized mortality rate for both sexes combined between 1980 and 2014

What, precisely, is the precision medicine initiative?

First of all, it’s a big deal, high on President Obama’s agenda and with its own page on the White House Web Site

Near term goals

One immediate goal of the Precision Medicine Initiative will be to significantly expand efforts in cancer genomics to create prevention and treatment successes for more cancers.

But the long term goals are broader

the Initiative will 1) support a national network of scientists who possess the talent and skills to develop new approaches for answering critical scientific and medical questions and 2) launch a national cohort study of a million or more Americans to propel our understanding of health and disease. The goal is to set the foundation for a new way of doing research that fosters open, responsible data sharing with the highest regard to patient privacy, and that puts engaged participants at the center.

Precision Medicine Initiative logo

You’ll soon recognize the PMI logo

Some specific research questions would be to:
  • Identify genomic variants that affect drug response
  • Assess clinical validity of genomic variants associated with disease
  • Identify biomarkers that are early indicators of disease
  • Understand chronic diseases and best management strategies
  • Understand genes/pathways/factors that protect from disease

In the process, we will learn about EHRs, mhealth, patient engagement and new research methodologies.


Workshops conducted by the PMI Working Group are open to the public if space is available, and can be viewed on webcasts. The recent workshop on Participant Engagement and Health Equity (July 1 and 2, 2015) was phenomenal, and worthwhile catching the video.

NHANES – beyond nutrition to prescription meds

NHANES prescription medication data hasn’t always been on my radar.

I’m not sure why this was so. NHANES is a well known national survey that began as a nutrition survey and quickly expanded to include a range of health variables, including results of a physician exam and laboratory tests. It is a national probability based sample, which means that one can generalize from the NHANES to the entire United States, and its methodologic standards are of the very highest. Perhaps the reason that I overlooked the prescription medication data is that there is so much data, and also, that NHANES was known primarily for nutrition data. Getting past my own blind spot, I decided to take a closer look at the prescription medication data collected in NHANES.

Here are some key points.

The survey

NHANES began in the 1960s and was conducted in waves, with a NHANES I, NHANES II and NHANES III. We love it so much that it became a permanent fixture. Since 1999 it has been conducted continuously in two-year cycles, and is now called NHANES continuous, or just NHANES. The US population is sampled over a two-year cycle and the data need to be analyzed using the full two-year sample. The sample is representative of the non-institutionalized, U.S. population and for example, does not include residents in nursing homes, or people in prison.

Sample size

Sample size is critical to being able to estimate drug utilization. Unweighted sample sizes by age group are listed in the table below. While the numbers are large (every one of these people was interviewed in their homes), they may not be large enough for many purposes in pharmacoepidemiology. For those of us interested in pediatric medication use there were 4,194 people under 20 included in the sample. When stratified into age groups, the sample might not be large enough to study medications taken by small percentages (fewer than 1%) of children. The table below is taken from the NHANES website and shows unweighted sample sizes.

Table 2. Unweighted sample size and percents by age groups from NHANES 2005-06, 2007-08 and 2009-2010 for examined participants

NHANES prescription medications

NHANES prescription medication information

The medication information is collected during an in person interview in the participant’s home. During the interview, survey participants are asked if they have taken medications in the past 30 days for which they needed a prescription. Those who answer “yes” are asked to show the interviewer the medication containers of all the products used. For each medication reported, the interviewer enters the product’s complete name from the container into a computer. If no container is available, the interviewer asks the participant to verbally report the name of the medication. Participants are also asked how long they had been taking the medication and the main reason for use. This is in contrast to databases that rely on billing or claims data, or electronic health records. Documentation about the 2011-2012 data files containing prescription medication can be found here.

Using NHANES prescription medication data for pharmacoepidemiology

The pros and cons of using NHANES for pharmacoepidemiology are straight forward. On the pro side, NHANES may be the only probability based population sample in the United States with medication information. This alone makes it extremely valuable, and useful in conjunction with other types of data. The second strength, is that unlike health records, claims, or prescription data bases, the NHANES documents the presence of the medication in the patient’s home, demonstrating the the prescription was purchased and brought home. Along the continuum of measures, beginning with prescriptions written and prescriptions filled, documenting the prescription in the patient’s home brings us closer to understanding true exposures and levels of use. Another positive that needs to be explored is the availability of information from the physical exam and laboratory tests for the person using a given prescription.

On the con side, the sample sizes may be too small to provide stable estimates of many medications, especially if one wishes to study use within a sub-group. In terms of bias, my first thought is that this method of estimating use will result in underestimates of use, with people forgetting, omitting or otherwise not reporting their medication use to an interviewer. Misclassification in the other direction might occur when a person has filled a prescription and shows the prescription to the interviewer, but does not take the prescription. This latter source of bias would lead to an over-estimate of use but would also effect each of the other types of measures of prescription medication use (prescriptions written or prescriptions filled also over-estimate the numbers of people actually using the medication.

Recent publications using NHANES prescription medication data

A quick search turns up several publication analyzing prescription medication data in NHANES, but not as many as one might expect. An interesting use of the data is that of Bateman and colleagues (2012) focusing on a group with a risk factor, hypertension, and describing the medication use within that group. This usage may have applications for people working in health economics and outcomes research.

  • Farina EK, Austin KG, Lieberman HR, “Concomitant Dietary Supplement and Prescription Medication Use Is Prevalent among US Adults with Doctor-Informed Medical Conditions” J Acad Nutr Diet 2014 Apr 4 S2212-2672(14)
  • Bertisch SM, Herzig SJ, Winkelman JW, Buettner C, “National use of prescription medications for insomnia: NHANES 1999-2010” Sleep. 2014 Feb 1;37(2):343-9
  • Chong Y, Fryer CD, Gu Q, “Prescription sleep aid use among adults: United States, 2005-2010” NCHS Data Brief. 2013 Aug;(127):1-8
  • Gu Q, Burt VL, Dillon CF, Yoon S, “Trends in antihypertensive medication use and blood pressure control among United States adults with hypertension: the National Health And Nutrition Examination Survey, 2001 to 2010” Circulation. 2013 Jun 18;127(24)
  • Bateman BT, Shaw KM, Kuklina EV, Callaghan WM, Seely EW, Hernandez-Diaz S, “Hypertension in women of reproductive age in the United States: NHANES 1999-2008” PLoS One. 2012;7(4):e36171
  • Kinjo M, Setoguchi S, Solomon DH, “Antihistamine therapy and bone mineral density: analysis in a population-based US sample” Am J Med. 2008 Dec;121(12):1085-91

Epidemiology and data visualization

We’ve always understood the value of data visualization

John Snow

If you’ve taken a look at any overview about epidemiology, or attended one lecture on the subject, you’ve heard of John Snow, the Victorian anesthesiologist, who tromped around London in his spare time, asking people about their water supply (water was delivered by different companies, and drawn from different pumps, accordingly). He associated one water supplier with a higher incidence of cholera deaths, inferring that something in the water was causing cholera – and this was before scientists had embraced the germ theory, and well before pathogens had been identified as causes of infectious disease.

Data visualization

John Snow’s map showing cases of cholera in 19th century London.

In addition to his excrutiating hand calculations of infections and death rates, he mapped the data. We can see references to his maps on all kinds of data visualization sites. Epidemiologists have always known about his maps; now they are garnering attention from non-epidemiologists.

Florence Nightingale

Another Victorian who didn’t mind using pencil and paper to add up lots and lots of numbers, was Florence Nightingale. Yes, the lady with the lamp, was also an ardent mathematician/statistician. She invented a type of diagram, “coxcombs” to visualize mortality by different causes.

Data visualization

Florence Nightingale called these diagrams, “coxcombs”.

But was she an epidemiologist? The term “epidemiologist” wasn’t in use when she was doing her work, but “she used statistics to measure health, identify causes of mortality, evaluate health services, and reform institutions.” (Stolley and Lasky, Investigating Disease Patterns, 1995).

Visualizing Health

From the Robert Wood Johnson Foundation and the University of Michigan Center for Health Communications Research

We’re starting to see the fruits of all the excitement about data visualization and health, notably this thorough report from Visualizing Health, a project of the Robert Wood Johnson Foundation and the University of Michigan Center for Health Communications Research.

As they state,

In theory, data can help us make better decisions about our health. Should I take this pill? Will it help me more than it hurts me? How can I reduce my risk? And so on.

But for individuals, it’s not always easy to understand what the numbers are telling us. And for those communicating the information – doctors, hospitals, researchers, public health professionals — it’s not always clear what sort of presentation will make the most sense to the most people.

Their web site contains examples of tested visualizations, and what’s especially nice, they’ve done research assessing reactions from the general public. They’ve created a gallery of graphs, charts, and images, and they’ve done the hard work of evaluating them.


from Visualizing Health, one of their data visualizations

one of their data visualizations

Among the goodies, a “wizard” tool to help you learn more about a risk you want to communicate, and a sample risk calculator that shows off some of the best design concepts.

I like the way they’ve identified use cases:

  • Tradeoffs between medication or treatment options?
  • Relating biomarkers (such as BMI or cholesterol levels) to risk?
  • Health risk assessment output?
  • Population risks: disparities?
  • Population risks: emergent disease (“Should I worry about that measles outbreak?”)
  • Understanding multiple side effects?Understanding unique side effects?
  • Motivating a risk-reducing action?
  • Understanding tradeoffs that change over time over time?
  • Small risks, and understanding how to reduce small risks?
  • Explaining what “average years saved” means for an individual person?

I like the way they describe their methodology, using three tools to test their images (google consumer surveys, survey sampling international, and amazon mechanical turk). Transparency is always appreciated!

And, at the back of the report (why at the back?) a comic book style presentation on visualizing health in practice, using images to educate patients about diabetes.

about health literacy

about health literacy