How do people die?
How do people think we die?
And is there a difference?
Well, it turns out there's a fascinating study conducted by Paul Slovic and Barbara Combs where they looked at how often different types of deaths were mentioned in the news. They then compared the frequency of news coverage with the actual frequency of people who died for each cause.
The results are what one might cynically expect:
"Although all diseases claim almost 1,OOO times as many lives as do homicides, there were about three times as many articles about homicides than about all diseases. Furthermore, homicide articles tended to be more than twice as long as articles reporting deaths from diseases and accidents."
For our final capstone project for the fantastic Bradley Voytek's COGS 108 course at UCSD, we thought it would be interesting for us to have our own go at examining potential disparities between actual deaths and their corresponding media attention.
For anyone curious about any of the steps throughout this project, the original data and code we used to do all this analysis is available here on GitHub.
For our project, we looked at four sources:
For all of the above data, we looked at the top 10 largest causes of mortality, as well as terrorism, overdoses, and homicides, three other causes of death which we believe receive a lot of media attention.
In all the charts below, we’ve normalized their value by dividing by the sum of all values for that year. Thus, the values given represent their relative share, rather than absolute counts. This is mainly to make comparisons between distributions easier, as what we really care about here is the proportionality in representation across different sources.
First off, as our “ground truth”, we’ll look at the causes of mortality as given by the CDC.
Immediately, we can see that cancer and heart disease take up a major chunk of all deaths, each responsible for around 30% of the total death count. On the graph, everything is visible except for terrorism, which is so small it doesn’t show up unless we zoom in (You can do this by clicking on different causes in the legend to “strike them out” from the graph).
Next, here’s the Google Trends data. (Because Google Trends didn’t start until 2004, we alas aren’t able to explore search data from 1999-2003.)
The two major changes here seem to be that heart disease is underrepresented here, and terrorism is very much overrepresented. Suicide also looks like it has several times more relative share here than compared to the actual death rate. The rest of the causes look like they’re within the right order of magnitude as the CDC data.
Now here’s the data for The Guardian and The New York Times. We put them both below as they appear quite similar. (We’ll be able to quantify the degree of similarity in the next section.)
Here, we see that terrorism, cancer, and homicides are the causes of death that are most mentioned in the newspapers. Though the share that cancer occupies seems largely proportional, the share given to both homicides and terrorism appears grossly overrepresented, given their respective share of total deaths.
Finally, here’s all of the above data presented in one graph, so we can see them side-by-side:
After our cursory glance at the data, we have reason to think that the distributions given to these different causes of death for each source (CDC, Google Trends, The Guardian, and The NYT) are not in fact the same.
To examine whether or not these distributions are the same, we’ll use a 𝛘2 (chi-squared) test for homogeneity, which can tell us if the way that different categorical variables are distributed in two groups are the same.
We’ll run 𝛘2 tests with these four pairings of our data:
Here are the results:
|Data Compared||𝛘2 Test Statistic||p-value|
|CDC and Google Trends||49.242||1.897×10-6|
|CDC and The Guardian||1198.758||3.205×10-249|
|CDC and The NYT||1204.499||1.860×10-250|
|The Guardian and The NYT||0.056||0.999|
As we guessed, the 𝛘2 value for tests 1-3 are indeed quite high. Especially for tests 2 and 3, the p-value is incredibly low, meaning that we would basically never expect to see results of this kind, if it were the case that our null hypothesis that the newspaper’s categorical distribution matches that of the CDC’s distribution was true.
We can also see that the NYT and the Guardian’s have a very low 𝛘2 value, indicating that it is quite likely they came from the same distribution. So now we have evidence that our two media sources are roughly similar, and this distribution is different from that of how causes of death actually affect the population.
During our preliminary graphing of the data, we noted that terrorism and homicides appeared overrepresented in the news data, and that heart disease appeared underrepresented. Below, we’ve listed the difference of factors in representation across the different sources for the 13 causes of deaths.
(For the Factor of Difference column, we took the larger value of Avg Deaths Proportion/Avg Newspaper Proportion and Avg Newspaper Proportion/Avg Deaths Proportion and added "Over" or "Under" to denote whether this value was over or underrepresented relative to the Avg Deaths Proportion value.)
|Cause of Death||Avg Deaths Proportion||Avg Newspaper Proportion||Factor of Difference|
|Alzheimer's Disease||0.036||0.009||4.172 Under|
|Car Accidents||0.057||0.025||2.285 Under|
|Heart Disease||0.305||0.029||10.388 Under|
|Kidney Disease||0.023||0.002||10.793 Under|
|Lower Respiratory Disease||0.064||0.018||3.520 Under|
|Pneumonia & Influenza||0.028||0.041||1.486 Over|
Here's a graphical representation of the Avg Newspaper Proportion/Avg Deaths Proportion factors. (Note that the y-axis is log-scaled)
The most striking disparities here are that of kidney disease, heart disease, terrorism, and homicide. Kidney disease and heart disease are both about 10 times underrepresented in the news, while homicide is about 31 times overrepresented, and terrorism is a whopping 3900 times overrepresented. Kidney disease is a little surprising; we had guessed at the other three, but it was only by calculating the factor here that this disparity became visible.
We set out to see if the public attention given to causes of death was similar to the actual distribution of deaths. After looking at our data, we found that, like results before us, the attention given by news outlets and Google searches does not match the actual distribution of deaths.
This suggests that general public sentiment is not well-calibrated with the ways that people actually die. Heart disease and kidney disease appear largely underrepresented in the sphere of public attention, while terrorism and homicides capture a far larger share, relative to their share of deaths caused.
Though we have shown a disparity between attention and reality, we caution from drawing immediate conclusions for policy. One major issue we have failed to address here is that of tractability; just because a cause of death claims more lives does not mean that it is easily addressable.
A more nuanced look at which causes of mortality to prioritize would likely be with a model like an evaluation framework.
Throughout the course of this project, we engaged in several, shall we say, questionable, methodological conveniences to make the analysis easier on us. These transgressions would likely doom us to the third circle of Statistics Hell—not as bad as p-hacking, but definitely worse than failing to preregister. Thus, to keep our consciences clean, we present to you: