{"id":95,"date":"2023-09-14T09:17:50","date_gmt":"2023-09-14T09:17:50","guid":{"rendered":"https:\/\/owenshen24.local\/?p=95"},"modified":"2023-09-14T09:17:52","modified_gmt":"2023-09-14T09:17:52","slug":"charting-death","status":"publish","type":"post","link":"https:\/\/owenshen24.local\/charting-death\/","title":{"rendered":"Death: Reality vs Reported"},"content":{"rendered":"\n
\"\"<\/figure>\n\n\n\n

Data collection and analysis by\u00a0Hasan Al-Jamaly<\/a>,\u00a0Maximillian Siemers<\/a>,\u00a0Owen Shen<\/a>, and\u00a0Nicole Stone<\/a>.<\/p>\n\n\n\n

Background:<\/h2>\n\n\n\n

How do people die?<\/p>\n\n\n\n

How do people think<\/em> we die?<\/p>\n\n\n\n

And is there a difference?<\/p>\n\n\n\n

Well, it turns out there’s a fascinating study<\/a> conducted by Paul Slovic and Barbara Combs where they looked at how often different types of deaths were mentioned in the news. They then compared the frequency of news coverage with the actual frequency of people who died for each cause.<\/p>\n\n\n\n

The results are what one might cynically expect:<\/p>\n\n\n\n

\n

“Although all diseases claim almost 1,OOO times as many lives as do homicides, there were about three times as many articles about homicides than about all diseases. Furthermore, homicide articles tended to be more than twice as long as articles reporting deaths from diseases and accidents.”<\/em><\/p>\n<\/blockquote>\n\n\n\n

Since 1979, when the original Combs and Slovic study was conducted, there have been several more empirical analyses which have found largely similar results. (Notably, here<\/a> and here<\/a>)<\/p>\n\n\n\n

For our final capstone project for the fantastic Bradley Voytek’s<\/a> COGS 108 course at UCSD, we thought it would be interesting for us to have our own go at examining potential disparities between actual deaths and their corresponding media attention.<\/p>\n\n\n\n

For anyone curious about any of the steps throughout this project, the original data and code we used to do all this analysis is available here on GitHub<\/a>.<\/p>\n\n\n\n


\n\n\n\n

Data: The Gathering<\/h2>\n\n\n\n

For our project, we looked at four sources:<\/p>\n\n\n\n

    \n
  1. The Center for Disease Control\u2019s WONDER database for public health data<\/a> (1999-2016).<\/li>\n\n\n\n
  2. Google Trends search volume<\/a> (2004-2016).<\/li>\n\n\n\n
  3. The Guardian\u2019s article database<\/a>.<\/li>\n\n\n\n
  4. The New York Times\u2019 article database<\/a>.<\/li>\n<\/ol>\n\n\n\n

    For all of the above data, we looked at the top 10 largest causes of mortality, as well as terrorism, overdoses, and homicides, three other causes of death which we believe receive a lot of media attention.<\/p>\n\n\n\n

    In all the charts below, we\u2019ve normalized their value by dividing by the sum of all values for that year. Thus, the values given represent their relative share, rather than absolute counts. This is mainly to make comparisons between distributions easier, as what we really care about here is the proportionality in representation across different sources.<\/p>\n\n\n\n

    First off, as our \u201cground truth\u201d, we\u2019ll look at the causes of mortality as given by the CDC.<\/p>\n\n\n\n

    We can see that cancer and heart disease take up a major chunk of all deaths, each responsible for around 30% of the total death count. On the graph, everything is visible except for terrorism, which is so small it doesn\u2019t show up unless we zoom in (You can do this by clicking on different causes in the legend to \u201cstrike them out\u201d from the graph).<\/p>\n\n\n\n

    Next, here\u2019s the Google Trends data. (Because Google Trends didn\u2019t start until 2004, we alas aren\u2019t able to explore search data from 1999-2003.)<\/p>\n\n\n\n

    The two major changes here seem to be that heart disease is underrepresented here, and terrorism is very much overrepresented. Suicide also looks like it has several times more relative share here than compared to the actual death rate. The rest of the causes look like they\u2019re within the right order of magnitude as the CDC data.<\/p>\n\n\n\n

    We see that terrorism, cancer, and homicides are the causes of death that are most mentioned in the newspapers. Though the share that cancer occupies seems largely proportional, the share given to both homicides and terrorism appears grossly overrepresented, given their respective share of total deaths.<\/p>\n\n\n\n


    \n\n\n\n

    Data Analysis<\/h2>\n\n\n\n

    After our cursory glance at the data, we have reason to think that the distributions given to these different causes of death for each source (CDC, Google Trends, The Guardian, and The NYT) are not in fact the same.<\/p>\n\n\n\n

    To examine whether or not these distributions are the same, we\u2019ll use a \ud835\uded82<\/sup> (chi-squared) test for homogeneity, which can tell us if the way that different categorical variables are distributed in two groups are the same.<\/p>\n\n\n\n

    We\u2019ll run \ud835\uded82<\/sup> tests with these four pairings of our data:<\/p>\n\n\n\n

      \n
    1. CDC and Google Trends<\/li>\n\n\n\n
    2. CDC and The Guardian<\/li>\n\n\n\n
    3. CDC and The New York Times<\/li>\n\n\n\n
    4. The Guardian and The New York Times<\/li>\n<\/ol>\n\n\n\n

      Here are the results:<\/p>\n\n\n\n

      Data Compared<\/th>\ud835\uded82<\/sup> Test Statistic<\/th>p-value<\/th><\/tr><\/thead>
      CDC and Google Trends<\/td>49.242<\/td>1.897\u00d710-6<\/sup><\/td><\/tr>
      CDC and The Guardian<\/td>1198.758<\/td>3.205\u00d710-249<\/sup><\/td><\/tr>
      CDC and The NYT<\/td>1204.499<\/td>1.860\u00d710-250<\/sup><\/td><\/tr>
      The Guardian and The NYT<\/td>0.056<\/td>0.999<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n

      As we guessed, the \ud835\uded82<\/sup> value for tests 1-3 are indeed quite high. Especially for tests 2 and 3, the p-value is incredibly low, meaning that we would basically never expect to see results of this kind, if it were the case that our null hypothesis that the newspaper\u2019s categorical distribution matches that of the CDC\u2019s distribution was true.<\/p>\n\n\n\n

      We can also see that the NYT and the Guardian\u2019s have a very low \ud835\uded82<\/sup> value, indicating that it is quite likely they came from the same distribution. So now we have evidence that our two media sources are roughly similar, and this distribution is different from that of how causes of death actually affect the population.<\/p>\n\n\n\n

      During our preliminary graphing of the data, we noted that terrorism and homicides appeared overrepresented in the news data, and that heart disease appeared underrepresented. Below, we\u2019ve listed the difference of factors in representation across the different sources for the 13 causes of deaths.<\/p>\n\n\n\n

      (For the Factor of Difference column, we took the larger value of Avg Deaths Proportion<\/sup>\/Avg Newspaper Proportion<\/sub> and Avg Newspaper Proportion<\/sup>\/Avg Deaths Proportion<\/sub> and added “Over” or “Under” to denote whether this value was over or underrepresented relative to the Avg Deaths Proportion value.)<\/p>\n\n\n\n

      Cause of Death<\/th>Avg Deaths Proportion<\/th>Avg Newspaper Proportion<\/th>Factor of Difference<\/th><\/tr><\/thead>
      Alzheimer’s Disease<\/td>0.036<\/td>0.009<\/td>4.172 Under<\/td><\/tr>
      Cancer<\/td>0.279<\/td>0.171<\/td>1.631 Under<\/td><\/tr>
      Car Accidents<\/td>0.057<\/td>0.025<\/td>2.285 Under<\/td><\/tr>
      Diabetes<\/td>0.035<\/td>0.028<\/td>1.260 Under<\/td><\/tr>
      Heart Disease<\/td>0.305<\/td>0.029<\/td>10.388 Under<\/td><\/tr>
      Homicide<\/td>0.008<\/td>0.251<\/td>30.796 Over<\/td><\/tr>
      Kidney Disease<\/td>0.023<\/td>0.002<\/td>10.793 Under<\/td><\/tr>
      Lower Respiratory Disease<\/td>0.064<\/td>0.018<\/td>3.520 Under<\/td><\/tr>
      Overdose<\/td>0.014<\/td>0.002<\/td>7.143 Under<\/td><\/tr>
      Pneumonia & Influenza<\/td>0.028<\/td>0.041<\/td>1.486 Over<\/td><\/tr>
      Stroke<\/td>0.053<\/td>0.059<\/td>1.119 Over<\/td><\/tr>
      Suicide<\/td>0.017<\/td>0.118<\/td>6.878 Over<\/td><\/tr>
      Terrorism<\/td>0.000<\/td>0.306<\/td>3906.304 Over<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n

      Here’s a graphical representation of the Avg Newspaper Proportion<\/sup>\/Avg Deaths Proportion<\/sub> factors. (Note that the y-axis is log-scaled)<\/p>\n\n\n\n

      The most striking disparities here are that of kidney disease, heart disease, terrorism, and homicide. Kidney disease and heart disease are both about 10 times underrepresented in the news, while homicide is about 31 times overrepresented, and terrorism is a whopping 3900 times overrepresented. Kidney disease is a little surprising; we had guessed at the other three, but it was only by calculating the factor here that this disparity became visible.<\/p>\n\n\n\n


      \n\n\n\n

      Conclusion<\/h2>\n\n\n\n

      We set out to see if the public attention given to causes of death was similar to the actual distribution of deaths. After looking at our data, we found that, like results before us, the attention given by news outlets and Google searches does not match the actual distribution of deaths.<\/p>\n\n\n\n

      This suggests that general public sentiment is not well-calibrated with the ways that people actually die. Heart disease and kidney disease appear largely underrepresented in the sphere of public attention, while terrorism and homicides capture a far larger share, relative to their share of deaths caused.<\/p>\n\n\n\n

      Though we have shown a disparity between attention and reality, we caution from drawing immediate conclusions for policy. One major issue we have failed to address here is that of tractability; just because a cause of death claims more lives does not mean that it is easily addressable.<\/p>\n\n\n\n

      A more nuanced look at which causes of mortality to prioritize would likely be with a model like an evaluation framework<\/a>.<\/p>\n\n\n\n


      \n\n\n\n

      Full Disclosure<\/h2>\n\n\n\n

      Throughout the course of this project, we engaged in several, shall we say, questionable<\/em>, methodological conveniences to make the analysis easier on us. These transgressions would likely doom us to the third circle of Statistics Hell\u2014not as bad as p-hacking, but definitely worse than failing to preregister. Thus, to keep our consciences clean, we present to you:<\/p>\n\n\n\n

      Statistical Sins We Committed:<\/h4>\n\n\n\n
        \n
      1. The article search APIs returned a list of all articles which contained the word anywhere<\/em> (headline or body). Though we originally wanted to look just at headlines, filtering for titles ended up proving unwieldy, so we ended up just grabbing the direct number of hits anywhere. This is a potential confounder in our analysis, especially as some words like \u201cstroke\u201d have multiple usages; it also means that our news data isn’t exactly representative of media hype.<\/li>\n\n\n\n
      2. Also, for the article search, we searched for different synonyms and added them up for our categories, as certain words, e.g. \u201cmurder\u201d, have roughly the same meaning as our initial search terms, e.g. \u201chomicide\u201d, and we wanted to take this into account. However, this might have led to unequal coverage of different topics, as certain words had more synonyms than others. For example, we used hits from \u201cheart disease\u201d, \u201cheart failure\u201d, and \u201ccardiovascular disease\u201d to account for the heart disease category, but only \u201cAlzheimer\u2019s\u201d for the \u201cAlzheimer\u2019s Disease\u201d category.<\/li>\n\n\n\n
      3. My understanding is that a \ud835\uded82<\/sup> test is typically used to measure counts for categorical data where the categories are mutually independent; that\u2019s a dubious assumption here, as several keywords, e.g. \u201chomicide\u201d and \u201cterrorism\u201d, might be mentioned in the same article. So there\u2019s definitely some double-counting going on here, which muddies our analysis.<\/li>\n\n\n\n
      4. Also, for the \ud835\uded82<\/sup> test, we used the average counts across all years, rather than running pairwise tests year-by-year. This could prove problematic because, if the underlying distribution differs from year to year, our comparisons using just the average might not be totally valid.<\/li>\n<\/ol>\n","protected":false},"excerpt":{"rendered":"

        Data collection and analysis by\u00a0Hasan Al-Jamaly,\u00a0Maximillian Siemers,\u00a0Owen Shen, and\u00a0Nicole Stone. Background: How do people die? How do people think we die? And is there a difference? Well, it turns out there’s a fascinating study conducted by Paul Slovic and Barbara Combs where they looked at how often different types of deaths were mentioned in the news. They then compared … Read more<\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[],"_links":{"self":[{"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/posts\/95"}],"collection":[{"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/comments?post=95"}],"version-history":[{"count":2,"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/posts\/95\/revisions"}],"predecessor-version":[{"id":98,"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/posts\/95\/revisions\/98"}],"wp:attachment":[{"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/media?parent=95"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/categories?post=95"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/owenshen24.local\/wp-json\/wp\/v2\/tags?post=95"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}