 Hi, welcome to the presentation, measuring correlation to causation exaggeration in press releases. My name is Bay Yu. I'm a faculty member at the School of Information Studies at Syracuse University. My co-authors are Dr. Jun Wang, an independent researcher, and Ms. Lu Guo, a visiting master's student, and Ms. Yin Ya Li, a doctoral student in our school. This research is motivated by the information quality concerns in press releases. Press releases are now the dominant link between academia and news media. However, press releases have also become a major source of exaggeration in science communication, which is later spread to mainstream media. This happens in the social context that independent journalism is shrinking and lack of resources to act as media watchdogs. On the other hand, the press officers are under pressure to promote their research. As a result, the exaggeration in press releases undermines public trust in science. Some independent organizations, such as the Health News Review, were trying to recruit health experts to manually review the press releases and news articles. However, it is an extremely time-consuming endeavor. So the goal of our research is to see whether NLP can help examining the information quality in press releases. We focus on the health domain because the majority of science news is about health topics. We particularly focus on one type of exaggeration that is interpreting correlational findings as causal. This tends to happen more often in observational studies, which aim to establish correlational findings compared to randomized clinical trials, which try to establish causal findings. Here's an example of correlation to causation exaggeration. This example is discovered by the NLP tool that we developed in this research. In the original research paper, the authors discovered a correlation between greater family size and lower cancer risk. And the authors suggest further examination of this correlation. However, when this observational study is reported in the press release, the title made a causal statement by saying, larger families reduce cancer risk. If an NLP tool can identify cases like this, we will be able to examine a large number of press releases and even answer more research questions such as, what is the trend of the exaggeration over the years? Answer to this question is important for the science community to monitor the severity of the problem. Also, different sources of press releases may differ in their exaggeration rate. Identifying the sources that need the most help could help with more targeted intervention such as training the press officers. To build an NLP tool to identify the exaggeration, we need to accomplish multiple subtasks. First, we need to build a link to corpus that pair up the press releases with their research papers. Second, we need to identify the observational studies. Third, we need to locate the claim statements in both research papers and press releases. Fourth, we need to identify the claim strengths such as correlational or causal. Last, we compare the claim strengths to determine the cases of exaggeration. To build a corpus of press releases, we download all health-related press releases from Eurick Alert, the largest website to post press releases. We then link the press releases to their original papers in PubMed by using the DOI links. However, not all press releases on Eurick Alert gave the DOI link. So we found a third-party platform, ScienceDaddy.com, which reposts Eurick Alert articles and manually added the DOI links. So that helped us to increase the number of pairs to over 60,000. We then locate the research findings in research papers by focusing on those with structured abstract. This is the same approach as what we did in the prior study. The structured abstracts include subsections like background, objective, method, result, and conclusion. We focus on the sentences in the conclusion subsection. That reduces the number of pairs to over 20,000. We then classify the research papers by their study type, particularly focusing on observational studies. Luckily, the PubMed librarians have manually annotated a number of research papers by their study design. So we sampled 25,000 observational studies and 25,000 clinical trials annotated by the librarians, using them as their training data, and then used light GBM, a decision tree-based gradient boosting algorithm to build a prediction model. With 80-20 split for training and testing data, we achieved F1 score of 0.95. Applying this model to all the PubMed articles in our data set, we obtained over 16,000 paper press pairs on observational studies. We then linked the claims from the research papers to the press releases. Here we utilize the inverted pyramid structure commonly used by press releases and news articles. This means the title and the first couple sentences are mainly used to describe the main claims. To confirm that this inverted pyramid structure holds in our data set, we randomly checked 100 press releases and found only two with main claims outside this range. So the inverted pyramid structure works well for us to identify the main claim in press releases. We then classify the claim sentence by their strength. We use the same annotation schema as in one of our prior study that classify claim sentences by strength in research papers. Each sentence is classified as one of the four types. The first type is not a claim, meaning the sentence described background or future work or other information other than the research finding. The second type is a correlational statement that commonly use language cues like associated with. The third type is conditional causal statement that would use language cues like may increase. The last type is direct causal statement that would use language cues like reduce or can reduce. We manually annotated the title and the first two sentences of 700 press releases resulting in 2,100 manually annotated sentences. We then build a prediction model based on different algorithms, including linear SVM, BERT and BioBERT. We use the pytorch version but we revise the loss function to ensure that all classes are treated equally. So we increase the penalty on misclassifying examples in small classes. BioBERT delivered the best performance with the macro F1 score at point 89. Although the performance is satisfactory, we randomly sampled prediction errors to understand areas for further improvement. We found several types of difficult cases such as some of the not-claimed sentences sometimes use causal language and subordinate classes. That's not the research finding but they sometimes use causal language too. And sometimes unusual language cues were used to describe correlation. These cases add challenges to the prediction model which we will address in future work. Now that we have sentence level prediction tool, we can aggregate the sentence level prediction to article level such that we can identify the research papers with correlational findings. A research papers finding is regarded as correlational if at least one sentence in a conclusion subsection uses correlational language and no sentences use direct or conditional causal language. That results in over 6,000 press paper pairs with the original finding as correlation. We then go check the corresponding press releases to determine whether exaggeration happens. We use the following procedure to make the decision because the title is extremely important. We first look at whether the title made direct the causal statement. If yes, then exaggeration. Otherwise if the title contains no claim we check the first and second sentences. If both are direct the causal statements then yes to exaggeration. If one is direct the causal and the other is not claimed yes to exaggeration. All other cases are considered no exaggeration. This criteria is strict to ensure that the findings in the press release are only presented in causal language. With the NLP tool at hand to identify exaggeration we were able to scan all of the over 6,000 press paper pairs that has the original research paper drew correlational conclusion and we found 22% of the corresponding press releases made exaggerated causal claims. Breaking down the cases of exaggeration by the year we found that despite the increasing number of cases as shown in the bars the overall exaggeration rate is actually decreasing as shown in the zigzag line. The decrease is statistically significant with the Spearman-Renko correlation coefficient at minus 0.88. We also compared different sources of press release using media contact person's email addresses and the name of the source institution to identify the source type. We found that over 25% of university press releases made correlation to causation exaggeration. In comparison over 16% of journal publishers made this kind of exaggeration. The ratio is about 1.5 to 1. This result is consistent with prior studies that use manual content analysis. To conclude this presentation we developed and validated an NLP approach for identifying correlation to causation exaggeration in house-related press releases. We found some good news that the exaggeration rate is decreasing over the years and we confirmed some bad news that the university press releases made more exaggerations than journal publishers. Understanding the reason behind these patterns require further research. We plan to interview some press officers to see whether changes in their training contribute to the changes. Our study has some limitations that we will address in future work. First, we'd like to identify the research findings not only in structured abstracts but also in unstructured abstracts. Second, we'd like to match the press release with their original research papers by using their content because sometimes the DOI links are not available. Third, we'd like to match the claim sentences in research papers and press releases by content instead of relying on the inverted pyramid structure. It works well for press releases but if we want to extend our work to other news sources and social media, we need to be able to match the claim sentences directly. Our code and data are available on GitHub. We are a fabulous research team and we thank the National Science Foundation and Microsoft for their support.