 Hi everybody, it's Monica, the friendly neighborhood epidemiologist here to explain to you what I think the differences between domain knowledge in the fields of public health and data science. Now, I realize that's kind of like talking about the difference between something that is soft and something that is yellow, like we aren't even talking about the same kind of thing. But, I felt like I needed to answer this question because there is quite a bit of overlap between the fields and some pretty serious gaps that need to be addressed. So here we go! So, let's say the circle represents the domain knowledge in the field of public health. See how nice and neat the circle is. Okay, now see the zigzaggy shape I placed on top of the public health circle? That represents the domain knowledge in the field of data science. Notice how it's all sort of well jagged and unpredictable? Okay, so it's not neat and circular like public health. Also, notice that there is an overlap between public health and data science. It's kind of a weird shape. I think it is shaped like when you have a hard boiled egg and you peel off the top egg shell. Do you see it? Okay, let's start with domain knowledge in public health. There are three areas where we excel. And coincidentally, these three areas are the only data science topics we know. They are study design, meaning epidemiology, scientific research, meaning how to use the scientific method, and biostatistics, which is basically what you do when you analyze data from epidemiologic research studies. Notice how on the slide the word biostatistics is like halfway into that egg shell shaped overlap part I talked about on the last slide. That's because of these three areas. I'd say biostatistics is the most data science thing we learn in public health. You can see up at the top of the circle the phrase study design doesn't really even make it into that overlap. It just kind of touches it. And then scientific research is way over on the left. Data science doesn't really do much research based on the scientific method. All right, here's your quiz question. What does the slide imply? Okay, I'll give you the public health implications of what that slide said. And that is that data scientists suck at study design. And they are not really that good at hypothesis driven research. So if you are a public health person, what's awesome is that you can go onto any data science team and totally help them out with your study design skills. You can get them doing scientific research. But yes, there's a big but unfortunately, the picture is not 100% rosy. Let's talk about what's in the overlap between public health and data science. Okay, as I said, now let's focus on that egg shell shaped overlap between data science and public health. You will see I identified four subjects management, governance, which is also policymaking, informatics and math. I want you to notice how huge the word informatics is. Really, I made the slide to scale. Math is little by design. Math is good to have in both domains, but it is not a main player by itself. Governance and policy, same thing. Good to have in both domains. And if you're in public health, your skills at both math and policy can be translated to data science pretty easily and vice versa. But let's get back to that huge informatics word. If you look upward on the slide, the word management is almost as big. And the problem with this scenario is that not only are they big, they are mostly on the data science side of the overlap, not the public health side. So while we learned both management and informatics skills as part of getting a master's degree or doctorate in public health, we really don't gain the level of mastery in those two domains to allow us to comfortably transition to the data science field without feeling like we have deficits. Okay, now let's look at what domain knowledge, in my opinion, falls outside of public health and squarely inside the domain of data science. Now, this is my opinion. If you have a different one, feel free to add a comment and we can discuss. But in my opinion, we really don't learn programming in public health as you can see on the slide. We learn to use SAS and all, but we don't learn programming form and etiquette or how to program as a group. These are topics students learn in computer science courses where professors actually teach programming. But we typically don't learn these kinds of concepts in our SAS courses in public health. We also absolutely do not learn engineering. I think everyone would agree with that one. And we only learn biostatistics. We don't learn statistics outside the domain of human health. That's why I put statistics on there and only the beginning of the word the S in statistics is in the overlap. When we try to make models using data outside of the health domain, we really have trouble. We are good bio statisticians, but translating that to another domain is a challenge. So how did it all get this way? Well, it has to do with the history of how these two different fields evolved. Public health as a professional field came first, although it is still rather new. Although the philosophical underpinnings to public health began to be assembled in the late 1800s and early 1900s, you only see schools and colleges of public health forming in higher education in the 1980s. For example, it is in that era when both the University of Minnesota started its school of public health and the University of South Florida started its college of public health. And then we didn't even start our CPH or public health certification examination until 2008. So what this tells you is back to 1980, we've been doing big data. All of our epidemiologic surveillance studies, you know, Framingham and Haynes, BRFSS, those all produce huge data sets. So we've been coming up with strategies for storing and analyzing big data since the 1980s and actually even before that. So when did we invent data science? Well, the term was invented in about 2010. In fact, I remember when I first heard the term data analytics, I didn't even know what it meant. What does the term data science even mean? And why was it invented in 2010? What was going on before that? Well, the term for it was business intelligence or BI for short. Remember that slide I was showing you with the overlap in the public health and data science domains? You might have wondered why isn't business on the slide? And the answer is because the term business basically means business operations, which is everything. It's management, it's finance, it's human resources. And we do all that in public health and in data science. So business kind of means like everything we do at work. And the term business intelligence or BI means that we can study all of these processes at work because they all produce data. Whenever we hire someone, or whenever we fire someone, we create data. We create data when we purchase something at work, when we go on a Zoom meeting at work, and when we print something out at work. In fact, early BI involved a lot of data entry and reporting because most data from business was not electronic. Managers might choose to make a video of workers working, then collect data from the video to see if they can make a better operational flow. Then they make a report. See the printer on the slide? That was reporting from your BI. I worked at a health insurance that had a whole reporting department that was doing BI analyzing our insurance data. All these activities were part of BI. So when we look back here at my overlap diagram, you will see that all the domains that are leaning heavily on the data science side are all from business intelligence. The huge informatics part represents all that data entry. You know, taking videos of people and doing reporting and such. Being able to do all that successfully required the other domains on the data science side, programming, statistics, and management. And ultimately, this is why engineering is way over on the data science side. Let's think for a moment about a factory. Imagine it's a serial factory and there are machines making cereal and putting it in boxes and putting those boxes in boxes. You know how it goes. If there are sensors all over those machines, they are recording data. And if you can do BI with that data, you can do a better job of making decisions about the machines. Like how often to shut them down for calibration so they don't get uncalibrated and produce results that are outside of acceptable standards. So here is your next quiz question. What do you think are the public health implications of that? That engineering is only on the data science side and not the public health side. Bingo! Public health practitioners make crappy engineers. On average, I know I've seen some pretty cool outliers in my time, like my intern who makes amazing dashboards. She's a public health person, an engineer, a musician, a mom, and everything else. But I admit it, I am a crappy engineer and most of my public health friends are not that good at engineering either. So the take home message here is that there is definitely a place at the data science dinner table for the public health practitioner. In fact, she might even be sitting at the head of the table if the data science team needs to be doing some sort of hypothesis driven research, especially in the health domain. But the problem is it's hard to jump into data science if you aren't well versed at programming, management, engineering, and non-healthcare statistics as the people already in the data science field. It can be really daunting. That's why I specialize in making learning materials for data science that are easily understandable by public health practitioners. So I'll hook you up. Please explore the links in the description to this video for some great public health data science resources. And don't forget to subscribe to my channel, connect with me on LinkedIn, and follow my blog. I hope this video answered some of your questions about the overlap between public health and data science. They are very different, like the difference between a yellow thing and a soft thing. But there are things that are both soft and yellow, like bananas. Yeah. Okay. Well, if you have any more questions that I can answer, please post them in the comments and I'll respond. Thanks for watching and I hope you enjoy the rest of your day. Thank you for taking time out of your public health and or data science journey to watch this video. If you like the video, then please hit the like button. And if you want more videos like this, please subscribe. Thanks.