 So thank you everyone for sticking around the conference. I know I'm like the last speaker and probably energy is super low right now, but I just, you know, this is like, as you can see, I don't have that many slides. So I'm hoping this is going to be more of a discussion, kind of show you the inspiration behind this idea of a data scavenger hunt. So this is just kind of a quick overview. So I'm just going to introduce myself. I'm going to like, you know, I'm trying to keep the questions very simple. Why? What? Who? And then who? And then a call to action. So who am I? So I'm an assistant professor at Oregon Health and Science University. I, my specialty is bioinformatics, but you know, I have really kind of gone full tilt into teaching. And like part of that is that I really love to see, I want people to really understand the data that they're generating and visualization is kind of the other component. I just kind of, when I learned ggplot2, like my head exploded and I was like, this is awesome. Everyone should know it. So I mean, like let's get at the why. So why, why look at data together? Well, I think number one, like the number one thing is that visualization is like a really empowering thing. So a lot of people like kind of feel like data, data science might not be for them. But like, you know, kind of discovering that, you know, like we're using these visual visual tools that people can easily grasp. Like I feel like, you know, it's very kind of empowering. And that same, that same kind of vein, like I feel it's very accessible by a lot, for a lot of people. And again, this is like, you know, in terms of thinking about, like, how can we invite more people to learn data science? Like, you know, I feel like visualization is a good path to, for inclusivity. Finally, like looking at the same graph together, like doing visualization together, it lets us have a conversation. Like what we, like, you know, do we believe that things are real in the data or not? And then finally, I just feel, and this is, no, I'm not casting shade on any data science programs or anything, but I feel like, like visualization and interpreting the data is like such a valuable skill. And sometimes it's something we kind of gloss over. So that's kind of a lot of setup for this, this is, it's kind of a simple activity. So it's, we call it a data scavenger hunt. And so the core of it is really this idea of, it's a social learning activity that's based around a data set. And so, you know, when I was, when I was in school, I spent a lot of time doing exploratory data analysis, and I thought it was the greatest thing. You know, so basically, you know, pioneered by John Tukey, but I think Barron's has a much better kind of description of it, like basically finding patterns and revealing structure in the data. And then, you know, the other key part of a data scavenger hunt is basically giving people focused questions to look for in the data. So this one example, and hopefully the demo will work. So basically this idea, so can we look for associations between variables such as body mass index and diabetes? And like the final piece of it is really the idea that we need to reflect on our discoveries and we need to present them as a group and we need to talk about them. So in terms of what, like this is kind of, this is the heart of it and this is, this is a package, our package that Jessica Minier and I have been working on called Burrow. Yes, the mascot is a donkey, but it's basically a pun. You can burrow into your data. And what it does is like you can basically, like this is our goal eventually, is that to throw, like, you know, a wide variety of data sets into it and it will basically pop up a shiny app for exploring lots of different data types. And the cool thing about this is that it basically builds all of the shiny app code for you. And so if you wanted to kind of bring people, like other people who weren't necessarily like our programmers, you can basically spin it up as a website and they can start exploring the data. So part of it, part of the idea is like, you know, very carefully going through this idea of exploratory data analysis and like mapping all of the different kinds of data to like the appropriate kinds of visualizations. And you'll see the app is very simple, but I think simple is really good in this case. Because it allows people to like ask questions and look at the data very quickly. So this is just kind of the thinking behind it. Like I mentioned, you know, the app is very much organized by like the variable types. So, and this is, I'm sorry, this is not visible. But the basic idea is that you have an overview of the data here. And then like, you know, there are really cool tools for visually summarizing and like tabulating and summarizing the data. And then you can look at single variables. So like, you know, the two variable types we really focus on right now are continuous and categorical data. So you can examine kind of what is, what exists kind of within a single variable. And then kind of based on the question that you're asking, you know, there are different visualizations that like we map to like, for example, if you have two continuous variables, we do a stacked bar plot to kind of examine things like proportions. If you've got a continuous and a categorical variable, like a box plot is kind of a natural way to visualize the data. And then obviously, if you have two continuous variables, a scatter plot. So the example I'm going to show you is for, it's the NHANES data set. Anyone here, has anyone here heard of NHANES? I just, I really hope you've heard of NHANES. But so it's, it's this data set is called the National Health and Nutrition Examination Survey. And it's basically a survey of nutrition habits. But there are also like outcomes that are measured. So like for, in this example, we'll, we'll look at depression as outcomes. So this is just like a little gift to kind of give you an idea of what like the app is doing. So I'm going, basically, the first thing is like, we're going to look for episodes of little interest with depressive episodes. And so you can see, basically, we've got this, this variable, we're looking at kind of the distribution of things. And then we're looking at the association of little interest with depressive episodes. And so this, this is like the hard part about gifts is like kind of trying, trying to time things. But you can, you can see like what, when we get to the, the association, you can see that there are like when you get to little interest, you can see that like this is like the most, the most depressed people also have the most depressed episodes of little interest. So like I said, the app is really, it's just kind of a conduit to really answer the questions. So this is kind of the other side of a data scavenger hunt is to have pre prepared questions that really guide the exploration. And one of the, one of the guidelines we've kind of done is that the questions should be very, should be simple, like maybe they should be answered, answerable by one single part of the app to more complex. So like assessing things like, like is the data missing? And can we even answer the question? And like I said, like the, like the real goal is to like, you know, give people like a tool where they can have a question and we can give them kind of a visualization. So I know Wi-Fi is spotty. But like, you know, as I'm talking, like, you know, there, there is a, like if you can get to the app, like take a look at it. And so there are three questions that I have for you to answer. So if you look at the age variable, why is it capped at 80 years in the data set? Is marijuana use associated with number of depressive episodes? And are hours of sleep associated with depressive episodes? So anyone want to tackle the first question? Yeah, sorry. Sorry, it doesn't seem to be zooming in. And is that better? So take, take a look at this. I'm going to kind of, kind of, for those people who can't get onto the app, I'm just going to kind of show like a short kind of summary of what's going on. So this is the first thing that this, like, you know, the students see when they open the app. And this is basically, this is a very cool kind of visual summary tool called VisDAT by Nick Turny. And when I saw this, I was like, this is an awesome teaching tool. Because you can see, and this is, again, I should have made things a little larger, but like, you can see that, like, it classifies the different variables into the different data types, you know, so the categorical data, numeric and the numeric data. And these gray, these gray sections are actually where the data is missing. So it gives people, like, an idea of what, like, you know, if, like, you know, what, what parts are missing from the data and, like, you know, to think of, like, you know, why, why is this data missing? This is, this is, this is another tool that, of course, it doesn't show up right. But this is just basically kind of a tabular summary of the data. So we ask things of the students, like, so how many, how many complete cases are there for depression, for example? Or how many missing cases? They also have access to a data dictionary that, like, you know, lets them kind of look up and search, search for different kinds of variables. So, for example, I ask you to look for alcohol so you can search and, like, you know, we'll tell you the name of the variable and this kind of thing. So it's really, so really the, the, the idea behind, like, the app is really to kind of empower people and to, like, you know, give them the idea that they can do data exploration. So the final part of, like, the, the, the data scavenger hunt is this idea of reflection and discussion. So, like, you have, like, it's not enough to kind of discuss things in a little group. You have to bring back your observations to the larger group. And so we provide people, like, a really kind of simple template to kind of present things. So it's less, less stressful. Like, you know, what was your question? And then basically show us in the app where the, where you found your evidence. And then, you know, if there are other, any other variables that might be associated that we should look at. And in the end, like, you know, the activity is really, like, what did we learn, learn as a group together? And I feel like, you know, really, I kind of gloss about the, gloss over the, about the social parts of this. But, you know, see, like, we've, we've run this kind of data scavenger hunt in a lot of different situations. Oh, yeah. So luckily I have that. So we've run it with graduate students. And, like, you know, Portland State University ran, like, we've used this as part of our clinical data wrangling course. So this is kind of a course our incoming informatics students take to really kind of understand issues with clinical data. And they used this data set called the sleep heart health study data set. And a lot of the material is online, unfortunately, due to the data use agreement, we can't publish the sleep heart health study data. So I'm going to try to produce a synthetic version. So other people can use this. This group that like I helped moderate called bio data club, we did, we used NHANES as a scavenger hunt. And Jessica Minier was part of that. Like, so we thought of like really interesting questions. It wasn't just depression, there was physical activity and type two diabetes. So we all learned a lot about the data set, like, you know, with that. So and I also teach an undergraduate course. And this is part of our data literacy unit. So they're public health education students. So they looked at the NHANES data set. You know, interesting, interesting results. So, you know, there was a lot of interesting discussions. One of the students actually didn't believe like the visualization. So like, you know, that was not that was kind of the basis for an interesting discussion. So I don't know if she still doesn't believe the data, but at least we had a discussion about so this is just some feedback from like the bio data club session, you know, it seems like people really enjoy this kind of collaborative exploration. And then, you know, I really like this, I like the data set and the open curiosity element of the workshop. So like, I really want people to be curious about data. And like if they have questions, like, you know, if they can try to answer it. So this kind of gets into like, you know, would this be useful for other audiences? You know, there's like, you know, as we've seen a lot of the presentations here about like, you know, citizen driven data efforts. And like my question for you is what would it take to make this useful for like citizens in the world? I think probably, you know, one of the things we need to do a much better job of is like having better just in time kind of documentation that kind of explains things to people. You know, can we use this as a, you know, framework to teach graph literacy and data curiosity? And you know, I want like maybe I'm thinking too big, but maybe we can use this framework to like as part of a ability to kind of democratize data science. So what's kind of next for the the borough, the borough package? We're, you know, this and this really directly has to do with like, you know, the public health students I've been working with I've been trying to think of what data is like, you know, interesting to them and thinking about things like social determinants of health. You know, I've just been kind of learning basically how to kind of integrate shiny with spatial data, like such as shapefiles, and there's basically this SF format within R. And like, you know, this is my, my, my Wi-Fi isn't working. I'd show you an example, this example, but like it's the idea of like, you know, having, having things that are appealing to like a wide variety of students that and like to help them kind of learn data science principles. So this is my call to action. I really would like to hear from like beginners and novices and like, you know, I'm trying to be as welcoming as I can to like people who want to be first contributors. I really need help with kind of making this, like the app is really rough around the edges and I need help to kind of make it more usable. And then like, you know, for like the different data types we're kind of looking at, like, you know, I need kind of some help and like, you know, understanding, like basically testing it on different data sets and then thinking about like, you know, doing a very kind of standardized template for the scavenger hunts. So this is my funding acknowledgement. So it was partially funded from the National Library of Medicine through our training grants, and then also NCAT's Translational Science Center and for through the Center for Data to Health project. So this is probably very small. But if you want to take a picture. So this is all of my info. So the slides are here. And then, you know, like, this is the main borough site. So if you're interested in the package or the app, like, you know, you can go here. I just added a contributing section. So if you're interested, I've still been thinking about how like, you know, the best way for people to get involved. But I have a mailing list sign up there, which is also here. And then just a little plug for our group called Biodata Club. So it's our local Portland Club that most it's kind of spans a lot of the universities here. And it focuses on learning about data. We're all about kind of exploding hierarchies. So it's not just it's not just students, it's students, staff, faculty, postdocs, we want basically people to like, if you're interested and you're curious about data science, we want you. And like, if you want more information about that, you can go to our website. So that's about it. Did anyone was anyone able to access the app? So of the three questions, what did you find? So let's go back to the questions. So anyone find out why age is capped at 80 years in the data? Yeah, Jim? Just happened to chance across the data dictionary. It says anybody who has more than 80 years is categorized as 80 years. Yes. And it's it has like, we just actually dove into this. And like, she found out it's about identifiability issues, because there's not many 80 year olds. Anyone find out anything about marijuana use? So there is a there is a variable called marijuana, and I'm sorry, I just kind of threw this at you. I didn't give like, when we do the scavenger hunts, there's a lot more kind of tutorials on how to use the app. So we look at marijuana use. Over 50% of the data is missing. So that gives us kind of an idea that maybe this is we don't have enough data to ask this question. But you can see that there's a higher percentage of marijuana users and non marijuana users here. So if we actually look at marijuana use and depression. So there is basically the the data is you either use marijuana or you don't. And like the people who have like the most basically the most the proportion of people who have the most depressive episodes actually increases slightly like if you use marijuana. And did anyone look at the last question? Okay, so I'm not I'm going to I think I'm going to end here because I feel like, you know, I don't want to drag everyone drag like pull everyone by the ear. So anyways, that's that's basically my presentation. So are there any questions?