 Well, hello everybody and welcome to today's lecture. I shouldn't call it a lecture. That makes it sound terrible, right? It's actually more of a demo. We're gonna be actually getting our hands dirty with a little data and some applications today. So my name is Monica Wahee, and I'm an epidemiologist, biostatistician and informaticist, but I say I'm a data scientist because that's all that. And what I'm gonna talk to you about today is offloading odious work from SAS to R. And if you're a SAS insider, like me, you know what I mean by odious. The output delivery system or ODS in SAS is the facility that allows SAS to do graphics. And what's cool about it is SAS was invented like a hundred years ago. I'm just joking. It was invented basically in the 70s. And the ability to add graphics to such an old software was really cool. The problem is today it's really not a competitor. And why do we even want SAS? Well, SAS will do regressions and stuff with big data sets like pretty quick if you program it right. And I love R. R is open source, but you know, I have run into limits in R doing regressions, doing big heavy data stuff. But R is wonderful for graphics. It's just so awesome. And those of you who use Python, that's also good, but I don't use Python. I just haven't used Python. I've used to use R. So that's what I'm going to really talk about is how do you solve your SAS graphics problems by making an application pipeline? I'll show you a little diagram and making using R for the display. Now, as you can see on the slide here, this is a screenshot of an R dashboard display at the top and a screenshot of a SAS dashboard display at the bottom. And you might be like a SAS dashboard display. And the answer is yes, actually, SAS, you can make a dashboard in SAS. I'm going to show you one. If you download these slides and you click on this link, you can get a SAS white paper about building a dashboard in SAS. And also I'm going to show you some code if this is unrelated to this white paper. But if you want the code for my demonstration, you can go to GitHub here. And I'm just looking at chat if you have any questions. Go ahead and put questions in the chat. So SAS, as much as I love SAS, I wrote a book about it, the graphics in SAS suck for by today's standards. It's great if you just need a quick histogram or something, but they're not really ready. They're not easy to use. They're no comparison to R. So part of the problem with SAS is when SAS produces graphics, it does it very inefficiently. It's slow. It's just not optimized for that. It's for regression. It's just too slow. Now, you might make the argument, well, what if we're not making a dashboard? What if we're just making a two-dimensional, like we're writing a report? Well, even those graphics don't look that good, right? So here's what's going, what actually happens in SAS. So imagine you have a big data set and you run PROC Univariate. PROC Univariate does summary statistics on a continuous variable like systolic blood pressure. So let's say I have a zillion records in here and I do PROC Univariate. So I'm going to get mean, standard deviation, that stuff. And I want to graph it. Maybe I'm not just doing one, like I'm doing it over time, like every 15 minutes or something. So SAS is the engine that needs to create the statistics, but it doesn't have to be the one to graph it. So if you want to graph that kind of a thing on a dashboard or you want to graph it in just for a report, it's two-dimensional, you can just take the results here and then deal with the results in another program. So this is a screenshot of like, I took, from one of the SAS white papers, I took a screenshot over here. See how it looks kind of Windows XP. And then I took a screenshot of an R graph gallery, but I'm actually going to show you something real life here. So first I'm going to show you R GUI. So what's going on in here? So this is R. And this sets the working directory, and I'm going to do a demonstration. So I'll run that. And now we're in this working directory, where I have a data set called BF underscore A, which stands for Black Friday. Now, if you follow me and you go to my blog, you'll see I did a little analysis of a list of incidents that happen on Black Friday. Basically, people fight on Black Friday over buying stuff. And so I had this little data set. I'm reading it in. It's called BF underscore A. And I'll show you what it says. Okay. So this is what the data set looks like. This is the console of R. And this is called R GUI. See, R GUI. That's just the interface of R. So the first column says death, the second injuries. Now, each of these is a news report. So this is from the news, it's public information. But what I want you to pay attention to, for each of these news reports, is I've classified it by period. Like this is 2020 to 2021. So that's a two-year period. This is 2016 to 2018. And you can read my blog post if you figure out why I did this. And then this is the type of the incident. Remember, these are people fighting. Is it a shooting? Is it a stabbing? Is it hand-to-hand combat? So basically I have two categorical variables. So I just want to show you that that's what it looks like. Okay. Now, time to graph. So we're going to do head-to-head R GUI versus SAS. Now, those of you who are SAS insiders, you know that SAS has base SAS and then it has components. Those of you who are insiders, you know, R has base R and then you add packages. So I'm doing a base-base comparison, base SAS graphing to base R graph. Okay. So the graph is in base R. I first create this, I guess it's a matrix, a table. Yeah. Called counts. I just called it counts. So I make a table of type. Remember, type is this and period is this. So we're going to make that. And it's called counts. I'll show it to you. Okay. Here it is. Okay. So this thing here, you're probably like, where's the the column heading? This is like OBS in SAS. This is called row names. And so this is actually like a primary key. And it's not great to have this this way if you're trying to analyze data, but it's good enough if you're just making a graph. So you can see what happened here is this counted everything out. Okay. Now I'm going to run the bar plot command on counts. And you can see kind of what's going on. Main equals this. So that's going to be the title. And the X label is type like this. The Y label is period like that. And colors, you know, red, orange, green, blue, purple, these are just programmed in colors. Legend equals row names. Remember, I said this is a row names. And so I'm kind of gaming that. And beside equals true. I think that's for the legend. Let's just run this. Is that sexy or what? Is I just, this is gorgeous. I love it. All right. Okay. So let's just look at this for a second. This was just, I just threw this down here. Very quickly, we can look at these three periods and see patterns, right? Like you see, there's a bunch of red here. There's no red here. You see, there's a bunch of yellow or yellow here. All you think is over here is green. And so what you can really see is that shootings are continuing, are what ends up happening over time. And you also see that there's just a decrease in incidents. And oh, hi, Michelle. There's a decrease in incidents because, you know, people just shop less. This is Black Friday shopping. So that's what you'll see if you go to the blog post. Okay. So that was base R. Now we're going to go do this in SAS. We're using SAS on demand for academics. And I'm already set this up in here. So, you know, I can't really demonstrate SAS. I can't really explain SAS if you don't know it already, but I'll just tell you what I'm doing. So first, I'm going to, whoops, I'm going to go over here. I'm going to map a library in SAS so I can get the, get the data set to come in here. Okay. That worked. That worked. Okay. So I go back to the code library. Now I'm going to go to the next. Okay. I created this code to import the data set. And yeah, it's like in the SAS environment, like in a CSV, and I have to import it in the SAS environment into SAS. So like, you know, and just remember, it was built a long time ago. So be nice to it. Oh, I should probably show you the results. So the, so here's a proc content. So recognize these variables. Remember type and period. Okay. So that's what's going on. Okay. Now we're going to do the bar plot. Okay. And there's a little bit of code here, but I just want you to look at this one, make a chart. So here's ODS. Here's the ODS. I'm calling on the ODS. We're going to do this frequency tables type period. Basically, I'm trying to get what I just did in R. And we're going to do this in base SAS. And here we go. And okay, look at this. We get our proc freak table up here. And this is what we get. So over down here, we've got type and over here, or the period, I'm sorry. And then over here, we have type equals shooting. Okay. And this is over the periods. And this is type equals hand to hand combat. You know, it's really not until I'm just actually doing this to realize how hard it is to interpret this. Let me show you a comparison of two dashboards. Okay. So this dashboard, let me refresh it. This is a dashboard my colleague and I built in R. Okay. The purpose of it is to compare hospitals in term in Massachusetts, in terms of healthcare associated infections. Please do not accept these data as complete. Like there's a lot of missing data. We don't know if this is accurate. Okay. But what we did was we took HAI data that we thought was the closest thing we could have and we quartiled it. And so the we, I mean, it's much longer than what I'm saying. But basically, we ranked them in a way so that you want to go to the green ones. Here's a green one. Here it is. Fairview hospital, Berkshire health system has the what we've ranked them on like apparently they have really good HAI, but that could be missing data. So that's not perfect here. Addison Gilbert. Yeah, this is pretty good hospital here. Let's see here. Here's kind of like a not so good one here. And again, don't believe these data. This is just a demonstration project. But look at how gorgeous this was. And when Natasha built this, she wasn't like a super expert. So it's like you can do this, right? With just the data that you can see displayed. But now I was going to show you one of those. If you download that white paper from SAS, you can see how to make a SAS dashboard. So I wanted to show you how one of them works. So I knew BRFS has had one. And so I went down here, I'm like, okay, I'm going to select, I live in Massachusetts. So I'll select Massachusetts. So I was like, all right, let's go. And this literally happened. I was like, oh, I'm going to do this live stream. So this is not like, so LoJ debugger trace report. And I was like, you know, I'll bet they're rebuilding the dashboard while I'm making fun of it. So I found another one that I knew was around. This is Anne Haines' data visualization. And I just wanted to show you the functionality of one of these, okay? Because I just showed you the other one, okay? So here's what's going on with this. This is chronic conditions. You can choose a chronic condition, okay? And we're on that choice. So we're going to choose, we can choose one of these. If we wanted to do something else, I think you can go somewhere else and choose other things like nutrition stuff. But we can choose obesity, which is the default here. Hypertension and high total cholesterol. But let's say we wanted hypertension. So we'll choose hypertension. See that IO? See how that was updating like that? And it says sex here all. But if we didn't, if we just wanted like female, we'd have to do like this. And then I guess if we didn't want all, we could remove it. See how the IO is like, see how if you go over here, there's like no IO. You know, why? It's not actually interacting with the data. The data are in the front end. And so you get, oh, I screwed up here. Okay, there we go. And then here you have like, oh, here, this was particularly annoying. See these age groups? So you got two to 19, like all the kids, then two to five, six, seven, 12 to 19. You got 20 and over, like a lot of adults. It's something a lot of older adults, but you don't have everybody. Like, and then let's just do the bar graph here. You can do that down here. And so if you do the bar graph, I don't know. I mean, it's pretty colors, right? So the idea is the data we were displaying in our R dashboard could have been calculated in SAS. We happen to be displaying a small data set about hospitals. Let's pretend we had, like when I worked at the U.S. Army, we had these huge data sets, obviously the armies, like the army is big, but it has multiple measurements on it. And so we would have these huge data sets. And I'm like, they took forever for it to go what's a rate of knee injury in active duty army in 2008. Like it took a long time to get that rate, right? But once you get it, you just save it. And so basically we would get that rate and then pass the, like the ODS will export the summary statistics from ProcFreak or whatever you want, you know, you just call up the table and exports it. And then over in R, you can rearrange the table. And oh, by the way, if you download the slides, there's another dashboard that Natasha and I made that's also in R that I encourage you to just take a look at because it's really cool. Like she did a really nice job. And it's on HAIs, but it's a different thing. It's more, it's a demonstration project on like visualizing it at a hospital. Those of you in microbiology would like it. Um, so this is the big picture that I'm trying to teach you is that you don't really need to use SAS for your visualizations because what you what you actually can do is this thing, is this is the strategy, this is the overarching strategy is you figure out and in order to do the strategy, and this is where everything breaks down, is you have to really know what are you designing? Like what is this dashboard? If you're making a dashboard, like what is it supposed to do? What is it supposed to fulfill? Like who's going to use it? You know, and what are they trying to do with it? Like that thing I just showed you in NHANES, I could not do anything with that. I can tell you right now, I'm like a trained epidemiologist and I mean it was pretty color. So that's all it I just could meditate or something. I like I could not interpret that. And you saw me how I could interpret that bar chart from base R, but whatever came out of SAS, I couldn't interpret that. It's the same data, it's just not a good display for interpretation. And so the trick is to back engineer, what do you want to see? What are you trying to visualize? And then figure out how to get the summary data out of here to support that visualization. And whether you do it in Python, if you're more of a Python programmer, or are like Natasha and I, or are, you know, it doesn't matter, that's the basic idea. All right. So the paradigm is, is there a way we can, one, get other programs to do with SAS is not good at, which in this case, I'm talking about graphics. And then two, find efficient ways of moving just the right data from SAS into the other program environment. All right. And so and go ahead and ask me any questions if you want it. But what I'm talking about when I go like and show you this, I'm kind of talking about an application pipeline. So if you're already a SAS user, you might have gotten there because you're a researcher like me. So if you go to public health school or biostatistic school or whatever, you often are taught SAS, but you're taught to use SAS on like data sets like BRFS, like survey data sets or data sets where they measure labs in a clinical trial. It's prospectively, prospectively gathered data. It's well documented in a protocol. But the problem is what we have today are data from apps. Like right now, you're on this Zoom call. And there's data coming out about Zoom and about like who's joining and about how much I'm talking, how long I'm talking. And what if, you know, somebody wanted to analyze that data and get some insights about Zoom or even about health communication. It's like, what do you do? How do you do that? And so to solve that problem, because a lot of people are having it, everybody wants to analyze data from apps, not necessarily to just figure out the app, but to figure out what people are doing and how their health is. So I'm holding this workshop called application basics, the big picture. And the learning objective of the workshop is to understand data sets coming from applications well enough to analyze them and produce results. So it's basically about computer applications and design of how they are designed, like how teams, the structure of teams that design them and how the data are stored in the applications and the terminology used in application development, like a lot of this jargon I've been using in my talk today. And with this knowledge, you can break through communication barriers to get the answers you need to complete an analysis and be seen as an expert using data from applications as a data scientist. So here's the details of the workshop. Again, application basics, the big picture, it's basic, it's a big picture workshop. It's Saturday and Sunday, March 23rd and 24th, 2024. And so it's two sessions and each session starts at noon Eastern time. So if you're over, I'm in Boston on the East Coast, if you're on the West Coast, it's 9am. And each one runs about three hours. And for a weekend data science workshop like this, I kind of priced it out. It's normally about $250 to $750 per workshop. But because you came today, your special cost is free. And I'm going to give you a link in the chat to register and also to find out more information. Here you go. I really encourage you to come because it's a great experience. And you get to do, it's really a workshop, like you're really interacting, you're really networking, you got real problems to solve. And actually, as much as I'm kind of clunky with Zoom, I've gotten kind of good at it and people really get a good workshop experience and you don't even have to leave your home. So if you're interested in that, I really encourage you to sign up. So thank you very much for coming today. And for this is what I had prepared for you. If you download our slides, please follow our company page because when I post events and resources and videos and stuff, you can go there to get them. You're welcome, Michelle and Ryan. I hope you enjoyed the talk. If anybody else has any questions that you or you have any use cases, like you're trying to solve a SAS IO problem or a SAS graphics problem or anything, just let me know and I'll be happy to answer them. I usually give everybody a few moments to think about the problem. This is kind of like free data counseling. If you're willing to spill your guts, I mean, you don't have to name names. You have to name variables or data sets or companies. But if there is a particular problem you're having with SAS IO, I wrote a book Mastering SAS Programming for Data Warehousing and she could probably tell data warehousing is a lot about IO and trying to make it so people aren't clicking and going, waiting for their results or even programmers. How do we program SAS or SAS runs the best it can? Because SAS is SAS and you still need it. It's just that when you don't need SAS, when some other program can do it, what people need to do is get bold and try to connect it, try to make it application pipeline, just see if it will work. A lot of times people are a little scared, but we're in the age of innovation. So I encourage anybody who's interested in trying to make SAS connect and work with other applications. I think it's a good thing because SAS is all over the world. It's not going away. You might as well see how you can leverage it. Well, great. Well, thank you so much, everybody, for showing up today on this nice Tuesday. And I hope that you have a good day and a good week.