 Hi, everyone. It's Monica Wachee here talking about data science again. But I'm particularly excited today because I get to talk about two softwares at once, R and SAS. Why am I using both R and SAS for one project? Well, I won't normally do that. I explained the reason in my blog post I link you to in the video description. Briefly, I was already developing an analytic data set in R for a project where I was analyzing the data in R. Then suddenly another project was assigned and it was due right away. The second rush project was like the first project, so I wanted to quickly generate a similar data set in R and analyze it. The only problem was for the second project, I wanted to use a SAS PROC for the analysis and I was in a total hurry. So I'll show you how I solved my problem as a perfect example of SAS R integration. So why do we love R? We love it because it is open source, which is free, and it's easy to use for extract, transform, and load, or ETL code. Creating an analytic data set for a research study is quick and dirty in R. But the problem with R is that it has trouble doing heavy analytic tasks on big data sets in a normal environment, like on my desktop computer. In fact, I go into detail about this limitation in my blog post. I demonstrate that I have no problem using R to generate the analytic data set. What I'm doing is trying to make two binary flags to represent health care access issues. The first one, med-cost flag, flags the record if the respondents said they could not access the doctor within the past 12 months due to medical costs. See the coding of med-cost flag? I generated it from the native variable in BRFSS, which is med-cost. And then next to it, we have the coding of the binary flag, no plan, from native variable health plan 1. This flags the record if the respondent doesn't have a health insurance plan. See, the BRFSS is a population-based survey. In the first analysis, I was just considering a subset of the records, so I didn't need to use the population weight variables that they include. But in the second analysis, I wanted to estimate the population-based rate of these access to care variables. So the idea is that all the records are in the denominator and the numerator is made up of the records that say 1 for the flag. But we needed whatever software we used to take into account the weight variables in the analysis. BRFSS offers guidance on how to use both SAS and R, whichever one you want, and run the analysis with the weights. But the problem was when I used the R code, R just wouldn't run. It just hung. So I knew I could not finish the analysis in R even though I successfully made the analytic dataset. So the main reason not to use SAS for a project is that the data step sub-language you have to use to edit the data is really onerous. Data steps are necessary for big data because you have to have all the tools offered to you in data step sub-language to manipulate really huge datasets. I explained this in the first few chapters of my book, Mastering SAS Programming for Data Warehousing. But SAS data steps are overkill for small data. And for SAS, most datasets that are not in a data warehouse environment are effectively small. That's because a dataset has to be pretty big for SAS to even notice it's big. Which was great for us from the analytic standpoint because R was not able to complete the analysis. The data were too big for the R commands that I was trying to use in my desktop computer environment. I knew the same dataset would be totally tiny in SAS's eyes so I knew I would not have trouble running the analysis on it in SAS. What I didn't know was if I could make the analytic dataset small enough to use in SAS ODA because it is a free version that is online so it has data limits. It's not really meant for doing a full analysis. It's just meant for practice. But I had an idea. My plan was to run PROC SurveyFreak in SAS which requires just the weighting variables and of course our flag variables. And since SAS datasets are so bulky, I instead use R to make a CSV file of the dataset with just the variables I needed for PROC SurveyFreak and exported that from R so I could upload the skinny dataset into SAS ODA. And I crossed my fingers and it worked. I was able to upload it. But remember you still have to convert it to a SAS dataset. This is SAS so of course it's confusing. In order to use the dataset in analysis, you still need to use PROC import or the import utility to convert it into SAS format in the SAS ODA environment. But once it's uploaded in there, it works. You can see my PROC import code on GitHub. And look, I was able to get my PROC SurveyFreak output to come out. See those weighted frequencies and percents? That's what you need SAS for. So I was able to solve my problem with a little SAS R integration. You can imagine how a similar approach would be helpful in SQL where you can make a SQL view and hit that from a SAS server environment through an ODBC connection using SAS access to limit your import to a smaller subset of data. This is an overall paradigm you can use when thinking about dealing with SAS which is trying to make datasets smaller in other programs before getting to SAS. So SAS doesn't have to work as hard at data stepping. I hope you enjoyed my little video on SAS R integration. Are you doing any SAS R integration or any other kinds of data software integration? If so, give me a comment. I'd love to hear about it. Thanks for watching and have a great day.