 Well, good afternoon, everybody, and welcome to today's presentation, Building Tables from Scratch in R. The first thing I have to do is clear my throat. Excuse me. It has been so hot here in Boston where I am, even though it's September. But thank you for joining me today. I'm Monica Wahee, and those of you who know me, welcome back. Thank you for coming to my event. And those of you who don't know me, I'm a data scientist and I'm in the sort of healthcare domain, but I didn't stay there. I kind of went other places. So I'm big on SAS, which we use a lot in epidemiology and biostatistics and stuff, but I'm also big on R and I'm big on integration. And so one of the reasons I'm so happy all of you showed up today is I often talk to SAS users about not using SAS. Not like there's anything wrong with using SAS. But I wish SAS users would use SAS for what SAS is good at and not use SAS for what SAS is not good at. And that has been sort of hard sell for the last 20 years, but now it's an easy sell because we have Python, we have R, and we have all these really awesome alternatives. So now one thing we all know, or maybe we don't all know, but a lot of us realize that SAS is amazing for data editing of big data, right? So if you have to edit a huge data set, like make a column or whatever, if you're running a data warehouse, actually, I wrote a book on data warehousing in SAS. If you're running a data warehouse and you want to analyze the data in the warehouse, you don't want to just warehouse it, there's a great use case for SAS, especially SAS Viya. Now that SAS Viya is the version that is online, that's web enabled or whatever. So there's a lot of reasons to use SAS's data editing. But the downside to using SAS's data editing, which is data steps, is that those data editing steps, those data steps are not good at manipulating, like, making a table, just like making one, right? So let's say you have a big data set, like the BRFSS data set, which I always use to demonstrate, that's a big surveillance data set. It's like a phone surveillance data set, so they call these numbers in the US, it's cross-sectional, and they ask you a bunch of health information. Okay, so let's say that you're analyzing the BRFSS data, it's got all these columns, all these rows, it's big, no problem, data step SAS. But let's say you just want to make a little table of summary statistics from that, and you just want to sort of throw it together the way, you know, like you could just construct it from making commands. Well, in SAS we do that when we make arrays, right? It's really easy to make an array object in SAS. Constructing an array, no problem. But SAS doesn't really like you to just construct a table, just make a table, right? And what SAS does is, if it's, like if I say to SAS, why can't I make a summary table, you know, with like means and standard deviations for some continuous variables, SAS says I already have a proc for that, and it's called proc, you know, very it. And then I'm like, okay, great, it looks beautiful on the screen. How do I get the values out, you know? And then SAS would say, you need to use the output delivery system. Now, if you use the output delivery system, it delivers output, it's good, it's like Amazon, you can count on it. The problem is the output is sort of delivered in this awkward state. It's a sort of weird shape. So I can describe it more when I'm talking about R, but it's like, if you do a frequency table, like a proc free can SAS and use the output delivery system to deliver the frequencies out, it'll be kind of like it'll say, like, you know, male, age 25 to 55, and then some mean, and then the value, and it's just this really awkward shape. So you can use output delivery system in SAS and get it to export these summary statistics, but it's kind of weird. And, you know, it's not that big of a deal to put data in R and do something like that. So that's what I'm going to teach you today is how you can do something like that in R, like what I'm describing that is not that easy to do in SAS. All right. So, but before I go on, I have to stop and make a commercial, which is for something that I love to sell because it's free. And that's this online workshop that I'm going to be teaching at the end of the month. So if you have what is it, two to three free hours from on Monday, September 25, Wednesday, September 27, and Friday, September 29, please come to my free online workshop. The topic is application basics. And that's the topic of the actual online course that I'll be teaching as a backbone of the workshop. But what we're going to be focusing on is R. We're going to be focusing on how can you involve R in your pipeline? Now, I know you don't have to talk to me about Python. Python is awesome, right? But in the healthcare domain, we often find ourselves using R instead of Python because it's just easier to adopt for a lot of things. So why not talk about R? Why not talk about places like in this exact lecture where you can use R in your application pipelines in health analytics without disturbing SaaS or disturbing other things you have going on, Epic or whatever's going on. So it should be an interesting time. And like I said, there's three sessions. And it's based on an online course and application basics. And there's a link to register for the free workshop in the description to this LinkedIn event. So I look forward to seeing you there. Alrighty. Okay. So now we're going to move on with the... And I'm checking the chat if you've got... I got the chat on. So if you've got questions, I'll try to catch it. So I'm just going to set up today's example. I was hinting, I'm going to use a BRFSS data. So that's health surveillance data where you call people on the phone and ask them, do you have diabetes? And how old are you? And stuff like that. So in SaaS, we use data steps like I was saying. And when we do that, we refer to columns by name. So in the BRFSS, there's a variable for education. I think it's called Educa. So there's Educa. And if you look in the code book, it's got like 10 levels to it, which is too many. So what I did, I pre-processed this data where you're going to use today. You can download it. And it's from one of my LinkedIn learning courses, actually. And so I created a transform variable called edgroup that doesn't have that many levels. Like just group together some of the education groups. So in SaaS, we do that in a data step, like data out, set in, and then whatever it was. You create the edgroup variable and you recode it or whatever. However, you do it. I'm not actually good at programming SaaS. And of course, I've forgotten and I haven't had my head in SaaS for a little while. So in R, when I do data editing, like in SaaS, I refer to columns by name. Now, you're probably thinking there's another way to refer to columns. And if you kind of remember, sometimes when we run arrays or maybe you do this a lot in SaaS, when you do macros and stuff, sometimes you refer to columns by their number. But most of the time you're referring to them as column names and you're making like arrays to name them. In fact, I hate to say it, but in SaaS, we're often renaming columns. So we'll cooperate with, you know, like arrays and stuff. So in R, how you, in SaaS, you usually are declaring a dataset like here, BRFSS. And then when you start talking about the variables that knows which dataset you're talking about. In R, you don't actually have to do that. Like you have a syntax, right? So let's say I have a dataset in R called BRFSS. If I have a variable called edgroup, I can refer to it by calling it BRFSS$edgroup, which is kind of nice because in the middle of nowhere, I can just start talking to R and say a dataset $ and a variable. It doesn't have to be a loaded dataset. I mean, it has to be R's memory, but it doesn't have to be anything I'm doing anything with at that moment, as long as it's available to R. Whereas in SaaS, you know, you're in the middle of a data step, you can only go start talking about some other data, you know, you have to finish the first one and go deal with the second one or you got to do a merge. I don't know, it's much more picky. It's good. You got to go to an order. So that's a long way of saying that R is really flexible for data editing. So not only can you refer to columns and data frames, that's what I use as data frame is the object in R I use as a dataset. I say I use, there's other ways you can do it, like there's a thing called a table, but I generally just use a data frame. And so in data frames, you can refer to the variable by the name, like I said, but you can also refer to it by number. And in fact, you can, you know how like an Excel, you can refer to specific cells like A4 or whatever, you can do that in R in an actual dataset, you know, which you're probably thinking, well, who would want to do that in SASS, you know, you have big data who would open the data set and look at this whole thing and try to look for a cell. Well, you probably won't want to do it in SASS. Because remember, what we're trying to do is make a summary table in R, right, a little baby table. So what I'm teaching you today is kind of, if you want to make a little baby table, right, maybe it'll have big numbers in it, but it's doesn't have a lot of rows and columns. And usually, you know, maybe you're making it for display, you know, just like for a report or something. It's not for making a big data table like you would normally edit with a data step. So R has a lot more objects than SASS, and you can really juggle them around for fun. Today, we're going to use the vector, a vector object, which kind of looks, we'll remind you of the SAS array. And there's a few other objects I'm going to use, but I'll just kind of get started with it. Let me, so you can get these links. If you want these links, you can get these slides by just downloading them. And then there's the links. And I just wanted to point out, if I remember, I'll go to the blog post I made about this topic. The demonstration data set on GitHub is in a different GitHub folder than the code I'm going to show you. I just want to call your attention to that. It's on the slide. And when you go to the data set one, just go to the bottom data set, the BRSS underscore I. I chose that one because I think it's the smallest. And it's still huge. It's still pretty big. All right. So everybody ready? We're going to go over to R now. So here we are in R. So I'm in R GUI because I get easily flustered by not being in R GUI and, you know, seeing R studio windows all over the place because I'm old. So here we have our console up here. And I just have one big thing of code for you guys today. So let's start at the beginning here. So we've got a read RDS. So if you download that data set, you'll find it's an RDS format, which is R's native format. So just like SAS 7B loves it. Well, R loves RDS. So this big data file, I'm going to read it in. And I'm calling it this data frame BRSS. I'm making it be a data frame in R and calling it BRSS. So I'm going to run this and then there it is. Now in SAS, we often run a PROC contents. What I'm going to do is run a call names to show you the column names. And you'll see here some of these, if you're used to using the BRSS data set, you'll probably recognize like R puts an X at the beginning. It doesn't like the underscores at the beginning. Like just another disagreement between SAS and R. But this is X underscore H underscore G. Remember underscore H underscore G is the native age grouping in there. So you'll see some native variables are in here and some transformed variables that it's just from one of the LinkedIn learning courses. And I just like to use it to demonstrate stuff because it's got variables the way I like them in there. All right. And so the native BRSS data sets usually 400,000 rows. But the number of rows here, this is like PROC contents in slow motion. That's the way R is. In SAS, you say PROC contents and it gives you all this beautiful output. In R, you have to tell it like, what do you want? So this is the number of rows, N row. And it's almost 60,000 rows. So it's not, it's not anywhere near the whole data set. I must have applied some inclusion exclusion criteria to get it down there. I forgot, I recorded those so long ago. But in any case, this just, but it's still pretty big. So it gives you an idea. Okay. So for our use case today of making a table from nowhere, from air, from scratch, you're probably wondering, well, you just read in a table, but you'll see. You'll see why I'm, where I'm basically, this is a table where we're going to get our summary statistics from. Okay. So let's go. So imagine we wanted to build table one for a study. So remember, when you publish the peer review literature, you know, if you apply inclusion exclusion criteria to this, and you're like, okay, now here is our sample. And you have this table one that says, you know, male and female and education groups and whatever, whatever. What I like to do when I make a table one is I like to take the outcome, like if the outcome is, I wrote a paper where they were up to date on their dental visits and not up to date on their dental visits. Like I'll put those as like all up to date, not up to date. And then I'll make this bivariate table and put all the exposure on the confounders and the different levels on the y-axis, right? And I'm picky. I like an all all. So I can really, because when I review it, I like to really understand all of these frequencies. Okay. So we're going to pretend we're going to do that only, we're made, I made up a study here. So our outcome, or our dependent variable, I guess I didn't think about this very logically, is smoking, smoke group, right? Which I invented smoke group here. And I invented it from their real variables. But I think I can't remember what, oh, here it is. I put it in four, three levels. Okay, there's a current smoker is one, and not smoker, not current smoker is two. And then nine is unknown because of course there's unknown. How do you not know if you smoke? Okay, I don't know. Somehow they said unknown. Okay. So I'm going to have like all current smoker, non-smoker and unknown. So there's like four columns for each of these variables in this summary table, right? Which is a frequency table. Okay. So what are, what are my, going to be on my, along my, my up and down y-axis here? Well, I decided to just kind of demonstrate two scenarios because you know, you don't want to be here forever. One is this education group, right? So you have education and I picked out, it has like five levels, like less than high school, high school, some college, and then graduate. Oh, and unknown, that's it. And then the next one I picked out was Hispanic, because that's more like a flag, like you're going to put the Hispanics in the table, like how many, but you're not going to put how many are not Hispanics, like you can just guess. So, and you do this with comorbidities too, like hypertension, yes, no, you know, COPD, yes, no, you don't have to put the no, it just takes some real estate in the table. Okay. So I'm giving you like two examples. This is like the baby of stable in the world. All right. So let's go. So, so here you see these are comments, this is the little number sign before that's all you have to do. You don't have to do anything fancy, like in SAS, you have to do the slash and whatever, you know, okay, here is how you can do a frequency table to the screen. Remember how I taught you this syntax like smoke group is the variable and BRFS as the table and here's the dollar sign. So this is just going to run a one way frequency. So let's highlight this and run it. Now remember, I told you that there's going to be, you know, smokers and non smokers and people who don't know if they smoke, which I can't even imagine. All right, here. Okay. So that's a one way frequency. So this is what's going to be in our columns. So pretend we're writing a paper and we have that. And maybe we did this in SAS. Maybe we're even doing regression in SAS or whatever, but we put it in R for this purpose. Okay. And then let's just look at education group here. And so remember how I said there was like I kind of cheating because I'm looking down here, but it's less than high school, high school, like just graduated high school, some college, college graduate, and then they don't know what their education is, which is probably low because they don't know their education. I can't figure it out. And then you have the Hispanic one was the other one where I'm going to put it. And so when I, so all of these three variables are not native variables. Okay. If you go down on the BRSS, you won't find these three variables. But if you take my LinkedIn learning courses in them, you'll find them because I'll demonstrate how I used the native variables and documented them and decided how to reclassify them to reduce the cardinality or like to reduce the number of levels because there's always a zillion levels in these surveillance data sets. So how I collapsed the education group, I did that. And, you know, you can, we can argue all night about what to do with unknowns. I just kept them in a group here. For the Hispanic flag, I really made a flag. So the unknowns, if you don't know if you're Hispanic, I think it better be no, you know, so they're kind of subsumed into the zero here. So I'm just showing you the data in the original data set that we're going to try and count up. Okay, now we're going to move on to actually making these things, this table from scratch, right? So there's a few approaches to do this. Basically, what we're doing is making pieces of a data frame, then putting, then calling it a data frame and putting it together. Okay. And there's different ways of doing that. Like, for example, I have, like if you do one of these frequency, if you make one of these frequency tables into an object, it's a matrix. Well, you then convert that matrix to a data frame, you made a table out of thin air, right? So that's just one of the five zillion examples of how you can just make a data frame out of thin air and art. But I'm going to show you like my favorite way for doing it when I have a table one scenario like this. So my favorite way is where, and you can do this both ways, but I'll show you my way, which is where you create vectors. And each vector is a column. So remember, we're going to have like a column for everybody. Well, we're going to have a column for the category, right? Like what education group we're talking about, right? And then we're going to have a column for all, and then we're going to have a column for current, and then non smoking, and then unknown, right? Like, so we're going to make all these columns. So you can make those as vectors. That's the first step. And then you run a data frame command on them. And that sows them together into a table, right? It like fuses it together. Okay, you can also do it the other way. I think you can make rows and then you can fuse them together into a table, but we're going to make the columns and fuse it together. All right. So first we have to make the vectors, right? Now this part is a little hard to envision because each vector is going to be horizontal when we look at it, but we have to imagine that it's going to be a column. So let's go to the first vector. Now this first vector here, so this is how you make a vector. You put a C here, and then you put parentheses. And if you're doing a character vector, you have to use these, these quotes. And if you're just doing a number vector, you don't have to use the quotes. And then you put commas for each member of the vector. So I have all. And then, and remember, this is our description column of like, what's in each row. So this says all, and then says ed LTHS, which is education less than high school. It's like me remembering it. This is education high school, education from college, education grad, education unknown. And then remember Hispanic, yes, because we're not doing Hispanic. No, okay. So let me just run this. And I called it cat live short for category level, because I'm just, these are, this is sort of a placer, you know, to just remember where I am. So I'm going to run this. And so I ran it here. And now when I run cat live, you can see, and this is it, this is the vector. So they have all and blah, blah, blah. Okay, see the six, this means it's the sixth member. And then this is seven. So there's like seven of these. Okay. Now I have to make the columns. And, you know, I want to fill them in with numbers, right? Like I want to fill in like how many, for example, current smokers are Hispanic, right? Like I have to fill all that in. But I don't have those numbers yet. So what I'm going to do is just fill it with zeros. And the reason why I'm going to fill it with zeros is that way I know that it'll accept numbers in it, you know, like zero is a nice place holder, because I can update it with an actual number. So I'm like, okay, that's what I'm going to do. I'm going to make like my all column, my current smoker column, my non smoker column, and my unknown smoking column. I'm just going to throw zeros in there for now. Okay. So I'm going to use this rep command, which is repeat, I guess, or replicate. And how the rep command works is you say what you want repeated, and then how long you want it repeated. So you could do like rep, I see that. All right. So what I did was remember when I was looking at how long this thing was, how it was like seven long, well, length cat live is going to tell you how long cat live is, which is seven. I mean, we cheated me already knew that. But you can see what I'm doing is I'm shoving like basically however long this is, like a string of these zeros into this all. Okay. And why is this important that you have the this be the right length of this? It's because when you go to fuse them together in the next step, if they're not the right length, they won't fuse together. So it's just easier to use this. And what will happen is in practice, I'll be fussing around with what I really want to put in that table, like I might decide I want to put hypertension in there and then change my mind. So that'll mean I want to change this vector here. And so this is dependent on how long it is. So it just adjusts, right? So let's run the first one, the all and just see what's in there. Okay. So we ran all and let's see what's in all, you probably already know what's in the seven zeros, right? And if we run these other ones, they're all going to look the same, right? They're all going to have seven zeros. Okay. So now comes the exciting part. We're going to run the data.frame command on cat live all curse smoke and smoke on smoke and sew it all together into an object called TBL. I just called it TBL, which is stands for table, right? So let's do that and run it and see what we got. Okay. So that you probably already imagined this is what we get. So okay. So now we made a table of thin air, but it's really not a very good table because it doesn't have any information in it, but at least it's like there. Okay. So I want to just like point out something about this table. As you can see, these numbers here are there and they say what row it is. I know that's probably obvious, but I just want to point it out. And these columns, you can kind of count like this is column one, two, three, four, five. So let's say that I was in SAS and I wanted to update this column, right? And they're actually this cell where Hispanic equals yes, right? We're in cat live where Hispanic equals yes. And it's the non smoke, you know, like I'd have to come up with criteria or something. Whereas in R, I can actually refer to this cell using a numeric reference. Okay. So this is where it gets Excel like. So I could refer to this column as TBL, remember that's the name of this, dollar sign and smoke, right? I could just refer the whole column that way. Or I could refer to this specific cell as row seven and then column one, two, three, four. Okay. And I'll show you the syntax for that. And so if you're like, okay, well, if you can refer to this by number by saying row seven column four, can you just call this column four? Or can you just call this row seven? I'm like, yeah, you can do that. So we're going to use some of that fancy tricks, those fancy tricks to fill in this table with the right numbers that we get out of querying the other table. So let's get started. So now that we have our table, our goal is to replace these numbers with the right answers. So I usually start on the upper left. So I'm going to need the total. Like notice here, all, all, that's just everything. Now I can remember n row. I can run the n row number of rows to the screen, which is everything that's there. And like Jay said, but I can also save it as a value, which I'm calling total underscore add. I just named it that. So I'm going to do that, right? And then let's just run it to see it. Okay. Yeah, there. Now, in this case, I showed, I'm going to show you this in two steps. So I saved that value in total n. Now I want total n to be shoved into here. So remember how I just taught you about the numeric referencing? This is how you do it. So let's look at what the cell is. This cell is row one and column two in the TBL, right? So I do TBL in this bracket. Okay. So bracket is really important. Bracket and then row one, row first in R and then column two. And that's going to shove this total and into that. Okay. So watch me do that. And then when I run the TBL, look at that, it updated very good. Now, I could have just made n row be shoved into that. But I wanted to show you this because like, what if you wanted to do proportions or something later? It might be nice to have this total n lying around, you know? So, but anyway, I just want to show you that. Okay. Now the next step is we're going to want to fill in the top row, which is, since it's the top row, it's just like everybody who's in the current smoker, everybody who's a non-smoker and everybody who's an unknown smoker, that those numbers need to go up here. So remember, we kind of did that before, right? When I was up there, we ran this table command and here it is, it's nested in here, I'll just run it again just to remind you what it looks like. So this is our table command. So like we're dreaming that this was really here and this was really here and this was really here. So how are we going to do that? And just to let you know, if I, if I take just this table command here and I make it into, I make it into an object, I'm just calling the object example. It looks nice, right? But if I, if I go to this class example, it says table, you're probably like, well, I don't, I don't care if it's a table. I'm like, I care. It's supposed to be a data frame. So, so I don't like tables, basically, in R. I like data frames. So what am I going to do? Well, they're forced to be a data frame. So when, when I wrap it in this as dot data frame, data dot frame, I'm coercing it, you do a lot of coercion in R, the IRB would not approve of R. So I'm coercing it. And this is what I get. So you're probably like, whoa, that reminds me a little of the ODS. I'm like, yeah, it's a little ODS like, but it's not annoyingly ODS like. So here's what happens. It always is going to name this var one. And it's always going to name this free. And later when I do the two way one, it's going to name it like var one, var two, and then freak. So it's always going to be flattened like that. And so, so this is what it looks like if you wrap it in the as data frame. So this is before it's wrapped and after it's wrapped. Now I'm going to save it as a data frame called smoke freaks, as you can see here. So we'll run this and smoke freaks. And now, like, if we do class, whoops, I'm gonna just do it over here, class to see what smock, I guess, smock freaks. It's a data frame, which I like. I don't like anything else. So I'm telling Vicky, right? Okay. So now, okay, now let's just go back. I'm gonna, I'm gonna show you TBL again. Okay, just to remind you, we're like so happy that this number is here and this number is here and this number is here and the data set, but we wish that from smoke freaks, we could take this number and shove it in here, take this number and shove it in here and take this number and shove it in here, right? So what you might have noticed is that's the entire second column. That's the entire freak column. So I could refer to this as like this column needs to go in this, like remember, SAS arrays, like from this position to this position. So that's really what we're doing. So the next thing is smoke freaks. Now remember our syntax only, it's row comma column, only this time our row is going to be one through three and then our column is two, but actually I'm just realizing I probably didn't need even to say one through three, like you can just get rid of it and it, because nothing is there, it's going to assume it. Let me show you, it should do it, right? See, it just made this list here, but even if I, you know, I'm being specific here and you'll see later with my two way frequencies, you got to kind of be specific, but that's the same thing. It's going to give me, you know, one, two, three. Okay, so I figured out how to pick those out of this and now how am I going to shove them where I want? I want to shove them here and to row, TBL row one comma and then columns one, two, three, colon five, three to five, that's what the colon is, three to five. See that? So now let's do that. Okay, and then we'll look, look at that. Okay, so we're pretty happy we did the top row, but now things get difficult because we actually have this whole matrix to fill in the education by, well, actually we have two things. We have just the one way frequency of education, right? And then we need the two way frequencies here. Oh, everybody please say hi to Ebenezer in the chat. He's my assistant and if you want him to be your assistant, then I'm going to, I'm going to post this video and I'll put a link where you can contact him. So, all right, so that's what I'm going to be doing is making, filling in this part, making a one way frequency to fill in this part, then making a two way frequency to fill in this part. So you're going to start sort of recognizing what I'm doing. So here I said fill in all for Ed Group. Well, guess what? This is going to go a little bit like what we just did. I already ran your table for Ed Group, right? Remember that? Now we're going to wrap it in a data frame and call it Ed Freaks. So here we go. Now, here's our Ed Freaks, right? So lesson high school, high school, you know, somebody who doesn't know how much education they had. All right. So going back to our table, where do we want these freaks? Well, we can kind of see it. It's almost like I wish I could just clip this, you know, like use a scissors, clip this, and then just tape it right here, right? So we're going to do that electronically. Okay. So Ed Freaks, here, see I caught on here. I've got this entire second column here. So we're going to take this entire second column here. But here we have to care. We can't just say the entire second column here because all is already filled up. So I have to care about this. This is two through six here. And I'm getting filled in column two. You see that? So let's go see how that works. Okay, let's go check our table. So this looks good. Looks pretty good. Okay, now we're going to tackle how do we fill in this big sort of matrix? Okay. So when we come down here, we've got table, now it's a two way frequency. And actually, let me just run this two way frequency to the screen. So remember, we're first declaring education group and then smoke group, right? So okay, so this is the smoke group because this is realistic because that's what we have here. And this is the education groups. Okay. So again, we wish we could just like sort of cut this open and put it over here. But I'm going to show you how I do it. So we're going to turn, but you see it's kind of hard to actually manipulate a table. So I turn this into a data frame and turn that into ed smoke freaks. And now you're going to see what happens when it's a two way frequency. So see, remember how the first one was ed group. So this is less than high school, high school or whatever. And this is current smoker, right? So you can totally tell exactly what value you're talking about, right? So, you know, the people I make fun of who don't know anything, they can't tell if they're a smoker and they can't tell how much education they have. They're over here, right? Like easily I can identify them because of the shape of this. And then that way I can make sure I don't screw it up. Okay. So as we can immediately see from here, this one, this current smoker information, that's from here, this third column from one to five here. And then this next column, that this, this one here is going to be here. So you can see what I'm doing here up here. So table two through six, this is always two through six. That's what we're filling it. Current smoker is in three, non-smokers in four and unknown smokers in five. And you can see me picking that out of the table. Like this first five is here. And those of you who are like good at building functions in R, you're probably like going, why isn't she building a function? And the answer is, I don't know math. Like I'm bad at math. So I guess that's my excuse for everything for not automating things. I mean, it would be a really cool function automate, but it would take a lot of parameters. Like do you want to do this as a flag or as a two-way thing? And what do you want to, you know what I mean? But if you can build this into a function, more power to you, I think it's pretty cool. So we'll do that. And then we'll check it. And this is a very error prone way of doing things I admit, but it's pretty easy to check it to see that you screwed up. Like if I put something in the wrong place, like if I put something in Hispanic, you know, you can easily see it. So it's kind of like, you don't, unless you know what you're doing, you don't want to automate this. This is more for like making a research paper that you're just doing it once. And maybe you'll go back and fix it for the reviewers or something. All right. So this is how from these three commands, we got this out of here and we put this in here. All right. So now remember the Hispanic flag that I just wanted just to say yes. So here, you know, remember the Hispanic, you start with the one-way frequency. Well, look at this Hispanic freaks here. So you see remember it was a flag and it's a flag, right? So it's zero and one. Well, you know how if you do yes, no unknown, it's usually one, two, three. So you got to pay attention that yes is down here in the flag situation. So the value that we want for yes is actually in two and then two right here. And so that's over here. Right. And so where are we going to put that? We're going to put it in seven, two. So here goes. It was lazy. I keep rerunning this. And there we have it. And then once more, we have to do this frequency here, the two-way. So we've got Hispanic and smoke group again. And then I can look at and remember we're using as data frames. That's the data frame. So what do we really care about? Well, we care about when you actually are Hispanic, right? And so when are you Hispanic? You're Hispanic where one is in bar one. So here, this second column, this fourth column, and this column row, the second row, this fourth row and the sixth row have the info we need. Let's see here. Yeah, here. So two, four, and six. That's those rows. We're getting this value out. And of course, we're putting them in here. And you can see row seven here. So we'll do that. And so it's like we're cherry picking just that value out and putting it in here. Let's go look and make sure it came out okay. Lovely. And so if you're worried that you type something wrong, well, I'll show you actually how you can kind of quick check it. So I do this a lot. I make tables like this a lot. Sometimes I'll have multiple bivariate tables in the same paper. And what I usually do is when I'm making it, I call a table. But then when I'm done, I save it as something else, like clinical table or demo table, because I don't want to accidentally overwrite it or get confused in memory. So I changed this to demo table for like demographic table. And then here's the thing that I just love. I should have probably been teasing this from the beginning is you can just export it. It's so easy. So this is like the alternative to the ODS and SAS is now what I'm going to show you here, which is me outputting this. And I'm going to go over here to where I've got my... I'm going to open the CSV I just exported for you to show you what it says. Okay. So this is in CSV format. And you'd want to maybe save in an XLSX, but I'm just not, I'm too lazy. I'm just going to show it. Put the borders on and then... So you can see here, here's all our numbers. And you're probably like, well, if you're a SAS user, you're wondering, well, what's up with this? Well, this is the row names. Now remember in SAS, OBS? Remember OBS? Number of observation? That's row names. Okay. So this is like the OBS. And as you can see, these are the numbers. And you're probably like, well, how would you turn that into table one? Probably I'll do another thing, another event on how I use Excel to sort of display data better. What I'll do is I'll do the proportions, for example, in Excel. Just when I'm preparing this for a peer-reviewed article. But the reality is you actually can do proportions in... Like if you make a table, let's see here, crop, table, table, the error that says Hispanic. See these proportions? Remember, like most people are not Hispanic, like 96% are not Hispanic, whatever. So you can do stuff and proportions, you can do all kinds of fancy stuff in R. I just did. I'm just being very pragmatic in just getting these numbers out so I can finish them in Excel. You'll see I'm just extremely pragmatic about what I use. But let me actually go back to my slides unless anybody has any questions, because I want to sort of summarize what I just did, basically. And that is that what did I just do? Well, I did something to make a summary table for table one in like a peer-reviewed article that I normally would do in SAS, okay? So it's kind of like I imagine I was making a big, doing a big project in SAS, but I needed to make this table. Maybe I can just make it in R. In fact, maybe I can just export the variables I need from my SAS dataset into R to just do this. So what did I do? Well, I took in my BRSS dataset, but I just set it aside for a moment, okay? I did some tables, you know, I looked at the frequencies and I made some plans, but I just set it aside. Instead, I used R to make a blank summary table. I made some vectors, so I designed everything beforehand. I made these vectors, and then I sort of fused the vectors together into a data frame called TBL. And that was the shape of my table one, and it had all just zeros in the middle. So I had placers for numeric values that I could fill in from querying the big dataset I read it. Then what I did was I created, I used the table command to create little one-way and two-way frequency tables, but I made them be in data frame format, so I could, you know, extract out easily the values that I wanted. You know, it's just basically, like, I just threw it into a table. Like, I hope there's nobody who has a sequel program on here. I just kind of threw it into a table and then just kind of picked out what I wanted from that table and threw it away. That's literally what just happened a bunch of times. And so the summary table I was making in R, I would make these querying, make these one-way and two-way frequency tables in data frame format just so I could get the values out, then copy them into this main table and then just throw them away. Like, it's totally inefficient. You know, SaaS would be like, oh my God, I'm having an IO heart attack, but, you know, R doesn't care about that stuff. You know what I mean? So one of the things that I sort of highlighted in this particular pipeline for making this table is referring to columns in an R data frame by not just the word syntax like table and then dollar sign and then variable name. I also highlighted how easy it is to use the numerical references to not just make a refer to a whole column or a whole row. You can do that too. Like row seven, you could just say seven comma and that's the whole row, but also specific cells like you can do in Excel. And so using all of those different tools in R, we filled in the columns and rows with the frequencies and then I exported it. And my idea was to do post-processing in Excel, but if you wanted to do more, you could do proportions in that table. You could add anything you wanted to the table because it's like you're sort of sculpting it yourself. And so you can shove anything you wanted it basically, like you can find a way and if it violates the rules of a data frame, you can just turn it into a table or something else. So R is so much less restrictive than SAS when it comes to that. Alrighty. Well, that was all I had prepared to present to you and I don't see any questions in the chat. Thank you very much for coming. I just want to remind you if you showed up later than in the beginning when I made my pitch, I want to sell you coming to a free online workshop. Doesn't sound like fun. If you do have time in your schedule, if you have two to three hours in your schedule on Monday, September 25th, Wednesday, September 27th, and Friday, September 29th, I would love it if you could come to my free workshop called application basics. What I teach about is application architecture basically for SAS users or data analysts. You know, we're data analysts. Why do we care about applications? Well, because we're analyzing data from the applications, right? So it matters when people get data from Epic or they go get data from this health app or whatever, and they give it to us to analyze, like what are we supposed to do? And so that's really what this course is about. It teaches you how to look at applications, how to figure out how to get data out of them, how to figure out how to hook them all up together to do what you want, like let's say you're making a dashboard. And that's why I really want you to sign up for this if you're interested because it's a workshop, even though it's based on my course that I made in the course management system, which so you'll have all those materials for free. But it's really interactive because it's basically a management course. And so it's necessary to really talk about case studies. I have some really cool case studies in there. And that way, you know, we can all break out sessions and there's challenges we can also solve together. So I think it'd be a good time. So I really hope if you're interested and you have the time that you can sign up. Well, thank you very much for coming to my event. And if you're not connected with me on LinkedIn, you probably should be because I'm going to be holding these events. I don't know if you'll like all of them, but there probably will be some topics that you like. So please connect with me on LinkedIn and sign up for my workshop if you want. And thanks again for coming today.