 Hi everyone, happy Saturday. I decided to do a little live stream today in case you wanted to eat brunch with me or supper or breakfast wherever you are in the world. I'm Monica Wahee and I'm just starting this live stream a little early uh so everybody can connect and say hi to each other and say hi to me and I can say hi to you. Hi everyone, just connecting now. All right and see it's working. I can update you in this period of time before we start the live stream on a research study I participated in for the application that I'm using to stream which is called restream. Restream is an application that is for people like oh hi we have somebody from Somalia on. Hello from Boston to Somalia, thank you. So I've been in a study where restream that's put on by restream and just to be clear what this software does is it helps people like me who are not very good at engineering. It helps like if you want to do something simple like I'm doing I'm streaming to YouTube and streaming to LinkedIn from my you know my little home on my laptop and my desktop but what I learned actually from participating in the study it's over now um is that uh restream can do a lot of different things it can actually you know do more like let's say you had like a dance recital and you had um a big uh something oh hello everybody let's see here hi from India yay oh um hi Matthew how are you I haven't seen you in a long time I hope you're doing well thank you for showing up to my live stream I was just talking a little about this before I started I'm going to officially start in a couple minutes um about how I was participating in the study um for oh is that here is here hi is that here I'm glad you were able to show up I didn't even advertise this very much I'm so glad you're here um I was I I found out the software I'm using can do all kinds of stuff not that I'm trying to do all kinds of stuff but also this is another thing you know how I've been having trouble with the chat like right now I I have this counter on here that says nobody's watching well obviously you're here because you're chatting so that counter's wrong and then also I've had live streams where I'm like nobody's here nobody's chatting and after I get off the live stream people email me and they're like Monica I was chatting and so this was coming up also in the participation among the participants because it's a qualitative study so we were discussing stuff and so hopefully um what we'll have is some improved restream is really taking off a lot of people are using it now and it's really improving fast but another thing that I sort of got out of discussing this with other people is streaming actually is a lot of work like I maybe didn't realize that because I like to watch people who are having fun streaming like walking around Las Vegas or going to a resort or fishing or whatever but even those people they look like they're having fun but it's a lot of work for them to set up their streams and do everything and I you know I'm starting to appreciate that more now and so give them more likes and thanks and you know because now I'm starting to be a little more empathetic and appreciate that and my live streams are a lot easier and still I'm like wow it's kind of like holding an event you know like um like a lecture or something you know all right well now it's one o'clock eastern time time to start the um official stream thanks to everybody shown up I'm Monica and I'm doing the stream today uh I'll give you a little backstory what I've been trying to do my goal for this year is to increase my youtube subscribers so if you're watching this live or you're watching this recorded and you like it that would be awesome that you liked it in fact what I would like you to do is check out my other stuff I have a lot of diverse stuff and um if you like that then I'd like you to subscribe because um I'm going to try to do more live streams so I can talk to you in real time um of course they're spontaneous I also make videos that are not so spontaneous so I can plan things and do a good demonstration but it's night I have a whole bunch of different things so that's what I was trying to do is you know I hardly had any subscribers like 3 000 subscribers and I have like like over 100 videos you know lectures and all kinds of stuff so I was like I gotta I gotta improve this so if you follow my blog you'll you'll know that and also I'm a LinkedIn learning author so of course I turned the LinkedIn learning you know but I don't know how to do something that's the first place I turn so I took a course and um learned some tricks about expanding um your youtube audience and one of them was just live streaming like this so uh so today uh I'm gonna demonstrate something oh one of the things that I also learned from that class um that course which is actually sort of obvious but you know how sometimes things are obvious but you need to be taught them I needed to be taught this the description to youtube videos is really important and I'm telling this to you guys and you're probably consumers of youtube and so you probably already know this but if you happen to make any videos like for example I have a lot of learners who are doing portfolio projects in data science and I tell them make a video show showcase how you could communicate about your data and about your project well if you do make a video and you put it on youtube it's really important you make the description like nice what do I mean by nice first of all make sure it actually describes the video and put words in it like if you have a transcript or anything um that you know what words you said you know put a lot of words in it there's five thousand characters in the description on youtube put words in it that you say in there because one of the things that was said in the course is there's really no tags there's no word description for the SEO to grab on to so you want to put the description another thing that I just think is nice and now it's easy to do is to put timestamp links so and when you have long videos and you'll see I'm definitely going to do that with this one after it's recorded is is go back and listen to it and then put in timestamp links for um so people can jump around in the video because I think it just makes it easier to learn especially if you're studying or you're like I'm going to demonstrate stuff and if you're like oh trying to do it you want to go back and look at it those are really nice so in the process of doing that if you're going to do that to your videos what do you have to do you have to watch them again so I was watching I have a playlist called essentials of statistics which has I guess really good videos I mean I forgot how good they were I was enjoying watching them um you know I'd sort of forgotten making them I made them in 2015 but they're really good I think they're really good and but they're they're good for people learning statistics um in an applied way they're not theoretical they're a hundred percent applied and I think that's why a lot of people like them because most statisticians teach like theory and I don't I get the theory but it doesn't fascinate me I'm not into math well I think it's so cool about statistics is when you can apply it when you can do stuff with it like actually predict something so uh I I created the videos for my nursing class I used to teach in undergrad and they loved it these nurses loved it and it was all by hand but I found as time went on data scientists loved it and I'm like you're already doing artificial intelligence why you care about standard deviation how to calculate by hand and I realize now when I'm right watching it is like yeah this is a good review for a data scientist because I talk a lot about how the statistical software we're not using in the class would do this and how we're going to do it instead you know because I would always give them like five x y pairs to make a correlation out of because who's going to sit there and you know calculate that forever but anyway I guess this is my long introduction way of saying as I was going through that I realized how much I use excel the random number functions in excel for just like basic research administration not for like fitting regressions or anything like that or imputing data just for like basic managing of research and when I thought of that I thought back to the last few weeks when I meet with my customers and I work with them and I realized I'm always doing this with my customers so I thought well why don't I just hold a little tutorial just show people how to do it because none of my customers have ever known how to do it and everybody seems to know excel so let me uh then just show you uh some tricks okay so I'm going to share my screen here and we're going to start with actually first I'm going to show you um uh where I'm getting the data that I'm going to use to demonstrate and that is I'm getting it from ahd.com American hospital directory.com and the reason why it's important for you to know that is because this is a really awesome website um it says here free hospital profiles but let's say you go here it's got all this data these are the hospital statistics by state here's all the states and here the here's the columns about it you know number of hospitals staff beds total discharges patient days and this is uh I think these are annually they're either annually or quarterly because Medicare which is the federal public insurance for elders mainly for elders there's some other groups in it that requires each of our hospitals to report quarterly data to it and what ahd.com does you see you can log in here they get this data which I think it's free or you can buy like you just have to pay for the media that they give it to you on like it's a lot of data and what they do is they load it in this interface and you can pay them and log in and do a bunch of analysis but the reason why I haven't done that is it's more for economics these are more it's more for economic um forecasting I've logged into it before I think they gave me a free account to show show me something um but this is available um publicly to just look at the data and I find it really helpful to when I'm teaching about healthcare now these are all the states in the United States I live in Massachusetts so when I taught my undergraduate course in nursing in statistics I use this a lot for demonstration because it was just like it's kind of nice because you get like this first of all it's public data you get this hospital name and actually if you're curious you can click on each hospital and it'll bring you to a record I'm not a robot I'm just demonstrating this here um and see this isn't and again this is public information but this is about Brock and hospital here's the website and I like using this to just teach you know to um just show when people are learning about hospitals and healthcare in the U.S. to just hear you know read about this hospital you know compare the patient revenue and all that um so let's see here it was okay but the data set that I am going to demonstrate with is this one and so I just want to go over the columns here says hospital name um the city and then um like so Massachusetts is not a very big state there are a few big cities in it and the rest are kind of suburbs and it's sort of rural area but not very rural um but there are places where you would have to drive a long time to get to a hospital so that's one thing to keep in mind um like for example you know Worcester is a really big city if you lived in Worcester you wouldn't have to go very far to get to a hospital there's multiple hospitals in Worcester but and and Gloucester is not a huge city but it's a pretty big city and yeah you actually pronounce it Gloucester like somebody had to teach me that but like a city like Pittsfield or Greenfield they're very kind of small and they're actually kind of lucky to have um a hospital and as you can see see here 89 beds it's not very big and so um so I just wanted to point that out that that's the the background like I grew up in Minnesota if we were looking at the list of Minnesota I'd be telling you different things because that's different geography here so what this is staff beds for those of you who don't know just to let you know um you can have beds in a hospital there are not staffed because there's not enough staff that's a problem right now in the U.S. because of the pandemic so this just says which beds are in operation that you can support with your staffing and these are how many people were discharged so this is inpatient all the hospitals offer outpatient services but this is just about inpatient and this is how many inpatient days like if everybody just had one day then the discharges would equal this but some people have more so and you can tell you know the closer these are to each other how many what kind of a hospital is it is it one more people stay a long time or not and then this is the gross patient revenue and don't get me started on that if you want to know about that watch my lecture on financing and reimbursement in the U.S. healthcare system that's in my other playlist not my statistics one all right so I took all these and I put it on in excel and what I want you to do is pretend that this isn't public data you know pretend this is more like private pretend these are participants in a clinical trial and you have done the the beginning stuff with them so what's the beginning stuff is you've let me stop you've already done the beginning stuff which is you first had inclusion exclusion criteria and so you went out and you said well you have to have a blood pressure between here and here and you have to be this age or whatever and you figured out that they were eligible then you sat down with them and you went through a consent process where you showed them consent form you told them everything they're gonna have to do in the clinical trial and they were like yeah yeah yeah I guess so and let's say that you felt convinced that they understood what they were talking about and what you were talking about you're going to get them in this clinical trial and they signed the consent form they say okay here I am then you go okay I'm going to measure your baseline data so you measure the baseline and then you've got the spreadsheet of data I guess that was it the spreadsheet of data and then now is time for you to do something with it so if you're doing an observational study you might have gathered a bunch of data from these people obviously for the observational study you don't have to put them in any groups but you have to create a study ID for them so maybe they have a unique ID like a medical records number or a social security number that's private and you don't want to use it you don't want to use any of those numbers and also I just found when I worked at the army I found people would have duplicate social security numbers that's not supposed to happen but people have a typo or there's some issues duplicates happen when they're not supposed to happen that's just life and data science okay so the best thing to do in the study is assign everyone a study ID and I mean that even if you're not saying people if you're saying animals or you're studying hospitals or whatever assign everybody a study ID now you can do that just by you know sequentially numbering them as they consent but if you just have a spreadsheet of data like we do with these hospitals you know how do you assign a study ID so I'm going to start by um sharing my screen here and we're going to go over to the to my data okay so what did I do here so so this is all I did was I copied the data and I changed these to like names of columns that look like here let me make sure this looks good here yeah um names of columns that like you would have like pat days instead of patient days okay I also added this column actually let me show you how I did it how I added this column this is what started out is I just put this up there and I put um I highlighted this and I right click I like to right click and I put insert because if you highlight something and then you right click and put insert it'll do that like if you do that with a row and then insert it'll insert a row but now I'm going to do controls because I want it okay so I just put the word order because this is the order that they came in at and this is alphabetical order so let's say this was like a class or something everybody already knows that every with last name of a b whatever's at the top and the z's are at the bottom and so it's like everybody knows the order but maybe I just want to create the order so I type one two three then I highlight these three okay now in excel I want you to notice the cursor okay see how it's uh hopefully you can see this this is an arrow okay now when I go down here see this fat plus it's like a fat plus now I'm going to go on this corner okay as I go down see that plus with the barbs on it like I call them barbs they're arrows right but if I go over here on the correct the barbs disappear see how it's a plus with like no arrows and it's not fat that's how you do what I call draggy draggy you can drag it down we created this pattern and now we're going to drag it down and you can kind of see see that 10 it's like filling them in okay draggy draggy oh okay okay we get 72 so we have 72 let me save this and by the way how I got this could be up here like if if you're this is all see home here let's view where's this view yeah view see this freeze panes I'm going to go here and put on freeze panes and what like if you do that see how when I scroll down to 42 like you can't see anything okay if I go back here and I go up and I click on the one I click below the pane I want frozen I can go up here and do freeze panes and then when I go down it'll stay which I like okay so I added that the other thing I added was this column over here called metro now this is like not real data I just made this up I made it up because I wanted to show you a demonstration with something where there was a binary field so I just made this up by metro I meant remember I was saying Worcester is metro and I'm in Boston that's Metro but some of these places like Greenfield is in the middle of nowhere I think so maybe you live in Greenfield you don't agree but in any case I just made up this field and kind of flag things the way I wanted to just so I could demonstrate this certain thing okay so everybody follow me so far so what you would do is like let's say that I got this hospital and actually I've studied these hospitals they have some IDs like they have organizational IDs and stuff but I just added this order I just added this ID to them but let's say I can't use that ID because we know they're in alphabetical order and if somebody say ID is like 37 they know they're in the middle whatever we have to do a random ID okay so this is now we're thinking observational study Monica is going to give them a random ID okay so how would I do that well this is the steps I do first I create a new column here right click uh insert and I call it random and random okay and I click into the first cell up here and I type equals rand open parentheses and close parentheses and that gives me this random number between zero and one okay now I'm going to go back up here and I'm just going to hit enter see that that didn't update if I go over here see I click on it remember it was 0.77 before let me do it again see now it's 0.70 so that it likes to update a lot well I want this random number in all of them so remember what the trick I taught you I click on this see it's a fat plus now here's skinny now I'm just going to drag it down it's just going to copy everything it's going to copy that ran number 72 so there's 72 in here see these random numbers and they're just going to keep updating every time I do something like update something else they're going to change okay now I'm going to but that's fine just it's there you just leave it now I'm going to insert a new column insert and I'm going to call this see how it updated I'm going to call this study ID okay so this is where I'm going to assign them this ID and nobody's really going to know what order they were in first so what am I going to do before I do that well I'm going to highlight the whole thing that's what you can do up here I'm going to go to over to data here I'm going to do sort and guess what I'm going to sort by I'm going to sort by random right and I don't really care what it says I'm just going to sort it okay now that I'm done you'll see it's even out of order already like five nine oh two you know it just changes all the time but our original order you see the four twenty ten sixteen so this work now I'm going to put it on the study ID and the study ID is going to be like one two three see how it keeps updating then I highlight and do my right where I write this down okay okay so this is how I assign a random study ID to this group of hospitals which are pretending our patients now if I want to get back the original order I can highlight up here okay I can sort by order and get back the original order okay so for example this might be somebody's social security number in your or your medical record number right and you put this in here and then you did this random thing and you sorted them by the random and you had the study ID and now so Brockton hospital now is going to be known in your study as 39 and ad care is going to 62 and Edison Gilbert 19 and there and then you can use so let's say these were real people and I said okay thanks for being in our study I'm going to send you through email a bunch of anonymous surveys like three anonymous surveys like they went through consent right and when you go to fill those out you're going to have a link just go to the link and enter your study ID as the first question well when you think about that that's a very secure way of getting research data because even though they consented if somebody hacks that data like you're using survey monkey or something someone hacks that data and they find out oh you know this person here study ID 39 has HIV or has vital cancer or something they they they only know what study ID 39 they don't know it's medical record number whatever right and so that's why it's kind of good to do this like in general so let me go back and see if anybody's got any questions now nobody has any questions okay so this first part let me just unshare for a second so the first thing I did what I did right there was I just showed you how to put a study ID on um I I put a study ID that just said one two three four five let's say you have different sites like an A and B site and you want to use like a study ID you want to do a one a two a three and the b one b two you excel will do that they it does this incrementing thing just you know work with it and that's how I get a study ID now the problem with the study ID is this so I want you to imagine I'm the statistician and we're doing like a clinical trial and I haven't really gotten to the randomization but let's say fast forward we did a randomization and there's two groups and the people have been placed in those groups and it's double blind okay so how so how I've seen a double blind study done is they take people with high blood pressure and they have two drugs they have like the usual drug that a lot of people take in this new experimental drug and what they'll do is the the experimental drug they'll make it so you can't really tell it from the usual drug and then people who are administrative what they'll do is package up the drug to give to the participant in these packages that all look the same but they'll put a study ID on it so whatever study ID you get that's your package it's kind of like a gift whatever random your raffle gift but that means that the statistician me is going to actually randomize the study ID is on a spreadsheet and say okay hand these out and whatever study ID this person gets says what package it because that way it can be double blind right so the the clinician can be like oh I I get I just pulled up the chart where I'm recording all the random all the patients that are enrolled and the next study ID is 60 I guess I'm giving that to you and here's a bunch of drugs but no one knows the patient or the participant in the clinician they don't they don't know what drug it is whether it's a or b so I I've been sort of behind the scenes when they're trying to set that up and so but one of the things that they have to do first is realize that if somebody gets sick and goes to the emergency room we're gonna have to unblind them so we're gonna have to figure out what they're taking and of course I mean we have to know for the statistics what ID 60 took but we're gonna also have to know for the emergency room so I need to keep track of who is ID 60 in case I get we used to have pagers you know we'll get a page in the middle of the night I gotta go do a unblinding get up and go pull up the spreadsheet and figure out who is assigned to 60 so how do you do that how do you arrange all that I'm going to show you that right now is what you'd make is what's called a crosswalk um it's called so we're going to go back here and so we're going to pretend that order that hospital name is like the person's name and order is like their original number and study ID is their new number so the first thing I'm going to do is I'm going to copy this whole page and just do this in control C and then I'm going to go over here in haste okay here let's make it a little bigger okay so and I'll call this um study ID crosswalk okay this is a random thing keeps updating so I can get rid of this just right click and delete and I'm going to pretend this is the person's name so we could figure out like we could figure out who to call who emergency room call whatever and I'm going to just erase all this other data okay okay there we go this is the study ID crosswalk okay so right here we can sort this by anything let's sort it by study ID here my data has headers and okay see it's all kitty wumpas here but the study ID so let's say somebody calls me and says uh oh it's terrible this person this hospital went to the emergency room that doesn't make sense this hospital went to the emergency room and it's hospital study ID five monica go figure out um uh who it is right and it would be um the va boss but it's not really who who it is what I'd be figuring out is there'd be like a column here that would say or somewhere it would say I I think I would put it in this I'm trying to remember what I used to do because this this spreadsheet I'm making with this crosswalk uh that was so locked out almost nobody should get at it and so I I think I put the treatment assignment in here so I would then look up the treatment assignment I'd say okay study ID five what did we say was assigned to placebo or was assigned oh we were doing blood pressure study so it was assigned to drug a or drug b and so um so this is so but this is what I also use when I'm just de-identifying like if I was going to do that thing and say okay thank you hospital for consenting to be in my study we're going to send you three surveys over the next three weeks and here's your study ID and I want you to enter it and fill out our surveys and then fast forward my data analyst comes and says um you know somebody didn't you know they filled out goofy stuff and I'll say who filled out goofy stuff maybe we can ask them I'll call them and see why they filled out goofy stuff and they said five I'd look it up and say it was the VA boss I don't know why I picked out the VA I used to have a friend who worked there all right actually I still do have a friend who works there maybe she's on here are you on here Heather hi I don't think I even invite her I should invite her these things she's really smart too she's good after me all right so so far what I've done is I've shown you how to how to just number the rows automatically in excel I showed you how to make a random number that just keeps updating I showed you how to make a study ID and you know how to sort by the random number make a study ID I showed you how to make a study ID crosswalk um now uh let me go before we go on let me see if I have any questions no I don't have any questions um now we're gonna move to um um what if we wanted um to randomize okay these uh these um hospitals like in the treatment groups like we're carrying on with the thing so um I'm gonna go over here and talk about what like actually we could just do it here um there oh there's one thing I wanted to tell you about and that is rand between so let me just show you why I don't use rand between so I'm gonna copy this over here to just show you why I don't use rand between okay so um well I won't leave it on here I'll just demonstrate it so I'm gonna erase these two because this is what we started with is um just where I had order okay now we'll notice that there were 72 hospitals now when I was young and naive I thought that maybe I could just insert a column and put like a new order right and just do rand between right so how how does rand between work you do equals rand between and it's going to give you a random number in between two numbers you see this down here this top bottom top so if I said rand between five comma ten see we see five now I'm going to go enter let's see here break it down a few see five so that's what it does okay so I mistakenly thought I could do rand between like one and 72 and just get a new study ID for them okay like that but you probably have already figured out the first problem with that and the first problem with that is that every time I did something like if I go high enter the number change okay but you can you can solve that like if you if you ever needed a bunch of random numbers between one and 72 you could do this then you just copy this I'm going to put copy then I'm going to go over here and I'm going to go over home pay special and choose values because then now if I click here this says 24 whereas this says rand between whatever so you can just get a bunch of values and that's how I do it I I do this and I copy it it's like I make a little random number machine and then I just get rid of it once I get my random list but the problem is um actually let's do that let's say you know so I said okay this keeps updating we'll solve it by going copy and then just pasting it and doing paste special values and then it's stuck it's like there okay so then I got rid of this random one and I said okay yay I've got this new order I'll sort by that so I sorted by it um but as you can immediately see I got duplicates and so that's why you can't really do that um because you'll you need to have a uh a unique ID and you just won't get one that way so that's something that I wanted to show you is the reason why I do it my way to create this random or the study ID that's random you know like it's randomly assigned is this other way instead of using this random so now I'm gonna get rid of this um although I do want to say um let me just say this here I want to say that um rand between is really useful okay um I find it kind of useful for making fake data like if you want to create a fake column um and you want it to be between one and three you can say rand between one to three like if there's three different sites you're doing um but if you do that again you've got a copy and then paste values because this random number will just keep updating um so okay so that's rand between now what I'm going to talk to you about is how you do a randomization so remember in observational studies you know you'll just have you'll have a study ID and you often will have a hypothesis right like I I think that um um for instance I think that taking this kind of pain medication is associated with lower heart attack risk or something like that you could have that as a hypothesis for an observational study but what you wouldn't be doing is assigning anybody to take the pain medication you just be studying people who are already taking it or not taking it like there's people with arthritis often take pain medication so you could get people with arthritis who are not taking it and taking it and then follow them through time and when you got them in your study and it was time to make a study ID you could use my method to just give them a study ID okay but now let's turn the tables let's say that you're going to study arthritis medication you want to assign what medication they're taking okay like whether it's placebo or this pain medication whatever if you want to assign it and you want to do a double blind and all that the I I want to remind you about what actually is happening in a clinical trial because it's kind of weird imagine there are two women let's call them um Sarah and Jenny okay let's say these two women are named Sarah and Jenny okay and they're both going to be in the study form I don't know like I guess birth control pills or something let's say that we enroll Sarah she qualifies and we put her in group A then we enrolled Jenny and she qualifies and we put her in group B we are not saying that Sarah and Jenny are different people what we are saying is Jenny is the counterfactual of Sarah so we're saying Sarah let's say we give Sarah placebo Sarah is the example of Sarah not taking any drug and we give Jenny the drug then we'll say Jenny is the example of Sarah if if there was this time when Sarah split into two people she could be Sarah without drug Sarah with drug this is the time and and Jenny's just representing Sarah with the drug okay so you're probably like that's creepy right because if you knew my friends Sarah and Jenny they're really different like I can't even imagine one representing the other in a clinical trial but if you think about it if you design your inclusion and exclusion criteria for your clinical trial such that you get a very narrow range of people like for example let's say that you are staying Alzheimer's and you enroll people with a certain level of function just a certain level you're getting a very homogenous group and so if you put one in one group one in the other you're hoping they're kind of a mirror image of each other because they're so alike so that's what we're doing now is we're randomizing we're saying okay who's going to be the counterfactual of who right so let's go back and um and so remember why do we randomize okay well we you know I talked about Sarah and Jenny but randomization is really about groups it's really about let's say I just take a hundred people from Boston they're going to be all like diverse some will smoke some will some will be a B some will be and whatever my outcome is is going to probably be influenced by some of those factors and we often adjust for them and you know analyses like adjust for obesity or whatever you know risk obesity gave or whatever but when you randomize when you just roll the dice and you randomly put a hundred people in two different groups you'll be surprised at how similar those groups are so if I took a hundred people from Boston and I randomly put them in two groups and I looked at the mean age it probably wouldn't be that different now if I took a hundred people in Boston who are age 20 to 30 and looked at them in randomize and looked at the mean age it would probably be exact exactly the same but if I took a broader range you know but then if I expand that to a thousand people or a million people eventually when you split it it's the same and that's a whole idea behind machine learning and artificial intelligence which we're not doing today all right and so what I'm going to do so so the thing with machine learning and artificial intelligence is they've got boatloads of numbers they say okay here are all the visa transactions let's split them and then we're going to study these and apply the model of these right well we don't really have the luxury of a million transactions if we're studying arthritics and giving them some you know pain medication right like I'm remembering a real life study you can read about it it's one of my publications with my friend Bob where there were 30 people in the study we only had 30 people we're studying pathology trainees and that's all that there were and we had to randomize them into two groups because they were going to get two different educational things and we really wanted them to be like comparable you know because some trainees took a long longer time than others and the problem was we had so few people some of our randomization wasn't very random so like I've been helping I have a Chinese Chinese customer and she's making some with some materials in Chinese and I can't read Chinese and one of them she made was an online quiz and I was just testing the quiz you know see if it worked and one of the times I tested the quiz I got 100% I can't even read it and so that's kind of what that that illustrates is that when you do randomization with just a few people sometimes you can end up with all of a certain group in one spot you know like if there is men and women in that 30 I might end up with all the women in one of the groups if there weren't as many women you know what I mean and so sometimes you have to randomize and throw away a randomization it's like you have to re-roll the dice a few times until you see something you don't have to do this with big numbers just the small ones so I'm going to just kind of show you how to randomize into two groups and then I got a special treat for you is I'm going to show you how to do a blocked randomization and why you would do that all right so let's go back to our fake data actually it's real data about fake participants okay so let's go back to our data here so let's pretend that so we have the study ID here let me just sort by study ID okay so we've got we've got all these participants we have 72 participants okay we're having fantasy uh we started a clinical trial and we said you have to meet this inclusion exclusion criteria and um and the 72 participants did and now we want to randomize them okay so uh what what should we do let's go uh copy this and I'll put it up there and the first thing I'm going to do is just copy it I just copied that up here okay well we already have this random thing that keeps updating now I can get complicated on you but let's just say we're randomizing them into two groups okay um what I could so I already know that there's 72 of them so if I were to make the groups half and half what is that that's 36 in each group I'm so bad at math so I want to put 36 in one group and 36 in the other so let's just um let's just sort this again by a random number right because we can get back the original order we can get back to study ID let's sort by random again and and let's make a new um let's make a new field we'll call it actually let me put it over here by house we'll say we'll call it treatment okay and I'm gonna make it be one for meaning they're going on treatment or zero meaning they're not getting a treatment okay so uh I want like the 36 of them to be on treatment so I could just go here like one two three four I'm just going to drag this down to 36 and you can kind of see it oh that went too far like I literally do this 26 okay here's 36 okay so these are the first 36 rows so I'm going to just give them the treatment right because and this keeps updating just ignore it ignore it okay so they get the treatment okay and everybody else does that yep everybody else doesn't so the everybody else gets a zero I'm just copying the zero and putting it okay and then I can just erase this I just did that to make sure I got the right one okay so now let's go back and sort by study ID again okay so now what do we have let me see this we've got we still have this random that I just keep living there we have this order um which we the original like their medical record number whatever their names so we know who they are and we have their treatment assignment and I guess this is maybe the best time for me to do like this copy this go over to our study ID crosswalk and then go there that's probably what I do I can get rid of those here so this is what I would use then to unblind is I'd have this treatment assignment in there all right so this is um randomized okay and so what what I did with my friend let me um so what I did with my friend is I just took 72 things and randomized them and I didn't really look at them if let's one of the things that I noticed when I went through and I flagged them as rule or I didn't say rule I'm metro or not later I kind of looked at and I said you know there's a lot of metro areas in Massachusetts there's not as many non metro ones well let's say I did this randomization I looked at it and then all the non metro ones ended up in one group and none of them in the other you'd be like well Monica you should have predicted that because you're the one who did the sampling lecture that says when you do simple random sampling which is what we're doing uh you can get imbalances like that so that's the problem with simple random sampling now if you only have uh like like one thing I could do is just throw away the randomization right and just like I could just sort it again and look at what I got and so like um so that's an option just throw away the randomization get another one um and that's what I did with my friend like you know if all the women that ended up in one group you know we only have 30 I just did the randomization again whatever you do don't pick don't do anything picky like don't use your hat just use the randomization you know like what kind of I'll tell you this funny story um the same friend Bob one day brought he's a pathologist he brought me um a an excel sheet that was had 40 rows on it and each one of them had was a tumor and had some information about it and he said Monica let's write a paper with my friend I forgot the friend's name the the friend who had given it to him and he and I said this is great and he said but I'm having a problem is the before the friend met me he wrote an abstract and published it and he got a mean from one of these variables and I can't get that mean and I'm like yeah I don't know I mean he just took 40 people with tumors right he's like yeah so we met with a friend and I said okay would I take the mean I get all this but maybe you did a mean of a group like because they had diagnostic groups and the friend I still remember the friend said no I just picked out what values to put in the me and we're like just pick them out like you pick flowers or you pick up fruit at the grocery I mean how do you pick them and he's like well this guy was high on this and low on this he was looking at lab values pathologists so I picked him but this other guy wasn't I don't know I just I didn't think we measured it very well I didn't pick him I'm like you can't do that you have to use the statistics you always have to be guided by numbers you know even when you're getting rid of people and you're a data set so whatever but so what I'm gonna do now is demonstrate what you can do if you've got this problem that I could have with this data set we're not we're I'm taking a simple random sample like I just showed you but the there's this characteristic which I'll show you here we'll look at what I'm talking about there's this characteristic this is it this metro here and actually let me just sort by metro and I'll show you and again this is not really a real I just kind of made this up so you can see here here all the zeros see there's only about 27 of these 72 that are in the non-metro area so um so I could end up with a bad situation with my randomization so um what I'm gonna do I don't know where to put this so I'm gonna show you how to do a blocked randomization it's probably easier for me to just show you than um to explain what I'm doing first we'll just sort this by study idea right okay now I'm gonna do this thing where I make two lists I make a list of the metro in the non-metro so first I'm gonna call this metro I'm gonna call this non-metro okay just to remind myself what I'm doing then over here I'm gonna copy this I'm gonna just remove this treatment because we're gonna make a new one um and I'm also gonna copy this over here I'm just copying the whole thing I'm just getting rid of the random order thing okay or maybe I want to keep it yeah let me keep it let me keep the random order thing never know when you're gonna need that okay so so that's what I did first okay so that's just a copy now my goal is to remove the non-metro ones from the metro one so I'm gonna go sort this by metro and see these are the non-metro ones so I'm gonna go and remove them all the zeros and then right click and delete then this one over here I'm gonna sort this by metro and then remove the opposite remove the metro ones okay so I split the list so that's the first thing you do okay so the list is split now now metro is separate from non-metro there's no copies okay now I'm gonna create a sheet called blocks and I'm just gonna format a little bit here because otherwise it's totally annoying um we'll just do that text here and I'm gonna call this block block num for block number okay then I'm gonna call this metro id and non-metro id okay now how many blocks are we gonna have I don't know how many um let's I'm gonna just make 10 for you six seven eight nine ten and you're probably like what are you been doing so I'll show you okay so the goal is to fully make pairs of metro id and non-metro id but we don't want them to be obvious so let's go and sort by um again the random id that's why I decided to keep it so we're I'm sorting both of the metro and the non-metro by the random id just once so I can put them in the random order okay so here we have in the metro the first study id we have is 70 so I'm gonna put that 70 and in the non-metro the first study id we have is 44 okay so now we created a block let's create a few more blocks um so the third one this is going to be 21 and 37 so 21 37 okay and then this one it's 62 and 60 kind of close 62 and okay now let's say we filled this in all the way down here now remember my problem was that I was getting all for example the non-metros in one group if I pretend these are filled in if I do a random number here and see remember rand that if I do this and I'm just gonna drag it down here okay I've got 10 units of course I would really have at least 30 but I'm just showing you this first I start by random and then now I can do the treatment right and let's say I do five and then five see how if these were really filled in it would be forced to have equal metro and non-metro and if this were filled in here the placebo group or whatever it was the control group would be forced to have equal numbers it would have five metros and five non- metros and it would be totally random like they wouldn't even know that that happened so that's what's called a blocked randomization and that's why you do it is to make sure that you have an evenness of something of a characteristic that it's coming out uneven when you do it the other way all right so let me stop here okay well I didn't get any questions or if you're asking questions it's because the chat doesn't work and I don't know about you asking questions but anyway I just thought I'd hold this little live stream to show you these tricks with Rand and Rand between although I didn't really show you much good to do with Rand between with this effort so don't bother with Rand between for doing this but Rand between is a good function you can use it for other things but Rand I definitely use Rand and you see I spend a lot of time sorting things and creating um orders and creating columns um and that and I do that in Excel now I use R I use SAS and obviously if we were dealing with a list of transactions you know I couldn't be doing this in Excel what's nice to do in Excel though is you can take some of those transactions you take like 100 and just play with it in Excel and decide how you want to de-identify or how you want to identify things or how you want to do a black randomization you can like I always think of Excel as kind of like a little sandbox where I can figure out how I want to do something before I go to SAS or R so whatever I program I you know I don't go oh maybe I should put a random color maybe I should throw this away or maybe I should pair this up you know that that process I did where I looked up the block I was making the block I looked at the first one I looked at the second you could do that so easily automated in um a uh statistical software oh I got a comment here thanks Monica I had to randomize something on the fly just now and ran between is a great function to know awesome I'm so glad I'm definitely applied not theoretical okay awesome I'm so glad you're already like I'm selling random and I even sold a ran between today makes me feel good um all right so yeah so as you can see the um things that I'm doing in Excel or that you could do in SAS and R these things they're not statistics okay it's not math it's research administration and that's often the thing that makes or breaks your study it's like do you remember a treatment group you would sign these people to do you know who these study IDs are do you know what's going on so if you suddenly feel like your heart's in your throat you don't know what's going on I suggest you take my data curation course that's course bubbled up from all the stuff I was like you guys should have kept track of this um and you're like okay I'll keep track of that how do you keep track of it right like those spreadsheets I just made they come naturally to me they don't come naturally to everybody so if you want to learn more about things like that like how how do you even keep track of this how do you even plan this I just did this demonstration but there's a lot of things I know how to keep track of I've done a lot of research administration because I think you know when you're an epidemiologist you're just kind of stuck with it it's kind of like I have a friend who's a musician maybe he's on here um and when you're a musician you're stuck dealing with sound you're just stuck with it you've got to deal with sound you've got to protect sound or whatever and you know when when you're an epidemiologist not a laboratory statistician or laboratory technician or something but an epidemiologist like you're always going to the IRB you're always writing protocol um you know excel is so much your friend word is your friend you're not making beautiful stuff you're making stuff like what I just showed you you're doing research administration which is really important that's that's how you make sure you're doing ethics that's how you make sure you're the right answer and a lot of people just don't really teach it or they don't really know how to do it so I'm trying to fill in the blanks with those all right well thank you to everybody who showed up today and um made with in my audience because I always feel better when there's more people in my audience um if you liked what you heard either you showed up today or you um are listening to this afterwards and you like what you heard please like hit the like button I don't know how to talk because I just talked for an hour and also you know please subscribe because that's actually what I'm trying to do this year is get more subscribers on my youtube channel it's been around for a long time it's kind of like I'd like to think of it as like you know when you have a there's a vacation place nobody's seen before but it's so beautiful but then when everybody gets there it's not going to be so beautiful anywhere so it's still beautiful it's still pristine my my um channel is still like only 3000 subscribers so now is the time to get in there and start making comments and uh and be a star and really support me all right well thanks so much for showing up it's a Saturday so and it's beautiful weather today in Boston so hopefully you'll get out and do something fun um thanks very much and I look forward to seeing you in the chat on my next