 In this video, I want to show you how to easily create some simulated data so that you can use when you are learning how to use your statistical software. Let me show you. I'm going to do this variable by variable, in other words column by column. So I'm not going to do one for every patient that would be too long. I just want to show you if I click in one of these ID cells, the patient ID, and I hold down control of my keyboard that will be command on a Mac and the down arrow key I jump right to the bottom. And we see we are in node 251. That means we have 250 patients in this study. Because remember row number one was reserved for the column headers or the variable names. So I'm going to hold down command and control again, hit up arrow and I'm right at the top. So what we're going to do for age is to suggest that our youngest patient was 18, our oldest patient was 85, and we just want Excel to choose a random integer value, that means a whole number, between 18 and 85 inclusive. And the way I'm going to do that is with all formulas in Excel, I start with equal and I'm going to use the rand between function. We see it comes up there, I've selected it and the tool tip is very small but it says bottom comma top. So I'm going to say the youngest patient was 18, the oldest patient was 85, and I'm going to ask Excel just to choose a random number in that range inclusive, 18 to 85 and lo and behold it chose 18. Now I'm going to go right to the bottom here, remove my cursor until it changes into that little arrow there and I'm going to double click. And what's going to happen is this whole column is going to be populated right to the end, patient number 215. It's going to stop there automatically because the column right next to it stopped there, there was nothing below that. So the spreadsheet software will do that for you automatically, I'm going to go all the way back up. Now I'm going to hit my F9 key and every time I hit the F9 key there's going to be a new random number inside of each of these columns. They continue to change and what you'll see is when we change the next value, the next value, every time I type something in there I might suggest that this patient was a rural and when I hit enter you see a change in the age column. So every time you do this things are going to change and I'll show you how to stop that right at the end. I'm going to go into that column and hit delete because I'm going to move all the way across because I've got something hidden on this side. I'm going to bring it back in. I'm going to hit CTRL X or CMD X to cut and I'm going to CTRL V and paste it right there just so that it's closer. Now this is only and I put some emphasis on these words. So this is only for creating this simulated data. I'm just showing you how to create your own simulated data. In a real dataset we won't have any of these. They will not be there and when I save this file in the end I'm going to actually delete this so that I can have my flat sheet only with the patient's details that I want. The data point values for these various variables. The reason why I'm bringing them in is I just want to show you a little trick. Here on this side for age we just had a uniform random distribution. In other words it was every value between 18 and 85 inclusive was equally likely to be chosen at each spot. But what if I want a random selection of these nominal categorical variables. As I mentioned our patients came either from the coastal region inland or from the city and I want this whole column to be populated by those three values at random. And the way we do that is the following. Now unfortunately there isn't a spreadsheet formula for this. You have to put a few of them together. But once you see it you'll remember it. The first one is equal of course because this is a formula. It's indirect and open parentheses and then concatenate. And then inside of quotation marks the column from which this was taken. So that was column M close my quotation mark comma then ran between ran between open my parentheses. And you see this was in row one two and three. So I'm going to say it goes from row one to row three. So it's one two and three. And I've got to close those parentheses three times because I had opened them three. There's one there's a second one there's a third one so they've all got to be closed. And I'm going to hit enter or I can just hit that little tick mark there and it chose coastal chose one of those. Once again I can hover over there until it turns into a little cross. I should say a little cross and double click that and you can see at random it chooses one of these again. It's a uniform distribution. So every time each of these values will have one third of a probability of being chosen. Next up let's go to temperature. Now let's suggest that we want temperature to follow a certain distribution. And let's have that distribution be the chi-square distribution with one degree of freedom. So what do we do there? Equal chi and if I move down there I see chi sq dot i n v is what I want. I'm going to open my parentheses and it says probability. Now I want a random value so instead of typing a probability between zero and one I'm going to say a rand open close parentheses. Now the rand function there just takes a value between zero and one which just relates to a random value. So a random probability rand open close parentheses and then the degrees of freedom. I suggested we're going to have one degree of freedom but this is going to give us values around about the one mark. So I'm just going to say after closing my chi-square dot i n v parentheses I'm going to say plus 36. Because I want that value which is going to be in the order of one, two or three to be added to 36 to give me my final temperature and we see the first patient's temperature there is 36.2. I'm going to hover over there, move my cursor down, double click, double click and it's going to fill in everything for me. Now you'll notice that I only have one decimal place here. In actual fact if I highlight that whole column there's actually quite a few decimal places. Well let's go down just on the values that we chose. There's quite a few more decimal places there and I'm just increasing them there. It's just the way that I wanted this to be printed to the screen with a one decimal value. If I click on one of these though we see they all follow the same formula but they do have more decimal places and I'll show you in the end that those decimal places will remain even though they printed like this to the screen with only one decimal place. Let's make a heart rate value that follows a normal distribution. So I'm going to say equal n-o-r-m dot i-n-v norm dot inverse i-n-v open and once again it takes a probability, a mean and a standard deviation. The probability I want to be random, I want the standard deviation, the mean to be let's say 18 and with a standard deviation of 15 and I close my parentheses, I hit enter. Once again I'm going to go into that cell and I'm going to double click and everything will be populated. Once again let me just show you there are many decimal values there. I am just hiding them with those little buttons there. For some more practice let's make a white cell count and let's make that from a totally uniform distribution. So once again I'm going to say rand between. Now with a rand between function remember we can only have integer values here but I want a white cell count. Let's make it from 8 to 20 anyway in that region. There we go it has a one decimal place there because the first one there was set with one decimal place. So that is a uniform distribution. Each of these values between 8 and 20 inclusive has an equal probability of being chosen for everyone. Let's practice one of these again. So I just want it to choose between yes and no. The patient either has diabetes or the patient does not have diabetes. Now I'm going to do that remember it's equal, indirect. I'm hitting tab for autocompletion then it's concatenate. It's the second one there and that resides in column n there. The values that I wanted to choose from. RAND BETWEEN. RAND BETWEEN it's only rows 1 and 2. So 1, 2. Close, close, close. Enter. It shows no there for me. It's going to choose something different for you. I double click and everything is completed there. Grades 1, 2 and 3. Let's make that a uniform distribution. So that's just going to be RAND BETWEEN. And all we want is between 1 and 4. And there's a 1 and 4 probability of every one of those values being chosen at every spot there. Double click and everything is populated. The treatment group is going to be nominal categorical. We're going to use indirect, indirect, concatenate, concatenate. This time I'm choosing inside of quotation marks. I'm going to choose from column O. Close my quotation marks comma RAND BETWEEN. And it goes from row 1 to row 4. So I'm going to say 1 comma 4. Close, close, close. There you go. And it shows a P there for me. It's going to be different for you. Double click and everything is completed. Now, survived and died. Let's make a little column for that. I'm going to ask for it to be between survived and died is the two data point values for this nominal categorical variable called outcome. I'm going to choose that. I'm going to say equal indirect, indirect, concatenate, concatenate. There we go. And it is going to be from column P in my instance, comma RAND BETWEEN. And RAND BETWEEN is only 1 and 2. Close, close, close. There we go. It shows survived. Going to double click. Let's do the last one and that is going to be the months. So from the time that the patient started their treatment with a randomization until they died, let's make that a uniform distribution as well. So RAND BETWEEN. And we're going to suggest that we make this from, let's make it from 5 months until let's make it 25. Let's make it longer. Let's make it about 30 months. And I'm going to hit enter there or return. And once again, I'm going to double click and everything gets completed right down to patient 250. So that's a nice way to create some simulated data. You can create all sorts of data this way. All I want to show you now, every time I hit F9, you can see the random values change. And one way to get rid of that is I'm going to choose all these random values. Remember, we started that right here. I'm going to choose all of them in that first row. I'm going to hold down the shift and command or shift and control key, hit the down arrow key. So they're all selected. And I'm going to say copy. And then down here, not just paste, but I want to paste only the values, paste values. As soon as I do that, we are no longer going to have these chosen at random. The values that were there, when we hit copy, they're going to stay intact. I'm going to hit escape to get rid of these marching ends. And now when I click on these, you see that it's actual values. It's the actual values. It's no longer those functions that I typed in, the formulas that I typed in to get these random values. It is now the actual values. And what I can now do is just to get rid of these, I'm going to delete them. I no longer need them because this has been done for me. It's all simulated data that we can now work with. We have a chi-square distribution here. We have normal distributions. We have uniform distributions. And we have uniform distributions just for choosing these nominal categorical variables as well. So there's a beautiful data set for us to work with. We can import it into our analytical software and we can do statistical analysis on it.