 All right. Let's get started. Hello and welcome to the R Medicine Pre-Conference Workshop intro to R for clinical data. My name is Stefan Kedaki and I'm dialing in from the beautiful sunny Philadelphia. I'm the chair of the R Medicine 2022 Organizing Committee and I'm running this workshop. I'm an assistant professor of pathology and laboratory medicine here at the Children's Hospital of Philadelphia. And so just a little bit about me, I've been coding in R for I think more than 12 years now and I'm in R Studio Certified Trainer. And I've taught R to physicians and other healthcare workers at the Harvard, Massachusetts General Hospital at the University of Pennsylvania, and at various conferences, both online and offline. And so this is not my first rodeo and I'm joined today by two co-instructors that I'm that I'd like to introduce. So here's Joe Rudolph. Joe is an assistant professor of pathology at the University of Utah. And Patrick Matthias is an assistant professor and vice chair of clinical operations at the University of Washington in Seattle. And we're fortunate to have two TAs help with the course today. So, so thank you very much, Rich and Sarah for helping out. All right. So, I wanted to spend just a few minutes to orient us to the technology and know we've been in COVID times for it seems like forever, but I still wanted to go over this real quick to to give you sort of a sense what what what we have in store for you. The will have the main part of the workshop in webinar style with everyone in one big session as we'll have 68 participants today. And, and you will all be muted and your cameras will be often will be recording the session for replay. Actually, are we recording. Yeah, seems like we're recording. That's great. We'll also have one or two breakout sessions. And for the breakouts we encourage you to participate by turning on your microphone we also encourage you to turn your camera if you're comfortable, but that's completely up to you. And just so you know we will not record the breakout sessions. These will be these will remain private. So, so this is, if this is this is your main zoom window. On the bottom, you should be able to see, you should be able to see a button that says participants and wanted one when it says chat. It may look a little bit different now because zoom has evolved. But, but what I want you to do is to have both of these windows open you want to have the participants windows window open and you want to have the chat window open. So, so that should look a little bit like this. And this is important because this is how we're going to communicate with each other during the during the session. So, we will make some use of the nonverbal feedback functionality of the zoom, and at various points I'll ask you to click yes to let me know that I can keep going. And what I mean by that is to click on the, on the, and I think, I think in your window, it says reactions. And there will be sort of a white checkmark on a on a on a green background and we'll, we'll try that out in a minute to see if that works for everybody. We will also use the chat as in case nonverbal feedback doesn't work sometimes we can get it to work, but also as a communication tool to to help you if you get stuck with a technical issue okay. So, please note that you know if I'm presenting, I, it's, it's hard to present and also read the chat at the same time so don't send me when I'm presenting any private messages, just send to everybody. And one of our co instructors or T is will will will help you. So, when I just just, you know, set some expectations for help. Well, we have, we have 74 participants today, maybe even more later. So please only ask for help if you really need, if you're really stuck with a technical issue with zoom, or the training environment that will introduce, if you have sort of a general question about the material. I encourage you to write it down and ask during the breaks or after the workshop will be hanging out and we'll be able to answer questions at that point. All right, so this is an interactive workshop. And so this is, this is our first interactive exercise. And this is also where I'm going to learn whether the nonverbal feedback reactions will work. So what I'd like you to do is to locate the reactions. Where it says reactions in the bottom of your zoom window, and give me a gig and click the guest button, the green checkmark. Okay, I see I see him coming in wonderful that's great. That's awesome. And it looks like most of you are finding it. Okay, this is perfect because this is what we're going to use throughout the workshop for me to know whether something makes sense whether I can keep going, or whether you're lost. So I see lots of yeses, a given and then yes and then please write your location is sitting country in the chat window. All right. So we have a roster from Toronto. We have, okay, it's too fast to read, Sergio from Toronto also. I see a lot of Philadelphia here Chris from Greensboro welcome. Okay. We have a Colorado from from Italy from Padua. Artemis from Russia. All right, welcome everybody. Sergio from Barcelona. Okay, this is a truly international audience. This is wonderful part from India. Welcome to the workshop. And so I wanted to this is just one one more slide to two or a few more slides to orient you to the course. This is a gentle introduction to data science designed for healthcare professionals and clinical researchers, or you know, people who who work with clinical data, but don't have a background in programming so if you know C++ this is not the workshop for you I'm assuming I don't assume that there's going to be a lot of C++ programs in here. So, so the way we're teaching this is is using this this framework here this this data analysis pipeline free framework and and and so so the way that we think about doing data analysis is that you always start by importing data and cleaning it or tidying it up, which which which puts the data into a sort of a format that makes it easy to do with data analysis. And then you start this iterative process of of exploratory data analysis so this is EDA here exploratory data analysis and usually that requires some data transformation. So these are two ways that you gain knowledge from your data or visualization and modeling. And then finally you need to communicate your results. And so we'll introduce the basics of how to do all of this stuff with our, with the exception of modeling because that's a bit out of scope for this very basic intro course. That's sort of the framework. Then, then how do we teach the stuff so we will introduce new concepts with lecture slides and we try to make them sort of, you know, pretty simple. So here's here's here's a lecture slide that we will that we will show later on. And then after after spending a few minutes introducing new concepts will practice these new skills with inner timed interactive coding exercises. So like like this year this your turn here. And you will work in a training environment on the web that we specifically set it up for this course today. And it's an important thing I want to point out that these exercises are timed. And so, so you don't have to feel like you need to finish each exercise, but it's important that you try. And when the time is up I really, really wanted to ask you to stop working and return to the main session and listen to what we're doing, because we will then go back and one of the instructors whoever whoever's teaching, what she live code the exercise for you, and a guarantee that you're going to do the most. If you try the exercise, you know struggle through however far you get, and then stop at the, you know, five minute or whatever time mark and then watch the instructor live code through that through that exercise themselves. All right, one more serious thing. I need to point out since this is an interactive course. I want to provide a welcoming and supportive environment for everybody, regardless of background or identity. So, if you have any any questions about this please take a look at the, at our code of conduct which is which can be found in the main of them on the main page of the website, and also to respect the privacy of participants we don't. We also have screenshots, recordings or photographs, all of the materials of the workshop are available online, and, and you will have access to recordings that will, you know after doing some video processing will post on the website for everybody to see. All right. So, this is our second interactive exercise this is going to be a meet and greet. So we'll send you into breakout rooms where I'll have 10 minutes to meet somewhere around 10 of your fellow participants. And what I'd like you to do is, you know, if you if you're comfortable unmute yourself. I'll pop on the camera. And, and I'd like you to tell each other, what you're hoping to get from this course today, and what what you're hoping to, to apply your newfound our skills towards and the other thing that we're going to do is that the TAs will jump into the sessions and they'll hand out individually to each one of you, your login credentials to the training environment and which we're going to start using in the next upcoming session. Can I get a quick double check on the mic and then just on the screen share that the slides are presenting and that the clouds presenting. Okay, great, thank you, then we'll go ahead and get started so so it's great to meet those of you who I met in the breakout session now I'm really glad to meet the rest of you my name is Joe Rudolph and I'm an assistant professor pathology at the University of Utah. And, and in this session we're going to cover an introduction to our studio and our markdown slash quarter. We're all excited to get started programming and we'll do that in just a few minutes before we do that I want to introduce you to three items the first is our ours of programming language for data analysis. There are a lot of really wonderful things about our it's freely available. It's great for wrangling data, and it's very capable for producing impressive visualizations, there's also a great community of folks using it for data science, and we're excited to welcome you all to the art community via this course today so welcome. Here's our studio, and our studio is the name of a free piece of software made by a company called our studio which will be having a new name coming this fall as positive but for today our studio and you can think of our studio as a sophisticated text editor for writing our code. You can run our studio on Mac or windows or even on a server as we'll do today. Finally there's our markdown, or at least that's what we used to say when we taught this course. The next generation of our markdown, known as quarter was recently released, quarter is a computational document format that has executable code. This also includes capabilities for annotating your work which is incredibly handy and it supports multiple programming languages for mixing and matching code. And if you're interested in hearing more about Corto, don't forget to check out join JJ layers Cordo keynote which will be on Thursday, August 25. Our studio and Corto synergize for a robust ecosystem for performing data analysis. Without further ado, let's dive in getting started with our studio. So as I just mentioned our studio can be installed on a server or locally on your computer. Our studio server is a version of the R studio development environment that can be accessed from a web browser. This is what we'll be using for today's training environment. The desktop in comparison is a version of the R studio in develop environment that's installed on your computer. This is what you should use after the course to continue learning our and working on our projects. We have a course website, which has been distributed and we'll share share with you all and in this we've included links to videos demonstrating how to complete local install of our studio desktop. We provide access to the course content via GitHub repository. So though the training environment today is ephemeral, you'll have access to the course content long into the future. And one last reminder, please don't upload any external data, especially protected health information to the cloud. See my slides are stuck here give me one moment. So now we're going to get started and get logged into our studio cloud using the login credentials so you each received a username which starts with our med followed by three digits and a password consisting of six digits in the breakout sessions. So navigate your browser to our med 101 dot cloud and use the login information that we provided you username and password to get logged into our studio cloud environment and then we're using those reaction feedback so click the yes that green check mark once you see the R studio panes and so we'll we'll start in minute timer may take a little bit longer to get everybody logged in and if TAs and other instructors could monitor chat and give me a signal of how we're doing getting folks logged in. Looking pretty good we have 20 checkboxes already. The checkboxes in 20 seconds that's great. So again, if you don't have username and password drop a message in the chat to everyone and we'll have someone message you individually back with the username and password. So it looks like chat we've got what we've got one who's looking for username and password. Okay. So we have another request in chat for username and password. I'm going to hold for maybe just another one or two minutes here, be getting folks over the hump and into the environment is is we want to make sure as we can get as many people into the environment as we can. Elizabeth, Elizabeth is asking, are the user names the same as the password. So the user. So you should have received I think I sent this to you earlier, a username that starts with our MED and some and three numbers. And the password will be a six digit number. And if you didn't receive that I'll let us know in the chat and we'll get you a new credential. So we have 41 checkboxes please don't forget to let us know by clicking on reactions and the yes checkmark that you have been able to log into the environment. Stephen I'm not seeing any new messages pop into the chat. So up through 38 after the hour. Can I get a quick scan of the checkboxes. We got 45. I think people are looking for the yes button it's okay. Yeah, if folks are looking for the yes button but they're logged in wet that will count that as a success we can we'll continue to find the yes button under reactions as we go through the course today. If they're so I'm not seeing, okay seeing yeses and not know so I'm going to continue. Is that okay. Okay, all right, great. Okay, so here's what the R studio window looks like on the top left is the editor. This is where we enter our code on the top right is the environment and the environment allows us to look at data that's been loaded into our interacting with data and the environment tab provides some of the functionality that you may be familiar with from working with point and click tools like Excel. The console is on the bottom left. You can use the console to quickly run an individual our command like install a package that we won't be using the console in this course today will be largely focused on the editor. The pain we have labeled as miscellaneous miss has a few tabs. The most important one for today is the files tab, which shows the files that will be using for the course. One of the most powerful aspects of working in the our environment is the ability to conduct reproducible beta analysis, those that can be shared revised repurpose and reproduced by others. To highlight the importance of reproducibility let's consider the following case. In 2002 researchers at Duke University tried to use microarray gene expression data tumor cells to predict sensitivity to chemotherapeutic agents. The approach generated a lot of excitement at the time and the resulting work was published in in many high profile journals including nature medicine jama and the new England journal. Unfortunately, there were a number of serious errors in the data analysis. The media has focused on the fact that the researchers tried to cover up investigations into these errors and press on towards clinical trials, even though there were at the time some open questions about the validity of the methods. Even more unfortunately, patients were enrolled in clinical trials and allocated based on flawed models and in all likelihood patients were actually allocated to the incorrect treatment arms. In the end, 18 papers were retracted and the institution settled more than 10 lawsuits for an undisclosed amount of money. Keith Baggerly and Kevin Coombs biostatisticians at MD Anderson uncovered the mistakes that led to these retractions in painstaking work. Let's look at one of the errors they found. These are a few of the hundreds of microarray pro probe sets, each corresponding to a gene that the Duke investigators reported to predict sensitivity to five flurry urus cell. And here are the probe sets that the MD Anderson team got, you can see that these two probe sets are different. The last number of each probe, you might notice a pattern, the number of the probe set that Duke reported is exactly one less than the number of the probe that the MD Anderson team found when they redid the analysis. This is what's called an off by one indexing error. And it's what happens when you use a tool, a point click tool like Excel and accidentally delete one cell. The result is that all values in the affected column are shifted by one. It's a simple error to make but as you might imagine the error invalidates all downstream results. The off by one indexing error was just one of many simple errors the MD Anderson team discovered. Another type of error that was pervasive to the study was label reversal so that cell lines were labeled sensitive to a drug when they were actually resistant and vice versa. This type of error can lead to a scenario where a patient gets the chemo therapy that would be predicted to be least beneficial to them. Other problems they identified were confounding inclusion of data from sources that were not reported the paper and wrong figure shown. These are all simple errors. You don't have to be incompetent or negligent to make them. And because they're so easy to make and because without good documentation or reproducible workflow it's hard to catch them. They're also unfortunately very common. So a key issue in this case study is that the investigators used point and click tools like Excel. This prevented peers and independent investigators from catching errors in the analysis until it was too late. And the Duke study is only one example where the critical barrier to reproduce ability was the tendency of investigators to use graphical user interface point and clicks style tools. These interactive tools usually don't record user actions and because of this they are fundamentally not reproducible. So but reproducibility doesn't just help others consider the following three statements and ask yourself if they found sound familiar. Can we redo the analysis with this month's data. Why did the data and table one not seem to agree with figure two. Why did I decide to omit the six samples from my analysis. Your closest collaborator is you from six months ago and you can do yourself a huge favor by using tools that promote reproducibility. And if you're analyzing clinical data that might affect patient management or public health based on your findings. That makes it so much more important that you take the steps to ensure that your analysis is documented three of errors and reproducible by others. So we mentioned the idea of our markdown and the newest version again of our markdown is called Corto so that's what we're going to refer to going forward as Corto Corto provides us with the features and tools to tackle the reproducibility problem. In Corto we can craft computer code mixed in with narrative and annotation that documents the purpose of the code and details about the decisions that we made in our analysis. Corto provides a lab notebook style interface for analysis visualization and annotation of our work. Like our markdown before it Corto is quickly becoming the gold standard for reproducible data analysis. In this course will teach you how to use Corto and we encourage you to continue using it consistently in your future work in fact it's the strongest recommendation that will make to you today. So Corto documents are composed of three basic building blocks. The first is a header, which includes pieces like the name of the document and the documents author and the desired output format when the document is assembled. The second block is the text Corto documents can be marked in ways that promote readability with various formatting styles so here you can see large and small headers. The main text also bulleted lists that cannot be styled as bold you can include hypertext links there's a number of formatting options. And finally there are code chunks, code chunks include our code that can be executed to output results. Within the quarter document we can execute a single code chunk by clicking the run code chunk arrow, which looks a little bit like a green play button, which I've highlighted here. Also with the click of the button called rendering, we can turn a quarter document from the text on the left here to this organized and annotated and presentable document here on the right. So let's look at this quarter document in greater detail again the first block is the header section, and on the left you can see the input and on the right you can see the output once the document is rendered. The main block includes narrative text with styling applied. Then we get to the code chunks shown in gray boxes. Don't worry about the grammar of these particular code chunks at this point in brief the first code chunk generally generates 100 normally distributed values, which are which we are performing a summary on the rendered document shows us our code and the output of that code. The first code chunk here renders a histogram visualization of that data. The result is a neatly formatted document that includes an annotated description of our analysis, the code we use, and the output of that quote. So it's your turn again your turn number two, follow the instructions in your number, your turn number two to open a quarter document, review the format, execute individual code chunks, and then render the document. We'll meet back in three minutes to review this exercise together. Hey, thanks so much for the for Sarah saying that I fit my mic is muted I was put myself on mute for 60 seconds and pop back so so once the time is elapsed on the timer we want to get back together and do the exercise together so as Stefan mentioned in the intro the most important thing that we asked you to do when we do the exercise is to give them a try but if you're not finished with the exercise when the timers elapsed don't worry we're going to run through them all together and so just stop working on the exercise individually turn your attention back to the screen share and we'll go through the steps together. So for this, for this your turn in the arm ed cloud we're asked to open, I asked you to open a new quarter documents so file new file quarter document. And you have an option here to title your document at the time that you created or you can, and you can change the title later so I'm just going to we find it's untitled and click the create button. Once I do that, you'll notice that a pain opens on the left to create what we referred to as the environment pain. In the environment pain you can see that the standard building blocks of a quarter document so we have the header up here with some data about the title of the document and the format that we're going to output it in. Then there's a text chunk here, which has headers and some text and also hyperlink embedded and then as we scroll down. You can see that there are our code chunks embedded here and again those are code chunks are distinguished in the document by these gray boxes. And for the your turn we asked that you review the various elements of those three elements the header the text in the code chunk. And then execute those code chunks by using the run current chunk arrow again that's at this green play button here so if I select that you'll notice that the code chunk runs and the output is displayed below it so one plus one is to. And then there's a second code chunk in here, which has some parameters set around echoing and I think it's it's not important that we think about the concept of echoing at this point but just practicing the, the muscle memory of pushing that run code chunk arrow. And then, after we've reviewed the contents of the document, and we've rendered there we've executed the individual code chunks, we can click that render button here and that will run all of the code in the document and compile that need neatly formatted document for sharing so if I click the render button, it's going to ask me to save this. And so I'm going to name this sample document. I'm going to click save. And after I do that the code will run, and then it'll output this compiled and executed document here in a new tab. And so to get back to the R studio cloud working environment you'll you notice that the, the core to document was just rendered as another tap and save so we can close that by xing out and that takes us right back to the RCO cloud environment. Now that we're doing data now that we're familiar with how to create a quarter document, we can begin the process of performing data analysis and are in earnest by importing a clinical data set. This diagram may look familiar to you you saw it in the welcome presentation that step and shared and you'll see it in other presentations throughout the day today. The first step in the data analysis pipeline is to load the data. And then we're going to do the same experiment so we can begin to tidy and transform that data. In today's course will be using a de identified a data set consisting of coven 19 laboratory test results from a microbiology lab. This data is stored as a CSV file. So what's the CSV file CSV stands for comma separated values. The CSV file is a plain text file, which means you can open it in a text editor and look at it. Here we have a CSV file with the names, medical record numbers, dates of birth for three fictional patients. This data structure is called rectangular. Because it falls into rows and columns where each row has the same number of columns, and each column has the same number of rows. I also note that this particular CSV file includes what's known as a header row instead of data in the first row. This includes a descriptor of what kinds of data is found in each column. CSV files often but do not always have such a header row to import our CSV data, we need some additional data analysis tools. We will be leveraging a set of tools called the tidy verse. The tidy verse is a modern set of tools for data analysis in our, and it is like quarter is becoming a de facto standard for doing data science with our, the basic tenets of tidy data analysis include that data should be organized in a consistent standardized way. Each row is an observation and each column is a variable. This is a very common way to organize data in a spreadsheet and will sound familiar to you. If you've used other point click tools like Excel to organize your data. Programming code that acts on the data should be consistent concise and mimic the narrative language as much as possible. The second component is that each data analysis can be broken down into a series of atomic steps, such as select this column, or arrange the data by values in that column. Accordingly an arbitrarily complex data analysis can be broken down as a pipeline of atomic steps. The tidy versus a package, a collection of functions data and help documentation that we can use to extend the functionality of our packages need to be installed explicitly. With the command install dot packages. So let's say you want to install a package named tidy purse, you go to the R console and type install dot packages, open parentheses tidy verse and quotations. Each package you want to use needs to be installed only once on your computer. However, in order to use the functions or data in the packages you may also need to load the package. This is done with the command library. So to enable all the functions in the tidy verse package you type library open parentheses tidy verse packages remain loaded until you quit our so every time you start a new session, you have to load each package that you want to use again. On our studio server, you won't need to install any packages for this course, we've pre installed them for you, but at times in this course you'll need to load packages and we'll practice this activity later. So these are the first two commands that we've been, we've covered install dot packages and library. Once the tidy verse package has been loaded via library, we can import CSV files using the read underscore CSV function. We can also template for how to use the read underscore CSV function to create a data frame object from the CSV file. You start with the name of the data frame object. Then you have this leftward facing arrow, then the read CSV function, and the file name and parentheses. This code structure construct is exceedingly common in our so we want to spend a few minutes exploring it together. Read underscore CSV is a function. Remember that functions are defined in packages, we loaded the tidy verse package to be able to use the read CSV function. You may be familiar with functions from math class a function takes an input, say an x value and returns an output, say a y value functions and computer programming also take inputs and then return outputs. The inputs and outputs here are the arguments and objects that exist within the context of a program programming language for read CSV the input is the file name of a CSV file, and the output is a data frame with the contents of that file. The input that goes into a function is called an argument, the argument to a function gets put in parentheses. The argument can have 01 or even many arguments. If there's more than one argument we use a comma to separate them. And we'll see examples of that later today. The read CSV function outputs a data frame, you can think of a data frame like a table. So we want to capture that data frame inside of a named object, we need to specify that explicitly. It's a great idea to capture the output of a function into an object so that it can be used as an input for other functions, for example to summarize or visualize the data in a named object to put the output of read CS the read CSV function into a named object use what's known as the assignment operator. The assignment operator is a smaller than symbol followed by a dash or minus sign and it looks kind of like an arrow pointing to the left, it's usually read, read as the concept gets. So let's put these pieces of together to load our Kobe data set. This line of code reads the Kobe testing object gets the output of the read CSV function on the Kobe testing CSV file. You might notice that one of the objects is put in quotes and the other one isn't. To be honest quotes can be quite confusing in programming languages. Names of objects such as data frames don't get quotes and contrast literal file names are always put in quotes this is a part of the grammar of coding will refer to as the syntax and are that will become more familiar to you as you develop proficiency and comfort with the language. So final your turn in this session in this exercise will ask you to open in our markdown document and follow the instructions contained in that document to load and explore our data set. You can find this document under the miscellaneous pain and select the folder exercises in that folder you'll see a document that starts with 01 the introduction. So open that document and follow the instructions within to complete the exercise will meet back in five minutes and go over this exercise together. All right, so the five minute timers up. Thanks everyone for for engaging with this your turn so we're going to gather back together and we'll go through this your turn as a group and we can fill in any of the gaps. So this your turn I asked you to in the Miss pain find the folder exercises and open that folder and then within that folder you'll see a number of documents contained and then I asked you to open 01 introduction, which is a quarter document format. Once I click that we'll see that the environment pain launches again, and in that we have the contents of the 01 dash introduction. You can see at the top we have a header here, and then it's followed by some, some texts that describes the analysis that we're going to do and some of the functions we're going to use such as read CSV and so as we read through that. And the instruction is to run the following code chunks you can see that there's an embedded code chunk again that's code chunks are set off from the rest of the page by this gray box, and we can run the code chunk by pushing the run current code chunk. You'll notice that that are provides us with some some feedback here, which is a warning and we'll discuss the concept of warnings later but for the purposes of this first exercise we can ignore that we can ignore the contents of that warning. And you can see that after I ran the current code chunk and now have a data element that's been populated here. The next set are part of the exercise access to inspect the data frame. And so you can see on the far right, we have our object that's been created in the environment. And if I select that I'll get a new tab that opens up that looks very much like a spreadsheet and it's got our data arranged with header row here and some column names. And you can see the individual observations listed below. And the first question that we had was how many rows are in the data frame and how many columns. You can see here next to the, or next to the cobit testing object that we have 15,524 observations. So those are rows and 17 variables which are columns. You can also see that information displayed here. Below the element. The second question, or asks us to try and edit one of the values in this viewer, and you can see that if I try and select any one of these values to change it. I'm able to highlight it but I'm not able to change it. So this is an intentional feature so that we're not able to edit date our data in ways that are not traceable and reproducible. So we need to edit if we're going to edit our data we need to do it with code so that we have a lineage lineage and traceable way to to follow the changes that we've made to our data set. So the next question asks us to look at the pan day column, which represents starting with day zero, the number of days into the pandemic any and so the least, the first results in this table. And we can actually click these arrows up at the top here to toggle and sort in other directions so let's sort it in reverse and you can see the last data element in this row or last observation in this object occurred on pan day 107. We can also add filters to this so there is a filter button right here. And question for access to add some filters so how many overall tests were positive. So if we inspect some of the variables or columns here we can see that there's a result column shows negative so if it's if there's a negative in there I wonder if there's a positive so I can type in POS. And when I do that, I see that 865 of the entries in this table of 15,000 observations were positive. If I wanted to add another filter we can layer filters on top of each other. And so how many tests were positive in the first 30 days of the pandemic so I can then adjust the pan day from zero to 30. And when I do that I get a smaller subset of the data that is both positive and has a pan day between zero and 30, and we can see here I have 137 observations within this total number of observations of over 15,000. To recap what we've covered in this introductory lesson we started by defining and differentiating are the programming language from our studio, the development environment from Korto, which is the document format that we use for reproducible processes. In addition to learning to create and edit Korto documents we discussed some basic coding vocabulary including packages such as the tidy verse which extend the functionality of our. We use the install dot packages function to install packages and the library function to load them into the environment functions do stuff on our behalf except arguments and we can store their output and named objects using the assignment and finally we discussed and practiced importing data using the read score CSB function from the tidy verse and explored this data visually. So at the end of each lesson we've included a what else section to introduce you to what else you might want to explore and learn about after completing this course. This includes helpful hints other functions in the tidy verse and exciting packages you may want to check out. The import cheat sheet helps with the grammar of importing data, including other file types and those with other separators like pipes and set of commas. The data import cheat sheet is available on our studio website. There are a variety of other packages out there designed to deal with different file formats Excel, SPSS Google sheets. There are ones to scrape the web and even handle JSON data. You can also connect to a variety of different databases directly in our to source your data and if you use any databases in your institution, you may want to explore these and check out these tools to make data base connections and are in this lesson we practice rendering our analysis to an HTML file in addition to HTML files Quarto increasingly supports rendering and a variety of other additional formats and even rendering for interactive dashboards is on the horizon. So we're going to look at dashboards and greater detail later in the course. And then finally the reticulate package provides an interface between our and Python so this means you can mix and match code chunks written in our with code chunks written in Python. R does many things very well but no programming language is perfect for all tasks and so it's great that our gives us tools to mix and match languages and different code chunks. So I think that's a great insight to for those of you who are new to coding we do recommend that you focus on single language to develop proficiency before moving on to another language as shortens your learning curve for your first and future languages. And so with that, I will end the first session. All right, I welcome back everybody. And welcome to the second session on data visualization. Just one second. All right. So to get started, let's take another look at the COVID testing data frame that Joe introduced earlier. And you might remember that it had a lot of rows and columns. And so each row here represents a single COVID-19 lab test that was run at the children's hospital Philadelphia. Not really. So this is actually a completely synthetic data set in which none of the entries are actual patient data but it's modeled to represent the underlying pattern of this data that we've seen here a chop since in the first few months of the pandemic. So the pan underscore day variable represents the day of testing started with zero at the beginning of the pandemic which was sometime in late February of 2020 which seems like a very long time ago. So by just glancing at the first couple of rows, you can see a few things right away. So there was only one test done on the fourth day, then two on the seventh, and then three on the eighth and then a whole bunch on the ninth. There's some sec there's a gradual ramp up in testing. Also, if you look at the result column, it looks like this is the first few days. All tests were negative. So now I'm going to ask you with that in mind. I'm going to ask you what what do you think the following plot that I'm going to describe to you would look like. So consider this COVID testing data frame we just looked at and what do you think a plot would look like where on the x axis, you have Pam day, the pandemic day. On the y axis, you have the number of tests that were performed on that day. And so what I want you to do is just to take a few seconds and try to like mentally visualize this graph, or do it on a piece of paper in front of you. And once you have an idea. Click yes to let me know to go on. All right, I see the yeses coming in. All right, so we have a lot of yeses so you have sort of a mental image. So what I asked you to imagine is a plot in which we have the count or the frequency of tests on the y axis. And that's broken down the pandemic day over the x axis and, and this is exactly. Salata, I don't know if that's pronounced your name correctly. That is a histogram. And that is this type of plot. So let's build a histogram of covert test by paying day. So when do you want to ask you to do now is to go into your R studio console, not the editor that we've used so far but the console, and type in this code here, starting with a CG plot exactly as it's written here and want to make sure that you really type it exactly as it's written because otherwise it's not going to work. And when you are done and you actually see a graph. Give me a click on yes, and the reactions and let me know that you've that you've completed this task. 123. Alright, looking good. If you're getting an error, just double check the spelling parentheses are can be can be very tricky and they need, it needs to be precise. Yeah, code is computer code tends to be unforgiving with with misspelled words or or punctuation. Alright, is this what you get great. Actually, when you run this code. You'll get what looks like an error but it's actually just a message or lets you know that when you ask it to draw a histogram. Now you should tell it how wide each bin should be or how many bins there should be because this affects the granularity of the data displayed, but it'll actually just pick a default value for you this is so so I'm pointing this out for two reasons the first is this is this is actually useful message. But, but our tends to kind of present these messages in fire fire truck read and beginning programmers often feel like oh my gosh I did something wrong this is an error. If something is an error in our it will always say error. And if it doesn't say error then it's a message or a warning. So this is really just a message. And the second thing so if you see something that is fire truck read. It doesn't necessarily mean that that you did something wrong. The second thing that I want to point out is that oftentimes are you'll has sort of tries to do tries to come up with a reasonable default that will work for you. And then you'll have to complain that something is missing. So is so so so even though we didn't tell the histogram, you know we didn't tell our in this case that we wanted to have this and this many bins. It's still actually created a graph for you but then it also complained about it. It's like yeah you should really tell me the number of bins or you should tell me the bin with. This is a useful message, and it's, it kind of gives you sort of a sense for how our kind of vibes when you interact with it. So when I asked you to imagine what this plot might look like then the number of covert tests that were performed on a given day. Over time, you might have imagined something like this. So you have very few tests that are being run and then maybe because the pandemic hasn't really started yet and then also maybe because the test isn't broadly available. At some point the number of tests ramps up significantly and it then remains at sort of a high level. But the simple visualization tells you so much more than that general shape or these general few ideas. You can see for example that by 30 days. You may be expecting ramp up settles, or it tells you a few things that meet you may not have expected and may want to look at later like for example there's something going on with test volumes going up and down. You know after after the 60th day here, and none of this and none of this would have been apparent from looking at a data frame or spreadsheet with 15,000 rows. The point of this exercise and it may seem trivial but it's actually quite profound is that is that is that visualization really is what is one of the main engines of knowledge generation visualization is one of the main tools. You have in your tool belt as a data analyst to understand what's going on with your data. So, and if you don't visualize your data. You might have some ideas of it. You might have some ideas about it but usually they're incomplete and oftentimes you're wrong. So, so you really want to have, you want to be able to visualize data quickly. And this is where a plot comes in. So ggplot is a package for making for creating graphics in in our it's part of the tidyverse. So it will get loaded when you load the tidyverse package with library tidyverse and ggplot provides a grammar of graphics for data visualization. And the idea that something that something has a grammar for something is actually should pretty common and are, especially in the tidyverse. So this idea of grammar gets thrown around a lot and we'll hear it later today. But essentially the idea is that there should be a consistent way of doing something and for ggplot is that there should be a consistent way to build any type of graph. And having a consistent way to make any type of graph makes it easier to learn and also easier for humans to read the code later and make sense of it and that's super important because most people who use are are not programmers they're they're not primarily you know they don't spend most of their time in a code editor. The idea for grammar of graphics is basically boils down to this idea. You should be able to specify any type of graph by by specifying the data that goes into it, the type of graph that you want to have, and a mapping that explains how the data from the data frame should be represented as visual marks on that graph. And having a consistent and we'll go back to this definition in just a second, but having consistent grammar means that once you learn how to make a histogram. You can apply that knowledge to make a scatter plot or a box plot with little extra effort, you don't have to learn how to do any kind of like all kinds of different plots, using completely different syntax. So this makes it easy to generate lots of different graphs quickly, and it helps you to understand your data more quickly. I wanted to point out that ggplot graphs look great in the in I use them all the time for for reports and you can you can make generation you can generate publication quality plots with ggplot. No question. So, um, So, so here's a quick analysis of the code that we just typed into the console to make that histogram. And you can say that you see that we give it a data frame. In this case, our covert testing data. And we specify the type of plot, or the geome in this case a histogram. And we specify an aesthetic mapping. In this case, we're saying that we want the x axis to represent the pan day variable of the covert testing different. In the histogram, I don't have to specify the y axis, because in the histogram the y axis is always the count of cases, which for covert testing is the count of tests in a particular bin. And a few additional details about this about this code are you always and ggplot you always start the plot with a ggplot function in invocation of the ggplot function to connect the ggplot function to the geome function. And the plus sign, which usually you put at the end of a line, followed by a new line, and that all the mappings, and we'll talk in more detail what mappings are but all mappings go inside of the AES function. AES is for aesthetic mappings. A lot of information. Let's try to consolidate this into a more general template that you can use to make your graphs. So here is a template for making any kind of graph with ggplot2. So you start with the code that's written in black here so black is, you know, the constant part, and you fill in the details that are written in blue. The first detail is a tidy data frame, and this contains the data that you want to plot. What do I mean by tidy data frame? So the idea of tidy comes up a lot in the definition of the tidy verse, but here's the definition of tidy. So a data set can take on a lot of different shapes, but there's one in one shape that is best suited for data analysis and that shape is called tidying. And a data set is tidy if number one, each variable is in its own column. And the elevation is in its own row. And each value is in its own cell. The opposite of tidy is often called messy. And oftentimes, a lot of the data analysis work is to convert messy data into tidy data. But for now, fortunately for us, the COVID testing data set is tidy already, because it conforms to these and complies with these requirements of each variable in its own column. And today, as a column, each observation is one row, one observation is one lab test, and each value is in its own cell, so we don't have any cells that have multiple values in them. Okay, so we pick a tidy data frame, that's the first step. The second step is to pick a geom function. And this is how you tell R what kind of a plot you want it to make. So let's go into more detail about what geom functions are, but for now, just know that you need to touch the plot with type of graph you want and you do that by picking the right geom function. And here are a couple of useful geom functions for visualizing the kind of data that we see in, you know, clinical practice, but there are many more. In the next six, you can make histograms can make bar plots, scatter plots, dot plots, box plots and line graphs. Okay. So the third and last so we picked a tidy data frame we picked a geom function. The last step and the third and last step is to write what's called aesthetic mappings. And this is where you tell R how you want the columns of the data frame represented as graphical markings on the plot. So what's an aesthetic and what's an aesthetic mapping an aesthetic is a thing that you can perceive about a specific data element on a graphic. It's position on an XY grid, but also other features like for example it's color and aesthetic mapping is how you tell R how you want to represent the columns or select columns of a data frame on the plot. Okay, let's look at an example here, consider this data frame with three columns, a B and C, and this XY grid here. This is my graph. And so, so this aesthetic here. This aesthetic mapping here defines that the X value or the graph should come from the graphical marking should come from the A column, the Y axis position should come from the B column, and that the color should come from the C column. So in order to get the following graph, which visually encodes all the information from these three columns in the data frame, and are automatically figures out things like access limits and the color scale, and you can manually fine tune this. Again, another example of R coming up with sort of a reasonable default, and, and then you can, you can fine fine tune the heck out of it. So, and so this is what aesthetic mappings are, essentially a list of these kinds of, you know, equations that that connect features of the graph with with columns from the data frame. So aesthetics are powerful and fundamental concept in the grammar of graphics so so I do a quick your turn to just explore this idea and consolidate this idea a little further. So, so we looked at XY position and color are there any other aesthetics that you can think of, and I want you to type your answers in the chat. Yeah, shape shape is definitely an aesthetic. Not sure what you mean by sex, I think, I think sex may be a column and a data frame that you're thinking of but that's not an aesthetic group as an interesting idea I'm going to come back to that later with. Yeah, with could be an aesthetic. Order and fonts are not actually aesthetics and I'll explain why. Phil, yes. Size shape Phil. Yep. Okay. So, alpha that's a good one. All right. So, so this is great thank you so much for for these answers. And there's this actually a lot of different types of aesthetics, and they also depend on the kind of plot you're making for example for a line graph. You can define the line width and line type. And for a scatter plot, you or any kinds of dot plots you can define the shape of the dots. And, and so. So, so some things that that that you guys have pointed out, like a font. That's, that's not a feature of specific data elements that come from your data come that says that's a feature of the overall graph. So this is not this wouldn't be considered an aesthetic. So that's where this, this is where this is a subtle definition so an aesthetic is really something that that can be encoded with with values from a column. It's not going to be categorical or, or, or, or, or continuous. Okay. So, so, picking the best aesthetics that aesthetics for your graph is as much an art as it is a science and, and I'm going to have a recommendation for a reference, which is going to be a great introduction to the topic of how to how to create great graphs. So that's one of the fundamentals of data visualization, which I'll talk about at the end of the session. All right, so, so let's, let's recap. To make any kind of graph, you start with this template, and you, you fill in the blue stuff. So you choose a tidy data frame, and this contains the data that you want to plot. And this is the type of plot you want to create and then you write your aesthetic mappings. And this is where you map data columns to position color and other features of the graph. Okay. Let's, let's, let's go back into the training environment and open the oh to visualize quarter document document and I want you guys to work through the exercises of the section titled your turn five and there's going to be a big block that says stop here I want you to stop there and not keep going because that's going to be for later. And if you're we're going to we're going to, I'm going to give you guys five minutes to to work on this and then we'll come back together. And I'll life code things. If you're done early. Give me, give me some feedback about this and by clicking yes when you're done. All right, I'm going to start in a timer now. So eight times up. Let's, let's all come together and look at the and look at the exercise together. So here I'm loading visualized QMD. Okay, so this is our markdown document, your turn five, and I'm being asked to run the following code chunk, which is all stuff that you've seen before we're going to. We're going to load the tidy verse and we're going to load another package called lubricate and this is a package that that helps with. It helps with formatting dates. And then here we have covert testing gets read CSV. covert testing dot CSV. This is all review, and we have the covert testing data frame in the environment so this worked. The first thing we're being asked is to recreate the history of pandae. And so we're basically retracing the steps of filling out the ggplot template with the three pieces that that that we need to put into it, the data set which is going to be covert testing the GM function is going to be GM histogram, and an aesthetic. In case we want to make the x axis, the pandae variable. So, so what I have here is is sort of a fill fill in the blanks kind of exercise, and which actually makes it really easy for me to fill things out. So, my data set is going to be covert testing. And, and here I actually wanted to wanted to point something out that just happened. So the, the R studio editors actually really, really tries to be helpful when you're coding. So, so I just started typing cove it, and, and this little little box here shows up. Now, this is, this is the R studio editor saying hey, I think, I think I know what you're about to type because there is an object in my environment that also starts with that. With those with those letters. And so what I can do now is I can hit the tap button, and this will auto complete my my variable and auto complete is very powerful. Not only because it can speed you up but also because it can reduce errors. And, and so so I highly encourage you to use auto complete a lot when you, when you, when you, when you get, when you get something offered by the editors like, like, like auto completing the, the name of data frame, hit tab to complete it. Okay. So, so this is now we pick the data set. The next next we want to choose the GM function. So that's going to be GM histogram, and that goes down here. So, okay, what just happened. This is actually another reason for why auto complete is so awesome. And you may have wondered why all of these GM functions all start with the same prefix. And the reason is auto complete, because what happens when you start typing the name of a function. So we're going to get the R studio editor will show you auto complete options, not, you know, in this, in this case there are many options and it's smart enough to know that right now we're not looking for a data frame we're looking for the name of a function. And, and, and so, so I can start, I can start writing things and then it'll actually file down to the final to the only function that it knows off that makes sense in this context. So, so I can get to the point where I can, where I can, where you only have to write geom his and can hit tab. But the other thing that's so cool about this is that in addition to the auto complete box, you'll also actually have a small help window, and the window tells you, not just kind of what the options the arguments are for your function that that you're that you're typing here but also the beginning of the documentation. So from this, I can, I can find out that histogram is to make histograms and frequency polygons. And it starts telling me how to how use this function is suggested I press F one for additional help. Auto complete is, is completely powerful, not only to speed you up and to make fewer errors, but also to discover functionality without having to look into inside of a manual or looking things up on the way. So I really wanted to remember to auto complete things. Okay, now we pick the data set we picked a geom function. Finally, we want to we want to write in the static mapping. So the aesthetic mapping here is goes inside of the as function, and we want to set we want to map x to pan underscore day. Okay, so let's see, let's see what happens. Oh, okay. So we get we get a histogram and maybe a little bit less. A little bit less obnoxious than when we wrote this in the console. Because it's not on fire truck read we get the same mess from from our that we're supposed to use. We were supposed to pick our own value for how many, how many bins to use or what the bin width should be of this histogram. So, so that's actually, that's that's actually a good idea and we want to want to, we want to do that later so so in this next exercise, we want to fill in the code to for for for this modified modification of the previous function where we actually specified bandwidth. Okay, so, so here we just want to type in testing. And we have GM histogram. X equals and restore day. And All right. So, so what is the bin width that I'm supposed to use. If I want to get daily test volumes. Anybody type in a chat. Okay, yes, so I see a lot of ones and that's correct. Because, because the variable that we're binning is pandemic day. And so, so if we if we look at one day at a time you get daily test volumes. So, so let's see what the effect is of doing this. So the effect of reducing the bin width is is actually that the data is in much more granular detail. And the texture also this this interesting pattern here all of a sudden. So let's see what I want to venture a guess and type in the type in the chat what they think these these troughs may indicate. Yes, exactly those are weekends. And so if you wanted to confirm that these are these are weekends you could like one simple way would be to count the number of you see whether whether these with these happen and in steps of sevens, yeah, which which they do. So, so, so that's that that's it that's exactly right so this is this is very typical for for a laboratory data, you know, we just have less volume on the weekends. And so we can totally expect this. This this pattern. Okay, let's next add some color. And so here we now asked to copy and paste the previous code chunk. And this is, I didn't write this exercise this way because I'm lazy and didn't want to write another skeleton of a code chunk. But because copying and pasting code is something that you want to do all the time when you're working and arm and you're when you're working with code. So copying and pasting even though is is is very simple is exactly the same way that you're doing in the Microsoft Word so I'm, I'm highlighting. I'm highlighting everything that's in the great background. And you can go to edit copy or you can hit command C or control C if you're on a on a Windows computer. And then you can go to edit paste. All right, okay. I was able to do it with a keyboard shortcut. And here I have a new code now. All right, now. Now that I copied and pasted this previous code chunk I can make a modified version of it. And I'm supposed to add an aesthetic mapping with maps to fill aesthetic to the result column. So, so remember for this, that when you have a function, and you have more than one argument, then this list of arguments needs to be a comma separated list. So, so I need to write comma, and then bill equals result. So two arguments pass to the AS function here. And this is what we get. So we have our automatically pick a color scheme as maybe not the greatest color scheme but it automatically picked one where our positive tests are in blue, our negative tests are in green and the test results were invalid in red. All right, let's spend the next few minutes on an important conceptual distinction about how you can define aesthetics in your plot and actually this has come up in the chat already. So, so this is a great segue into this next section. So consider this plot. It's the same as the one you've created at the very beginning of session right, except the bars are blue, not black. The difference is the fill aesthetic because we've played around with a phil aesthetic just now we've used it to to map it to the to the. That's okay. So we mapped the phil aesthetic to to the results column. But this isn't really a mapping right because, because, because all bars are the same phil color. So they don't represent the values of a variable in the data frame, instead we're setting it to a constant value, the color blue. Let's see how we like how we can do this with GT plot. So in the, in the, in the, in the, in the exercise we just mapped the, like I said we just mapped the phil aesthetic to the results column by rating phil equals results inside of the AES function, which I'm highlighting here in red. So the general rule is that if you define an aesthetic inside of the AES function, then that aesthetic gets mapped to a variable or to a column. So inside of AES map it to a variable. So the aesthetic is defined outside of the AES function, as in this case here. So we have a AES function, we only pass x equals pandy to it, but then we add a comma and, and define the phil aesthetic inside of the GM history function but outside of the AES function. Then it gets set to a constant value like blue and R knows a lot of different colors by the English name. And one, one caveat color names are one of those things that need to be put into quotes. So the recap, setting versus mapping aesthetics. On the left here, we're setting aesthetics by defining it inside of the AES function. So you see the close parentheses is after the phil, just like we're defining it inside of the AES function. And that leads to a mapping of the, of the, of the aesthetic to a call to a column in the data frame. If we define it outside of the AES function, but still inside of the GM function, then the aesthetic gets set to a constant value. All right. That was setting versus mapping. Very important. Let's talk about GM functions next. So we briefly looked at GM functions earlier and you might now appreciate how that makes it so easy to switch one type of graph for another. But let's dive a bit deeper into the concept of GM functions. So let's consider these two plots. How are they similar type the answer in the chat. How are these two plots similar. All right, same data. Exactly. Yeah, both display distributions. Yes. Same X and Y same axes. Exactly. So if the axes are the same data are the same. What's different. What is different that on the left, the data is shown as a histogram. And on the right it's shown as what's called a frequency polygon. So a GM function is is a function that given the data and the aesthetic mappings, which are defined with the axes on generates a geometric object to represent that data. So let's go back to to the to the to the quarter document and work through the exercises of the section titled your turn six all the way through the end of the quarter document. Again, let me know if you're done early will will give ourselves five minutes to work on this independently and then come back together in life code. All right, let's come back together and look at the rest of the exercises in the quarter document. All right, so run the following code chunk. Easy enough. Okay, so this looks like something we've seen before. Okay, now, now try to figure out how you would modify the closer draws a frequency polygon instead of a histogram. And I'm not going to. I'm going to re type off that code I'm going to because because my my my gut tells me that it's going to be very similar to this code that already have. And actually specifically I think I'm just going to have to use a different GM function. But I actually don't know which what what the name of the GM function is to make a frequency polygon. So what I could do is I could, you know, Google it. What I could do instead is I'm going to see if I can use autocomplete to to to guess the correct function so I think. Maybe the frequency polygon function is going to start with an F and it's like frequency polygon or F polly or something like that so I'm just going to type F. And here I already see something that might fit the bill GM underscore freak polly. And, and that is indeed the correct function so this is just to demonstrate one more time. How powerful autocomplete is for discovering functionality that that that you want to use and how our studio really helps you be efficient this way. Okay, so let's see whether I actually did this correctly yes I did. So here we have our frequency polygon modify the previous code chunks so the line color is blue. Okay, so is if I want the line color to be blue is that a setting or a mapping type it in the chat for me please setting or mapping. This is a setting and why is it a setting. It's not dynamic. It's the same for all the data. It's not the different color for each different, different, different outcome. Yeah. So that's exactly right. And I'm setting it to one thing I'm setting it blue. I'm not setting it to some kind of a variable. Perfect. Okay, great. And then if it is supposed to be a setting does it go inside or outside the as function. Thank you Victoria. Thank you are asked to perfect. So that will be color equals blue. Let's see if that works, and it did. Okay, awesome. But inside GM function. Yeah, that's exactly right. Okay. And what do I think the following code will do. Try to predict what you'll see. And. Okay, so what this does looks kind of weird. This looks kind of like a Frankengeom a Franken gg blood function right because we're, it looks like we're grafting a second GM function on top of the first GM function. And actually that's exactly what's happening here, because we're having okay so first we, we have a GM histogram, and then there's a plus afterwards, and a GM freak pulley afterwards. And what we see is that in one plot we get an overlay of of these two functions. And this is actually a super powerful concept for gg plot that you can compose arbitrarily complex plots by putting pieces or layers of them together. So we're getting different kinds of GM functions or other types of functions. I'm going to talk about a little bit at the end of the session to create a very, as we said earlier to fine tune the heck out of the graph. All right. So, but let's first recap what we've done already at this point and I know we've gone through a lot. But the slides are going to be available for you and I'm going to have further resources that I'm going to suggest at the end of this so if you feel a little bit overwhelmed right now that's totally fine. And I promise it gets better with practice. But let's recap what we talked about in this session. We talked about gg plot to this is a package and our package that provides a grammar of graphics. So the idea of gg plot to is that you should be able to create any type of plot using a simple template to which you provide a tiny data frame and the tiny data frame is one in which each variable is in its own column, each observation is on each value is also a GM function, which tells our what kind of plot to make and number three, aesthetic mappings and these tell our how to present data as graphical markings on the plot. And we talked about how aesthetics can be mapped to a variable or sets to a constant value. All right. In this last part of the session. I want to show a few additional things you can do with gg plot without going into too much detail to save a plot that you've created on the console. You can go to the plots pane in the, you know, in the MISC. There's a plots tab in the MISC pain in a miscellaneous pain, right in the bottom in the on the bottom right of the R studio window, and then you can click on the export button and save as image. And this this way, this way you can, you can get graphs that you that you typed in a console. So I hope that that you'll make most of your graphs inside of an R markdown or into or inside of a quarter quarter document. And if you do that you can still save that image as on your computer by by right clicking on it and clicking on save image as so this is how you can manually save plots that you made. For example, to put them into your PowerPoint presentations. So we've only barely scratched the surface of what you can do with gg plots. For example, you can change how overlapping objects are arranged for example, we we we've been working with a stacked histogram, but you can make side by side bars also. You can use different themes and these affect how non data elements such as axes and grid lines and background appear. So here we have a different theme, all the data look exactly the same, but here the background is different. You can customize your color scales. And, and to be perfectly honest with you, this is a little bit more complicated than it that it needs to be but if, but but the, there are lots and lots and lots of examples for how to do this by simply color customized color scale gg plot and you'll get you'll get a lot of code examples that you can use. It is a little bit less user friendly than it should be. You can pass it to your plot. That means breaking into sub plots by another variable, for example, gender or location in the hospital. And this is a very powerful way to to to visualize sort of high dimensional, high dimensional data. You can easily make radial plots or geographical maps, and you can add titles and subtitles annotations and you can change the access labels or their appearance or their fonts, all of that is possible. And all of these elements all of these many manipulations. Like position adjustments or themes or color scales facets coordinate systems. The text added to your gg plot. This can be can be added by in the same exact way that we added a second layer to this Franken plot in the very last exercise. So by writing a plus sign at the end of your gg plot. After your GM functional after your last gplot command, and, and then, and then following with a theme function or scale function or a facet function or coordinate function, and, and all these functions are part of the ggplot package. The ggplot to teach heat, which I've linked to in the resources is is nice to have in hand when you're exploring your data. And here it reviews the basic template that for building any type of plot. So here we have data GM function and mappings, I didn't make this up. This is this is canonical. You know teaching about ggplot that were promulgated promulgating here. And it also lists, it lists useful GM functions here for for you know for one variable to variables, and so on and so forth. So this is this is good to have on hand. If you like to learn more about what kinds of graphics are the most effective in specific situations. I recommend looking at at fundamentals of data visualization by clause will key. This is a very recent readable and recent primer on data visualization or figure design, and it's available for free at this address shown on the slide, and there's no need for you to write it down because I've also listed this in the resources on the course website. The serve minor package extends ggplot to make it straightforward to create survival curves and risk tables. And the GT package provides a grammar to to create a not grasp but display tables and these are tables that you might want to show in a publication or on a regulatory submission, and the GT summary package, which makes a trivial to generate publication ready tables from tidy data, complete with it and makes things like computing summary statistics. Super trivial. In fact, dance your work the creator of GT summary taught a GT summary workshop at our medicine this morning. And if you missed it, you'll be able to watch a recording of it on the arm that will be posted on our medicine website in in about two minutes. So, so definitely check out GT summary. If you're interested in making a table that looks like this. All right, welcome back everyone. Everyone can see the PowerPoint screen. Yes. Awesome. All right, so. A little bit behind schedule but I'll do it can to catch us up so for this next section. I'm going to cover. Really, the kind of active tidying and transforming data and really thinking about when we have a data set we don't always receive it tidy that this data set we have happens to be tidy so there won't be as much transformation or pivoting of that data to make it tidy. There are some additional work that we tend to do frequently to get our data set ready to visualize and model. And so, we're going to focus on some functions in the deep layer package, which is really a package that's intended to cover data transformation activities and a lot of bread and butter ways to transform data frames. And these, the nice thing about this package is that they, the functions in deep plier are, they, they use a consistent syntax or grammar for transforming data frames. And a lot of the these functions are actually borrowed from concepts in structured query language, or SQL, which is a common accessible language for querying databases. And so you, you know, if you if you're familiar with SQL you might actually recognize some of the names of these functions that offer equivalent functionality. And so the first function that we're going to talk about is select, and this graphically just shows a nice quick illustration of how select works. This is essentially a mechanism to extract specific columns from a data frame. So when you use select on a data frame, you expect that you're going to have the same number of rows in your output data frame, but a decreased number of columns. And the syntax is very straightforward. You call the select function your first argument is your data frame that you're going to transform. And then your subsequent arguments are any of the, the names of the columns that you wish to extract from that. So in this case we are taking our covert testing data frame, we're applying the select function to that. And we are specifically pulling out our MRN and our last name variables or columns from that data frame. Take a look at an example of our covert testing data, data frame, we apply select to it. We select MRN and last name. This is the resulting data frame from from that. So quick, quick quiz here you can answer in the chat, which of these would successfully select the first name column from our covert testing data frame. Okay, great. I see lots of rapid fire bees. That's correct. And so we have select our covert testing data frame and then first name notice that the, the spelling is exactly the same same case as as what we see in our data frame here for bonus anyone want to guess what see here does in the chat. So if you put a minus in front of that first name, what do you think might happen. So that would eliminate great rate so I see Solana. Many other folks are chiming this would remove that first name variable so that's a nice shorthand way if you want to just drop a single column you can use that that that syntax there. That's a really quick one to cover really easy one. And that's just that that's a nice convenient function where you know if you have especially if you have a very wide data frame. Maybe you have maybe you pulled something from your electronic health record, and it is 75 columns long and you can use those columns you want a more compact representation select is a is one function that you can use to to really trim that and and work with a more manageable data set. And just quickly I'm seeing. Yeah, and you can, if you take that select function and you use the assignment operator that backwards arrow put that new new data frame, then that can store your output into a new data framework with you know from that point on so that's a very kind of that's a nice convenient way to kind of decrease the size of your of your data frame. So filter is the next function that will cover that is that is a little bit more interesting and so the idea behind filter is that you can extract specific rows from your data frame that that meets some criteria and so the output when you use filter is that you expect to have a data frame if you were to you take the output from filter and put it into a new data frame to a new object, you'd have a output that has the same number of columns, but a smaller number of rows. And the way that the syntax works is that you like other functions and d plier, your first argument to filter is that data frame that you want to transform, and you follow that with some logical test, and the kind of the way the way this operates is that you evaluate for every single row, whether that logical test you put the filter argument is true. And if that is true, it will return that row it'll bring that row over into your, your output. And as you can see here, you know, conceptually you have this data frame, only some of the rows meet the criteria that you that you put in there. And only those will be taken will be output from the function and could be put into a new data frame. So an example of a type of logical test you might apply. You might want you might have a column name, and you might have some specific criteria and you're looking for a quality. So you would use two equal signs here as a logical test to say that, you know, the column is equal to this criteria. So we're going to make that a little bit less abstract. We have our filter function here. We have our cobit testing data set as our first argument. Our second argument is a logical condition so MRN equals equals so this is a logical test here. This set of numbers. And so what I was going to do when, when you execute this is it's going to go through and and evaluate every single row, determining whether this condition is true or false. And then if it's true, it will carry that over into the output. You can do the same the same strategy will work for characters if you want to look for a quality for a specific string. And so in this case, we have our cobit testing data set our log our logical condition here is last name equals to stark and note that this is a string and quotes, stark is not something that are will recognize it is a it is a string and so you're putting that in quotes and evaluating whether last name is equal to this value. And this would be the output for this specific data frame, this is the output that you would expect from evaluating filter here. Equality is one logical test, there are actually multiple other logical tests that you can use if you're evaluating a setting conditions for a set of numeric values so this might be very common. If you are looking at lab data for example you might want to see whether values are outside of the reference range, for example you may use less than and and greater than to be looking for values that fall outside of some range. We covered equals, you know less than or equal to greater than equal to, and then not equal to is also another type of condition you can use there's actually some additional conditions that that that can be useful for this as well but you know if it's a logical test in R, it's a candidate to be put into a filter function, as long as you expect that there's an output that is that that kind of results in a true or false for that logical test. All right so another kind of quick quiz here for this, this function, which of these would successfully filter the cobit testing data frame to test with positive results. We're testing lots of, lots of C's here, right so we have filter cobit testing result this is a set of strings, you know we're matching it's a string here so we put quotes around the positive here to look for quality to this so great great work everyone. Any questions about select and filter those are kind of our some of our rapid fire functions. So we transition into talking about pipes. So if I can just point out that do I made a great point mentioning that that select and filter don't actually change your object if you look at cobit testing before and after you ran a selector filter command and it still has the same number of roasts same number of columns. Correct yeah that's a and that's a that's a great point so one of the keys here is that when you are using the, when you're using these functions if you want to do something with the output you know those these out functions are going to output something, they're not going to act on that on that original data frame you want to store that output and do another object. This is also a common pattern as we get into the next section where when we use pipes and we arrange these functions we put these functions into a order into a sequential order. And very often we'll take the output of that set of multiple functions and put it into a distinct object. All right and then what is this the syntax to use select and filter one chunk where Samuel will get to that actually right now in the next section. How can you select, how can you select distinct values. So I think that is, if you're looking for a set of. Yeah, that exactly step and so step in the bringing up that there's a distinct function that you can use that that can pull out distinct functions that are distinct values, based off of some set of variables. Alright, so let's move on to talk about pipes and just quick out of curiosity quick show of hands. Our quick yeses show of yeses who uses pipes routinely or who's familiar with pipes and uses them or knows how to use them. See a few. Yeses here okay so we have, we have a quite a few, although not everyone is familiar with these so this is good. Good topic for discussion here. For d plier functions and for many other functions. It's actually it's pretty common to want to be able to to put these put multiple functions together and execute them sequentially in some order. And, you know, take the output of that, and this is kind of if you're familiar you think about things you're familiar with bioinformatics you think about kind of genomic type of data. It's, you know, commonly referred to in that context is like a pipeline. So there's a set of sequence of activities that you're doing to a data set that's going to result in some output. And so our actually has a very similar construct using a pipe operator. And so the general idea behind this operator is that it takes the result or takes the thing on the left side of the pipe, and it puts that into the first argument of the function on the right side. And so in this example, you know, we are referring to the filter function, we have a cobit testing data set, and we have some logical condition here. And as opposed to putting cobit testing into the filter function, we can actually put that in front of the filter function and pipe that into the filter function, as shown here so this is the original way to call the function. This is another way to call this function. And you'll see the value for the of this shortly but you know this is these two function calls are equivalent cobit testing is being piped into the first argument here. And so if we, for example, wanted to actually look at multiple functions in sequence. We might have our cobit testing data set we're interested in filtering to just the results that were within the first 10 days of the pandemic. And then from that we might just be interested in looking at the clinics, just selecting out those clinic variables that clinic name variable to see the clinics that were that were ordering tests in that first time period. And so this is, you know, this is the confusing thing about this layout is that you're, you kind of have to work from inside the set of functions the nested functions outwards and it's not very easy to read it's not as intuitive as to what the order of operations here is unless you're very familiar with And so instead of nesting the functions in that way an alternative is to use these pipes, so you can take your data frame pipe it into one function. This case the filter function and then the output of this can be then typed into another function here. So this is, you know, this allows you to kind of sequentially lay out multiple functions here we're just focusing on these dplyr functions but you can you can imagine actually additional data manipulation steps that are chained together here, using these pipes and this is, you know, this is a very easy to read because you can kind of sequentially follow what's happening with the data set across each line. So just one word of note with in very recent versions of our I think our 4.1 and above, you may see a newer pipe this is actually a native pipe. Historically, our did not have did not have this concept of a pipe as part of the base our language. And so there was a package called the gridder that was incorporated into tidy verse that developed these pipes that you know you called for the call for these pipes. But since then the our language is actually recognized the value of this concept and has has developed a base our pipe that is looks a little bit different it's two characters instead of the three here. And but it is equivalent. And there is actually a shortcut to type this. Whether or not you have a native pipe or a or the kind of the original pipe might depend on your our version but if you do I think it's command command shift and if I remember correctly is the shortcut you can use to put that in. So, let's do a quick quiz here. Which of the following is equivalent to this. This line here. Great, so I'm seeing multiple days here. Great job everyone so this is data, we are piping that into select that means that when we use the pipe, the what you see on the left hand is the first argument to this function. So that that pipe concept is useful because that also now we can kind of not only think about trimming up our data set or, you know, carving out specific rows or specific columns, we can also think about adding columns or expanding our data set as well. And we can do this again in sequence with other activities using that pipe function. But the function that we call to create new calculated columns is called mutate. And so the output that we expect when we call mutate is that we have a data frame that has the same number of rows but we increase the number of columns, and that column the column, or the columns that we create are depend or the result of some variation that we will provide. So the syntax here I've transitioned here to the pipe syntax we're going to take our code that take testing data set, we're going to pipe that in so it's the first argument or the data frame to this mutate function. And within the mutate function our argument our second argument, or argument after the data frame is we're going to call out a name for a new column. Assign that new column using a single equal sign. Remember this is distinct from the double equal sign when we're testing for equality this single equal sign is more like an assignment operator. And then we are going to call some calculation that's going to tell our, what we put into this new column that's named, you know what we put here on the side of the of the equal sign. So a very straightforward example with our testing data set we haven't really talked about some of the, the turnaround times that you might be seeing in the, in the data set but each row of our data set represents some test order, and some results, and each of those results has some turnaround times associated with them expressed in hours. And so we have a collection to receipt turnaround time and we have a receipt to verify turnaround time and for those who are not as familiar with the lab terminology, collect receipt just means, you know from the time that the sample was collected to when it actually arrived in the laboratory and receipt to verify means the time that it was received in the laboratory to when someone signed off on that result and you know kind of sent the result out the door for whoever is going to be using that result. And so, in this case we are using mutate. We, and we're going to create a new column cr tat men's. And that is going to get the output from this calculation. Collect is basically our collector see turnaround time or call rec tat time 60 so this is expressed in hours we're just multiplying this by 60 to to create a column that expresses that that same concept in minutes. And again, with this mutate function, this calculation, you can think about it being applied very similarly to to our mutate or our sorry to our cell to our filter function in that it's going to do this calculation it's going to evaluate this for every row. So the difference is as opposed to selecting or pulling out specific rows, based off of the evaluation here we're we're using the calculation to create a value for every single row so this is going to sequentially go through and perform this calculation for every row, put output into our variable or a new column here. So we've covered multiple d plier functions, we're going to actually kind of pull it all together and go through a set of small set of exercises in our quarter document for this lesson so go ahead and go into your pre transform quarter document and work through that first set of exercises and stop when you're when you end of that first set of exercises. Go ahead and click yes when you're finished I'm going to start the timer here. All right everyone time is up so circle back. Just give me a confirmation that can see my our studio workbench screen. Yep, but maybe you can blow it up a little bit. Okay. So I'm going to open up my transform quarter here, jump down to your turn one. And I'm going to use these functions here so I'm going to filter test to clinic to the pick you location. I'm going to select the column with a receipt to verify turn around time, as well as the day from the start of the pandemic. And so again we're using our pipes here. Our data set is cove a testing so I'm going to put that here into my, the start of my pipeline. And for filter, I'm going to use my logical condition, logical condition here so what am I putting in here, any input from the group as to what goes into this filter function. All right, thank you, Krishna Yang. So clinic name. You can see, you know as I'm doing this there might be some auto complete to equal sign quotes, pick you. And then I'm going to select, you know, just these columns that I care about so I'm going to reverse the order a little bit here. I'm going to put in pan day, see verify turn around time and I can tab to auto complete if I see the right, you know the right variable coming up. I can run this and take a look and you'll see that I have these, you know I have, I've selected these columns this looks like it's correct. I'm not sure because they don't have my clinic name here. So one other trick when you're troubleshooting these pipelines you're going through and you want to evaluate the output at different stages is you can actually highlight just the part of the code that you want to run you don't have to run the boot chump. And then you can hit command return on a Mac or control enter on the Windows machine, and just run that, and, you know, go through and see okay did you did this part of the, the, the code work. So I'm going to look for that clinic name and I'm going to see okay this looks correct I have looks like every entry has pick you which is what I was going after so that looks right. And, you know when I run this I can also see there were 261 rows. I run this, I just have these columns, and then I have 261 rows here as well so this looks like this, the output of this is correct. And I was able to kind of string these functions together to get my output that I was interested in. So the second part of this exercise, we are going to use mutate, and we're going to create a new column called total tap total turnaround time that that really is the sum of that whole you know both of those intervals collection collection to receipt and receipt to verify. I'm going to actually store this output in the covert testing function and into a covert testing object and view the data in the in a new tab. So, at the beginning of the pipeline here. I have my covert testing object. And I'm actually going to write the output from this code back into the covert testing object. And this is not necessarily recommended but in this case I'm not doing anything to to really fundamentally change this data set. I'm adding another column to this. And so then for this mutate function what am I going to put here I'm creating this new column called total TAT. What goes in here if I want to calculate the total turnaround time. So I see, and David. So I'm going to sum up those. And then I'm, you know, also just put put this in here so as a quick little tip sometimes you may not just want to see the output in line below. You may want to actually look at the whole data frame using the view function so this view function is actually equivalent to if I was to click this table here to pop up the, the data set to view it so I'm going to run this chunk and verify that I have, you know, my data set here, and I'm going to scroll all the way to the end I'm going to see that you know mutate has created this new total turnaround time and if I just do a spot check I can see okay this looks like it's the sum of these here and so this is, you know this is kind of the intended outcome that I wanted. And so I do see yet has called out that some didn't work and so you in this case, we are using mutate to to evaluate or to create an output that is row by row so every row is going to have an output. So this is typically something that we use when we want to within a column, sum up all the values within that column, and that actually can be handy for kind of using the summarize function which we're not going to cover today, but is a, another, that's kind of one mechanism you can you can use some in the context of, in this case, for mutate because we're doing that calculation row by row and we're taking, you know, two columns and adding those values together, we're just using a simple addition operator here. Alright, so in the interest of time, we're going to kind of stop with going over new functions in the plier at this point, but just be aware that within the slides you have you have access to the content afterwards. In addition to these exercises we also have solutions on the site. And so one thing I would strongly recommend if you are not familiar with the functions group by and summarize. This is a very powerful tool set of tools that you can use to create the use of subgroups of your data and so I would highly recommend. We're not going to cover it. Because of time limitations here we're going to jump to the dashboards lecture here but but just be aware that there are slides here that you can that you can review around group by and summarize. There's a great material in the offer data science book. And there are exercises. There's another exercise we can practice and you can see the solutions for. We're going to actually just one thing I will mention I'll skip to. I'll skip to the end of the transform section and just mention a couple other things real quick, we covered select filter and mutate, you know select for extracting columns filter to to filter out specific rows by logical criteria this is really helpful because you have a big data set you have some specific inclusion criteria use a use a function like filter to to really narrow out your data set to just your inclusion criteria, and then mutate. You can use to create new columns based off of a calculation. Just want to mention that there is many more the prior functions we didn't cover group by summarize is another powerful set of functions arrange helps you to order data so this is very similar to like a sort you might do an Excel. You can add rows. You can, and then there are some operations that you can do to to actually join different data sets together, put together different data sets, either by some key that between different data sets, or kind of arbitrarily if you have to data frames of the same size you can you can put together using some using function the bind columns function to put data sets together, not based off of key and join functions to put columns together or put data frames together based off of a key. So there are, you know these and many other functions are on the dplyr cheat sheet so I highly encourage you to take a look at that particular when you have these data transformation issues. And there's a variety of other functions that help you tidy data deal with dates and times parse out strings iterate and kind of do some actions repetitively, and then actually query manipulate databases as well. So we are running low on time I think we will jump right into the next lesson we will probably won't get through all of the next lesson but I think, particularly for those who are interested in visualization and, and looking at, you know how you can pull together your data in a way that is easily digestible by others. Let's, we'll cover a few concepts and dashboards maybe do one exercise before we, we break for the workshop. So this last dashboards lesson we're going to talk a little bit about, you know ways to communicate data using specifically a package that is really geared toward building dashboards and taking multiple plots or ways to visualize data and put putting them into into one format that that you can actually make interactive. So I'm sure at least some of the folks in the workshop have have at some point in their lives, seen a dashboard or seen, you know, some representation of graphical data where multiple plots are put into one place for kind of the purpose of consuming that data quickly, and you know potentially a large amount of data very quickly and deriving some understanding from that. And this is just an example there are multiple nice examples of dashboards using the package that we are going to cover here. The package that we have found very useful and this is something that, you know, in my daily work actually we use very frequently is called flex dashboard. And so this takes an R Markdown document and can turn it into an interactive dashboard. And so, notice that this is, you know, this for the time being this is really focused on our Markdown documents are Markdown documents you might recall are essentially similar to core tone and how they function. There are just some differences and syntax here and so this should be, you know, this, this general framework of flex dashboard is not it should be pretty familiar in terms of just dealing with a, a document format that allows you to to include both text and graphics in one place. So this is an example here on the left hand side of a flex dashboard Markdown document. So you can see there's a header, very similar to like our core tone document. In this case we are specifying a flex dashboard output. And then there's a specific orientation that's called out of columns and so this, and then you can see here in the, in the kind of text section of the document. So this is a certain syntax to call out and carve out different columns so you can call out, you know, the first column here include some chart with a heading, you know, play some code, and that will physically put the contents or the output from this code chunk into the section of the, of the dashboard here, then you know we call a separate the second column here, and then here we've actually included multiple charts, which will be laid out on in that second column. You can this is an example of the column orientation. You can also orient this as rose just by changing the header and the syntax here and so as opposed to calling out these columns with this. We're calling out rows, we are calling out rows, and you don't know there's no specific number of underscores you need to memorize the nice thing is when you start one of these documents. It actually the markdown that you work with will will kind of pre supply you or will give you will give you a template to work with here. So for this exercise, we're going to jump to oh for dashboards or rmd is going to look a little bit different than your core to document. Our studio is going to open this in this file in source mode so this is it looks more like the raw text of the of the file, which actually the core to document would look like you would look very similar to this in source mode. But when we but by default our studio will open it as visual so don't, don't be too concerned with that just open up this file. And this is going to work the same way as you would work with Porto and that you can render or knit this. And so what we're going to do when we open this is go ahead and jump straight to knitting the document. So there might be, depending on the version of our studio you might see a preview button instead of knit, whether it's preview or net you'll you should see a button there in the same place that you saw render for Corto so hit that button. You may need to hit try again if you see a pop up window, but just go ahead and get that that will that will show you what the dashboard looks like. And then we're going to go into the test volumes over time plot. And we're going to just tweak that plot to visualize the fraction of positive tests on a given day. And so this is just jumping back to your Gigi Gigi plot lesson using the phil aesthetic to and mapping that to to result. And then knit the document again, note that change, and then finally change this layout from a column layout to a row orientation. So I'll give you five minutes. And then we'll come back together and walk through the exercise. All right everyone let's regroup. I'm going to go through this relatively quickly I've opened up the dashboards markdown document here you'll see this is in source mode you can kind of see our flex dashboard information up here. We are reading in that cove of testing data set. And, you know, if I hit knit here. I'll see a kind of quick simple dashboard that pops up. It shows a plot and a, you know, some a table with some positive test results here and so I'm going to go in. And I want to make this modification to my plot so I want to map the phil aesthetic to result and so I'm going to just jump in here and map phil to results before knitting. I'm going to also update the orientation here so this will help put in the results into my Gigi plot and you know the phil of the bar plot will now map to the different results here. And also change the layout here from columns to rows. Do that here I see these are different columns that are called out here I'm just going to update this so that you know clearly label them as rose. So if I knit this, I then see I now have rows, splitting the two sections of the dashboard, and I've mapped my results to to the phil here. So this is, you know, this is a pretty handy toolkit to very quickly create dashboards and you're really with a pretty simple syntax. And then a series of plots or tables. Now we're not going to, we don't have time to cover it here. But again, the, you have access to the slides you can review this and see some of the other your terms. You can very easily make plots interactive with the plotly function, just by taking your Gigi plot, storing that as an object, and then calling this function Gigi plotly, your with your plot, you can use that to to make any Gigi plot and interactive plot. And then you can likewise for tables, you can use the data table function with this from this DT package to make interactive tables as well and so there are exercises around this if you want to play around with this and learn how to do this. So these are very kind of quick ways to make your dashboard interactive and so someone mentioned you know this looks like a simpler simpler version of shiny. And yeah, I think one of the, the benefit of this format is that you have less complexity than setting up a shiny but you can knit it into an HTML the data is actually held kind of within an HTML, and you can share that with others so that you can that you can really open up on any modern web browser. So there's a variety of other ways you can style these dashboards there are different themes that that kind of have different look and feel to the dashboards. And, you know, these are kind of simple ways to take some of those skills that you learn and manipulating data, and using Gigi plot to turn it into a nice work product that you can use to communicate with others. So I, there's a variety of other packages that are really focused on interactivity. And you can just see some of some examples here. And then shiny is really that ultimate level of application of interactivity, developing a web application which has a higher level of complexity, but is ultimately very versatile tool for for making these web apps. And, you know, there are different ways that you can deploy, whether these are static HTML dashboards or web apps as well. So, with that, you know, we're at the end of time I just want to cover a few things, or point out a few things that that might be a value to you. As you go on and you kind of develop with these are skills and so, you know, if you don't already have our studio installed on your computer. There are instructions on the site on the website to do that you should already have that website on your email, and there's some guidance as to how to install the right packages and then take the code from from this, from this course and use that on on your time there you know multiple exercises we didn't cover that you can cover on your on your time, if you want to continue to develop these skills, and we have those solutions as well. I highly recommend when you have straightforward questions about kind of these bread and butter tidy versus functions in particular the offer data science book it's freely available online. Very nice resource. And there's also some solutions to the exercises as well you know that part of the value of going through our for data sciences is really learning by doing and doing some of those exercises, and you can find the solutions to those as well. I'm focused on quarter because that's the kind of the newest version or the next generation of our markdown, but there's a great resource on our markdown that's publicly available as well freely available. And there are multiple other kind of quarter highlights, both of our, both of the key notes for the our medicine conference as well. And there's another workshop tomorrow focused on on quarter. Our TAs rich rich and Sarah for for helping out and keeping us on track and answering questions. I want to thank all of you for your participation. And please complete post course surveyed be we'd really appreciate it. If you're, if you can complete this looks like it's working now so please do fill that out and wills will stick around some of us will be able to stick around the instructors and maybe the TAs might be on for a little bit afterwards to answer additional questions. Thank you all.