 Statistics for healthcare, life sciences, social sciences, using the R language for statistical computing bought to you by the School for Data Science and Computational Thinking at Stellenbosch University. Welcome to this course on statistics using R. My name is Dr. Jean Klapper and I'm a research fellow at the School for Data Science and Computational Thinking. I'm the creator and instructor for many online courses, the largest of which has more than 100,000 participants from all over the world. I also have an author, a textbook on statistics. One thing is for sure, I am passionate about learning from data and about showing others the beauty of extracting knowledge from data. I hope that this passion shines through during this course. I really want you to be passionate about it too. Now week is a short time in which to learn any subject and deciding what to put into this course I wanted to be cognizant of the audience. We are all domain experts in our fields or at least learning to become experts and for most of us there is so much data from which we can extract knowledge. I want to leave you at the end of this week with a full understanding of the power of statistical analysis using R. This means that this is not an easy course. Learning any new topic takes time and effort and practice. R is a computer language. A computer language is much like a spoken language. You don't just pick up a new language and become fluent in it in one week. This course aims to be both reference work packed with information and also a personal guided journey. I have created video lectures, multiple PDF documents, practice exercises, solution sets and we are going to have live online sessions. I really want to support you during the start of your journey. There are no ways that I want to leave you with some superficial useless stuff. So I need you to stick with it. Remember there is no pressure during this week, no expectation to understand everything. We are not aiming to have a deep conversation or of the meaning of life in Latin after this week. The aim is rather for you to have a full understanding of the potential of statistics and R. I want to guide you towards a future where you can join the massive community of domain experts employing the power of data analysis and R. Now computer languages such as R have democratized the use of data analysis. The ability to analyze data is no longer the exclusive domain of statisticians. It has instead become increasingly important for experts in other fields to analyze their own data. It is also an essential skill to be able to interpret the published literature. It is in no one's interest to read only the introduction and conclusion sections of a published paper. All experts must be able to critically appraise the results of research becoming active members of the research community. The community of R users is enormous and answers to R related questions are everywhere. Any quick search of Google will show pages and pages of links to tutorials, videos, discussion boards and so many other resources that will answer all your questions. You will pretty much never get stuck when you need an answer about R. Now computer language needs a program into which you type your code. In this course we are going to make use of RStudio. RStudio is the standard development environment for our code just as Microsoft Word or Google Docs are for documents. More specifically we will use RStudio Cloud. You can sign up for a free account and use RStudio without having to install any software in your computer. It is completely possible for you to do so if you want to though and if you have questions about this we will discuss it during the live sessions. Now on RStudio Cloud you will find copies of the working material for this course. There are code files, exercise files and solution files. The course is structured around video lectures for reference and in case you do not want to watch the video lectures there is also a set of PDF documents that you can read. During the course you will need to watch the required video lectures or read the documentation. That is each morning or the evening before. You can then attempt the set of exercise files pertaining to each lecture. At a specific time in the afternoon we will have a live session where we work through the exercises and discuss the related topics. There are 15 chapters in this course and I want to tell you a little about each of them. Chapter 1 is a short introduction and highlights the topics covered in this course. Chapter 2 introduces the R language in RStudio itself. RStudio allows us to create the standalone R scripts that only contain code but more importantly and very much more useful is the ability to generate R Markdown files. Now an R Markdown file starts life as an empty page just as in the mentioned Microsoft Word or Google Docs. You can write normal sentences, format texts, titles and subtitles and add images, videos and do pretty much everything you would do in a Word document. You can also write our code though. This makes R Markdown files very powerful research documents that we can share with our collaborators or the rest of the world for that matter. I introduce R by starting with some simpler arithmetic before introducing mathematical functions, computer variables, collections, random values and I introduce you to enough code just to get you going. Chapter 3 I talk about study types. Pretty much most of the papers that you find in journals employ one or more combination of study types. It is important to know about study types as you embark on your own research. I also talk about randomization and the research question. Chapter 4 I introduce some basic statistical terms and definitions. These are important to know. In Chapter 5 that's all about data manipulation in R. I show you how to import data into R and explain how each value in your data set, how they are indexed and how to select specific parts of your data for analysis. Chapter 6 expands on Chapter 5 but instead of using base R I showcase the relatively new paradigm in R called tidy principles. It has pretty much become the standard approach to data analysis in R. Chapter 7 is all about summarizing your data. I talk about all things mean, median, mode, variance, standard deviation and quantiles. In Chapter 8 I build on these measures of central tendency and measures of dispersion by showing you how to do comparative summary statistics where we start to compare values between different groups. Chapter 9 contains one of my favorite topics, data visualization, where summary statistics start to extract the knowledge hidden in data. Data visualization brings data to life. By visualizing data we extract even more of the knowledge that it hides. It is also a great way to communicate your results with others. Chapter 10 starts our real journey. I explain the topics of sampling and sampling distributions along the way introducing the famous t-distribution. In Chapter 11 we do some parametric comparison of means. This includes students' t-test and analysis of variance. Chapter 12 is about linear models including correlation, linear regression and logistic regression. These techniques are some of the most often used tools in statistics. In Chapter 13 we compare the means of numerical variables between two or more groups. These tests and the ones from Chapter 12, they require some assumptions to be met. In this chapter I show you how to test for these assumptions. If they are not met we cannot use these common tests. Chapter 14 solves our problem by introducing non-parametric tests. These include the man-wit-new test and the Kruskal-Wallis test. The final chapter is about tests for non-numerical variables. This includes the chi-square test for independence and Fisher's exact test. Now remember to read the description down below that will give you an update of which chapters are covered on which days during the course. Before you start on this journey let's go inside so that I can show you how to prepare for the course.