 let's test the sound one two three test one two three test not too bad a bit bad well it's a lot of dumb right I don't think many people like Rotterdam there's no alt start which is a shame so sounds okay yeah well it wasn't too bad but it was a big conference there were a lot of fun talks but there was also a lot of like not too interesting talks so we'll have to see well have to see but it was okay was okay I enjoyed myself and I'm a little bit bugged that I missed the dinner at Blidorps Zoo alright sound is okay perfect yeah I thought we just start with some music I'm still setting up I actually did something new like I have a second perspective now I added an additional perspective to the to the standard like this one so anyway anyway like so that you guys can see what I'm seeing right because like for you guys you only see the standard thing but you never see me like typing or my drawing board and these kinds of things and kind of the equipment that I have set up so that it works but yeah thanks everyone for joining us we're up to two concurrent viewers so of course Misha's here Misha's always here which is good really appreciate that Misha that you're always here so it's my email it's another email so YouTube mentions that I'm live I don't know why I get actually notifications about that all right let me move some windows around so that I am actually able to do stuff this is for when we want to look at stuff all right so I'm just gonna go to the lecture layout then it's gonna be like this like I don't know like at least you now can see the stuff that I'm using since it's the last time that I'm streaming probably in a long time I'm not as prepared as I should be but but today it's just you Misha when I look at my YouTube thing I can see that it's one concurrent viewer I don't even think that my moderators here yet which is logical and the fun thing is is actually my office is clean cleaner than it has been in years because I spend all morning like cleaning up tidying up and stuff so I see to how do you mean you see to you see to what oh two viewers yeah that's true so my moderator might be here I hope so I hope so but I don't expect too many students to show up because it's been a very very busy week two viewers yeah I just saw but yeah at least like I took my phone right and I've added it as an additional camera so then you guys can see what I'm I'm seeing right so you can like see the YouTube chat and when people say things and stuff and besides that you can see how it looks for me so this is my OBS window where there's like the layout and I've all of the little buttons and of course I have my stream deck here so I can play music and switch scenes and stuff right if I just press the notepad plus plus button like this one here then you can see that it's changing that so and I thought it would be good to show you guys how it works when you're streaming and that it's more difficult than it looks like you really need like two monitors so I'm just gonna put this one back and of course you need the influencer circle right like that's very important because otherwise like lighting will be very bad especially since I have like lighting from the side cool toy cool toy yeah yeah it's a cool toy to to play with but let's just quit the music for now because it's really loud in my ears I know it's not that loud for you guys but today welcome welcome everyone this is going to be my final lecture for the ha so from tomorrow I will still be in at the university and I think I already told you guys this in a previous lecture I got a new job so I will be moving to Northumbria University so yeah so I will be moving to Northumbria University and the nice thing is that well the nice thing is the sad thing is is that I'm leaving here so I checked actually the Agnes thing because some people had problems signing up for the exam that should be fixed now so I hope that everyone is able to register for the exam if you are not of course then send me a message but for today I will just have a 25 30 40 minute presentation about creating an art package it's not as good as I want it to be actually like you can see the glare on the screen which is a little bit annoying but and it's too high up as well it's just let's do it like this let's hope that it doesn't just drop good it's really interesting like I could have like five six cameras if I wanted to but yeah so yeah we will be talking about the art package and then I will give you guys an overview and then at the end of the overview I will show you guys a couple of exam questions from last year so we can or so you guys can get a feeling of what kind of questions I'm asking for the exam so with that all out of the way let's start my final lecture for the Humboldt University so like I said we are going to create an art package called not another art package right because we've been programming in r and we've been generating a lot of scripts but we want to of course have other people use our code so how to do that well that's by creating an art package um so during this whole presentation I will be creating an art package which has the annoying name your package name and originally when I made this lecture I thought it would be a really good idea to call it your package name but after doing this lecture for like a couple of years I actually figured out that it's that it's a lot harder than it looks to say your package name every time but we're just going to create a package if you looked at the lecture from like last year there's a r package in 15 minutes so that just contains our code but I wanted to show you guys as well how it works when you have c or c++ code that you want to put behind r right because r is a very versatile language and you can actually use it to call c or c++ and it has as an added advantage is that you can use things like pointers and use very optimized code because r itself is not that optimized for very big data sets and the nice thing is of course is that by using c or c++ you can actually do that and create a really really advanced r package where you have like really good code behind it so first things first if you want to create an r package you need to have the r compiler so the r compiler is called r tools and you can get it from the link on the screen so https or http this is still the old link cron if you are on windows then it's bin windows r tools if you're on linux it's something like bin linux r tools and if you're on mac os x it's something like os x r tools but the thing that you have to be aware of is that the r tools version that you are installing should match your version of r and when i made this slide the latest version was r 3.2.5 i think but currently we are at r 4.01 or 4.02 i think so again this is very important because if you install the wrong version it will not be able to build packages that you can install and of course if you want to build help files and these kinds of things you might need mixtex so mixtex is a la tex compiler so la tex is a layout language which is used a lot in computer science and mathematics but it it's not required to build the package but if you want to build the package and look at the help files that it generates then you need to install mixtex as well so you can get it from here it's a big install so if you install the full mixtex la tex compiler suite it will cost you like 1.6 1.7 gigabytes of your hard drive good so after you've installed these things then you can start building an r package so an r package is all about structure and here we see the official guidelines let me just click on it then it opens up in the wrong window so i'm going to open it up in the correct window so if we look at firefox write then this is the manual for writing r extensions right so this is for version 4.2 the latest version and if you go down you can see that it's a massive massive document which has like eight chapters and it's really really long right so it just goes into all the nitty-gritty details and you have to know all of these details because that's that's important because every little rule that is specified in this document you have to follow and it's really a lot so if you would print it it would cost you around like 60-70 pages and it's a lot and it's it no one wants to read this document so that's why i made the lecture because instead of reading this 60 page document you can more or less just follow this guide and build an r package which they will accept on cram so that's always the goal right is get your code bundled up in a package and then put it on cram so that people can just do install packages and then give the name of your package and then use it in their own code good so if of course you get stuck somewhere then do conceal the official guidelines so that's why the link is here so you can just use the link to go to the official guidelines and the first things first so you need to create a folder right so i always create a folder on my desktop when i make a new package just because it's a good place to put everything but everything in there needs to match the official guidelines so that is the important part besides that when you want to build an r package you have to start using the command line so in windows the command line is just pressing the windows key and then starting cmd right so cmd.exe is the is the windows command line executable and it opens up this kind of a screen which looks a little bit like msdos which allows you to just directly type commands which are executed in windows or against the windows subsystem so because we need to execute in the command line i want to make sure that people know how to start it so it's very easy you just press the windows key just basic cmd or run as admin just basic cmd because like we don't need any administrative privileges unless you of course want to install your package somewhere where you don't have full rights but normally if you install r as a user then you can also install packages as a user so you don't have to have administrator privileges so just open up the command line and then it looks kind of like this depending on which version of of windows you're using i think this is still a screenshot from when i was still using windows 7 or windows 8 i'm on windows 10 now so it looks a little bit different but it's very similar to this but of course this is very advanced because like a lot of people never ever get to use the windows command line but you will have to if you want to create an r package good so of course we need to go to the desktop right so if you start it then it will go to your user folder so in this case i was called user 2 on the on the computer so i didn't use my own name i got my username being user 2 but so it's in c which is the c drive then you say users which is where all of the user accounts are stored and then user 2 but because we are creating this package on our desktop we have to move into the folder so moving into a folder is done by the cd command which is changing your directory so what i'm saying is just cd desktop and i created an empty file an empty folder right so the empty folder that i created is called your package name right because that's going to be the name of my package so if everything goes well right you are able to install the rtools compiler you're able to install mixtex you've created this folder on your desktop called your package name then when you cd into the desktop folder you can type r cmd with capitals check in small letters and then the name of your package so in our case your package name and since it's an empty folder it will tell you this right so it will if you would execute the command so i'm in c users user 2 i go to the desktop and then i execute the command r cmd check your package name so it will say that okay i'm going to use this folder here as my log directory this is the version of r that we're using this is the version of the compiler or of the platform so we're on windows 64 bit i'm using this character set which might be a little bit different if you're from a different country and then it says checking for file your package name slash description and then it says no right because it's just an empty folder so there is no description file right and that is that is key in building an r package i can't repeat this enough building an r package is only about structure it has to be structured exactly the way that r expects things to be so r expects a certain file and folder structure so let's create this description file that it needs so we need to create an empty file and we need to call this file description all capitals all caps so head all capital letters but we have to make sure that there is no dot txt at the end right so it sometimes happens that people use windows and they save a file and it's called description dot txt and then you have to just delete the extension of the file because this file is not allowed to have an extension so inside this file you can open it up using note bet plus plus for example you can just type the following so you can say package double point your package name you have to specify a version so in this case it's version 0.0 0.0 minus 1 first version of the package the date that it currently is or the date at which you are creating your package so you can see that when i first did this presentation it was the 9th of june 2015 so that's more than seven years ago when i did the presentation first you have to give it a title which is very important because the title is seen by people so make your title descriptive right so if you're making a package for genome-wide association then have at least genome-wide association in the title because it is the thing that people can use to search for if you're creating a package for ecology make sure that the title contains the word ecology if you're creating a package for the analysis of RNA-seq data make sure that it's in there because these are the keywords for google that that is used by Kron to kind of advertise your package you have to specify an author and a maintainer so you write down the name of the author and then you add between the larger than and smaller than brackets you add the email address which is very important because when you submit your package and this is the email address so the email address of the maintainer will be used to contact you to tell you about any issues that they found with your package and things that go wrong you have to have a depends so in our case we're just going to make a package and the only thing that this package requires is of course r so that's what it says here it depends on r and in this case our package is going to be suitable for r versions greater than or equal to 3.0.0 so this is this is the thing that determines if r will install your package so if your package is very specific to a specific version of r you can specify a single version if every version of r from a certain version works then you can use this larger than or equal to the description gives a one line description of your package so in this case it's just my first r package again the description is important because it helps people find your package and then you have to specify a license and this is relatively tricky and i'm not a copyright lawyer either but you have to specify an open source license if you want your package to be distributed on cram they only have a very limited amount of of licenses which are allowed so it has to be gpl3 or gpl2 you can go with i think mit license there's the b bsd license and there's a couple of other open source licenses which are allowed but you cannot use a closed source license right because you are distributing your code to other people the code is given to people so you cannot keep the code closed so for a list of licenses go to the r extension manual they have a list of licenses and if you want to know more about licenses then i would go and tell you guys to go to the eff right so the eff website they have a very good overview of which licenses are there and which licenses allow you to do certain things right because you still with an open source license it could still be that you don't want people to use it for commercial purposes or to sell your package right so you can have limits on what is allowed using an open source license good so now we've added this description file right so now we can again do the same thing so we open up the window for commands so the the windows command line we go into the desktop folder and then we say rcmd check your package name right so just check again is it now a valid r package so what you can see here is that it's checking a whole bunch of things like it now finds the description file right it says okay there is a description file and then it's going to check a whole bunch of of different things and it's actually a valid package at this point right because there's only a single note at the end saying that well there's something not entirely wrong not entirely correct with your package but the package as such is a valid r package so the most minimal r package contains no r code it is just a single folder with a description file specifying what will be in the package but you have a note right so the note says packages without r code can be installed without a namespace file but it is cleaner to add an empty one a b thank you very much for the live stream sorry won't be able to catch up now we'll see the recording later please don't stop these videos yeah i will probably continue making videos in my free time but like i said i will be moving my job and i don't know for sure if i can continue live streaming my lectures from Northumbria University it's something that i still have to discuss with them there but if i'm not allowed to then i will continue making videos like this just not in work time but in my free time because i like doing it a lot as well so thank you for for the comment good so packages are very easy it's just a folder with a description file but it mentions that if there's no r code it's perfectly fine you can install it however it is cleaner to add a namespace file right so what is this namespace file well this namespace file is there for you to load dynamic libraries so dynamic libraries under windows are called dll files right so you have for example an opengl.dll but you also have other dlls so dlls or dynamic link libraries or dynamic libraries are more or less compiled code which you can use so it is code which is for example written in c or c plus plus or for tron or some other language which can be compiled into this one of these dynamic libraries so they they contain code to do all kinds of things and there's literally like millions of dlls available and these dynamic libraries are there to do things like creating a window using opengl using vulkan for 3d display but there's a lot of other things that are in there right there's for example bullet which is an open source system for physics so you can do all kinds of physics simulations and you can then directly from r call these functions which are made available by other people in these dynamic link libraries so not just that right so it is there to load external libraries but it also has a list of all the functions that are available to the user right so in this case we have an empty package without any functions that we are giving to the user but we do want to provide our code in our package right so that's why i'm going to create an empty file called namespace again it's all caps there is no extension for this file so no.txt at the end and inside the file i'm going to add the following right so i'm going to say what does my package do well my package exports a single function to the user and the function name of this function so that when people start our load my library then this this function will be available to them to call right and this is in this case i am naming my function my first package function i could have named it anything right because when we write code i could have for example have a function called msa to do multiple sequence alignment i could have a function called square root where i compute the square root of something but in this case as an example we're just going to make a function which is called my first package function good so of course i now need to make this function right so if i look in my Windows Explorer i have my description file i have my namespace file but of course i also need to have a folder which are is which will contain rcode because i promised in the namespace file that i would provide at least one function to the user right so i have to create a folder called r with a capital and this will hold all of the rcode files so all the code in these files needs to be in functions so you are not allowed to have a script which starts with a set working directory you can have a script but this script can only contain functions because that is the only thing that you can give to users right because it's not like an r file or an r script that you would normally write it is a it is a package so a package contains functions and my strategy is always to have one file with one function in there and this is just so that i can easily find where a certain function is located so if i have in this case a function which i am promising to make my first package function what i will do is create an rcode file the rcode file is called myfirstbackagefunction.r and within this file of course i add a header because that's just common sense to add a header so that people know when it was created by whom it was created when it was written when it was last modified and in this case there is a single function in here called myfirstbackagefunction which is a function which takes no parameters and it doesn't do anything right because why would it do anything because it's just and then i save it in this r folder good so now we have a namespace file which promises to produce this function we have an r file which has the function and we have a description file describing how the package looks like right so of course the next step is to check the package right so i'm going to do the same thing again i'm going to do r cmd check your package name and it will do all kinds of checks and it comes now up with a warning so we first had a note so a note is not that serious but a warning is something that you need to fix your package is not allowed to have any warnings or notes when you submit it to cram right so now something is weird right so what is the warning is it it warns you that it found an undocumented code object called myfirstbackagefunction so it mentions that all user level objects in a package should have documentation entries and this is key to why r is such a nice system to work with because every function in a package needs to be documented so r forces you to document your code and to write at least a minimal skeletal structure about what your code does what the input parameters are what is the expected output and an example and this makes a world of difference because a lot of other programming languages they don't force you to document your code if i submit code to for example the npm registry for javascript there is no requirement to document anything i could just get away with saying um this package does something right and then no one would know what it does but in r every function that you write needs to have a documentation object and this is this is core to how r works and makes it very beginner friendly so create we need to store our documents a documentation somewhere right so documentation goes into a folder called man for manual and this is for some reason written with small letters i don't know exactly why but it's a choice that they made so r code goes into a folder called r with capital letters manual files go into a folder called mon which has small letters and again i follow the same structure one documentation file for one function right so inside my month folder i create a new file called my first package function dot rd right so the r is saying that oh it's an r file and the d means documentation right so the extension dot r means there's code in here and dot rd means that this is a r documentation file so this is what i now have inside of the i first i r package and it is so i have a manual i have i have manual files in this folder i have an r folder which contains the r code i have a description and i have a namespace file so how does these documentations look well here is when we start using later so later is one of these languages which is there to generate all kinds of output formats if you write your document using the latech layout language what happens is that you can use a latech compiler to compile this document into a pdf but you can also compile it into an html file you could even compile it into a word document because there's latech compilers which take a latech document as input and then compile it to a different output format so how does this rd file look well i have to give the name of the function so i have slash name so my first package function then i have an alias which in this case is the same as the name which is also called my first package function right because i can have a function uh can have multiple names in r and we'll see this when we get to documenting internal functions because then we use the alias so that multiple functions get redirected to a single help file but in this case i have a single name i have a single alias and i have to give it a title right so the title generally in r is more or less structured like this so we have my first package function then a minus and then a very short description of what the function does then we have a slash description which is a long description of what our function does right so here i could say something like my first package function first function in the package and then the long description would be this is the first function in my package it doesn't do anything and does not have any input parameters then we have a slash usage section and this is showing the parameters so in this case we created a function which did not have any parameters so the usage section will just be the way or will just be calling the function without any parameters i have arguments so in arguments i need to describe the input parameters or the input arguments to my function it does not have any but i need to provide some text so if if i need to provide text but i currently don't have the text what i do is i always write something like to do add details right because then when i search through all of the documents i can just search for to to do and then i find a list of all of the files where i still need to write some documentation the details are to provide details about the algorithm and for example it could be something like oh this function uses the method developed by blah blah at all in 2015 the value is describing what is returned right so here you would write the return value of this function is a list which has two elements the first element of the list contains a vector and the second element of the list contains a matrix and then you have to provide an example you always have to provide an example without an example r will not accept your our documentation file and this is one of these advantages of r this is why r is so super user friendly it is because every function in r in packages which are uploaded to cram they have at least a single example to show you how to use it so in this case i will add a more or less comment line saying an example to execute the function and in this case we are just calling the function with no parameters since it doesn't do anything and it has no parameters and then i have to specify a keyword and this is how r figures out if a documentation object is describing a function which is called a method or if it is describing things like data or if it is describing something like a plot routine so the keywords are very limited you can use data graphics methods and like one or two other ones but in this case we are documenting a function so the keyword is methods good so let's recheck the package right so i'm going to do our cmd check your package name and it checks it checks we see a whole bunch of ok's and that's it you've built your first r package right so step one learn how to build an r package you now know how to build an r package which contains r code i don't know what step two is but step three is profit right um of course there's still one step that we need to do we need to install our newly created package into r because we haven't uploaded it to cron yet we can't just use install packages from r and we have to install the package via the command line so we open up the command line again and then we type r cmd install so in contrast to check install is written with capital letters check is written with with small letters um your package name and then it will start installing your package of course the next step is to open up our load your package as a library right so say library your package name i'm going to execute my function and i'm going to do question mark my first package function to look at the help file which has been created and of course the help file in this case is not very meaningful but at least you can check right so you can check that the function works um you can look at the help file make sure that everything is okay good so that's it it's very very easy to create an r package when you only have r code um it takes you around like 30 40 minutes um to make the structure fill in the description file fill in the namespace file and then start just adding your functions that you created one by one um of course writing the documentation will take a little bit longer good so i wanted to talk to you a little bit about more advanced packages right because there is a lot more to creating our packages than just this not when it comes to our packages which just contain our code but i wanted to show you guys how you can add your own data into packages um and i wanted to show you guys how you can do some more documentation and also how to use c or c++ code behind your package so there are two very special files when you create your r package um both of them are in the manual folder so it's mon and then slash and then you have the name of your package so in this case your package name minus package dot rd so this is the global package description this is more or less an index so this is um if you would think about your package or the documentation of your package as a html file right then this would be the entry it would be the kind of the index dot html uh to your package right so it's a global package description file and it is more or less the index so you can talk about like oh i was bored at work and i wanted to make a package and that's why i created this package it has five different functions and and these kinds of things so and this is just the general information about your package and then we have another file which is mon your package name right minus internal dot rd so since all functions in r need documentation but you sometimes don't want to document certain functions right so imagine that i have a function which is called by the user um but then i have like a little function which which the the function uses internally right so this this function is never supposed to be called by the user of your package it's an internal function it's only used by you the author of the package inside of functions which are given to the user then you can document it here right so this is where the alias field comes in is that imagine that i have a a very big program right so this program does like six different analysis steps one by one and so step one is data qc step two is normalization step three is doing something else right so this this qc step is generally a function right because the the function that i'm providing to the user just calls the qc function but the qc function is generally not called by the user so then have what i can do is i can alias the internal or i can alias the qc function to this internal help file the same thing holds for the normalization step right so i can take the normalization function and alias it to the internal help file which means that when people ask for the help file they just get a help file which says well this is just internal functions there is going to be no documentation and you're more or less on your own so all functions need documentation small internal functions which are generally not used or not called by the user can be stored here and we can use the alias field for that so let's first look at the first one so the global package description index this is how it looks like so again we have a name your package name minus package we have an alias which is just your package name we have a doc type in this case so the doc type is package so this tells r that it is a description of the package so it uses a slightly different format compared to the standard format that it uses for help files because this is the main entry file then we have a title we have a description we have details and that's it there's no example there is no return value because we're not describing a function right we're describing the package itself of course we have an author section so there's an author section which lists the author and which lists the maintainer uh paolo hey paolo welcome welcome um yeah sorry i haven't responded to your comment yet i i saw it but i was on a conference um but yeah thanks for leaving a comment um i'm actually interested in the kmer stuff that you were proposing um so could you send me an email with some more information because i think it will be interesting to see if it can make like a short lecture ish out of it so but welcome to the lecture and in this case the keyword is not methods but the keyword is package right since this describes the whole package um it's it's a different keyword just to let r know that um it's not describing a function but it is describing the package so the internal package looks like this right so this is the file like i told you guys which describes functions or documents functions which are generally not called by the user so it's a very short file but generally if you have a lot of internal functions there will be a lot of aliases in there so the name of it is your package name minus internal it has a title the general title is always internal functions and then you alias all of the different functions that you do not want to document the rule is is that you cannot export a function to the user using the namespace file when it is documented in the internal so in this case i'm aliasing three internal functions called my internal function one my internal function two and so on generally this would be like um internal dot qc internal dot normalize internal dot calculate some statistics right so it's it's generally a lot of functions that are that are more or less described here then you have the description the description is very basically internal functions these are generally not called by the user you have to specify the author in this case it's me so i just write it down so i write down my name my email address and then i have keyword and the keyword for this file is internal right so there's two keywords here so the one is keyword package which is the main index page of the package and then we have the internal functions so the internal functions are not called by the user but since every every function that you write in your package needs documentation in this case we are going to link these functions which we do not want to write documentation for because they're generally not used by the user we are just going to link them all to the internal package keyword using aliases good all right so if you want to add data data goes into a folder called data with small letters again i don't know why they didn't decide to capitalize everything because the description the namespace the r folder is but for some reason the data and the manual folder are just small letters so of course when i want to add data i need to have some data so in this case i'm just generating a random matrix 100 rows 10 columns a thousand random values in there and i'm just going to save this random matrix into a file called random matrix dot r data so again the the file extension here is sensitive right an r d file where the d is small is a r documentation file an r data file data is capitalized this contains data that is generally given to the user and again every object in your package that you are giving to the user needs a documentation object so if i save this random matrix dot r data i also have to create a new file in my manual folder called random matrix dot r d so the random matrix dot r d file looks like this because it has a slightly different structure you can see that from the different doc type so the name of it is random matrix again it has an alias you can use aliases to provide the same data using different names if you if you so please the doc type is data it has a title it has a description it has a usage section instead of a usage section so this is how you make the data available to the user right so if the user types data random matrix into r it will load the data file the format describes the format so for example this is a matrix with 10 rows 10 columns and 100 rows it contains numeric values the details are there to highlight some details about the data for example it could be that part of the data was collected in 2015 as part of this international collaboration or international project right that goes into the details if the data has been published before you can add references as well so for example if you uploaded your data to something like fixed share or you uploaded it to other data repositories then you can add references towards these repositories so that people know how to cite the data or cite the paper that you published about the data data also needs an example the examples for data are generally just data random matrix right it's the examples are generally the same as the usage because there's no real example on how to do the data you could give a small example here on how to subset it or other things but this is generally not not the most interesting part the keyword here is data sets so and again this needs to be written exactly so there needs to be an s at the end and you cannot use a capital which is it's very precise right making a changing a small letter to a capital letter will already be bad it will not accept that it will not recognize it so the keywords is data sets good so now we know how to create a package create the namespace file create the description file create manuals create the r files the r files can only contain functions the manuals need to follow the structure be written in more or less this latech dialect we know how to add data to it but besides the tests that we have in the individual manual files right because the examples in the manual files when you check your package are actually executes the examples so it executes the examples to see if the examples really work right so that the example doesn't generate an error however if you want to add more tests you can create a folder called tests and our files put in this folder will be executed automatically during building of the package so you might have hey you might have a new algorithm and when the input to this algorithm is 15 then you expect a certain output so you can put that in this test folder so to make sure that every time you change some code in your package or you're making a new version of the package that you do not break the fundamental algorithm right so it's it's really to provide more or less a safety harness for when you're coding every time you install your package it will run all of the examples making sure that every example compiles and works and it will also run all of the tests located in the test folder so how do these tests look well i just have a basic very bad test here so this is test 0001.r again you add a header to any r file that you create and this is a very bad test right because this just randomly stops right 20 of the time i'm drawing a number which is greater than 0.8 and then it will just throw a stop error but that is generally how these tests look so these tests they do they call one of your functions right so they call a function that you created and if the output is not what you expect it to be then you throw a stop error and then the package is stopped there so it will directly stop execution it will warn you saying that i encountered an error in test 001 this is the error message the error message in this case being unsuccessful test good so make sure that when you write tests right and you are using random numbers as input that you set your seed right because if i would set my seed to a fixed number at the beginning of the test then this test would either always succeed or it would always fail right because by setting the seed i'm setting the random number generator to a known point and drawing one random number after setting my seed will always give me the same random number so in this case i didn't do that but generally when you use random numbers in your test which is perfectly possible make sure that you set your seed so that it will always draw the exact same random number all right so now the last folder of the day it is the src folder the source folder in the source folder you can put for troncode ccode or c++ code so this is code that needs to be compiled into one of these dynamic link libraries right so these dll files or if you're on linux they're called so shared object files if you're on mac they're called dilips for dynamic libraries but this code can be in in in this src this code needs to be in the src folder yes hello hello moderator very good that you're joining us so i just want to give you a very very basic example on how you can use c++ code and call it from r and then have the results back to r right so we're just going to make one round trip where we have a function which calls some c++ or c code in this case not c++ but just basic c code and then what happens is that we give the answer which is computed in c back to r good so we need to create a c code file right so the c code file that i'm creating is called call test c from r.c right so it's in the source folder it's called it has this name and what is it going to do well it's going to be a very very basic example of course every c file also has a header so in c comments are slightly different right in r we use the hashtag for a comment line in c you use forward slash forward slash and then everything behind it gets ignored and slash star star slash this denotes a block of comments so we're going to write a little c function and i don't want to explain too much on how you exactly write c or c++ because that that's a whole different topic but here we're just going to make a very very small piece of c++ code so here it says void right so void means that this function that we are defining the function that we're defining is called r underscore add right so it adds two numbers together and this function is going to be void which means that it does not return anything you can also see that there is no return statement in the function right so in in c you specify the return value the name of the function and then the function parameters so in this case our function will have three parameters it will have a pointer to an integer a pointer to an integer b and it will have a pointer to an integer results so c uses pointers so pointers are more or less like little arrows that point to a memory location and this memory location has a certain format on how to interpret it right so an integer is a whole number it cannot hold a five point three but it can hold the number seven right so what this is going to do it's it's going to take this pointer to a so we're going to follow the pointer to a then we're going to add the following of pointer to b so we're going to look into a what is stored there and then we're going to add the value in b to it right so we're just going to dereference the pointer look at the value of a then add the value of b to it and then we're going to store this where the pointer of results is pointing to so all of these three memory locations are managed by r so this is the the the data never leaves the r allocated memory it it just the c code just looks into the memory which r reserved for these things and it's going to add them together put it back in results so what this function does it takes two whole numbers adds them together and then puts the result into this variable called result good so this is our c code so now we need some r code right because we need to call this code right because otherwise it doesn't work right so again we create a an r file so in the r folder we put a new file called call test c from r dot r so again we have our header saying when it was made and here we now have call test c from r right so it is a function which takes two numbers number a and number b and it will return the result right so it won't put it in a new variable it will just return the result but we need to define a memory location where c where the c code can store its results so that's what we're doing here we're defining a new variable an internal variable called result and we're putting the values zero in it so how do i now call c code well in r that's relatively easy you just say dot c so big c right so call c code which code do i want to all well that's the name of the function so r underscore add so that's the function that i want to call and then i want to transform a to an integer i want to transform b to an integer and i want to say result is as integer result because i need to provide our i need to provide the c code with a little box where it can put the result so this will call the c code it will provide two values and it provides an an empty box for storing the results and then it will just return everything from this call so this calls the c code and returns the result of course we have to add a manual file for this function because every r function which is available in the package needs to be having a documentation object furthermore we have to update our namespace file because in the namespace file we have to mention that we now have a second function in our package because we only had one function before but we now have two so updating the namespace file is now requiring two additional lines first we have to tell r that we have c code so that we want to use the dynamic library which is being built from this c code right because the when we compile our package or when we build our package now the first thing it will do it will take the c code compile it into a dll so we have to tell r that okay so when when you load my package use this dynamic library so the dynamic library is being built by r automatically but it will need to specify the name so it's the same name as the package so in our case it's your package name i'm going to export my first package function because i already had that function in my package and then i'm going to export this call test c from r right so this is the new function that we made could you kind could kindly could you tell me when you broadcast your lecture live well now and every thursday at 2 p.m central european time and generally i will actually do these like announcements before not so much announcement but i will create the live stream before and then but generally at 2 on a thursday we stream but that that might change in the future but generally if you just go to the main well not website if you go to my main youtube profile then if i am live streaming soon then you will see the live stream thing but i forgot to do it yesterday or the day before so i only did it this morning like half an hour before the stream started i created the the live stream itself so i don't think that i gave people enough warning to follow the live stream good so used in lib right specifies which dynamic library i want to load so in this case i'm telling our load your package name.dll and export exposes the call test c from r to the user right just like the export of my first package function did that of course i need to provide a manual file so in this case i have two arguments right so the only difference with the previous file is that now we have this argument section oh this is really small so this argument section says call test c from r it has a parameter a and a parameter b and you have to specify slash item a and then give the description of a slash item b and then the description of b right so the first number to add up the second number to add up and of course we have an example right so the example is just calling the function using the numbers 5 and 10 and then of course since the examples are actually tests in r if the result is not equal to 15 right then then there's something wrong with the code that i wrote in c so i'm going to explicitly add a test saying that if the result of adding 5 and 10 together is not equal to 15 then i'm going to throw a stop error and that is just to provide myself with the harness right because if i update my code in the future i might mistakenly change the plus symbol by a multiplication symbol and if i do that mistakenly then when i build my package it will run the example and it will give me an error so it's just it's really nice to have like a lot of these examples which kind of harness your code right with known input known output good so i have to test my package so i do an rcmd check this should not throw any notes or warnings if there are notes or warnings read the note read the warning fix it and then i'm going to say rcmd install your package name and now we can see if our c code works right so we can say library your package name and then we just say call test c from r we provide a being 5 b being 10 and then you see it provides us with the two numbers that we input it and it also returns the result in a little list good so some common mistakes when you are building an r package when you install your package so when you do rcmd install from the command line make sure r is not running while you install a package if that is the case then the package will not really be installed right it will it the code might not be updated and this is not the worst thing in the world but it can confuse you quite a lot right because if you have r open you install your package you go to r you try your new function then it says oh function not found this can be a source of like real pain so every time that you install a package make sure that r is closed so close your r window close r studio or whatever you're using to run r always check your package before installing so always run rcmd check before you run an rcmd install because the install doesn't run all of the test the check does it will go into the test folder and will run all of the test making sure that everything works add enough testing use the documentation for very quick and simple tests and the test directory is there if you want to do more thorough tests of course you have to be able to test your package but in general any function that you write you can write a test saying that if the input is five ten and seven then this needs to be the output and this will really help you especially when you're building a package which you are maintaining over like a five or six year period or ten year period right because in ten years time you're not going to know exactly what every function is supposed to do and by having these tests it gives you the flexibility to look at the code change the code see if all of the tests work and if all of the tests work and you have added enough tests then you know that you did not break anything and this is really really useful and it's kind of this common strategy of like test-driven development so test-driven development means that you are writing a test first and then writing the code to implement passing the test good so with that out of the way we will have our first break or probably the only break for today because the rest of the lecture is going to be the overview of all of the other lectures so i'm going to go through all of the ten lectures that we had and i'm going to tell you guys what i think is important and what you should know for the exam so let's do the first break i forgot what the first break is going to be i think it's going to be ducks but i'm not a hundred percent sure so don't blame me if it's not ducks and we all have some music the music is called barn from my stream deck and i will run out get myself some fresh coffee and i will see you guys in seven to ten minutes good so enjoy the ducks and be right back didn't make it back in time was a very short little movie still waiting for the Tasmanian devil we did Tasmanian devils when did we do Tasmanian devils i have to look that up let me actually go back to lecture layout so that you guys can see me as well i'm just gonna quickly look because i i i knew i did Tasmanian devils i'm not your yeah my videos livestreams i think we did Tasmanian devils in the linear mixed model lecture if i'm not mistaking i'm just gonna quickly look is the break look look look i'll take this one so you guys can see like lecture number nine Tasmanian devils bottle drinking Tasmanian devils and and fighting Tasmanian devils so we did it i i i i heard you we're still seeing your terminal it's the wrong chat and i'm reading so let's put this one back but yeah we did uh we did Tasmanian devils so uh you can't blame me for not doing them so why do we have two cameras yeah why why not last lecture right so um yeah seems that people didn't watch all of the lectures but that's okay that's okay sure i lost it uh yeah so lecture number nine is Tasmanian devils um let me mute myself i really have to cough right so i'm back um yeah so we did Tasmanian devils um i think you're reading the wrong chat yeah yeah because that's very confusing right when i have like you can you can't really see how many screens i have open i could put it like this right so that you can kind of see um but this is the youtube live chat thing and then i have youtube open of course and then this is the overview and then the linear mixed models and besides that i have like all of the other windows open so that obs can capture them like the lecture itself and um of course you you always need to have this thing open to take a look at your cpu usage and gpu to make sure that you don't overload the stuff like streaming is a lot more involved than what you guys just see on the screen so um that's why you actually have the second cam today so that you guys can see what i am doing and uh and the second camera is nice because i can just do like this right so it's a little bit different a little bit different today since it's the last time so ah last lecture yeah kind of like the no it's not the last video but it's the last lecture that i'm doing for the Humboldt um since i am moving to a new job i will be starting a job in Newcastle at Northumbria University and well it's the last lecture of the lecture series um but i'm not going to stop making videos so there will probably be more different videos so like the one that i did about monkeypox and i'm thinking about um inviting some of my friends from bioinformatics to have a couple of like talks more like a podcasty kind of format i tried it last time with um at the end of the r lecture no not at the end of the bioinformatics lecture we had an invited guest speaker um Thursdays will never be the same yeah i'm very sorry um depending on how my schedule looks like i might actually stream next Thursday or the Thursday afterwards but it's the last lecture of the r course and i'm thinking about doing something like i'm showing you guys how to program in like d or c++ or perhaps some python um because of course i program in a lot of different languages because bioinformatics is not just learning r and doing one language it's uh it's it's doing much much more let me actually move this one a little bit oh don't fall don't fall over oh you shouldn't do a second camera all right but yeah so the the ideas is that i do continue streaming because i do love streaming and uh i i do probably going to teach some different things so good look in the new job yeah yeah i'm really well excited about it yes i am i'm also a little bit sad since it's been eight and a half years of me working here so it's uh it's a big change for me as well we'll have to see anyway overview lecture right so i'm going to go through all of the 11 lectures that we had the first 10 of course plus the r package lecture that we just had and i'm just going to highlight what i think is is important and what i think deserves to be on the exam so i'm just going to have one slide for each of the lectures and i'm going to talk a little bit about what we already talked about and highlight what might be on the exam so what do you need to study everything on the slides so if it is on the slide it is fair game for me to ask you guys about it and besides the slides there is going to be the pdf file so the lecture 10 mixed models by boto winter um i might ask one or two questions about it and i always say that but i almost never do um i have to be honest about that hold up big news how do you mean big news i have to share it right now all right so what's the big news waiting for the big news oh home office mail is in oh very good very good so did you get accepted or rejected both of them is big news um oscar got a girlfriend no oscar didn't get a girlfriend oscar oscar's my cat by the way um so fingers crossed fingers crossed what's it gonna be it's gonna be accepted rejected visa is granted perfect that is so perfect i actually got my visa news a couple of days ago so that means that we can actually go to the uk and and work there and that's that's good all right congratulations i uh i probably have like an audio effect for that i i don't use the stream deck enough like uh your application has been successful you've been granted permission to enter and stay in the uk from the fourth of august yay very good very good very good i'm i'm really happy about that at least then i don't have to cancel the other job good so let's circle back and continue with uh with the lecture so what do you need to study everything on the slides is fair game for me to ask so if it's on a slide then i might ask a question about it um and the pdf although i generally do not ask too many questions about the boto winter pdf but i do want you guys to know what a mixed model is what a random intercept is what a random slope is what the difference is how you write them down in r but i'm not going to ask anything about the data set right i'm not going to ask um about the analysis itself it's just about how we do the analysis in in r so good good good good good good so the next slide will mention what i think is important so for lecture number one so we did a very short overview of the history right so i talked to you guys about charles babbage i talked about conrad zoos about the first computer so there will definitely be a question about that and one of the things that i can tell you in advance is that i love to ask questions about people who won a noble prize um because i'm still hoping um don't forget to leave a new email for contact in my email address will be exactly the same i'm continuing using my gmail because um that's that's that's my email so um always has been for the last 18 years or something so so yeah and and do send me a mail about the uh about the k-mers because i think it will make an interesting lecture on how to do k-mer analysis so um the history i will ask definitely one or two questions about the history um so just go through the slides see which people are mentioned um google them a little bit see if they've won a noble prize because if they weren't won a noble prize then they are definitely going to be asked about um there's going to be a question about why r and generally the question um is going to be something like name three reasons to use r and to be very clear when i do an exam question and i ask you to name x things so if i ask for three things and you write down four the whole answer is wrong because you did not understand the question if i'm asking for three things i want to hear or read three things on the answer i'm so sorry ah very tickly throat um i had some water but it's okay i will survive so if i ask for three things don't write for writing down two will still give you two thirds of the points writing down one will still make you eligible for one third of the points but if you write down four things when i'm asking for three there will be no points given and that is just because i am not going to pick and choose which are the three right answers one sars gov two tests coming up i actually am testing myself every day because of the conference last week and i haven't tested positive yet um so i'm very positive that i'm not positive so why are no a couple of reasons on why you should use r what are the advantages also know what are the disadvantages of r so know how to use r as a calculator and there will definitely be a question about euclidean division um there will be definitely a question about the built-in constants like the months dot up for the abbreviation of the 12 months um or about the pa or about pi or these kinds of things there will be questions definitely about the different data types so the different data types are numeric vectors matrices and these kinds of things i also want you to be able to index a vector or a matrix right so if i give you a vector um description right so vector one arrow c one two three four five six seven and i'm asking you guys select the fifth the eighth and the ninth element from this matrix then you should be able to do that there might be a question about variables but um i think that we've seen enough variables that people know that it's just a name that you can assign to to something right so you can you can put something in a variable you can use the variable without knowing what's in it um so that's that's that's what i want you guys to know so for lecture two we started off doing variables again right so variables like a box you can put things in you can use the box without knowing what's in there but there will be some very small control structures right so i want you guys to know how to write an if statement a switch statement write a while loop and write a for loop these won't be very elaborate questions right they will be at most four five lines of code during the exam that you have to write for a single question right it's not going to be write an algorithm to compute the greatest common divisor or something like that now there will be very small questions like write a while loop with prints out the numbers 10 to minus five um make sure that there's a new line after each each each line that that is being printed i want you guys to know the difference between a statement and an expression so there might be a question saying that is this a statement or is this an expression um there will be a question about advanced looping like how to use the apply function or how to use the l apply function um of course we also talked about functions a little bit i told you a little bit about the theory behind functions and about what the scope of a variable is right so that if you have a function you can define an internal variable this internal variable is not visible from the outside it's positive to be negative yeah it's definitely positive to be negative and i was very careful at the at the conference so i wore a mask almost everywhere um it's just that you have like the shared dinners where it's really hard because you can't eat with a mask on someone should find a solution to that um escaping the inevitable be sure to know how to escape and what to escape right so you can have for example an enter which is a slash n right so escaping is by putting a slash in front of a modifier um and of course the slash itself needs to be escaped as well and be able to know the different random distributions right know that there's a our norm for selecting from gaussian distributions and our unit for uniform distributions um there's Poisson distributions and these kinds of things i want you to be able to read in data so when i show you for example a little piece of a data file and i ask you write a read table function or read csv function that reads in this table with the correct headers and row name specification then you should be able to do that there will be a little bit of subsetting of data so know that the in is there so that you can ask questions which elements of this vector are in another vector the which takes a vector with logical values and then gives you back the indexes which are true um and know that you can write data using the write table and using the cut function so there might be one or two trick questions there but i generally avoid trick questions um we also talked about biomart in lecture three so know what the difference is between a mart so mart is a data provider attributes are the things that you want to retrieve filters are the things that you are going to specify right so a filter might be chromosomal location and then the value might be chromosome one from one million base pairs to two million base pairs but a filter might also be a gene name and then of course the value is going to be the value of gene name so know how to use biomart and i think that biomart is important because it's um the our courses are for biological sciences or plant and animal sciences um so biomart is one of these packages which is really important when you do biological research lecture four was about univariate versus bivariate statistics so we talked a lot about univariate statistics right what is the mean the median the mode we talked about the the dispersion measurements like the ranges and the quantiles we talked about spread like variance standard deviation and know how to compute these things using r and we also talked about shape like skewness and kurtosis of data and here it is important to know which packages are available and which functions you can use to for example assess if a normal distribution is skewed or if it shows positive or negative kurtosis during lecture four we also talked about plots so know how to generate a basic box plot a histogram or an image plot in r we also talked about par to set global parameters like the size of the the points the size of the letters but also know how to use par to make like a dual plot right so to have one window with two plots in there using mf row or mf call so in lecture five we talked about classes of objects so that you can actually add a class to a list for example and then you have your own default functions right you can write your own summary function so that when people type the name of the variable holding an object which has a type that you defined that you can print out a little overview of the object instead of having thousands and thousands and thousands of lines of code or thousands and thousands of lines of data run run in front of your nose not only the summary function but also the print and the plot function and the image function can be overloaded using this then know what the artist's palette model entails and that base r uses the artist's palette model right so you start with the background and you work towards the foreground right so you set up your axis then you plot for example your points and then you draw a line which goes on top of the points and then you draw an arrow which goes on top of all of the other things that you already drew then we also discussed some important plot parameters like know how to change your axis know how to change the size of the the font that you're using and and these kinds of things of course you don't have to know that cx being 18 is a filled dot and these kinds of things that goes way too far but but know that cx allows you to set which type of plotting symbol you are using well we discussed some functions for plots things like lines points text text outside of the axis using mtext the title the axis and also know that the width function can be very useful right the width function allows you to take a matrix and make all of the columns of the matrix into variables and also know that this is the reason why there is a requirement on column names of a matrix right because our once column names of a matrix to be proper variable names that means that it cannot start with the number you cannot have like weird symbols in there and these kinds of things and also we had like an overview of what makes a good plot right so make sure that your axis have units on them make sure that there's a legend which explains all of the elements in the plot and these kinds of tips to make good-looking plots in lecture six we talked about common microarray workflow we talked a lot about different normalization techniques and know why we normalize right know that there are normalizations of scores and normalizations of ratios and know why we are normalizing in general we also talked about log ratios and this is of course because if you have two variables which have a very different range you can take the log ratio of the two ranges and then still end up with numbers which are linearly related right so this has to do with the dye bias in in microarrays where the green dye has a much larger dynamic range than the red dye the red dye is much smaller dynamic we talked about t-tests what are the assumptions underlying a t-test so I have one of the assumptions underlying the t-test this is that it's a normal distribution that samples are independent and all of these things we talked about correlation as well so correlation is a measurement of how one variable reacts when another variable changes so have one variable goes up the other one goes up as well and then the correlation will tell you how strong this relationship between these two variables is we talked about multiple testing and things like type one and type two errors right so type one error is saying that to a to a woman that you are not pregnant while she actually is while a type two error is saying to a man that he is pregnant while he never could be so more or less right we have a couple of slides about type one and type two errors so type one errors are false negative type two errors are false no on the way around type one errors are false positives type two errors are false negatives and I also showed you where you can get a lot of free microarray data so know the two databases that provide you with free microarray data and also know the difference between them on lecture seven we talked about algorithms and design patterns I won't ask a lot of questions about this I also probably won't ask you to write a recursive algorithm but know that recursion exists and know that recursion has a single parameter which always goes up or always goes down towards the base case and that this variable is called the recursion invariant so it is for example x right and then when we call the same function in the return statement we call it on x minus one so then x is the recursion invariant and also know that you can have indirect recursion so which means that you have one function calling another function which calls the original function right so that it's kind of a loop of two functions calling each other but yeah know what recursion is know what the base case is know that the recursion invariant is the variable that counts down or counts up towards the base case and makes it so that recursion stops at a certain point so when we talked about lecture eight we talked about regression we talked a great deal about regression because it's one of these fundamental algorithms to analyze data and to to find relationships or associations between variables know how the regression model works right that we have unknown parameters which are beta one beta two beta three and this is the thing that we want to know right we want to know how strongly something some variable influences our response variables so the dependent variable right and these these variables which are the things that we want to estimate are called the independent variables and they are denoted x and the dependent variable is the thing that we want to predict which is called y so we talked about single linear regression we talked about that every beta that you calculate comes with a confidence interval and that this is just a statistical association right so it's not the truth because all models are wrong some are useful but that that every parameter that you estimate comes with a certain confidence interval we talked about creating regression plots like plotting a independent variable against a dependent variable and then adding in the regression line we talked about multiple linear regression where you don't have a single independent variable but where you have multiple independent variables right where you have for example the body weight of a mouse is determined by the sex of the mouse the age of the mouse perhaps it's feeding behavior yes so then we have multiple independent variables for each of these we want to know the effect on the dependent variable and then it's called multiple linear we also talked about regression models which in first instance do not seem to be linear like quadratic regression or e to the power of beta x right but these are still linear regressions so you can have a model where you say my dependent variable is determined by the body weight to the power of two or my body weight is determined by the sex to the power of three right so there there there can be quadratics or to the power of three or e to the power of x in regression models and this still is a valid linear regression model although it might not look linear and know that you can use the curve function to plot things like quadratic regression so in lecture nine we talked about linear mix effect analysis we talked about the random effects and that when we do an lme a linear mixed effect model we do this because one of the assumptions of a standard linear model is violated for example we have measured the same individual multiple times so that means that all not all measurements are independent of each other or we measured brothers and sisters right so these individuals are not independent measurements since a brother and a sister share 50 percent of their genome and this needs to be accounted for into the model because otherwise you would overestimate your significance so know how to do these linear mixed effect models in r and know the difference between a random intercept model where for each individual you allow it to have its own intercept right its own mean of a certain variable and that there is something like a random slope model where you allow the slope of the variables oh my voice is getting really bad so a random slope model is when you when you estimate multiple slopes each individual is allowed to have its own effect right so if we think about body weight and for example your food intake right then we might allow every individual to have an individual food intake slope where for some individuals taking in a lot of food will lead to not a lot of increase in body weight but for other individuals eating a lot of food might create a lot of intake or a lot of effect on body and of course the pdf is part of the thing that you want to learn so the the boto winter pdf is part of the excellent good so in lecture 10 we talked about different linear models so even more complex linear models where we have not a response variable which might not be a normal distribution or a response variable which might not even be continuous right so in the case of a case control study where some people are responders and other people are non-responders or if we think about a survival curve where some people or some mice die during the experiments and others survive right so if you are a survivor or a non-survivor and then we have to use different different models we have to use link functions so we then tell the linear model saying that well the response that we are looking at the independent variable is not a normal distribution so and know the difference between a standard linear model a linear mixed effect model which is fitted using lmer and then we have a generalized linear model so a generalized linear model allows you to have a response variable which is not a continuous variable and for example a Poisson distribution if we are using a generalized mixed a linear mixed effect model then generally if we have multiple factor levels in the dependent variable right we had the example where we have for example your admission into the university being determined if you are from a certain level of high school right you might be from a top tier five percent high school or you might be from a top 50 percent high school so the example was we had multiple levels of high schools right so so multiple factors that you can use the vault test to group these into a single probability value like how likely is it that the high school that you went to is changing the admission because when we are doing a factor test hey when we are using different levels in r then every level gets its own beta estimate and gets its own p-value but by using a vault test we can combine these effects and these p-values into a single effect and a single p-value we also talked about common idioms so things like melting and casting going from a wide format to a long format and we also discussed a bunch of other idioms so be able to use them and be able to to know or recognize some of these idioms so when should i use certain types of code so in lecture 11 which we just did we talked about how to create a package and so what do you need could be a question on the exam and hey you should then answer well you need r you need r tools and you need mixtex and you should also know why you need it right so you need r to to load your package eventually you need r tools to compile the code be it r code or be it c plus plus code and you need mixtex for the documentation files know what the difference is between a description file and a namespace file and know that there are some special files and also know the different folders right so if i ask you what is the folder called holding the documentation then you have to say man good so with that for me there's nothing left but to wish you all very very good luck on the exam register for the exam if you haven't the exam will be on the 28th i'm saying from my mind let me actually look that up log in c for leisung's fist i know um leisung's termine irons so the exam will be on the 28th of july so i wish everyone good luck on the exam the exam is not hard if you follow the lectures um then you should be able to pass the exam with at least a 3.0 or higher or lower in the german system because it works the other way around but it's not a hard exam but it does check if you listen to the lectures if you did some of the assignments so of course there might also be some questions about the the assignment good and then it's your time to give me some remarks some feedback um i see that there's not a lot of students currently attending which is logical because it's like 27 degrees and sunny outside so i when i was a student you when the weather was like this i would not attend lectures as well which is perfectly fine um but if you if you want to leave some feedback um and i'm not looking for positive feedback i'm looking for things that i should improve things that you think should be better um then let me know in the comments or throw it in chat now um but yeah like i'm always like feedback is the thing that helps me improve for next year um and that's always good so if you if you say well um you should do this differently or when you are showing us are it's too small or these kinds of things um then um it's good to let me know in the comments and like michel said before about the corona test it's uh it's it's positive to be negative uh can i go for exam if i start watching now um yes if you are registered to the humbled university you probably could join the exam still um but if you are studying at a different german university then it's going to be hard to make the first uh round because then you still have to register as a neighbor and these kinds of things um but yeah if you in theory if you would just start learning like four days before the exam and you would put the whole youtube movies in like two times speed then you could watch everything in like 20 hours right because it's kind of 40 hours in total and then if you do it at two times speed it's 20 hours and then you can do the assignments in the evening then you should be perfectly able to to join the exam and and pass it right it's the if my goal is not to have you guys fail the exam my goal is just to make sure that you attended the lectures and that you know the basics of r so that you can write a for loop that you can select things from your matrix that you can make a subset that you can load in files that you can write out files make some basic plots understand how a linear model works so it's not it's not for me to make it difficult for you i want you guys to pass and i want you guys to get a good grade so that that's my goal good so yeah if you have any feedback hey like it's positive to be negative so let me know and say well dude you're talking way too fast or i don't know i don't know if if there's anything that you don't like about the lectures then do let me know and and don't don't keep quiet because if you are bothered by something then other people might be bothered by it as well and then it's good to let me know so i can change it so with that we are at the end of the lecture so it means this exam is paid i live in france i have attended your lectures max one hour so yeah the problem is is that the lectures are given as part of the lecture series for the Humboldt university so that means that anyone studying at a german university can do the exam you just have to register as a neighbor which is filling in a form i think you pay like 15 euros in administrative cost and that's it if you are part if you are enrolled at the Humboldt university joining the exam cost you nothing so it means this exam is paid i live in france or i have attended your lectures max one hour and i've registered on Moodle but i don't know university procedure yeah so then it then it's going to be relatively hard to do the exam because i can only give credits to people who are in germany because i'm certified as a teacher in germany but not in france so but then like you're more than welcome to follow the lectures and learn stuff from it great lectures hope you make many more love bioathematics and maybe some real practice it would be great thanks a lot for sharing your yeah no you're welcome so um but shana was yeah it it's going to be hard getting your credits approved for a french university um it is possible but it is difficult um so um that that's the thing i didn't do the exam to learn a lot yeah no but that's the thing right like the idea is that i stream them to make them available for everyone so everyone can learn how to program r and see how if i'm doing um if i need r as a language like c um in theory the languages are equivalent so if you know how to program in r you don't really need to know how to program and see if you know how to program and see you don't really need to know how to use r but every language that is out there and still exist has an advantage like one of the things which r is really bad at is managing random access memory um but a language like c gives you very strict control over your memory but not just a memory but also over timing like there's no garbage collector which can come in and and screw your timing so if you if you use r um then r can do a lot of things for you like linear models in like a single line of code in c you can also do linear models but that's going to be like 20 30 40 lines of code because you have to write part of the code yourself or you have to use a dynamic link library developed by someone else and the nice thing about r is that you have built in linear models you have built in probability distributions so the language is is is different from c i will practice by myself and i'm already using r for my phd tiny first gd plot yeah yeah but the the idea behind my lectures is is that you don't need all of that like i never like i've been working with r for eight years here for years phd that's 12 years plus two years master is 14 years i've been programming in r for 14 years and i've never used tidy verse never ever if i see people writing code in tidy verse to select a column from a matrix i'm like face-forming and thinking like like my philosophy in programming is is that you want to have as little dependencies as possible because dependencies create pain and baggage for the future because tidy verse might change tidy verse might disappear altogether chances of it are low because it's used by a lot of people but there is no guarantee right someone like hadley wiggum is a really really good programmer he made gg plot but if he gets hit by a bus and there's no funding to hire someone behind or after him then the whole thing ends right and then you are stuck with the version that you have and bugs don't get fixed and so the idea behind all of these lectures is to make you guys familiar and how to use base r to use r without any dependencies and that is that is almost impossible because there are some really really good packages out there right so packages like bio mart but i try to always minimize my dependencies so one of the rules when you build an r package is that you cannot have more than four dependencies so as soon as you type library gg plot library tidy first library something else right you you can only use four and that's the limit i like gg plot only when i use it to plot real-time sensor data it always gets me in trouble i like gg plot as well i love how it looks it's just that after programming in r for like 14 years i am not going to make the investment to learn a whole new syntax and that's the same for tidyverse right everything that you can do with tidyverse you can also do with base r it's just a a layer on top which should make it easier for people but if you know base r then there is no reason to use tidyverse or tibble or the other stuff because it doesn't add anything it's just the same but using a different syntax so for someone who already knows r and is already programming in r for 14 years it's just an additional investment for me right i have to spend more time learning a new syntax figuring out what it does like as soon as i see this like greater than symbol percent greater than symbol i'm like i'm not gonna read this code because like they're generally using it for very very basic things which could just be achieved by using an apply function um but that's that's just the way that i think i think dependencies should be avoided because it just gets you into trouble in the long run because someone like hadley wiggum he's a he's a perfect guy the code that he writes looks really good i love ggplot i love the stuff that that he does but if he gets hit by a bus tomorrow it all ends and then you if you only know how to plot in ggplot then you will have a problem because ggplot will more or less cease to be updated and in a couple of versions it will be thrown out of r so there's always a risk when you use when you use dependencies so i i i i love ggplot it looks beautiful i've used it myself in the past a couple of times but if i am teaching a course to teach people how to program r in my opinion ggplot should not be in there because it is a different syntax altogether what is the best best route for learning data science the best route for learning data science is learning at least one programming language be it python be it r um be it pearl um those are kind of the three main languages which are used in data science so learn a programming language and then find questions that interest you and um and and try to answer those right try to get a feeling on how you build linear models or how you use things like machine learning to answer questions because in the end data science or bioinformatics is about the extracting knowledge from data right data is just that it's just stuff that people measured and it's not it's not knowledge in itself right people always say but we can sequence it right but sequencing a genome does not tell you anything you need to do to figure out how these variants in the genome are controlling things like phenotype and the same thing holds for data science no matter if you're looking at political science or if you're looking at economic science what people do is they gather large amounts of data and someone in data science is there to formulate interesting questions and then to use the data to try and answer those questions so the best way to learning it is learn one programming language and become really really good at it so just spend five hours every weekend programming writing little programs for things that interest you i think i gave the example here on stream that i was talking to people and they were mentioning to me like well all my friends are having their birthday in more or less a two-month period so am i selecting my friends based on when they were born or which month they were born and these are very simple questions to answer you can just go on the street ask random people like what is your birthday what are the birthdays of your two best friends right completely anonymous you don't need to know their name you just want to know their birth date and then the birth date of their two or their three best friends and then you can start gathering a little bit of data and then you can start writing a script to analyze that data if you're interested in weather data and for example climate change then there is so much information out there you can get like 150 years of temperature measurements and humidity measurements from from most weather stations just figure out what your local khanemi is khanemi is the the organization in hollands which collects the weather data and and does all of these things and there's so much free data available like you can use google financial to get 25 years of stock prices so if you're interested in modeling stocks and and doing predictions on if if a certain stock will go up or if it will go down all of this data is available for free if you are interested in cancer and how genes or which genes are involved in cancer there is literally 40 50 years of microarray data available at databases like the gene expression omnibus which you can download for free so just download some data and start writing code start understanding what the data structure is what could go wrong how you can model that and how you can model for example the influence of weather on stock prices because two data sets completely disjunct from each other the one is weather data the other one is is stock price data and you can mash them up right you can you can see if the weather has an influence on the stocks of tesla or if it has an influence on things like unilever right so there there's a lot of things that you can that you can do and the most important thing is learn a programming language and learn how to use it for yourself to to analyze or to answer your own questions that you might have because just following a bootcamp or just listening to me talking on youtube is not going to make you a good data scientist the way that makes you a good data scientist is to to download data and start doing your own analysis which sounds really conspiracy theory right like people in conspiracy theories always tell you like oh do your own research right but doing your own research is is is what you should do to become good it's just that you shouldn't do research on facebook right you should you should use valid data sources like your local police station which publishes crime reports if you're interested in crime or use your local metrological service if you're interested in weather or climate change but find where the raw data is download the raw data and start modeling yourself and sometimes the answers might really surprise you and and might be different than what other people think but yeah that's that's the important part do your own research but do it based on valid sources and do it based on on how you think things should be modeled like george box always said and i can't repeat this enough like if you do data analysis then you're always doing something which is a statistical model right it's something which is not true you're always working with associations which are not true measurements there there's always a high level of uncertainty be aware that there is uncertainty when you make graphs put error bars on the graphs like show what the confidence is so that's kind of my advice is just like learn a programming language and start answering questions that you have yourself and it can be really fun and that that's also one of these things which makes you a good programmer a good programmer does fun stuff we we didn't talk about how to make little movies or animated gifs in r and in the past i wrote a little flappy bird kind of thing for r where you use the r plotting window and you you overdraw every frame and if you press enter then like a dot goes up and you have kind of a flappy bird system where you can do fun stuff with it doing fun stuff doing crazy stuff is part of programming programming is actually a very very creative field in my mind it's it's not it's not like people say a real better right mathematics is really better physics is really better but programming is kind of in between the two and you have programming encompasses everything it allows you to be creative to find new ways of doing good so that's a whole rant about one question thanks for the question shufo dip i hope i'm pronouncing it properly um so if there's no more questions then thank you guys so much for for being here for my for the last lecture of the current r course i will definitely be back i will be streaming on twitch probably next week since i will be on holiday next week and the week after thanks for the lectures the past two two and a half years and good luck at your new job yeah well we'll we'll stay in contact misha and i'll definitely stream more like there's there's so much interesting stuff to tell you guys like i literally have like one and a half gigabytes of code because i had to clean up my my computer here for going to my new job and there's there's there's so much crazy stuff that i did in the past which resources are best for learning r so for learning r i would say that if you follow this course right if you look at the 11 videos that we i now made so the 11 live streams then in total you have around 35 40 hours of me talking about r and live coding the examples then you have a relatively solid basis there are a couple of good books out there like what's the one that i always advise and those are actually available on moodle as well so in the moodle system there's three books that i put in there so it is the a beginner's guide to r from Zür 2009 then there is introductory statistics with r by doll guard 2008 and then there's understanding statistics using r by schumacher tomec published in 2013 those three books used to be free from springer because of the pandemic and those are really really really good introductory books so you just go through the book most of the books of some assignments in there i can recommend the data analysis using r lectures by dr outins yeah well i don't want to promote myself too much about the do you have an indication of other great lectures in learning r as yours who that's difficult i was on a conference last week in rotterdam and i figured out that one of the um one of the guys that i met there um also gave lectures on youtube i don't know actually if he gives r lectures but i can look that up very quickly because i subscribe to him um and i don't see do they do r courses introduction to genomics blink in r r for beginners um yeah look up genomics bootcamp um and then the uh r for beginners lecture um let me copy the link for you guys to put in chat um see if that link works for some reason what's going wrong with the link then let me try this again so that didn't work too well um but yeah just search for r for beginners and then the the the channel is called genomics bootcamp i don't like the name of the channel but he's a really good guy like uh gabar um he's a he's a good guy he really tries hard um and yeah definitely the introduction by carl berlman like the the link that my moderator put in chat is is really really good links okay okay yeah because when i click the link i i go to a channel error but that might be because i'm coming from the streaming environment and not on youtube itself so yeah um carl berlman definitely works um is is worth it as well um he's one of the guys that when i was learning r and learning statistics really helped me like a lot in in figuring out how statistics worked um so definitely and there's definitely like like just search for r course right and on youtube it's really hard you have to like the person in a way right the the voice has to be okay the sound like they're it really depends um it also depends on your own background so um but yeah that definitely uh give gabar a shout out because i i met him on the conference last week and i saw his lectures or not so much his lectures he did a talk about education via youtube um and i totally agree with with what he said um like it it is very important to involve people directly when you do lectures um but he he has a slightly different idea behind it because he makes these like 20 minute videos um and he's not like me right like i'm just sitting here talking in my office um but he really like writes a script beforehand and says we want to know this this this and this and then he kind of compacts it all into 20 minutes um while i'm more from the other side i'm more like i want you guys to exactly know what's going on and i i'd rather talk for an hour about how to select something from a matrix and make sure that you understand exactly the minutia while he just does it slightly slightly different um but yeah carl bromann is like perfect as well um he's especially good when it comes to uh comes to statistics um and and genetics so he's definitely uh worth checking out as well all right any more questions like i'm starting to get the feeling that we're just getting warmed up and although the lecture is more or less finished but that we um that there's actually some good questions coming up so um i like that a lot i like that a lot so but if there's no more questions then uh i'm just gonna and the last lecture of this r-course very very sad i actually might do one next week about like a topic that interests me and is not part of the exam um like i said i will be free from next week on tomorrow's my last day at work here and then i have like one and a half month uh before the next job start and i might make some videos i might play some video games on twitch because i have my twitch channel as well um so all right so then thank you guys for watching thank you guys for being here and um make sure to give a like or a subscribe to the channel it really helps out on youtube like by liking the video other people will get informed about it and it the the reach is just a lot bigger good then uh with that um thank you for being here for the last like two and a half hours i really enjoyed it i hope you guys enjoyed the course if you have any feedback negative feedback um also let me know um because i'm always glad to hear from students and um get some feedback on things that i might improve in the future so with that um i will see you guys next time um i don't know exactly when but i will make sure that i post at least early or more in advance that i did today today was a little bit silly because i i did not schedule the stream like a couple of days in advance but i hope people that enjoyed the lectures and the stream are still here and uh i i thank you so much for being here and i will see you guys next time so thank you and goodbye