 I'd like to learn a language, but I'm not really sure what to learn. I hear languages like C, C++, Java, Python, R, Haskell, Ruby, all these different languages. Which one should I learn? I'm just like confused. Well stick around. I will tell you the language that you need to learn and you know what, it might shock you what I have to say. If you do a Google search for should I learn R or Python, you're going to get just scads of different articles telling you which one you should learn. And they will tell you, oh, you should learn R, no, you should learn Python, or no, you should learn Julia, because it's really cool, right? Well, you're going to get just tons and tons of information. And dear viewer, I can just tell you, don't bother, don't bother. They provide a lot of flash without much substance. There's not a lot there when you kind of strip things away and you kind of try to figure out what they're actually trying to tell you. What is most important, though, is that I want to congratulate you. I want to congratulate you for making the first step and acknowledging that you need to learn how to analyze your data in a reproducible way. And to do that, you need to learn a programming language. Many years ago, I was visiting a college campus and as we do, we have lunch with graduate students and postdocs. And one of the things I asked them was, in 10 years, do you think we're going to have more data or less data than we do now? And they all said, oh, of course, we're going to have more data 10 years from now. And then I said, well, do you currently have problems analyzing the data you have? And they just realized what they had just acknowledged, right? That in 10 years, we're going to have more data than we do today. And if you don't have the skills today to analyze today's data, well, you're going to be hosed in 10 years when you have even more data that you need to work with. And so the key to figuring out how to analyze that data in a reproducible manner, as you've already acknowledged, is to figure out how to program in a programming language. And so I will tell you the point of this video right now. It doesn't matter what language you learn. What matters is that you learn a language. That's the video. That's my point. That's all I have to say. Learn a language. That's all, right? I don't care if it's Python, R, Julia, Haskell, C++, whatever. Learn a language. But Pat, you might be saying, well, OK, let me help you winnow things down to those languages that will probably be the most helpful to your growth and success as a scientist. Before we dig into the big debate between Python or R, I want to maybe give you a little bit of the landscape of different programming languages. There are two different types of programming languages. There's compiled code. These are languages like C and C++ and compiled code. You take the code that you write and the compiler converts it to machine code that's understood by your processor on your computer. The advantage of programs written in C or C++ is that they run very fast. If you've ever used my software package, mother, that's written in C++ because they are speed of great importance. The downside of compiled code is that it tends to be harder to write the code. There's just a lot more that you have to do to actually get the result. So if I wanted to add a string of 100 numbers, well, that's probably going to take me three or four lines of code. If I do that in R, I can do that in one line, no problem, right? And so the compiled code will be very fast, but will be slower to develop. Interpreted code, on the other hand, is very fast to develop, but tends to be a bit slower. It is interpreted because every time you run it, it reinterprets the code to then be run on your computer. It uses that interpreter to convert what you've written into something that your computer can understand. And as I said, it is very fast to write. So again, adding up a string of 100 numbers. In R, you'd say sum 1, 100, bam, you're done, right? But it tends to be slow. And so I would say for summing up 100 numbers, you're not going to notice the difference in how long it takes to get that sum, right? The difference between C++ and R, it's minuscule, right? And it's certainly not something you're going to notice and isn't worth perhaps all the extra overhead of figuring out how to calculate that sum in C++. So interpreted languages include things like R, Python, JavaScript, Julia, Ruby. These are often also called scripting languages, and they are interpreted, right? And so they don't get compiled. They have something when you run the program or run it through the program like R or Python that takes your written code and converts it to something that your computer can understand. So let me help you to narrow that landscape. And let's just think about interpreted languages. For most of what people have to do out there, something like C or C++ just really isn't necessary. If you're doing something that requires a lot of heavy lifting and a lot of intense computation, sure, you can go learn C++, but perhaps see if it works first in one of the interpreted languages. So I'm going to narrow things down even further for you among the interpreted languages, as you've already guessed, to R and Python. And so yes, there are other languages out there, things like Ruby and Julia. Those are interpreted languages, but they're perhaps not as mature or perhaps not as well developed for a lot of data science or bioinformatics types of applications. So why would I have you choose Python or R? Well, one reason is that they're both free. You don't have to pay for them, right? If you're to use something like SAS or SPSS or Graphpad Prism or Microsoft Excel, you're going to have to pay for them. And if not you, then someone at your university is paying a site license to get that software. I see this distinction come up frequently among engineers who all love using MATLAB, thinking that MATLAB is free, but not realizing that their engineering school paid for the site license to MATLAB, which would cost most of us hundreds or thousands of dollars to get a license to use MATLAB. So while MATLAB has a lot of the same features as things like Python or R, it's not free and it's actually very expensive. And that cost, I mean costs, right? But then that cost also naturally will keep people away from using your code. And we all want people to use our code and to understand what we're trying to do because we want our analyses to be reproducible. So cost is a big factor that keeps our code from being reproducible. If going along with both programs being free is that they generally come pre-installed with any Linux-based operating system. So if you're running Linux, obviously, if you're using a Mac, the OSX operating system has R and Python already installed with it. If you're using Windows 10 and have the bash subsystem installed, that will also have Python and R installed. The next reason that I would encourage you to think about Python or R is that there are wonderful communities out there developing code to expand the features of these two languages. Could you imagine going to the developers of Microsoft Excel and saying, it'd be really cool if you could figure out how to make these new plots that people are calling joy plots or ridgeline plots. Do you think you could incorporate that into what you're doing here with Microsoft Excel? I wouldn't even know who to ask, right? Well, with R or Python, I could develop that, right? I could develop that in a package and then put that out there for the R or Python community to use. And that's exactly what happened, right? Someone made the Gigi ridges R package, at least for R, so that anybody then can make a ridgeline plot and expand that package even more and make it even better than it originally was. That's just something you can't do with those proprietary software packages that you have to pay for or that are so locked down by the company that's developing them. And so really the ability to do all sorts of different analyses, things from textual analysis to different type of data visualization is just baked into the ecosystem of programming and R or Python. And so I just can't say enough about the wonderful packages that are out there, hundreds, thousands of packages that allow you to really expand the toolset that's available within these languages, which just is not possible with other computational platforms. Beyond the communities that are there developing packages, there's also great communities out there to help you develop your R skills, right? There's Stack Overflow, there's the RStudio community, where you can post questions, you can answer questions, you can look at other people's old questions. There are meetup groups. Many major cities or not even major cities like Ann Arbor here where I live have Python and R user groups. There's also user groups that are focused more towards women and just really nurturing environments that have been built up around both of these programming languages. I'm sure there's probably meetup groups for Microsoft Excel, but I'm sure they're just not as fun or exciting as the R Python user groups. So R and Python are both modern, yet mature programming languages. They've both been around for several decades now, and they have a full set of features baked into the languages themselves. And as I already mentioned, they have this universe of other packages that can expand the utility of those languages. Other options for these interpreted languages might include things like Julia. And so Julia is an up-and-coming new programming language that I don't think is so full-featured quite yet and doesn't really have the big user community that you'll find for R and Python. And so while Julia is something certainly to be aware of, it's not something that I would encourage somebody to jump into right now here at the end of 2021. At the same time, there's also a language like Perl. And so as a postdoc 20 years ago or so, I learned Perl because that was kind of the cool thing at the time and it was really popular among people in bioinformatics. But it kind of became stale and just didn't continue to develop with the same types of packages and support and everything else that you now see from Python and R. And so for that reason, I wouldn't really think about those other interpreted languages and really it would run with R and Python. Something else I'll add is that both R and Python have very rich and very complementary sets of features. Within the tidyverse with R, you're probably familiar with Deplier. My understanding in Python is that the analogous package there is something called Pandas. They both have rich plotting systems. They both just have a lot of rich, great features that really complement each other well. If you go online and do these silly Google searches again, you'll probably find people say things like Python is great for machine learning. Well, the same features are available in R as well. And oftentimes the libraries that are available in Python have been ported over to R and those from R to Python. If you're one of these engineers that I love so well, know that there are R packages that are basically wrappers on all the features from Matlab. And so again, these languages intermix with each other because they want to be as full-featured as possible. The other thing to keep in mind is that as these languages continue to mature, they break down the walls between the languages. And what I mean by that is within R, I can program in CRC++. I can also run Python code within R or I can make an R markdown document that's running multiple different languages. And the same is also true from the Python environment. There are also Jupyter notebooks which come from more of a Python lineage that allow you to run all sorts of different languages within those notebooks. It's not just for Python. You can run R, Julia, Perl, Bash, whatever you want from within those notebooks. And again, as these languages continue to mature and as the different communities talk to each other, I think you can really expect to see a lot of great cross-pollination between the different languages to only make them better. Finally, I will tell you that both R and Python are widely used in the bioinformatics communities and data science communities. If you do Google searches or go to Amazon and do a search for R and bioinformatics or Python and data science, you will find tons of resources to help you to learn those languages. There's lots of great books, all sorts of different discussion forums, as I mentioned, web-based tutorials, videos like this one, all sorts of great materials that are there for the bioinformatics and data science communities, regardless of whether you're using R or Python. It's really hard to go wrong with one of those two languages if you're part of this community, and I'm really hard-pressed to think of another community where you would have to choose one over the other. I think these languages are just so widespread across science, both biological, social sciences, humanities even, that there's just such great features for these languages to help people in these different communities. So by now you should be saying, okay, I believe you, I should learn R or Python, but which language should I learn? Well, let me give you a thought experiment. If you were to move to Argentina and you're before the move date, you're trying to think about, well, what do I need to get ready? Well, I need to learn the language, right? I need to learn a language so that I can communicate with people. What language would you learn? Well, yeah, right? You'd probably learn Spanish. You would not learn German, right? I'd be really pretty hard-pressed for you to justify why you would learn German if you're moving to Argentina when nearly everybody there speaks Spanish. The same is true with the programming language. And this is where I'm going to help you to decide between Python and R. Someone joining my lab, what language should they learn? Well, as I've already established, I know R. I speak R pretty fluently. I can probably read Python, but I'd be pretty hard-pressed to open up a blank screen and start programming in Python. It just wouldn't be something I would do. Well, if you were to join my lab and you wanted to be able to communicate with me, you know, communicate in code, what language do you think you'd learn? Yeah, you would learn R. So in my lab, the language we speak is R. I have had people that have joined my lab with no programming experience, people that have learned C++ or JavaScript or Python in their past, and then they joined my lab. And what do I tell them to program in? Do I tell them to program in their original language? No, I tell them to program in R. And so I try to be pretty strict about this, because when they, you know, finish their degree or finish their post-doctoral training, they go off to great things. Who's left with the code? Me, right? And so I'm going to have to maintain their code. And hopefully also other people in the lab will want to look at their code and build off their code to make it better or to use that code to help facilitate their own analysis. If we had people using all sorts of different languages, it becomes kind of a virtual tower of programming babble, right? Where we're all speaking different programming languages and we can't communicate with each other. And the most important person to communicate with is your PI, right? Or your boss, or the person that you're collaborating with the most. And in my lab, that's me, right? I have had post-docs who have left the lab with projects kind of, you know, 75% done. And then I have to finish up the project and I look at the code and only to realize that they broke the rule. They perhaps wrote some scripts in Python. And I recall somebody doing this where they had maybe 15 lines of Python and they were basically doing some really basic file manipulation things, things that might be one line in Bash. And so I was like, you don't know Python. Why did you write this in Python, right? So because I didn't know Python and couldn't manipulate it, I had to ultimately get rid of that code and rewrite the functionality in something that I knew. And that kind of wasted my time, right? So again, think about the environment that you are going to. What language do they speak? Is it Python or R? And then I would even say, you know, if they're using a different language, say they're speaking Ruby, go learn Ruby, right? When in Rome, do as the Romans. When in Argentina, speak Spanish. When in the Schloss lab, speak R. The other reason that I'm pretty strict about this is within my research group, I try to have an immersive language environment, an immersive training environment. And so I want people in my lab to be able to talk with each other, to help each other, to train each other. Sure, I provide a lot of training, but I'm not, you know, knee to knee, shoulder to shoulder with them, helping them through their work. They're far more likely to do that with their lab mates. And so if everyone in the lab is programming in a different language, then there's just not as many opportunities for those exchanges. Whereas if I've got five people in my lab at all different, you know, periods of time in my lab, and then all different periods of experience with R, then I have a really good mix of helping people to learn R because the more experienced people can help the more junior people. The more junior people can perhaps ask the more senior people to kind of refresh their own skills and to get those skills to be a lot stronger. Again, if everybody's doing a different language or even if I have two different languages in the lab, then basically dividing the lab in half. And in my experience, that has only shown confusion and difficulty in getting everybody to the same place in their programings. So if this isn't clear yet what language you should be programming in, what language should you learn, I have a series of questions for you to answer. So first of all, what are the people in your lab programming? I've already talked a lot about that, right? Well, you know, perhaps you're the only person in your lab or you've got a really small lab and you all are just getting going with programming. Okay, that's cool. Well, are there other scientists on your floor of your building? What are they programming in, right? And then you might go a little bit more broadly in your department. Are there other people in your department that are doing some data science work? What language are they programming in, right? Well, and then if you go a little bit further, most many universities have like a bioinformatics core or a statistical consulting core. What do they suggest you use, right? And then you might go even further and say like, you know, in my town we've got this great user group focused around this one particular programming language. Great, great. So what I want you to ask yourself is if I run into a problem and you will run into problems because I do too, who will you go to for help, right? And a physical person, a physical embodied person is always better than going off to some virtual help form. So who are you gonna go to for help if you run into a problem, if you pick Python or if you pick R, right? So think about that and then let that inform what language you will choose to learn as your first language. And so perhaps you've answered that question and you've made the mistake of doing that Google search for Python or R and you've come back to me and you say, but Pat, this blog post seems really authoritative and they say, I shouldn't learn that language. I should learn this other language instead. Well, let me help you to interpret those blog posts. The first thing to realize is that many people find certain things in Python to be off-putting, not helpful, right? So many people complain about the importance of white space and getting white space and spaces or tabs or whatever just right for Python to run correctly. That just annoys some people, right? Other people find that the fact that you have to use multiple backslashes in a regular expression for R, that's just annoying, right? And so you'll find people that program in R or people that program in Python that have things that they don't like about the other language. And if they're honest, they'll have things that they don't like about their own language. There are no perfect programming languages. So just accept that, right? There is no perfect programming language. And when you go to a blog post saying, which should I learn? You're basically asking for someone to tell you what is the perfect programming language? There's no perfect programming language. The other thing to appreciate in those blog posts is that people have biases and these might not be explicit, but they can also just be implicit. This could just be based on their own knowledge, right? So if I wanted to write a blog post comparing the speed of R and Python and say C++ or whatever, if I know R really well, I know all the tricks to make R code really fast. If I don't know Python very well, I'm gonna write it in some pretty vanilla Python that might not be very efficient. And so I can make R look really fast, right? And I can make Python look really bad. And the opposite is true as well, right? Someone that's a experienced Python programmer can make Python look just amazing and they can make code in R just look horrible, right? And so realize that people come to those types of blog posts with a bias. I obviously have an R bias that I'm trying not to express here, right? But realize that if I make a comparison between R and Python, I've got a bias that I know R and I don't know Python very well. That's just a bias, right? And so you need to acknowledge that other people have biases as well. The reality though, is that if you're going to establish your skills in one of these two languages and you're gonna be working in data science for any length of time, that very quickly you will probably want to develop skills in other programming languages, right? So I consider bash programming to be a type of programming language. I consider HTML to be another one, right? And so these are complementary but distinct programming languages from R that I can use, bring together to make attractive visuals and do some pretty fancy and reproducible analyses. Over my 25 years of programming, I've learned quite a few different programming languages. The first language I learned as an undergraduate was Pascal. I've learned some basic, some pearl, R, C++, C. I've read Java, JavaScript, Python. I've programmed some in Ruby. Lots and lots of different languages, right? And I will probably learn more before my career is over. The hardest language was Pascal. Why was Pascal the hardest language? Was it because of the weird syntax? No, because I had to wrap my head around how to solve problems with computers. And I did that in Pascal, right? If I'd learned R first, that would have been the hardest programming language for me to learn as well. And so, again, that first programming language, just like the first spoken foreign language for you is going to be the hardest to learn because that's where you form your mental model of how you interact with the computer. So now, if I went off and learned Python, say, I suspect it'd be really easy for me to learn Python because I already have this mental model for how programming languages work, right? And so I might be thinking about how, I know how to make a for loop or an if statement in R. How do I do that in Python, right? And so, encourage you to learn one language and not get overwhelmed by thinking, oh, I've got to learn all these different languages. Learn one language, learn it well. And then once you understand where the gaps are in that language and what you can and can't do with it, then think about other languages that you might learn to complement those deficiencies. For me, if I was starting out learning R or starting out learning Python, my second language probably wouldn't be the other language, so me, I wouldn't go out and learn Python. I would go out and figure out how do I script in Bash? How do I write programs that work at the command line to manipulate files, to move things around, to create files, to run scripts? That might be my next language or perhaps HTML and JavaScript so I could help to disseminate the information I'm doing to this crazy thing called the World Wide Web, right? So again, as you get immersed in the language, you will identify where the holes are and how you can go to other tools out there to then bring them in. And you might say, you know, I don't need to learn Bash because I can use the file manipulation tools within R or within Python to do the same thing and you know, that's awesome, more power to you. And again, it is a journey that you will go on and I can guarantee you though that at some point you will start to learn other languages and that mental model you create with this first language will be so valuable in helping you to develop your skills with that other language. So if you are interested in learning R, have I got a deal for you? I have all sorts of great resources here on this YouTube channel. I have a website with all sorts of great resources. I can point you to other resources. I would love to have you stick around. Please be sure that you subscribe to this channel so that you can get updates on whatever I do next. If you decide you wanna learn Python, awesome. That is so cool that you wanna learn Python. I'm really happy for you. Unfortunately, I don't have any Python materials for you but that's cool. There's other people here on YouTube that would be more than happy to teach you Python. There's other great resources online as well and if you need help finding those drop me a note I'd be more than happy to help you find those resources for learning Python. If you're one of those people that wanna tell an R programmer that you shouldn't be programming an R or you're somebody that wants to tell Python programmers that you shouldn't be programming in Python, I have a favorite ask of you, go away. You are not helpful, you are not helpful. You are so in confusion and you are frustrating people and frustrating their educational process. I cannot emphasize enough the importance of helping people to learn their first language. Don't be offended that they didn't pick your language. Be excited that they are jumping in and trying to learn how to program. I'm excited about that and I hope with me that we can help bring other people along the way to help them learn how to program in their language in a far better, more robust reproducible manner. So keep practicing with whatever language you choose and we'll see you next time for another episode of Code Club.