 Thank you for joining us on what appears to be an absolutely beautiful day here up in Scotland. I trust it is elsewhere. Thank you for joining us for the fourth and final code demonstration that we're hosting here at the UK Data Service and I'm joined once again by my colleague Julia Kasmeier who will be looking after the chat, posting links, taking questions and just keeping me on track. So thank you for giving up your time today. It's the fourth and final session. We're going to focus on computational environments. So by that we mean setting up your computer so that you can conduct and reproduce a piece of data analysis, a piece of data analysis that is computationally dependent or computationally intensive. And if you're having issues with the sound, the video then please post some comments in the chat if you're logged into Twitch and Julia will be able to respond. So I'm going to do about 40 minutes. Computational environments are a wee bit more of a technical topic. You'll see that we've produced a notebook for you to use as well. There's not actually any code in this notebook. Everything we'll be doing will be done through the command prompt. You may recognize this as the kind of little black box of doom that you may have occasionally launched on your machine without realizing it. I'll go through all of that. So what we're basically doing is, and if you need to use this notebook, Julia will post the link into the chat and then you'll be able to follow along. But particularly for today, you won't be executing any code in the notebook so you can actually just track what I'm doing. And the notebook is for later on when you want to practice again, set things up on your own machine. But for today, probably best to keep an eye on what I'm doing. Just want that note, let's get started. Again you have about 40 minutes today, but you've got the notebook for practicing yourself. I'm looking very large, let's make myself a tiny bit smaller. There we go. Yes, so key things today. Understand what a computational environment is. So be able then to install and to import Python modules on your machine. So these are extra functionality that we need for scraping web pages, manipulating datasets, natural language processing, all these things are not standard with your Python download. Then we'll learn how to capture the computational environment in which you conducted your work and being able to share that with others so they can reproduce and adapt and extend what you've done. And then we'll actually go through the very short process of reproducing a computational environment that somebody else developed. And then we can actually look at somebody else's bit of work on our machine. So very technical if you don't have any kind of base knowledge, but actually it's a reasonably simple short task to be able to create environments, capture them, share them, reproduce, and I'm going to go through everything from step one. Again these are Jupyter notebooks that we're using. We're not really going to use this one today. It's just it has again the advantage of mixing live code, you know, text and outputs. You're all probably sick of this. If you've been watching me before and in fact it's so out of date that it tells me to enjoy web scraping and that was two or three code demonstrations ago. So what is a computational environment? Just so we're all clear on what we're doing. So each computer has its own unique computational environment that consists of not just the operating system, not just what software you have installed, maybe use Python, maybe use R, maybe use Envivo, maybe use Stata for example, then you have different versions of that software installed. Maybe you have Stata 14 or Stata 15, older versions of SPSS, you know, older versions of Qualtrics, etc. And there's also other features so, you know, the kind of machine you're doing your work on, you know, is it quote unquote a super computer that you're using, you know, to conduct your analysis. Is it a very cheap bog standard university machine and all these things. So basically the machine in front of you, the hardware and all the software you use to conduct a piece of analysis, they constitute the computational environment. Best question we have here is why do you even need to understand that? I'm pretty sure all of you have been doing your work up to this point using a computer, you know, doing fine, robust, you know, defensible research. So why do you need to know about this? Basically, there's two competing trends and two competing issues. One is the fact that our work is becoming a bit more computationally dependent. So you're not just using a machine because you have to type or you need to know save and share electronic files. You're increasingly needing, you know, highly powerful specialized statistical software, for example, or you need the R programming language, you need the Python programming language, or you need big data, the data sets so large it doesn't even fit on your machine. So we're all becoming a bit more dependent on computational environments, you know, to produce our work. In conjunction, there's a big push in science in particular to improve the transparency and the reproducibility of scientific work. It's very, very difficult, you know, to view a table, to view a graph, you know, to be told about a pattern that's been found in some empirical data and to be able to take that and go through all the steps, you know, to reverse engineer all the steps the analysts took to produce, you know, the work. It's near impossible to take a journal article and to actually conduct the exact same work. So with these two competing, you know, trends, we're computationally dependent to do our work and our work is increasingly opaque, it's increasingly untransparent, very difficult to reproduce what we're doing. So what's critical then is being able to understand and replicate the computational environment in which work is undertaken. And I really like this quote from the Turing-Way community that any analysis you do really should be mobile. So that means that on your local machine, you know, you can define and create and maintain your workflow. So that's, you know, let's say using Stata, importing some data, estimating a statistical model and spitting out two graphs, for example. Yes, you need to be able to do that yourself for your own purposes. You should be able to move that to another machine in the future. That should be, you know, able to be shared amongst all your teammates and they should all be able to produce the exact same result using the exact same methods. So this is the key idea to get into your head. It's not good enough anymore to just configure your machine and, you know, you can do your work, but, you know, someone else just has to figure it out themselves. You know, I'm quite stringent on that. I think we need to be better as a scientific and an analytical community. So that's me giving out to you. So how do you capture a computational environment? So we know that it's a combination of hardware and software. So basically it involves recording the computational features of your work or to your work. So what type of machine did you use? You know, is it a Dell laptop? Is it a Lenovo? Is it a Mac? You know, et cetera. That's not just geeky knowledge. You know, how much memory does it have? So my machine is eight gigs of memory. That's, you know, the amount of information my computer can process at any one time. If I had a 16 gig data set, can I actually load that in and work with it? No. Simple as that. Not without, you know, just sampling from it. You know, I'd need a machine with more memory. Which operating system do I have? Is it Windows, Mac, or Linux? You know, that affects, I'm sure you all know how files are stored, which types of software you can actually use, you know, would have given operating system, et cetera. You know, which programming language or statistical software did you use? Was it Python, or Aura, Stata, Envivo, Qualtrics, you know, et cetera? And then which version of that language? You know, Python has, you know, four reasonably recent versions. They're all broadly similar, but are slightly different, unfortunately. And that does impact your analysis. And then what additional, you know, packages or modules did you install in order to use Python? So, if you want to do natural language processing, you need to install the NLTK module in Python. So that's something extra you have to do to configure your computational environment. And we shouldn't overlook the fact as well that there's lots of miscellaneous research objects that are needed to produce, you know, a result. You know, you need the data sets. Pretty obvious. Do you need metadata? Do you need documentation? You know, et cetera. There's lots, lots you need. Phew! So a very basic way of capturing your computational environment is just simply documenting all of the above elements. And you know, we do that to some degree with a journal article, or a paper, or a report. You know, we do say, here's the methodology, and you might say, you know, I use this machine, you know, et cetera. Thankfully, as we cover today, there are technological solutions that make this process, you know, much simpler, much quicker, and much more robust. So finally, the two key ways of capturing your computational environment. So you can use something called a package management system. In Python, there's kind of two main ones, Conda and PIP, and don't have to worry about they are now. We'll use PIP throughout this demonstration. Finally, they keep track of all the modules you use in your research. So for example, when we did web scraping, we made great use of the requests module. That needs to be installed separately. That then needs to be kept up to date, because there will be newer versions. And PIP or Conda help you manage that process. You can also, either in conjunction or separately, use either a virtual machine or something called a container. A virtual machine is basically the computer in front of you run as an app. So you can package up your computer. If you can imagine that making sense, you can put that online, and then you can use that computer whenever you need it. So it's similar to, you know, using software as a service. You know, instead of installing something to your machine, you're using a service in the cloud. And a container then is a much smaller version of that. It's basically like a zip file containing all of the software, all of the code, all of the data files that you can move between computers, and it'll work on each one. So it's not a computer itself, it's just a zip file basically, or a container, like a shipping container, with everything you need to do to reproduce a piece of analysis. Great. So how do we actually do all of this, you know, fantastically technically interesting stuff? So again, if you stuck with me, I researched charities, very, very interesting, very topical, unfortunately. So we're going to set ourselves a computational task. So we're going to download from my home country a list of all the registered charities in the Republic of Ireland, and we're going to do this using Python. So the first thing we need to do is we need to create our computational environment. So the first thing, and I've done it already, is you do have to have a copy of Python. So I'm not going to go through this now. It's an actually really easy process. So if you go to the Python.org website, I'm sure some of you have done this already. Here's the latest version for Windows. If you have a different operating system, you'll be able to do that as well. So you just follow the really simple instructions here. You'll have downloaded a version of Python to your machine. So what then is the difference between what I'm saying? So if we've already downloaded Python, is that all you need to do? So the approach we're going to take is that every single piece of work should have its own version of Python. Now that doesn't mean installing it multiple times, as we're about to see, but what it does is it involves making isolated copies of your download of Python and putting those in different folders in your machine. So for me personally, I like to have a projects folder on my machine. And into all of that, I have all the various things that I am working on at any given time. So here you go here. I've got a projects folder, everything with the code demos that's here, anything to do with charity data for the Republic of Ireland is here, got some research I'm doing, et cetera, et cetera. And what I do is for each of these separate projects, they all have their own copy of Python, which is perfectly tailored for the work involved. So if I don't need web scraping for one of these projects, it will not be installed in that folder. It will be installed for other projects, but it won't be in that particular folder. And the reason why that's important is because different Python packages or R packages or whatever way you want to do this all depend on different versions of other packages. So the natural language processing package will need a certain version of another package, for example. But if you're doing something else, if you're web scraping, that will require a different version of the same package. And then suddenly you get in an entanglement. So if you only have one copy of Python, what ends up happening is you have to keep updating and downgrading the versions of the same package just to meet the specific requirement at that time. Much better, set up a computational environment for every separate project, they're all self-contained and they can't, it's a poor choice of language, but they can cross-contaminate each other. Computational environments are basically like quarantine, unfortunately, which is very relevant to what we're doing right now. So the preliminary, as I said, we just want to check if Python, ah, that's why it kicked out. I should be in this one, apologies. I was using the online version of the code, but I'll use the one on my machine just because, yeah, perfect. So using Python, I can say, right, which version do I have, okay, so version 3.7. As you can see, there's a version 3.8. I haven't upgraded just yet. And then I can figure out where my copy of Python actually is. So on my hard drive, which is the C drive, there's a folder called anaconda 3 and in that is Python, basically. So as I said previously, what we're going to do is make multiple copies of that version of Python and spread them out across project folders. So the first thing to do is we want to create a project folder. So we're going to download some charity data. So let's create a folder. Usually you might do it, you know, this way you might open up, you know, Windows Explorer, whatever it is on a Mac, apologies, you know, and you would go to the C drive, you know, you would, yeah, right click, new folder, et cetera. We can do all of that through the command prompt. So on a Windows machine, if I type C-M-D, you can see the command prompt. This is variously known as the command line interface, power shell, the terminal I think on a Mac, the bash shell, shell scripting, lots of different, lots of different terminologies. Basically this is a very rudimentary way of interacting with your machine. So in terms of writing programming code, you can kind of write much simpler code that tells the machine to do something. So just to prove that, you know, all of this is happening in real time, you can see the command prompt is telling me that we're currently in a folder on the C drive called users and here's my university username. So users, here we go, yeah, perfect. So I have a list of folders of things I do, and as you can see, there's nothing called charity data download just yet. So I take this line of code here, so copy and paste that into my command prompt, like so I use the right click there or you can use control V if this is a Windows machine and it executes command. And if we want to, you know, take a look again, voila, there we go, the folder has been created. It's a very, very simple command MKDIR just means make directory, so create the folder. So the second thing then is I want to move into that folder, I can do that through the command prompt as well CD just means change directory. So go inside that new folder. And now you can see I'm currently on the C drive users and I'm inside here. This is equivalent to me now doing this. So that's where I am on my machine. Bear with me while we do this, this is utterly essential, I find it fascinating, you might not yet, but I promise it's absolutely crucial and gets quite interesting, but you do need to interact with your command prompt to do a lot of this. The more interesting task, now we want to put a copy of Python in this folder. So we've created something called charity, dash data, dash download, and into that we want to put a copy of Python. And that's what we're referring to as our computational environment. That's the version of Python we're using for a particular analytical task. That's hasn't copied and pasted correctly, perfect. So this line of code here, it calls on the Python version that's downloaded to my machine in general, and it creates a virtual environment called Env. It's just common standard shorthand for creating virtual environments. You could have called it whatever you wanted. Basically it's the folder that stores the copy of Python that you need. So it'll take somewhere between 10, 20 seconds to execute probably, yep, there we go. We can check the results, yep, as you can see now, a folder called Env has been created and inside of it are various files and as you can see here we've got copies of Python. So there it is. So it's made a copy and it's put it in this charity data download folder. So what that means now is basically for any of this work that I do in this folder I'm using this version of Python. So the main version that I downloaded that's stored in the Anaconda folder, that's preserved. That's untouched, untainted, that will stay as it is until I use it explicitly. So what do we need to do now? So we've created it. Now we need to activate it. This just tells your machine from now on use this version of Python until I tell you not to. So here you can see why it becomes important to know which operating system you have because on Windows this is the command for activating your computational environment. On Linux or a Mac it's a slightly different command. So when I do this you'll notice in brackets you can see the Env. That now means my machine is using the version of Python that's stored in the Env folder. To see how this works you can launch Python through your command prompt. So by typing Python and then minus V, now I'm using my version of Python stored in the Env folder, very uncreatively, there we go, now I'm using Python. As you can see this isn't a really good way of using Python. You can only type it one line at a time, much easier to do it a different way as we're just about to. But now as you can see we've got a local copy of Python, it's activated so all of our future work in this session uses that version of Python. Good, so it's set up, we're ready to actually get to an interesting bit of computational work. We're going to download the register of charities, that's basically a census of all registered charities in the Republic of Ireland. It's an Excel file and it's located here. So I've written a little bit of Python that to save me going to this web page and clicking on it here if you hover over it you can actually see the web address where the file is located. I'm going to use Python to request that file and I'm going to save it to my machine. So here's the code we need to do it. So this code is not executable in the notebook. It's stored somewhere else which I'm just about to show you but basically this really simple script here downloads that file, saves it to our machine. So this stage again we're going to move out a Jupyter notebook back to the command prompt. So the first thing to do is if you are following along online, here's the Python script, so this is the collection of code I just showed you, saved as its own separate file. I'm going to move that into the charity data download folder and then I'm going to execute it. So it is currently stored in the code demos folder, I'm just going to do this manually for now. Here you go, so if I just do control C and I'll put it into the folder that I created charity data download, perfect. Yeah, excellent. So now I have the programming script I need, let's see if it works using my computational environment. So again it's the same command if you're on Windows, Linux or Mac. We use the Python command, that just does your computer, hey I want to do something in Python and I want to execute all of the code stored in this file right here. So intentional error, you'll be glad to know. Python is quite good in terms of errors, it's actually quite descriptive so as you can see here there's a module not found error so there's no module named requests. So I wanted to do some web scraping but Python is telling me you haven't installed the requests module so as you can see if this was a much more complicated detailed piece of work across a project team of five people in three different universities or four different departments in a business, this is quite annoying, you share your code and then somebody comes back and says give me this error, you didn't tell me I needed requests, da-da-da. So this is why we create and capture and share computational environments. So we get the error that we expect, it's telling us that we don't have something called the requests module. So remember I mentioned package management systems earlier, so pip, pip is a Python package manager, not sure why it's not like in a copy and paste, there we go. So if I use the pip install requests command, what that does is it downloads the request module so a module is basically a collection of Python code that allows you to do something. You can see my machine has started and successfully downloaded the request module. To prove now that it works I'm going to re-execute the command, this time it should have the request module and it should work. So I get a little message here, finished executing script, how do I know it worked, bingo, there should be a new file that wasn't there before containing the register of charities that I'm interested in. Let's just always, always prove what we're doing just so it's not a magic trick that I spent hours constructing. Here's a list of registered charities in the Republic of Ireland. Okay, good. So we've configured or we've managed the computational environment, that's all well and good, it's an important task. What if we want to update? So things don't stay static, the requests module improves over time. So let's say we wanted to update it and copy and paste. It's amazing, isn't it? All this technical stuff I'm doing and I can't copy and paste correctly. Perfect. So we can use pip install again, request and this time I specify an option dash dash upgrade which just tells my computer uninstall the previous version I downloaded and install the latest version. You can see because I obviously installed requests about three minutes ago that it is clearly the most up to date version so that is absolutely fine. If I wanted a list of packages that possibly need updating, I can use the pip list command with the dash dash outdated option specified. That takes a look inside my computational environment and says, okay, it might be worth updating these modules here. So the actual installation module called pip itself is actually out of date and really, really helpfully, Python actually gives me the command I need to execute if I want to update pip. So I'll do that just now just to show you that this is all very quick and easy and very possible. So this then updates the pip module. Great. So we've set one up, we've managed it and we've been able to execute a really simple but still hopefully very useful piece of programming. So how do we capture all of this? So we've done the hard work, successfully executed the script. How could I then share this with you? You're all using different machines, Macs, Windows, Linux machines, you may have different versions of Python, you might have installed it yet, et cetera. You're all doing something different on different machines. So how do I actually capture the Python computational environment that I created? So remember I created something in an ENV and ENV folder on my machine. Basically I want to pull out aspects of that folder and I want to share that with you so you can reproduce my work. This command here but first we'll do the first aspect of it. So pip freeze which is quite a funny term. Basically that literally freezes or it takes a snapshot of your computational environment at this time. So you can see on my machine here I've got five additional modules that I need to do my web scraping. One, two, three, four, five, perfect. And you can see I've been using the version of requests which is 2.23.0. So that's really good. If I wanted to share that with you, basically I can run the same command again. Oh my God, copy and paste, geez. Basically I'm running the pip freeze command again and then I'm putting the output of that into a file called requirements.txt. Again to prove that worked, now you can see there's a little notepad file, a little txt file here and if I click into it, voila, here's the list of modules that I need to do my web scraping. What that means now is I can share that txt file with you. I can share the programming script with you and as long as you have Python installed on your machine, you should be able to exactly reproduce what I've just done in the last 10 minutes. And that's the real power of this. There shouldn't be any mistakes, no deviations. You should be able to execute the same programming script in the same way to produce ideally the same results, assuming the file doesn't go out of date or the web page disappears. And that's just what's so powerful about this. It's obviously a big technical to get set up. But now, not to be too grand about it, but people across the world and on different machines and different teams, can now all reproduce this work. Exactly. And calling it requirements is just convention or standard. You could have called that file what you want and we'll show you an example of that in just a moment. So basically, my computational environment is a copy of Python that's stored in here, a script that I want to execute and all of the things the script needs in order to execute. So three simple things, a programming language, a configuration of that programming language and the actual code itself, which is fantastic. Sorry, I do find all this stuff very interesting. Maybe that's a bit weird. So in terms of sharing this, I mean, you could just email. So if I emailed you, I really like your work. Could you share your script with me? All you'd have to send me is the requirements.txt file and the Python file that I need to execute. So you could just zip these together and email them. Better practice as you hopefully agree that we've been doing for the code demonstrations is just making them publicly available. So if you go to our GitHub repository, you'll notice that there's always been a requirements.txt file on the main page. And what this contains is all the Python modules you need to reproduce the work. So we've now done four Jupyter notebooks. In order to run those on your machine, this is the requirements file that you need. And that's what's been happening. So if you've been launching the notebooks online through MyBinder as some of you have, basically the server looks at the requirements.txt file and makes the computational environment online for you to be able to use. So it's all, you know, it's just the power of doing this is just fantastic. It's, you know, things have improved enormously. Brilliant. So we've created, we've managed, we've updated and we've captured and shared work. So the final thing to do is let's say somebody else did a piece of work. Unfortunately, this is me again, so I'm going to reproduce some of my own work. But this, you know, is a totally new computational environment. I want to set up and I want to recreate to do something. So a couple of weeks ago in the first coding demo, you know, we did some data manipulation of census files. So we had 1961 census data. You know, we linked it to 71 and 81. We sampled from it. We did various things. So we'll use that example again, much, much simpler example. We're basically just going to load data in. We're going to take a quick look at it. And we're just going to print a message saying, you know, we're finished looking at this file. But this census data task requires different modules than the previous one we've just been working on. So we need to set up a new computational environment. So we're going through many of the same steps. So I'll do this quite quickly. The first thing I'll do is because I'm done with this computational environment, very simple, if you're done with an environment, deactivate. And as you can see, the ENV and brackets, that's now gone. I'm no longer using that computational environment. So we want to go through the same steps again. So I just want to go back one folder, back to my T95 folder. So I swear to God, if I don't copy and paste, just, yeah. Don't leave me feedback about this, please. I know what I'm doing. Great. So I want to create a new folder. This time calling it census data, what do I call it? Yep, there we go. Census data cleaning. So here's a new blank folder to store my computational environment. Again, I just want to navigate to this folder. As I said, that's just like going in and clicking on it. Again, yeah, that's like what I've done here. Just clicked into the folder. So again, exact same task. I can copy and paste all of this in one go. Again, all I'm doing is setting up a computational environment in this folder and activating it once again. So again, it takes about 10 to 20 seconds. We're making a copy of Python. This copy is going to be different than the main installation, and it's going to be different than the one we just created previously for scraping charity data. So now we're going to have three copies of Python on a machine, and voila. So now we're, oh, yeah, no, I picked a different folder. That's the problem. So I was in the wrong folder. Apologies. How did that go wrong? Ah, yeah. I forgot to update it. There you go. Good spot. It was called something else. So as you can see, I created a folder called this, and then I tried to navigate to a folder called that, which doesn't exist. Apologies. That'll be really quick. So yeah, we navigate to that folder. We just go through the steps again of creating it. Yeah, there we go. Live TV, huh? Perfect. So now we're just creating a computational environment in the folder called Census Data Cleaning, and we're going to activate it, and then we're going to execute our programming script once more. While we're waiting for that, we will move across the files that we need. So in my code folder, there's something called Census-1961-data. I'm going to move that in here. So that's the programming script that I need, and I'm also going to move across the requirements file that I need for this work as well. Perfect. Let's activate that computational environment. So we're up and running. Let's try and execute the Census code. So if you're working through this yourself, you can download the files from the notebook. So Census Data Script, there's the data you need, and there's the requirements file. So let's try and execute the Python file. OK, so we get an expected error again. We need a module called Pandas. We haven't installed Pandas yet on our machine. How can we do it? So the whole point of creating a requirements.txt file is that you can actually tell your machine, open up the requirements.txt file, and work through each of the modules listed. So install all of them in one go. So you don't have to manually say, in pip install pandas, pip install requests, in one line of code, we can install everything we need to set up this computational environment. So this line of code here, pip install, this time we're installing what's contained in the txt file. There's only one thing contained in the txt file. It's just the pandas module. And as you can see, my machine is now installing the module that I need. The reason I only need the pandas module in the requirements file, because as you're probably the eagle-eyed among you are probably spotting the fact that lots of things are going on. So there's another module called numpy, so numeric Python, which has been installed. There's something called pytz. There's a module called 6, for example. Basically, certain modules need other modules in order to function. So by just saying pip install pandas, Python is smart enough then to go write pandas needs these three or four different modules, so we'll install those at the same time. So your requirements.txt file doesn't have to be anything. Doesn't have to be an essay. I know that if I tell my machine install pandas, it'll install everything else it needs in order to work. So because it's installing lots of other things, you can see, yeah, one, two, three, four, five. So it's installing five modules. It takes a little bit of time, because it's obviously installing them from an online repository. But in, let's say, five, four, three, two, one. Oh, no, I won't try that again. So hopefully in a couple of seconds, it'll have installed all the modules we need. In the meantime, I will keep riffing, I think they call it as a musician. So the next step, once it's installed everything, is to try and execute the programming script again. And this time, it should work, because we should have installed the pandas module. So it's taking its sweet, sweet time, but that's OK. I'm sure Julia can probably put some jokes in the chat. Julia, can you act as the raconteur? Will that work? Yeah, so I'm just now having a little look at the chat myself, so I'll go through some of your comments about if you've got Python installed twice, which one do you need, how do you tell your machine which one to use, et cetera, et cetera. Yes, yes, my wall clock is slow, thanks very much. It's 41, that's quite slow, actually. It's getting slower, that's quite unfortunate. So this is taking quite a while, so I might interrupt it. Oh, no, maybe it's, hey, perfect. OK, that took a surprising long time. How bad? It's up and running now, so yeah, Julia, that's enough joking. Thanks very much. So we're at the final, we're at the end. So now we want to see if the programming command actually works. Let's take my reputation on it. Yeah. Ah, yeah. So the thing I forgot to do is I forgot to move across the data set. So my script is looking for a data set. I didn't put it in my project folder. Voila. Pressing the up key, that brings up the most recent command. And now let's see how we, yeah, perfect. So it's a very simple programming script. It loads in some census data and it just prints out the first 10 observations as well. So don't worry what the variables mean. We're not too interested. It's just a very simple programming script. So now we've gone through a full spectrum of technical tasks. We've created a brand new computational environments. We've customized them. We've installed new packages. We've upgraded existing ones. We've shared them, so we've seen how you can capture it, so how you can bundle up the modules you need, put them in a file, and then share that file with others. And then we've seen how we can use that file to reproduce a computational environment. So voila, yeah, that actually reasonably went well on my end. We've learned lots of things. That was last week's, what have we learned? I've just obviously verbally outlined what we've learned this week. I find this interesting. I'm sure some of you do as well. And I've seen some of the comments. Some of you are familiar with using the command line or the command prompt. So it is quite tricky. It's quite technical. It is a vital skill for doing computationally intensive work, research, and analysis. So you need to know file systems. You need to know how to navigate to a folder, activate an environment, et cetera. You need to be able to use the command line interface, which is quite unforgiving if you make a mistake. As you saw, if I didn't copy and paste the correct thing, it can take a while to run all these things. But I do consider the rewards great. Hopefully you agree. I picked some deliberately very simple programming tasks, but you can imagine some natural language processing. You could have hundreds of lines of code, but the process is the same. You need to set up your computational environment, install the packages as you need, execute the commands, and then package all of that up, bundle it up, and share it with someone else so that they can use it. Also, so good luck if you're gonna continue doing this. Essential free further learning, the Turing way, you've probably seen that before. I'll just show you super book by a group across different universities, but I think based at Cambridge, the Turing Institute. They've written an online book, which is absolutely just chock full of fantastic advice. You know, it goes through open research, version control, and it has an excellent chapter on reproducible environments. So this is how I learned basically to do everything I've just shown you, but it goes into much, much more depth also. There's a book called Python 101, which is quite interesting. It has a very, very short segment on virtual environments, but it's still worth a read. And there's this good article I found on Medium. Oh, no, this is the official documentation for a module called PIPNF. It's based on a lot of the same things we've seen today. Which version of Python do we have? How do we install modules? But it goes into a bit more detail. So fantastic, yeah, that's it for me. I will now take your questions. So thank you very much. Beautiful day, you've spent it inside with me. Thank you so much. I do really, really appreciate it. Okay, so let's look at some questions. Perfect. Jeff, I have Julia's jokes. Oh, okay, I'll start with most recent. Let me expand the chat box. So somebody has asked, can you use Anaconda to create computational environments? Yes, you can. When you install Python, basically you get a set of modules that are considered standard. So there are things that you are almost, you know, certainly going to need in your work. So it's called the Python Standard Library. You can see I was looking at it recently. And it has a list here of all the different modules that just come installed on your machine. You can also install a version of Python from something called Anaconda. So this is basically a customized version of Python. So Anaconda have gone to the kind of hard work of thinking about all the modules you will need to install on your machine and it, you know, provides them for you if you download their version of Python. So absolutely, you can use a version of Python provided by Anaconda. So requests, for example, that comes with Anaconda. You don't need to install that on your machine. It's already provided. But obviously lots of other packages are not. So you still have to go through the process I showed you today, but maybe not as often. So yes, you can also use Anaconda for installing and upgrading packages. And just Google it, there'll be, you know, there'll be lots of information about how you use Conda. Yeah, so there you go, create virtual environments with Python, for Python with Conda. It's much the same process, but instead of, you know, using Python minus M, V-E-N-V, yeah, you can see here it's a different piece of code. So sorry, long answer short. Yes, you can use what's called the Anaconda distribution of Python to install and update modules on your machine. So I'll work backwards from now. Is there an equivalent of requirements if using R in RStudio? Yes, there is. Usually the way I've done it in R before is you would just give R like a list. So you've got the install.packages command and into that you would have a list of the packages you want to install. Yeah, presumably you could just place the list of all those packages in a TXD file and then in R open the TXD file and load it into a list and then put that list into the install.packages command. Yes, but R is not something I use a lot myself, but I'm almost certain you can do something very, very similar. I don't know what the cheese says when it sees itself in the mirror. No, I don't. Okay, a more serious question. Ooh, when printing in Jupyter notebook, can you print without the input lines? Would you mind just saying a little bit more about that? I'll answer that question, but I'm just not entirely sure what you mean by printing in that sense. If you could just write a little bit more. A question here. So to reduce waste, can you automatically purge unnecessary resources in the environment? Again, I'm not entirely sure what you mean by that. Do you mean like unnecessary modules? Is that like uninstalling unnecessary modules? If so, then yes, you can obviously uninstall modules as well using PIP. Modules don't tend to take up a lot of space on your machine. I mean, you know, Python code like any code is just text that's interpreted in a certain way. So modules don't tend to take up a lot of space. So in that sense, I wouldn't advocate, you know, downloading the Anaconda version of Python and then uninstalling modules that you think you might never need. So if that's what you mean by that question, then yes, you can automatically or manually purge unnecessary modules, but I don't really advocate it. Oh yeah, and just about the or question, yeah, Rob has put something in about RStudio. So thank you very much. That's a really good solution. And yeah, so we had a question here about yes, so yeah, again, sorry, there are sometimes there are problems getting by my binder up and running because it's free. It can sometimes yeah, just take its time to load or you know, the server crashes. So apologies for that if it's not. But as I said, Julius posted a link to the GitHub. Yeah, so as long as you know where the open data or the code demo GitHub repository is then, you know, all of the notebooks are there, you can install them on your machine. And then as I said, you can take the requirements.txt file and do pip install minus or requirements.txt and then you'll have everything you need to run the code books on your personal machine. Okay, so yeah, I'm happy to keep tipping away. I'm obviously finished in terms of content, but I'm gonna keep answering questions. Yeah, okay, let me keep scrolling. Oh, I see, right. So in Jupyter notebook, you can print a hard copy of your notebook from the file menu. Yes, so very good point. Shouldn't the requirements include the Python version for reliable portability? Yes, absolutely. So into the requirements file, I could and probably should have put the version of Python that I actually used in the programming that I did today. So yeah, absolutely. You've held me to account there. Best practice would be putting the version of Python into the requirements.txt file. So yeah, well spotted, thank you. And so yeah, I haven't been perfectly reproducible or transparent. Oh yeah, that came up earlier. Thanks, Julia. Would you recommend using Visual Studio? I haven't used that since first year of uni. So it depends on the task you have in mind. My limited recollection is that Visual Studio is good for software development. I've never used it for data science or computational social science work. I assume it has functionality that allows you to do web scraping and data analysis and machine learning, for example. Oh, I see, sorry, it didn't scroll down far enough. So would you recommend using Visual Studio for Python programming? Can't answer that question directly. What I can say is I use various platforms for writing Python code. So one obvious one, as you've seen, is Jupyter Notebooks. They have certain advantages, they have certain disadvantages. I quite like using something called Sublime Text, which you can see here. So this is for my own research. I tend to write Python modules that people can then use themselves to collect data about charities. So here's some Python code I've written, 104 lines. And as you can see, it doesn't have a lot of comments because it's quite hard to write comments. I mean, it's got some, but the output doesn't look great when you do it. So let's just see if this actually works. So you can execute the Python code in Sublime Text. You don't need to use Jupyter Notebooks or something else. So you can see the script is running. Yeah, the output is pretty rudimentary. It's not as nice as a Jupyter Notebook, but it works. So whatever it was doing, it works. You can write your Python code in, you know, notepad, it's as simple as that. You know, you could have import requests. And as long as you save that file as something.py file extension, then it would be a Python executable script. Perfect. Wow, Julia, that is not terribly funny, but I see somebody did find that funny. Okay. Again, I'm happy to stay on for another few minutes if you have more questions. If not, what's gonna happen next is we will email everyone who signed up to all of the coding demos. We will provide a link to the GitHub repository and an evaluation. This was a pilot. It was very enjoyable for me to deliver and for Julia to help me with. I know that. I know some of you have found it good. There's elements of it that could certainly be done better. So if you do get a chance, it'll be a really quick evaluation, you know, five or six questions. And the main thing as well is just let us know about extra topics. So Julia is gonna do some text mining quite soon. I've got plans for social media, social network. Do you wanna learn how to use the command line? Do you want to learn machine learning, big data? You know, crikey, it's endless really, isn't it? So yeah, so just, or even just contact Julia and myself, you know, on Twitter using our emails and we'll be glad to, you know, help you, chat to you. Okay, so final couple of questions. Is social science part of data science? Yes, I think the term that's gaining traction is computational social science, which means answering social science questions using data science approaches. So I think data science is a separate thing to social science, but it influences it. And the main way it influences it is through a new disciplinary field called computational social science. So traditional social science questions about, you know, social networks and distancing and, you know, social capital are now, you know, being studied using Twitter data and Facebook data, you know, et cetera. But I agree the terms can be quite unhelpful. What, you know, big data analytics, data science, computational social science. And statistical learning is another one. Yeah, it's, can Python produce compiled code for speed? Python is not the fastest language for that. So Python does use a lot of C code to underpin a lot of its modules. Python is slower because it uses an interpreter instead of a compiler. So it's not very fast, but what I'm talking about here is if, you know, you were a software developer, you know, Python wouldn't be as fast as C and it wouldn't be as, I think, flexible as Java, for example. But it's obviously simpler. It's very, very extensible. You know, it's, yeah. Hopefully that's a good enough language or answer. Apologies, it might not be. Okay, yeah, a command line session would be great. I'll keep that in mind. We can, I'm gonna take a break in June. So maybe in July, that would be good. Yeah, machine learning using Python, deep learning. Can you create another Bitcoin using Python? I think that's the whole appeal of Bitcoin, isn't it? That the algorithm and the code used to create it is so secret and so clever. It's not something I plan on doing anytime soon, but I'm sure you could create your own crypto currency, but don't quote me on that. Thanks very much. Okay, it's one minute to five. I'm gonna stop the stream. This will be put up online so you can rewatch or, you know, watch it for the first time if this is how you come across it. A sincere thank you. Really appreciate you giving up your time. It's an exciting area. Good luck with what you're doing. Please contact us and Julia and I will speak to you soon. Goodbye.