 I'm Greta Lindsay and I'm the interim director of statistical consulting and research services. And I'm also the senior project manager for the human ecology, learning and problem solving lab, which is a social science data collection and research facility on campus. I came back to MSU in 2016. From, so I got my master's degrees in mathematics and statistics in 2006 and 2008. Then I left to go work for a local software company and I did a lot of quality control and statistical analysis. And at that company, we used Python 2 a lot. And since I came back to MSU in 2016 towards the end of that time, they were switching over from Python 2 to Python 3. So there is some slight differences between Python 2 and Python 3. I primarily use R, but I still have that Python knowledge. I had to learn Python on the fly by basically teaching myself. And on top of that, it wasn't really like Python, Python. It was Python that was accessed through a particular software package. So it's a little bit different. And so I'm hoping that we can at least give you the basics of what you need in order to get started today. It's only going to be the very, very briefest touch on a lot of things, but hopefully it's enough to help you again, get a good start so that you're not starting from scratch. Like I was, I did have, I did take some computer science classes in my undergraduate degree, but then when I got to see plus, plus, and I got to see plus. I was like, I don't need that degree. So, and then actually what really knocked me out of the CS minor was data structures, which is kind of strange because Python is great with data structures. Anyway, Addis's statistician, it's good to be able to use lots of different languages to solve different problems. I personally think that R is better for actual statistical analysis, but Python is great for data wrangling and database management. As well as it's just a really good skill to know. This is the pilot of our introduction to Python workshop. And so it might be a little bit rough. We might change how we do things. The next time we teach this, we will give you the opportunity for a feedback survey. At the end of it, we would really appreciate constructive feedback to help us revise this. The other thing that we're doing is we're doing this a little bit different. We're not teaching this through anaconda or a Jupyter notebook. If you are sort of familiar with Python, you might have heard of those. Those are just different user interfaces. We're teaching this through our studio, not just to show you that our studio is more than for more than just are, but have a nice consistent framework. And the hope is that you'll be able to do some work in Python and then again use our studio to do more sophisticated visualizations and statistical analysis using R and kind of go back and forth. And it might turn out that we decide that this doesn't work the best teaching it in this framework, but we wanted to try to give it a shot. So, when you installed R, well, so, first of all, there are Sally is research statistician for scissors. And so she is also available to help. And she's recently gone through the installation process on a Mac. So you've got a match. You could maybe answer some of your questions. But she is also a Python newbie. So I'm trying to get my scissors team to at least get started on that. Um, so, in order to get started today, there are a few things that you needed to do. First, you needed to download are the language and that is just just to get the connection to Python. That's all we're going to be basically using for our and then our studio is this user interface. And then from this, we will use our studio and are to download the rest of what we need for Python. This document here. If you unzipped the folder or the folder that you're given. Hopefully you unzipped it and saved it in a location where you can save to otherwise later we'll get an error. Um, yes, you raise your hand. If you didn't get the email from after you registered and you don't want to go into the folder and the installation video. Okay. So, thank you. Okay. So, in that, in that zip folder, there's a bunch of images and that's just so that we can get this document to compile. There's a PDF copy of what I'm going to go over today. And then there's a file called RMD and that is what this document is. It is a mixture of code and documentation. If you code in Python or in our any other language, you really want to push reproducible research and so being able to have your code along with your documentation and have them linked together so that you're minimizing copying and pasting. That will make your life easier as make your research reproducible and you'd be able to hand off your codes to somebody else and they'd be able to go through it. So we'll, if you're still in the process of installing and getting caught up. We're just going to give a little bit of background information. I will go over the installation process a little bit. So, and this is some of this code is written in R and it's just so that when we compile this document into a nice little pretty PDF or a Word document, it'll look nice. So don't worry too much about some of the code chunks that you might see right now. So, Python are in our studio briefly. Python is a popular programming language, commonly used for statistical analysis, data visualization, machine learning. There's lots of different ways to interact with Python through different integrated development environments or IDEs. The choice of the IDE that you might use depends on your context, nature of the project, personal preference, who you're working with, what they use, how you want to collaborate. Some other examples other than RStudio, RIDAL, PyCharm, Jupyter Notebook. I personally use Jupyter Notebook, Visual Studio Code. I haven't used Visual Studio for Python, Google Colab and RStudio. So I have Anaconda or Spyder or two other versions that are basically big packages that I've used in the past. So Python has broad applications beyond data analysis. You can use it for web development, automation, and ultimately your choice between if you use R or Python will depend on your needs and preferences. Again, since we developed these series of workshops using RStudio and RStudio uses R behind the scenes, you do need to have both of them installed on your computer. And if you decide to use RStudio as your ID of choice, hopefully you will continue to learn R as well. Any code chunk that is in here that's in a Python code chunk will work in any Python IDE, so you can just copy it and paste it or type it in in a different Python environment. So if you don't have RStudio installed that you do have another Python IDE, feel free to use that, but I'm going to be using this one. Okay, so there is an installation demo for our workshops. I created one for Python. It's a half an hour. That is with having some of the installation time cut out. So I really plan or I really want you to plan on at least 40 minutes to get things installed. This is being recorded. So if you do not have everything installed and you're playing catch up, pay attention up here. The video this workshop is being recorded. It'll take a week or so to get up. Sarah sometimes it takes a little bit longer, but she's going to work really hard to get this one up sooner. So you might have to then go back and rewatch some of the beginning stuff or just rewatch the whole thing and it will hopefully make more sense as you follow along. The links to install are up here and you can install Python. I was going to have people install Python separately, but I'm going to work through using actually this code to install it. I can't use this on a Chromebook. Hard to code on a Chromebook unless you're coding in Google's environment, which I haven't done. Chromebooks just don't run this kind of software or have the power that you need. So steps to get ready. Hopefully, some of these are done. If not, again, catch it up. You need RStudio or R than RStudio. Download the zip file. Unzip the zip file. I have this emphasized here because again, if you can access things from in a zip file, but if it's not unzipped, you can't save to it and it causes problems. Once you open RStudio, it will probably look something like this. But if then we'll go to file, open file and then you would browse to the RMD file and you would open that up. And then as soon as you open it, now you should have four panes. And the reason why we have this zipped is that if we have code here that has a PNG, it'll give you a little preview in this script window. All right. If we've got that up, if you did not have, next thing we need to do is get reticulate installed. If you open this up and you saw a yellow bar at the top and it said install reticulate now. Hopefully you went ahead and did that. If you did not see that. We'll come over here to the lower right hand pane where we'll see files, plots, packages, etc. We click click on packages. This is just for our packages, unfortunately, but we can type start typing in reticulate. And then auto complete. And then we would install that. And I'm not going to it won't take that long to install but I'm not going to do it right now. That's the next thing that we need. All right. So quick check in. How are we feeling. Anybody have any immediate problems? All right. So our studio orientation and layout. A lot of this is really adaptable. If you've got multiple month, if you've got a nice big monitor, you can actually make three panels instead of just to you can rearrange things and move things around. We don't have a nice big screen here. So we just have 2 things and we're zoomed in. You can zoom in by doing control plus. You can zoom out by doing control minus or command plus command minus. You can adjust your setting so you can go to tool, global options, change your appearance. If you need a black background to make things more readable, you can change that. I under the code. Have. Let's see display. I have rainbow lines. For tabs. I'm, I also have rainbow parentheses. Normally have rainbow parentheses selected. So all of that will help with making sure and I don't want to actually want to go back to my appearance. And I can't remember what the normal one is. We'll just go with this one. All right. So you can play around with your appearance. This will be the code window or the, the script window, the editor. Console down here is where we'll get. Well, we can sometimes see. Commands and the output of commands. There is terminal and background jobs. We probably won't touch on those, but those are also down there. The environment was where we're going to store. We'll see variables and then we'll see them stored. There's other things up here. And some of these will apply and some of them won't apply to Python. So files, plots, packages and so on. All right. So now we have reticulate installed. The next thing that we would do is install many conda. So by running this line of code. So this is an R code because it's using our to make the connection with Python. So you can either run commands, a line at a time by just doing control enter anywhere on that line, as long as you don't have anything highlighted. You can run the entire code chunk by pressing the play button. So this is going to do three things. It's going to install many conda, which includes the Python language. It then when we have the double colon here, it uses reticulate without installing it or without loading it. Then we'll load reticulate. If you try to load the entire reticulate package without having many conda installed, it'll give you an error. Then we're going to use today we're going to use for Python packages. And we'll use pandas, sci pi, num pi and mat plotlet. And these all take a little bit of time. Num pi comes with many conda. This is just gets the latest version of it. And if you had all this done beforehand, then just save some time during the workshop. If not, you can work on installing these as we go. All right. Python, or we do sometimes need to tell our studio where Python is at. So back in this tools, global options, if we click on Python, we would, if we don't have a Python interpreter selected, we would go to select. And I don't have Python on my system. I have a conda environment. So if I click on conda environments, it takes a few seconds. It should come up. There's one there. So, let me actually run my library. So command enter control enter for library reticulate. This is just loading that packet package. Let's see if that brought up. Python. And now, because we loaded reticulate, we can see our conda environments. I'm using the latest version 3.11.4. And the older version is there if for some reason you needed to use an older version you could, but it's better to use a newer version of possible. I'm going to cancel that because I already have one selected and I'm going to cancel again. Yes. No, as long as it sees that. As long as that bin contains the executable file, you should be okay. All right. Any other questions? Good question. I'm not going to run this install line because that will take a while. So, this just talks about what I just did. So now let's talk about working in our studio. You'll notice that sometimes there is some lines of code that have lots of back ticks. And that is just so that when we print the PDF, you'll actually see what the code chunks look like in the console or in the script. If we were to actually run them, we'll take all of some of the extra back ticks off. What we want to do is in this file, if we want to see what it looks like, I'm going to take off the four back ticks above and below. And I'm going to remove these extra back tick are quote quote back tick. And now I get what looks like a code. We'll call this a code chunk. It's in gray background. If we were to see a play button, we know that we can run this. So if we were to run this in, it's going to use the language are, and so we'll just hit play and one plus one is to the one in the square brackets. The one in the square brackets just indicates it's the first line of the output. We can do the same thing with Python. So let's get rid of some of the four extra back ticks above and below and get rid of telling it what styling to use. Again, now we're using the Python language. We want to do one plus one. Let's see what happens when we do that. And it's going to now switch to the Python environment. So it takes a little bit to get going. It calls reticulate to load Python. We get the Python version. Now, before we saw in the console, we saw one carrot and a one plus one and then a two. When we're in Python, we see three carrots, one plus one and a two. We don't see the one in square brackets because it's Python doesn't let you know what line of output it is. Okay. When we are doing reproducible coding, we want to make sure that all of our code is in a code chunk so that it will be executable. Our studio also has spell checking built in. It will red squiggly line underneath words that it thinks that are misspelled. You can go to edit check spelling and check all the spelling. Sometimes if it is, if it can identify a possible word in a dictionary, you can click on it and it'll suggest a word. And so if you find any words that are misspelled feel free to fix them in this document. Our studio will recognize where variable lives. It will recognize if it is a variable in our or if it's a variable in Python. So over in the environment window. We see that we Python here, and we have data. We have ours in our interface object. We can switch to our environment. And we could see if we had any variables that were stored in our. And then we can go back and forth between our and Python. All right. So, we don't need to run this again. I meant to delete that. Sorry about that. So what we can use our to do code without using Python. So here we're using are the language. So let's just store the value of two in X, the value of three and why and then we'll do Z as X plus Y, and I'm just going to play the whole Chuck. And notice that X, Y and Z are stored in our normally when we're talking when we're coding and are we use a different assignment than an equal sign but to because this is Python and we use equals and Python, we're using equals here. We're going to do the same thing and Python again we just take the are in the curly braces return replace it with Python, and we can run the same code. But now those objects are stored in the Python interface. So if we were to change these values, let's say let's replace X with seven and why with one and run this again in Python the values update seven one and seven plus one is eight. We go back to our they don't change we're only changing the Python environment. So now we're going to switch we're just going to use Python from here on out. Okay. So in, we can use Python as a calculator. Probably not the best use of Python but you could use Python as a calculator most the time you're going to be calculating things as you go along the way, but just so that you we can establish the operators for doing arithmetic in Python plus symbol. Python works as we expect one plus two is three for multiplication we use the asterisk 16 times nine is 144 division is the forward slash so 20 divided by five is for notice that it is 4.0. So it is not an integer it's doing a decimal value there. Subtraction is just the dash. Let's see what happens when we run to the power of two. That the right answer. Now, okay. That's because in Python it to erase something to an exponent you don't use the carrot. This is actually doing modulo instead of powers. So to get a power we do a double star or a double asterisk. So to double asterisk to is for We can put lots of spaces around our code to make things more readable Python does not care about spaces. And so spaces are more for the human eye. And like our Python has a wide range of libraries often called modules to do data analysis. Again, we're going to briefly touch on pandas, NumPy, SciPy and that plot live today but there's many more than that. We've already installed reticulate. This is going through how you could get another package if we wanted something other than the ones that we already installed. We do use library reticulate. This is our code so it's passing things in our function way pie underscore install is the function. And then we put the package that we want in quotes. We already did this so we don't need to do it again. So once we have the package installed, we need to import it. So in our we would use the library function to import a package in Python we use the import function. And the nice thing about Python is that you can give packages aliases because every time you use a function out of a particular package. You have to give it the package name so in our once you load a library you have access to all the functions within that package. And you only need to give it the package name if there might be some conflicts and you have to specify a particular version of how to calculate a mean, if multiple packages have different definitions of that. In Python you always have to use the package name. So if we give it an alias, and it's a shorter name, then it just makes typing a lot easier. So here we're going to import pandas as PD. So it's just two letter abbreviation for pandas. So everybody go ahead and do this one so command enter control enter. And of course you're going to tell me that we don't have pandas. Hopefully all of you had it. And it's just bothering or being problematic for me. And this will just take another minute. I'm going to take that long to install just install pandas as anybody sounds like there's a few other issues going on. Anybody else have particular anybody else having any other issues. Okay, so if we have to get them one at a time hopefully that they're all as quick as pandas. So once we have, we run the pie install. We can import pandas as PD. It should work now. We don't really see anything other than we get the three carrots back, but we can, we'll test it out to make sure that we've got it improperly. You can also specify the Python environment by giving it its Python path here. This should be the same as if you go through the global that options dialogue. If you've had problems so far. It might be because you had a previous Python installation. And it's not, and it might be getting confused over which one we use. So you might have to reinstall things. Hopefully that's not the case. I'm going to check really quick to make sure. Looks like we don't have any questions in the chat, but I will pause just for a little bit longer. Listen, what's the error. Yeah, you have to do the library reticulate first for that to show up when it's kind of like a chicken and egg thing. The other code that I had in there of giving it a path. Did I delete that I might have deleted it can sometimes also help as well. So if you get like an install zero, it's just, I think it's a bad internet connection. And you have to just try it a few times. All right. So this is this is the hardest part if we can get past the install issues. That's the hardest part. All right. So assuming that we've gotten pandas imported, I'm going to keep moving on so that we can stay on track, sort of. So again, we imported pandas as PD. Let's do some math. So in order to do math, we're going to import math. We're not giving it an alias. So we have to use four letters math for every function. So math dot is the package. And then after the period or the dot is the function that we want to do. So let's import math. Command enter control enter. Now let's do the square root of two. sqrt is for square root. What do you think this next line is saying zoomed in and it really jumped around. math dot co s cosine. math dot pie. So 3.14 blah, blah, blah. You can just run we can get the value of pie by running that part or not. And the whole thing we get negative one. This one I open close parenthesis Addison. Oh, there we go. I just didn't like it when I had it highlighted. Okay. And so the cosine of pie factorial. So if you don't remember what factorials are it's four times three times two times one. That's 24. We've got a complicated function here where we're chaining together a lot of things. So we're taking pie dividing it by four getting the cosine of that, raising that to the power of two. Then we're adding the pie divided by four sign of that to the power of two. What was that formula. Does anybody recognize that formula if you've taken calculus. It's a radius of a circle. I forgot to. Devon Goodwin graduated last spring and he was the one that really helped with porting over the tutorial to Python. And he got his masters in math and so I'm, I'm thinking that he came up with that example of a complicated formula from his math background. The next part, we're going to create some objects. So we've already done this. We created X, Y and Z. Yes. So anytime. Yeah. Yeah. Yeah. Yeah. Most of the time. There's not, there's not a lot of base. You have to, you have to really specify where it's coming from. Okay. So creating objects. We've already created X, Y and Z. Let's talk about what we're doing here. So, when we assign values to an objects in Python, we use the equal sign. Again, if you're not familiar with our apologize, but I do want to make some connections here in our generally we use what we call an assignment arrow, which looks like this. But that does not work in Python. So we always use equal signs. And if we run this particular line of code, then we would see that the value of X in our environment changes from seven to six. And we should be able to see the entire value because it's a scalar. Python didn't Python 2 was not case sensitive. If I remember right. Python 3 is case sensitive, which is probably good because that way you have to be a little bit more careful about things. So we want to make sure that just pay attention. Use auto complete whenever possible. So that you're using the right version. Is this that true? Did Evan write that right? Is this still case sensitive? Let's test it out. Let's see. If we type in X, we get nothing out. If we type in lowercase X, we get six. So yes, Python 3 is case sensitive. You could use to be a slot be able to be sloppy and Python 2 and no longer allowed to do that. There's lots of coding guides out there for Python. And so we strongly recommend that you try to write clean code, right? So that you can read it so that other people can read it. You never know when you're going to write some code, put it away for a few months and then have to come back and read it again. Try to use object names and functions or generally object names should be nouns. Function names should be verbs. Try not to reuse function names that have already been used. Although with Python, because you have to specify a package, if you reuse a function name and it's not from particular package, then it will use your version of the function that you created. You could create your own way of summing variables or summing values and it would use your version because it's locally and not from a particular package. Google has a style guide that you could use. And there's also a package pie lint that will automatically check and correct for issues in your code styling. I've also heard but I have not used that chat GBT is really good at reading and correcting code. So you could also ask chat GBT to fix your code. Alright, so if we create an object, but don't tell it to print an object or ask what it is, it won't do anything. So if we just said what is six, we just see the code X is six. If we want to get the value out, we'd either type an X or we could say print X and it will print the six. Sometimes we need the print and sometimes we don't. So if you try to print out a value and you can't get it, then just try putting print the function print around it. Once we create an object, we can use it so we can do math with our value of X. So we could do 2.2 times X. And we get 13.2. Notice that it really wants it to be a double so it gives it lots of zeros and a one at the end. Even though we would normally just think of it as 13.2. If we take four plus X, that is an integer plus an integer so we get an integer back. You can overwrite a value of an object so we could take our value of X, add six, store that in Y, then we could change the value of X. So Y is X plus six. Six plus six is 12. You see that over here in our environment. Now we can change the value of X. The value of X is now 2, but is the value of Y 12 or 8. It's 12 we haven't overwritten Y. Okay, talked about it. It's just a quick check. I know it seems like where it's a trick question. But just make sure that you know that if you change the value of X, Y doesn't automatically change until you tell it to change or you run that code. So we can also print Y or just say Y or we can look over in the environment. All right, so we should be able to do some use Python as a calculator create some objects do a little bit of math with the objects. Everybody good there. If you have a working system assuming. Okay. Let's talk about data types a little bit we basically briefly mentioned integers and doubles. We have more than that we have lists. So a vector in Python, we formally call it a list, or it could be a NumPy array. And that's the basic data type. And so a vector is a series of values, which can either be numbers or characters, but every entry of the vector has to be the same data type. And Python can tell that you're building a vector when you use square brackets separating each element with the comma. You can use a list function, or you can use NumPy. Again, that's the package array is the function, and then parentheses and square brackets will create a NumPy array. And they will concatenate a series of entries together. So, if we need to install NumPy in the command window. Let's see this might you should already have NumPy installed I'm going to just try to import NumPy. If you install mini conda, we should be able to go import NumPy as NP. Oh, great that worked. So, because if you did mini conda install, then you should get NumPy for free. If you don't have it, if this import does not work, then you can use a pip command, or you can, and that keeps it in the Python world, or you can go back and use the pie underscore install from our world, and that will work as well. And the exclamation point at the beginning of pip tells it that it is a shell command rather than a Python command. And so that is for interacting with the operating system or the software application rather than interacting with Python. And in the shell environment, you can type commands and execute them by pressing the enter key, and then the shell interprets the command executes it displays the output, or in the terminal window. All right. So I already imported NumPy is NP. A lot of packages have a default alias that most people commonly use. But again, you can make the alias be whatever you want it to be. It's something that shorter than the actual package name. So let's create a base array in Python. So this is a list of temperatures. This is going to be a Python list. Because it's just the square brackets. And we created that list. It's over in our environment. We can see some of the values. Let's get that in the console. So I'm going to do control enter in the console, and it prints out all four values. Could do the same thing as a NumPy array. So again, we imported NumPy is NP. Period array is the function that we want the same list of values in square brackets, but also inside the parentheses. And now we have two objects in our environment. We've got the temps list. We created the second one temps underscore NumPy. And we can see that the difference is that we've got one that's identified as an array in one is just the list. Even though it has the same values in it. And we can also print the NumPy array. If we do print temps NumPy, you don't see the array function out front and just see the list. Notice, so there's no commas in between the numbers. You just see the spaces there. So that's the difference between a base list and a NumPy array. And how they're handled will be slightly different. So those are numbers integers. We can create a list of words or character strings. Character strings can be anything. It could be just single letters. It's nice to actually have words that we can recognize. So here we've got a list of animals. And let's build that list and print it out. Notice how it took the double quotes and made them single quotes. Double quote and single quote are used interchangeably. If you needed to have a character string that had it a quote had a quote in it. Then you can alternate which ones you use the outer ones are going to be what classifies a single string as a unit. We do the same thing in NumPy. And again, we get the same list out, but this time we have it space separated instead of comma separated. So once we have lists, we can check to see what type it is. If we created the list, we should know what the list type is. But maybe we have a list that somebody else created and we need to check the type of it. So the type function is a base function. And so the type open parentheses, temps list, close parentheses. And if we run that, we'll see that it's a list type. What do you think will happen if we do the same thing for NumPy, temps NumPy? Let's give it a try. It says it's a NumPy ND array. So an ND array is a Python class that represents an n dimensional array or a multi dimensional array so that we could have more than one dimension in that array. An array is a data structure that stores a collection of values of the same type. So in this case, they have to all be integers or all be numbers. All right, let's create a vector. We'll call it DEC that contains decimal value numbers, and then check what data type that vector contains. We already have this here. We've got some examples there. I want you to just run it. And then I will give you a few seconds to run that and then I will run it myself. So I want you to get ahead a little bit if you're able to, if you've got a working system. I'm just going to run all of this all at once by using the play button. I'm going to scroll down and I'll be able to see the output. So first we created a NumPy array. So it says ND array. And then the last set is just a Python list. Get the same values, they just look slightly different. One thing I love about Python that I don't really have a good equivalent and R is a dictionary. So the idea of a dictionary is that you have a key value pair. So when you, when, if you take a dictionary and you look up a word, a word could have multiple definitions associated with that same word. It's got a pronunciation. It tells you maybe where the origin is. It tells you a lot of information about the word. And so that's the word in the dictionary is the key. And then the value pairs all the information that's associated with that particular work. In this particular case, we have a one to one dictionary where we've got a key that has just a single translation. And Devin was also multi-lingual in English and Spanish. And so he created a Spanish dictionary, so literal dictionary for the dictionary type. And so here, hello, we can translate that to Ola. How are you, comma, star, and I am fabulous, Perone. So if we run, I'm going to build actually do this line by line. I'm going to build my translate dictionary. And then I'm going to translate fibia. So how do we do that. So translate is built as a dictionary in our environment. We can see that we've got curly braces. And it gives us a little bit of a preview. It gives us the key and the value. And if we want to use it, we put in the key in quotes in the square bracket, and that should give us the value of fibula. If we had multiple definitions for fibula, then it would give us all of the values associated with that particular key. And the nice thing about dictionaries is that they don't have to all be of the same type. And so the different values can be different types. So vectors are ordered dictionaries are not. So even though a physical dictionary is an alphabetical order, the dictionary type doesn't have to be an alphabetical order. So vectors are indexed by integer positions dictionaries are indexed by keys, and all the elements in a vector are the same type wall and dictionaries that doesn't have to be the case. You can change or mutate the values contained in a vector or a dictionary, but you cannot change the value of the keys in the dictionary once they're assigned so keys are immutable. So let's do a few more examples, we could create a vector. We'll call it that could be easy just the numbers one through four. We can access the values in a vector by index. So the Python is zero based. And so the first value in or the first index is zero. R is one base. So the first index and R is one. We've got a graphic here that will show in just a little bit. But if we want the first value out of that vector, we say print back zero and we need to scroll and we get the value down there. Again, we can create a dictionary using curly braces which identifies it as a dictionary. The first word is the key, the colon identifies it as a pair. And then the value is after the colon. We can get values in the dictionary by putting in the key and the key does not have to be a single string can have spaces in it. And we can add a new key value pair to a dictionary. The translate is the dictionary's name inside the square brackets is the new key equals and the new value. So we said I like cheese. Mi gusta el queso. Again excuse my pronunciation. We can get that translation out by printing it. And we see that down there. We can modify the value in a dictionary. We can change hello to audio. So if we wanted to really make a confused dictionary. And if we print, try to get the value of hello out. Now we see audio sort before we would have said, oh, we can get the whole dictionary. If this was an actual dictionary, you know, really big, we wouldn't want to print the whole dictionary. This is small so we can actually do that. There are other types, including logical so true false Boolean values. Python is case sensitive. R wants true and false to be completely capitalized. Python only wants the first letter to be capitalized. So this is one of those times where if you go back and forth between Python and are you're going to have to make sure that you're using the right casing. So that you don't. It'll give you an error and then hopefully that will remind you that Python and are similar but different in important ways. We'll take a little break here in just a few minutes, but I want to get through this section. So, I true and false represents particular numbers. What numbers do you think true and false represent zero and one, which is which yes true is one good job. So on or off true is on one false is off zero. And so we can coerce true and false into numbers if we wanted to do math with them. And so type already of logic is telling us this is a list we already knew that was a list. So if we are wanting to know the type of each element in the logic true or false, we could give it a particular element of that. So the first element of that list. So logic square bracket zero, and then in that's inside type in parentheses. And now it identifies true as a Boolean. And this is the index. It's not a value. If we try to mix and match false, a word space and the value to let's see what happens here. So this is a for loop that will go through this list. And for each item in the list will print the type. So for I, that's the index and diff types is our list colon tells it that's what we want to instructions to loop over. We're going to print the type of each item in the list. Oops, and we have to give it the whole thing. The first one is false Boolean string integer. So it does even though it we're mixing and matching it's still going to identify the individual type because it's just looking at that one particular element. All right, before we start talking about tuples. Do you want to take a five minute free. And if you're still having install issues, we can maybe try to work on them a little bit more. Then we'll wrap up types of objects and get into importing data and working with data frames. Okay, so let's do maybe 408 come back. I'm going to mute myself in case I have side conversations. Remind me to unmute. All right, we'll get started here in just a second. Letting people in the room get settled in. All right. Are you guys okay with me going on. Yeah, okay. All right. Anything online Sarah. Okay. So there's another type that Python has that is unique to Python. There's no are equivalent as far as I'm aware. And that is a tuple. And tuples cannot be modified once you create them. And they can contain elements of different types, including numbers strings and other objects. You define tuples. Like parentheses. So normally parentheses are used for functions. But we don't it's a function without a name. And so if you just have open close parentheses with some items inside the parentheses that's you're defining a tuple. So if we run this line of code here. Over in our environment. We can see our tuple. Listed here. And if we click on it, we can actually get in for a preview or we can click on anything in this environment and we'll get a preview of what's in there. In another tab in this source code area. Oh, notice. If we do that it does identify it as a tuple. Okay. I resize it things jump around a little bit. Scroll back. We can access items in a tuple. So if we have tuple underscore ex, for example, square bracket, zero square bracket. And we say equals four. What do you think is going to happen here? Nothing. Why? You can't change tuples. Yes. So tuples are immutable. So it just complains and yells at you does not support item assignment. Okay, let's see what happens when we try to mix different data types into one vector. So again, tuples we can have multiple data types what happens when we try to do that in a list or a vector. We can let's uncomment this for loop here. So I'm doing control shift C in that uncomments. And I'm going to have both lines selected so that I can run both of them. So for I the item in our list of vector num char colon. We're going to print the type. Int int int string. Okay, that's good. So that didn't course anything. We can do the same thing for our logic. We'll call this num logic 123 and false. The type of that is a list. And we get three integers and a boolean and do the same thing for char logic. I'm going to go over here and I'm going to click on it. We can see a list. And we see string string string bull. If we have 123 and then foreign quotes. It sees the quotes and even though four is a number it treats it as a string. If we're allowed to have multiple types in the same object, the hierarchy is integer float. String list to bull set and dictionary. So dictionary is the biggest and it can contain all of these that are below. If we were to make the Python our connection in Python in our list is equivalent to a dictionary sort of dictionaries a little bit more powerful easier to use in Python. In our Python list is equivalent to a vector. We can also do lists of lists. So we could create three lists in Python. Different OS types, maybe some favorite numbers and some logic. Notice that they're all the same length. Then we can do a layered list. By assigning inside the square brackets those three lists. If we put layered list square bracket one. We got out. Remember zero based indexing so that means we got out the second list from the layered list. If we say layered list square bracket zero square bracket one. What does that do. The first list, which is OS and the second object in the first list, which is one. To access the elements of a list within a list. We use the syntax the name of the list. The outer structure list I and then the element is the second one in the square brackets. We've been outputting values with print and by just running the name of the variable we can do this because in a Jupyter notebook or interactive Python shell such as our studio. You can simply type the name of the variable or the expression to see the value printed. But if you are running Python from a command line, or in a different kind of interactive environment, and you might need to use the print command to display the output. So ID, our studio is nice Jupyter notebook is nice because you can be a little bit more flexible without having to do print. To create a name list in Python we can use dictionaries. So again, this is going to be a dictionary that has keys and values that are different links. So we'll have a name list we'll give it a title is our key in our value as statistics. Our next key is numbers and we're going to give it a value our list from one to 11. Well, pay attention to that in just a second. And then we're going to give it another key data and give it a value of true. Let's run that control enter command enter. And there was a carriage break return there did not like that. So I had to remove the carriage return. Now we can print that. So list range one to 11 gave us the values one to 10. Why did it give us one to 10 instead of one to 11. Zero based. Okay. I know that in the computer science world, people really love zero base zero base to me is really annoying now that I've been using our for so long and not Python as much. And to me, you know, it's 11, one, two, you know, it's not even 11 items. Again, we've got a graphic here and just a little bit. So it stops before you think it's going to stop. Okay. Let's import some data. I, we can import data in the environment. This is going to import it into our. But let's go ahead and we're going to do that. So we're going to do from text base. And in the zip file that you are given, we're going to get select the black foot fish dot CSV and click open. And we're just going to use all the defaults and click import. And that's how that went into the our environment and not into Python. So how do we get it into Python. This code down here is how we would read it in to our so this is the our code, because it just has a single carrot. We'll use that to help us import it into Python. So we need pandas installed. If we haven't already gotten it, which everybody hopefully has. Let's go ahead and make sure that we import it again. And now we're going to use pandas. We can do read underscore CSV to read a CSV file. We can just give it the name of the CSV file. And it's going to look in the same folder that this rmd file is in. If the data is not in the same folder where you're working on your code, you can give it a whole path. And it will import that way as well. So if we needed to specify the whole path. That will work, but also the shortcut of same directory makes for cleaner code. And we had it in our files right next to each other, but fish is in the same folder as this intro rmd file and so we can just import it. Now, we're in Python. We can see that there's something different over here. We have this blue circle with a white arrow in it, and we can see that it says data frame, and we can see a little bit of information about that data frame. You can get a preview of it by clicking on it, and it'll create a tab. I did not look like it imported properly. I think it's just not previewing properly. Let's look at the structure of the data to make sure that it imported. If for some reason you had problems getting the zip file, this next code chunk would get you the actual file from GitHub. Let's look at the type of Blackfoot fish. Okay, it's a pandas data frame. So that's good. We're a little bit concerned that it didn't import properly. So let's look at the shape. It's a Blackfoot fish dot shape. So Blackfoot fish is the data frame. The function we want to use is shape and we're going to print that. Okay, so this is looking better. I think that it's just not viewing properly. And parentheses means what type is parentheses. Tuple. So we can't, this is immutable. We can't change this. Okay. We can't add more rows. We can't add more columns. And we can't change the values in there. We can get the column number of columns out, which should be seven. This is the columns. It actually gives you the column names. So these are all the names of the columns trip mark length, weight, year section and species. The trip is the time that they went out to collect the fish mark. Can't remember what that is. It's probably if they mark the fish or not length, they took the length, what the length is weight, year of the collection section of the river species of the fish. D types is the data types of each column in the first few rows. So trip is an integer. The trip mark is an integer length and weight or floats integer for year section and species or objects. And the data type is an object. There is a nice function called describe that will describe all of the columns in the data frame. So let's see we got count number of objects that should be the same for all of them as long as we don't have any missing. Yep. Missing values it takes out. So you see that we've got some missing values in weight. Mean standard deviation men 25th percentile 50th percentile. This is wrapping around. Oh, 75th percentile and max. You can also just check the type of the blackfish. And again, we get data frame. Some other functions that you can use to inspect data frames. Data set name and shape is the rows and columns shape and square brackets zero is the number of rows shape and square brackets one is the number of columns. The length of the data set name variable gives you the number of columns as well. We can look at the content head open close parentheses the first five rows tail is the last five rows. Call names is the column names data set name indexes the row names. We can get summary of the content using info or describe. The data frame is a two dimensional table like data structure, similar to a spreadsheet in Excel. It's a primary data structure provided by the pandas library. And you can create a data frame in several ways, such as loading data from a CSV Excel file SQL database or constructing it from scratch. We can look at the type on lists or dictionaries data frame is made up of columns reach column is a series object. Each column can have a different data type. All the values in a column must have the same data type. And we've touched on this a little bit, but we're going to touch on it a little bit more because it's important we're going to extract some data from our data frame. Before we do that any questions on importing data, we're looking at the data frame. We've already read our CSV but if we hadn't we could read it. This is actually we're going to read it in as DF just to give it a shorter name. So I'm going to rerun this again or run it with a shorter name. We're going to access the weight column by saying DF square brackets and weight in single quotes and we're going to name that weight column so that we have a separate copy of that. If we print the weight column. Oh thank goodness it didn't print all 18,000 rows. What did it do printed the first five and the last five did give us the length and the type of it. It could have been dangerous. It's like I need to remove some stuff from the document here. We can get the years out. If we say years dot head without parentheses it should not really work. Give us the head and the tail. Let's see what happens if we do open close parentheses. Now we actually just get the first five without the open close parentheses. It just summarize the entire object. We need to actually have the open and close parentheses in order to for the function to work. We can get the structure of the years strs for structure. No, sorry, str converts years into strings. So let's look at. That's not that's not it either. Okay. So, str is just a different way of looking at the years. It's still storing it as integers. But if we wanted to get into a particular data frame or vector by giving it an index, we need to use a function pandas function called I look for index location. And let's see here if you look over in the environment tab here, we can see if I can get back up here. It's in the same place. So our studio tells you the dimensions of Blackfoot fish. So we can roughly view the data set as a matrix of entries with variable names for each columns. We can use the I look function to perform the same task that we did before, but using the following code. So if we wanted the values and the fifth column and store them in a variable called years. It's a colon for all the rows, comma, member fifth column is a value of four. And then if we want the values will do period values. And we'll get years out as an array now. So let's get something smaller for practice. We're going to create a data frame from scratch. We're overriding the Blackfoot fish DF as this new data frame. So we have to be careful about reusing names. And I'm actually going to call this DF to so that we're careful about that. And that x is a vector that has hnt w v y is may October, March, August, February, and then Z is some years, and you have to print it out. So what would be output if we entered DF dot I look square bracket to comma colon. And I have a code empty code chunk here for you to type that in DF, I, L, O, C, square bracket to comma colon. Oops, DF to what did we get here. The third row. Yes. And it prints out the column names, as well as the values. How do we get a value or output of 2015. We want 2015. Let's see that is in row one. Okay, so the second row, the index of one. We could get just get the whole row out and we make sure I change this to DF to, but we want just that last item. So we could say, instead of colon, we could say to another way to do it. We can do first row. And this third object in that row. Let's see if that works as well. Let's say we don't know the location of a particular thing. So let's say we want to look for in our data frame in the Y column. Let's look for where it equals October. And then we want out the Z column and let's see what this code does. Okay, so we're looking for when Y is October, then look in the Z column and tell us what that value is and it tells us 2015. So that's kind of complicated. Another way. This is actually a fourth way. Give it the name of the column Z and the second item in that column. Oops, DF to area 2015. Okay. So look is you give it a name, a named column or a named row. I look you give it an index. So I for index. So I want to go to these pictures here. So if we want to index and get a specific element. Let's say we have a list of grades. We want out 93 we want to get 93 out, we give it to in the square brackets that's the specific one that we want. So number two says 93. If we want a range of elements. The zero based indexing zero starts before the first element. And the commas are essentially what it's counting. Okay, so zero to one would only give us 88. To two would give us 88 and 72. If we want 72 and 93, we have to say grades one to three and we say the symbol for two is the colon. So one to three gives us 72 and 93. So think about when we're slicing or we're getting out a range of elements in zero based indexing think of it as indexing on the commas. So it's in between the values. If we want the first object in a list zero and square brackets, or zero to one would give you the same thing. So one would give you the first two objects in a list, but they're adjacent so you could also do a plus B, the second position that you want plus one you have to add one. So one comma three would give the same items as. Well, if if you were wanting to go from one to three. And three. You would actually not get the same items here so I need to fix that. So let's check this so we've got a list of five values one through five. And we're specifying each item in the list so we see one through five. If we want the same list, but we want to give it a range. Again, we have to go one more than what we want. So one to six. If we want to go backwards. We can start from six and go to one, but we have to tell it that we want to step by negative one. So the last thing in this range is the step by. So we're stepping backwards by one, but notice we start at six and we don't get all the way down to one range 123 to 131 goes from 123 but stops at 130. Three to negative one by negative one should be the values. Three to one and zero. Now if we want to index into those lists, if we do one, the list is one through five one colon three should give us two and three. In the list range 123 to 131 three to six gives us 126 127 128. If our list is the range six to one by negative one two to four will give us four and three. So if you have adjacent items that colon notation is nicer as long as you remember that you have to add one more one than you might think that you need. If you have non adjacent items, then listing everything out is the way that you're going to go. So 139 would give you the second, fourth and 10 items in a list. 1st, 3rd and 9th if you're thinking zero based. So we can use this in I look to get out information. So I'm going to change this to DF to. So I look indexing location square bracket 1, 2, comma. No colon. So you can either have the colon there to get all columns or not have the colon, it'll still work. That gets us the second and third rows or rows identified by 1 and 2. If we use a colon since 1 and 2 are adjacent. That gets you the same thing. You can look at them both by playing. And we'll see them one on top of each other. If we want non adjacent items. Again, we have to specify them explicitly with the 1, comma 3. And change it to two. So you could do this with names instead of indices. And if we change this to DF to. How do we pull off only columns X and Y. Or X and Z. So X and Y are the 1st and the 2nd columns. So we could do 0 comma 1. Or we could say Nate with the names X and Z. If we do X and Z, then we don't have I look we use just look. Is everybody feel fairly comfortable about accessing things out of vectors? I kind of probably want to go a little bit faster. So we can get some plotting in. If. Really quick before we do that in our there's a big deal about making sure that, you know, data type things are changing data type. And with Python. We don't have to worry so much about characters being called factors. In this blackfoot fish data set on species. Is takes on values of rainbow trout, West slope cut trout, bull trout and brown trout. If we get the unique values out, we can just see those levels. And we can do the same thing for the species or without the parentheses. We do right without the parentheses. Again, that doesn't really do anything. It just gives us the header and the footer of the frame. Essentially the same thing as not having unique there at all, because you need to have the open close parentheses. We can convert a year to a category type. This is similar to converting it to character or a factor. And so we're going to put an F at the end. If we want to treat it categorically. We can also use pandas to do that and the function there is categorical. Instead of as type category. And when we do that, we can see that it's 10 categories, but their integers categories from 1989 to 2006. If you're able to get scipy in. Which didn't install for me. Let me see if I can get this really quick. While that's running. We're going to from scipy we're going to. This is a particular notation. So instead of importing everything in the scipy module. We're only going to import the stats module and we're going to import it as stats. Since we're not changing the alias, we could just skip that as stats part and just import stats. And we're going to make sure that we have numpy imported. Notice how we can give. Do assignment of 2 variables at the same time. Using a comma so we want a value of mu and a value of sigma. There's 2 separate things. We're separating it with the comma. We say equals 0 comma 1. No square brackets. And it magically puts 1 in for sigma and 0 and for me. We can get a from the stats package. We can look at a normal distribution and get random values or random numbers. The center is mu the scale is sigma and we want 1000. And from that, we can look at numpy mean of the samples should be something close to zero. It's over here it's 0.06. And the standard deviation should be something close to one and it is 0.985. We can print a string together the F paste things together. The mean is in square curly braces mean so that will look up the value of the mean and the standard deviation is curly braces standard deviation. And we probably should round those but we can get something that looks kind of ugly but prints it all out together. Alright, I want to get to plotting finding help. There's lots of help out there for Python. And when the skip finding means talking about masking, you can go through that yourself cleaning data pay attention to missing values. Let's do some data visualization. I have to run pip install really quick probably with everybody to come prepared and my computer decided that it wasn't going to be prepared today. That wasn't too bad. Doing it the other way definitely took longer. Okay. I'm going to import from Matt plot live pie plot as just PLT for plot. And let's get a scatter plot. So it's a scatter pie plot blackfoot fish clean. I'm going to just run this really quick it's dropping the NAs weight. And then comma blackfoot fish clean length let's see what happens there. And we need to do plot dot show. So it plotted one thing on the x axis and one thing on the y axis but oh my goodness we don't know which one it is. So let's clean this up a little bit and put some labels on there. We're going to the axis or so. We're going to import Matt plot live pie plot is plot. FIG acts is going to create a new figure in an axis object that we're going to add things to do these one at a time. On the scatter plot for the axis on the scatter plot. We're going to have blackfoot fish clean weight and blackfoot fish clean length. We're going to set the X label to be wait so it's plotting x, y. So set the x axis, set the y axis label, set the ticks. We're going to give it in rotate the ticks by 90 degrees on the y axis, set a title. And now we're going to show it. You can show this after each step if you wanted to check your work. And we can get expand this show in a new window. Hopefully those of you that are watching online can see this in the new window. But it just enlarges this sometimes if it doesn't show properly, I will minimize it and then expand it again. And we might get more of the access if it doesn't show it properly. Really quick. Let's just do. Side by side blocks plots. I'm going to scroll down to side by side blocks plots. I'm just going to play the whole thing. And so we can get box plots of fish weight by species. And if you wanted to, you could go through the code there to see how it did it. But the function is just box plot. Sorry that we kind of rushing at the end here. Hopefully you're able to explore more on your own. Again, the idea of here is just to give you a taste of how to get started and maybe get through some of the install hurdles I know that we weren't able to solve everybody's install issues here. We will. We're add on a next step Python in the spring. And then we also have our other our series workshops. Sarah, do you want to give the closing spiel. Yeah, so I guess there's a few things to know one is that we recorded today's workshop so I'll post it shortly in a few days. And then we have other our workshops recorded online. It's Montana.edu slash data science. There's training materials there and our work our videos are there.