 Maybe we'll start at about 1201. So just hang tight. We're going to wait for another minute for everyone to come in. Okay. Hi everybody. We have a really great lesson and a lot to get through today. So I'm going to start us off. Welcome to our August webinar. Python fundamentals with data analysis and visualization with Dustin Allen, our very own from New Mexico Smart Creates Center or the number two at score office. I'm Brittany Van Dorg, the communication and outreach specialist for New Mexico. Epscore, which is the established program to stimulate competitive research. Epscore is a nationwide program funded by the National Science Foundation. And today I'll be your host along with my partner in crime, Icerna, our website administrator who will be working behind the scenes to make it all flow smoothly. A few housekeeping things before we begin. I want to let you know that if you have any questions that any point, please type them into the Q and a box and ISIS will politely, but firmly interrupt us in and read them out loud. So we can get your questions answered. And then before we start, I also wanted to do a plug for our Epic Webinar training lineup this fall. We're kicking off things with today's webinar. And then in September, we will hear about research by Dr. Wong, an assistant professor from NMSU School of Engineering and a new New Mexico Smart Grid Center faculty hire his talk, leveraging energy storage resources to improve combined cycle power plant operational efficiency. Is it a topic that's very relevant for the Epscore project that we're currently running the Smart Grid Center. So I hope many of you will attend. And also, finally in October, we will learn about the research from Dr. Shaw, an assistant professor from New Mexico Tech Department of Electrical Engineering. And we will also, and he's also a New Mexico Smart Grid hire registration information for all of this can be found on our website. All right. So with that, I'd like to introduce our presenter for today, Dustin Allen. Dustin received his MS in information science and assurance in 2015 from Anderson School of Management and has previously been certified in depth ops operations by Red Hat and Merantis. He has been a part of New Mexico Epscore team since 2011 where he currently is a website developer and also manages and maintains the business computing operations for the state office. Thank you so much for being here. Dustin, and please take it away whenever you're ready. Sure thing. Thank you, Brittany. Give me one second while I get my browser up so I can start the lesson. All right, everyone. Are you able to, here we have a fresh Jupiter notebook. If you follow the setup instructions that were sent out with the webinar mailing, it will give you a link to Anaconda and that is what I'll be using for this. In the setup instructions, it says to download the files to your desktop and that is what I have done and how I will be navigating for this webinar. So first thing we need to do is start a new Jupiter notebook inside the necessary directory. So I will go to desktop, software, carpentry, Python, and data. This will become more later on when we get to the tabular part of the lesson. This will make more sense while we're here. So first and foremost, we're going to cover getting familiar with the interpreter and how to assign variables and work with the basics of Python. First and foremost, unlike other languages where you need to do strong type casting for your variables, you don't need to do that in Python. You put in the variable such as weight underscore kg and assign it a value. Let's say 60 for this and press shift enter to run the command. And now we have the variable assigned. This is for, excuse me, for assigning variables. There's a couple of rules that need to be aware of. They cannot start with a number of any kind and they are case sensitive. The types of variables we're going to be working with today are integers, floats and strings. And I'll show you how to declare each of those. Let's redefine this variable weight kilograms as a float. Right now it's an integer. To do that, we retype the variable name and assign it a new value that has a fraction in it. And now the Python is smart enough to know that it is no longer an integer and it is now a floating number. And let's say we want to create a string. Let's say for a patient data, for instance, patient ID. To declare a string, we put it in quotations. It doesn't matter if it's a numbers or letters. If it's long as what we declare inside quotations, it will always be a string. Now, something fun that we can do with the interpreter is we can do basic arithmetic without having to create functions. Let's say we want to find the weight of someone in pounds, for instance. Well, we can do 2.2 times weight in kilograms. Oops, I forgot to add my operation there. One of the fun things about live coding is that you get to see me troubleshoot on the fly. All right, now we have the weight variable. And if we want to extend a string variable, well, we can put in patient ID equals and flab underscore and plus. When you do a plus with strings, this is a concatenation function. And this is a little bit different than what I'm going to show you in just a minute. This will make it one string instead of two separate strings. And let's add it to the existing variable, patient ID. So this is great. We've created a lot of variables. Now we need to, now let's check the value inside them with some built-in functions. The first one I'm going to show you is print. This is an extremely useful function because it helps you make sure that the code that you're creating is the code that you are going to be, excuse me, the code that you're creating is the code that you want. You're getting the correct outputs. So let's check ourselves for weight, and you saw that little drop down there. That's because I used tab completion and it brought out all the variables we currently have. So there we have it. Weight in pounds is 132, which is what we expect. And let's see what patient ID looks like. Print. The open parentheses is where, is how you define a function. Not all functions need to have inputs, but for print, well, we just get the sign at a variable or a value. So it knows what to print. Print. Patient ID. Inflam 01. So we seem to be on the right path here. Now if you want to print multiple things that aren't all strings on the same line, instead of using concatenation, you can give multiple values to the print statement. So let's say print. And then patient ID. Now comma, that ends the first print statement. Now let's put in a string for the next one for a description. Weight in, excuse me, kilograms. And now the final statement. Weight. KG. And here we have inflam 01, weight in kilograms 60.3. As what we expect. Another. Now let's talk about another useful built-in function, which is type. Print. Type. And now let's get the type for the value of weight KG, which is 60.3. And that should be type float, which is correct. And now let's do the same for the variable patient ID. It should be string. And it is so. So far we've covered the print statement and the type statement. And now let's talk about the type of data. So you're getting the output you want, and you're really sanity check when you're debugging to know that you're getting the output you want, and that you're working with the right type of data. I know that I've accidentally. Tried to concatenate strings before, and that will just error you out. So. Another really fun thing about the interpreter is that before it starts calculating. Any of your functions. Or is it going to take inside the print statement. Like this weight. Print. Weight in pounds. Comma. And now 2.2. Times weight KG. Now we have the same weight that we had earlier over upon statement six. And now one last thing I want to cover in this section. is if you want to change the value of a variable will stay the same. This does not change weight kilogram, it does not make the weight an LB. If I were to check weight kilogram right now, it would still say 60.3. Let's check that real quick. So the only way to change a variable value is to explicitly define it as a new value. So weight kg equals 65.0. If I didn't put in the 0.0, it would become an integer, which is what we do not want. And now let's do that print statement one more time, weight kilogram, and now it is 65. So this is just the basic introduction to how to use variables. And this will help us in our next section where we start working with tabular data and start working with some patient data. The context for this lesson is we have 60 patients over 40 days and we're studying some arthritis input from them. So to get started on that, we're going to want to first import a library so that we can do more statistical analysis on our patients and on our data in general. So the library we're going to import is called numerical python or numpy. Now that we have that library in place, a library is a series of functions that you can download and use for your work. Numpy is very robust and has a lot of mathematical features in it, far beyond what we're going to show today. I'm going to be covering means, minimum, maximums, and standard deviations, but really it's extremely powerful and can be used for heavy statistical analysis for your research needs. So what do we do with numpy? First, we need to get some data into our system. So numpy.load txt, which is a function inside the library, and this function takes two parameters for our lesson. The first is fname, sounds for file name, and we're going to get some information data from that folder I directed us to in the very first step. I'm going to use tab complete to see what my options are and we'll use the inflammation one CSV file. The second parameter that we need to give this function is the delimiter. A delimiter is how you separate data out in a CSV file, while it's conventional to use a comma, underscores, and spaces are also prevalent in the wild when you're working with raw data sets. So it's useful to be able to define that explicitly when working with your data. And here we can see the output of this command. What we're seeing here is the first three rows of data and the last three rows of data, and then they're showing us the first three values per row and the last three values per row. When you see the three dots, that just means this is a continuation of the same. So what this is describing is there's data in the middle of this and there's data in the middle here. To work with this data, we're going to need to import this library, this CSV file into a variable. So let's put this into a variable called data, data equals, and I'll just paste that text I just wrote. And I'll run this one more time. And now let's get some information on data, print, data. Hopefully this will look the same as what we have up above us. And we do, slightly different, not as many commas, but it's still saying the same thing. Within this array, we have at least seven lines and there's more than seven variables, which is correct. So something that we can do, that we showed earlier, which was the type command. Let's take a look at what this is. Print type data. This is a numerical Python and the array. What that means, it's a multi-dimensional array, which is correct. We're looking at tabular data. Think of this as a spreadsheet. And if you want to find out what type of data is within this array, well, there's another built-in function that we can use that works with numpy arrays. Print data.d type. Whenever you want to know what your options are, by the way, after you put in a variable or a function, after you press dot, you can double tap tab, and it'll give you your potential options. As you can see, there are many of them. And this is a good way to learn what is available to you within a particular library in addition to the help pages and manual pages that are available online. Okay, that aside, we're working with d type here. And this gives us float 64, which is familiar and what we would expect as we see these values are one dot something. And because this is a multi-dimensional array, let's also get the shape of the data so that we know what we're working with print data dot shape. And here we have 60 rows and 40 columns, which is correct and what we want to be working with. We have 60 patients over 40 days. The next thing, now that we are able to describe the data and look at the data set as a whole, it's be useful to actually be able to access the data that we need to. So the first thing I want to show you is how to access specific indices print first value in data. Like we did earlier, and now data and here's some new syntax, we use open brackets for working with arrays. The first value is the x coordinate and the second value is the y coordinate. And there we go. The first value is zero zero, which corresponds to our summary pre output up here. Now, here's one that we don't have the summary value for, but I'll just show you how it would be done print middle value in data 30 comma 20, which would be 13. If this is a little complex at the moment, that's okay, let me give you a visualization for what this is doing. So when we access zero zero, we'd be accessing capital A here, but if we were to access zero two at BC, I hope this helps clarify what I'm talking about when I say we input an indice. So when we go to 20 by 30, we're going to position 20 somewhere over here and way down over here to 30. The next way that we're going to want to access this data is to be is getting slices of indices. And this is really helpful if you want to work with like a particular day or subset of patients. So let's say we want to get the first three patients and their first 10 days worth of data. So print data brackets, zero, two, four, because Python starts counting with zero and you go up to four but not including four. So this is the first three to patients or four patients, excuse me, and then zero to 10. And there's our output. And if we wanted to get, let's say patients five through 10 for those same first days, well, then be print data five through 10 and zero through 10, like before, you can also take these slices and assign them to a variable. Let's create a small subset of this time, the first three patients on the last four days. I'll show you a short hand in the process. Small will be the name of our variable data colon three. If you do not specify a digit before the colon, it'll assume start from the beginning. Likewise, 36 colon and leave that blank, it'll assume continue to the end. So if you wanted to get all the data for a particular day, it would look like that or all every day's data for one particular patient, it would look like that would be everything for patient three. Anyways, I digress. We're creating a small variable for the first three people in the last four days. Small. Now let's create a nice print statement for that. Print small is print small. There we have it. Now to start working with some of the mathematical analysis within this array. First, let's just get the average across all the values in our data set. Print numerical pi dot mean of our data 6.14 and change. Now let's do something fun and assign multiple variables at the same time so that we can take a look at some of these other statistics. max val comma min val comma and standard deviation value equal numerical pi dot max data numerical pi dot min of data num pi standard deviation. Oops, without that dot, we're not accessing it correctly and we'll have a ton of errors. Data respectively. And now print. Let's make some print statements for each of these. Print max inflammation comma max val val print min oops min inflammation min val print standard deviation std and there we have all our statistical descriptive outputs. So now start working with some statistics on slices. Let's just start let's say we want to work with page in zero like I was discussing earlier. Patient underscore zero equals data and it'll be this individual and then we'll just get all every day of theirs and then let's find out what their maximum inflammation is print zero patient zero with this 18. Excellent. Well we don't actually have to use the variable page in zero. We could put in the data directly into the print statement. So let's say if we wanted to one off check the max value for patient two. Print, flow, mation, patient two and then we'll put in the parameter for this as data and then two comma everything. Excellent. 19 that is the output that we are expecting. Now so far we've been doing so as we see here this is working across one particular patient. What if we wanted to get the max for every single individual patient? Well what we do then is we define the axis for our array. So when we're dealing with axis one we're going across the x axis we're doing zero we're going across y. This graphical representation should help with what I'm trying to describe here. When we set the axis to one we're going we're getting data for per patient if we set the axis to zero we're getting data per day. Okay so let's get back over to here and start let's get the average data or cross days for inflammation. Print numerical pi mean function of data and let's use axis zero so that is across days. That's pretty much what we expect. We see 40 values or so. To double check that we can get the shape of this. So let's put in that same command and then see what our option and then put in shape. As we can see it's 40 comma zero which is correct. We have 40 rows and there's no other dimension to it. That's exactly what we expect. Meanwhile if we do the same thing with the other on the other axis we should get 60 values pi dot mean axis equals one. There we have it. So this has been well and good. We've been able to describe our data set and been able to do some big some descriptive statistics on it. The next section we're going to get into is going to be the digitalization of this data and I'm going to keep building up what we already have here the same data sets and the same the same patient data. So to do digitalization we're going to have to import a new library import matplotlib.piplot. Okay now to show a visualization it takes two things. First you need to create what you're going to show and then you're going to have to show in a separate line. So let's create the variable image and we'll use the built-in image show variable. So matplotlib.piplot.imageshow and now we're going to show data. Oops that's actually not going to work quite right because I didn't use the second command like I said I would matplotlib.piplot.show and that's going to be the end of every one of our digitalization statements and here we have it a heat map that shows the values and what we're seeing here is that the central the central days seem to have higher values than the beginning and ending days and it's the same across all 60 of our patients. So let's use check it let's use some of our other statistics to verify this trend. First let's start with the average So average underscore information equals numerical pi dot mean of data and we'll continue to use axis zero and now let's plot this information data average dot plot I know there's a lot of plot average inflammation iplotlib we have the average data which maintains the same shape we saw in the heat map and now let's confirm it with the max and min analysis plot matplotlib.iplot.plot and unlike we've been doing before instead of creating an extra variable we're putting it directly into the interpreter num pi dot max axis equals zero and now it's let's take a look at it dot show again we're seeing the same trend this is even more pronounced than when we were doing the averages so let's take a look at the min library pi plot sublibrary the plot function call in numerical pi use the minimum function call in our data and of course make sure we're still using our axis zero for days plot library dot pi plot and show still pronounced not quite the same not quite as pronounced as the max but it's still a very start cheap very interesting data set for patient data and it's because of our ability to visualize that we were able to see this right away before having to do more analysis these are a lot of things that you want to do when you first are working with data just to get a general shape and feel for how it's going to run and work in your program basic visualizations like these are always a great help when starting with starting out with a fresh data set and trying to try to figure out what needs to be massaged to work with your programs and what does not so this has all been well and good but we've only been doing one plot at a time each time so let's do this let's start from the beginning and plot three on one three at once so I'll start a new lab for this a new notebook and we'll start from the top first let's import the numpy library and now let's import the matplotlib.py plot library again and I'm going to show you a shortcut you can create aliases for these um libraries so instead of having to type that every time I'm just going to say as plt for a shorthand okay now let's load in our data like we did before numpy dot load text with the same file name and the same delimiter and now let's create a fig a figure elt dot figure and let's define its size figure size equals inches by three inches all right now let's tell python what to put inside this figure so axis one is going to be figure dot add subplot and now this takes three parameters one for how many rows this is going to be and what row on the subplot this is going to be in it's all going to be in row one how many items are going to be in the row there's going to be three and then the particular location of this plot within the figure this one's going to be in one we're going to be doing three figures so I'm going to copy this and paste it and make this axis two and it's also going to be on the same row but it's going to be in position two and then let's copy it out one more time for the third axis in this one it's going to be in position three okay we're getting there now we need to put let's put a label and actually and plot out each one of these figures so axis one dot fig dot excuse me axis one wait forgive me axis one dot set underscore wild label and here we put in a string average and now we'll feed it some data axis one dot plot just like we were doing in the previous example when we were doing one at a time this time will be um pi dot mean of data and we are still working with axis zero I'm going to copy and paste these for working with axis two and there's not since there's so little that needs to be changed first we'll change the variable to max match up here this label will be max and then we'll change the statistical function we're going to use from mean to max and now we'll do it one more time for minimum for axis three all right we're getting there there's a couple more things we want to do before we'll run everything that we've shown we'll give a layout for the figure so that it knows to not overly make things overly wide and it'll fit everything inside the parameters that we gave it which is the 10 inches by three inches fig dot tight layout and another very useful thing is the ability to save the figures that we make so let's do that as well plt after all that's the shorthand for our library they're online to save fig and this will save in the same working directory that we read the inflammation data from let's give it a name and formation dot png and now for our last step well you will show so hopefully i don't have any typos and this will run first go and there we have it all three plots that we did in our previous step all on one line and if we go to our desktop we can find that inflammation png in the working directory um so i that's all i have at the moment now for everyone is there any questions i seem to have finished up a little bit early okay well then if yes but well while we're waiting for questions at least for myself it would be interesting to know like what could you explain some of the applications of python uh specifically what can you do with this with python python is a great general use um language uh you can use it for statistical analysis um like r or matlab that's what i've been displaying here today but you can also use it in server administration and building websites which is usually how i use python um i import libraries like os um for instance and i can use it to uh script out to build entire websites or things like that um but i know for our audience is more um researchers you can use it to work with biology data um polyside data i remember when i was an undergraduate i would use python for working with power law distributions and things like that and more advanced statistical analysis you can also use it for a big data computation thank you and then i i actually do have another question what is what could you tell me about this maybe this you put a notebook do you put a notebook sure yeah so jupiter notebook is a standalone program it's more for it's more of a teaching aid than strictly a production creation tool um this was launched from the anaconda package um which is linked in the setup file um that we discussed at the beginning and it's open source and it runs on mac windows linux any distribution you want and so it's a really cool it creates a browser front end for python that's running in the background uh when you start up on anaconda you'll probably notice your terminal um or command line if you're on windows popping up and throwing out a lot of uh data and that's because it's actually starting a server on your computer so for you to work to open up on your browser yeah that's pretty cool cool um or since a lot of people may refer to this after the fact um oh before you end could we have a question but before you end uh could you give information on about where to get these files and such yes so these files are verbally verbally um they're on the software carpentry website is where these come from and this is the first three lessons of the software carpentry lesson on teaching python is where the status um this lesson plan came from uh we actually do have a question from um the audience okay uh from brad maynard could you explain the access function again the access function sure so when we look over here the axis is just describing which way you're going to be looking at the data so when you say if you don't describe mean you're getting the mean of all the data um crosswise across the rows and up and down through the days which isn't that descriptive of a statistic so what you do is if you set it to access one you can get um the average per patient or if you set it to access zero you can get the average per day awesome it looks like we have another question okay i'm just i'm just going to take it i said uh david wants to know can you describe i think it meant what is in the other files folders that we downloaded and unzipped so when we go into if you go into the rest of the lesson plan um this is only the first three parts of teaching python and i um i'd only practiced and prepared for these three first three lesson plans but as you can see there's many more sections and those other files that you downloaded would get into play um working with through these exercises um i strongly encourage you if you're curious to continue to work with them um if you want to learn more this is a there's a lot of very useful tools here for um for computing data for researchers and it's not designed for web programmers or um in mind this is really for researchers and how to you leverage python for their purposes we um david i hope that answers your question and if it doesn't let us know and we would be happy to dustin would be happy to further answer your question it does cool great all right everyone i'm going to give you maybe like two more minutes to type your questions into the q&a box um just this is the time dustin knows what he's doing if you have any python questions uh ask him because he knows um hearing none okay yet yes um yeah i am going to bring up my screen again okay okay you y'all have been the most silent audience but you're all they're they're all wrapped everybody it was we have great attendance for this so thanks everybody for joining um us today and doing python fundamentals with data analysis and visualization with dustin alan and um just a reminder we've got our next webinar coming up on wednesday september 22nd and that will be on leveraging energy storage resources to improve combined cycle power plant operational efficiency i want to do one last shout out to dustin let's see if it'll come up thank you dustin for doing this webinar thank you britney for having me thanks ices for running in the background and we will see you next month everybody and for the webinar on the 22nd we hope to see you there bye thank you bye bye bye thank you