 So I teach at a small college and what Aaron talked about is about teaching people who come with the interest in using these kinds of tools. I try to sneak up on them and actually teach them the tools before they know they want those tools. So that's kind of a little bit different approach. So how many of you in the room know what Jupyter Notebooks are? I've used them. Okay. Pretty much everybody. Awesome. So that's good. But really the thing that I kind of want to spend some time talking about is not what they are, but how to sneak up on students that probably would avoid your class if they knew you were going to try to teach them how to code. They would run somewhere else and not actually want to be in that class anymore. So I want to talk a little bit about the motivation for why I think it's important that we do something like use Jupyter Notebooks instead of using other tools. Before I do that, though, I want to do acknowledgments. People do it at the end. I always like to do it at the front. So the Jupyter Project is pretty amazing. They do this for free. Right? I mean, that's pretty incredible, free to those of us who use it. My colleagues Helen Hu and Sean Raleigh at Westminster. So everything that I'm talking about is a we, not an I. And then I'm going to show you some work of some of my students with their permission. Data Carpentry I think is a great resource. A lot of what I'm going to talk about, Erin talked about, I'm just sort of backing it down a level. Also Tricia Shepard is someone who developed a lot of materials that I put in Jupyter Notebooks. And then I've been involved in this Tides Project, which is through the ACNU. And that stands for Teaching to Increase Diversity and Equity in STEM. And I'll talk a little bit about how I think what we do as data science people and what we're doing in an undergraduate institution actually sort of does address some issues of equity. So why bother with this stuff? What's the issue that we have? So I was trained in a very classic sort of genetics lab. I did bench work. I've fiddled with computers for a long time, but in the future, this is from a 2017 report. You guys know this, right? But my students don't know that they're going to use a computer for work, no matter what. A lot of them that are in science, they think, well, I'm just going to go work in a physics lab. I don't need a computer. And that's kind of funny to some of my colleagues who are like, well, in the 70s we use computers in physics. But they come in very often, especially biologists and neuroscientists. They don't think about computers as being a big part of what they're doing. And so part of the job of not saying you're required to take a coding class, but here are some coding skills. And by the way, don't you like it? Isn't that kind of neat? Look at what you can do. And they don't even know that you like it until it's too late. And then you kind of got them hooked. And by the way, you needed to get a good job. This came out in nature in 2017. Again, talking about things that data carpentry is addressing, where you're trying to take people who are saying to themselves, well, why do I need this coding stuff? I'm in a PhD program, or I'm in science. Why do I need to do this? But we can actually bring them in and get them the skills that they need. And then I end up teaching a fair bit of reproducible science. This is not something that's in the undergraduate curriculum. Maybe you know what that looks like in an undergraduate level, and maybe you don't. But nobody really teaches this to undergrads. We haven't done it historically, and we really need to. They're entering this world where they can't have a lab notebook that's on paper. They're generating huge amounts of data as undergraduates, never mind what they have to deal with on their own once they're done. And you have no idea the number of files that I get submitted for assignments that are called way worse things than came up earlier, final, final, final number four. Or it's the exact same file name that you gave them as the assignment and it comes back to you. And you don't even know what it is, or who's it is, or it writes over. I mean, just being able to teach students how to keep track of the data that they have and not mess it all up, it actually takes a lot of time. So that's something that is a big issue for a lot of them that we try to address. Thinking about equity, computer science has a problem, right? This is in the news all the time that tech and computer science has some pretty big inequities in the workplace. It's true in academia, it's true in tech. So if you look at the numbers, 57% of professional occupations are held by women. But 20% of CIOs are women. How do we engage students in a different way, right? The sort of traditional culture of who learns to code has not attracted everyone. But if instead of being about the computer, it's about the domain, as I found out everybody but me calls it that. I guess I came from the domain, so that's sort of how I see it. But if we can teach them that science has these tools and needs these tools, then before they know it, they're actually into computer science or into data science. So I've been teaching a lot of first semester students that come to college and they're not interested in CS at all. I mean, if you ask them, do you want to take a computer science class? They say no. And the idea is that if we actually could take some of these 50% numbers of people coming in the door interested in science, interested in biology and neuroscience, biology and neuroscience are at least 50% women. Compared to things like computer science and sometimes engineering. So I think that, again, sneaking up on people and saying, well, you don't think this is what you're interested in, but we gotta make a bunch of graphs. I teach a lot of classes that we have to make graphs because we have data. A lot of it's actually gathered by the students. What tool are we gonna use? Well, we could use Excel, but you made a histogram in Excel. It's kind of a pain, right? And we've used Google Sheets, but these things have problems. So I'll actually show you some of the sort of template notebooks that we've set up. So we hand students these templates that have data in them and say, here's how you make an Instagram. And then put your data in it and see how that goes. This is just more data about, again, 57% of bachelor's degree recipients or women and 18% are in computer science. I mean, this is an issue that needs to be addressed and I think that addressing it by putting the sort of domain first and then saying, oh, and by the way, you need these skills. Here they are in a way that's different. There's also actual science data on the fact that if you have teams of people working together that are diverse, you get better ideas. This is an active area of research right now. So this is again something that we try to support. We do a lot of team-based learning. I'm going to introduce you to a paradigm called Pogl, which stands for process-oriented, guided inquiry learning. Anyone heard of pair programming? Yeah, okay, a bunch of people. So pair programming is something we do as well. So you pair students up, one person drives, one person navigates. But then we have teams and the activities are actually scaffolded. So the idea came up before, well, how do you teach people how to learn? Well, one way that we do that is we use what's called scaffolding, where you don't just throw people a CSV file and say, figure it out, analyze this data. You actually have a whole activity set up, in our case in a Jupyter notebook, where what you do is you walk them through the steps, piece by piece, and you have them write code or run code, modify code, look at the results, and then they go through and actually write answers to questions that ask them about how did that go, what were the tricky parts, what did this command do, what did that command do, what were the arguments, and you sort of walk them through the thinking processes inside a Jupyter notebook, and so I'll show you some examples of that. But again, this idea that working in teams and working in diverse teams is something that's really worth addressing. And I think doing it in the classroom in this kind of sneaky way, actually works pretty well. It's funny, I think of data and software carpentries, graduate students in postdocs, and then you think of, I'm talking about undergraduates, and there's this whole movement of pushing this down the line even more, and working with K through 12 students. And this model, the sort of team-based learning kind of things that we're doing have applications in high school for sure. Okay, so this is probably not something that I need to talk much about. So why would we do this, and I hate bullet point slides, but there's one. So Python and R, the two big languages. We use both at the institution that I work at. Some people use R, some people use Python. We can do the same activities in both in two different Jupyter notebooks. Also people, I'm trying to convert the MATLAB holdouts to Python or R, but the few that won't, then there's at least Octave. And I've been experimenting a lot with SageMathCloud. I don't know if you guys know SageMathCloud. It's free, and I'll show you what that looks like. You can pay to get it to upgrade and go a little faster, but it's actually a free online cloud tool, and it's pretty awesome. We use Anaconda to do a lot of the work in Python. So here's this idea of scaffolding that students don't have to write the code. They can copy, or they can run the code, they can copy the code, they can modify it, and they can do whatever they need to do with it. That's something we've done a lot of. Markdown is amazing. Markdown's the best. So, in the same notebook, students can write markdown, they can format equations in Leitach, they can do what they need to do. And it's actually really simple to learn. It's kind of amazing how fast students pick up on how markdown works. I barely even have to show them how it works and they're off and running and almost no one complains. Which I thought that most students were gonna kind of freak out a little bit about markdown because it looks like code, right? And it's not, so. But most of the advantages, I think, if everybody raised their hand about notebooks is you all know why. So this Pogo project, this team-based sort of active learning approach. So this is the longest lecture I've given this semester as of right now. I never talk in class like this. We might set something up and just say like, hey everybody, this is what we're doing today and this is why it's important. Today we're gonna look at numerical methods for approximating things because computers can't do integrals and this is why it's worthwhile. One of the classes that I teach is on scientific computing in Python. So it's everybody from biologists and chemists to physicists, neuroscientists and math majors. Some of them have a little coding experience, some of them have none. And it's all over the map. So we try to pitch some of these mathematical and computational ideas to all these different kinds of students and actually get them in the notebook and get them sort of going. And this framework of Pogo is actually pretty cool. There is a CS specific Pogo, if anybody's actually in computer science. It came out of chemistry originally, but there is a biology. There are some biology activities, some of the activities are free, some are not unfortunately. But the sort of development cycle is pretty interesting too. The guided inquiry works on this kind of learning cycle where this is part of this scaffolding idea. The first thing you do is explore something. You present the students with a table or some equations, some background. Today we're gonna look at DNA and how DNA is made into RNA is made into a protein. Background information on that or background information on something that you might do in code. You might wanna do a 2D array. So you might need to introduce NumPy and how do you make a 2D array. And then you actually have them go through and kind of write some of the code, manipulate some of the code and see how it works. You don't tell them how it works, you let them see how it works. You let them make, break it, have errors like Aaron was talking about. What does that error message mean, what did it say? What does it now tell you? Gee, you forgot an argument. Okay, what does that argument do? Why do you need it? So they kind of invent their own idea of how does this stuff work. And then you actually get them to apply it. So it moves them from something very easy, something that they just sort of kick around a little bit through to something that they then have to kind of figure out how it's put together. And then you ask them to do it. So you hand them something and say, okay, well now take this DNA sequence, make it into RNA and then make a protein out of it all on your computer. So I'll show you a few examples of that too. So I'm hoping to sort of end a little early and I'll show you a bunch of examples, things students have done and then things we've prepared for them. But then I'd be interested if y'all have questions and especially if you have ideas, things that you're working on or things you've used Jupyter notebooks for. I think that'd be kind of neat to have a little discussion about that. I don't know if you can even see that. All right, I have some of them just brought up as notebooks as well. So as a neuroscientist, one of the things that I wanted to sort of bring to some of the students is they modeled the nerds potential. This is why your cells have any electrical activity at all. So it's a little bit of an explanation. There's some math. So this is pretty simple for students to do some math. One of the best things I love about these notebooks is you can just sort of lay it all out. I just wrote that. You format the equations and then we can give them either values or test cases. Probably the biggest struggle for a lot of the students in some of these classes is running test cases of their own. We give them some test cases for their code and then they're like, okay, what now? And sort of training them to think about exploring a parameter. How far should that go? Should it really be 10,000? Does 10,000 even make sense? Probably not. What's the normal range of that ion concentration, for example? Or that velocity or whatever it is. So getting students to sort of think about running test cases. So this is the same thing. I mean, you guys have done enough with this that you'd know. This is the markdown version. And students are really savvy. They see the nice version and what do they do? They click on it and it falls into markdown. And most of them go, oh, okay, I can work with that. And they figure out how it works. We provided them with very little training on how markdown actually works. And mostly they just figured it out and did it, which is pretty cool. And then we give them some programming tasks. So that would be sort of a model and sort of this Poggle idea where we'd ask them to start exploring what does this do and how does it work, program the math, and then we give them a task. So we did some object oriented programming and we wanted them to get practice making a class so they had to make a class called a neuron. And we're working on constructor methods. And then we gave them values to plug in up at the top for sort of one kind of cell. And now we'll see if your class works with these other. So moving from a scientific computing context, this is an activity I do with first semester students who literally come in and may not know how to make a graph, any graph. I give them this activity where I give them data and just say, okay, make me a graph, and I give them no instructions. And it produces really interesting results. They're sort of all over the place. Some people make these really nice graphs. Some people make things that I can't even understand. And then we go through and have this sort of almost a semester long discussion, and this isn't a genetics class for non-science majors. And we have this discussion about how do you present data? What do you do with it? Why are there good ways to sort of present certain types of data? So this would again be the sort of the model that students would look at. I actually often have them sketch graphs on paper. Because that's a good way to just get the intuition of what is that even gonna look like. And then they have to go through and think about these different types of graphs. And then I actually have them load some data, which in this case is genetic data. So it has a genotype and some numerical variables. And then some categorical variables of different types. And you go through and deal with this thing in R. This is a little bit on the high end. This is a little more advanced for what I would expect a first semester college student to be able to do. And this would be a mix of code that I provide for them and just leave and let them run so that they get a sense of what it is. And they'll often ask some questions. What did this step do? Why is that step there? Why was it necessary? And then they make histograms. And it's actually kind of amazing how hard it is to make histograms. But in something like R with ggplot or with Python with matplotlib, it's actually pretty easy. And then we can go on and make stack histograms. It's pretty amazing how hard it is to make a stack histogram in anything but R or Python. Excel I've never tried, I got too frustrated. We used Google Sheets a semester ago, but you can't make a stack to histogram or a box plot in Google Sheets. So again, they can go through and basically pluck this stuff out and walk through and make all these different kinds of sort of representations of the data, starting to think about what's the distribution look like. And you can do almost anything that you want in this. And this one gets kind of advanced in the end. Going through and making some different kind of bar charts and tables and things. But this is again, running in Sage Math Cloud, which is free. And it has actually a structure where you can make classes. You can actually get your whole class in there and they can submit assignments and you can evaluate assignments. I did a Python activity with my colleague, Sean Raleigh's data science class, where that entire, his entire class was in Sage Math Cloud and he graded their assignments and everything in Sage Math Cloud. So if you're interested in that, it works pretty well. As far as thinking about other examples, this one kind of broke. So the other thing that the end of the scientific computing class, one thing that we did, again, trying to get students to connect what they do in science to computing. And these are juniors and seniors that are from all over different types of science. We basically had them write a report, a project. They had to either make a simulation, build a model, or gather, clean and visualize a data set. And so this is a student who was in environmental science. He's very interested in sort of climate data. And so they had to write an introduction and justify everything. So this was not a template, this is just an assignment. Make a notebook that does this, and so they didn't have anything to start with. They'd have a research question, and her research question is about temperature. And so she went through and did all this in Python to get the data set from NOAA, to clean it, and then her goal was to try to map it. And we'd done nothing with map plot lib with geographical data, literally nothing. And she went through and gathered all this data and analyzed it all. And then found base map, which I had never used because I haven't done much of geographical data. And she made this map using latitude and longitude and the temperature data from this data set. And this is a student who had no computer programming experience at the beginning of the semester, and they presented these projects on Monday. So that's the kind of thing that wants sort of giving the opportunity to figure out how they can connect to it. Students can actually produce some pretty amazing things. This is a student who was also taking a GIS class. So she bounced her data back and forth between GIS and Python, and did the sort of data gathering from the census data in GIS. And hers is incredibly long, she's very thorough. But so she had this hypothesis that basically the distance you live from a park would depend on the income in your census track. And so her sort of idea was that perhaps Salt Lake City, there were neighborhoods that were underserved as far as parks went, versus had sort of an equitable or a good amount of sort of park area. So she had to pull in just a lot of data from GIS from the census. I mean, this is all code she wrote entirely on her own. She wrote multiple methods for all these classes, and it's kind of amazing. I'm not sure I understand all of it. And in the end, the best thing she did was sort of bounce back and forth. And so she had a bunch of her data, she graphed it, and then she put it back in GIS, and used GIS to draw the maps. And so again, this is a student who the other day was like, so tell me about data science, what is that? I'm thinking about jobs and data science and doing something with this sort of geographical data. And a student who would never have said that to me, she wouldn't have known what data science was a semester before. So pretty cool, incredibly thorough. And then the last project example I'll show you is from a student who worked on cell biology, who used psychic image, again, not in class at all, to actually try to ask how many outgrowths are there from cells? That these cells are making neurons, stem cells. And so she was differentiating some neurons from stem cells. And she wanted to know, can I count the number of outgrowths? So she messed around with a bunch of her images. So she'd done a biological set of experiments in taking these images. So this cell, this stem cell, is making neurites all over the place. It shouldn't do that. But she wanted to try to quantify how many outgrowths are there. So she went through and processed the images. And then she was able to write a function that would put a blue dot on each of the outgrowths and count them. And she did that with a bunch of these cells that she had sort of flattened the images and made them a little bit cleaner. And then she threw it some really dirty data and it didn't do very well. And that's fine. But it did reasonably well. And so we had this whole conversation about automating things and what do you do with these files? How do you keep track of the images before they're processed, after they're processed? All these kinds of skills that she probably never would have dealt with in any other way. And again, then my favorite part of this is it's not just code and it's not just the sort of stuff that they've done in graphs. There is actual text, they think about, what am I gonna do with this? What does it mean scientifically? And they can basically turn in a research report that has code, results, and their interpretation. So that's sort of on the end of it with students that are a little bit more advanced. But with the graphs too, even with incoming students, they write code, they think, they describe, they talk about what they've learned, all in the same document. These are HTML, they've flattened them into HTML files and now they can send them to whoever they want to have a look at them. So I've talked plenty, so if you guys have questions or suggestions, I'd love suggestions too, things that you're doing.