 Okay, well, hey everybody, hi, welcome. This is level up your scientific coding. This is level two unit testing. This is a webinar brought to you by CSDMS, the Community Surface Dynamics Modeling System. I'm Mark, I'm a research software engineer at CSDMS. Hello, we are all adjusting a bit so I'm taking the webinar from home and my name is Benjamin. Yeah, it's too bad. It was fun because we got to do it together last time. It's still, it's kind of fun, still when we did a part yesterday, we practiced. All right, so here's the idea behind this webinar. So unit testing is a concept from software engineering that's really useful. Like professional software engineers use this all the time. And in this webinar, we want to try to explain what unit testing is and try to hopefully give you evidence and convince you that you, as a busy scientist, could also take advantage of unit testing and make it advantageous for you. It's actually a time saver in the long run. Yeah, because we have to be honest here, Mark, probably you will have to invest some time initially in developing your skills and learning how to do it. I'm going to be honest with you guys, I'm also in the process of learning how to use unit testing, but evidence is there that for sure you will save some time at the end. So it's absolutely worth doing the effort. And then we can go to the overview of what we will cover today and rather than spending a lots of times showing the nuts and bolts of how to do unit testing, we want to present some arguments on why we think it's actually beneficial for you to use unit testing and we will point you to some resources where you can get nice information on how to do it. So we give you this webinar, but for sure there's others who did the same and who have maybe done it even better. So we will point you to these resources and feel free and go ahead and have a look over there. So the topics we will cover are first, what? What is unit testing? Because it's a little more abstract than our previous webinar where we discussed version control. This one is like slightly more difficult to get. So Mark will spend a couple of minutes on explaining what unit testing is. Then we will present a number of case studies where we think unit testing can be useful from the perspective of grad student, a postdoc, a researcher and a professor. After that we point you to those resources I mentioned before, the online resources. And finally we will wrap up and this will take a good amount of time. We will wrap up with an example where Mark is going to show actually how you can do a unit test. All right, so let's start with the what. So what is unit testing? All right, if while I'm doing the webinar, if you see me looking over here it's because my notes are on the screen over here. So don't be too distracted by the fact that I'm not looking at you. I'm trying honestly. All right, so unit testing, here's the idea. So for every unit of software that you write and unit is a little bit of a vague word but it's meant to be maybe like every function or every sub-program that you write. Not like every line of code that you write or every block of code, but kind of the sub-program level. All right, so for every unit of software that you write you also write software tests. And the idea is that these tests you run them and you then ensure that the software you written behaves as expected. All right, so the idea is you write code, you write tests. You don't check in code. So I'm trying to think how to say this. So I've given it kind of a definition of unit testing but let me talk about some behaviors that you'd also have that you do with unit testing. All right, so the idea is don't check in code to a repository unless the code has tests. All right, don't check in the code unless the tests pass. Any new code that you add to existing code should also have tests. Don't remove old tests but add new tests for the new code. Writing good tests is kind of weird. It's a learned scale. It's somewhere between art and science. And when Benjamin talks a little bit later about kind of some good resources and when I do my demonstration I'll try to show kind of the process for writing a test. It's maybe good but I think it's useful. All right, so you can see on the slide I've added a couple of keywords here as well. The idea is you want to isolate. So you want to test very narrow pieces of code. You want to cover that is you wanna get as much of the code tested as possible. You want tests to be automated so that you just run them. They go through at any time to test the code. And you want reproducible results. You don't want the test to be flaky. That's actually a technical term flaky for tests. So Mark, if I understand this well so you want actually to test the fine grade logic of your source code. Does that mean then that you also have to change the way you code? Cause if you want like from what you say it's sound that you need like small modules or components which you can actually test. Is that also influencing the way you write your code in general? Right, that's a really good point. So testing definitely lends itself to writing modular code. You want to write small sub programs and then assemble them into a larger program. And I know like from my perspective you know I started writing code in the awhile ago. It was in the late 80s, early 90s, you know and I was bad. I wrote lots of main program, lots of scripts, you know and so main programs don't lend themselves quite as well to testing. Although the example that I'll show today actually has a main program. But nevertheless, the idea is that testing does tend to lead to better programs because you want to have small pieces that you can test, these small pieces that you can then assemble into larger programs. And then maybe an additional comment from a scientific perspective. So there's several things when you're writing up your code and uni testing is really about testing your software and your coding. So next to that, there's of course the thing that you want to make sure that your actual science is good. And that's something we will not cover today. So today we will cover where your software implementation is right, but even with the perfect software implementation you can have a crappy scientific code. So there's like two things we can do to evaluate scientific code and that's verification which would be comparing your numerical solution for example to a known analytical solution. And there's evaluation where you will go and confront your model's output with real world data. And both are super important, but it's not something we are going to cover today. Today we will really focus on this uni testing. Right, that's cool Benjamin. Like I know that I do unit testing, you know, it's purely the software. I don't really need to know about necessarily the science, but what's going on. Cool, thanks, that was good. All right, so let's move to our first use case. So we will start with the graduate students. And for that I will give you an example I picked up in a paper which is attached in the GitHub folder we share with you. It's from Clune and Root. And there they'll lay out why you should do uni testing in science and more specifically they do it they do it in climate model development. And they say that initially scientists in their team were quite, let's say, they don't really want to do this uni testing because they felt it's like distracting them from doing the actual science. And I think that's true in general for software developments. You want to implement your code. You want to check where things are working. But this gradually changed last decades in software engineering and now also in scientific code development. And of course that's because of the awareness of the usefulness of tests. And a second thing is that there's tools out now which helps you a lot in doing this testing. So also for graduate students you don't have to develop all of that yourself. So and the tools we will cover today and Mark is going to give a nice demonstration of how you can use PyTest for example are really helpful there. So the example I want to give you is the one covered in this paper of Clune and Root. And well, I think everyone, every grad student and even less as a scientist at one point you see a paper coming out with a nice set of equations and you want to check out whether or not it's you can reproduce it. And what you typically do as a scientist is diving into the code and trying to reproduce it as quick as you can. The alternative is to develop a code where you start unique testing from scratch. That means that you will almost every line for every line you will implement a test to see whether the actual software implementation is right. And the example covered in this paper is about implementing a set of equations to simulate the shape of snowflakes. And in the end, the researcher who did the implementation came up with a code of 800 lines of tests and only 700 lines of code of source code. So this is interesting. And you think, well, I'm spending a lot of time in developing this code and that's true. But the difference is that upon completion of his source code he could run the software up to completion without a single bug and a single error and reproduce the results directly. And what you typically do as a grad student and I'm like facing this problem myself is doing the implementation and solving the equations and then start debugging. And without you realizing, this debugging takes a lot of time. And the time which is set up in doing the debugging can be easily compensated for by actually doing this unit testing from the beginning. So that's just a little example. By the way, I think you can command that implementing all these components might slow your code and that's true. But you can still develop a code in different components, test it and then work on the performance in a later stage. So that's an example for our grad students. And then another example we can cover now is as a postdoctoral researcher which is actually what I am myself. And this is something which happened to me just a couple of weeks ago. And I wrote a piece of software to do calculations on a digital elevation model. And my code was working fine. I did some tests and everything seemed to work great. But then I downloaded a DEM from a different source. I loaded the DEM in my software package and was doing my calculations. And all of a sudden, all my elevation values in this digital elevation model were changing into infinity. Like I think I spent at least three hours trying to debug the code and trying to find the error. You know what? There was not a real error. The error was in the input arguments of the function I was using. So what happened is I downloaded this DEM and to save some place on the server, they converted the DEM to a single precision which is then stored as a floating value 32 bits. But in my function, I was using a piece of C code and I actually needed a floating number with 64 bits. So actually a simple unit test on my input argument would have prevented me from spending at least three hours in debugging the code. So that's an example from the perspective. It was awesome Benjamin, thank you. I mean, not thank you for all your work but that was a cool example. It wasn't awesome to waste all my time there Mark. All right, so I like to give some perspectives. So the next kind of user case is from the perspective of a research scientist. This is what I am. So I kind of thought about some things that affect my work. So the idea is that as a research scientist, no matter how much science I'm doing, I'm probably doing some coding as well. I'm also probably working with other people who are also doing some coding. Maybe we're collaborating on a project. Now this is a spectrum, you know, some people may do lots of science, some people may do more software stuff. I do, I'm more on the software end. But I think that the bullets I have here kind of describe how unit testing could be useful as a research scientist. So first of all, confidence. This is a really interesting word that I see used a lot when reading about unit testing but it really resonates with me actually. You know, the idea that when you're writing code, if you have good tests that cover the code, they will give you confidence that when you write new code, you're not gonna be breaking anything in the existing code base. And I think this is really cool. You know, I write code and I feel free to explore because I know that my tests will tell me if I'm breaking anything. The next bullet, you know, again, people may be relying upon the software that you write. All right, so the idea is if I can test and try to catch bugs before they get pushed, before my software gets pushed out to people, that's a good thing. People will have more confidence in me in the software that I write. All right, and then the last one, so safety in a collaborative project. So again, imagine you're working with collaborators, maybe different university, different research institutions around the world. You know, everyone's contributing maybe some pieces, some pieces of software to a larger project. If everyone's doing unit testing, then again, you have confidence, you have the safety in knowing that the code is tested and is working and won't be introducing bugs into your application. All right, so these are some comments from the perspective of a research scientist. To watch your ideal world, Mark. What's that? To watch your ideal world. Yeah, right, right, yeah. All right, so the last one, user perspective is from a professor. I wasn't able, I didn't try very hard, but I didn't wait any professors to do any guest cameos here, but I think I can give the perspective of maybe a professor as a team leader. So a professor who maybe leads a group of graduate students and postdocs and research scientists and wants to be able to help organize and facilitate their work together. So two kind of ideas I came up why unit testing could be useful for professors in this context. One is that when we write proposals to NSF, for example, we are always asked to give metrics so that the program officers can kind of keep track and grade, if you will, how the project is going. And I think unit tests give one, give a metric. So you can do things like, how is the code doing? What's the health of the code? What's the coverage of the code that you've produced? Have you found bugs, things like that? How many bugs found, how many bugs fixed? So unit testing can provide metrics for success in a proposal. It's also useful then my second bullet, just teamwork. Again, if you as a professor are organizing this group of junior to senior type researchers, if you know that everyone's doing unit testing, everyone's putting in the test, the time to make tests, to make the software good, you have greater confidence in the software that you produce. That's great. Yeah, all right. So those are our different user cases. So like we said, there's a lot of resources out there where you can find insightful information on unit testing. What we noticed while looking at the internet is that most of the resources are covering software development, both for scientific purposes. So I think unit testing is like more new to science than it is to software development, which is why we actually do this webinar. But anyhow, you can find a very nice book, Code Complete. I think Mark is a big fan. I'm going to be honest with you guys, I didn't really. There's a Ministry of Testing and then there's the PyTest documentation and especially that one I find very interesting. So if you're new to this unit testing thing, just go on the PyTest documentation website and try to do some of their examples. Because I think that's still the best way to actually learn about unit testing. Just do it. And then there's obviously the paper I already referred to of Clunen Road, where you can find a nice example on how to do unit testing. Right on. All right, thanks Benjamin. All right, so now we get to see a short demonstration of unit testing. Let me just get my notes up here. Okay, so we'll start with the repository. And actually this is the same repository that we used for the first webinar. So let me pull that up so you can see it. All right, here we go. So under GitHub, CSDMS level up. All right, and just a quick comment. I've reversed the colors on my web browser to make it a little easier for me to see. So if you go to this page, you'll see a white background. That's the standard background. So just so you know that's happening. Okay, I would like to take a little extra time, hopefully like just like a minute and a half to actually go through and kind of reinforce what we saw in the first webinar. I want to clone this repository and then start using the software inside of it. All right, so let me go through these steps. All right, so I want to click on the clone or download button on this web page. You can see it's kind of a pink color on mine. It's a green color on your web browser, likely. Okay, so. We have the same screen as you have. Well, I know, but if you're following along, you know, if you have your web browser up, yeah, that's all. So all right, so I can clone. Note that you can use either SSH or HTTPS. I think HTTPS is the default. I happen to have it set up with SSH. I'm gonna do that for mine. So I'm gonna copy that link. I have a terminal set up. Okay, so nothing's here now. I have Git installed on my machine already. All right, so I'm gonna get clone and then paste in this URL to the repository. All right, because I'm using an SSH key, I need to put in my password. All right, so you can see I've cloned the repository if I CD into it. All right, you can see it has files and directories like what you can see on the webpage. Okay, so recall that this Python module that we used last time is a little old. It uses Python 2.7, and so it's a good idea maybe to run it in an isolated environment. So there are a couple of different ways of setting up environments in Python. I like to use conda, all right? So I've installed conda on my machine already and you can see I've provided an environment file that you can use in the repository. So let me set up this environment for us to use. All right, so I'm using conda. We haven't talked too much about conda here, but I'm gonna plug, I'm gonna plug several times, but this will be my first plug for the summer school that Benjamin and I are helping to teach this coming summer. So we'll talk a lot more about things like anaconda and other things as well. All right, so I want to, okay, so this is the command and this command is also listed on the read me. I probably could have just copied and pasted it. It's on the read me on the web page as well. Orc, maybe a small side note. So if you guys don't know what to do and used to get home these days, you can always go to the CSDMS website where we have a bunch of tutorials on how to actually set up this Git SHH links to learn on conda and to learn how to make environment. So if you don't know what to do, just go and check out the website where we have all these tutorials listed. Right on, good idea. Yeah, good idea. Thank you, Benjamin. Okay, all right, so conda's now got me set up. I don't think I can contact me. I think if I source activate the name of this environment. All right, so now you can see I'm using the level up environment. This is gonna be Python 2.7. Okay, so let's take a look again at the files. I think just to refresh your memory, let's run that program again so you can see the output. All right, so this is what we did last time. We just call Python on this module. And now you can see there is a new file called gph.ping. If I open that up, all right, you can see this is what's produced. All right, and that's on the GitHub page just 12. All right, so just to refresh your memory. Oops, let's come back. There we go. All right, we removed that for a second. And let's next look at the code. All right, so again, I haven't done any testing yet. I just kind of want to refresh your memory as to how this module is set up. Okay, so I just opened it up with my favorite editor. All right, so you can see this is Python. Again, I'm using Python because it's kind of, it's a nice language. Everyone can get access to it for free. Even if you use MATLAB, you can read Python and say, oh yeah, okay, that's not too different. All right, so let me give kind of just a high level overview of this file. Oh, you know, actually, well yeah, so a little high overview of this file. So it's made up of a couple functions. So you can see there's a function called read. And if I scroll down, there's a function called prep. And you can see both read and prep have input arguments. And then there's a function. The last function is called view. And it has several arguments. Some of them are keywords. All right, and then at the bottom, there's a main program. And this is what's run when I call it with Python. All right, so I would like to use this program that we used in the last webinar. I like to use this program. I wanna test it, because I had no tests on this before. And actually, as a sidebar, if you recall from the first webinar, I had to make some changes to this because something happened between when I wrote it several years ago and when I tried to use it in the last webinar. If you recall, the date string was wrong in the title. If I'd had unit tests, then I maybe would have caught that, for example. So this code could benefit from having unit tests. Okay, so let's take a look. So if you take a look at the repository, you can see that there is a directory called tests. Let me CD that directory and then do a listing. Okay, before I explain these files, let me just step up a level. Again, I'm using Python because it's easy. I'm also gonna use the Python tools, PyTest and Coverage. And I wanna be careful. I don't want this webinar to be about those Python tools. I wanna try to keep it at a higher level and just think about unit testing in general. We're gonna use these tools to demonstrate unit testing, but don't get too caught up in the details. It's actually one of the tools I refer to to facilitate the testing, but we just wanna go for unit testing as such. Right on, yeah. Okay, so in this directory, you can see, I have a directory called tests. There is a file called underscore, underscore init, underscore, underscore.py. That's a module definition file. I needed it to make my tests. As far as this webinar is concerned, don't worry about it. We're not gonna use it at all. What is important is this file, all right? So it's called test underscore NC20 reanalysis. All right, that test underscore is actually used by PyTest. Let's take a look at that file. So let me just give a quick overview of what's in this file. All right, so you can see I'm importing a couple of things including PyTest. You can see in this line that I have imported from my NC20 reanalysis module the read, prep, and view functions. All right, I've set up some paths because I wanna get access to the sample data file that I use with this program. After that, you can see that there are a series of functions. Okay, I'm just gonna scroll down through this file. All right, so I have a series of functions. Each of them begin with test underscore. And most of them make use of the assert statement as well. That's a common statement used in testing. Okay, so that's kind of the overall structure of this file. Let me go back and run PyTest to run these tests on the example program. All right, so I need to step up a directory. So now I'm back at my main directory. And here's the call to PyTest. Now I'm gonna add an option just to make the output from PyTest a little more verbose. Okay, there we go. So you see PyTest ran. The way PyTest works is it looks for any files that begin with underscore test and then any functions named underscore or test underscore inside that file. So that's why it thinks of the way it was. Okay, so you can see that there were eight tests and they all passed. Let's go back to the test code. And I wanna make just a couple of comments on what some of these tests do, what they look like and correspond to what they do. All right, so the first test. Okay, you can see the first test that I make is, does the file that I wanna work with exist? All right, everything's not gonna work if I don't have this file or at least a file that has the same setup. All right, so you can see I'm asserting that the path to this file is true. All right, so if the file didn't exist, then this part of the equality would be false and if I try to assert false is true, I get an error. All right, so this will tell me whether the file exists. All right, the next few tests, let me just talk about these next three tests. All right, I'm dealing with the read function just to refresh your memory. The read function basically tries to read some data from a net CDF file and then it returns that data as a Python dict with data inside of it. Okay, so you can see I'm testing first of all, you know, what happens if I try to call the read function with no file parameter. All right, so I need to be able to tell it, you know, what file to read. I then test, you know, trying to read a silly file. You know, this file doesn't exist, it's not a net CDF file. I then try to give it correct data and see that it's actually giving back this dict of information. All right, so these are a couple tests that I can perform kind of on the inputs and outputs of this read function. Okay, I think it might be useful to add a test and maybe kind of, I can use this to show the process of writing unit tests and recall, this is kind of somewhere between an art and a science. All right, so let me get to my notes here. All right, so in the view function, you can see that there are several arguments. One of them is day of year. Okay, you can see that it's used to subset the data that come in. It's also used to make a title on the plot. Okay, so you can see I provided a default for day of year, but what would happen if I gave an erroneous day of the year? What if I had negative one or 1,000? What's gonna happen to the program? We can test that. So let's go back to our tests. I'm gonna add a new test here towards the bottom. So the process of adding a test is you basically just wanna first make it work and then gradually add more information in order to try to fully test the unit of code. So I'm gonna make a new test, what we call this bad day of year. Okay, so the first thing I wanna do is just make the test and try to make it pass. And I can do that by just using some Python code. Don't worry too much about that. Okay, so I made a test. Let's go back and try to run it. So we had eight tests before. If I run PyTest again, okay, good. So now we have nine tests. And you can see that our test, test view bad day of year passed. All right, let's add more information. So this is gonna be interesting. All right, so I'm gonna need to add a bit. So I need to read my re-analysis file. Then I need to prep the data. Now, I'm using read and prepped, but you can see above this in the file, I've tested those functions. So hopefully that means that they're working right now. And I'm not introducing new errors by using them in this test. So I basically set up so that now I'm using the default day of the year. So this test, as I've written, it should still pass. Let's try it. So I'm going back up, I'm running PyTest again. All right, you can see it took a little longer as I had to actually read the file, but you can see that this test is still passing. Okay, let's see what happens now if I try to make it fail. All right, I'm gonna, let's put in. So now that'll be a day of year that's erroneous, at least on Earth. Okay, so this test should probably fail now. If I run PyTest again, again, this is the great thing about these automated tests. You know, they're all set up, all you need to do is run PyTest. Boom, okay, so it fails. All right, you can see which test failed. You can see there's a stack trace and you can see PyTest is trying to help me. So it's telling me that 1,000 is out of range. I only have 365 days I can work with. All right, so it's an index error. All right, so I'm gonna try to trap that index error. Steal some code from above here. Okay, so now this is a complete test, all right? So now this- Can you comment a bit on what you just did? Oh yeah, thanks Benjamin, yeah. So I want my test to pass, all right? So I am testing for a bad value. So what I'm doing is trapping that bad value. I know that if Python raises a particular error that I know is associated with this bad value, then I can make this test pass. So I'm basically testing the opposite, making sure that if someone does put in some bad data, this error will be thrown. Got it. Okay, yeah, thanks for that, that's a good question. Thank you. Okay, so if I run it now, run PyTest again. All right, now it passes. All right, so good. So we've added a test and you can see the process of adding a test. You know, I kind of built it up from just nothing into something that shows a good test of an input parameter to this function. Okay, so that's an example of writing a test. I have one last thing I wanna show in this demonstration and that's the idea of coverage. Okay, so, oops, right. All right, so the call to coverage is something like this. All right, so the coverage package, the idea is it looks to see what code is actually tested by your test and what test isn't. And it gives a score basically, you know, the fraction of code tested to the total amount of code. Okay, so it runs. Okay, that's not very exciting. It's actually cool to get a report. All right, so you can see, well, I could have added some keywords to remove the tests themselves. What's important here to see is that for the Python module, you know, there's total statements 53 and I missed 18 of them. So my coverage score is only 66%. You can actually go a little further and see what chunks of code you missed by generating an HTML report. All right, so that makes this directory HTML, COV. All right, if I open, oops, oh, wrong one, that's, just a second, I'll try that again. There we go, okay, so you can see that. All right, so this is the same report, but the neat thing is that if I click the link, ah, it shows me my code. Now, let me reverse the colors on this. Oops, oh, darn it. Okay, so you can see the green bars next to the numbers on the side. That shows you that code has been captured in a test. So that code has been tested. So good, that's good, good, all right, good, good. Oh, bad, okay, so you can see that there are chunks of code in this Python module that haven't been tested. So this is where I should focus my efforts if I want to write more tests. All right, so it's a neat tool. I should comment that the coverage score is not a super, it's not a great metric because you can game your tests to make tests that cover all lines of your code. It's more about making good tests. So even though 66% probably isn't great, I could do better. It's not like a super important number. Okay, I think it's about it. So one thing, as I was writing the test for this, I know I didn't do a great job. I didn't spend a whole lot of time on this. I could have done more. So if anyone watching has ideas for more tests, for better tests, you can go to our course or the repository for this webinar and submit a new issue or if you make changes, you can submit a pull request to us and that'd be really cool. I'll add anything that you do to some additional documentation for this webinar. Also note that we're using PyTest. I showed some really simple examples. There's a lot more you can do with unit testing and this again will be another plug for the summer school that Benjamin and I are helping to teach this summer. We'll go into more detail with unit testing. Things like fixtures, for example. Okay, I think that's it. So yeah, thanks you all for watching. It's great that you were here. If this was maybe a bit too false, we will add all the code, Mark just displayed in the repository and I would really recommend you to just go out there and try to test some code yourself because that's still how you develop your skills. Obviously we are not living on Mars, Mark. So we only have 265 days a year. So we have to speed up a bit but we have, as Mark said, this summer school coming up where we will cover all of that in more detail and we have our next webinar in one month which is on object or in programming. And I see there's already a question out there. Mark, did you see it? So I will read it out loud and maybe you can comment on it. So a question is due to tests based on errors you've had in the past or is there a list of common tests and how do you know when you have the proper number of tests? Oh, okay, all right. So I don't think that there is exactly like a list of standard tests but maybe, so I don't have high confidence in this answer but typically again, when you write unit tests they're written on units of software. Unit typically means like a function for example. And so if you test all of the inputs and outputs for the function that's the kind of standard procedure for ready unit tests. You can do more, you can do other things but testing inputs and outputs is a standard procedure. And what was the second half of the question, Benjamin? When do you have the proper number of tests? Oh, right, right, yeah. So again, so maybe not high confidence in this answer either but there isn't a good number. Basically you should do it until you feel that you have confidence that you're testing your software well. I mean, that's kind of the idea behind the coverage score but just again, bear in mind you can game the coverage score to make it 100% but have really crummy tests. So you just wanna make sure you have enough tests that you feel pretty good about your software. I mean, bugs are still gonna happen but you at least have some confidence. And this is again, we're kind of the, it's kind of an art to do software testing. Absolutely. And our question was whether we can send the link to the summer school so I added the link in the group chat. All the questions, go ahead. Okay, Mark, I think that's it. Okay, sounds good. Will, thanks everyone. Be well. See you all. Bye-bye.