 www.nanohub.org Online simulation and more for nanotechnology In this next section, I want to talk about regression testing. Now that you've built a tool, you can set it up so that you can test the tool. And more importantly, when you publish the tool, you can test it again and again and again. So, we've spent a few days now. You guys are pretty good now at building Rapture tools. You've got the tool built. You're running it in your workspace. It works. In a minute, I'll show you how you can get it published on Nanohub. So congratulations. You've got a tool. We'll go through all of this business about how you install and launch and test and finally publish the tool. But pretty soon you'll know how to do this part too. Congratulations. You've got a tool and it's published. So, what's next? Well, what's next is once your tool is published, people on Nanohub may put in wishes. They say, hey, I love your tool, but I wish it simulated gallium arsenide. Hey, I love your tool, but I wish it had a few more options for plotting. They'll put in different ideas of things that they want to see and there's wish lists on Nanohub to do that. They'll also find bugs in your tool. If you write software, you're writing bugs at the same time. Always. So people will find problems and then they'll put in tickets on Nanohub. And guess what? If there's a problem with your tool, you get the ticket. Congratulations. Isn't that great? So, when you write software, there are bugs and somebody's got to fix it. And if you wrote the tool, you get to fix it. So, I'm sure you'll be making changes because nothing is ever perfect. The other thing that's going to happen is that your advisor will come along and say, hey, that's great. I can't believe you finished that tool so quickly. I've got some new ideas. Let's try adding this equation. Let's try adding that equation in. So, your advisor will say, there's new physics that I want you to try and add this into the model and add that into the model. And you'll have to add a few more inputs. So, you'll be making changes. No matter what, you're going to be making changes to your tool. I'm pretty sure. So, let me tell you a story about back in the day. Remember I told you about that sequel program that I wrote back in the day when I was a grad student? I got to this point. I was adding some new physics in. I had a whole section of parameters here in my physics. You know, e to j and pi and all these different constants that I had fixed up. And I added just a little bit of new physics into my tool. I wanted to republish, right? So, at some point, when you add that new physics, you'll get in and you'll click the link that says, please install the latest code for testing and approval. And you'll want to republish your tool. Okay. Congratulations. You published your tool all over again. I'll tell you the rest of the story in a minute. But you've got this new tool. You've got all these different versions. Turns out on NanoHub, you can keep multiple versions online. You can have like a version 3.01 that's the very latest. And that's online. And then some people may say, well, hey, I want to keep this old version 2.1 online because a lot of people use it. It's got a different model, a simpler model. People also use it, but they want to keep the old version and the new version online. So, NanoHub lets you have multiple versions of the tool online. But getting back to my story, this is where I thought I was headed. My SQL program, way back in the day, when it was 1985 and I was watching Back to the Future and playing the Tetris game. So I had this program and I wanted to make a change, a simple change, relatively simple change to the physics. It turns out there was a little bit of a code, part of the code where I had to do like a sorting routine. And I don't know, you guys may have seen the standard bubble sort code before, but this is kind of what it looks like. I needed to go through a list of values and I needed to compare two values. If this value is greater than that value, then swap them. That's how you do a bubble sort. And when you're swapping two variables, what you typically do is you set the temp to the first and first to the second and the second to the temp variable. You introduce the temp variable, right? So you can swap values around. So I made a two line fix, or maybe seven line fix, to my program to sort the values like this. And then I started running again. All of a sudden, my program got this bizarre physics. It started doing all this weird stuff. I got this funny little curve in the thing and everything. And my major professor was very excited because we discovered some brand new quantum physics, right? But we looked more carefully into the code and it turns out, no, it was actually bizarre. The behavior of everything was completely bizarre because it turned out when I was doing my swapping here with my temporary variable, there was another global variable up in my state declaration. It was also called temp. Actually, I was using that for the temperature of the simulation. So as I was doing my bubble sort, my temp variable was overwriting the temperature of the simulation and instead of being at 4 degrees Kelvin, it was going all over the place. It was like 193 degrees Kelvin and then 7,000 degrees Kelvin and then 0 degrees Kelvin and it was going all over the place. Because of this simple one line fix, the one line fix, I thought all I was doing was adding a few lines of code into the program, but I forgot about my global variable and all hell broke loose. So the point is when you go to republish your program, you get everything all perfect, right? You'll test it, you'll check it, everything will be all perfect. You publish it on nano-hub and then a bug comes in and you'll make a five line fix and then you say whatever, put it back up. And it turns out the five line fix could be completely horrible to the rest of your program. Every time you go to publish, you really should check your program again as if you were publishing it for the first time. Go through everything, check it all, right? And you really should, you really should, nobody does that because it's a lot of work, right? You fix a bug and you put it out, you fix a bug, put it out and you just want to be done with it because you're just fixing bugs. But really you should be checking everything very carefully each time. So Rapture has a solution for that. Rapture can solve your problem because it has a testing tool and the way it works is you build up a suite of test cases. That first time through when you're building your program, take a few extra minutes and build up a bunch of test cases and save them. And then you might not ever need them again, but then the bug report comes in or the wish comes in or whatever, you go back to your tool and you add something. Just before you go to publish again, you can run back through these test cases. But instead of you doing all the work and taking all the time, you just push a button and Rapture will do it automatically. Rapture will go through all the test cases that you started with and check to see if they're still working. They may not. They may be broken. And when they're broken, you can look at each of the ones that's broken and figure out why it's broken. In my case, the temperature was varying wildly and I was getting strange physics out and I wish I had known that because if I had a series of test cases, I would have seen right away that these simple test cases are not working anymore. And I would have caught the error right away. So let me show you how it works. Maybe the best way right now, let me flip and show you kind of a demo of what it looks like. So if I've got this program here, my Fermi function calculator thing, and I can run it, working great, right? Go ahead and republish the tool, right? Fantastic. Everything's working. If I run rapture-tester, it'll bring up all these test cases that I created before. So all these different test cases that at one point worked perfectly for me. Now what I can do is say select all and run and this little regression tester will run back through all those test cases. First one worked great. Oh, wait a minute. That one failed. It walks through each of the test cases and tries to replay the old results and wherever it fails, it puts like a red X. The ones that are fine give me check marks and the ones that are failing give me red Xs. So you can run through all of these tests and again, this is much easier to do than it is to run them all by hand and look at them. And for any particular test, if it's working fine, great, but when it's not working, it'll show you what's wrong. So this one, in this case, it says the output status is bad. The test run failure was not expected. So if I look, it's giving me child process exited abnormally. In other words, that test case cordoned for whatever reason. And so that wasn't expected. It was supposed to give me a real result there. So okay, something's wrong with that case. I should really look carefully at the one EV case. Zero EV gave me different things. Here it tells me that the, there's a result missing from it expected to get a certain string in the output. It expected ASDF, but for some reason it isn't there. And then here it's telling me the details have changed. If I click view, something changed. Oh, in this case, it's really innocuous. What happened was, it used to be called very direct statistics. And now it's called very direct factor. You may look at that and say, oh yeah, I changed that. That's supposed to be different now, right? And in that case, that's okay. Don't worry about changes like that, because what we can do is, if I, if I like that particular test case, I can say this test case is good. It's a new golden standard for me. Everything's working fine. It was not supposed to have that string ASDF, and it is supposed to have the different label parameter act factor. So if you look at a test case like this and you say that's working correctly, then all you have to do is click on new golden standard like this, and it says, are you sure that this is right? All right, this is perfectly right. And I click yes, and so now, I basically told the system, don't worry, that test case is fine. So now when I run, that test case won't fail. It's not going to fail. And now when I run, that test case won't fail anymore. It's using the new output as the golden standard. So sometimes there are changes in your tests that really, things really did change. They're supposed to change. And other times, there are problems like this where it really wasn't supposed to core dump, right? So you want to go through all of these tests that fail one by one and see what's happening. For some reason, 1EV is killing me with that test case. And so I can bring up my tool and I can set it to 1EV and run it. Ran there, but maybe it doesn't match the, oh, maybe it's a different, it may be a different temperature too. I should look at the test case. By the way, it leaves around all these run.xml files so I can check them out. And there's a series of test cases here that I can look at. I think it was test three that was failing. So I can look in test three and see what was the default value and the current value and all of that and kind of nail down what was going wrong. Let me get back to the lecture and show you now sort of the rest of it. Now you get a sense of how this works. Let me go back and explain how this is working. So from the normal Rapture runtime environment, we can run the tool. You remember it generates a driver, it runs your program, it does the run.xml and we load up the result. And then normally that run.xml gets whisked away into your data results directory, like we've seen up to now. So normally that goes there. It'd be nice if that kind of stuck around because I'd like to use, I'd like to capture that and use it as a test case. And there are a couple of different ways of doing that. One way of stopping it from storing that result is to set the, to unset the SessionDur variable. What normally what Rapture does is it looks for an environment variable called SessionDur and it tries to put that result in that SessionDur. And if you can't find that environment variable then it won't put the result there. So if you want to keep those run files sitting around, one way to do it is just by unsetting SessionDur. Another way to do it is to create a driver file and run by hand. We've seen that too, right? When you run your program by hand with a driver file it leaves the run file around because in that case you're running the program not Rapture. But somehow one way or another you want to get your hands on a run file and then you can move the run file into a test directory. So if I've got a run case like this one, if I move it into a test directory I can keep it around for later as a test case. So imagine I'm running half a dozen different cases grabbing all the run files and moving them into the test directory. For each one of those test cases if I just add a little label up to the top of it then Rapture will recognize it now as a test case. So I can edit that run.xml file and I can add a little test label and a description and all of that. And this suddenly, that's how basically it makes it a test case for the tester. So all I need is a run.xml with a little bit of labeling stuff and I have a new test for the tester. In the label you can put a little pipe symbol and that's how you create folders. This is a folder room temp and then a value called 0EV. So if I have room temp and 0EV that will show up now in a series of folders in the tester. And the description here is a little note to yourself to remember why was I testing this case. Maybe you should say I should always core-jump when you pass in 0EV and maybe that's something you test for because you can do negative testing with this too. You can expect failures when certain things happen. Or you can say I'm trying to make sure that it does work in 0EV because that's a corner case in the problem. So just by taking your run file and adding a little bit of test stuff into it you can create yourself a test case. And then, just like we saw before you can run through all these test cases and see what's going wrong for each particular one. So we can run through and look at the various outputs and see what's failing. Oh, we should also... I should also make sure my laptop is plugged in. There we go. One other thing that I didn't show you before if we run through all these test cases looking for the failures remember that's the one that we re-goldenize so it's passing now. So we have a complete set of test results. Sometimes you'll get outputs that are slightly different. This is a case where the output curve it says differs from expected value and if I view it it will actually show me subtle differences in the output. What looked okay to my eye is actually wrong. My old case it expected to get... Let me see, the black is the expected result so it expected to get this black curve and the red is the test result and this is what it actually got. So here's a case where if I just looked at it with my eyes I would think everything's fine but yet it's actually wrong completely wrong compared to what I got when I first published the tool. So that's exactly the kind of subtle error that you want to be able to check. You notice it gives you two ways you can see the output here as a string then you can also see the output as a plot and you can scroll through. Turns out I guess there's one value in here that's correct. At the value 0.5 we actually match the same value and that's right in the middle when they cross but everything else is wrong on this curve. So the tester catches errors like that too. So you can go through case by case and examine all the results and try to figure out what's wrong. Here it says there's an extra number in the output so here somehow there's this extra number that the tool is reporting that isn't supposed to be there and you should say oh yeah that's good or you should say don't, I forgot I deleted that out of the tool.xml and I didn't mean to. So one way or the other you go through all of these different things you notice that it'll tell you the subtle differences in labeling and the run status and so forth and it's up to you to look at each one of these and go through point by point and try to convince yourself whether or not the difference is real and it should be there and you need to fix your tool. You can go back into the tool then and edit it. If you do some debugging on this particular Fermi one you can see how it's working and you might notice ah well here it was 4.2 Kelvin so if it's 4.2 Kelvin and one EV it's actually dying. I put in an explicit kill kind of thing so if I simulate oops not that one if I simulate 4.2 EV and one EV ah child process exited abnormally so that's the problem that I ran into in my test case and I can fix the code just by saying oh yeah now that's fixed doesn't do that anymore so if I put in 4.2 and one EV then now it works good that works the way it should and if I run my tester and select well that particular case let's say and run it ah that one's working now so little by little you can work your way through all of the failures that you have and convince yourself yes this is working now working correctly and you want to make sure that you get all green check marks down that column before you republish your tool so this is the last thing you do is kind of double check at some point as we get more and more people using the tester we may start using it as a standard for all the tools and we'll be able to on Nanohub say this is a tool to test sweep and pass and this is a tool that doesn't have a test sweep or didn't pass the test the last time it ran so we'll be using that as kind of a quality metric in the field of science they call this verification and validation verification is making sure that your code runs correctly like this and validation is making sure that the model actually explains reality so scientists do this very carefully when they're building codes like this and in the future we may start to like I said require that or use that for Nanohub right now we don't because the tester is not quite feature complete like the builder the tester is about halfway there well even a maybe less quarter or a third of the way there it does some simple things it handles most of the simple inputs like the builder on the output side it only handles a couple of outputs it'll handle curves but for example may or may not handle images it doesn't handle Unirec 2D in fields for sure we haven't implemented that yet so there's still some things in the tester that are not quite there but for simple tools it'll work great for you so give it a try and see what you think these are the kinds of errors that you'll be able to catch the fact that an output value is changed that an output value is missing or extra and also that the input value has changed maybe the label changed the units changed whether it's missing or it's extra so you can track down all of those cases and again if you convince yourself that should be like that I really did remove that input it's different now then you can click new golden standard and force that test case to to pass from then on that'll be used so here's the assignment for you guys you have a nice tool working now yeah actually we might want to back up and use the very oldest version of this tool that you guys created again because the tester is not feature complete I'm not sure it'll work very well with all the fancy stuff the notes and the groups and all the enable and disable and all that stuff that you put into your latest spirograph I'm not sure that works in the tester so we back up to the plain Jane version of your spirograph it just has three numbers and produces a spirograph plot you can create a test suite for your spirograph plot so what you do that again is you make a test directory you run your test cases a few times by hand you take the run.xml files and copy them in I want you to test those three cases I want you to run the tool as a fancy cross as a flower as a palm branch you run that xml put it in the test directory and then try to run the rapture regression tester on it at that point if you run it everything should run cleanly and pass and then you can edit one of your tests and delete some of the numbers in the test case and then run it again and at that point your test will be weird compared to the current result and you should at least be able to catch the error and you'll say no no my tool is working I want you to run the test so then you can re-goldenize and make it all properly again so give that a try back up to your early version of spirograph if you don't have a version let me know because we can put one up on the bootcamp site that you can start from and then create those three tests and mess around with the tester okay let me show you my solution for the lab assignment here with the regression tester so here I am in my directory I can run rapture and simulate different cases and I want to build the set of regression tests here for various test cases with different values and normally when I run rapture it will give me results but when I close rapture and look around I don't see the run files jeez if I could get my hands on those run files that I could use so there are two ways to get your hands on those run files one way is to copy them out of your data results directory under data results there's all these other directories that you can see that all depend they're all different workspace sessions that I've fired up and you might wonder which one is the current one well there are two ways to find that I can echo dollar session and it shows me that the current session is 3264l so I can go into data results 3264l and there are a bunch of run files so those are all the run files usually the run file with the highest number is the one that I just ran so 78 739 uh 19 jeez I'm blind 392 well I probably wanted 394 that's the latest one alright so there's a run file in fact if I want to double check and see what it looks like you remember you can do rapture-load and it'll pop it up okay so that's that run file and it's an interesting one because it's 1710 and minus 3 and it gives me that funny looking shape so I'm going to actually grab that one as a new test case I'll add another test case so I'm going to copy that file from that directory back into my test directory and I'll give it a name I could leave it the same or I can give it a name funny shape dot xml alright so now in my test directory I've got four test cases actually I need to make that into a proper test case and I do that by adding test in here and slash test with a label and a description okay description should probably remind you later when it breaks like what was important about this test case now another way to grab test cases for the run dot xml is to if you do the unset your session dir and then if you run rapture all the cases that I run from here on one and two okay so that's another test case and now if I look that run file is sitting right there so I don't have to go over and hunt it down in my data results directory if I break that link and get rid of that session dir the run file is sitting right there and now I can take that run file and I can move it into my test directory and I'll give it a different name again for the run folder dot xml let me edit that one and give it a test description with the label and I should set the description I'll just skip it alright so that's how I generate test cases and some people were asking me oh boy how do I set my session dir back well one way is just by starting a new xterm because if I start fresh with the new xterm it'll have the session dir back or I can sort of copy and paste that value and do the export session dir equals and then can I copy and paste our middle mouse button there we go so now I've got my session dir back alright so now I've got all these different test cases and if I run rapture dash tester it'll bring it up and it finds them all if it doesn't find some of your test cases it's probably because you forgot to put test with the label in there and all that so anything that it finds in the test directory that has a label on it it can pull up as a test case and you notice there's no description for that test case but everything else has a description so I can grab one test case or I can grab all of the test cases and run through them now and oh it's telling me there's something wrong with that test case let's take a look so if I look at that test case and it shows me just the output is different the result differs if I click on it it'll show me it expected these values and it got a bunch of other new values over here and you can even see the difference on the plot here the black line versus the red line right there is sort of the difference between this and this and if you look at it very carefully whenever you see a difference like that you should always ask yourself what the heck is going on you look in your program you convince yourself whether or not your program is correct you should ask yourself is my program right in this case my program seems to be doing the right thing the latest result that I got the red result, the test result is smooth and correct and the old result is actually weird and choppy and you might ask yourself at this point how did that test ever get in there but maybe you just never noticed it the last time you ran the test case it looked okay to your eyeballs but it's wrong you might say well alright the test is wrong my program is correct the test is wrong and in that case I want to use the current output for my program as the new golden standard now rapture says are you sure are you sure that the latest results are completely correct and if so then yes so now if I run that test now it's checking the output against my latest standard so rapture will look at all kinds of things let's say I run the rapture builder and I'm going to open my tool.xml and suppose I add a boolean control in now to my tool.xml and I'm going to save that uh oh set a default value and save that alright so I'll save that tool.xml now when I run rapture you see I've got the extra control in there alright and now when I run my rapture-tester and I run all these tests they're all failing why did they fail well it's telling me that the input has changed this test case doesn't specify that new boolean input value and it didn't of course because back in the day when I created these tests I didn't have that control I do. So the question is is that okay and again if that's okay then I can say alright keep that as my new golden standard and unfortunately you have to do it for each one yes yes yes yes maybe maybe someday there'll be some kind of batch thing where you can select them all so now I've re-goldenized my tests again so now they expect that control and if I run them all they'll pass immediately because I changed the interface so as you're building your tool and checking everything you always want to ask yourself is it something that I did in the did I make a mistake in terms of my simulator or did I make a mistake did I add something to rapture or what you remember it's real easy to mess up your code it's not very hard at all to make what looks like an innocuous change you just run one and save your program oops not that much of a change and run your tests and they're all failing again and this time I'm like yeah yeah yeah the output is actually completely different now from what it's supposed to be completely different and it was like oh yeah I was debugging something and I added that stupid 2.1 in there I didn't mean to so you eventually track down the problem and you say oh man this is wrong and you fix the problem and you run it again oops with the tester and now you can convince yourself that everything's running cleanly so that's the whole point of the regression tester is to say hey I got all these test cases and I want to make sure that when the temperature is zero or the temperature is 500 million degrees Kelvin or whatever the electron volts is negative I want to make sure my simulator doesn't crash I want to make sure that it gives the proper value and I didn't make any mistakes by accident because it's so easy right to change one character in your code and it screws everything up in the program oh I got another one for you oh I should have done this example case oh man the uh the classic example is I have a loop variable up here I set I equals one right or I equals 10 for my loop counter but wait wait I is the imaginary number in MATLAB right so now if I'm running my test case select all and run burn see a simple innocuous statement like I equals one that you wouldn't think twice about because you were putting in a loop counter all of a sudden you completely broke your program and you can look at each case and convince yourself like what did I do oh my gosh oh my gosh what did I do right I decided to put it on a spirograph anymore let alone the wrong one so so when you do stuff like that be careful check your code carefully and make sure when you go to publish oops I always forget the tester part make sure when you're about to republish your tool you always go back through your regression test suite and have it run cleanly and again my only my biggest apology is that right now the rapture regression tester doesn't work for really complicated tools we need to fix it and finish it so that it works with all the complicated output types because then it would really work for you guys and you could you can use it to check everything but at least you know it's there and at least you know you should be very scared of any one line change that you make to your code because the simplest one line change sometimes can break everything