 So thank you very much for coming. Can everybody hear me? Okay. Thank you. I'm a couple of slides of presentation. Indeed, I have not my badge, so please let me introduce myself. I'm Valerio. That's me, of course. Indeed, after me. I have pitched the Incomputational Science, and I'm currently a postdoc researcher at the University of Salerno in Italy. I can define myself a data scientist, whatever it means. And of course, I'm also a very geeky person, so I like all the stuff. And please don't ask me to fix your computer, but I'm quite sure that you'll never have me that. Yeah, let's get serious. So these are something, some of the topics I work on, usually work on. I work with a machine learning algorithm for machine retrieval, tax mining in general. And I recently joined the team in Salerno, working with Linked Data and Semantic Web Data Technologies. Very interesting. And I usually apply all this stuff to the software. So in fact, my main research field is software maintenance. And so I basically apply machine learning algorithms to the source code and the analysis of source code. And of course, I do all this stuff with Python, so I prefer the programming language. And these are more or less all the tools I use, basically, every day. In particular, the machine learning tools I use the most, and I like also the most, are this one. And these are all the tools I'm going to talk in a few minutes. So let's get to the point, machine learning and the test. So what? So the presentation is more or less organized in two different parts. The first one, we're going to understand what should be common risks and pitfalls related to machine learning and machine learning models, or at least I'll try to introduce some of the topics you should see about this kind of things. And in the second part, I'm going to talk about testing machine learning code, what actually it means and what tools I'm required to use. So please, before we start, please let me ask you three questions. First of all, do you already know machine learning on many of you? So you're all perfectly suited for the stock. And do you already know use here about testing or test driven development? Yeah. And have you ever used like it learn from machine learning? Okay, perfect. So I'm trying to skip all the introductory part. So basically what machine learning is. This is one of the most common definition of machine learning. That says that machine learning is a systematic study of algorithms and systems that improve the knowledge and performance with experience. I took this definition because this definition points out a very interesting part of the machine learning, the algorithmic part. So at Glantz, so basically machine learning means writing algorithms and writing code at Glantz machine learning should look like this. So basically, it's algorithms, data and statistical. So in a few words, in a nutshell, machine learning should be summarized as algorithms that should run on data. And from our point of view, I mean from the point of view of this talk, we should deal with algorithms that we should deal with the testing of algorithms that analyze data. So we need to take some these into consideration to to perform our testing properly. Very common and few examples of machine learning. This is an example of linear regression. In this case, we have all the data, the blue dots are the data and we want to generalize a function that fits all the data. Another very common problem is the classification problem. We have the data divided into classes, and we want an algorithm to divide the data, properly the data in to do in the two classes we have. In this case, we have defined a hyperplane separation between the two classes. And another well known algorithm, another well known technique, machine learning is the clustering. The clustering problem tells that we have different data distributed in the space, the blue dots, and we want to end up with an organization of the data like this, for instance. So we want to want an algorithm that is able to identify the different groups in the data. First two examples presented, as many of you already know, are an example of supervised learning. Supervised learning means that the pipeline processing of a machine learning is more or less like this. We have data over there, and we transform the data into a feature vectors. Then this feature vectors is fed into the machine learning algorithm we want to do. We have defined, we want to test, and then we have labels. This is the supervision part. That's why this kind of methods are called supervised learning. And after that, after we train the model, we want to exercise the model on the new data. Basically, machine learning means try to define a model that is able to generalize the conclusion. And that's the key word. The key word is generalization. This is the supervised learning setting. The unsupervised learning setting is something like this. So it's almost the same stuff. The difference is in the output, of course, and in the fact that the supervision is missing. So no labels on the data are provided. So please let me just get back to the previous slides. The output of the supervised learning model is the expected label. So we have an algorithm here that is trained on a set of labels or on a set of given labels. So set of labeled data. And we expect the algorithm to generate the exact label or the proper label for the new data coming after the training part. This is the supervised learning setting. In the unsupervised settings, since the labels are missing, the output is different. So in general, we may have a likelihood or a cluster ID, which the cluster or the group where the data belongs to. These are the general introduction of the techniques and the stuff we are supposed to deal with. And second learn provides this kind of cheat sheet. It's a sort of mind map that you can use to decide which kind of technique you can use for your specific problem. And this is quite interesting because as you can see, second learn provides algorithms for classification problems, for clustering problems, for regression problems. So the three examples presented previously. And we have also that dimensionality reduction is another problem of unsupervised learning. Here you may find, even if I don't know if you can read it, but here you may find some tips on how you can decide to which technique you should use for your specific problem. First things, if you have labels, of course, you may, if you have labels, you may end up with regression or classification. So supervised learning approach. In case you don't have labeled data here, you end up to clustering approaches because you don't have supervision. And this is just a very simple tip. But even if you decide which kind of approach you want to use, I mean regression or classification or clustering, whatever it is, you need to decide which kind of technique you may use because classification is a family of approaches. So the classification itself is a kind of, is an approach, a family of techniques. So you may decide which kind of algorithm or technique you should use. And after that, so basically you have to decide which model you're going to use. And after that, you have to decide also the set of parameters that your model best, that should use for best approximate your result. So we have a lot of things to decide. So basically, another definition of martial learning is martial learning teaches machines how to carry out tasks by themselves. It's that simple. And this indeed. The complexity comes with the details. And in the stock, we're going to deal with this kind of details and try to see how we can deal with all the details we're asked to deal with. So this is our starting point. So we have the data, the historical data we want to use. We have decided which kind of model we want to use for our problem. And we end up with this pipeline process, this kind, this sort of iterative process, because we want to test if the model we have built, so the other model we have decided to use is perfect for the problem at the end. And we want to evaluate the performance of the model. And we want to optimize the model. So in this case, it means try to tune the parameters of the model in order to improve the performances. So we want to deal with this iterative process in the stock. And what about the risk? The risk related to martial learning? We may end up for, first of all, we may end up dealing and analyzing unstable data. So we need to be robust against data that may contain noise on one hand. On the other hand, as I said before, martial learning is essentially algorithms. So we need to test if the code we already written contains fault or programming fault. We may end up with a problem which is called underfitting. The underfitting problem means that the learning function we decide to use, and sometimes this means that the set of parameters we decided to set to our model is not properly suited for our data. So the learning function does not take into account enough information. So the model is not accurate enough to learn from our data. This is called underfitting. Another problem is the overfitting. So the counter example, so the completely different problem, we have taken the learning function does not generalize enough. So this is a quite difficult phenomenon to discover. And we will see that there are some techniques to deal with this kind of problem. And finally, we have the unpredictable future. So we don't actually know if our model is working or not working. So we need to check and test the performance of our model while it is running. Okay. So I'll take up with this kind of risks. But first of all, if we have, as we may end up, we want to reduce the problem of unstable data, we have testing. So we're required to do some testing through our code. If we want to avoid underfitting or overfitting problems, we have a technique which is called cross-validation. We will see some examples about that. And the unpredictable future, precision or recall, tracking over time. Do you know what is precision and recall? Okay. I'll try to explain a bit. No problem. Okay. So let's start with the dealing with unstable data. So basically the point is try to test your code. And testing your code is one of the things that I suggest you to do most of the time. Thank you. In Python, we have a lot of tools for testing. We have the great unit test module. And basically unit test is based on a set of assertion. And the assertion, for instance, we have assert equal A and B that tests if the instance of A is equal to instance of the object B. We have a lot of assertion. The last column here in the figure refers to the Python version where it has been introduced in the unit test. Let me just briefly remind you that unit test module is a bit more extended, improved, announced in some terms in Python 3 with respect to Python 2. And I will show you an example of that in a couple of slides. Moreover, we have assertion to test the exceptions. We have the assertion to test warnings or even assertion to test logs. And this is an example of how you can use the assert logs. So basically here, you test if the output of the log here corresponds to what you expect. But in case of machine learning, we need to take into consideration that basically we're dealing with numbers. In fact, one of the most important features like it is that data are presented through matrices. So in general, we end up with having the feature matrix as x represented as a matrix of numbers. And we have labels that are basically an arrays of numbers. So here we have to deal with numbers. So the testing we're going to write, the unit test we want to write has to deal with number problems. And we need to test numbers and we need to compare arrays or floating point numbers. In this particular case, we have NumPy comes in help. NumPy, I don't know if you already knew that, but NumPy has a testing module that includes some more additional assertion. For instance, sounds are set almost equal, approximately equal and some assertion related to array comparison. We will see a couple of examples. For instance, if you want to assert that two numbers are almost equal, we might use the assert almost equal assertion in the NumPy testing module. And we might specify the number of decimal positions we want to the two numbers are compared. So in the first case, we want to test the number at the seven decimal places. So in this case, the test passes. In the second case, since the last digit is different, so here the decimal places to take into consideration are eight, the test fails. So we have an assertion error here that says that arrays are not almost equal to the eight decimals. So the actual and the desired. So these are reported. And this is one of the things we need to take into account when we deal with floating point numbers. Moreover, we may assert if two arrays are equal, NumPy provides two different functions, assert all close and assert array equal here. The assert or close function implements this comparison, this function. Basically, we if we test assert or close takes to some more additional parameters here, at all, which means absolutely absolute tolerance are told, which is the relative tolerance. And in this case, the test will pass. And in instead, if we're going to use the assert array equal, these two arrays are different. And this is the assertion error we have. So the mismatch is 50%. Again, if we want to compare floating point numbers, we might take into into account the you the so called ULP. So the unit list precision, which is the usually refer to the epsilon. If we wanted to know what is the epsilon for NumPy and for floating point numbers in general, we may get this by using NP dot f info dot apps. This is the epsilon, so the ULP for floating points. And in this case, if you want to test if two arrays are equal, in the first case, the test passes because we're going to verify with you to check if two numbers plus the epsilon are equal. And this test passes because we're just adding one single epsilon. And so due to floating point numbers representation, this test passes. In the second case, this the test fails because we're adding a quantity which is greater than the epsilon. So greater than the unit list precision. And so the two numbers are considered different. X and Y and not equal to one, you will be max is two. Okay. And finally, NumPy testing is great because it's also had some more tools to deal with your testing. For instance, it has some decorators that integrates with knows the knows testing framework. Just an example, it has these these test decorators. The one showed in the slide is slow that allows you to decorate the function telling the framework that that test is supposed to run slowly. What may what it means it depends on your personal your definition. Again, we have more over we have sorry, we have the mock framework, which is included in the unit test of Python three. And this is one of the feature I was referring to when I said that the unit test module in Python, the built in unit test module in Python for Python trees is a bit extended and answered with respect to the one in Python two. In Python three, you may do something like this from unit test input box. And this works in Python two, if you try to import box from unit test, you got an error. And if you want to use the mock in Python two, you should do a peep install mock, which is a mock package available on PyPy. And we see an example, the mock, do you know what a mock is? Okay, so no problem. Basically, here, we're we define the function class, which is nuclear reactor, that basically calls a function, which is the factual, the factual here prints the message. And this message is just used to to test if the the actual code is exercise not by the mock, and calculates the factual number of the end, given an input. So this is the test in the first test, we mock the factual function. And in the second test, we don't. So this is the output. So here, we have mocked the output. So we want to test the assertion here, here, sorry. And we got working, which is the actual, the actual code we exercise it. This case here, the do work, we assert that the output of the mock is six, but we have already defined it. And no, no, no more message has been printed. So no mock has been printed here, because the actual code has not been exercised by the mock. It's just been mocked. And in the second case, we got an assertion error, because we have here a factual of three, which is not supposed to rise any exception. So we have an assertion error here. So here, we are exercising the real code. Here, we're done. Okay, is it clear? Okay, thank you. So this is the part related to the, this is the part related to the unstable data. What about the modular generalization and overfitting? I don't have the time to explain the code. I'll just show you the example. Basically, the two, the most important parts of this code are these ones, because basically, we randomly generated some data here in this example. And we're trying to apply different algorithms, in this case, linear regression algorithm on these data, using different features and the different polynomial feature in particular. And the different polynomial feature have been generated by the polynomial feature here on the socket package. And the different features have different degrees. So we try to apply features of degree one, four, and 15 and try to test what the, what are the performance of this model. So this is the output. So basically, the, the blue dots are the data. And in, you may see in green, the true function. And in blue, the function approximated by the model. In the first case, we have a model which is underfitted. In this case, the model defined, so a linear model here, a linear model with linear features is, is not taking into account enough information. In this case, we have a very good model. So it's perfectly suited for our test data. And in this case, the model is overfitting because it's trying to, it's trying to, to this case, this is not very good approximation. Okay. So if we look at this particular case, okay, it seems that if we defined, if we define a model with a polynomial feature of degree four for this particular data, we are done. So we have the perfect model we may have. Indeed, this is not because this particular problem here, this particular model, sorry, has been exercised only on training data. And the problem is this model is in some sense overfitted. What does it mean? This means that if we consider just the training data, okay, we perfectly fit all the training data, but the model does not generalize any, does not generalize in any sense. Because if we, if the model is going to, not really, I'm going to include you. If the model, if the model will see some new data, the model is, has been too much trained on the training data. So no generalization is allowed on this model. So how we can cope with this kind of problem, the one extremely important part in this, in the model evaluation is to apply a technique just called cross validation. And in this particular case, the psychic package helps us to, with a lot of built in function that allows us to apply cross validation and model evaluation techniques. In this particular case, we want, we apply a very simple cross validation, which is called train and test split. So basically we get the input data and we split the data into different sets. So we have the training sets and the validation set. And we, we train the model on the training data and we evaluate the prediction performance of the model on the, the validation data. One kind of technique to, to see the property. So the prediction property, the prediction performance of the model is the so-called confusion matrix. In this case, this is a classification problem, three classes by three classes, a multi-class problem. And we see that in this case, we have three missed classes in this classification problem. Another more, another more complicated example is, unfortunately I don't have the time to, to show you is the, this is an example of the K neighbors classifier applied on some data. So this, okay, let me just do, you can include with this. This is very interesting because we want to, to test the, yeah, okay, thank you. We want to test the performance on the training data and on the cross, cross-validated data. So we apply here the function which is called shuffle split. So basically we get the samples. We have 150 samples. We, these are the, the aforementioned functions to generate the true function. So the x and y data. So as the regression problem explained before. And then we want to compare the learning curve. So basically the learning curve is the performance of the training score with respect to the cross validation score. And this is the cross validation score for the degree for polynomial. So basically here we see that when we enlarge the number of training examples we consider, the errors between the training and the cross validation score is basically reduced to zero. So it's a very good model. And in case of a polynomial of degree one, so which was the model of underfitting, here we have the, the error between the cross validation curve and the training curve that basically is even, is always large. So it's not a good model. Okay. Finally, some conclusions. I have more slides but I don't have time to show you. I'm sorry. So basically in conclusion, these are very important advice. You, it's always important to have testing in your code, especially if you want to test numerical data and numerical algorithms. Another suggestion I may give you just a hint, some reference to, to look into is something which is called FUDS testing. FUDS testing is very interesting for numerical analysis because some tests, FUDS testing maybe basically generates randomly applied data. So the FUDS testing technique is usually used to test the robustness of your code. So in just to, to, to, to, to test the performance of your algorithms in case of randomly generated data. Okay. So thank you a lot for your kind attention.