 Okay so good afternoon everyone and obviously I wish to thank you the organizers because this is a very nice opportunity for me to join this this course and to present something that is somehow parallel to the topics we are we will hear during the week. I will talk about machine learning and neural network applications and I will introduce these topics but I will focus mainly on artificial neural networks because they are probably the most useful simple ready-to-use tools you can you can find and there are some close relationships to other methods you will discover and you will learn during this week. What is machine learning, artificial intelligence and how they relate to other disciplines in similar areas. Well in this in this scheme you can see that basically we have data somewhere and we have sciences that deal with the extraction of knowledge and discovery of knowledge in large data set but different approaches are possible and statistics is probably the the most popular the most widespread. Somehow when you need to recognize patterns you are partly dealing with statistics and partly dealing with something else and something else is in the field of artificial intelligence and machine learning. It is not something that everyone agrees upon but usually machine learning is considered as a kind of subset of artificial intelligence that is a broader field. A broader field because it also deals with something that is not related to computational issues that is related with cognitive or philosophy whatever you can think about intelligence and the way we interact with machine. If we look at computational issues only then we are dealing with machine learning and artificial neural network are part of this of this latter group so they are both tools useful for artificial intelligence and they were developed as tools for understanding intelligence but then they became something that is closely related to machine learning so solving problems. All the activities we can carry out using data as a whole are something we can call data science while computer science involves also the way we manage data that is a broader context. As for the relationship within statistics and machine learning in the slide we just saw there was no overlap between the two but actually the relationship are a little bit more complex. Leo Breiman was a statistician and a machine learning specialist. He was convinced that statistics deals with data modeling and machine learning is something that deals with algorithmic modeling. Statistic focus on data properties of data takes that we can do with data. Machine learning deals with the way we can use data process data and the way we can exploit algorithms to solve problems. The situation we are facing is that the amount of data we have to deal with every day is increasing very rapidly and the knowledge we have to deal with this data is also increasing but not as fast as the data amount. So the real problem is that our expectation are increasing much faster and everyone thinks that we have a solution, reason, a good idea about just about everything. This is of course not true, it's not possible because our knowledge is limited. So we have a difference that is increasing and in order to bridge this gap we need some help and this help can be provided by machine learning and artificial intelligence. Let's think about these two definitions as more or less the same thing, more or less. Okay, what is machine learning and what are the main technique in this field? The real list of techniques is much longer. So what you see here is just a few main types of technique. Classification regression trees, I mean binary or not even binary trees with multiple split. If there are taxonomists among you, when you need to identify a species you have a key and you have to answer a number of questions sequentially and this is exactly what a classification tree does. Artificial neural network is what we will be discussing today. Support vector machine, genetic algorithms, case-based reasoning are other techniques. If you are curious about them I think we will have the opportunity to discuss after this talk maybe this evening so I would be very happy to answer all your questions. But we have limited time so we must focus on our topic. I want to mention however the ensemble methods among which random forests are probably the most popular and the most useful because they are probably now the most useful algorithm we can use for classification tasks. I'm not saying the best algorithm because there is a nice point of view that I will mention later on about what is the best algorithm. Random forests are basically a combination of classification trees. So we accept the idea that if we can develop something that learns more or less good enough but not really very well and then we can combine a large number of these weak learners then the combination of the weak learners will be much more effective than the individual learners. So 100 classification trees are able to provide much better prediction classification than a single classification then the very best you can train. And there are other methods that are not based only on let's say averaging the result of a single method. Other methods are based on a kind of sequential processing. You develop a first model and then you get some residuals between your prediction and the true result your targets and then you develop a second models that try to narrow the gap between what you predict and the true data and so on. But even in this case I would be happy to answer all the questions after our talk. So let's focus on artificial neural network. What is a neural network? I guess that you probably already saw something like this or maybe just read something about neural networks on a newspaper scientific review even in some scientific papers. But basically a neural network is a set of very simple processing units and these processing units are connected with each other in a very complex way and this is what makes a neural network a powerful tool. What is the history of neural network? The concept is very old, more than a century, because the idea of the way in the human brain is able to learn is something that dates back to the last decades of the 19th century. But more or less during the Second World War and after that the first practical implementation was obtained. I mean practical implementation of a theoretical idea. So it was something that was not really able to tackle a real problem, something that was just kind of concept demonstration, not more than that. But the first rules for functioning of artificial neural networks, the first kind of very, very, very simple neural network, dates back to the 60s. But something happened during the 60 and two very influential authors, they showed that these perceptrons, these very simple architectures of neural network had limitations. They were not able, with the knowledge of that time, to solve practical problems. And the consequence was that during the 70s most people abandoned the study of neural networks. In the 80s, there was a new start. And it was not exactly a fresh start, because under the ashes there was still some activity. So there was a new start based on previous knowledge. Tevo Kohnen first published a paper about self-organizing maps, and we will discuss about self-organizing map today. In 82 and in 86, Ruhmelart and his co-worker published a paper about the error-back propagation algorithm that was finally able to make a neural network work, solve problems. And that was the start of a new era in neural network applications. In 1991, Kola Santi, that was not an ecologist, wrote in a paper that he was sure that there will be ecological application of neural network. And I guess that no ecologist read that paper. But in the mid 90s, the first ecological application, I would say the first biological application, or the first application in old field of neural network, appeared. And since then, the number of applications is constantly increasing. The next milestone I want to mention is in 2006, when deep learning appeared. Deep learning is something related to more, I would say, different learning algorithms, able to train much more complex structure that mimic the functioning of human brain more closely than the neural network that almost everyone is using now. So this is our future, but it's something that already exists. Okay. Of course, there is a close analogy between the nervous systems and artificial neural networks. What makes them similar to each other is that both are comprised of many units, and these units are connected in a very complex way. But if we think about the most complex neural network that we have at the time of our talk, the most complex neural network is probably less complex than the nervous system of a fly. So there is still much, much, much, much work to do. Okay. The kind of problems we can tackle with neural networks, most common is function approximation at, say, empirical modeling. If you can do something using a general in a model or something like that, you can do probably better with the neural network. Pattern recognition, classification, clustering, forecasting. I mean classification. It's more or less like pattern recognition from my viewpoint. Here you know the categories you want to recognize. You don't know whether there are different categories or not in your data. Forecasting just another way of modeling and so on. The learning procedure can be supervised if you already have the answers you are expecting from a neural network. You can teach the neural network the right answers. Unsupervised is when you want to discover the structure in your data. And based on reinforcements is something that is more useful in industrial application. It's something that deals with, let's say, you know, the Google car that drives by itself. That is probably a good example of reinforcement learning because the car learns to get the highest reward from the possible choices. If we focus on the most popular, the most typical applications, pattern recognition is something that deals with groups of different objects, observations, samples. And you want to, for instance, to define the species of an individual, of a specimen. You want to detect the species or you want to detect some other property of your data. Clustering is when you want to find the best way to separate homogeneous group and obtain homogeneous subsets of a larger set of observations. And this can be done in unsupervised way. Here we don't have the right answer. For pattern recognition we must have the right answers and we have to teach to the neural network how to recognize our object in a proper way. Another category is regression. You know the values for one, two or many independent variables and you want to get estimates for one or many dependent ones. You want to fit a series of data and extract a more general way to relate these variables. And then this is also a supervised way of learning. But the search for an optimal behavior is based on reinforcement, as I was just saying. Okay, I will skip my own application because I think it's not really relevant and we are running out of time. It was just to get you a broader idea. But let's see how a neural network is designed. This is a three-layer perceptron. So a multi-layer perceptron and this is three-layered. We have some nodes or neurons that are the input nodes. Here we put the values for the variables that we use as predictors for others. Usually the data are scaled into a 0 to 1 or minus 1 to 1 interval because we want to normalize our data. The output is the rightmost node that outputs a normalized value that must be converted back to the original units if you want. Here in the middle, we have a number of connection, of synaptic connection, and each connection is associated to a weight. There are two in this case because we have three layers and so we have n minus 1 bias nodes. These nodes work as the intercept in our regression. So it's something that we only need to move up and down the output of the network. Okay, now let's have a closer look at the so-called hidden layer, what is between input and output? The hidden layer is something that contains nodes, neurons, that collect all the inputs, sum all the inputs, and use this sum as the argument for an activation function that turns this linear combination of inputs into the output of the node. Okay, so into the response of the node. And actually this combination of linear, combination of inputs, and non-linear transformation is what makes a neural network work. The activation function is usually a sigmoid function like this. This is not the only way you can use an activation function, there are others, and in some cases the output unit has a linear activation function, just makes a linear transformation. What has been demonstrated is that if you have a neural vector like this, provided that the number of nodes in the hidden layer is large enough, and provided that you have enough data to train the network, then you can approximate every function to the accuracy level you desire. It was proved in 1989, but then in 1991 it was proved that this property was not related to the activation functions, but on the three-layered structure. So it was the hidden layer, the root for this property, provided that the hidden layer had a non-linear activation function. Okay, how does a neural network like this learn? We pass to the neural network examples, and we know the answers. And our goal is to make the artificial neural network learn, but learn to generalize, not learn by heart. And this is the most relevant problem. And we will discuss this in a minute. Testing data, so a data set that is used only to test whether the neural network works properly is always needed. And this is something we should use independently of the method we are dealing with, if we want to compare different methods in order to avoid circular references. I won't go into the detail of the back-reparation algorithms, because we don't have enough time. But basically, we initialize the weights randomly, or almost randomly. I mean, almost randomly, it means that we can narrow the random range of the values. Then we pass a pattern to the neural network using these random weights, and we obtain an output value. Then we compare the output value with the known target value. And on the basis of the difference between the two, this is in red, and on the basis of the derivative of the improvement in learning in the output, we can define how large must be the change of the weight that we can use, and so on. When the error is small enough, we can exit the procedure. This is in theory. In practice, this is what happens. Imagine you have independent variable elevation and dependent variable species richness, a very simple model. One input, one output, only two neurons in the middle. Bias nodes are not shown here. Let's start the training. It may be a jump, but we can probably start over. But you can see that the red line is the output. Okay, now it works. This is the output of one of the hidden nodes. This is the blue is the output of the other hidden node, and the red is the output of the network. As you can see, the weights that connect the hidden layer to the output are negative, and in fact the shape of the curve is reversed. But basically, you can see that the longer you train, the better the red curve fits the data. This is a very simple example. It seems that everything works very nicely, but there are problems. There are problems because in order to get the best weight, we have to look for a minimum in the error surface. In a surface that is the relationship between an error measurement, let's say mean squared error, and the weights. This is only a line, but you can imagine that in two dimensions, this is a surface. In more dimensions, this is a hyper surface. But the problem is always the same. You have to start with a random guess, and you can find yourself in a local minimum. This is not going to work, so we have to try again. And you can be a little bit more lucky. This is a better solution, but still not the best one. So the training procedure needs some tricks to help the network learn. For instance, if you add some noise to the weights in the neural network, when the training stops, when you are stuck in this point, if you add some noise, you are basically shaking your neural network, and the learning can start again. And if you're lucky enough, you can find the true minimum, the best solution, but it's not so easy. This means that when you train a neural network, you always have to train again and again and again as long as you can, and you will find the best solution if you have enough time and computational power. During the training procedure, if you look at the error relative to the training set, to the data you passed to the network for teaching, then the error decreases steadily. The longer you train, the better you get. But if you look at the error in a validation set that is not passed to the neural network as an example, then this error will decrease and then will start increasing. Why? Because somewhere there is a minimum, but up to this point, you are training a model. From this point on, you are training a memory, something that learns only the pattern you show them. Imagine you have a person who is in a large library, can learn everything that is in that library, but that person is never being in the real world. The knowledge he can collect is probably part of the knowledge you can get if he compares theory, examples, and real-world validation. So we have to combine these two kinds of information, and we have to stop the training when we optimize the learning relative to the validation set. There are some tricks that can help to avoid this problem. For instance, imagine you have this kind of B-variate problem, more or less like the same we just saw, and we expect to train a model that has this kind of shape. But if we overfit the model, if we train too much the neural network, then what we obtain is like this. It's like using a very high-degree polynomial. You can fit whatever you want, but it doesn't have a meaning. So what we can do? We can use something that we call jittering, and so we add some noise to the data we have. So we use not a single point, but we use a kind of interval, and we fit those intervals, and fitting the intervals makes the response much more smoother and smarter than fitting the points. This is a practical consequence of overfitting. This is a typical problem, primary production at different levels of phatoplankton biomass, and this surface shows the relationship between irradiance, photic zone depth, so kind of transparency, and production. Obviously, a shape like this is not likely to happen. You can't have two maxima in primary production for the same level of irradiance. There is no reason for that. It depends on transparency, but you cannot have two maxima. Just one. Two maxima is not possible. You can have two minima in that curve because you can have the optimum for intermediate level of transparency. It doesn't make much sense, but in theory it can happen. But for sure, you cannot have surfaces like this, two complex, no biological milling in them. So this is what happens when we overfit a neural network. To avoid the overfitting, we discuss about early stopping, to jittering, but we can, for instance, pass the patterns for the training in random order. So the neural network does not learn the sequence of the examples you provided to it. Or we can use weight decay. This is a smart trick that involves some steady decrease of the weight. So basically it's like slowing down with your car. It's easier to drive the car if you slow down. And this makes the same trick with the training. And the last, and from my point of view the most interesting, is that you can add to the training procedure some penalties if the solution you obtain is not ecologically or biologically sound. If you recall the picture I just showed you, those surfaces representing primary production as a function of irradiance and transparency, that surface must have only one maximum. There are no other possibilities. So if you compute a partial derivative of that function, of that surface, and you apply a penalty to a network that does not comply with the shape you are expecting, you can drive the learning using a biological constraint. This is very interesting for ecologists. In general, the first point is somehow trivial, but it's important. So we don't want to have to deal with variables that are not at all in relationship, not even nonlinear relationship with our targets. If we have this kind of problem, there are other methods in machine learning that are not sensitive to non-relevant inputs, but neural networks are sensitive. If we want to model a continuous response, we want to process something that must be smooth, continuous, and we want it as more changing input, it's as more changing output. And of course, we must have enough data. Enough data is the main thing we want for a neural network, in any case. So we need at least two sets of data, one for teaching the neural network, this is a training set, and one for validating during the teaching, during the training. Then if we want to compare different models, so compare the neural network with other models, then we need the third set, the test set. But let's see how the computational part of what we are discussing work. This is a neural network similar to the one we saw in a previous slide, and imagine you have 0.25 as an input value. What we do is just to compute the linear combination from these two nodes to the green one, and the linear combination is 0.449. You can see here the computation we do. Of course, this is 1 times 0.57. 1 is omitted here, obviously. So we pass this value to the activation function, and the output is 0.62. And now we know that this node has a 0.62 output, and then we can do the same with this node, and with the output node, and the overall result is 0.55. So our function turns input value 0.25 into output value 0.55. Computations are very simple, and you can write 10 or 12 lines of code to run your neural network once the weight of the network has been defined, of course. So this is very easy to run a train at the network. It's a little bit more complex, but not too much complex to define this weight. This part is only for normalizing data before passing the data to the neural network. So computationally it's very simple. Is the training algorithm we use really critical? We just talked about error-back propagation, but there are other algorithms that work as well, or in some cases, better than error propagation. But the truth is that with ecological applications, that are not real-time applications, the training algorithm is not critical. So if we stick with error-back propagation, we can get usually the best result we can hope to get from a neural network. There are other types of neural network, only the multiliner perceptible. No, there are many types. I want to mention very quickly the radial-basic function networks that use intermediate units that are not logistic in their activation function. They have some other types of bell-shaped functions. The difference is that the activation depends on the distance between each of these processing units and the input array of values. So it's something that recognizes a pattern that activates only a few of these units, depending on their distance from the input pattern. Then the second step is just like the multilayer perceptron. I just want to show you this slide. Imagine you want to find the way to separate two groups of objects, like this typical classification problem. A multilayer perceptron finds this kind of boundary, a radial-basic function network finds this kind of boundary, because it's based on a number of small radial responses. So you can combine radial responses in a very complex way, but basically you can do that. If you want to deal with the time series, if you want to get forecasts, then there are some recurrent architectures that feed some context neurons that are not input neurons, or something that works as a kind of memory of the previous states of the network. This can improve the ability to predict in a time series. This is the so-called ELMAL network, and this is called JORDAN network. The difference is that here the context neurons are fed by the intern neurons, and in that case they are fed from the output neurons, but the way they work is very, very similar. I want to show you just one example. My experience with neural network started in 1996 with this paper. It was just a comparison of a typical linear model for primary production in Chisabic and Delaware Bay and two neural networks. This is a model that combines the data from the two bays. This is the neural network that combines the data from the two bays. This is a combination model that distinguishes between the bays and this is the neural network that is able to distinguish between the bays. You see, there is an input node that has Bay as an input. It's Delaware and Chisabic. It's a kind of switch between two different ways of functioning, and of course there was a huge difference in accuracy. Then there is a story about this, but we don't have time, and I have to skip this one. I was, there's a way to skip smarter than skipping actually the slide, but let's talk for a minute about how we can open the black box models just by sensitivity analysis. There are several methods. I mentioned some of them. The easiest one is just to look at the values of the weights. Larger weights means probably more relevant variables. It's not very effective. Then you can also try to see how the neural network behaves if you take only one variable and make that variable vary from the minimum to the maximum, from zero to one with normalized data. I always use a different method, just perturbation of the input patterns, and there are other methods based on partial derivatives of the response of the neural network. It's something that probably is not optimized, not yet. What you can obtain is a ranking of the importance of the variables, and you can try to figure out their role, but you are not able to figure out how they interact. So second-quarter effects are out of your reach, and this is a serious limitation. Of course, you can also look for the amount of change in error that depends on the amount of perturbation you had. In some cases, you will discover that small perturbation does not affect the error. You can think, but probably the network didn't learn. No. This is a good proof that the network is able to generalize, because if it doesn't respond to changes that are not relevant, you are probably dealing with a good solution, more robust. The ecological problem is that there are some parts in the space of our data where perturbation doesn't make sense. If I do sensitivity analysis by perturbation, and I'm dealing with elevation and temperature in reverse, does it make sense to perturbate an input value assuming that it's a very high elevation, the temperature is very high? Because it is not possible. And it is not possible that the temperature is low if you are close to the mouth of the river. So from an ecology point of view, perturbation of data must be smarter than just adding or subtracting something. There is some work going on, and a PhD student of mine is working on these issues now, almost ready. Okay, a few minutes, ten minutes, to talk very briefly about another type of neural network, the self-organizing maps. They are using unsupervised learning in most cases. There are some types of self-organized maps that actually do supervise the training. And the SOM, as the acronym is, basically is a combination of units that have the same structure of the input data. That means that if you are dealing with a list of species, typical community data, then the structure of a unit in a self-organizing map is a list of species with abundances. And what you can have is something that works like a clustering algorithm if you use only a small number of units or something that resembles an ordination if you use a larger number of units. Okay, if you think about the color picker in any office software, this is what a self-organizing map does. So you try to put colors that are similar to each other close to each other. And this is what the self-organizing map does. Okay, how do they do that? Imagine you have your list of species in N, in this case in P samples, P observation, and N species. This is the map we want to train. We can use rectangular maps, hexagonal maps, smaller, larger clustering, larger ordination. A hexagonal has a better topology, but rectangular is good as well. Okay, let's imagine that we have this map and this square is an array of values, like this one. We initialize with random values, then we pick at random one of our samples and we look at the unit of the map that most closely resemble that sample. Of course, it will be different, but we can find the closest unit. Once we have defined the closest unit, we change the values of this unit that, again, are exactly a list of species in this case. We change those values in order to make that unit a little bit more similar to that sample. And we also change the units in a neighborhood, but we change the neighborhood less than the best matching unit. And we go on with this kind of training, on and on and on, as soon as we have a structure that doesn't change anymore. At that point, we put all the original observation, or our samples, we project these samples onto the network. Usually, this is done using Euclidean distances. Most of our application, all the application I know, use that distance. But it is possible to use any distance, if you want to. Break artist? Okay, break artist. In order to project the units, we only have to find the minimum distance. This is the structure of the SOM unit, and that is the structure of the sample we want to project. And we have just to find the best matching unit. This is a real application with a number of samples, and the structure of each unit is a list of species. In this case, each species has a probability of occurrence. And I want to show you this example, that is about butterflies of the small island around Sardinia. You know the biogeography, the island, the mainland, all the theory. That's the idea. So we want to understand how the species are distributed in this case. We can train the self-organizing map, project the samples from the different island onto the map. And this is what we obtain. In this unit, you see the real data, presence or absence, the values of the self-organizing map, continuous values or binarized values, and all of them are matching. So this is a very good representation of the tabular island butterflies. What we can do with this map, a number of things, for instance, we can explode the map, and we can pick each unit. This is just an example. And we can compute the distance between this unit and all the neighboring units. And we can use different gray shades to represent this distance. And the average distance is the shade of gray of this unit. Then we can decide whether we want to keep all the intermediate units or discard them. And we can obtain a representation like this, where lighter shade of gray means regions of the map where all the units are more similar to each other, and darker shade, kind of ridges that separate different areas of the map. Is this map related to the ordination technique we discussed in morning? Yes, of course. This is principal coordinate analysis based on Euclidean distances, and this is the self-organized map based on the same data, Euclidean distances. The circles around the labels show, say, MADDA and ASI, MADDALENA and ASINARA, samples that are in the same unit or here in the same group. Look at this group of observation over there on the upper right corner. And then these are in lower and so on. So you can see that the structure is obviously the same. So why should we use self-organizing map if we can obtain more or less the same results with a well-known and well-functioning tool? Because we can do something more with self-organized map. We can do more or less the same thing even with ordination techniques, but there are some advantages. You will see tomorrow when we practice. For instance, I can represent the density or the probability of presence of a species over the map. So Vanessa Atalanta is much abundant or frequent in this island, less abundant in those. But this is very easy. You can use some bubble cloth on the ordination is the same. But then you can also add some other variables on the map because you can interpolate external variables and add them on the map. And you can discover something about what makes the species behaving that way. This butterfly is typical of the higher part of the Mediterranean vegetation. So it's not to the sea level, only from 100 meters on. And this is the maximum elevation of the island, and that is the frequency of that butterfly. So there is a close relationship but we can do many other things. I guess we will see tomorrow something more. But I want to use this last minute just for some take-home messages. Multilayer perception and self-adverting map are very useful. They are not magic. You can tackle the same problem with other tools. But these tools allow some, let's say, some craftsmanship. You can change the way they work. You can adapt them to your problem. You can use them in very different ways. There is no one right way to use these tools. You have to do some experiments. You have to adapt them to your needs. There are some art packages for almost everything you can do with artificial neural network. But the problem is that when you have big data in the sense of large dataset then you will discover that especially with neural networks art can become a bit too slow. So if you want to do some real work with your neural vector you will end up writing, compiling your own code. And even if you want to implement some of the methods we just discussed here and there are many others. So art is a good environment for the main types, for standard runs, for experimenting, just to get started. For 99% of the people it's good enough. If you want to do some real work then you need something else. Probably most of you were not familiar with this kind of tool. Now you're a little bit more familiar and tomorrow I hope you will have the possibility to get started tomorrow or next month if you want. What you cannot learn quickly is to add some ecological thinking because you must always be first ecologist, then modeler or data scientist. If you ever worked with someone that was not an ecologist very good at dealing with numbers with algorithms but with no ecological knowledge then you are losing something. Sometimes it's necessary because you need technical skills. You have to ask someone for advice for instance. But ecological thinking is the basis for good results and for sound results especially. Remember that if you fall in love with a myth, with a theory, with an hypothesis and it's very easy then you're probably out of business because that is not the right way to do it but everyone fails, me for sure. There are no such things like too many data. It's always a good thing to have too many data. Too many variables depends on the methods but we can use that. When we have not enough data, not enough record then we are really in trouble. Do we have two more minutes for the last slide? I can jump because you see that small triangle there. If I click there and jump to the... Thank you. I have one more slide but I want to show you something that is a good synthesis of what me or everyone who worked with this kind of problem will learn. And this is partly by Leo Breheimann partly by these two other authors. Rashomon. If you remember the movie that was based on four persons who were telling the same fact from their own point of view. One of them was not a person but someone who was dead and was talking through a medium. But basically there is no objective truth. So there is never the best, the most adequate method, algorithms. It depends on the way you define your problem. It depends on the constraint. A second viewpoint that is a consequence of the first is the no free lunch or no free meal, as you prefer, theorem. This was demonstrated. So if you take two optimization algorithms and you test them over a wide range of different methods and situations you will never find the best. So there will always be a setup where one of them will be better than the other. So no best method at all. The Ockham razor, you know that. But when we deal with machine learning the problem is simpler is better. No, because simpler can't adapt to complex problems, cannot represent complex behaviors. But complex is very difficult to train and complex can turn to learning by heart, learning to memory. And this is not what we want. So it's not always true that simpler is better. Ensembled method, those based on the, using a number of algorithms and then averaging, for instance, their results is a typical case. It's much more complex but it actually works better if properly trained. The last point is the so-called problem of the high dimensionality of our data. The curse of dimensionality is a problem related to the fact that if we have a neural network with, let's say, 100 weights we can't train that network unless we have thousands of records. We have to train a network so complex with a handful of records. But if we use proper approaches then we probably can take profit of the higher dimensionality of our data. Many variables, a lot of opportunity to use them in a different way to find one of the models that performs not the best but the best we can afford with our data and our knowledge. Tomorrow we will practice a few of the things we discussed today and I hope that seeing what happens will help you to understand how useful these methods can do. They can't substitute, they can't replace good statistics, good ecological thinking. There's something else. It's another tool in your toolbox. Thank you very much. If you plan to practice tomorrow at this address you can find scripts and data we will be using. These are the R package you have to download and install. These are some of the programs I'm using. There are no explanations, no manuals but I will talk 10 minutes tomorrow about them and in case you want to try something that is not only R, for instance for large dataset I would be happy to share them with you not only the programs but also to provide you the direction for using them effectively. So it's an alternate solution, not the best one. Thank you very much.