 Hi, everyone. Thank you for the introduction. Today, I will be talking about how to use artificial intelligence to optimize decision-making in sports. We are in the era of machine learning. And every day, intelligent algorithms are used to help people in their daily lives. For example, when you make a picture with your smartphone, when you drive your car, or even when you're watching your favorite show, many times you are interacting with artificial intelligence. Andrew Ng, a notable researcher in the field of AI, said that artificial intelligence is like the new electricity because of its great capacity to influence all the industries in the world, from automotion to commerce, entertainment, medicine. I could go on forever. And of course, sports are not an exception. And although 70 years ago, talking about technology in sports sounded like science fiction, nowadays, all the sports clubs and institutions take advantage of technology to improve their performance. So now, we have access to a wide variety of extremely useful data sources, for example, tracking data obtained with cameras or GPS, eventing with the actions performed by the players during the matches, and other types of data, like medical tests, nutrition, and even genetic data. So all of this is considered as big data because of the wide variety of sources from where they proceed and its huge volume. However, in order to optimize decisions, we think that most of the times is not enough to use a descriptive analysis of this data. Because in order to take the most of this data, you need to contextualize the most useful knowledge to the scope of the decision that you are going to make. And we think that the best way to do it is with artificial intelligence. So let's take a look at an example here. We assume that we are in a scouting problem where we need to decide if we're going to sign a player or not for our team. So we can basically go two ways. We could either analyze the data from another scriptive manner, which would allow us to answer questions about the past performance of the player, or we could take advantage of artificial intelligence to contextualize this player in our club, possibly in a new league with new teammates and a new coach. So the estimates provided by the predictive models are most of the time much closer to reality. And this is just an example that when we validate our models in the test data sets, which consists of an entire season of data, we realize that we are reducing the error in almost 50% in variables such as goals and assist. And this basically extends to the rest of the variables. So nowadays, it is generally recognized that artificial intelligence would make a great impact in the field of sports. For example, this article from Forbes identified some of the fields where it is more interesting to apply artificial intelligence. So some of them are scouting, player and team performance analysis, and players' health. So let me give you a little background about Olozip. The company was founded by Esteban Granero, who is currently a professional soccer player at Espanol, and Gaizka Sambicente. So they had a very deep belief that we could better understand football if we analyzed the game from a scientific approach. From the beginning, they counted with the help of Koncha Bielza and Pedro Arrañaga, who are full professors in artificial intelligence at the Technical University of Madrid and technological partners of the company. So at that time, I was their PhD student. We started working in unveiling the relationships among the different variables that describe the game of football, and in this time, we've come a long way. So nowadays, we work as an external AI department for the clubs, and we have materialized the knowledge generated during years of research in practical solutions that can be used to enhance the capabilities of the clubs. So nowadays, most sport management platforms focus on generating a descriptive analysis of the data, which allows you to answer the question, what has happened in the past? We take advantage of artificial intelligence to reach the predictive and prescriptive dimensions, which allows us to answer questions such as, what is going to happen in the future and what can we do to make our objectives happen? So now, I will tell you about the process that we usually follow when we need to address a new problem. And well, this is basically the standard in machine learning, but I will give you some specific examples about sport analytics. So most of the time, we receive a relevant question, even either from our sport experts or from our clients. And first, what we do is to search in our database for the information that can help us answer these questions. And we design the solutions that better fit this problem. Then we preprocess the data to generate a data set that can be directly fed to our machine learning algorithms, and the results are predictive models that can be used to answer these questions. So these models are validated to ensure that the answers that they provide generalize to future cases. And when the models meet our quality standards, they are implemented in production and they are made available for our users. So for us, it is very important that the responses provided by our models are intuitive for the users and also that they don't only provide point estimates, but also measures of uncertainty. Also in some cases, like medicine, it is very important for us to learn transparent models. And this means that we must know the process which produces the responses of the models. Our final objective is to make the user as comfortable as possible with our software. And therefore, we constantly ask for feedback and update the models and the tools accordingly. So these are some examples of the type of questions that our software will respond. So for example, in the case of medicine, we could be interested in knowing what is the risk of injury of a player before a training session or a match? Or what can I do to reduce this risk? In the case of match analysis, we could be questioning ourselves, how is my rival going to play against my team? Or in a scout team, for example, how would this player perform in a new team? So basically, we need to deal with a great variety of supervised learning problem, both classification and regression problems. And although the results are not so easy to see, we also need to deal with during the process with some supervised and supervised learning problems, such as clustering, anomaly detection, and structure discovery. So when we deal with a problem for us, it is very important to understand the data that we're working with. And for this, it is very useful, for example, to project the data into a smaller dimensional space. In this case, we have projected the statistics from the players into the space in a way that the players that have similar statistics should be shown closer to each other in this graph. So basically, each point represents a player, and the colors are related to the position of the players in the field. So lighter colors represent more defensive positions, like central defender. And darker colors represent more offensive positions, like a striker, for example. So just in one glance, you can see that players with similar positions are grouped together. So you can get a very good idea that the models, even if they only have the statistics, would have a very good idea of which is the position of the players. Another interesting thing is that the left backs and the right backs were grouped separately. So just by looking at the statistics with no information about their positioning, you can easily differentiate between left backs and right backs. And also, there are some points. For example, here there are some orange and red points. And when we looked into it, most of the cases, actually, they were players who had switched positions, but the data provider had not registered. So this may be useful also for detecting some errors. And there we go. So for us, yeah, it is very important to find the errors because even if you follow the best procedure, if your data is not good, most of the times, you won't get good results. And for this case, anomaly detection is very useful. But we will continue with this same image because we can get some insights. For example, these two points are clearly separated from the rest. And those are clearly errors in the data. And while this other point here, this player, is also slightly separated from the rest, in this case, this is not an error. It's Leo Messi. He must be a special player or something. So for those of you who are not very familiar with machine learning, I will quote Pedro Domingos, a prestigious professor from the University of Washington, who said that machine learning is like the scientific method on steroids. And why did he say that? Well, the principles are the same. You formulate hypotheses, you test your hypotheses with the data, and you iteratively refine them. The difference is that machines can do it millions of times fast. But what are our models actually learning? And what can we do? And how can our algorithms represent players and teams in a way that we can later predict their future behavior? So let's take a look at the example of a player. So our algorithms has access to all the action that a player has performed during his career. So what it does is to encode all of this information into a numerical vector, which we refer to as the DNA of the player. So with this numerical vector in combination with predictive methods, predictive models, we can later answer questions about what will the player do in new situations and contexts. So for example, here, imagine that we want to know what is Tony Cross going to distribute his passes from this specific point in the field in a future match. We could get a response in this form where the green contours represent the expected distribution of the successful passes, and the red contours represent the expected distribution of the unsuccessful passes. And for example, we would also get the general probability of success of this player. So the idea for the teams is similar. We compress all of their information also into an numerical vector, which represent their play in a style. And we can, after that, answer questions about what is this team going to do in the future. But also, we can get an idea of what the models actually think of the teams. So what we are going to do is to select a subset of this vector, which is related to the pass distribution of the teams. And we are going to project this high dimensional vector into a two dimensional space. And we can get a picture like this. So we are projecting the teams from the four major leagues in Europe during last year in a way that the teams that distribute their passes in a similar way should be closer to each other in the graph. So you can see here that we have hidden four of these teams. In this case, Real Betis, Barcelona, Huesca, and Athletic. So let's see if any of you can figure out which number corresponds to each of these teams. So let's start with the first one. This team is between Manchester City and Bayern Munich. Any guess? Yeah, Barcelona. This one was pretty obvious. The second one is a little bit more tricky. So it's close to Atlanta and Chelsea. So both of these teams displayed a very collaborative and offensive style of play. What do you think? Betis, yeah, you got it right. OK, so the third one is close to Getafe. And, well, this team displayed a very direct and straight forward style of play. We only have two left. Huesca and Athletic. No? OK, this is Athletic. So we already know who the last one is. It's Huesca. They finished in the last position in La Liga. And, curiously, most of the teams surrounding Huesca also perform very poorly in their respective domestic leagues. So it seems that they did not display a very effective style of play. OK, so when we have the representation of the players and of the teams, what we can do is to use them as input of predictive models and use them to answer different questions about the future behavior of the players and the teams in new games and in specific contexts. So imagine that we want to analyze the next classical, Barcelona versus Real Madrid. And, for example, we would like to know how is Benzema going to do. So an interesting question could be, from where is Benzema going to create more danger with his passes? So this is basically answered in this graph. So the darker areas represent the zones where Benzema is more likely to create danger with his passes. Or another interesting question could be, from where is Benzema going to score goals? So these are expectations represented here. And we are also providing the probability of Benzema scoring a goal from this specific point, knowing that he's playing against the defense of Barcelona and that Teres Tegen is the goalkeeper. OK, so a question that we get very often is how certain are you of the responses provided by your models? And we think that the only way to answer this properly is with honest validation. And well, the standard in machine learning is to split your data set into a training and a test data set in a way that you learn your model in the training data set, and you use it to make predictions in the test data set. You compare these predictions with reality, and that way you can get an unbiased estimate of the error that your models are going to make. In our case, we also have a time constraint, which is that when we make predictions, we usually have data from the past to the present. And we want to make predictions about the future. So the best way to mimic this is to, for example, if we have eight seasons of data, we use the previous seven to train the model, and we use the last season as the test set. OK, let me drink a little bit. So now I will tell you about different tools that are currently available in different fields, medicine, match analysis, and scouting. So TCTDoc is our tool for medicine. And in this case, we counted with a wide variety of sources. So we had the tracking of the players during training sessions and matches. We have a great number of medical tests, like blood tests, saliva tests, nutrition, also even genetic data. And from all of these, we compute a set of variables which have been identified in the past as possible predictors of injury, either in the literature or by clinicians. So with this data, we built a transparent model, which allows us to answer questions about the risk of injury in different parts of the body. Also, the model suggests instructions about how to reduce this risk of injury before its training session. The next tool that I'm going to tell you about is called TCT Coach Board, and it focuses on pre-match analysis. So on predicting the behavior of teams and players, in future games, and that way, allow the user to create effective game plans and strategies. So this tool is presented as an interactive coach board in which you can click or select areas and they are annotated with information. And you could proceed in the next way. You could select a team to analyze, a rival, and also you could decide if you want to analyze the team as a whole or, for example, a specific player. You could select different contexts and ask different probabilistic queries. The results would be provided as counterplots over the map. So let's continue with the next example, with El Clasico, with the next Clasico. An analyst could be interested in knowing how the attack of each team starts. So they may be interested in analyzing the goal-kicks of their respective goal-keepers, Teres Teren and Tivo Kurtua. So here, again, the same idea. We analyze the goal-kicks from this point and the green counters represent the distribution of the successful passes, and the red counters represent the distribution of the unsuccessful passes. So we can see that our model expects Teres Teren to play with, to pass to players that are nearby, while Tivo Kurtua is more likely to mix his game between short passes and long passes. And this also affects the probability of success of his passes. Now let's see another example. Here we have four pictures. So each picture represents the expected distribution of shots of four different players. So the areas that are darker, for example here, represent zones from where a player is more likely to shot. And let's see again if, oh, also, we are providing the probability of scoring a goal if a player sets from this position. And it's basically the same position in the four plots. So let's see if you can figure out which player corresponds to each of these plots. OK, the first one, it seems like this player likes to shot from this position. And he's pretty efficient also. Any ideas, who this player may be? Yeah, yeah. So he's a left wing. And it makes sense that he sets from here. The second one, he's expected to be the least efficient of the three. OK, let's go with Hazard. So any ideas? Yeah, Sergio Busquets. OK, so if you got those two, I think you are going to get the other ones. OK, the third one is the most effective of the three. And it looks like he likes this kind of diagonals. I don't think that you are going to get it. Yeah, that's right, Leo Messi. And we already know the fourth one. It's Casemiro. And it seems like he likes to shot from far away. And although he's not as effective as the elite strikers, he's pretty good also. He's pretty efficient. OK, so the last two that I'm going to present you is this is a scout. Well, surprisingly, it's about scouting. And one of its functionalities is to predict the performance of a player in a new team. So in this case, let's assume that we are at Letico de Madrid. And recently, Diego Costa was injured. So they may be looking for a replacement. And it is on Cavani, maybe a good option. So what we could do is to predict how is this player going to perform in Atletico de Madrid. So we have here in green the expected performance of Cavani in Atletico and in Orange, what he's actually doing at PSG. So it seems like this player during this year is having lots of opportunities, but he's not being very effective because he's scoring less goals. But if you are an Atletico de Madrid fan, you don't need to worry because, well, our models expect him to improve his efficiency clearly if he goes to Atletico de Madrid. So another thing that we could do is to compare the expectations for several players at the same time. So in this case, we could be interested in comparing, for example, Diego Costa with Edison Cavani, who might be his replacement. So you can see in red the expected statistics for Costa and in Orange the expected statistics for Cavani. And in most stats, Cavani is actually expected to outperform Costa. So he may be a good sign. We also have here a prediction of his future market value. And given that he's already 32 years old, his price is expected to go a little bit down. So maybe we should also consider this. So another functionality is to predict the progression of a player in the same team. In this case, we could predict the performance of Eden Hazard during the rest of the year. And if you are a Real Madrid fan, you can be happy because he's expected to improve a lot during the rest of the year. We could also send one of our players to two different teams. So this can be useful, for example, if you are planning to send one of your players on alone. So Vinicius is not playing too much for Madrid this year. Imagine that they want to send him on alone. And they need to decide between three teams, Alaves, Español, and Arsenal. So Real Madrid should be interesting in which team Vinicius is going to improve the most. And also, how would his market value change? It seems that, in this case, Arsenal could be actually a very good choice. Finally, the last functionality that I'm going to talk about is a similarity search that allows you to search for similar players in your team. Let me explain. So let's go back to the previous example where we were at Letico de Madrid and we wanted to sign Cavani. So imagine that PSG is asking too much money for Cavani, or they don't even want to sell him. So what we could do is to search for the players that would perform more similarly to Cavani, but not in their origin club in Letico de Madrid. So what we do is to contextualize the whole database of players in Letico de Madrid and compare them with Cavani. And they are displayed in this way. So in this case, if I were at Letico, I could be, I don't know, interested in Timopuqui, who is expected to perform similarly. And his market value is like five millions, so much cheaper for now. OK, so during the last year, World Football Summit, we were asked to publish some predictions about Benzema and Cristiano Ronaldo. At the moment, Benzema was at his worst moment, and we predicted that he was going to improve during the year, which actually happened. And Cristiano Ronaldo was changing his team from the last years Real Madrid to Juventus, new league, new teammates, new coach. And our predictions came much closer to reality than the previous statistics that he was doing. So that's why the mass media covered these predictions. So OK, as a summary, I would like to highlight that what separates us from the rest of the market, and this is basically that instead of focusing on a visualization and analysis of descriptive statistics, we are focusing on the predictive and prescriptive dimensions. We are currently also working in basketballs, and we think that this approach generalizes to many other spots, maybe most of them. So we are looking forward to new challenges in the future. So I think I was very fast. That's all. Thank you very much for your attention, and I'll be happy to answer any questions. Hi, thank you for your small question. For the scouting, do you take into account also the minutes that that player is going to play in that team? Or you are assuming that he's going to play exactly the same number of matches for each team. So we don't predict the number of minutes that the player is going to play, but for example, in the market value prediction, he's taking into account. So for example, the user could input, like, I think this player is going to play a lot in this team. So the market value changes a lot. The statistics are estimated per minute, so they don't depend on the time. Perfect. That was a question. OK, thank you. Hello. Thank you. It was very interesting. I'd like to ask, how do you prepare data? You said, for example, for a player, you trained your model on seven seasons and validate on eight. And how do you prepare all this data? Because, for example, to get a good example of what cross doing in one or another position, it looks like a lot of hand work and difficult to automate such thing. So actually, we work with that data provider for eventing data. So yeah, there's a lot of work in converting the data to the data sets that we use. But the data sets are not so difficult. They are structured data. And for example, so you can get an idea, it's like an event, for example. OK, thank you. You're welcome. No more questions? Someone? OK, then I would all like to thank you for coming today, and I hope you really enjoyed your time. And I want to welcome you back tomorrow and wish you a good evening. Thank you.