 Thank you very much. It's my pleasure to be in Singapore. Honestly, I've been living here for three years, about seven years ago. It was the greatest time of my life. So coming here again, it's like a real right of pleasure and happiness for me. And every time I walk along the streets, I'm just smiling because I just simply love that city. Well, today I would like to talk about a topic that is pretty interesting. It's about a prediction of very rare events in the airline industry. Usually I do financial analysis, but this time I thought why not to take the mathematical and statistical stuff from finance and apply to something much more interesting. You know, it's good to break the rules and just think outside of the box. So why not to see what we can do with data that we can gather over the internet about the airlines, about airplanes flying around the globe, and can we predict something that is, well, it happens, and I mean air crashes. But, well, we need to approach that in a systematic way. So what I'm going to tell you and I'm going to show you is a really combination of creativity in thinking about the problem and way of looking for proper solutions based on data that we have an access to. And so the outline is just in seven points. I'll just tell you just briefly about our rare events and forecasting methods. Then we go to data samples. So we will look at some aviation database and what we can find interesting in the database. And then we go to something that we will try to implement in mathematical statistical way in order to describe very rare events like air crashes, right? And we will analyze a couple of best possibilities or the mathematical methods that we can find some probabilities of certain events. Then we jump into another dimension of data or playing with our data. So number four, I will show you a method that I invented, let's say this way, over the last seven days. And I will be really working passionately about it, just driven by the curiosity of what's going to happen. We will study a problem that is called, I call it, probability of finding a plane in a different spot on the planet just by providing a GPS coordinates. And then we'll go to something that may cause accidents which are just storms. And in this case we will bound ourselves by database of United States because this is the only public access that I found in a very short period of time. And based on that we will analyze one real example of the air crash which took place in Chicago on December 8th of 2005. And this will treat it as an out of sample prediction. Before that day, so before December 8th of 2005, we will treat all data from our database combined as an in-sample. And then finally I will conclude about a simple formula for calculating the probability of air crash given not only a location but also an area where they even can count the probability of a really bad thing to happen and in time frame. So let's start. A lot of stuff I will go pretty quickly just because it's so emotionally attached to this subject because I'm just simply a geek of aviation, I love flying planes so this is really something that excites me. Well if you think about a rare event just by definition this is something that happens very rarely, how rarely? Well I will show you that when we look at the specific kind of data of air crashes we have about a 40-43 fatal accidents in a span of 24 years. And if you imagine how many planes take off and land every single day which is around the world which I don't know is about 20,000 maybe more I even heard about 100,000, it doesn't matter it's just an enormous number per day so this number 43 accidents over 24 years in specific conditions is very rare. And when you have a data sample which is so small comparing to the huge population of possible events then how you can describe or how you can derive some meaning out of that. So this is the problem of rare events. In finance, this is the area where I specialize, this is a little bit much more better approachable rather than in case of the airlines. Airlines are a very peculiar case. And forecasting is our way of thinking about the proper mathematical and statistical solution in order to derive something from the data that we analyze. So let's jump into database. So I found something that is called aviation accident database of the National Transportation Safety Board in US and it covers, it contains a lot of information about all possible air crashes since 1962. And if you go to the website this is how it looks like. You have a lot of information that you can choose from. So information about accident, from when to when, location, type of aircraft, model, registration, number of engines and so on. Then operation about the airlines. You can screen all these data as you wish. But what really captured my attention was weather conditions. And I found that within that area that I cannot click because it's just a print screen, there are two categories that I really found pretty fascinating and I didn't know about it. There are two kinds of weather conditions in aviation. The first one is called VMC, visual meteorological conditions. That means that pilots have visibility and they can fly, they can see. So even if something will go down with the instruments on board they still can land or can do something on board because they have visual contact. And there is something what is called IMC, which is Instrumental Meteor Conditions and according to the definition from Wikipedia that is an aviation flight category that describes weather conditions that requires pilot or pilots to fly primarily by reference to instruments. There is no visual, there is no visibility or it's very limited and we are influenced or the airplane is influenced by horrible, horrible, bad weather conditions. So that really lighted the spark in my mind because I thought well if I can just extract the data based on this IMC condition then I can analyze how often that sort of accidents happen in the history of aviation and just present it. So what we do, we download all the database that you see in one CSV file and you can see how this is just the five lines but how many categories there are located so there are 31 categories and we can choose from them and screen the data. It's enormous amount of information but this is a good example to tell you about a big data because now it is very popular we are doing a big data so this is an example of big data. Don't be discouraged when you just screen just a panda's diaphragm that we just read the data inside and you find that you cannot find information about all accidents because for example number of engines are just missing, right? It means a lot of models of airplanes, right? So we need to be much more aware of what we filter and what we play with so it takes time, it took me a couple of hours just to realize what I can really use from that huge database. So what about first of all screening for fatalities so where all airlines that aircrafts, air crashes that just simply cost people to die in different ways, it doesn't matter, it's just the number must be larger than zero, at least one. So that was the first step of screening. The second one was I was interested in a specific type of aircraft. I really like jet liners, like big Boeing's, we love Bombardier, we love Airbus 3-8 and so these are like that really ignite, these machines ignite our imagination especially when you took the flight on board, it's something that really is right now for us meaningful. So what I did, I just simply screened for, I just rejected everything that had no jet engine and also I eliminated everything that had a turbo fan or the turbo propellers. So usually, so the full data sample that I was left with is very theory, crashes and I'm displaying, they are in the table. I just selected based on the date, based on the aircraft status which was damaged or substantial. Make Airbus, Airbus industry, Boeing and so on, McDonald Douglas, these are all jet liners. We have a model specific, we have a number of people who died due to accident and the nearest location of the accident. Now, this is what we've got in the database, so let's do something with that, right? And if you look at the data, you can derive a lot of things just by that. So the simplest way or something that is really fun is to translate the last column which are the locations of and display on the world map. And in Python, it's really a tricky job, I found it pretty interesting. I spent about three hours looking for a best and a quick solution to look for the name and plot it on the map and I found that if you play with an HTML file of special Google request of the website, find latitude and longitude.com with a name of the location, then you can very quickly download all the GPS coordinates. There is a Python library, I think, GeoPy or PyGeo that does the same thing but it stacks when you just simply ping a lot of requests. So that simply works much, much better. I'll show you later how it is. So, I think I'll just, let me just run that. So these are the results. This is like the first step in data analysis. You need to feed your imagination with a special resolution of all the events. So, we can see that pretty much in that category of low visibility or the lack of visibility, all the error crashes took place are just more or less randomly distributed around the world. Just please memorize the locations of all crashes in the United States. I'll come back to that. There are much more to the west, to the east rather than to the west. I'll show you later why. So, the interesting point is that when you look at the data, you think about, okay, at some point I can stop analyzing the data and grab another, let's say, future event that we know from database that existed. But we will analyze data to some point in time in order to predict or derive some probabilities of a next event that we know that happened. It's called in-sample data analysis. So, I picked up, just because of other conditions, that point in time, which is again, Boeing 737 in Chicago, you know, is on, that has been destroyed or just, no, not destroyed, just subsequently affected by the snowstorm that we will come back to that later. So, the last point in data sample is event of 23rd of August 2005. Peru doesn't matter. It's just the last point in the data set. Right. So, having that, we move forward. We have our in-sample data just separated. We can check what's the time frame. And the first part is, of course, visualization. The visualization that is meaningful can be number of fatalities in time. And we see that the distribution is pretty interesting. If you calculate the spread, it's more or less equal to half of the average. And again, in red, I just marked for you what points or what data in time we will take as our in-sample and the rest we will leave untouched. Right. So, we separated in-sample and we can check, as I told you, that expected number just based on that analysis, just simplifying that and calculating that the expected number of fatalities is for the future event. So, after almost 2005, it's 62 people plus minus 60. This is... Yeah. Right. And also, you can, of course, calculate if you derive the histogram. You can normalize it in a proper way. The probability of people dying in the amount of 62 or more is 45 percent. Right. So, it's the first step in understanding what the data really tells you. Now, when you go to mathematics or statistics, you find a couple of tools. And they are not easy to reach for because our textbooks are not really telling us the story about what we should take from which page, from which chapter, and to apply in order to derive what we want. So, in case of modeling of probability of very rare events, we need to be very, very specific what we are looking for. So, in general, we are talking about events that can have discrete moments of time or they can appear continuously in time. So, we don't know exactly when that may happen. And this is a case of air crush, air crushes. And this is called continuous exposure probability modeling. So, the basic... The starting point is something what we call them, calculating the mean interval between the events. So, based on our... In sample, we will try to estimate what is the mean time interval between all our air crushes. Right. So, basically, what we do... We cannot derive one number because we already derived it. If you take an in-sample data of time differences between... In days, for example, between when all these events happens, right, it's just a t-bar. But in order to model that, we need to consider normal distribution with mean value of t-bar and sigma just normalized by number of data. Good question. Because the normal distribution has... Basically, you can have negative values, right? Sure. So, if there's an interval... That's why the division by square root of n minus 1, which is number of all events, reduces that. And this is the point. This is the point that we don't go into negative territory. That's... And, again, we know that the more data we collect, so not 43 points... Sorry. In our in-sample, we have 33 events. The more data, the sigma will drop dramatically, right? So, what we do... We have our... We... These are the number of injuries. Okay. So, basically, we derive all these differences between all these days when the air crashes happened and we calculate it. So, when you visualize that, this is what you've got in the in-sample, right? So, the distribution of when... How far in time, in days, that all events were distributed. Yeah. This is how it looks like. And if you normalize it, you can even calculate what's the probability of specific events. So, it's more or less the same probability because we are just limited by number of observations. Now, the more interesting part is that having a formula for modeling, which is a normal distribution just given by that, we can always run just a simulation so we can draw random numbers following that formula, and this is more or less how it looks like. It's just one realization of such a distribution, right? So, now, the point is that the mean value, so the expected value from our analysis is that the expected time interval between all these events is about 270 days. So, every 270 days, we expect air crash in the low visibility conditions, somewhere in the world. We don't know where. We just say... We provide some information in time. This is great. And this reduction in sigma allows us to make only a limit for the estimation of this mean value. We cannot take, in fact, these histograms and calculate the probability of time difference between events. We can't do it. What we can do instead, using the same techniques or just sticking to beta, which is exactly the value, the expected value, 270 days, we can use that under the assumption that we model it with a long... We're using a Poisson process, so something like a call center calls that we expect more events to happen quicker in time, and then it just simply smoothly drops because this is one of the best models that we have. I don't say in nature. It's just something that we know that works in different instances and especially in discrete events. So, we can model something what is called time-to-next air crash. And this is really fascinating because if you apply just a simple modeling, one of the realizations of that is a histogram like that. So, how you read it? You simply know that if you integrate from 0 to 200, so you know that what's the probability of air crash happening 200 days from now or less. So, it's that information. Or if you take the probability at 200 minus probability at 199, you've got exact probability and the prediction. So, probabilities are like prediction, right? With it of even having exactly 200 days from now. So, that's the difference. But this is a powerful and I'll be using that just in a second. So, based on our in-sample data, I just derived that like you can, for example, say that in 170 days from the last event that took place, which was again August 23, 2005, like if we want to calculate the probability that in 107 days the next event will happen so we can do it in two different ways. First, you can simply integrate the histogram and find that we need to wait 170 days or less till next event if the probability is 33. So, we've got 33% of probability that within next to 107 days there will be a crash or alternatively you can calculate the probability that exactly in 107 days the crash somewhere in the world will happen is 0.3%, right? And the funny thing is that also we can use that histogram to say that probability of air crash on the next day is 0.4%, right? So, these are all numbers that we derived based on data analysis. Or, you can simply take exactly to plot how in time that second statement would look like. So, probability that air crash will happen exactly in K days and this K day is just plotted here. So, this is one of the key solutions or the modeling solutions from our modeling that is meaningful and we will use that probabilities later to calculate the big picture. In terms of something like probability per unit of time and we choose day as a unit of time we want to, we can also using just a Poisson distribution we can model the expected number of air crashes per day. And if we do it correctly this is more or less the result. So, we have 99.78% it's just 99.8% that it won't happen and only 0.4% that it will happen on the next day. And that number is very powerful because we can use it as just a global information from our analysis that per day we expect an accident and the probability of accident anywhere in the world is 0.4%. Just like that. We'll keep it in. You can also go into something that is called probability of the occurrence of several events in an interval of time and then you can, for example, calculate that if we look at the end of August 2005 and we know that the next accident happened in December so we have a period of four months so you can, for example, calculate the probability that within three months zero, one or two accidents will happen so we find that about 30% we have chances that within four months next four months the accident will happen. We know that it happened but this is the probability that we're looking from the point of end of August 2005. Right. Now, next part which is very key information about another side of probability. Probability of finding a plane by GPS coordinates in a specific area somewhere in the world. If you go to a website OpenFlights you will find a pretty interesting database. First, you find that OpenFlight provides you with a data file of CCSV with all roads. All roads so you can plot them all together to see what's the connection between all the airports. So we have a code for the airport where the plane starts when it lands and we assume, and this is the best assumption we've gotten probably pilots they fly along all the roads in the same way, I mean defined in the same way so with the shortest way to burn less fuel is of course along the great circle and it's not straight line is just an arch right? So we know that. So now I combine two files in order to to have both information code for the airport where the plane starts and when it lands and the GPS coordinates. So now having that I can plot a map and also I've got equipment so what sort of plane was on the road. I don't have, and this is also important for our analysis, I don't have information how many planes per day flies on that road. It's just, it's not provided in database, I don't know is it such a database exists but this is not a worry, I'll tell you why just in a second. So what I'm interested in if you go to airlinesco.co.uk slash blah blah blah you can screen for our most interesting aircrafts which are Boeing, Airbus, Embraer, Canada and McDonald's, which are just jetliners. So what I do in the next it takes time so I'm screening for that and this is like an example of a random data that we filter so we can check that the equivalent is just the Airbus 300 something something and these are all the categories. So we have a very nice data sample let's plot them. We plot them in Python using something which is called a base map and it's a brilliant library it took me a little while just to figure out what you need to go in order to get what I want but this is the result, right? When you plot all 67,000 roads this is the picture you've got. Now I've got a question for you, what's the probability of finding a plane here here in Europe, right? This map suggests when you over plot all these lines, all these roads it's one, one, one, one, one hundred percent it's not like that, right? We know that because the plane is not everywhere, right? It goes so my idea was just to take this road this flat road flat image of roads and just turn it into something called a heat map. Heat map probability of finding a plane. Now how do we calculate it? So I came up with a pretty interesting solution that I believe can be applicable to our just modeling so if you think about just one road I just picked up something randomly from our database and I just plotted as a line so this is the shortest path between two cities I'm not going to read it in Chinese, it's just a place in China and the road was a plane flying Airbus 330 flying to Melbourne so what you can do with that you can take that image you can convert it into a NumPy array so you have a numerical information about all the pixels and based on that you can create a map that simply contains zeros where there is no plane and one would correspond to where the plane is located in the map so you have just information where it is located and because doing that especially in Python using this library with a cylindrical projection cylindrical projections in base map ensures us in a certain degree of confidence that no matter where we are on the map no matter where we are on the map the distance in pixels in latitude and longitude and latitude is the same and I found just by simply calculation and checking that randomly there is about every pixel on that map which is 1600 per 1200 pixels corresponds to 25 by 25 kilometers so it's pretty interesting so having all these ones and zeros you can see how many pixels our road takes how many what's the percentage of the line that represents our road looks takes so it's about for this particle of light you can find that it's 1450 pixels that you can even translate into that it's 0.07% of all map if you go into kilometers it's 0.7% of the area on the surface of the planet assuming that the planet is spherical that the road so this is how we built our understanding of what we can do else so what's the probability of finding a plane assuming that we have one plane per day flying simply what's the probability in space that the plane is somewhere on that line well simply we can assume the simple model that we having a number of pixels is just one divided by number of pixels so it's a simple probability there is only one pixel 125 by 25 area where the aeroplane is located but what happens when we've got two roads right crossing each other so we know that for example alright so we have two roads one is let's say airbus 380 it covers 13 pixels and the other one is Boeing 777 and they cross each other in one point so probability for finding a Boeing 777 is completely independent on where the airbus is located so this is very important so in each square we have a probability so 1 over 9 and 1 over 13 but in that place the probability of finding either Airbus or Boeing is just we need to calculate differently and and if you figure out this analytical solution and do hard work in Python you can finally come up with a heat map of probabilities and this outcome took about four days to derive it's pretty interesting the darker color is the higher probability almost equal one of finding a plane of course we know that at the airports probability of finding a plane landing or starting is almost one it's not one exactly for all the airports not for the business one yes but not for the whole this result is pretty interesting because having the constant knowledge about the dimensions of the image we can just simply take any GPS coordinates we can find a pixel corresponding to that and we can from the map directly read out the probability of finding a plane by GPS coordinates that's it so simple and so beautiful alright alright next part we are coming to the ground final so probability of the barrier this is a tricky one these data are I just in the period of one week I just found an open database of covering only US so what you can find in NOAA National Weather Service are there in CSV files from 1996 to present all records and all locations of storms of any kind recorded by different people by different stations by different sources in US so when you look at the example of a file that is called in the US storm events locations you just are interested in latitude and longitude so where the storm has to place and of course when and this is the tricky part we have a year and month we have no we have no information about the day so now we need to think that we can derive some probabilities of finding a weather in specific area just because we have coordinates but we this will be prediction month by month not day by day as in case of airlines but it can be solved I believe so we screened the data we look for our case study which is a Chicago December 2008 that we try to predict that the event will happen so we are looking for the total history before that event of all storms all possible in before that so from 1996 till 2005 and of November even we can take month by month all storms that happen in that area and this is beautiful because I didn't know what I can derive but I found that there was there is a record over that period of time 341 storms in that area wow that's that's much better than 33 right so you can tell something more about the probabilities so if you plot per month because we screened so per month this is how it looks like it's a number of storms in time all right now of course we can display all the storms and this is the result every single dot from our database of storm events it's just the one dot and you can see that in the eastern part of US the probability of storm is much much higher than to the west and that's why as I ask you to memorize we had more accidents due to bad weather which is pretty good for us right when we do the modeling and this is more or less how it looks like now in that part how to find how to calculate the probability of of of bad weather bad storms snowstorm rainstorm it doesn't matter what the class is in that area around Chicago right we need to use a Bayesian approach and we can treat every single month or every single day as a series of Bernoulli trails so we simply collect information that on the specific month there was a storm or there was no storm that's it so it's a zero one zero one zero one and we know that from Bayesian analysis that if you start from something like what is called inform prior so we have no information about the distribution of the probability of a storm so we add more points more data points to our sample as we go in time and we build that probability and based on that we can we can use that information that approach using a better distribution and we calculate was the probability of a storm in area just using that I'll just show you just to visualize if you are not familiar with that how it looks like and you know that the idea behind using this beta distribution with this prior and learning stuff in this Bayesian frame is the same as we first we have no idea about whether for example we flip a coin we don't know whether it's fair coin or just we expect more heads or more tails we want just to test it so we start just by flipping and the more times we flip and we record whether it was tail or head we build that image of probability and after for example 500 trials we see that it is more or less the probability is centered around 0.5 so this is expected half time if we have a fair coin we should record the tails and the half time we should record it so this is how it works in practice we look at the data storms and the more data stones we have or don't have we build the probability and that allows us in fact to predict what's the probability of a storm in December 2005 based on monthly data so if you do it correctly and if I dinote correctly the probability of a storm in December 2005 based on our in sample data and a little beyond till November is 40% now how to calculate what's the probability of a storm in December per day we can take 40% divided by 31 days and estimate it's about 0.1.3% or we can use something that we know from finance which is a compounded returne model that we just simply know that if you multiply the percentage and you do it 31 times so you multiply it you've got probability per month 40% so this is how you can approach that and you can find that the probability is 1.1% so both approaches with dividing by number of days or doing that in the compounded returne framework it works in the same way it doesn't matter is it 1.1, 1.3 it's just 1% it's very low it is not 0 so we have figured out what's the probability of a storm what's the probability of a plane in any spot and we derive also the probability of accident which was about 0.4 per day so now let's look at very briefly on the case study and this is pretty interesting because if you like aviation, if you watched on National Geographic a lot of movies and investigations it's pretty interesting to feel like all these guys going around the aircraft and looking for all the causes what would happen and in the record that we can find on the internet about that event it says that on that day 750 South Airlines jetliner of a runway at Midway International Airport while attempting land in a snowstorm that's important a plane came to the rest at an intersection at the northwest corner of the airport one person in a car was killed so nobody was injured in the plane a snowstorm has begun during the early afternoon after and affected all of northwest Illinois and northwest Indiana storm total snow of 3 to 6 inches was widespread and intensive snow was just like one hour before the accident they estimated about 3 inches per hour of a fall and in another report that investigated why plane really crashed they found that the conditions contributed to the formation of the band and insight as to why the snow intensely picked just prior to the airline incident revealed a combination of events which came together simultaneously for a short period over this area and it demonstrates how complex air crash can be in fact where the accident happened Chicago Midway International Airport this is how it looked like so no crash but due to again snowstorm that happened and then in the national transportation safety board they only put we find on the internet that the probable cause of that accident that the pilot's failure to use available reverse thrust in a timely manner to safely slow and stop the airplane after landing which resulted in a runway overrun so it's an error of human error but human error could be correlated with bad weather conditions with low training and so on so you can just by that peculiar case I just want to put your attention on one spot that no matter what the cause of all the accidents where was or where the causes were we when we approach a huge database big data right analysis we need to be aware what's going on behind that's why I was asking you about the results because the process of processing the data is important but also what we derive is even much more important than but all these things are correlated anyway one more point about Python which is the tool that we use and it's pretty cool you can go to something what is called national weather service in the US at waterweather.com and you can download an image for every single day of daily precipitation so just to provide you with definition of the funny word that I just learned it's a water release from clouds in the form of a rain rain, sleet, snow or hail all kinds of so what we can find is just that in an area of Chicago on that day it's like a summary for the last 24 hours there was let's say fall, I'm not saying it was snow or rain fall but it was something recorded about a 10 to 12 maybe 13 millimeters anyway final thought is all these scenarios and now we are ready to calculate the probability of an aircraft given the GPS coordinates in a specific area in cadets cadets so one approach is simply in statistics we know that if we have all these events independent we can just multiply probabilities and we expect something very low so probability of finding an aircraft in a specific area times probability of a storm in that area times probability of just an air crush we all derive that so when you look at the number is 4.3 times 10 to the power minus 3 so it's a very small now what we can do with that number we can multiply it again by probability of an air crush exactly in K days from the last day in sample that we took into account which was 23rd of the last accident in sample which was 23rd of August 2005 intentionally I put the number 107 days to predict in the future just to give you an idea because this is exactly 107 days from last accident when the next in Chicago air crush took place so it was 107 days separation and just multiplying what we calculated again in this section here so taking that probability we do like a model we take the approach that we used before in order to calculate the probability in K days that's why I mentioned there is a possibility of calculating a even happening in interval of time and we use it and this is how it looks like so the zero would represent 24th of August 2005 and 170 days from that more we can read out from the graph was the probability on that day exactly into 107 days of air crush and it is about 0.001 1 let's say now if you want to have any comparison to what that number is because it is so small well let's consider a game of lotto I don't know in Europe it's very popular in Australia it's popular so in the game of lotto you have 49 numbers you just pick up randomly that set and it was the probability that you pick up 6 numbers correctly well it's 1 to almost 1,400,000 it's just so small it's a 7.5 times 10 to minus 6 there's a probability and if you if you just take what you can read from the graph and you divide by that probability of lotto you see that the probability of air crush in Chicago is around the same order of the magnitude but conclusion it's amazing interestingly people more often win in lotto than airplanes crash in Chicago which is good news thank you very much questions yes can you fly it on the side what's the problem where do you fly night night but where do you fly to India well that's amazing that's a good question ok India which place ok I just can't pick up assuming that half is about how to play no problem ok let's ok I know more or less how to do it quickly do you win lotto last crash that's also important I know because of the better weather it was in in Mexico in Monterey 2010 December 5 which is 16 years ago 6 years ago 6 years ago right no maybe one day sorry I just messed up something I'm not sure if it's confirmed by your data but I get most of the crashes occur during the start of hunting yes that information is containing database so then why would we care about the length of the flight because if you take into consideration for example flight of Air France 447 from Rio to Paris that I think crashed on first of June 2009 if I'm correct it was crossing the area around the equator and it is the category is not IMC it's a VMC so visual but they were flying at night and they experienced something what is called a superfluid activity in the cloud so there is like a formation of water at very low temperatures that is liquid and it cost all the pipe tubes that measure the speed of the airplane to freeze and that cost and plus pilots error that cost an accident but so it was on the road it was exactly more or less in the half of the distance between Rio and Paris so it may happen everywhere so that's why but just look at the approach I say much more about not precision just in first step towards building a bigger picture I've been talking two days ago with one guy here in Singapore informing that for example metmetal.com it's a big service that runs all these weather analysis of different levels around the world so if you have an access to all these data plus this what I derived or I will improve it for sure I'm really keen to do it you can you can predict next I mean not predict exactly when and where it happened because it's difficult it is so much complex problem that you just simply don't know what to do but at least you can derive probability that in specific area of a crash and also please note that I'm talking about an in-sample this in-sample the more data we include all these probabilities will change so if I'm lucky one day I would be happy to provide you with a time evolution of this final probability of an air crash for a specific place in the world how we evolve on time because the more air crashes happen that numbers will fluctuate but we'll see you only work with the crashes that happened because of bad weather yes yes, yes, yes that's something that is applicable you think we have a storm over Singapore we have so many storms it rains and in 5 minutes there is a sunshine and we have a lot of place the congestion of airplanes around Singapore in this area 25, 25 or even it's enormous and the road changes so you see how complex really the problem of calculating the probability is but at least with this outline I think I was pretty successful doing that just somehow sure I know where you are going I know where you are going and the error could be from the pilot that is also not certain I think there are many levels of randomness so on a broader view you think that this problem is even so even here the data is very less well what I of course what I missed during my talk you remember the road that I showed you and I told you that I'm calculating number of pixels in the maps like in fact when choosing Python drawing a line with a width of 0.5 it gives you a spread of plus minus 1 pixel so the width of the path in pixels is not 1 but 3 pixels which is a good thing because I think it's a good thing because if you think about a road the plane never travels along the great circle perfectly the computer always corrects the path so it may drift 75 kilometers to the right or to the left along the path so I'm assuming that the probability of finding a plane in that remote area but I'm not assuming that it may just go anywhere it's just this is the area would we take one more question if anyone has any word to the lucky draw you assume that the probability of finding a plane you assume that these are independent probabilities but are they actually independent? they might be different right? well again I just assumed that you have one plane per road per this is difficult to define per day so in fact we know that it is not true but what I can tell about that prediction about finding a plane in an area is the lower limit for probability because if you take some roads with a higher congestion the probability in a different spot or different area will be higher so that's why you need to treat that heat map of probability of finding a plane in any GPS location as a lower bound which is very good this is very good