 Thank you very much for your presentation and thank you everyone Thank you very much for coming today and showing interest in our talk Well, the title of the talk suggests that we will be talking about transformation at the transport industry and transformation is one of the main Concepts inside means I it's philosophy. I just want to show you a video in order to introduce introduce this concept hmm We are the mark we leave The sign of our commitment to the future a Commitment that evolves and stays in motions as days and times change Leaving our mark on people in companies in society Is our motivation because we are the mark we wish to leave Because revolutionizing business we revolutionize our world a More sustainable more human more intelligent and secure world We believe that in any field the important thing is to innovate and transform to create advanced products and services promote a better society and Increase the quality of life Through our own solutions Innovative services and state-of-the-art technologies. We contribute to the creation of the companies and institutions of the future We want to be your companions on this journey full of unexplored roots Creating new ways of doing and living and developing new businesses driven by technological innovation Working together from start to finish From ideation to implementation through agile and collaborative cultures New methodologies that changed the rules of the game But a trip with our companions would be nothing and we know it That is why we believe in people in empowering their talent in the sum of diverse intelligences United by common values We are committed Curious non-conformist people collaborating for a common purpose a Better world for future generations Welcome to the human technological revolution Welcome to mean site so The experience that we are sharing with you today has a lot to do with this positive and human impact on society Because transport maybe with the communications or energy industries are one of the Industries that has had a highest impact on society in the past 20 years Transport is something that affects us all regardless our company is just focused on transport Goods or people or we're a company that uses transport just a means of connecting our products with our final clients Or even if we are regular citizen that uses private or public means of transport to go to work Or visit some friends or family around your town and just to give you some numbers Transport industry generates 15% of the gross domestic product in the whole world So you can imagine the economical impact that this industry has And only in Europe. There are more than 10 million employees which are associated to this industry If you take a look at the budget in the mean average a budget in European houses 13% of this budget is dedicated to transportation issues and Also for people or just products we all we traverse and we travel through Thousands of billions of kilometers every year Hmm, but this industry has also a huge impact on environment 25% of greenhouse emissions are associated to this industry And if we go to the data plane you you can imagine the amount of rich data that this industry can generate Sensory information and the Internet of Things connected vehicles Interaction through social networks and webs, etc. No some studies has state that the efficiency of transport industry and Emissions that this industry generates can be reduced and improved in a 15% Amount just to the through the use of big data analytics however, those studies also show that only 19% of transport companies employ big data and plans analytic technologies and What's more 70% of those transport companies do not plan to use it in in a near future So in this great opportunity a scenario the European Union decided to fund a set of initiatives In order to prove the viability and positive impact that big data Artificial intelligence can have on transport industry The goal of the project was to ask as a catalyzer in order to apply this state of the art technologies in this Digitally in mature industry and it was a complete success as all the developments that were carried out along the project Have now been integrated and we are changing the way that these transport companies are operating We are very happy that we have contributed in order to bring some light and create a continuous improvement cycle in such a relevant industry If we take a look at the project whose name is transforming transport as I mentioned before It's a European project which is funded by the horizon 2020 fans And this year has received an award as the best digital transformation based on data use case in Europe This project has been developed by a consortium of 49 different partners and was leaded by the transport division of Indra Hmm those partners represented 11 different European countries and the project has a budget of 19 19 million euros this this budget is partially private and partially public So we can see that there are some not only public institutions But also private companies that are interested and see the the potential of this kind of initiatives the project took a nearly three years from 2017 and I'm finishing this year in 2019 and the scope of the project including Included several several different use cases which cover different transport domains connect vehicle traffic for logistic airport passenger flows High-speed rail connected cars Distribution logistics and urban mobility And it's been in three of those use cases that our big data and a pants analytics division in minside has contributed Those use cases are first the first one is focused on traffic vehicle vehicle traffic prediction and took a look at two toll roads one in Malaga another one in Porto The second use case a was focused on Railway maintenance and it took a look at the high-speed network that connected the Spanish cities of Malaga and Cordoba And the last one pretended to analyze the flow of passengers in the airport of Athens in Greece But in order to to do our magic as data scientists We needed some computing flat some computing platform in order to to work and we use a one-site open platform Which is a big data distribution that has been developed by mean site that has a strong open community behind That covered everything that we needed because it includes components from the data ingestion and storage to a powerful artificial intelligence and machine learning engine in order to For us the data scientist to do the dirty work And also included a set of layers for visualization Apification and productivity session that help us very much in order to integrate all results with the external partners that from the consortium but just the platform itself could do nothing because As Pablo Picasso said computers are useless because they can just give us the answers But we need to find the proper questions And is this question and answer thing that inspire or unit in order to create a methodology to get the best results when we face an analytical problem This methodology just as the video that I showed you at the introduction is focused on finding the key people that affects our process Because finding these key people will allow us to get the key knowledge in order to face the problem This knowledge involves data because we must know which data is available The amount of historical data and the quality of this data so we can establish Which questions can be solved and can be answered and it's we in which way we can fix the expectations And also we can use this knowledge in order to create and establish a set of metrics that allow us to evaluate in an Objective way whether the project was successful or not Once we've done all these studies this previous study We are now ready to identify and list a set of opportunities that are relevant to the project And then we can prioritize and create a road map in order to get the highest value at the lowest time So it is following that this methodology that we carry out the developments that are inside this project And now allow me to introduce my colleague David that will be sharing his experience with the smart high rose use case Thank you. Well, thank you Victor Now imagine yourself in the following driving situations When you go to to work in the in the morning and come back home in the evening When you are on holidays and go to the beach to the mountains or wherever you prefer When you go out with your friends to have dinner and come back late at night or when you have to drive in a rainy day You know that these road traffic situations are different one from each other And we have seen only four but there are many more and this is the goal of this use case model road traffic situations in order to enhance toll road operations toll roads that are placed Importo and Malaga. We had to predict road traffic flows within the next 15 minutes one hour and two hours And also the probability of having an accident within the same time periods We use for that our knowledge about road traffic dependencies Which are mainly the calendar and the location of the road and the segments of the road the hour in the day The parameters of the road such as the number of lanes or the width of the road The number of vehicles that are in the same place of the road at the same time and the weather conditions And we face here a problem and it's that humans are not regional at the wheel So probably we didn't have the most important feature which is what drivers are thinking at each moment But despite of that we achieve really good results as you will see For the road traffic prediction model we started analyzing historical historical traffic flows Both Porto and Malaga has production system that are recording every 15 minutes the number of vehicles that have passed Through each segment of the road. This is done. Thanks to electromagnetic electromagnetic induction When a vehicle passes above a coil that is placed in the asphalt This vehicle is counted and this those are the squares that you can see in the roads when when you are driving We use in this data we group by segment By month and by hours and we took the we analyze the peak hours of the days For example the 9 o'clock in the morning or the 6 o'clock in the evening and we identified some outliers These outliers let us know what's the behavior of the roads for example in Porto Porto's road is a road used mainly by people that goes to work. So we only had to to Cluster data in two clusters one for non-working days and the other one for working days But in Malaga as the road is placed in an environmental and in a vacation environment For example there there is the beach and natural parks and cities around we had to make up to six clusters because This road was used by people that goes to work But also by people that are on holiday in order to To assure that and the cluster were well made we made we perform an analysis of variants within each cluster To in order to assure that the days that were in the same cluster What we have the same way all this data was enriched by previous traffic flow rates This previous traffic flow rates. Let us know if a bottleneck is coming And then we try and evaluate Models that are related to the use case and those are regression models and Time series models for the regression models. We we tried SBRs XGBoost Ranging for it's et cetera and for the time series models. We tried LSTM's and Arimas And we obtained that the best model is a random forest which gave us a relative root and minisquare error from 15% to 33 33 percent Obtaining a vehicles of error At some segments of the roads during the peak hours, which is a very good error We we could achieve good predictions for two hours horizons Which for example Arimas didn't let because Arimas were we're good predicting up to one hour But then the performance decreased We could achieve near real-time predictions because this model predicts really quick, but also because it's in a production system developed by Indra And this is a really important fact really important fact because operators need to know as soon as possible How traffic is going to behave in order to optimize the resources both technical and human for example relocating people In different places of the road or opening or closing gates at the toll plaza's all of these has reduced travel time for users and Their driving experience has been improved For the accidents prediction model We started analyzing accident reports these reports are independent from Porto and from Malaga and We extract from these reports the main variables that we know that have impact in in accidents generation Those are the level of service Which is a measure that combines the number of vehicles that are in the road in the same place and the parameters of the road in those places The location of each segment of the road the hour in the day the calendar and the weather conditions we obtain the probability of having an accident because of one because of each of these variables and we concluded that The ones that have them the most impact are level of service and location followed by the hour in the day and the calendar and Finally and surprisingly weather conditions specifically the rain didn't have too much impact, but probably this study is biased because we analyze it in Malaga and it's a place where There are few rainy days So it should be performed again in a city like Madrid where there are more rainy days in order to combine all these variables we perform a conditional probability model this combination, let us know what What's the level of alert of accident in each part of the road and Now you may be asking yourself how accidents can be predicted because all we have seen here is a descriptive analysis well You can see here static variables which are the parameters of the road the location The time and the calendar, but also dynamic variables Which are the weather conditions that we know in advance thing to weather Stations and the level of service which is now in advance. Thanks to our road traffic prediction model so model integration It's really important here and both of them are integrated in the pollution system. I told you before and Will with all of these model with this model Operators can warn drivers if they are driving in a in a segment of road with high probability of having an accident Hmm Thank You David and I recommend you to write down his contact information in order you may find a traffic accident when you go back home tonight And now allow me to introduce you Marta that will be sharing with us her experiences in both the Railway traffic maintenance and the airport passenger flow. Thank you Victor Hmm Well, currently there are two main strategies for maintaining the railway infrastructures Corrective maintenance which consists of repairing the breakdown that have already happened and preventive maintenance in which periodically some tasks are performed The problem with this too is that while corrective maintenance could be too late in time because the fault has already occurred And maybe the service was interrupted Preventive maintenance could be too early and it will be increasing the maintenance cost by performing tasks that were not needed at that moment of time so Which was the objective of this use case? while predicting when and where the fault will occur and By achieving these we will be increasing the network availability decreasing the maintenance cost and Improving the maintenance efficiency This project focused on two critical elements of the track So point machines that are the elements that allow the train to change from one track to another and the track itself and To do so we had information from the track So dynamic and geometric escultation of the track the track geometry Maintenance tasks perform on point machines the movement times of the poor machines Mechanical characteristics of point machines the events that happen on the track and the weather conditions in Which challenges did we face during this project? Well, the first one was that not all the data was digitalized in the way that we needed to and Thanks to the fact that means site has several digital capabilities. We were able to digitalize in the way that we needed to Then it happened to be that one of the main source of information For the degradation of track profiles came from the dynamic escultation and it was not only a time series But also a signal and so we had to treat it in the frequency domain The third was a consequence of the second because we had like a huge number of variables And we needed to readjust them so the most impressive rejection came from applying a PCA In which we were on for like 800 variables to just 10 keeping a 98 percent of information The fourth is quite common in predictive maintenance problems And it's the fact that the number of no faults was extremely higher in relation with the number of faults And how do we handle this? Well, we apply commens in order to reduce the number of no faults while keeping our representative simple And the last one it was that not all the data cover the same time segment And it was quite critical for the second objective and predicting the degradation of point machines as we will see later Okay, here you have a summary of the methodology photo for the first objective predicting the degradation of track profiles and The first thing that we did was divide the track into two hundred meters segments and then applying this segmentation We build a historical data lake then apply the variable reduction process that I've already mentioned and then try different models like trance sample model support vector machines and neural network and We gave the results of the score and the confusing matrix to our business partners so they could decide which model adjusted better to our needs and Define a one the selected one was the neural network and here you can see part of the results. So we achieve accuracy of over eighty five percent and We wanted to highlight the true positives and false negative So you could also see how accurate our model was in order to predict a fall no fall and just a very good of it Well currently this is integrating to real time. So we are already reducing dimensions cost for the second objective predicting the degradation of point machines the methodology was more or less the same But the results weren't as good because as I mentioned before not all the data cover the same time segments and And what is the real impact of all of this what they mentioned as cost had been reduced by almost 35 percent the number of interventions per month had also been reduced by like 15 percent and depollution emissions have also been reduced between 15 and 25 percent and that it's all real Okay, so this is the last use case that we're going to talk about and I will ask you just to keep in mind That at first it was solely an airport initiative and thanks to the good results the shops from the airport wanted to be involved So just wait and see Okay, here we see what we all do when we're going somewhere. So basically checking past your security board the fly and then the flight just takes up and This project is focused on Two stages so predict the arrival of the security check and understanding the behavior of passengers between the palm shop and the Security check and the boarding time, which is also known as dual time And to do so we had information from the airport from two local airlines Asian Airlines and Olympic Airlines and from the shops of the airport Well for the first objective predict the arrival of the security check We combine several approaches and the first thing was finding a probability distribution for each type of passenger We had four groups. So three from the airlines from which we had data and the fourth Covering the rest of the airlines Having this probability distribution We had a probability model for the next day ahead and then applying machine learning techniques We could improve the first hour head So with combining a lot of these at the end of the day here We have an example and what we see is that the prediction like captures the most important changes from the real data and Just imagine how useful this is for the airport in order to adjust The number of people needed in the security check Okay, so now we know how long the passenger is going to stay inside the airport and the next question is like Okay, so what variables affect the shopping rate the most so for the shopping rate? We had like one and zero so one if any purchase was done and zero if the passenger didn't buy anything on Previous studies it has been seen that the variables that affected the shopping rate the most was the boarding gate However in this new study Some relationships between variables has been seen like for instance Normally all the flights going to the same destination have the same boarding gate assigned and So we wanted to order all these variables and so all features a decision dream and the result Was that the variable that affects the shopping rate the most is the type of airline Followed by the part of time dem to destination and finally the boarding and Just to give you some numbers the maximum global shopping rate was like 15 percent and The fact that the airline was considered as low it could change the shopping rate on twice and The fact that the boarding gate was from a group of boarding gates. It could only change it by 1% So that's shocking, right? Okay, so what do we know at this point? We know how long the passengers going to stay inside the airport We know the variables that affect the shopping rate the most and the last question to be answered We'll be like, okay, so in which segment of time the shopping rate is the highest and To answer this question. We had the distribution of dual time and what we see here is that only 2% of passengers live 15 minutes between the security check and the boarding time and Normally the rest of us live between like 45 minutes and one hour 15 before the boarding time I'm then in the same segmentation We had the the shopping rate and what we see is that it grows until like one hour and 45 and Then no matter how long the passengers stays inside the airport that the shopping rate will not increase And if you remember the numbers that I've just told you that the maximum global shopping rate was like 15 percent Well here we see that it's lower and this is because for this study only data from local airlines has been considered and Normally when we travel locally, we tend to shop less than we were traveling abroad Okay, so just a quick recap from what we study is that we have a model that can help the way the security check Personal management is done. We know the variables that affect the shopping rate most We know in which segment of time the shopping rate is the highest and Knowing these some strategies can be implemented in order to have to pass on your weight until that time and Now Victor will give you the conclusions. Okay. Thank you very much Marta So As my colleagues just said the project results were quite good And they are all integrated in real-time management systems and exploited by the different partners of the consortium However, I want to dive a bit a little bit more into this into the conclusions that we that we got to The thing is not just about the numbers because something that we found is that there's still a bit of work That has to be done in the cultural change in the sector of data and analytics in this industry because sometimes the Expectations were different that the results that could be obtained for example We had the expectation in the vehicle traffic use case that the rain would have a huge impact on accidents But we didn't have historical data or significant historical data of rainy days So we couldn't establish whether it has a real impact or not or for example in the case of the train network maintenance We didn't have enough data for the point machines to correct to be correlated in the same segment time So we could we couldn't carry out a real good analysis. So first of all we need to do is instructing and teaching or this industry that Depending on the data that we have the results will will be very different. Okay? Hmm, okay And now going back to the question thing that I mentioned before with Pablo Picasso I just wanted to list which questions we answered along the project For the first use case we answer whether we could do something in order to predict vehicle traffic behavior and we also were asking whether we could Do something in order to avoid traffic accidents from happening In the case of the train network maintenance We were asking whether we could do something in order to reduce maintenance cost and also warranty that this maintenance has a fewer Greenhouse emissions and also a fewer impact on service level that the final clients Good experience and in the case of the airport use case We we were asking whether we could do something in order to properly schedule the personnel that could be Working at the security check Point at the airport. So we could avoid these long queues that I Insert that you will have found in some time at the airport No, and also there was a very important question that as mention as Marta mentioned wasn't initially Contemplated in the project, but the success of the first analysis Made the duty-free shops to to get into the project and ask this question No, can I do something in order to predict the shopping behavior of the passengers that are coming today to the airport? well The answer to all those questions was a huge yes We could and we could do something in all of those cases and by that we have proven The viability and the benefit that the transport industry can have by the application of big data and advanced analytics But also we have seen that those results prove that this can also have a very positive a meaningful impact on society as as citizens that will have a different experience when we you when we you when we do something related to transport and I can't conclude the presentation without thinking and introducing Some other colleagues that help us along the project and that are here with us today Paula Bea Santiago Maria the three of us and of course the rest of our data science and artificial intelligence Stripe that help us each other every way and share good and bad moments and now we will be answering your question in case you have any and Just before finishing just wanted to emphasize that the application of big data analytics is not only relevant for transport industry Regardless the sector of your company You better apply them now because otherwise you may get a disadvantage A competitive disadvantage so hurry up because the time is now Thank you very much So any question No Okay, well, thank you very much everyone for coming