 To tell us more, we have from Repsol, Elena Thomas and Emilio Martin, both senior data scientists at HubData and Analytics. Welcome to you both. Thank you, Nicolas. Hi Emilio, are you sharing your screen? Yeah, I'm going to start setting my screen, so let me know when you start seeing my presentation, please. Sure, I'll confirm it. Okay, still, we still can't see it? We can't see it. Everything is all set. Take it away, Emilio. Okay, thank you. Okay, so thank you very much for your introduction, Nicolas. As you say, my name is Emilio Martin. Today, together with my teammate Elena Thomas, we are going to talk about how we are managing corrosion detection in our refineries. Essentially, the talk is divided in two clear parts. The first one is related more about what is the mission of Repsol as a company, how we are performing our digital transformation program. The second part that I think is going to be more interesting, not only because I'm not going to be the speaker, but also because Elena is going to share a lot of technical details about the kind of neural network architecture that we are using to detect corrosion in our refineries. When we were preparing this talk last week, we tried to put on our audience shoes. The first question that came to our mind was why? Why a company like Repsol with more than 25,000 employees and with almost 100 years of history is investing so actively in this kind of technology? The answer to that question is pretty simple. Essentially, we are a multi-energistic company and our mission is to bring and design the energy of the future to our customer. Essentially, this is a simple statement that is a really difficult puzzle that we are trying to solve in our day-to-day. To be honest, I don't think that we have a definitive answer to that puzzle. We know some of the pieces. For example, one of the pieces on that puzzle is to provide clean energy. In this regard, last December, our company made public our commitment to become a net zero emission company by 250. I know that 250 could sound like 30 years time, so it's a lot of time. But this commitment comes from our CEO. Also, we have a set of milestones that we have to achieve. That means that we are working in our day-to-day actively to achieve that goal. I think that a good example of that is how we are using our technology and talent to try to figure out what is the answer to that question. It's the hackathon that we participated last summer. Essentially, the hackathon was arranged by Microsoft, and the end of the hackathon was to detect methane leaks using satellite images and, essentially, artificial intelligence. So, the main challenge here was that, as it's pretty common in our industry, there was a scarcity of data. Essentially, the team came up with an innovative solution that instead of trying to stick with a supervised learning approach, we came up with a simulation approach. On top of that simulator, we put an optimizer in order to, essentially, assess what is the probability for each of our assets to be responsible for causing the leaks. And, essentially, we won the hackathon and it was an international hackathon. And I'm sharing this piece of history that maybe we could provide further in the next big things, not only because the hackathon was pretty good, but also because I think that it is simplified pretty well how we perform things in the epsilon, how we are trying to solve this big puzzle that is to design the energy of the future. As I said, we don't know, we don't have yet definitive answer, but we know some pieces of that puzzle. And, essentially, some of those pieces are, for us, key ingredients are people, talent people, technology, and resources. And to say we do some figures about our digital transformation program, the first thing that maybe draw your attention is the scale of our digital transformation program. We referred to our digital transformation program as a kind of umbrella under we developed all kinds of digital initiatives. We have underway more than 200 digital initiatives in which are involved more than 1,200 people in a cross-company collaboration with a budget set only for the previous year, sorry, on 150 millions of euros with an estimated return by 2,022 of 1 billion. And, obviously, we can't accomplish that huge amount of project by only using internal resources, but so, therefore, we rely on some partner and supplier. And the last part that I think that is also quite important is the people and talent people. And under this digital program, we created almost 500 new roles and employees. And it also, and me, works for the data analytics hub, but there are other hubs like digital chain on each channel, robot, professional automation, cloud competence center, security, digital marketing, and hardware robotics. Essentially, in this picture, you can see our chain of value. So we have different portfolios. We have exploration and production, trading, refining, distribution, electricity and gas, consumption, and corporate. The project that we are going to share with you today is from the refining portfolio. And inside its portfolio, we have different strategy initiatives. In the case of refining, the first initiative that we have is flow lessons and always say that essentially it's aim, which is aim to improve the workforce and plan safety levels with tax automation and some real-time data assessment. Here also, safety is, I would say, our highest priority. Autonomous plan that is related to use data in order to optimize the parameters of our units and also of our plans in order to improve the process in an automatic manner. And improve the process could be like reduce the energy consumption or produce more product of certain quality. And to business planning that essentially in this kind of project, we use predictive analytics to optimize our decision and our decision range from design which kind of component you can buy or how to charge or recharge some vessel or what this kind of stuff. And then the project we are going to share today is framed in the strategic initiative that we call zero and effective failures that essentially is aimed to increase reliability and reduce maintenance cost and somehow extend the asset life cycle and maximize the production. All this, I mean, every single digital product that we develop under this strategic initiative is inside one internal product that we call asset health that essentially is some application in which we can deploy different models to asset the health of our assets. So focusing a little more in the topic of this talk that is corrosion, just as Nicola said at the beginning of the presentation, corrosion has a huge impact in our operation. So typically it means a couple of millions per year of impact and we can distinguish two types of corrosion, internal corrosion and external corrosion. In this graph you can see the number of intervention that we have made each year and in blue we have internal corrosion and in red we have external corrosion. Essentially internal corrosion represents 80% of our interventions and external corrosion only the remaining 20%. Here you have a snapshot of our application which is called asset health. Right now I'm going to share a video. Essentially the aim of this kind of project is not only the fact of corrosion in our pipe but also to provide a prioritized list to our technician in order to make inspection because as you can imagine with thousand of kilometer pipe what you have to what you want to have is just a prioritized list about in which part of the refinery you should inspect. So I'm going to put a video. So this is typically an image of one of our refinery and for those of you that have never been in our refinery, our refinery is a pretty big thing so maybe it has several kilometers, several square kilometers regarding the application. Essentially when one of our technicians logging, sorry one of the technicians logging he has a list, he has an agenda and he has a list of next situations and for each pipe he has the name of the asset, the line, the parameter who is having a higher impact in the asset health estimation and a set of nodes and also he has a set of assets located in their refinery and also ordered by the asset health evaluation that different models are providing for each asset. So if we click on a particular asset we can also see the evolution in the time about the data set, how it's evolving that health index and so on. So now I will pass you over to my colleague Helena who is going to provide more details about the models and the challenge that we face during this external corrosion project. Helena are you there? Can you hear me? Yes, can you hear me? I was talking. Yes Elena we can hear you. Okay nice, I'm sorry. Okay so why did we choose external corrosion for our presentation? Okay we think that it's a more complex problem, we have developed our own external corrosion tool while in the market there were already commercial available solutions for internal corrosion so this thing we consider it more innovative. Why is this a challenge? The challenge is to look for correlations between the visual and the thermal images and try to find insulation breakdowns of our pipes. The reason it is challenging is that we want to do that in an uninvasive way, we don't want to do it to check the defect manually every time we have because to check the defect manually means that we have to remove the insulation and we would like to remove the insulation only if the operator is quite sure that there is a corrosion problem underneath. So this is what we aim to and the way we have started to deal with is to build a machine learning model to automatically detect these defects by using visual and thermal images. Here in these images you can see four different kinds of defect. The image on the left shows some defects of raptors in the following image shows some complete detachment of steel on a pipe. The third image shows why and this detachment of the steel is is only partial and the fourth image shows another kind of defects that are very popular which are defects that occur within the joint of the pipes. So this is our objective and the difficulties we find are different and several. Here in the image you can see some part of the rack of the refinery in La Coruña that we have used to start with this project which is the starting of if it's as sexual it will be followed by another phases of the project. But for this part of the project we are limited to this data from one rack in this refinery. Another difficulty is that the ground truth is very difficult to know. It's not only needed that the operators could check the ground truth in the images. The operator has to go to the real site and check if the defect is a real defect or not. So it's not a ground truth very easily to detect even for a human and it is combined with the fact that the corrosion defects are sometimes located in in parts of the rack that are most more difficult to be reached. So it's difficult to go there and check and then if we detect accurately X, Y and Z coordinates which is another challenge, detect three-dimensional coordinates from a two-dimensional image then our work will be more useful to the operator in order to make its impact. So these are the kind of images that we are going to use. I already showed the visual and these are the thermal images. We see that the available data is the scars as I previously mentioned for this part of the project. We have less than 1,000 images if we include both thermal images and visual models and we have built two models, the thermal model which will distinguish between two classes defect or non-defect and the visual model which will distinguish between the four kinds of defects. The thermal model is not always useful because there are some defects that may be not sweet of or on depending on the weather conditions and on the operating conditions. If their pipe are not wet you may have a defect but you may may be not detecting this defect in the thermal model. So this is the reason behind combining both. Okay just for those of you who don't have it very fresh, this is a summary of the analytical challenge of the field of computer vision in the previous years which almost has to do with the feature extraction part. In the past it was a very hard task to to be made but in the more recent years in the last 15 to 10 years in the field of deep learning a lot of progress that has improved the computer vision challenges because now they can build features in an automatically way. The reason behind this you can see in the upper right left of the slide you can see this is a convolution operation. We could see like if our images would be represented by a matrix made up of numbers and what is a convolution is a mathematical operation which will get another matrix and put this matrix across the original image and will produce another matrix or another image. The purpose in the image on the upper and the bottom left we can see what would be the what is the the normal architecture of the deep learning used for computer vision tasks. Normally it's based on these kinds of convolutional layers and the way that it layer operates is the one that you can see described on the image. The first layers will get features of the image that are low level for example we could detect the borders we could detect difference in the lights as we go and move forwards to the right in the layers we could be appreciating more complex features for example we could extract circles geometric like rectangles and then more even more complex things so that the objective here is to generalize the image that we have on the left which is a car into something that would say this is something large with four wheels so that we could apply it you know in the last layer of our architecture for classification tasks this has been it's a it's a has been everywhere in the scientific community there are several problems that you can approach and solve using these tools in our case we are going to use object detection because we want to detect the defects and image classification once we have the the defects detected we have to classify into one two three or the fourth label as I've said in the last 10 years starting from 2012 especially we can see in the upper part of the slide that the top one accuracy has increased greatly from 50 percent to almost 90 percent here you can see several architectures famous architectures in the field like alex net inception and the thing and the way these architectures has improved the results mainly deal with the number of parameters that we are including in the model we can see that the radius of every circle has to do with this million of parameters and that it's it's a very huge we observe almost the end in 2019 the effort of the scientific community by changing the in the accuracy which is called efficient nets that we have used in in this in our model but how can we make use of all these state-of-the-art studies of the scientific community the answer is transfer learning transfer learning is a change of paradigm apart from traditional machine learning in the past we used one data set to learn one task and another different data set to learn another task with transfer learning paradigm we are able to make take advantage on one data set that has been trained for learning task one and combine this learning with our smaller data set to in order to make our system learn task two so with less images you get a less computing time and less computing expenses you let you get a good performance on task two in order to see this is more in detail we can see here how how could we transfer transfer these parameters in a more technical way in the upper part we can see how we train the system one in order to complete the task one which is how to classify between elephant, wild snake and terrier labels we can see the architecture which is made up normally of several convolutional layers and at the end of it some fully connected layers and we make past the training images through it this is what we call the pre-trained model now we were interested in transferring some of the parameters of this network that it has been costly to to to get and the transfer parameters that we normally freeze are the parameters that are encoded in the in the part of the first layers well or the not the last layers of the architecture which has to do normally with embeddings and with feature extractions so in this way now we are going to pass through our training images so that for example for training what we say a window we could make take advantage of the thing that architecture one has already learned in previous tasks for example learn how to identify rectangles so this is the idea the main idea behind behind this paradigm in episode we are using it in many projects and many projects having to do with computer vision and many other projects that has to do with natural language processing there is some beneath it is that in that way we will save computing time we will save expenditures and we also will reduce our carbon footprint that as my colleague Emilio mentioned at the beginning it's one of our purposes for 2015 at zero emissions commitment so apart from that the other advantage of transfer learning and as I've already mentioned is that in many domains not only in this case of this project training data is scarce at the end all our companies under the digitalization program but to make it a real data-driven company with the size that it is we have to to start little by little and not we can we don't have always all the available data that we that we would wish so yes here why did I introduce the problem on transfer learning because this is the base the base are for our visual and thermal models are pre-trained models from using the architecture of efficient net and res net for the thermal which has been trained on the co-coded data set and for the framework we have used the tensor to flow object detection API because of its performance and popularity and afterwards we get our own model which is able to receive these kind of images and render output that coordinates x y set of the defect the class of the defect and the confidence score of the defect was a performance of our model okay so what metric did we choose to measure our performance since our data set is is unbalanced we cannot choose the accuracy and we have to go to metrics such as precision or recall I will remind you briefly what precision means precision is the true positives out of the positives that I say they are and recall is the positives that I say out of the real positives that we can find in the sample for our business unit the important metric here most important is recall because they don't want to miss defects in their refineries and if you see and take a look to the results on the thermal images we can see that the precision and recall are quite high we have a precision of 95 percent and recall of 70 percent however for the business unit this is not enough because as I mentioned previously not always are the thermal defects on switch on so we have to have a model that also performs good on the visual images here our performance is lower however we are trying to improve it by returning the model with more training data but this machine learning model that I have presented it's only a small part within our whole pipeline I want to emphasize here all the pipeline that we have developed in this project the first part in green is the manual process which has to do with a modeling three-dimensionally the space that we are studying and the second part which is automatically which includes the machine learning model and the link in between the 2d images and the 3d georeference of the defects and the web that we have built in order for the to do the manual model a photogrammetry and the photogrammetry to summarize is to uh convert the space that we are studying into a three-dimensional model how do we do this so here on the right we have an example in order to get a 3d example a 3d object of this house I would have to take 360 degrees images at three different heights low medium and high and then on the sketch on the left you can see how if I have these kind of images with these kind of um perspectives I could geometrically find the edges of the three-dimensional object this is the first step we did at the beginning of the project the second step was but the first step the photogrammetry it's give it give it it renders you a three-dimensional model but is a relative model we don't have it with a really absolute coordinates for that we have uh developed an object model based on a laser scanning of the same area and based on the label assets that the business unit provided us in this way we are modeling this three-dimensionally and we are going to be able to connect these things with the photogrammetry so the objective here that we have uh read is to convert the 2d image 2d images into 3d points and so ones that I have detect at detection in one 2d image I can relate it with this x y and set coordinate and the name of the asset which is great this is the thing that really helps the business unit and what's more here which is very advanced apart from this is the fact that we have built these visual and thermal cubes three-dimensional cubes so that we are able to group the defects for example we may have one defect that is being corresponded by 10 images taken for different from different perspectives so when we present these images to the to the operator maybe in this image you can see the defect from one or two and not from the from eight of the 10 images from the 10 of from the 10 images then this is the last part of the project which has to do with the model validation retaining and prioritization of areas here on the bottom right we can see what the operator would see in the website you can see that the blue dots are the defects that we find so the operator can click on any of these defects and prioritize and see the pictures that the model is choosing that that this corresponding to the to a defect or another in this way the operator can choose the area of the refinery where he where he has to move in order to assess that the defect is or it's not true or maybe he can see that there are four defects in the same area so he can decide he or she can decide to remove the insulation which is a hard part so this is one part of the web the other is that we are giving here a tool for the analyst so that in case that he can relabel the image as true or false defect so that it will help us with the model validation and retainer in the future and here are the key improvements that we have found out during the length of the project there are mainly three one related to image acquisition in this case it is very important to take very good pictures i mean good i mean need no very no difference in in sunny or clarity not shadows these kind of things that will help a lot of machine learning computer vision algorithm then these thermal images and visual must overlap in these 360 degrees when we do it and then another choice that we took when we started was to use radiation image instead of temperature image because temperature image depends a lot on the kind of day and it's difficult to be repeated in the same temperature images for the same asset then the second was labeling improvement we found several difficulties in the first weeks of the project only one person was devoted to label the thermal image another one was devoted to label the visual images so that the labelings were always done in the same way we had to remove one class in visual model because we didn't have many examples of that and we have to label some defects and in the in the field of data augmentation what it was to take different parts of the pictures zooming in zooming out and rotated to assess for the image variety variability and the most important thing is the retraining during the length of the project of this part of the project we have lasted for three months we have retrained twice during these three months and now we are retraining another and we have tried to build capabilities in our website so that in the future this retraining which will be much more easy and that the available labeled data will increase so I think with this we finish our presentation I would like to thank you for the opportunity you gave us to present our work and I hope you enjoyed our presentation and we are pleased to answer any questions that you may have thank you Emilio Elena thank you so much for for that very interesting presentation so we do have some questions for you let's dive straight in Ruben asks why defect no defect only one class in the thermal limit there is one class in the because there was only all the defects were switching on there's no difference you only see if something is being more wet or not you cannot assess the kind of defect good if there is corrosion you will see that the insulation properties of the pipe is different so or is it different or not sure another question google elmore because of the extremely limited availability of data did you try some one shot or few shot learning algorithms not yet not yet because in this part of the project sadly we devoted less time for them to the machine learning model than for all the thing that goes around the ones of the most difficult part here was have a reliable way of linking the effect to the positioning to the care positioning so we devoted more time on that and for the one shot didn't try we only tried the normal transfer learning with the images available and the unlimited okay question right did you use the pre-trained network for the classification or used it for feature extraction if you used it for extraction which classifier did you finally use for your classification and a follow-up does your model include both classification and continuous regression to arrive at its output i don't know if i understood the whole complete question but i will try to answer i mean we use the feature extraction layers not the classification layers the classification layers were our own layers and it was a classification layer not regression layer all right hopefully that answers your question Kenneth if not you'll have to get in touch um yeah how does the how does the model pick up cui cui is the part of internal corrosion isn't it this was a statistical model based on the flow properties okay i can't help you with that okay yeah he tried to contact through twitter so we yeah we can follow up that question on twitter that would be the best way for some of these questions are very specific is there any process to detect the quality of the images previous the machine learning model question from alejandra the process to detect the quality of the images has been carried out manually we don't have an automatically way of saying this is not valid this is valid because all our images go go through the training part uh of course at the beginning steps we saw that our performance of the was not good and we take a step backwards and say okay the images are not being okay but we don't have an automatically way of removing that uh for the image so all the images at the moment are manually taken there's no automation of the image capture no yes now there's automation but he means i think the question means for the for the quality all the images are taken okay it doesn't matter any quality yeah can we come back we have we have tried to we have tried to teach how to yes well the things that i have mentioned to tell the business units but in the case that in some part they do things badly we are not testing that automatically right final question what about the from the human resources point of view the technicians that previously you would have required to walk these kilometers thousands of kilometers and inspect all these pipes manually are they being retrained or are they being replaced what are some of the hr consequences of what we're doing here well i think that then i mean the technology is just a tool and and essentially um they are not going to be replaced of course the thing is that the we are just providing a tool to prioritize the world because most of the time when you talk with the technician they what they say we do is that essentially they don't know where to inspect i mean they have so many possibilities and so many kilometers to inspect that um just being given a tool to that give that give they are really decision is they i mean they are pretty happy with that tool so it's not um we are not trying to replace our technician because you know it takes a lot of time to to create a good technician great well that's that's great anyway okay well let's see it just remains for me to thank you both very much indeed we're almost out of time so maybe some of these more specific questions will find a way through the networking section to address them directly to you and once again thanks very much indeed to both of you thank you thank you