 Thank you for being here I'll be talking on AI for engineers So quick check. How many of you are engineers by my education? All right. Now you see So many talks, but we consume so many Carsh machines or even, you know, when we start with our newspaper, there's a machine behind Right. So in today's talk, I just thought of, you know, giving you if you use cases and then talk about trends But what's really happening in the engineering world? How AI can help? Okay So the talk is integrating digital twin and AI for smarter engineering decisions So how many of you know digital twin? Few of you. Okay. Any of you have implemented digital twin? Built one Fantastic. All right. So, um, thank you for being there. You could also add but In the interest of larger audience I'll first, you know, build some background foundation What is a twin and then maybe show you a use case of how we can build one and Then we see how really it fits in the AI workflow because the conference is all data scientists right data science conference so you'd be You know learning Concepts key techniques and then application a bit about myself. I work as an application engineer why I'm talking I get to talk to various different companies and I feel fortunate that I can also learn from it, right? So whatever we have learned as a company myself. I'm going to present those use cases here as well Okay, so let's build some motivation Instead of just getting into what is a digital twin a quick question to ask Why do some time? Well designed engineering systems fail or underperform good challenge to have A second look shows the centrifugal force well as an engineer. I cannot watch that again and again. It hurts, right? Now these systems they are Expensive millions of dollars maintenance is expensive and If there is a failure, which has not been rectified Underline which has not been identified unplanned downtime can cost additional money At the same time, it's in many cases. This is hazardous as well Good motivation. So can we solve these problems and engineering companies have been doing this carrying out timely maintenance Now can AI help? Can we start understanding behavior of these systems and then predict? Yeah So can solution be a digital twin? Well, not the only solution, but one of the solutions. So what is a digital twin? If you do Google you'll find various different definitions. So this is what my definition is a digital twin is up-to-date Representation of a real asset in operation, okay? Up-to-date emphasize Real asset in operation Yeah, people might call it cyber physical system digital avatars cyber objects all means the same thing To give you an example now I have not taken the wind turbine example because this is something which I have and this is from one of our Customers as well. So here is an oil extraction unit Anyone has experience from oil industry? Few of you. Okay, right Now you'd see you might have one site like this on multiple sites like these Now we are trying to understand the definition of digital twin. What is an asset? This is a rig Out of that 20 rigs. This is one rig Which cost more than a million dollar repair cost is $100,000 and from their repair logs Then they find out that pumps are most critical part And if you can rectify if you can identify problems well in advance You can save money makes sense Now even if I'm talking about this use case whatever workflow AI workflow I'm going to show can be applied in other applications So an asset could be a component a valve a System or systems of system right Up to date that asset must be continuously pushing data to a digital twin where you might have a Condition monitoring system which might be giving you visual indications of what's the fault in this particular case It's saying that one of the cylinders has blockage Right, and then if this is a scenario If this is a scenario failure, how long I have before the system fails So here is a regression model which is giving you remaining useful life Now how is this useful? Eventually everything boils down to business impact, right? We started saying these are expensive systems and maintenance is costly A site manager might get a flag like this that Location pump number has some problem not just about the problem is giving you more details That what exactly is happening one of its cylinder is blocked and with this faulty condition you have 15 more hours You go ahead and fix now with this information He can manage his inventory better Run, what is scenarios? operationalized Correct. Yeah so We have built enough ground on what is a digital twin we also saw a use case as an example Let's see how to build one Now again I'll be taking the same pump because we have been introduced to pump and We want to solve this problem. We want to find out. What's the fault? What needs to be maintained? So fault classification problem I have some machine data. So I have some data Okay, and I want to develop an algorithm. So this could be predictive maintenance algorithm So in this world they call it predictive maintenance, but essentially AI model right challenge Oil and gas guys might might be able to our industrial guys What's the challenge when you know when when we deal with these systems or? Absolutely as these systems are expensive We carry out timely maintenance. So most of our data is healthy data Now if my I want my algorithm to detect Failure, but we are not feeding or we are not teaching algorithm to identify Learn failure. We do not expect it to predict failures Okay, of course, you know Connection is that aspect? So can I just go ahead and break one of the tooth in my pump? I want to have failure data right so To create this failure cases is expensive time-consuming. Sometimes it's not possible So with this setup, let's try to you know build our our our our our story from here So everyone clear on this part so we want to build a classification model But our challenge is that we don't have data if we just start with that there will be class imbalance Okay, so we'll be using this approach and many of you might be aware of these these things right you start with Gathering data then build a model and then worry about deploying it Okay, and as my friend there said that no then we need to also worry, you know Take care of the infrastructure where you you know connect the systems have that streaming engine in place So now we're really digital when fixed in this workflow And that's the talk about right This morning if you have attended Vidal's talk he also talked about you know bridging physics-based systems with AI Extreme, but you know left-hand side your left-hand side is a physics-based model where I know what's happening inside the model as A domain expert you would know how it functions Right-hand side it's more behavior. He gave an example of a ball which is falling right Well, I can very well have You know sensors both at the ceiling and the and the bottom and I can maybe figure out what's happening But if I know the physics I can continuously track the the complete motion so advantage with physics-based model is That it's transparent Yeah But Not everything can be modeled Yeah, whereas and I'll show that with an example Whereas your you know data-driven model you capture behavior But as we said that not every parameter can be measured Idea is that Can we compliment these two? Yeah, so here is my my talk This is the theme that we're going to bridge AI data-driven model and physics-based model we get better decisions and These are I mean I would call this as a twin my generation of data will happen with physics-based and I'll be building then You know data-driven model based on that So let's get started with the first piece To build a twin physics-based As you said many of you are engineers, right? When you build a twin You got to know the physics of the system I've taken this slide from IIT Bombay professor Natraj Control systems, so he presented him on offer conference If this is a gas Turbine on left-hand side, I will start representing that with various different stages with equations first principles Can you find out know what's really or it's difficult to model with first principle out of these five stages? Combustion chambers is a little difficult now. We can complement that with your data-driven models. Yeah so anyways He started with first principles Then also use a simulation tool So you've got to be a domain expert who understand domain and a tool might just help you to build one I did not have those equations. That's why I did not show directly, you know for my pump. I had a model like this a Multi-domain model Right which has mechanic hydraulic and and electrical parts as well So first is building the model having a representation of the physical acid. Is it a twin yet? It's not a twin. It's just a replica, right? That's the second important part that You have healthy data You fine-tune with the healthy data and make sure that it's working as a model So here we see black is the major data blue is the simulated data. There is a gap. What do we do? Maybe optimize right of course, you know objective function there said objective that I want to have a Parameters tuned and now it is tuned Can I label it a digital twin now? Yes, I mean, you know now it is behaving as if it's a real system I could explain this in 30 seconds, but this process is really You know it takes a lot of time for him it took two years Okay Now it totally depends on how much fidelity accuracy you want to have how much control you want to have how much in detail You want to go? We started with saying that it could be just a component or complete system So what is critical needs to be? Identified and that you will do it Okay. All right. So now once it is a twin I Can inject failures as many failures I want to have Okay, again here, I should know that what kind of failures might occur Yeah, that is also crucial. Let me show you an example how this works I'm going to inject a failure here and I'm saying that there is a seal leakage fault seal leakage on and with this fault as it's a pump I'm continuously majoring pressure Okay, so with this fault added How's my pump behaving? It was initially between those lines. Now it is going beyond or touching those normal zone Right, so I could add one fault interactively. I can see that or I can have in real world You might have combinations of faults occurring together so I can write a nice for loop there That I want to have for this range seal leakage blockage. Now. It's a twin. I don't have to worry I can have as many simulations as possible Right. So this is what is happening and then you get synthetic data Okay. All right. So we started with saying that We want to build a classification model, but we did not have failure data We say that can physics base model helpers Build one and then use that for generating data synthetic data Now this can be added to your healthy operational data Make sense. Right. So what I do next I add whatever sensor data I have with the synthetic data and then This is ready for the next step Okay. All right. So I have here 240 measurements Sample we have measurement at every millisecond. So 1200 readings and 240 measurements Okay, let's move to the next next part Which is, you know, building a data driven model With the available data Pre-processing most of you know, what are the challenges missing values outliers? correct offsets Those are real problems and not be getting too much into it because I want to focus more on this particular part We just feature extraction in this this world But let me let me just tell you, you know, our data had two problems Spikes and offset Okay, that sensor was reaching to its maximum value and Validated this with our engineer that it's not possible pressure just going so there was definitely a problem So you're playing the detective role. What's really happening with your data? So we Concluded that it was a noise Yeah, that's again a crucial step And here even if all my recordings measurements are 1.2 seconds long, I still see offsets So these were only two challenges in this particular data Now what I've done So we had this data located directly on hadoop, but I have copied everything in my own hard drive So what I'm doing I'm saying, okay, all the data Which is greater than nine remove that and also fill missing values I'm doing a linear interpolation here Okay, that's our dealing with offset All right. Now with that I have So-called clean data. It's ready for the next step, which is my favorite identifying features Okay um I'll be using these terms feature and condition indicator Okay So what is a condition indicator? or a feature Which helps you that unique characteristics which helps you to achieve your goal if it is a classification It should be able to distinguish between healthy and healthy If it is a regression problem your your feature should have a predictable behavior Correct. So problem with feature extraction is that you got to have various features considered to build a robust algorithm Yeah Second challenge you might not know which are really important So ranking really then comes in picture So I could extract features From time domain Frequence domain or time frequency where frequency is varying with respect to time Now I don't expect I don't know whether you do that, but I don't expect all of us have to have this knowledge Let's just see the impact of Taking one of the features and and then prove our point whether just one feature helps to identify failures I'm starting with mean Okay, I have raw pressure data. I'm taking mean So here as my faults are nicely labeled I'm I'm going to give you quick two seconds to observe this, you know, uh plot here black line Sorry, let me just So black line here Let's just look at block inlet one of the faults Black line here shows healthy data and I'm comparing that with faulty data with Severity of fault blue is less severe red is really severe Now let's see if I take a mean so I'm taking a box plot here Box plots are super useful when you're comparing two distributions, right? So blocked plot and healthy I can clearly see that I can I don't need machine learning here Right, if it is a anomaly detection problem, I could just see right right a threshold that if my mean is less than 2.7.24 it's blocked inlet. Otherwise, it's healthy I don't need machine learning but our problem is not that simple Well, if my goal is to just identify anomaly, this would still work Healthy anomalous, but we might not know what is the fault My point is that just one feature might not help to do that. So in this case, we just looked at mean It could be a good condition indicator to do achieve anomaly detection, but not for everything. What do we do? I can look at variance. I can look at kurtosis skewness. Now I can very well see that for kurtosis You know my all blocked inlet faults can be very well segregated So if I every time get raw data Take kurtosis I can see that whenever, you know, I have kurtosis greater than five. There's a blocked inlet problem So one feature might be helpful for identifying a particular fault. Another feature might be helpful for doing something else So you need to have combination. Sometime you might need to look at two features together mean versus variance And this is able to distinguish even better if you can see Yeah, um talking about frequency domain So peaks peak frequencies. So if you're getting if you're getting data from rotating machines Uh, normally I'm just to give an example. Here. I have um a data from um machine which has three sources bearing Motor shaft and and disc If I just look into time domain, I just get a combined effect But if I look into frequency domain, I very well get three get three different peaks with three different frequencies Right, so if I let me just, you know, have this side by side comparison And my point here to tell you that sometimes frequency domain can give you additional information to achieve our goal, which is classifying faults so if I Just tell you to look at these two faults blocked inlet and seal leakage blocked inlet And I'm I'm in time domain They follow similar pattern It's very difficult Just in time domain to figure out, you know, which is what But if I now look into frequency domain for the same faults At 2000 frequency here, your magnitude is different and for blocked inlet. It's Close to 0.015 Yeah Point time domain frequency domain might also give additional information Uh, let me just show you all these two two three steps. What what we have done So extracting features and then our challenge also was to find out You know, which are out of these features, which are really important So what I'm doing here importing all the data So I'm using here App-based workflow just jumped here Yeah, this is what I want to tell So all those features, you know, if you know Fine, if not, all those are directly available there Time-based frequency based I can just select all all the features here and then those get bundled up in in my A feature table, right? So, um When I want to look at, uh, you know, uh frequency domain, I can I can take a power spectrum I can also see that no, um for a particular fault Let's say up one fault might have four peaks five peaks another fault might not might not have that Right? So I can also look at that. I can select what's the frequency band Now once this is done The next important part is that understanding which feature or which condition indicator has maximum contribution To your failure what can help you? So that's where your histograms will or this kind of representation will help Now let's just look at, you know, let's say data mean How to understand this or how to read this I'll be asking this question. Is this a good feature? If it was a good feature, I will see clear, you know Histograms for all the faults, but it's there's overlap So then I have I can use other techniques Like ranking those features. So here I'm using one way ANOVA to rank the features again There are various different techniques to rank your um Features so that's where the ranking really comes in picture right So that's more about feature extraction or identifying condition indicators Now we have extracted enough information which is ready to be given to training machine learning Yeah Again one solution does not really fit all we need to compare various different models and then also need to make it very sure What will be the impact in production? In prototype when you're prototyping it, it might be a good model But when you are taking it to production, you know memory footprint performance might be issues correct so, um, I like this For a reason that and many of the times we get this question How many times what is good accuracy? So this is an iterative process. That's my highlight here You start with historical data pre process that apply very different techniques get a model once the model is ready I take the model and apply our new data and integrate it right As you know, there are various different techniques available. So how would be doing that? again, um MATLAB users might be knowing there are apps available So in earlier case as well, we use diagnostic feature app. So there are various different apps available Doing things. Let me just show you classification learner app, which is I used Which is what I used in this particular demonstration So what I'm doing Exporting all those features we created in the last stage directly to classification um learner app Now here I can compare, you know between two predictors or two features What's the impact whether they are able to give you a good representation? I can select here all the models So that's what is happening Okay And then the comparison which was the challenge So I'm comparing various different models and see what really works the best So in this case, it's giving you an uh, you know ensemble with uh Boosie trees is giving you maximum Accuracy Again, there are ways I can look at, you know, the performance there. So confusion matrix ROC curves once this is done I want to take it and move it to production. So fourth generation is also there Okay Now this problem I'm not getting into building a regression model But if you're interested how we can build a remaining useful life model I'll be happy to talk about it later on. So today's case is more on classification. Okay But based on data, there are various different techniques Um, I'll just show you, you know, what is important and what really happens in the digital twin world You start building. So here I'm showing a degradation model with a safety threshold I know that okay my machine will fail after or let's say a car a car engine will fail After one lakh kilometers. So I know that safety threshold Now what's happening here is that I start with whatever information I have And then continuously keep updating model coefficients or parameters. That's the key So you're making your you're tuning your model. The model is learning and you're changing the So here if you see You're getting more and more accurate, um, estimation of remaining useful life Yeah, so that's also very interesting Okay, now Just coming back to this this particular part that once you have built such a system, you know a digital twin which has physics As well as we have here the AI model Bridge together built together And this is now Once it is tuned Can be directly shipped Or put into production Yeah So What we have done so far we started with, you know, generating data We mixed that with the sensor data, which was operational healthy data. That was our challenge that failure data is not available Correct. We use here physics based model. Then we got into preprocessing only two problems. We had spikes and offset We dealt with that Then identifying features we saw why it's important You might need to look into time domain frequency domain time frequency. We did not see but links are up Correct. Once that was there, we compared various different machine learning techniques classification techniques models Now it is ready for the last part, which is deployment Okay, and here key question Which needs to be asked is that which algorithm needs to deploy where How many algorithms we have built here or how many techniques? three Yeah, first for generating data the model itself second for feature extraction and third the actual machine learning model classification model What do you think feature extraction where it should go? cloud No edge Yeah, so So see I mean, let's let's see, you know why it's important to understand One way to ask this question This is going to be let me warn you. This is going to be a busy slide I'll give you five seconds to look at but very extremely extremely useful So I can ask this question. How much time your AI model has to act This is really the key question And that's where you see So many technologies Yeah So extreme left hand side is where hard real time you've got to have those decisions made directly that time environment, right? so Normally your autonomous vehicles aircrafts you'll have Algorithms implemented directly on embedded code Whereas when you're doing life cycle analysis Dealing with huge amounts of data where extreme right hand side will come in picture where time really is You know, you have no constraints as such. So in our case We'll be taking, you know algorithm, which is feature extraction on the edge and In digital twin world, it's important That it is all the time streaming, right? You're getting data from all your assets Okay, let's just see, you know one by one. So how we're dealing with that Why it's important? I mean answer to why it's important to take feature extraction directly on on the edge If this is a pump with one sensor Which is which is, you know, we're taking measurements at every millisecond I'll be getting thousand sample which which is 16 kb If I do it for entire day 1.3 gb if I have three pumps and 20 sensors We'll be generating seventy six eight gb of data What does it make sense? To send all the data unprocessed Pay a lot of money on the transmission You'll be taking bandwidth as well as cost So what do you do? on the edge itself Clean the data extract what is useful and send it. What is what can be consumed by your machine learning algorithm? Yeah, make sense So, um, how we're doing it now here? I'm creating a buffer of Thousand samples. It does not make sense to take mean on every single point So that's what I'm saying wait for thousand samples take a mean and then I'm I'm extracting those features here Okay, so something which is not right But I wanted to show here I'll just try one more time Yes, okay Sure, I got you a question. I got you a question. Let me just you know, stop it there I'll answer that question and then maybe just you know complete talk um So historical data Normally what what what is the suggested approach is that you first develop prototypes with the historical data? That's exactly what we have done with the historical data Once it is tuned once you have got those algorithms The algorithm is ready to be shipped to work with the new data Yeah Okay, all right, so um Now what I'm doing is that I'm selecting I want to convert this into c So that's what is happening again. It's ab base workflow. So this is my c code We convert it directly from the fourth generation language Um similar can be then also if you want to take it to android device You can call this from directly android studio. That also is possible and let's talk about the Twin part Which is physics based and interdriven together take it to cloud. So what we'll be doing is We'll be packaging that streaming function. So essentially that streaming function has some pre-processing element Continuously update element and then classification everything is just bundled what I'm doing taking that function And Converting this You know in a packaged environment So it's building the binaries It's now available as a cdf component Now I'm not getting into details too much details of it But essentially this can then be merged or put directly into something called matlab production server a server environment Okay, here. I'm showing matlab because you know, that's that's the company I represent But you could put in you know So here we're talking about assure where you have repositories here Then you have a mechanism to talk to enterprise application as well as uh management influence right So your cdf will then you know get bundled here all right, so um just to talk about personas here how we can build one and what kind of team You might might want to have So domain experts definitely okay Who understand your assets systems embedded guys Who can talk to edge devices solution architects Azure part right and then data scientists who can collaborate who sees the big picture um A data scientist can also be a domain expert Very well possible. Okay, but just wanted to bring out those subject matter experts Any thought of a difference I agree. Yeah, all right So quick summary of what what we have seen we started with that as a problem We quickly also summarize the generation part uh The last piece also talked about no How do you decide your a algorithm or how do you decide where to take your a algorithm? And we looked at that chart that how much time you have how much time your a algorithm has to act okay, so um There are other applications of digital twin as well That no you can have so we looked at predictive maintenance as a use case There are people who are building process twins Okay, that also is possible. So mostly your um Oil and gas guys who do downstream processing. They also have processes reliance maybe You can have business optimization. Let me just tell you a story of um energy optimization guys How how they have taken their algorithms? And it also relates to all of us So building iq anyone has heard of this company. It's a australia-based company Concernancy company their client Could be a hotel like this they came to them and said we want to save energy I don't know you might know this 30% of the energy is consumed by hotels large buildings malls And they have ventilation systems ac's which are really Inefficient because they do not take care of other parameters like what's the weather outside? How many people are inside? You know, I can keep on running ac here even if there is no one Right normally this is what happens that someone will come 8 o'clock start the machine or start the hVAC system 8 p.m. Turn it off So here that's that's your um set point as well Yeah, so what they did What they did They collected You know the other aspects like both business data as well as engineering data What was the energy consumption yesterday for entire last year two years? What are the weather patterns? How many people normally visit because it will matter if it is a mall Your weekday scenario weekend scenario will be different, right? So all that was considered And these they start building algorithm keeping Human comfort also or without sacrificing that art correct So, um, this was their original system that they had this uh hVAC systems Uh, and then it was controlled by a supervisory control It was cleaning the data filtering and then putting a set point What they did They build a predictive optimization algorithm Which was looking at data huge data historical data And this was adaptive optimization model, which was a multi objective model That was directly running periodically Okay on the cloud And uh, then it was also considering other other parameters giving you more accurate results and set plot What was the impact? Now here they don't have any physics based model as such Okay, it's more data driven, but they can go forward and and uh, you know build something for hVAC systems Okay All right, so you can you can learn more about these things. There's data steel story at labs cup. Could they also have implemented? um Now Gartner All of you know Gartner They say by 2022 Two-third of the companies who have implemented iot will have at least one digital twin implemented And this might occur maybe in next year or so So there's a tremendous opportunity You know in in building one my question to you is what's your digital card? All right. That's what I had. Um We have a demo booth outside. I'll be happy to you know be here. I'll be there till 10th Okay, so any further questions as such, of course, you can ask we have time, right? We have five minutes. Perfect. So, uh, I would say that no visit the demo booth You can you can learn more about how you can develop and deploy deep learning algorithms Predictive mentors and digital twin for sure. We also have a workshop on 10th, which talks about addressing deep learning challenges Okay, that's at 430 last the last session And then um, if you are a startup, um, also talk to us Okay, that's it ready for questions Yes okay, so, um Then when we build these algorithms, there are two phases first is a prototype phase prototype phase Where you build algorithms look at historical data and then you want to take it to production So matlab really comes in the you know prototyping phase matlab production server is a mechanism A piece of software which can be bundled put directly on any cloud Okay, so it can very well look at the load how many requests you're getting so for example, if it is a Uber taxi prediction app which you have built Um, right now how many people want to use your that particular piece of code? So it the software automatically balances the loads and then you know connects with other presentation layers like tableau spot fire It it has been taken care Yeah, so it is it is built for low latency environment So many many financial companies are already using matlab production server as well Yes Yes, of course. So uh, uh, with respect benchmark, uh, so rul or in general machine learning So definitely we have um I'll also give you a link where we have done a third party has done the benchmarks. Yeah Hi shitech Okay Hi, so I work for the ideo, which is a distillery company and we are working something similar to it We get the data from the iot level. We process it in terms of the digital twin But the challenge here is in the auto correction of the machines So can we look into that aspect because whatever is going to come out from the production Is going to be a visualization kind of thing like this inlet is the issue or sensor is the issue Can we send that information back to the iot and can this Absolutely, absolutely. So if you I don't know whether you looked at um this particular part, but that is also a key This arrow here once you have deployed It's going back to the sensor Yeah, so yes, that's that's possible. We'll we'll talk more about it And that needs to be done, you know, the they should be a closed loop Sure, I can repeat the question. So are there any pre-built Models physics based models which are available? Yes. So there are many models which are available. They get shipped with the tool We also give you components for example A cylinder can be used in a machine could be used in an engine. So you get cylinder That is possible. If you know the physics of it, you can connect it Hi Yes, hi, uh, so this start off curiosity. Yes Well, if we can generate quality data, why can't we emulate the whole engine and generate healthy data also? Like that gives you power in terms of you just need to have a physical model or physics model And then you can emulate what are the failures and it helps better engineer also Even before building something Yeah, so essentially, I mean if I understand your question that no, can you just Do everything with physics based? Right, that's that's you can generate the data from physics based Healthy and unhealthy and do your modeling over that itself. Like you don't really have to Have sensors invested initially. Okay. Yeah. So, um, I would say, you know, essentially we are doing that initially I mean, that's that's that's the important part. So I'll answer that question in in two pieces First part not everything can be modeled Like combustion chamber, it's it's difficult so difficult to model that with first principles So that's the first second part you got to have real world data to make it better and better Right, so it's always be a combination of an AI model and a physics based model Complementing each other Once it is tuned it becomes a golden reference. You can do a number of things with it Yes, of course, that's that's an important step there. All right. Okay I'll be available. Thank you very much. Thank you