 My name is Anil Arora, I am a part of SAS analytics practice team. Background wise I have been part of Accenture Infosys with SAS, so I have been working on predictive modeling right from my, probably from day one on my jobs, with SAS I am nominally assigned for my responsibilities really around the AI ML practice of SAS. So we are helping our customers in getting on to AI ML journey, how do they get onto that AI ML journey using SAS you know platforms. Now just one request for all of you, we have been circulating a notebook in the audience where we just need your registration IDs, so that in this in case any questions that come up and we want to get in touch back with you, we will have the details with you. So in the meanwhile we will be just circulating the notebooks. I know it is going to be tough for you guys at 5.30 pm in the evening and still being able to attending heavy doses of AI ML, but I promise you to make it more interesting, more worthwhile for you to be sitting late in the evening on Thursday. Right, so first of all thank you everyone for coming here and for such a great event, I think I have been attending couple of sessions since morning, I really found those to be interesting. I always thought there is so much of to learn from everyone and I really realized that today. Now before I really start you know on the modeling topic that I have chosen for today is essentially building and deploying machine learning models with an open SAS platform. Yes, I am calling it as SAS open, it is not an open SAS software, it is an open platform and the reason I say it is an open platform will I highlight in my next you know few slides. Okay, so before we start on how do you build and deploy machine learning models, I am just going to very quickly cover up on how really you know what is the analytics life cycle is, how do we go about building different phases of building a machine learning models and finally deploying them in downstream applications. A variable high level flow, I think the first and the foremost step of building any machine learning model, classical machine learning models, statistical models, even a deep learning model. The first step that you need really is a data to really work upon right and now you are getting data from not just a structure data basis, traditionally that is what you know we have been working on, we are getting data in the form of text, images, videos. So that really forms the engine for you to build any machine learning model or any statistical model. The second of course step is the pre modeling steps. Now this is where a lot of the scrubbing of the data really takes place right. You will do all kind of descriptive analytics, just see what is the completeness of the data, what is the availability of the data. Remember the philosophy right, garbage in garbage out. So it is our responsibility, our practitioner responsibility to really make sure that the fuel that you are putting in to make your machine learning models work is really not a garbage because you giving garbage, you take out the garbage. So it is very important to really have your data cleansed as much as possible for you to be able to build any kind of a machine learning or a deep learning model. Third step modeling, I think this is where people do get start interested about you know building as you know people still call that this is where the magic works. Now you could build you know n number of models, you could even iterate number of models with just one equation with just one algorithm. But you know the ultimate goal is to really build a model where you can define your own champion models and your challenger models. So be it a case of identifying a spam versus a non spam emails, be it a case of any classification models, any prediction models, any forecasting models, this is where the you know the modeling is really happening with the different algorithms now being available for us. The fourth stage post modeling steps again a very important part of building any predictive models right. You could build multiple iterations as I said with different algorithms even with a single algorithm. So personally I have also built using a one algorithm and running around 50 or techniques on the same algorithm. So this is where I was getting a bit greedy in terms of how do I really build a model which is the best that possibly can be used by my business. So at times it's good to get greedy in terms of really identifying what is the best iteration of a model but yes there are time constraints that we generally run upon right. So to really build a model and to identify which is my best model you do a lot of validation statistics that you really need to work upon and then you really decide upon which is your champion and a challenger model which really gets consumed in your either downstream applications or in a manual still some of the people are still using those manual ways of you know extracting the codes, extracting the you know algorithms pulling it back manually into their downstream applications but there are newer things that have really come up which I am going to talk about and possibly try to show a demo around it also. Last not the least but unfortunately the least spoken about and the least talked about because people still consider as the magic is here so why do you have to worry about all these steps right. To really complete the analytics life cycle you need the deployment aspects to be very strong right. You could build n number of sophisticated models using all those AIML deep learning techniques neural networks CNN RNN YOLO version V1 YOLO version V2 but if they are not getting deployed they are really a piece of paper lying in a bin right. It's all the efforts that you have put up till here right this becomes more of an educational exercise rather than your business getting benefited from the efforts that you have put in over here right. So it is very very important you build models which are which are really deployable and there are a number of reasons why our models don't get deployed a couple of them could be you know the model is not in sync with what the business expects I built a model in my previous organization and we statistically I mean very few people could come and you know ask me you know this model really doesn't work statistically I was very confident of building that model I gave it to my client but the client came back to me that I can't accept this model I was taken aback like what's happening the client said the variables that you have in your model I cannot do a classification based on the industry that you have taken as one of the important variables of building a model because as per the regulations of that country they cannot use that variable to differentiate on making any outcomes on any predictions. So this is where my whole exercise of building a model went for a toss and I had to rebuild that model again that gave me a learning that whenever you build a model always go and build that model which can easily be consumed also you need to have that business sense along with being a really a data scientist who can work really deep down all those algorithms this is where I personally feel the whole personality of a data scientist really comes in you just you just don't really have to be a very good person who can build models with the available algorithms that are available because building a model is using an equation or now we have you know tools that have come of which have made our life much easier in terms of building models. So that's that is not really you know that is a good part but not really the most challenging one so we need to have a full-fledged view of building and deploying the models. I am going to cover up couple of topics picking up couple of topics from this whole cycle one is the feature engineering I think this more of a public demand that you know somebody asking that can I cover on a feature engineering so I am covering that plus I am covering up interpreting the machine learning models how can we interpret those models machine learning models because typically not long ago people were you saying you know the machine learning models are not really interpretable so I am going to talk a bit on it then we will jump on a very quick demo of a building a machine learning model and then and then deploying it let's get deeper feature engineering it's the feature is really a synonym of an input variable like the inputs you call it as a feature right now finding best set of inputs to be used in modeling is really about identifying those parameters this is this really forms the crux of building or scrubbing or getting the best data for your modeling purposes typically happens at a stage of data and preparation in your you know the complete analytics life cycle why why is this really an important right because when you get a data from your data warehouses it's really you know in the form the way a business has been running at the back end so that's the variables those gets captured you need to you need to and you can make a lot of variables for you to be able to have a better prediction of the models now either the data is available in your you know existing data set that you get from your data warehouses or the other process could be you could build newer variables it could be any transformations of those variables or any kind of an you know engineering that you would do on the existing variables whatever the way of building or making a feature engineering the end goal is really to build a model which generalizes well on an unseen data when the model was getting built so that's the essence of how do you really you know do a feature engineering now there's there's a concept called curse of feature engineering because as you would have you know realized I'm talking a lot on creating newer variables high dimensional data is everywhere right and we create more we keep on creating some more variables around it now what happens is now we are getting data you know in a high dimensional space from everywhere what's happening is when you you know you are getting more features on your existing data sets the observation set that is there if that remains stagnant the sparsity of this feature space in that observation space is really increasing is probably increases exponentially so as your data as your observation space you know doesn't grow your feature space increases right the problem of model not getting generalized to a new data set increases that is where this this this concept is called as the curse of curse of feature engineering the curse of getting more variables that is called a curse of dimensionality you are there's there's a reason you do there's a reason you do feature engineering but then you always get a curse of this because your feature space grows and your observation space doesn't grow so you need to realize how much feature engineering really needs to be carried out and how much you really need to cut down upon the redundant variables probably now how do we how do we take care of this curse how what what actions can we perform while building a machine learning model so what does this include the fourth process I'm at a very level I'm just talk about one is constructing new features constructing new features is really akin to like you know day preparing a data's from the existing data sets that you have now this could be a logarithmic transformations this could be you know inverse transformations this could be an auto encoders this could be level encoding or this could be creating variables from the existing you know variables it could be deriving variables from the existing one for example you are given a date you know I think people somewhere from the banks also you get a date of a birth of a customer you get you know when did a person bought a policy from an insurance company you create those variables right you create the tenure of an or of a customer with a particular organization so all those stuff this this aspect is really you're creating newer variables so that's predominantly forms a part of my constructing new variables the second is selecting only key features right you could end up having you know 200 or 300 or variables but if I can select probably around 70 or 80 variables out of those 100 which is explaining the same variance that has been explained by my overall data set that I have I really don't have to pick up every you know every single variable of the 200 feature space and because if I'm able to explain the variance that has been explained for my dependent variable or whatever I'm to predict is what really I want those features I don't really need every every single feature of what's being presented to me that's one way of reducing multicollinearity that's one way of reducing or getting to some really important variables I'm not calling as significant variables because more of a statistical term which really comes into a model building process the other is okay the good part is selecting key features really helps you preserve the interpretability of the variables because there's no newer variable gets created with an orthogonal component it's not a PC but it's essentially picking up those same variable so you don't lose the interpretability of the variables that you have which makes you much more comfortable in explaining the models to the business because you don't have a newer variable which has been calculated basis some algorithm at the back end right so that helps in preserving interpretability clustering features again a multicollinearity reducing multicollinearity reducing the dimension space helps you in putting those variables in single clusters where all the class all the variables that in a form of particular cluster are really correlated to each other and not really be the other class so you could pick up probably one or a top two variables out of those clusters which can explain the model well not just by having all 200 or 300 variables now with this machine learning models coming up the complexities of building a model has increased and that's why the newer features that have come up for reducing your variables you know you would have heard about PCA newer you know algorithms that are now available is a robust PCA the difference is predominantly in the sign sign in the side of a robust PCA is basically prone to the outliers that are present in your variables if you have a lot lot of outliers in your variables the PCA will form an orthogonal component but that will not be as accurate as you will use that when you will end up using a robust PCA there are other methods that are available nonlinear techniques like auto encoders which predominantly works on the neural network phenomena it essentially helps so we are using neural networks to really reduce the number of variable are identifying the better variables again for reducing your multicollinearity in your data set SVD singular value decomposition again one of the better ways of you know reducing a multicollinearity but the drawback if I can call about for these algorithms would be you lose is the interpretability because there is a newer variable that gets created out of from those you know original variables that you have so that's the you know side you know disadvantage of using these algorithms the other topic you know I told about will talk about machine learning models being not used or still organized and still face some resistance in using machine learning models because they are not easily interpreted now can we really make them interpreted is there a way we can use them is there a way can I explain those models to the business which variables are driving a particular prediction which variable is you know positively driving it which variables in you know inversely driving it can we get on to those you know findings the answer fortunately is yes now how many have you heard about a concept called a li 5 now this essentially stands for explain like I am 5 so you you build a model not literally right you don't you a 5 year old child cannot just understand a machine learning model so not literally but you would you would be able to explain these models like to a 5 year old child so that's where these concepts come from and all the you know the algorithms that I talk about in my next slide is predominantly under this e li 5 post modeling diagnostics line charts partial dependence charts the latest one you know I was reading about the ice plots I will talk about relative importance of the variables etc etc line charts typical stands for local interpretation model agnostic explanations as you would say the first word is L which stands for local so the interpretation of these models is not globally global phenomena it's a local explanation which means that any instance that you would want to see why a particular prediction has happened in your data set or in a test data set or in a training data set you would rather perturb those data points for which you want to justify a particular prediction you would create you know n number of samples around it you build that model the same machine learning model give those weightages of those newly formed data set and run up probably a linear regression model or maybe a decision remodel now the output the dependent variable forms there is becomes the probabilities that were formed or a prediction that has happened using a machine learning model that becomes a dependent variable here now this way you will be able to at least explain why a particular instance of what you have predicted is really what are the drivers really behind it if you look at the chart on the line charts it's the the the idea was for you know building this model is to whether there will be a shot made by a player or not right the actual model says zero which means a shot will not be made but when I'm running a low line model on the model that has been doubled by a machine learning model the probabilities vary I which is 0.83 which means I am picking up an instance where it says the model will be very high the prediction is very high in terms of a yes but not at the global level so what how will you use it is you will go back and see what are the drivers that are you know driving this particular prediction so if the style short stylist chunk and your short area is centered there's a high chance that you're you'll be able to make a shot so you go back and see what is happening behind these variables what is the values that are created and you what you'll end up doing is you'll run multiple iterations of various instances of your model and finally you'll be able to give an explanation as to why really a particular instance is predicting a particular outcome that forms a part of my lime explanations another post modeling diagnostic is a partial dependence what this does is it picks up a variable right and picks up all the values of that variable for each value of the variable it runs a model on the other values that are present for other attributes the outcome that it gets it averages them out and gives a final prediction for that particular instance now again this is not a local phenomenon it's a global phenomenon so it might happen that a one variable that you would run a partial dependence plot says that your variable is not at all correlated with the particular prediction because it has averaged out right one prediction could be saying a probability of let's say 0.51 the other says minus 0.51 or not a minus 0.49 so it averages out so there are very high chances that your correlation really comes down to 0 now to counter this there's another method of doing a post modeling diagnostic is a ice plots individual conditional experiences this overcomes this efficiency if I can call that of partial dependence it doesn't really get to an averages of the prediction that has happened but it bought it nails down at an observation level so at an observation level or at a cluster level it will tell you why a particular prediction has been made that's the other part fourth one is relatively simple variable importance I think most of you would know it relative importance you know it creates an importance of the real basis the errors that have been reduced in the next steps by introducing another variable if the error reduces that how much the variable error reduces what really gives you available importance relative importance generally we look at relative importance I personally look at it important because it gives me on a it brings me on a particular skill from a 0 to 1 right it doesn't go from any so I don't know how to compare 81 versus 45 right I don't have a benchmark what relative importance will help me is really bring this to one and the other variables will and other values will revolve around this 81 so now I know this is half of this right currently you don't see it's a half of it so they say that is how you will be able to judge you know which is your better variables in terms of prediction what do we need to do to make some variables better in terms of interpreting a particular instance can I can I take the question now from a SAS perspective how do we make this possible right we I talked a lot theoretical concepts you know partial dependence chart the line charts ice plots how do we make this possible is an evolutionary transformation if I can call that is of SAS which is what we are naming it as SAS wire now the reason I say it's an evolution transformation I'm going to highlight in my next couple of slides but I'm assuming no one is dosing off so I'm still I'm going to play a very high voltage video to introduce SAS wire in the room for this is it a sound thing SAS wire an open platform again the reason I'm saying it open is is because it is predominantly meant to do any kind of analytics that you would want to do with a single platform be it visualization be it statistical modeling machine learning deep learnings optimization forecasting everything happens in a single platform which is SAS wire yes next slide talks about why it is open from the left hand side you could see you know it can ingest data from any sources be it in stream be it Hadoop or be it in your you know in premise databases on the bottom side you could see an Amazon Azure GCP open stack you host your applications on any of the clouds it supported even in supported on your on premise also on the right hand side you would see R and Python you know the API is so in that sense we are open to an open source word now we realize you know there's a power in the R and Python so why not our customers end up using what they want to use you choose the language of your own choice you choose which cloud you want to get on you choose the type of data that you want to ingest in will help you get on to that journey that's essentially the message I'm going to take embracing and extending open source is what the SAS wire really accomplishes for you covering just picking picking up some few you know snapshots as to how it is really an open environment language of your choice you remain on SAS you remain on Python you remain on R but you still be able to use you know SAS at the back end to run your processes so at a leadership level probably you know you the leaders will not be bothered about whether I need to hire a SAS guy I need to hire a Python or an R guy you know what they are really matters is is a business problem getting really solved so this is where you will end up using any language of your choice SAS is on GitHub which means the codes of running you know any algorithm predominantly most algorithms are now available on GitHub so go ahead pick up the codes run in in SAS and you'll be able to you know start using analytics without really starting it from scratch of building a models right from the rest API's yes the model that gets built up can be can can be transferred into a rest API's which can help you in consuming the applications or the models that you have developed in your downstream applications third part is run native models in SAS you write your native programs which is written in Python or written in R you'd be able to bring them in SAS compare those models whichever giving whichever gives the best performance in terms of the statistics that we want to evaluate go ahead evaluate that and you'll be able to consume the models whichever that gives you a better you know accuracy supported on ONNX formats if the models getting built on any of the other platforms be it Amazon be it any open source R or Python we are supported on the ONNX format so we can read the ONNX files also which means the area of your playing forum from a practitioners point of view is really eased up is really massive and this is where we can help the practitioners really to build an analytics ecosystem in an organization this is a very late entry to the slide to the deck the reason this was a late entry is because I was on the SAS booth today and there are a lot of questions that were coming of as to are we there in the are we doing image analytics are we doing video analytics or not is the are we in the computer vision space or not just to give you a message with SAS via machine learning deep learning natural language processing and of course the supporting technologies are all available I'm not going to dwell too much on it I have been mandated not to make it a more of a marketing pitch but this was just to let the group know because I have got a lot of queries today so this is just to let the group know that you know all these are possible in various algorithms that are present in SAS okay use case the demo it's a default prediction use case that I'm going to bring today I could have I wanted to bring on the video analytics one but I didn't have the GPUs really to work upon with today so I brought on this miss Sarah is an analytics consultant you know she works with the bank it was given to her that you know you build a model where you're trying to predict whether a customer will default or not she doesn't have any knowledge of how to build a multivariate models so maybe let's just try to help me Sarah in helping her assignment complete let the fun begin then I'm picking up a data I think I talked about the important part is to really understand what's happening in the data which is the pre-modeling steps right let me see how my data looks like some customer ID is present durations in months per first credit taken years at employment marital status other debt indicators I've got some five missing values in my age so I really need to you know have my data complete before I proceed with any modeling techniques let me do some I'm short on time but I'll just go back now there are auto ML techniques that you know that comes up with so you know as a first cut you will get a particular models built on a lot of feature engineering exercises and a lot of machine learning models being used with an auto ML technique but of course as a data scientist as a practitioner you would still want to you know refine those models which have come automatically from the tool so you would still get those flexibility in terms of you really been building a model on top of what an auto ML functionality is offering from the tool just getting the data set for my predictive modeling exercise and that's it let me just define what my target variable is so default on payment is why target variable so change that to target now you know I'll do some bit of data mining activities I knew there were some you know five observations that were missing and I knew once I saw the data the data was skewed a bit so let me just do a bit of an outlet treatment also yes there are techniques where you really don't need imputations or outlet treatment using the WA approaches I'm sure many of you would have used it you can still use them but I'm just running through the processes of the way I feel you know I'm just running but of course you're free to use WA approaches typically started with in the risk analytics space in the banking industries but you know a lot of marketing analytics teams are also using WA approaches because it simply avoids you it simply helps you in not really running through the imputations outlet treatments and just helps you in defining a better predictive models you don't really need to do a lot of variable reduction techniques also because automatically the IV values the information values also comes up but right now I'm just going ahead with the missing values the outlet treatments some bit of a feature engineering and then I'm going to build models you have the options of picking off choosing and defining a particular imputations for all different variables currently I have just one variable which has five means of course in a in a practical scenario where you really end up building models you don't get data in this neat and clean form I am agreeing with because I worked a lot on really messy data sets but given the time constraint that I have I'm just picking up and showing the ML techniques and how to really deploy those models so just building a pipeline of the project as to how I see that this be the way it is transformations the best just adding up some models no ideally the way I would really like to you know work upon when I really built a model is really multiple iterations just for one algorithm and then running multiple iterations for another algorithm so it's really for me it would be related m cross n matrix I have for just for the sake of time I'm doing a very simplistic way of building a model but of course this can be really really complex as to you know how you really come upon getting the best iteration for your models what I'm going to do is I am going to run an auto tuning option for one of these models gradient boosting or a random forest where I'm not going to define for example for a random forest I'm not going to bother about what should be the number of variables to start with what should be the number of observations to start with what should be the number of trees to start with let the model run give me the best you know results and then I'll fine tune those hyper parameters I'll do I'll for one modeling algorithm for example gradient boosting I'll do some changes to myself building those hyper parameters run a logistic maybe a step wise with some significant values of alpha of point five or point sorry point zero five and then we'll see all the results look like me change for a random forest trees that the number of trees be hundred generally you would start with a 60 percent of an out-of-back sample with a random forest and thumb rule being a square root of n of the number of variables that are present that's a thumb rule of course you know doesn't really always fits so let me just start with five here because I have around 25 odd variable 25 26 odd let my gradient was that Chile will not classification I'm just running this on auto tuning so I'm not worried about the L1 regularization L2 regularizations the number of branches the number of you know number of optimal number of observations in a particular node I'm not worried about all those now I want to click this button and run this model now it's going to take some time in the meanwhile let me also you know show you how you could actually build up an open-source model by running SAS okay this is a use case that I had talked today to a few of my customers today in terms of in the image analytics space right and this is the one we did for one of the governments in India where the use case was really the go the government wanted to identify on a particular highway is there any two-wheeler or a three-wheeler that is running because that highway was just meant for a four-wheeler vehicles so you are not the a three-wheeler an auto or a scooter was not allowed to run because it's a very high speed zone now the task was to really using the images that the cameras that being deployed how do we see whether there's a two you know two-wheeler or three-wheeler being driven out there if you see there's a package called SWAT again this is available on github but this is being developed by SAS which really helps you in using SAS along with an open source so if you don't run SWAT you'll not be able to use the integration tech it's really an integration technology between SAS and open source but you know which SAS you'll get this and you'll be able to integrate with the open source and you would continue writing as I said you know you continue writing your codes in Jupiter as you would want as you would run in a native you know python world some of the advantages that come over here but I'm not going to talk about because I just want to show you the images analytics video cases being solved by using open source and SAS for one of those customers and we are using a VGG 16 model to be able to you know detect whether the particular images are two-wheeler or a three-wheeler or four-wheeler almost run just writing 30 seconds more now how do I if assumingly you know we have passed the validation test we have passed you know the model which I have chosen is really to be used for deployment purposes how do I really know how do I really consume those models the deployment aspect that I talked about the product really gives you in a scoring API which can be then consumed in your downstream applications now I'm going to you know show as a last step of the demo as to how do I call a particular model from my downstream applications it could be a mobile app it could be a website you know it could be your own application that you would have developed so this is an application that was developed by a developer in SAS not really to do with a data scientist per se it's more of a development exercise developer exercise rather where somebody who's a more of a hands-on coder will develop you know these kind of applications but once these applications are developed you will be able to you know communicate with these downstream applications so what really you need to do is very quickly whatever the downstream application is and wherever your server is they needs to authenticate that you know this machine can really talk to my SAS for because they need to authenticate and validate that particular downstream application so these are the steps you will need to take before this really gets into a driving mode this gives me all the models that were built on SAS while I'm remaining on my downstream applications and you pass on these you know values let me just do that so once you pass out these parameters for example I'm passing ages 35 I think I'm shooting time but that's what you really get is once you fill up all this information another lot of variables I really don't want to see all my variables in my final model yes I would really want to stick around five or six thirty seconds more yes I would really stick with five or six models in my in my final model this is the significance criteria once you submit the result the scores of that particular customer being defaulted or whatever your business problem is comes down on your web application or on your downstream applications so a person sitting really in the field working in a field working in any downstream application doesn't really bother about what is happening in as an analytical engine he'll still be able to use SAS for your live scoring of your models thank you I think this is what I wanted to cover today I know we have a question I'll take it thank you very much thanks