 Hi, welcome to the session. So, today we are going to talk about automated machine learning. Before we get started, how many of you have built machine learning models? How many of you know how machine learning works? Thanks. So, I am part of the automated machine learning team in Microsoft. My name is Deepak Mukuntu. So, this session we will cover it in three parts roughly. We will talk about what generally machine learning model building process takes and what are the challenges there. And then we will define automated machine learning in terms of how it works and how it helps. And then at the... Is this better? Should we increase more? Can you increase the volume please? Is this better? Better? Okay, thanks. So, we will talk about machine learning process in general and what challenges it has, how automated machine learning can help and how automated machine learning works behind the scenes. And then I will end with a demo showing how you can use it today regardless of your experience in machine learning. Okay, so with that, I think almost every one of us know that AI and ML is a trend now. This just shows one number from Gartner which predicts that by 2022, this much of revenue or attribution will go towards AI or machine learning in enterprise. And where is all this money going to come from? Are there specific scenarios which will bring many... like a slice of this? It looks like we may be trying to predict which area is hot in terms of machine learning in AI, but the answer to that is almost every area across marketing, sales, service, finance, operations, workforce, any scenarios you talk about, there is chance for machine learning to help that scenario. So with that, I am not going to spend a lot of time going through each of these scenarios in detail, but I hope if you look at this slide, some of these scenarios resonate with what you do and what you are interested in doing. When enterprises start looking at machine learning for these areas, first thing they start looking at is they don't have data scientists, they don't know how to build machine learning models. So typically they go for pre-built services, pre-built AI services that big cloud providers like Microsoft offer. In Microsoft we have this concept of cognitive services which is Microsoft built models are hosted and made as rest end points for services, services for you guys to call, where you don't need to build any machine learning models. So if you ask me what is the best way to get started with machine learning, that is probably the best way to get started with machine learning and infusing AI into your application or your workflow, whatever you want to do, without getting into the details of building a machine learning model yourself. But why do then people build machine learning models? It's because most of those cloud offerings are generic. They may not be suitable for your own scenarios. For your own scenarios, you may want to customize models to suit your needs, which is where you end up building your own models with automated machine learning. What we are suggesting is start from this because it's a system which is going to automatically produce the best model pipeline for you and if you want to customize more, then you go and improve that pipeline further. So with that, I'm going to talk about what typical machine learning model building process takes. We are going to take a very, very simple example of car price prediction. So let's say you are a data scientist or you want to build a machine learning model which predicts the price of a car. Let's start with this problem. So typically, you start with what features make sense. So features are basically inputs that makes a good model for the scenario. Good features in this case could be things like mileage, condition, age of the car, make model of the car. Anything that you think is an important attribute of the car or surrounding things that determines the price of the car is a feature or an input variable. Once you identify the features, the next thing is an algorithm. So as there are a bunch of options you have, typically data scientists pick the favorite that they are familiar with and then comes hyperparameters which are basically parameters that help your model produce the result. So these are the parameters that guide your model through the process of prediction. So with these three in mind, typically data scientists start with a good set of features. Let's say I start with mileage, and then I go and pick an algorithm. In this case, I am picking gradient booster trees for example. And then for that algorithm, there are a bunch of hyperparameters that I pick. So each of this will have n number of values to pick from which means that you have so many combinations to try. Just with one algorithm and those three features for example, you have at least five in two these many combinations to try as machine learning pipelines. So you start with one combination and then you get to a model, let's say the model is 30 percent accurate, not good enough. So you go back, you go back and try to figure out some other parameters for the same algorithm, will it give a better result. In most cases, you go back and pick a different algorithm. In this case, I am picking KN and after gradient boosting. In a lot of cases, data scientists go back and figure out new features to add to get to a good model. So this is a pretty iterative process. And then finally you get to a model which meets your needs or your bar and then you go and deploy that model in production and you get a car price prediction service out which you can use. As you can see, this is a building machine learning model is a pretty iterative and time-consuming process and a lot of this can be automated. So as you can see, this is not rocket science. You are trying out different combinations, you can do a for loop with all the three level for loops simplistically speaking. But that means you are basically doing a brute force search of all possible combinations which may not be suitable for your scenario because you are going to spend time and money. If you are using compute, you are basically trying to, you are going to spend money. So you need something that can automatically figure out the right machine learning pipelines for you to try given your input problem and your data set. So which is where automated machine learning comes in. So with automated machine learning, this is a system which will give you the best machine learning pipeline for your scenario and all you need to bring is an input data. So this is just a hypothetical view of the system. So think of this, the middle piece as the system. The three inputs you bring is your data. You indicate your data in whatever form you have and then you indicate the metric you want to optimize or a goal that you are trying to optimize for and then for each because the system is going to try different combinations, you also need to guide the system by providing some constraints. Constraints could be things like I do not want to try more than 50 models. I do not want to spend more than two hours. Things like those are constraints. So you bring these three as inputs and the system is going to automatically give you the best model or a good set of models that you can choose from. So that is at a high level what automated machine learning is and how it works. We are going to go in detail on how this middle piece works in a bit but this is a very very high level view of how the system works. So who is this for? Is this for data scientists? Is this for people who do not know machine learning or Python? The answer is that when we started this journey of automated machine learning, we started with data scientists. So the accelerate AI was our mission and the focus there was data scientists spend so much time as we saw building a good machine learning model. What if we could automate that? So that data scientists instead of spending weeks to get to a good baseline model can get a good baseline model in couple of hours. That was the goal which is where accelerating AI as a mission came in and then as we started pitching this to customers and customers started using it, we started a lot of customers saying we do not have professional data scientists. We do not know people who have this end to end machine learning expertise. We have data analysts. We have engineers who want to build machine learning models because we saw in the first couple of slides how the demand is high for AI but at the same time supply of good data scientists is very low. So to meet the supply versus demand problem building machine learning models, you need to enable people who are not data scientists who are domain experts, data engineers, data people to be able to build machine learning models. So that is where we started then from Accelvate AI, we added democratizing AI also as a mission and I am going to show you how we are enabling that at the end of the deck and you can regardless of your experience in ML you can try it. We have different offerings you can try out. And then as we mature and as customers matured through the process, we started seeing patterns around customers saying I have hundreds of windmills and I want to build one machine learning model for a mill but then different mills have different characteristics different inputs. So I want slightly different models for different mills. So I am working with a customer today who is a banking customer and they want to build different machine learning models for different ATMs to predict how much money to put in the ATM on a regular basis depending on how customers their customers in that locality withdraw money. So there is a need for easily scaling out machine learning models from one model to many models which is where we are now investing in scaling AI as a mission. So with that, I am going to spend the next 10 minutes or so going deep on how this works. It may be a little deep. I will try to keep it as easy and as high level as possible. When you look at hyper parameter tuning which is the last third phase of machine learning models. The first phase was feature selection second phase was algorithm selection then hyper parameter tuning which is for the features and hyper and algorithm just tuning hyper parameters. Just for that last phase, how do you solve it? There are various options today grid search, random search and Bayesian optimization and there are there are theories around which one is better or worse but each of this is going to be a time consuming process and even if you just pin it off the machine itself takes so much time to figure out the right hyper parameter combinations. This is just one of the three steps of the problem space. These things work but with caveat right. So these things work well when you have your number of hyper parameters is less than 10 which in most cases not true. In most cases your hyper parameters are a large space and 10. So these these approaches may not work out well you are willing to wait. This is going to try out many different combinations. So you need to be able to willing to wait and there is also an assumption that parameters are continuous and not discrete. As soon as you get into like parameters that are combination of continuous as a discrete you need different solutions and those the three approaches simple approaches we talked about will not work. So now with automated machine learning on Azure which is the team I am part of we Microsoft research came over to the breakthrough research and that is what we are we are using as part of our automated ML offering. So the research had two key intuitions that the whole system is based on. First key intuition is that given there are so many different possible ML pipelines around featureized around pre-processing the first step is typically pre-processing and then your algorithm selection hyper parameter selection the number of pipelines that you can think of is so big that it's a continuous space. For us to be able to make a system which can automatically detect or identify machine learning pipelines we need to discretize it. So that was the first intuition discretizing ML pipelines and then second one was we cannot treat each data set as a separate entity on its own. We need to start seeing patterns between data sets so that when you bring a data set when we look at your data set in terms of characteristics we know that this data set is like this data other data set that we know of where this kind of pipeline worked well. So we can now map and say that your data set is close to the other data set we have seen before. So this pipeline works better for your data sets that's so that's these are the two intuitions on which the system is built the system and the background is like Netflix or any streaming service which recommends things for users based on their interest so this is a recommendation system which recommends machine learning pipelines given your data set. Think of it that way plus it is also backed up by a probabilistic framework which tries to explore and exploit different possible pipelines. So different between exploration and exploitation quickly talking about it. Exploitation means if you have found out a machine learning pipeline that gives you like 70% accuracy you can go ahead and just hyper parameter tune that thing continuously just hyper parameter tune and in the hope of getting to a better model or you can say that no that is going to give me locally optimized approach local maxima I want to get to a global maxima which means that I want to not just look at those features and those algorithm but also try out something else so which is where the system has a good probabilistic approach of balancing between exploration and exploitation. So this slide basically if you are interested in going deep that link is a research paper Microsoft research paper that you can read up but this link is basically showing you how the system works. It's a combination of collaborative filtering and Bayesian optimization and it brings uncertainty into account. So it's trying to the system is going to try to predict the pipeline with the best accuracy or performance minimizing the uncertainty each of the dots here is different pipelines that the machine the system is going to try. So I'm going to just run an animation. So on the top you're seeing is as the system recommends different pipelines how is the model accuracy of performance increasing over time and on the bottom you're seeing how the selection of pipelines impacts performance and uncertainty and the system is going to try out different things in a probabilistic way to maximize performance and minimize uncertainty. So all the blacks here is minimal uncertainty as you can see and then the light color means your performance is maximum. This is how at a high level system works. So with that I'm going to fast because we have 45 minutes I'm happy to answer questions offline if you are interested and this slide deck will be shared so you don't need to take pictures and I have links wherever relevant for you to go learn more. This is basically when I talk to customers and say bring your data and we'll give you model people are like no I cannot give you my data data is so sensitive. So I want to clarify that even though I say data is one of the inputs we do not read your data. So all we need so this one what is showing is on the left is your system customer subscription or your local machine wherever you're running your machine learning model training this is our brain or the meta model that I described how it works what comes from that side to this side is essentially some metadata from your data things like how many rows of data you have how many columns you have what data type do you have in each column we don't even read your column name because even column name can have sensitive information that you don't want to share. So that's the first input that we get on this side plus what is the problem type are you doing a classification classification problem regression problem forecasting problem you can specify that so that's another input that we take and then the machine learning model the meta model recommends the first set of good pipelines that we believe will help you and then that goes on the on that side and then training happens on your side. So you control your compute you control resources for training and then the results of the training which is in this case accuracy of the model of each of the pipeline gets fed back to the system and the system based on the accuracy of your models recommends the next set of pipelines so this is how it works so I want to reinforce that we do not see the data. Data is with you we only get some metadata at a high level and then the scores of the pipelines with so far we have talked about how automated machine learning can help build a good machine learning model quickly and give you options that you can choose from but then in the real world when you want to put models to production things like transparency becomes super important which is where we we invest a lot in that so for example model interpretability and feature importance what this means for a model when you have deployed in production you want to be able to explain each prediction for example let's say you have a credit scoring application or you have you have a a machine learning based system to approve or reject a loan for a customer banking scenario if you reject a loan you cannot tell a customer that your loan got rejected because we have a machine learning model so you need to be able to explain that this model is rejecting your loan or you don't even need to say this model the system is rejecting your loan because of these attributes going out of bound of what we think is the right thing so which is where feature importance comes in as one of the key transparency features which a lot of customers need before they can deploy a machine learning models to production in addition through the training process even though we say this is a super magical system which give your data bring your data and will give you the best model there needs to be guardrails things could could go wrong your data could have problems like for example if you have a classification problem where you're trying to predict between loan defaulters versus not there is a typical class imbalance problem where number of defaulters will be much much lesser than number of people who are not defaulting right so you need to be able to treat your data before feeding into this process because otherwise the output is just garbage in garbage out but the systems like automated machine learning guide you through that process for example when you put your data in we will through the training process figure out that your data has imbalance and flag it to you in some cases we can automatically fix it for you and tell you that we have fixed it things like missing value so if your data has missing values we have a techniques to automatically figure out the right ways to impute your missing values and we tell you that we imputed this column had these many missing values and we fixed it automatically for you things like those I don't want to go into details here but the meta point here is performance of the model or getting to the best model is not the only thing most customers before they get put a model in production want transparency so with that now how can you use this yes we automatically do feature engineering also there is a simple flag I am going to show you some we can take more questions at the end we do feature engineering also so now going to how you can use this system this is our first version which was a python SDK which really we released in December last year as you can see this is how you specify your input so what this is showing is if you can read it is better yeah so what this is so showing is you basically indicate your task type in this case you are saying I want to build a regression model again the python SDK so assumption is that people who are using this no python and no some level of data science but I have other experiences which are tuned more towards non-data science folks which I will talk through here what you are saying is my task is a regression task and my primary metric is R2 score R2 score is what I want to optimize for and then iteration time out is each of the models that you are trying should not take more than 60 minutes for training and then I do not want to try more than 30 iterations so what they are saying is try 30 models 30 best models you can each model should not take each training pipeline should not take more than 30 minutes and then maximum concurrency meaning you can do two at a time two parallel times all of these are configurable parameters you can specify and then we also do cross-fold validation for example you can you do not have to specify explicit validation data set to us you can say I want to do 5-fold cross-validation n-fold cross-validation we support that and then preprocess equal to 2 this is a simple flag one flag you set and we automatically do feature engineering for you things like to answer your question I do not have a slide but I will talk about it the kind of feature engineering we do is simple preprocessing first imputation missing value imputation is something we do and then if you have text features we automatically produce the right features in terms of like n gram type features word embedding type features is automatically produced if you have a date time column which is very typical in a lot of time series forecasting type scenarios we automatically figure out good features from that date time column things like week of the year things that data scientists typically build themselves we automatically produce it for you and then we also do before the training process we also scale and normalize your data automatically because in most cases you need to normalize and scale your data for the model to do well so that is a model specific thing we do automatically also so all of that is controlled by that just single flag so you just specify it true and we automatically do it for you yes so we have an api which you can call to get featureage data also this is a good question we did not have that when we originally released in December but we had an update correct correct exactly so the idea is that this is the system is going to be magical but it is going to be transparent otherwise you are not going to trust it so at every point where there could be a black box mindset we are trying to open up that black box feature engineering was one thing we did not open up when we released it but since then I think in April or May we announced a separate API that you can use to figure out what feature engineering we did and also download featureage data that you can then use to train your own model for all the pipelines it will give you exactly what feature engineering happened what algorithm was selected and what hyper parameter values it can it can take any pipeline ideas and input and give you the exact end-to-end visibility for that pipeline another thing I am going to talk about which we do not have today but actively working on is generating python code for the training pipeline so that way you can use the system to get to a best model, best pipeline but you do not have to go and recode that if you want to customize that further we will just single click of the button you can download the python training script for that pipeline so that way you will get exact code and then you can go further so we are focusing on transparency which is a big SDK works on local machine also so I can actually run my own Jupyter server here and then use python code to run it requires an azure authentication because to call the metamodel that is the only thing otherwise your data can be here training can be on your local machine but it requires azure authentication just to call the metamodel standard azure auth azure auth service principle authentication ok so this is for data scientists who know python but as we said not everyone is a data scientist not everyone knows python we have some investment around making it easy for those personas first one is we enabled a UI simple UI in the azure portal I am going to show a demo on this where you do not need to know python you do not need to write any code it is just a UI experience where you bring your data everything we talked about in the SDK but through UI and then go through the training process and then you can basically download the best model from here or deploy it directly from here single click deployment I am going to demo it to you this is the azure on azure portal azure ml portal so with that let us go through the demo and then we will come back to other experiences we have a power BI integration SQL integration also the portal is in your subscription so when you sign into the portal you are basically using your own subscription right so it is data it is your slice of the public cloud where you have control on everything we do not see like azure does not see whatever you have there we do sophisticated benchmarking which is to make sure that we are comparing ourselves with other similar competitors I can name them but you can follow up with me we basically say standard we have a good repository of standard data sets and we compare ourselves with other competitors and see how good we do for a given period of time second one is we also try to compete with humans and that is something we are starting to invest more in things like competing in Kaggle competition compete in Kaggle competition and see how good we do versus humans I must say that this is not going to replace humans for sure right this is going to this is going to give you the power give you the productivity you need and transparency so you can use this as a baseline system there I have heard customers talk about when I ask them why are you not using it they are like no I have my own model and then on the side they will say I am still using it because I want to make sure I am not making a simple mistake that an automated system can do so some customers use it in production some customers are like I am using it to compare myself with this automated system just to make sure I am not making silly mistakes so there is a wide spectrum of customers using this stuff but no feature engineering also happens in your side so everything happens in your subscription or on your local machine if that is where you are running so we do not take that is a good question but I had that same question as a PM when my dad was saying I want to do that so it is basically Azure no it is we do not charge anything more than your Azure compute so if you are using Azure then you pay for the compute that you are using on Azure the service itself is not charging anything more for you the key competitors we look at our H2O data robot and these days other customers other companies like Google Auto ML Tables has come up Amazon has hyper parametered as an offering they do not have a full end to end so far but yeah those are competitors we look at so this goes back to the feature engineering question we do look at your data but that looking at your data happens in your subscription because when we do feature engineering we have to look at your data and see missing values we cannot we do not look into your data our code which runs in your subscription as a client looks at the data that is a good question that is a really good comment but I want the meta thing there for we do not look at your data is that you control your data that is the meta point because people can say if I give you my data and you are going to give me a model then you are basically looking at my data and you will overfit to my data so that is not happening that is the meta point that I am trying to make any automation we do on feature engineering happens in your subscription in your control which we cannot look at but we have to look at your data at that point the system has to look at your data without that we cannot feature engineer and I want to reiterate data is on your subscription we do not look at it think of it this way there could be two models one is that there is a system I send it my data and the system does this magic or the system has two components the cloud component and the client component the client component is sent to my subscription and the client component is what does feature engineering the cloud component which is the brain recommends the models the algorithm hyper parameter combinations so that is here in our control which is where I am saying we do not look at your data but for feature engineering the client components runs on your subscription and we do not log anything so your question could be what if you are logging some stuff there and then sending it to your service we do not whatever we log we are very very careful to again abide to the same principle of we do not look at your data so I can share more data with you offline something okay so I think the question was data robot has some number of algorithms at it right how many algorithms we support I can give you data offline those are things that we haven't published publicly yet but if you have a scenario we can we can work with you to say I the only thing I can publicly speak about is that I have known customers who have tried data robot and H2O and chosen our offering primarily because of cost and completeness end to end completeness offering because with Azure ML you get the automatic like deployment and ML ops capability which those offerings the competitors do not provide so it's a single place which has auto ML capability but also has the end to end life cycle management of the model that's what interests customers more but I can I can talk to you offline okay yes we give you the column IDs plus that's again that gets sent to the client so what gets displayed to you is from the client so the client knows your feature name what we know is some IDs so when the ID comes to on the client your subscription it maps it to the feature name and then any visualization that gets displayed uses the feature name that's how it works yes we do so as part of the transparency work we do provide the definition so that you can run it yourself you can run the logic yourself when I talked about we want to provide the end to end Python code for training we do one for the first step is the feature transformation logic say that again yes so EDA there is an investment as part of Azure ML so I will show you a demo if I get time but as part of that we are looking at EDA also to be plugged into here so that way you can start from your data do the EDA through the experience and then bring AutoML so end to end experience will have EDA into picture also but we don't have a good story around will that also be part of the code that we generate that we don't have yet because EDA experience itself is something that we are working on right now can you hold that can you hold that for I do want to make sure we spend time on the demo and then I can I can answer questions okay so this is the cordless experience that we shipped as part of build April May time frame this is still a preview capability as part of Azure portal so basically what you are seeing here is that these are all different experiments or AutoML runs I have done so to start a new experiment I just click on create experiment and then give experiment a name select a compute so I am just going to select a pre-configured compute that I have but if you don't have you can create your compute from here next so this is going to now iterate through all the storage accounts I have so that I can use data that I may already have on the storage accounts or I can upload the data as a CSV file or whatever format you have through this so in this case I am going to just pick a CSV file that I have already uploaded to my storage account which is for the same scenario we talked about earlier so this has a bunch of columns it also has some missing values as you can see here you can easily select or unselect features like you can say I don't I want to minimize my training time so I want to reduce from 20 columns to like 10 columns and you can basically say don't include this column or include the other column and profiling I mean I don't have time to show this one because this takes like a minute to basically what this does is so what this does is it goes and looks at your data but not really will be looking at your data this is running on your subscription so this goes looks at your data and shows trends around for each of the columns how is the data distributed what is the mean median typical things that you would do it through EDA experience this does it I am not going to wait for it this is going to take like a minute or two and so once you have once you have seen that you can specify what is the task type in this case it's a regression task and then target column which is price in this case that's it so this is the only minimal inputs that you need to provide but we want to make sure that this is also catering to like sophisticated people who know machine learning data scientist where you can actually control you can choose a different primary metric you want we have automatically figured out the right thing for you based on what you gave but you can choose different things you can specify a number of iterations you can change job time for each iteration you can change you can provide a different validation mechanism if you cross forward validation mechanism if you want you can also provide concurrency this is an interesting one a lot of customers tell me that I like light GBM I know that for my problem KNN will not work the best so we provide this capability you can select algorithms that you think will work better for you so that way we don't search the whole space but we can we can search only for the algorithm that you are interested in and press next so what what's going to happen now is R to ML the cloud is R cloud is being invoked and whatever you gave as input is being sent and then this will now start running different iterations so this is going to I have done this before for a hundred iteration run on this data set it takes anywhere from like 14 minutes to an hour but I am going to switch to stuff that I have already run so this one is exactly same thing but for I ran it for I think 20 iteration last time so once the run is complete this is what you will see you will see how the model how the system tried different models over time how did things improve the orange line is basically tracking the best metric and then for the next thing you can see is you can look at all the pipelines it has tried so the first the best and sorted by the metric you are trying to optimize for in this case voting ensemble so how many of you are familiar with so ensembling is not only trying a single algorithm but based on good performing algorithms for your run we automatically figure out the best ones and then ensemble the ensembling is basically think of it as simplistically as averaging out multiple different algorithm output into one and then giving you that final output so voting ensemble so ensembling typically wins the most cases we have seen ensembling win because it takes the best of multiple algorithms so each of these is a single algorithm so in this case it's saying it did some standard scalar wrapper transform and then it's using random forest you can also get hyperparameter values through this experience when you go in so let's look at voting ensemble for example so when you look at when you go through each of the pipeline we give you graphs that as data scientists you will typically plot yourself different problem types we have we show different type of charts this is where if you enable the explainability we also show you feature importance graphs here right so with this you can look at it and then we also compute a bunch of original metrics that we show here so it's not only that primary metric we also compute around 20 different metrics that you can type use for tie breaking if you have two models that are performing very similar for your primary metric you want to compare with a separate secondary metric you can use any of these and then you can either download the model as a pickle file which is again a transparency feature we don't we don't do a black box you can download the pickle file and go wherever you want to do to deploy it or you can deploy it single click from here right so if you do that as you can see this is something I did last week last month but basically if you click on this it's simple you just give it the deployment name you are asking it to automatically generate your scoring script that will be used when the model is scored after deployment and that's it so it's a single key deployment so you click on the deploy button so model name is required okay so let's say odsc demo deploy so now this is going to take the best model in this case voting ensemble create a container for you an ACI container in this case and then deploy it so it's going to take 20 minutes so once the deployment is complete you this would turn green like you saw in the just like a few seconds back with a link to a web service that you can use in your application to call the model so this is a very simple like no code experience to train a model get to a web service so web service you can either call invoke in your application or you can we also have deep integration with power BA so if you are a power BA analyst you can actually consume the web service through power BA and I'm going to quickly show you how you do that so in power BA you have this entity the concept of data flows which has the concept of entities think of this as a sequel database with tables so each of the entities is logically table in this case I have car price as a table or entity you click on this to go in the entity to explore so this is going to take a few seconds but once this loads up you will see a new option called AI insights so you click on the AI insights and it will give you options to query or consume any Azure ML models that you have built including models built through automated machine learning in Azure right so this one is going to now it's looking for this user what model this user has access to so in this case this user has access to around 8 different Azure machine learning models I'm going to pick a model that I deployed in the past same car price same flow that we saw in the UI deployed web service and then when I hit apply this is going to now go and create a new column in the same data entity and invoke the model for each row and then populate that last column so you see here so this one is the model populated column price and this one is the original price as you can see original price also had missing values but you can when the model we spent hardly what an hour to build a model with just 20 iterations and that came this close to the label column but you can imagine if you want to try a better model you spend like maybe 2 hours you will get a much better model plus with the transparency feature when we have the code generated you can go and customize that model however you want that's a good question yes currently we support supervised learning only unsupervised learning is on our roadmap similarly we do not support deep learning currently we only support classical machine learning but deep learning is coming soon unsupervised learning in terms of the clustering is slightly far out but deep learning using images and stuff is coming soon that's a good ask a lot of customers ask that we do not support that today because the challenge is that we the meta model the brain needs to be retrained for any new primary metric we add so if you have a metric that you think is generic enough that applies to almost everyone then we can work with you to make that as the primary metric we support but if you have your own metric that's something we cannot support because we are generic service just a minute so before people move I want to show you other experiences we have and then we can take questions give me a minute so we talked about the UI demo and then consuming it in Power BI if you are a Power BI analyst and you want to build a machine learning model in Power BI yourself this is the experience so you have a very similar experience to what you saw in the Azure UI but a much more simplistic one so here we don't even give you all the pipelines we tried we only give you the best pipeline because for analysts we don't believe they have the ability or inclination to compare different models all they are looking for is give me the best model you can for a given period of time so that's this experience for Power BI analyst we also have a visual studio experience for C-sharp developers if you are not a python developer but you are a coding person you are C-sharp.NET developer we have the exact same backend cloud supported in C-sharp that you can use similar to the python experience we saw in the beginning the last one is SQL server a lot of people say oh my data is in SQL and I am a SQL expert I don't know Python I don't know C-sharp but I like SQL we also have a SQL store proc based approach which again uses the exact same backend workflow we saw but coding wise you code in SQL so with that we are seeing so this is the last real slide a lot of customers have multi disciplinary teams some data center some developers some analysts and they want to collaborate so with that in mind what we are doing here is we already enabled this flow this is the demo we saw you built a model using Azure using AutoML and then you consumed it in Power BI that's the demo we saw but the other flow that people are asking is I am an analyst I built model in Power BI using AutoML but I want to give that to my data scientist so the data scientist can go approve review approve or improve so that's where we are working on this one this feature where with a single click of button from here analysts can export the model either the full training code or the best model training code as a python script that can then the data scientist can look at so this is where we are seeing a lot of collaboration capabilities needed as customers and companies are maturing in AI different personas working together and we will continue to support these collaboration experiences so with that like two of customer use cases where they are using automated machine learning in production Snyder electric and BP and we have more and happy to share that is the stuff that I said is work in progress the flow the this arrow so this arrow the intent of this arrow is that whatever AutoML did here I should be able to say give me a training script that I can run redoing the whole thing or give me a script for the best pipeline only as a python script that does this work and it's happening it's we are actually working on it should have something in the next couple of months yes so recommendation scenario is work in progress we currently support the bare bone things like classification regression and forecasting forecasting is one of the meta scenarios we added recently but the collaborative filtering type recommendations we behind the scenes use collaborative filtering for the system but enabling that as a you customer scenario is work in progress to your original question on custom models again answer is same as the primary metric question which is we can support a new model if you think it's generic enough that will be useful for other customers also given we are a platform we it's hard for us to support any custom stuff but that said we are transparent so you can actually use this as a baseline and then use your own algorithms to compare yes yes yes yes it is it is part of Azure ML so it's a web service which has all the STTPS support and any enterprise security related readiness that you can go ahead and use sure I have documentation links where we talk about exactly what kind of how do we impute data so just to be clear whatever we have in production today is based on research right but customers are using it for the real world scenario yeah no we don't have anything kept as for insider community just to be very transparent right DNN so DNN support there is like cutting edge research that's already happened published called NAS it's called network something search new neural architecture search that's for basically searching DNN models given your data what is the good DNN network that works well for that data set we haven't worked on that yet but that's something that research has already published papers on right the intent is not to save that for ourselves it's just that research is always like five years in advance right so we will catch up there and as part of our DNN investment we are looking at enabling NAS research into the product but think of this as this research was published in 2015 we have that in production 2018 so that's the phase that we are going yeah I metadata for like feature engineering for example for feature engineering like column name somebody ask the question right so there we need we operate on column IDs but then we need to be able to show it back to the user in the column name we so we haven't done that we haven't done that we haven't done that yet I don't know if we will do that anytime soon but we haven't done that it's based on like public it's based on public data public models that on those data it's not being retrained it's not being retrained with real customers scenarios yet yes yes say that again so why will you be more correct yes yeah yeah that's a good question that's a good question so as part of the feature engineering transparency I talked about we also give you an ability to override our feature engineering so you can look at what feature engineering we have done and you can say I don't want to do these these things I want to do just these from you plus I want to do this one extra thing we enable that as part of the SDK release so that was a good customer a lot of customers ask that question I want to customize feature engineering everyone everyone so the trend you know it's an interesting question because as part of the collaboration I am seeing patterns around some customers some companies having mixed set of data scientist and data analyst some companies are happy to say for simple like low business impact and low business critical problems analyst can go build a model deploy it on their own but a lot of customers are saying that no no no data scientist needs to review and approve but the trend there are there are mixed trends but everybody almost every customer is saying data scientist resources are hard to find so they are augmenting data science team with these citizen data scientists or analysts who are domain experts who know data but want to learn ML so that's the trend I am seeing it is so the single single click deploy I showed automatically enables that so you click single click deploy and we automatically generate the right metadata for power to understand and query correct yeah so as part of our overall Azure ML ML op story ML operation story we have that covered so this is I just talked about automated ML but there is a like broad spectrum of Azure ML capabilities that you can get for free so that that finishes the end to end thank you