 From around the globe, it's theCUBE with digital coverage of AWS re-invent 2020, sponsored by Intel and AWS. Welcome back to theCUBE's ongoing coverage of AWS re-invent virtual. theCUBE has gone virtual too and continues to bring our digital coverage of events across the globe. Been a big week, big couple of weeks at re-invent and a big week for machine intelligence, machine learning and AI and new services for customers. And with me to discuss the trends in this space is Bratton Saha who is the vice president and general manager of machine learning services at AWS. Bratton, great to see you. Thanks for coming on theCUBE. Thank you, Dave. Thank you for having me. You're very welcome. Let's get right into it. I mean, I remember when SageMaker was announced, it was 2017, it was really a seminal moment in the whole machine learning space. But take us through the journey over the last few years. What can you tell us? So, when we came out with SageMaker, customers were telling us that machine learning is hard and it was within, it's only a few large organizations that could truly deploy machine learning at scale. And so we released SageMaker in 2017 and we have seen really broad adoption of SageMaker across the entire spectrum of industries. And today, most of the machine learning in the cloud, the vast majority of it happens in AWS. In fact, AWS has more than two X of the machine learning than any other provider. And, you know, we saw this morning that more than 90% of the TensorFlow in the cloud and more than 92% of the PyTorch in the cloud happens in AWS. So what has happened in there is customers saw that it was much easier to do machine learning once they were using tools like SageMaker. And so many customers started deploying a handful of models and they started to see that they were getting real business value. You know, machine learning was no longer a niche, machine learning was no longer a fictional thing. It was something that they were getting real business value. And then they started to proliferate it across their use cases. And so these customers went from deploying like tens of models to deploying hundreds and thousands of models. And in fact, we have one customer that is deploying more than a million models. And so that is what we have seen is really making machine learning broadly accessible to our customers through the use of SageMaker. Yes, so you probably very quickly went through the experimentation phase. And people said, wow, you got the aha moments. And so adoption went through the roof. What kind of patterns have you seen in terms of the way in which people are using data and maybe some of the problems and challenges that has created for organizations that they've asked you to help them rectify? Yes, and in fact, SageMaker is today one of the fastest growing services in AWS history. And what we have seen happen is as customers scaled out their machine learning deployments, they asked us to help them solve the issues that used to come when you deploy machine learning at scale. So one of the things that happens is when you're doing machine learning you spend a lot of time preparing the data, cleaning the data, making sure the data is done correctly so it can train your models. And customers wanted to be able to do the data prep in the same service in which they were doing machine learning. And hence we launched SageMaker Data Wrangler where with a few clicks, you can connect to a variety of data stores, AWS data stores or third-party data stores and do all of your data preparation. Now, once you've done your data preparation, customers wanted to be able to store that data and that's why we came out with SageMaker Feature Store. And then customers want to be able to take this entire end-to-end pipeline and be able to automate the whole thing. And that is why we came up with SageMaker Pipelines. And then one of the things that customers have asked us to help them address is this issue of statistical bias and explainability. And so we released SageMaker Clarify that actually helps customers look at statistical bias through the entire machine learning workflow before you do, when you're doing your data processing, before you train your model and even after you've deployed your model. And it gives you insights into why your model is behaving in a particular way. And then we had machine learning in the cloud and many customers have started deploying machine learning at the edge. And they want to be able to deploy these models at the edge and wanted a solution that says, hey, can I take all of these machine learning capabilities that I have in the cloud, specifically the model management and the MLops capabilities and deploy them to the edge devices? And that is why we launched SageMaker Edge Manager. And then customers said, we still need our basic functionality of training and so on to be faster. And so we released a number of enhancements to SageMaker distributed training in terms of new data parallel modules and new model parallelism modules that give the fastest training time on SageMaker across both the frameworks. And that is one of the key things that we have at AWS is we give customers choice. We don't force them onto a single framework. Okay, great. And I think we hit them all, except I don't know if you talked about SageMaker debugger, but we will. So I want to come back to and ask you a couple of questions about these features. So it's funny, sometimes people make fun of your names, but I like them because it says what it does because people tell me that I spend all my time wrangling data. So you have data wrangler. It's all about transformation and cleaning. And because you don't want to spend 80% of your time wrangling data, you want to spend 80 your time driving insights and monetization. So how does one engage with data wrangler? And how do you see the possibilities there? So data wrangler is part of SageMaker Studio. SageMaker Studio was the world's first fully integrated development environment for machine learning. So you come to SageMaker Studio, you have a tab there, which is SageMaker Data Wrangler. And then you have a visual UI. So that visual UI with just a single click, you can connect to AWS data stores like Redshift or Athena or third party data stores like Snowflake and Databricks and MongoDB, which will be coming. And then you have a set of built-in data processors for machine learning. So you get the data in, you do some interactive processing. Once you're happy with the results of your data, you can just send it off as an automated data pipeline job. And it's really today the easiest and fastest way to do machine learning and really take out that 80% that you were talking about. Why has it been so hard to automate the pipeline, to bring CICD to data pipelines? Why has then that been such a challenge and how did you resolve that? You know, what has happened is when you look at machine learning, machine learning deals with both code and data, okay? Unlike software, which really has to deal with only code. And so we had the CICD tools for software, but someone needed to extend it to operating on both data and code. And at the same time, you want to provide reproducibility and lineage and trackability. And really getting that whole end-to-end system to work across code and data, across multiple capabilities was what made it hard. And that is where we brought in SageMaker Pipelines to make this easy for our customers. Got it, thank you. And then let me ask you about Clarify. And this is a huge issue in machine intelligence. Humans by their very nature are biased. They build models, the models are biased in them. And so you're bringing transparent, the other problem with AI, and I'm not sure that you're solving this problem, but please clarify if you are, I'm no pun intended, but it's that black box. You know, AI is the black box. I don't know how the answer, how we got to the answer. It seems like you're attacking that, bringing more transparency and really trying to deal with the biases. I wonder if you could talk about how you do that and how people can expect this to affect their operation. I'm glad you asked this question because customers have also asked us about this. SageMaker Clarify is really intended to address the questions that you brought up. One is it gives you the tools to provide a lot of statistical analysis on the dataset that you started with. So let's say you were creating a model for loan approvals and you want to make sure that, you know, you have equal number of male applicants and equal number of female applicants and so on. So SageMaker Clarify lets you run these kind of analysis to make sure that your dataset is balanced to start with. Now, once that happens, you have trained the model. Once you've trained the model, you want to make sure that the training process did not introduce any unintended statistical bias. So then you can use SageMaker Clarify to again say, well, is the model behaving in the way I expected it to behave based on the training data I had. So let's say your training data said, you know, 50% of all the male applicants got their loans approved. After training, you can use Clarify to say, does this model actually predict that 50% of the male applicants will get approved? And if it's more or less, you know you have a problem. And then after that, we get to the problem you mentioned, which is how do we unravel the black box nature of this? And, you know, we took the first steps of it last year with Autopilot where we actually gave notebooks, but SageMaker Clarify really makes it much better because it tells you why a model is predicting the way it's predicting. It gives you the reasons and it tells you, you know, here is why the model predicts that, you know, you were approved a loan or here is why the model said that you may or may not get a loan. So it really makes it easier, gives visibility and transparency and helps to convert insights that you get from model predictions into actionable insights because you now know why the model is predicting why it's predicting. Yeah, it brings out confidence level. Okay, thank you for that. Let me ask you about distributed training on SageMaker. Help us understand what problem you're solving. You're rejecting auto-parallelism. Is that about scale? Help us understand that. Yeah, so one of the things that's happening is, you know, our customers are starting to train really large models. Like, you know, three years back, they would train models with like 20 million parameters. You know, last year they would train models with like a couple of hundred million parameters. Now customers are actually training models with billions of parameters. And when you have such large models, the training can take days and sometimes weeks. And so what we have done are two concepts. One is we introduced a way of taking a model and training it in parallel on multiple GPUs. And that's, you know, what we call a data parallel implementation. We have our own custom libraries for this, which give you the fastest performance on AWS. And then the other thing that happens is customers take some of these models that are really large, you know, like billions of parameters. And we showed one of them today called T5. And these models are so big that they cannot fit in the memory of a single GPU. And so what happens is today, if customers have to train such a model, they spend weeks of effort trying to parallelize that model. And what we introduced in SageMaker today is a mechanism that automatically takes these large models and distributes it across multiple GPUs, the auto parallelization that you were talking about, making it much easier and much faster for customers to really work with these big models. So the GPU is a very expensive resource. And prior to this, you would have the GPU waiting, waiting, waiting, load me up and you don't want to do that with your expensive resource. And, you know, one of the things you mentioned before is SageMaker debugger. So one of the things that we also came out with today is the SageMaker profiler, which is really part of the debugger that lets you look at your GPU utilization, at your CPU utilization, at your network utilization and so on. So now you know when your training job has started, at which point has the GPU utilization gone down and you can go in and fix it. So this really lets you utilize your resources much better and ultimately reducing your cost of training and making it more efficient. Awesome. Let's talk about Edge Manager because Andy Jassy's keynote was interesting. He was talking about hybrid and his vision is basically an Amazon's vision is we want to bring AWS to the Edge. We see the data center as just another Edge node. And so, so this is to me, another example of AWS's, you know, Edge strategy. Talk about how that works and in practice, how does it work? Am I doing inference at the Edge and then bringing back data into the cloud? Am I doing things locally? Explain that. Yes. So, you know, what SageMaker Edge Manager does is it helps you manage, deploy and manage models at the Edge. The inference is happening on the Edge device. Now, consider this case, so Lenovo has been working with us and what Lenovo wants to do is to take these models and do predictive maintenance on laptops. So, you know, you're an IT shop and you have a couple of 100,000 laptops, you would want to know when something may go down. And so they deploy this predictive maintenance models on the laptop. They're doing inference locally on the laptop but you want to see are the models getting degraded and you want to be able to see is the quality up. So, what Edge Manager does is number one, it takes your models, optimizes them so they can run on an Edge device and we get up to 25X benefit. And then once you've deployed it, it helps you monitor the quality of the models by letting you upload data samples to SageMaker. So, you can see if there is drift in your models or if there's any other degradation. All right, and Jumpstart is where I go to, it's kind of the portal that I go to to access all these cool tools, is that right? Yep, and you know, we have a lot of getting started material, lots of first party models, lots of open source models and solutions. All right, Pratt, we're out of time but I could go on forever. We do thanks so much for bringing this knowledge to the CUBE audience, really appreciate your time. Thank you, thank you Dave for having me. You're very welcome and good luck with the announcements and thank you for watching everybody. This is Dave Vellante from the CUBE and our coverage of AWS re-invent 2020 continues right after the short break.