 Hey, welcome back everyone. Day three of AWS re-invent 2020. I'm John Furrier with Dave Vellante, co-host of theCUBE. Dave, 10 years for us. The leader in high tech coverage is our slogan now. 10 years of re-invent day. We've been to every single one except for the original, which we would have come through if Amazon actually marketed the event, but they didn't. It's more of a customer event. This is day three. It's the machine learning, AI keynote. Swamy's up there. A lot of announcements. We're going to break this down. We've got Andy Thigh here, Vice President of Constellation Research. Andy, great to see you've been on theCUBE before. One of our analysts bringing the analysis commentary to the keynote. This is your wheelhouse AI. What do you think about Swamy up there? I mean, he's awesome. We love him. Big fan of theCUBE and we're a fan to him. But he got 13 announcements. A lot. A lot. So, well, some of them are. First of all, thanks for having me here. And I'm glad to have both of you on the same show attacking me. I'm just kidding. But some of the announcement really, sort of like a game changer announcements. And some of them are like, meh, just to plug in the holes, what they have. And then- A lot of golf claps. Yeah, right. And you could have also noticed that by when he was making the announcements, the clapping volume difference, you could say, which is better, right? But some of the announcements are really, really good. Particularly we talked about, one of that was Microsoft took that out of having the open AI in there, doing the large language models. And then they were going after that, having the transformer available to them. And Amazon was a little bit weak in the day area. So they couldn't, they don't have a large language model. So they are taking a different route, saying that, you know what? I'll help you train the large language model by yourself, customized models. So I can provide the necessary instance. I can provide the instant volume, memory, the whole line, yeah. So you can train the model by yourself without depending on them kind of thing. So Dave and Andy, I want to get your thoughts. Because first of all, we've been following Amazon's deep bench on the infrastructure pass. They've been doing a lot of machine learning and AI, a lot of data. It just seems that the sentiment is that there's other competitors doing a good job too, like Google, Dave, and I've heard folks in the hallway even here, ex-Amazonians, saying, hey, they train their models on Google and then they bring them to SageMaker. Because it's better interface. So you got Google making a play for being that data cloud. Microsoft's obviously putting in a great kind of package to kind of make it turnkey. How do they really stand versus the competition guys? Good question. So they each have their own uniqueness and the variation that take it to the field, right? So for example, if you were to look at it, Microsoft is known for its industry related things that they've been going after, industry verticals and whatnot. So that's one of the things I looked here. They had this omics announcement, particularly towards that healthcare genomics space. That's a huge space for HPC related AI ML applications. And they have put a lot of things in together here in the SageMaker and in their models, saying that, you know, how do you use this transformative to do things like that? Like for example, drug discovery, for genomics analysis, for cancer treatment, the whole nine yards, right? That's a huge volumes of data though. So they're going in that healthcare area. Google has taken a different route. I mean, they want to make everything simple. All I have to do is that I got to call an API, give what I need and then get it done. But Amazon wants to go at a much deeper level, saying that, you know what? I want to provide everything you need. You can customize the whole thing for what you need. So to me, the big picture here is, and Swami references, hey, we're a data company. We started to talk about books and how that informed them as to, you know, what books to place front and center. Here's the big picture, in my view. Companies need to put data at the core of their business. And they haven't. They've generally put humans at the core of their business and data and now machine learning are at the outside in the periphery. Amazon, Google, Microsoft, Facebook have put data at their core. So the question is, how do incumbent companies, and he mentioned some, Toyota, Capital One, Bristol Myers Squibb, I don't know, are those data companies, you know, we'll see. But the challenge is most companies don't have the resources, as you well know, Andy, to actually implement what Google and Facebook and others have. So how are they going to do that? Well, they're going to buy it, right? So are they going to build it with tools? That's kind of like you said, the Amazon approach. Or are they going to buy it from Microsoft and Google? I pulled some ETR data to say, okay, who are the top companies that are showing up in terms of spending? Who's spending with whom? AWS number one, Microsoft number two, Google number three, Databricks number four, just in terms of, you know, presence. And then it falls down. Data robot, Anaconda, DataIQ, Oracle popped up, actually, because they're embedding a lot of AI into their products and, of course, IBM. And then a lot of smaller companies. But do companies generally, customers, have the resources to do what it takes to implement AI into applications and into workflows? So a couple of things on that. One is, when it comes down to, I mean, it's no surprise that the top three are the hyperscalers because they all want to bring their business to them to run the specific workloads. On the next biggest workload, as he was saying in his keynote, are two things. One is the AI, I'm all workloads. The other one is the heavy unstructured workloads that he was talking about. 80%, 90% of the data that's coming off is unstructured. So how do you analyze it? It's such as the geospatial data he was talking about, the volumes of data you need to analyze, the deep neural network you ought to use. Only hyperscalers can do it, right? So there's no wonder all of them on top. For the data, one of the things they announced, which not many people paid attention, there was a zero ATL that they talked about. What that does is a little bit of a game-changing moment in the sense that you don't have to, for example, if you're trying the data, if the data is distributed everywhere, if you have to bring them all together to integrate it to do that, it's a lot of work to doing the ATL. So by taking Amazon Aurora and then Ratchet, combine them as zero or no ATL and then have Apache Spark applications run on top of analytical applications, ML workloads. That's huge, so you don't have to move around the data. Use the data where it is. I think you said it. They're basically filling holes, right? They created this suite of tools, let's call it. You could say it's a mess. It's not a mess because they're really powerful, but they're not well integrated and now they're starting to take the seams, as I say. Well, yeah, that's a great point. And I would double down and say, look it, I think that boring is good. We had that phase in Kubernetes, hype cycle where it got boring and that was kind of like, boring is good. Boring means we're getting better, we're invisible. That's infrastructure, that's in the weeds, in between the toes, details. It's the stuff that, you know, people, we have to get done. So, you know, you look at their 40 new data sources with data wrangler, 50 new app flow connectors, Redshift, AutoCAD, this is boring, good, important shit, Dave. It's like, you got to get it. And the governance is going to be key. So to me, this may not jump off the page. Adam's keynote also felt a little bit of, we got to get these gaps done in a good way. So I think that's a very positive sign. Now, going back to the bigger picture, I think the real question is, can there be another independent cloud, data cloud? And that's to me, what I try to get at my story in your breaking analysis, kind of hit a home run on this is, there's interesting opportunity for an independent data cloud. Meaning, something that isn't AWS, that isn't Google, isn't one of the big three that could sit on it and so on. Let me give you an example. I had a conversation last night with a bunch of ex-Amazonian engineering teams that left, the conversation was interesting, Dave. They were like talking, well, Databricks and Snowflake are basically batch. Okay, not transactional. And you look at Aerospyte, I can see their booth here. Transactional databases are hot right now. Streaming data is different. Confluence different than Databricks. Is Databricks good at hosting? No, Amazon's better. So you start to see these kinds of questions come up where Databricks is great, but maybe not good for this, that and the other thing. So you start to see the formation of swim lanes or visibility into where people might sit in the ecosystem, but what came out was transactional and batch the relationship there and streaming real-time and versus the transactional data. So you start to see these new things emerge. Andy, what's your take on this? You're following this closely. This seems to be the alpha nerd conversation and it all points to who's going to have the best data cloud, say data super-clouds, I call it, but what's your take? Yes, data cloud is important as well, but also the computational that goes on top of it too. Because when the data is unstructured data, that much of a huge data, it's going to be hard to do that with a low model compute power. But going back to your data point, the training of the AI ML models required the batch data. That's when you need all the historical data to train your models. And then after that, when you do inference of it, that's when you need the streaming real-time data that's available to you so you can make an inference. One of the things what they also announced which is somewhat interesting is, you saw that they have like 700 different instances geared towards every single workload. And there are some of them very specifically run on the Amazon's new chip, the inference in two and the training on TR-1 chips. That basically not only has a specific instances, but also is run on a high-powered chip. And then if you have that data to support that, both the training as well as towards the inference, the efficiency, again, those numbers have to be proven. They claim that it could be anywhere between 40 to 60%? Well, so a couple of things, you're definitely right. I mean, Snowflake started out as a data warehouse that was simpler and it's not architected in its first wave to do real-time inference. It's not. Now, the second point is, Snowflake's two or three years ahead when it comes to governance, data sharing, I mean, Amazon's doing what it always does. It's copying, it's customer-driven because they probably walk into an account and they say, hey, look what Snowflake's doing for us. This stuff's kicking ass. And they go, oh, that's a good idea. Let's do that too. You saw that with separating compute from storage, which is their tiering. You saw it today with extending data sharing, redshift data sharing. So how does Snowflake and Databricks approach this? They do it with ecosystem. They bring in ecosystem partners. They bring in open-source tooling and that's how they compete. I think there's unquestionably an opportunity for a data cloud. Yeah, and I think the super cloud conversation and then SkyCloud with Berkeley, Paper, and other folks talking about this kind of pre-multi-cloud era. I mean, that's what I would call us right now. We're kind of in the pre-era of multi-cloud, which, by the way, is not even yet defined. I think people use that term, Dave, to say, oh, it's some sort of magical thing that's happening. Yeah, people have multiple clouds. They end up by default, not by design, as Dell likes to say. And they got to deal with it. So it's more of their inheriting multiple-cloud environments. It's not necessarily what they want in the situation. So to me, that is a big, big issue. Yeah, I mean, again, going back to your Snowflake and Databricks announcements, they're a data company. So that's how they made their mark in the market, saying that, you know, I do all these things. Therefore, I had to have your data because it's seamless data. And Amazon is catching up with that with a lot of their announcements that they made. How far is it going to get traction to change when it has to see? Yeah, I mean, to me, there's no doubt about it, Dave. I think what Swami's doing, if Amazon can get corner the market on, out of the box, ML, and AI capabilities so that people can make it easier, that's going to be the end of the day at TellSign. Can they fill in the gaps? Again, boring is good. Competition, I don't know. I mean, I'm not following the competition, Andy. This is a real question mark for me. I don't know where they stand. Are they more comprehensive? Are they more deeper? Are they have deeper services? I mean, honestly, it shows all the different capabilities. Where does Amazon stand? What's the prospects? Particularly when it comes to the models, so they're going at a different angle that, I will help you create the models. We've talked about the zero ATL and the whole data. We'll get the data sources in, we'll create the model, we'll move the whole model. We're talking about the ML Ops teams here, right? And they have the whole functionality that they built in that they've matured over the year. So essentially, they want to become the platform when you come in. I'm the only platform you would use from the model training to deployment to inference to model versioning to management, the whole line, yes. And that's the angle they're trying to take. So it's a one-source platform. What about this idea of technical debt? Adrian Kharkov was on yesterday, John. I know you talked to him as well. He said, look, Amazon's Legos. You want to buy a toy for Christmas, you can go and buy a toy. Or do you want to build a toy? If you buy a toy in a couple of years, it could break. And what are you going to do? You're going to throw it out. But if part of your Lego needs to be extended, you extend it. So, you know, George Gilbert was saying, well, there's a lot of technical debt. Adrian was countering that. Does Amazon have technical debt? Or is that Lego blocks analogy the right one? Well, I talked to him about the debt. And one of the things we talked about was, what do you optimize for? EC2 APIs or Kubernetes APIs? It depends on what team you're on. If you're on the runtime team, you're going to optimize for Kubernetes. But EC2 is the resources you want to use. So I think the idea of the 15 years of technical debt, I don't believe that. I think APIs are still hardened. The issue that he brings up that I think is relevant is, it's an end situation, not an or. You can have the bag of Legos, which is the primitives, and build a durable application platform, monitor it, customize it, work with it, build it. It's harder, but the outcome is durability and sustainability. Building a toy, having a toy with those Legos, glued together for you, you can get the play with. But it'll break over time. Then you got to replace it. So there's going to be a toy business, and there's going to be a Lego's business. Who are the toys in AI? Well, out of the box. Who's out of the box? So you're asking about what toy is Amazon building? I mean, Amazon clearly is a Lego block. If people can have out of the box. What about Google? What about Microsoft? Are they basically more building toys, more solutions? So Google is more of building solutions, angle, like I give you an API kind of thing. But if it comes to vertical industry solutions, Microsoft is ahead, right? Because they have had years of industry experience. I mean, there are other smaller clouds that are trying to do that too. IBM being an example. But now they are starting to go after the specific industry use cases. They think that through. For example, the medical one we talked about, right? So they want to build the health lake, security health lake that they're trying to build, which will keep a compliant. It'll provide all the European regulations, the whole line of eyes. It'll help you personalize things as you need as well. For example, if you go for a certain treatment, it could analyze you based on your genome profile, saying that the treatment for this particular person has to be individualized this way. But doing that requires enormous power, right? So if you do applications like that, you could bring in a lot of this, whether healthcare, finance, or what have you. And it'll be easy for them to use. What's the biggest mistake customers make when it comes to machine intelligence, AI, machine learning? So many things, right? I could start out with even the model. Basically, when you build a model, you should be able to figure out how long that model is effective. Because as good as creating a model and going to the business and doing things the right way, there are people that they leave the model much longer than it's needed. It's hurting your business more than it is. You know, it could be things like that. Or you're not building it as possible, AI or later things. You're having a bias in your model. That are so many issues. I don't know if I can pinpoint one. But there are many, many issues. Responsible AI, ethical AI. All right, well we'll leave it there. You're watching theCUBE, the leader in high tech coverage. Here at day three at ReInvent, I'm Jaffer at Dave Vellante. Andy's joining us here for the critical analysis and breaking down the commentary. We'll be right back with more coverage after this short break.