 Live from Washington D.C., if the Cube. Covering Inforum DC 2018, brought to you by Infor. Welcome back, we are here on the Cube. Thanks for joining us here as we continue our coverage here at Inforum 18, we're in Washington D.C. at the Walter Washington Convention Center. I'm John Walls with Dave Vellante and we're joined now by Rahul Pathak who is the GM at the Amazon Athena and Amazon EMR. Hey there, nice to see you sir. Nice to see you as well, thanks for having me. Thank you for being with us. You spoke earlier at the executive forum and I wanted to talk to you about the title of the presentation, was Data Lakes and Analytics, the Coming Wave of Brilliance. All right, so tell me about the title but more about the talk too. Sure. So the talk was really about a set of components and a set of trans-driving data lake adoption and then how we partner with Infor to allow Infor to provide data lake that's customized for the vertical lines of business to their customers. And I think part of the notion is that we're coming from a world where customers had to decide what data they could keep because their systems were expensive. Now moving to a world of data lakes where storage and analytics is much lower cost. And so customers don't have to make decisions about what data to throw away. They can keep it all and then decide what's valuable later. And so we believe we're in this transition and inflection point where you'll see a lot more insights possible with a lot of novel types of analytics, much more so than we could do to this point. And that's the brilliance. That's the brilliance of it, right? The opportunity to leverage. To do more. Like you never could before. Exactly. I'm sorry, David. No, that's okay. So if you think about the phases of so-called big data. We went from sort of EDW to cheaper data warehouses that were distributed, right? And this guy always joked that the ROI of Hadoop was reduction of investment. And that's what it became. And as a result, a lot of the so-called data lakes would just became stagnant. And so then you had a whole slew of companies that emerged trying to sort of clean up the swamp, so to speak. You guys provide services and tools. So you're like, okay, guys, here it is. We're going to make it easier for you. One of the challenges that Hadoop generally, in big data generally had was the complexity. And so what we noticed was the cloud guys, not just AWS, but in particular AWS, really started to bring in tooling that simplified the effort around big data. So fast forward to today. Now we're at the point of trying to get insights. Data's plentiful, insights aren't. Take us, bring us up to speed on so that Amazon's big data, AWS's big data strategy, the status, what customers are doing, where are we at in those waves? It's a big question, but yeah, absolutely. So, John Furrier question. So what we're seeing is this transition from sort of classic EDW to S3-based data lakes. S3 is our Amazon storage service and it's really been foundational for customers. And what customers are doing is they're bringing their data to S3 in open data formats. EDWs still have a role to play. And then we offer services that make it easy to catalog and transform the data in S3, as well as the data in customer databases and data warehouses, and then make that available for systems to drive insight. And when I talk about that, what I mean is we have the classic reporting and visualization use cases, but increasingly we're seeing a lot more real-time event processing. And so we have services like Kinesis Analytics that makes it easy to run real-time queries on data as it's moving. And then we're seeing the integration of machine learning into this stack. So once you've got data in S3, it's available to all of these different analytics services simultaneously. And so now you're able to run your reporting, your real-time processing, but also now use machine learning to make predictive analytics and decisions. And then I would say a fourth piece of this is there's really been, with machine learning and deep learning, and embedding them in developer services, there's now been a way to get a data that was historically opaque. So if you had an audio recording of a support call, you can now put it through a service that'll actually transcribe it, tell you the sentiment in the call, and that becomes data that you can then track and measure and report against. So there's been this real explosion in capability and flexibility. And what we've tried to do at AWS is provide managed services to customers so that they can assemble sophisticated applications out of building blocks that make each of these components easier, that focus on being best of breed in their particular use case. And you're responsible for EMR, correct? So I own a few of these, EMR, Athena, and Glue. And really these are EMR's open source, Spark and Hadoop, with customized clusters that operate directly against S3 data lakes and no need to load into HDFS. So you avoid that staleness point that you mentioned. And then Athena is a serverless SQL on S3 so you can let any analyst log in, just get a SQL prompt and run a query. And then Glue is for cataloging the data in your data lake and databases, and for running transformations to get data from raw form into an efficient form for querying, typically. So EMR was really the first service, if I recall, right, sort of first big data service that you offered, right? And as you say, you really began to simplify for customers because the Hadoop complexity was just unwieldy. And Momentum is still there with EMR, are people looking for alternatives? Sounds like it's still a linchpin of the strategy. No, absolutely. I mean, I think what we've seen is customers bring data to S3. They'll then use a service like Redshift for petabyte scale data warehousing. They'll use EMR for really arbitrary analytics using open source technologies. And then they'll use Athena for broad data lake query and access. So these things are all very much complimentary to each other. How do you define just the concept of data lakes versus other approaches to clients and try to explain to them the value and the use for them, I guess, ultimately how they can best leverage it for their purposes? How do you walk up through that? Yeah, absolutely. So there's, you know, that starts from the principles around how data is changing. So before we used to have typically tabular data coming out of ERP systems or CRM systems going into data warehouses. Now we're seeing a lot more variety of data. So you might have tweets, you might have JSON events, you might have log events, real-time data. And these don't fit well into the traditional relational tabular model. So what data lakes allow you to do is you can actually keep both types of the data. You can keep your tabular data indirectly in your data lake and you can bring in these new types of data, the semi-structured or the unstructured data sets. And they can all live in the data lake. And the key is to catalog that all so you know what you have and then figure out how to get that catalog visible to the analytic layer. And so the value becomes, you can actually now keep all your data. You don't have to make decisions about it a priori about what's going to be valuable or what format it's going to be useful in. And you don't have to throw away data because it's expensive to store it in traditional systems. And this gives you then the ability to replay the past when you develop better ideas in the future about how to leverage that data. So there's a benefit to being able to store everything. And then I would say the third big benefit is around by placing data in data lakes in open data formats whether that's CSV or JSON or more efficient formats that allows customers to take advantage of best-to-breed analytic technology at any point in time without having to re-platform their data. So you get this technical agility that's really powerful for customers because capabilities evolve over time constantly. And so being in a position to take advantage of them easily is real competitive advantage for customers. I want to get to in for, but this is so much fun. I have some other questions because Amazon's such a force in this space. When you think about things like Redshift, S3, Kinesis, DynamoDB, we're a customer. These are all tools that we're using. Aurora, the data pipeline starts to get very complex. And the great thing about AWS is I get API access to each of those and primitive access. The drawback is it starts to get complicated. My data pipeline gets elongated and I'm not sure whether I should run it on this server so that service until I get my bill at the end of the month. So are there things that you're doing to help? First of all, is that a valid concern of customers? And what are you doing to help customers in that regard? Yeah, so we do provide a lot of capability and I think our core idea is to provide the best tool for the job with APIs to access them and combine them and compose them. So what we're trying to do to help simplify this is A, build in more prescriptive guidance into our services about look, if you're trying to do X, here's the right way to do X, at least the right way to start with X and then we can evolve and adapt. We're also working hard with things like blogs and solution templates and cloud formation templates to automatically stand up environments. And then the third piece is we're trying to bring in automation and machine learning to simplify the creation of these data pipelines. So Glue, for example, when you put data in S3, it will actually crawl it on your behalf and infer its structure and store that structure in a catalog. And then once you've got a source table on a destination table, you can point those out and Glue will then automatically generate a pipeline for you to go from A to B that you can then edit or store in version control. So we're trying to make these capabilities easier to access, provide more guidance so that you can actually get up and running more quickly without giving up the power that comes from having the granular access. That's a great answer because the granularity is critical because it allows you as the market changes, it allows you to move fast, right? And so you don't want to give that up but at the same time you bring in complexity and you just, I think, answered it well in terms of how you're trying to simplify that. The strategies obviously work very well. Okay, let's talk about info. Now here's a big ISV partner. They've got the engineering resources to deal with all this stuff and they really seem to have taken advantage of it. We were talking earlier that, I don't know if you heard Charles' keynote this morning, but he said, when we were an on-prem software company, we didn't manage customer servers for them. And back then the server was a server. Software companies didn't care about the server infrastructure. Today it's different. It's like the cloud is giving in for a strategic advantage. The flywheel effect that you guys talk about spins off innovation that they can exploit in new ways. So talk about your relationship with info and kind of the history of where it's coming, where it's going. Sure, so info is a great partner. We've been a partner for over four years. They're one of our first all-in partners and we have a great working relationship with them. They're sophisticated. They understand our services well and we collaborate on identifying ways that we can make our services better for their use cases. And what they've been able to do is take all of the years of industry and domain expertise that they've gained over time in their vertical segments and with their customers and bring that to bear by using the components that we provide in the cloud. So all these services that I mentioned, the global footprint, the security capabilities, the all of the various compliance certifications that we offer act as accelerators for what info is trying to do. And then they're able to leverage their intellectual property and their relationships and experience they've built up over time to get this global footprint that they can deploy for their customers that gets better over time. As we add new capabilities, they can build that into the Infor platform and then that rolls out to all of their customers much more quickly than it could before. And they seem to be really driving hard. I have not heard an enterprise software company talk so much about data and how they're exploiting data the way that I've heard Infor talk about it. So data is obviously key. It's the lifeblood. People say it's the new oil. I'm not sure that's the best analogy. I can only put oil in my house or my car. I can't put it in both data. I could do so many things with it. So. I suspect that analogy will evolve. I think it should. I'm already thinking about it now. First on the cube. You keep going, I'll come up with something. Don't use that anymore. Scratch the oil. Okay, so your perspectives on Infor, it's sort of use of data and what Amazon's role is in terms of facilitating that. So what we're providing is a platform set of services with powerful building blocks that Infor can then combine into their applications that match the needs of their customers. And so what we're looking to do is give them a broad set of capabilities that they can build into their offering. So Cloud Suite is built entirely on us. And then Infor OS is a shared set of services and part of that is our data lake which uses a number of our analytic services underneath. And so what Infor is able to do for their customers is break down data silos within their customer organizations and provide a common way to think about data and machine learning and IoT applications across data in the data lake. And we view our role as really a supporting partner for them in providing a set of capabilities that they can then use to scale and grow and deploy their applications. I want to ask you about security. I've always been comfortable with cloud security. Maybe I'm naive. But compliance is something that's interesting and something you said before, I think you said cataloging, glue allows you to essentially keep all the data. And my concern about that is from a governance perspective, the legal counsel might say, well, I don't want to keep all my data. If it's work in process, I want to get rid of it or if there's a smoking gun in there, I want to get rid of it as soon as I can. Keep data as long as possible but no longer to sort of paraphrase Einstein. So what do you say to that? Do you have customers in the legal office that says, hey, we don't want to keep data forever and how can you help? Yes, just to refine the point on glue, what glue does is it gives you essentially a catalog which is a map of all your data. Whether you choose to keep that data or not keep that data, that's a function of the application. So absolutely we have customers that say, look here at my data sets for whether it's new regulations or I just don't want this set of data to exist anymore or this customer is no longer with us and we need to delete that. We provide all of those capabilities. So our goal is to really give customers the set of features, functionality and compliance certifications they need to express the enterprise security policies that they have and ensure that they're complying with them. And so then if you have data sets that need to be deleted, we provide capabilities to do that. And then the other side of that is you want the audit capabilities so we actually log every API access in the environment in a service called CloudTrail and then you can actually verify by going back and looking at CloudTrail that only the things that you wanted to have happen actually did happen. So you seem very relaxed. I have to ask you what life is like at Amazon because when I was down at, I was at AWS's DC offices and you walk in there and there's this huge, you've seen it, there's this giant graph of the services launched and announced from, I guess it was 2006 was when EC2 first came out till today and it's just this ridiculous set of service. I mean the graph is amazing. And so you're moving at this super hyper pace. What's life like at AWS? You know, I've been there almost seven years, I love it. It's been fantastic. I was an entrepreneur and came out of startups before AWS. When I joined I found an environment where you can continue to be entrepreneurial and active on behalf of your customers but you have the ability to have impact at a global scale. So it's been super fun. The pace is fast but exhilarating. We're working on things we're excited about and we're working on things that we believe matter and make a difference to our customers. So it's been really fun. Well, so you got, I mean, you're right at the heart of what I like to call the innovation sandwich. You've got data, tons of data obviously in the cloud. You're leader and increasingly become as sophisticated in machine intelligence. So you got data, machine intelligence or AI applied to that data and you get cloud for scale, cloud for economics, cloud for innovation. You're able to attract startups. That's probably how you found AWS to be here, right? All the startups, including ours, we want to be on AWS. That's where the developers want to be. And so it's again, it's overused word but that flywheel of innovation occurs. And that to us is the innovation sandwich. It's not Moore's law anymore. For decades, this industry marched to the cadence of Moore's law. Now it's a much more multi-dimensional matrix and it's exciting and sometimes scary. Yeah, no, I think you touched on a lot of great points. It's really fun. I mean, I think for us, the core is we want to put things together that the customers want. We want to make them broadly available. We want to partner with our customers to understand what's working and what's not. We want to pass on efficiencies when we can. And then that helps us speed up this cycle of learning. Well, Rahul, I actually was going to say, I think he's so relaxed because he's on theCUBE. Yeah, it could be. Right, that's him. We just like to do that with him. They are fantastic. Thanks for being with us. It's a pleasure. We appreciate the insights and we certainly wish you well with the rest of the show here. Excellent, thank you very much. It was great to be here. You're watching theCUBE. We are live here at Washington DC at Inform 18.