 Hello, everyone. Welcome to theCUBE's presentation of the AWS Startup Showcase, season two, episode two. The theme is data as code, the future of analytics. I'm John Furrier, your host. We had a great day lineup for you, fast growing startups, great lineup of companies, founders and stories around data as code. And we're going to kick it off here with our opening keynote with Rahul Patak, VP of analytics at AWS, CUBE alumni. Rahul, thank you for coming on and being the opening keynote for this awesome event. John, it's great to see you and it's great to be part of this event and excited to help showcase some of the great innovation that startups are doing on top of AWS. Yeah, we last spoke at AWS re-invent and a lot's happened there, serverless is the center of the action. But all these startups, Rockset, Dremio, Cribble, Mux, Nexla, Cardo, Ahana, Imply, all doing great stuff. Data as code has a lot of traction. So a lot of still momentum going on in the marketplace, pretty exciting. No, it's awesome. I mean, I think there's so much innovation happening and the wonderful part of working with data is that they've been run for services and products that help customers drive insight from data as just skyrocketing and has no sign of slowing down. And so it's a great time to be in the data business. It's interesting to see the theme of the show getting traction because you start to see data being treated almost like how developers write software, taking things out of branches, working on them, putting them back in, machine learnings, getting iterated on, you're seeing more models being trained differently with better insights, action ones, all kind of like working, like code. And this is a whole nother way people are reinventing their businesses. This has been a big, huge wave. What's your reaction to that? I think it's spot on. I mean, I think the idea of data as code and bringing some of the repeatability of processes from software development into how people build data applications is absolutely fundamental. And especially so in machine learning where you need to think about the explainability of a model, what version of the world was it trained on when you build a better model, you need to be able to explain and reproduce it. So I think your insight spot on and these ideas and are showing up in all stages of the data workflow from ingestion to analytics to ML. Next wave is about modernization and going to the next level with cloud scale. Thank you so much for coming on and being the keynote presenter here for this great event. I'll let you take it away, reinventing businesses with AWS analytics or we'll take it away. Okay, perfect. Well, folks, we're going to talk about reinventing your business with data. And if you think about it, the first wave of reinvention was really driven by the cloud as customers were able to really transform how they thought about technology and that's well on your way. Although if you stop and think about it, I think we're only about 5% to 10% of the way done in terms of IT span being on the cloud. So lots of work to do there. But we're seeing another wave of reinvention which is companies reinventing their businesses with data and really using data to transform what they're doing to look for new opportunities and look for ways to operate more efficiently. And I think the past couple of years of the pandemic had really only accelerated that trend. And so what we're seeing is, it's really about the survival of the most informed folks with the best data are able to react more quickly to what's happening. We've seen customers being able to scale up if they're in the same delivery business or scale down if they were in the travel business at the beginning of all of this. And then using data to be able to find new opportunities and new ways to serve customers. And so it's really foundational and we're seeing this across the board. And so it's great to see the innovation that's happening to help customers make sense of all of this. And our customers are really looking at ways to put data to work. It's about making better decisions, finding new efficiencies and really finding new opportunities to succeed and scale. And when it comes to good examples of this, Finra is a great one. You may not have heard of them, but that the U.S. Equities Regulators and all trading that happens in equities, they keep track of. They look at about 250 billion records per day. We use Amazon EMR, which is our Spark and Hadoop service and they're processing 20 terabytes of data running across tens of thousands of nodes and they're looking for fraud and bad actors in the market. So, huge transformation journey for Finra over the years of customer I've gotten to work with personally since really 2013 onwards. So it's been amazing to see their journey. Pinterest, another great customer I'm sure everyone's familiar with, but they're about visual search and discovery and commerce and they're able to scale their daily log searches, really a factor of three X or more, drive down their costs and they're using the Amazon Open Search service. And really what we're trying to do at AWS is give our customers the most comprehensive set of services for the end to end journey and data from ingestion to analytics to machine learning. And we want to provide a comprehensive set of capabilities for ingestion, cataloging, analytics and then machine learning. And all of these are things that our partners and the startups that are running on us have available to them to build on as they build and deliver value for their customers. And the way we think about this is we want customers to be able to modernize what they're doing and their infrastructure and we provide services for that. It's about unifying data wherever it lives, connecting it so that customers can build a complete picture of their customers in business. And then it's about innovation and really using machine learning to bring all of this unified data to bear on driving new innovation and new opportunities for customers. And what we're trying to do at AWS is really provide a scalable and secure cloud platform that customers and partners can build on. Unifying is about connecting data and it's also about providing well-governed access to data. So one of the big trends that we see is customers looking for the ability to make self-service data available to their end users. And the key to that is good foundational governance. Once you can define good access controls, you then are more comfortable setting data free. And the other part of it is data lakes play a huge role because you need to be able to think about structured and unstructured data. In fact, about 80% of the data being generated today is unstructured. And you want to be able to connect data that's in data lakes with data that's in purpose built data stores, whether that's databases on AWS, databases outside, test products, as well as things like data warehouses and machine learning systems, but really connecting data is key. And then innovation, how can we bring to bear and reimagine all processes with new technologies like AI and machine learning? And AI is also key to unlocking a lot of the value that's in unstructured data. If you can figure out what's in an image or the sentiment of audio and do that in real time, that lets you then personalize and dynamically tailor experiences. All of which are super important to getting an edge in the modern marketplace. And so at AWS, we really think about connecting the dots across sources of data, allowing customers to use data lakes, databases, analytics and machine learning. We want to provide a common catalog and governance and then use these to help drive new experiences for customers and their apps and their devices. And then this, in an ideal world will create a closed loop. So you're creating your experience, you observe how customers interact with it. That generates more data, which is a better source that feeds into the system. And on AWS, thinking about a modern data strategy, really at the core is data lakes built on S3, and I'll talk more about that in a second. Then you've got services like Athena and Lulea and Lakeformation for managing that data cataloging it and querying it in place. And then you have the ability to use the right tool for the right job. And so we're big believers in purpose-built services for data because that's the way you can avoid compromising on performance functionality or scale. And then, as I mentioned, unification and interconnecting all of that data. So if you need to move data between these systems, there's well-trodden pathways that allow you to do that and features built into services that are needed on. And some of the core ideas that guide the work that we do, scalable data lakes at key. And this is really about providing arbitrarily scalable high throughput systems. It's about open format data for future-proofing. Then we talk about purpose-built systems at the best possible functionality performance and cost. And then from a serverless perspective, this has been another big trend for us. We announced a bunch of serverless services and reinvent. And the goal here is to really take away the need to manage infrastructure from customers so they can really focus about driving differentiated business value. Integrated governance and then machine learning are basically not just as an end product for data scientists, but also machine learning built into data warehouses, visualization and databases. And so with scalable data lakes, S3 is really the foundation for this. One of our original services at AWS, really the backbone of so much of what we do. Really unmatched to your ability, availability and scale. Huge portfolio of analytics services, both that we offer, but also that our partners and customers offer. And really arbitrary scale. We've got individual customers in S3 that are in the X-Y range, many in the hundreds of petabytes, and that's just growing. As I mentioned, we see roughly a 10X increase in data volume every five years. So that's exponentially increasing data volumes. From a purpose-built perspective, it's the right tool for the right job, like on Redshift and data warehousing, Athena for query and all your data. EMRs are a managed spark in a dupe, open search for log analytics and search. And then Kinesis and MSK for Kafka and streaming. And that's been another big trend is real-time data has been exploding and customers wanting to make sense of that data in real-time is another big deal. Some examples of how we're able to achieve differentiated performance and purpose-built systems. So with Redshift, using Redshift-managed storage and its latest instance types, to 3X better price performance and what's out there, available to all our customers and partners in EMR. With things like Spark, we're able to deliver a 2X performance of open source with 100% compatibility, almost 3X in Presto, with Graviton 2, which is our new silicon chips on AWS better price performance, about 10 to 12% better price performance and 20% lower costs, and then all compatible open source. So drop your jobs in, you have them run faster and cheaper and that translates to customer benefits or better margins for partners. From a serverless perspective, this is about simplifying operations, reducing total cost of ownership and freeing customers from the need to think about capacity management. At re-invent, we announced serverless Redshift, serverless EMR, serverless Kinesis and Kafka. And these are all game changers for customers in terms of freeing our customers and partners from having to think about infrastructure and allow them to focus on data. And when it comes to serverless options and analytics, we've really got a very full and complete set. So whether that's around data warehousing, big data processing streaming or cataloging or governance or visualization, we want all of our customers to have an option to run something serverless, as well as if they have specialized needs, instances are available as well. And so really providing a comprehensive deployment model based on the customer's use cases. From a governance perspective, lake formation is about easy build and management of data lakes. And this is what enables data sharing and self-service. And you get very granular access controls. So rural security, simple data sharing, you can tag data. So you can tag a group of analysts in the EU and you can say those only have access to EU data that's been tagged with EU tags. And it allows you to very scaleably provide different secure views onto the same data without having to make multiple copies. Another big win for customers and partners. Support transactions on data lakes or updates and deletes and time travel. John talked about data as code and with time travel you can look at querying on different versions of data. So that's a big enabler for those types of strategies. And with Blue, you're able to connect data in multiple places. So whether that's accessing data on premises in other SaaS providers or the clouds, as well as data that's on AWS and all of this is serverless and interconnected. And really it's about plugging all of your data into the AWS ecosystem and into our partner ecosystem. So these APIs all available for integration as well. And then from an ML perspective, what we're really trying to do is bring machine learning closer to data. And so with our databases and warehouses and lakes and BI tools, we've infused machine learning throughout powered by the state of the art machine learning that we offer through SageMaker. And so you've got ML and Aurora and Neptune for graphs. You can train machine learning models from SQL directly from Redshift and the Pima you can use for inference. And then QuickSight has built in forecasting built in natural language querying all powered by machine learning. Same with anomaly detection. And here the idea is, how can we help our systems get smarter to surface the right insights for our customers so that they don't have to always rely on smart people asking the right questions. And really it's about bringing data back together and making it available for innovation. And thank you very much. I appreciate your attention. Okay, well done. Reinventing the business with AWS analytics rule. That was great. Thanks for walking through that. That was awesome. I have to ask you some questions on the end to end view of the data. That seems to be a theme. Serverless in there, ML integration. But then you also mentioned picking the right tool for the job. So then you got all these things moving on. Simplify it for me right now. So from a business standpoint, how do they modernize? What's the steps that the clients are taking with analytics? What's the best practice? What's the high order bit here? So the basic high order bit is, historically legacy systems are rigid and inflexible and they weren't really designed for the scale of modern data or the variety of it. And so what customers are finding is they're moving to the cloud. They're moving from legacy systems with punitive licensing into more flexible, more systems. And that allows them to really think about building decoupled, scalable, future-proof architectures. And so you've got the ability to combine data lakes and databases and data warehouses and connect them using common APIs and common data protection. And that sets you up to deal with arbitrary scale and arbitrary types. And it allows you to evolve as the future changes. It makes it easy to add in the new type of engine as we invent a better one in a few years from now. And then once you've kind of got your data in a cloud and interconnected in this way, you can now build complete pictures of what's going on. You can understand all your touch points with customers. You can understand your complete supply chain. And once you can build that complete picture of your business, you can start to use analytics and machine learning to find new opportunities. So think about modernizing, moving to the cloud, setting up for the future of connecting data end to end and then figuring out how to use that to your advantage. I know what you mentioned, modern data strategy gives the best of both worlds. And you mentioned briefly, I want to get a little bit more insight from you on this. You mentioned open, open formats. One of the themes that's come out of some of the interviews of these companies we're going to be hearing from today is open source, the role opens playing. How do you see that integrating in? Because again, this is just like software, right? Open, open source software, open source data it seems to be a trend. What does open look like to you? How do you see that progressing? It's a great question. Open operates on multiple dimensions, John, as you point out. There's open data formats. These are things like JSON and Parquet for analytics. This allows multiple engines to interoperate on data and it creates option value for customers. If you've got your data in an open format, you can use it with multiple technologies and it'll be future-proofed. You don't have to migrate your data if you're thinking about using a different technology. So that's one piece that open source software also really a big enabler for innovation and for customers. You've got things like Spark and Presto, which are popular and I know some of the startups that we're talking about as part of the showcase use these technologies. And this allows for really the world to contribute to innovating in these engines and moving them forward together. And we're big believers in that. We've got open source services. We contribute to open source. We support open source projects and that's another big part of what we do. And then there's open APIs, things like SQL or Python. Again, common ways of interacting with data that are broadly adopted. And this again, creates standardization. It makes it easier for customers to interoperate and be flexible. And so open is really present all the way through and it's a big part I think of the present and the future. Yeah, it's going to be fun to watch and see how that grows. It seems to be a lot of traction there. I want to ask you about the other comment I thought was cool. You had the architectural slides out there. One was data lakes built on S3 and you had Athena glue and lake formation kind of around S3. And then you had the constellation of Kinesis, SageMaker and other things around it. And you said, pick the tool for the right job. And then you had the other slide on the analytics at the center and you had Redshift and all the other services around it, around serverless. So one was more about the data lake with Athena glue and lake formation. The other one was about serverless. Explain that a little bit more for me because I'm trying to understand where that fits. I get the data lake piece, okay? Athena glue and lake formation enables it. And then you could pick and choose what you need. On the serverless side, what does analytics on the center mean? So the idea there is that really we wanted to talk about the fact that if you zoom into the analytics use case within analytics, everything that we offer has a serverless option for our customers. And so if you look at the bucket of analytics across things like Redshift or EMR or Athena or glue and lake formation, you have the option to use instances or containers but also to just not worry about infrastructure and just think declaratively about the data that you wanted. So basically saying the analytics is going serverless everywhere. Talk about volumes. You mentioned 10X volumes. What other stats can you share in terms of volumes? What are people seeing? Velocity, obviously data warehouses can't move as fast as what we're seeing in the cloud with some of your customers and how they're using data. How does the volume and velocity communicate? Do you have any other insights into those numbers? Yeah, I mean, I think from a stats perspective, take Redshift for example, customers are processing some reading and writing multiple exabytes of data a day across Redshift. And one of the things that we've seen as time has progressed is data volumes have gone up and data types have exploded. You've seen data warehouses get more flexible. So we've added things like the ability to put semi-structured data and arbitrary nested data into Redshift. We've also seen the seamless integration of data warehouses and data lakes. So Redshift was one of the first to enable straightforward querying of data that's sitting locally in drives as well as data that's managed on S3. And those trends will continue. I think you'll kind of continue to see this need to query data wherever it lives and allow lakes and warehouses and purpose-built stores to interconnect. You know, one of the things I liked about your presentation was kind of had the theme of modernize, unify, innovate. And we've been covering a lot of companies that have been, I won't say stumbling, but like getting to the future. Some go faster than others, but they all kind of get stuck in an area that seems to be the same spot. It's the silos, breaking down the silos and getting the data lakes and kind of blending that purpose-built data store. And they get stuck there because they're so used to silos and their teams. And that's kind of holding back the machine learning side of it because machine learning can't do its job if they don't have access to all the data. And that's where we're seeing machine learning kind of being this new iterative model where the models are coming in faster. And so the silo break busting is an issue. So what's your take on this part of the equation? So there's a few things at play. So you're absolutely right. I think that transition from siloed data to interconnected data is always straightforward. And it operates on a number of levels. You want to have the right technology. So, you know, we enable things like queries that can span multiple stores. You want to have good governance. You can connect across multiple ones. And then you need to be able to get data in and out of these things and glue place that well. So there's that interconnection on the technical side. But the other piece is also, you know, you want to think through organizationally, how do you organize? How do you define who owns data and when they share it? And what are the SLAs for enabling that sharing? And think about some of the processes that need to get put in place and create the right incentives within your company to enable that data sharing. And then the foundational piece is good guardrails. You know, it can be scary to open data up. And the key to that is to put good governance in place that you can ensure that data can be shared and distributed but while remaining protected and adhering to the privacy and compliance and security regulations that you have for that. And once you can assert that level of protection, then you can set that data free. And that's when customers really start to see the benefits of connecting a little bit together. Right, and then we have a batch of startups here on this episode that are doing a lot of different things. Some have, you know, new lakes are forming, observability lakes. You have SQL innovation on the front end, data tiering, innovation at the data tiered side. Just a ton of innovation around this new data as code. How do you see as executive at AWS, you're enabling all this? Where is the action going? Where are the white spaces? Where are the opportunities as this architecture continues to grow and get traction because of the relevance of machine learning and AI and the apps are embedding data in there now as code. Where's the opportunities for these startups and how can they continue to grow? Yeah, I mean, the opportunity is, it's amazing, John. You know, we talked a little bit about this at the beginning, but there is no slowdown in sight for the volume of data that we're generating pretty much everything that we have. You know, whether it's a watch or a phone or the systems that we interact with are generating data. And, you know, customers, you know, we talk a lot about the things that will stay the same over time. And so, you know, data volumes will continue to go up. Customers are going to want to keep analyzing that data to make sense of it. They're going to want to be able to do it faster and more cheaply than they were yesterday. And they're going to want to be able to make decisions and innovate in a shorter cycle and run more experiments than they were able to do. And so I think as long as, and they're always going to want this data to be secure and well-protected. And so I think as long as we and the startups that we work with continue to push on making these things better, can I deal with more data? Can I deal with it more cheaply? Can I make it easier to get inside and can I maintain super high bar security? Investments in these areas will just pay off because the demand side of this equation is just in a great place given what we're seeing in terms of data and the appetite for it. I also loved your comment about ML integration being the last leg of the equation here or last leg of the journey. But you got that enablement of the AI piece solves a lot of problems. People can see benefits from good machine learning and AI. It's creating opportunities. And also you also have mentioned the end to end with security piece. So data and security are kind of going hand in hand these days, not just the governments and the compliance stuff, we're talking about security. So machine learning integration kind of connects all of this. What's that all mean for the customers? For customers, it means that with machine learning and really enabling themselves to use machine learning to make sense of data, they're able to find patterns that can represent new opportunities quicker than ever before. And they're able to do it dynamically. So in a prior version of the world, we'd have rule-based systems and they would be relatively rigid and then we'd have to improve them with machine learning, this can be dynamic in near real time when you can customize. And so that just represents an opportunity to deepen relationships with customers and create more value and to find more efficiency in how businesses are run. So that piece is there. And your ideas around the data's code really come into play because machine learning needs to be repeatable and explainable and that means versioning, you can track of everything that you've done from a code and data and learning and training perspective. And data sets are updating machine learning. You've got data sets growing, they become code modules that can be reused and interrogated. Security is a big theme. Data is really important in security. Seeing that as one of our top use cases certainly now in this day and age, we're getting a lot of breaches and hacks coming in, being defended. Brings up the open, brings up the data as code. Security is a good proxy for kind of where this is going. What's your take on that on the reaction to that? So on security, we can never invest enough and I think one of the things that we, you know, the guidance at AWS is security, availability, durability, jobs, you know, one, two, three. And it operates at multiple levels. You need to protect data and rest with encryption and good key management and good practices there. You need to protect data on the wire. You need to have a good sense of what data is allowed to be seen by whom and then you need to keep track of who did what and be able to verify and come back and prove that, you know, only the things that were allowed to happen actually happened. And you can actually then use machine learning on top of all of this apparatus to say, you know, can I detect things that are happening that shouldn't be happening in near real times that could put a stop to them? So I don't think any of us can ever invest enough in securing and protecting our data and our systems. And it is really fundamental to earning customer trust and it's just good business. So I think it is absolutely crucial when we think about it all the time and are always looking for ways to raise the mark. Well, I really appreciate you taking the time to give the keynote final word here for the folks watching. A lot of these startups that are presenting, they're doing well business-wise, they're being used by large enterprises and people buying their products and using their services. For customers who are implementing more and more of the hot startups products, they're relevant. What's your advice to the customer out there as they go on this journey, this new data as code, this new future of analytics? What's your recommendation? So for customers who are out there, I recommend you take a look at what the startups on AWS are building. I think there's tremendous innovation and energy and there's really great technology being built on top of a rock-solid platform. And so I encourage customers thinking about it to lean forward, to think about new technology and to embrace a move to the clouds so they can modernize and build a single picture of their data and figure out how to innovate and win. Earl, thanks for coming on. Appreciate your keynote. Thanks for the insight and thanks for the conversation. Let's hand it off to the show. Let the show begin. Thank you, John. It's a pleasure as always. Great to see you. Thank you.