 Okay, welcome back everyone. We're here live here in San Francisco for Amazon's web services summit. This is the smaller event compared to re-invent the big conference in Vegas, which we were broadcasting live. I'm John Curry, the founder of SiliconANGLE. This is theCUBE, our flagship program where we go out to the events that strike the signal from the noise. And an Amazon show would not be complete without talking to the Amazon guys directly about what's going on under the hood. And our next guest is Adi Krishnan and Ryan White who run the Kinesis teams. Guys, welcome to theCUBE. So Dave Vellante and I, who's not here, unfortunately has another commitment, but we were going gaga over the Kinesis. We love Redshift, we love going with the data. I see Glacier's really low cost options to store stuff, but when you start adding on Redshift and Kinesis, you're adding in some new features that really point to where the markets go, which is I need to deal with real-time stuff. I need to deal with a lot of data. I need to manage it effectively. I need low latency across any work use case. Okay, so how the hell do you come up with Kinesis? Give us the insight into how it all came together. We love the real-time. We love how it's all kind of closing the loop if you will for a developer. And just take us through how it came about. What are some of the stats now, post, reinvent, share with us? Well, the Genesis for Kinesis was trying to solve our metering problems. The metering problem inside of AWS is how do we keep track with how our customers are using our products? So every time a customer does a read out of DynamoDB or they read a file out of S3 or they do some sort of transaction with any of our products, that generates a metering record. It's tens of millions of records per second and tens of terabytes per hour. So it's a big workload. What we're trying to do is understand how to transition from being a batch-oriented processing workload where we're using large Hadoop clusters to process all that data to a continuous processing model where we could read all of that data in real-time and make decisions on that data in real-time. So you basically created an aspirin for yourself. It's kind of had a little pain point internally, right? Yeah, Kinesis is kind of an example of us building a product to solve some of our own problems first and then making that available to the public. Okay, so you guys do your Amazon thing which I've gotten to know about a little bit of the culture there. You guys kind of break stuff, kind of the quote Zuckerberg. You guys kind of invented that philosophy. You know, move the stuff quickly, iterating fast, so you solve your own problem. And then was there an aha moment like, hell, Dan, this is good. We could bring it out to the market. Well, we're customers asking for it at the same time. Was it kind of a known use case? Did you bring it to the market? What happened next? We spent a lot of time talking to a lot of customers. I think that was kind of the gist of it. We had customers from all different sorts of industrial vehicles, financial services, consumer online services, from manufacturing, from the digital app tech, come up to us and say, we have this canonical workflow. This workflow is about getting data of all of these producers, the sources of data, figuring out a way to aggregate that data, and then driving it through a variety of different crossing systems to ultimately light up different data stores. These data stores could be native to AWS stores like S3, DynamoDB. They could be more interesting, higher in data warehousing services but the key thing was, how do we deal with all this massive amount of data that's been produced in real time, ingest it reliably, scale it elastically, and enable continuous crossing of the data? You know, we always love the word elastic. It's a term that you guys have built your business around, being elastic means you have a lot of flexibility and that's a key part of being agile. But I want you guys to, while we're here on theCUBE, define kinesis for the folks out there. What the hell is it? You define it for the record and then I have some specific questions I want to answer. So kinesis is a new service for processing huge amounts of streaming data in real time. Short and simple. Scales elastically, so as your data volume increases or decreases, the service grows with you. So like a Node.js error log or an iPhone data. This is an example, would this be an example of streaming? Yeah, exactly. You can imagine that you were tailing a whole bunch of logs coming off of servers. You could also be watching event streams coming out of little internet of things type devices. One of our customers we're talking about here is a Supercell who's capturing in game data from their game Clash of the Plants. So as you're playing Clash of the Plants, you're capping on the screen, all of that data is captured in kinesis and been processed by Supercell. And this is valid data. I mean, obviously, you mentioned some of these, get internet of things, which is a sensor network to wearable computers to whatever, mobile phones, obviously event data coming off machine. So you got machine data, you got human data, you got application data. That's kind of the data sets we're seeing with kinesis, right, diverse set. Also traction with trends like Spark out of Berkeley, you're seeing in-memory. Is this kind of, is this in your wheelhouse? How does that all relate to? Because you guys have purpose built SSDs now in your new EC2 instances and all this new modern gear we heard in the announcements. How does all the in-memory stuff affect the kinesis service? It's a great question. What you can imagine is kinesis is being a great service for capturing all of that data that's being generated by hundreds of thousands or millions of sources. It gets sent to kinesis where we replicated across three different availability zones. That data is then made available for applications to process. Those applications that are processing that data could be Hadoop clusters. They could be your own kinesis applications and it could be a Spark cluster. And so writing Spark applications that are processing that data in real time is a great use case. And the in-memory capabilities of Spark are probably ideal for being able to process data that's stored in kinesis. Okay, so let's talk about some of the connecting the dots. So kinesis works in conjunction with what other services are you seeing that is being adopted most right now? So I mentioned Redshift, I'm just throwing that in there. Obviously a data warehousing like Tool, seeing a lot of business intelligence. So basically people are playing with data. A lot of different needs for the data. So how does kinesis connect through the stack? I think the number one use case we see is customers capturing all of this data and then archiving all of it right away to S3. Just been difficult to capture everything. And even if you did, you probably could keep it for a little while and then you had to get rid of it. But with the prices for S3 being so low and kinesis being so easy to capture tiny rights, these little tiny tails of log data that are coming out of your servers or little bits of data coming off the mobile devices, capture all of that aggregated and put it in S3. That's the number one use case we see. As customers are becoming more sophisticated with using kinesis, they then begin to run real-time dashboards on top of kinesis data. So you could push all the data into DynamoDB or you could push all that data into even something like Redshift and run analytics on top of that. The final case is people then doing real-time decision-making based on kinesis. So once you've got all this data coming in, you're putting it into DynamoDB or Redshift or EMR, you then process it and then start making decisions. Automated decisions that take advantage of the real-time nature of it. So essentially you're taking us down the life cycle of kind of like man walking the wreck at some point, right? It's like they start small, they store in the data, usually probably a developer problem, just inefficiencies, log file management. It's a disaster, we know. It's a pain in the butt for a developer. So step one is solve that pain, triage that. Next step is okay, I'm dashboarding, I'm starting to learn about the data and then three is more advanced. Like real-time decision-making. So like now that I've got the data coming in in real-time, I'm now going to act on it. Yeah, so what I want to bring that up, this is more of a theoretical, kind of orthogonal conversation, is what you guys are basically doing is we like that silicon angle, like the point out to kind of what's weird in the market and kind of why it's important. And that is the data things, there's something doing with data really points to a new developer paradigm. And I want to get you guys' comments on this. No one's really come out yet and said, here's a development kit or development environment for data. You're seeing companies like Factual doing some amazing stuff. I don't know if you know those guys, just met with New Relic, they launched kind of this data off the application. So you're seeing what you guys are doing. You can imagine that now the developer framework is, hey, I have to deal with data as a resource constraint. So you haven't seen anything. So I want to get your thoughts. Do you see that happening in that direction? How will data be presented to developers? Is it going to be abstracted away? Will there be development environments? Is it matter of just organizing the data? What's your vision around that? So that's a really good question because we've got customers that come up to us and say, I want to mel real-time data with batch processor or I have my data that is right now lots of little data and now I want to go ahead and aggregate it to make sense of it for a little longer period of time. And there's a lot of theory around how data should be modeled, how data should be represented. But the way we are taking the evolution step is really learning from our customers. And customers come up and say, we need the ability to capture data quickly but then what I want to do is apply my existing Hadoop stack and tools to my data because I know and understand that. And as a response to that customer demand was the earmark connector. So now customers can use, say, live queries or cascading scripts and apply that to real-time data that Kinesis is ingesting. Another response to customers was that some customers that would really like the stream-crossing construct of storm. And so our step over there was to say, okay, we shipped the Kinesis storm spout. So now customers can bring their choice of paradigm in and mel that with Kinesis. So I think the short answer there right now is that keeping it to moving customers. It's early. It's early. It's really early, right? I would also add, just as with Hadoop, there's so many different ways to process data. In the real-time space, there are going to be so many different ways that people process that data. There's never going to be a single tool that you use for processing real-time data. It's a lot of tools. It adapts to the way that people think about data. So this also brings us back to the DevOps culture, which you guys essentially found in Amazon in the early days. And I got to give you credit for that. You guys deserve it. DevOps was really about building from the ground up to the cloud, which post.com bubble, really, to think about it. That's Amazon's, you've lived your own world, right? To survive with less and help other developers. But that brings up the good point, right? So, okay, if data is early and I'm now going to be advancing slowly, can there be a single architecture for dealing with data? Or is it going to be specialized systems? You're seeing Oracle, it's made some little products with engineered systems. You're seeing any great stacks working. So what's the take on the data equation? Not just to do, there's other data out there. The Internet of Things data. What is the future architecture right now? I think what we're going to see is a set of patterns that begin to evolve. And people will be using those patterns for doing particular types of processing. One of the other teams that I run at AWS is the fraud detection team. We use a set of machine learning algorithms to be able to continuously monitor usage of the cloud to identify patterns of behavior which are indicative of fraud. That kind of pattern of use is very different than I'm doing quick stream analysis, right? And the kind of pattern that we use for doing that would naturally be different. I think we're going to see a canonical set of patterns. I don't know if we're going to see a very particular set of technologies. Yeah, so that brings us back to the DevOps thing. So, Adi, I want to get your take on this because DevOps is really about efficiencies. Software guys don't want to be hardware guys. At the end of the day, that's how it all started. I don't want to provision the network. I don't want to stack the servers. I just want to push code. And then you guys have created some really easy ways to make that completely transparent. But now you're talking about composite application development. You're saying, hey, I'm going to have an EMR over here for my Hadoop cluster, and I'm going to deal with some maybe fraud detection stream data that's going to be a different system than Hadoop or going to be a relational database. Now I need to basically composite the build it out. That's what we're talking about here, composite instruction research. Is that kind of the new DevOps 2.0? I mean, because what I'm trying to tease out here is what's next at the DevOps? I mean, DevOps really means there's no operations. And how does a developer deal with these kinds of complex environments like fraud detection, maybe application here, a container for this pass? So is it going to be fully composite? Well, I don't know if we run the full circuit with the DevOps development model. It's a great model. It's worked really well for a number of startups. However, making it easy to be able to plug different components together, I think it's just a great idea. So like, as Adi mentioned just a moment ago, our ability to take data and kinesis and pump that right into elastic map reduce, that's great and it makes it easy for people to use their existing applications with the new system like kinesis. That kind of composing of applications, it's worked well for a long time and I think you're just going to see us continuing to do more and more of that kind of work. So I got to ask both of you guys a question. Give me an example of when something broke internally. This is not an aside, I don't want to go negative here, but part of your culture is to move fast, iterate. So when these important parts are like kinesis, give me an example of like, that was a helpful way when you guys stumbled. What did you learn? What was the key pain points of the evolution of getting it out the door? And what key things did you learn from either success or kind of a speed bumper failure along the way? Well, I think one of the first things we learned right after we'd shipped kinesis and we were still in a limited previews and we were trying it out with our customers, we were getting feedback and learning what they wanted to change in the product. One of the first things that we learned was that the amount of time that it took to put data into kinesis and receive a return code was too high for a lot of our customers. It was probably around a hundred milliseconds for the time that you put the data into the time that we've replicated that data across multiple availability zones and returned success to the client. That was a moment for us to really think about what it meant to enable people to be pushing tons of data into kinesis and we went back- It's still not bad, a hundred milliseconds is... That's low. No, it wasn't too bad. Right away, we went back and doubled our efforts and we came back in somewhere between 30 and 40 milliseconds depending on your network connectivity. Hey, in the old days, that was the spinning disk of the 10, 20 meg hard disk on a VC. That's right, that's right. Getting those Lotus files out or accessing those Windows files. So you guys improve performance. So that's an example you guys have done. What's the biggest surprise that you guys have seen from a customer use case? That was kind of like, wow, this is really something that we didn't see happening on a larger scale that caught me by surprise. Rising use case. It could be a corner use case, like, wow, I never figured that. You know, I would say like, some of the one thing that actually surprised us was how common it is for people to have multiple applications reading out of the same stream. Like again, the basic use case for so many customers is I'm going to take all this data and I'm just going to throw it into S3. And we kind of envisioned that there might be a couple of different applications reading data out of that stream. We have a couple of customers that actually have as many as three applications that are reading that stream of events that are coming out of Kinesis. Each one of them is reading from a different position in the stream. They're able to read from different locations, process that data differently. But the idea that Kinesis is so different from traditional queuing systems and yet provides a real time functionality and that multiple applications can read from it. That was a bit of a question. So who's the number one use case right now who's adopting Kinesis? You're watching folks watching out there that the Kinesis brain trust right here within Amazon. What are the killer no brainer scenarios that you're seeing on the uptake side right now that people should be aware of that they haven't really kicked the tires on Kinesis, what they should be, what should they be looking at? Wow, I think the number one use case is log ingestion. So like I'm tailing logs that are coming off of web servers, my application servers, data that's just being produced continuously. We grab all that data and very easily put it into something like S3. The beauty of that model is, I now have all the log data and I got it off of all of my hosts as quickly as possible and I can go do log dives later if there's a problem. That is the slam dunk use case for Kinesis. There are other scenarios that are beginning to emerge as well, I don't know, Adi, if you want to talk into the more interesting cases. Yeah, the other one that's very interesting and lots of customers are doing so already is they emit data from all sorts of devices. So these devices are not just your smart phones and tablets that are practically full form computing machines, but also seemingly low power, seemingly dumb devices. And the desire remains the same, but a millions of these out there and having the ability to capture the data they produce in real time is key. Yeah, I think just to highlight that, one of the things I'm hearing on theCUBE interviews, all the customers we talk to is, the number one thing is I just got to screw the data, I know what I'm doing with it yet. Now that's a practice that's a hangover from the BI, this data warehouse in business of just stored from the compliance reasons, now which is basically like, that's like Glacier as far as I'm concerned. Traditional business intelligence systems are like their version of Glacier, it's shipped out somewhere and give me those reports five weeks later, they come back. But that's different, now you see people store the data and they realize I need to touch it faster. I don't know yet when. That's why I'm teasing out this whole development 2.0 model because I'm just seeing more and more people want the data hanging around, but not fully parked out at Glacier or some sort of compliance storage solution. You know, I think I kind of understand where you're going. There's, I'm going to use a model for like how we used to do BI analytics in our own internal data warehouse. I also run the data warehouse for AWS. And the classic BI model there is somebody asks a question, we go off and we just do some analysis. And if it's a question that we're going to ask repeatedly, we build a special fact table or a dimensional view or something to be able to grind through that particular view and do it very quickly. Kinesis offers a different kind of data processing model which is I'm collecting all of the data. I make it easy to capture everything. But now I can start doing things like, oh, there's certain pieces of data that I want to respond to quickly. Just like we would create dimensional views that would give us access to particular sets of data in very quick pace. We can now also respond to when those events are generated very quickly. Well, you guys are the young guns in the industry now. I'm a little bit older, the gray hair showing. We actually use the word data processing back in the day. The data processing, the DP department or the MIS department. If you remember those days, MIS was the management information. Are we going back to those terms? I mean, look at, look what's happening. Is it the software mainframe in the cloud? I mean, these are some of the words you're using. Just data processing, data pipeline. MIS, that's my word. I mean, we're back to those old school stuff but different look and feel. Well, I think those kinds of very generic terms make a lot of sense for what we're doing, especially as we move into these brand new spaces like, wow, what do I do with real-time data? Real-time data processing is kind of the third type of big data processing. Data warehousing was the first type. I know what my data looks like. I create indices. I have a pre-computation of the data. A Hadoop clusters in the MapReduce model is kind of the second wave of big data processing. And real-time processing, I think, would be the third wave of big data processing. Well, I'm getting the hook here, but I got to just say, you guys are doing an amazing job. We're big fans of Amazon. I always say that, you know, it's very rare in the history of the world, you look at innovations like the printing press, the Wright Brothers, we've discovered, you know, flying and things that have an Amazon with the cloud. You guys have done something that's pretty amazing but what I find fascinating is, it's very rare to see a company that's commoditizing and disrupting and innovating at the same time. And it's really a unique value proposition. And the competition is responding, IBM, Google. So, you guys have a lot of targets painted on your back by a lot of big players. So, one, congratulations on your success, which means that you're not going to go on the open field and fight the British, if they said use the American Revolution analogy. You got to continue to compete. So, what's your view of that? I mean, and I'm sure you don't talk about competition, you're probably told not to talk about it, but I mean, you got to know that all the guns are on you right now and the big guys are putting up the seawall for your wave of innovation. How do you guys deal with that? What's your thoughts? It's not like we ignore our competitors, but we obsess about our customers, right? Like it's just constantly looking for what are people trying to do and how can we help them? And it can seem like a very simple strategy, but the strategy is build what people want and we get a lot of great feedback on how we can make our products better and that's what we do. It certainly will force you to up your game. When you have the competition sighting on you, you got to focus on the customer, which is cool, but like you guys kind of aware of like games on, I mean, Amazon, Andy giving a little tech talk and hey, game is on, guys, that's rock and roll. You guys are aware, right? I think we're totally aware and I think we're actually sometimes a little surprised at how long it's taken our competitors to kind of get into this industry with us. So again, as Andy talked about earlier today, we've had eight years in the cloud computing market. It's been a great eight years and we have a lot of work to do, a lot of stuff that we're going to be bringing. We're almost ready for middle school. Final question for you guys and give you the final word here. Share with the folks in the last word is why is this show so important, right? At this point in time, in this market, why is this environment of the thousands of people who are here learning about Amazon? What should they know about why this is such an important event? Well, I think our summits are a great opportunity for us to share with customers how to use our AWS services, learn firsthand from not only our hands-on lives, but also our partners that are providing information about how they use AWS resources. It's a great opportunity to meet a lot of people that are taking advantage of the cloud computing wave and see how to use the cloud most effectively. Eddie? It's just a great time to be in the cloud right now and with all these amazing services coming up, there's no better mind-melt of people coming together and so that's probably as good a reason as any. You guys are doing a great job of disrupting, changing the future, modern enterprise and modern business, modern applications. Exciting to watch and if you guys keep focusing on your customers with that customer base, you keep up the pace, that's the question. Can you finish the race? That's what I always tell Dave Vellante. I know Dave's watching. Dave, shout out to Dave Vellante, who's on the mobile app right now, is traveling. Guys, thanks for coming inside. Kinesis, great stuff, closing the loop real time. Amazon really building it out. Thanks for coming on theCUBE. We'll be right back with our next guest after this short break. Thank you.