 Live from San Jose, it's theCUBE. Presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and it's ecosystem partners. Welcome back to Big Data SV, everybody. My name is Dave Vellante, and this is theCUBE, the leader in live tech coverage. You know, this is our 10th Big Data event when we first started covering Big Data back in 2010. It was Hadoop, and everything was a batch job, about four or five years ago, everybody started talking about real time and the ability to affect outcomes before you lose the customer. Lewis Kanasharo is here, he's the CEO of Streamlio and he's joined by Karthik Ramasamy, who's the chief product officer. They're both co-founders. Gentlemen, welcome to theCUBE. My first question is, why did you start this company? Sure, we came together around a vision that enterprises need to access the value around fast data. And so as you mentioned, enterprises are moving out of the slow data era and looking for a fast data value to their data to really deliver that back to their users or their use cases. And so coming together around that idea of real time action, what we did was we realized that enterprises can't all access this data with projects right now that are not meant to work together, that are very difficult perhaps to stitch together. So what we did was create an intelligent platform for fast data that's really accessible to enterprises of all sizes. What we do is we unify the core components to access fast data, which is messaging, compute, and stream storage. Accessing the best of breed open source technology that's really open sourced out of Twitter and Yahoo. So Karthik, I was going to ask you, why does the world need another streaming platform? But Lewis kind of touched on it, because it's too hard, it's too complicated. So you guys are trying to simplify all that. Yep, the reason mainly we wanted to simplify it because based on all our experiences at Twitter and Yahoo, one of the key aspects was to simplify it so that it's consumable by regular enterprise because Twitter and Yahoo kind of organization can afford the talent and the expertise in order to do these real time platforms. But when it goes to normal enterprises, they don't have access to the expertise and the cost benefits that they might have to incur. So because of that, we wanted to use these open source projects whatever the Twitter and Yahoo has provided, combine them and make sure that you have a simple easy drag and drop kind of interface so that it is easily consumable for any enterprise. Essentially what we are trying to do is reduce the barrier for entry to enterprises for real time for all the enterprises. Yeah, enterprises will pay up for a solution. The companies that you used to work for, they'll gladly throw engineering at the problem. To save time but most organizations, they don't have the resources and so, okay, so how does it, would it work prior to Streamlio, maybe take us through sort of how a company would attack this problem, the complexities of what they have to deal with and what life is like with you guys. So the current state of the world is essentially it's fragmented solution today as the state of the world is where you take multiple pieces of different projects and you have to assemble them together in a format so that you can do the end to end real time and I'll take out the steam analytics, right? So the reason why people end up doing is each of these big data projects that the people use was designed for completely different purpose. Like messaging is one and computers another one and third one is storage one. So essentially what we have done as a company is to simplify this aspect by integrating these well-known best-of-the-breed projects called, for messaging we use something called Apache Pulsar, for compute we use something called Apache Current from Twitter and similarly for storage, for real-time storage we use something called Apache Bookkeeper so I'm going to unify them so that under the hoods it might be three systems but as a user when you're using it, it serves as a or functions as a single system. So you don't, so you install the system and you ingest your data, express your computation and get the results out in one single system. So you've unified or converged these functions, if I understand it correctly we're talking off camera a little bit. The team, Lewis, that you've assembled actually developed a lot of these or hugely committed to these open source projects. Absolutely, co-creators of each of the projects and what that allows us to do is to really integrate at a deep level each project. For example Pulsar is actually a pub sub system that is built on Bookkeeper and Bookkeeper in our minds is a peerless best-of-breed stream storage solution so fast and durable storage. That storage is also used in Apache Heron to store state. And so as you can see enterprise is rather than stitching together multiple different solutions for queuing, streaming, compute and storage now have one option that they can install in a very small cluster and operationally it's very simple to scale up. We simply add nodes if you get data spikes and what this allows is enterprises to access new and exciting use cases that really weren't possible before. For example machine learning model deployment to real time. So I'm a data scientist and what I found is in data science you spend a lot of time training models in batch mode. It's a legacy type of approach but once the model is trained you want to put that model into production in real time so that you can deliver that value back to a user in real time. Let's call it under two second SLA. So that has been a great use case for Streamlio because we are already made intelligent platform for fast data for ML AI deployment. And the use cases are typically stateful and you're persisting data, is that right? Yes, use cases, it can be used for stateless use cases also but the key advantage that we bring to the table is stateful storage and since we shift along with the storage, the realising stateful storage becomes much easier because of the fact that it can be used to store the real intermediate state of the computation or it can be used for the staging area of the messaging where the data when it spills over from whatever the memory is it's automatically stored to disk or you can even retain the data as long as you want. So that you can unlock the value later whenever after the data has been processed for the fast data you can access the lazy data later in time too. So give us the rundown on the company, funding, VC's, headcount, give us the basics. Sure, we raised Series A from Light Speed Venture Partners led by John Veronis and Sudeep Chakrabarti. We raised seven and a half million and emerged from stealth back in August. That allowed us to ramp up our team to 17 now mainly engineers in order to really have a very solid product. But we launched Post Rev pre-launch and some of our customers are really looking at geo-replication across multiple data centers and so active-active geo-replication is an open source feature in Apache Pulsar and that's been a huge draw compared to some other solutions that are out there. As you can see this theme of simplifying architecture is where Streamlio sits so unifying queuing and streaming allows us to replace a number of different legacy systems. So that's been one avenue to help grow. The other obviously is on the compute piece as enterprises are finding new and exciting use cases to deliver back to their users, the compute piece needs to scale up and down. We also announced Pulsar Functions which is stream native compute that allows very simple function computation in native Python and Java. So you spin up an Apache Pulsar cluster or Streamlio platform and you simply have compute functionality. That allows us to access edge use cases. So IoT is a huge kind of exciting POCs for us right now where we have connected car examples that don't need a heavyweight scheduler deployment at the edge, it's Pulsar Pulsar Functions. What that allows us to do are things like fraud detection, anomaly detection at the edge, model deployment at the edge, interpolation, observability and alerts. And so how do you charge for this? Is it usage-based? Sure. What we found is enterprises are more comfortable on a per node basis simply because we have the ambition to really scale up and help enterprises really use Streamlio as their fast data platform across the entire enterprise. We found that having a per data charge rate actually would limit that growth and so per node and shared architecture. So we took an early investment in optimizing around Kubernetes. And so as enterprises are adopting Kubernetes, we are the most simple installation on Kubernetes. So on-prem, multicloud at the edge. I love it. So I mean, for years, we've just been talking about the complexity headwinds in this big data space. We certainly saw that with Hadoop. Spark was designed to certainly solve some of those problems, but it sounds like you're doing some really good work to take that further. Lewis and Kartek, thanks so much for coming on theCUBE. I really appreciate it. Thanks for having us. Thanks for having us, Dave. All right, and thank you for watching. We're here at Big Data SV Live from San Jose. We'll be right back.