 from San Jose. It's theCUBE, presenting Big Data Silicon Valley, brought to you by SiliconANGLE Media and its ecosystem partners. Welcome back to San Jose, everybody. This is theCUBE, the leader in live tech coverage, and you're watching Big Data SV. My name is Dave Vellante. In the early days of Hadoop, everything was batch oriented. About four or five years ago, the market really started to focus on real time and streaming analytics to try to really help companies affect outcomes while things were still in motion. Steve Wilkes is here. He's the co-founder and CTO of a company called Stream, a firm that's been in this business for around six years. Steve, welcome to theCUBE. Good to see you. Thanks for coming on. Thanks, Dave. It's a pleasure to be here. Yeah, so tell us more about that. You started about six years ago, a little bit before the market really started talking about real time and streaming. So what led you to that conclusion that you should co-found Steam way ahead of its time? It was partly our heritage. So the four of us at Founded Stream, we were executives of Golden Gate Software. In fact, our CEO, Ali Gattay, was the CEO of Golden Gate Software. So when we were acquired by Oracle in 2009, after having to work for Oracle for a couple of years, we were trying to work at what to do next. And at Golden Gate, Golden Gate was replication software, right? So it's moving data from one place to another. But customers would ask us, like in customer advisory boards, that data seems valuable, that's moving. Can you look at it while it's moving and analyze it while it's moving and get value out of that moving data? And so that was kind of sat in our heads. And then we were thinking about what to do next. That was kind of the genesis of the idea. So the concept around Stream, when we first started the company, was we can't just give people streaming data. We need to give them the ability to process that data, analyze it, visualize it, play with it, and really truly understand the data, as well as being able to collect it and move it somewhere else. And so the goal from day one was always to build a full end-to-end platform that did everything customers needed to do for streaming integration analytics out of the box. And that's what we've done after six years. I got to ask a really basic question. So you're talking about your experience at Golden Gate, moving data from point A to point B, and somebody said, well, why don't we put that to work? But is there change data or was it static data? Why couldn't I just analyze it in place? Because Golden Gate works on change data. Yeah, okay, so that's why. There was changes going through. Why wait till it hits its target? Let's do some work in real time and learn from that, get greater productivity. And now you guys have taken that to a new level. That new level being what? Modern tools, modern technologies? A platform built from the ground up to be inherently distributed, scalable, reliable, with kind of exactly one's processing guarantees. And to be a complete end-to-end platform. There's a recognition that the first part of being able to do streaming data integration or analytics is that you need to be able to collect the data, right? And while change data capture from databases is the way to get data out of databases in a streaming fashion, you also have to deal with files and devices and message queues and anywhere else that data can reside, right? So you need a large number of different data collectors that all turn the enterprise data sources into streaming data. And similarly, if you want to store data somewhere, you need a large collection of target adapters that deliver to things, not just on premise, but also in the cloud. So things like Amazon S3 or the cloud databases like Redshift and Google BigQuery, right? So the idea was really that we want to give customers everything they need. And that everything they need isn't trivial, right? It's not just, well, we take Apache Kafka and then we slap things into it and then we take things out, right? Pretty often, for example, you need to be able to enrich data. And that means you need to be able to join streaming data with additional context information, reference data. And that reference data may come from a database or from files or somewhere else. So you can't call out to the database and maintain the speeds of streaming data. We have customers that are doing hundreds of thousands of events per second, right? So you can't call out to a database for every event and ask for a record to enrich it with. And you can't even do that with an external cache because it's just not fast enough. So we built in an in-memory data grid as part of our platform. So you can join streaming data with the context information in real time without slowing anything down. So when you're thinking about doing streaming integration, it's more than just moving data around. It's ability to process it and get it in the right form, to be able to analyze it, to be able to do things like complex event processing on that data, and also to be able to visualize it and play with it is an essential part of the whole platform. Well, good, so I wanted to ask you about end-to-end. Because I've seen a lot of sort of products from larger, maybe legacy companies that will say it's end-to-end, but what it really is is a cobbled together pieces that they bought in. Oh, this is our end-to-end platform, but it's really, you know, it's not unified. Or I've seen others, well, we have an end-to-end platform. Oh, really, can I see the visualization? Well, we don't have visualizations. We use the third party for visualization. So convince me that you're end-to-end. So our platform, you start, when you start with it, you go into a UI, you can start building data flows, right? Those data flows start from connectors. We have all the connectors that you need to get your enterprise data. We have wizards to help you build those. And so now you have a data stream. Now you want to start processing that. We have SQL-based processing. So you can do everything from filtering, transformation, aggregation, enrichment of data. If you want to load reference data into memory, use a cache component, drag that in, configure that. You now have data in memory. You can join with your streams. If you want to now take the results of all that processing and write it somewhere, use one of our target connectors, drag that in. So you've got a data flow that's getting bigger and bigger, doing more and more processing. So now you're writing some of that data out to Kafka. Oh, I'm going to also add in another target adapter to write some of it into a Azure blob storage and some of it's going to Amazon Redshift, right? So now you have a much bigger data flow. But now you say, okay, well, I also want to do some analytics on that. So you take the data stream, you build another data flow that is doing some aggregation of the windows. Maybe some complex event processing. And then you use our dashboard builder to build a dashboard to visualize all of that. And that's all in one product, right? So it literally is everything you need to get value immediately. And you're right, the big vendors, they have multiple different products and they're very happy to sell you consulting to put them all together, right? Even if you're trying to build this from open source and organizations try and do that, you need five or six major pieces of open source, a lot of supporting libraries and a huge team of developers to just build a platform that you can start to build applications on, right? And most organizations aren't software platform companies. They're finance companies, oil and gas companies, healthcare companies. And they really want to focus on solving business problems and not on reinventing the wheel by building a software platform. So we can just go in there and say, look, value immediately. And that really, really helps. So what are some of your favorite use cases, examples, maybe customer examples that you can share with me? So one of the great examples, one of our customers, they have a lot of data in our HP nonstop system. And they needed to be able to get visibility into that immediately. And this was like order processing supply chain ERP data. And it would have taken a very long amount of time, a very large amount of time to be able to do analytics directly on the HP nonstop. And finding resources to do that is hard as well, right? So they need to get the data out. They need to get it into the appropriate place. And they recognize that use the right technology to ask the right questions. So they wanted some of it in Hadoop so they could do some machine learning on that. They wanted some of it to go into Kafka so they could get real-time analytics. And they wanted some of it to go into HBase so they could query it immediately and use that for reference purposes. So they utilized us to do change data capture against the HP nonstop, deliver that data stream out immediately into Kafka and also push some of it into HTFS and some of it into HBase. And so they immediately got value out of that because then they could also build some real-time analytics on it. They would send out alerts if things were taken too long in the order processing system and it led them to get visibility directly into the process that they couldn't get before. With much fewer resources and more modern technologies than they could have used before. So that's one example. Can I ask you a question about that? Yeah, yeah, absolutely. You talked about Kafka, HBase. You talked about a lot of different open source projects. You've integrated those or you've got entries and exits into those? So we ship with Kafka as part of our product. It's an optional messaging bus. So our platform has two different ways of moving data around. We have a high speed in-memory-only message bus and that works at almost network speed and it's great for a lot of different use cases. But, and that is what backs our data streams. So when you build a data flow, you have streams in between each step. That's backed by a in-memory bus. Pretty often though in use cases, you need to be able to potentially rewind data for recovery purposes or have different applications running at different speeds. And that's where a persistent message bus like Kafka comes in. But you don't want to use a persistent message bus for everything because it's doing IO and it's slowing things down. So you typically use that at the beginning, at the sources, especially things like IoT where you can't rewind into them. Things like databases and files, you can rewind into them and replay and recover. But IoT sources, you can't do that. So you would push that into a Kafka back stream and then subsequent processing is in memory. So we have that as part of our product. We also have Elastic as part of our product for result storage. You can switch to other result storage, but that's our default. And we have a few other key components that are part of our product. But then on the periphery, we have adapters to integrate with a lot of the other things that you mentioned. So we have adapters to read and write HDFS, Hive, HBase, across Cloudera, Hortonworks, even MapR. So we have the MapR versions of the file system and MapR streams and MapRDB. And then there's lots of other more proprietary connectors like CDC from Oracle and SQL Server and MySQL and MariaDB. And then database connectors for delivery to virtually any JDBC-compliant database. I took you down a tangent before you had a chance to give us another example. We're pretty much out of time, but if you can briefly share either that or the last word, I'll give it to you. I think the last word would be that that is one example. We have lots and lots of other types of use cases that we do, including things like migrating data from on-premise to the cloud, being able to distribute log data and being able to analyze that log data, being able to do in-memory analytics and get real-time insights immediately and send alerts. It's a very comprehensive platform, but each one of those use cases are very easy to develop on their own and you can do them very quickly. And of course, as the use case expands within a customer, they build more and more. And so they end up using the same platform for lots of different use cases within the same account. And how large is the company? How many people? We're around 70 people right now. 70 people and you look at the funding, what rounds are you in, and where are you at with funding and revenue and all that stuff? Well, I'd have to defer to my CEO for those questions. Okay, but that's cool. So all right, but you've been around for what, six years, you said? Yeah, we have a number of rounds of funding. We had initial seed funding and we had investment by summit partners that carried us through for a while. Then subsequent investment from Intel Capital, Dell EMC, Atlantic Bridge. And that's where we are right now. Good, excellent. Steve, thanks so much for coming on theCUBE. Really appreciate your time. Great, it's awesome. Thank you, Dave. All right, keep it right there, everybody, we'll be back with our next guest. This is theCUBE. We're live from Big Data SV in San Jose. We'll be right back.