 Live from Union Square in the heart of San Francisco. It's theCUBE covering Spark Summit 2016, brought to you by Databricks and IBM. Now here are your hosts, John Walls and George Gilbert. Welcome back to San Francisco. Here's we continue our coverage here on theCUBE over the Spark Summit 2016. I'm John Walls along with a senior analyst at Wikibon in theCUBE, George Gilbert. And we're now joined by Moniz Wieben, who's a CEO and founder of Splice Machine, a regular here on theCUBE. Moniz, nice to have you back. Thank you. Yeah, so, tell me a little bit about, we're talking about Spark and what it's doing for you and we think some unique approaches obviously that you're taking on in terms of how you're applying that technology to a couple of things, looking at the calculation engines and operational engines and what have you. A little more on that if you would just- Sure, thanks John. So Splice Machine is looking at a class of problem that requires both operational workloads, workloads where you have to in real time change data and store data and manipulate data as well as the analytical workloads that Spark is very well known for. And so this combined operational and analytical type of use case requires different kinds of computational engines. So Splice Machine is a relational database management system that we call a dual engine relational database management system because under the covers, it has an operational engine, which is based on HBase, a Hadoop based engine, key value store, as well as Spark. And what our system does is just look like a regular database to an IT person or a developer where you issue SQL to it and it creates a plan for that SQL but analyzes that plan with an optimizer and determines what kind of query is that SQL going to really be. And if it's going to be a short read or write or change to some data, it knows it's a transactional type of query and it'll execute that on HBase. But if it interrogates it and realizes it's going to be a long running query, maybe a join with aggregation group buys, it'll then execute that on Spark. So you get the best of both worlds and that's why it's a dual engine database. So the flexibility that Spark gives you and I really, I guess it's about the versatility too, right? Is it not? Because all of a sudden you've been able to, you can create or take on a whole new set of issues or challenges with various clients, right? What we see is a desire to be able to have the power that Spark gives you in terms of analytics. But instead of what it's very well known for today, which is really being that data scientists work bench to be able to manipulate data and to be able to articulate exactly how to massage data, they really want to be able to power applications with that power. They want to be able to have a seven by 24 web application, mobile application that is actually being able to take advantage of that analytics, but also have concurrent usage at the same time. And that's what requires these two different engines. So if you're building a web app or any commerce application or something in the social community and you want to have multiple users changing data, placing orders, but also be able to analyze that in real time, you kind of need these dual engines and that's what we target. Well, you made a little bit of news here but with a fairly critical decision for you guys, right? You're going to go open source. And we are opening the door, right? And so space machine is taking a big step. Factors behind that, what drove you there? And then what do you think that's going to do for your business? So I like to split this into why did we do it from our perspective and why is it important for the customer base? For the customer base, the CTOs and CIOs that I work with constantly told me that over time, their whole position on open source changed. Where in the past, they may have been resistant to something that didn't have a company behind it. It turned and it's an insurance policy now for companies to be able to have a vibrant community around the technologies they absorb. To be able to have not a single vendor lock in circumstance or to have an insurance policy should someone acquire their technology provider or if they just change focus. By having an open source project, there is a whole community of people who can carry it on whether or not that company carries it on. So it's a great thing for the companies but even more so, it's more eyeballs on the software. It's more than any one company can accomplish. More bugs are fixed. Performance is better. It's more secure because there's a greater community working on it. So that's why it's great for the customer. But for us as a company, it's great because we're finally at that tipping point of a company's maturation where the platform works. It's performant. It's used by people. It's live where you don't need that small focus team to put it together and now we can finally open it up and let the whole community contribute to it. So for us, it's all about adoption. Instead of having tens or hundreds of customers, there'll be thousands of customers and that is what really creates fire under the innovation of the platform. So those are the reasons why we did it for both ourselves and for the end customer. Money, you said something which, in all the conversations we've had, you actually shed new light with this optimizer that decides is it a transactional operation or is it an analytical one? What would be an example of something that needs both and can the query optimizer do both? Yes, it definitely can do both and the most simple example of doing both would be where you're inserting new records based on some complicated analytical query that you're doing. So in that case, the optimizer may run a very complicated workload on Spark doing a pipeline in memory calculation that may have many joins. For the sake of me, I was going to blame it on the viewers, for the sake of me, give me a concrete example. Oh, let's say you just want to insert a record into the database and find me all the customers who have bought shoes within the last three weeks in these five zip codes, right? And so that query about, that would have to go process every transaction that the company might have would be executed on Spark and then the insertion of those five records that came out would be inserted into HBase and that would be a good combination of using both computation engines because you're inserting a durable, persistent record into our database using HBase but we computed what those sets of records look like by computing in Spark. Okay, so let's go, since we don't have much time left because everyone's stampeding towards the alcohol. You mean you want to go get a drink? No, I have to. Yeah. So for years we heard about this architecture where you had batch work, you had real-time work, and then a layer where you combine the two so you had the historical and the most up-to-date and that was a bear to create and maintain. How do you fix that? Right, and I see this as an evolution to how people have been trying to solve this simultaneous workload problem, simultaneous operational workload and analytical workload. As you said, in the early days, typically people solved this problem with using specialized engines and a large ETL process, often a painfully long ETL process and this is still rife today in enterprises where you'll operate your applications on a traditional relational database management system like Oracle or DB2 or something like that, SQL Server, and then you'll use ETL to get the data into an analytical framework like Teradata or Netiza and that ETL process may take more than a day. And this is painful for organizations because they can't make business decisions in the moment and they're constantly trying to extract that ETL out or make it faster. We have many customers who actually use Splice Machine to fix that ETL problem, but the newer architectures, the green field opportunities from the ad tech world and the life sciences world, all revolved around this new architecture which is very technically named the Lambda architecture. And the Lambda architecture was this attempt to get your hands on the real time data right at the moment to be able to analyze it with no lag and that has resulted in a very cool set of applications, but those cool set of applications require what I call enterprise duct tape where you have to take many different data engines and duct tape them together and make sure they're held together and make sure that as those versions change, you can keep changing the versions. It's very difficult job and very expensive. People might use a Kafka event manager into a streaming system like Spark Streaming or Flink or Storm and then use a batch in analytics layer which typically in more modern systems you spark. Well they may use Hadoop in MapReduce, Hive there and then on their application side for the serving layer they'll use a NoSQL database like HBase or Cassandra and they may even use some other kind of fast caches to be able to get at records really fast. Well keeping Cassandra, Spark and Hive and these other databases all together it's a nightmare for companies and so what we've tried to do with our architecture is be a lambda in the box. Use Kafka and a streaming system to get your data in but then have one data engine, a splice machine type of data engine that handles the mixed workload and we think that's the answer. There are others trying to do this too. You can imagine SAP's HANA and even Oracle's Exadata is our mixed workload databases but their scale up systems, they can be millions of dollars and what we are is sort of the open source alternative to SAP HANA and Oracle Exadata that makes it very cost effective to scale out rather than scale up. That is the most articulate explanation I have yet heard and it'll help me with actually some research that I'm supposed to write this week. Excellent. Well I very much appreciate the time. Well that money could help with that. Good luck with the open source transition. Thank you very much. I know a big move and continued success as well. Thank you very much and if your viewers want to be participating in our open source movement we are calling for contributors and mentors and champions so contact us on our website. And the URL is? dubdubdubspicemachine.com You got it. Thanks Monty. Thank you very much gentlemen. Always good to see you. George and I'll be back with more from San Francisco in just a minute.