 Live from the Moscone Convention Center in San Francisco, California, it's the Q and Oracle Open World 2014 brought to you by headline sponsor QLogic with support from HGST, violin memory and MarkLogic. And now here are your hosts Dave Vellante and Jeff Kelly. Welcome back to San Francisco everybody. It's good to be in San Francisco. I'm here with Jeff Kelly and this is Oracle Open World. This is our fifth year here at Oracle Open World, SiliconANGLE Wikibon's The Cube, our live mobile studio. We go out to the events, we extract the signal from the noise. We're going to talk to MarkLogic. David Gorbet is here. He's a VP of engineering at MarkLogic. MarkLogic is a company that according to Wikibon is one of the leaders in big data. Jeff Kelly could talk more about that. David, welcome to The Cube. Thank you very much, my pleasure to be here. So Oracle Open World, big show. A lot of customers here. We're talking off camera, a lot of your customers here. So what do you think this year? 60,000 people. It gets bigger every year. They take over San Francisco essentially. It's getting somewhat out of control, but it's good. A lot of business being done here. A lot of action. So what's your take on the show so far? Yeah, I think it's great. I mean it's great to see so many people. It's great to see so many different vendors and so much different new things going on in the database world. Because for a long time, there wasn't really a lot going on, but it's a very exciting market to be in. Yeah, database was kind of boring for a while. People say, oh, you're working the database business. Okay, great. And then now, all of a sudden, it's exploded. Yeah, yeah. And you're seeing key value stores and all this NoSQL talk and SQL on Hadoop and real-time and just exciting. So where's MarkLogic fit into all that? Yeah, so we're a NoSQL database, but we are the enterprise NoSQL database. So if you think about a lot of the NoSQL technologies, which as you know are technologies that were designed to accommodate the type of data that doesn't fit easily in relational rows and columns, you know, a lot of those NoSQL technologies grew up around single application web use cases. So I'm building a, you know, Facebook or something like that. Our focus, and we've been around for 14 years now, our focus has always been on the enterprise. So giving customers the enterprise capabilities that they need, high availability, disaster recovery, security, journal archiving and just transactional integrity, which a lot of NoSQL technologies are missing. So when the big data meme really sort of took off, and you guys were doing big data before, it was called big data. That's right. But a lot of the practitioners that I've talked to, when you talk about no schema on right, they go, what? How did you get through that in the early days? And how was that sort of, how do you deal with that with customers? Well, you know, there's always been complex data out there. There's always been data out there that's not in a relational database because it's too hard to model. And that data is valuable. And it, depending on the industry, in some cases, it is the entire value of their business. So, you know, we are early, some of our earlier customers were publishers, they were creating information applications. Content is very hard to model in relational databases. And so they saw the value right away, and they had to transform their industry. But now we're seeing more and more companies or customers who have complex data, financial services, customers, derivatives contracts, very complex healthcare records are very complex. These are actually not unstructured data. They're structured, but the structure is complex, hierarchical. It would be very sparse in a relational database. It's very hard to model. And you couple that with the dynamic nature of most businesses today, the agility that they need in being able to model new forms of data and do new things with it. It's just, you know, it's very hard to do in a relational world. Can you talk a little bit about your approach, your philosophy? We're at Oracle OpenWorld, so it'd be good to differentiate from Oracle, if you could. I mean, you got, you know, multi-million dollar boxes, you got hardware and software engineered together, exolytics, exologic, big iron. Yeah. It's like the mainframe all over again. Yeah, you guys don't make hardware, I don't think. No, we don't. So how is your philosophy different, you know, as an engineer, how do you approach the problem differently? Well, one of the, I mean, there's several differences, really. One of the key things is that in today's data environment, there's a lot of data, a lot more than, you know, and more being created every day. And that data not only is there a huge volume of it, but there's a lot of different types of data that people want to model. So our approach really is to use a scale-out architecture, which is more typical of NoSQL than of a relational database, right, so that you can add new capacity on commodity hardware. It's much more cloud capable, so we've heard a lot about cloud here at Oracle Open World, right, so, but, you know, our technology is fundamentally designed that way, right, so it's a scale-out architecture. But it's also the schema agnostic nature of our database allows you to integrate new data easily and quickly, right, so that if you have, it's not just about the volume, it's about, I want to suddenly start analyzing some new piece of data along with my existing data. I can do that without having to do a six month or a year-long data modeling exercise. So, I mean, I had a little tongue in cheek earlier today, we were talking on theCUBE. It was 8i for internet and 11g for grid. Oh, what? G became C, cloud. So, you're, I'm inferring from your comments, you were sort of born in the cloud, you know, born in this new platform world. Talk about that a little bit. Yeah, so we, you know, our founder came from a search background, and so he's, you know, internet generation technology where scale-out is the model, and the way that you index for that type of technology is fundamentally different than the way that you index for a traditional relational database, and it allows you to do more with the data. It allows you to answer questions that you never thought you were going to ask. So, you don't need to figure out, not only do you not need to model all the data in advance, but you don't also need to figure out all the questions and all the queries you're going to have to do so that you can do index optimization, right? So, it's a, yes, it was born of the sort of the new generation of data modeling with a search-based internet-style paradigm, but focused on enterprise data and enterprise customer use cases. And that's kind of the definition of enterprise big data. Jeff, you remember when we went back into probably 2009, 2010, we were trying to define it, it was doing things that you couldn't do with traditional technologies, and that was sort of how we came about our definitions, right? Sure, and then, you know, I think when we, early days we talked a lot, talked about a lot of those things you couldn't do with traditional systems, and then the conversation we've seen shifting towards the enterprise grade components, well, we want to do these new things that you couldn't do with a traditional relational system or a traditional data warehouse, but guess what? We also need all that enterprise grade, security, high availability, you know, the privacy implications, so we're starting to see that become much more of a conversation, and so suddenly we're seeing the conversation start to swing in Mark Lodzik's direction, because you've focused on that for a while. Take us back a little bit, so the company's been around since kind of even before the whole SQL revolution, if you will, so take us back there a little bit if you can, what did Mark Lodzik see that was so far ahead of the market that, you know, the market seems to have started to catch on a few years ago, but you guys were way ahead in 2001. Yeah, I mean, I think it starts with our founder who, like I said before, came from a search background, he was the principal architect at Infoseek, and he just thought it was crazy that you could find stuff on the internet so easily and quickly, but you couldn't find stuff in your own organization, right? So he set out to unlock the value of that data, to make it available, and then he very quickly realized that in order to do that, you have to think about how you're going to store it in a different way. You need a database and a search technology to be together. The way I like to think about it is the, search is the query language for unstructured data, and if you have complex data that's a mix of structured and unstructured data, you know, you need to be able to mix query and search seamlessly, right? But the enterprise customers with their enterprise data, they always needed transactional integrity. They need, you know, asset transactions, and in case you're wondering, asset transactions is not just about, you know, bank debits and bank credits. I mean, it's about a consistent view of your data at all times. You know, you can't even back up a database, you know, with a consistent backup, if you don't have a guaranteed consistency in your database, you know, you have to quiescent, stop ingesting things, get it to a consistent state. You know, our enterprise customers can't afford to do that, right? So at the end of the day, they're looking for that agility and flexibility with their data, but they need, there's this thing called data governance, right? They need to be able to, they need to be able to manage their data, they need to be able to run it in their data center. And a lot of these technologies just are not really enterprise ready. So tell us about how you approach that issue, in particular around some of these use cases you were talking about earlier, almost like it sounds like the data warehouse use case, but with the added flexibility that no SQL database provides. Talk about that use case and where those privacy implications come in, the data governance issues and how you're able to address those in ways that maybe, especially when you've got a lot of data sources, you're moving them around, in ways that it was maybe harder in the kind of old world of the traditional data warehouse to do that. Well, one of the things that makes it hard is that there's so many different technologies operating on the same data, right? So you've got, if you've got data that you're generating in your business, you've got your transaction processing data, you have a system for that. And then if you want to analyze that, then you're going to go ETL that into some data warehouse somewhere, which is a totally different technology stack, right? So not only is it difficult from a data modeling perspective and from an ETL perspective, but from a governance perspective, these systems use different security models, they have different scale models, you know, there's different retention and lifecycle features on both of them. So it makes it very hard and the more of those you have, the more difficult it is in your data center. So what we're seeing is we're seeing a lot of customers want to simplify that, right? They want to be able to use a set of technologies that has the similar, a same sort of scaling pattern, the same security model, and to be able to leverage it throughout the data lifecycle from the transactional live portion of it, where you're actually processing transactions through to sort of maybe a compliance portion where you're keeping it around to answer compliance queries, but it's really read only at that point through to the archive portion, where you're not putting stuff on tape anymore, you want to put it in maybe HDFS, the Distributive File System so that you can do map-reduced jobs on it and figure out what trends are happening long-term in your business. Being able to do all of that with a single database technology vastly simplifies the governance problem because you're not transforming the data at every step. You can use the same security model, the same scale model, the same data retention model, et cetera. So if I can jump in, so the same policies and it makes it easier to automate and ISVs can build on top of that, presumably. So when I think about, you were just describing the data lifecycle, I think about it at some point, I know a lot of people want to keep everything, but I know a lot of general counsels don't want to keep everything. So how does that fit in to your model? Again, it's not your primary value proposition, but presumably ISVs can add value on top of that. Can you show me what that looks like? We actually have several ISVs who have built solutions on top of MarkLogic, one in particular in the eDiscovery space, because if you think about it, being able to get lots of different information from lots of different places together very quickly and then be able to search and discover what's in there, that's a key use case for that. But for compliance, I mean one of the things that makes compliance difficult is when you decide you have to retain something or when you decide you have to delete something, maybe there's a privacy law that requires you to delete a customer record or something if they request it. If you have all these different systems that are ETL together with lots of different downstream, data flowing downstream, and then people doing desktop aggregation, you lose control of the data and you don't even really necessarily know where it is. Being able to have a more data-centric viewpoint with a technology stack that supports all of that makes it much, much easier to control the data. Can you talk more about your technology stack? I mean we touched on search a little bit, it obviously fits in there, but talk about the stack in particular. Yeah, so we're a database, so we are at the persistence layer of your stack. Typical architectures are a three tier architecture. We're the database layer. Like every database, we have a file system that we use to store data. Unlike most databases, we can work on almost any file system. So we can work on SAN or NAS, we can work on local disk, like many NoSQL scalar technologies. We can also run directly on HDFS. So if you're using Hadoop, you can run your data directly on the HDFS file system and do MapReduce jobs right underneath MarkLogic, use MarkLogic to do real-time query as well. And then on top of that, we have interfaces that allow you to access your data through REST, through an ODBC driver, if you can believe it, right? SQL in a NoSQL database. Through Java clients or other middle tier clients. So it's interesting to hear you talk because Jeff, as you know, a lot of times competitors of MarkLogic would say, wow, they were born before Hadoop and so they're not modern. You know, I'm talking about everything you're saying fits into this sort of big data world is born in what I'm calling born in big data, born in the cloud world. Is that unfair criticism? Yeah, I mean, I would say, you know, we have a, you know, we look at all these technologies as they come out. We keep a very close eye on the Hadoop ecosystem, for example, there's a lot of valuable technology there. All right, so, you know, HDFS is a valuable technology, MapReduce is a valuable technology. There's also a lot of churn in that ecosystem as well. People trying to layer on security, people trying to layer on transactions, people trying to layer on indexing. For us, we already have all that stuff, right? So, being able to just run directly on top of Hadoop and put, you know, all of the data management capabilities with enterprise grade on top of it, it's just a natural fit. So, are we trying to put, is the ecosystem trying to push, are the industry trying to make Hadoop do things that it shouldn't do? I mean, I know the security people would say, well, security has to be designed in. You can't do what it's a bolt on. You could say the same thing about high volume transaction processing. You could say the same thing about a lot of things. So, you guys started with that in mind. That's right, yeah. What's your opinion of that as an engineer? Yeah, it's very hard to do that as a bolt on. I mean, you know, it's, you know, when we started, we started with, there are certain things that you say, we're not going to compromise on. We're not going to compromise on transactional integrity. We're not going to compromise on security. And then you design around that, right? And so our design has evolved around the things that we decided we were not going to compromise on that enterprises need. If you didn't design around that, it's much, much harder later on to go and retrofit. And so we're seeing a little bit of that with the Hadoop ecosystem. I think the, you know, HDFS and MapReduce were designed to solve a big data problem. Almost everything else in the Hadoop ecosystem were designed to solve problems with HDFS and MapReduce. You know what I mean? So it's like, you know, we're seeing a lot of that. And that's not to say they're bad technologies. I mean, I think there's a lot of good innovation going on there. But I think it's going to take a long time for all that to coalesce into something that's really sort of enterprise grade. So I want to go back a little bit to the kind of, you laid out the life cycle to kind of get the live transactions doing some of the analytics as well as the archiving all using Markology kind of as that database layer on top. So can you put a little color on that? Maybe talk about a customer or two, what kind of applications are they running where they're doing both the live transactions also bringing in the analytics so they can analyze that data in close to real time. And maybe, you know, if you've got a Hadoop example that's great too. A good example is fraud analytics, right? So if you've got, if you're an investment bank and you're doing transactions and you're using, you know, Markology is used to process, for example, derivatives trades. And then you want to be able to look at those patterns and figure out, you know, is there a rogue trader or is there some fraud going on or something like that. Being able to do that three days later is not useful. Right, you want to be able to do that as close to real time as possible. So that's one example. Another example is just with the whole healthcare.gov marketplace. So they're doing a lot of transactions. They're also doing analytics on those transactions to figure out, you know, who is in what segment and who's applying for which policies and things like that. And they want to be able to do those analytics on the same dataset without having to do ETL. So traditionally, if you were, even if you were, traditionally, maybe not the right word, even if you were doing that kind, those kind of workloads with a different NoSQL database, you would be, you could run those transactions and then you would have to move that data at some point into an analytic environment. That's right. We're hearing a lot about some of the NoSQL database having, some of the challenges are right around exactly what we're talking about, BI and analytics against that data. And we're not even talking about real time capabilities. So to get that real time view would be very challenging in some of these other environments. Yeah, yeah. And our technology is designed for that. So that, you know, we design, in order to do the transactional piece, you need to have transactions, right? So having asset transactions is important, but it's more than just asset transactions. I mean, we do memory buffered writes, we do lock free reads, so that you can get high volume mixed query workloads. So we do a lot to be a really great, fast transactional database. But the way we index, so our indexing technology allows you to answer just any arbitrary question on your data to do more sort of discovery based analytics. And so, you know, being able to have those indexes and be able to give you sub second response on very complex queries and also having the transactions in the same database makes it easy to do both of those things. So, you know, I like to say, you know, if you think about it, why is Google so fast? You go type something into Google, you know, why did they give you the results so fast? It's because they're not actually searching the web. They're searching their indexes, right? So we do the same thing. We have indexes of everything that's in the database and we can return sub second response on very, very complex queries, analytic queries. And this is the way that business intelligence and analytics in general is going. It's going to become a more search based operational paradigm. It's not sufficient anymore just to say, I'm going to pre compute all my dimensions and I'm going to figure out what my dashboard is going to look like and I'm going to create some canned reports, right? I want to be able to ask a question of the data and then based on the answer to that question, ask a different question that I hadn't thought I was going to ask. It's about discovery based analytics. And so, we think about Gartner recently released a report on data warehouse and they're separating that into four different categories now. There's traditional data warehouses and then there's, you know, a category of operational data warehouses where, you know, I have to say, our logic is number one on that, so. Well, they're definitely different use cases and it's interesting when you think about, you know, that when you think about real time, the definition of that is relative depending on your particular use case. But the idea that you, the example you gave was a great one. You see a rogue trader. It's no good to figure that out three days later when that rogue trader is off in the Bahamas somewhere. You know, enjoying the fruits of his labor. So, I got to push you a little bit though from a technology standpoint. Everything sounds fantastic but there's got to be some trade-offs. Are there any trade-offs you've had to make from an engineering perspective that you wish you could, or you are working on kind of filling in some of those gaps? Yeah, yeah, so there's always trade-offs to make. For example, if you do journaling, then it takes longer than if you don't do journaling. So, when you're journaling transactions, that's the trade-off that you make for durability, right? If you do, if you have a scale-out system and you want to have transactions across multiple different entities in your database, you need to have some sort of two-phase commit system. That takes a little bit longer. That's a trade-off that you make to have asset transactions. We also do a lot of indexing up front. That's a trade-off that you make in order to be able to respond to complex queries in sub-second time. So, we think those are the right trade-offs because what we see is we see people building those into their application layer if they don't have them in their data layer. And it's more efficient to do those in the data layer. So, security is another good example, right? If you build security in, yes, there's a trade-off because we index security properties on the documents and it takes a few extra nanoseconds when you're ingesting data. But if you don't do that, then you're going to spend a lot of time in your application doing it. So, there are some performance trade-offs. How do you, as a company or as an engineer, what are you, I think, Dave asked a similar question earlier, but I just want to ask it again, about what's your philosophy about it? Where do we make those trade-offs? How do you look at it when you're developing a database, you're developing new capabilities? Where do you say, okay, this is something where it's worth taking a slight performance hit because this is quote-unquote enterprise-grade capability that people need or some other fault? Yeah, so we won't compromise on things like security, on things like transactional integrity. It's a false economy because if you lose all your data because a node goes down or something like that, then it doesn't matter how fast you got the data in there in the first place, right? You know what I mean? But, you know, that said, we have a scale-out architecture. So, you know, it's very easy to achieve whatever load performance or whatever query performance you need by adding scale. And we actually have an elastic model where you can scale up and scale back down again, which is equally important, you know, as your data needs change. So for example, we have a customer who they know that they're, I think it's the middle of the week is the busiest period. Weekends are not busy at all. They're on the Amazon cloud. They preemptively scale up their architecture for their mid-week rush. They scale it back down again on the weekend because they know they don't need that capacity, right? So, got you. So that's an interesting example. So you mentioned the cloud. What are you, are you seeing a lot of traction there? What's the, what, talk a little bit about the, the role you think the cloud's going to play maybe now but also going forward. We're here at Oracle Open World Clouds, every other word you hear. And but of course, you know, we've been covering this market. Dave's been running a lot about cloud and the economics of cloud. How do you see this playing out in terms of the database world specifically? Well, so, you know, cloud technologies and being able to scale elastically has always been there at the application tier, right? Where you have a stateless app server or something like that. It's very hard to do at the data tier. And it's particularly hard for relational databases. So, you know, we have a bit of a different philosophy from Oracle with respect to the cloud. We think the cloud is about agility. We don't think it's just about taking my application that I wrote and moving it to some other hosting provider, right? We think it's about having choices so that you can be agile about expanding and contracting your environment so that you can build new applications on top of your existing data. Being able to run in a hybrid environment where some of it is on-premises and some of it is in the cloud. We have some customers doing that as well. And for us, agility, you know, if you think about what Markologic is all about, it's about agility with your data. It's about being able to do things without having to do the data modeling up front, without having to build, you know, rigid ETL. The cloud is a natural extension of that for us. It's really part of our, it's an extension of our value proposition, really. I was going to say, being able to do that at the database tier just adds a lot of flexibility to the infrastructure that has never been there before. Can you give us an example? You've been around for a while. Jeff, as I said, Callie's report, I think shows you, I think, Jeff, I'm correct, and Markologic is the largest independent sort of big data company, right? And it's certainly leading in the NoSQL space. Okay, in your data. Can you give us an example of a customer who's doing something with Markologic that they couldn't do before, or maybe changed their business? What's your favorite customer example in that sort of general category? That's a hard question, because there are so many, really. I mean, it goes everywhere from customers who have gone from 16 different database systems to one Markologic system and vastly simplified their infrastructure to customers like the FAA, for example, who runs their emergency operations network on Markologic. This is a situational- Oh, I hope that thing's good then. That's better be solid. Enterprise class. This is a situational awareness application. So if there's a hurricane or something and they need to figure out where to land planes or where to reroute them, they can do that. And the types of data feeds that they ingest in order to do that are geospatial location data for planes, metadata about airports. It turns out a 777 can't land at every airport. Certain runways can't accommodate them. What fuel stocks are available so they can make sure that they can refuel and get out of there. Weather data, obviously weather data. But also, you know, Twitter feeds they're monitoring now as well because since the miracle on the Hudson, you know, they're been, you know, they're monitoring aviation-related tweets in case that may be the first place that they learn about some aviation disaster. Being able to combine all that data into one system and then build a situational awareness application on that. I like that one, frankly. Yeah, that's pretty exciting. I'm happy. My follow-up question was, what's, you know, what kind of, from a technology standpoint, what's exciting you? That example you just gave is pretty exciting. Are there any others that are really sort of thrilling you? Yeah, I mean, so I run engineering in my project. So for me, I'm always looking forward, right? So I'm thinking about what's coming next and we have some incredible technology coming out soon in the next couple of months that's just going to really revolutionize the database world. So things like, you know, our bi-temporal capability which is going to be really important for anybody who's in a compliance type situation. It allows you to answer the question, you know, what did we know about a time period in the past when we made a decision versus what we know about that time period now, right? So you need to be able to ask those questions so that you can justify decisions that you made so that you can, you know, allow, you can follow the train of decisions across corrections in your data. It's going to be very important in the financial industry but also it's very popular in all kinds of other industries as well. Oh, I'll bet, that's exciting. And you guys got an event that we're going to be at, I think, next, 2015. In April? In April, yeah. Yeah, our strategic world user conference. That's going to be great. We'll see some of those innovations there as well. That's right. Excellent. All right, David, listen, thanks very much for coming on theCUBE, sharing the story of Mark Logic. Congratulations on all the success and Hot Company and we'll be watching, you know, the Rockstar CEO. Yeah, that's true. And really excited to watch how this all shakes out. We've got a Capital Markets event down in New York and we're going to be talking about these issues, the disruption that is big data on the traditional enterprise. So, and I know you guys are part of that as well. So, thanks very much for all the support and thanks for coming to theCUBE. Great, thanks for having me. It's been great. All right, thank you. All right, keep it right there, buddy. Jeff Kelly and I will be back to wrap up at Oracle Open World right after this. This is theCUBE, we're live from Moscone.