 Live from New York City, it's the Cube at Big Data NYC 2014. Brought to you by headline sponsor, Juan Disco, with support from EMC. Mark Logic and Terra Data. Now, here is your host, Dave Vellante. Welcome back to Big Data NYC, everybody, I'm joined by Jeff Frick. And this is the Cube, we go out to the events, we extract the signal from the noise. Big Data NYC is our event that we hold in conjunction with Hadoop World and Strata. This is our fifth year covering Hadoop World, really excited to be here. Ken Krupa is here, he's the enterprise CTO at Mark Logic, really interesting company that is the leader, actually, in NoSQL. That doesn't mean, folks, NoSQL, it means not only SQL. Ken, welcome to the Cube. Thank you, welcome. I mean, thanks for welcoming me. Well, you're welcome, and it's good to have you. So yeah, Mark Logic, really interesting company. Not a ton of people, but you hear so much hype about the key value stores, and you guys kind of quietly emerged as one of the leaders. We have you, actually, as the leader in that space. How'd you get there? So, patience, I guess. We've been around for 13 years. It's the virtue, they say. Yeah, yeah. 13 years as a company, and, you know, re-envisioning what a database should be. Right, so our founder, Chris Limblad, he was really approached as, hey, if you took this search technology that we've been hearing about and made a database out of it, what would you get, right? And that's really what we set out to do. And through the years as the terms in the industry were evolving of defining technologies by silo, you might say we had a bit of a, where should we fit? An identity crisis of where should we fit? I guess five or six years ago, this notion of NoSQL or something other than the database that you're used to, right? The relational rows and columns database really emerge and that mean took off. So sometimes the mean just kind of takes off and it comes to you and you go, oh, right, yeah. There's a lot of stuff in that messaging that defines what we are and what we could do. And so we've been running with that, but still we don't like to be put in a silo, right? When I meet with customers and they say, so your search or your database or your this, we say, well, you know what? We like to redefine the boundaries of what a database should be, right? I personally don't think search and database should be two separate things. I think if you're putting your data in, you wanna find it. So, yeah, right? If you think about when the relational databases were born, right, there was always this notion of, yeah, there's this keywords to find some text in there as well as values. So it's really been more of the technology inertia of when search could be achieved at the scale that it could be, post-dating the invention of databases, right? So right now we're in this place where we like the NoSQL, we like the notion that everybody kind of latches on and knows what that means. We can certainly differentiate ourselves as Enterprise NoSQL, but what we really like is this redefining of what should a database be in terms of what the expectations have been, yeah? Okay, I love it when the CTO comes on and I can ask, like I call it a Colombo question. So let's start with NoSQL. I start out the segment, I said not only SQL. What does that mean? So what that means is that the language that you use to ask the question of your database is not really what should be defining what you should get out of your database. So NoSQL is an easy thing for people to say, oh yeah, it's not a SQL database. But SQL really is just a language, it's a domain-specific language for asking a particular type of question provided you could visualize your data in a certain way, right, rows and columns. So what we said was, okay, but let's not limit ourselves to, just because we don't start out with rows and columns, let's not limit ourselves to saying, you can't ask a SQL question. It's one of the most successful domain-specific languages in technology, let alone in database domain. So that's what we said, you know what, SQL is still useful. You're not gonna turn on a dime and say people are gonna stop asking SQL questions. So let's embrace that as well as all of the other paradigms through which you can ask questions of a database. Okay, another question I have is key value store. Everybody throws that term around. No, it's key value store, it's key value store. And people associate that with NoSQL. What is meant by key value store? So in the NoSQL space, I've heard that as one of the, X number of definitions, or the first one. There's a key, some identity for something, and then there's stuff that maps to it. That's morphed a little bit, where you've got this definition of a document store, or an object store, which you might consider Mark Longick falls into that category. What I do like about it is that it's this notion of there's this identity of particular things that you're looking for. Whereas in the relational world, that identity was sort of something that maybe didn't live in one place, right? It was this emergent identity. But at least key value said, all right, I've got things, I've got people, I've got places, and I want to give them an identity. That is, that's the way I see it. In the technology space, key value store means something like, yeah, I've got this very simple model where I have a key and I put anything I wanted, and anything I wanted, it can be defined a certain way. We get a little bit more specific and we say, we've got a key, if you will, but the stuff that you give us, the objects, they have other descriptive information in them. So we talk about XML and JSON as ways in which you could present self-describing data. So what you get in that packet is you also get schema information. So schema is data for Mark Longick. It's not an orthogonal thing that you got to pre-define somewhere else or pre-define at a specific point in time, right? It's something that comes with the data that now allows you to do better analysis and discovery. Okay, so you think about schema, you have rows and columns, you got identifiers for those, and then we were talking about schema on read, the term that you used, and then we heard Amy O'Connor talk last night about when she goes and talks to practitioners, they have to get their head around no schema on write. What's the difference? They sound like cousins. Maybe you can help us understand that. Yeah, so no schema on write and schema on read. I guess they're not mutually exclusive. I see it as you don't have to pre, in a prerequisite way, decide all the questions you want to ask. Right, and that's really what it was about. That's really it, right? You don't need to know the questions ahead of time. Right, and when you think about it, it's kind of counterintuitive. Like, when you want to do discovery on a topic, where do you go? You start typing something to some search engine, right? You just go and say, okay, right? You don't say, let me create a table, right? I'm gonna think of everything I might want to know about this topic I don't know about, and stuff in there, right? So this notion of a schema that, if you know a schema, great, that's great. If you've got a model, if you've got tables of stuff, that's good information, great. Let's not throw that away. If you don't, that's great. But more importantly, as you do that, I mean, you have that conversation with your data. Right, so yes, I talk to data, I'm sorry. Okay, out for everybody to see, I talk to data like it. I don't know. We have that conversation. You have a conversation with sort of a corpus of data. A model might emerge, right? And that's more natural. That's a more natural thing where you think, oh right, yeah, and you mentioned pulling the signal from the noise, right? You find a signal, and you go, okay, that's something that's worth modeling. So I actually don't like to think of schema on right versus schema on read-it to continue them, right? I mean, you read it, you write back. As you're doing discovery, that conversation also involves you enriching that information, right? So that's another paradigm shift that we're seeing at Mark Lodge. We're hearing it from our customers really, right? If you're doing the right thing, you're reacting to what the customers are telling you and trying to get a little proactive of where they're gonna go next. And one of the places is in this notion that search is not a read-only operation anymore. It's not, right? It's something that as you find that signal, right? You go, okay, this is valuable. Let me decorate that information with other information. The human-to-computer interaction, yeah? And then that's available for the other folks, right? Because you're not just doing this in isolation. Particularly in an enterprise, if you're an intelligence agency, there are lots of people who are interested in what everybody else knows. There's a crowdsourcing of all that knowledge, if you will. So the search operation, if you will, should be conversational. You should be writing back. You should be asserting new facts. And so we're hearing a lot of that from our customers. And that's a really, you know, it sounds obvious, but then you're like, but yeah, but we've been doing it. We haven't been thinking about it in those ways for many years. So that's one that we're particularly interested in. So Ken, you talked about, Mark Logic has this ability to sort of have self-defining environment. Is that IP? Is that a fundamental sort of component of the technology in general? Can you talk about that more specifically? So you say self-defining environment. So you were talking about the ability of your database to have other information that helps you self-define. Right, where you can enrich the information. Right, and so that's because we're a database, right? And so what did I just say, right? So the definition of a database is something that you can also write to, right? So search intuitively, people have thought of it as something that you ask questions. But a database, people for years have been saying, well, it's also something that you write to. So that's really it, is that we have this very powerful platform that can speak a lot of different technical dialects, if you will, but also give you the ability to update it. It's really that simple. And to do it at scale, right? So the notion of wanting to do this isn't new. You might say that when databases first came out that was the whole idea, is I want to read and write from this database. And then I, and SQL, structured query language, right? Is this notion of, okay, and now I'm gonna ask you a question about what I put in. The technology now has caught up, right? We've got a lot of new, we've got a lot of techniques now. We can scale out, right? The cost of hardware keeps going down. The notion of a critical mass around commodity hardware and what you could do with it. And just as you need more scale, you pop in more nodes into your racks. That all converging, sort of in the last 10, 15 years is what gives us the ability to do that. So all the concepts have been out there, things have come into alignment nicely for us to do that. At a different economics. Much different economics. Because Larry Ellison would say, if we do this, it's all called Oracle. Remember when IBM bought Lotus Notes, he said, stupid acquisition, we're gonna do all that in Oracle. Now, of course, say what you want about Lotus Notes. It's all this unstructured data. Oracle will stand up and say, we deal with unstructured data, we deal with structured data. Of course, they have a no-SQL product that I can't find any customers of it. But my point is, in theory, you could do it with a relational database, but you couldn't do it economically at scale, as I think what I'm hearing, right? Or no. In a sense, well, you could, so the statement that you said about Oracle is correct, except it's not just with a relational database. So the name of product category, Oracle has it, right? So they certainly have it. And so it's not untrue for them to come in and say, yeah, we can do that. Okay, well, yes, a lot of people can do a lot of things. Putting it all together is not as easy as people might think. So if you have all these disparate products, whether it's Oracle and a number of acquisitions that they've made, or going completely on the other end of the spectrum and saying, well, let me find what I can get from the other source community and put it together, the answer is yes, you could do that. But in the middle there, the reality is a lot of time, a lot of cost, and in some cases, there's a lot of brittleness associated with that, right? So I talked about us being multivariant in some ways. And by that I mean, you could model things in a lot of different ways and more logic or not model things. You can ask questions in a very structured and unstructured way. So a case in point, depending upon who you're talking to or what set of features you're looking at, you can consider us a search engine, right? A no-SQL database, a transactional no-SQL database, because you might want to make a distinction between the two, or a triple store. So this notion of semantics and triples and the W3C standards, like RDF, being able to model what they say, what they call machine readable knowledge, right? So that's three or four products right there. Integrating those is hard. Integrating them in a way such that the whole is greater than the sum of its parts, such that when you ask a question, and you're not sure of the type of question you're gonna ask, you're not sure if the question is best expressed as a graph problem or a search problem or a structure type of question. If it's a question, or better yet, if it's a question consisting of all the above, it really, really helps to have sort of one body of information and one product that can handle all that. I'd just say it strikes me too that it's been a perfect storm of all this Moore's lawishness, if you will, across a number of product categories, network, CPU, graphics cards, databases, storage, to drive the cost down and the power up. But also the thing that strikes me is Google's influence on a younger generation, on the expectation of access to this type of information and how I'm going to get it. It's a start trekking, right? Start Trek on this set they talked to the, they talked to the machines and the machines answered their question with Google now. And you could even argue maybe now that we're not allowed to text in the car. Now we're all really talking to our phones and asking questions where before someone would never even think of that as a possibility to dive into the enterprise to actually ask a simple question and get an answer back as to what is our HR policy? I mean, something as simple as that which was buried in a PDF attached behind a firewall they had the VPN in and maybe you could find that page. Very different paradigm. Yeah, that's a great point. And it's funny, you bring up the start trek thing. I've given presentations before and sometimes I actually have a video of one of the start trek movies where they go back in time and engineer Scott picks up the mouse and goes, hello computer, right? So I've done it, because look at that resonance. I just want to do it. And then they're like, just use the keyboard and he uses the keyboard, oh, how quaint, right? So I love that clip. It's a spoiler alert. Probably my next presentation is going to have it again. But that's just the point. There's a lot of operational friction associated with trying to, we've become accustomed to. And accept. And accept it's like bad cell phone service, right? We accept it. We never would have accepted that with our landlines, right? We kind of accepted everyone's scratching and you're going to drop calls, right? Call back. Right, landlines. They're like, why'd you hang up on me? So probably the first few cell phone calls went that way until they figured it out. So yeah, no, you're right. That shift of expectation is great and that's really what we're embracing, right? And we are, and you brought up the enterprise, and not necessarily the Starship Enterprise, but within the enterprise, behind the firewall. Another key point of ours, because of one of the other reasons why Mark Logic was formed was yes, and we have to do the types of things that say Google does behind the firewall where it's a slightly different paradigm of how information flows around, right? You don't have the same critical mass of, you know, billions of people on the planet, right? So, you know, you don't have the same statistical advantage, right? At the same time, you know, there are all these security requirements, as you mentioned, who's allowed to see what, right? You know, the Google model's more open, you know, for good reason. So taking that and saying, okay, well, well, this notion of you've got these folks, once they get into work, they start thinking differently and speaking a different language, right? Understanding that language of how people, you know, work speak, right? And then respecting the security boundaries, right? That there's need to know, particularly defense intelligence and other customer segment of ours, is, you know, what should you see, right? What are you supposed to see? But making it seamless and friction-free, or as close to friction-free as possible, as close to just having a conversation and, you know, saying, you know, maybe the computer comes back and says, I'm sorry, I can't tell you that, right? But not where, you know, why don't I know? Why am I not getting the answer, yeah? Kind of curious as to how you got here. You mentioned you've got a search engine, no-SQL database, transactional database, the semantic engine. How did that come about? Is it architecture? Is it, was it somebody's vision? Was it serendipitous? So all the above, right? The serendipity part really is, you know, you might say that's the customers, right? We put that in that category. The customers will communicate to us. You know, it would be really cool, right? And in fact, that's how the company got started, right? Customers telling Chris, you know, it would be really cool. But then the vision to say, you're right, that would be really cool. Let's do that, right? And then the architecture, great point, because you have to think where you want to take it, particularly when you're architecting these sorts of things, right? Because the things that you know are going to be important are more difficult to do after the fact. You want to say, you know, I want to put some scaffolding in place here, or I want to focus on a foundation so that when that next cool thing that the customer tells me, or not cool but important thing that the customer tells me about, I can layer it in, right? So semantics is a great example, right? So this was, as a result of our Mark logic seven release last year, and we added this triple store capability. But it wasn't like we said, okay, let's just acquire or pull off the shelf some triple store and try to stitch it in to Mark logic. We said, you know what? We've got this incredibly scalable and security optimized, right? With respect to who can see what product. This engine that scales out, right? Let's build the right indexing that leverages all of that capability into our triple store. So that when we come out with a triple store, it inherits all of that. It inherits all of that quality of service and capability. And so that's what we have. So those three aspects are exactly right. You mentioned the three, is it they're not mutually exclusive nor should you take away one if you're gonna be successful. You have to architect it right. You have to have the vision and you have to know when serendipity is gonna hit you over the head with a break. Hey, wait a minute. That's something I should focus on. So as somebody in the database world, John Furrier always says, five, six years ago if you went to a party and you asked somebody, what do you do? I'm in the database business. They go, oh, see ya. It's cool now. And then all of a sudden it's become so cool. It's like dozens of database companies popping out and everybody's trying to predict the moves in the chess board and the winners and the open source pieces and other wildcard. So tell us, help us figure it all out. What's going on out there? What's going on? So data is cool again, right? So, yeah, you don't put that on a T-shirt, right? I'm sure there's lots of them down across the street. Yeah, you have to be right. I'm sure I'm gonna back for that. Yeah, so not to get too philosophical here, right? But the business that we've been in, right, has always been called one of two things. Information technology or data processing, right? So we'll, you know, some variation thereof. So it's not been called hardware technology or software technology or even, and I love my smartphone products, app technology. It's not even that, right? It's still called information technology, right? So it's always been about that. And we keep coming back to that, that all those other things, hardware, software and apps, they are ways on ramps to provide information, right? To give us new insights, new discoveries. So we keep going through this cycle, right? Like I remember I was doing data warehousing stuff before this stuff called the internet came along. I was like data warehousing, you know, start schema and then it's like, oh wait a minute, there's this internet thing, you know, I'm just, you know, gone, right? For a while. Until you realize that, wait, the internet thing is just a hyper-enabler of all of what you're trying to accomplish, right? Because not everybody's contributing and everybody can get the information from everywhere. So it's always been about data and all of these things are coming into alignment now, right? Public data, right? You're talking about commercial and open source. That's great too, right? Because that creates ecosystems where the barriers to entry are as low as they can be, right? And depending upon where you are in terms of how much you wanna roll up your sleeves and just, you know, do a lot of it yourself, that's okay because maybe you don't have the budget, right, to do it any other way versus, you know what, I've got some economies of scale, I've got some budget, I've got some capability and quite frankly, I don't have the time to do that and I have another way that's gonna work for me. So what's in the enterprise space? What's in the commercial space? So it's, you know, it's an exciting time right now and scary and all that other stuff but it's, you know, data has always been a part of it, will always be a part of it, it's always gonna be called information technology, watch 10 years from now we're gonna change the name and we're gonna put up this quote, who's this idiot, right? But it's always been called information technology and so that's why, we're just coming back to where we've always been. What things excite you either, professionally, personally, technology-wise? What's, so that stuff excited me guys, you can probably tell her and I'm jumping around. Yeah, so professionally, I'm a geek at heart, you know. I've always been into technology not so much as an industry but as, regardless of what industry I've been in, it's always been about what technology can do, right? Along those lines I think it's interesting to think of how that's cannibalizing itself. It's really not a technology industry anymore, it's just kind of everything has a technology component to it. Your car, right? There's a whole lot of technology in there, right? Now it's wifi cars, so to me it's like technology eating the world is something that I can ruminate on for a long, long time. Personally, all of my personal interests point back to technology, right? Sports, right? Big Yankee fan and getting into some of the, I was really into the World Cup like most Americans every four years getting more and more into it. All right, we love soccer. Yeah, love it, yeah. All right, and I'd say football when I'm in the right place. So for a global audience I probably have to say many things. But technology there as well, so my kids, the games they play, I've got one of my sons, he's at that perfect Minecraft age. Yeah, probably sees my kid. Oh, you're probably, right? Yeah, maybe was your kid the one who blew up his house? I don't know, something happened. Yeah, so really the fusion of what my personal interests are and my professional interests are is just that, yeah, technology is just an integral part of everything. It's not a single industry sub-segment. I like to think about that a lot. I like to worry about that from time to time. I have kids, right? So what does that mean? The rules are changing much more quickly. So, yeah. Cool, Ken, it was really a pleasure having you on and great to see the tailwinds that are at MarkLogic's back now and you guys participating in earnest in this space and really making a difference. So thanks for coming on, appreciate it. Great, thank you. All right, keep it right there, everybody will be back with our next guest. This is theCUBE, we're live from Big Data NYC. Right back.