 Live from San Francisco, California, theCUBE, covering MarkLogic World 2015. Brought to you by MarkLogic. Here are your hosts, John Furrier and Jeff Kelly. Okay, welcome back everyone. You are watching theCUBE, our flagship program. We go out to the events and extract the symphony noise. We are live in Silicon Valley from MarkLogic World 2015. I'm John Furrier with SiliconANGLE. I'm Joe Mike Coase, Jeff Kelly, big data analyst at wikibon.com and our next guest is Jeremy Bentley, CEO of SmartLogic Entrepreneur, growing company in the fast emerging space around metadata, cloud, search, finding stuff in the enterprise. Welcome to theCUBE. Thank you. So what a better way to jump right in, talk about what cloud and what agile, what unlimited compute, flash memory, flash technology, basically technology to make things go faster and faster. Add to fingertips. Which might speed up the content type market. So explain, we're going to get into that, but I want you to explain a little bit about SmartLogic, what you do, because it really is some innovation going on in how things can be surfaced, the discovery. You know, information discovery is not just a consumer, but it's also for systems. I want to get into that, know the deeper way, but first start with what you guys do. Okay, well, I might start just with a business scenario, which is I think that the, you know, the users, the people who run businesses, they don't say, oh, I'm really glad you put all the data to do with the balances of my customer in this database, and then I put all the documents and information to do with my customer in this content management system. So you already now start to see an unnatural separation between what people refer to as structured data sitting in a database and unstructured content sitting in a content management system and being surfaced by a search engine. And of course, as you get into the restrictions of that, you know, you can ask a database a question and we'll talk about the changes there, but actually you can't ask a PDF a question. So you already start to see that information that's held in content is not available for analysis or processing by another computer. You actually have to read the PDF. You know, content is very, the information in content is very valuable, but it's unavailable. You have to give it to someone and say, you know, go off and code into a database the information that's in this content. So what smart logic is about is it's about unlocking the data that is in the content, right? And you know, when you do that, you start to, it's hugely beneficial. Why? Because 80% of the company's information is in content and yet the content itself is so restrictive that you can't use it for analysis. You can't process it. You can't automate content-based processes. And that really comes back to a little bit of history, which I think is part of the data versus content argument. We started life with hierarchical databases. Well, actually we started life, Gramercy III had information and he had it in paper. At some point in the 50s, we started to code really useful bits of information into databases. They were restricted. They had the eight bits, 16 bits of memory that he had. And so you could only take the really top pieces of information and put them into this Harkane system called a hierarchical database. 30 years later, we invent and the massive $40 billion market that databases are today is to do with the concept of self-description. The hierarchical database gave way to SQL. SQL is a self-describing, has a self-describing nature. Other computer systems can talk to that database. You can write reports. And that really was the beginning of what we call modern IT. By doing it, you've got this benefit, self-description and therefore automation, but you've also got a still a restriction which is that the schema or the way the database is structured means that you have to presuppose the question before you ask it, right? And then of course we now move to the post-relational and we move into no SQL. So that's the data side. Now let's look back at the content side. Well, as data was moving off reports into databases, the reports themselves were still in finding cabinets for at least another 15 years. They then moved into image management systems. They moved from image management systems into file shares. They moved out of file shares into, you know, content management systems. We invented enterprise search to sit on top of those. But actually it's all batch and brittle systems. If you, by batch and brittle means you can't change these systems and they're not self-describing. And until the information in content becomes self-describing, we don't get the same data or information facilities over that content that you get over data, okay? So self-description is the basic art, is the basic- Because data can't talk to each other. Data can in databases talk to each other. But if they're not in different databases, if they're in different databases- Well, content can't talk at all. Yeah, fine. So our smart logic's job is to make the information in content or to make the content self-describing. So it's addressable. So it's addressable. Okay, got it. So once the content, the information that's locked up in content becomes self-describing, you can start asking questions of it. You can ask your PDF, what are you doing? It's kind of like an IP address. Once it's addressable, you can do things with it, right? The data, so this is content, right? I don't know what this is. This is a report, right? And this report- You hold up a little bit. This report has a huge amount of information in it. But in order to access it today, you have to go and read it. And if you want to know who signed off the minutes of the meeting, you actually have to go and read it. But what if you could query the report like a database and say, who signed off the minutes of the meeting? Who was present at the committee at the hearing? What date was the committee found? Well, it was organized. Were there any dissenting voices? Well, you know, this is data. This is data and valuable as the information that's sitting in the database. So the thing is 80% of the information is in this form. It's not in the database. So as we move into the, if you like, the true information era. Is that because the content management systems are so lame and old? That's because they are designed to hold on to pieces of paper. And the best they can do is find the piece of paper and I'll make you read it. But I think, at the time, technology has come. There's a conference of technological change. And the first is that the, if you like, the ability to process language, right? The ability to read and understand the meaning of the information in the report. That's moved on significantly. I mean, this stuff didn't exist before 2007. The volume of content that is going into these content management systems, even if they were self-describing, which they're not, means that they're all beyond spec because they were never designed to hold on to terabytes of content. And if you start to look at, you know, big industrial processes and the compliance tasks and, you know, the paper trails that organizations have, which they've now automated into content management, this data is not available for analysis because it's stuck in paper or virtual paper because it's in a content management system. So yes, the answer is content management systems are batch and brittle applications, right, that don't self-describe. So how does smart logic go about actually breaking down those barriers and actually making it accessible, making content accessible and the ability to tie that to structured data so you can make better decisions? Well, I think this is part of the confluence of technologies coming along. So we've moved now into post-relational era and we have some, you know, we have the concept of a fact, an RDF triple, which is, if you like, the atomic level of fact. So what Semaphore, which is our product does is it goes through the information, the content and it turns the facts in the data, in the content into data in the form of RDF triples. So if you've got data coming out of your data warehouses in going into post-relational systems as RDFs and you can take the 80% of the information that's in the content into the RDF format, now you've got a common format for all data to sit in a unified information store, if you like, which is incredibly valuable. So you can start to answer questions like, which very few, you know, no banks can answer, which is give me a list of all my customers who've got a balance of over $20,000 and who've written a letter of complaint last week. That is an impossible query to ask and it's only the opening statement of so much content-based information that it's not available. So the semaphore's job is to apply intelligence, that's why we call it content intelligence, in order to pull out the data in content and what does that mean? It means you have to be able to model the subject matter of the business and words associated with models of things like taxonomies and ontologies. The reason why they're important is they're a very good container to put in, to put, to model subject matter and that sort of stuff. You then have to interpret those ontologies. You then have to classify the content as it goes past, flies past the system and able to be able to classify the content. You need to be able to extract all the facts, the entities, the potentially the sentiment in order to extract that. And then you need to be able to visualize that information and co-mingle it with stuff that might be in the database. So the conference then of post-relational information stores to give the improvement in natural language processing and the ability to actually do this stuff that I've been talking about with content. And I think the last one, which is the concept of the semantic web, the concept of open link data where people are sharing their ontologies with, you know, make them available, where the information itself possibly doesn't become so valuable, but actually the question that you're asking over the content, over the information becomes the biggest piece of value. And that also is philosophically really quite big, right? So most people think about locking down their information, their data. Actually, as you get into open link data and open shared information, it's not the information that's the most valuable, it's how you use that information and what questions you ask of it. And that becomes very, you know, it's the insight, it's not the data. Chairman, we've got to get the hook here, we're getting the poll here from the producer, but I want to get one last word in, give you the last word, share with the folks out there real briefly. What about MarkLogic? Should they know about? They might not know about. What's the key secret sauce in MarkLogic? What makes them so good? Well, MarkLogic, like smart logic, a lot of logic's here, they're hugely successful at providing no-SQL databases for enterprises. You know, an enterprise is a difficult thing. It's volume, it's security, it's speed, right? It's reliability and all those things. And so very similarly to MarkLogic, smart logic works on the same principles. We are well known for speed, for enterprise-ness, if you like, and the ability to deal with the volumes and the speed and the reliability and the security and the things for that. So we make a great combination, the two, and we've done lots of projects together and we're looking forward to doing more. Okay, we are here at Live in Silicon Valley at MarkLogic World 2015, this is theCUBE. I'm John Furrier with Jeff Kelly from Wikibon. We'll be right back after this short break.