 Live from Cambridge, Massachusetts, it's the Cube. At the MIT Chief Data Officer and Information Quality Symposium, with hosts Dave Vellante and Paul Gillin. Welcome back to Cambridge, Massachusetts, everybody. We're here at the MIT Information Quality Symposium, the Chief Data Officer Conference, the leading, I think, event on the whole issue of information quality and the emerging role of the Chief Data Officer. I'm Dave Vellante of Wikibon. I'm here with my co-host, Paul Gillin. Thanks, David. I want to introduce our next guest, because we go back a long way. I've been in this business 32 years. It was in 1982 when I started a computer world. And there's one person who has sort of dominated a landscape, the landscape of data warehousing, data management, for all that time. And you don't see that many people who have that kind of position for so long as Bill Inman, known as the father of data warehousing, the author of 52 books. And it turns out that we have an intersection point in our career as long ago that I just found out this morning, Bill. Well, gee, Paul, I never forgot. Because many years ago, when I was starting my career, the first publishing adventure I ever had, you were my editor. And listen, I remember you. I just remember you don't ever meet. Well, I remember you, you're a god, Bill. In this business, I just don't. And I remember seeing your name in computer world, but I didn't remember that I had played some small role in your success. Data warehousing certainly has shifted, come of age, perhaps it's morphing into something else right now. Some people would say data warehousing never really lived up to its expectations. What would be your opinion of that? Well, I think if you take a look at the needs of corporate architecture, I think data warehousing, at least in my perspective, certainly has lived up to its expectations. But on a personal basis, I really haven't done anything with data warehousing in over a decade. So I've kind of left data warehousing behind. I still have many fond memories, and I still occasionally talk with people on the subject, but it's something that is certainly is of interest to me, but I'm not actively involved. But what you are actively involved with right now through your company, Fox Run, is text analytics and mining text for context and meaning. Big problem that many, many smart people have been attacking for a long time. Have you finally cracked the code? I think we have. We've taken a different approach. The approach that we've taken is that in order to do a credible job of managing text, you've got to start with context. And unfortunately, context is difficult to find and deal with. It's taken myself and my researchers, I would say, 12 years of starts and stops and a lot of failures along the way. But we're now at the point where in terms of dealing with text at a very profound and deep level, we're able to do that. And can you give us an example of how your technology is being used right now? What are some real-world examples? Some real-world examples are, number one, in terms of call-central data. Most corporations have call centers. Most corporations can tell you how many calls they get and how long the calls are. But that's about all they can tell you. And unfortunately, the conversation between the company and its customers is very important information. And so what we're able to do is go in and understand that information and organize it so you can put it into a database. Another interesting arena that we're working in now is in health care, being able to take health care records. Health care records are kind of interesting in that you must have health care records in a narrative form. Why? Because doctors and nurses need to look at that narrative information. And so it's absolutely mandatory that health care records be in a narrative form. The problem is that data that's in a narrative form cannot be effectively analyzed by a computer. And so in order to get a proper analysis done, you've got to take that narrative information and turn it into a form that can be analyzed. So those are, and there's a lot of other examples out there, but those are two of the more interesting examples. So I wonder if you could talk a little bit more about sort of this initiative that you're taking on. So it sounds like you can ingest voice and presumably social data and diverse data types. And you're bringing them into some kind of data store and then performing analytics on that. Can you talk a little bit more about sort of what it looks like and maybe we can talk about the secret sauce? Well sure, the data in the end analysis looks like a standard database. It looks like any other database that anybody would have and in fact from a usage standpoint, the end user doesn't know that it came from anywhere different than anywhere else. And so from a standpoint of what does it look like, it looks like any other analytical data that you've ever found. Now, there is this issue of where does the data come from in terms of voice, in terms of OCR, Optical Character Recognition. Those are some of the issues related to the subject of getting the data. And I'm going to have to say that the technology of voice, recognition and transcription is one of the links in the chain of being able to do what we need to do. So is the secret sauce in the voice recognition? Is it in how you handle the diverse data structures, both? Can you talk about that a little bit? The secret sauce, if there is such a thing as secret sauce, it's in the ability to find and understand context. And context of text is what gives us the capability to do a profound analysis of text. And the difference between what we do say and search processing is, is we don't do search processing. We go the next level and understand context. Once you understand context, then you're in a position to do analytical processing at a very different level than you've been able to do it before. Yeah, well, I've always said search is kind of like a blunt instrument that people use because they can't figure out how to solve this problem that you presumably are solving. But I'm still really interested in how you're doing it. If you can share that, is it a categorization? Is it semantics? Is it some new thing that you've invented? Is it math that's existed for decades? Day number one, it has taken us 12 years time to learn how to do it. Number two, there is no one algorithm. Text is a text, language is an inherently complex subject. And we take language for granted. Why? Because we speak it, because our mothers taught it to us when we were one years old, and we just, to us, language is very natural and normal. But when you start to put language into a computer, it's anything but natural and normal. And so when it comes to the question of how do you do contextual analysis, you have a hundred different ways that you do it. And why? Because in language, there's a hundred different ways that a context occurs and appears to us in language. So there is no one algorithm. There is no one way that you do it. The last time that myself and my group of researchers had a conversation, I think we're up to about 62 or 63 separate little algorithms that say, in this case, this is how you do it. In this case, this is how you do it. And so that's, in the time that we have here, I could not possibly begin to tell you, well, you do it this way. But because there is no one way that you do it. Let's take an example. And by the way, I want to apologize. I mispronounce the name of your company. It's Forrest Rimm Technologies, your company. If we were to take a transcript of what we're talking about right now and feed it into your engine, what kind of contextual setup would require and what could you give us out the back end? Okay, the context of our conversation would be, are we in Boston? Are we doing a TV show? Are we at a conference on data and data quality? Are we grown men talking? Is it cold? What time of day is it? Is this a professional conversation or a casual conversation? Those are all of the kinds of things that make a difference to the context of the conversation that we were having. And you need to be able to take those different factors and apply them to the text. So if we had a transcript of what we're talking about, we would take these external factors and say, okay, this is what's going on here. Now, given that this is the context of what's going on, this is when I'm reading the text, this is how I interpret this to mean this. And so this is what our technology and our software does. Then you take this information that we have created and said, okay, this is the proper and appropriate interpretation of what's being said. We're now gonna take that and put it into a standard relational database. Once I put it into a standard relational database, I can use a hundred different tools. I can use ClickView. I can use Tableau. I can use SAS. I can use business objects. I can use a bunch of tools to do the analytical processing against it. And so the trick is the number one, we have to have the content of what we're talking about. But then we have to be able to factor in all of these other external factors that make a difference as to what we're talking about to be able to appropriately interpret the meaning of the words. So I wonder if you could talk a little bit about your company. Yes. A forest rim tech. Where are you at? You've been working on this for 12 years. You've been in the company 12 years ago? That's correct. So now you've got a product. That's correct. You're selling the product. And you're self-funded? Is that right? That's correct. I own the company. Awesome. We love that. No VCs. No VCs. So where are you at with the company? I mean, you got paying customers. Can you talk about that a little bit? We are... Working with the NSA, I presume. I know you can't say. We live in and work in most of us in Colorado. I have partners in Dallas, Texas, and I've got some people in Chicago. But most of us are in the Denver area. Number one. Number two, we do have a paying customer. We've got some very large companies. We are working in the area of healthcare. We're working in the area of call center for corporations. And then we have some other very well-known corporations that we're doing other work with. I'm not in liberty, especially on television, to start to go into the names of the corporations, but I promise you, you would know them. And I should mention, this is your third startup, the first one you took public, the second one you sold. So you have pretty good track record in this area. Right. You probably want to be changing the subject here a little bit, Bill, but you're also very well known to your prodigious writer, 52 books, remarkable. Your latest one is about data science, I understand. The data science. Tell us a little bit about that book. Sure. The publisher of the new venture is Elsevier Kauffman, the Morgan Kauffman, I'm happy to say. And it's coming out in the month of November. It's a book that talks about data science from the standpoint of fundamentals. It says, I think there's a lot of conversation with big data and data science going on out there. And I think there's this tendency, at least for the people that I've talked with, in data science to be rather cavalier with regard to what I call fundamental information, things that they need to know. And so, that was the inspiration for writing the book. A friend of mine, Dan Linstead, and I sat down one day and says, gee, Bill, these people are all talking about big data and data science and all this kind of stuff. They should understand, they should know, before they get into it, that this is how things need to be structured, how things are structured, et cetera, and so forth. And so, Dan Linstead is a co-author of the book. Dan Linstead is known for something called Data Vault, and he's contributing that part to the book. But it's really a book for the data science person because you take a look at big data. You've got corporation after corporation building things with big data. There is this need right now for people that have a deep understanding of all the technology. And so, this book that we're producing is really a foundation book. It's not a book on how to do statistical analysis. It's not a book on different techniques of analysis. It's a book on how things ought to be structured in terms of volume of data, in terms of the physical media of data, in terms of the meaning of data, and all of the things that we work with. So, my last question is, a lot of young people in the audience interested in data, we're always talking about getting involved in data. If you like math, statistics, data's the place to be. What advice would you give to young people that are interested in this field? Let me tell you, I have a young lady that is the daughter of a friend of mine and she's going to college and I were talking the other day and she's an ambitious young lady and a smart young lady. And she came to me and she said, Bill, she says, what could you tell me? What advice could you tell me? I want to succeed. And I said, look, I said the demand for data scientist is not going to be, is already so large and it's going to grow exponentially that if there's one thing I could tell you to do is go be a data scientist, go learn what it takes to take data and how to take that analytical data and turn it into useful business value and information. I said, if you do that and you're going to have a happy, successful career. We talked about, of course, the relational era. Some people think we're moving into a post-relational era. Maybe it's a no-sequel era or it'll be unstructured text. Data will be managed in less structured forms in the future. Do you see new technologies in the database area that excite you? That is a difficult question. I think, yes I do. I think there are some new technologies that are most interesting but I think the thing that I don't care for for that question is I think the emphasis. I think the emphasis in the future is going to be not on the technology and database. Not that we still need them. They're still going to be important but it's going to be like turning the light switch on and off. When you turn a light switch on and off, you don't need to know that back in the wall is a wire and this wire has got electrons and those electrons are doing, somebody needed to figure that out years ago and it's important that it's there but in today's world, we're more interested in the light switch and what it turns on and maybe the decor and stuff like that. I think that yes, there are some interesting database technologies that are coming along but I think that the emphasis in the future is going to be much more of an emphasis on can you provide to me a measurable business value? I think that that's much more exciting than the technologies themselves. And we hear a lot of talking about abstracting data at a higher level about using tagging, semantics, metadata to really create industry standards that a lot of different companies can use. Standardizing data across entire industries the way the manufacturing industry did many years ago. Do you see this coming together at a higher level? Are we going to see more efforts to standardize data so that it can be shared easily across industries? Paul, when you take a look at standardization in the computer profession, standardization is done from commercial usage of a product. It's not done from committees of people sitting around saying we're going to do this and that and the other. That the truth of the matter is Bill Gates, Larry Ellison and IBM for the most part they set the standards, but they don't sit down and think well I'm going to create a standard that they sit around and say I'm going to create a product or a line of products and then the product becomes so pervasive it becomes the standard. So yes, there are standards coming but I believe that the way standards are going to be set is much more in product acceptance than I guess I've seen in my lifetime so many committees and so many people and those are standards that people set with committees they just never seem to happen. By the time they come together they're irrelevant, they're obsolete anyway. Absolutely. Looking at data warehousing, I know it's been a decade but I want to come back to data warehousing and where that's going. We're seeing the evolution of distributed databases through Hadoop, whole new approaches to data warehousing, the data warehousing concept. Does this excite you? Do you see a lot of greater potential in these much flatter, more distributed technologies? I think that data warehousing is becoming one of those switches, or one of those wires in the wall. I think that at one point in time we in the industry were interested and concerned with what the wires look like and how many electrons were flowing. I think that was one time of great interest and importance but I think that as time passes we're still going to need the wires, we're still going to need the electricity flowing through the wires but it's going to be in the wall hidden and someone's just going to come along and turn the light switch on. So I think in terms of interest level, I think data warehouse has a diminishing interest level. It doesn't mean that it's going away, it doesn't mean that it's not important, it just means that it's become a standard part of the infrastructure. Just very quickly, the term big data is this a meaningful term to you or do you think it's just another new wine or old wine in the new bottle? No, I think to me it has very much a definite meaning and I think I use the Silicon Valley accepted understanding of big data. IBM wrote a book called Understanding Big Data which I believe IBM gives away for free and I think that in that book, IBM pretty much described what big data is. In a recent Wikibon survey, only 5% of people that we surveyed said that big data was a buzzword of unclear meaning. We asked that same question five years ago of cloud and I think 95% of the people said cloud was a buzzword of unclear meaning. So I think most of the community agrees with you, Bill. So hey, thanks very much for coming on theCUBE. It was really a pleasure having you. Thank you so much. It was great to meet you. Thank you, David. All right, keep it right there. We'll be back with our next guest. We're live from MIT. This is theCUBE.