 Live from Boston, Massachusetts, it's theCUBE at the HP Vertica Big Data Conference 2014. Brought to you by HP. With your hosts, John Furrier and Dave Vellante. Okay, welcome back. We're here live in Boston, Massachusetts. This is theCUBE, our flagship program. We're going to go out to the events, Extracted Civil Noise. I'm John Furrier, Dave Vellante here at the HP Big Data Conference. Our next guest is Janath Manoharaj, Team Lead Database Services at Blue Cross Blue Shield. We're just talking databases earlier with the Vertica product team, so huge contentions between Hadoop and the database world coming together. Give us your take first of, you know, not at Blue Shield, but in the industry, your view of the database collision with big data. Is it going to be a war of religions? Is it going to be a war of formats? What's your take on it? You know, before, you know, our old style was, you know, you have the relational databases, and then you have these big data that was so new, and we kept it all distinct. But later on, we have seen, even in Blue Cross Blue Shield, we are all converging. And one cannot be the other, I mean live without the other. So it's pretty much, you know, like we do database, warehouse augmentation, where, you know, put all the data, never purge it, but put it on the low-cost disk, and whenever you need it, you know, use Vertica, for example, get all the data from it, and then process it, and then it will put it back to the HDFS that's pretty much Hadoop. So it's pretty much, you know, converged. That's what I see. So in thinking about, sort of, your business, the claims apps, right, the agent systems, I mean, they are, and have been the lifeblood of the organization, and the processes developed around those, and then all of a sudden this sort of big data theme comes in. So how are you using data historically, maybe talk about your enterprise data warehouse and how that's, you know, transforming? What kind of journey are you on with regard to big data? You know, the old way, or even like past couple of years before, we used to have, you know, each project will have its own, major project will have its own warehouse. You know, it's like data everywhere, you know, one customer comes in, for example, one Blue Cross Blue Shield plan, his data is all in his place, and another, we had like multiple copies back and forth, and you know, it's the total cost of ownership was high because the data is replicated, the storage, you know, you need data for the multiple copies. But with Hadoop, you know, it's the cheapest solution that I can think of. You put all the data, it's like one unified repository, you know, and it has pretty much good security where you say, you know, let's isolate this part of the data, so with all these isolation levels, it really helps us in bringing the cost down and also increase in performance. Okay, so you joined Blue Cross Blue Shield, you're based in Chicago, what's the name of the organization? Blue Cross Blue Shield Association. Association, okay, and that's the sort of, it's the new cover, Chicago, Midwest? Yeah, yeah, Blue Cross Blue Shield Association is a federation of 37 independent community-based Blue Cross Blue Shield plans, and it's spread over like 170 countries, and you know, what it does is it owns the trademarks, the grant licenses, and everything, so if you want to get the name, you know, you would need to be licensed by us. So you joined in 2008, back then nobody knew what Hadoop was, I'm not even sure if Hadoop was invented by then, when did Doug Cutting come up with the idea, John, who was probably after that, right, but even 2010, what's Hadoop? 2011, what can I do with Hadoop? So when did you guys sort of see the potential as an organization and in you personally, take us through that. Yeah, you know, before we used to, this is pre-08, we used to have all these big three databases, you know, all the big vendors, like established ones like IBM, Oracle's, SQL Server, even though, even now they are big, and we have all their warehouses and things like that. One of each. Yeah, one of each. Yeah, yeah, but you know, the things was, you know, with all these big three vendors was like, you know, one size fit all kind of a thing, you know, if it's OLTP, you know, where the banking transactions things that are like shorter and it's very fast. So it's also the same warehouse or a database. And pretty much if you want to mine the data, it was pretty much used the same thing and that was kind of a problem for us because one size doesn't fit, some it works, some it doesn't. So what we ended up doing was, you know, in 08 and things like that, we started bringing in Vertica for pretty much ad hoc processing, you know. And it really went well with Vertica. The performance was like at least ten times faster with the previous one that we were using. Which was a traditional database, DBMS, or DBMS. So the move was from, you know, from traditional DBMS into columnar based. So that was our first step. And then moving forward, you know, now we are thinking about creating a data lake, like a one unified repository. And also how to include Vertica and Hadoop integration. So those are the big things and we want to learn more about Vertica Hadoop integration. So that's where we are here. Okay, and so, go ahead, John, sorry. I was going to ask about some of the health care issues. I mean, honestly, Blue Cross databases are huge because now you have to slice and dice everything. All kinds of restrictions and things that you're asked to do from a compliance standpoint, but yet you've got to be innovative. How do you balance that out from your perspective and share that you sit in every day database? What are the issues that you're facing that you're solving these new problems? What are the new problems that you're solving? And how does that relate to some of the things in health care that's unique? Yeah, you know, in health care, you know, it's pretty much all provider claims. I mean, Medicare, Medicaid, all medical pharmacy related data. And you know, the problems was, you know, for the persons that go on the website and then they click on a button. And then you need to, and then they say like a five mile radius, give me all the providers, right? So the information needs to be like instantaneous. And that used to be a problem before. And I'm just giving you one example. And with this newer, you know, like the columnar store and things like that, you know, it's, we have seen like a 10 times more performance boost by going Vertica's way. This seems to be the big database scuttlebutt going in the industry is that Vertica's got a bad ass solution, the database. Yeah, exactly. I think it's one of the best databases that we worked with. And even in the POC, you know, right from the proof of concept, it was very much better. The things that used to take like 40 hours, we pretty much take 40 hours in a year. You know, HP, we always give HP credit for things, but also give them some critical feedback, which is they don't market very well. They can't get the word out. They like it in their own way sometimes. I mean, they have praises from all their customers about how great this database is. I mean, it is the Ferrari of the industry. But the interesting thing is that there's so much fud going on around databases because the functionality is changing. I mean, IBM has been a big leader in databases for a long time. Certainly Hadoop is open, scale out, open source. But now you have critical deployment. So in context of the IBMs of the world and to the open source, why is Vertica so successful in your opinion? Well, what I think it's, first, the performance is good. The second, low cost. The third one, I mean, we were an early adopters of Vertica. So I think so far the support has been really good. I even see the support manager, like Amy Miller, calling us, saying that if any problems, like more proactive style of support rather than you place a call and then they call back. I mean, so far all these things really help. So you talk about building this data lake. Correct. I presume that's a virtual data lake, right? You're not going to try to put all the data in one place or are you? Describe that initiative. Yeah, you know, the thing is, some of the, I mean, I can't provide much information because I believe it's like proprietary. Oh, okay. I'm sorry about that. Go ahead, no, share all. I'll talk. But just conceptually, should we be thinking about a data lake as a God box, or sort of a distributed set of data? Yeah, it's pretty much like you said, a combination of virtual as well as physical because with all these specialized databases sitting around it, so you have a big lake and one can, it would be the source that would feed all these subsystems. So I'm telling you at a high level. Okay, we have to be careful there, I understand. But now in terms of Hadoop, are you using an open source Apache Hadoop? Can you tell, are you using a vendor distribution? Are you paying for it? What are you doing with Hadoop? Well, you know, we are still looking at it. You know, I can't divulge more details, but we are looking at all the big four, you know, like IBM, Cloudera, MapR, and also Houghtonworks. And particularly, we are interested with these 50 million, what do you call that, the agreement that went on between the HP and Houghtonworks. I wanted to learn more about it. And also- The H in Haven, does it stand for Houghtonworks now? Or is it still Hadoop? Yeah, and also we're pretty much impressed with MapR, their platform, you know, the custom MapR file system. Okay, so you are using Hadoop today, correct? Correct. Okay, and you can't tell us who you're using, but I presume it's some kind of open source distribution. And then how much data are we talking about for your big data projects? Talking terabytes, petabytes? I think we would start with terabytes and easily move towards petabyte. We have so much data that's all distributed now, so easily in petabytes. Are you using, are you bringing in outside services to help you do this stuff, or are you kind of doing it all in-house? It's a combination of both. You know, if you have the, in the beginning, for example, with Vertica, we got all the HP folks working with, I mean, the Vertica, at that time, it was a separate company, they were working with us. But as time went on, we got on job training. So it would be the same thing with Hadoop environment too. Okay. But we are not, because Hadoop is evolving so fast, with particularly, you know, before in Hadoop version one, everything was like single threaded, you know, you can't run in parallel, but with the newer Hadoop two or things like that, you can run in parallel, so, so. Now, you mentioned, so you had a lot of different data warehouses, and I'm sure you had data marts, and you have a lot of different data types that you're working with, structured and unstructured. So are your big data projects bringing together diverse data sets? Are you primarily working with, you know, singular data sets, or are you bringing in multiple data sets? I think we'll start with some projects, you know, some data sets, and eventually it would be all in one. Okay. That's the plan. But diverse data sets that you're actually having to work with, and so how do you deal with the data integration challenges? You know, it's the big thing, you know, that's why we have all these proof of concepts to, some of the tools have native capability to do it, but I think we're still working on it. It's a thing in progress. Do you have a chief data officer? Currently we have like two teams that manage this thing. One is like a, we call it a BPS. It's like a business protection services in combination with the enterprise architecture. They found this role that you said. And that's an IT function. You're correct. Okay, so the de facto chief data officer sits within IT. Right. Okay, and then are you, I wonder if you could, you know, think about, you have $100 to spend. Obviously you spend more than that on your technology, but if you had to look forward and think about allocating that $100 between say traditional enterprise data warehouse and more modern architectures, how do you see that sort of spend shifting over time? Will it largely stay the same? Is it going to be dramatically toward the new modern stuff? I wonder if we could talk about that. I mean, with the way it's going, it's more towards the latest and greatest, you know, the open source solutions and with the specialized, you know, the niche databases like Vertica and things like that. But yeah, I feel we are moving towards that. So culturally, the move to open source, how are you dealing with that? Is there drawbacks of open source that you have to manage? Do you have to change your processes or your skill sets to deal with that? Yeah, the open source has always been a challenge, you know, just like Java was an open source compared to like .NET, which was a proprietary. So there have always been this learning curve which would be steep. So with the open source, you know, you would need to have like training with one of these big vendors and so that we're not, so that to avoid making mistakes when we go live, you know. So I've seen open source, it's kind of a, the learning curve is kind of difficult. You get more releases and things like that. But with the proprietary, it's much easier. So that's the trade off. So any last question for me, any advice you would give to your fellow practitioners trying to move from the sort of old world to the new world? What would you, what's the one action item you would ask them to take? Yeah, the one of the things that I learned was, you know, for some things after we do careful analysis, some projects really fit the new world where you can use Hadoop and other environments, you know, the latest offerings. But for some, you know, like the banking and OLTP things, I would stick with the old. Yeah, I mean, it matters. It depends on the workflow and things like that. But if it's like heavy duty mining, then we will. Work-lose matter. Correct. Thanks for coming on theCUBE. Really appreciate you sharing your perspective. We, you know, we give Jeff Vertico here and there, but they are doing great. Thanks for sharing your insights. And again, the database where the action is, and it's really interesting right now. It's only going to get more complex, but more important. And with mobile and cloud on the horizon, multi-tenant security, we haven't even gone there. It's just going to get more exciting. So if you're a database geek, it's a great time to be in data. This is theCUBE live in Boston for HP Vertica Big Data Conference. We'll be right back with our next guest. We'll be right back with a short break.