 from Boston, Massachusetts. It's theCUBE, covering HPE Big Data Conference 2016. Now, here are your hosts, Dave Vellante and Paul Gillan. Welcome back to Boston everybody. This is the HPE Big Data Conference hashtag, seize the data, and this is theCUBE, the worldwide leader in live tech coverage. Mohan Vergera is here and he is a BI architect and Vertica database administrator at Eliza Corp. Out of the Boston area, Peabody Mass. Mohan, welcome to theCUBE. Good to see you. Good to be here. Thanks very much for coming on. So this is your second Big Data Conference. Right. What do you think of the event? Are you learning things? Are you excited about your peers, discussions? What do you take? Right, so it's great to be here because the technology has been changing rapidly and even the Vertica is adding stuff pretty rapidly. So it's good to be here to learn all the new stuff that they're presenting. And I'm excited to be here. I was here last year too. Again, it's pretty rapid change of the product itself and the additional things, the way they're adding things is pretty good. And we want to take advantage of all the abilities that they provide in the product. So there's some main reason that brings me here. And we are six people, seven people here from Eliza. The team. So you say changes. You mean things they've introduced in 8.0 or is it cloud, what are the things that are changing that are most significant? Right. When we started using Vertica, most of the customers which we talked to are having Vertica database on-prem. But we started Vertica directly on AWS cloud, which is Amazon web services products. And the primary thing we want to know here is how Vertica is doing things on AWS because it's a pretty new market for Vertica. Again, I heard from a lot of customers it's on-prem thing because it used magnetic hard drives to store data sequentially for performance and things. To answer your question, we have been learning a lot from Vertica in this conference about how they are doing things on cloud. That's pretty good because we can take advantage of AWS snapshots, which is a cheaper way of storage on S3 as a backup rather. You spin up one more AWS Vertica cluster and load simultaneously for backup and recovery things. So tell us, I should have started here. Tell us about Eliza, what the company does, some of the key drivers that are driving IT? Sure, Eliza is a health engagement management company. We have 100 plus customers, or most of them are healthcare and pharmacy customers. Our primary goal is basically to reach the member in the right time via right channel. We have a multi-channel, we do reach members through IVR, which is phone, email, as well as a text. Okay. So I would be a customer of my health plan and the health plan would contract with you to manage communications with me, is that what you're saying? Yes, more or less. You know, right, so we get data from our healthcare customers. If you are one of those customers, individual, we get your data and then we'll reach you on their behalf. For example, if it is a flu season, we probably call you go get a flu shot. Your area has been reported too many flu cases. So we basically engage you and we capture your data and we'll study you down the road in order to reach you in a right time or right manner because we have a multi-channel thing. Maybe you, when we talk to you, we capture things. What is your mode of preferred communication? If you say email, then we don't call you, we text you or we email you and the same thing for text too. If you prefer to communicate through a phone, we'll text you. Okay, so your story, your AWS customer. Right. You're using AWS, you're using Redshift, you started with Vertica in 2014. Late 2012. Late, so really, 2015, you started. Yes. Okay, we mean for real, right? I mean, you're really getting up there. We'll have this last quarter of 2014. You were sort of crawling and then jogging and then now you're probably sprinting in. Yes, coming on. So part of the story is you moved data from Redshift into Vertica. Is that right or you're constantly doing that? Or describe what you're, paint a picture for us. Sure. Basically, before even AWS, we were mostly SQL Server shop. We, our transactional system on DDW was on SQL Server. But data was growing rapidly. And we want to take advantage of AWS because most of the things that AWS provides as a service so we don't need any maintenance. We basically single clicks, double clicks, and then boom, you got the service up and running. So we started revamping, moving the whole thing, whole architecture to cloud. So before then, we had SQL Server as a transaction system and also SQL Server as a DDW too. So we moved to AWS using DynamoDB as a transactional system and Redshift as a DDW. So we moved everything to Redshift in AWS. Then we had to buy Vertica for some other reasons, meaning we want to do more statistical analysis on the data and we have a data science team and SAS integration things with Vertica. So we came to know that Vertica is a pretty good tool, our product, data warehouse tool, database to use all these things. So to answer your question, so again, we stood up Vertica to load partial data from EDW which is Redshift for analysis as a data mod. So currently we are using Vertica as a data mod and Redshift has a DDW. Ah, okay, so that's a sort of an ongoing, that's the infrastructure of choice and DynamoDB is still the transactional system. I see, okay, and so you designed and architected the ETL from Redshift into Vertica which is an ongoing process now. Right. Okay, and why Vertica? I mean, why did you, what was the business case for moving there in the first place? What problem were you trying to solve? Sure, in Redshift we have something called HashMap and Distributed Key Thing, but in Vertica we have this segmentation and live aggregate projections things. So the live aggregate projection is the primary reason, well I wouldn't say primary reason, one of the primary reasons to use Vertica as a data mod because when you use for live aggregate projections all the data when you load, real time will be aggregated for you. So one example I would give you is we have all the data in Redshift in the first place. So when we moved or when we, it's a continuous process of moving of data from Redshift to Vertica, right? So as on the fly, not most, there are few parts of the data in Redshift we have as key value pairs. So in order to get the key value pairs into column or format, we take advantage of live aggregate projections. So when we load the data from Redshift to Vertica on the fly, it creates the live aggregate projection and then we create other tables that consumes this data that were converted to columnar structure on the fly. That's one of the reasons. And the second reason for Vertica is basically SAS integration. We have our SAS and R team working on the, all the data from different clients. What I mean by that is previously we were getting data from customers when I saw our customers, our healthcare and pharmacy customers to reach particular people on a daily basis. But lately we made a deal with those healthcare companies where rather you give us members you give all the data you have which is pharmacy, enrollment, and clients data. We do processing for your meaning. We dig all the people. We study each and every person. We study their behavior and then we make sure who to call and what are the programs they should enroll. So that's where the Vertica has been playing a major role for our data science team as well as our people using SAS. So that's a service that you provide for your customers? Right. So you said pharmacy and healthcare companies and the claims data. Right. So you ingest that, do the analysis and then provide the recommendations back? We provide recommendation back as well as we flag the people who should we call. We have, we are a health engagement management company, right? So we have our patent speech recognition system. So we flag the people who are, we have something called vulnerability index, right? For example, if you are a 30 years male, no kids, you don't have health risks. But, you know, like if you are healthy, well, if you are 30 years male with three, four kids and then you have a mortgage and we reach people differently because we ask different kind of questions because you're healthy, your lifestyle is a little different when compared to who's single, right? So we consider all those factors using our analytical team and then reach the people. You must have had some data quality challenges. Of course, we... I'll continue to, I wonder if you could discuss that a little bit. Right, we have like three, four, we have three channels in the first place. Like we do receive data through email, phone, as well as text, right? That's, as we receive the data from three channels that's the first place where we need to find out what would be the standardized data doing just down the stream, right? That's one of the problems we have because most of the questions, most of the data we reach out, most of the data we collect after we reach out the people may not be in a similar format. We have different programs and different channels so all the data would be in different aspects, right? So that's one thing to consider when we load the data through all the ideal process to Redshift or Vertica. That's one thing. And the data quality is we, even though we have all the question codes in our control, we have some departments called Interaction Design Department. When we reach a person in program A, age, well age is a simple example. For example, age is a simple thing. In other words, if you want to use the question code as when did you get the flu shot, right? So for program A, we do like a flu. Some other program we do flu shot. So even the flu and flu shot are the same thing. When we ingest the data, we treat as two different question codes. That's when we do analytics. We have to consider flu as well as flu shot when we do analysis, right? That's one of the biggest data quality issues we have. And most of the times the ETL we build most of the times the ETL we build will be making sure the flu and flu shot are the same. But there are a few. But then so you rationalize that with just general, just corporate knowledge. Sometimes it's called tribal knowledge. It's just you have to remember to do that or do you have a way to sort of automate that categorization and flag those sort of data quality issues? Can you automate any of that? Or is it more just we need Mohan who knows how to do it to handle this problem? We can't automate these things because it's a team collaboration. Like we have three, four departments inside the company. So the department, interaction design department deals with what are the question codes we use for these programs, right? But analysis team doesn't involve with interaction team because analysis team is the last part of the puzzle, right? The interaction design team is a team that designs the question codes. So we have to coordinate with all the departments which is not a easy thing for each and every question code. But we have to make sure in the previous example flu and flu are the same, right? So we have to basically implement all the data quality rules while ingestion. Okay, last question. Things you're excited about? Things you're working on that have you passionate? Yes, we have been working on lately the member matching stuff. So in this whole data mart and data enterprise thing we want to identify all the members. We want to identify how many Mohans we have. Basically the Mohan can come from different customers, right? So from customer one we have probably a different MRN. From customer B we have different member number. So our company is basically building statistical analysis on the algorithm, neural network things to identify the Mohan is same. So that's one exciting thing to work on. And the passion things are, again, we want to take advantage of this Hadoop cluster on EMR to find out how Vertica can integrate with the HDFS system. Again, that's not passion, basically. Yeah, but fun new stuff. Excellent. Well, Mohan, really appreciate you coming on and talking about how you're solving some of these gnarly problems and best of luck. Thank you. All right, take care. All right, keep it right there. Paul and I will be back with our next guest. This is theCUBE, we're live from Boston, Massachusetts. Right back.