 from San Jose in the heart of Silicon Valley. It's theCUBE covering Big Data SV 2016. Hey, welcome back everybody. Jeff Frick here with theCUBE. We are live in downtown San Jose at Big Data SV, which is part of Big Data Week, which is concurrent with Stratoconf. It's everything about Big Data is happening here in San Jose this week. So we are excited. We're coming kind of in the last day, three days of continuous coverage. Had a bunch of great guests and we're really excited to have on a Cube alumni, many time guests, Lawrence Schwartz, a CMO of Attunity. Lawrence, great to see you as always. Oh, thanks for having me again. Pleasure to be here. So what's the, what's kind of the vibe at the show? What's, what are you feeling? What's going on here? It's a 10th anniversary of Hadoop. It's amazing. You can just see the companies and how they've evolved, right? So I remember coming here, you know, a couple of years ago to the show, maybe four or five years ago, in companies that you would see in the startup showcase. Now they're, you know, keynote speakers. They've got a 20 by 20 booth. So you can really see how some of these companies have grown. So it's nice to see that. It's nice to see a full ecosystem. And it's nice to see a lot of, you know, real customer use cases and people really getting into understanding how they leverage this in their enterprise environments. So dig down on those use cases. What's the problem that people are talking about on the show floor? Yeah, I think the problem is you've got a lot of different solutions within the Hadoop ecosystem. You have Spark, you have Kafka, you have Scoop, all these different things that people are trying to integrate and come up with the best way to implement in their environment. So they come and, you know, a lot of the questions are, you know, what does our company do, of course, but they want to understand, you know, how do we fit in with the ecosystem? Who should I look at? What, you know, where are all the pieces of the puzzle come together? So they have some, typically some use cases today that they're leveraging Hadoop. And now they're trying to understand, you know, what's next? How do I really take it to the next level? And that's interesting to see because that shows a level of maturity in the environment. So we've got Hadoop that has been kind of over on the side and we're trying to move that whole ecosystem at least the approach is associated with that ecosystems where you try to solve deeper problems into the enterprise. Is there a difference between Hadoop where it's been and Hadoop where it's going as a play within the enterprise? Oh, absolutely. I think a lot of the cases, and we've seen this at our company over time, where people were kind of trying it at the edges and doing various analytics processes or other things. For specialized projects, now we're seeing more and more this is hitting the core of the environment. So for example, one of the things that we've seen with one of our Fortune 100 manufacturing customers that came to us and they said, look, this is going to be a central part of our analytics strategy of having a Hadoop data lake. They've got to pull in thousands of different applications into the lake. They want to have flexibility to use things like Kafka and go into not just a Hadoop lake, but they also want to go into NoSQL targets and Cassandra and look at Mongo. So now this is a major company looking at a massive deployment for their competitive advantage. And that's what's driven us on thinking about how we approach it as a vendor and how we incorporate things like Kafka into our current technology. So the challenge is going forward, obviously, is to have this, however, packaged and set up so that it is coherent and not just episodic. So in other words, it's not built out of piece parts based on when they become available, when I realize that there's an issue that I need to attend to. So from your perspective, how is this going to become more coherent? Yeah, yeah. So in the space that we play in, we have a lot of customers who are really looking at how do they integrate Hadoop and they have existing data warehouses and databases, right? So this is really, you want to make it as seamless as possible to drop in their Hadoop environment. So that means that it's got to be a few clicks and a few steps to really set up, how do I move data between an existing data warehouse and into Hadoop? How do I pull it off of a disparate set of databases? How do I have flexibility on the back end and really being able to package that complexity into an interface that's a couple of clicks to kind of set up what you're doing and set up the movements you want to do and the flow that you want to do. So that level of complexity might still be there under the covers, but giving that user presenting an interface that really simplifies it and makes it look like, well, just another source or just another target, if you will, in their environment. So as we think about that integration, where's the state of the yard today and how are you advancing it? Yeah, I think the state of the yard is definitely in a lot of different areas. You know, over time, you know, we came in and when we started doing Hadoop almost two years ago now, we looked at some of the open source tools like Scoop that were out there for kind of pulling in information and there's ways to leverage that, but we looked at that and said that's great as an open source tool, but if you want to go into a production environment, you really need to have something more robust, something that you can interoperate and use repetitively and simplify. And then the latest one that we've seen that, you know, you take Kafka, right, which has, again, very simple way to go in there and set up things with CLI and whatnot, but if you want to do this repeatedly, if you want to do this for thousands of different applications and you want to set this up and integrate with an enterprise environment, you've got to be able to leverage that as we do on the back end to make sure that people have a flexibility, but then plug that in into all the kind of interesting things you might want to do. So you take Kafka, for example, a lot of people think of it as a streaming platform, which it is, gives you a lot of flexibility, but if you think of a lot of existing data sources, it's not just sensors and other things that data might be coming from, it might be existing changes to a SQL database. And so you want to be able to stream changes that come up for that, do change data capture, pull that in. Well, that's one way to kind of take an existing stream source if you will add an existing stream source to a Kafka environment, which is thought of often for just regular or stream of sensor processing. Let's just say, for the people that aren't familiar with Kafka, give them Kafka 101. You just dropped like six Kafka reference from that last thing. And we're hearing a lot about it and we were at Spark Summit and we're talking about Kafka kind of under the cover. So the folks that aren't familiar, give them kind of the Kafka 101. Boy, all right. Yeah, so Kafka really gives you that flexibility. It's basically a streaming service to pull in lots of different formats of data and then be able to kind of write that in different ways out to the end environment. So it is great. A lot of people do use it for kind of high speed, very different types of sensor data coming into a system. I mean, that's a great use case. That's kind of where it started. We looked at it and there's obviously the flexibility to use it for that. But then you think of all the different types of data sources and for a lot of the enterprise customers, they have those types of environments, but they already have hundreds, if not thousands of SQL databases, for example. And those might be the intermediary that information's written to and they want to pull off of that, but it's being rapidly updated, right? So it's now you're not just taking it directly from a sensor. This information might live in an existing SQL store and then you want to pull it out and stream it out into a different type of backend. So we're able to leverage that kind of overall flexibility of a messaging kind of pipeline, if you will, to really integrate with an enterprise environment. And then the same thing on the backend, it's really nice in that there's a lot of flexibility in how you write the data and the format and how you do it. So while we started doing things like MongoDB over a year ago, and that's a specific format, you've got other things like Cassandra out there. You've got some really big customers we work with that are doing their own flavors of NoSQL databases or their own systems. So we were getting customers coming to us and say, well, I've got Cassandra, I've got Mongo, I've got my own thing, right? And instead of saying, we were trying to look at the problem of how do we have one solution that could help with a lot of different environments. And so on the backend, when you think of how you write the information, Kafka gives you a lot of flexibility in how you take the information out and then place it into all different types of environments. So that's been a great platform that, we look at it with different ways, we can adopt these open system platforms, pull them in and add additional value. But what's interesting when you're talking about those, not necessarily having a platform that can support lots of applications, but now it's really the applications that are driving a plethora of integrated architectures that support the application, but they just all need to pull off that same data or have access to that same data. So it's a very different way to kind of enable application build. Absolutely, yeah, and it's amazing. You've got some of these large enterprises that have thousands of different applications and all these different formats. So you've got to think differently. We used to look at replication and being in that space as from one point to another or a few points to a few points and now the number of inputs that you can have, as you will, has grown dramatically and then the flexibility on the back end for different types of targets has grown dramatically. So we've had to think of creative ways to adopt our platform to support that. So one of the reasons why asking a question like Kafka 101 on theCUBE is really relevant is because there are a lot of people that are entering into this world of better decisions, faster decisions that aren't raw technologists that need to understand this stuff. What are you talking to them about? You know, it's interesting. A lot of it is, we have even big enterprise customers come to us and at the end of the day, they're looking for how they, when they come to a replication provider, a data integration provider and data management, they're looking for ways to just, at the start of the question is, I've got a lot of data, how do I manage it? Do I wanna spread it across different systems? How do I think about the lifecycle of the data? What do I wanna move, right? So those are the questions that we see come up. And then we try to think of, well, what's the best way that we can help them to do that, right? Sometimes they might wanna keep everything in one area and just run a reporting server. Sometimes they might wanna do some tiering of information and put high performance data in one area and then use a lake somewhere else. So we're kind of looking at it from that holistic viewpoint and trying to figure out, what are they trying to do and can we give them enough flexible solutions? And we found that even though we specialized historically a lot in the movement of data, more and more people are asking the question of, starting before that, what do I wanna move, right? What's not being used, what's being used actively, right? And that's made us think of the broader question of, okay, let's look at data management as a whole. What's your goal? What do you wanna get out of each platform? And how does Hadoop play a role in that? How does traditional data warehouse play a role in that? How does the database play into that? How does it notice you, hopefully, into that? How does an in-memory, all these questions start popping up. So it's really trying to advise across that spectrum. And we just had a great conversation with Bill Sparzo talking about what's the value of this exercise anyway, right? And the values directly tied to the decisions and the business benefit from those decisions so they know how much to allocate on a budget. You know, what is the value of that data? So I would imagine those conversations are changing significantly where it's not, it's not like a fixed cost of worrying about managing infrastructure. It's enabling whole different levels of customer engagement, whole different levels of applications, revenue that they couldn't do before that now suddenly the exercise in the value of that data justifies the expense on the front end. Oh, yeah. And that's a, you know, some interesting thing because we would see, you know, in past years people would start looking at Hadoop as, well, you know, I could do analytics off that, you know, but more of it's kind of a cold pool. I don't care about, you know, the performance necessarily as much as I do a data warehouse. That's still a thought process that people go through. But you have to kind of look at your data warehouse and say, well, you know, where am I getting good performance? Where am I paying for performance where I don't need to, where I might want to move that off? Giving them the tools so they can ask those questions, right, which user is actually using the data, you know, and why, right? So getting that kind of BI, if you will, intelligence on the infrastructure is important to making those types of decisions. And then what's fascinating now is as some of these companies start getting, you know, hundreds of terabytes of Hadoop, you start worrying about other things of, okay, do I have a performance cluster of Hadoop and a lower tier of Hadoop, right? And so now you have to think of not just what's my intelligence and how to understand what's happening in a data warehouse, but what's happening within Hadoop, you know, who's using it, where is it residing, how often is it being used, things that a lot of people take for granted in most file systems, but isn't as easy to access and figure out in a Hadoop environment, so we're trying to help customers with those decisions as well. And the other thing that I think is interesting is that, you know, data could sometimes before be a liability in terms of how much you had to store in the experience and running it in some of these classic old systems that were not inexpensive. Now this whole move to, you shouldn't sample, you know, collect as much as you can, you're not really sure what you're gonna need it for, maybe down the road, so maybe you should grab it too. And then now it is potentially a strategic asset that we can use in ways that we didn't think about before. How have you seen that conversation change within the clients in terms of their attitude about data, not as a liability that I have to buy a bunch of expensive things to take care of, but really as an asset that needs to be nurtured made available to more people and leveraged to execute top line, you know, business profitability and revenue? Yeah, yeah, no, it's a great question. We saw, you know, we worked with one of the large telecom providers, and they have lots of different sources of data that they're trying, like 200 different sources of data they're trying to pull in. They pull it into a data lake, they keep, you know, as much as they can for kind of that long-term analysis. But then they kind of do their, you know, the transformations, they do extra work within the Hadoop environment, and then they say there's a subset of that data, kind of a golden set of data, if you will, that they then move into a high-performance data warehouse so that other people in the business can really query it quickly, they can get the information they want. So it's an interesting way of being able to leverage and keep as much as you want, but at the same time, if you need some high-performance analytics, high-performance information for the rest of the business, you have the flexibility to quickly and easily, you know, migrate some of it over, move some of it over, keep it in sync with a high-performance data warehouse so you can see those hybrid models evolving and people looking for that flexibility. And this is not unlike even in the data warehouse from many years ago, where we would do stuff within the data warehouse and then say, oh, well, that's interesting. Why don't we drive that in a reporting system? So there always is this tendency, as we gain more certainty about where the data is, how it's modeled, what kinds of questions we're gonna ask about it, to try to drive performance out of the process of processing that data. So as you think about some of the new technologies that you guys, some of the announcements that you have, where are you playing a role in that process of trying to drive more performance, lower the cost of integration? Yeah, yeah, no, that's a great area. So one of the things that we've been working on this year and we announced is one of our products composed actually helps customers. If you have the raw data, you move it into a data warehouse, there's still a lot of work to be done, right? To clean up, to normalize the data. If you can do performance on it, to set up a data mart, right? All those processes to do that in a traditional data warehouse could take people weeks, months or more, right? And we have a large worldwide insurance company to work with, right? Massive systems. Their ability to kind of add new information to a data mart and the flexibility to do that is almost non-existent. I mean, they can make changes like once or twice a year to this, adding in new information, building out these new models and marts take forever in a big environment. So even if you look at, there's a lot of room for improvement in traditional data warehouses. So automating that process, once you have a data model and being able to build out the data marts from that, then quickly the data warehouse as well, we've delivered tools for that automation. So if you think about the end-to-end process of I've got to figure out what's going on, I've got to figure out where to move it, and then once it's over there, how do I quickly get it ready for analytics? So now this large insurance company can now do this once a month or so and make big changes to their environment. You want to have as much automation in the process. You want to take a lot of that complexity, put it under the covers, give them people hooks if they want to get into it, but you really want to simplify that whole environment and that's how you get the most value out of marrying up Hadoop, how you get the most advantage of your existing data warehouses. So that whole process is how we think of it from end-to-end. I love having you on Lawrence, because you guys are, you're in the guts, you're down in the engine room, plumbing all this stuff together, hooking it up. But I want to ask you about another term we hear about all the time, which is, you've got your own data, which has value. And then now there's all these third-party data sources that may or may not, when combined with yours as well as the context, provide a whole nother level of value. Especially coming off the example you just used with a big mature system, it's not that easy to start plugging in new pipes. So how are you seeing people execute now going out to say get another publicly available data set, integrate that in with their existing process to come up with a whole nother level of insight and opportunity? Yeah, yeah. So being able to pull all those disparate sets into one area like Hadoop is always a great kind of flexible way to do that. And then I think what becomes more and more important, and we hear more customers talking about is, well, you want to be able to understand if you have data from one area and another area, they're both coming in, they look similar, how do I compare them? Then you have to start thinking about the lineage of the data, where did it come from, being able to understand that. So that's an interesting area. That's an area we get more and more requests than people figuring out. So being able to pull it in, being able to integrate it and being able to understand the value of it, because otherwise it's a sift of data. And if you want to then pull it out, if you pour this glass of water into the data lake, can you ever get that glass of water back, that particular glass of water? Those are the types of questions that people are struggling with. All right, well, we're coming towards the end of our time. Again, always love having you on. And I just want to give you the last word, kind of what are you working on? What's new to Trinity for the next six months that you're getting excited about? Yeah, yeah. No, there's a lot of interesting work going on right now. I think we've had some of the cutting edge. Enterprise is looking at, as I mentioned, being able to take advantage of Kafka and leveraging that. And I think we're seeing more and more of those environments spreading out there. People are looking at Hadoop as a core system. They're looking at ways that they can integrate it flexibly. And that's where we see a lot more and more of the business kind of going in that area. So we're bullish on the value of Hadoop and what it brings. And we're excited to see widespread deployment in the success of a lot of companies here. Awesome. Well, thanks again, as always, for stopping by Lawrence. Oh, my pleasure. All right, so I'm Jeff Frick with Peter Burris. Follow us on Twitter at twitter.com slash theCube. You'll see all the interviews go up, links to the interviews, CubeGems, which we try to get out pretty frequently, nice little highlights, CubeCards, all the fun stuff. I'll see you going to follow siliconangle.com and also at Wikibon to keep an eye on all the things that we're doing here at theCUBE. We'll be back after this short break with our next guest. Thanks for watching.