 Live from San Jose, California, in the heart of Silicon Valley, it's theCUBE. Covering Hadoop Summit 2016, brought to you by Hortonworks. Now, here are your hosts, John Furrier and George Gilbert. And create some very interesting new use cases where you're combining financial transactions that are done in a real-time basis from a database and combining that with new social media or other types of unstructured data in big data environments. So that's one piece. Just for unstructured data and loose data brought in, what's the role of Kafka in all this? That seems to be a really big part of it. Yeah, so what Kafka can do, in the example I'm looking at here, is take the stream of data that's coming in, let's say real-time transaction that's layering on top of a traditional database. Relate that as a message stream into Hadoop, the data lake, where in turn it can be co-mingled with other types of unstructured data. Let's say social media feeds, other types of customer service records, and that can enable you to have more real-time interactions with customers. You know the transaction that just happened, you've got other bits of intelligence, you're putting it together for a customer service rep, maybe an instor clerk, they can act on that information. So we've heard a lot about these various design patterns or architectures, Lambda, Kappa. The idea of being that in the past, this database was a self-contained thing, and you might have multiple views on the database, but now we're hearing, because we're putting data to so much greater use, we might collect data, ingest data coming in and through Kafka or something like that, with the help of Attunity, it might go have one view over here, another view over here, another, and it's all kept in sync. Have you seen customers asking you for that, or are you advising customers what they can do with that sort of architecture? We have, I think that the demand for Kafka has been the ability to have that simultaneous broadcast, if you will, to multiple big data ecosystems, and each of them is going to have that river of data streams, and they might have different data points that they're comparing them with, combining them with, in order to act upon different use cases. So that's very much one of the benefits of Kafka. It's one of the reasons that our customers are excited about feeding into Kafka some of those more traditional structured data sources. So what, can you paint a picture of some of the new applications that either your customers have built or are starting to think about with that capability? Because that's very different from what I gather, how it used to be, just ship a log of changes. This is fanning out to all different views. Yeah, we're working with a Fortune 100 automaker, and they are consolidating data from about 4,500 applications, and this would include DB2 and some others, and they're consolidating all this data from structured and unstructured data sources into a Hadoop data lake. They're also using Kafka to start to broadcast what's coming in in a real-time basis. And the whole point here is to ensure that an automobile, it really becomes more of a connected experience for users. That starts to facilitate in the future things like self-driving cars, smart cars, usage-based car insurance, which is already well underway, as we know from folks like Progressive. And Atuniti's role is feeding the very consistent and ordered data into Kafka? That's correct, so we play two roles here. One is that we can feed data to Kafka message brokers, which will in turn broadcast them to multiple big data ecosystems, including Hadoop. We can also do real-time efficient ingestion into Hadoop. And we'll do it by automating the ETL process, taking the manual coding out of the process to ensure that you don't have a resource burden to minimize the risk and minimize the effort required to make it happen. What's all this talk about a logical data warehouse? What is that all about? So the logical data warehouse is a very interesting trend. And I think that what we're seeing, especially to show like this, is that you've had the traditional data warehouse storing things typically in rows and columns, and you've got structured environments. They're starting to grow. They're growing fast. They're incorporating new types of data. Now Hadoop comes along, and Hadoop is a great offload candidate for the cold data. Our visibility offering, where we had an announcement here that I'll talk about, can move, identify the cold data within data warehouses, move it to Hadoop, where it can be stored more cost-effectively. The logical data warehouse can help you look across your environment, and might include Hadoop, and might include multiple data warehouses, and manage that in more of a holistic fashion. Manage that multi-platform environment in more of a holistic fashion. What's the biggest things that you guys are involved with customers? The management of data is a big deal is the life cycle of data management. And all the real-time streaming, your predictive and prescriptive analytics, what is customers worried about, and where do you guys fit in with the customer's conversations? Because there's so much to kind of figure out. How do you guys fit in the customer problem set, and what's the impact of the customer that you guys help solve? Sure, I think that customers are very excited about the opportunities of data analytics. The upside has been trumpeted by pundits, and the press, and so forth. We're all very excited about what can be achieved with analytics. The reality is that this creates a huge infrastructure burden in terms of cost and complexity. And so organizations need to start taking a hard look at how they can build infrastructures, multi-platform environments that can scale. And a big part of that is data management and data mobility. So we really address some fundamental pain points, helping integrate data more efficiently and more effectively across platforms. Attunity Replicate can help do that. We will help automate the ETL process and automate the creation of data warehouses and data marts. Again, shrinking the amount of administrative burden and skilled resource that's required to do it. Most importantly, we'll also provide this visibility layer. And visibility will look into a data warehouse and help you understand how tables, rows, files are being consumed by specific users, user groups, and so forth, and the resulting impact on the underlying infrastructure. So there you get to the necessary visibility into how your infrastructure is being used, what the impact is on performance and cost. And you can really help start to manage that multi-platform, high-scale, high-growth environment in more of a sustainable way from a budget and a cost and a performance perspective. Are there, would you put into that category of trying to understand who's using what and how vendors such as Alation and Waterline Data where they're not trying to do the propagation of data with integrity that you are, but they're trying to carve out a small piece of help the customer manage the metadata. Are they in that space? Are there others? There's just, that's a similar space. I think that metadata has a variety of different uses and there's more and more value in metadata and understanding how that can help you ensure the integrity of your data, data quality, it can help you profile data. We're starting to build some of those pieces into a tunity compose. What we're doing with visibility is really providing very intuitive metrics about usage of physical and logical resources that can help you make much more informed decisions about performance, cost, data placement, compliance, and how to meet data retention requirements in particular. Oh, so metadata means many things. It does mean many things to many people, yes. Okay. And the metadata management piece, what has it all fitted? So metadata, again, can mean multiple things. In our case, what we're talking about is usage profiling, data usage profiling and workload analytics. How is data being consumed? And then what is the resulting workload? So you can understand, okay, if this workload is, if this data, this set of tables here has not been touched in 30 days, but updating it on a nightly basis is taking 30% of CPU cycles within this premium beta warehouse, I'm going to get a huge benefit if I take that and move it off to Hadoop. What's the biggest problem you guys solve for customers? You have to nail it down. Yeah, you know, if I were to talk about the perspective of Hadoop, which is an increasing chunk of our business, it really boils down to applying enterprise class, data warehouse class controls to the Hadoop data lake. And it requires understanding, and this is the new capabilities that we're announcing here at the show with visibility, understanding how Hadoop files, Hadoop users, directories are being used and how they're behaving over time. And from that, you can start to say, okay, I have this data lake. It grew 100% last year. It's probably gonna grow 50% this year, and I'm gonna have three new data types a month. That creates an explosive set of circumstances. People need to get that under control. They need to apply more rigorous capacity planning, and they need to do it based on metrics. So what we're gonna do is provide those metrics about file usage, file utilization, user activities so they can understand with a clear-eyed view what to look for and what to prepare for. And the problem is time, money, savings, less of the data, I mean. Yeah, it's, how do you make use of Hadoop without breaking the bank? How do you make use of Hadoop? So the problem is they're not getting in time to value and it costs too much, so they reduce their cost and get the time to value, that's okay. Correct. Just a quick question. Have you quantified in the last six to 12 months what a terabyte in a data warehouse appliance costs versus a terabyte in Hadoop? You know, I think that the numbers change quite a bit and it's gonna vary by customer. I don't have ready estimates for you. I think there's no question that there's a huge savings to be had. So for example, we worked with a Fortune 100 bank here in the U.S. and by helping them identify cold tables, cold databases, and institute a periodic ongoing tiering plan, they were able to avoid $15 million in planned upgrades to their data warehouse over three years by moving that data to Hadoop. And Hadoop, the cost of Hadoop was a fraction of that. Okay. So the last question, last minute of the interview, I wanted you to just describe what you guys are announcing here at the event, the big news, and the focus of the show. Yeah, visibility 7.1, our offering has deeper intelligence for Hadoop. We will help organizations understand usage of files, usage of directories, specific user activities. They can formalize and automate reporting schedules for this, which will help them plan more effectively about how to ensure that their Hadoop data lake has the right infrastructure. Going forward, they can support growth. They can meet budget requirements and they can comply with regulations. All right, Kevin, thanks so much for sharing the insight and the cue, appreciate it. Attunity here inside the Cube at Hadoop, Summit 2016. I'm George Gilbert, we'll be right back with more live coverage after this short break.