 Live from Munich, Germany. It's theCUBE. Covering DataWorks Summit Europe 2017. Brought to you by Hortonworks. Hey, welcome back everyone. We're here live in Munich, Germany for DataWorks 2017 Summit. I'm John Furrier with my co-host, Dave Vellante, with theCUBE and our next two guests is John Thomas, both behead of customer development, EMEA for Alation. Welcome to theCUBE and we have Bertrand Karyu who's the director solution marketing at Tri-Fecta and partners. Guys, welcome to theCUBE. Thank you for having us. Big fans of both your startups and growing. You guys are doing great. We had your CEO on our big data SV, Joe Hellerstein talking about the wrangling, all the cool stuff that's going on. And Alation, Stephanie's been on many times. But you guys are startups that are doing very well and growing in this ecosystem. And everyone's going public, Cloud Air has filed their S1, great news for those guys. So the data world has changed beyond Hadoop. You're seeing it, obviously Hadoop is not dead, but it's still going to be a critical component of a larger ecosystem that's developing. You guys are part of that. So I want to get your thoughts of why you're here in Europe, okay? And how you guys are working together to take data to the next level because we're hearing more and more, data is a foundational conversation starter. Because now there's other things happening. IoT and business analysts, you guys are in the heart of it, your thoughts. Maybe I'll leave, yeah. So definitely at Alation, what we're seeing is more and more people across the organization want to get access to the data. And we're kind of breaking out of the traditional roles around just IT managing both metadata, data preparation, like Trifac is focused on. So we're pretty squarely focused on how do we bring that access to a wider range of people? How do we enable that social and collaborative approach to working with that data? Whether it's in a data lake, so if we're here at DataWorks, really that's one of the main topics, but also other data sources in the organization. So you're freeing the data, the whole collaboration thing is more of, okay, don't just look at IT as this black box of give me some data and they spit out some data. That's the old way. The new way is, okay, all the data's out there they're doing their thing, but the collaboration is for the user to get into that data ingestion, playing with the data, using the data, shaping the data, developing with the data, whatever they're doing, right? It's just bringing transparency to not only what IT's doing and making that accessible to users, but also helping users collaborate across different silos within an organization. So we look at things like logs to understand who is doing what with the data. So if I'm working in one group, I can find out that somebody in a completely different group in the organization is working with similar data, bringing new techniques to their analysis and can start leveraging that and have a conversation that others can learn from too. So basically it's like a discovery platform just saying, hey, you know, Mary in department X has got these models, I can leverage that. Is that kind of what you guys are talking about? Definitely. I'm breaking through that, enabling communication across the different levels of the organization and teaching other people at all different levels of maturity within the company, how they can start interacting with data and giving them the tools to upscale throughout that process. We're trying to talk about the trifactic. It's one of the things that I find exciting about your value proposition and talking to Joe, this founder, besides the fact that they all have GitHub on their About page, which is the coolest thing ever, because they're all developers. But the more reality is that a business person or person dealing with data in some part of the geography could be whether it's in Europe or in the US, might have a completely different view and interest in data than someone in another area. It could be sales data, it could be retail data. It doesn't matter, it's never going to be the same schema. So the issue is, got to take that away from the user complexity. That is really fundamental change. Yeah, you're totally correct in the way. So information is there, it is available, Alation helps identify what is the right information that can be used. So if I'm in marketing, I could reuse sales information associated maybe with web logs information. Alation will give me the opportunity to know what information is available. And if I can trust it, if someone in finance is using that information, I can trust the data. So now as a user, I want to take that data, maybe combine the data, and the data is always different formats, structure, level of quality. And the work of data wrangling is really for the end user. You can be an analyst. Someone in the line of business most of the time, these could be like some of the customers we are here in Germany like Munich three would be actuaries building risk models and or claim for casting, payment for casting. So there are not technologies at all, but they need to combine these data sets by themselves and at scale. And the work they're doing, they're producing new information and this information is used directly to their own business. But as soon as they share this information back to the data lake, the inhalation will index this information, see how it is used, and put it to this visibility to the over user for reuse as well. So do you guys have a partnership or just more of a standard API kind of thing? So we do have a partnership. We have planned development on the roadmap which is currently happening. So I think by the end of the quarter of Q2 we're going to be delivering a new integration where whether I'm inhalation and looking for data and finding something that I want to work with I know needs to be prepared. I can quickly jump into trifacta to do that or the other way around and trifacta if I'm looking for data to prepare. I can open the catalog, quickly find out what exists and how to work with it. So basically the relationship, if I get this right, is you guys pass on your expertise of the data wrangling and all the back processes you guys have and advertise that into inhalation. They discover it, make it surfaceable for the social collaboration or the business collaboration. Exactly. And when the data is wrangled, it's again indexed. And so it's a virtual circle where all the data that is traded and combined is exposed to the user to be reused. So yeah. So if I were a chief data officer, I'd say okay there's three sequential things that I need to do and you can maybe help me with a couple of them. So the first one is I need to understand how data contributes to the monetization of my company if I'm a public company or a for-profit company. That's I guess my challenge. But then the other two things are I need to give people access to that data and I need quality. So I presume Alation can help me understand what data is available. I can actually, it kind of helps with number one as well because I can say okay, this is the type of data, this is how the business process works. Feed it. And then the access piece and the quality, I guess the quality is really where trifecta comes in. Yep. What about that sequential flow that I just described? Is that common in your business, your customer base? It's definitely very common. So kind of going back to the Munich reexamples since we're here in Munich, they're very focused on providing better services around risk reduction for their customers. Data that can impact that risk can be of all kinds from all different places. You have to think five, 10 years ahead where we are now to see where it might be coming from. So you're gonna have a ton of data going into the data lake. Just because you have a lot of data that does not mean that people will know how to work with it. They won't know that it exists. And especially since the volumes are so high, it doesn't mean it's all coming in in a greatly usable format. So Alation comes into play in helping you find not only what exists by automating that process of extraction, but also looking at what data people are actually using. So going back to your point of how do I know what data is driving value for the organization? If you can tell you, in this schema, this is what's actually being used the most, that's a pretty good starting point to focus in on what is driving value. And when you do find something, then you can move over to the fact that to prepare it and get it ready for an organization. So okay, so keying on that for a second. So in the example of Munich Re, the value there is my reduction in expected loss. I'm going to reduce my risk. That puts money in my bottom line. Okay, so you can help me with number one. And then take that Munich Re example into trifacta. Yeah, so the user will be the same user using Alation and trifacta. So it's an actuary. As soon as the actuary identified the data that is the most relevant for what you'll be planning. So the actuaries are working with terms like development triangles over 20 years. And usually it's colon by colon. So they have to pivot the data row by row. They have to associate that with the paid claims, the new claims coming in. So all this information is different from that. Then they have to look at maybe whether information or additional third party information where the level of quality is not well known. So they're bringing data in the leg that is not yet known. And they're combining all this data. The outcome of that work, that helps in their risk modeling. So that could be used by, they can use SAS or R over technology for the risk modeling. But when they've done that modeling and building these new data sets, they're again available to the community because Alation would index that information and explain how it is used. Other things that we've seen with our users is there's also a very strong, like if you think about insurances, banks, pharma companies, there's a lot of regulation. So as a user, if you are creating new data set, where the data is coming from, where the data is going, how is it using the company? So we are capturing all that information. Trifecta would have the rules to transform the data. Alation will see the overall high level picture from Tableau to the source system where the data is coming from. So super important as well for auditing. And just one follow-up, in that example, the actuary, I know hardcore data scientists hate this term, but the actuary is the citizen data scientist, is that right? Yeah, actuaries would know, I would say, statistics, usually. But you've got multiple levels of actuaries. You've got many actuaries, they're Excel users. They have to prepare data. They have to clean up, structure the data to give it to a next actuary that will be doing the pricing model or the next actuary that would be the risk modeling. You guys are hitting on a great formula which is cutting edge, which is why you guys are on the startups. But Bertrand, I want you to talk to you about your experience at Informatica. You were the founder of Informatica France and you're also involved in some product development in the old days, but back in the days when structured data and enterprise data, which was actually a hard problem, dealing with metadata, dealing with search, you had schemas, all kinds of stuff to deal with. It was very difficult. You have expertise. I want you to talk about what's different now in this environment because it's still challenging, but now the world has got so much fast data. It's got so much new IoT data, especially here in Europe where you have an industrialized focus, certainly Germany in case in point, but it's pretty smart mobility going on in Europe. You've always had that mobile environment. You've got smart cities, a lot of focus on data. What's the new world like now? How do people are dealing with this? What's your perspective? Yeah, so there's, and we all know about the big data and with all these additional volume and new structure of data. And I would say legacy technology can deal, as you mentioned, with well-structured information. And also you want to give that information to the masses because the people who know the data best are the business people. They know what to do with the data, but the access to this data is pretty complicated. So where trifacta is really differentiating and has been thinking through that is to say, whatever the structure of the data, IOT, so web blogs, value per JSON, XML, we know that should be for an end user, just a metrics, a table, but that's the way you understand data. The next thing when you play with data, usually you don't know what the schema would be at the end because you don't know what the outcome is. So you are, as an end user, you are exploring the data, combining data set, and the structure is trading as you discover the data. So that is also something new compared to the old model where an end user would go to the data engineer to say, I need that information, can you give me that information? And engineers would look at that and say, okay, we can access here, what is the schema? There was like all these back and forth. There was so much friction in the old way because the creativity of the user is independent now of all that scaffolding and all the wrangling, pre-processing. So I get that piece of the citizen journal, citizen analyst, but the key thing here is that you're extracting way to complexity, way to get the job done. So the question then comes in, and because it's interesting, all the Latine here at DataWorks Summit in Europe and in the US is, all the big transformative conversations are starting with business people. So it's the business units or the front lines, if you will, not IT. Although IT's got to now support that. So if that's the case, the world's shifting to the business owners, hence your start-ups. Is that kind of getting that right? I think so. And I think that's also where we're kind of positioning ourselves, which is you have Data Lake, you can put tons of data in it, but if you don't find an easy way to make that accessible to a business user, you're not going to get the value out of it. It's just going to become a storage place. So really what we focused on is how do you make that layer easily accessible? How do you share around and bring some of the common business practices to that and make sure that they're communicating with IT. IT shouldn't actually be kind of cast aside, but they should have an ongoing relationship with the business user. By the way, I'll point out that Dave knows I don't really, I'm not a big fan of the Data Lake concept. Mainly because they've turned it into data swamps, because IT deploys it. We're done. Check the box. Data's in there. But the data's getting stale because it's not being leveraged. You're not impacting the data or making it addressable or discoverable or even wrangle-able. But that's a word. But I mean, my point is that that's all complexities. Yeah, so we call it also the frozen Data Lake. You build a lake and then it's frozen and nobody can come fishing. You play hockey on it. You dig a hole and you're fishing. And you need to have this collaboration ongoing with the IT people because they own the infrastructure. They can feed the lake with data, with the business. If there's no collaboration, and we've seen that multiple times, Data Lake initiatives, and then we come back one year after there's no one using the lake, like one, two percent of the processing power or the data is used. Nobody is going to the lake. So you need to index the data, catalog the data to know what is available. And the psychology for IT is important here. I was talking yesterday with IBM folks at the party here, but this is important because IT is not necessarily in a position of doing it because, doing the frozen lake or Data Swamp because they want to screw over the business people. They just do their job. But here, you're empowering them because you guys got some tech that's enabling IT to do a Data Lake or data environment that allows them to kind of free up the hassles, but more importantly, satisfy the business customer. So you know what I'm saying? So there's a lot of tech involved and certainly we've talked to you guys about that. Talk about that dynamic of the psychology because that's what IT wants. It's almost a DevOps mindset for data ops, if you will, or data as code, if you will, is what concept we've been calling it. But that's now the cloud ethos hits the data ethos kind of coming together. Yeah, so I think data catalogs are certainly different in that traditionally they are more of an IT function, to some extent, on the metadata side, whereas on the business side, it tended to be a siloed organization of information that the business itself kept to maintain very manually. So we've tried to bring that together. All the different parties within this process, from the IT side to the governance stewardship all the way down to the analysts and data scientists can get value out of a data catalog and can help each other out throughout that process. So if it's communicating to end users, what kind of impact any change IT will make, that makes their life easier and have one way to communicate that out and see what's going to happen. But also understand what the business is doing for governance or stewardship. You can't really govern or curate if you don't know what exists and what matters to the business itself. So bringing those different stages together, helping them help each other is really what Alation does. Talk about the prospects that you guys are engaging from a customer standpoint. What are some of the conversations of those customers you haven't gotten yet together and also give an example of a customers that you guys have and use cases where they've been successful? Absolutely. So typically what we see is that an organization is starting up a data lake or they already have legacy data warehouses. Often it's both together and they just need a unified way of making information about those environments available to end users and they want to have a better relationship. So we're often seeing IT engaged in trying to develop that relationship along with the business. So that's typically how we start and we in the process of deploying kind of work into that conversation of now that you know what exists, you know what you might want to work with, you're often going to have to do some level of preparation or transformation and that's what makes Trifact a really great fit for us as a partner. They kind of come into that next step. Yeah, on the, like market share, one of our common customer, we have BNSF, also common customer, eBay, common customer. So we've got already mobile customer and so some information about the issue market share. They have to deal with their customer information. So the first thing they receive is data, digital information about ads and so it's really marketing type of data. They have to assess the quality of the data. They have to understand what values and combine the value with their existing data to provide back analytics to their customer. And in that use case, we were talking to the business users, like people selling market share to their customers because the fastest they can onboard their data, they can qualify the quality of the data, the easiest it is to deliver right level of quality analytics and also to engage more customers. So it was really to be fast onboarding customer data and deliver analytics. And where Alation is playing is that they can then analyze all the SQL statement that the customers are, maybe I'll let you talk about the use case, but there's also, it was the same users looking at the same information. So we engage with the business users. Exactly. So I wonder if we could talk about the different roles. You hear about the data scientist, obviously, the data engineer, there might be a quality, data quality professional involved. There's certainly the application developer. These guys may not even be in IT, then you got a DBA, then you may have somebody who's sort of a statistician, they might sit in the line of business. Is this, am I over complicating it? Do larger organizations have these different roles and how do you help bring them together? I'd say that those roles are still in flux in the industry. Sometimes they sit on IT, sometimes they sit on IT, sometimes they sit in the business. I think there's a lot of movement happening and it's not a consistent definition of those different roles. So I think it comes down to different functions and sometimes you find those functions happening within different places in the company. So stewardship and governance may happen on the IT side or it might happen on the business side and it's almost a maturity scale of how involved the two sides are within that. So we kind of play with all of those different groups. So it's sometimes hard to narrow down exactly who it is. But generally it's on the consumption side whether it's the analysts or data scientists and there's definitely crossover between the two groups. Moving up towards the governance and stewardship that wants to enable those users or document and curate the data for them all the way up to the IT data. Engineers that operationalize a lot of the work that the data scientists and analysts might be hypothesizing and working with in their research. And you sell to all of those roles so who's your primary sort of user constituency or advocate? So we sell both to the analytics groups as well as governance and they often kind of merge together. We tend to talk to all of those constituencies throughout a sales cycle. And how prominent in your customer base do you see the role of the chief data officer? Is it only confined within sort of regulated industries? You're seeing seep into non-regulated industries? I'd say for us, it seems it's non-regulated industries too. One percent of the customers, for instance, have this anecdotally, not even customers, just people that you talk to have a chief data officer, formal chief data officer. I'd say probably about 60, 70% of them. That high, okay. Yeah, same for us. Inregulated industries but also over. I think they play a role very often in chief data and analytic officers. It's data and analytics. So they have to look at governance. Governance could be for regulation because you've got governance policy. Which data can be combined with which data? There's a lot and you need to have that. But then it's, even if you're less regulated, you need to know what data is available and what data is trustworthy. So you have this requirement as well. We see them a lot. They are more and more powerful, I would say, than the implies when they're able to collaborate with the business to enable the business. Thanks so much for coming on theCUBE. Really appreciate it. Congratulations on your partnership. Final word I'll give you guys before we end the segment is share a story. Obviously you guys have a unique partnership. You've been in the business for a while, breaking into the business with elation. Hot startups. What observations out there that people should know about that might not be known in this data world? Obviously there's a lot of false premises out there on what the industry may or may not be, but there's a lot of certainly a sea change happening. You see AI gets a mental model for people, machine learning, autonomous vehicles, smart cities, some amazing kind of magical things going on. But for the basic business out there, they're struggling and there's a lot of opportunities if they get it right. What thing, observation, data, pattern you're seeing that people should know about that may not be known. Could be something anecdotal or something specific. You go first. So maybe that would be surprising that Kaiser is a big customer of us. And you know Kaiser in California and the US are big. They have hundreds of thousands of hospitals. And surprisingly, some of the supply chain people where I've been working for years trying to analyze optimizing the relationship with their suppliers. So typically they would buy a staple gun without staples. Stupid. But they see that happening over and over with many products. They were never able to solve it because why? That would be one product, they have to go to IT, they have to work, it would take two months and then there's another supplier, new product. So how to know which are- They're chasing their tail. Yeah, but okay, so that would be, it's not like super excited, but they are now to do that in couple of hours. So for them, they are able by going to the data, like see what data, see how this hospital is buying. They were not able to do it. So there's nothing magical here. It's just giving access to the data, we know the data best, the analysis. So your point is don't underestimate the innovations as small as it may seem or inconsequential could have huge impact. Yeah, the innovation goes with the process to be more efficient with the data, not so much building new products. It's just basically being good at what you do. So then you can focus on the value you bring to the company. John Thomas, what's your thoughts? So sort of related, I would actually say that something we've seen pretty often is companies all sizes are all struggling with very similar problems in the data space specifically. So it's not a big companies have it all figured out. Small companies are behind trying to catch up and small companies aren't necessarily super agile and aren't able to change in the top of the hat. So it's a journey. It's a journey and it's understanding what your problems are with the data in the company and it's about figuring out what works best for your solution or for your problems and understanding how that impacts everyone in the business. So it's really a learning process to understand what's going to. What are your friends who aren't in the tech business say to you, hey, what's this data thing? How do you explain it? The fundamental shift, what do you say to them? So I'm more and more getting people that already have an idea of what this data thing is, which five years ago was not the case. Five years ago, it was, oh, what's data? Tell me more about that. Why do you need to know about what's in these databases? Now they actually get why that's important. So it's becoming a concept that everyone understands. Now it's just a matter of kind of moving into practice and how that actually works. Operationalizing it, all the things you're talking about. Guys, thanks so much for bringing the insights. We wrangled it here on theCUBE live. Congratulations on trifecta and elation. Great startups, you guys are doing great. Good to see you guys successful again. Rising tide floats all boats in this open source world we're living in and we're bringing you more coverage here. At DataWorks 2017, I'm John Furrier with Dave Vellante. Stay with us, more great content coming after this short break.