 Live from Dublin, Ireland, it's theCUBE, covering Hadoop Summit Europe 2016, brought to you by Hortonworks. Now your host, Dave Vellante. Welcome back to Hadoop Summit, Dublin everybody. This is theCUBE, theCUBE goes out to the events, we extract the signal to the noise. This evening, the big event is of course at the Guinness factory, we're excited about that, the customers are all going to be there, drinking that lovely beer. Andrea Capodacasa is here, joined by Hessel Medema. These gentlemen are big data architects with Cap Gemini based out of London. Gentlemen, welcome to theCUBE. Thank you very much. We got the double factor of the big data architects, big brains in touch with customers and what customers are doing, so let's unpack it. I mean, we've been talking all day about how Hadoop and big data, it's kind of reaching adolescence and growing up. So Andrea, let me start with you. So from Cap Gemini's perspective, where are we in this whole big data theme? So after a few years in which, let's say the big data was this technical novelty and was a technical problem, now we start to see with our customers, especially the largest ones. The first large implementations that are going into production. And so we see the focus that is shifting from the technical problem into a wider problem. So you need also to take into consideration the operational and the organizational aspects when you want to transition smoothly into these modern architectures. So Hessel, you guys are both big data architects. What is that in the world of Cap Gemini? You look like you dress like a big data architect. Andrea, you dress like a business person, so but you guys are both big data architects. Is that a technical role? Is it a business role? Is it a dual role? Talk about that. So I guess probably it is on the dividing line of business and technology and so many of these kind of roles. I think the last couple of years it was primarily a technical role because we're just trying to implement all these new technologies and learning how to implement them in the best way because nobody had done it before. I think nowadays it is more about trying to establish a business transformation to actually know how to leverage all this information and that's now readily available for all the people in the business. And how much of the role is strategy versus sort of implementation? It depends on the project, of course. And also the suit is depending on the project in general. But I would say at the moment I think we are lucky enough to be considered as advisors for our customers in their transition to the new world. So I would say that especially lately we are probably 60 to 80% in the advisory and in the strategy and in creating this roadmap for a smooth transition rather than the implementation. I mean essentially your company is all about getting people to do something to improve their business or their organization. I mean doing nothing's a strategy, it's just not a very good one. So you're trying to achieve business outcomes. So one of the things you guys are talking about is moving folks to a modern data architecture. Generally big data I guess specifically. What does that mean? I guess it means that you understand your landscape well enough that you can leverage all the technologies that are available in the big data ecosystem to the best of their advantage. And if you look at all the vendors here at the Hadoop Summit there's so many technologies to choose from and but actually leverage them in the best way and I guess a lot of lessons that were already there for example in the data warehousing world that everybody has experienced with for tens of years. We have to a little bit relearn again. So for example how do you manage your data and how do you tackle data quality and all those kind of things? So the traditional enterprise data warehouse, Andrea, has we got a kind of a love hate with it. I'll say that, right? I mean it's obviously driven a lot of business value but it's been a challenge for people. It's like a snake swallowing a basketball in terms of data. It's very difficult to keep up. You have to go through a select few analysts that can get you the answer. And the architecture is you put a big pipe into a big box and then you have some lords over that box. So how is a modern data architecture different than what I just described? And is it fair what I just described? Yeah, I think it's a really good question and the promise of the enterprise data warehouse was to have this kind of unified architecture, this kind of ivory tower with some top that was the Oracle, the definition, full definition of the reality. And this did not work. So this single view is something that cannot be achieved by a single architecture. So what these new architectures are bringing is the agility. So you will have different layers. This is something that actually we will be speaking about later in our presentation at three o'clock. Let me do some sort of promotion. Three o'clock on the second floor. So coming back, there will be still some sort of data that are very well certified, very well of higher quality. And these are, let's say in water, we call the traditional data warehouse. But then we have more flexible ways to join new data that are giving us a better business insight. So we are building on what was the data warehouse that is not the only way to show the data that are being used. But we essentially have this kind of archipelago of data that are interconnected. So this is possible just to these new architectures that are very open, very flexible and very powerful. And not to mention it's very cheap. Is this always cloud? Is it 100% cloud, however you define cloud? From our point of view, we are agnostic. I mean, the data needs to stay in the place where they need to reside. Especially in Europe, you know that there are some concerns about the privacy. There are different nations that have different views on this. So sometimes the best option is to keep them in, let's say, data centers, et cetera. So cloud is an option. In many cases, it's a fantastic option, especially to start very, very quick because you can start immediately focusing on the data rather than on infrastructure. But we are open to many different... But what if I define cloud differently than where the data goes? What if I define cloud as sort of the operational experience? In other words, you know, people think of public cloud, Amazon, Azure, whatever is very simple and agile. Can I replicate that on premise today? Absolutely. Okay, so let's call that cloud. Yeah, yeah, yeah. So is it fair to say that everything must be cloud as I've just defined it? Yeah, I think in your exception, essentially, what if I can rephrase it? Yeah, sure. The users need to focus on their analysis, not on the iron that is allowing their analysis. And so we're going in that cloud in the sense that we are leaving behind, we are hiding the technical complexity and we are leaving the users to what they need to do. That is business. So I wonder if you could comment on the following. So we've seen this emergence of Hadoop and Data Lakes, et cetera, and I want to talk about the business value that customers are seeing. It seems that a lot of the business value is coming from cutting costs. Hadoop is a cheaper storage container than putting it in a million dollar mainframe box, for example. So okay, that's a good way to get ROI, cut the cost. Have we gone beyond that? And what kind of examples can you give in terms of real, tangible business value? I guess the easiest examples are in the main, for example, like fraud analytics and just seeing who's doing any credit card fraud. And with the proper data scientist, it is actually quite easy with all the technology that's available now to find those kind of patterns. And that's like, for example, predictive asset management. I think that's getting very, very popular right now, is that you can actually, for example, maintain very expensive equipment before it actually breaks down. So is that, I want to follow up on that. Is that a technology factor or is it a cost factor in the sense that it's now inexpensive for me to eliminate sampling, for instance, fraud detection? I couldn't do it before because it was just too expensive or is it a case where the technology wouldn't allow it? It was too rigid, too stove-piped. Is it some kind of combination? I think it's actually giving the tools to the right data scientists to actually leverage that information. And just the enablements that a Hadoop environment can bring to indeed not use any sampling, but also just create many, many models of machine learning that you want to test against your data. That really is the big different shape and the agility that it offers. As the data scientists kind of taken over the role of the business analyst, the person who built the cubes and so forth, are they a bottleneck? They can be because in some cases when you have to do something that is very complicated, you need to have very specialized and very bright people that are kind of unicorns in the term. They need to understand the mathematical models. They need to understand the business and they need to understand the underlying technology to crunch this massive quantity of data. However, the solution is to keep the data scientists where it's strictly needed because there are many cases in which, for example, the fraud analytics, you need a data scientist to set up the system. But then the analyst without the mathematical degree can easily use the parameters, can tweak it to identify if there are frauds. So in essence, again, you will have very specialized people for very complex tasks. But in general, you are giving very sophisticated tools that are easy to generate and agile to change for a much wider and less specialized audience. So let's simplify the discussion for us. Let's say there's two types of installations, architectures. One is the traditional enterprise data warehouse and the other is sort of a modern architecture. You guys are espousing. When you go into talk to clients, do you see patterns emerging for each of these that are clearly discernible? Is it still fuzzy? I wonder if we could talk about that, Hessell. Well, I think it is very sensible just to think about your use case. What do you want to use? And so data warehousing technology is clearly not obsolete. But what you do see is that more and more use cases can be delivered by big data platforms. So I think most of the people who are building data warehouses already are doing this for a long, long time. So they know when to build a data warehouse. I think it is very important to actually know what are your right use cases to start leveraging a modern data architecture. And of course, if it's streaming data, probably it's quite likely that you want to use a modern data architecture. If it's unstructured data, of course you want to use Hadoop architecture. If it's structured data, well, I guess it's a cost-benefit kind of analysis that you have to do. So let's assume the use case lends itself to a modern data architecture. What do you see as the critical success factors to getting there? So in general, of course, you need to find the right set of tools for the right case. This is kind of obvious, but sometimes, let's say, the enthusiasm for some particular technical solution is prevailing. This, of course, now is kind of fading this kind of problem. But then, of course, we need also to make sure that we are able to transition to the new architecture in a very simple and smooth way. The users, they wanted to have their continuity when they are after the migration. So we have also this kind of decommissioning strategy. And of course, we need also to take into consideration the user adoption. Most of the users don't even know what is big data. This is the reality. We are super excited about the reality. The users, especially in big companies, they barely know what is a database. So they cannot really understand. So we really needed to think about a communication plan. We need to get them excited about the possibilities that are enormous, about what you can do, like joining the old data, let's say, with this streaming data, the different data, all the data that are so different, so big, so fast, et cetera. So this is something that needs to be done. Cannot be left, you know, the improvisation. But the reward is huge because then the users, we love, we start to love immediately, the new and we forget the old. And then you can decommission much quicker than the old. So let's talk a little bit more about that adoption. You know, you go through the process of, you know, you're consulting with your customers, advising them, maybe they have you do some of the implementation work. They bring their best people in and now they've got this modern data architecture. So you talked about some of the ways in which you can get adoption, but can you operationalize it so adoption can really go through the roof? So I think maybe our biggest success story in this is I think when we are able to actually create physical environments where the data scientists and the business analysts and the consumers, the end consumers of the data get together. So when you create these insight centers, it really provides a collaborative environment where you can actually transform the raw data to actually the business insights that you're looking for. And I think, again, if you then compare it with a traditional setup, it's actually where you used to have the business analyst and the data modeler sitting there from their ivory tower creating these models of which they think will lead to business insights. Now you can have a direct impact on how you're going to use the data and find those insights. Actually, when you can set up a physical environment where people can be close together, we definitely recommend it. It's always a success. So that physical, the proximate collaboration, interesting. Yeah. We joke a lot in theCUBE about the data lakes. My partner, John Furrier, he doesn't like data lakes. He likes data ocean because lakes sometimes become ponds and they turn into swamps. And how do you avoid that dynamic? The data lake just being a bog of data of no value. So even that, we are in a world in which we have the traditional data that even before the big data they were significant. But now they start to explode. You'll see like the terabytes of data that are created by any sensors, any plane, et cetera, et cetera. So you need some processes that make sense of the data that you are introducing into the data lake. And then you need, on the other hand, to give the users a very simple user experience to get the data that they want in a really simple way, not in a technical way, that are able to join the data so they get most of the information that they want. So these are the new challenges. And we do this in two ways. When we ingest the data, we are working on the automatic metadata creation. So we are tagging the data. This is customer data. We are ingesting them only if they are not duplicating, only if they are not replicating data that already existing. We suggest where they can join, other transactions, et cetera, et cetera. Maybe we can make sure when ingest a PDF with a letter, I understand that this letter is from this person and I join it to the rest of this. And this is done automatically. On the other hand, there is an interface, as I say, Google-like, which there is a user. I want to see the highest, the most likely causes of fraud. And then there are essentially a set of information that are categorized and are leveraging this information that I created when I was ingesting the data. So this is crucial because since the data that you are ingesting are enormous and very fast, you need an automatic process to tag the data. You cannot rely on just manual processes. Hassan, when you talk to customers, what's the biggest concern? Is it, hey, I have to modernize because there's such an opportunity, there's such a huge ROI. Or is it, I have to do this because the organization is pressuring us to do this and I need to make sure it's secure and compliant and the data location is proper and it's governed, et cetera. Which of those seems to be the overriding factor? So I guess from a business perspective, they just want to use the data and they hear all these great use cases and stories. So they always just keep pushing everybody. We really want to apply machine learning to our data or we really want to be able to correlate these data sets. Then of course, when you talk to the IT department of the big enterprises, it's always a discussion and how can we do this secure, how can we meet up with compliance and regulations and all those kinds. So it's really dependent on who you talk within an organization. And I guess that's also why we as a system integrator try to talk to all these parties to align everybody within an organization. What's the engagement model? How do customers engage? Are you typically going, talking to the line of business? Are you talking to IT? Are you bringing the two together? What's the engagement model? It depends on the engagement itself. It depends on the use case that we know. It depends on sometimes there are requests for information or requests for proposal and in which we have a landscape that probably they want to migrate or some use cases that they want to address. Most generally we are asked for a point of view on what they can do with these new technologies. So we are given information about their current landscape. We use our experience in all our customer base and we are giving them some use cases with other companies that are a little bit more advanced in this case. We help them, but we respond to all types of engagements. It can be very specific. I want to do this with this technology, right? No problem. We have experience essentially in all the technologies rather than starting from like a road map that is not well defined at the start essentially. Just to start them and to organize themself for transitioning smoothly. And you guys are part of a big data practice not specific to any industry, is that right? But you have industry focus in some cases. Is that true or? Yeah, so we've got a global big data practice. So you're part of that? Yes, yeah, yeah. So it's a global organization within Camp Gemini that's just, it's called Insights and Data and it's just purely focused on big data and analytics and things around that. But the big data teams definitely has a big data of a global focus because it's just a global problem and all our clients are global as well. So you have to have that global mindset. And also we have the global size because we have 10,000 people and growing with significant rates every year. And so we are very lucky because we work with the most advanced customers that not only are addressing these problems they have already solved them thanks to our help. So we are very happy to help also the new companies that are embracing this transition to get that as smoothly and successfully. Excellent, all right, we'll leave it there. Gentlemen, thanks very much for coming inside theCUBE. Thank you for inviting us. Thank you for having us. All right, keep it right there, everybody. Check out, go on Twitter and check out hashtag CubeGems. You'll see little snippets of the interviews from today. Check out siliconangle.tv. Of course, go to wikibond.com. We just released our latest big data market forecast and always always go to siliconangle.com for all the news. This is theCUBE right back right up to this word from Dublin.