 Thank you very much. So good morning, everyone. Thanks for coming here to this talk today. I'm going to talk about big data governance challenges. So first thing that I would want to do today is ask you, everyone, how many of you are managing or are working in a data lake environment? Okay. No, that is paying something expectable. And how many of you are struggling to get real business value out of all the data? Well, some people and some people don't recognize it. Maybe there's a boss here or something like that. I may rephrase the question. How many of you in all your professional career working with data has faced a situation in which some business user complains that your organization have lots of data but too little information? Have you heard that? Yeah, something useful. So that's something useful. So maybe the title of the talk today is not the right one. It's not big data governance challenges. Maybe I should put this one big data governance challenges because data governance is something that is not really easy to perform, to do in our organization. Some facts to get started. It's estimated by, according to IBM, that 3.1 trillion dollars are being wasted because of poor quality in data. That's only in 2016. That's a huge amount of money. Okay, we are in Europe. We know that these 3.1 trillion are European billions, but it's also a huge amount of money. According to Garner, in 2017, each organization, on average, expends 15 million dollars on financial costs because of bad data. That's also plenty of money for only one organization, on average. So savings and opportunities to improve are really big. And it's also said that almost 60 percent, more than half of data scientists working in organizations, spend most of their time cleaning and organizing data. Those are tasks that do not have any value. We are wasting talent instead of these people training models, making predictions, understanding the business to analyze what's going on. They are cleaning and organizing data. So how can we solve this situation? We're going to solve it with data governance. And what data governance is? For me, with data governance, I like a lot this definition I took it from VA Survey. Data governance is everything that includes people, processes and technologies, needed to manage and protect the company's data assets. And the objective is to guarantee understandable, correct, complete, trustworthy, secure and discover corporate data. Sorry, I read the definition, I know that. But it's good to set a foundation of what are we talking about here. And I like a lot this definition because it talks about not only processes. Maybe when you ask me what is data governance, every one of us, we are thinking about processes and bureaucracy, but it's much more. It's also technology and even more important, it's also people. And we shouldn't forget that. So that's the reason why I like a lot this definition. Which kind of questions can address data governance in a data lake environment? First of all, it's going to help us to understand and to know what data is available and in what format. What can I expect? What can I ask to my data system? Second, what precisely does the data mean? This is a very important question. Sometimes we go to a very high level meeting in which the business executives, instead of thinking about or talking about how they are going to solve this business problem, they are complaining about why your data is different than mine and which data is correct and they're wasting their time and their effort. Maybe they are talking about sales in an e-commerce platform. And maybe some of them is considering that sales are everything that happens after a user, a customer check out their shopping cart in the web page. But maybe the other one is thinking about sales is what happens only when the product is really delivered to the customer. And maybe another one is thinking that a sale is something that happens when the return period has expired. So the sale is consolidated, it has no working back. So it's very important. It's important to have the same meaning for everything. And every business user and every user in the system has the same understanding of what we are talking about. Data governance also helps us to know where does the data come from? From which system? Which transformations happen there? What is the origin of these data that I'm seeing in this report? Which are the sources that are involved on it? What is going to happen if I change some operational system? Which reports are going to be affected? To what extent is it reliable? Data quality, and speaking about data quality here, is the data precise? Is this an estimation? Has it been cleaned before or not? It's not bad if data is not precise. What is bad, if I'm not aware, that is to some extent an estimation. But I must know it. Very important, who is responsible for the data? Who may I ask help to? Who is it going to say that this is the right definition for our corporation, for our organization, for this piece of data? Who is the person that is going also to define the authorization policies? Who is authorized to use it? Am I authorized to use this data? Maybe I want the country manager for some country to see all the other countries in the world in my organization, or maybe I only want each country manager to be able to see their own data. Those policies, also regulations, GDPR, et cetera, are also important in a data governance in our data lake. We've been talking about data governance for many years. Even before that, we started talking about big data. And it's pretty clear the building blocks on which components do we need to carry on with a successful data governance initiative. We may have a data catalog that help us answering the question about what data is available in what format. We may have a business glossary that help us understand the true meaning of each piece of data. We also have data lineas. Sometimes it's included in the data catalog. I like to give it importance, and I have a building block on its own to understand and to help answering the question about where does the data come from. Obviously, data quality. And in the foundation, the metadata, the data about the data, the data that help us feed all the other building blocks. And last but not least, obviously, the security and compliance policies that you should apply to all this system. So if in a high level, we have a pretty clear understanding about what we are doing, what we are talking about, everything is written, why is it so difficult to succeed in data governance? Why do we still have all these problems to really strike business value out of the data? I took this slide out of a book from Jurgen Apello. It's called Management 3.0. It's a very recommendable book. I like it a lot. And this slide is going to help me explain some difference between two concepts that many of you think that they are synonymous. If I talk about something complicated and if I talk about something complex, maybe many of you are thinking that it's the same. Something complex and complicated is the same thing. And I don't agree with that. It's not the same. Something complicated is the opposite of something simple. And it's something that maybe you are not an expert. You are not going to be able to understand it. But if you are an expert, you are going to be able to understand it. And you are going to be able to predict the outcome of the system. For example, my watch is very complicated. I do not understand how it works. But I know that it's going to tell me which hour it is. So it is a complicated system. But it is not complex. It's predictable. But for example, my family, or any family, is a very simple system. Three, four, five people living in the same house is a simple system. But it's very complex. You cannot predict what is going to happen. Maybe your child is having a bad day because he broke his or her favorite toy. Or maybe some of the adults in the house had a bad day at work and come home back really tired. So you cannot predict the outcome on what is going to happen. And there are other systems that are chaotic and are complicated, like the stock exchange. You need to be an expert to understand it. And also you are not going to be able to predict what is going to happen even if you are an expert. So this classification about difficulty is going to help me with what I'm going to say following in the presentation. So remember, complex and complicated are not the same thing. So what are the bad news? It's that data governance is both complex and complicated. We have technologies that are complicated. We have processes that help us keep everything more or less ordered that those processes are complicated also. But we have people involved in the equation. We have people who are responsible of the data. We have people that are generating data that may have errors intentionally or most of the time unintentionally. But when people generate information, it may have errors, human errors. People who are responsible of the data. People who need to take ownership of the data. And sometimes this ownership is not clear. Sometimes you don't know who is responsible about this data. And nobody in the organization can answer to that question. So when you include people in the equation, you are adding complexity to the organization. You are adding complexity to the system. And what happens to IT guys like me, I come from IT, not for the business part. But I like to talk to the business here. What happens a lot of times is that we try to imagine that we are able to solve the problem only managing as if it was complicated, only with procedures, only with policies. And that doesn't work. That doesn't usually work. You have to take into consideration that the problem is both complex and complicated, and there is people involved in the equation. And different kind of people. You have business people, you have IT people that speak different languages. More than 10 years ago, before we started only talking about big data and technology, we talked about how to bridge the gap between business and IT. With BICCs, Business Intelligence Competence Centers, et cetera. And now the situation is bigger. We have more data. We have more complicated systems. And only two or three years ago, we started again talking about these kind of problems and data governance, et cetera. So in case we haven't had enough 10 years ago when data governance was a challenge because of the people, technology, and processes, here it came big data. And here we are in big data Spain in 2018. And we have data in real time. And we have very complicated batch processes to predict machine learning, artificial intelligence. We have different means to deliver the information to the business, not only business reports. We have services embedded, operations embedded in our operational systems with intelligence on them. We have complex architectures with new technologies that are not as mature as usual technologies were. So they are more difficult to manage. Here we have an example with different tools. It doesn't matter. In the booth outside, many, many technologies that help us solve these technical challenges that came with the big data explosion. Have you ever seen that? Big data talk. I have to say something about volume variety and velocity, the 3Bs. But what's new on this is this math symbol. Let me play a little game with maths. This symbol is a multiplier. What I'm trying to communicate here is that when we add more volume, more variety, and more velocity to our data lake, the difficulty of managing the system is not going to scale linearly. It's going to be exponential. This is an exponential factor. So the idea here is that the 3Bs are exponential. So our challenges, our data governance challenges, now in the big data world, now in our data lake, are being multiplied by an exponential factor. So we have more or less the same challenges that we struggled to face 10 years ago, but in a much bigger environment. So that's the reason why many times we do not succeed when we try to extract value out of our data lake. Let me play a little game here. I know that the synonyms, complicated complex, all that thing is confusing. So instead of talking about complexity, I'm going to talk about entropy. If you talk about entropy, the definition in thermodynamics is the amount of order, disorder, or chaos in a system. That's precisely what I was talking about before when I talked about complexity. So I'm going to talk now about the exponential factor, the 3Bs, when I like to talk about the volume, variety, and velocity in our data lake. And I'm going to talk about entropy when I refer to the order or disorder in the system to the predictability. So let's do a little game here with this graphic, this here with the quadrants. What do you think it would happen if we have low 3B exponential factor, low variety, low volume, low velocity, and everything is ordered? We have low entropy. I like to call this place the land of lost opportunities. Everything seems under control, but I'm not leveraging all the data assets available to my organization. So this is not a good place to be. What happens if I add some entropy to this system, if I add some disorder, some chaos? I like to call it the messy teenage room. Nothing more to say here. Please tidy up. There's something that you can do. So let's move up to the big data wall with the high 3B exponential factor. If we have a very ordered system, I like to call the place the plateau of bureaucracy. I have a feeling that everything is under control, but it requires too much effort, and I'm going to be slow. This is the kind of place where I can spend two months asking for permissions to access some information that I need to the new system. And I'm going to be late to get the results to my business users. So I'm not going to add value. I'm going to keep everything ordered, but I'm not going to add that value because I'm going to be late. So this is not a place to be, neither. We have one last place to be, high volume variety velocity, high entropy. What happens here? Here's the realm of chaos. OK, I'm not going to do any kind of process. I'm not going to do any kind of data governance. So what's going to happen here? Unpredictability. Data quality is going to be low. I'm not going to know what's going to happen when I start a project. This is the kind of place where, for example, there are several teams extracting the same data out of an operational system and putting it into the data lake, the same information with different names, wasting volume, wasting computational resources, and even worse, applying different data quality policies for each of the pieces of data. So maybe you have different results in different reports, and that's the worst thing that you can do. Unpredictability, truthworthiness of the system. So when we do this kind of graphics, this kind of pictures with four quadrants, normally there is at least one quadrant that we like to be on. But this is not the case. Everything is bad. What's happening here? That's not so bad. There's a good place to be, and I like to call it the balance zone. This is the balance zone where you are managing all your data assets, and you are keeping a balance between order and chaos. So this narrow zone is the place where real business value is being generated in your organization. Just let me elaborate a little bit more with a different metaphor. If you are in the plateau of bureaucracy, this is the picture that comes to my mind. This is a desert. This is a huge desert. You have a lot of material. You have a lot of sun. But crossing it is painful, as it's painful to cross a data lake with a lot of bureaucracy and with a lot of processes that is not flexible, that is not agile. So you can survive crossing this desert. Maybe you have a very expensive, a big track, and you can cross it and survive, like the Paris Descartes race. But maybe your track is going to get stuck in some sands, and you are not going to have a good time. You are going to probably be late, as I told before. If we move to the other part, which kind of image comes to your mind? If we add chaos to the system? This is the data swamp. This is unpredictable. This is dangerous. You can, if you cross this data swamp, a snake can bite you. You can coat some illness or something, whatever. This is the kind of place where you are developing some new project and someone deletes your data sources without noticing before. So you are in the middle of the project and something happens and you are stuck all again in the beginning. There is a lot of unpredictability. There are a lot of dangers there, because nothing is being taken in control. So what should be the right place to be? What would it look like? I like this picture. This is a good place to be. This is a healthy environment. This is a healthy lake. This is our healthy data lake. Where, okay, there may be some unpredictability. I'm not going to know exactly what is going to happen. But it's safe to be. I can swim, I can sail, I can fish, I can even run on the shores of the lake. I can deliver a project knowing which data is available, knowing the outcomes and more or less predicting how much time I'm going to need to deliver the results. So how can we really succeed? How can we make real this promise of balance in our data governance policies in our data lake? How can we achieve the balance song? I have three advices for all of you today. First of all, is when you build a data analysis platform, a data analysis system, you should follow a comprehensive methodology. Something that happens usually is that technical people use software development methodologies, general software development methodologies to deliver these kind of systems. And that's not a good idea, because there are specific tasks that must be performed in a data analysis system that are not included in software, general software development methodologies. Software development methodologies are not good for developing software, but you need more steps in your methodology. For example, we at DXE, we are a huge services company that we come from the merger of the Hewlett Packard Enterprise Services Division and CSC. So we have been a lot of years working in these kind of initiatives and analytics, and we have a very mature methodology that includes all these tasks that are specific to data analysis. So this is the first thing we have at DXE. There are other methodologies also, but you should follow one that you should make sure that it includes all these data analysis-specific tasks. And also, you should embrace the agile principles, maybe scrum or other thing or other methodology. It's not important which specific framework you are going to use, but which is important is that you embrace some kind of unpredictability, that you assume that you are thriving in a complex environment, that innovation, the digital transformation, it's a kind of unpredictability, an unpredictable journey. So the best methodologies that help us succeed in these kind of environments are the ones that embrace the agile principles, whichever framework. Here I am picturing a scrum, but you can do whatever you want, but you should consider that agile is going to help you. Also, these methodologies should embed data governance processes on it. In our case, for example, in DXE, we have a specific track for metadata and a specific track for business. Because what happens is you forget about metadata at the end of your project. Your costs are going to increase a lot, you are going to have additional delays, and sometimes it's not going to be even possible to extract all the metadata that you need to rightly govern your system. So this is the first advice. The second advice is to automate as many tasks as possible. Many years ago, when I started working in business intelligence and data warehouse, we had a few data sources. We had some data maps with some 50, 100, 200 tables, and we were able to document them and to keep all the necessary information in Excel spreadsheet manually. It was painful, but it was possible. And we could have some help with these kind of deliverables in our projects. But that was many times ago. Now that's not possible. We have thousands of data sources, tens of thousands or hundreds of thousands of transformations in the data. And there are many different tools that can help us getting all this data about what we are doing, which kind of reports do our organization have, where does the data come from. And all this information can be managed in different systems. Here there are three different tools from three different vendors that could help us succeeding in these kind of tasks. What I would like to say today is that business value comes from insights, from analytics, for good decisions, and business value comes from, for example, using artificial intelligence to be embedded in our operations, in our organizations. But artificial intelligence can also help us to solve in this kind of complex problems of automation, of metadata management and our data governance processes. So this is like a wheel that feels itself to move better. So automate as many tasks as possible. You can build your system for automation or you can buy. Here in this slide you can see many vendors that provide solutions, different characteristics, some of them. But there's plenty of options in the market to help us succeeding in these kind of tasks. Third, everything should be made as simple as possible, but not simpler. This is a quote that I like a lot from Albert Einstein. It's a simple quote, but it has many, many important things on it. I was talking about balance. So if everything is made simpler as possible, where are we heading to? In which place are we going to end? We're going to end in the realm of chaos that we picture before. What happens if you make things more complicated as they should be? And that's something that, unfortunately, technical people sometimes, sometimes we do it because we say, okay, that's great, this is technically great, but we make it complicated. You're going to end where? In the plateau of bureaucracy. So this place where everything should be made as simple as possible, but not simpler, is your balance zone. It's where you want to be. It's where you really are going to be able to extract business value from your data. A couple of tips for managing the simplicity, for keeping the balance. If you know agile, you know the concept of technical depth. Technical depth is not something bad. It's something that happens and you must manage. What's bad is that you don't manage technical depth, but all of we have technical depth. With data governance, processes, metadata management, quality, it happens the same thing. We may have an optimal data governance progress in our data lake, in our organization, but not always we can follow this path. Maybe we have budget constraints in a moment of time, an opportunity with a deadline that we would like to cease that opportunity, but we cannot perform all the ideal tasks to be able to succeed. So maybe we have some data governance depth there. We made a real data governance progress that is lower than we intended to. And that's not something bad. Depth is not bad. I have a mortgage. I have depth. And it allows me to have a house. So it's not bad. What's bad about this, as I said, is not managing it, not being aware that some decision that you made is generating both technical depth or data governance depth. So keep that in mind in every decision and every step that you make. And the second tip, it may sound an obvious one, but sometimes the obvious we forget about that. And again, we technical people sometimes we forget about that. And business people, they forget that we are managing difficult and complicated systems. So we speak different languages and we should understand each other better in order to succeed, because we need each other. So always, if you are in doubt of which path to follow in your data cover, in your data lake, if this is a good or bad decision, if this task that I'm going to perform is going to help me or not, always think about value generation. This may seem obvious, but in my experience, many, many times we forget about that. So okay, we are going to do that because it's something that I read somewhere and it's a very good practice. Or not, I'm not going to do that because it seems expensive. There are things that I heard again and again. But it may be expensive, but it may help us to generate more value. Or it may be a good practice, but it's not helping at all in generating value in our organization. So always, always consider value generation in each of your decisions related to how are you going to design and deploy your data lake. So considering these three advices, what should we expect to achieve? Where are we heading to? How it feels to be in the balance zone. As I said in the beginning of the presentation, if you are in the balance zone, you are going to have data which is easy to find, which is easy to understand, which is correct, which is complete, which is trustworthy, which is secure. So it is the kind of place that you have this predictability. It's the kind of place in which your data team is going to have a high ability to adapt to new business requirements and to new conditions in your environment. It's going to have flexibility. It's going to have agility. It's going to help you to reduce costs because you are going to be more focused on what you really need. And you are not going to have the same problems as the data scientists, data scientists at the beginning, that they are only cleaning and organizing data. You are going to be performing tasks that are adding value. So also, thanks to this focus, you are going to achieve reduced time to user. You are going to be faster. You are going to be able to deliver faster. You are going to be able to help your business to seize all these opportunities that are ahead of us. This is the foundation to make data-driven decisions. And this is the foundation to a data-driven company. And data-driven decisions, it really generates value in the organization. So this value chain, for achieving this value chain, we need to be in the balance zone and to put the right data governance processes, technologies, and responsibilities and people in the organization to make sure this can happen. So we are coming to an end today. So as a wrap-up, I would like to summarize the main takeaways from today's talk. First takeaway is that data governance is essential to drive business value to our data environments. Big data environments need data governance. You should use specific methodologies that have in mind data governance processes to manage this complexity in this system. You should also automate as many tasks as possible. The problem is difficult and you need help, technical help to solve it, including artificial intelligence, as I told before. And last but not least, you should keep it as simple as possible, but not simpler. Use your common sense, please. Do it. Do not go to your messy in his room, okay? So that's all. Thank you for your attention. I would be very glad to answer any questions that you have here or later in the booth, in the DXC booth. And I wish you the best in your journey to a balance zone in your data lakes. Thank you for your talk. Very interesting. So, for instance, if you have a company in the realm of chaos and you want to move it to the balance zone and this company may have a few human resource, okay? What would be the first initiatives related to data governance that you should start with? You know, the key ones. Because there's a lot of data quality, you know, data lineage, business catalog. You should involve business from scratch. Or the technical architecture. Yeah, it's a very good question. Where do we begin? You should begin with the part that you have more chances to be successful. So you talked about that you have a low level of human resources. So if you have a low level of human resources, your complexity is going to be manageable. So start with ownership. Start with who is responsible of the data. Ask that question. And maybe you are going to find surprises. Maybe some simple questions. Who is responsible about the definition of sales in an organization? And you say, okay, everything. What is the definition of a sale here? Nobody is going to answer. So begin with that. Small steps that generate awareness of the situation. And when you do that, you have something that is someone that is responsible, you can say, okay, if you are responsible of this data, here I have five reports. And all of them are telling different stories. And things are getting bad. How can I succeed here? What should I do here? And he's going to ask, where does this data come from? Why is this quality so poor? So you have the next steps. Okay? Simple data lineage initiative to help him understand the person who is responsible. Why data is not good at all? And where does it came? And you begin pulling out of the string and helping understand everyone. And you are going to get the support and the budget when you begin to improve the quality and the predictability of your initiatives. And this is something like a snowball that gets bigger and bigger. That's also the agile way of thinking. So in this situation, that's my advice. Start with the simplest step in awareness in the organization. Work with people. Hello, good talk. I would like to ask we know that all of the data lakes will usually become data swamps in the end. But my question is when should the data government on the data lake start? And which role should IT play and which role should business play on these data governance in the data lake? Thank you. Could you repeat the last part of the question, Soryans? Which role should IT play and which role should business play in the data governance in the data lake? Very good question, very interesting question. First of all, I would like to disagree that every data lake is going to become a data swamp. Maybe you can sanitize it. Hopefully some of them not. Some will become deserts too. I've seen deserts also, not only data swamps. That's also a bad place to be. IT should collaborate. Business provides the focus on business value. On the focus on what are we trying to solve here. IT are the savi ones in the technical tasks. So IT is going to be stronger in the technical aspects. For example, automation. Business should be more focused on the business definition of the data. And the processes should be a team effort between both of them. So any way that you can figure out IT and business have to work together and there is not only one answer. There are good practices but each organization works in a different way. So that requires some kind of reflection but very good intentions for collaboration and understand each other listening a lot and communicating well that's something that maybe IT can improve. But the first step to improve it is to recognize it. And business also, they need to understand that something is not as easy as it may sound or as easy as we technical people may try to pretend when we are selling, for example. So things are balanced. It's not good or bad. But you should work together. You should rightly understand what you are having in your hands and if you do that, you are going to succeed. It's as simple as that. Business definitions IT, technology processes join effort. Okay? Thanks. Okay, so that's all. Thank you very much.