 Hello and welcome. My name is Shannon. I'm the executive editor of Data Diversity. We would like to thank you for joining this month's installment of the monthly DEMA international webinar series. This webinar series is designed to give our Enterprise Data World Conference Attorneys education year round and to inform you about all the great things that DEMA international has got going on. We are excited about the upcoming Enterprise Data World 2016 conference to be held in San Diego, California, April 17 through the 22nd. We've already had a lot of inquiries about that, so I'll be sure to save the date. It was sold out last year. Today's webinar is being presented by a longtime DEMA member and speaker, John Hendricks. Today he will be discussing taking information governance to the next level. Just a couple of points to get us started. Due to a large number of people that attend these sessions, you will be muted during webinar. For questions, we'll be collecting them by the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share our highlights or questions by Twitter using hashtag DEMA. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now, let me introduce today's speaker, John Hendricks. John is a highly rated consultant speaker and author who has been active in the field of information management or relational database management since 1986. John's experiences combined with information, architecture, and management expertise have enabled him to help many organizations to optimize the business value of their information asset. He has a vision for innovation and ability to translate visions into strategy. He is the director of the Belgium and Luxembourg chapters of DEMA, data management association, and runs the Belgian information governance council. He has published articles in many leading industry journals and has been elected to the IDUG speakers hall of fame based on numerous best speaker awards. John is chair of the president's council at DEMA international as well. With that, I will take it over to John to get us started. Hello and welcome. Hello. Morning, evening, afternoon, wherever time is when you're in. Thanks for joining us for this session. What we're going to do today is really talk about what it means to create an information center organization. And I try to sort of say, well, let's catch it in a hashtag, because apparently that's something that you have to do today. And to summarize it, I came up with a hashtag, no Hadoop, where no Hadoop, as you probably can imagine, does not mean that we do not want Hadoop, but that we really need to think about what it means to be information-centric, and that means that it's not only going to be Hadoop. So looking at the context to really see what is going on today and one of the words that I see used and probably abused very often is the fact that people are talking about the fact that we're living in a disruptive age. Because things are disruptive. We have a disruptive business model. Data becomes disruptive. But if you look a little bit at this from a broader historical perspective, we notice that actually most of the things that we're so excited about today, they have been invented many, many years ago. So just as a refresher, I'm sort of looking at saying, hey, what are we going to do today with things such as machine learning? Well, in 1959, Arthur Samuel defined machine learning. And he said, yeah, but wait a minute, there's a whole new paradigm in terms of storing data. We have column stores, but I benchmarked against model 204 in the 80s, so that's not novel. Memory computing, SAP HANA, that's something quite new. Well, not exactly because IBM PPF is there since 1979. And data mining was defined in 1999. And even in the 1960s, statisticians were already talking about data fishing or data dredging. So the point is not that we've come up with a whole bunch of new things. But then the question is why are we so excited about this problem? One of the things that has fundamentally changed is the cost of value equation. And actually a lot of the things that we're doing today, we can really say, well, it's thanks to more and more law that we now have reached the tipping point where the cost of computing allows us to do things that we were not able to do before because the value was simply not in proportion to the amount of data. So the point is that because it has become cheaper, we can start storing things and we can start processing things in a way that we didn't do it before. And it also allows us to change some of the problem settings. And one of the things that is an important thing to understand is, for example, the fact that we are now starting to automate processes where issues such as acidity is not really a problem. And what acidity basically means is the fact that I can run a transaction and re-run the transaction and have consistent output. For many cases, if we are now just looking at statistical relevance or proposal of a negative offer or trying to understand whether that person is actually the person in this picture, then this whole aspect of acidity doesn't play anymore. So by dropping some of those techniques, we can start using quite different techniques than we used to do in the past, which gives us more capabilities of processing things. Computing is pretty much everywhere. So it is networking, connectivity and things like that. So that means that the methods are not novel, but the price model is novel and also the way that we can connect things and access things pretty effectively change the environment. The result of that is also that we see what I would call an explosion of informal events. The fact that we have this internet of everything or the internet of me or whatever, most of the data growth is actually not in the more traditional things. So if I look, for example, at an organization and I can say, how many additional invoices do you have, the exponential growth is not in that area. But the exponential growth is more into a space like, and I think the best example is when I look at my holiday pictures, when I went with my wife on our honeymoon, we spent three weeks in the United States and we, believe it or not, were able to take pictures for the entire three-week travel with three rolls of 24 images. So we were very, very fortunate in which pictures that we were taking. Today, if you are going out for dinner with friends and you spend 30 minutes together, you probably have taken more pictures or selfies or whatever than we have taken over that three-week period. So the growth of the data is not into these formally recognized things, but really into the area of smart metering, Fitbits, RFID tagging, and they have all one thing in common, that that kind of data is potentially less formally described. That's one problem. And the second problem is not only that it's less formally described, but also that the actual value that you can derive from it requires quite different processing than just looking at an invoice or just looking at the sales order. So it's not so apparent to actually say, hey, if I have for myself for the last three days, what is the actual value in doing that? So that means that we need to do something differently and do something potentially more complex to actually get insight out of that. So that growth that we can pay now because it's not too expensive to do that. The question is where are we going to store that information and how can we actually make sense of that? What I would call marketing bus around data lakes or pools or puddles, whatever, where people are saying, yeah, but if you store everything, the fact that we have the data that is available, it is much smarter to just store everything and handle it. And by the way, while we're at it, why don't we try to replace all our problems and solve all our problems by applying Hadoop. So my point is that, don't get me wrong, that Hadoop is not something bad, but we should really start testing and saying what is the added value of, for example, processing something in Hadoop. And this comes back to my first remark considering acidity. If you're saying that, hey, I'm running my banking transactions, I really want to know how much I have in stock or what's the balance of a specific bank account, then that problem is not potentially a problem that you need to handle with massive parallel computing or machine learning that the bank account is plus or minus and that's pretty much what it needs to be done. So the point is not that we're saying, yeah, but you should get rid of a data lake or a data pool, that's not the issue, but really saying what would be the added value of handling something in a way that we can actually say it's massively parallel scalable and I can do all kinds of things, yeah, but what is the added value of having that? We're handling this in a much fundamentalistic way, where they're basically saying put everything in the data lake and then all your problems are solved, well, not necessarily because the value proposition of a Hadoop cluster is not necessarily there when you start looking at full tolerance, high availability, et cetera, et cetera. So the point is the lake is not the issue, but the problem is what problem are you dealing with and you pretty much know that people are not looking at the entire picture when you look at an architectural drawing, which you often see today when people are drawing a lake, where then they push aspects such as operational systems or master data sitting on the outside of it, basically sitting on the side of the lake and if you look at something like that then you know that this is not the issue. And the actual answer that I'm saying here is that a true architecture and the best design is actually considering the fact that you have a hybrid environment where we can apply the best technology solving the best problem. And as I go back to what is disruptive today is not the fact that we can do novel things, but the fact that you can combine the novel things with the more traditional way of solving the problem. So that means it's a little bit more complicated, it requires a hybrid architecture, but it's not only going to be a dude that is going to solve our problems. Which brings us to the second point, that we're basically saying, yeah, but what about statistical relevance and why is it so that our data lake is not going to be able to solve all the problems. So we're basically saying, what is the problems that we're still having and shouldn't we be looking at the small data. Now, I come up with a new business model because if you go back to my explosion of data being saying, yeah, let's look at the data that is coming from a Fitbit. If you look at the data coming out of a Fitbit, what does the Fitbit really tell you? The Fitbit only tells you that the device was at a specific location, at a specific point of time. It doesn't actually tell you who the person was, it doesn't actually tell you what the demographics are, it actually does not give you this entire context that allows you to really make sense of that. So what I come up with is a new business model because I actually had a dog walking company before. I had no time to actually walk their dogs, they could give them to somebody and I was doing that and I find today more and more people that buy a Fitbit but they don't have the time to run and exercise the Fitbit. So if I would combine the two business models and say, you know what, I'm going to stick the Fitbit to the dogs and let the dogs run around in the park, it exactly shows you what goes on that the dog is moving. Which brings us really to that weakest point or the weakest link of actually saying if I capture all of this transactional data, all this device data and I throw it into the data lake, what is that really telling me? It is just telling me that a sensor has been at a specific point in time. It tells me that the device has been at a specific point in time but it doesn't really have the demographic context of that. The point is throwing all that captured data into the lake, it is going to be pointless and we will not be able to draw any valid conclusions if I don't have that surrounding environment that is going to tell me that Fitbit was bought by that specific person and that person has this income scale and that person has a household of such a context for the children and that is not the kind of thing that you can get out with statistical relevance. That also means that the aspect of active quality governance and active data governance in terms of defining what things are and defining what the quality objectives are, that still needs to be combined. It is not with simple volume and statistical relevance that is consistent with taking actions and drawing the right conclusions. That is another thing that I said is sort of friction happening today between the people that are more, you could call it old school, you could call it traditional but that are basically saying I nurture the quality of the data and I want to take the effort of maintaining the quality. There is the other school which is more the statistical school that is not consistent with my quality objectives, what I'm going to do and there's a very nice statistical term for that, I'm going to consider it to be an outlier and I'm going to discard the outlier because it's not going to contribute to my decision-taking model. Both are true but if you really want to say I want to make sure that I truly take the right decision, then taking out an outlier is actually an operational system. Imagine that you have a product master and you would like to understand what the allergens are of a specific product and whether people are potentially going to have an allergic reaction by eating that, then I don't think you probably want to say what we're going to do is we're going to use statistical relevance and we're going to make the sample set large enough so that I'm 100% sure that the product doesn't contain any allergens. Everybody that's concerned with the health of people will probably say no, you can't just use statistical relevance to have the bill of material or indicate that the product does not contain any allergens so that this operationally is something that you can't simply do. The point is that just having enough data is not going to be a problem of understanding what it is and how it works. That is going to be maintained and that's going to define the ultimate value that you will be able to drive from the fact that you've stored everything into your data lake and that the data is then going to be able to draw to the right conclusions. With the rise of the heart of to obtain value data or what some people call big data, what we see is if you actually have to have equal importance or spend equal importance to the small data or the reference data or if you want to talk about business intelligence terminology then you really should think about the fact that your conformed dimensions are actually going to allow you to make sense of that. The point is not having the right quality of the smaller data that is explaining the transactions is going to really keep you from drawing the right conclusions and the data that is sitting within your data lake. So that means that you really have to fundamentally address that mindset of your organization, say what do I care about? Do I care about statistical relevance and then you can throw in cognitive computing whatever to sort the things out. The point is that proper data quality is not going to be solved by that statistical relevance, you really have to have the right persons and the right purpose. Which brings us to this ultimate question on who should need this context? Who should actually bring this mindset into your organization to say that data needs to be nurtured and it's not just having the volume which brings us to the question what do we really need? Do we need to achieve data officer, achieve digital officer, achieve information officer and looking back at that saying this whole idea of chief information officer is something that we have actually done many, many years ago because the CIO is a role that has been in an organization for quite some time. But in reality what you notice is that the role of the CIO has slowly converged in many cases to more a chief technology officer where IT has pretty much taken up the role of delivering technology services rather than actually putting information at the center of what needs to be done and in fact it already goes a step further today because what we notice is that a lot of the chief technology officers are in many cases becoming COOs and then you could wrongly assume that means chief operating officer but in many cases the CIO today is actually becoming the chief outsourcing officer which means that this differentiating value of technology as such is decreasing. But the fact that the information is as such a business enabler that requires attention, that requires ways of transforming an organization potentially from a process-driven organization to an information-driven organization. It's not something that's going to happen overnight which means that essentially what we need to do is we really need to make sure that that at the C level the focus on proper data management is being maintained. So what I'm typically saying here is that you really need to look at what I would call a chief importance officer which basically is how can I get a roadmap in place, how can I get a budget transition into what was most cases more a process focus and then we did a little bit of lean right now into the point where we're saying how can we truly become data-driven. So the point is why and how are you going to be data-driven and looking at the industry you will notice that that focus could be quite different depending on what kind of an organization that you're working. So looking in a highly regulated environment then the importance of information is not necessarily the creation of additional value but merely to be able for example to just keep your operating life. So just looking at insurance or banking or anything that has to do with life sciences. We see that there is a very big push for compliance as a driver. So in that case it is quite likely that the top of mind for your board and the top of mind for your CEO is not potentially to create a lot of new value but is at least the ability to stay compliant and be consistent with the fact that your markets authorization is still going to be there. So in that case as the drive is essentially compliance what we see is that this information focus is often driven by the risk officer or by the CFO or by the compliance officer so that our roles and responsibilities and activities are actually consistent with what we need to be compliant. Now if you're more into an R&D kind of an organization then the fact that you have the information truly becomes a part of the business model. Now we'll talk about that a little bit later but we see a lot of companies where we're actually saying that they have to be fully integrated in the business model so that they have to be fully integrated into selling mobility or selling the ability to understand what is the quickest way to get from point A to point B. And just looking at Google today what they are for example in the Netherlands what they've launched is they've launched a fully integrated view on traffic coming in a new business model so in that case it's not pretty much a risk angle but it's more of a value or innovation angle that can be taken. But the point is that coming from more of a process view into a data-centric view it's a multi-year transition that needs to be done which means that we'll have to have somebody that can make sure that it stays at the horizon and the scope of the board which means that we're going to use or we're going to need this chief importance officer or data officer or whatever that you want to be putting in place. Another element that we see today happening in the market is the fact that there is more and more drive for having the data being opened. We're basically saying next to the fact that there is net neutrality it's also the fact that the data is going to be exposed so that we'll get the benefits out of that and understand what is going on. And a very good example is what I just recently worked with a retailer where they were trying to understand the effectiveness of their sales campaigns and they were just looking at that they're all sales data and they realized that we're doing great we're actually having our own brand being better positioned versus the competitive branches. And what they noticed is that by bringing in some of the external and open data sets, they could dramatically improve their decisions taking by looking at the data that is being opened and a lot of the new business models are actually coming from the fact that we can do smart things and interesting things with the data that is available and the data that is being opened. So there is a huge potential of that. However there is a quite there is a potential risk there because if you look at some of the things that the semantic web can do, what we notice is that the information governance piece of that is sometimes lacking. And a good example is what I saw in the newspaper a few months ago in Belgium where the head of the newspaper says 300 nuclear incidents happened this year in Belgium. And sort of say well, that's a big number. And then when you start digging into that data set, you notice that what is the actual definition of a nuclear incident? So the data is available and the data is open, but the data is not always the data points are not always sufficiently governed and described that we can draw the right conclusions in a very safe way. So that is a bit of a risk of this kind of an appointment. A risk is what we saw which happens with the data that is being published with the London bike. So when you rent a bike in London, those data points, which bike is rented at which point is actually data that is available. Now somebody was smart enough to trace the individual bicycles and by the nature of people being lazy you could pretty much derive the fact that people do not go from one bike station to another, they typically go to the closest bike station. And when you get off a bike, your destination is probably within the proximity of that bike station. So by looking at the actual bike movements, you could pretty much always go down to the individual saying at nine o'clock in the morning this person lives in this area, goes to work and works in that kind of company. So they still have a sort of a breach of privacy because you could understand this person is going to be out of his house from that point to that point on. So we have to be a little bit careful by making the stuff too much open so that we can actually potentially run a risk A by drawing the raw conclusions or B by exposing things that we should not necessarily be exposed. Which brings us to the next point of saying, yeah, we have all the data, we can process it. But the question is, should you process it? And what we see today is that especially with all the legislation that is happening in the European community, there is going to be a much stronger push, a much stronger drive of having data privacy really as a core value. So today we see companies that are actually using the fact that they are respecting your privacy as a selling proposition. If you just look at the way that Apple is approaching Google, Apple is basically saying, listen, you have to pay to store your images with us, but we won't be invading on your privacy. So the privacy is actually sort of the new green or the new eco saying, well, if you stay with us, then we will respect your privacy, we will not invade on you as an individual and we will treat you in a way that is being ethical. So that's one of the key issues today now is that with all the data being available, the question is, should you as an organization do this? Now you can take two angles. You can take the angle which was published recently in the Harvard Business Review, where we're basically saying, well, disruptive is by pushing the envelope, disruptive is by pushing the envelope. So we actually argue that if you really want to start working with your customers in a respectful way, then that's not something you do. You do not invade the privacy because people, when they have an option, they will actually say, well, I'm not going to stay with this company, I'm going to take my business elsewhere. That's actually an interesting case that happened quite recently, and it's an exception that the bank was going to monetize the data of the customers. And there was a lot of pushback from customers saying hey, I'm not going to pay you to run my banking transactions and then you get business value out of me. So that's sort of a very important balance that you need to be keeping saying, are we going to use this information even if we can, should we? And commission is likely to be quite severe. And in fact, the Netherlands and Belgium have taken the lead in that. And from the first of January on, when you have a breach of data privacy, there are fines of up to 800,000 euros, which is quite significant amounts. And this is not only applying to European companies, but it's applying to anybody that the data processing or with a European subject, so any company will be exposed to that. So ethical and should you is suddenly becoming a very, very important question. Especially if you start looking at information as being a business model. And this is where we're basically saying where the disruption comes from. And a lot of data management projects today are actually significantly missing the point. Especially when we start looking in the area of master data management. Where people are saying, oh, I want to handle the quality. I want to process the quality of product information or I want to do customer information. But they don't really think about what it really means to have data being a differentiator. And data being a differentiator is not necessarily doing better product management, but it's about selling more. It's about better using digital intelligence. And then digital intelligence basically means that I have the ability to take the next best offer. To do a smart attain. For example, having an only channel approach in retail or turning people from a show groomer to somebody that will actually buy. That's a big, very big distinction there. Or something like, for example, what are we doing with drop baskets. There's some very interesting market research that has shown that if somebody drops a basket and you can follow it up with a proper next best offer that, for example, conversion rates are up to 30%. And this is the point where a lot of the exercises that people are doing are saying, yeah, we're only doing product master data. We're not doing, for example, price configuration or we're not doing something with drop baskets. The point where it can actually become disruptive is the fact that we are not probably using this information. And especially looking a little bit at this context of disruption, what disruption really says, it does not say something about your competitor. Disruption is a level playing field. Because if you look at this environment, the fact that the data is available and the processing power is available, I mean, anybody can do that. And I've seen many cases where we see this larger existing corporations complaining that they're being attacked by a small competitor that is acting in a disruptive way. Now, if you truly believe in this idea of a data-centric and data-driven organization, then in most cases, the larger corporations with a little bit more of legacy, a little bit more of background, what they have is a lot more data points. So for example, if you look at e-commerce, the customer profiling and understanding whether a customer is a type personality or a B type. So basically, are they more sensitive to pricing or are they more sensitive to brands? That is something as an existing retailer, you have a much better insight in the transactional history than a new kid on the block. But the other thing is because that larger retailer is not using the data points that they have and to turn that into an insight and turn that into a better way of doing business. And we see, as I said, a lot of shifts from people selling electricity to people selling what is the most effective way of consuming or people moving from selling cars into selling mobility solutions. So going from a more physical world where data is actually becoming the business value. Same thing with Nike where Nike is now installing more and more sensors into their apparel. Why? Because the data that we can capture from that is becoming the business model rather than selling the shirt or selling the shoes. That is idea of data being the asset is in many cases leading to a complex which is called hoarding and what hoarding basically means is that you hang on to the data and just as recently as this afternoon I was sitting in a meeting with somebody where they said, you know what, let's just capture the data and we'll figure out what to do with it later. Now that is a completely wrong assumption because if you say I'm going to store the data anyway and I'll figure out what to do with it later, what happens is that the value of data is only there if you actually have a function to do something with it and if you don't have a function to do something with it, then the fact that you hang on to the data, it's actually becoming a liability and that's really something to think about to say what is the validity of having it and if I capture the data without truly understanding what I want to do with it, to what I have enough context to actually draw the right conclusions afterwards. So that's really the situation where just having more does not necessarily mean doing better and those are things that we should actually just distinguish and this idea of capture the data and we'll apply the schema to it later as its limitations because in many cases you don't really understand what it was which caused the right conclusions. So I have a very simple rule. If somebody wants to hang on to data, then the real question is if you want to keep it, do you have the ability to maintain it and do you have the ability to have the way to reflect the reality within the data point that you have and often the answer there is that we do not have the ability to make sense of it at this point. We don't know what to do with it. Well, let's not capture it at this point and then we'll see what we can do at a later stage. Which basically means that if you do not have the proper metadata, then you really don't have the ability to draw the right conclusions and to actually use this thing in a most effective way. And a lot of the regulation that is coming up today is really more focusing on the traceability and the lineage where we're talking about the data definition. If I do a price proposition, if I have a conclusion that I have regarding to a customer, I need to be able to back it up and say where does this thing come from. And that goes way beyond just having a common definition. Because you could say yeah, but this is something that we could potentially solve by having a proper glossary. A proper glossary without truly having the right definition, but I cannot guarantee I do not know that I'm drawing the right conclusions. So the point is that that metadata is really something that is quite fundamental because it's only going to allow me to be compliant, for example, with BCBS or with any data privacy regulation, it will require me to truly understand where did the data come from, what are the sources and how when I take a conclusion, when where does this really come from. And that's an area where I see a lot of companies still being quite weak today and just being able to comply with regulation is going to be an issue if you do not do proper metadata management. So documenting it, understanding what is going on is quite fundamental. Now one of the challenges there is that there is no very strong industry standard that will actually facilitate this. Because our design repositories are very often not truly linked to the actual IT assets so that in some cases we sort of have to come up with something happened in the middle but we don't really know what it was. So there's sort of still a leap of faith involved when I start looking at numbers and I really want to justify that this five is actually a five because that's still going to be quite limited. So looking at all of this thing, what we notice is that we're pretty much living in a perfect data storm. Why? Because we're in the data age because the cost of execution both hybrid or cloud or storage or network processing is going down. So thanks to war's law, it's costing us less and less money to manage the environment and the availability of data through censoring, through capturing is dramatically increasing and our analytical capabilities, the algorithms to actually understand a whole bunch of data porn saying this is going up or is going down or I can draw the right conclusion the availability of the algorithms is considerably growing. At a way that we can actually do this at a reasonable price and of course the visualization techniques and the interfaces that we can have in terms of self-service are also available. So that means that the fact that we have the data, we capture it, we store it and we process it gives us really a new environment that we didn't have before. But, and this is where it becomes the big but, we really have to make sure that we govern it properly, that we architect it properly and that we have the right policies to apply to that. And up to now, most of the data processing designs have been quite linear in process and this is where we sort of have the tale of the two chickens, right? So how do I get on the other side? We see sort of basically two worlds evolving in parallel. We have one which is the more traditional relational world where we basically say I need to define the schema, I will normalize it, I will try to understand what we're doing and then I get started. So this is one way of doing it and this is a way for many, many years where we basically say we need to govern the semantics and then I can make a safe decision. The new kids on the block, the not only sequence block is actually taking a different angle and some of them pushes quite far where they basically say the entire world just exists of a bunch of events and I can bring to the data the function and the schema as I process it. So basically the data as such, the data functions are quite complex neutral. So what they do is they bring the schema and the function to the data and then anybody can sort of take the data set and try to make sense of it. But those two worlds are not necessarily contradicting each other and they can actually quite effectively live together. So what we need to do there is to truly understand what is the governance challenge that we have and the governance challenge that we have on the right. So when you look at this thing, the relational world really is all about defining it as form of semantics. So that says I have to have governance, I have to have the proper definitions and once I have the definitions then I get going that's one model. But if we start looking at this no-sql world and the late binding or the schema-less processing, is we could say, well, I will define it as I go along. But that's quite tricky and I always compare this kind of a model by giving a bunch of hand grenades to a monkey. And they play with it and they draw conclusions and everybody is happy but at some point this data I will find the schema when I start processing it will realize that if they pull out the pin it does something quite differently. So this processing or late binding is no reason for not doing governance of the data as such. But the difference is that the focal point is not by managing the semantics, the focal point is on managing the container as such. Because if you look at it and you probably have already heard this where I have a data scientist and what the data scientist needs, it needs a data lab and what do we put in the context, that container still needs to be managed. So for example, if you look at the way of extracting data from Twitter or from Facebook or any other data source, very often there are quite strict rules on how long you can keep that information and what data can be processed. So if somebody, a data scientist is taking a data set and taking a whole bunch of Twitter feeds what they need to do is you know what that container is and say, oh, wait a minute, this is potentially containing personal data. That means that I have to get rid of it or I have to understand where it came from and how it needs to be managed. So the point is that there is a very strong data governance element in there that says I need to understand enough metadata of the container so that I can apply my policy in terms of context. So what we're saying is that this schema less or late binding world is no excuse for not actually applying proper data governance but the focus point is essentially in the first stage on managing the container rather than trying to come up with all the semantics and trying to define what things are. So at one stage if I really want to start industrializing and drawing that it makes sense to take this late binding or apply the schema to the function is to bring that into this world of I'm gathering the data, I'm serving it, I'm using it, I'm getting rid of it and I'm maintaining it which is a more traditional world governance semantics and this discovery phase where we then say let's define it as we go, let's refine it and revalidate it where we do the hypothesis of industrializing it then you have to bring it into the more semantically managed environment. So the point is that instead of having a linear process and having information that is sustainable because I describe it in my run-up what we're basically saying here is that really through sustainable information management means that in some cases my exploratory work and I will discover what the data that it will come first but if you then want to sustain it then industrialize it you have to bring it into an environment that you actually describe the schema and understand what things really are. So that means that this whole idea of sustainable information has two layers in it. It has a layer of managing the container and having enough metadata on the container so that I can do this in a safe way and I can approve the set subsequently if you want to industrialize it then you have to take it to the next stage where you really say what are the semantics of this field, what it means, what is the traceability, what is the lineage and that's a non-linear process. Which is also starting to look at what it really means to do proper sustainable HOV data and I'm sort of trying starting on a crusade to replace these worlds of big data with something that makes more sense because it's not necessarily only about the size, it's about the fact that in some cases it is harder to obtain the value out of that. Hence the HOV elements of processing the data. And what you notice is that we need really a three part model where we're saying the data scientist and what is the data scientist that is the person that is able to define and validate the validity of the data set and that the person that truly understands the algorithm in many cases today what we see is that these people are actually spending too much time on trying to source the data or to transform the data. So what we're saying is that we really have to understand what the data manipulation data engineering side is and put that data engineering side next to the data scientist so that they can both work on the data store and get the value out of the data store and then the business expert can actually work together and then see how we can industrialize that and start looking at potentially what could be a next best action. So basically saying a truly data driven organization is an organization where we're saying well what I can do is I can look at the different data points that I have and this is another area where we're saying what does it mean to be information centric what it means to be information centric is that we're basically saying we are moving from our guts so that this is a gut feeding that is probably the right thing to do we should be moving from our gut feeding into a quantifiable environment. So one of the things that for example our chief data officer or chief information or chief important officer should do is to say for every point where I am taking a decision in my organization do I have the data to back up that decision do I have the data points do I capture the events that are actually making sense and a very good example in that context that I had quite recently was a company that was investing a lot of money into this area of digital excellence and they bought a data capturing tool that was sitting on their website so any web interaction had a tagging layer that actually could understand what the customer was doing on the website you could argue oh great we measure we understand what is going on but what we noticed is that the reality by not properly managing the data that was being captured the only thing they were doing is they were filling the data lake with huge quantities of information but those huge quantities of information were actually pointless because they were not linked back to the understanding part so the measuring is one piece but then understanding what you actually measure and understanding what insights you can get out of that is the morning more data this is good for bragging and having a discussion at the bar oh I capture so many gigabytes, gigabytes per day and if you want to win that game then you should go to CERN because in terms of capturing they are pretty much the champions of that but the point is how can we understand what is going on and how can that actually lead to action which brings us to that next stage where it is not only about Hadoop it is equally about complex event processing and for example having this zero latency or very low latency acting on the fact that we have a specific insight so that's really when it becomes really useful not the fact that we can measure not the fact that we can understand but the fact that you are from an information point of view going to have the right understanding where you can take that next best action based on the insight based on the right context based on the right quality of the right data quality level so that you can draw that conclusion for the right purpose so if we put that puzzle together and look at it in a little bit of a broader context what we could really say is that our bigger picture is at the center the fact that you have sustainable information management there is these two layers where you haven't formally described everything and you are just managing the container which you could look at it as a challenge for data governance and this is where you can then set up your data lab or sandbox or lake or whatever to capture the information but if you really want to do this in a sustainable way and start running your process in a safe context you have to start doing proper information governance with the right ownership and for the right reasons and on the right side we really have to start looking at saying yeah but what I want to do is I want to differentiate from a strategic point of view I really want to have better insight, better actions and the drivers for that which is what we basically see at the bottom is to say why are we doing this for risk avoidance reasons and that's basically the space where the your CDO should actually look at saying okay what is the data that we have do I have my data in a way that it's managed properly because you don't want to be ending up the next target or whatever company that runs into trouble because you have a data breach and you haven't applied to PCI rules to that so that's just part of the describing of the container and the running of the container. So my point that I was trying to make is to say if you really want to have this information synthetic organization that what we need to do is to look at the operational small data making sure that the right data is being put into the right context and on the other side making sure that you have the right data points draw the right conclusions based on the right level of quality that is available and in most cases this broader picture is something that is missing because a lot of people from their organizational point of view are a little bit like multi-Pytons 100 meters for orientations challenge people where you have the data scientist who is going off in one direction you have the people running the MDM projects going off in the other direction then people that are on another tangent and these things they really do not come together and the way to make it come together is by looking at these roles and executing these roles and this model in a consistent way so that you are doing it for the right reason for the right outcome. That pretty much puts me through the end of my slide so that is the point I would say that we can sort of opening it up for discussion questions thank you so much that was a fantastic presentation. Go ahead and submit your questions if you have any questions and the Q and A in the bottom right hand corner of your screen and one of the most common questions that we get as a question about the slides we will be sending out a follow-up email by the end of Thursday for this presentation with links to the slides and links to the recording of this session and this is very, oh, here we go. Well, actually, if there are no questions, can we talk to everyone? I wonder if everyone has their coffee today. It's very unusual for our crowd. We do have questions. Everyone has a type of question? All right. Well, Yon, thank you so much. Anything you want to say about what is going on with Dama to everyone? Yeah, well, as we already said that we've just launched the central membership. So for people that are not in an area where they have a chapter directly available to them, we now have a quite good way to interact with us and to join and get some of the benefits that you would normally have by being a chapter member. So I would encourage people to have a look on our website and see what we have available so that's one quite interesting announcement that we just recently made a few weeks ago, so that's certainly one thing to look at. That's a very exciting announcement and hope everyone has a chance to go to Dama.org and check it out. Yes, so I guess that's it. Everyone's very quiet today. Everyone's ready for their summer, the rest of their summer holiday. Thank you again for the presentation today. It was just fantastic and thanks to everyone for attending. I hope to see you at EDW in April 2016. Absolutely. Thank you. My day. Thank you. Bye.