 I'm Kemp, and I'm the executive editor of Data Diversity. We would like to thank you for joining the current 2014 installment of the Monthly Data Diversity Webinar Series, Real World Data Governance, with Bob Siner. Today, Bob will be discussing big data governance, what it is, and why is it necessary. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. For questions, we'll be collecting them via the Q and A in the bottom right-hand corner of your screen, or if you'd like to tweet, we encourage to share highlights of questions via Twitter using hashtag RWDG, Real World Data Governance. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and additional information requested throughout the webinar. Now, let me introduce to you our speaker for today, Bob Siner. Bob is the president and principal of KIK Consulting and Educational Services, and the publisher of the data administration newsletter, TDAN.com. Bob has been the recipient of the DMIT Professional Award for significant and demonstrable contributions to the data management industry. Bob specializes in non-invasive data governance, data stewardship, and metadata management solutions. And with that, I will get the floor to Bob to get today's webinar started. Hello and welcome. Thank you, Shannon. Thank you, everybody, for taking time to attend this webinar. No matter what time it is, I was going to say good afternoon to everybody, but for some of you, it's still the morning. In fact, some of you, it's morning tomorrow. I had somebody from Australia who wanted to attend the webinar and told me they were sorry they couldn't make it, and it would be on at three o'clock in the morning, and I said, well, you know, if you register for the webinar, you get a follow-up email, and you can attend the webinar at a time of your choosing. So again, thank you very much for attending today, and thank you very much for attending throughout the year. As Shannon said, this webinar is on big data governance, what it is, and why it's necessary, or what is it, and why it is necessary. And I hope I can answer some of your questions about data governance and big data and the relationship between the two in the next hour's time. And as Shannon said, we're going to leave some time at the end of the webinar if you have some questions. I wanted to share with you also, I announced this in the webinar last month, that we started the series or we started to put together the subjects for the series for 2015. The subjects for January, February and March, are one of you will be doing agile and data governance, bridging the gap between the two in January and February, data governance roles and responsibilities, and in March we'll be talking about data governance best practices and what are the criteria to identify what would be a data governance best practice for your organization. So a couple other things to share with you real quickly as I typically do. I also wanted to let you know that the book on non-invasive data governance, which Shannon had mentioned, is now available. It was available September 1st through Techniques Publishers on, you can get that book in Kindle format or on Amazon.com. Also the KIK Consulting website if you're interested in more information about non-invasive data governance is now, has been updated and changed. Also one more thing to share with you is that the Enterprise Data World event, the agenda for that event has now been posted and I'm happy to say that I will be speaking at that event. I'll be speaking on progressive topics in data governance. I want to talk about big data like I'm talking about today. I'm going to talk about agile and also I'm going to talk about the Internet of Things and how that relates to data management and data governance. So I hope you can attend in Washington DC as a fabulous conference every year and in March and the beginning of April. This is the abstract for the webinar. I always start my webinars with the abstract, big data is all the way to everybody's asking about big data, researching big data, considering even some companies are doing big data. Your company may be one of those. And it's kind of funny when I talk to people about what I do, I talk about data governance and a lot of them have always said, well, you're talking about information security, right? And I said, well, that's a part of it. What's funny is when people want to talk to people about data governance, they say, well, is it related to big data because they keep hearing all these things about big data? Well, big data is in the news. There's a lot being said about big data. In this webinar, I want to talk to you a little bit about defining what big data is because there's different definitions depending on who you ask in the industry. Also, I want to talk to you about data governance and different definitions for governance and how do we relate the two? Is there such a thing as big data governance or are we really just talking about the governance of big data? So that is what we'll be talking about today. Today, the session is going to include defining big data governance, we're going to govern big data, make a connection for the IT people and for the business people in our organization, between the big data efforts that are taking place and the need to govern the data associated with those efforts, also determine the vitality of big data governance and also I will offer to you some of their considerations for big data governance and the governing of big data. Real quick, I just want to show you a couple of humorous things that I've come across. If you go out to Wikipedia and you do some research on big data, and I've done that a long time ago, the slide or the comic that's on the left hand side of the slide is what they have posted on Wikipedia. And it talks about the bringing together of different types of data for business purpose and for making sense of that data and being able to make decisions from that data. So I'll share that comic with you again a little bit later here, but also on the right, why, grandma, what big data you have? Everybody's talking about big data. A couple of other comics real quickly, the one on the right, is a pertinent interest to me. Let's solve the problem by using the big data that none of us had the slightest idea of what to do with. Well, you know what? If we apply governance to that big data, we improve the understanding of that data. We go through the process of identifying what big data we're going to be using. The chances are that we will be able to make decisions based on the big data. And it will go more from being a object that people are talking about, but not understanding to something that's well understood and well taken advantage in organizations. As we know, data is getting bigger and bigger as we speak. And the sources of data are many and varied and they come at us from all sorts of speed. So what I have for you is it has to do with the budgeting for big data. Finding a lot of organizations are budgeting for big data, but I'm not exactly sure what they're budgeting for. And I'll tell you a little bit about some of the things that you budget for when it comes to big data. How exactly are they spending their money? Well, big data has been around for several years. The debt continues to change, but they should seem to be taking it seriously and addressing the things that they need to around big data. And here are what companies typically are budgeting for, scaling up and scaling out storage, developing non-production and production environments around their big data. They're budgeting for staff and training on big data tools that are gonna be new to our environment, operational systems, the purging and the archiving and the disaster recovery for the big data. But a lot of organizations that are thinking about data governance don't necessarily put those three at the bottom of this list at the top of their priority list. Now we're gonna talk about big data governance and the governing of big data today, but I'm also gonna spend a moment talking about big data, metadata, because there is such a thing. I don't wanna just call it big metadata because it's really the metadata that's associated with your big data. And then also all of the rules associated with risk management for all the other data in our organization apply to big data as well. So I just wanted to share those with you. In fact, data governance has become such the rage that, and I think I've said this before, is that I'm from Pittsburgh, Pennsylvania, a local group called the Pittsburgh Technology Council held an event called, I Love It When You Call Me Big Data. And they were trying to be risk-ally attracted, the heck of a lot of people, more people than they've ever attracted for an event. And the session that I had proposed to them was, you may not be doing big data, but big data is doing you. So we are being involved by data coming from a lot of places and a lot of different ways, and we need to take advantage of that data in our organization. So let's start real quickly with the definition of what big data is, and different definitions depending on who you ask. The traditional definition about big data is that it's high volume data, it's high variety data coming from a lot of different sources and a lot of different formats, it's high velocity, it's coming at us at great speed, great volumes of data are coming at us quickly and in different forms, and we need to understand how we can take advantage of that data for our organization. And that data, just like any other data in the organization needs to be governed. We'll talk more about that in a couple of minutes here. Another definition of big data is it's pair of bytes and petabytes and even exabytes of all sorts of formats that are coming to us in a lot of different ways, and in a lot of organizations we're looking at ways to continue using that data to make decisions. But it's secondary or tertiary requirement and the security of that information is being treated as a secondary or tertiary requirement. Also, big data is unstructured data, data that's coming to be found in texts and emails and social media sites and machine generated logs. So the question that I have for you and if you would be kind enough to take a moment and maybe enter into either the Q and A or the chat session, how is your organization defining big data? Or maybe even just share with the folks in the session, are you even talking about big data in your organization? Are you talking about governing big data? Are you talking about something that you are calling big data governance? Is there a need to call something big data governance or is it in fact just the governing of the data? Governing of the big data? I have it on our last year in the real world data governance series on big data. This is the only slide that I borrowed from that presentation. So all the stuff that I'm gonna talk to you about here is different than what I talked to you about before, but I thought this slide kind of hit it on the head when it was defining big data. Big data is a term applied to data assets whose size is beyond the ability of commonly used software tools to capture managed process data within a tolerable in glass time. But the question is, is it just the size that's the concern to us? Or the format that that data is coming into us, that that's a concern? You know, software tools that we have in our environment capable of being able to capture and manage and process that data. Whether it's really large data or it's just data that's coming from us in different formats from different places. And if you look at this chart here, if you look at this pie chart, you'll see that the majority of people understand that big data is a legitimate problem that's stemming from the growth of this unstructured data that's got us from all directions. Some people think that it's a meaningless catchphrase. Other people think that it's a way to talk about Hadoop. Other people think it's a data warehouse. Other people don't know. The fact is that big data is becoming more commonplace in organizations. People are using that term, the fact that it's just being defined differently in different organizations. So if you can take a moment and share with us what your definition of big data is, that's something that I'll share back with folks in the email that Shannon's mentioned that we send out within a couple of days after the time the webinar is over. That's also a definition of data governance. I talk about this a lot. Data governance to me is the execution and enforcement of authority over the number of data, whether that's big or that's small data and the data of all shapes and sizes. If we want to get value out of that data, we need to execute and enforce authority over that data. We need to make good decisions about that data. The definitions of data governance are that it's the orchestration and the harmonization of people in process and data. And that's all well and good if you expect that you're gonna be able to get people into a room and get them all to put their arms around each other's shoulders and sway back and forth and solve problems. The fact that we need to execute and enforce authority. Other people think data governance is the formality of decision rights. And you know what? You don't have decision rights associated with your big data, the way you have it associated with your small data, your metadata, your master data, your reference to all of your data needs to be governed. So we want to make certain that we execute and enforce authority that we bring together people process and data and that we formalize our decision rights associated with that data, whether it's big data or small data in our organization. So I believe, like I believe that the definition of data governance is the execution and enforcement of authority over the management of data. Well, what data is that execution and enforcement of authority over? It's over all of those definitions that I just shared. It's the high volume data. It's the terra peda and exabytes of data in different formats. It's structured data. It's unstructured data. So if we believe that we need to execute and enforce authority over the management of data, then we certainly need to make certain that we extend that governance of data into our big data. Again, the question becomes, do we use a term called big data governance? For this governance of big data or do we just call it governance of big data? Do we need a separate program for big data or do we have separate roles or responsibilities? Let's investigate that further here in the balance of the webinar. So what are the things that big data governance needs to be concerned with? Well, there are four primary principles that I talk about all the time in organizations. We talk about those principles as being that we need to manage data as an asset. We need to analyze accountability for that data no matter what size it is or where it's coming from. And we follow the rules that are associated with that data and we wanna be consistent in the way that we apply governance to that data. So in terms of big data, we need to apply these things the same way to big data as we do to any other type of data in our organization. We need to manage it as an asset. We need to have people accountable for it. We need to follow the rules. We need to specifically manage the definition of the big data. We need to manage the production of the big data. We need to manage the usage of the data. It's just a type of data in our organization. Yes, we might have issues that pertain specifically to that data, but we wanna make certain that we're managing the definition of our big data, managing the production and managing the usage of the big data. So the big data is an asset and big data is another type of data in our organization. We need to have people responsible for it. We need to follow the rules and all of those types of things. I was gonna mention metadata in here as well. Well, the fact that I saw somebody talking about big metadata, and I said that does not sound right to me. We really talk about big data in relationship to the metadata. And the reason why I showed that cartoon or that comic strip here on this slide again is if you look at that closely, the individual that's talking to the guy with the big guy there says, your recent Amazon purchases, tweet score and location history make you 23.5% welcome here. Look at three different sources of data. Look at the Amazon purchases. Well, where are you gonna get that data from? The tweet score. Where does that come from? The location history. I know this is just an example that goes along with that cartoon right there. But think of it, if we can get data from a whole lot of different sources, we can have information about that data. We need to know where it came from. We need to know the format. We need to know the way that it's defined. Now, the data in the past was always limited to the things that we talk about, the names, the definitions and the labels associated with the data. Well now, if we're talking about metadata associated with big data, then we need to take into consideration that these sources of data are gonna be varied across the places that we can pull the data from and across the organization. Metadata as a term can be used to evolve. Really talked about it in terms of data, from being a data management asset, to becoming a BI, a data warehouse asset, to a data governance asset. Metadata in the news all the time as being a privacy and a confidentiality asset, well now metadata has certainly become a big data asset. Since we need to know information about where does it come from? How is it defined? When does it get updated? What's the cost that we can have in that data? All of those things are metadata associated with big data in our organization. So putting together data governance strategies for our organization. From the ones that I had the pleasure in the honor of working with, we wanna govern structured data and unstructured data and content and records and logs and data. They don't set up separate data governance programs specifically for structured data versus unstructured data. Granted, a lot of organizations will start with a governance of structured data and then start to focus on structured data and content data, but the question becomes can we utilize the same roles and responsibilities? Can we govern processes? Just the same way that we govern processes associated with normal data governance for the other types of data within the organization or do we have to develop a separate program that focuses on big data governance? And you'll find that in this session, my suggestion is no, we really don't need to set up something that by itself stands alone as big data governance. We can take advantage of the roles and possibilities that we've defined. Some of the tools that I'll share with you in a couple of minutes that help us to understand who does what with the data across the organization. We need to do the big data in that data that we are inventorying and that we're looking for who's accountable for that information and we need to help for that accountability, whether it's structured data, it's unstructured data, it's content, record, whatever it is, we can use the same program to manage the difference of data within our organization. Well, the same thing holds true for different subject matters of data. I've worked with organizations that have set up customer data governance programs or supplier data governance programs. And the fact is when they're setting up their programs, specifically for that subset of data, we want to do it in such a way that it's extendable in other types of data within the organization. So we want to make certain that if we define a data governance program for our organization, that we have room to be able to grow within our program to identify what roles and responsibilities are that are associated with the governance of big data within our organization as well. I don't see organizations setting up data governance programs for customer and then setting up another data governance program for supplier and another one for product. In fact, I see organizations that from a subject matter and from a data domain perspective can utilize the same structures that they've defined for their governance program in general to focus on these different domains and subject areas of data within their organization. So to answer the bottom question on the slide, typically one data governance program is really all that's necessary. It's difficult enough to get one program off the ground, let alone a program in place for different types of data, for different subject matters, for different structures of data coming into the organization. Again, my suggestion is that we want to define our governance program in one way that's extensible and expandable into other areas of the organization. If one of those areas is the introduction of big data into our environment, then we want to make certain that we can extend our data governance program into those areas. So do we or will we govern big data in ways that are different than other data? And I want to get that from an accountability perspective and responsibility for this data, from a process perspective, from a rules perspective, an inventory and a decision-making perspective. And I want to share with you quickly here a couple different graphics that I use all the time with organizations that help me to understand who's accountable for the new organization. How do we apply governance to specific processes? What rule do we need to educate people across the organization? How are we going to inventory our data, including our big data? And how are we going to utilize that data from a decision-making perspective to create different constructs for making decisions around big data that we do for our data? And again, I wanted to say that I don't really think that it's necessary. So from an accountability perspective, this is a model that I use to define different roles and responsibilities around governance in the organization. A lot of the first response to this graphic is that it seems pretty bureaucratic. There's a lot of different levels. That's why I incorporate these things on the outside of the diagram that say some of these things already exist to be leveraged within the organization. The fact is most organizations look at things this way. They look at things from an operational perspective. They look at things from a tactical perspective, strategic and executive perspective. And so it would be surprising to you that the roles and responsibilities that I typically talk about in terms of data governance in those categories. We've got operational data steward roles. We've got tactical data steward roles. We've got a strategic level of the governance council. And then we've got a steering committee at the executive level. And in fact, the webinar, I think it's in February of 2015, is gonna focus on this operating model of roles and responsibilities and go into the details for each of those. So when we incorporate big data into our environment, does that mean that we need to create additional layers? We're gonna look at that data with people at an operational level, at a tactical, at a strategic level. We need to have all of those same levels of responsibility and accountability for the data, whether it's big data, small data, metadata, master data, reference data, whatever type of data it is in the organization, it makes sense to define sets of roles and responsibilities one time and then to use those. Now, we don't need to be afraid to alter those in whatever way they need to be altered to address data into the other types of data that we're governing. But we can use the operating model that we've defined for our governance program or the roles and responsibilities that you've defined for your program in your organization. We also need somebody who has the responsibility for data governance, whether it's any kind of data in your organization. We also need to understand that we need to take into consideration information technology and project management and regulatory and compliance when it comes to governing data. So again, I would venture to say that if we're gonna govern big data, we don't have to create a big data governance program to do that. We can take advantage of all these different levels of roles and responsibilities that I've laid out here. And again, if you're interested in more information about that, please attend our webinar coming up in February. From a progress perspective, we can do the same thing. We can take, outline those activities that we need to govern for our organization. And we can create these RACI or these RASCI charts of the different steps of the processes, the different roles associated with our program, and who does what in those steps of the process. So rather than this process stating resolve or research information quality issues, perhaps consider that it may be a process for big data into the organization from a variety of sources. For buying an analytical model that we're going to plug that data into that's going to help us to analyze and to make sense of that data. All processes that are involved in all the different aspects of managing data and taking advantage of the data and all the operational processes within our organization, any of those processes that are associated with big data, they need to be governed as well. So from a process perspective, do we need to redefine the program? Well, not exactly. What we need to do is define the processes, outline the steps of the processes, and again the roles and responsibilities of the program and bring the two together to make certain that we've got the right people involved at the right time for the right reason to govern the data through the data, big processes that we have in our organization. So in governing big data different from a rules perspective, or is big data governance different from any other type of governance from a rules perspective? Well, I would again venture to say that it's important that we understand the rules associated with big data. We understand the protection rules, we understand the classification rules as far as what's highly confidential, what's confidential, what's sensitive, what's public data. We need to make certain that we document the rules associated with the data the same way that we document it for any of the different types of data that we use in our organization. So this data matrix, an older version of the common data matrix actually has this spot right here that I spelled out, it says we need to make certain that for each subject area of data, we're distributing the rules associated with that data, and we're distributing that to all of the people in the organization that have their hands on the big data. We want to make sure that those rules are well documented. So again, from a rules perspective, governing big data is just like governing any other type of data with the organization. If you feel differently, I'd be interested in hearing from you what your thoughts are. From a regulatory perspective, I've shared the common data matrix before. This is a way of being able to identify by subject area, what are the different types of data that we have in our organization, and who in the organization uses that data. We need to identify for each of the subject areas, what are the systems that that data resulted in, who in the different parts of the organization make use of that data. So as we're going to inventory the data, we need to add that big data is another place that we can find customer data, that we can find finance data, that we can find employer, supplier, or vendor data. So that's from an organizational perspective to know where all of our data is located, and to identify who has accountability for that data, whether it's informal accountability or formalized accountability for that data, which comes into play when we talk about putting our governance program into place. It's a big data different from our governance of other data from a decision-making perspective. Again, going back to the operating model of roles and responsibilities that I've shared, we know that we've got operational people, we know we've got tactical and we've got strategic people. One of the things that I don't talk about a lot in this operating model of roles and responsibilities is this escalation path, an approval path. In other situations, they want to push as much of the decision-making associated with the data as far down into the data and into the organization as possible. Again, I'm talking to this diagram, and in fact, the layers of the pyramid diagram that you see there kind of indicate to you what's the percentage of the decisions that organizations want to make at that level. In other situations, they talk about pushing the decisions down to the business unit, to the operational level, but when those issues start to cross over business units, then they escalate it up to the operational level. If the decision cannot be made at the tactical level, we escalate it again up to the strategic layer to the data governance council. The most interesting space in that tower that's sticking out of the operating model is that they don't escalate issues up to senior management. So at least that's why we have a council of representatives of senior leadership, our senior leadership team across the organization to be in that position, to be able to make decisions that do get escalated to the strategic level. So again, I would say that if we have a clearly defined decision path, or a decision process, or an escalation process, defined for our governance program, we should consider using that same process and that same escalation path for the governance of big data as we would for the governance of any size data within our organization. Okay, so let's talk about different ways to govern big data within our organization. I've talked about governing data from a definition perspective, from a prediction perspective, and from a usage perspective. Well, if we're talking about our other data from those same three perspectives, then why don't we also consider governing our big data from those perspectives as well? So governing the definition of big data, some of them have the responsibility for identifying what big data is available to us. We need research to identify what the potential sources are. The sources, they need to identify out of that big data what data is going to be useful to the organization. We're going to need within our organization to be able to handle that data as it comes in from the outside. And how is that big data defined? Are those places that we're pulling big data from, are they providing us a definition, or do we need to develop that definition within our organization? You know, some of the things associated with governing the definition of data apply to big data as well, including the definition of the format of the data, who has accountability for defining the big data, and we also need to store the definition of the big data somewhere, whether it's a metadata solution and a business glossary, wherever we're providing metadata and information about the data to people within our organization. So one of the ways to govern big data is to govern the definition of the big data. And you know what, it's not going to govern itself. We need to learn that we've got individuals in our organization that have clearly defined accountability around governing all of these aspects of the definition of big data. We've got to learn the production of the data, the production of the big data. How is the big data being produced? Where does it come in from? What's the quality of the data? And do we have accountability assigned to people, or do we have accountability formalized to people for producing that big data? And then where are we going to store the information about how that big data was produced? And again, that's another way for us to govern big data. We can govern the definition, we can govern the production, and by the way, we need to govern the usage of that big data. So we need to recognize who's going to use that big data. What rules do we need to be able to educate them on, associated with that big data as to how it can be used? You know, what the definition of that data is, where it came from, you know, what are some of the applications of that big data to our organization? So we need to govern the usage of big data the same way that we need to govern the usage of any data in our organization. And in fact, a recent client of mine, the sole focus of their governance program was on protecting the data. They're not going to see, well, the fact is when they start to pool big data, that big data is just going to plug right into their governance program for protecting data. They don't need to redefine their governance program. They may find additional tools to help them to govern the big data because of the different things that we talked about, the volume, the velocity, and the variety of that data. But we need to govern the usage of the data. So in ways that we govern definition, production, usage of other data in our organization, we need to govern the usage of big data as well. So we might as well look and see, well, what is the definition of to govern something mean? So what does it really mean to govern big data? Well, that's the best place to go but the dictionary, where we talk about what is the definition of what it means to govern something, to make it administer public policy and affairs, to exercise sovereign authority, to control the speed, to relate to control actions or behaviors. Well, if we're going to do all those things or over big data, that's what it means to govern big data. We're going to keep under control. We're going to exercise the deciding or determining influence, all these things. Those are the, again, the dictionary definition of what it means to govern something. All of these things apply to the governance of big data, the governance of small data, meta data, master data, whatever type of meta data we have in our organization. So one of the things that I want to talk about here when it comes to connecting data governance and big data is those four core principles that I mentioned earlier. Management, to understand and to agree that data must be governed as a value in strategic asset and that we have to have clearly defined accountability for data in general in our organization and that we must follow the rules and that we must be consistent in the ways that we govern data across the organization. Well, we can start to introduce big data to your environment in the same four core principles apply. We need to make certain that we're managing it as an asset, that we have accountability, that we govern it to the rules and that we're consistent in our approach to governing data within our organization. So here are some questions around the vitality of big data governance. Do we require a separate discipline called big data governance? And I want to tell what my answer to that question is, I really think that we can leverage existing governance to have big data in our organization. How would big data differ from the other type of governance? Well, we don't know about it. The accountability would be the same. The decision-making would be the same. With the same rules and rules apply, should we just stop using the term big data governance and call it data governance? Or can we use the fact that organizations are starting to embrace big data in a way that will help us to sell the needs for data governance in our organization? So let's take a look at each of those. Do we require a separate discipline called big data governance? Well, in my opinion, the answer is no, it's not necessary. But if that's your opinion, do you think that we need to have a separate discipline for big data governance than we have for data governance in general? So do we really require, we're having a hard enough time in a lot of organizations selling the need and justifying the existence of role sources and time and money being spent towards data governance. Think about what the reaction of people are going to be when we say that we need a separate governance program to govern big data from the other types of data in the organization. Now, how would it differ? How would big data governance differ from the other type of data governance? Would it be different in best practices in the roles in the action plan or the communication plan? Well, practices that I typically see being adopted by organizations are that senior management support sponsor and understand the activities of governance. Well, those same best practices would apply to big data as it would any other type of data. The second one is that somebody has to have the responsibility for governance within the organization. That the roles and responsibilities are clearly defined. Well, if we're going to govern big data, we need to have those same types of best practices. And in fact, we can incorporate the analysis and the assessment of big data governance or big data with our assessment of other types of data within the organization. You know, I mentioned earlier the different types of roles that are typically associated with a governance program. Well, again, we're going to try to redefine your roles and responsibilities specifically around big data and we're going to look at that as a duplication of effort. So perhaps what we want to do is we want to take our data governance initiative and we want to embrace the fact that big data, that data at high volume, that high speed, of high varieties of formats are coming at us and we want to make certain that the roles and responsibilities that we define for our program are there to be able to handle and enable the governance of the other types of data within the organization. So roles, the same roles apply while we need to have an executive strategic tactical operational and support roles. The same thing, you can answer that question for with the same rules apply, R-U-L-E-S. Do we need to protect the big data the way that we protect other data? Do we need to classify that data? Do we have compliance rules associated with that data and business rules associated with the big data? The question is if we're going to govern any other type of data, do we need to know protection, classification, compliance, business rules associated with the big data the same way that we have it associated with other places in the organization? The question is should we stop using the term big data governance? And I hear that term and I see it in print and I see it in presentations all the time and I'll quickly raise the question to the person that's holding that session or writing that article but what I want to know from them is you know they're not really such a thing as big data governance or is it just the governance of big data? So the question here is is big data even a thing? Or do we talk about it in terms of data governance of big data instead? So can we use big data to sell the need for governance in our organization? Well the answer to that question is potentially yes if your organization is already embracing big data. A company hasn't embraced big data and you think that you see it in the future you know you may not be able to use big data to sell the need for governance if you haven't embraced data governance yet but what we may want to do is at least open the door for the fact that as we're putting our governance program in place we're making certain that we're allowing for the fact that there's going to be this high volume high density, high variety data coming at us. We need to stress the need to execute and enforce authority and we need to stress the need to formalize accountability management of that data across the organization. So governance is governance and we're making certain that we execute and enforce authority and formalize accountability. We can use some of the same tools I showed you earlier the common data matrix, the mid-model of rules and responsibilities around governance in our organization. Considerations for the governance of big data. Well we need to have people that are accountable for identifying and then approving the data requirements when it comes to what big data is going to be useful to us. How are we going to use it? How are we going to apply it across the organization? That's not going to govern itself. We need to have a clearly defined role that has a responsibility for doing those things. We need a role associated with assessing the quality of the big data coming into our organization. We need a role associated with accountability, associated with accessing the data or bringing access to the big data in our organization. We need a role for managing and supporting the infrastructure within the organization that we're going to add and we're going to need to enhance in order for us to be able to maintain and to use the big data effectively within our organization. We need accountability for managing stakeholder expectations. Somebody needs to communicate with the stakeholders, help them to understand what's available, what's possible and help to manage again their expectations as to what they expect to be able to get out of the big efforts that are taking place in our organizations. In the last few years, I came across IBM's five ways to take advantage of big data. It's not surprising that the very two, the first two items on the list are built a corporate culture that's savvy around data and savvy around big data. And that's how we understand that those four core principles that I mentioned a couple times earlier need to apply to any type of data in our organization to make sure that we manage that data properly. There's one of the five ways to take advantage of big data is to make sure that the maturity and privacy in governance are requirement. So it's not only saying that we need to apply governance to our big data, it's everybody. It's the IBMs, all the other large consulting companies and small consulting companies say that as we embrace data governance, we need to make certain that we're securing that data, we're holding it in private, that we're governing that data as well within our organization. And there's actually the article that I found was on Forbes.com magazine and you can find it. It goes into more detail as to these five ways to take advantage of big data. So these also should consider the following things when it comes to governing big data in the organization. All of the things that I just shared with you on the previous slide, those things are gonna run in parallel with each other. We have the resources that we don't not govern and one of those aspects that I just spoke about that we can govern things in parallel to each other. We can make certain that we get the right people involved at the right time to make certain that we're going and we're assessing the big data opportunities that we have in our organization. We need to govern the selection of the internal and the external data that we're gonna integrate into our organization. We need to govern the selection of the models we're going to use and the tools that support the business goals in our organization as we start to embrace the idea of big data and as we start to put that data into models that we can use to make decisions within our organization. We need the capabilities of the organization to exploit this new resource of data that we have, this new big data resource within our organization. The governing of parallel issues, the selection of which data is going to be associated with our big data effort, the political models, the exploiting of this, all of this needs to be governed by somebody. Does anybody who's separate from the rest of the governance initiative, well it may be from a project leader perspective, somebody who is managing the data projects within our organization, we want to make certain that all of these decisions that were made or making association with all these different issues, the internal, external data, the analytical models and the expectations and the potential, we need to make certain that somebody has the responsibility for managing those aspects of the project. This is the best quote that I've seen out there about big data and data governance. He says that the payoff from doing the big data and advanced analytics management revolution is no longer in doubt. I won't read the rest of the quote but you'll see that that was quoted in October 2012. So that was two years ago already, two years plus that the, that this statement was made. The organizations are still just starting to embrace big data. They're certainly starting to embrace advanced analytics and they're recognizing that in order to manage to advance analytics, we need to have high quality data, we need to have accountability for the data. What is the payoff from joining the big data and advanced analytics revolution is no longer in doubt to some, some organizations are still a little bit behind in catching up to the fact that there's a lot of data out there that we could take advantage of that can help advance our organization. The fact is that if we don't take advantage of the big data and we don't have that big data the same way that we govern other data in our organization, the fact that our competition is gonna leave us in the dust because they are gonna start looking at the importance of these things in our organizations. So the last things that I wanna talk about before we take a couple questions are the applications of big data. And what I did was I pulled off several different industries to share with you because if big data is still more consensual to you, I'm trying to make it a little bit more real for you. I mean, by the way, this is real-world data governance. So let's talk about real-world big data as well. So the applications of big data in healthcare organizations are aggregating years of research and development data in the medical databases. They're digitizing their patient records. They're bringing together all of the healthcare knowledge into databases and helping them to make better medical decisions. They're managing data from clinical trials to information patients. They're collecting and analyzing information from multiple sources. These are some of the applications of big data in the healthcare industry in the retail industry. Here are several more. There's a growing cross-channel data volumes, increasing investments in technologies, in retail technologies, solving the behavior puzzle, understanding what customer behavior is, increasing our sales of our organization by other customer behaviors and treating the ways that we set up our stores and set up our websites and we set up our relationships from one product to the next. We need to be able to assess the customer behavior information, improving personalization, segmenting the most valuable customers. These are other ways that retail industry are embracing big data within their organization and the last one is the applications of big data in education and feedback in the application processes for higher education. Personalization of core study and helping to pull information from universities across the country and across the globe to help individuals to understand where they can go with the different courses of study that they're taking within their organization. We can improve efficiency by saving time in an effort to realize goals for our students, for our faculty, for our programs. They're tracking and understanding the patterns of learners. That's another big use of big data within the education space and then understanding more about the learning process by bringing in all these different aspects of data and helping us to be able to analyze that data so that we can again, understand better the learning process and what's gonna be effective for people. So, we'll talk real quickly and then take a couple of questions. We talked about defining big data. We talked about defining data governance. We talked about big data governance and if it is such a thing or are truly just applying governance the same way that we applied the other data to the big data of our organization. We talked about ways to govern big data through the definition, production and usage of data. We talked about using the big data to make a connection for the IT, the business people and determining the vitality of something that we might call big data governance within our organization. We also shared with you as a last item some of the considerations for big data governance. So, I'll go through a lot of this stuff real quickly with you. I hope that it was helpful to you. As I said, we're sharing the slides with folks and I'll be welcome to take any questions that you have. Just briefly before we start that, the upcoming webinars, we'll talk about agile and data governance in January. We'll talk about big governance roles and responsibilities in February and data governance best practices and best practice criteria in March. So, thank you for your participation today. Shannon, do we have any questions? We have questions coming in and if you have any additional questions, go ahead and submit them in the right hand corner in the Q&A section. And of course, one of the most common questions that we get from everybody is asking if they get a copy of the slides. I will be sending a follow-up at the end of day Monday with links to the slides, links to the recording of the session and anything else requested throughout the webinar, including all the great resources that Bob has mentioned throughout. There was a question about slide 12. Do you know what the demographic of the folks that answer the poll was? So, that's an idea. You don't really know, but if you go to theallanalics.com, that should be able to answer that question for you. Most of the polls, they do provide that information, but I don't have that handy and I don't want to make it up for you, but it's really revealing that study. If the majority of the people, even if it's a slight majority of people recognize that there's a growth of unstructured data in our organization, there are people that say that it's another way to say a dupe and a meaningless catchphrase and that's a limited number of people, but I'm afraid I don't have the answer but I will be glad to get that information for you and then I'll provide it back to you in the answers after the webinar is done. The next question is, what is your perspective of the gaps between data governance and information governance? Well, that's interesting because there's a lot of organizations that are using the term information governance and in fact, they're using it again in different ways. There's some organizations, in fact, there's a client that I worked for that called information governance because they did data governance several times and it hadn't worked for them so they needed to call it something else so they called it information governance. They all know that data and information are the same thing. No, they're not. Data plus the context for the data becomes the information. Now, other ways that organizations are using information governance is that they're using it to include not only data governance, but product governance and technology governance. So I had the pleasure of working with several organizations that have called their program information governance as the umbrella term used to embrace not only data governance, but again, the technology governance and the process governance. I've seen some government organizations that use the term information governance to provide to the governance, to policy governance and the technology governance. So there's different umbrellas or the term information governance is being used in different ways in different organizations. So I think that if we're gonna look at the gap between data governance and information governance, information governance is going to include more than just the specific data itself. It's going to include the metadata. It's going to include the processes associated with the data. It's gonna include the acquisition of new technologies within the organization. So that gap is the organizations that are calling it information governance either using it just to replace the term data with information, whether there's differences between the two or they're using it to embrace a wider area of the organization for governance. And the most often times I see it as being data governance, governance and technology governance. For at least a whole webinar if not a whole conference, the difference. So I appreciate that question. Thank you very much. So it's more actually a request for your thoughts on the following statement. It seems to me we need a different approach to metadata management for big data. Hadoop specifically because it is a schema on read rather than a schema on write. What are your thoughts on that? Well, I would say that we need to have an approach on metadata. Metadata that is specifically being captured about big data increases our requirements or expands our requirements to manage the metadata that specifically is along with Hadoop. And I'm not a Hadoop specialist. I don't typically talk to things that I don't, I have a lot of knowledge in. But the point is that if we're doing metadata management for big data, we need to identify what information about that data is going to be necessary for the organization. It's a different approach to metadata. I would think that you would do this, take the same steps to identify the metadata associated with your big data as metadata that's associated with your data warehouse or your master data management solution. Certainly there would be additional types of metadata when we're talking about big data. Again, that I talked about earlier in the webinar where we need to know where that data came from and what's the format of that data and how that data can be used and can't be used and the definition of that data. If we were to identify our metadata requirements for master data or for data loss, we may have some steps that we do to identify what those requirements are. I would say that we don't really need a different approach to metadata management for big data. We do the same steps. I understand that some of the results that we received from doing that assessment of the requirements may be different for big data than they are for the other types of data I mentioned. Can you tell us what federal agencies you have found that have good, mature DG programs and ones that have a vibrant DG implementation process? I certainly talked to one in particular from recent memory. Actually, I can talk to another as well. So there's a Department of Health and Welfare within a state in the US that I may have mentioned before, but their data governance program was entirely focused on the protection of data. So what they did was they set up a project to build all the tools and to get all of the rules associated with the protection of the data as a project first and got those things ready before they started to embrace the stewardship aspect of data. So if they had the rules in place associated with the specific data to where the rules applied, they would then go out to all of the folks that were potential users of that data and educate them on how that data can be used, how that data can't be used. So that's one example, a good example of a, well, that's not a federal agency. That's more of a state agency. So I talked to the folks at the Department of Education for the US government and they don't necessarily have a vibrant data governance implementation project underway, but they do have folks that are looking at that at this point. So I mean, the CIA, I helped work with them to put together a strategy around data governance, but again, the information that they've shared with me since my engagement ended has been limited, but I know that they have a pretty vibrant and mature data governance program, the Department of Defense has a pretty large data governance program. In fact, governments from other countries, I was gonna say from other worlds, things about governance, governance from other worlds, but we had at one of the Data Diversity events recently, we had the speaker from the British Army Department of Manning Army about the in-depth detail of their flourishing data governance program. So there's examples out there, if you do a search on the internet of federal agencies and data governance programs, attend the Enterprise Data World Conference or attend the Data Governance Winner or Finance or DGIQ Conference and your case studies from federal agencies, state agencies, local agencies, all of their programs have matured and how they've added value to their organizations over time. So just to continue for how we can learn more about federal agencies have found and have taken advantage of their governance programs. Question you get from your clients regarding implementing big data. Question is, well, what is it exactly? And that's when we started the webinar that way, is to, big data as it was originally defined was those three Vs, the high volume, the high velocity, the high variety of formats of data. But that was the volume that that was the focus of the original conversations around big data. Client and oil and gas company built sensors onto their oil rigs that were out in the Gulf of Mexico. And those were constantly 24 hours a day, seven days a week sending data back to their organization. They wanted to know if that high volume data was data. And for the contents and purposes, it was. But at this point in time, just going back to that silly cartoon shared earlier about location data, about tweet Twitter and social media data, about precious record data. And we all want to know what that big data is. And that's why I asked the question at the beginning, what does big data mean specifically to your organization? Because even though there are definitions of big data out there, it's not happening on who you ask. So that's the number one question that I get. And then the second question is, do we have big data governance or can we just apply our governance to our big data? And I think I answered that in this webinar here. I think that you can just apply governance to your existing levels of governance, to big data as well as other types of data. I have seen trends of just the term big data even going away just because it's data. There's just a lot of data out there every day at anywhere and we just need to manage it. And that's what I know. I know I'll let you wrap things up in a second, but how do you say that? Because in the webinar that I did last year on big data, I talked about the term small data. And I think that potentially the term big data is going to go away. You know, organizations may focus on small data, small, fine-tuned, refined data sets with high levels of metadata that are used to make critical corporate decisions. You know, big data can be replaced with small data. So I believe there's some truth in what you said, that big data could go away at some point at a time. Thank you as always, Bob, for this great presentation. And as always, thanks to attendees for being engaged in everything we do and for your great questions. Again, I will send out a follow-up email by end of day and Monday with links to the slides, the recording of the session, and the additional resources Bob had mentioned throughout the webinar. So I hope everyone has a great day and a happy holiday. Thank you. Happy holidays, everybody. Thank you, Shannon. And hopefully we'll see you again soon.