 And welcome, my name is Shannon Kemp and I'm the Executive Editor of Data Diversity. We'd like to thank you for joining the current installment of the Monthly Data Diversity Webinar Series Real World Data Governance with Bob Siner. Today, Bob will be discussing Big Data and BI Analytics Required Data Governance. Just a couple of points to get us started. Due to the large number of people that attend these sessions, he will be muted during the webinar. For questions, we'll be clicking them by the Q&A in the bottom right-hand corner of your screen, or if you like to tweet, we encourage you to share highlights of questions by Twitter using hashtag RWDG, Real World Data Governance. As always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Bob Siner. Bob is the President and Principal of KIK Consulting and Educational Services and the Publisher of the Data Administration Newsletter, TDAN.com. Bob has been a recipient of the Dama Professional Award for a significant and demonstrable contribution to the data management industry. Bob specializes in non-invasive data governance, data stewardship, and metadata management solutions. And with that, I will give the floor to Bob to get today's webinar started. Hello and welcome. Hi, Shannon, I hope everything's going well with you. Thank you everybody for attending the webinar today. If you're new to the webinar series, welcome aboard if you've been a past attendee of the webinars, it's great to have you back. As Shannon mentioned, today's subject is Big Data and BI Analytics, Required Data Governance. And what I've found is that a lot of organizations, almost every organization, is talking about big data. They're also talking about business intelligence, data warehousing, and with an eye on being able to improve the way that they're able to analyze their data. So they're definitely focusing on BI analytics. And one of the things that I've found and I'm expecting that most of you have found this as well, that is that data governance is required in order for us to get the most out of our big data and to get the most out of our BI data as well. And so that's what we're gonna talk about today. We're not only gonna talk about the relationship between big data and data governance and the relationship between BI analytics and data governance, but we're also gonna talk about the relationship between the two of those big data and BI analytics. And hopefully give you some ideas as to where you can use the fact that you have these types of initiatives underway in order to jumpstart your data governance initiative. And the same thing in reverse. If you've got data governance, how you can then translate that into success in our big data initiatives and our BI analytics initiatives. So before I get started, I've got a couple of real quick slides that I wanna run through real quickly. The real world data governance series will continue through the end of the year at least. And we've got some really fascinating subjects coming up the next couple of months. Next month on August 20th, we'll be talking about data governance and data stewardship certification. That's always a hot topic. The month after that, we'll be talking about data modeling and data modeling is data governance. Kind of ruffle a few feathers out there and get people to understand what the heck it is that I'm talking about. We've got a very interesting special guest attending during that webinar. Dave Hay, data modeler extraordinaire, will be joining us. The month after that, we'll be talking about governing metadata. We'll be talking about agile, we'll be talking about data governance and the internet of things. Hopefully these subjects are of interest to you and we will have you back in another month or so for the upcoming webinars. One big announcement that I wanted to make to all of you that are out there listening to this webinar is the announcement of the relaunching of the data administration newsletter. If you're not familiar with tdan.com, please go out, take a look at it, register so you can get the emails about the updated content. The publication has been around since 1997. However, this month, I just announced a partnership with Dataversity to not only redesign the site, but to relaunch the site, embracing things like social media, embracing things like RSS data feeds. I'm looking for authors, I'm looking for practitioners, people who are interested in sharing their success stories about their initiatives that focus on data management. It doesn't have to be a data governance initiative. Anything that you have that would be interesting, please pass it on and please be a regular visitor to the data administration newsletter. I think you'll find the redesign of the site was well worth it. There's all the content from the 17, 18 years that the publication has been around. So please take a look at that. And then real quickly, just to kind of follow up with the current events, I wanted to let you know about the book that I had published back in September called Non-Invasive Data Governance, also the relaunch of the kikconsulting.com website, and also put a real quick plug-in for the Dataversity events that I will be speaking at in the near future. One is coming up in September and that's the Data Governance Financial Services event. Data Governance Financial Services 2015 conference and I'll be speaking on considerations for starting or enhancing a financial data governance program. And then in November, I'll be speaking at the Enterprise Dataversity event where we will be talking about a strategic data framework based on data governance best practices. So please, I hope to see you there. Just some information about some upcoming events that you might be interested in. So this webinar is going to address the things that are listed on your screen here. We're gonna talk about existing governance applications or the ways that we can apply governance towards our BI efforts. We're also gonna talk about how we can apply data governance to our big data initiatives. We'll talk about what does the future of big data and BI data hold for us, the relationship between these three subjects, articulating the value of not only data governance to our big data and our BI analytics events, but also the reverse of that. How can we take advantage of the fact that our senior management is talking about big data. They're talking about leveraging as much information from the data warehouse and from our business intelligence initiatives as possible. We're gonna talk about how we can use to our advantage the fact that our executives are talking about those types of things to help to push our data governance program further into the industry, further into our organization. And then we'll talk about true intelligence that is derived from governed data. So we'll wrap up on that note and then take some questions at the end of the session. So as I do usually when I'm starting these webinars is I wanna share with you my definitions of data governance and data stewardship just to kind of get us all on the same page. There's lots of industry definitions out there, the definition that I've grabbed onto is that data governance is the execution and enforcement of authority over the management of data and data related assets. A lot of people look at that definition and say that it's worded really strongly. Well, the fact is your definition of data governance needs to be worded strongly. We need to be able to execute and enforce authority over the management of the data, whether that's from a data quality perspective or a data security or a compliance or a classification perspective and really at the end of the day what we're trying to do by putting governance in place is to execute and enforce authority over the data. Data stewardship, if you're familiar with the non-invasive approach to data governance, the one that I mentioned the book is about, the approach that I prefer to talk about focuses on formalizing accountability for the management of data. Where in a lot of organizations they think that data governance has to be about command and control, the non-invasive approach tells us that we're already governing data to a certain extent, we need to take advantage of that level of governance, that level of accountability that's already in place and formalize that so that we can be effective in governing data and stewarding data in our organization. Again, real quickly just to say with a definition of non-invasive data governance, it's the practice of applying that formal accountability through a non-invasive framework and I'm not here to talk to you about non-invasive data governance today, but I want you to know that that's the perspective that I take when I'm putting together these webinars is that we can do governance in a non-invasive way, we can apply it to things that we're already doing in our organization. Some of those things that we might be doing are big data related or our BI data related. So if we can take advantage of the existing levels of accountability in our organization and we can formalize those things, that is that kind of lays at the core of what non-invasive data governance is. So we're here to talk about big data and BI analytics data. So one of the things that I wanna do is I wanna just define some of those terms for us real quickly as well, where big data is really a broad term that's used for data that is so large or so complex that our typical way, our traditional way of processing the data in these applications are inadequate. And there's a lot of challenges that we address by that we come across as we're putting our big data initiatives in place. Some of those are listed on the screen for you here, analysis, capture, duration, search, sharing, all of those things that we need to do with this high volume, high velocity, high variety of data that we are all needing to address these days. And the fact is that in order to analyze the data, in order to capture and curate and search for the data, we must apply governance to all of these challenges that we have in our big data environment. And if we don't do that, then what we're gonna do is we're gonna have a lot of data coming from us at us from a lot of angles that is not being governed and therefore people don't thoroughly understand that data, they don't know where it came from, they don't trust the data. And so if we want to gain from the investment that we're making in big data, we wanna make sure that we're governing that big data. It does not mean that we need to put a separate program in place called big data governance, it means that we need to include the big data in the data that is being governed through our initiatives. So data governance must be applied to all of these challenges effectively in order for big data initiatives to demonstrate the return on investment, the proof of value that we are expecting from our big data initiatives. The same thing holds true for business intelligence. Business intelligence and BI and data warehousing, they're sets of techniques and tools for transforming the raw data that exists in our application into more meaningful and useful information for not only business analytical purposes, but for any purpose, any decision-making purposes that we have in the organization. So some of the common functions around BI include the reporting, that the process mining, the business performance management, the benchmarking, but also if you notice in some of those common functions that I've listed there, there's three of them that are very analytically based. Online analytical processing, OAP, there's predictive analytics, there's prescriptive analytics, and in order to get the most value out of our data in our business intelligence environment, in our data warehousing environment, we must apply governance to all of these functions as well. So it seems pretty sensible for us to understand that if we are focusing on big data and we're focusing on the investment of the data that's in our data warehouse, we wanna make certain that that data is of the highest possible value and the highest level of understanding that it can be for our organization. So that's really all I have as far as definitions is concerned, but we wanna make certain that if we're focusing on things like big data and business intelligence, that we wanna make sure that the data is of high value and that it's accurate and that it's put into the hands of the people that can use it. So I borrowed a couple paragraphs here from a recent article that talks about content management and the content really requires data governance. And so I'm just gonna go through these kind of quickly with you, but as companies generate increasingly more content and if we're talking about increasingly more content, we're really talking about big data. It's structured data, it's unstructured data, it's data that's coming at us from all angles. They're coming to recognize that the ability to be able to analyze that data is non-negotiable. Otherwise, why are we putting together big data environments or BI analytical environments if we're not going to use that data for positive purposes for our organization? So cleaning business insight from that data is really the end target for most organizations that are applying dollars towards these types of initiatives. So cleaning that business insight from the big data is still in the early adopters in the well-hyped stage. There's a lot of people talking about it, but there's less organizations that are getting the true value out of that data, perhaps because the level of governance associated with that data is not where it needs to be. Companies are beginning to see these results emerge in terms of smoothed workflows, improved search, better compliance, which really means better governance associated with the data that is being fed into our big data and to our BI analytics initiatives. So big data analytics is a business practice that analyzes and derives insight from data. That's what we're trying to do with the resources that we're applying to building these types of data resources for our organization. Big data analytics helps company manage their data lifecycle. With business metrics in place, companies can identify which of that data in our BI environments, in our big data environments is most value to our organization. And if we can identify which of that data is most valuable to the organization, then accordingly we can adjust our investments in the storage and the future analytical needs for our organization. So there's a very strong relationship between big data and BI analytics. There's certainly a big relationship between big data and governance and BI analytics and data governance. Let's talk a little bit more about that here. So when we talk about big data, and I love this slide just because the idea of it, first of all, it's the middle of the summertime, it's hot as heck out there and we've got a skier on the slide. But how much data is big data? So even dating back to when we were talking about data in terms of bits and bytes, up to kilobytes and megabytes and gigabytes, if you look at that progression of how the amount of data has grown in organizations, you'll see that where that red line sticks out, this is the most recent data that I could find, but as of 2012, there was 2.7 zettabytes of data that existed in the digital universe. 2.7 zettabytes, it's not even a term that's used by a lot of organizations. Well, the truth is that by the year 2020, they're predicting that we're gonna have 35 zettabytes of data being created every year. So we've got lots of data. We're putting a lot of money into managing that data and to taking advantage of that data for the best business purposes that we can. We need to make certain that that data is governed, that that data is well-defined and well-understood. So we're gonna focus on whatever you mean by big data in your organization, this is some type of scale to see where we're going. We're going to yoda bytes and xenobytes and selenobytes and so on and so forth. Some of these terms, I don't know if we'll be discussing them anytime soon, but the fact is that big data is just getting bigger and bigger daily, certainly annually, and we wanna make certain that we're putting governance around that data. If we are going to capture and collect the most important of the big data that's available to us, we wanna make sure that that data is of high quality, that it's accessible to people, that it's understood that we have formal accountability applied to make certain that that data is the best that it can be. So you've heard the expression from the US Army, be all you can be. Well, the fact is that we need our big data and we need our BI data to be all that it can be, and that starts with formalizing accountability around the management of that data. Couple of things that we imply by the name of this webinar, you know, the big data and BI analytics require data governance is, well, why does big data require data governance? Well, a lot of us in the data management industry, we use the term that data is an asset all the time, and some people are clear as to what we mean by that and some people are not as clear as to what we mean by that. But when we talk about big data and this volume of data that is now part of what we see daily in our organization, we need to recognize that not only is the data in our applications in our data warehouse, is that an asset, but any data that we can use as an organization is an asset to the organization. So that big data that is most important to us and most important for usage in the organization is certainly an asset. And we've got to look at big data as an asset in our organization, and if it's an asset of the organization, just like any other asset in the organization, we need to govern that data. We need to execute and enforce authority over the management data. We need to formalize accountability. We need to make certain that even though this data is huge, this data is still big, you know, we need to make certain that that data follows the rules, whether those rules are classification rules, compliance rules, security rules, privacy rules, business rules that are defined by our organization. We need to make certain that this big data follows the rules that we've defined for us. We need to be consistent in how we apply big data to applications. And I'm gonna talk a little bit today in the webinar about how important metadata is to the big data and to the BI analytics environment and how important metadata is to our data governance environment as well. So another one of the reasons why big data requires governance is because in order to understand that data, in order to improve the value and the understanding of that data, we need metadata to support us, to improve both business and technical understanding of the data and the data-related assets. So we need to manage the definition of the big data. We need to manage the production of that big data, where it came from. We've got to manage how that big data is being used across the organization. So there's a lot of reasons, and I'm gonna address some additional reasons for why big data requires data governance and some of the additional slides that I'll be covering, but just real quickly, these are some of the primary reasons why we need to, why big data requires governance within our organization. So I'll talk about it from the BI analytics environment. Same thing holds true. If we're gonna put time and effort and resources into building out our business intelligence environment and that the data is going to drive the analytics, that data needs to be well-defined. That data needs to be understood. That data needs to be made available to people. And one of the ways to be able to do that is to, again, formalize accountability over the definition, production and usage of data in our BI environment. Again, it needs to follow the rules. We need to be consistent in how we apply analytics to the data in our organization. And again, the metadata has always been talked about as being a backbone of our business intelligence initiatives. We need metadata to improve the value and the understanding of our BI analytical data. We need to manage the BI data definition, production and usage of that data as well. In a recent webinar and also in a near future webinar, we're gonna be talking about metadata and its relationship to data governance in the more detail. But for most of us that have been active in this industry for a period of time, we understand the importance of having metadata to support our data warehouse. So we understand that. We can certainly translate that into why that metadata is necessary for our BI and our analytical environments and why it's important to our big data environments as well. So there's a couple of quick slides here just to share with you why data governance is important to these types of initiatives. So let's talk a little bit about the metadata angle. There's just for a couple of minutes here. And let's talk about where we've come from and where we're going with metadata. Well, in the past, metadata was always limited to things like the data dictionary, the business glossary, the names, the definitions, the labels for the data on the screens that people are seeing, on the reports that people are getting. But metadata is evolving just like the data, the big data and the analytical environments are evolving. And now the metadata encompasses things like not only the data, but the people that have accountability and responsibility for that data. The processes that are using that data and the processes that we follow in order to even get that data into a condition that's gonna be most usable by our organization. And now metadata is also focusing on things like big data as well. Where is the data coming from? What data do we have available to us? What is the definition of that data? Where do we get it from? How can it be used? How is it being applied in our organization? Metadata really lies at the core of pretty much everything that we're doing in the data management industry. So it really shouldn't be a surprise to us that metadata lies at the core of big data and BI analytical data as well. And the truth is, and I've done webinars on this in the past, I look at the relationship between data governance and metadata as being a two-way street. So when you're doing data governance, some of the things that are just gonna naturally fall out of and result from your data governance initiatives is metadata, metadata about people and the processes associated to the data. So that's the one direction of the two-way street is that we are going to have metadata that flushes out of any of the BI initiatives, the big data initiatives, and certainly out of the data governance initiatives. The other side of the street is that this metadata that we're now counting on to improve the value that we get from our big data and to improve the value that we get from our BI analytical environment, we need to govern that metadata as well. And again, just like I mentioned earlier where we don't need to have big data governance initiatives or metadata governance initiatives or data warehousing governance initiatives, really what we need is we need to define and roll out programs around data governance that are going to enable us to get better use out of our big data, out of our small data, out of our data warehouse data, out of our content, out of our records, out of everything. So metadata really lies at the core of what we're doing around the governance of data, whether it's big data or BI analytical data. So metadata continues to evolve. Initially, it was looked at as being primarily a data management asset, and then it became a business intelligence and a data warehousing asset, and then it became a data governance asset and a privacy and a confidentiality asset. If you look up in the news and you see the word metadata, most often it's associated with privacy and confidentiality and those types of things. So metadata has evolved not only from being a data management asset to a security and a privacy and a confidentiality asset, but now it has become an asset to all of our big data and analytical platforms that we're putting in place in our organization. So one takeaway I'd like to see you take from this webinar is that we understand how metadata really lies at the core of what we're doing around governing the big data and the analytical data that we have in our organization. So the other thing that lies at the core of the big data in the BI analytics is the data governance itself. We need to make certain that we have governance programs in place that will encompass not only the structured data that we have, the bits and bytes and the data and the tables and the databases that we have, but we also need to be able to apply governance to unstructured data. Years ago I was told not to even use the term unstructured data because somebody had really expressed to me that all data is structured to some degree. Okay, well what I mean by unstructured data is data that exists in documents, in audio, in video, in those things that are not your traditional databases. And a lot of the big data that is coming at us these days is unstructured data. So not only do we need to have metadata to address our structured data, but we need to have metadata and we need to have governance in place for unstructured data. Now other terms for unstructured data may include things like content management or records management or logs management. There is certainly a lot of social media data that is coming at our organizations as we start to embrace or we've already embraced social media for our organizations. There is a lot of data coming at us. And if we expect to be able to analyze that data and make use of that data, and we certainly need to govern how that data looks in our organization, govern how the quality of the definition, production and usage of that data. And that data includes not only big data, but if there's a such a thing as big data then I would be thinking that at some point in time we're also gonna have something that we call small data. And small data may be finely tuned to data sets that sit on the desks of managers. If we're gonna govern the data around big data, we're gonna govern data around small data, around data warehouse data, master data, metadata, all of this data needs to be governed. So if we understand that data governance and metadata kind of lie at the core of what we're doing around data management, it's for all data in the organization. Big data, small data, analytical data are certainly included. And so what are we talking about when we're talking about governing perspectives or governance perspectives for big data and analytical data? And so typically when we talk about governance for big data and analytical data, there's five perspectives that I wanna share with you around what it really means to govern the data in our big data and our analytical data environment. And the first one is the accountability perspective. From an accountability perspective, who has responsibility for making certain that the data is defined the way that it needs to be defined so that we can get the most value out of that data? Who has the responsibility for making certain that that data is being produced appropriately or that it's being used appropriately? So we're gonna wanna look at governing data from an accountability perspective. We're gonna wanna look at it from a process perspective. How are we gonna use data in the processes? What are the processes themselves for defining, producing, and using the data? We wanna talk about accountability when it comes to the rules perspective. From the inventory of data, what data do we have in our organization from that perspective? And then lastly from the decision-making perspective. So what I wanna do is I wanna walk through each of these relatively quickly here and talk about what is the governance perspective at the accountability perspective for data in our big data environment and then follow and do the same for the process perspective, rules perspective, and so on and so forth. So for the accountability perspective, and I've talked a little bit about this already in the webinar, is hope that if we're gonna govern data, then there's really three pieces or three processes that are associated with the data that need to be governed. And that is the governance around the definition of the data. So making certain that data that is defined to be used in the organization is not given what I've called in the past cheeseburger definitions. So what's a cheeseburger definition? A cheeseburger definition or the definition of a cheeseburger is that it's a burger with cheese or a client account number is the account number for a client where the definitions of the data don't really tell us anything more than the name of the data field itself. So if we wanna make certain that we're getting the most value out of our big data and our BI data, then we wanna make sure that we have valid definitions, definitions that have been validated, definitions that have been certified in the organization, so that anybody that is going to gain access to that data, whether it's in our BI environment, our big data environment has a good understanding of what that data really means to the organization. We also wanna have accountability for the production of that data. Where did the data come from? What did we do to that data in order to get it into the format that you see in your BI environment or your big data environment? We also then, it makes sense to govern the usage of that data. So if there are rules that are associated with how that data can and can't be used, then we wanna make certain that those rules are well-defined and that they're shared with people. And in order to get those rules well-defined for how you can use and how you can't use data, we need to have somebody in the organization who's accountable for doing that. And the fact is that if you're building a data warehouse, if you're building a BI or a big data environment, somebody typically has the responsibility for making sure that we're putting definition to the data that is included in that environment. And if we make certain that we have accountability for the definition of production and usage of the data in these environments, we're taking great steps towards having a governed data environment around these different types of initiatives. Certainly we need to have accountability for protecting the data. Now who can see the data? Who can't see the data? How can data be used? And there's something called data classification that a lot of organizations are talking about where data can be highly confidential or confidential or sensitive or public data. And all of those different classifications of data require governance. Not only do they require governance, but they require that we have classification handling rules. So if the data is identified as being highly confidential, what does that mean to the organization? How do we handle that data in relationship to data that's not classified? And if certain people can see it, certain people can have that data shared with them, we need to make certain that we have people in our organization that have accountability for protecting the data. And that doesn't just mean in our applications, in our data warehouse, it means in our BI analytical environment, it means in our big data environment as well. We need accountability for improving the understanding. And I talked about that a little bit with the definition is that we need to make certain that we get the appropriate people in our organization involved in putting definition to the data so that people that are using the data not only are they using it appropriately, but they're using it in such a way that they're really leveraging what that data means to the organization. And also from a retention and from elimination perspective, we need to make certain that we have accountability for making certain that we're retaining the data. And if it's big data that it's just more of it, we need to make sure that we're following the retention rules and that we're eliminating data in the appropriate manner as well. So there's accountability. The accountability perspective is very important when it comes to why these different types of applications require governance in our organization. And in fact, most governance organizations, most governance initiatives within organizations address these things. The definition, the production and the usage of data, the protection of the data, the understanding of the data. So the accountability perspective is key when it comes to being successful with our governance initiatives. Let's also talk about it from the process perspective where, and I've done webinars on things that I call bill of rights, getting the right people involved at the right time, using the right data to make the right decision. From a process perspective, that's what data governance is all about. We need to be able to identify who the owners of the data are. And I know that I typically shy away from using the term owner, but it seems to be used a lot in organizations where they've identified people in the organization that are owners of the data. So if there's owners of the data, then there's also gonna be owners of the process associated with the data. And we need to know who those people are as part of our governing perspectives for the big data and the analytical data as well. We wanna make sure, as I said before with the bill of rights, we get the right people involved at the right time. We wanna make certain that the appropriate people are participating in the efforts to improve the definition, the production and the usage of data in our organization. And from a process perspective, we're really looking at three things. We're looking at input and output and throughput. And what is it that is input and output and throughput to these processes? It's data. And that includes the big data and BI analytics data. They're all being used in different processes. If we understand what the data looks like coming into the process, what the data looks like as it leaves the process and what's done to that data during the process, we will have a better chance of successfully governing the data associated with these initiatives. From a rules perspective, well, there's a lot of rules, as we know, are associated with the data. There's business rules, which are rules that we define internally for how the fake data is defined, produced and used. There's compliance rules that are being handed to us from the outside organizations or from the federal government or from our local government or in our industry. We need to make certain that these compliance rules are well documented. The protection rules that I talked about, they need to be well documented. And any standards that we have for data need to also be followed when it comes to data in our data warehouse and data in our big data environment. So there's a rules perspective associated with governing the data in the BI and the big data environments. From an inventory perspective, if you're a regular attendee of these webinars, you'll know that I've shared something in the past that I call a common data matrix. I'm not sharing it in this presentation, but I'm sure that it will come up in a future webinar. But we want to know what data we have and where that data came from. We want to know the quantity of data. We want to know the quality of the data so that we can direct people to the most appropriate data in our environment. We want to know who owns the data and we want to know how the data is used. So from an inventory perspective, we need to know all these things about the data. And when we're talking about big data, data that comes in various formats from a variety of places, we need to make certain that we're looking at it from a quantitative perspective. Can we handle that volume of data from a qualitative perspective? Is that data suited for purpose within our organization? We want to make certain that from an inventory perspective, we're also looking at how are we going to govern data in our big data and our analytical data environments. And then lastly, from a decision-making perspective, we want to know that when we're talking about data in these environments, who has the accountability for being able to make the decisions? How are they going to make decisions? What are they going to make the decisions? And who's going to be impacted by those decisions? And then lastly, how are we tracking those decisions that we're making? Are they good decisions? Are they bad decisions? Requires governance. It requires that somebody has the formal accountability for executing and enforcing authority over that data, but also has the formal accountability for making sure that we track not only the data itself, but the decisions that we're making on the data and how they're adding value to our organization. So we want to look at the future. One of the items on the agenda was to talk about the future of big data and BI data, but in order to look at the future of these things, let's take a real quick look at the past, where our business intelligence strategies have evolved really from hierarchical databases to relational databases to dimensional to enterprise data warehousing, data marks, cues, distributed databases, data lakes, it may be an interesting topic for a future webinar, but data lakes, for those of you that may not be familiar with that term, they're an object-based storage repository that holds data in its native format. So if we've got big data, we're gonna throw all of our big data into this data link, and then as we pick and choose the data that's gonna add most value to our organization, we pull it out of our data lakes and we move it into our BI environments and into our big data environments. So we're not only concerned with the bits and bytes of data in tables and columns, we're talking about unstructured data, we're talking about big data as well, and we need to make certain that as the whole idea of business intelligence evolves, that we're evolving with it from a big data and from an analytical perspective. All right, so now let's look to the future. It's kind of funny, I read this somewhere some time ago that in business intelligence strategies, we're kind of moving from the data mining to the big data analytics perspectives around our big data sources. So we're in data mining, we used to be looking for that needle in the haystack. We'd be looking for the relationships between the data so that we can get that golden nugget of information out of our databases. Well now in the big data environments, we're talking about not only looking at the haystack and trying to find that needle, but we're looking at the entire haystack. And in order to make sure that all of that data in that haystack is of high value or high quality to people, then we wanna make sure that we're governing the definition production and usage of that data to add value to the individuals and the groups in the organization that are going to use that data. Can't really do a webinar on big data without talking about the three Ds of big data and I've seen them in different forms, but I usually refer to them as volume, velocity, and variety. We've already talked about the importance of the volume of the data. We know that we've got more data available to us coming from more sources than we've ever had before. It's coming at us faster than it's ever come at us before. And it's certainly in more varieties of formats than it's ever come before. One of the things that I suggest for organizations is you take a look at how are you handling the volume, the velocity, the variety of data now. Even before you enter into the big data environment, and you saw the chart on an earlier slide in the slide deck, how the volume of data is growing and growing in the organization. Well, if we're not able to govern the volume, the velocity, and the variety of the regular data that we have in our organization, we need to certainly put something in place to govern that and also be able to handle the data that's coming at us at the higher volume, at the higher velocity, and of different shapes and forms of data coming at us. So we wanna make certain that we understand what we're doing well with the data that we already have, but then as we embrace big data and BI analytics, we wanna make certain that we're governing that data consistently across the organization. So it's not just any data in the organization, it's the data that's going to be most useful to us, the data that we're investing most of our resources in. We wanna make certain that we have governance in place around the volume, around the velocity, around the variety of sources of data that we have coming at us. Really the question becomes, well, what are we gonna do with all this data? What are we gonna do with all this big data and this BI analytical data that we now know is well-defined and well-produced and we've documented the rules about how it needs to be used or how it can be used? Well, the first thing we need to do is we need to make sense of that data. We need to understand and trust the data. We need to protect the data. We need to know where the data comes from. We need to be able to govern the access to that data and also be able to govern how analysis is taking place on the data. So we need to ask all of these questions when we start trying to determine what we're going to do with all this data that now exists in our big data environments and our BI analytical environments. And the truth is, I can just get to the next slide here, that in order to achieve these goals of getting the return on investment from these investments that we're making in big data and BI data, is that we need to be able to do all these things, but we really need to have governed data. We need to have governed process and we really need to govern the use of tools in our environment. All these things need to be governed. Not only the data itself, which we've talked about a little bit here, but the process associated with defining, producing and using that data. And then the technologies that we're using to leverage and get the most out of our big data in our analytical environments. And you know, it's funny. So some of the organizations have something that they call information governance as compared to data governance. And oftentimes the information governance really encompasses all three of these areas. They encompass data governance, process governance and technology governance. And I can tell you that there's at least a handful of organizations that I've worked with that have called their initiatives information governance. And why information governance rather than data governance? Well, because we're not only really looking at the data, but we're looking at the process and we're looking at the associated technologies as well. So the next item on the agenda, things to talk about really was the value of governance to these initiatives. But what I really want to do here is I want to kind of turn that around a little bit. So we've already really talked about the value that data governance will bring to our big data and the value that the data governance will bring to our BI analytical data. Let's reverse that here for a moment. Let's talk about the value that big data and BI analytics are gonna bring to our data governance environments. So let's talk about that here for a minute. So again, now what we're gonna look at is what's the value of big data to our data governance initiative? What's the value of the BI analytics to our data governance initiatives? Well, the fact is that we can use the fact that our management is talking about big data and BI analytics to push our need for data governance in the organization. If they've already justified the investment in big data and justify the investment in analytics, then we need to make certain that we are gonna get the most return on investment out of these resources that we're building in our organization. We wanna make sure that we're maximizing the use of the resources across these different initiatives. We wanna make certain that we're providing trustable data in our analytical environment and in our big data environment. We wanna make certain that we're making better decisions from that data as well. So if we can look at the fact that our management teams are talking big data and talking analytics, then in order to truly get the most value out of their investment in these things, we need to have high quality data. We need to have understood data. We need to know all those things that we mentioned about, we talked about in the first 45 minutes of this webinar. So instead of again selling the value of data governance to these things, let's look at it the other way around. Let's look at how do we sell the value of these things to data governance? And it's a very different approach for selling why we need governance in our organization. But if they're making the investment in these things because these are the buzzwords of the industry and these are the things that everybody needs to address from the industry analyst perspective, from the vendors perspective, then we wanna make certain that the data that makes it into these environments is well-governed and most useful to the organization. So I'm gonna make two bold statements here. And if you don't agree with me through the chat or separate from the webinar, please reach out to me and tell me that you disagree. But these two bold statements, I'm hoping that most of you will agree with me, that organizations that govern their data effectively get more value from their big data and their analytics and their BI analytics efforts. So if you believe that to be true, I'd love to hear from you things that stand behind that bold statement. But if we can say that organizations that govern their data get more value out of their big data and their analytical environment, that's a pretty bold statement. And then we've gotta back it up with the actions that we put into place around data and around data governance. So my second really bold statement that I'm gonna share with you is that organizations that don't govern their data effectively do one of two things typically. Typically they don't even embark on these types of data activities or they get less value from their data-driven activities. So the first one is organizations that do govern their data effectively are getting the most value out of these initiatives than the organizations that aren't governing their data. Well, they may not even address these initiatives or they're certainly getting less value out of their big data resources and their BI analytical resources. So the last thing that I wanna talk to you about before we take a few questions is driving true intelligence from governed data. And so typically to derive intelligence from the data we need to have quality data. We need to have a high level of understanding around the data. We need to protect the data. We need to make the data available and we need to have formalized accountability. Again, that kind of lies at the core of what I talk about with non-invasive data governance is that if we look at the fact that there's already people in the organization that have accountability for the data and really what we need to start with is formalizing that accountability then we can make certain that we have the appropriate people involved in the appropriate times with the data to make certain that the data is the way that it needs to to derive that value from our big data resources from our BI resources. So let's walk through each of these real quickly from a quality perspective. We're talking about data standards and making certain that we have standards. It's very difficult to improve the quality of the data in the organization if we don't have a standard as to what this data needs to look like, how this data is going to be used. So by starting with data, if we're going to focus on data quality, we want to make certain that we have standards defined for the data in the organization and that includes data in our BI environment, data in our big data environment as well. We want to make certain that any processes that we put in place in our organization that they're being followed. You know, a lot of organizations do business process re-engineering and defined processes but then there's no level of accountability for making certain that we're engaging the appropriate people with the appropriate steps in the process. We want to make certain that we have escalation paths to resolve issues around data that where we're just coming to disagreement. So rather than bang heads and go our merry way, we need to have a formal way of being able to escalate decisions to the appropriate level of the organization so that again, we're looking at governing data as an asset and governing specifically data in our BI environment and our big data environments as an asset as well. We want to make sure that we have decision rights, we know who's responsible for making decisions, we want to make certain that we're certifying the data into these resources and making sure that it follows the standards that we've defined for ourselves. And then the other term that I would add to that is validation, we want to make sure that we're validating the data that makes it into these data resources. So we can derive intelligence from governed data, we certainly need to improve the quality of the data in order to do that. We need to improve the understanding whether or not that understanding is through a business glossary and a vocabulary or through a dictionary. That's making sure that we've captured the information about the mapping of the data that we've managed and that we make available information about the validation of the data, what steps have we gone through with this data to assure the people that are using the data in these environments that the data is what it needs to be in order for them to make decisions. So I call that the data validation piece. And then there's the understanding, there's the education piece of it as well. We need to make certain that people that have access to the BI analytical environments and people that have access to our big data environment that we give them some education around what that data is, how it's defined, where it came from, all the things that we do with traditional BI resources, we need to do the same thing with our big data in our BI analytical data environments. From a protection perspective, and we talked about this a little bit earlier, that we have classification rules, security rules, auditability rules. We need to make sure that all of these things are well-defined and that we have a way to be able to keep and hold people formally accountable for following the classification, the security rules, for making certain that we can work with our auditors to make certain that the data is what it needs to be and that it's trustable, and that whether it's internal or external auditors, that they have the metadata that is required to make that data truly as valuable as it can be for the organization. And that involves process and education, it involves enforcement and compliance as well, but protection is one of the things that we need to put in place to derive intelligence from our governed data. Availability, we need to make certain that the appropriate data is available to the appropriate people, that we've taken a look at what the best way is to store the big data, especially when it comes to big data, again, coming at us from many different directions and many different formats, we wanna make certain that we govern the storage of that data. We govern the timeliness of the data to make certain that the most appropriate data is available when it needs to be for the end users of these environments. Data is complete. Kinda goes back to the quality perspective as well, is we wanna make certain that our big data is complete, that our BI data is complete, and that it can be used to answer the questions that people have of the data. From an accountability perspective, we talked about that a little bit. We need to formalize accountability rather than hand it to people with new responsibilities. We need to have stewards of the data, whether it's big data or small data, or BI data or application data. We need to make certain that we have stewardship of the data. We need to improve our communications around the data, including the metadata that we've collected about the data. We need to improve how we enforce, execute and enforce authority over the management of that data. So the main points that we've covered in this webinar is that we talked about governance applications and how we can apply governance to our BI environments and our big data environments. We talked a little bit about the future of big data and the future of BI data. And I think we all know that it's growing and growing in importance in our organization. We talked about the relationship between the big data, BI, and governance. We talked about articulating value, not only in the value that governance will bring to our big data and our BI analytics efforts, but also that the reverse is true, that we use the fact that we're doing these things to really push the need for governance into our organization. And we talked a little bit about true intelligence is derived from the governed data in our environment. And with that, I'm getting ready to turn this back over to Shannon here for any questions that we have. But just a real quick reminder, in August, the third Thursday of the month falls on the 20th. We're gonna talk about data governance and data stewardship certification. That's a hot topic for a lot of organizations, so please welcome you back for that in future webinars as well. And the last thing that I'd like to share with you really is please, if you get a moment, go out and visit tdan.com, register for the emails that go out, announcing the new content. We'd love to have your participation in the tdan.com environment as well as the real world data governance environment. All of those, I'm so proud to say that I am glad to work with Dataversity on both of those things. It's really important that we recognize that Dataversity provides all of these resources, and if it wasn't for them, the series wouldn't be here and the publication may not be where it is today. So with that, I'd like to turn it back to Shannon. Do we have any questions, Shannon? We certainly do, yes. And of course, the most popular question is people inquiring about the slides. Just a reminder, I'll be sending out follow-up email by end of day Monday for this particular webinar with links to the slides, links to the recording, and anything else requested throughout the webinar. And if we don't have enough time for all the questions, keep the questions coming. One of the great things about this webinar series, Bob will answer the questions in written format that we don't have time to get to, and we'll make sure and get those out in the follow-up email as well. So Bob, the first question coming at you is, how realistic is it that rules are deeply flushed out? In my experience, I only know of two companies that do a good job of defining data. Most of the others at some point tried, but things are out of date or stale. Well, that's a great question. I appreciate that question. The fact is that in order to be able to get the value out of the data, we need to have people in the organization that have formal accountability for making certain that we capture that information. So it's really important. I mean, there are organizations that govern their data better than other organizations, but what it truly comes down to is can we define people as being formally accountable for not only the definition, but the protection for putting good definition? I would venture to guess there's a lot more than two organizations that do this well. What it really requires is, and to be successful with this, is that we need to have people in the organization that are formally accountable for making certain that they happen. Love it. Okay, so we've got another question coming in. Do you attempt governance of the data lakes or only data that was pulled out for a particular business reason? Well, by definition, really, a data lake is data in its native format. So if it's data that's coming from streams that are ungoverned, then the data in the data lake is not gonna be governed. So I would say, and again, I don't have a lot of experience with data lakes, but data in the data lakes is not governed. Data that is pulled out of the data lakes for a useful purpose, typically that's when governance is applied. If I'm off base around that, I'd love to hear about it, but that's the way that I view the data lakes versus the use of BI and big data environments. And that's the way I've heard it and understand it as well. Actually, all the questions that we have coming in, there may be a couple more if you guys wanna type in a couple additional questions, but Bob, thank you so much for another fantastic presentation. This is great as always, and I look forward to next month's presentation. As you mentioned, a very hot topic for a lot of people, the certification of data stewards. Anything else you wanna add before we end the session? No, I appreciate it. I think the next month's topic is gonna be great as well. Big data is a keyword that is used by a lot of organizations, and some organizations use it without having a firm definition of what that means. We're looking at how the volume of data is growing and the usage of that data for analytics, it certainly requires governance, which is really what this webinar was all about. Perfect, thanks so much, Bob, and thank you everyone for attending today and taking the time. Again, I will get that follow-up email out within two business days. Again, with links to the recording, the slides and additional information requested throughout the webinar. And I hope everyone has a great day. Thanks, Bob. Thanks, Shannon. Take care, everybody.