 Welcome everyone. I'm Kemp and I'm the Executive Editor of DataVersity. We would like to thank you for joining November's installment of the Monthly DataVersity Webinar Series, Real World Data Governance with Bob Steiner. Today we'll be discussing governing data, big and small come one come all. Just a couple of points to get us started due to the large number of people that attend these sessions. You will be muted during the webinar. For questions, we will be clicking them in the Q&A in the bottom right-hand corner of your screen or if you would like to tweet, we encourage you to share highlights or questions via Twitter using hashtag RWDG, Real World Data Governance. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested through our webinar. Now that we've introduced to you our speaker today, Bob Steiner. Bob is the President and Principal at IK Consulting and Educational Services and the publisher of the data administration newsletter, TDAN.com. Bob has been a recipient of the DAMA Professional Award for significant and demonstrable contributions to the data management industry. Bob specializes in non-invasive data governance, data stewardship, and metadata management solutions. And with that, I will give the floor to Bob to introduce the webinar. Hello and welcome. Thank you very much, Shannon. Thank you, like always. Thank you, DataVersity, like always. And thank all of you for taking time for your busy schedule to sit in the Monthly Webinar Series. As Shannon mentioned, the name of this installment of real-world data governance is Governing Data, Big and Small, Come One, Come All. And we're going to talk a little bit about big data. We're going to talk a little bit about something that probably hasn't gotten a whole lot of press yet. Because that might be getting some press soon. And it's something called Small Data. And we're going to talk about how governing data, just to be honest, is how it's impacted whether or not the data is big or small. And we're going to touch on a couple of different things. We're going to see what some of the people in the industry are saying. And then we're also going to share some insights and some considerations on managing or governing your data, whether it's really big data governance or we're really just governing big data. So again, thank you very much for attending the session today. So what I'd like to do is I'd also like to let you know about the upcoming seminars or the upcoming installments of the Real-World Data Governance Webinar Series. And in December we'll be talking about data governance expectations and how we get to set expectations effectively for the management in our organization and how we can focus on getting the business people to speak up and share with us where governance is going to add value rather than always trying to tell them where governance will add value in their organization. So I think that's going to be a really interesting installment of the Webinar Series. And then in January, February, and March, you see we're going to talk about a data governance framework of success and an operating model of roles and responsibilities. In February, we'll be talking about data governance policy, what if it's necessary, how it would be used within organizations. And then in March, it's got a really interesting session called Agile Data Governance. And I'm going to have a special guest that's going to be with me during that webinar. We're going to talk about Agile. We're going to talk about data governance and how the two sync up and how we can learn from one to help to be successful in the other. With that being said, let's kind of move into the abstract and hopefully the reason why you're here for the webinar today. We all know that big data is a big subject these days and big data is getting a whole lot of play in the technology and the data management aspects of it. And we also have been hearing about the term called big data governance. And sometimes I get asked the question as to whether or not there really is something such as big data governance. And I've mentioned this before in previous webinars, but what I haven't, my thought is that there is big data governance, but it isn't really any much different, at least from governing other types of data in our organization. So I would say there is the governance of big data, but not necessarily a specific industry called a big data governance. And we'll talk about that a little bit more when we move into the webinar sessions here. So I guess I can ask that question many times in different venues. And I would say here that I'm going to answer it once and for all but the truth is people aren't going to continue to have that question. And hopefully at some point they'll be able to make their decisions as to whether or not there is such a thing as specifically big data governance. So one of the things that I'm going to try to challenge you with is a couple of different aspects of considerations for governing data. And then at one point I'm going to kind of sit back and if I can, I don't know how successful it will be, but I'm going to kind of turn it over to you guys and see if you have specific questions about big data that you need to govern that would be different from some of the other aspects of governing data that we're going to talk about here in the session. So if I challenge you, if I try to make you think, hopefully we'll get a little bit of response from you, but start thinking about that. What are some of the things about big data that need to be governed differently than any other type of data within our organization? So we'll get to that in a couple of slides here. But again, so many organizations are considering big data solutions. We were going to talk about the aspects of big data that might require some type of special attention. Some of these management issues are the same issues, some of them may be different. And so we're going to talk about things that are similar and things that are different. And then so again, I just welcome you to sit back and participate if you can and I'd love to hear from you as to what your thoughts are around governing data, whether it's big or small. And I'm glad to see a whole bunch of you on the call today. So the term that I used of small data, I kind of say it, you think I might have made up that term, but in fact, things I've started to read are indicating that there is something called small data that is at least at some point going to be talked about in the data management industry. And so exactly is small data? Well, really the question for this session is what exactly is big data? We're going to talk about that. But small data, from my understanding, is when people will take big data sets and they break them down into smaller, usable sets and the quality that that data is going to have to have and the metadata and the accountability for that data within an organization. So I'm not necessarily saying that there's going to be something such as small governance, but the governance of small data certainly is something that we need to consider and it's not a term that I made up and hopefully at some point it'll get a little bit more press and people will start to talk about small data in the same way that they're talking about big data. I'm not sure that the technology is going to become as important, but small data may become the next big thing around data at some point in the near future. So what I'm going to do throughout this webinar is I want to throw out a couple of different quotes that I've seen over the years about big data. And if you can recall, if you were attending this webinar series back a year ago, we did another webinar on big data governance and I believe at some point in the 2014 schedule, we've got another webinar that's scheduled about big data governance. And so the question is how is the industry going to be changing between now and then? Are there going to be more technologies? Are there going to be newer technologies? Is big data governance going to become something that's going to take off? Well, we don't have to check back a year from now. Just like if you attended this session a year ago, I've got a whole bunch of things that have been written and said about big data and data governance that I'm going to share with you. So this gentleman, Cliff Finley from ReadWrite.com, back in 2011, wrote that in short, big data simply means data sets that are large enough to be difficult to be worked with. So exactly how big is big? Well, I guess the beauty is in the eye of the beholder there. So there's no two definition to a size of a data set, at least not that I've seen, a size of a data set that would then be considered to be big data versus any other type of data in the organization. Certainly when we're talking about terabytes and we're talking about petabytes and we're talking about edabytes and all these things that are used to describe the big data, certainly data sets are getting bigger and bigger, and therefore big data and the technology used to be able to manage the data coming from the big data is certainly blossoming at this point, and there's a lot of information out there. There's a lot of information out on the diversity site about big data, and there's a lot being written and talked about. So here's our agenda for today. We're going to talk about what is big data. We're going to talk about specific considerations for governing big data. We're going to talk about big data and whether or not people think that big data governance is truly a thing, or is it, again, just the governance of the big data. We're going to talk about, again, governing that data, and then we're going to spend a little bit of time talking about small data and kind of putting into context some of the things that I've been reading about the industry and how small data might be one of the next big things that we need to be looking out for. So from my experience, and I've been working in the data governance industry for many years, at least it seems to be many years, there's a whole lot of different types of governance out there that people are interested in. There's certainly data governance. In fact, this week down in Fort Lauderdale, Florida was the Dataversity and DevTech International Data Governance Winter Conference. It was attended by hundreds of people, and a lot of people kind of come to the word data governance, and then they discuss the difference between data and information governance, and there are some differences, but we hear these terms fairly often. We hear of data governance and information governance and there's a whole lot of different reasons why organizations choose to call it information governance, one, because the word data governance hasn't resonated in the past, and information then takes into consideration structured data, unstructured data, and things like that. There's a big data governance as a term that's being out there, and that's something that we want to discuss in the webinar today. I've heard a lot about BI governance, and BI governance tends to be the whole, the governance of the whole BI process from the beginning of defining the data to producing the data, and then using the data. BI governance typically is alluding to the governance of the process around building a BI environment. We've heard the term metadata governance, which I'm not sure again that it's a specific discipline itself. Perhaps it's just that we need to govern the metadata, and so therefore we call it metadata governance. Certainly project governance, corporate governance, process governance, and for one of my clients recently, they used the term customer data governance because they were very specific about the subset of data that they wanted to manage or have fall under the auspices of their data governance program. So there's a lot of different types of governance. Certainly there's something called big data governance, but we're going to debate that a little bit and we're going to talk about what are some of the definitions of even calling it big data governance versus just the governance of big data. So again, as I typically get started, for those of you who have not been on one of my webinars before, and I wanted to share the definitions that I use for data governance and for data stewardship and talk about them here real briefly. But if there's nothing in the definition that points to big data or small data or metadata or customer data, it just says the data governance is the execution and enforcement of authority over the management of data and data-related resources. And again, the reason that I put that kind of a teeth behind the definition and this definition truly does make some organizations cringe and say it's worded too strongly because at the end of the day, this is what we're typically trying to achieve. We're trying to achieve the execution and enforcement of authority and management of data. I've seen other definitions of data governance. There's the harmonization or the orchestration of people and process of data and it all sounds very good and it all sounds very soft. I like to have a definition that has a little bit of teeth behind it and that's why I use this definition. My definition of data stewardship is a little bit less aggressive or less difficult to understand, should I say. When I say data stewardship is the formalization of accountability over the management of data, again, we're not saying over the management of big data or big data resources. So the definitions, I'm not saying, I believe that we don't really need to have a separate definition for data governance versus data governance, but all we would say is that we're going to execute an enforced authority over the management of big data and big data resources. So again, I use definitions that make sense to you and if you have a definition that you'd like to put into the chat box here on the webinar and share with people, I'm sure that that would be very interesting for people to share definitions of data governance and whether or not they have a definition of what big data governance is being considered within their organization or within your organization. So you also have heard me talk about non-invasive data governance and non-invasive data governance is kind of what I practice when I am working with an organization where I think that the fact is that we already govern data to a certain extent within our organization. Sometimes it's very informal, inefficient, and ineffective and all we really need to do is put some format around it and it's a lot different than a command and control approach. Actually, it's a lot different than a command and control approach. So the non-invasive data governance, the definition that I use is it's the practice of applying for accountability through non-invasive roles and responsibilities to existing processes to assure these things, take care of those things. Basically, there's data production and usage of data, assure regulatory compliance security, and so on and so forth. One of the things that I noticed when I was going through this slide deck is that I'm not really sharing a whole lot of the templates and things that I've shared in previous webinars, but if you're interested in those templates, please let us know and we'll be glad to send those off to you just as a way of taking into consideration the non-invasive approach to data governance. So another case that describes how governance is applied, how it's always to be transparent and supportive and collaborative, whether that's big data or immediate data, that's a term I really haven't heard yet, but who knows, somebody out there might be coming up with that term sometime soon, or small data, which is, again, the topic that seems to be getting a little bit of error right now, or a little bit of press, and we're going to talk about that again in a couple of minutes here. So whether that data is on the desktop and the server or the cloud, we need to be able to govern data within our organizations. We want to do it in the most practical way possible. And so that way is by taking the non-invasive approach and identifying and recognizing people as stewards rather than assigning them to be data stewards. If you're interested, take a look at the rest of the World Data Governance Webinar series that's on the DataVersity site. There's a whole lot of past webinars that address the non-invasive approach in great detail. But we're not going to talk about that today. We're going to talk about big data, and we're going to talk about small data, and we're going to talk about the differences or the similarities between those. So what I want to do again is share with you a definition or at least a list of what the category of big data are, according to Krish Krishnan, who is a pretty well-known writer in different places behind network information management. But he talks about categories of big data. He talked about unstructured data as big data or semi-structured data. And he finds, again, the semi-structured data as being earnings reports, spreadsheets, software modules, and those types of things. Structured data being data that comes in bits and bytes in rows and columns in databases, machine data, mathematical model outputs. But the truth is that you want to look at these and say these are categories for data just in general. We've got unstructured data, and we've got data that's in documents, and we've got document management, content management, records management, knowledge management, that cover all sorts of different types of data. But not necessarily categories of big data, it's categories of data. And again, Krish also talked about common characteristics of big data. And one of the things that you've done reading on big data in the past, you've seen, you talk about the three V's. The three V's. I only have two of them listed here, but they're volume and variety. And the three is velocity. And again, we'll talk about that in a minute here. But in this article that I've alluded to, he's talking about some common characteristics of big data, the volume of data, or the size of the databases, the variety of the data and the different formats that it comes in, the ambiguity and the quality of the data. But again, I would say that if we can cross out the word big and we use volume, variety, ambiguity, quality, as a way of being able to look at any type of data that we have. And back in the 80s and 90s when I was getting started in the industry, we always took a look at how large the database was going to be. So the volume is being something new. Well, when we're talking about massive amounts of data that we talk about with big data, yes, volume becomes a big issue. But volume has been an issue since databases were built and since people started to manage those databases and size those databases. There's a lot of different varieties of data. There's a lot of ambiguity in the data across an organization. So these aren't necessarily just common characteristics of big data. They're common characteristics of any type of data that we work with within our organization. So in addition to big data, big data is a term applied to datasets whose size is beyond the ability of common-use software to carry out their management process. So we're talking about very large datasets here. And what's interesting is there was a big data quick poll that was taken a few years ago and I wanted to share the results with you here that when people ask what big data is, the big data is 51 percent. So a little bit more than half said that it's a legitimate problem stemming from the growth of unstructured data in our organizations. Well, the truth is big data does not just allude to unstructured data. It alludes to very high volume datasets as well. But some of the other results of this survey are some things that I found to be quite interesting that people thought that it was a new catchphrase for data management challenge. Like I just said, people have been concerned about the size of the database since the beginning of databases. But a lot of people, even back when this survey was taken, everyone was two years ago or it was within the last two years, the people were concerned that big data is just another way to say data warehouse or another way to say Hadoop or it's a meaningless marketing catchphrase. Well, the truth is big data is real and big data needs to be governed and we need the metadata and we need the accountability for big data the same as we need it for any type of data in our organization. So going back to 2001, a gentleman who is also a friend of mine, Doug Laney, had mentioned the big Vs that were associated with big data. So he talked about volume and velocity and variety and if you've done reading on big data, you've probably come across these three Vs a multitude of times. So just think about it again, the amount of data, the velocity is the speed that data is coming in and the speed that data needs to go out. The variety is what different types of data do we need to govern? And if we're talking about unstructured data then it's certainly going to take a specific set of tools to work with unstructured data versus just the high volume relational data if data is stored that way. So he talked about big data. Big data has emerged because of what's happening in our universe basically. The amount of data that we have continues to grow and there are between 4.6 billion mobile phones subscribers in the world and between 1 billion and 2 billion people accessing the internet. And all that is is truly the exchange of data back and forth and so the volume of data that is moving on at any given time continues to grow. And it all ended up between 1990 and 2005 where the billion people entered the middle class. Meaning either it was either through increases in the amount of money that they had or through the reduction in the price of technologies these days there are more and more people that are calling for more and more data at all times. And so the amount of data and what is big data just continues to grow. In fact the world's effective capacity to exchange information in 0.6 was 281 petabytes. It's kind of the size that I can't even fathom. There's 471 petabytes in 93, 2.2 exabytes, 65 exabytes in 2007 and they were saying that it was 667 exabytes annually by 2013. So now just by a hand here how many people know what an exabyte is? I don't know what an exabyte was until I looked it up and this is what an exabyte is. It's a line with 18 zeros after it. So it's 10 to the 18th power bytes and 1,000 petabytes and a billion gigabytes. So big data. When we're talking about big data we're no longer talking about it in terms that most people use to. We're using the terms petabytes and exabytes and who knows what we start to call it after that or what it is after that. I'm sure they already know what it's called after that. They just haven't fathomed data that was that large at any given time. So what are they saying about big data on the net? Big data is a catch phrase that's been bubbled up for a long time. Big data is hard to do. It's very expensive and time consuming. I just wanted to share with you a couple thoughts of other people that are out there in the industry and what they're saying about big data. We can't really answer the question of governing the big data. How does it compare to governing that we have? We're going to get to that here in one second. Before I get to that I wanted to share with you one more slide from a good friend Karen Lopez, data chick. She has a webinar series as well on dataversity. If you've ever talked to Karen you know she can be quite quite informed and quite amazing as well. She started a grant on big data and she was telling us when she asked her a question, what's big data? She said she was here to tell you that nobody really knows what big data is. Big data is just that. It could be anything that you want it to be. But I'm sure there are definitions as to what would be considered big data but I think it's the most often it's in the eyes of the beholder. Some things that she noticed about big data was that people capitalized. They capitalized the word data. I'm wondering when small data comes around are we going to only spell small data in small letters? Because we spell big data in big letters. She was saying next thing we have is huge data and ginormous data by next year's Enterprise Data World in Austin, Dataversity's event. The truth is there may be more talk about small data than there is about huge data or ginormous data or data that is any bigger than that or big data basically. We're talking about governing data just in general. We need to define what we really mean by governing something. It's one of the things that a lot of organizations are talking data governance but they haven't necessarily taken a step back and talked to people in their organization about truly what it means to govern the data within their organization. So I'll share the free dictionary as I've done in the past and what does the free dictionary tell us that governing is and the definition that they gave was to make it administer public policy to exercise sovereign authority to regulate control actions or behaviors of others. All I did was add to data around the government. So to govern data means to have sovereign authority in data and the definition that I used at the beginning of the webinar talked about execution and enforcement of authority over time. So it does go to the definition of govern but in order to control the actions or behavior of data a good friend of mine, Len Silverston once said that we're really governing the data itself. We're governing people's behavior associated with that data. So when it comes to big data or small data certainly what we want to do is we want to I wouldn't really say control the actions or behaviors but we want to be able to influence the actions or behaviors of individuals and people with the organization. We can't really control the actions or the behavior of the data but we can control the actions and the behavior of the people associated with the data within our organizations. And that certainly holds true for any big data set that we would use to perform analytics anything that we will do with our big data. So when we talk about governance we're talking about formalizing accountability I mentioned the bill of right here it's an article that I published in the recent issue of the bill of rights is getting the right people involved at the right time using the right data the right way to make the right decision leads to results so that's what I mean by the bill of rights. Assuring compliance, following rules understanding the data, reducing risk that is what governance means and unless somebody can inform me differently I would believe that that's what it means around big data as well. We need to formalize the accountability for the big data. We need to get the right people involved at the right time when we are working with the big data. We need to assure that that big data is compliant it's not getting the hands of people that probably can't see that data it can't use the rules there's improved understanding of the data that does not mean it's a command and control or slowing down or adding bureaucracy in some organizations it may mean those things they can approach it's more practical something that takes advantage of the things that we have in our organizations governance does not have to mean command and control slowing down and adding bureaucracy and those types of things. So considerations for governing big data so first let's talk about considerations for governing any type of data within our organization and then please share with me if it makes sense that how is this different for big data than it is for small data versus customer data or product data or BF data it's pretty much the same and I'd love to hear from you if you think that it's different and if I don't hear from anybody then it kind of assures it confirms the fact that maybe there's not a whole lot of things in big data that need special attention but we will talk about some of them here in a couple of minutes. So some of the considerations for governing any type of data is the definition of the data or the definition or usage of the data and I always talk about governance in terms of definition, production and usage for using accountability governing processes risk management but we've got risk management with any data that we have in the organization whether it's compliance and regulatory control data classification which is highly sensitive maybe sensitive maybe highly confidential and classification of the data that way or data security so we need to secure data if it's big or small but the issues around securing the data may be different because of the size of the data but then again maybe it's not so governing your data of any size requires at least these governance basics and these risk management aspects as well. Of this but the issue resolution and the certification of the data, the quality control governing that data in a proactive way or a reactive way the bottom line is that you really need to consider these things for governing any type of data whether it's big data or small data or medium sized data or just any data in any database in your organization those are responsibilities to be similar. We've got people at the executive and strategic levels who have an interest or concern in the data the tactical, the operational and the support levels as well so we can't necessarily say that you need to redesign an existing governance program to govern data across your organization including your big data we've done some webinars on an operating model of roles and responsibilities one coming up early in the first quarter of next year is a framework for data governance that includes all of these layers and in that webinar the difference is for the different types of data in the organization but for right now I would say that you need to at least consider having executive and strategic level people and tactical, operational and support people when it comes to the data in your organization and the governance of that data in your organization the situation and planning when we communicate about data governance whether it's big data or small data or big data do we need to recognize who the audience is that we're working with is it the executive and strategic level because we're certainly going to communicate with them differently than we're going to communicate with other people in the organization what's the messages that we need to communicate what are the tools that we're going to use to deliver that communication certainly in information governance in information governance oftentimes means different things to different organizations but most of them I've seen information governance allude to the fact that there's process governance and technology governance and policy governance in some organizations stewardship all these things are important for information governance for data governance for structured structured data governance and unstructured data governance so there's a lot of considerations for governing any type of data in the organization it's important to call it big data governance to focus on just the big data in your organization I'm not here to tell you that you should do that I want you to consider that there's governance of all different types of data in your organization not necessarily something that's different from data than versus any other type of data in your organization so what are some of the similarities between governing big data and other versus other types of data well there's the governance the governance of the definition of the data and by the way the pictures that you see here come from me they came from my very artistic teenage daughter so we can thank her for those but we've got to govern the definition of the data and we've got to govern the production of the data and the usage of the data when we talk about big data we have to be probably more concerned about the volume of the data the size of the data how quickly it comes in and out I'm not sure it's different or vastly different for big data sets than it is for other types and the variety of data whether it's big or small we have an organization that we need to also govern the unstructured data most organizations will start with the structured data and then get into the unstructured data but there's a lot of organizations that I've had contact with and that I've worked with that have viewed unstructured data as being something that was very important for them to be able to govern in their organization and sometimes the volume of that unstructured data is so large that some organizations would consider that to be big data as well in addition the quote that we early on in this slide deck talks about the different types of data and the unstructured data and how unstructured also plays such a big piece in big data in organizations another similarity between governing big data versus other data is the governance of the definition of the data and govern the data redundancy the definition of the metadata or the metadata about that data we model data similarities again between big data and other types of data in the organization as it relates to the definition of the data once I start seeing a slew of people saying how the governance of the definition of the big data is different from the governance of any other type of data well if we're not seeing that maybe people believe that the governing data is the same whether it's big or maybe not the same but very similar no matter if it's big or small or medium size data similarities between the production of the data now this may be a little bit different but with big data an organization I worked with recently an oil and gas company they had their wells offshore and those wells and those platforms that were out in the Gulf of Mexico they had thousands of sensors at them that were pumping back data to the organization 24 hours a day 7 days a week so very very high volume data but that was very different than how data is produced in other places and data that may not be considered big data so we've got to worry about the origination where it comes from the creation of that data whose responsibility for it again very similar to the things that we need to govern when we're running any type of any size of data within our organization usage as well there's regulatory compliance there's privacy security the distribution of the data again we need to govern the usage of the data whether that data is big or small and if it's for big then please somebody indicate that for us all so we see that there is such a thing as big data governance out there rather than just the governance of big data so again governing the volume velocity and variety the same thing holds true is it really that much different than what we're going to do around the governance of other types of data in our organization that we need to call it out as something that's specifically different so we need to be concerned with the volume yet we need to be concerned with the velocity and the variety but the same thing holds true for any type of data that we have in our organization not just the big data maybe it's a bigger issue or maybe it's just that we need to be able to apply different types of technology to it but the actual governance of the data itself certainly there's a thing out there as technology governance but as the governance of the data itself goes not a whole lot of differences between the governance of big data and the governance of any other size data within our organization so what are some of the differences if there are any differences between governing big data versus other data well we need different types of technology as I alluded to earlier the uses of technology to manage that data to store that data to make that data available so here's the point right now where I'm going to ask you if you're still out there what do you see as being some of the differences between governing big data versus other data and to share some of that if people have the interest in sharing that with us or maybe you feel the same way that I do but there's really similarities between how we govern big data and how we govern other types of data within our organization so please take if you get a chance attend some of this information and as it says feel free to tell and so I would like to get some interaction from you folks see if you're in fact I feel awake out there let's talk about governance of big data again for a second here talking about the volume the variety the tool usage some of the things that I might not be as well versed in the no sql but certainly the simplicity of use the complexity structure data unstructured all of these things are things that we need to consider when we're governing when we're governing data of any type in our organization we need to be concerned about the metadata that sits behind the data that's going to hold true for the people that are using I'm seeing some people send in some responses so this is this is good but the idea again is to think that governing big data may have specific issues but it's also a lot like the governance of any other type of data within our organization so let's talk a little bit about governance and stewardship in big data so in a recent report that came out at TDWI data stewardship came up quite often in this report and they talked about people that were managing big data efforts to have the idea of adding big data initiatives to their existing data governance ecosystem and to me that was somewhat humorous and I was thinking that that it would probably be the other way around actually I would think that if you have the existing governance ecosystem you just have to identify what the differences are and what the similarities are between governing the big data as it again compares to any other data that's being governed by that data governance ecosystem and one of the problems during the big data management a lot of stewardship or governance was sent to a third of the people second only to inadequate skills or staffing of big data initiatives so a lot of organizations are saying that stewardship and governance is very important when it comes to big data do we need to call that big data governance again that's up to you and I'm seeing a bunch of responses in the chat area which I'd love to get to when I get to the end of the slide back here but thank you for participating in the webinar that way it always makes it interesting so we talked about big data and we talked about regular data and we talked about any differences between governing the different types of data I want to talk about small data here for a second the first question that I have is is there really something that is called small data out there or will there become something that is called small data and Dan Lupin has mentioned that big data seems to be capitalized well small data I guess if we refer to it as small data we should refer to it in lower case because that gives people the idea that it's small yes but so what is small data I'm wondering with you again something that I read and I tried to read a lot about what's going on in the industry and somebody seemed to say here that it seems inevitable but some of the experts have really claimed that small data analysis is the next big buzz word so we're not going to go from from big huge data to gigantic data we're going to we may actually start looking at what small data is and well big data is getting all the the headlines it says small data is the next big thing and the values associated with combined I love the word culling vast structure data sets for business insights so we're talking about culling data we're taking the data larger sets of data and making them small and more manageable and so when we have those small and manageable data sets and we have folks that are focusing on using that data to do either analytics or to make core decisions or to identify that data in terms of its key performance indicators and smaller sets of data that we're more concerned with and if there are more sets of data than a very culled uses that are used for business insight oftentimes it becomes very important that people understand that data where you come from how do they get here how is that data different from any data that we have in the organization with small data is the use of the small data sets organizations and at some point when the large data is just or big data becomes just another thing in organizations perhaps the focus will turn to the small data so it may be too big for all but the largest enterprise have time and expertise to build a big data platform maybe it's too large maybe it's something that we can't manage the vast amount of data that we have and that we need to break it down into small data sets so it's just something that I would want you to consider and again one of the goals of this webinar series is to make you think about some of the things that we're talking about here and whether or not you think the small data is something that's going to be very important in the near future if you have individuals owning small data would Marcus start to change from the world of big data profiles and marketing so again this came from e-week.com said I'm not sure if the business world is ready for that yet but if we own our information and decide there's that word own again and we use the term ownership and a lot of organizations use the term ownership when it comes to responsibility for data in the organization I know that I in the past mentioned that I try to stay away from the term own but when it comes to small data sets perhaps that would be a word that would be kind of reintroduced into the vocabulary for data governance programs is people own small data sets and they have the responsibility for making sure that that data is governed very well in order to be used to make important decisions within an organization so there's data and there's small data one last summary of our data governance and small data I say if we define small data as being data that is limited to purpose high quality document easy to use easy access and easy to use to improve organizational analytics and then at some point there will need to be the governance around that small data so therefore there's not too many people out there that would argue that in order to provide this small data there will be a level of small governance associated with that data as well so therefore if small data is something real there will be small data governance at some point in time so we've got some data governance now in the future we'll look for the potential of being small data governance some questions are what impact does big data have on data governance efforts and vice versa I think it's more what is the impact of big data, does data governance have on big data and what do we need to do, is there anything that we need to do different than the other type of governing we do in our organization and to be honest with you I haven't seen too many people come across and say that the big data needs its own type of governance so what aspects of governance need to be altered for what governance roles and processes must be directed towards big data and we've kind of summarized here that the governance of the big data and the governance of the small data and the in-between data is somewhat similar so what's the relationship between governance of big data the existing aspects of our data and information governance that we can directly apply from the definition to the production to the use of that data my suggestion as always would be to stay non-invasive in your approach to governing big data and there's a lot of information out there about invasive data governance there may be some additional technical complexities maybe some additional product technologies and data complexities please feel free to share that with us and respond to the email that Shannon sends and if you've got some insight as to how big data governance or small data governance may be different from any other type of governance I'd love to start conversation with you and there's a lot of places where that can be addressed what is this big data we talked about considerations for governing big data and big data governance and then we talked about small data let's take some questions if there's some questions out there I believe that there might be and just again to let you know that the monthly webinar series in December we're talking about government expectations in January February and March we're talking about success, data governance policy and then actual data governance and I think that's going to be a really interesting webinar in itself so with that I'd like to say thank you very much and kind of turn it back to Shannon and if there are any questions that we'd like to address thank you for that great presentation as always one of the most common questions of course that we get is will people get a copy of the slides and the recording as you mentioned will go out in the follow-up e-mail within two business days so by end of Monday for this webinar if you don't have it in your inbox by Tuesday morning let me know and I'll be sure to give you a copy that Shannon at Dativersity.net so quiet today I think everyone's just winding down for the holidays Bob for at least those in the U.S. there we go we've got a question would it work to have big data defined so over which we have no controls and cannot validate? Well that's not the right thing isn't it to have data that we have no control over especially if it's data that is is really important to organization so I don't know I've never heard of the term useful data but again if it makes sense in your organization to develop data that way then feel free to do that I mean there's nothing to say but useful data is certainly something that you would want to have control over so if you cannot control your data and you cannot validate your data it is a hope or expectation that you're not using that data to make valid business decisions around data again it's useful all data at least you would hope that most of the data that you collect would be considered useful data for your organization or for governing the data or something like that tomorrow? Well there's a lot of tools out there and that's a really great question and there's a lot of vendors and a lot of software companies that have great tools that that can be used to help capture metadata about the data and help you to put some structure around the governance in your organization but I'm not necessarily thinking that we need to have different types of governance or specific types of tools often times it's easy to be able to develop tools internally within an organization where you can create a common data matrix that I've used in many other the webinars that I've given or governance activity matrices I mean there are a lot of tools out there that will add value but recognize that with the tools comes a curve and it takes governance of the tools themselves in order to make them successful with the tools as being enablers to create the governance programs and metadata programs but not necessarily the programs themselves. Are there any vendors or conferences that you'd like to recommend? I of course love that question. There's a lot of great conferences there was a great conference this week a data diversity conference and a DevTech conference there's a conference in Austin, Texas in April at the beginning of May which is an enterprise data world there's a lot of events that are out there so I would suggest keeping your eye on dataversity.net and seeing what events they're talking about because most of that type of information is available there. A lot of all of our conferences as well and as of course enterprise data world covers the full spectrum and then we have several specific conferences. There's a question about that area so I don't know if there's questions if people know that I know you've mentioned that people should put their questions in the Q&A area so if you put questions into the chat it would be easier on Shannon and kind of put them in the Q&A area if you can. Great, thank you. Next question is there some data governance, a subset of enterprise data governance? There's an MDM data governance I'll say again I'm not sure that there is such a thing as MDM data governance. The governance of the master data we know that we see the terms master data and data governance kind of joined at the hip in a lot of things that we read and presentations that we see. The governance of data that is under the auspices or under the mentorship of the MDM program certainly needs to have governance. We need to make certain that we're talking about what the true master data looks like for the organization, what the metadata is associated with that data the processes that we use to bring the master data together so there are a lot of things in the master data discipline that require data governance so consider it to be a subset of your data governance initiative certainly it could be a subset but again it's not its own thing the governance of master data would be the same as governance of big data I would think. My question can we get from lots of data that we can't get from a sample of that data? I have a question one more time please. So what can we get from a lot of data that we can't get from a sample from a statistical sample of that data? That's a great question and so I guess I'm not necessarily an expert on big data but I would be glad to try to answer that question. What would be more of a sampling of whether or not it's customer behavior or industry behavior or whatever it is the more data that we have the better we predict what the future is going to be and what certain activities are going to provoke other activities from taking place so that's the high volume of the data and the k-marts and the wall marts and the oil and gas companies that are big adopters of big data technology they're trying to monitor large volumes of information that have to predict and know when something is happening that shouldn't be happening or when something is happening that needs to be addressed so that type of things that you can get out of large volumes of data as compared to smaller volumes of data. One of the big data is that even though regular data governance tends to build quality or consistency some of the big data information may contain intrinsic value and we may not need to build substantial data quality this is kind of a comment here substantial data quality consistency in order to not break up that your cutoff there. There's some truth in that when we're looking at higher volumes of data maybe we don't need to necessarily be as concerned with the data element and the value of the data and the piece of data that we're looking at but we need to be worried about the volume about the quality of the data we need to think about where the data came from how it's defined because the whole data that's defined very poorly is just a bigger problem than a small amount of data that's defined and produced very poorly so there's some truth to what he said but there's certainly some relationship there. I have a question should derive data or does the data need to be governed? Certainly derive data needs to be governed when you think about the fact that a derivation of a piece of data means that we have some type of a formula or some type of an equation that we come to derive that data from other data well certainly there needs to be a level of an agreement as to what that derivation is going to look like how it's going to be calculated how it's going to be rolled up or how it's going to be broken down and certainly the quality of that information and the quality of the metadata about that derived data becomes very important as well. So I would say that yes, derived data needs to be governed is not the same than more than other types of data in the organization. And should we consider the governance of big data is better addressed at the top of strategic plans? I don't think so at least that's my opinion and again I'd love to hear from people if they feel differently, but I would say that we don't necessarily need to put data just because they get the top of the organization's plan if it is a business initiative from an organization to be able to analyze large volumes of data that are coming at your organization very quickly then yes in no sense to talk about data governance in terms of big data but there's no reason other than that that I can see to take it kind of put it at the top of the data strategy for an organization. Those are all the questions that we have today. Thank you very much. Thank you very much Shannon and so what I'd like to do is I'd like to thank everybody again for attending the webinar. I wish everybody happy Thanksgiving and happy holidays if you're outside of the U.S. but it's been very thankful to be given the opportunity to do this webinar and to be attending any of this webinar so I just wanted to put that out there as well Shannon. Thank you so much for this and thanks everyone for your participation I just love that the chat's always going and we get lots of questions throughout these. Again I will send a follow-up email within two business days with links to the slides, links to the recording and Bob always writes out the answers to your questions as well so you'll have that and the additional information requested throughout so if you don't have that in your inbox by Tuesday morning you can email me at Shannon at dataversity.net and we will get that information to you. Everyone has a great day and thanks so much for attending and Bob thanks again for another great presentation. Thank you.