 My name is Shannon Kemp and I'm the Executive Editor for Data University. We would like to thank you for joining today's Data University webinar, Demystifying Big Data. This year's May edition in a monthly series called Data Ed Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Let me give the floor to Megan Jacobs, the webinar organizer from Data Blueprint to introduce our speaker in today's webinar. Megan. Hello, everyone, and welcome. My name is Megan Jacobs and I'm the webinar coordinator here at Data Blueprint. We are so excited that you all found the time to join us for today's webinar on Demystifying Big Data. As always, a big thank you goes out to Shannon Kemp and Data Diversity for hosting us. We will start in just a few minutes after introducing your speaker and also let you know about some of the key items. We are planning one hour for the presentation, followed by a 30-minute Q&A session. I try to answer as many questions as time allows at the end, but feel free to submit them as they come up throughout the session. Answer is the top two of your most commonly asked questions. If you will receive an email, links to download today's materials and any other information you request during the session within the next two business days. You can find us on Twitter and Facebook. We have set up the hashtag Data Ed on Twitter, so if you're done, feel free to use it in your tweets and submit your questions and comments that way. We'll keep an eye on the Twitter feed and we will include answers to those questions and our Q&A towards the end as well. Now let me introduce you to our speaker. Dr. Eakin is an internationally recognized thought leader in the data management field. Many of you already know him and have seen him at conferences nationally and worldwide. He's been 30 years of experience and has received many awards for his outstanding contributions. He is also the founding director of Data Bootprint. He has written seven books, dozens of articles. He has experienced with more than 500 data management practices in 20 countries and consistently named as one of the top 10 data management experts in the world. Peter has spent multi-year immersions with organizations as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and the Walmart. He has hired at conferences and workshops and is traveling to numerous speaking engagements and projects. Recently, Peter was at Data World in San Diego. Peter, where are you today? Exactly, Shannon and I spent much of the conference into very intense discussions with people who have an interest in this. In the San Diego conference, actually, we had attendance up quite a bit, so we were very pleased with all the interest and the additional attention that it is in that process. So I'm looking forward to the feedback from everybody. What we're going to talk about today is, again, this topic of big data. And those of you that didn't catch the slide as it went by, one of the biggest problems or challenges that we have with big data is because we have more data, we have more wheat, essentially, to sift through in order to try and find the kernels or the needles that are underneath it. So as we do this, we want to talk a little bit about the origins of data analysis of things that we're looking at here. The challenges that are faced by virtually everybody in this space. And the key to this is that you need to look at ways of complementing your existing data management practices. And this is the first takeaway from this session, is that when we look, we see that most organizations are starting out by trying to stand up separate organizations for big data. They are entirely complementary with the things that you do. And if you stand alone organization, you really feel that this is actually hurting your organization in the terms of the attempt that they can have. So look at this as a complementary thing. Think of it as ketchup on your french fries, if you will. We'll talk a little bit about some prerequisites that are necessary to exploit big data techniques. And we'll talk about prototyping, which is really the proper way of practicing big data techniques. And you do need to practice them. It's just like any other skill set that you're going to bring into your organization and work with it. But at the top of the hour, some takeaways, and look forward to a good rousing discussion. I know that several of you have started out by asking me questions even in advance of this session. And I'll look forward to responding to those as we go forward. So let's get started on this. Now while it's not a fun sort of thing to talk about the Black Plague, it is interesting to look at the origins of data analysis. And they do really start with the plague in here. In fact, there's a particular book sponsored by Captain John Grant, who is a member of the Royal Society. This was their society of scientists that they had in London. And when they looked at the plague and what was happening there, they were trying to figure out sort of what's gone. A terrific book by Barbara Tuckman called This Mirror, which is people they thought that the world was going to end in those days. But some people said, well, how bad is it? And if you just simply say, it's really, really bad, that's not something that's actionable. It's not something objective. It's not something that you can do anything with. So Captain Grant sat down here and put together this book of realities that he had, which you can see from this slide here were things like plague and stillborn deaths and spider fever. I know it's a great way to start out on a seminar like this and talk about death and things like that, but it is. And notice he also had a number of bells and females and the number of buried and just all sorts of statistics. Now, the point of this was that when he had these numbers assembled, he could then start to look and say, what's really going on? And what is in fact happening? Because if you just say it's all bad and things are terrible, that's not helpful. But if you say that nine people died from the stopping of the stomach, which is one of the categories in this book here, people are going to say, okay, so what else was going on? When you have these kinds of answers, you can now start to map them out and say, well, where is it happening? And there's a map that they put together that was the first example of mortality geocoding in there. You can see the denser areas had 07 deaths and the shaded areas had under three and you might ask the question, what's going on? In those areas, it's different. It's different in those areas from other places. The question is, when is it happening? And it turned out that they were accurately able to work to determine when the peak of the plague occurred. So it was about September 12th. You can see in there September 19th. In that week in September, the number of total burials peaked and they sort of got over it and they were on the downside. So the first question was, where is it happening? The next question was, when is it happening? And they eventually figured out using this data as well, why it was happening. In other words, the answer is they had those nasty wraps. And of course, once you have those questions, you can actually ask the next question of what will happen next. And of course, that's what businesses are looking to do, is to predict what is going to be happening next. Now, the plague was in 1665 or so and again a number of years surrounding that. But the next thing that happened was the cholera. It was the cholera epidemic in London. And even 1854, John Snow took these same techniques, these data analytical techniques and was able to calculate the number of deaths around and it turned out that if you remember anything about your history, the cholera epidemic had to do with well pollution. And so they simply stopped drinking water from certain wells and people ended up not buying, which is a happy outcome from this. So we look at these data origin analysis because this is where it has been. And I'm going to now say where are we going? And the challenges that we're looking to address are things that are really in common and you see a lot of these stories that are around. The big one in the economy, a couple of years ago, they do some terrific reporting on this. I'll cite a couple of other instances a little bit later on as we get into it, but we're starting to get into things like yodobites. And again, they say it's not too big to imagine, but that's only temporary as far as that goes. So when we look at this and say, well, what do we mean by big data? And what do we mean by big is the real question. If you're an executive at AT&T, the year before they introduced the iPhone and they told you that your volume in mobile data was going to go increase 8,000 percent over a four-year period, you'd be going, okay, what do I need to do to deal with that? Now, from AT&T's perspective, they want to service their customers and create a valuable business proposition in there. There are a couple of other statistics that go along with this as well. Eric Schmidt's been widely quoted as saying, every two days we collect as much information as we did in the entire time from the beginning of history up until 2003. So that's an astounding statistic that goes on in there. When we look at the smartphone shipments worldwide and in China in particular, you start to see that of what's going on there that most of the mobile growth is going in China and that we're going to see some very, very interesting things that are happening as well. In addition to that, the number of things that produce data is coming up. And if you were watching the revolving slides prior to this session, one of the other quotes that's very interesting is that we will have sensors, self-propelled, self-powered sensors that are no larger than a grain of sand, which means you can put them everywhere. They can become literally ubiquitous. And when they become ubiquitous, now we have a enormous number of things that we can use to gather data. You're just looking at IP traffic and saying that it's going to quadruple by 2015 is an absolutely astounding statistic. When you look across things, again, another good measure of how popular these topics are is by looking at what we're asking Google to tell them. And Google's here showing you the number of search results for the term big. And notice the scale is only from the beginning of 2011. So it's come very fast, very hard, and the internet pages that are mentioning big data, again, has been increasing enormously. So again, up being 50 million big data pages devoted to it. And what you're seeing is increasingly individuals are making use of the things that the data-producing capabilities provide. So whether it's apps for your iPhone or sensors or a Nest thermometer that you might have for your house so you can do more efficient energy management is just incredible things that are going on. Here's a couple of very definitive statistics. There's some infographics, very good at the ones that we produce here at Data Blueprint as well. I went from IBM showing big data equals big opportunity and 4.8 trillion ad impressions. That's a lot. I'm from Wipro, which is a very nice consulting firm. Some of our colleagues work for it. Well, 50 million tweets a day, 3 million emails sent every second. Or I think, which is another one that says that 90% of the data was created in the last two years, but that's 2.5 quintillion bytes created daily. There's something perhaps that brings it a little bit more home though. Everybody remembers the London Summer Olympics, very successful set of games that they have here. And the gigabytes of data per second, 200,000 hours of big data were generated just for testing the systems. Every day, they generated an entire year's worth of media coverage. 8.5 million people on Facebook were talking about the Summer Olympics, which was averaging 15 terabytes for data per day, 15,000 tweets per second, 4 billion people watching in 8.5 billion devices connected. So big data is different. And you can see it's just changing the curve up there in terms of what's gone on. Now, that's sort of the reality of the situation in that we're dealing with a lot more devices, a lot more devices that we can make use of that can also generate the data. And so people are saying, well, how do we start to deal with this? And one of the things they always do is they include a whole series of pictures that I'm flashing in front of you right at the moment. On the other hand, there's a real challenge with this too because when you look at the definitions that they have for big data, you'll see something like this. These are two of the magazines that we read on a regular basis in the business school, contact Sloan Management Review and Harvard Business Review. They both have special issues on big data. And they make big data and compasses everything from click-stream data to the web to genomic and protonomic data from biological research and medicine. And there's some very cool things that are happening there, but what's that objective? And there is a problem with this from an objective perspective. So let's talk about what people's definitions of big data are. Most of the time, it has to do with people very familiar with Doug Laney's original report on this in 2001 where he said it's volume, it's velocity, and it's variety. It has to do with these, doesn't it? Well, I've actually gone a little bit further and calculated a number of other definitions in here as well. There's one called variability that I've called SRC does and says, well, you know, not only volume, velocity, and variety, but variability. There are many options or interpretations that can confound the analysis that's out there. Another one is vitality. A dynamically changing big data environment which analysis and predictive models must be constantly updated as changes occur to see the opportunities as they arrive. That's what my publicly available CIA paper. And another one says it's virtual. It's only going to include online assets. It can't include any hard data assets. Well, again, these are really more problematic than they are useful. But let me give you an example very specifically of big data. Now, last year, the European Union approved a rule that mandated that a trade on their stock exchanges must physically exist for at least half a second. Now, that's very interesting from that perspective. And it is a trade that must exist for at least a half a second. So this group here, the Nanx Group, put together a video that's out on YouTube. I'm showing you 30 seconds of that video right now that said that within that half a second, there were 1,200 orders and 215 actual trades that occurred. That's big data. In other words, in a half a second, a whole bunch of things happened really, really fast. I urge you to go out there and take a look at that video just to get an idea for the enormity of the challenge that we're dealing with. So the half a second trading on the second 2013 in the stock, Johnson & Johnson is absolutely phenomenal. Here's another interesting use of big data. This is the history flow of the Wikipedia entry for the word Islam. And they're looking at it and saying, well, who's making what types of entries and how is the change made and these kinds of things here, which is, again, interesting from a cultural perspective to take a look at. So there's certainly an example of big data. Here's another one. We're looking at the spatial flow of information on the New York Talk Exchange. So where does New York communicate with? As they're going back and forth on these things. And if you look, you can see these visualizations. So these are telling you the kinds of things that are happening in there. But let's go back a bunch of years to 1962 to a paper from a fellow named Oran and Claude. I've made it a goal to look up and see what happened to Oran. I just found out about this paper recently. But Oran had some very interesting things in here. For example, at the start of the paper, what does the size of the next coffee crop bullfighted tenants figures? Local newspaper coverage of U.S. matters. The birth rate, the average mean temperature of refrigerator sales has to do with who will be the next president of Guatemala. The answer is perhaps nothing, but it's not a frivolous question. And the very last piece on this is that he says that this type of analysis will only be the first false-trink step by an infant-quantified behavioral science that is going to be forced on us for its upbringing like a doorstep baby, which I think is a very interesting analogy. Because he predicted not only the use of computing in the intelligence community, but also worldwide, but also the use of predictive analytics and the accompanying privacy challenges that we're all going to be facing. So let's talk now a little bit about some definitions of big data. And I'm going to put a bunch of them up here on the screen here. Again, Gartner group, high velocity, high variety, high volume. What do you expect? That's the original definition. IBM data sets are beyond the capability of big data software to use. Big data sets include data sets beyond the size and capabilities that they can use. That's from Wikipedia. New York Times says it's shorthand for advancing trends in technology. Well, not helpful. I was quoted back in 07 as saying big data is about putting the I back in IT. I kind of like that one. But what it really gets down to is that we have no objective definition of big data. Anybody that is saying any measurement, claim to success, quantifications, etc., etc., must be viewed skeptically and with suspicion. Now, does that mean we should give up on trying to do data and then get over it? No, we're not going to stop here a third of the way into it and say we can't manage this stuff. But the question might be, would it be more useful to refer to big data techniques? And those, in fact, can be quantified. These are new techniques that are available that impact the productivity in an order of magnitude sort of sense that complement the existing analytical methods that we have in here. These big data techniques are characterized by continuous, instantly available data sources. I have something called non-venoment processing, and I'll come back to that in just a minute. These capabilities are approaching the or the limits of our human comprehension, which is, again, something else we can manage. And finally, they need to be architecturally enhanced with identity and security capabilities. So there's some things that we can look at. There's other trade-off-focusing data processing that we'll talk about as well. So now we can't define big data, but we can objectively define big data techniques. And the question then becomes a little bit easier. Where in our existing architecture can those effectively apply the big data techniques? So let's take a look and just remind everybody the purpose of an architecture. And whether we're talking data architecture, enterprise architecture, or other types of things, architectures talk about how and why components and act, where to go, when are they needed, what are the changes that needed to be implemented, what should be managed regionally, what should be managed locally, what stage should we use, what adopters, what rules, policies, et cetera, et cetera. And the key, of course, to all of this is that all organizations have architectures. These architectures should be developed in response to organizational needs that are then instantiated and integrated into a data information architecture. Architecture often doesn't articulate some specific information system requirements. And when we put a feedback loop in place and say how's it going, people tend to say, okay, we've done something useful there. We have an organizational response, we developed a solution for it, we tested the solution, and we evaluate the results. Now, unfortunately, most organizations stop right there. Of course, you don't want to stop there to continue this process. And organizations spend a lot of money doing this. Worldwide spending on business information is over a trillion dollars a year. The average large enterprise spends approximately $38 million on information, and even small and medium-sized businesses spend hundreds of thousands of dollars. So this, obviously, means that everybody is spending some money trying to manage their data on something in there. So we'll move on a little bit further and talk about how these can complement our existing practices. And I'm hoping that most of you have seen the Gartner-5 phase hype cycle. If you haven't, it's something that's certainly worth looking into. Gartner is usually very happy to give you a taste of this for free, and you can usually find them on websites or in seminars like this one. But what happens is that we start off at the bottom left-hand corner with a technology trigger. There's something cool that somebody figures out. And then you move to someplace we call the peak of inflated expectations. Now, what happens, and hopefully this isn't after lunch for too many of you, but when you get to the top, you know that the next thing that has to occur is that you fall to the bottom to the trough of disillusionment. And then eventually, as you get a handle on what's going on, you raise up the slope of enlightenment and eventually come to the peak of productivity. Now, some of you may recognize this also as an oscillation function, just as everything else. We eventually settle on some happy mean for where this should be used and where the clear benefits of the technology are used and commercialized. Let's take a look at where Gartner positioned some of these things on here. This is the Gartner big hype cycle, and you can see it's very dense and has a lot of things. Let me just point out a couple things. Text analytics is currently in the trough of disillusionment. That doesn't mean it's bad. In fact, it's about ready to start rising and things are going to get exciting. So network analysis, however, you'll notice is five to 10 years away from what they call peak hype. Excuse me. That means that I wouldn't put a whole lot of effort into that social analysis unexpected to pay off in immediate terms. It's a decade-long type of an investment. On the other hand, web analytics and predictor analytics, both of which are well-established fields, are clearly climbing that plateau of productivity. So there is a big data looks like. Let's see how big data looks like in the context of other pieces. Actually, we focus on one additional piece, which is that Gartner says in the middle of this report, a focus on big data is not a substitute for the fundamentals of information management. Thank you, Gartner. We certainly agree with you there. Now, let's take a look at where big data is in Gartner's hype cycle. So this released almost a year ago. But nevertheless, even a year ago, we were seeing that big data is still two to five years away from peak hype. Now, what that means is we're getting to the point where it's going to go down into that trial of disillusionment. There's fun things happening with it now. Let's take a look and see what kind of things can we really expect and where does it come from. Excuse me. The last piece of this is that most of you are familiar with Gordon Moore's law. It says over time, the number of transmissions in an integrated circle delves about every two years. In fact, it's 18 months. It's getting faster rather than slower. It shows no real sign of slowing down. Gordon was very prescient in this area. But we don't understand that the way we do it is trade-offs. Now, the way we did things in the past, we were always going to be getting better. In fact, there are some trade-offs that we need to make. The most typical one is a business card that my friend Michael Adams used to give people, and he would hand them this on the back of his business card and say, price, quality, and speed. And they'd look at it and say, what am I supposed to do with this? And he'd say, pick two. And that's actually kind of a good thing because you can't have it all. So we start to talk about trade-offs at any point in time here. And if we're trading off between price and quality, for example, there's still additional trade-offs to be made even between price and quality. If we spend more on price, we can generally improve the quality of it. So these trade-offs play very much in the big data world as well. Now, here's another development that's happened in the big data world, which is that while the business has gotten really, really fast, our hard disks are limited in a number of different ways. And the thing is that the processors that we have and the processing capability that we have is literally starving for data. I don't know, again, how many of you remember, but in the old days, we used to have computers that we'd call them, and we would take a bunch of mainframes, regular mainframes around, that would do nothing other than feed what we call jobs, okay, challenges, problems for the supercomputer to solve. But the only way we could keep the supercomputer running at full speed was by having all these other computers feed it with all the speed that they were using in here. So these faster processors outstripped the processing capabilities of the hard disks in the main memory, and the hard disks became too slow and the memory was too small. So flash drives removed both of these bottlenecks. And just as an example, combined Apple and Yahoo have spent more than $500 million to date on doing this. There was an interesting article in Forbes recently, and I believe that flash drives Apple as a follower of high-priced flash memory devices. There's certainly some truth to that statement. But what this does is that it makes additional storage and more memory kind of obsolete. Again, look at minimum 10x improvements here. What this means is that we can take what we had before, and we can apply memory to their real memory, or we can apply memory to the disks that are being articulated in there. So as we look at these flash processors, it means that Facebook's Dragonstone servers have three terabytes of memory. What can you do with a server that has three terabytes of memory? And the answer is a lot of new things. So you can only use it real memory, but you can use it as disk memory as well. And of course the iPods and the smartphones of today are all using flash memory. You'd never put a spinning disk drive in this thing that would drive the batteries crazy. Flash storage is now cheaper than disk storage. And in the future, you'll see the hard disks of the world go away. There are additional things that you can do with it, which means that the old hard drives, we used to have to worry about dropping them, breaking them in one form or another, because we're all problems with them. We can't do that anymore. We can drop these things just like we can on smartphones when they come up. But that's just a sampling of the new capabilities that these new storage devices can take. So this is one of the developments that's happened here. Here's another one that's happened as well. We call this non-Van Neumann processing efficiency. These are named after John Van Neumann, who's one of the luminaries in computer science. And we have used the Van Neumann architecture ever since the invention of computing. Again, the way we used to do it was we would take data and we would run it to a processor. And the only way we were able to get things to go faster was by shrinking them. And we're getting to the point of we can't shrink them anymore. They can't be made smaller. Literally, they are about as small as we can get them. We have machines that make machines that make chips now in this area. These are all very interesting things, but you look at that limitation. Michael Stonebreaker did some terrific work on this. If you don't know Michael, he's the limit behind Ingress. Really terrific work at MIT in Berkeley. But he did a study a couple of years back that found that modern database processing was 4% efficient. If our furnaces in our house were 4% efficient, we would all be broke and cold. We need to do better. Not to say that these 4% is necessarily bad, but what is it that allows us now to say, can we relax some of the constraints that we had in our previous incarnation of database processing? And these big data architectures attempt to address this, but because it's a zero-sum game, we have to talk about treating these characteristics against each other. So we might, for example, give up on reliability and try to get some more predictability. Things that you've seen in the popular project in MapReduce, the Amazon Dynamo, the Netflix Chaos, the Monkey Hadoop, and this thing called McDipper. It sounds like it was invented by McDonald's. All these things are various ways in which we can exploit non-venoment processing. And the real key to it is instead of taking the program and moving the program to a processor and running it through a processor in the old architecture when we were having big, full main computers feed a supercomputer at a high efficiency rate, we cannot break many's tasks up into loads of thousands, millions of nodes and have them processed in parallel. In the process of in parallel, we're actually taking the software and the processing to the data rather than taking the data to the processing. So when we end up with a server that has three terabytes of memory, we can run thousands and thousands of processors simultaneously through that exercise. And those processors then need to be reassembled on the other end of it and put back together and that's what some of these big data architectures are about. Now I mentioned the trade-offs that we do. You'll hear this as CAP theorem. This stands for Consistency, Availability, and Partition Powerance, as in faults when things go bad. So if we look at the traditional model, we see Consistency and Fault. That's a really good area for relational database managers to play in. And this gives us what we call the Acid Test. It stands for Atomicity. We have to identify a very precise level of transaction that's consistent. The same thing will happen time after time. We can isolate it down, which means we can audit and go back and look at it as durable and we can keep it running as long as we need to and make those results. When we look now at Availability, however, and Fault Powerance, now we look at the NoSQL world. And NoSQL does not stand for NotSQL. It stands for NotOnlySQL. Again, not a great naming convention there that we did, but that's what we've got. And Merifocus is not Acid, but it's on base. It gives you high availability. It gives you a soft state, not a hard answer, but this is the direction things are going. You can move towards eventual consistency. Now, the third area that we've got, we can do some small data sets that can be both consistent and available, but that's a much harder piece. So just like I showed you on the previous slide, we have to try off the zero-sum game. In big data, we have to start paying attention to these tradeoffs as well, which means you're not going to throw out your relational database processing. If you're going to layer this in in addition to your database processing and trying to figure out how to do that properly, I'll show you an example in just a few minutes. But some of the tradeoffs that you may have to make, for example, are between using SQL and using big data, because big data doesn't currently have a way of using SQL. Another big reason that people are interested in SQL is because our colleges and universities for the past 30 years have been cranking out lots and lots of young people who understand how to program in SQL. This is the basis for putting together a standard set of programming languages. SQL, of course, ended up fracturing a little bit and not becoming quite as universal as everybody would have liked. But here's a tradeoff. Do you want SQL or do you want data? Here's one. Do you need privacy or do you need big data? Big data techniques do not have the security that our existing technologies have in there. And so both privacy and security can be tradeoffs that we need to make against big data. But by golly, if you need massive high-speed flexibility, there's not much else that will give it to you other than these big data techniques. So these are some of the tradeoffs that you have to make as you're looking at where in your existing architecture should big data play a role. Let me give you a different component in here that will possibly be useful for you, what we've seen. One of the things that people are always trying to do is to figure out what's the invite that people get. This is what we do in analytics. And they're looking at something and saying, what's happening out there? What can we observe? Now, oftentimes when we do this, we can look at something and come up with some rules. And there's a very neat simulation out there on the internet called BOIDS, B-O-I-D-S, which means that we figured out how to make simulations that show that we can control a flock of BOIDS with just three variables. Now, that site, and that's very interesting insight. It also talks about crowd behavior and things like that that we can look at. This is the feedback and the ability to discern this pattern. Wow, with just three variables, I can control a mob. Hmm, very interesting. What can I do with that? Well, not much, unfortunately, because if I can't operationalize it, if I'm not able to exploit that insight and put it into an existing knowledge base, it's only good for me as the analyst who's observed it. I don't have the ability to move forward with it. This is what we call the analytical bottleneck, and this is what we're spending an awful lot of money paying an awful lot of people and an awful lot of vendors to put hardware and software together, because what's really going on, and when we add this to the volume, variety, and velocity, things that we looked at before, we reach things that are happening. People are saying, what is happening? What can I do? And this is really a sense-making technique. We're trying to figure out what is going on so we can make sense of the patterns that we see. The big contributions here are the ones I'm showing in orange, because not only do you get this sense-making technique, but you also get this potential and actual insights that come out of the analysis of the data. And our most powerful insights are always those that we combine with informed insights that we have drawing from our existing knowledge base. Now, all people, David Brooks, has done a very interesting column a couple of months back in February where he talks about the limitations of big data. And one of the things that he says is that data analysis struggles with the social. For example, you need to count things, but we can't tell whether somebody is talking about it because they like it or because they hate it. Again, it measures the quantity but not the quality. So you may have interactions with co-workers that you see, but big data won't capsule the devotion to our set of childhood friends or college friends that we've had. We can make decisions around there. Again, it's very foolish to swap the scalbed machine for the crude machine that's on your desk. The big data struggles with context because decisions are always embedded in sequences and contexts, whereas brains think in stories and they will weave together multiple causes, multiple contexts. Data analysis is pretty bad at narrative of emergent thinking and explaining. Data creates big A-stacks and that leads to more false positives, if you will, or statistically significant correlations that may not be, in fact, useful. Most of them are spurious and they deceive us, and this false entity, if you will, grows exponentially with the given amount of data that we're doing. Big data, for example, has never solved or changed anybody's mind onto whether we'd be rescuing the economy and then putting more money into the job creators or whether we should be putting more money into work projects. Big data has never changed anybody's mind on that. It favors these memes over the masterpiece. So it can tell when people take a liking to something, but it can tell when people hate products initially because of the unfamiliarity. Finally, big data really obscures values. So while we talk about, quote, the raw data, we almost never understand the context that it comes from and all data has some context that it's using in here. Now one analysis that we looked at talked about $100 billion a year savings in just the healthcare area. And the first one was on transparency and clinical data and clinical decision support. So that's almost half the savings that come out right there. This one was research and development. It was $108 billion. Fraud detections, $50 billion. Public health surveillance, response systems, $9 billion. And finally our little patient record part in there is about a $5 billion thing. These are things that can happen. Again, big data is something that can help in that area, but we've really got to put some effort in there. When you look at knowledge workers in general, knowledge workers spend about 80% of their time manipulating data. I know about 20% of the time actually analyze it. These are true hidden productivity bottlenecks. And when you re-architect these and put in place the proper balance of big data and existing techniques, each of you only reduce that for 80% of their time to 60% of the time. That does represent a doubling of our knowledge worker productivity. When we double our knowledge worker productivity, we can start to look at some things that occur in here. And this is a favorite book of mine by Eric Topol of the Creative Destruction of Medicine. Eric's book in here talks specifically about some things such as the reorientation of medicine from populations to individuals. And I'll give a very personal example here. My mother has suffered from it right after years and years. And when they treat her with drugs, they say this helps some people. Well, this will now enable us to go directly to say, Mom, you should take this because it will help you. Because the cost of taking that drug has actually outweighed the benefits in many instances when she's been on different regimes. Gives us, again, the opportunity for big data capture. For example, one of the things that we're looking at with a number of fronts are these wearable computing devices, which means it's going to help with one of our big problems that we have in healthcare of whether people are actually taking the medicines that they're prescribed. If you look into this, you will find one of the biggest problems we have is they say, here, take an aspirin every day and that will help. And the people just simply forget the patients, forget to take their aspirin. So this is something that we can look at in a really big way and come up with some really good things. The last item on this particular piece that he talks about is printing organs. And literally, we can actually print organs the same way as the folks the other day had a prayer gun on the internet for a little bit. These DNA tests can tell you the things. What are the odds of developing glaucoma? And I don't know how many of you saw on the paper this morning, but Angelina Jolien has elected to undergo a double mastectomy because she's at risk for a very big set of cancers that she's facing. If somebody like that can look at these statistics and say, this is tremendous, we can do some things that are going to be absolutely life-saving in many cases. So look at the prerequisites that people need in order to exploit big data. And one of the biggest problems that we have is that we still aren't educating people about data. When we teach business people about data, there's virtually nothing. And yet how many of them use it today and the answer is 100% of them. It's the definition of a knowledge worker of any kind. And worse still, when we go and look at what we taught IT professionals about data, the typical IT professional has had one course in how to build a new database. And while that's nice, you do have to ask if any kind of data that should be IT expenses on that chart are used to improve existing IT assets. Why are we teaching them how to build new stuff at all? It's a very underutilized skill in there. So the impression that smart people who've gone through our colleges and universities get is that data is a technical skill that you use to build new databases. This is absolutely not the best way to educate IT business professionals. Because data is every organization's soul, non-depletable, non-degrading, durable, strategic asset. From a perspective, it deserves care and feeding in a way that other organizational assets are cared for and fed. After all, you have a chief medical officer, you have a chief financial officer, and of course the case I'm making here is the case for the chief data officer. You have a chief data officer should be arguing for development of what we call data-centric techniques. And let me show you what application-centric development, which is the way people have been doing it for years, works with a strategy. We move some goals and objectives, and it's a natural next step to move to some systems and applications to be developed. The development of these systems and applications then leads us to some network and infrastructure requirements. For example, if I find ERP at step three, then the network needs to support ERP. Data and information have been an afterthought that, well, yeah, we've got to fill the system up with something. It ensures that the data is formed around the application and not around the information requirements of the processes that are always formed around the applications. And variable data reuse is possible. I'm going to quote, the significant problems we face cannot be solved at the level of thinking we were at when we created them. To create data as an asset, it means that assets are economic resources. They must be owned or controlled, and they must be producing value, and they must be converted into cash. These assets then allow us to anticipate future benefits, hopes, economic benefits, but in many cases they're intangible as well. With these assets, we can formalize the care and feeding of them and put data to work in very significant ways. And I want to shout out to my colleague, Ron Redmond here, for coming up with the articulation of that concept. We'll look at this centric now. What this means is that data and our strategy, excuse me, our strategy and our goals and objectives remain the same. But if we now start to manage information as an asset as this next step down the line, let this work an infrastructure component. Now it becomes easier to maintain. We're dealing simply with the idea of delivering data to people via a series of lightweight systems. This was in fact part of the basis for SOA that many of you experienced. SOA hasn't been working out very well because most people are ignoring the dick component of what I'm looking at strictly as a technology ploy. The advantages of this approach are that the data and information assets are developed from an organization-wide perspective. This means that we now have systems that complement the data needs and complement the existing organizational process flows, which means that we can finally reuse data and information to the maximum extent possible. These are challenges for us, but for whatever reason, we have not seen the reuse of application code. In fact, as several articles in recent, excuse me, studies in recent time have said that virtually no code is getting reused beyond the expert coder who understands precisely what they're trying to do with their code. But in general, we are not reusing application code. Here's clearly that we think that you ought to start looking at reusing code. What it means is that data is an asset for the organization, so the contention is that we should have a chief data officer in the organization. The reason for that is because most of our CIOs are spending their time doing a lot of other very good work managing information technology applications. They're managing application development packages. They're making sure you have email and support for backup and recovery systems, and the amount of time that they can focus in and strictly look at data is a fraction of the overall time. Now similarly, we're also contending that this chief data officer should report in at the same level for the other top asset management jobs. Finance is an easy one. If we look at that, the goal of the CFO is not to manage lots of other things. The CFO tends to manage finance. The chief official officer tends to manage medical things. When we look at this, the things that the chief data officer should be ahead of the data governance organization and that that data governance organization should coordinate with that top information technology job to make sure that we put things in place in a way that complements the existing architecture so we have tons of work for this function but there's not much talent. I have to tell you that when we talk about projects going wrong, we tend to talk about the projects becoming a science project. I'm not enthusiastic about the current business around the data scientists. It seems to me that would be the last thing we would want to have as a chief data officer would be a data scientist because they would tend to treat things as science projects. We have a lot of things that need to be done and the CDO is the person who can do this. Our friend, Michelin Casey, has done some very good work in this area and has done some really interesting articulating on this subject so I urge you to look up her blog and take a look at that as well. Now, our final section here is what are we going to get to? It really goes back to the fundamentals of the projects that we have here which is that we talk about this kind of jokingly and say, well, here's the original business concept and then the consultants describe it and the customer explains it and you know that is where we started to get some translation errors that occur in here and the business project leader understands it in a certain fashion. The programmer writes it up. It doesn't work quite as well. The beta testers get something. Operations puts it up. It gets accredited for operation. We finally deliver it. It's a little late. It's not very well documented. The help desk doesn't support it terribly well. The customer builds for a gold-plated roller case. The patches are applied. We finally get it accredited and all the customer wanted was a tire swing. The reason things work this way is because we documented this process of developing the systems lifecycle incorrectly from the first place. That's another lecture for you, but I could certainly tell you about it if you're interested. Reach out to those on the side. We'll be glad to do this because it came up with something called the waterfall model of systems development. First you do requirements, then you do design, then you do implementation, then you do verification, then you do maintenance. I'm sorry. This never existed. It never should exist. I think that we've been trying to make it work for the past 30 years. Often times you'll see that people put in little loops in the process where we get an analysis in the design and an implication, but we still need these feedback loops at each and every schedule along here. And the discrete phases. The better way to do this is Barry Bame's spiral model, which he established in 1986. So this was a full 60, excuse me, two years after the original software office textbook was gone on. And unfortunately, most people don't even understand this as the proper way to do things. What we're looking at here when we do this is that we start out to the very far left that we're doing. And we start to move gradually in this clockwise circle where we have some review and we do a risk analysis and say how will that work. And then we do some prototyping to see if it does work. And if the prototyping proves favorable, then we do some engineering to build it out a little bit and we plan the next cycle. And we do this iterative build process so that each prototype becomes more and more useful eventually turning into something that works. This is a risk-driven approach rather than a document-driven or code-driven process. And even though this article is old, these practices are very, very rarely implemented in a way in which they should be implemented. Which means that when you look at big data and try to implement it directly into here without integrating it into your existing practices, it doesn't tend to work very well. And so you end up with big stovepipes, additional piles of information that are out there. Approaches to this now say we should really look at different types of things. There are two separate ways of doing this. The top half of this in the blue is we're trying to use big data to approach an issue. And the idea here is to determine some sort of impact within a focus area and execute this assessment against it and see if a big data solution is applicable and will fit in with the existing architecture. Develop a short-term strategy that over the next period of months or so will determine the best fit for this. And allow us to develop our prototype approach to see if the pilot is in fact a big solution. The opportunity issue at the bottom is to not address an issue but to say, hey, look, can we exploit something? And we say, what is the best need? What are the improvement areas? Let's again collect the requirements, create a notional design, and see if it fits in with the architecture of the big data solutions. Then move out in a prototypical fashion. Even if these are much more productive in the traditional way that we're seeing this implemented in organizations, which is to say, have a separate data manager group, then there's a warehouse group, and finally we've got a third big data group and by the way, they never talk to each other. Well, that's dreadful. So what we're looking at here is what I like to analogize to the workload's hierarchy of needs. And you all remember this from probably high school in most cases, where at the bottom level are your food, clothing, and shelter needs. If those food, clothing, and shelter needs are unmet, then you're unlikely to sit down and do what we call self-actualizing. And the self-actualizing, while they show it as dessert and whipped cream up here, are things like being able to write novels and being really, truly creative in terms of going after your passions and things like that. In the data world, we look at it and say, look, the basics of this are the data management practice areas at the bottom of this chart, the part in the blue, are things that we call data program management, organizational data integration, stewardship data development, and data support opportunities. These are necessary but insufficient prerequisites to leverage your organizational data. And anything that goes in the top there, whether it's cloud, MDM, data mining, predictive analytics, big data, you can name it. In the top area, you can get to the things in that green area without doing the basics. Absolutely you can't. But it'll take you longer, it'll cost you more, it'll take you a little less, greater risk than if you instead concentrate all those basic practices and truly look at them and, in fact, practice areas. The more you practice the basics, the easier the things will work at the top. I'm going to say it a second time here because I get a lot of questions on this. Yes, you can do the things in that green triangle, whatever the list of silver bullet is, but it will take you longer, cost more, deliver less at greater risk than if you do it in a very organized fashion by gaining experience a long way. Just at the top of the hour, I'm going to leave you with a couple of thoughts here and then look forward to your questions. The first one is an interesting one that most people are sort of seeing as what is she looking at in my data area to focus in on big data challenges. The first real change is cost effectiveness of your existing analytical methods because you're paying a lot of money for your knowledge workers. This will make them as productive as possible. The second one is business optimization. You want to optimize the business processing, not optimize the technology. We can always tweak the technology to get it to where we want it to be. And we have high fidelity and quality sets of information. So the big data technology is not all that new. It's really a matter of keeping your focus and developing skills. And again, hats off to Gartner for coming up with what we consider to be really good guidance in this area. Here's another part that people may not be aware of. When we look at what people are looking at with data, most people think social media, but it turns out that's only 16% of what people are looking at. And so this broad stereotype of social media is the answer to everything is not correct. It's important. But by God, there's a couple of other areas. The next one we look at moving to the left is we're trying to get more detail about our customers, our suppliers, our things like that. And I'll relate to you a quick Dilbert, which was one of the Dilberts that Scott Adams put together was a terrific one where somebody said, hey, I want you to develop a social network for our logistic system. And you kind of stretch your head a little bit and say it, and Wally, of course, thinks it's a doomed project. But one of the customers that I worked with recently had actually bought a social network for their logistic system so they could own the conversations that were going on. And that's very interesting. You get more detail about your customers, your suppliers, who they're coordinating with, and things like that. It's a very, very interesting move from a strategy perspective. But here's another thing. We've had another term that we've got to deal with which is called dark data. Gosh, I hate it when people make up these terrible names. Dark data means existing underutilized data. This is really, we're seeing as much area in this area as we are in now. So big data is absolutely not solely about social media and not solely about integrating publicly available data or commercially available data or even more details. There's a good 40% out there that's saying we're just using big data to go out there and find out what we actually have. Again, I mentioned the Gartner recommendation several times. Again, I think these are quite, some of the new analytics made possible but they have absolutely no preference. Excuse me, I said preference, precedence. They have no precedence. So we need to look at innovation in order to achieve value, which we need to treat big data projects as innovation projects that will require change management. The business is going to take some time to trust the data source and understand the various analytics that come into play. This means that creative thinking can unearth these resources of information from this quote dark data. Now, I don't want everybody to go start using the term dark data but this will catch on. And we're looking at the business to understand what data sources you already have before you go out and spend lots of money buying other resources that are there. And finally, while big data gives us the ability to do things faster, get value from faster analytics requires change. So look at process-free design in there as well. This is also a lesson that we learned in this quote NEM space. People found out that when the NEM could do things, if it wasn't closely aligned with your processes, you are not going to be able to in fact make big use of it. Now, it's not always about the money either. We'll close here with two quick more lessons. First of all, when you're able to integrate data using big data into this, you can automate a series of manual processes. And what that gives you is that data can pass effectively and efficiently among your existing business practices to eliminate the inconsistencies, give you the ability to cross-analyze. But in this particular instance, it reduces the turnaround time matching patients with donors. So I've got to tell you, I'm here at Data Blue Prayer with a room full of really brilliant data engineers in the back room. And they enjoy saving companies money. That's really good. But when you talk to them about doubling the number of bone marrow transfers, that's tremendous. That means you're saving thousands and thousands of lives on an annual basis. So I'm going to close here with a quote from some organization that doesn't usually talk very much. Ira Hunt is the CTO of the Sexual Intelligence Agency. And he gave a great talk that's up on the web that you can take a look at on this. But he has a little history lesson in there that he gave us. He says that sophisticated tools without the data are useless. But meteor tools with the data are frustrating. To give that to an analyst, the analyst will always opt out for frustration over futility if that's their only option. And what that means is that a lot of the things that you're trying to do are already being done out there. So look at ways of making your existing analyst less frustrated and try not to go for the sophisticated tools but to look at improving both your tools as well as your data quality in there as well. And you're more likely to achieve better results. And with that, we've reached the top of the hour and it's time for your questions that I'm looking forward to. I'm going to turn the program over to Megan. All right, Peter, that was a great presentation. Now it's time for Q&A. It's time for you all to ask your questions. So just click on Q&A window. You should be able to submit your questions through that Q&A window at the top. You can bring it up that way. We'll give everyone just a few seconds to get their questions in and then we'll start. So our next series of events that we've got coming up and we'll do that screen up there and all of these next and then business integration. We've got some metadata topics coming up on this as well. And also, I'm going to see a bunch of you all out in San Diego at the upcoming Data Governance and Information Quality Conference too. For some reason, San Diego has been a real popular conference destination and Shana and I will both be out there meeting and greeting people. So we're looking forward to seeing you there. Peter, I just want to interrupt and make a little announcement that we are partnering with Morgan Kauffman. So everybody from today's webinar will get a discount code for your book. Oh, cool. Yes. Thank you. Some questions coming up? We do have a question. Let me see if I can get this phrased right. The first question is, isn't your approach featuring data requirements or was it data architecture before technical architecture and systems applications contradictory with the principle of big data consisting and storing data without knowing its uses? It stands. Terrific question. Absolutely. Very insightful. Whoever asked that. So let me just pull this slide back up so that everybody can see the context of the question here. One of the features of Big Data, remember I mentioned several trade-offs that people have to make on there, is that the beauty of many of the Big Data techniques are that you don't have to decide exactly what you're doing until you've sort of figured it out. In other words, you can get into it a little bit and let that solution evolve towards it. So absolutely. The question that was asked is in my application-centric versus data-centric, which are the two slides that I have here for you on this. People look at this and say, okay, well, this is number three on this chart and I'll just go ahead and highlight it so that everybody can see what I'm talking about on the slide. In traditional data processing, this has meant developing a data information architecture. And what I'm suggesting here is absolutely not that and it's a great clarification question. The data information architecture needs that you have formalized, that you have managed, that you do understand are the things that are going to tell you what Big Data holes you still have to fill. And so that's probably not expressed as well as this. I may take another crack as diagram to try and do that. That's one of the things we love about these webinars is that you guys come up with all kinds of helpful suggestions in here, so let me be very explicit here. The question is dead on. Megan, could you read it one more time to make sure that I get the language that they've used here because it's a very good question. I'd love for me to try that one more time. Do you want the whole question again? Please. Is in your approach featuring data requirements or was it data architecture before technical architecture and systems applications contradictory with the principle of data consisting in full data without knowing its uses in advance? And where as in high, the data had to be fetched from prior curies? Again, the answer is no, but I think I didn't do a good job of explaining it that way, so thank you for the question, whoever that was. What the point of this diagram here is showing is that the more that you understand about your existing data capabilities, as you start to make sense of the environment that you have, and again, this is what we're trying to do here is to make sense of the environment. So again, the big data components here are in orange. Those will be easier to make sense of if you fully understand your existing IT and data architectures. And if you understand them, you'll say, oh, wow, here's the hook. Now, the analogy for this is very, very good. And again, some of you may remember in the old days, we used to do this thing called molecular modeling. And basically what we did is once we figured out that we'd have to mathematically represent the model, we could actually represent it in a physical sense or a virtual physical sense, and look at the model and say, oh, there's a hole there, and where can I fill that gap if I have an aspirin molecule, little, I don't know anything about molecular modeling, but if I have this need, I can fit it in. This is what big data really does, is that big data is one of the strengths of big data is that you don't need to know these things in advance. It's a very, very powerful technique in here. But that should not be done in isolation from your existing piece. You will be better able to make good productivity of big data techniques if you have a firm understanding of your existing IT and data architectures. If you don't, all the data in the world is going to tell you some things, but you're going to take longer, cost more, deliver less, and a greater risk to get there than you will if you build on your existing architectures. Again, it's just a terrific question. I hope that makes sense to everybody because it's a really insightful question. Okay, great. And the next question is, how would the entity life history approach benefit development of integrated systems? How does the entity life cycle approach? I believe that's what it says, yes. Entity life history approach. Okay, well, I don't know if that's a formal approach or not. We certainly use entity life history to describe the life cycle of data in here. And so from that perspective, one of the things that we can look at is adding some sort of history to that life cycle approach. If we take a traditional structured approach for developing data, it goes through creating a formal data model, we're going to put it in structured tables, and we have to develop a whole series of infrastructure around it. And by doing that, we have to get the requirements correct as the last questioner mentioned, or we will not be able to build effectively around that. The whole big data is that we don't know these things in advance. And so this gives us, again, that additional flexibility. So I'll go back to that same diagram that I was looking at a minute ago. And so the pieces in right off to the left here sorry, orange off to the left there are things that we're just trying to figure out. And they really give us additional extensions where we didn't have that ability to do this type of extensions before. So we can bring in data in blobs. We really don't know what it means or what it says or anything else, and we don't have to pre-develop systems to do it. This has been one of the big challenges about DEI, and I heard this at the FEDU conference. Shannon and I were both at, which was that somebody came up, I think it was Mr. Ladly, if I recall, that said this, that said the best result from DEI is another whole series of questions which leads you to the idea that we're going to be going around and around and iterating. So the idea of building an infrastructure to answer a precise question is almost antithetical to what we're talking about in this context here. Again, I hope that answers the question because I do believe it's a good one. Next question is, how big data be audited? Are auditing techniques for big data different compared to traditional data? They have to be. And again, it can be audited, but what we're looking at here, again with the trade-offs that we described before, is that auditing is on a different meaning here. We're not going to use an audit in this case to calculate the payment on something or other or whether a check for $5 went from this vendor to that vendor. That's not the kind of thing that big data excels at. Those are things that atomicity, consistency, isolation, and durability do. But that auditing does make sense. It also means though, if you're at the base, remember the basic available soft state and eventual consistency, it means that your audit has to be also tempered with that. So we would not, for example, do, we just finished this last Saturday, the spring 2013 semester at VCU, and graduated several thousand young people with their college degrees to go out and do the workforce. We do an audit on each and every student. You would not do that with a big-day environment, but auditing in a big-day environment can still be accomplished. It has a different set of goals and objectives and what you're looking for there are did you in fact see the trends that were coming, the kinds of things that we've talked about already with regard to base sets of things as opposed to optimistic senses of things. Again, if that answers the question, please ask a follow-up on that. Great. And the next question is, do you think data science is just a new buzzword for what we've been doing already, or is it really substantially different? I do think it's got some substantial differences, but I also think that it in and of itself is not the panacea solution that everybody thought it is. You heard me earlier talk about, you know, what we call projects gone wrong. We tend to call them science projects. I do believe that, and frankly, I don't know really what you would choose to come with and jealous data scientists. Where I've seen the most utility come out of this data science movement is within a specific industry or domain, for example, the logistics area, the medical area, the transportation areas. These areas, when you get down to that sense, I think that we're going to find that a broadly based data scientist is simply not going to be useful or is going to take too long to become useful. One of the things that we see here at Data Blueprint and actually developed a practice around is many organizations during the 90s and the 2000s and early even in this decade would come into an organization and say, I've got these great techniques that I can use and everybody would say, well, great, let's put them to work. Let's find out what the forecast would be if you do this. And it's approximately three years to become truly productive in this area. When we see our techniques to this, we help get them faster so they can become productive instead of in three years, in perhaps a year by becoming familiar with the, again, let's pick logistics as a domain area because it's one we're a little familiar with. When you become a data scientist slash logistics expertise, then you've got something I think that's useful, but a broadly based data scientist, you know, while they know statistics and they know all these kind of things, it means you're going to be a mile wide and not very deep. And I think in order to achieve true utility here, you do have to look specifically at a domain in order to become useful. Great question. I hope that's useful for you. And the next question is, what is one example of a business using big data to connect with their customers? To connect with their customers? See if I can talk with something that would be in the public domain that we can talk about. Probably the most useful area, I'll refer you back to Eric Topol's book on this. He's got several examples in here of how people actually do understand what's going on in the medical domain. So, for example, I'm not a doctor, right? I'm not a medical doctor in that context. But, you know, if we say, you know, aspirin helps people, right? What we actually find out from our analysis is that aspirin helps most people. That aspirin actually harms other people. Again, I'm absolutely making this up, so please take this as an example. But to connect with a population customer in this case, we would now be able to look at your gene splicing and dental DNA things that you're looking at in there and say that you should take aspirin, but my brother, Tim, should not. And it will help me. It will not help my brother. And that's a really good way of connecting with people because now we can start to focus in on individual treatment instead of population-based treatment. And it also means that when we look at the drugs that come out, we can say this set of the population should take them and this one should not. Again, we can split it easily into male and female just as an example and say that this drug worked really well for males and it works really poorly for females. Again, I'm not a medical doctor, so I'm certainly not giving you any advice in here. But I do think that that's an area that we can get a lot of connectivity in very quickly and achieve some very good results. Kate, the next question is, with so much data being produced and collected, how does an organization decide how long to retain it? Usually the lawyers tell us, and that's probably, again, not who we want to have doing this, not that there's anything wrong with lawyers. But most of those decisions are being made based on risk-based decisions instead of a utility-based decision. So with data being collected, we need to establish classes of data. We need to establish governance rules that surround different types of things. One of the other definitions of Big Data that I did not go into is that Big Data is never captured or never stored. It's only examined. It's search-based. Again, we've got 1,000 definitions out there, so we can't say that any one of them is more correct than the others. But if you're never capturing the data and you're only just sort of sampling it, like water samples or air samples or something like that, then there would be no need to store it that way. But the conclusions from that are actually kind of interesting, too. In other words, let's say that we took sampling of water samples from a body of water. And we actually retained those samples for a long time, but we retained the actions that we took from those samples. The lies aren't real clear whether the metadata surrounding those samples is, in fact, data about them or whether it represents a pointer to them. There's a legal case out there, for example, where somebody had an e-mail, but there was proper documentation and document retention policy. It was destroyed, but the fact that the e-mail had existed at one point still caused this organization to have challenges. Again, we've got a lot to work through in that area before we get there, so I'd be really reticent to give you any specific guidance in there. But looking for experts who want to learn how to do this. And again, that's something that we're certainly going to see some very interesting careers come up to in the near future. It looks like there are no other questions. So we'll go ahead and come in the next few seconds. Let's give it a little time to see if anybody submits anything. So again, we'll put in another plug for a conference that Shannon is going to be at to the data governance and information quality conference coming up in San Diego. And we, of course, are going to have our upcoming webinars, and we'll look forward to connecting with you guys. That's it. Thank you, everyone, for participating in today's event. We hope you've enjoyed it. Thank you again to Diversity and Shannon for hosting us. Once again, you will receive today's materials within the next two business days. Our next webinar will be Unlocking Business Value Through Data Quality Engineering. Hopefully you guys can make it out for that as well. As always, feel free to contact us if you have any questions. Thanks, everyone. Have a great day. Thank you, Peter. Peter, another great presentation. It was just another fantastic day. Thanks to you guys so much, and thanks, everyone, for attending. Thank you, everyone, and I'll just echo Megan's sentiments and say I hope everyone has a great day. Thank you. Bye-bye.