 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager for DataVersity. We'd like to thank you for joining today's DataVersity webinar, implementing Big Data, NoSQL and Hadoop. Bigger is usually better. The latest installment in the monthly webinar series called DataEd Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us and with each other, we certainly encourage you to do so. So just click the chat icon in the upper right-hand corner for that feature. For questions, we will be collecting them via the Q&A in the bottom right-hand corner of your screen. If you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataEd. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days, containing links to the slides. And yes, we are recording and will likewise send the recording of this session as well as any additional information requested throughout the webinar. Now let me introduce to you our speaker for today. Peter Akin is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of books, dozens of articles, and many books. The most recent book is which one, Peter? Data Strategy. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of his most important and largest organizations in the world have sought out his and Data Blueprints expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And today Peter is joined by guest speaker Mike Adalton. So let me turn everything over to Peter to introduce today's guest and to get today's webinar started. Hello and welcome. Thank you, Shannon. Micah, it's good to have you here. Micah is the president of Data Blueprint and he has been with us for about three years. And what we thought we'd do today was to try to talk specifically around what he's been seeing in the marketplace around big data because we're still getting lots of questions. We're still seeing lots of confusion. And it's just not really clear that some of these messages are getting out or that there's even a consistent set of messages out there. Micah, welcome to your first official podcast on here. It's our webcast with all the pods. There we go. But I'm definitely glad to have you here. Thanks, Peter. It's great to be here. And hello to everybody out there. So Peter, the question that we wanted to start with today and what we're hearing in the market is we've heard you discuss big data and big data tools and technologies. It's often that a client will come to us and say, I think I need a reporting solution. My executive tells me I need big data. How should I begin to build my business case to serve my basic needs before jumping into getting a tool that may really not meet the needs that I have to run my business? And first of all, I hope you applaud whatever customer it is that's coming to you by asking the questions in the right order because obviously the business needs should be driving the tools. And that's really the direction that we want to go. So we've organized this webinar around a series of questions. We want to start out by talking specifically about why it's important to consider the messenger. And that's important for everybody to understand because it does shade the message. Now full disclosure, Mike and I are both consultants, right? So you're hearing this from folks that work in that way. But then we want to look specifically at what our big data technology is good at. And we'll talk about a confluence of events that has come together to really give us some very different capabilities. But just because it's new doesn't mean it's right for all of your problems. And so then we'll look at successful big data approaches, what has been the key features of somebody who's working in this area that actually makes them successful. Because bottom line is up front, you guys, big data projects have been about as successful as IT projects, which means one in three of them succeeds on time within schedule at the functionality that's specified by the customers on this. And if my dentist was that bad, I would find a new dentist. Then we'll finish up with a little bit of operational type of context on here. And it's not all bad news. It's just that it's a matter of learning what works in the appropriate way. So Peter, you mean to tell me you'd really want to have two-thirds of your teeth? Oh, gosh. That stuff isn't that one, don't I? Excellent. Touché there, sir. Touché. All right. So first of all, many of you may not be aware that we're actually officially in the post-big data era, which means we passed it. And I tell that to audiences a lot, and they go, oh, man, I mean, it's over? No, it's not over, but the hype portion of it is over. So this is the picture that most people get when they look at what is being called big data. And you'll hear from us in just a little bit, but we don't really like the term because it's kind of imprecise. And that's an issue with this. Doug Laney, our colleague at Gartner, kind of started the thing off, although he doesn't claim credit for it. He's the one that popularized it, saying that there's increases in velocity, volume, and variety of data, and it just continues to go. So we need to see what that does differently for things, and that's a very reasonable proposition. In fact, when we started to catalog these things, there's Doug's pieces right up there for the starters. And Doug had a cool new book coming out on information economics, too. I have not seen a copy of it yet, but I'm really looking forward to reading that one. We found some other definitions out there of variability, vitality, virtual, and value and veracity. In fact, if you really look hard, there are 13 big Vs of big data. Now, we know right away that if everything starts with the letter V, the marketing people are involved, because there's no way that would happen by accident. According to the legend, John Marshall at Silicon Graphics originally coined this. But what it really does come down to is an aspect that we have that is really very problematic. If we don't have an objective definition of something, then any measures of success claims about whether it does this better or that better don't work. So you guys will remember, I wear both an academic hat as well as a consultant hat, and I get invited to big data conferences all the time. And my first question to people is, what do you mean by big data? And people kind of scratch their heads and say, well, you know just a lot. It's like, well, a lot's an interesting answer, but Mike, I think if you tried to sell a lot of value to a client, that would be a problem for you, wouldn't it? It would be. So your approach to this is that when a client comes to us with a problem, you try to pin down the value that they're trying to obtain from that. And if they can't specify the value, it's very hard then to say that there's a business proposition. Again, if I said I was going to give you a cool new card to replace your Jeep, you might say, well, you know, what's that going to do for me? I say, well, it can carry nine kids. And you say, but that's nice. I've only got one. We always try to segment what we're doing from the value you get for doing that thing, which I think is exactly the point you're headed at. And let's talk to you today, Peter. The issue then is when people talk about big data, you need to have a conversation with them and say, what do you mean by big data? So if we're not using the right vocabulary, it becomes very, very difficult to actually talk about the topic. Now, Mike, I won't ask you if you've ever seen this particular quote before, but I know some of our audience certainly have. This is Justice Potter Stewart talking about an obscenity case. And so he says, well, I'm not going to try to define the material, but I know it when I see it. Now, that may be easy for one type of thing for people to look at. But what it really comes down to is that we don't want people to have to actually look at it to decide whether that's what it is or not. That was a gerrymandering, Peter. We can do that one. Yeah. Well, there'll be a supreme court case on that one. There will be. You guys didn't know we're going to inject politics into this one, did you? All right. We won't do too much of that. But anyway, yeah. So even out of context here, though, your point is very well taken. That doesn't necessarily mean what we're taking. So if we have that sort of a problem, it is an issue. And so what I do is when I'm talking with people about big data, one of the first things I say is, can you give me an objective definition? And an objective definition is one that all of us can agree upon is, in fact, happening, right? So independent people coming to the same process in different ways would come to the same conclusion around that. So I say, let's not talk about big data because that's a very imprecise term. Instead, I would be very happy to engage with you in a conversation that describes big data techniques or big data technologies. And both of those are things that we can, in fact, describe precisely. Our definition for it is pretty straightforward. These are new techniques that allow organizations to impact by at least an order of magnitude productivity. Order of magnitude is a 10x improvement. A 10x kind of thing where the boss can't help but notice something good has happened in that. They are characterized by continuously and perhaps even instantaneously available data sources streaming. And the neat thing about that is that many of the streaming things are things that simply pass you by. You don't actually even want to capture them, but you're sort of sampling them, think of sensor data and things like that. We've been having an awful lot of praise for the weather forecasting community for the last two hurricanes because they kind of got both of them right. They didn't have it exactly right where it was going into the specific state of Florida, but they warned people people were prepared and that's much better than not. In fact, I just got off the phone with somebody who said, yeah, my neighbor says they can see my house. It's still standing down there. So that's good. We've actually got a couple of colleagues down there who are worried about it because we haven't heard from them as well. We're going to also talk about non-venoment processing. It's kind of hard to define something for being non-something, but it is the way it is popularly defined. Von Neumann architectures will describe them in just a minute, but are the prevalent models for computing, how computing has been for largely the amount of time. At least I've been working with this particular process. We'll come back into Von Neumann in just a minute. We also then want to talk about capabilities approaching or past human comprehension. Mike and I are sitting here in a conference room and we can't see outside or beyond the windows. We have walls. That's the limits of our comprehension here. We might be able to hear a sound, but we can measure the limits of human comprehension. No matter what you and I try to do, we can't see through things or under things or anywhere else given that particular piece. Finally, it's really key for whatever you're doing in this space to make sure that you have some way of identifying specific instances of things that you're looking for. We talk about architecturally-enhancible identity and security capabilities. There's other trade-off-focused processing. We'll get into that in just a little bit as well. Maybe a better question to ask, instead of how can I do big data, might be where in our existing architecture can we most effectively apply our big data techniques? Let me ask you this question because as we go through that, when you push back on clients with that sort of a message, do they kind of get that? What's your basic sense about the level of education of most of the customers? Could we do some pretty sophisticated customers? We do. I think it touches on the differences in understanding, I think it comes down to how you're going to make a decision you're going to make. Are you making that decision out of what you see and are attracted to, or what's really going to solve your problem? And we oftentimes spend a lot of time with our clients separating out what they're trying to achieve and making sure they're on the right path to achieve that from something that is a bright, shiny object. We can also talk specifically about various biases that organizations have in decision-making and that actually is why advertising works because we have confirmation bias in there. If you're not familiar with this whole issue, there's a wonderful book by Daniel Kahneman called Thinking Fast, Thinking Slow that goes into this excruciating detail. It's a wonderful book, one of my favorite books that's come out in the last decade around this. Given that, if you push back with a client and say, well, could you talk to us about where big data techniques could go in your environment? Is that a conversation that they're capable of getting into? It is. And it's actually a much more productive conversation because you're able to separate out the technique from the tool and the value that you get from that. And then I think you're also able to separate out how data management is really the foundation that allows them to take advantage of those big data tools and techniques. Fantastic. So given that, when we talk about big data techniques, then people don't really care what it's called, they just want the functionality. That's right. And of course, that's exactly what your attempt to do with them in that conversation is to say, look, you need something done. We're the experts in how it should get done. So let's really define what needs to get done. We'll push the how discussion later. And the reason that's so important is because we did a study, I think collectively we did, it wasn't a personal study, but we did a study in IT a couple of years back where we tried to find what was the hardest thing to do in IT. And it turns out the hardest thing to do in IT is not doing design. So for example, if we start talking about implementing big data techniques, then we can start talking about how many clusters we're going to need and whether we're going to do this type of processing, whether we're going to use cheap computers or expensive computers, blah, blah, blah. None of these are addressing the real question that the customer has, which is, what is it that they need to have done? They are all how questions. And when we look at what happens in these conversations, they tend to be overly dominated by the how. So those people are moving into how they are not doing what and that becomes a huge, huge problem. One of the things that I love to do is to try not to name the what that describes the what because it's the naming that I think oftentimes creates this challenge. Beetlejuice, beetlejuice, beetlejuice. Exactly. Oh no, he's here. We're kidding, he's not here. So great. And we have seen an overabundance of tools in this space. And the next couple of slides that I'm going to show all you guys are just sort of an annual survey that people do that pop up different ways of describing what's going on in the data landscape. And I don't expect anybody to get anything out of these slides other than this space is largely driven by tools. There's nothing wrong with it. The tools are great, but remember tools are a part of the solution, not the whole solution. So let's talk about how data actually comes about in this. And this is a wonderful chart that the folks at DOMO did. They do this every year for the Internet, and it's really cool to see the various bits and pieces of it. The chart says that for every minute of every day in the entire year of 2016, we had 18 million megabytes of wireless data up and downloaded from the cloud back down into our mobile devices. So that's every minute 18 million megabytes. Every minute, Netflix streamed 86,000 hours of video. Now that's enormous. Every minute of 2016, Tinder got a million swipes. I don't know whether they were left or right, but the Weather Channel got 14 million requests for what the weather is going to be like, and I'm sure that's way up with two hurricanes this year and a third one to follow, right? That's a real clip to the IJ. Jose comes next, right? And three and a half million text messages in that process. And just to put in one final piece of all this, Amazon makes $200,000 every minute. Now remember, Amazon makes less than a penny off of every sale, so you can do the math there and see what's going on in terms of keeping the UPS company quite alive on that. So when we look at how the data maps come out, well, I think that the couple that I showed you before were kind of useful. I like this one a little bit better because it really does show us that we have a relational world and a non-relational world, and that they are both sharing the same mind space because all of this is about delivering data to customers, whatever that customer is, a researcher or a consumer on the other end of it, it's got to have some peace here. Just so that you guys get this, the key is the next slide here, so you can actually see what's going on. But this shows a much more integrated picture of the whole space, which is really the way we'd prefer to see it. So while I appreciate the work that was done on those previous charts and it is a fine amount of work, it's not quite as helpful because it only shows one perspective on the whole thing, and I really like a more integrated perspective of this. Similarly, we're seeing incredible amounts of money being spent on this, although again, if somebody calls something big data, I would assume they're collecting the money and going into, but we've seen some big data pieces that aren't really big data. You know, there's just lots of data. That's exactly right. That's exactly right. One of my favorite people to talk to about this webinar in particular, a fellow that lives up in the Boston area, he said, well, back in the day, I had two billion records and things. Was I doing big data back in those days? I think the answer was, yes, Dave, absolutely you were, right? You can see here we're seeing lots and lots of spending on this, and our guidance here is to be careful and don't find faults, excuse me, don't fall for the shiny object syndrome, because as I said, the technology is just one leg of the stool, and I don't know where you're having on your next travel, but if I get on an airplane and I put me on one of those things, I'm not going to be very comfortable. In fact, I'm probably going to teeter quite a bit in order to do this. Our three-legged stool really consists of people, process, and technologies, and big data technologies are a good leg for that stool to stand on for certain types of applications. Before we go into what types of applications are going to be, let's talk a little bit more about what has happened in this industry space to make this happen, but please don't go out and bet the farm on buying some tools that are going to help you, because that is only a one-legged stool, and it's not really going to get to where you need to go. I'll say one other piece on this, too. Not that they're particularly negative, but one of our local customers here in town had a big data initiative, and they were going to do all sorts of big data stuff, and that went well for about three years, and then they said, now we don't like that anymore, we're not going to go to the cloud. All right, well, those are two completely different things in terms of how we're setting this up. Hey, Tom, sorry, we didn't tell you we switched rooms on you. So let's look at what those advances in technology are, and the first one is the fact that computing has become cheaper. So computing technology has followed something called Moore's Law, and after, I don't see Roger Moore, but that's not right, I have to look that one up. Gosh, that's terrible. I've got him on another slide. But over the years, his observation was that every 18 months or so, the amount of computing power available doubles, and the cost halves. Now if our cars had followed Moore's Law, we would be driving around with one tank of gasoline per car period. We would never have to fill our gasoline again. That's an incredible amount of superb stuff that's gone in here. So computing power has gone down very, very much. But more importantly, people like Google, have also figured out this really wonderful way of commoditizing what a computer is. And so you probably have heard the story, if you haven't, Google's computers are cheaper than your shoes. I don't care what shoes you're wearing, they're cheaper than it. Because Google is making the cheapest computers they possibly can, knowing that it's not a question of if it will fail, but when, and what they did instead was design a resilient network. So it doesn't matter. They have robots. Now they've grown up and down their server farms, and when they find a bad thing, they pull it out. It's like a scene out of the Matrix. You can imagine that particular scene. We'll get that for graphic next time. But anyway, because this type of computing technology for basic server support has gotten so cheap over the years, that's one of the components on this. Gordon. Gordon Moore, thank you. There we go. That's right. Second piece of this, then, is flash memory. And the really surprising thing, we all waited for flash memory to come along. It's the kind of thing where flash memory was, you know, we heard about it. We could put it on a USB drive and things. But most of our laptops and things have no more rotating disk drives. And because of that, it means we can change some things very differently in terms of how the internal architecture works. Oh, by the way, I should say thank you guys for listening to us today because your alternative is to go watch the Apple show where they're unveiling the new iPhone 8 and X, but we just lost half our audience right there. But hey, you could always watch them on repeat too. Of course we can get on repeat too. Anyway, so flash drives are following the same pattern. Now nobody would have, nobody did predict, in fact, that this would happen. But people put their time and attention to it. And all of a sudden, we're seeing that flash drives are becoming twice as capable. They're running twice as fast. And they are half the cost every 18 months. And that continued indefinitely. Moore's law can't continue indefinitely. But at least at the moment, things are going in the right direction. So I take commodity computing. I add to it very, very large flash drives. And these flash drives allow us to do things that we simply haven't been able to do. So you're seeing some pictures here of the inside of one of the Google Flexes where they're taking a look at all the various bits. The hard disks were slow. The memory used to be small. Flash drives remove both of those bottlenecks. In some cases, people accuse Apple of being nothing more than a very expensive flash drive manufacturer and seller. And there's certainly some truth to that rumor in there. It got so bad for Facebook at one point in time, they had to invent a brand new technology to make their stuff work. Because every time you hit a like, or every time you hit a unlike or whatever, that's a database transaction. And if you were trying to use this off of a relational database, as they were for many years, you eventually hit some bottlenecks as they did. So the bottom line is, given these new capabilities, what can we do? And I promised I'd describe von Neumann architecture. And von Neumann architecture is the original architecture. John von Neumann wrote the first paper that popularized the whole computing paradigm. And what I'm showing here on the screen is that we used to take data and we would move it to the processor, the computer, the CPU, central processing unit. And we take the program and we move it to the CPU. And that movement shown by that double-ended arrow there is the slowest thing that a computer does. So all that movement was a big problem. And now, if we look at it and say, okay, by the way, when we're done, we also have to move it back out to show the results. So we have two-way street on that. So the way we've been working with that is we've been working on miniaturization. And the whole point of this is that the miniaturization was simply designed to shorten the length of wire and that the electrical signals had to travel. So while they also became lighter, that was good. It was really all about speed that they were trying to optimize on. Now, the problem with that is we will eventually hit a limit and we're getting close to it. You probably have noticed that computers are running two to four gigahertz, which is pretty darn fast, but we're not going to be able to go a whole lot faster than that. So people like Michael Stonebreaker started to poke around and say, hmm, given that that's going to be a bottleneck that's going to have to be solved, what can we do with databases as well? And his piece that had a lot of database manufacturers kind of upset with him was that he came back and said that modern database processing is about 4% effective. Well, that's an issue. If I told you again, Michael, your car was 4% effective, you probably wouldn't be happy to rather go back to that. I can fill my tank up once in my lifetime and make that work all the way through. Big data architectures are like this, but we are still governed by the laws of face and physics. So is there a sum game where we have to trade these characteristics back and forth against each other? And if we're going to make databases or computers do things faster, we have to do things differently. The first introduction of that difference is the idea that we can take a task and break it down into multiple pieces. And at the moment, for example, you've probably squeezed a file on your hard drive somewhere, compressed a file, zipped a file to send it off to somebody. If you have two CPUs, you can divide the task of compressing a file up among the two CPUs and they should be able to do it. Probably not place as fast, but faster. And a lot faster if we can get over the overhead humps on this. So really clearly, big data techniques exploit this non-Van Neumann processing. And that's what we want to talk about in particular, is how that capability can be useful to folks. Now, we do a terrible job in IT and name and things. Have you had somebody come up and ask you about what does NoSQL mean? All the time. All right. And the answer, of course, is, well, it's both NoSQL and not only SQL. Well, it can't be, right? But nevertheless, people interpret what's going on there. And the idea is that SQL has been a tremendous success. We've taught college and university students for years and years about SQL. And when a business knows it needs to report, they will typically come in and say, I know I need to do reporting just as the client at the beginning said, but what can I do and how can I incorporate this in here? So the motivations are that it's less complex in design. It does give us the ability to scale into multiple processing pieces right away. And it does give you a little bit control over the data in some cases here. So again, we're going to dive into it a little bit further on that, but that's our NoSQL. It's a set of techniques that allow us to go in and mix our old technologies with our new ones. I was at a SAS user's group meeting the other day, and even SAS has a ProcSQL now that they can go in and do queries with. Hadoop is the main emblematic piece that goes into what's happening around big data and technologies. And the idea is that it was a great start on the whole process. This is one of my favorite pieces, though, because I'd like to show this to groups and show them. This is one of our data blooper Hadoop clusters. People go, oh, wow, that's really cool, right? And it's true it is. Those of you that are familiar, however, know that we really do not rather have, well, let's go back to the beginning. We need at least two computers to control what's happening here. So the most parallel we can bring in is three pieces. We're taking one task and poking it up into three pieces. That's not going to help. We're really talking about large numbers there. And trying to get to more rapid deployment and clusters of things. So the idea is not a problem that can be divided up into two parts or six parts, but a problem that can be divided up into 10,000, 100,000, a million different parts in order to go and do this. Because that allows us now to go out and work in these areas. So hopefully the first bit back to the question is, reporting is probably not your best area for big data, unless you're encountering a specific problem. Let me give an example. One of the ones we did on one of the projects was that we came in after some other company had taken a bunch of these files and moved them directly into an Oracle database. And the end-of-day job was running 45 hours. It's just not going to work. So might there be an opportunity there to exploit parallelism? Potentially. Is it necessarily a big data application? Maybe not. So that's sort of the way to think about them. And there are some very good cases on this. If you're modeling risk, if you're doing churn analysis, if you're working with Vodafone over in South Africa, and they were so sophisticated in their ability to run data on their customer base, that they could tell six months before a customer could that they were going to be no longer a customer of Vodafone, just by looking at the patterns that were in the data. Now, they didn't just sit back and look at that. Then they'd say, okay, this person looks like they're not going to have a contract with us in six months. What can we do about it? And that's where they were then able to apply other types of things in order to do that. What you're really looking at is sort of operational efficiencies or revenue growth or risk reduction areas as being part of the area that goes in. This is how it's typically portrayed. Let me give you a little bit more basic context around this. We use this in computer science, and we call it the CAP theorem. I apologize for getting technical on you guys, but it stands for consistency, tolerance, and availability. Now, we're going to deal with the first of these, which is that when you go to an ATM machine and withdraw $10 from the ATM, you want that transaction to be 100% correct, and the bank wants that transaction to be 100% correct. And we don't have lots of problems in that area because we've been working with relational database management systems for years who have done a very good job of making sure this stuff works when it's supposed to work. We call this an ACID test. It stands for Atomicity. We can identify the precise transaction. It will be consistent. The same transaction will occur over and over again, hundreds of millions, billions of times. The transaction can be isolated. We can find exactly where that transaction is in the log files until when it happened and what machine you were in front of and the picture that took of you when you were standing in front of the machine. And it's durable in the sense that if we have a failure, we can recover. Either it got in or it didn't get in, but we know what happened either way and it recovers from that. We add to that the availability piece, though. This is where it gets kind of interesting. And this is where NoSQL does a really good job, but we call this basic because the transaction is a basic transaction. It means that the data and the response to the analysis of data will be available, but that the answer will be in what we call a soft state. $10-ish, right? You don't want to hear the bank say-ish, do you? And eventually we'll get to consistency. So we're going to come back to this diagram in just a little bit because it really does represent one of the best ways that we can do this. Now, Mike, I'm sure that whatever your customer is, if I've been babbling against them like this, they've gone to sleep. So help me translate this a little bit just at this point to say where are things, right? In other words, we've got new computing technology. We've got flash drives. Do the customers want to hear about these things? Sometimes they do. They're interested, right? Cool stuff. They're interested. But mostly they want to know back to our earlier conversation, they want to know about value. What are they going to get for it? And how do they separate out what they have to do for what they're going to get? So if you're reporting context needed to have things that were basic, available in soft state with eventual consistency as opposed to atomicity, consistency, isolation, and durability, then we may be looking at a solution that might involve some big data technologies. Absolutely. All right. Let's take it a little bit further. We also, with that, can increase the value of RAM and make huge, huge, huge servers where everything literally is loaded into RAM. So I want you to picture here that this is not 13 Pac-Men, but it's maybe 13 million Pac-Men. I thought how many nodes we have in our computing vagatrix here. And we're going to take the problem and decompose whatever problem we have into 13 million parts. And we're going to send Pac-Person against each one of those 13 million parts and chew on them and chew on them. Notice two things are critically important about this type of processing. It has to be decomposable and we have to reassemble it. If we don't, we haven't added any value. But once again, we're not really going to talk to customers about this, are we? We're going to talk about increased capabilities. And how that relates in specifically to some of the technologies and business problems that we have. And business results. Again, it doesn't do any good to help them with problems. It helps them with energy. Thank you for correct me on that, absolutely. So one more piece out of this, which is parallel friendly approaches. And what this talks about then is that if we could take a problem and easily put it down into small chunks and process all those chunks the same way, consistency all the way through, we can come up with very good approaches to solving problems that we haven't had before. We couldn't solve these problems before. So here's the sort of unfortunate set of circumstances where big data is good to think about in your entry level places. We call it a sandwich use case. And it's a really wonderful analogy on here. We'll give credit at the end of the slide here. But the idea is that your big data might come in in a landing zone where you're just not familiar with what's going on. We dealt with a phone a few minutes ago with another client that was working on it. They said, we've never really looked at our data that's coming in. We sort of assumed certain characteristics about it. And of course we know now that that's not necessarily a good thing. But again, you don't want to replace your existing architecture. You want to complement your existing architecture. So we can take this landing zone, understand it, and then transform some pieces of that into something that we might use. Nowadays, they're calling landing zones data lakes again, unfortunately, we don't have good definitions around that. But it does make at least conceptual sense that when you don't know the data and you bring it in this way, it ends up working a little bit better off on that. Same thing with an offloading. When we have archival, we can push it way, way out. So we simply don't need these things as readily. Now another way of thinking about this is to think a little bit more about something that maybe people don't like. So I'll give you a warning. If you don't like snakes, tune the next five minutes out. Because it turns out that some snakes can actually see in the dark. You might ask yourself a question. How would the snakes see in the dark? Well, it turns out the answer is some snakes are equipped with infrared sensors on the side of their heads. So as a snake is looking around and trying to figure it's way, this is a pitch black box that the snake clearly can't see anything in here with this Austrian researcher who's trying to get something to go after this. He's got a little bit of warm milk in this little bulb that's squeezing back and forth. That's why that bulb is glowing. That's why the snake is glowing. And clearly the snake can see what's happening in this. So these types of snakes can do some really interesting things. And I want you to think about a snake that's crawling in the grass right now. He's crawling along. And the grass is maybe four inches high and the snake's pretty low to the ground. And the snake crawls up here next to you, Micah. That's a warm thing. I like to use warm things. Do I want to eat Micah? Well, he could put his head up and look at you. I mean, he may be staring at your ankles, but you know, he's still going, okay, Micah's pretty big. If Micah instead was not Micah, but Micah was a mongoose, by the time he put his head up, the mongoose would have eaten his head. No more snake. So what the snakes are learning to do is they put these infrared pieces up. Kind of like a radar, but it's more infrared. And it says, warm probably tastes great, but a little bit big for my taste. So I'm going to keep my head down and slither my way back into the thing so that I don't get eaten by a mongoose or thereafter. And I, gosh, we go back again. Does anybody really care about this technology stuff in the big data world? I'm trying to connect with Snake, Peter. I was still captivated by your story. I was waiting to see who's going to eat and where's the mongoose. Where do we know the snake? Yeah, exactly. You mean, does anybody care about the radar technology? Well, we're talking about it in the context of a business discussion. So if we're having a conversation with a client, I think the answer is, as you said several times, they want business results. They don't care how you get them, but they are getting pushed, a lot of them, with very strong advertising messages that say these technologies do really cool things. And the answer is yes, these technologies do really cool things. Are those really cool things the things that will solve your problem? So I think one of the things that you and I talked about, Peter, as we were preparing for this, was obviously we talk a lot about data being fit for use. We talk here about tools being fit for purpose and whether or not the tool that you're looking at is really fit for the purpose and the problem that you're trying to solve. It ties immediately back into the kind of conversation about your big data landing zones being accretive to the tools that you already used, not in place of the tools that you had before. So so much of what we are doing with our clients is helping them understand that big data isn't a substitute for what you already have. It is accretive to and a way to solve a problem that you couldn't solve the tools you already had, but that doesn't mean you get rid of the tried-in-truth things that you're just now able to solve bigger and more complex problems. I've never understood throwing the baby out with the bathwater, but I guess the image is you're holding a baby and you're like, I'm done with washing it so I'm just going to pour it down the drain. Oops, I forgot to take the baby out. Right. I don't. I think that's even worse, but who knows. So let's take a look at how we actually describe this. This is something that we've tried over and over again and had very good results with. It's called the analytic insight cycle. And basically what happens is there's a pattern of some source that somebody is looking to understand. In this particular one, I'm showing you a simulation called Boyd's BOIDS. And what the researchers discovered was the reason that birds flock. And of course, you've seen this pattern perhaps in your life that birds just flopping around, look like they're doing something that's kind of related, but it turns out what they're doing is a census. But the important insight that came from this was that they could control the birds flying around like that with just three variables. But that's pretty interesting. So somebody maybe in law enforcement says, oh, maybe I could use those three variables to apply to crowd control if I have a potentially dangerous situation, such as we had in Charlottesville last month. Maybe that would have been a way of approaching that particular peak. So you get a pattern that emerges. You say, oh, okay, what did I see? The only way you can use that, though, is if you operationalize it, until we can take and pass this on to somebody else and say, hey, if you get a crowd of this type and you want to use these type of crowd control techniques to get fit for purpose for the tool, this might be something we could exploit in general. Now, the exploitation can only occur when we get it in a knowledge base. Now, that's typically training subsequent groups, people that come down the line. Some organizations of employment is to actually put together knowledge bases and put them in a formal capacity, and they manage it a little bit formal. This is what we mean by the analytical model net, because getting these insights, we need more of them faster. That's just the way society is, there's demands for it in order to do this. So if we add in on the left-hand side of the diagram here, volume, velocity, and variety is our 3Vs or 13Vs or however many we're talking about it, we can now start to take a step back as to what actually is happening. We need these big-dated technologies to say more what is happening here, not just direct observation, but looking out beyond our capabilities that we've had before. So we can sense things that are going on before. Again, the example I gave earlier, Vodafone, knowing when their customers are going to get ready to terminate their contract with them. We can go back to the patterns, the observations that are occurring there, and say, well, what are we actually seeing in there? And more importantly, we can also hypothesize what would happen if we did see this. For example, again, I'm a teacher down at a university, and if we see lots of students marching around in circles, we may ask the question, is the end of the world coming, right? Because students don't protest anything anymore these days. It just doesn't happen. But if we have that set up, we can actually do something like that. And then we can take the combinatorial value of data, pull these both of these things together to find out what really, really happens. And my favorite example of this is the three data breaches that we had up until last week, which were the biggest ones, the combination of Target, losing all that customer data, of Ashley Madison, all that bad information that they had on that one, and then the Office of Personnel Management that had information on classified people. So we could take people who'd been given a security clearance that lost their data through that, and happen to register on Ashley Madison with their same email address. Probably not the brightest thing to do, but we did have hundreds of thousands of them that did out of the 47 million people that were in there. It was very disappointing to see that. And then we can find out about their shopping habits at Target, which means that the bad guys who have all this data can now start to approach what we consider to be the worst national security threat we've ever faced in this country. So that's just an example of this. And this is really what we're trying to do is get past this bottleneck site. And I'm going to drop us back to a couple of really interesting insights that one of my heroes, JRC Licklider, put together. This was done in the late 1950s. And he said, look, what are humans good at? And he listed a bunch of things. And he said, what are machines good at? And he listed a bunch of things. It turns out this taxonomy has held up for a long time. And his observation was the best approaches are going to be the right combination of human and automated methods. The proper balance between those two is really what we strive for here at Data Blueprint to help our customers know where they should apply data techniques and where they shouldn't. Let's go a little bit further. And, Mike, I'm going to turn your original question back into how does one approach these projects? And I've already said IT projects largely unsuccessful, two-thirds unsuccessful. It's okay for us to be immature currently because we've only been doing it for a little while, but it's not okay for us to remain immature. We need to do better than that. So the first change is something that Gardner recommends, which is a wonderful recommendation, of don't treat these things as IT projects. Instead, take a look at them and say, these projects have no precedence. If they have no precedence, the word innovation applies very, very well to them. And there is a different way of running innovation projects than you run an IT project. We can talk a little bit about that. I don't know if we'll have time today, but maybe that's another webinar that we could pull together. Because creative thinking also helps us to look at, and this is the other part of the conversation we had with the customer just before starting the webinar here. The customer said, I don't think we've looked at our internal data very much. We've been relying on external data that comes in, but we don't have a good idea of what we have existing. And I said to them, don't worry, you're not alone. The average company has lots and lots of dark data lying around that they don't know that they don't know that they have it. And that's a problem for them. And that under-utilizes the results that you have. Now, be careful, though, on the inventory side of this, because this is a trick we've seen that a lot of CDOs have gotten in trouble. Again, conventional guidance to CDOs is go out there and do a data inventory of all your data. In 33 years, I've never seen anybody finish a data inventory. And of course, in a project mode, they want to know what can be done in two weeks. It's going to take you three weeks, four weeks, four months, four years. True story. There's one of the big banks out there that decided they wanted to purge their environment of access databases. So they cataloged 400,000 production access databases, which in and of itself was a feat. Then they actually get rid of them. It took them almost seven years to do so, but they did a great job. One of the best programmatic implementations I've seen across a large number of opportunities here. Be careful with that inventory piece. Finally, they create the ability to do analysis faster, but there's no way an IT person can figure out what the business value of that analytics is. Again, Micah goes back to your point. If we can't codify the business value, you're almost always going to need a process redesign effort as part of all that. So let's take a look at innovation then. Innovation is different from invention in that innovation refers to a better, and as a result, a novel idea. And innovation also differs from improvement in that innovation refers to the notion of doing something that is different to change as opposed to doing the same things better. So we're going to do different things, and we're going to do it different ways. That's two different things. And I put you out in a place that not many people are really, really comfortable. I mean, it's kind of a wild landscape out there. Where do you go? Well, there are lots of techniques, as I mentioned. We're not going to get into them here. I'll give you a couple little bits on it, though. Again, you use data to keep track of the innovation process. Again, you measure change, you motivate change, and you manage change in order to do that. Similarly, you also then use data to innovate. Wow, what if I took not just the radar sensor from the snakes, maybe I could incorporate them into those Google glasses that I work in and have safe for walking around dark places or something like that. Lots and lots of different ways of doing this, and there's a whole series of these things that you can pull together in order to come up with specifics on it. We won't go into those bits and pieces here, but it is something that you can take a look at. You can see, again, data could be just an absolutely crucial piece at getting these to measure on this. And the tools that we need are going to be data-based as well in order to do that. Then we want to get into the concept that I'm near and dear to my heart. Again, this, by the way, is Licklider sitting up there in front of what he believes the home computer of the future would look like. Didn't quite work out that way, but I like the picture, and it turns out every time you do a re-engineering search. So first question is, how can you state that you've improved any system if you don't understand the legacy systems, strengths, and weaknesses? Everybody's going to have some good stuff and some bad stuff. And we want to know what the differences are because we want to keep the good stuff and we want to try to improve the bad stuff. So the value proposition then is making sure that we do understand that and couple that with the new pieces. Understand the legacy systems analysis. You must first reverse engineer the existing system and then use the information on that to add value to whatever those new system requirements are in order to come up with a new system. But Micah, I'm just worried. I've taken you so far. You just asked me for a new report, right? How do we reconcile all of this? How do you have a conversation with a customer in there that actually starts to work in that context? I think it goes back to the fit-for-purpose question and making sure that you're helping the client think through. What is it that you have that you have down pat that doesn't need to change and doesn't require any re-engineering? And what new problem, what new question are you struggling to really solve for? How do you identify that and then figure out the right tool, the right technology, and the right data to help you address that new question? And of course that gets us back to the innovation component. That's exactly right. I'm going to drive into it, sure. And one last piece that we'll add to this, which takes us back to the diagram that we had originally, we call it throwaway prototyping. And I know that's just completely antithetical to most people. So here's, again, our cap theorem. We talked through these pieces, and I didn't put one piece of the diagram up here, which is that we talked about how NOSql and relational database management technologies work. But the piece I didn't show earlier is what if I want consistency and availability? In other words, I've solved consistency and fault tolerance, and I've solved fault tolerance and availability. Turns out what works really well in here is small datasets and throwaway prototypes. So first of all, throwaway prototype is a prototype where we simply plan on trashing the thing, right? We take the area of greatest risk, innovation, whatever it is that we're measuring, and we prototype it. We build something, but we don't build it with the idea that anything we're doing will likely be in the final system. But because we're going to throw it away, we can beat the heck out of it. We can do all kinds of different things, paint it blue, whatever it is we need to do in order to come up with it. And this throwaway prototyping method is incredibly important, because if we have the ability to simply... This requires patience on the part of our users, where we have to say, we're going to build this for you. Actually, I used to say this a lot. We're going to build it wrong for you three times, right? Oh, my God, why would I pay you to do that? Well, if I can build it wrong three times in the time it takes most people to build it wrong once, I'm still ahead. And hopefully, if I learn from those three subsequent failures, the fourth time I put it out there will actually be maybe more correct. So what do you say to those people who say, you've put in all that work, now you're going to throw it away? We've done it up front, you have to plan for it. And this is a hard change, and it's something we work with folks on in order to pull that out. So these are the three pieces, innovation, re-engineering, and throwaway prototyping, that we see help most organizations become more successful with their approaches to big data. They don't know the tools. They don't know what skill sets to hire. If you throw away a prototype, you're probably not going to hire an army of data scientists to throw something away. So we can try something at a low-risk approach in a way that we can start to learn around that process. Of course, you might be also alluded to that there's some things that are just going to be problems no matter what. And David Brooks of all people, a political commentator, put together a really good data column in the New York Times a couple of years ago that we tried to copy here, where he talks about what these things can do. Big data, data in general, is never going to tell you which of the thousand likes you have on your Facebook site are the ones that are going to really make your heart beat because it's a friend you haven't seen since high school and you're really excited about, wow, what's been happening for the last 20 years to catch up with them, right? They can count your friends to no end, but they can't tell you which of those friends you really care about. That's huge. Similarly, data is always going to be struggling with context. Again, we use the number 42 an awful lot and toss it around, but if you don't know the hitchhiker's guide to the galaxy, the number 42 is not going to be terribly amusing to you. I do have to say that in 33 years of doing this, I have never failed at least once any part of the world had at least one person in the room who knew the hitchhiker's guide to the galaxy, but it has come down to one person sometimes. Data creates these incredibly large haystacks, which means you're going to get more correlations. Correlations, however, do not count as causations, and we've got to be able to grow into that as well. It has big trouble with big problems. I've never convinced somebody who was red that the blue issue was right, and I've never convinced a blue person that a red issue was right because data does not impact on those decisions. So all the big data in the world is not going to change political opinions. However, we can make this statement. It does look like the Russians may have figured out how to use big data to swing elections. Keep tuned on that one. We'll find out. That'll be very interesting to see what happens. It favors memes over masterpieces. Cat videos are always going to win out over real artwork on there, and data, of course, obscures values, given that. So to get back to the top here, I'd like to come back to Maslow's hierarchy of needs and just get people to really remember what Maslow's real insight was. You have to have a series of necessary but insufficient conditions to get to where you want to be. We call self-actualization now flow. You're in the moment and really operating at your highest capacity, but if you have food closing and shelter needs that are unmet, you'll never be safe. If you're never safe, you can never get to love and belonging. If you don't have any love and belonging, you'll never have self-esteem, and if you don't have self-esteem, you will never get to self-actualization. And our data world is very much like that. There's these wonderful things in the pyramid that are largely technology-based, but they don't know that this requires a foundation. And if we have the foundational practices in place, government is following the strategy, architecture, and operations, now we can start to make progress because whatever we're doing up in the top half of the pyramid is supported by the bottom half of the pyramid. If we don't have the bottom half of the pyramid or we don't know that we need the bottom half of the pyramid, everything else ends up taking longer, costing more, and delivering less, while at the same time presenting more risk to the organization. So rather than concentrate on the technology pieces, what we'd really like to do is help organizations do both the technology and the capabilities in order to pull that up. There's a couple of examples here we can go through real quick, which is just to take a look and see what's happening out there in social sentiment. And the movie studios have gotten pretty good at this. So they don't really, I hate to say it, they don't care what the critics think anymore. What they do is they look at the Twitter feeds that are coming out of the movie theaters on opening night of the movies and decide which movies are going to fail or not, based on the Twitter feed, because they are more accurate than the critics. Don't get me wrong, I like reading critical reviews and things like that, but we're finding out that Twitter is actually better at doing that. So there's a really great approach to that. Obviously, I think this begins to speak to the value and being predictive, right? The way that you're using this analytical tool and technology is to predict the value of that movie. The critics review is the critics review for a small slice of people and carries with it. All kinds of bias that we talked about earlier. Yeah, maybe only readers of the New York Times or something like that, whereas Twitter, of course, is ubiquitous. That's right. And we have to say President Trump has been some effective in that area as well. Speaking directly to the people, right? There you go. That's one of his things. Another one is to take a look at how in real time you can start looking at pricing of books. Now, with 10 books out on Amazon right now, I watch those real keenly, not that it makes any difference at all, because when the book goes down and the price goes up and price, I don't make any more or any less. I just find it interesting to watch. But as you're starting to look at these, one of the things we're seeing is arbitrage that's occurring between eBay and what used to be haf.com and the Amazon marketplaces. And people have actually set up, you know, some very sophisticated routines to go back and forth and see what happens when they get certain books on the marketplace. We've got a real interesting test right now because two data strategy books came out in the same month, mine and another guy's. And, you know, they have the two of them do. We'll be able to compare side by side on all this. Take one more example here as we get to, actually two more examples here. Patient data, looking at this, the hospitals are just going crazy. They've got all of their data in various different systems. And it's extremely annoying for them to have to go over to the radiology system to get the radiology data and then the lab system to get the lab results and then get the other pieces. It just takes an awful lot of time and energy to pull this stuff together. So there's some approaches looking in the healthcare world at how big data can help to start to get a more holistic picture of what's happening here. You can see a couple of examples in the Wall Street Journal and other places that talk about this. And our final example here is a loyalty program, which is really just trying to find out what's happening on this. Now, most people don't know this. I mentioned the Apple event is going on right now and Steve Jobs is very much against apps on the iPhone. He wanted to be a pure device so he could control the whole ecosystem. But what really drove the development of apps on our phones is data, because when you have an app, you can own the data that says he's with that app. So while we could look up Amazon on the iPhone and take a look at it and get some information, if Amazon actually has that app on the iPhone, then they've got the data that occurs right there with it. And Netflix, why would Netflix want to know where you're watching movies? Well, it helps them position their logistics chain so they can come back and move logistically around the whole process and come up with an awful lot of really interesting, exciting pieces. Oh, now, Michael, you're trying to distract me now. You've got the iPhone X up in front of you. We'll look at that next hour. It's not really fun. All right, so here we are back at the top of the hour. We gave you guys some good references here for you guys to take a look at. Of course, we're very happy to engage you in a conversation around this so we can talk about specific issues, but it's time to turn it back over to Shannon and look see what sort of questions folks have for us. All righty. This is what a great presentation from both of you. Thank you so much. And Micah, it's so nice to have you joining us this month. If you have questions for Peter and Micah, just go ahead and submit them in the bottom right-hand corner of your screen in the Q&A section. And just to answer the most commonly asked questions, we will be sending a follow-up email by end of day Thursday with links to the slides, links to the recording of the session and anything else requested throughout. Everyone's being very quiet right now, though, guys. They're all watching me. I thought that was the Apple thing. Yeah, that's exactly it. No questions today. That's very unusual for our community. All right. I'll give you guys just a few more minutes. Anything you guys want to add? Anything that came to mind while you were speaking? Nothing new for me. It was a pleasure to be here with you, Shannon, and great to meet the community for the first time and enjoy having this whole time to chat with Peter. That's great. One comment's too much info. Yeah, you guys always pack in a lot. It's awesome. I love it. The recording definitely helps to go back and listen. So we hear we've got a question. How do you ensure that the business-driven data projects are governed? That's a great question. So one of the things to look for is to see how the organization is set up structurally. And what we end up fighting an awful lot of the time, and I do mean that fighting is organizational culture. Well, I'm sorry. We don't do that the way here. Well, the definition of insanity is doing the same thing over and over again and expecting different results. So one of the things we check out as we're trying to set up a project is does it have the requisite executive support, some sort of a champion, and again, a clear business case, as Micah said several times during the presentation. Without those elements, it's very difficult to make any of this successful. I was going to say the same thing. I think part of the beginning of any of those projects is making sure that, because a data project is no different than any other kind of project, I think it comes with the added complexity that you're often trying to connect an IT user and a business user about something that is shared between them and then trying to figure out who plays what role. But that is a very common organizational change management project and those projects always require very clear sponsorship, very clear executive ownership. And my recommendation is always make sure that you've got those things understood before really embarking on it. Oftentimes an easy test case is a place to start to prove out the value and making sure that you've got interested people to play on both the IT side and the business side that can work together to seek out a sponsor and hold them up to lead that effort. And we'll just take you back to our questions too, which is really consider the messenger, what is Big Data Technology good at, and then what have been successful approaches to this process. And we think we've got a recipe here that allows more organizations to be more successful more rapidly in this environment. Alrighty, so what are some of the tools healthcare companies are using to consolidate data amongst the various departments? Well, if I just started listing them, it probably wouldn't be very helpful to you. So let's say that most healthcare organizations are faced with much more legacy challenge than the typical environment that we see. We've actually built systems that integrated laboratories together. One of our long-term customers, we automated her whole laboratory in there. And each little piece of equipment comes with its own little piece of data, its own interface standards, its own ways of interacting with the rest of the laboratory, or not. And it is a big challenge. So healthcare is one of the most challenged debates. And then when you say, what are people using? We see an awful lot of technology-based solutions, but mostly what's happening these days is that people are trying to consolidate around electronic medical work. The conversation again yesterday with the healthcare group that we were talking about specifically mentioned that they had decided they were going to move into EPIC as their own system, away from their own proprietary EMR. And the individual that I was talking to said, but I'm not worried about that because I know how long the procurement process is going to take, and I'll be retired before they even start installing the software. So he was not worried about it, and that's not getting, obviously, some of the strategy things. Mike, anything on that? I think the other thing that I would add to that is we talked about fit for purpose, making sure that you're finding the right tool. The healthcare environment isn't changing at the same pace that Apple, or maybe maybe not Apple, but Amazon. The Amazon world changes second-by-second minute-by-minute in terms of dynamic pricing. There's a whole different set of demands on the healthcare world, which probably requires a different view into the kinds of tools that you're selecting for use, and making sure that you're solving the problem that you need to for your patients and your stockholders. I'll add one more piece, too. When we see tool-based integration, most people what we really think they should start out with is integrating the data, because the data is your sole non-deflatable, non-degrading, durable, strategic asset that's pervasive across the organization, and adding a tool to that mix without considering what the impact is on the data architecture is going to be hugely problematic. We've untangled enough of those to know that you really need to go back and start with the data integration and then work backwards from there into your fit for purpose scenarios. How would you do data governance on big data solutions? The first thing is to realize that data governance is really about managing a resource with governance. Just the same way, if I came along and said, I've got this great technology that'll grow human beings really, really fast. If we need a bunch of new employees, we can sprinkle some seeds in the backyard, and by the end of the week, we'll have a whole new crop of employees. Excuse me, I didn't mean to cough on that one. It's a pretty horrible quality to write. I would very much be suspect as to what was really happening in there. So if I then told you, oh yeah, by the way, radiation is a component of that solution, oh no, I don't think that's the way we want to go. Mutant detergils, you know, whatever it is, it's going to come out of the thing. So it really is absolutely crucial to sort of go back to what is the business problem that's being solved. IT is here to solve this, to serve the business, and that's what we're trying to do is to help the businesses get to what the results they're looking for. And the fundamental roles in governance don't change, right? You still have a data steward. You still have an IT owner. You still have those key roles in the data governance process. They just might insert themselves at different points along the path of using that data. And I think that that probably gets back to how do you apply that solution in your organization, given your culture, given the tools that you're using. Those of you that have seen me at conferences know that I say that when you're doing your data governance chart or grab for everything you possibly can. Tell them that data governance has control over IT projects. You'd be surprised how many people don't know what they're signing when they sign that off. And the least thing, you can come back and have a negotiated solution. But again, we see very, very often organizations will simply go out and buy new stuff. We're actually aware of a healthcare organization. I think we're doing some work for them. Who had a very good CDO that came in and said I'm going to turn everything into big data techniques. They actually used the right word. But unfortunately he wasn't able to, payroll is not the right application for these things. And he got held up a year after he said he was going to do this. And the board said, I'm sorry, we don't think that's the right direction to go. So you can't be the CDO anymore. And they're now back to a more rational approach, which is, as Micah said, complementarity rather than replacement. I love it. So what do you do to help people find what's in the lake? Data lakes are so hot right now. And that's why we're getting that question. I pulled together a slide on Data Lake and it didn't include it in this one. But the first problem, of course, with data lakes is vocabulary. What do you mean by a data lake? Most people think it's that place where you throw the data on the intake, where I showed it earlier on that diagram. We pulled that one back up. The sandwich model, we were talking about earlier. But there are actually quite a number of capabilities. If you have a data lake that has good profiling technologies that you could apply to it, that becomes very, very useful in terms of just looking at the data before you decide to ingest it. If we take this model that we're showing here and implement it as a security model, this would be the DMZ in there. In security terms, it means it's come over the first set of firewalls. We're going to hold it right here until we figure out exactly whether it's a Trojan horse or whether it's a real horse, and we can actually use this stuff. That's another aspect of it in there. How do you see what's in the data lake? The best way to do it is to know what you're putting in the data lake in the first place. Of course, most organizations, that's the reason they want a data lake, so they don't have to go through the process of deciding what it is they're putting into that process. The word swamp comes up in this context quite a lot as well. Swampful would be obviously harder to figure out than a lake, but I don't know about most of you guys. I can't see the bottom of a 30-foot lake. You need some technologies to do it, and big data technologies may be a good approach depending on your business needs for how to sort of see if there's anything in the data lake that looks like the diamond ring that fell off your finger while you were fishing. Again, that's probably really stretching the analogy, but picking a diamond out of the bottom of the lake there, even if it's kind of muddy, maybe a magnet as well, you know, that might be good. What are the tides been doing lately? Are there turtles that eat diamond rings? Sorry, Shannon, we can get a little silly on that, but it's a good question. Can you make that quite literal? I could take a diamond or two. That would be lovely. So how do you distinguish between agile and throwaway prototyping? Excellent question. Well done. No, go ahead. It's a fantastic question. Yes, it is. Agile is a really good way of developing quality software better, but it is really focused on software. And so the idea that you're going to create data and solve data requirements in an agile sprint is absurd. So our rule, guiding organizations that are doing this, is if your data requirements are in good shape, go ahead and do agile. It is a wonderful way to do this. But if you are in the middle of an agile sprint and somebody questions the data requirement, you need to stop programming right away because that can involve different structural requirements for the program. I've got another whole talk that does that. Shannon, maybe we should pull that one out at some point and do that one as well. Agile is very good at what agile is supposed to do, which is help you develop improved software. But don't mix up your data development, which is a programmatic activity with your software development, which is a project-oriented activity. I think I would add to that and say that agile is a project management approach to delivering software so we don't get down the path of, give me a year and I'll bring it to you and I'm sure you'll like it. It's a way to make sure that you've got the user, the customer, and the performer and the team all aligned throughout the process and making sure that you get bite-sized and useful pieces of technology throughout the development cycle. The idea of a throwaway prototype is simply that, let me build this thing as a prototype that may be the beginning of an agile sprint. You may say, wow, that prototype is really cool. I need to develop it. I think it goes back to the idea of innovation and how you develop for innovation differently from how you develop for production. You would enter into an agile project to solve a specific business problem. You would enter into a throwaway prototype because you're not quite sure where you're going with it. It's a very different kind of financial model in terms of how you're going to invest in that. It's a very different kind of delivery model. I think the agile question is, I already know what I want to get out of it. I just don't want to commit to a year and a million dollars. I want to commit to $10,000 a week, and I'm okay that it takes me a little bit longer to get all the functionality that I want because I know that each time I get another piece, it will be a creative. I'm going to assure everybody who's listening, we did not plant that question, but we do happen to have a website that we've been working with for a little while talking about using some of the concepts in the agile community into a thing we call the Data Doctrine. It does talk about the differences between the two. We won't go into them here, but that's another topic we can dive into. If you're interested, take a look at that website. Hopefully you can see it. There you go, datadoctrine.com. That's great. It's a great time for thinking about new topics as we're putting together 2018. I love it. How do you implement... We talked about data governance on big data, but how do you implement governance on big data without having a schema or structure? Governance is about an asset, and the schema is about the structure of the asset. One of the things that governance can do is help set up some guidelines for organizations to address exactly the last question that you had. When is it appropriate to use throwaway prototyping? When is it appropriate to use agile spread methods in terms of software development? That's the kind of... governance is at the policy level, and it's going to be focusing in on providing directional information so that people know what they're supposed to do. Again, think of the 10 commandments. Don't kill people. Don't steal stuff. I think there's a couple more that go into there as well. That's just sort of general guidance, and that's really what data governance should be or should certainly start out being. It can get more specific as we find that more specific governance benefits the business, but governance is also an ROI-based exercise. If you just do governance without showing any benefits from it, you will have a major challenge. One of our favorite things to do is to go in and revitalize the governance effort that has sort of run out of steam or become overwhelmed with way too much bureaucracy and complexity and lack of value. So here's a very interesting question for you. On the sequel versus relational databases, is it correct to say that sensitive data, money-related data, will be more often present in relational and only publicly available data will be stored and processed by big data technologies? So I would not say that those are absolute rules. And of course, now that we know that the Ashley Madison Target and OPM data breaches are out there, we can definitely tell you there's some money stuff out there sitting there as well. But it's a very good question. Are there certain types of data, money being one of them, and it should be relational? The question is, if you need consistency and fault tolerance, as we don't want it to fail, then those more sensitive things should be in the areas that are better protected. One of the interesting things about the, I would say experience, data breach is that Congress has now written to the company and asked them 13 very specific questions, such as, were you paying people on the outside to try and break in, which is considered to be a good practice in the industry? And if I put the data out in a NoSQL database, it generally has much less security considerations available to it than does the well-established area around relational databases. And if that's where the data was sitting, I think they may have themselves a bit of a legal exposure there. We'll just call it that. So I don't think you can make it as simple as only this type of database will hold, only this type of data, but definitely that flavor is what we're certainly seeing in the market. Interesting. So which applications give a snapshot compare, evaluate tools for considering, for consolidating or linking client-specific data? For example, it gives advantages and disadvantages of each tool. For 30 points on your test, and then for the next question, we're going to give you another 30 points if you tap that master's degree, right? Wow. Great question. Very hard. We can take tools. We don't have an interactive dialogue, so I apologize for that. But I mean, there is just an app. I flip by these, mainly because they're just mind-boggling with large. When you look at the number of different tools that are out there, what we'd like you to think about more rather than tools is capabilities. Because if you tell me I have these types of capabilities that I need to have, that eliminates an entire section of this diagram that we don't have to go through and figure out on here. But as you can see, there's many different tools that do many different things. And if you look at a tool by tool, it's a lot of work. And you're going to have to become quite expert at a lot of different things, but your problem is not going to help you, because you learn about it, and you say, oh, that's not going to help. So really, the direction we'd rather go is more to this type of a diagram here where somebody says, okay, so is it relational or not? And then if it's not, we're going to use grid computing, right? And there's some other high-level classifications that then we can now follow these pathways around. I'm not saying these are good or bad diagrams. They're each good in their own way, but it goes back to what's to fit for purpose. So if you're trying to figure out how to evaluate tools, the best thing to do is take a look at one of those different reports that do the big overview, the magic quadrant diagrams, or Forrester, or any of the other wonderful advisers that show this is out there to do that kind of thing. Or give us a call. We'll be glad to talk to you. All righty. Well, it looks like that's the end of the questions there. I'll give everyone a couple of quick seconds if you have any additional questions, and just put out a reminder that I will send a follow-up email by end of date Thursday with links to the slides and links to the recording of the session. Yeah, I think that's about it. Mike, Peter, and Mike, I thank you so much for another great presentation. Again, Mike, I thank you for joining us this month. Always a pleasure talking with you, and thanks to all of our attendees for being so engaged in everything we do. We just love it, and thanks for attending during the Apple Announcement, and we love that even more. So I hope everyone has a great day. Peter, Mike, thanks. Yeah. Hi. Thanks, Jenna. Hi, everybody.