 So welcome everybody. My talk this morning is about big data for good and evil lessons from the NSA prism scandal. So my name is Jason Bloomberg. I am president of Zapthink. We are an industry advisory firm focusing on enterprise architecture, service-oriented architecture, and cloud computing. Been around for about a dozen years. Acquired two years ago by Develle Technologies, a U.S. government contractor. So we have special insight into the ins and outs of U.S. government technology. I've written four books and I'll be giving away the fourth book, the Agile Architecture Revolution, drawing at the end of the session. So just drop your business card or scrap a paper with your name on it into the bin, the waste paper basket going around the room. Okay, so let's get started. So we're talking about big data and this is a marketing term, a lot of hype, a lot of noise. So how do we define big data? Well, we define it essentially as data sets whose size is beyond the ability for traditional or typical software to deal with. So if you have really very large data sets and the existing software isn't up to the task, you need cutting edge technology, something that is at the bleeding edge, something that is just still immature and that's the only way you have to deal with those data sets, then those data sets constitute big data. So if you think about this definition, it is relative. As tools mature, what constitutes big data continues to grow and that is a fundamental part of understanding what big data are all about. That big data is inherently dynamic term. So 2012 big data technology landscape, somebody put this slide together and the reason I put it up here is this is completely out of date. 2012 is oh so last year. This is also a part of the big data challenge. The technology context continues to evolve. So you'll see some more traditional established tools on this slide. You'll see some new areas, of course quite a bit of open source, quite a bit of Linux-based technology here, but all of it is in flux, is in development. And I'm sure there are plenty of tools and technologies that aren't on the list because they're just too new to make the 2012 slide. This is a challenge in the marketplace. So for large organizations looking to deal with the big data challenges, it's hard to go shopping for technology. This is a very new area. So if big data sets are always growing, we can extend the same argument into the past. We've always had big data problems. Whatever the data challenge of the day is, that is your big data challenge. Come on in. It's like sitting in the front row, like the presenter's going to bite or something. Except for the people wandering at the end, they have to. And no other choice. There we are, very good. So if you take the big data challenge and extend it into the past, this tool here, the Hollywood punch card counter, was a big data tool in its day. We weren't able to automate a census data collection until this tool came along and they were less to put the census data into punch cards, which were also a big data tool in their day. So this is a part of the story. It's not like big data are new or special or different. We have always had big data problems and we always will because there's always some sort of technology at the bleeding edge and that enables us to deal with a whole set of problems that we weren't here before able to deal with. So if you think about it, there's people say that the quantity of data in this world doubles every two years or whatever the factor is. Well, if you look at all the data you have in your organization, some of it is more than two years old. And some of it is less than two years old. And whatever that line is, whether it's two years or a year and a half, about half of your data is older than that point and the other half is not as old as that point. So at any point in time, half of your data are more than that figure, they're two years old, all of your historical data. So that is part of your big data challenge. All the data you're dealing with, certain part of it is old data. And you have a special set of challenges for your old data. You have the format issues in terms of the floppy drive problem or the laser disk problem. You have the data format issues. It's a fox-based format file. How are you going to deal with that? And you always have those problems. Again, it's just a moving target. Come on in, plenty of room, plenty of room. This is a moving target and it will always be a moving target and it always has been. Yesterday's file formats, preferred file formats are today's obsolete file formats, but today's preferred file formats are going to be tomorrow's obsolete file formats. So it continues to evolve. It continues to be a challenge. And one of the big challenges with big data is that the big data sets continue to grow exponentially, but our ability to deal with big data as human beings does not. Our brains are rather limited in their ability to grow. Our attention spans are rather limited. Our ability to understand information, to process information at best grows linearly. That's a bit generous. If anything, my ability is going linearly downward, at best and literally upward, but no matter where you put it, the quantity and complexity of the information you're dealing with in your organization will eventually swamp your ability to process it. And you'll reach this big data crisis point where you just have too much, you have no idea what's going on because all the tools, all the technologies, all the approaches for dealing with amount of information you have are no longer able to deal with it and you just get swamped. If you haven't reached that point yet, because the exponential curve will eventually get to that point. So why do we have this problem? Well, it's a corollary to Parkinson's law. So Parkinson's law, if you may remember, is the law that the amount of available work will fill the available time. Well, the amount of available data will fill the available capacity. So if you have a bigger hard drive, that just means you get to collect more data. So it's not a matter of saying, well, we can finally solve our big data problems simply by improving our technology because then we'll have technology that will enable us to deal with the big data that we have today. Well, you're missing the point, right? If you improve the technology, then you will have more data because it's human nature to collect as much as you can, which means it will always fill up whatever you got, right? So whether it's punch cards or what, memory cores, or you might recognize the one on the lower right, that's one of the Star Trek computers from the original series. Oh, yeah. Okay. Okay, so some interesting laws, part of the Parkinson's law pattern is if someone can collect big data, then someone will, right? If the ability is out there, somebody is going to do it, right? So whether it's interesting data or not, whether it looks like noise, whatever, right? If you can collect web log files and somebody will. If you can connect your, you know, log files, somebody will. If you can connect all of the tweets that anybody has ever made, then somebody will, right? If you can collect all the previous versions of every web page in the world, then somebody will. Or if you're the NSA and you can collect everybody's email or everybody's, you know, buddy list or everybody's phone calls, then you will, right? If it's not the NSA, then somebody else. But somebody, if somebody has the ability, essentially a rule of thumb, there's no sort of technical reason why somebody would have to do it. But it's human nature, right? If there's a tool out there that enables you to collect a certain amount of information, then somebody, maybe not you, might be the other guy, is going to collect that information. So now we have to deal with the governance challenge, right? The more data we collect, the more important it is for us to understand what rules we want to follow about those data, right? And we have to wonder, well, do we want them to? What are the privacy issues? What are the security issues? What are the legal compliance issues? All of these challenges now that the more powerful our tools get, the more important these questions are. Because we're able to collect so much information, and if we're able to, somebody will. Okay. So one of the interesting twists to the NSA story is that much of the data they're collecting are actually just metadata, right? The call detail records, right? They weren't necessarily collecting the phone calls themselves, the audio. They were collecting the call detail records. That is, the records about the phone calls that Verizon and other phone companies have, right? So the call detail record has, the time it would play. What appears on your phone bill if you itemize all your calls, right? All the information there. So it's just a few hundred bytes of information per call, and if you talk about everybody's phone calls in the world, that's a lot of data, right? So this is an interesting point. We're used to thinking in the technical world of metadata as being of interest to the techies because it helps us work with the information from a technical context. But we don't really see that metadata are valuable or important to the business. It's really more a part of the tool that we, it's a tool that we have for dealing with data on the technology side. But what's becoming increasingly important is to understand the business context of our metadata as well, right? If the NSA is interested in the call detail records, in addition to the phone calls, well then we have to ask ourselves what are in the metadata, what's in the metadata, right? What valuable information is there as well, and treat the metadata as a big data problem. We have to do our analytics. We have to figure out what is valuable in our metadata. Okay, now another interesting twist to the NSA story, a big data, the big picture of big data as it stands today is that we don't just have to worry about the data that we want. We also have to worry about the data that we don't want because we're collecting a lot of data and we want to figure out the gems of wisdom, the nuggets of gold that give us real value, but we have to worry about all the rest of the data that we've been collecting. Remember Parkinson's law, if we can collect it, we will. If we have the storage, we have the capability, we'll fill up that capability with data, and much of that information is not valuable to us, but it may still have some sort of importance. For example, it may still have confidential information, or proprietary information, or other information that is important to manage. So you're not just looking for those nuggets of wisdom. You have to deal with all of the information and understand the policies and processes that you need to put in place in your organization to govern those larger data sets, even though the proportion of data that may be valuable may be getting smaller and smaller. So much more draws to those nuggets of gold, but that draws, that leftover stuff becomes an increasingly challenging governance problem. This is becoming one of the biggest problems because so many organizations have so much information, and they say, well, we just want this little bit about the terrorists. We don't care about the 99.99% that isn't about terrorists, but that's what we all worry about. It's all that rest of the stuff. And that's one of the lessons of the NSA story. Nobody's worried about them collecting information on terrorists, except maybe the terrorists. Everybody else thinks that's a good idea. It's what everything else they're collecting that is the problem. This is the whole NSA story. Well, you're looking for those nuggets of gold, and that's sort of the goal of analytics is to take a large data set and boil down that little bit that has business value. But it may be worse than that, though a little bit that has business value may actually be dangerous. So instead of nuggets of gold, that's actually what, uranium or something, right? And of course with uranium, a little bit is fine, so it's the same thing with information. So for example, personally identifiable information may not be apparent from a large data set, but simply by reducing the data following an analytics routine, you may uncover personally identifiable identifiable information that you weren't aware of. So for example, US Census Bureau, the US Census data is publicly available, right? Our tax dollars at work, not yours, our tax dollars at work. You can go to their website, download their tools, access their data sets, and anybody in the world can come along and run their own analytics on US Census data. But US Census is prevented by law from revealing personal information about individual citizens, right? So they can reveal statistical information about zip codes and about communities and about ethnic groups and all of these categories, but they can't tell you about any individual. In particular zip code, there is only one Native American family with two children. Well, you run these statistics, you may be able to learn about those people individually because there's only one of them in that particular category. So put together the right analytics routine and you're distilling personally identifiable information from statistical data sets. And that is what I'm talking about here. So another challenge is the risk of false positives. You don't necessarily know your data are completely accurate. The more data you collect, the more likely you will have a mix of accurate and inaccurate data. So if you have this enormous, complex, expensive, challenging analytics algorithm and it still is a little bit of information, it may or may not be correct, but if you put excessive emphasis on the value of that information, then it becomes increasingly dangerous, having false information. If you reveal some information that turns out to be false, it may steer your business in the wrong direction, it may cause a lawsuit, who knows what could happen. So this becomes increasingly, increasingly likely, as you start dealing with increasingly large data sets. So there's a lot of words on this, but I wanted to give you the full quote. Big data can be used to mislead and this quote is actually from the NSA explaining that the data they were actually looking at was really only a tiny little bit. We don't have to worry about it, it's only a tiny bit. So they use this rather complicated argument, the internet carries 1.8 petabytes per day, that's a whole lot of information, but we are only looking at 1.6%, but only a 40th of a percent, 0.025 is a 40th of a percent is actually selected for review, that is the automated analytics takes a 40th percent and then makes that available for review. So now the human beings look at that 40th of a percent and try to figure out what the terrorists are up to. So that effect is that they're looking at a tiny amount of all the internet traffic. A dime on a basketball court. So a dime is our 10 cent piece. Nobody here doesn't know that. So dime on a basketball court. Well we shouldn't have to worry about that. As long as the NSA is looking at terrorist activity, we can all support that and the rest of our information is to the basketball court and we don't have to worry about that, right? Okay. Well in other words, well of all the information that they're processing, 7.5 terabytes per day is the result of their analytics that they look at with their human being analysts who actually try to figure out what's going on. 7.5 terabytes a day. Well they're trying to position that as not very much thank you compared to all the internet traffic per day, but remember those call detail records are 200 bytes. So let's say, just for the sake of argument, we do our own spin. It's a bit misleading, but misleading the other way. Let's say they were only looking at call detail records. Well, if they're 200 byte call detail records, 7.5 terabytes, is 5 million phone calls per day for everybody on the planet. Well that sounds like a lot. So which is a dime in a basketball court, are 5 million phone calls a day for everybody on the planet. Same data, right? One argument meant to make it look like a tiny bit. Another argument meant to look like a whole lot. Same data, right? You could call this the pancakes to the moon problem. It's like anytime you end up with some sort of big number problem, and you see it on television, somebody will say, well it's like a stack of pancakes to the moon, or we can take all these things and put them in, and it'll go around the world 12 times. Well, those sorts of analogies are intended to make big numbers clear to people, but they can equally be used to mislead people. And you as the techies, as the folks who are supposed to understand this stuff, should be in the role of saying, wait a minute, the data don't mean that, the data really mean this, and that can be a real challenge to figure out what the data really mean when we're talking about such big numbers. Okay, so, on more of the sort of political side of things, one of the things I find, I speak around the world, so it's interesting that when I speak to a US audience, this whole NSA thing, causes certain consternation, but when I go other places in the world, there's a very different perception. Because when I go outside the US, the whole idea is, well, the US government is spying on all this data, let's not put our data in the US. Oh, we gotta keep our data out of the US. Oh, we gotta keep it in Europe, we gotta keep it in South America, wherever it is. Is that a reasonable conclusion? Well, maybe, maybe not. Who's to say your country isn't spying on you? Right? I mean, just because the NSA, we have Edward Snowden sort of, you know, show the cards of what the NSA was doing, doesn't mean that MI5 or MI6 isn't doing it here. Right? Who knows? What about, is your country working with NSA? Well, those guys sure look like buddies, don't they? What are they talking about anyway? Hmm. It's like, well, we collect and call these records. Oh, well, so will we. Well, how about, let's look at yours, you can look at ours. Oh, well, well, obviously, you know, in this case of US and the UK, obviously their intelligence efforts are highly coordinated. There's no way that you can say here in the UK, well, we want to put our information in the US because the NSA is doing something. I mean, who knows? The NSA could be spying on your data as well. Who's to say the NSA isn't spying on data around the world? I mean, they're trying to spy on the whole internet. And if you really know how to do that well, then you really say that, oh, this part of the internet never touches the US and this part of the internet doesn't. Who knows where the NSA really is? I mean, it could be around the world. There's no way to know, right? Oh, we can't put our data in the US because the NSA is looking at it there. Who's to say they're not looking everywhere, right? So, bottom line is what are your big data policies? Do you have a policy? Oh, we have to comply with, you know, the EU data correction directive so we don't want our personal data to cross, you know, borders, we don't want to put our data in the US because, you know, the US government is doing this, that, and the other are those appropriate policies. Do they actually meet the business need? And do they make sense in the context of what the technology actually can do for you? And those are some important questions that your compliance people will not really be able to deal with unless your technical people work with them to understand the real ramifications. Okay, so I've been talking about data governance as essentially the punchline to this story, right? What do you need to deal with some of these issues? You need better data governance. So what is data governance? Well, this is sort of governance the old way, traditional way, right? You have some sort of information problem in the data governance world. It's usually a data quality or a data confidentiality or some sort of data-centric issue. So what do you do? Well, go out and buy yourself a data governance tool. There's a bunch of them out there. Buy the tool, but buying the tool isn't governance, right? Governance is a set of organizational processes for creating, communicating, and enforcing policies. So we need policies for how we want to use the tools, right? What policies we put in the tool to ensure that we're meeting the business needs for data quality and data protection and these other data-centric challenges. And that's what we mean by governance. Okay, so far so good. So here's a simple example. Business stakeholder on the left, techie on the right. Business stakeholder has some sort of data governance challenge. Unclean data, very common, right? Address formats are mixed up in different applications. Who knows what, right? That's the problem. So the techie says, well, let's use this data quality tool. That's fine, but the business stakeholder realizes the tool alone doesn't give you governance, what you need are policies and processes for managing the data using the tool. Fine, okay. Problem with good data though is that big data are always in the state of flux. They're always growing. So let's add that fact into this equation, right? So we have an information problem. We need some tools. Then we need policies for using the tools. But now we need a way of dealing better with governance. Policies for how we do governance. We call those meta-polices. Because what we really need are next-generation governance tools that can give us a best practice approach for dealing with big data. Because the challenge is whatever tools we get, the big data will swamp the tools, right? And we have to have a better way of dealing with big data challenges. Okay. So there's this notion of a meta-policy. And this is where it ties, this is where we get to the topic of my book. Which you can win by putting your business card in the waste paper baths going around the room. About half the people didn't hear that before. So it should be somewhere. I'll have a drawing in a moment. So, okay. So agile architecture. What are we talking about there? Well, what we're talking about, is instead of just thinking about the things themselves, the data, the policies, the processes, and other aspects of your business environment, business and IT environment, you have to think about how those things change. So you want to work at that change level, that meta-level. How are processes, what are the processes for changing processes? What are your policies for managing policies? These are at that meta-level. What are your requirements for dealing with requirements? So when we talk about business agility, as a business driver in the organization, we treat that as a meta-requirement. Because when we say business agility, we're saying we want to build technology that enables us to deal with changing requirements. The ability to, that will provide us with business agility. If you think about it, agile methodologies are actually meta-methodologies. So what do I mean by this? Well, if you look at the agile manifesto, if you're familiar with the agile manifesto, one guy, oh, come on, a lot of you are. Agile, okay, very good. So if you think about the agile manifesto, the fourth principle, responding to change over following a plan. Now what if your plan is to follow agile? Well, in order to be agile, you have to be willing to change the plan to be agile. So we run into this problem all the time. Oh, yeah, well, you're doing scrum. Okay, so what does that mean? Well, we have all these rules for how to do scrum. I got in the scrum book. Okay, well, what if there's some sort of challenge? Retrospectives are taking too long so we don't want to do them. That's a scrum principle. Oh, well, we have to do them because they're in the scrum book. Well, you're not being agile. Because you're not responding to change. You're following a plan. So what agile methodologies really give you is a way of creating and updating methodologies. A way of changing your methodologies to meet changing business needs. And now we're talking about metapolices. Policies for how to do governance. And this is at the core of agile architecture because we need to establish ways of dealing with change that meet the business need, even as that business need evolves. Okay. So this whole notion of dealing with, at the meta level, is dealing with change. Anybody recognize this guy? Oh, somebody room this big. Somebody should recognize him. Come on. No, it's the other guy. That's a good guess, though. It's not Oiler. No, Oiler was blind in one eye. This is Leibniz. Go through Leibniz. Inventor of the differential calculus. Newton invented the integral calculus. I really just put him up there because I love his wig. They should bring those back. Wouldn't that be great if they had wigs like that? I kind of might bald spot it would be one of them. Anyway. Okay. So differential calculus, the mathematics of change, right? So this is what, when we work with architects, this is what the story is. You should focus on dealing at that meta level because you need to deal with change in your organization from an architectural best practice approach. Instead of saying, oh, this diagram, that's our architecture. There's our Java architecture. There's our data architecture. Some sort of diagram you put on a wall. Well, that's not the architecture. That's a snapshot in time of the architecture. The architecture itself is the principles for how that thing evolves over time. And that's the focus, right? The focus of the architecture beyond how things change, not on the things themselves. So how do things change? Well, typically manually, right? We can automate governance, but dealing with these meta policies is a set of human activities, right? We can use tools for agile development, but for dealing with changing the plan, that's a human activity, right? So we're working with policies. We're working with meta policies. So maybe we need a policy for dealing with meta policies. Let's call that a meta-meta policy, right? Oh, and we want to automate those, so let's come up with a meta-meta-meta, right? We call this the Hall of Mirrors problem, right? If you have two levels, then you can obviously think about a third and a fourth. But I put this slide in the deck for a reason. The Hall of Mirrors problem is an illusion. It's an illusion of an infinite tunnel. In reality, there's just the two mirrors, right? The same thing here, right? We work at the policy level or the process or the, you know, the task level. And then we work at the meta level, and those are the two levels. And it's an illusion that you have infinite levels. That's not really there. So basically, at the bottom level, we want to automate that. At the meta level, we want to treat that as a set of human activities. We've already been doing that. We just haven't made it a formal part of our architecture. Okay, so back to data governance. How does this fit into the data governance story? Okay, well here's the problem. A big data problem. We have too much information, right? So, okay, what does the techie say? Well, let's use this big data tool. Ooh, hadoop. Whatever. Pick it off the list. Okay, well now, of course, the business stakeholder wants governance. Here are the policies and processes for how to use the big data tool. But now we have, but it's big data, right? Big data are always growing. So what happens? Well, it's just a matter of time till the techie says, uh-oh, big data got too big for the tool. Right? If it hasn't happened yet, just wait a little while, right? It's bound to happen. So, now what? Well, we need policies for dealing with ever-increasing quantities of data. The challenge here is that big data tool you have doesn't do that, right? So the techie's at a loss. It's like, well, how can we do that? Right? So this is what we need. We need a way to manage policies for dealing with ongoing big data challenges. And that's the core of big data governance. Right? And the tools aren't really up to the task. So that becomes now the challenge, essentially, that the NSA has presented us. Right? Somebody is out there able to collect all this information. So they are collecting all this information. Right? For better or worse, for good or evil. And we have no established policies and processes for governing what they're doing. Right? And that's the challenge. Right? So what if that happens in your own organization? Somebody can collect information so somebody will, whether it's about your customers, your employees, or whatever it is. Now you can run into the same problems. Because we don't have big data governance tools that solve this problem for us. Okay. Well, where do we look? Well, we can look at the tools themselves. Right? The big data analytics tools may also help us with big data governance. Right? May or may not. But for example, if data quality is the challenge. Right? How do we know the data are accurate, data are consistent, that we don't have different representations of data across a large data set. Well, this is a common problem. Right? In fact, this is the world of Hadoop. Right? Hadoop deals well with mixed types of data. Right? So some structure, some unstructured, just a big mess of different data. And that's one of the use cases that Hadoop was designed for, is to do batch analytical processing of differently structured data sets. Well, you'll have good and bad data mixed together as a rule. Right? Data that are accurate, that are properly represented, as well as all sorts of other data that aren't necessarily accurate or well represented. So a tool like that can help you figure out the actual problem. Right? So move essentially tools that deal better with poor data quality levels. But this is really just scratching the surface. Right? Because data quality is only one challenge. Right? There are many other challenges. The privacy security challenges are the ones that are more of a concern to us. But data quality, at least we have a handle on how an analytics tool might be able to, you know, filter out the poor quality and give us better quality data. So when we talk about governance in this context, that word governance, nobody likes governance. Right? Because it brings in mind, you know, this sort of, you know, big brother, you know, rules and policies and, you know, having to go through all these, jump all these hoops to get your jobs done. But what we're talking about here is a much more lightweight, iterative, automated approach to governance that leverages technology. Right? So it's large and automated. It's agile. It's proactive. Right? It responds to changing business environment as opposed to dictating rules that people have to follow. But it's a new way of thinking about governance in many cases. So this is part of the challenge of this whole story. Okay. So the big picture here is we're developing more and more powerful tools. Right? Whether it's our big data tools, cloud computing as a category of more powerful tools, we're rolling out services as part of our service-oriented architecture. We're giving the business or a wider circle of users increasingly powerful tools. The same with mobile technologies. We all have supercomputers in our pockets now. Right? We're giving business users supercomputers and saying, well, yes, you know, do what you want with them. Right? This is the problem. Well, if you give people powerful tools without the appropriate policies and procedures, it's like giving power tools to children. Right? It's dangerous. And we're running into this in many different categories. And this actually is something that you can find on the poster as well. Right? All the different forces of change impacting the IT organization, whether it's mobile, whether it's SOA, whether it's cloud computing, these are all the story of giving the business more powerful tools. And without the appropriate governance, we end up with all sorts of, you know, BYOD problems and misuse of cloud computing resources and a whole series of challenges we just didn't have before. Right? If the IT organization, if the operations team, you know, manages all of the systems and doles out very limited capabilities to the business, you don't have the same kind of problems you have with cloud computing where anybody with a credit card can go provision their own instances and put your corporate data on them. Right? We have all sorts of new problems that are essentially governance challenges. And the more powerful are tools, the more important it is to deal with governance. Because you don't want to be a situation where you simply say, well, sorry, you can't bring your own phone. Sorry, you can't use the cloud. Right? If you take that approach, then you're limiting your ability to get value out of your people. Right? IT should be an empowering organization, shouldn't be there to establish a bunch of rules to keep people from getting their jobs done. Okay, so here's an example. Sola governance, automated slide. So in the Sola governance world, especially in the web services context, one of the great things about the web services family of standards is we have a number of standards aimed at governance. So WS policy, WS security policy, and a number of others, right, that give us essentially a standard way of representing a policy. Right? So we have a Sola policy, that might be an XML file that's written in WS policy, WS security policy, or a handful of other standards, that gives us an automated way of creating, communicating, and enforcing that policy. Right? Enormously powerful part of the web services story. Much more powerful than just soap. People think, oh, soap, it's difficult to work with, you know, web services or pain in the butt. Well, that's true, but the shining point of the web services standards are these policy-centric standards. So once we have that XML representation, nothing sacred about XML, Amazon web services uses JSON. It doesn't really matter, but you have essentially a metadata representation of your policy. What do you do with it? Well, you can store it. You can put it in your policy creation tool and have some sort of model-driven tool that creates these things. You don't have to hand code the XML, right? Use some sort of pretty interface or whatever. And you can store it. You can manage it. You can update it there in your centralized tool. And then you communicate it to your enforcement points, your ESBs, your XML appliances, your system management tools, your identity access management tools. Those are the enforcement points that enforce these policies. And the power of having a standard-based approach for doing this is you can create these policies in a central way and distribute them to distributed enforcement points for enforcement, and if all the tools follow the same standards, I know that's a big if, but if all the tools follow the same standards in the same way, then you have this automated SOA governance infrastructure. But what does it mean to do SOA governance? It means having policies for dealing with this whole thing, policies for how we go about setting it up, how we go about creating the individual policy representations, those XML funds. That whole thing is SOA governance. It's what the architects do or the architect team does with this infrastructure. Those are meta policies, policies for how we deal with governance. Okay. Cloud computing, the same thing, right? We're shifting with cloud computing, we're shifting responsibility for the IT environment to the user. We're fully automating the cloud environment and we're giving the user the ability to go to a web page or access a simple API with whatever tool they want and provision their instances, move their data around, configure the whole environment, set up the load balancing, set up the storage, set up the network. You can do it all in an automated fashion now. So what we're doing is we're making it easier for people to really muck things up. There was an article just a few months ago. Somebody did some research and found out that 70% of Amazon S3 instances were unsecured. It's Amazon's simple story service with any kind of software object in their images or files or whatever. 70% of them, if you knew the URL, you could download the file. Completely unsecure. And this article was saying, oh, Amazon is not doing their job entering security right, blah, blah, blah. It wasn't true. Amazon was doing everything right. People just weren't turning the security off. They were just putting stuff at an open URL with no security. Amazon, the infrastructure was fun. You could have all the security options. You could do anything the way you wanted. People just didn't know how to use the tools properly. Well, the more powerful the tools, the more likely you are going to be in a situation where people aren't using them properly. So the focus is now not on how you get those tools to work. You can just take the cloud for granted increasingly. Not quite there yet, but we're working on it. Now we have to deal with how we're going to use the tools. So the more our technology becomes mature, the more governance becomes the central focus of what we needed to think about. So this is on the poster. So you all have the poster. A few more left. He's taking the poster. I don't know if I have quite enough for everybody, but hopefully you can download the PDF on our website. Okay. So in the upper left-hand quadrant, complex systems engineering is the big picture. That's another talk. You'll have to see me in a different talk for that one. But the center is next generation governance. Governance replaces integration as the key to enterprise IT. It's not about connecting things anymore. We can take the connecting things part for granted. Once we get our architecture right, once we get the cloud computing right, the technology bits are taken for granted by the business. Now it becomes a governance challenge. How do we use these things? How do we use our services? How do we use our cloud instances? And governance becomes a key to the big data explosion. That's the lower right-hand quadrant of the poster. We're filtering the information is more important than simply storing it. We're able to store large quantities of information. Understanding how to use the information becomes the key to the big data challenge. All right. So bottom line, I haven't really given you the answer here. I haven't told you what big data governance tool to buy because they're not really out there. This is a challenge. The big data will always be too big. Big data challenge is always growing. If you think you have it figured out today, just wait until tomorrow. It'll be a new challenge. And the next generation governance tools, somebody in this room may actually put one together. It'll be inspired by my talk and go code one tonight. Well, they have to drive business agility. Drive the ability for the business to support changing requirements as the business environment changes. And the tools will always fall short if you don't have the appropriate architecture. And if you want to learn more about that, win my book. Very good. Okay. So one last plug. I am returning to Europe to run our two-day cloud computing course. The fourth and fifth of November. And I'll be back in London on the fifth and sixth of December. So you just write me an email if you're interested. That was my quick plug. Not too bad. Okay. So ready for questions. And the book giveaway. Make sure that Wastebasket is going around the room. Put your card or slip of paper in there. I'll be having a drawing and see what the questions are done. Some questions. Either you're all completely overwhelmed or completely lost. That's just a couple of questions. The problem with NSA is that it's capturing most of non-US data. Yeah. Well, that's what they're focusing on. Not only our folks are capturing their data, but also NSA is capturing illegally from the state. And that's done through actually this necklace in American companies which are recommended about it. You know, it's less about what's legal. It's more about what's possible. Yeah. Right. So that's part of the challenge, right? Is if you're able to collect the data, you need somebody to know. One of the things that that problem was that you had better laws. Yeah, laws do not perform. Well, better laws. But you need to have the appropriate, you know, technology approach for dealing with those laws. It's simply having a law and as many people will follow it. Right. But when you're dealing with data, we're dealing with technology-based context. Right. We have to have a way of establishing, communicating and enforcing the policies. The laws drive our technical policies. And this is what I'm saying. Without appropriate data governance, it doesn't matter what the laws say because somebody is going to break them. Right. And so laws by themselves don't solve this problem. We have to connect the dots between the high-level policies which include laws for regulations and the policies we can actually enforce in our technology. And that's the missing piece that we don't have yet. But we don't have a way of ensuring the NSA is doing what they're supposed to do. The other question was more related to the data problem. Most of the business thinks that because everybody's doing data, we should do the data. And we have a lot of data too. And we can monetize the data. How do you make business understand that you need a strategy to monetize your data? Well, one of the things we talk about, and this we talk about in any architectural context, especially meeting with a room of hardcore techies, is you have to start with a business problem. You don't want to start with the shiny thing. Ooh, the cloud is cool, let's do cloud. Big data is cool, let's do big data. Let's see if we can figure out what the business wants to do with it. No, you start with a business problem. So somebody, and this is part of the role of the architect, has to have a clear graph what is the business challenging? Does the business have a need that big data tools might solve? And then the architect can help figure out which tool is right for the job. If it's cloud-grade, if it's big data-grade, whatever it is. There's a lot of different tools now. And the challenge is not to have some cool new tool and go shopping for it. Who wants to pay me to use this tool? You're just barking it up. You're taking the wrong direction there. When you start with a problem, understand all of the different tools and technologies that are available so that you can make a recommendation as to what the right tool is. So the business has a problem that really is solvable with some sort of big data-centric solution that by all means recommend that. But only if you understand that's the problem. You don't want to give the business new problems because now you have a tool that can solve them. I've seen cases in which they consider strategically to know the technical solution even if they don't have a clear idea of what to do and how to analyze the data. Yeah, well that's common, right? I think on the techies we're doing a shiny thing problem, but the business side, they'll look at some technology thing they read about in some magazine and say, we want that because it's cool. Right, same thing. But they sort of have a different kind of ignorance, right? Ignorance of what the details are on the technology side. Again, it's the role of the architect. Our core audience are architects, right? So the role of the architect is help the business understand what technology can do for them in the context of the problems the business has, right? And then the architect is also responsible for working with the technical team to make sure that whatever the technical team does is something that the business wants to spend money, right? So it's all driven by where the money is, right? Because the business will come up with money if it's a problem that is painful enough, called those hot spots. Not just a problem, a problem painful enough the business wants to spend money to fix it. You're spending all your time on those. Other questions? Oh come on, at least one more. I mean, we're here in the Linux conference and regardless of the fact that you're using PowerPoint can you talk about a bit the challenges and the opportunities that the open source community regardless of the Duke and the tools specific to more than to the governments like a bigger, so largest scope tools that can actually enforce policies in a cloud sense that goes all the way down. Well, I think the big picture I've got a two minute warning here the big picture with open source is that proprietary solutions simply aren't practical in an elastic environment. Because a proprietary solution you'll have some sort of licensing arrangement you know, you run into this with windows in the cloud it's like, okay we want to have an elastic windows environment so how many licenses do I need? Well today it's 12, tomorrow it's 15, the next day it's 18. So how do you deal with that? It just doesn't make any sense. You have to approach any kind of elastic environment from a perspective that you can scale indefinitely without simply paying more licenses. The whole idea of licensed software is not compatible with the cloud. So if you take that as sort of a fundamental principle software licenses aren't compatible with the cloud well that sort of says open source is the only way to do this stuff and that means everything and anything you might want to do in an elastic environment so any kind of infrastructure to service or big data or even platform or software to service it's just not practical to take a traditional software licensing approach to any part of it that needs to scale. So it's no mistake that Linux is driving the cloud because if we didn't have Linux we wouldn't be ready for the cloud. And it makes you wonder how far Microsoft will get with Azure just how much can they do unless they decide to give away Windows Server for free. And the chances of that? Well who knows? Let me see who knows what will happen with Microsoft. Microsoft, give me one of those server licenses for free. What's that? Yeah, well that may be interesting if that turns out to be the core of the Azure strategy if you run an analytics. Could happen, right? It's going to be hard to do it on Windows. Of course you know Windows runs on Amazon too, you know. Yeah, well, yeah, exactly. Okay, well time's up so let's do the giveaway. Where's the bin? And the consolation prize is free. That's all for the moment. That's a chance. Alright, here we are for the book. Sometimes I drew out, draw out the stuff to paper sometimes I draw out the business card. You never know. Man, here it is this time. Oh, it's sort of a business card. Is that a business card? Yes, that is another business card. It is a business card. It's my name. Okay, well, I get me good the tells out of it but I guess you win.