 There we go. Hello and welcome. My name is Shannon Kemp and I'm the Executive Editor for Data Diversity. We would like to thank you for joining today's Data Diversity Webinar, Data Architecture Requirements. The latest in the monthly series called Data Ed Online with Dr. Peter Akin brought to you in partnership with Data Blueprint. Now let me get the floor to Megan Jacobs, the Webinar organizer from Data Blueprint to introduce our speaker in today's webinar. Megan, hello and welcome. Hi, Shannon, and thank you. Hello, everyone, and welcome. Thank you for finding the time to join us for today's webinar on Data Architecture Requirements. As always, a big thank you goes out to Shannon and Data Diversity for hosting us. We'll get started in just a few moments after I let you know about some housekeeping items and introduce your presenter. We have a one-hour presentation followed by 30 minutes of Q&A. I will try to answer as many questions as time allows, but feel free to submit questions as they come up throughout the session. And to answer the top two most commonly asked questions, yes, you will receive an email, links to download, to pay some materials in the webinar recording. You can do it afterwards. These materials will be sent out within the next two business days. You can find us on Twitter, Facebook, and LinkedIn. We set up the hashtag Beta Ed on Twitter, so if you're logged on, feel free to use it in your tweets and submit your questions and comments that way. We'll keep an eye on the Twitter feed, and we'll include those answers to those questions in our post-session email. And now, to let me introduce you to our presenter, Peter Akin is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and eight books. The most recent is Monetizing Data Management. Peter's experience is more than 500 data management practices in 20 countries and consistently named as a top data management expert. Some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multiple year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. He often appears at conferences and is constantly traveling. So Peter, where are you today? So today, Megan, both Lewis and I are out here in Pleasanton, California, where we're going to give a presentation tomorrow to the DAMA group. And then we're going to go up and hopefully have dinner with Shannon tomorrow night in Portland. We're going to meet the Portland data group. So welcome, everybody. Just a quick starting off place. Again, as collectively as data managers, we all believe that data is the most powerful yet underutilized and poorly managed organizational data asset. It is our collective soul, non-degradable, non-degrading, durable strategic asset. We've seen a lot of people that are starting to realize now that data is a new oil, and I've seen this in a number of different places. One place I was at recently, they had actually made it like this. Data is the new soil, which is kind of fun. You can put something, you can grow in it, it's not quite as environmentally unfriendly as all the other things that are oil-related. And then I'll see this t-shirt out and about now too. Megan, maybe we should get some of these for us. Data is the new bacon. Regardless of how we do it, what we're trying to do, of course, is unlock business value by strengthening our collective organizational data management capabilities, providing them with solutions that match their problems, and building lasting partnerships with the business. And today's topic is the requirements around data architecture. And in order to understand these, it's really key that you understand, first of all, contextually, we're going into the DEMA, DEMBOC, CDMP context that I start all these webinars out with. We'll then move into a section on what is data and information architecture, why are they important. Talk about how they're used, particularly in leverage. And we'll finish up with three specific examples. One, software package implementation. A second one, helping to improve the processing of a donation center. And third, we'll look at an application of text mining analytics. And that we'll finish up again at the top of the hour with some takeaways. And we'll always include a couple of references in here. And then the part that we really enjoy the questions and answers. So let's get started and take a look as we dive in. Again, first part of this is saying for many of you, understand that we talk about data management as being very much like Maslow's hierarchy of needs. And the idea is, of course, that if you are lacking in food, lacking in shelter, and hungry, it's very unlikely that you're going to self-actualize. And what we see most of the time in data, it would be things that we put in the golden triangle here at these buzzwords. And over the past 30 years, we've substituted a bunch of buzzwords as they've come and gone on here. But what has remained constant throughout the entire time is that these really represent just the tip of the data management iceberg. And that if we don't understand that there are foundational data manager practices that need to occur underneath these other practices in order to support them fully, there's no way that we can do them well. Another piece to understand critically about these foundational data management practices is that they operate in a weak link in the chain scenario. So in the little example that I'm showing you here, the data platform and architecture piece has a weak link in the chain because of the paper clip that's holding together that chain. It's not as strong as the other chains that you're seeing there. And, of course, the problem is that makes the entire foundation for everything that the organization wants to do with their self-actualizing data management practices only as strong as the weakest link in the chain, that in this case being the data management platform and architecture. When we talk about these things, you'll see most of the talk in the Golden Triangle is focused on capabilities, whereas what we believe collectively here is that these really should be focused on capabilities. So technologies will come and go, will advance, will do all sorts of wonderful things. But if you have poor organizational capabilities, you can get to the places in the Golden Triangle. You can accomplish the advanced data management practices without being proficient in the foundational practices. However, it will take you longer, cost more, deliver less, and present greater risks to the organization that if instead you crawl, walk, and run your way to the top. And I attribute this directly to Tom DeMarco who has asked many, many times, could you do it faster? And he said, yes, I could do it faster, but it will take longer if I do. Tom was a wonderful person to help us articulate these things. Another key piece to all of this, of course, is the idea that we now have, thanks to the CMMI Institute, a set of integrated practices, a set of de facto standards that talk about managing data coherently, managing data assets professionally, maintaining fit for purpose data efficiently and effectively, looking at data architecture implementation, life cycle implementation, and, of course, supporting organizational processes around this area. This is what we mean by the foundational data management practices. Some of you may or may not have heard of the DAMA guide to the Data Management Book of Knowledge published in 2009. We've worked really, really hard on it. And, of course, the idea here is that data architecture management is straight up at 12 o'clock, a place where many people get started on all of these activities. This is the one-page summary of what it means to manage data architecture in here. You can see it's an ipodiagram. The left-hand side describes the inputs, the middle part, the activities, and the right-hand side, the outputs, with goals at the top and participants and tools at the bottom. We're not going to go into this in huge amount of detail here, but we are looking through these educational practices to help you all become more proficient and join the other thousands of CDMP-certified professionals in the area of data management. So that's our little five-minute spiel at the top of the hour. Now we're going to dive into what do we mean by information architecture, starting off with the definition of architecture. Again, the key here is that architecture defined broadly as both the process and the product of the planning and designing. So when we talk about architecting, we are architecting an architecture. It really encompasses all design activities in it, but it lately has been referred to designing any kind of system and use throughout the IT world. In fact, key to taking away some value from this presentation is to understand that all organizations have architectures. The only question is how well understood and therefore documented are these architectures, because if they are not understood and documented, they cannot be useful to the organization overall. So we talked about architectures. These are symbolic representation of the structure as well as the use, and of course for us data people, reuse of the resources that we have here. They link common components that are represented in some form of a standard notation that are detailed to permit the business and the technical personnel to read the same model and come away with a common understanding. This understanding, of course, means a specific definition. It means it's documented and architected as a digital blueprint, and I will point out in the bottom left-hand corner. These are brought to you by Data Blueprint, which is our main business. We don't often do commercials there, but I can't resist at this point here. The key, of course, is to understand not just the business people and the IT people, but also to understand between systems and humans as we go into this. Organizations maintain many different types of architectures, process architecture, systems architectures, business architecture, security architectures. I've heard the word architecture used to refer to the technology stacks that organizations do, and of course the one we're focused on here today, data and information architectures. The key, of course, to be careful of is that if your management perceives that these are just technical committees that sit around and don't do things, then you are not demonstrating business value. If business understands what you do, then what you do becomes an investment. If they do not understand what you do, you are going to be perceived as a cost. Let's look at some very unsatisfactory definitions of information or data architectures here. The underlying information design principles upon which construction is based. Well, no business value in that. Plans guiding the transformation of strategic organizational needs into specific projects. Again, value, but not harder to pull out of there. A framework providing structured discipline of an information asset. Again, I don't like these because it's hard when you talk to people and think about this in the elevator pitch kind of a situation where you get on with the supervisor at a high level of floor in the organization, you have 30 seconds to get to the bottom and prove to them. You can imagine, even though I love Roger and Elaine Evinon's book, Information First, trying to explain to them that information architectures are foundational, disciplined, describing a series of guidelines and standards. Convention factors for managing information is a recent map you've gone, right? Even our DINBOC, I'm less than happy with that, which we of course expect the community to help us improve the DINBOC all the way around, but calling it, defining the data needs of the enterprise and defining the master blueprints, again, to meet those needs. Okay, but again, if we're trying to show value here, let's talk about a real bit, which is that what we're trying to do is to develop a common vocabulary. And this common vocabulary is used to express integrated requirements, ensuring that data assets are stored, arranged, and managed, and used in systems in support of organizational strategy. So I consider this to be a more useful definition, first of all, because everybody understands that if we're not speaking the same language, it is going to be very difficult to communicate, and that just results in additional overhead, assuming that you don't have any miscues or miscommunications in there. And the other key part of this definition is that the focus here is on supporting organizational strategy. If you're not doing something to support organizational strategy, my goodness, what are you in fact doing? Let me tell you a very brief little story about the use of vocabulary in an enterprise architecture for an organization. They were buying a software package that was going to manage their accounting functions, and all of the things that they were keeping track of was products that got put in tanks. And in this package, the software permitted them, every time you move something from one tank to another tank, it counted as a retail sale. Now, if you know anything about financial accounting, you know that there are other types of tanks that people use, and they could have modified their information architecture to reflect the vocabulary that they did. Instead, they decided through data governance to apply information architecture components and differentiate between the two types of tanks that I showed originally, and this tank, which is a tank that is moving it from place A to place B, this tank which is storing it, and this tank here which is floating across the ocean, and this tank which is flying around. Now, all of these are important because if we don't understand the difference between them when we count them all as retail sales, we'll get to a very dreaded consequence which is having to restate earnings. I've worked with several companies that have had to go through that oopsie, as they call it, and it's not a very pleasant occurrence on this. Just very briefly, we're going to talk, now we don't want to get these tanks confused with these tanks, which leads me to the second story in here. The second story is, again, very brief, but the idea is one of the groups that we're working with didn't have the ability to determine the obsolescence of heavy pieces of equipment, such as the one that I'm showing here. In fact, every time they bought one of these pieces of equipment, it calculated more than three million different data values, different data attributes that came out of every purchase of every tank, and if you didn't know the one or two that determined the obsolescence of this system, it was very, very difficult, which meant these guys with our help were able to save over a billion and a half dollars of impact on their expired inventory. They didn't have to store things that they didn't need. They didn't have to handle them. They could return the operations back to the manufacturer. Again, there were lots and lots of things in here that they were looking at that they were trying to get to. So let's dive down to a little bit more useful definition. A structure of database, not database, but data-based information assets supporting the implementation of strategies. And again, key to this is that most organizations have data assets that are not supported with strategies because their information architectures are not known or were still unhelpful. So the question becomes really, how can organizations more effectively support user information architectures to support their strategy implementation? And a good place to look to are your web people. Your web people really do understand. They make graphics like this one that show that the information architecture connects the classification and the hierarchy of the vocabulary that's in there. It has tags and labeling. It has navigation and wayfinding. It search capabilities connecting people, in this case, through a website to the content that they need to have. Great piece. We'll come back to some more web-based architecture implementations here, but that's not the only part of it. Think about when somebody is building things in a traditional IT project mentality. They may say that programs A, B, and C, in fact, share some data, so we're going to connect them using a database. And this database is used to connect for domain one programmed A, B, and C data and allow them to reuse this information. But if you're focused at a project level, when somebody goes to implement programs D, E, and F and use the green database down in the bottom right-hand corner, there's very little opportunity unless you formally put it in place. It's not going to just happen that the D, E, and F programs will be able to reuse any of the common data elements from the APXB database. And of course, this applies over and over again in organizations. What we're talking about from a data perspective is that the architecture has greater potential business value. The focus is broader than just the software architecture or any individual database architecture. It's a system-wide use of all the data because, of course, problems come in when we're trying to exchange or interface this data with everybody else's. So our architectural goals are much more strategic than they are operational in nature. Data architecture, then, is important because it is poorly understood. It's not very easy to calculate the data asset value there. It's often been inarticulately explained not necessarily within our domain but outside of our domain. I'm showing a picture of a disguised database system book that is in the upper right-hand corner that is a typical book of what many of our undergraduates and master students use. It has 1,000 pages on how to build databases and very little explanation on the indirect effects of not understanding the broader information architecture that you need to have in order to implement this. And it ends with costing organizations millions in productivity, redundancies, siloed efforts, or poorly thought-out software purchases. And I'm going to pick on healthcare.gov right now just because most people have that in their minds and remember it. Now, we know that one of the primary problems with it was that there were 55 contracting organizations in there. That was simply too many. And a very good quote by Jim Johnson, a chair of the Sandish Group, says the real news would have been if it actually had worked, that would have been a success in and of itself. But interestingly enough, the software that was programming some components of the system was using traditional SQL-based data management technologies. But another contractor, one of the other 55s, had incorporated big data technologies into this. And the one group would say, tell me what SQL to write, and the big data group would say, the incompatible architecture pieces that they had were at the root cause of some of the failures that occurred in the healthcare.gov rollout. Here's another example. If you are looking to purchase software, I always recommend that you use your purchasing departments and ask them to give you a model, logical data model of the system that comes in. First of all, if they can't figure out what a logical data model is, many purchasing departments are now planning to come in there. So about half the vendors out there are saying, terrific. A smart customer would love to talk to you, see what our package is, in fact, compatible with what you're looking at in here. Now, here's an example from a DOD organization. A person should be related to an employee by a businessful that in, again, are more or less business, technology speak. So zero, one or more employees can be related to one person. That's a great, correct business rule. But, again, if you think back to the elevator speech, very unlikely that the chief executive is going to understand what you mean by that. But if you say to them, we need to make sure that moon lighting is supported so that somebody can work for part of our organization during the day and then another part at night so somebody doesn't have to go around and calculate manually every year the tax reporting forms that would go into this. Again, another example on this thing is a business rule number three. I'm showing in the bottom there. Zero, one or more employees can be associated with one position. Well, again, that's a very, very correct business rule, but it does not describe the concept which is job sharing and saying that, in fact, it's real important for our system to be able to handle the fact that somebody may work this job from morning hours and somebody else may work this job in the afternoon hours. So here's a query that we found in one of the organizations we were working with. It was just simply had never been optimized as a very, very difficult dense query. And by simplifying it, we were, in fact, able to make it easier. I don't ever claim that we're going to get all the way to easy, but if we understand the architecture better, we can do a better job with our queries because what happens in many organizations is that this process, these queries are run hundreds, thousands, millions of times, resulting in what we call death by a thousand cuts to the organization as they try to do more with less but at the same time not realizing how their architectures are really a hidden expense. So the lack of a coherent data architecture represents a hidden expense by costing money to the organizations. And again, I get this question earlier. If you don't explicitly design your systems to work together at the most granular level, which is the data level, then it's unlikely that they will just happen to work together. And, of course, they cannot, the architectures cannot be helpful as long as their structure is unknown. So I've got a little book on this. It's actually the subject of the webinar that we've got coming up next month on this. But really showing that organizations spend between 20% and 40% of their IT budget migrating data, converting data, and improving data. So I'm not going to be a realizer of this particular piece if you do not know who John Zachlin is. Please ask us some questions at the end. I'm sure he, like everybody else, is going to be congregating in Washington, B.C. at the end of this month for our big conference enterprise data world. The goal then for data architecture must be making sure that we have a shared IT and business understanding as well as the systems on there. That the sharing is highly automated and that's dependent on the engineering concepts, which we'll come around to in just a little bit. That these modeling characteristics change over the course of your analysis. That there is a motivation for doing all of the modeling. Now many people define things in their models. Clive taught me to motivate, to put motivational purpose statements in there to say why are we collecting this information? Because it does get closer to the requirements than instead of just simply defining components. Again, the use of modeling if any sort is much more important than the collection of a specific method because the modeling documents are living documents. They're going to adapt and hopefully revolve over the time period. These models must have access to modern search technologies and that they provide utility. I worked for one CIO who over the years just never really got what we did in his organization. He did a video when Bangkok and Hong Kong and Singapore and I see your models on the wall. People aren't putting them up there because they're pretty pictures. They're putting them up there because they're extremely useful and there's clearly some business value in them. Now understanding architecture is very important and many people laugh at this particular piece of a diagram of a house. It turns out that this house is better than this house. This house has a cracked foundation and nothing more should be built on it. This first house that I've put up here actually does have a good foundation, a solid foundation and while we wouldn't want to put a swimming pool on top of the roof it is possible that we could add another layer of functionality on here so knowing and understanding the principle of architecture are much more important than most people realize. Most people don't even think about it. They assume that everything they're buying is a wonderful two bedroom house or whatever it is in there. Now as we look at architecture in particular one of the concepts that we have to do is understand these abstractions have also these concepts of abstraction also have to do with completeness and utility in here. So models are downward facing in detail and architecture is tend to be upward facing in nature. It's a great discipline. In the past if we have to critique those that have come before us they tried to get the architecture perfect and in data architecture this is not timely, it's absolutely not feasible. I've seen many organizations where they actually got five years to create a data model of the organization and they created a great data model of the way the organization was five years ago which was a useful contribution that would get me wrong but it did not give them so rather than focusing on the entire architecture our guidance here is to focus on specific architectural components that are relevant to the problem at hand that are governed by a framework because this gives us more immediate utility. Again a diagram like this is too much detail from an architectural perspective if you're trying to communicate with the business. It's just not going to be useful. It's not to say that this architecture diagram doesn't have utility in the long run but for communicating with the business and understanding how it can be used to solve business problems it is absolutely unhelpful. Here's another one this is again one of the web diagrams I'll call it Jeff Kern Design for these they're very nice articulations here where it's just an overview of showing how the web developers are using information architecture within a single application to connect users with the data that they have. This is too much detail this is perhaps the right amount for some uses and here is maybe a use that could have been done in the elevator speech where you can look here and see that the user experience consists of an information architecture component with user research site maps, wireframes and testabilities but the user interface also then has a visible vocabulary interface design specifically and HTML and flash implementation components and the content strategy here for these architectural pieces. What we're trying to do of course with all of these architectures is to organize details into components organize these larger components into models and the models then become organized into architectures. The data models expressed as architectures start off with the attributes that are organized into the entities or objects. The objects are then organized into models and the models are organized into architecture and in this case I'm showing more granularity on top and more abstraction at the bottom it's okay to flip the chart and do it the other way, the point is to be consistent and not confuse people. Data must be architected in order to deliver value. So here's our definition of data I use this a lot in my talks the number 42 is my age let's see 14 years ago. Okay so that's a pretty irrelevant fact and a meaning maybe you'll remember it, maybe not the second meaning of the meaning of life those of you that read Douglass Adams which I discovered in the galaxy will understand that little piece of it as well but none of that is helpful as data until we supply it as information in response to a request is Peter old enough to drink answer yeah he definitely old enough to drink on the other hand how do we make use of it from a strategic perspective and the key here of course is saying well if Peter is old enough to drink therefore we can market to him some things that may be interesting for him at the end of a long day or a long plane flight or whatever it is now the other part of this diagram that's important is to notice that if we don't have a data architecture we can't have an information architecture the data architecture is a necessary but insufficient condition on the other hand if we don't have data architecture and an information architecture there's really no way that we can come up with what the business intelligence the analytics whatever it is that we're looking at and calling it this particular decade can be useful here we have a little screen glitch there but again if you don't have the data architecture you can't have an information architecture if you don't have the information architecture you have a very poor foundation to strategically use your data in the long run this diagram goes back to Dan Appleton's time at DOD when we were both together there in the early 90s it's been very very useful to articulate this ever since and the other thing that I would also caution people is they say can I call it an information architecture or data architecture if you don't manage your data and information together you create even more problems in your organization so just pick one thing and be consistent and call it some organizations prefer information architectures my point here is that they are so close that trying to manage them separately is more work than it should be in any other way to do this again thanks to Dan Appleton for that let me give you another example of an architecture support in a restaurant context here now if my restaurant is serving dishes and the goal is efficiency then any dish we'll use that we use as an answer to get to efficiency on the other hand and those are the bottom dishes by the way just pick the next one off the plate on the other hand if I'm in a fairly fancy restaurant here for example in San Francisco where there are a lot of fancy restaurants and every dish has its own unique physical plate you're serving peach cobbler it gets a peach cobbler plate means when I drop a peach cobbler plate or have no clean ones in order to do that that does not necessarily promote efficiency and effectiveness goals or dexterity on the organizational part so all of these examples are designed to see that your information architecture your data architecture should address a series of questions how and why do the data components interact with each other where do they go when are they needed how and why will the changes be implemented what should be implemented organization-wide implemented locally what standard should be adopted what vendor should be chosen you can read these rules here the key is who is going to coordinate these things if it's not the data architecture group in addition to that when we look at data architecture it should of course be developed in response to organizational needs the organizational needs articulate and become instantiated into the data and information architecture that then authorizes certain information system requirements there's a precedence order in there as well and I very strongly believe that these things must precede systems development of course if we don't have a feedback loop on it to say how are we doing because we never get anything perfect the first time we have an issue there as well so we need to have some feedback to say hey how's it going and what improvements can we make along the line but this also addresses another fallacy although the information architecture is done no it's not the information architecture needs to evolve as the organization evolves through so let's move on now a little bit and talk about information leverage in your data leveraging and the key of course here is that we need to understand that it is not simply a function of technologies again data leveraging is an engineering concept if I use this graphic that I have on the chart here you can see that it revolves people and a process and from technologies and that people pull on the lever the lever sits on the fulcrum and the process is people pulling on this with our data we need similar types of things within the data and with our organizational data exchange partners we need to obtain leverage that is implemented by the that is implemented with data centered technologies and processes and student skill sets what we want to do of course is eliminate the rot 80% of the data in your organization is redundant, obsolete or trivial so there's no point in there and of course the bigger the organization the greater the leverage potential exists by treating data more asset like it simultaneously lowers IP costs and increases our knowledge workers productivity all parts of our architecture evolution can be categorized according to this framework our architecture is as is or to be that's what we have or what we want it to be similarly it can be categorized as conceptual in nature logical in nature or physical in nature and another dimension that most people do not include in here is validated versus unvalidated I've seen that word unvalidated off on that last piece there I'll change that real quick before I slide out on that again that piece in the corner there should be unvalidated versus validated because if something is unvalidated it is again a weak link in the chain there all of our architecture activities can fit into that particular framework now the real challenge that we have is that we've been teaching application development incorrectly for years and years because as soon as we get to step three here on this process where we talk about systems the system then becomes either a package or something that we're going to build and it sucks the air out of the conversation this ensures that the data is formed around the applications and not around the organization-wide information requirements the processes are narrowly formed around those applications and very very little data reuse is in fact possible given those circumstances if we flip our model slightly and say instead we should develop strategy goals and objectives and then as I stated a couple slides back develop our data and information layer first this gives us then the opportunity to talk about networking components and then get to our systems and applications as an additional development process the advantages of this approach of course are that the data assets are developed from an organization-wide perspective that the systems that support the organizational data needs and complement the organizational process flows as opposed to the other way around and finally we can maximize our data reuse now over the years we've talked about developing software from a number of different perspectives the real key has been always that we'd like to reuse software but when we measure it our use worldwide globally of reusable software components is as close to zero as it can get without actually being zero so we should really concentrate on reusing the things that we can categorize and use which are agreed upon data elements and data there. I mentioned earlier that engineering was important to this process engineering goes hand in hand with architecture. Architecture is used to create systems that are too complicated to be done by engineers alone it requires technical details of the exceptions but the engineers then develop the technical designs just as our data engineers develop the technical designs and our craft people develop these things that are supervised by building contractors and manufacturers as well so we can work hand in hand and I'd love to show this picture to groups there are four attributes of this picture it's taller than I am and it has a clutch it was built in 1942 and it's still in regular use today now in 1942 we were at war and in fact we were losing a war with World War II and consequently it was very very difficult we put thousands of young men on so the ships like the USS Midway here put them off into battle and every way they woke up and they needed pancakes or eggs or whatever it was that they were doing and I don't care how many of these things you gave them KitchenAid is a very fine brand but it has a different duty cycle you cannot use this mixer to make eggs or pancakes or whatever it is you're going to make for thousands of soldiers every morning and expected to work for more than 60 years we didn't know how long the war was going to last it lasted a lot longer than we originally thought and the reason we can do that is because we have a series of engineering standards that we use people understand what we mean with a set of stairs a set of black nails certain components that we put into play in order to use and this is where it becomes incumbent on us as data people to define an architectural work product and again I've got an example here from the old paper that we wrote the architectural work product here the PeopleSoft version 7 benefits modules implemented on Windows 95 you can tell how old the example is here illustrated the integration of the three PeopleSoft metadata structures in this case covering pay, personnel and benefits now that's a very good component to define our work around and that's what we should be doing is practicing not trying to get the entire architecture but trying to define as much architecture as we can and as we've needed to get the business problem resolved many of you have worked on systems that don't even have this level of decomposition again if this was the entire hierarchical system decomposition from an architectural perspective I could say authoritatively that there are three processes in the system and process one is more complicated than process two and process three that this was the entire system another way to describe a different system would be pay and personnel modules here again this is back to the PeopleSoft example here where we see level one, level two and level three definitions of how this architectural component is divided up and used here's a more complicated example this is actually from one of the veterans administration systems that we work on on this and you can see here there are 10 different components that they look at in this the point here is that you're not trying to boil the ocean with your architecture efforts and to say we're only going to focus on radiology which you can see in the upper right hand corner is everything that has six or five components underneath radiology then we can focus in on that piece and do a good job of that particular component similarly you may even look at a strategic level data model here's one for the Commonwealth of Virginia's Department of Social Services I think this is probably maybe 15 or 20 years old I'm sure it doesn't bear any resemblance to what's going on here but at the time it was very useful to explain to people that there was a taxpayer view, a client view a governance view, a program delivery view and a vendor view and I'll show you each of them very briefly, taxpayer view with concern with payments, taxpayers taxpayer benefits and social service programs the client view was looking at payments, clients, client benefits welfare agencies and noticed that they had the same between those similarly payments stays in the same place as well, we look at the governance view you can see the governance view for the State Department of Social Services was more complex dealing with resources, a board policy approval programs government and government the program delivery view which is the part the taxpayer services are concerned with again was dealing with clients here as well as social service programs, delivery partners and the both for agencies working with and finally the vendor view which was dealing with those same aspects, social service programs, client welfare agencies but also goods and services and vendor payments and only when you put it all together do you get a whole picture of what goes into that process this is what allows us to leverage these things and the purpose of these diagrams that we used 15 years ago was to help convince the management of the Department of Social Services that they worked in a information intensive environment and the information architecture components were absolutely not as helpful as they could be in order to illustrate all of the effective services that were being provided by the State so again old diagrams there but it will give you an idea how these six work, let's look at three very detailed examples on this here's a software package implementation, what we're looking at with this software package implementation is taking an old system that was an old green screen mainframe and yes there's still lots of them out there, I know there's New York Times article several years ago that said all of us football programmers are dead but I got news for you in New York Times some of us are still alive out here now one of the major process changes was that we had an integrated screen up on the left screen there in that particular system and when we put the new ERP in place it turned out to be that same information on 23 separate screens that was a big change to the operational processes that we were involved with in the system one screen the 23 screens required more effort on the part of the users of the system to be able to in fact use it to knowledge workers had to be more adept at understanding the process and there was another piece that was going in place at the same point in time which was that we were going to replace all instances of social security number with something called person ID that is of course a correct architectural implementation piece because person ID is a better unique identifier of a person than is social security number aside from the fact that there are certain legal restrictions that are involved well however the management that was involved in the system said you know I'm a little scared we're making major changes to the operational processes should we in fact also be making changes to this person identifier at the same time and ask the vendor how big a change is it to go back and reverse replacing also the security numbers with person ID replace all person ID with this with security number and the you know that's not a very big change well unfortunately it's not a very helpful answer either and they just pulled it in the corner had a total project budget in this case of five million dollars now to help them get some clarity around that answer it helps to understand the architecture of in this case peoplesoft which was the system that was being proposed to go in there peoplesoft has a home page that is linked to one or more business process names each business process name decomposes further into one or more business process component names in each component name decomposes further into component step names that one little diagram there tells you what you need to know about peoplesoft metadata from the process perspectives and when we look at this across how we're able to get peoplesoft to report on its own metadata you can see in the first business process name was administer based benefits there in the system and it had four components manage benefit enrollment to us manage benefit dependence management leave accruals report benefit participation in each of those had one or more steps that were associated with it this information here became key to understanding because now we could count the number of instances that we had out there in the system again there are the home pages that we had there were seven of them there were 39 business processes names and 822 individual business step names in addition to that if you'll notice the middle section of the diagram also showed the data records of the panels there are 26 excuse me, 25,906 instances of a field appearing on a screen that's a field appearing on a panel well those are very very interesting and useful pieces in order to see this but once I have that information I can in fact figure out exactly what I need to do and count up the precise number of things that need to be changed again having these numbers is very very important because instead of saying not a very big change I can say 1400 panels 1500 tables 984 component steps and I can add in some labor hours at $200 an hour and in this case the organization decided that if they were going to make modifications to the system those modifications should be paid for by the initial project budget for the next five upgrades so even though it might have been a $200,000 upgrade $194,200 as we're counting on this particular piece when you included the fact that people would certainly upgrade their system when they did we have to make all these changes again it became a million dollar decision in a $5 million budget a little more than not a very big change and of course one of the important pieces on here is how likely do we think that in fact it would only take people 15 minutes to make these changes so underestimating that here again we use the architecture to convince management that it was really a bad idea and a very big change to go out and replace every instance of the correct identifier person ID with the incorrect identifier social number because it was going to cost a fifth of the entire project budget let's move on to a donation center processing piece there's a cancer center that has lots and lots of knowledge workers and lots and lots of patients everybody knows somebody who's been touched by cancer when they would get a grant application somebody would call them up from someplace and say hey if you have 200 patients here cancer I don't think there's anything called here cancer I'm just giving you that as an example the challenge was that these knowledge workers the researchers and staff would have to pull from these files here and look manually through the files because the information wasn't loaded in them it was a very pain taking effort it would take a month to respond and of course if the answer needed to come up more quickly than that when we looked at the actual assessment of what was going on in there we saw that the information was being integrated too far down the system which meant that we didn't have good information again I've already told you there's a lot of material in there when we moved the integration closer to this we were able to come up with a better architecture that more usefully supported the value that was provided by the donation center here the real key to this is we see this 80-20 rule in there where these knowledge workers spend 80% of their time manipulating data and 20% of their time analyzing it by the way a quick little side note here we see the exact same statistics for the quote data scientist that is supposed to be the savior of analytics things in today's environment and when you ask these data scientists what they do they spend 80% of their time munching the data what this meant was that if we only improve their productivity by 20% we actually doubled their actual productivity and that was a huge lever that we could put in place this lever then allowed us to integrate these into a more holistic view and automate these previously manual processes passing data effectively and efficiently between the various groups of the donation center and to eliminate inconsistencies and redundancies and more importantly too we were able to forecast an increase of safe matches from three and ten to six and ten through primarily the use of this new information architecture one more quick example here there's a group that was faced with a data quality problem during system migration and the challenge was that they had millions of stockkeepers units that were about two million of them and these stockkeeper units information were stored in clear text fields now tell you the reason they were stored in clear text fields was because the Oracle database that they were using had replaced a previous more difficult to use hierarchical database however they hadn't changed any of the programs so they had actually gotten Oracle to work like a hierarchical database which is something that they developed for them in this case so they didn't have to change the program in order to migrate these to another software platform however they looked at this and said the data's in comment fields clear text fields and we will not be able to extract it automatically well it really left the problem very unsatisfactorily unsolved and so the group came to us and said can you help us we converted what I prefer to call non tabular data into tabular data again if you use the word I'm going to take unstructured data and make it structured that's the definition of unstructured data it's something that cannot be structured so these terms are more accurate non tabular data into tabular data and here we were able to save a fair amount of money and in this case literally a person's century of work now I'm putting these things together and we were able to hold one end of the equation fixed that was two software engineers that were working on the project part time so it was equivalent to one STD a week we had them work in teams of course because they were likely a process of collaboration and during the first week they sucked they didn't match anything now this is a matter of expectations with the customer you have to make a miracle however by the end of the fourth week we had already matched 55% of the specific stockkeeper's units back to what would now be called a master data management solution so here we're using text analytics to craft a master data management solution during the same amount of time we found that we could ignore at the end of the first week at least 1% of the solution and by the end of the fourth week we could ignore 1% of the solution was ignorable and finally we took our unmasked items we got a little better and worse so we were wobbly on this one we understood we managed the expectations carefully with the customer and we understood both that this would take some time the question became how much time should it in fact take the answer is you should do this until you get to the point of diminishing returns 30% unmasked down to 9% and in fact by week 18 we were down to 7.5% we had matched everything out there or declared it as ignorable and again you can see here this figure got to a diminishing return to point by week 14 where we could simply say that 22% of the data was absolutely of no value to anybody at all in there and which meant that our original problem space had of that 80% 7.5% would require a manual approach but that 70% could be handled automatically. Now again the information architecture components here when we looked at this and said to them we can take 80% of the information that's in your architecture right now and convert it automatically they looked at this and put together a chart for us that said my goodness you've made us you've given us the ability to save literally a person's century of work if you look at the second to the last line there about $5.5 million it was 93% years of work. Now that's a tremendous amount that's the first time I've ever gotten to a person's century you've heard of a person year but of course the other number that we put up there was that we said it only takes five minutes to plan to exercises in five minutes apiece. So if we double that it's two person centuries and $10 million triple it to 15 minutes which we said was inadequate on the previous example again we know we're up to 15 million and three person centuries on this. So these architectural components knowing and understanding what we can identify but finding 20% of the data is not useful by being converting this non-tabular data and the tabular data we're able to help this particular customer out specifically. So we're approaching back at the top of the hour here and let's just do some review. Again would you build a house without an architectural sketch and the answer is of course not the model of the model is the sketch of the system to be built in a project. Would you like to have an estimate of how much the new project is going to cost in this case the house yes. The model is going to give you a very good idea of determining how demanding the implementation work is going to be. If you hired a set of contractors from all over the world to build your house would you like them to work on the same language the same set of plans? And of course the answer is the model is the common language for the project team just as the information architecture is the model for all of the IT projects that are going on in your organization because we've already discovered it is much easier to share your data than it is to reuse your software. Would you like to get the team to verify the proposals after the construction team has started the answer is yes the model can be reviewed in a critical walk through before thousands of implementation hours go to work on them. Again a very, very key piece in project development. If you had built the same house would you like to build something similar in perhaps another place so now we can reuse these platforms using the exact same or a derivation of that model. Would you randomly drill into a wall without understanding where the map of the plumbing and electrical lines were and the answer is no it makes the project easier to support and maintain in the long run. A couple of takeaways here and information architecture is a structure of data based not data based. Data based information assets that are used to support the implementation of organizational strategy. There are other components we looked at other architectures early but the takeaway is that most organizations have data assets that are not supportive of their strategies. That is their information architectures are not helpful either because they are unknown or because they weren't developed specifically to support strategy and if you expected to happen as a random question is how can organizations more effectively use their information architectures to support that strategy. When we use information architecture it is the application of the data assets towards strategic objectives and this can be assessed by the maturity of the organizational data management practices. I'm going to pause here and put in a little plug at the enterprise data world we will be doing a tutorial on Monday and Louis Broom will be doing a tutorial on Saturday. Louis is going to focus on data strategy and Melanie is going to focus on the data management maturity model that tells you specifically how to improve the maturity of your data management practices in there. Because if you do improve your data management practices these results in increased capability, increased dexterity and accomplishment through the data center development practices the taxonomy, stewardship, the repository, the idea that data must proceed systems development because data evolves over time and systems come and go. So the question of how does an organization achieve better use of its information architecture the answer is continuous pre-development. The starting point isn't the beginning. The vast majority of us on this call have never developed an information architecture from the ground up that only happens when you're starting a brand new organization. By the way data blueprints going on 16 years this year so that is one place we were able to do it. We have helped a couple of organizations out but the vast majority of our customers end up with existing information architectures that aren't they need to be re-engineered. This means using an incremental iterative approach. Focusing on one component at a time and applying formal transformations to that model that I showed you that I didn't have the unvalidated piece on that I'm going to go back and change right now because show you this last slide here. Talk about the upcoming event again as I mentioned EVW coming at the end of the month here webinar is governance strategies that we'll have in April. That I'll turn it back over to Megan and we'll start to go into the questions but as I said I'm going to change that one slide. So Megan I'll turn it back over to you. All right thanks Peter now time for the Q&A time for you all to ask your questions so just click on the Q&A chat feature it's the Q&A window feature at the top of your screen you have a question is what is the first step in gathering requirements for a data architecture project? First step in gathering requirements for data architecture is to find out what the requirements are. If you don't know the requirements you're not going to be able to do any sort of valid data architecture at all. So many people are very curious in order to find out what the requirements are. I'm reminded of a story Megan and I think you probably heard it around the office as well which is that we had a company call us at one point and say hey can you guys develop a data strategy for us and we said sure we'd be glad to what your business strategy and they said we don't know we don't care we just want to check a box and we said well we can be glad to but we got a call back from the IT manager a couple of weeks later who said hey I understand you wouldn't develop a data strategy for our folks I think that was a really good idea. We should have an IT strategy in place I'll go figure one out and we'll get back to you in just a little bit on that. We eventually got a call from somebody who was equivalent to the managing director said well I'll tell you what our team decided to sign the paperwork he said it's one word analytics and we just went oh my goodness no no you need to understand quite a bit more about your data strategy in order to come up with a good architecture because if you don't have a strategy any architecture will do if you don't know where you're going any road will take you there. So the first step again is to understand those business requirements and make sure that information architecture supports that requirement. Great question, thanks for starting us off with that. Next question is do you think an integrated data warehouse can be achieved with the conformed dimensions in a collection of star schemas? That's a very hypothetical question of course it depends on the quality of the star schemas that are done etc. It's certainly possible to do is maybe slightly different rather than looking at whether a large integrated component can better serve the needs of the organization. The alternative would be perhaps to put some smaller components together and make sure that those components are driven by a global architecture. In this case you're more likely to be able to achieve synergies between different components different dimensions that are in one large piece. The larger something is the more complex it is the more brittle it becomes and that brittleness in an unchanging environment can work to your advantage but many of us work in highly evolving environments and that becomes a disadvantage. So I think that's a pretty good answer to your question if not please push back on it and let's look at some more. The next question is the information architecture who should be the owner? It's a great question. I advocate very strongly that for today's environment the business should be the owner of this product. Now the question here let me go back to the model I put up before a very good question on this. The technical people tend to look at data as storage and it's just a question of volume from their perspective. They don't really mind what is in the data set what's going to be used within that context in here. And that's not bad. They're not bad people for doing that. We haven't really taught them that data is our sole non depletable non degrading durable strategic asset. So these information components here the data is really what people are focused on at the level the lowest level of this pyramid here that I'm showing. The business is where it starts to translate into information. So you may provide data but the business people provide the response to this request. Is the order late? Is the order early? IT doesn't care how many orders are late or early they just clear about reporting accurately that information to the business. So that's the strategic use of the information. I have to say I flew out here yesterday on United Airlines and within two hours of me landing they had a customer survey team and they wanted to know what I thought about the flight. Now I don't know how United Airlines is using that information but I'd be really surprised if they weren't using it to improve their quality of information and the intelligence level. I didn't mention this earlier but the intelligence level for a couple of years it was called the wisdom level and it was also called the knowledge level. Again this is one of the things that happens when you get to be as old as I am as you see these things come and go but the basic structure has not come and gone. Now back to the questioner's question. Go over there and say Peter says you should use it this way. I've got a book on it called the case for the chief data officer and it's up there at Amazon and it articulates very well. I've got a number of very fine comments posted on the review section there that talk about the specifics of why this should be the case. Now this is not to say that it should necessarily always be important. We've had such neglect and such under education in this area but if we don't push it to the business it's unlikely to get the time and attention that it needs to have and the reason for that is because IT is not worried about just data. IT is worried about security, IT is worried about telecommunications infrastructure, IT is worried about software applications, IT is worried about a number of things that hasn't worked well in the past so we should have no expectation that it's going to work well in the future and instead what we should say is push it back over to the other side and let's correct this error on the side of the business in this case and not error on the side of assuming that IT will be able to handle. Let me make one more point before we go to the next electrification officer. Now at the turn of the century businesses were discovering this new capability that they had called electricity and electricity was kind of cool but they didn't know exactly what to do with it. You'll recall that most of the power in those days was done by steam engines were becoming better but had not yet achieved perfection and there certainly didn't have gasoline motors and electricity was starting to come and electricity was starting to be able to generate it. That meant first of all if the plants could get away from the rivers or the wind farms or wherever it was they were generating electricity. The cattle farms they were using oxygen to generate electricity just power in general on that. So for a couple of years there these chief electrification officers were extremely valuable and I just assume that the nice power that we've always expected to be there will in fact be there. This gives us now an opportunity to do the same kind of thing. Our chief data officer concept you'll see a lot of talks around this. Also however I'm not sure that chief data officer is the right title. I'm actually promoting a concept called enterprise data executive because it is a concept that you don't get the immediate pushback of oh my gosh another chief at the level and enterprise executive data officer can reside at any level of the organization. But again the business are the only ones who can articulate the business value of this data. The IT people with not their expertise. So I've given you a very long winded answer. You can tell it's one of my thoughts that should be owned by the business. Eventually I'd like to see it merge. There are many fine organizations that do that well today but I find they are a very, very definitive majority. About one in ten organizations is even coming close to that type of a process. Thanks for letting me jump in my soapbox. Next question is what do you want to talk about? So that's a great question. Now when we talk about big data first of all it's a very difficult subject to actually define objectively. So I don't like to talk about big data. I like people to talk instead and try this yourself. Next time somebody comes with big data this and big data that ask them if you can talk about something that you can in fact talk about. Now I've already said in here it's really like a process people, process and technologies that we're trying to get to. Now the goal with leveraging data is to in fact put it in the right place to get the right application of people, processes and technologies all lined up the way we'd like them to be. So goodness I'm going to talk about that. Now because we cannot define objectively big data but we can in fact define big data technologies. The question becomes what is the role for these big data technologies in here. Big data technologies have two very definitive use cases at this point. First of all they are massively parallel and that helps out tremendously for organizations and activities that are very, very good for big data technologies here. The capture zone, the landing zone for where this data comes in, which means we can determine what is wrought and what is not wrought very quickly in this. And the cold storage zones when we take this data and we don't all need it to be hot of course. Think of a checking account or a comment period that the bank implements or something like that. So I'm giving you two places there where these big data technologies go. The question was specifically about the role of modeling within them. When we take this data we are simply not saying that we do not need to model it anymore. What we're saying is we don't need to model everything. And what big data technologies are used, once we understand how the data that is consumed by these big data technologies is used, we can then incorporate it into our existing architectures in a much better easier fashion. Now let me go one step further than that. The idea of course that these are technologies means they give us certain capabilities and an analogy here is the telescope. In the past few days people said oh, I can see and of course we had eyes and sometimes we had glasses that helped our eyes, those of us that don't see as well as others, glasses and contacts do very, very well in that poor. The telescope allows us essentially to see into the future and in the military people recognize this right away. So they would simply say if I got a telescope I can see the enemy coming towards us before we realize. Now big data technologies allow us to do this. They can allow us to more effectively model. They can allow us to understand profile, be able to quantify the qualities of things that we're looking at in order to do all of this and to take data that maybe we're done in absolute retrieval and get rid of it quicker and sift away from the chast more effectively in here. This is an opportunity to show people how modeling can in fact help us in the longer run because by using these big data technologies we can get to models that are useful to the organization much faster. Again, I hope that answers your question. You can tell it's sort of one of my hot button items too because we're seeing a lot of promises around how big data is going to solve all of our problems and we'll be magic for us. Thanks for the question. Do you counter a particular difficulty in getting adoption for the development of an information architecture when an agile software development approach is used? Great question. Some of the organizations that I work with are going through that tension right now. Here's how you solve it. The way to make agile work is to have your data requirements 100% complete before you start. If you don't understand your data requirements, the only possible outcome from your agile development process, which is a very good process for developing software rapidly, the only possible outcome from your software development process are more piles of data. So what we have to do is understand that there never was a parallelism in the process of developing our data and our software requirements simultaneously. Data evolves at a different cadence than the software. Data is constant, whereas our software tends to be more or less replaceable as we've moved, evolved, changed architectures and done things with it. Now, people are using agile correctly to develop better high-quality software. But if you don't understand the way to start that, you're literally throwing money down the drain. And what happens is people end up with small piles of data that gives them less integration and the value that they get from the improved software development practices is offset, and I have a slide on that as well, by the cost of, in fact, having to go back and integrate the data. So this 20% to 40% of IT costs on this slide from John is actually much higher in standardizations that are not separating their data evolution from their software development. Again, very important part, great question, thank you for asking. Okay, enough follow-on from that question is, are there specific strategies used in this case? Well, yes. Again, the strategy is very much to simply separate the two of these activities out. Our data requirements can be used as a gateway on whether we should, in fact, go in and develop the data. Let me put up a different charge here on this. Again, we're talking here about the pervasive use of data throughout our systems. If we're developing or redeveloping program E in the bottom right-hand corner there, and we don't have a full understanding of the data that's shared in the green database, much less the data that's shared in gray databases, the only possible outcome that we can have there is to have E develop some very good software using a very good method. Again, if you don't have 100% of your data requirements nailed down before you start that effort, if you're trying to do it at the same time, the only possible outcome is confusion, which means that you will end up with small piles of data so again, data evolution is separate from external to and must precede systems development activities. All right, great, and I think that's all the questions we have for today. Thank you everyone for participating in today's event. We hope you've enjoyed it. Thanks again to Data Diversity and Shannon for hosting us. Once again, you will receive today's materials within the next few business sessions. As always, feel free to contact us if you have any questions. Thanks everyone and have an awesome day. Shannon, look forward to getting there tomorrow night, hopefully. Yes, we'll see what I can do and thank you so much for the great presentation and Megan, thank you so much for your help as always and if anything, we'll see you at the end of the month in Washington, DC at Enterprise Data World.