 Welcome to our next EDW session called the problem with communicating about data is the assumption that it has taken place, which will be presented by Michael Bork, senior IT associate at Exo Global, and all audience members are muted during these sessions. So please submit your questions in the Q&A window on the right hand side of your screen. Our speaker will respond to as many questions as possible at the end of the talk. Please note that there is a link form at the bottom of the page titled EDW conference session survey. When this is where you can submit session feedback and we encourage you to do so. So let's begin our presentation now. Thank you and welcome Michael. Hello. Good morning. Good afternoon to everyone or wherever time zone you are in. Sorry for the delay. I have been an information systems professional for about 40 years. I've worked in a variety of industries, healthcare, manufacturing, banking, international development, higher education. I've been VP of information systems, a consultant. I've managed networks. I've developed systems. I've designed database. Essentially, I've done it all, but I've always done it with data modeling. And I wanted to talk about communicating about data and its integration today. Let me just say that I have detailed notes for every slide that you will see. So if you miss something on the slide, that's okay. You can go back and reconstruct it. In addition, if you find any of this material interesting, I'm writing a book on this and I would be happy to share the chapters with you. Just send me your email and I'll get back in touch with you. Currently, in an inflection point in information systems, we are on the verge of being able to integrate data anywhere, no matter where it's located with any user, no matter where the user is located, and no matter what the user needs to do, he will have the data available. The problem is that the technology is outstripping the methodology in my point of view. So stakeholders have different views of the world and we have, on the one hand, we have information, we have business users. And when I say business, I mean any kind of organization. It could be governmental. It could be for profit. It could be industry with products. It could be services. It could be an international development agency, an NGO, even a funding organization that provides support for data integration projects, particularly in developing countries. Next. On the other hand, we have the world of IT professionals and they see the world differently. Next. So what happens is the two major camps of stakeholders are speaking across purposes. The end users want to talk about their business and how it needs data. And the technical professionals want to talk about data in relationship to their equipment and it neither side sees the point of view of the other side. Next. So what we need is a modern day Rosetta stone that provides shared concepts and mutual understanding of these shared concepts. You may recall from here history books. The Rosetta stone was an archaeological artifact discovered it was a law decree published by an Egyptian king in the second century BC, and it was written in three different languages, hieroglyphics, Arabic, and ancient Greek. And that was because there were three major groups of stakeholders who spoke different languages, but all needed to understand the concepts that were behind the law that was being published. And so I'm trying to do what you are going to see on these slides is a way to develop a common language and understanding between the IT camp and the end user camp. Next. Consequently, the when the end user camp issues an RFP or asks for assistance, they might have a better or they will have a better way of expressing their needs because there they will have an understanding of the IT professionals. Next. Conversely, the IT professionals will have a better understanding of the business and its data needs when they propose products and services to the end users. Next. So what we have is the two worlds the two camps. As we said the business and the IT professionals. And you can see business levels business processes standalone information systems little jigsaw puzzle pieces. These are meant to represent the fragmented data that is found throughout the organization. On the other hand, you have the IT professionals who are who are now able to provide tools and technology based in the cloud and the Internet that allow these organizations to integrate their data so that it can be used throughout the world in any process needed by any member of the organization. Next. And the Rosetta Stone is a data model. A data model is a graphical representation of the data requirements or of a particular organization. It is an enormous object it the data model what you see here is I don't know eight or 10 different. You might call them entities or objects or things of interest, but a large organization to its full data model would consist of anywhere from five to 15,000 of these entities. And we have all the jigsaw pieces jigsaw puzzle pieces on the, on the left and on the right the data model gives a common ground to start from when the two sides are communicating about their needs and about their capabilities. And that's our Rosetta Stone next. It helps when both sides have a shared view of technology that I have a timeline here that shows the major historical ages as seen from the United States or European point of view. The reason I took the point of view of Europe and the United States is that's where digital computing began. This timeline basically shows the interaction between societal organizations and technology. Next. This other timeline is a simplification of the historical timeline. This is more of a technology timeline you can see. Excuse me for the first 4500 years of this historical line. We're basically doing manual data processing chiseling on stone, writing on papyrus or drawing on sand. Over time we moved to mechanical data processing that classic example of that is the printing press and the primitive calculators that were developed during the age of discovery. Then we moved on later on to the industrial age to electromechanical data processing. There were moving parts in calculators and other components like photoelectronic activation of the wheels and the calculators. Then finally in the 1940s we developed a non-mechanical completely without any moving parts computing. As a result of that we have been able to digitize and store this data and process it and distribute it anywhere we want. Next. This overlay simply shows the growth of data that has resulted from the invention of computers, all electronic computers. We now talk about Zeta bytes worth of data. You can see that essentially the timeline over history was flat. The amount of data grew insignificantly but starting in 1945 and then continuing through the 2000s the data capabilities of our systems have been exploding. So we now talk about Zeta bytes. One Zeta byte is a trillion gigabytes to the 21st. Next. We were talking about the interaction of data in society and it helps to use a thing called the battleship model. This battleship model is for literature and art over the past 15 centuries. You can see that we are looking down at a line of battleships from overhead. You can see the outline of each individual battleship. It shows the evolution of technology and this is a good metaphor for the evolution of technology in society. You have a battleship it starts small and it expands over time becomes the major movement and then it is eventually replaced by another movement that goes through the same life cycle. Now this metaphor can also be applied next. This metaphor can be applied to society and IT information technology and law. You can see that society and IT are played this leapfrogging game as society advances with its organizations and institutions advance. IT advances in support of that. Sometimes IT will advance and then society will be dragged along by that advance. And this interaction is reflected in laws and the laws always follow with a particular leg. There are different ways. We can see that over the past couple of decades the number of laws that concern data have increased significantly. Next and one way you can understand what is going on with the evolution of law is you can classify what the law concerns with a thing called the PAPA model. It was invented probably 30 years ago and it stands for property, accuracy, privacy and accessibility. And you can see that initially the concern of the law was the PAPA model but over time as databases have grown we now have the internet of things where you have a computer in every object in your environment and we can connect anywhere at any time. We are now moving to PAPA plus I, I standing for infrastructure. We now realize that with the increasing connectivity come risks to national infrastructure and the whole industries can be taken down by intrusion. Next, a good example of interaction between society or among society industry and IT is the baby boomer generation to which I belong. And you can see that the baby boomers entered, that generation began in 1945-46 and it entered society broadly in the 60s and 70s, getting employment in various industries, going to college, getting married. And you see now they're tending towards retirement and you can look at the industries that interact with demographic group and how the information technology interacts with that and you can then see what laws result from it. We see the laws that concern privacy in education and accuracy in credit reporting and access with regard to government documents during the 60s and 70s, the baby boomers were active in the protest against the Vietnamese war. So next, next, next, and I'm not going to go into the details but you can see that for each decade as the baby boomers have advanced, society and technology have been changing and the law has been changing to catch up with all these advances. Next. And so what I want to show you here is that an example of why an architecture and conceptual, shared conceptual models are so important to data and its integration through technology. If you take a look over the last 15 months of the COVID pandemic, it affected all continents and if you're going to deal with the COVID pandemic, you need to perform certain activities, surveillance, give care, research and development, manufacturing logistics. All of this sounds good, but if it doesn't have an architecture behind it, it can be very frustrating trying to do management of a huge effort like this. Next. So for the past past 15 months, we realize it's occurred to me that there is no underlying architecture behind the whole COVID response, as illustrated by this X data model. There's no set of conceptual models that would allow cooperation and collaboration among all of the stakeholders in COVID. Next. So what we have on each continent is a variety of organizations that are involved with all of these activities associated with COVID. Next. All of these organizations have their own sets of information systems and next. Now we have sets of tools and technologies on the cloud in the internet that in theory greatly facilitate the integration of all this information, all this data, but we did not start out with a data model so there's been a lot of confusion in data gathering and data analysis which I don't think will be clarified for another 12 or 18 months. Next. So once again, this just shows, this is a repeat of the situation. We have the business world. We have the IT world. They can collaborate based on a shared understanding of the data. And but that that is not enough. Remember, I told you that a data model is a huge undertaking. It's not done overnight. It can consist of thousands of entities in the model. And in order to get from the current situation where you have most organizations have standalone non integrated silo kinds of information systems, each one with its own database, which greatly disrupt the business processes, which is this line up here. They greatly disrupt the business processes because for most of the history of digital computing, we have had to design constraints that say, number one, you cannot have a big enough integrated database that can support everybody. Number two, even if you did, you don't have a network that provides this data universally and instantaneously we now have reached that situation. And so we need to also talk about the methodologies it's not enough to have a shared language and set of concepts about data. You need to have methodologies. We're going to discuss those on the next few pages next one. I would add that there are constantly emerging technologies. And so technologies have a rather short lifespan. But a method methodology has a lifespan of decades. It's how you do something. These technologies are really the tools. So the tools are constantly improving. But the methodologies don't evolve as fast. So it's and not enough attention is paid to methodologies. Anyway, here's the methodology for change management. It's called the seven s as you can see that there are seven components. According to this model, there are seven components to an organization. Each of these components starts with an s next. And here's just a short overlay defining each of these components. And remember, the notes behind these slides have all the details that you see on the text that periodically flashes in. And so the lines. I meant to indicate that if you make a change in strategy, it potentially could impact every other component of the organization and actually might rely on that component to, in order for another for strategy to be successful, it's going to require that you change these systems, which could be your strategies, and that's going to undermine this of your strategy. Next, next. Oh, thank you. And so this. I lost my thought for a second. The, the, the methodology for change. I'm going to explain that with a concrete example, the Y2K problem, you may recall that older information systems were in danger of malfunctioning when we switched from 1999 to the year 2000. And large companies, you know, this, this represented the possibility of financial loss and actually loss of human life if the this, these computers controlled equipment. And so large companies, particularly large industrial companies and oil and gas companies were are antiquated information systems that had been built on a standalone silo based basis. And they said, look, $100 million and fix the Y2K problem but then we're still be stuck with these old systems that aren't any good. They're not integrated. Let's just spend another $200 million and bring in an ERP enterprise resource planning package like an SAP. So for this money, we'll get a completely new set of integrated information systems that will support all our major areas, like finance, operations, customer management, HR and so on. And so they went and did that, but many of them had nasty surprises because they didn't consider the seven s is next. And then did the new strategy of an ERP require that you do significant restructuring of your organizational structure, but it also required that you changed your systems next. This new IT require that you redesign the business processes and remember I said, for four decades of information systems there was a fundamental two fundamental constraints on your information systems. They couldn't be the data couldn't be integrated. And the access couldn't be universal and instantaneous now all of a sudden with these ERPs and their gigantic databases and Internet access. It's impossible to completely redesign your organization. Next, though, but you had to get more staff. Next, you had to give new skills to your people. Next, you had to, you had to require changes to behaviors. I mean, everything was going to change. Therefore, employee behaviors were going to change in all your business processes. Next, you might have to change your entire culture because your corporate culture had been built on information systems that were silos and did not encourage inter organizational data sharing next. Yeah, change management implies learning. And here we see three learning curves that there are three learning curves for a new information system. You learn a new strategy. You learn a new technology and then you learn new ways to restructure your organization so that it can execute the strategy with that technology. Now you see that these learning curves are sequential. The trick is to have them clustered as close together as possible so that you can, when you achieve a fit, you know, a particular level of learning. When you achieve that that fit, you'll be able to execute the strategy with the new system in a very effective way next. We see two companies, Company A and Company B. Company A has learned faster so that you see its learning curves are closer together in time. Company B has a large gap between organizational execution and technology. And if you believe that in competitive advantage, let me just rephrase this. Vendors will tell you that if you get their system, you can achieve competitive advantage. Well, perhaps, but the caveat is you have to be able to assimilate that system fast enough before your competitors do in order to achieve competitive advantage. Next, methodology for data modeling. We have touched on this briefly. Remember, the data model is that Rosetta Stone. It's the common ground by which the two worlds, the business world and the IT world can achieve common understanding, mutual understanding. And so what is that? Next, when we talked about the data model, and once again, the data model can be thousands of entities large and basically is an architectural drawing that shows what things you need to manage for your organization to be successful. And here we see, in light of COVID and the fact that most people have some experience with healthcare, I use, I'm tending to use healthcare examples in this presentation. However, in the book, there is a section in every chapter that deals with industries other than healthcare, in particular, manufacturing, banking and higher education. So if you're interested, that can be made available to you. So in any case, we have the data model, and it's next, it can be huge. And it has multiple phases. And these multiple phases are iterative as you go from one phase to another. You loop back, you learn and you loop back and you modify it. I'm not going to say anything else really in detail about data modeling. The book has an entire appendix on how to build the data model if you really want to need to know that or want to know that. Next. And the next three slides just show the three phases of data modeling. Phase one, you're interested in just finding out the things, the events, the people, the places that you need to manage in order for your enterprise to be successful. And you state a business reason why they're connected. So a physician writes an order, and this order is for equipment. This is just a very superficial model, but it just illustrates that this is meant to show the logic of the business in a way that can be understood by both camps. In the next phase of a data model, consider a data model eventually becoming a blueprint for a database, and a database is meant to answer any questions that any user at any level might have about the data. And so in this next phase, you're interested in the structure and and how one the structure of the data and how one might navigate through that to answer end user questions. Next. And then finally, there's all the fields. Each each entity gets analyzed to determine what other fields are needed besides structurally content wise we're talking about. And so this is called a fully attributed model. And once again, this will not this end state will not be built overnight. If you are serious about integration, it will take you several years to reach your ultimate data model, but even then, it will continue to change. And the reason is that society and technology continue to change all the time and then this just gets reflected in your data model next. The next thing, the next slide illustrates the complexity of data modeling for COVID in the United States. So what we have here is to say the management of the COVID effort in the United States at the national level and at the local level with public health officials. And they need a database which we would call a data warehouse for decision making. Next, at the same time we have various types of organizations working in providing care in providing lab and pharmacy support for the care. And then we have multiple types of organizations involved in R&D research, manufacturing and shipper. Each of these organizations particularly the larger ones has its own data model. So these data models eventually have to be harmonized into an industry specific data model. And eventually the next, eventually all these data models need to be linked so that their statistical data can be transferred to the data warehouse used by public health management for the pandemic. Next, just a little bit on process redesign. Recall and I've stated it and it cannot be stated too often, but business processes were initially poorly designed. What you see here is say the logical processes in an inpatient admission for a small hospital where you register, make diagnosis, it's a take X-ray, treat patient, discharge patient. And this hospital uses the older generation of information systems that were never designed to be integrated. So it has a registration system and imaging system and a charting, which is all the medical notes and results. And they have separate databases and I have inserted filing cabinets for paper because this is a poorly designed process. And the poor design is reflected not in this logical flow here, but in the details of the workflow beneath these high level processes. And actually the day to day, minute by minute tasks and steps that people perform in all the stakeholders in the process, the clerk, clerks, patients, physicians, nurses, others like insurers or watchdog agencies. So, because the processes had to be or the workflows had to be designed to accommodate flawed information systems, we get next, we get what is called non value added in the processes and you see this here for each of the stakeholder groups. And there are other stakeholder groups, but there are repeated activities. There's a chance for, it requires much longer to execute the process. The possibility for error of re-entering data or duplicating data can cause bad outcomes in a hospital. And overall, the efficiency of the process, the quality of the process and the experience in this case for the patient or in other industries for the consumer or the customer or the client, the customer experience is negatively impacted. But all this goes away when you start integrating systems. And so there have been four different approaches to justify the integration of systems. You might call this cost benefit analysis. Initially, computers were really not so much viewed as a system for to support a business process like managing a loan or managing a patient. Or managing a tax return. But it was really designed to support slow level tasks like printing data or sorting data or storing data or allowing data entry. So then in the 1970s, we started to justify systems based on efficiency. We would establish an information system, a silo, a standalone system for a particular department within the overall enterprise. And this was actually quite successful at the department level because it provided both efficiency and effectiveness. But overall, for the entire enterprise, these silo systems did not help the enterprise. Enterprise systems came along in the early 90s and continued today. Once again, it's the ERPs. They have integration of the major data areas, as I say, finance, operations, customers and products and human resources. And they, once again, they allow you to completely redo your organizational structures and your business processes. And the mantra during the 70s when ERP and other such systems were being implemented was don't automate but obliterate. In other words, don't try to automate your existing processes. Throw them away and start with a clean slate. Over time as connectivity has evolved, has improved, databases have gotten bigger and more integrated, and we have computing power stored in almost every object in our daily life. We realize that an enterprise is really part of an ecosystem. So for example, an ecosystem could be suppliers, your enterprise and your customers. Once the concept of ecosystem arrived, we realized the nation has an ecosystem and that's for, we have nationwide utilities like oil and gas, telecommunications, electricity. And it is possible to bring down an entire ecosystem if the data, if all this wonderful integrated data and universal access is not managed properly. Next. And let's talk about the methodology for project management. Getting close to the end of the methodologies and ending. There are two approaches to managing a systems development project. One's called waterfall, the other one's called SDM. They both involve the same phases, that is, initiation of project analysis, design, development and implementation. But there are big differences in that the Agile method allows you to group these phases together in smaller sets of activities and do that iteratively. And notice the Agile method allows you to loop back within one. The development cycles are called sprints and they're supposed to last weeks rather than months or years, which you have in the traditional waterfall method, the waterfall implying that you can never go back. And that's why we have the gray dotted lines going back from one phase to another because you can't really do it. It's too expensive and it causes too many delays. In any case, with Agile, the IT professionals and the end users frequently are co-located. They work together almost daily on a given sprint. And at the end of a sprint, you have a working version of the eventual system. Notice you have a bunch of question marks for the waterfall method because the end user never really knows what he or she is getting until the end of the project. And when that happens, there are many nasty surprises because society and technology do not wait until you finish your system before they change. They're changing all the time. So if you don't have a system, if you don't have a methodology like Agile that can adapt to that, then you're going to have trouble with project management. And one final thing about Agile is as it relates to the data model. Recall, I said the data model is enormous and it's developed iteratively over a span of time. And that is precisely what the Agile method allows you to do in systems development. Hey, Mike, I just wanted to give you a quick note that... How much time do we have left? A little less than five minutes if you want any time for Q&A. Okay, fine. I'll go quickly over the next couple of slides. Next. There is a methodology for data governance. Once again, this is highly dependent on the developments in technology and society, in particular the law. So we have three major major approaches to data governance, which is data management in general. There was database administration, data administration, and then data governance. The first one, database administration. It was like the care and feeding of files. And data administration was when you had multiple databases, none of which had been designed to be coordinated or integrated. You had to do a lot more work on defining what data means what to what people. And then finally, as we realized that if systems are breached, then you get... You can undermine or destroy entire sectors of the economy. We now talk about data governance because it's so much more important now to understand what the legal ramifications of your technical work on systems are. Next. This is just a repeat of the concept. We have the end user camp. We have the IT professional camp. Hopefully we've shown you ways to achieve mutual understanding between the two with a shared understanding of a data model and shared conceptual models of business processes and technical components. Next. And of course, we've been talking about the data model. Next. Once again, just to repeat, the data model allows you to bridge the patient's gap between the two parties so that the end users understand better what the IT professionals are saying and vice versa. Next. We only have two slides left. Hang in there. But it cannot be emphasized enough that you need to have tools. You need tools to manage all these models and the data model, the conceptual models of business and technology. And so you need data modeling tool. Next. You need data modeling. You need a data dictionary. You'd have to have graphical software to handle queries. You need software for process modeling, software to manage your agile projects. This is very software intense. You need tools to do something of this complexity and scale. Next. So here we are. We're at the end of the presentation. And I think here's the things that I'd like you to take away. Next. You've seen the conceptual framework for data integration. Next. You've seen the methodology issues. That is the integrating your organization with integrated data and technology is a very complex task. The tools and the technologies are now becoming available, but you need to have methodologies. You need to have rigorous ways of getting from point A to point B. So we've seen that. Next. And finally. I don't know what your backgrounds are, but I'm assuming I'm dealing mostly with the end user camp and that many of you are considering what to do about integration. What I can tell you is you need to find consultants who can prove could show you how they can apply numbers one and two to improve your data integration. If you do an RFP, ask the consultant to show you how they bridge this understanding gap between their people and the people who are issuing the RFP. I have seen, lately I have seen many, many RFPs and it is clear to me and these are particularly from developing countries that have been brought into action that have reacted to COVID and they now realize they need to integrate their systems. And not a single RFP makes any sense because they don't know, they don't have the slightest idea of what they're looking for. So anyway, that's it. And once again, if you find any of this material valuable, send me an email and I'll send you the chapters on my book. Thank you. Mike, thank you so much. I'm free. That is all the time that we have. We don't have any time for questions, but there was a couple of questions. I'll get those to you, Mike. And if you want to connect with Mike, you can go into the speaker section in the Spock me app to connect Mike and ask any additional questions you have there for him. And once again, remind you that there's the link form at the bottom of the page titled EDW conference session survey. This is where you can submit feedback for today's session. That wraps it up and you're welcome to continue networking with the attendees and each other through the Spock me app as we break between sessions. And we look forward to seeing you then. And don't forget to check out the sponsor section for information about tools you need to support your data management program. Thank you so much, Mike. Thanks for working through the technical challenges, but we made it. Got it. Thank you. Okay. Thank you.