 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager for Data Diversity. We'd like to thank you for joining today's Data Diversity Webinar, Approaching Data Management Technology, sponsored today by Alexio and InfoDix. It is the latest installment in a monthly series called Data Ed Online with Dr. Peter Akin, brought to you in partnership with Data Blueprint. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you'd like to chat with us or with each other, we certainly encourage you to do so. Just click the chat icon in the bottom middle for that feature. For questions, we'll be collecting them via the Q&A in the bottom right-hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag Data Ed. To answer the most commonly asked questions, as always, we will send a follow-up email to all registrants within two business days containing links to the slides. And yes, we are recording. And we'll likewise send a link to the recording of this session as well as any additional information requested throughout the webinar. Now, let me turn it over to Dipti for a word from our sponsor, Alexio. Dipti, hello and welcome. Hi, Shannon, and hi, everyone. Great to be here today. Let me go ahead and share my screen. And let me know if you can see it. Looks great. Great. Thanks, Shannon. Hello, everyone. Here to talk a little bit about Alexio. Alexio is a unified data orchestration layer for the cloud. What exactly does this mean? In today's world, we're seeing four big trends that are driving the need to think about a new architecture. And Peter will talk about how important architecture is later on in today's session. What we're seeing is a separation of compute and storage becoming increasingly important as folks move, enterprises move to a hybrid and multi-cloud environment, as well as new storage technologies like object stores in the cloud, as well as on-premise becoming more important. All of this is driving for self-service data and a data-driven culture across enterprises. And with these four trends, we're seeing that an ecosystem that was very simple many years ago, maybe about 10 years ago, that started off with one compute and one storage system that is Hadoop MapReduce as the compute layer and Hadoop HDFS as the storage layer, has become quite complex in today's world because of these four trends. In this environment, we're seeing a lot more compute frameworks and there's more that will keep coming up as the ecosystem evolves and there will be more storage systems that keep developing as well from NFS to HDFS to increasingly object storage. In this complicated environment, it's becoming expensive complex and performance is starting to get affected for data-driven workloads. And what enterprises are seeing is a need for a new layer across in the right in the middle of the stack between compute as well as storage to enable independent scaling of these layers. And that's what Aluxio is. Aluxio is a data orchestration layer that helps you scale out your compute frameworks and burst in the cloud or burst in on-premise independently of your storage tier underneath which might be a range of different storage systems which have evolved over the years. And so with this new data orchestration layer, you can achieve better data locality, better accessibility for your data which may be spread across different data silos as well as better elasticity for your data increasingly as compute needs to be more and more elastic. So let's take a look at what are the use cases that this separation and the data orchestration enables. The first one is being able to have better performance as well as better accessibility in a single cloud. And in this case, the example here shows Spark that's running on S3 or other frameworks, other big data frameworks that might be running on S3. With the data orchestration layer in the middle, you can bring your data very close to your compute in the same environment even as your compute and it accelerates your performance. You can also take existing deployments like HDFS that might be on-premise and enable compute in a hybrid environment with Hive or Spark or Presto running in the cloud with HDFS instances on-premise. And you can also enable much more complex environments including running big data on object stores which might be on-premise or in the cloud which Luxio and data orchestration enables as well. So these use cases are enabled by a few key innovations. Luxio includes a unified namespace which makes it easy to address all your data silos that might be spread across different storage systems. It includes APIs for different types of frameworks. For example, for HDFS integrations as three integration, Aposix integration as well so that the same data can be exposed in many different ways to different frameworks. And it enables data locality with caching and multi-tiering which includes having data in memory in SSDs or disk close to the computer as possible. And so that in a nutshell is Luxio and what Luxio enables. Luxio is an open-source project with great momentum. I invite you to try us out and join the conversation on our Slack channel at aluxio.org slash slash. Back to you Shannon. Dipty, thank you so much for this great presentation and sponsorship. And if you have questions for Dipty, she will be joining us in the Q&A portion of the webinar at the end. Now let me turn the presentation over to Cam from our second sponsor today InfoJix. Cam, hello and welcome. Hey, thanks Shannon, really appreciate it. Hello to everyone out there. I think, you know, one of the things that makes us so excited to be a part of this data education series and this webinar today that Peter is going to take us through is really is looking at asking the question of why is it that some data management organizations are successful while others fail? And I think like most success stories, there are common patterns and trends of high performing organization. As you all can see from the slide, InfoJix has been around for a long time and we've helped build hundreds of data programs over the years. Many of you who are on the phone here today. And there's been a lot of research behind this, I think based on external research and even our internal findings, there actually are some specifics that really are the common patterns of why some organizations succeed and why others fail. One of the biggest ones is that most successful data management programs, in fact, 85% measure the impact that that program has on critical business objectives. So 85% of the successful programs do some sort of measurement on the impact to how that program is delivering some value to the business and how do they do that? Well, typically data impacts the business in three predominant ways. Typically data as it gets used for analytics and insights, how it gets used for operational purposes, such as within business processes, or how it's used to reduce risk from a compliance regulatory standpoint. And what we do at Infagix is we look at what data is critical to driving those business objectives out of those key areas. And typically it gets down to about 1% to 2% of your data overall. So if you think about the vast amount of data that you have in your organization, what that means is that 1% to 2% really drive about 90% of the outcomes. And what we do is we provide essentially services and solutions that help our customers build a strategy and a roadmap that builds competencies around that philosophy and looks at how the people, process, and technology need to best serve the critical data that drives those outcomes. We then tie those competencies to solutions that measure the business value in the ROI itself. And our solutions are specifically in the areas of data preparation, data quality, data management, and data governance. So on the converse side of that, why do some companies fail? I think one of the interesting things that Peter will get into is that there are some patterns there as well. And there are some pitfalls that I'm sure lessons learned. Some of you on the phone who have been bloody through the wall in lessons learned in the past, but also from our experience as well, companies that tend to look for technology as a silver bullet, instead of accounting for the people and the personas who will benefit from those technologies tends to be a misstep. Also organizations that don't, since we look at the tool strategy in a way that it addresses the core business objectives and needs, that tends to be a misstep. And organizations that think with a project focus instead of a program focus tends to be a misstep as well. I think those are all really exciting ideas that Peter is going to unpack here as a part of the conversation that we're really excited about. So thank you again for taking the time to listen. We're really proud to be a sponsor and a part of today's conversation. And Shannon, at this point I'll turn it back over to you. Thank you. Cam, thank you so much. And again, Cam will also be available in the Q&A portion of the webinar at the end if you have any questions for him. So let me introduce to you then our speaker for today, Peter Aitken. Peter is an internationally recognized data management thought leader. Many of you already know him or have seen him at conferences worldwide. He was just at our Enterprise Data World Conference in Boston. He has more than 30 years of experience and has received many awards for his outstanding contributions to the profession. Peter is also the founding director of Data Blueprint. He has written dozens of articles and 11 books. Most recent is Your Data Strategy. Peter has experienced with more than 500 data management practices in 20 countries and consistently named as a top data management expert. For some of the most important and largest organizations in the world have sought out his and Data Blueprint's expertise. Peter has spent multi-year immersions with groups as diverse as the U.S. Department of Defense, Deutsche Bank, Nokia, Wells Fargo, the Commonwealth of Virginia, and Walmart. And with that, let me turn everything over to Peter to take off with today's webinar. Peter, hello and welcome. Hey, Shannon. And first of all, thanks to Cam and Dipty for setting us up for this. And as I said, we do have some interlocking themes that we're going to talk to you all about today. Just a quick note, we've got people in scar, as I can tell from here. North is Calgary and I am in Bogota, Colombia. So welcome, everybody. It is definitely not winter down here. What we're going to talk about today is give you a better understanding of sort of the fundamentals around data management technologies. And the reason I'm so excited to have Dipty and Cam join in later on is because there's a wide variety of practices around this. Some of your organizations are absolutely ready for some very advanced technologies. And Shannon, probably, we ought to do is look at breaking this into two different sections. So we'll take a look at that. And Peter, I know we're a little attached with you today, but your audio is starting to break up a little bit. I don't know if it's where your phone is. That's right. See you're sitting there. We're going to have to get back again, but we should be good. What we're going to do is talk about technology considerations in here for a bit. And then we'll talk about data technology architecture that are important. Very few. One in 10 organizations actually manages data technology architecture. We'll dive into of course our good old friends of case tools and repositories, and something that's been relatively new, recently profiling into separate tools. I say relatively, they've been on the scene for about 15 years, as opposed to case tools that have been around for almost 50 years then we'll look specifically at some data quality engineering tools a little bit on the data quality life cycle and if we have some time, we'll get to some other technologies and see how it goes, given all of these things that we've got to try to accomplish today. To get started, I'm going to give you a little bit of a background on I'm going to interrupt you again. We can barely hear you. Understand you. You're coming in pretty broken up. I have absolutely no idea what's going on. I'm child directly in. I don't know what's going on. I don't know if you can move your phone if it's an exception issue, if you're on a cell phone or Not a cell phone, sitting right here by the side. Let me try to sit here with the audio on the other one. That's how I get to do this. Yeah. Yes, and yeah, he always talks fast. Yeah, let's switch. Sorry, guys. It's a problem. It's a problem. It's a problem. Sorry, guys. It was when he was in the soundcheck and now it's a little choppy. I think we're just going to have to press on, Shannon. Are you getting me at this point? Kind of. Kind of. That's a good answer. Thank you for that. How do I switch it back over to the computer? We have an audio connection. Yeah, yeah. Where do I switch the video? There we go. Audio connection. Audio. All right, let's see. Thank you guys. There it switched. Are you there? A little better. At least clear. There's no choppiness. Maybe turn, can you just like get right up in the mic? You're low. Shannon, can you hear me? We can hear you. Yeah. The sound is a lot better, but your volume is low. Yeah. So we just need to turn up your audio. We can turn up the volume here. You still there? Sorry, guys. Thanks for your patience. We should do a webinar on the challenges in the future. Shannon, can you hear me now? Yes. Shannon, can you hear me now? Yes. I'm not sure what that was all about, but we'll give it a try. Anyway, back to where we were talking about the interruption guys. No, there we go. That's why Shannon is so important to these events. All right. Thank you very much. What we see happening in most cases when organizations are just moving to the cloud just to take one example is that you get this idea that most people say it's a technology project and we're going to put all of our data into the cloud. I've been to a lot of organizations in the past five years where they say we're going to do this. The problem is when you've worked with your data into the cloud, there's no basis for making decisions about what data that you have and what data that isn't. And Cam said already, when you move your data to the cloud, you get an inclusion of architecture or engineering concepts in this. And there's no idea that these concepts are even missing from the process. There's my little thing there. 80% of organizational data is redundant, obsolete or trivial. And so we just don't need to move it into the cloud at all. Unless you just want to enrich the cloud then you understand. Don't get us wrong. I'm going to see a great example of what they do. So really what you should look at in terms of transforming the cloud is looking at less data. The data that's in there should be cleaner and it should be more shareable by definition which means overall you need less of it all the way around. Now these are a couple of strategic planning assumptions that Gartner put out in December. By 2021, data starting using hubs, lakes and warehouses will support more use cases than things that don't produce more use cases in there than taking any one of those strategies independently. Similarly in the big 2022, Gartner said 50% of cloud decisions will be based on the assets that are provided rather than on the product capability. So they see emerging of all these technologies going around here. And using active metadata will reduce the time to data delivered by 30%, which is something we'd all like to get to. Finally, less prediction from Gartner wins by 2023. Gartner is a specialist by 20%. It's the first time I've ever heard anybody come up and tell me we need less data people in this world, but I'm pretty sure we're wrong about this one. So the problem is if you're looking at cloud offerings you better stop looking at what the cloud does instead look at what the cloud gives you access to. Again, an easy one is if you need access to YouTube data, then Google is probably a better cloud offering than Azure. And if you're more important in data or Office 365 than Azure is going to be a better piece on that. This is a picture from the Gartner report. Again, it's out there at Gartner. You can get all of it in the URL down there. But what this really gets to is that we really don't have a good definition of data management all the way around. They sort of look at it as collective to connect. We've always looked at it as what happens between the sources and uses of data. And the idea is that once you start specializing in there, you'll have some data engineering, some storage, and some data delivery capabilities along with your data governance pieces in there. And that means you need to have specialized team skills, and that applies to the technology component as well. But even this diagram here, as much as we like it, is insufficient in my opinion on this because it doesn't well reflect the idea that data is something that we want to reuse in this case as opposed to simple deliver. So I like to talk about making a better data sandwich in this context. And the data sandwich really is composed of three parts. There's a matter of data literacy among your folks that you have working toward. You've got a data supply that's probably uneven and individual uses of standards sometimes more than less. What we'd really like to do, of course, for all of these is to smooth them out and make them into a much more palatable sandwich. Now this cannot happen without engineering and architecture. It simply does not work. And I was on a tea farm, interestingly enough, in India last summer where I saw this little thing on a little plaque behind the cash register. Quality, engineering, and architecture work products do not happen accidentally. And when we ask ourselves the question, why is there so much sand in our various technologies? Our data sand, if you will, but isn't helping us get data delivery the way we'd like it to be, then we have some big, big problems. So technology by themselves are one-legged stool. And I can tell you that United Airlines puts me on a stool to get me back from Bogota on Sunday to get to Newark on a one-legged stool. I won't be comfortable and we know now that three-legged stools are the minimum minimum that you need to have in order to do this. Of course, the three-legged are people, process, and technology in here, which should be thinker-related as much as they possibly can because only when you have them working together do you actually end up with good success. Let me give you a very quick example of this. We do an MDM webinar around this. MDM is a great set of technologies, but if they discipline, and most of the time we see this sold as a solution. And the problem with that is people think if they bought this silver mold solution, it's going to solve their problems. It will if you also include these people in process pieces. So putting up an architecture like this is only going to be partially helpful to your actual solution. The technology first is de-emphasizing people in process components. And successful MDM requires governance and quality as well as understanding of your process architecture. Tools and methods are required in order to get this stuff to work. There's an enormous demand for data talent out there right now, and it is literally going through the roof, but our supply is not increasing. And this is a problem as well. I can give some numbers to this in fact, it's a super study that should store data that's accumulating 28% annual growth. I wish our retirement funds would accumulate at a 28% annual growth, but that this supply of data analysts in the workforce is only growing at 6%. These are the kinds of things that we need to have technology, and yet what we see a lot of times is that people go off and buy technologies without understanding. Again, that's why we're so happy to be working with the two vendors today and here that are going to tell us a little bit more as we get to the Q&A section about how they can help organizations do this. I mean, think again about Moore's law. The hardest part of doing requirements is not doing the buying. So one of the goals, take away as if you will, is to postpone your technology investments as far as you possibly can because that will allow you then to understand more about your requirements and then one of these various technologies actually come in to help them out. Thinking too, in terms of your leadership whether it's a CIO or a CBO at the top of your chain, they're feeling a lot of pressure to buy technologies. And if these individuals won't buy it, if they do listen to what we're talking about today and postpone the buying decisions until they really are ready to be a good customer in this, then they go around and they go straight to the CEO or straight to the board. And it's just amazing to see the amount of pressure that these individuals are under. We also need to start doing a better job with what we call vendor or project promise auditing. So when somebody stands up and says, if you deliver this to me by Thursday I'll get it 50 million to the bottom line. Well there are some organizations now that are standing up and saying and the last three promises that this individual who just stood up and made are in fact correct because people don't really understand the type of curve. Now a quick little bit on this when considering a new subject there is frequent tendency to first overrate what we find to be already interesting and remarkable and secondly by a sort of natural reaction to undervalue the true state of the case. This was written between 1850 and 1852 by the world's first programmer Lady Aida Augustus King sorry Augustus King I said that wrong and this is her first program that she wrote but she also then did something that Gardner turned into the height cycle and they've done a great job with it. The technology height cycle starts with some sort of a technology trigger rises to the peak and inflated expectations drops to the trial and disillusionment re-jumps its way back up to the slope enlightenment and then onto the plateau of productivity. Now the key is to put this in the straight ugly vernacular. Wow it's the best thing in the world. Oh no it really sucks when the answer is it's somewhere in between and it's going to be up to us to collectively figure this out. Here's a couple of examples in use the height cycle for data management as of July 2018 so the last one they've issued on this. Data as a service was at the top and about to fall to the bottom so if you're in the data as a service business you may want to think that or skip the next cycle perhaps if you're looking at information steward applications things are about to get better for you and master data management live actually at the trial of disillusionment at the very bottom right now which is why. We've seen that data made platforms as a service on the other hand is a fairly mature organization a couple of quick more. Information governance and master data again machine learning is just on the ascending curve we are just getting started with machine learning on this. Many data management solutions follow on us at the top of product and customer data now as you're looking pretty mature so you should find these cycles as you're looking at technology Gardner has them out there they'd like you to pay for them but I guarantee if you look hard you'll be able to find them. One more quickly on this data storytelling is becoming important as our chat bots are into this prescriptive analytics however it's about to go over the oopsie at the very top and if you're in text analytics you're doing pretty well. So let's talk about what a data technology architecture is with the idea that you need to have some sort of an organization you need to understand how the technology works and what particular value it is going to supply in the context of your business requirements so you can ask these questions to the vendors and you'll get a chance to at the end of this. What problem does this technology mean to solve? Who sets this technology apart from the others? Are there specific requirements that you need to have in order to run this? Do I have to be a certain type of shop or does this technology include security functionality we all have seen little bits where people have left the databases open on the web. So this data technology architecture is a part of the overall technology. It's considered part of the enterprise's data architecture and addresses the questions what technologies are standard, required, preferable and acceptable? What technology apply to purpose and circumstances? Which purposes and circumstances? And in the distributed environment where should everything reside? Now just so that you know the rest of this presentation is largely organized for a takeaway. So Shannon will send out the slides at the end of this along with our sponsor slides as well and you guys can take a look at this. So this is really thinking about what do we have in the presentation. Come back and take a look at this later on. But I'm pretty sure most of you have heard of case tools. But guess what? The vast majority of our student population out there have never heard of case tools. This is appalling. We've stopped teaching case tools because case tools stop being given away. The case tool market is alive and thriving. But it is amazing when you see a Microsoft executive stand up in a whole public meeting and say, oh my gosh, I just discovered there are tools that help you do data modeling. This could be actually very valuable to us. I can't emphasize how important it is to make sure that your organization has an understanding of how case tools can play. And good case tools will play all the way from planning statements, things that are your requirements to different components that hold all of this information together in one particular place and then actually create work products for you that you can produce out of it. In some cases XML, in some cases it's DDL that goes into your databases and regenerates your database. All of these things are important. The always problematic drawbacks from this has been that when we ask people what is the main case tools they use, they cite three Microsoft products, Microsoft Excel and Microsoft PowerPoint and Microsoft Visio, none of which are in fact case tools. They are drawing tools and of course Excel has its own etymology as well. We've also seen an evolution. It used to be in the old days that ER win and ER studio were big players in the market and Rational Rose then came along because they were giving it away to all the colleges and universities. So we tried to teach students about Rational Rose and how to do data modeling with it which it can be done. It's not the easiest. Now of course both of those companies have been reborn as IDERA and ER win. We have some very, very good product offerings on the market on this and you will see these works at the various forums that we have around here on this. Plus of course open source and just remember that open source is never free. It needs its own care and feeding and I've included in here for you a list of case tools that goes on for five pages. Again these are good things to have but they are not something that most of your folks are familiar with if they've graduated from good colleges and universities recently because we stopped teaching them about 20 years ago and it's just absolutely appalling to me. Now another problem we've had in case tools all the way around is that they cost per seat. So here's an old piece. These are going to look kind of old for you but they haven't changed. This is really the key. They're still valid. If you have to spend thousands of dollars or hundreds of dollars per seat then you have to add this all up and you can see it adds up to a fairly big investment. If I just want to ask for $2,500 a seat times 75 seats that's about a two and a half million dollar investment by the time you add all the workflow into it. Here is a case tool taxonomy that I adapted from a colleague's piece in here and there are upper, middle and lower case tools and there are management tools. There are technical tools. There are support tools. There's a lot of things that are going on in this environment but if you don't even know it exists in the first place it's very hard to make sure it's useful in your organization. To finish up on case tools here, the real key is that the old way of doing it was that everything had to fit into a single case to one environment. There was limited access into and out of it. I worked on a project at the Defense Department actually let's see it was in the 90s, early 90s. We were going to create something called case tool definition interchange format. It's a pre-cursion to XML and all of that. The new model that we're seeing here is now that you have some metadata and you have more case tools as a service that can come in and do certain types of things for you in the organization and help you get essentially the service that you need to from for example one case tool that I like a lot does a great clustering and I haven't seen that duplicated in other tools so I like to use that case tool for specifically the cluster algorithm. On the other hand better, that's going to be a lot better, but a different case tool might actually give you a better layout of what the tool service is what the data model actually should look like on this. So our case tools are big pieces. Next one we're going to talk about is repositories. And again this is from that same gardener for that reference before. Just notice the two pieces in green. Identifying data delivers value and supporting data governance and security are the huge big poles and the tent here that everybody wants to pay attention to. Gartner says they did some measurements around this and in measurements they said that metadata occupies about 12% of the time that we spend doing data management. That's a nice number I've seen it before and Gartner is pretty good with all this but at the same time Gartner will also come back and tell you that they view metadata repositories as an esoteric technology. Esoteric in this case being not directly related to the business. That's a problem and when we look over the repository environment we tend to see an awesome line of organization certainly falling into this pattern. There's about 45% that really don't use anything at all. And that's sort of amazing it does have a function to do with the size of the organization. Bigger organizations are starting to realize that they will gain efficiencies from using repository technologies. One in four however are building their own and doing just fine with it. Then there are the big players that play in this and that typically is about a 16%. So you can see the market is quite fragmented given this. Now I'm going to show you two sort of quadrant charts that Gartner likes to produce here. This was 2004 and you see over here on the right we're not really worried what was happening in 2004 other than to say some of the same players are still in the same quadrants since 2005. All of these documents are available from Gartner. You can contact them directly. They'd love to sign your permission subscription service form but nevertheless you can still find people that have got access to them. So you see that howling systems group that same meter in that quadrant is playing in exactly the same space. Now the main problem we've had from the metadata repository perspective that's out there is simply this. I think it was 91. IBM published something called the AD cycle information model and this is literally how everything maps to everything else in an environment space. There is no business case in the world that will allow you to put your people to work doing this much mapping of metadata back and forth. However, knowing that this exists and there are very good documents out there. There's a whole issue of the IBM systems journal, same journal that published John Zachman's original framework that talk about how these things fit together. So this is not something you have to go back and reinvent. In fact my colleague Michael Gorman also has a meta tool that he uses to track all of this stuff together. So my point is if you have a metadata problem that you're working with, don't put your head to the wall and beat it. Talk to people ask around. Michael loves to talk to people. I would be happy to share some of those internal documents with you because a repository doesn't actually have to be an integrated solution. It has to be a solution that can easily be integratable. So find different words there. What I tell people is that we've been helping people for years and years instead of going out and buying six-figures, seven-figures worth of technology they can implement metadata repository functionality for a very inexpensive piece. It usually takes a SQL server instance and a good SQL server programmer and a good business case. The idea is it doesn't have to equal a repository but if you create it correctly and I've given you the academic basis for that in those two previous slides it can be much easier to create something that will eventually evolve into a repository when you find out it does the entire prospect. Multiple repositories are not necessarily bad and many people use Excel quite successfully to do this. It's not ideal but if it gets the job done that's good enough. The minimal functionality then is the ability to create, read, update and delete and evolve those various metadata items as we get through. And in order to manage metadata you need metadata repository functionality in order to do this. So again just a little bit there on repositories. The next category here, profiling discovery tools, I will take a little bit of pride and credit for inventing the field. When I was in the Defense Department in the late 80s we discovered that we didn't have enough people to go in and do all of the work that was needed to do the Y2K mediation. Dave and I did a session on this at UVW. We had a lot of fun and I think it was a youth session as well. We put out a research proposal that said can somebody come up with some algorithms that would help us analyze data better. And a wonderful PHD out of Cornell made me a bit and pulled together the first series of algorithms on this. And more importantly what you see here on this chart is that governance quality and integration functionality that we need to have in here is well over half of what's going on in data management. So focusing in on that area turned out to be a really good idea. When we did this we put it back out in the public sector and you can see at the bottom of this slide there were companies like evoke and metagenics and essential that came along and moved this technology forward in terms of the maturity. But the real key to it was that these data analysis technologies allowed a 10X improvement over the previous manual approaches. And the idea is that we're changing the way we used to do this. In the past what we did is we came in and we sat down with a beamer as we call them and we put it on the desk and we put a blank screen of paper up and we'd say tell us about your business. Now this is a good way of doing this. I learned how to do this partly complied in many cases on this. But switching to this new way we can move to a semi-automated environment that is really engineered to be repository independent. And you'll see that these tools are listed both in a separate category by themselves as well as in a category that includes data quality tools. Now just to give you an idea of what this is, instead of in the old way getting the business to tell us everything they knew about it and trying to make sure they were complete we can do this analysis off time. We could select a PAKO there. There's PAKO with an asterisk in there with a minimum value. And when we look at that we can do a little bit of homework and say hey double click on that set of values and see that's what pops up in the window down on the bottom right hand of the screen. These actually show you the frequency distribution there. Excuse me, 11.49% of the data that we're looking at in this sample is governed by this asterisk value. And then somebody says 11%. Isn't about 11% of our workforce governed by the UK payment method? Aha. So instead of asking the users to tell us about payment methods, we can get to them with an asterisk. And we say, is it true that if you have an asterisk in that column that means that your payment method is from the UK? People love to tell you yes or no. But more importantly in addition to telling you yes or no, this changes the dynamics. When I'm doing this full time for the Defense Department we would spend three mornings a week. That would be eight till noon doing what we call model preparation. And the rest of the week, we did only call it a model refinement validation session. Those sessions went on and on with the users. And the most important thing was the business people wanted to know when they could have their good business people back. By the way, if they wanted to tell you that you can have people that they don't care about, they probably don't know enough to help you in this scenario as well. So with our new reactive, see the proactive model, we can instead move this to just two afternoon sessions and spend most of it. And that also gives us the ability then to figure out some measures around this. This looks a little bit mathematically complex, but it just says you can't tell how long it's going to take or how much it's going to cost if you haven't ever done it before. And if you do it, keep track of it, you can set up an evolutionary process that will emerge towards the ability to predict with good certainty that this is how long it's going to take and how much it's going to cost. By the way, I thought I had a brilliant business idea on this one in the acquisition because I can tell you which ones are going to work and which ones aren't. And the answer I got for get out of here, we don't want you to mess any merger deal up because when the deal goes through we get our fees and it doesn't matter if somebody else's problem at that point in time. One of the things that is part of the reality that we live in here. I mentioned already the data profile on the right-hand side is also listed as a data quality tool. And before I dive into these tools, let me tell you one more thing about profiling. If you have a proprietary database, a database that you do not understand what is going on in that database, this set of technologies can reverse engineer for you a logical third normal form of the model and tell you exactly what's going on. So huge use is outside of just the quality, but if you don't look at quality at the same time, shame on you for not paying more attention to it. So again, I mentioned this before, we were doing profiling on this. Here is the next tool that pops up. It's a parsing and standardization tool. Again, these are things that are starting out to appear as services. No problem with that if you can feed a bunch of things in, it would look like phone numbers and you could have it come back and tell you that here are valid phone numbers that came out of your input phone numbers. That would be a service that I think a lot of people would pay some amount of money for. Certainly not millions in order to do that. So again, when you identify certain types of errors, this is really where the machine learning is starting to dig in. We can identify automatically and say, oh, it looks like they're starting to write in their information that doesn't happen in the way. This is a quick example here. This is one that occurs oftentimes in master and reference data where a lot of people will put down Great Britain as the country that is known as the United Kingdom. The problem is GB may not be your official standard to definition for what that is. So a rule can pick that up and say GB, did you mean really Great Britain or did you really want us to put in the proper doing that which is the United Kingdom, UK in order to do that. These transformation tools look through various patterns, come up with rule-based transformations and allow the organization to end up with higher quality tools. Student data that is trained and organized better and faster in order to do that. Another step in the class tools is the data quality tools that talk about identity resolution and matching. And the idea is, we published this as well, if you've got people in your database and they've changed your email and you use email as a primary way of figuring out who the people are. Once somebody changes the job, you move it. It's why you hear that the database that is largely keyed on email addresses is, we've got a half life in about 90 days or sometimes 180 days. It's definitely not a real useful way of doing this. But the identity matching tools are starting to do some very, very useful things. Such as looking you up on the Internet. There's a great one we used occasionally called I think it's HubSpot and they're a sort of CRM-ish tool but if you start putting data into it on the Internet and find out what else it knows about Peter so that it can come back and say, is this the Peter that you're in fact talking about in order to use this? Our fifth category of tools under the data quality tools are data enhancement tools and these are pretty straightforward. These are things that we're going to add to the original data such as the data time stamp. If I knew something occurred it's probably made more valuable to know when it occurred. We also can put in auditing information, contextual types of information. We can code them with Geographic and if you saw the Apple announcement from a couple of weeks ago, the new Apple credit card is guess what? Using some type of machine learning to go in and put in Geographic information for which Starbucks you bought your coffee from in order to do that. By the way, Capital One's had that technology around for years but we're glad Apple's got it as well. Again, demographic data, psychographic data that you can pull into this to pull all of these tools together. Last category of data quality tool technology under here are just the basics of reporting and many, many organizations have been doing reporting for years. We had a joke on reporting in the old days because reporting was actually a report. It was a physical set of piece of paper. Back in the green screen terminal days. When we put that report out, one of the things we would often ask is, I wonder if anybody's reading this. The trick was, don't put out a report on a periodic basis and see if anybody complains and if nobody was complaining then it must have been the right decision. Now that doesn't work anymore because now we have dashboards and if people now look at their dashboards and try to see things. Now if you're not feeding the dashboard, you have a very different set of problems that you have in your environment. So just good basic reporting is a wonderful way to do this. I've had some very interesting conversations with people who think they're in the data mentioned profession because they're report writers. They do a great job writing reports but if you don't fully understand everything that you're trying to do from the reporting perspective it's a problem. One of my favorite examples was that I worked with a colleague on this one where we had two vice presidents that were both in a meeting telling the president of the bank that sales were simultaneously up and down. Their reports show that. Sales, absolutely correct but obviously the data that underlaid that report wasn't a problem. So again, just very briefly let me go back through this. Let's take a quick page there. I just want to read through the tools here. There we go. Profiling tools, huge stuff. If you're just discovering profiling tools on this webinar it's good for you to get into there and find out. There's only been one book written on that subject so it's a little bit hard to find some good information on. The books are in my Jack Olson by the way. Data parsing and standardization tools. Data transformation tools. Identity resolution and management and enhancement tools as well as good old reporting tools. So these are all critical pieces of data quality technologies that you have in order to put these up. And I want to say just a bit about the data quality life cycle because it is kind of a problem. Originally our good friend Tom Brennan put out in 1993 that this was the data quality life cycle and it seemed reasonable enough at the time. You acquire data, you store it and you use it to make sense. But it turns out it's a little bit more involved than just that. And this is because our profession, the data management profession, measures the amount of time that has been around in terms of decades. Where the accounting profession, the profession I'm very comfortable and understanding of has been around for 8,000 years. In fact, in this morning here in Montetal we were shown a beer sale I think in 450 B.C. So 2,500 years old and hopefully you'll get an artifact of that because that's kind of fun. And the reason this is important again is because if you just take the simplistic perspective here, you will miss out on the idea that the vendors actually can help you by putting in specific types of technologies that will help you with various aspects of the data life cycle development process. So again, we're not going to walk through all of these. Just know that it does exist in reference form and that you can get access to it by sending you the final set of tools on this. So we're going to finish up here in the last 10 minutes or so. I'm looking at a couple of things that are a little bit further out and moving towards where our sponsors are coming from as well. First of all, everybody wants to do data integration. And there's lots and lots of things that we can do that integrate data. The tools include servers. They're a big deal. We need a place to put the data. There's something called Enterprise Information Integration Technology. There are things called portals and there are some conversion tools on this. We'll look at a couple of these to finish out on this. Portals were huge a couple of years ago. Again, it was the hype cycle. It was the greatest thing in the world and then they were the worst thing in the world. The answer is, of course, they are in the middle somewhere. And this is a great piece I got from Terry and back in 2001 he said, look, you can take your legacy systems and wrap them up into something that turns them into a portal. The better way to describe it is perhaps looking a little bit like this. You can take web services, take the old legacy application or just take the legacy code out of the old application and re-package it as a web service. Somebody in the organization can simply click a button here and look at regional reporting by state, by region, whatever it is that they're trying to get access to. These types of portals still have tremendous underutilized potential. One of my favorites that I worked with was something called Top Gear which was a wonderful technology for wrapping up everything in your ERP whether it matter whether it was SAP or PeopleSoft or Oracle SAP or Oracle ERP. What you're doing here is you're essentially taking all of this information and wrapping it up in some way that's accessible. So instead of having to understand some of this information on here which you can see as pretty detail you can literally pick up a piece of information and drag it to something else and what you're seeing here is that they dragged a customer number here onto a masternata stack that they have called customer and that will then be interpreted as a request for additional information about the customer. Very, very useful types of technologies. We do not see enough organizations really investing in this but it's a great way of trying to get to more self-serve type of activities around this. Portals also function very, very well as data quality tools and I don't know if the portal itself will build any data quality into what you're doing although you can do that. But you can use the portal and you can use the cloud when we started about 50 minutes ago as a great place of entry when you come in put only data of quality in the portal. Now I'll tell a quick little story here. This was an executive at a company that had just been sold to a European organization and he said, look, I got all these people coming in to me with reports and these reports tell me why I shouldn't sell their division but all the other divisions should be sold. He said the problem is I can't tell where they're getting the data. They have no problem offsetting the lineage around this. The portal can help with that process where you can say only data and of course the solution you develop for them only data that's in that portal can be used to produce these reports and if you want more data in the portal make a good business case and show us how you're going to put data of known quality into that portal as opposed to data of unknown quality. Now I won't ask the hundreds of you that are out there to raise your hand if you could make a statement. All of the data in our organization is of unknown quality but you shouldn't want anybody else looking over how we collect your shoulders as we did something like that. Again, a couple of, again, integration pieces, ETL is extract, transform, and load and this gives data to the new database, data warehouse, whatever we're talking about with a very, very large set of processes. People forget how much metadata can be mined from ETL processes. It is a phenomenally useful space and yet I'm just amazed that people just don't even think about it. In order for that job to run every night it's got to be right and whatever transformation it does has to be the right transformations or else you've been working on something with a series of incorrect assumptions for a very long time. There's another category of these that are coming out which is enterprise application integration which allows the applications to be connected. Again, these are rapidly using different packages that we put on them as well and finally there's a new category of EII enterprise information integration that allows tailored views to be delivered to the user at the time it's required. Here's an example of that. This is an older product here so I'm not endorsing anybody specifically but MetaMaker. It's the only real table that are in here in this case probably ones that are in blue. So we take the two blue things on the far right and we do some sort of a transformation to them that get this first orange table both middle and the diagram at the very bottom. If we then say, I think we have to have a little bit more again the tagging information that I was talking about the enhancement function here I can now come up with this next orange table which is a little bit smaller than the first orange table at least in terms of metadata items but now I need to find out the really last piece for my query and that's going to require four other tables that I put together. For this one operation I will end up with something that is useful helpful to us as we look through all of these various bits and pieces. These are wonderful technologies that I find fewer than one in ten organizations are really ready to start using. So again we've sort of flown very very briefly here through data technology architecture a little quick reminder what case tools are a little bit about repositories more importantly profiling and discovery tools data quality engineering tools talk a little bit about the data life cycle just to show that we need to put some more time into it and a couple of better things. I want to finish up here at the top of the hour with a couple of quick findings from Gartner here as well the idea is that the assets continue to provide strategic cloud service offerings. In other words you're going to stop buying clouds based on whether it's Google, Microsoft or Amazon and instead you're going to look at clouds as what if I buy into Google can I get access to? You're seeing a huge huge see a change that's coming out through machine learning different the idea that most of the knowledge on how to do those kind of tweaking things that we do are kind of in people's hands and what we really want to do is change that quite a bit and instead move it into somewhere where we can encourage these things and this is where machine learning can prove to be tremendously helpful in all of us although there's a huge debate about what machine learning is part of data management or not I certainly think it is. Again these use of cloud applications get us the clouds in the database and so we can start looking at what's happened if we put the cloud that's giving the database in the cloud. Now it's going to depend on when your business needs yes it's a great idea to say I'd really like to have no deep gaze and have Amazon manage all that stuff for me. I've gotten a lot of interest in the business so doing a great job of keeping track of what your requirements are how important they are what types of things you need to have and what types of expertise you need to have in order to run your business for example those of you that are in the insurance industry you understand inherently that you are in an information producing business and probably the DBAs have some value in there that I wouldn't necessarily want to simply help source without at least sucking all of that wonderful metadata out of their heads. Last point from Gartner on this slide is that it's not very obvious I think conclusion but it's probably good for them to state it that remember when data lakes were going to take over the world well guess what Gartner is now saying that if you have a combination of data in warehouses, lakes and hubs you can achieve greater flexibility than if you have only one of the others if you only have warehouses, only data lakes or only hubs you're going to not be able to provide as much clarity around all of them and we're headed to our Q&A session and I'm going to just finish up here with a little bit of a bit of narrative here and just going to hopefully show you guys how this stuff actually all does fit together on this. So while you're thinking about your questions and things let's just think for a minute here about how most organizations pursue data and I don't mean this as a people on the call the hundreds of you that are out there on the calls really do understand the data is much more than the bad side in between IT and business and so many people say great so we'd like it to be here right the IT there's data and the business this is really one of the justifications for the whole CBL movement that we've seen and interestingly enough the federal government most people are not aware of has mandated the use of chief data officers throughout all federal agencies starting on July 14th 2019 but here's the real thing your organization and your IT group are swimming in the sea of data and that's the data is never going to get any less if we don't have automation technology the kinds of things that our sponsors are going to be talking about here in our Q&A session in other words then there's no chance that we're ever going to get a handle on all of the data the other big thing we see is that organizations are simply saying we want to go digital we absolutely have to go digital but you can't go data you can go digital just by spelling data it does require additional work and that is not something that we've seen a lot of organizations that want to get involved in so we look at this it really comes down to we've got some bad data and we've got some wonderful wonderful thing that happens in the middle of it but we're still going to get bad results at the end of this and I like to call this the lady data I guess the king rule because it follows perfectly with our hype cycle yes there's something that some of the stuff we work really well on but unfortunately the sales people don't always know what they're talking about again the company completely accepted from Hologram so one last piece of the comment I'm watching here but it's still really really relevant somebody posted this on LinkedIn as a recent technology realization and the realization was if I've got chocolate ice cream on this end and I've got something awesome I'm still going to get chocolate ice cream on the other end and if this is true without blockchain and it's true with blockchain the fact is this is a recent technology realization says that we are not doing a good enough job educating our young people to the foibles that we are looking at in this environment and this is simply something that all of us from McCall have noticed a particular statement garbage in and garbage out is not going to change garbage in and garbage out is fine we abbreviated as Geigo so here that we get garbage data and we've got a perfect model but we still get bad results because we had garbage data and that's true whether you have a data warehouse machine learning, business intelligence blockchain, AI master data management data governance analytics technology whatever sort it's going to be problematic if we don't get the data fixed our goal here is to replace that poor quality data with good quality data and if we do that the data will start to propagate into our various streams that we have here and eventually we'll be able to get to good results on the other side of this in this case I call it quality quality out in order to get that we are right at the top of the hour in time for us to turn back to Shannon and turn on the Q&A sessions on this. Shannon back to you Peter thank you so much for this great presentation as always and just to answer the most commonly asked questions I hope you're sending a follow up email by end of day Thursday to all registrants with links to the slides and links to the recording from these presentations today so if you have any questions feel free to submit them in the bottom right hand corner was kind of answered later after this question was input but let me just ask it again Peter just to see if you have any additional things you want to add to that and if Canem did you want to add to that as well what is it meant by active metadata reducing data delivery time? Of course one of the some things about this whole industry is that it's really hard to tell what people mean when they say active metadata so I did a little bit of research around that and as best I can tell and I'll turn it over to Canem and Diffie as well to see if they have anything else it's the idea that you're using metadata as a tool now let's just start for a minute here and realize that metadata is not a noun metadata is actually an adverb it's a verb used as a noun so you technically metadata some data but there is no such thing as metadata so you can go around and point to it because that is data metadata is data and it should be managed using data management technologies so the idea that most organizations get into is that they start looking around is that metadata is that metadata if you can instead take the approach that by understanding where metadata and how data is used has metadata then you can start to push it into these machine learning algorithms so that's the best I've come up with in doing a little bit of research around it but let me ask Canem and Diffie to see if they have anything Canem want to go first Not much more to add than what you've already added but I'm looking at how the metadata is essentially being used in your organization so when you think of things like data lineage how the data is actually flowing through the organization you can collect information about how that metadata is being managed and utilized that seems to be some of the consensus on what has been by active metadata it certainly is one of the more emerging terms though that is being championed by Gartner and others so expect to see hopefully some broader consensus on what's on in the future Canem and Diffie have anything to add Sure, Pierre Let me just put you on the spot to tell us a little bit how the product that you're talking about actually incorporates those concepts into it I think that is the voice that you'd like to say Yeah, sure, so I think primarily what we're looking at is from a data cataloging and also from a data governance perspective the use of metadata is incredibly important when looking at what data is going to be important for the business so I think one of the key things that you talked about Peter is you have to make sure that the technology and also the disciplines that you're incorporating really serve the business intent the business value and the business objectives that the individuals in the organization care about so with the explosion of data the metadata about what data assets you have in your organization if we can get our hands around the active metadata on how that information is being used, where it's flowing in your organization we can see what data is being fed into reports what data is being used for enterprise KPIs and analytics that are important to leadership and to the executive team we can see what data is a part of our core business processes so whether it's creating new products delivering new services improving our quote to cash processes, things like that or even just looking at what data is going to be important from a security and compliance perspective the better that we can derive that metadata and specifically active metadata about what information we have the better we can move the needle towards building our data management data governance, data quality all of the capabilities that you talked about the better we can be at building those programs toward driving some specific value for the business so that's how we're thinking about leveraging metadata specifically in the categories around data discovery data governance and data management in that particular way that would qualify as a surprise question because we definitely did not rehearse that one as far as that goes thank you for playing along yeah, metadata is a really important aspect of big data management as more data gets generated a large percentage of that might be dark data and the only way to kind of differentiate between business critical and then the rest of the data many times is metadata and metadata management is actually a very important part of big data systems in general and a Luxio specifically because we actually manage metadata as a core value and core innovation with tiered metadata management because when you think about files and objects the binary bits is the data but actually the most valuable piece is the metadata about the files and objects and so there's a lot of innovation in the product itself that helps with metadata management particularly the temperature of the metadata in the way we move data around and also accessing data based on the metadata and pulling it back from the under stores and the data silos where data might be spread and you said two things in there that I think everybody else might be asking questions about what do you mean by the temperature of the metadata when you think about data you think about the value of data and as more data gets generated there is really a lot of data but how do you know what data is most valuable I think of it as if an application if an analytical application or any app for that matter is using data that becomes active data that becomes hot data and you want to know and you want to manage your hot data and warm data in a better way than all your data because you want to make it special in a way because that's your most important business critical data and if you manage your metadata in a way where you track usage and you track applications the trends of these applications using the metadata and you track it as a part of the metadata you understand the temperature of your data better which means that you might be able to understand the value of your data better which parts of your data set are most important versus the others which are less important and you can prioritize resources accordingly for that hot and warm data as well that gets back to the thing Pam was saying too about 1% of your data may be influencing as much as 90% of your operations that are out there thank you both for both of this and let me just clarify too Gartner term dark data is the data in your organization that you don't know that you have or should be using and so another great term that we don't get us wrong or are a little bit confusing and other questions yeah absolutely I love this next question coming in I think it's a great question for all three of you where do you see live data versus static data real-time capture versus manual input become the majority and not the minority one of the things I've seen from our two colleagues more difficult to get first on this one the key is it used to be the data there was always some sort of delay that got to you whether it was a monthly delay a woman's clothing store in my teens and we took all this wonderful data that came in that could tell us that t-shirts sold better at noon than they did at dusk just simple things like that and then we take all this data and they roll it into monthly statistics that were absolutely of no help whatsoever so it would be opposite of live data in that type of a context now with some of the wonderful new technologies that we have and again not just these two vendors but seeing our ability to get to things a lot faster people have black areas they have smartphones and things that can actually deliver these bits of information to them in a way that can make a difference I'm currently managing a website right now and when I get a sign up for that that's what I call live data I can go in right away and do it if I'm waiting in line at the gas station so that's a very poor example and talk about how you guys see the real surge in live data and how that your product can help that yeah I think that this has become a really important live data streaming data over the last five years as you mentioned technology has caught up with actually being able to use live data as opposed to a month old data or even a day old data a great example that we see is in retail where online retail e-commerce internet companies they want to have sales attribution and they want to have sales attribution on what coupon or what promotion is working the best and we're talking about a minute old or a few of maybe even five minutes old and they want to run sales attribution online in real time and reuse those coupons or reuse those promotions and so you're talking about live data and you're making decisions about what to promote in real time as well so to be a truly data driven product or company you actually need to leverage live data which might be the most valuable data as well as the most recent data ends up being the most valuable in some cases particularly retail and telecommunication because you can use that immediately to create effect and I think that's what we're seeing and Alexia gets leveraged because of the again the temperature you want to make sure that your most valuable data which might be the live data is treated more in a special way and is brought closer to the compute environments so that's kind of what we're seeing is the world is moving towards live data is the data is what the most valuable data is and it depends on the use case in some cases it can make real impact and others it may not and so depending on the use case you might want to prioritize live data super camera anything to add to that yeah I completely agree with Diffy on that I mean you really look at the time the viability of how quickly do you need the data and what's the purpose of that information is going to be used for so when you look at things like this is an example that Diffy had given around real-time reporting real-time use you know being able to get access to that live data is going to be critically important versus static that might be information that would be used for more of a an example or a use case where the time can be set now one of the examples around that is if you're using data to really run demand planning or something like that for a month where you're going to look at what information is needed to plan for products a couple of days in advance that data perhaps might be able to be a little bit more static than it would be live especially the biggest component with that is with the I think the advent in the increasing capabilities being able to get more data more quickly live data is becoming more important because people want results people want information more quickly in general but it certainly comes back around to the use case and what you're trying to achieve and the use case ROI of course is part of that as well I'm going to add one more piece to that too if you guys give some good examples one of the things you'll see happening more and more is a reference to AB testing and how data is critical to this data management in particular I use MailChimp to manage my mailing lists for things and one of the things that MailChimp will do is set up two different versions of it see which one gets the most hits and the rest of them out that way and it does it automatically that's a real value add to me from a MailChimp perspective because I'm an idiot when it comes to online marketing so there's a way that that can help out but the other part of it is you'll see that all of your organizations are engaged in some form of AB testing in this and what we're talking about in live data is a shortening of the feedback loop when I was selling women's clothing in Richmond and it was spultering in August and in Providence, Rhode Island where the company I was working for were trying to tell me to sell sweaters it just didn't make any sense because the lag that was involved of us so I contend that live data will be liveness or the freshness of the data will actually become an attribute that we may start to manage in very interesting ways going forward on this great question Shannon thanks for that great answers for you guys thanks thanks to the attendees for the questions feel free to submit it in the bottom right hand corner in the Q&A we'll try to get as many as possible in here so guys people process and technology where does the data fit is it the seat of the three legged tool great question I don't know that I've got an answer to that although I know from Shannon you'll probably need to move quickly on it so I'm just going to turn that one over to you guys any idea where does data fit in this particular slide maybe it's a seat yeah sure I can jump in you know we were actually I was talking with the customer the other day and I put it to them in an analogy that was something like this imagine that we're all cooking a meal and the meal is essentially the result that we want our data to deliver so you go into a restaurant and whether think about your favorite meal or think about a piece of cake that you want really the recipe that the chefs follow in the back to deliver you that meal that's the process so that is the codified process the best way to cook that meal a way that's been proven and that's kind of delivered the best result the tools well there are many things in the kitchen that can be the tools we can grill a steak over a grill we can cook a steak on the oven we can sear it on the top of a range so that really gets back to the tools that you use need to really be used to deliver that meal in the way that your stakeholders and your personas so to speak that you want to deliver that meal for the organization in the way that they expect it and that can be used as an analogy for reporting that can be used as an analogy for compliance and regulatory events where that data has to be prepared in a certain way the people that of course gets down to the chef I think no further explanation there the data what's the data the data is the ingredients so the data are the ingredients that you have in the back freezer the back refrigerator that you could pull from to cook your meal now some ingredients you might not need in fact there are probably just a couple of ingredients that you need to cook the meal that your stakeholders ultimately require in order to cook that meal you have to follow the process and you have to use the right tools so I'm not sure if I exactly answered your question but hopefully just through the magic of analogy maybe that makes a little bit more sense on how data is used across people and then we'll turn it back over to our wonderful audience out there because one of the reasons we do this is because we're looking for you guys to actively improve this we've used this people process and technology school for a long time I like to say that the school is made of good data but that's not very helpful so there's probably a better analogy and one of you guys will probably come up with it so let's challenge everybody Dipti, anything you want to add to that? Yeah I think there's another way you can think about data as a part of all three legs of the stool and the way to think about it is people from a people perspective you need a data-driven culture if you have a culture that's data-driven where you make decisions based on data not just anecdotes or not just subjective kind of information that's really the people aspect people talent that you have in your organization needs to think from a data-driven perspective the process needs to be data-driven as well and then the technology obviously you have to have the right data technology to solve the right problem and that becomes the method or the implementation detail but if you don't have the data-driven culture with your teams and you don't have a data-driven process even having the best technology and the best data management technology or meta data management or cloud management is not going to help unless you have the first two in place and so that's the way I think about it is it's actually spread across all these three aspects super thanks next question how does one do the quote-unquote new way of getting to know the data landscape if no longer the old way of quote tell me about your business unquote back to the profiling piece and then I think the question was what is the difference what allows us to get that order of magnitude improvement in productivity if you've ever worked in a context where you're trying to get information out of folks that have this information but you need to formalize it and then again this is putting the beater down on the desk and starting to talk about it in this fashion if I've always got to have people involved it gets us to something called the terror of the blank screen and your screen's in good blank I did that and I'll bring it right back again there people just don't like to create but everybody likes to edit and so by letting people know and that they get cookies real cookies not electronic cookies for answering these questions and for proving this type of information correct it leads to a much more rapid development environment somebody is telling this morning in somebody that resembles a lot of some of the agile techniques that we're seeing in the sense that it's an iterative approach and that we have the right people in the room to actually do this but their role in the room is to confirm or deny the hypotheses that we have so we may say active connected to why we have this type of relationship and if they say no that's not right we say great thank you you've helped improve our model because as far as we can tell that was the best we were able to come up with I think that is a pretty good picture we'll get some live data feedback on it right Any additional comments to that Kim or Dipty no this is Kim not on my and nothing more to add than what Peter already has that was great and it's interesting how this ties into the three legs right the data driven aspect of people process and technology Perfect and so this came up in the chat one of the things that we love about our webinar community is that chats always just on fire and this came up in the chat and it comes up in almost every webinar that we do no matter who the speaker is and especially with two new speakers on I wanted to bring this up because we do like I say we get this question all the time so I'd love some fresh perspectives so you know we you know a lot of them we all on the line spend time educating on you know data management and the value of it and so on and so forth and the various aspects of it but how do you actually get people to listen how do you get and to further expand that you know how do you get executive sign off on on these necessary pieces to data management great question and I've talked to a lot of folks about elevator speeches where it's really really critical to make sure that you've got everybody on the team that can answer a question in a certain way with a consistent message because that's the way it works but now what we're starting to see is a formal discipline around data storytelling starting to evolve notice it's at the beginning of the hype cycle which means it's going to go up to the top and then it's going to crash just like everything else but somewhere down the line we will have more guidance around this and this is something that we see a lot of classes that are being offered in universities now called data journalism where they're given a data set and told to write stories out of it if you want to get people in the corner office to pay attention you have to translate this stuff into something that matters to them usually that dollars sometimes it's lives and experience and things of critical nature in terms of things but it's got to matter to the people in the corner offices or they're just going to look at you as a technology person and not think that you have anything useful that you are going to be able to add to that. Again I'll turn it over to my colleagues here and see if they want to enhance that. Yeah it's interesting Peter that you know data storytelling is so important an analogy a similar analogy is in the startup world you know if you have a great technology it's not going to sell by itself right you need to kind of see a real market and a real value it's the same thing with data it might be there might be a ton of data it might be great data but unless you see real value from that data and you see returns and then how do I the other the person on the other end is not going to understand the value of the data and so at the end of the day all these data driven exercises and processes and technologies are meant to get value out of the data and that's and telling that story of what value it creates is the important bit and so as you put together an architecture to get more insights and value from your data it's important to kind of have that storytelling be a part of it as well because otherwise you're not going to be able to convince the other end of the line or the executives on the importance of data or the importance of that process or the importance of that technology and so at the end of the day that storytelling is going to be very important completely agree with that yeah completely agree with that you know this is something that comes up quite a bit and something that seems to always be a question that's asked usually more often than not around data governance in terms of being the biggest the place where we see the highest obstacle because data governance has almost been vilified and been turned into a four letter word it means that you're not allowing me to do something whereas something like data quality around analytics or around AI machine learning all various other aspects that pertain to data tend to be a little bit more exciting if you can communicate the value and the significance and really the importance of data governance you can almost communicate the importance of anything and there are a couple of things that we think about first of all I will absolutely echo what Peter and Dipti had already said finding a way to understand what your audience and what your stakeholders care about and then linking your story to that instead of trying to get them to care about what you already care about that is extremely important so think about what is it that already has your audience bought in as an example if you're talking to someone within supply chain they're going to care around about on time delivery they're going to care about supplier fulfillment they're going to care about day sales outstanding if you're talking to a CFO they're going to care about starving Doxley they're going to care about hard financial numbers if you're talking to someone in marketing they're going to care about prospecting and campaigns think about how data really drives the things that they care about and put it into that story put it into that framework the second thing is is there are multiple different kinds of value so there's not just hard monetary forms of value and you can certainly communicate your story that way but there are also stories of value around how it can increase the performance how it can increase speed how it can increase competitive positioning for the organization how it can just make life easier on them if it's a a tactical problem on just being able to find the right information so again find what it is that is a pain point for them or something that they're already really passionate about and fit your story into that instead of trying to get them to buy into why they should care about data quality or data governance or master data management or something that you are trying to champion that would probably be the best advice that I can give there's a lot more content that we have on that so if anyone's interested in additional information there you can reach out to us after this session and I don't think either of you guys would say this stuff is easy no not at all it's probably I would say it's one of the hardest aspects in building a successful program because people will change people will move on and up in the organization and the intent of the business will change this year the business will care about X Y and Z next year it will be about A B and C so you constantly need find ways to reinvent your story and make sure that you're linking it to whatever those important objectives are so I think we have one last question here so back to your stool metaphor Peter where would you place rules would that be the fourth leg so again the stool may not be the best metaphor I'm going to push back and tell you guys help us come up with some better ways of describing this the question is where would the rules fit into this I think that most people would say that rules inherently go under process but I don't know that my colleagues would agree with me there John I think process is a good place to kind of add checks in right as a part of the flow as you're kind of coming up with the end to end flow of the metadata management process rules come up in many different phases and each phase needs checks and boundaries in terms of what goes in what comes out and is it aligned with that specific phase so I would think that rules kind of fall into the process leg and it depends on for some processes like governance for example or metadata management becomes really important that they are in place otherwise you could it could get a little bit chaotic I would agree and I think with this picture you could add an infinite number of legs to the stool other variations that I think we've seen in the past have been people process policies in technology or people process procedures in technology to break the P alliteration there but certainly you can go as far as say people process rules in technology or the policies there so yeah it really I think comes back to what is going to resonate with the individuals that you're trying to communicate with and being able to see rules and policies and where does that make up the overall framework if that's important then absolutely I would support including that but I don't recall it technology we'll steal a thing out of Gardner's book and make up our own term for it there you go well listen guys it's just been super conversing with you and of course everybody's questions are simulating as always Shannon we'll turn it back over to you for some final thoughts thank you so much and again thank you Peter for another great presentation and Cam and Dipty thank you so much for joining us and adding to the conversation here as Peter mentioned and of course thanks to all of our attendees for being so engaged in everything we do we just love all the conversation going on again just a reminder I will send a follow-up email by end of day Thursday with links to the slides and links to the recording for everybody and again thanks to Alexio and to Infigix for sponsoring to enable help make all these webinars happen as you can see here is just playing next month's webinar data management maturity another great presentation on the data management maturity model so I hope you all can join us next month thank you so much and thanks everybody have a great day thanks everyone