 Hello and welcome. My name is Shannon Kemp and I'm the executive editor of Data Diversity. We'd like to thank you for joining this Data Diversity webinar, subscribing to your critical data supply chain, getting value from True Data Lineage, sponsored today by ASG. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'll be collecting them via the Q&A in the bottom right-hand corner of your screen. If you'd like to tweet, we encourage you to share our questions via Twitter using hashtag Data Diversity. We'd like to chat with us or with each other. We certainly encourage you to do so. Just click the chat icon in the upper right for that feature. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Today, we have two speakers joining us to respond, the research director of IDC's Data Integration Software Service, and Sue Havas, VP of Product Manager at ASG. Before joining IDC, Stewart worked as an architect, consultant, and analyst in information management and middleware markets for 25 years. He spent 10 of those years at IBM, and most recently was an analyst with Infotech Research Group in Canada. Stewart holds an honors diploma in transportation engineering technology for Mohawk College in Hamilton, Ontario. Sue has over 18 years' experience working with metadata on the buy side as a customer's and sell side as a vendor, including implementation, showcasing, and program support. She has supported a wide range of clients, including financial, insurance, healthcare, manufacturing, and e-commerce, with a general need to provide data-driven business practices. Sue is responsible for launching and guiding ASG's Enterprise Data Intelligence Solutions, superior metadata data governance technology, and fresh modern offerings that deliver excellent value for today's challenging business demands. And with that, I'm going to turn the webinar over to Stewart to get us started. Hello and welcome. Hi, Shannon. Thank you very much for having me today, and thank you very much to the University for running this webinar on behalf of ASG. So, as Shannon mentioned, I'm Research Director, focused on data integration software at IDC. We follow the software markets, and we keep an eye on what's going on, and talk to a lot of end-users that are using the software to get an idea of how they're using it and their needs and their desires and feedback information back. So, today we're going to talk about getting value out of true data lineage and through the data supply chain. You know, in the 25 years I've been in the industry, data quality, integrity, and data lineage has always been a problem, and I'm sure it was a problem long before I even started. And my experience is from programming to solutions architecture, and now as an industry analyst, the problem of data integrity never went away. It really just got worse as data environments got more complicated. As organizations deal with increasingly complex data environments, and digital transformation, of course, is top of mind these days, data integrity is even more important now than it's ever been before. Today we're going to look at the issue of data integrity, specifically focused on data lineage and the age of digital transformation. As we follow the supply chain, the data supply chain, understand the value it can provide, and look at the features in an emerging software market segment that we're seeing and understanding becoming known as what's being called data intelligence. Data really is core to digital transformation. Intelligence over that data is critical to understanding the integrity of the data. Today's data environment today is more complex than it's ever been before. We used to have finite data sets with predictable growth. We now have seemingly unlimited data sets with exponential growth. We used to store text in numbers in one or two data technologies and formats. We now have many big data, relational, no-SQL, image, video, text, audio formats, and technologies to put that data in. We used to have databases that could enforce schema. We now have schema-less repositories. Data needs to be available. I did some work with a logistics company not too long ago that had so much data that they couldn't physically move it around every night so that every data center in the world had an exact copy and replica of that data that they needed to do their business. So we did some work with them to figure out how they could potentially have that data available to the people that needed it when they needed it without having to replicate it and move it around every night. Data needs to be secure. We know that the perimeter is gone. There's too many examples of data breaches in the economy today in the reports we hear in the news. Data needs to be compliant. In today's global economy, with data being stored and accessed around the world, it needs to be compliant with multiple regulations, not only where it's being stored, but also where it's being used. Data needs to be trusted. Organizations need to know how dirty their data is or how clean it is, depending on whether you're a glass half empty or half full type of person. You need to use it appropriately. You need to know where it came from. If you don't know the lineage of your data, you don't know whether you can trust it. A scale of data distribution and a variation of data sources and types on the third platform is greater than ever before. The third platform, you see a diagram here on the right of the slide, is a concept that IDCs come up with. It's really talking about where most of the IT spend is today and where all of the digital transformation solutions are being implemented. It's made up of four pillars, cloud, big data and analytics, social and mobile. On top of that, there's a number of innovation accelerators, blockchain security, augmented reality, Internet of Things, cognitive systems, robotics, 3D printing, that are really enabling some of the new business models in the digital transformation that's occurring. When you look at this, you realize that data integrity is going to be key to the success of these digital transformation initiatives. Data intelligence is going to be critical to understanding the level of integrity. Metadata management solutions and data lineage solutions really have become the cornerstones of what we're seeing in these emerging data intelligence solutions. We're expanding the definition of lineage from where and how to answer all five Ws of data. There we go. Let's take a look here at some initiatives that are occurring today and how data integrity is impacting them. We were in a survey in the fall of 2015 of about 650 data integration software end users. At the top of the list here is terms of issues that are already impacting digital initiatives. Security and compliance policies made the top of the list. I don't think that's a huge surprise because when we're talking about the third platform, we're talking about cloud and such, those issues come up all the time. If you recognize the policies, policy constraints, from what we've seen, I think it's more because the policies can't keep up with the technology. We've all heard about how cloud data centers can be more secure than on-premise data centers. We also understand that there are compliance policies that can prevent one department or one group from seeing another group's data. There's a lot of examples of that in healthcare. Budget constraints came in second. Not all the surprises. Everyone seems to have this common concern or issue. But if you notice data constraints came in third, ahead of technology, human resource, and even poorly defined requirements. That was a little surprising, not completely surprising, given how much I'm focused on data. But what's surprising, because I know when I was a consultant, it seemed like requirements always changed, and the business always complained about standing in the requirements and people not getting it right. The bottom line here is that digital transformation is happening now, and data, which is at the core of digital transformation, is already impacting these initiatives. We suspect that data without integrity won't be able to support those initiatives moving forward. Let's take a little bit closer look at data integrity on the third platform. In the survey, we asked about organizations where they were storing their data that was being integrated. And while this chart is up and I'm discussing it, I'm going to get Shannon to launch a poll here, because we want to get a better understanding of where the group that's listening to this webinar today is where are you storing the data that you are integrating? Are you storing it on premise only? Do you have a hybrid environment? Are you storing data in the cloud only? So then go ahead and answer that poll for the next minute, and we'll get to the results. So as you can see from the chart here, it illustrates that there were more respondents in the population of the survey people that are storing their integrating data in hybrid and cloud only environments. They're those that are integrating on premise only. The for rent sign is up there because we talk about access, trumping ownership in the age of transformation. And it's not just limited to IT. It's talk about Uber and ride sharing cars and car to go, bicycles and many urban centers, living spaces through Airbnb and so on. It is becoming more distributed and lineage will be harder to track across multiple clouds. Master data is at the highest risk of data integrity issues. Master data is the data about the people, places and things that your organization cares about. And it's that data that gets distributed the most into the cloud environments. You think about going into CRM systems and HR systems and payroll systems and ERP systems. It's the master data that is being distributed the most. And really the number of on promises only data environments are going to continue to decrease as companies are pursuing digital transformation initiatives without a cloud IT foundation will be utterly impossible. Sharon, is the poll done? Yes, it is. It is. It looks like most on-premise only. On-premise only. Okay. I'm seeing the results. That's okay. It looks like 25% on-premise only. 22% say hybrid. And 2% say cloud only. Okay, so we're about half and half between everyone that's on the line. Okay. So, although our survey results showed that it's increasing in the cloud environments, we suspect that trend is going to continue. So what happens when data gets to the cloud? Well, data becomes harder to trust. The results we asked those organizations who was measuring data quality metrics. So measuring how many duplicates they had, the completeness of their data, how much conflicting data, the staleness or timeliness of their data. And we then did a cross-reference of all of those organizations that were integrating data in these three categories environments. The metrics, the experience less positive and more negative changes in the measurements in hybrid and cloud environments. So let's take a closer look at that. So looking at on-premises, again, the negative, there's fewer organizations that have negative measurements there. But you can see the trend is moving to hybrid and then to cloud. The amount of negative gets larger, the amount of positive gets smaller. And really, when you think about this, it's an intersection of two pillars of the third platform that are driving digital transformation. It's cloud, cloud platforms and data, big data and analytics. You might even call this the perfect storm if you're trying to get more value out of data in your environment. So let's take a closer look at data on-edge. Historically, it's been traced in two dimensions of types, where and how lineage. And this is my people attempt at drawing a lineage diagram. ASG does a much better job of doing these, by the way. Where lineage traces the origin of the data, how lineage traces how the data source was manipulated or changed to produce the outcome. And so if you think about the data supply chain, data coming from somewhere going to a target, lineage is a significant part of that supply chain. So in this example here, we're looking at vendor data and invoice data going through to a spend report. So in the instance of where lineage, we're going to take a look at schema and instance. So schema where lineage in this example shows that the spend data came from the vendor in the invoices table. The schema level how lineage in this example documents the selection summarization and grouping that produce these results. You can then go down to a finer grain of instance lineage to look at specific data values, the one vendor and all their related invoices and potentially tracing not back to the invoices themselves. So this gives you an idea traditionally this is how we looked at lineage, considered lineage, but really in the context environment that we have today of the third platform and the complexity of all the different new data types that we have, need for better data intelligence is really upon us. It's driving new requirements and expanding the definition and the type of metadata that's being captured. Again, I mentioned this earlier now talking about the five values of data, the who, what, where, when, why and how. So who is using it? What does it mean? Not just what is it but what does it mean? Where did it come from traditionally where lineage but also where is it in the organization? Where is it being physically stored, persisted for access? When was it captured? And when is it being used? So looking both at the backward and forward lineage but from a when from a timing perspective. Why is it being stored? And why is it being used? How has it changed? How is it being used? And how is it related? Relationships bringing a new dimension to metadata and lineage. And all of these questions together are bringing context to the data. Relationships are a higher level of trust in the data that they have in their organization. Hence providing a lot more intelligence about the integrity of the data. I mentioned how important relationships are becoming. Relationships now need to be better understood. No longer is just a 360 degree view of master data important. But relationships between people, products, services processes all need to be understood internally within the organization. Externally between each other outside of the organization. And understanding how those relationships traverse the walls between the internal and the external. And an understanding of these relationships will need to happen at scale in the age of digital transformation. Let's look at customer intimacy as an example. If you've predicted that customer intimacy is going to happen at scale. And intimacy at scale may seem a little bit paradoxical. But leaders in digital transformation have demonstrated an ability to do this. Think about personalizing interactions and offerings to millions of customers through their ability to harvest insights from relationships, activities and preferences. Consider shopping on Amazon or offers that are pushed to your email or browser. Or show up in your Facebook feed and they're related to your favorite hobby or pastime. It's been said that it can take years to gain a loyal customer and seconds to lose them. In the age of digital conversations in digital transformation data quality integrity is going to have a significant impact on customer loyalty. Because poor data quality and integrity can result in turning a customer away pretty quickly. I think we're just sort of scratching the surface of the value that data lineage can bring to an organization. Here's some categories of patterns of data lineage that are delivering value that we uncovered through some research that we did. Governance, so providing backward lineage of data to trace results of reports and such. Back to data owners. Back to the sources for quality and access control. In addition to providing forward lineage that allows the data owners and stewards to manage the use of their data and understand how it's being used. In compliance providing evidence to regulatory bodies on where the data came from. Who's using it and how it's been changed throughout its life cycle. Change management allowing users and developers to understand how data element change will impact downstream systems and solutions. In solution development allowing better design testing and higher quality deliverables by sharing lineage glossary and relationship metadata across distributed development teams. Storage optimization providing insights into what data is being accessed where it is in the organization where it is in the IT environment how often and who is accessing it. This data is being used for data archiving decisions and acquisition decisions. Data quality improving the quality scores that are calculated through the application of business and standardization rules against the data and the lineage itself. Understanding how clean or dirty your data is through these scores can make for better decision making and better outcomes. Problem resolution assisting with root cause analysis and break fix issues. And really I think a wider business level benefit of lineage also exists focusing on change values of some of the core master data entities that are referred to earlier. These data entities that are shared among processes, departments and applications. An example could be marketing sales or service impact of a context change of title, department address or even employer. The ability to capture, validate and distribute and trace these changes in a timely manner could lead to better protection of likes to revenue streams and the ability to capitalize on new revenue in business to consumer and business to business commercial relationships. Not knowing about a change could result in losing credibility or losing relationship with a customer you've had for a long time. Knowing about that change and being involved with that customer through that change could go a long way to protecting that revenue perhaps even increasing it. So enough about anecdotal evidence. Let's look at some quantitative information. What's measured can't be improved is a favorite quote that I've heard and so we asked our survey respondents about measuring data integrity metrics that are referred to this a little bit earlier. Within the population of respondents that had implemented data lineage management have reported seeing tangible benefits with almost all of the reporting benefits within the first year of implementation. This chart illustrates the difference between the number of respondents that have reported positive data quality measurements and have implemented the process of data lineage management compared to the number of respondents that have also reported positive measurements but are not doing well. We can see here in the numbers and in the bar chart that data lineage is having a very positive impact on reducing the amount of data duplication, reducing the number of data conflicts, improving the timeliness or reducing the staleness of data. Based on these results we can only hypothesize the data lineage doesn't have a positive impact on whether or not it's actually. We also measured the impact of lineage on availability, security and compliance. These are similar results. While I walk through the data on the chart again we're going to have Shannon throw up another polling question. Is your organization tracing lineage and if yes are you using it with an automated tool? Do you have an automated tool for tracking lineage? It's manual knowing not tracing data lineage. As you can see by the numbers here there is some positive impact on reducing the time to find data but you can see there is a significant impact on reducing the amount of time to prepare data for presentation. Also by a factor of two at least getting towards three. There may or may not be an impact on security as the data suggests. But lineage does appear to be having a positive impact on compliance as suspected in that this is a key value proposition of lineage and meta data management. Other analysis in the poll results. It looks like we've got a few people with an automated solution. Moral analysis. We've got a few people with an automated solution. Moral with manual. We've got a bunch that aren't measuring and majority well not quite majority no answer. That's interesting. Let's look at some case studies based on the research we've been doing. In this example this came from an online payments provider. They use data lineage as input really an agile project. Development sprints. And as a result of using lineage or having a lineage solution more than 80% of the information about the data elements used in solutions is now available and consistent across distributed development teams. Which has removed a lot of assumptions, improved the quality of solutions delivered in sprints and reduced the quality of solutions. Lineage also provides the teams with the ability to perform impact analysis and propose changes and develop regression test cases. While a company hasn't qualified the value of lineage it estimates that data lineage has saved at least one two week sprint per project. Two weeks of an average 8 to 10 developers could be 20 to $40,000 per project without quality deliverables and pure break fix cycles after implementation. This particular case study came to us from a utility company. Prior to having data lineage available through an automated solution, the utility company had employed 15 data stewards each responsible for data in different areas of the business. The company estimated the data stewards spent 30 to 50% of their time in data forensics to know what the data in a report meant and where it came from. After implementing an automated data lineage solution they were able to deploy business user friendly data lineage dashboards behind the business users to answer their own questions. As a result the amount of time data stewards spent on forensics became negligible. It also resulted in them uncovering some security and architectural documents and information had not been kept up to date throughout system changes and data warehouse changes and data mark changes and the reports that were being created. As a result resources couldn't fully comprehend backward where the data come from and forward who was using that lineage. The data lineage that they discovered through the solution helped bring the utility back into compliance with both and get them happy with regulators once again. Looking at a case study that came from a top five U.S. bank initially the bank required data lineage to assist with data forensic processes and meet federal regulatory audit requirements including TARP and Basel II. They implemented an automated lineage discovery solution and really got a lot of value out of the lineage itself as they developed the business in terms of forensic and audits with concern. They also discovered they could use it at multiple levels of change management and facilitation of application modernization projects which is a common project happening in a lot of large organizations including banks. They were able to reduce operational risks because of it. They've already been able to decrease the time to market windows of these solutions. The banks have been able to qualify the value of data lineage. It's also been able to quantify the value of automated lineage discovery. Manual tracking of lineage in its complex systems environment was difficult in the error prone. Through the implementation of the automated solution they were able to reduce their efforts by 80 fold. The detailed analysis showed an approximate $1.1 million savings in discovering the lineage in just 10 key business elements across 100 applications. This is justification they needed to expand the program into more elements and more areas, applications across the business. So what's ahead for data lineage and data supply chain and data intelligence? We're always looking to the future of IDC so we're going to look here out over a dock into the data lake if you will. The impact and value of data lineage is really not clear. Sorry, it is clear based on everything we've already seen. Sorry about that. The complexity of data lineage in the area of digital transformation on the third platform really is driving a lot of innovation and solutions to capture, manage this data lineage in an automated fashion. So we need to have the ability to deliver higher quality data to the business that is trustworthy, available, secure, and compliant. Zero-gap data lineage is something that's going to become more important as organizations need to see the full picture of the data supply chain and you need to understand every single system component that's gone through where it came from and where it's going. Traditionally, filling these gaps there's increasingly automated solutions coming out in the market to go into the application source codes, SQL queries, stored procedures, and custom coded solutions to achieve that zero-gap lineage. Data lineage is an important part of the data value chain as we've been talking about and we've been talking about this whole notion of emerging data intelligence solution answering the five Ws, understanding more about data and organizations are going to begin to learn that big data analysis and insights is not just about the data but it's also about how the data is being used and there's a lot of information, there's a lot of insights about that that organizations are starting to uncover. Data intelligence is really going to be used to inform and improve data governance, improve data life cycle management, help clients deliver new insights. It's going to increase the focus on instance lineage, the where and how schema lineages have driven many of the solutions to date. The relationship has to come in in order to look at a relationship you have to go down to the instance level. More needs to be known about where the data for a specific product, customer, service came from, how it's changed, where it is, how it's being used in order for that to be more trustworthy. All of these trends and future predictions will help organizations better understand their own data supply chain and bring more intelligence to the decisions that are being made every day. With that, I will turn it over to Sue. Shannon, if you can give control to her. Can you hear me okay? Perfect. Excellent results and it's definitely on target with what we're experiencing and seeing the marketplace today. In this portion of the webinar, I'm going to talk more about how data lineage and subscribing to the actual lineage and supply chain increases the effectiveness of your overall data management, compliance and governance programs. I'm also going to hone in on a particular healthcare example and talk a little bit more about how we're seeing these use cases in other spectrums of the marketplace and then go in and touch a little bit on best practices for implementing data lineage. So let's get started with looking at that critical supply chain as a foundation to your governance program. So really becoming a true data centric environment really requires an accurate and precise view on that data lineage supply chain, understanding exactly where the information originated, the quality of the source of that information and how that information is being reproduced and delivered internally and externally. What we're finding primarily in that financial compliance community is that the speed and the accuracy in delivering this traceability and exactly the evidence that's needed to support stress testing and audit results. Our financial clients as well as the auditors are growing very weary of reproducing this information by deconstructing and reconstructing spreadsheets and the clients are demanding a deeper mining capability that can actually read through the different layers of your code versus simply looking at the source to target mappings from an ETL vendor. So if you miss a hop in the supply chain, perhaps you miss a column, a calculation, a code, or maybe the lineage stops before reaching its true source, that accuracy of your overall business decision or audit could result in a demarcation or a financial penalty. It could expose critical data that could be very harmful or prevent your clients or your company in general. And of course the PII issues are something that we all have to be aware of as Stuart mentioned before. The data governance tools are doing a great job of providing value in that collaboration and the crowd sourcing and forming those consistent definitions across the different communities as well as aligning those standards with the policies. But what's in question here is how strong the alignment is to the data standards, how strong that alignment is to the critical data and the data supply chain. So when you build that strong comprehensive collection of business assets, it really should connect to the similar strong comprehensive results from a data asset perspective. Otherwise the governance foundation could be sitting on a sheet or maybe it's representation of just one project or one line of business across that company or maybe it was captured from an SME interview. So when that data changes you need to capture those changes and we're finding clients want this automated and accurate as well because it's crossing both new and old technology platforms. So this traceability is really a gap on data trust to a specific healthcare example and here's a couple insights that I found online. It's not representative of a certain client or customer. In the first insight we see there were 2 million hospital stays for patients without insurance and then the second example here it says that privately insured parties stayed at least one day less in the hospital than those covered under Medicare. So that's probably a big deal in the healthcare world today and if we break it down and look at the business forensics we're seeing how the lengths of stays described within the different lines of business. We also see different types of hospital stays, inpatient stays, outpatient stays and all the different codes that were involved with these insights like gender codes, region codes, and the business rules and policies behind them. As we're looking in that business zone we're seeing that we start to have questions like well how was average length of stay calculated across these various regions and for that we need the data forensics, we need the data inventory, we need the data lineage to understand the where and how that steward mentioned how this information came to be and it needs to be accurate and we need this information not only to support the insight but to act on the insight. So how do we do that? What we're proposing is change detection and subscription to this information and that first level of forensics, data forensics is providing you with your high level data map. This is the where and the how of data lineage so when we look at that insight we can see that the source of the information is coming from the claims database and the patient database and we need to understand the integrity of that particular database. We also see it's coming from the MDM hub which is great because that's where all of our validated certified master data is coming from so that's a good sign and the change happens somewhere inside of the enterprise data warehouse and a lot of tables being joined calculations, aggregations happening to produce those results to what we're seeing here the patient claim reports and of course inside of my big data lake. So when we look at the holistic view from this level we start to question okay are my application owners aware of this change and what about from a claims processing standpoint do they understand the change that was impacted to these individual claim reports and inside of my big data lake results this could be the possible reason why there was an increase in the length of stay for Medicare patients. So let's take another look a deeper look at the actual change in what took place here in this example. So level two is the detailed information and what we found here is that a equal override was introduced in April and this override joined two entirely different columns. The length of stay for private claims that were set to yes used procedure start and end date versus what they were calculating off of before which was the hospital check in and check out date for region 14 in the UK. So by the way I did this before Brexit so I'm not trying to make any more of these insights. But the local change request performed within the EDW affected the entire data supply chain for claims in big data insights and when you start to investigate data causation and correlation you start to find out that the inclusion of this equal override actually shaved off one full day of that Medicare patient stay on the final global insight and what was reported out through the reporting area. So the question is how quickly can you pinpoint and detect these changes yourself? How easy is it for you to look across regions, look across departments, different technologies, the ETL code and the calculations, that one particular change could have been in paragraphs and paragraphs of PLSQL, Java or JSP code or maybe all those different codes. So you really need change detection at that really deep code analysis standpoint. So yes, quick and accurate data results or data traces are paramount in resolving these issues. However, we feel the evolution is towards detecting these cross-platform changes before they happen by using subscription and workflow processes to guide your BI and risk analysts through these changes before they hit production. So when you're looking at data lineage tools, you might want to look at whether the lineage is also supported in a work and process area so you can understand that to be state before it goes into production and understand you might want to look at lines of business workflows so making it easier for those different lines of business to submit their lineage before it goes into production and creating an enterprise view and then of course that code analysis being built into your lineage connections to make sure that all the hops are represented regardless of how that data is being moved. Finally, I wanted to talk real quick about just some other use cases that we've experienced and when you combine support for compliance, data assurance, data quality and data insight you start to see a much more different market segments and I'm going to, for a time sake, skip some of the ones that we've already talked about and jump to from a retail perspective in one scenario we saw were providing data lineage actually reduced data delays from weeks to days for seasonal campaigns where they needed to make quick changes to web services that were producing information and online marketing and promotions. Lineage is serving pharma really well by ensuring that key data sets are monitored via the supply chain lineage and validated for treatment patterns and drug testing outcomes. They use lineage to connect analysts to experts to further their knowledge and insights through the lineage. Then from a manufacturing standpoint we've seen the case where product directories and codes failed to connect the MDM hubs at where lineage leaving only half the picture. They couldn't pinpoint the source system and subsequently the incorrect codes were being used in their transform and calculation logic and these incorrect codes recorded the expenses to the wrong cost center which really misrepresented the overall product revenue and I'm sorry I have a poll here that I didn't read. Do you have resources managing supply chains in your organizations today? Pick one yes or no and then if you could also let us know what industry you're in whether it's finance, retail, healthcare, pharma manufacturing or entertainment that would be great too. Just running out of time. People are answering the poll and if you want to throw your industry into the chat section the poll is just closing in four seconds. All right perfect. I see some comments coming into the chat and lots of finance on the phone. All right. Photov insurance. More finance. All right and here are the poll results. We'll refer back to these maybe at the end because I just want to finish at least one more slide before we get to the questions and that answers if that's okay. Then how come it's not moving forward? We know that there's some things that you might want to consider before you go about implementing data lineage. From a software perspective we've worked at simplifying this process and the pool of information and from a best practices standpoint we've worked at how we go about scoping and implementing this particular data lineage project and we start with identifying the critical data. We usually do interviews and work with SMEs to identify where you're most at risk as an organization with not governing particular pieces of data. Then we also look at working backwards doing some reverse scoping on the actual critical data that you're distributing within and throughout your organization and understand what that chain is. We ask you for a baseline because that baseline becomes very handy when you're talking about return on investment later on as you start to prove out that lineage what might take you 60 to 90 days for your initial implementation of data lineage versus months and months of collecting this information for a one time pull is really significant in keeping that lineage project on track. Then we go at the automation. We point our scanners at collecting this information and in parallel you can be collecting your business glossary information, your traceability and connecting the business traceability to how it's represented in the physical world. From an ongoing standpoint we're getting a lot of traction with our end users who want to monitor and subscribe to change detection going forward. Finally, one last slide. We've pulled together our lineage experts to provide some pretty great new features in our latest data intelligence release. This latest release includes things like snapshots so being able to provide point in time history of when data was created and where it was created and how it moved and where it was distributed. Also providing issue tracking and feedback mechanisms and inserting our lineage into other areas of technologies in your technology ecosystem. A line of business workflow which I already talked about and of course subscribing to that lineage through notifications and workflows. I'm going to stop right there Shannon and turn it back to you for questions and answers. Thank you Sue and thank you Stuart for these great presentations. We certainly have questions coming in. If you have any questions please leave them in the bottom right hand corner in the Q&A section. And of course one of the most popular questions we receive are people asking about the slides and the recording. I will send a follow-up email for this webinar by end of day Thursday with links to both of those and any additional information requested throughout. Sue, I think this is specifically for you. This person said they joined late and weren't sure if you answered this already and I don't know that you did. Yes, we definitely do. So our automated relarts are going through email system to notify subscribers that a change was impacted anywhere along that data lineage supply chain. Thank you. And again, to you Sue, it appears to me that you're suggesting maybe a bureaucratic process which might bog down the entire process. But I'm not sure if you're aware of that. I'm not sure. I'm not aware of the entire process. How is it not doing that? I think that the bureaucratic process is more in establishing standards from a policy and a business definition standpoint. The automation of the lineage is really coming based off of scanners that port that information into the repository and create that linkage for you automatically versus running anything through a workflow process. I think maybe where that question comes from is that work and process area. So understanding the 2B state before promoting it into production and that's a step that is really up to you if you want to seek out and provide. You can make those changes straight to production or you can work in a preventative manner and ensure that the changes that you make don't impact other areas before pushing into production. But that's your choice. Sure, it makes sense. Next question coming in. What applications are you utilizing for data management and data lineage? Is that for Stewart in his poll or? Probably both to both of you. Okay. From our standpoint, we'll work with any sort of package. We've worked with other data governance packages as well as BIN do's or reporting packages. So anywhere where you see the need to see that source-to-target information and how the data is flowing and traced through it, we can port our information into that using a Rust API. Yeah, I think getting back to the whole zero-gap data lineage idea, I mean, it's important to have an understanding where your data is regardless of what application it's sitting in or if it's flowing through the organization. Typically, you know, if you're putting the data into a warehouse or into a data repository or something in order to build a dashboard over it in order to track KPIs and such, you know, it's important to understand that it's not just those applications, the data that you're looking at and there that you need to understand where it came from, it's actually an application itself and that's really where the Rust API that Sue referred to and you're seeing a lot of the data lineage software solution providers starting to offer that kind of capability and starting to look at or push into the application codes itself and looking at the SQL and, you know, intersecting the SQL and the network and being able to understand as that data is flowing around the organization. Perfect. I believe so and certainly the questioner can ask more specifics if needed. And do you monitor, again to both of you, do you monitor all of your data with your ETL processing or just choose the most critical data sets? That's a great question. Both. We like to start with critical data but also your critical data sets I think is another great avenue to start with and they could be the same in most cases but starting with your critical data allows you to narrow down your priorities and ensure that you're attacking, you know, those areas that's going to provide you the best overall value of your program and see some really great results right away. Yeah, I would agree. I've seen organizations that have gone against starting with their critical master data to understand the lineage of it and understand its supply chain and been so successful with it in one domain of master data that they quickly started to expand the program into other domains and in some cases they're now looking at expanding that even further into the transactional data. The next question is you know, I'm sure it's going to it's certainly going to vary from environment to environment but maybe Stuart, you want to start here answering this question. How large is their EDW? Ours is 200 terabytes in production. Yeah, I think again that's in the survey that we did we did not ask about size of data warehouse. There certainly are multiple different sizes. It depends. It depends. Depending on how much data you're actually tracking. There are some organizations that I've worked with that have data warehouses and have data environments larger than that and there's others that I've worked with that are smaller. And when you've got that amount of data you've really got to kind of back to the last question you've really got to pick and choose what you're going to start with. Look at your more critical data elements and the sets. The data that you need to have the highest level of trust in to support strategic or operational business decisions that's the data you want to go after and that's the data you want to understand the best the most about. Stuart, I don't know you've worked with probably more customers in terms of specific data lineage applications. Do you have an idea of the size of data warehouses that they're handling and managing? They're usually quite large. We've worked with some that have over 50 to 70 source systems that are feeding those data warehouses and going out to several different reporting marts and into that big data link. So the environments can be quite large and from our perspective we're capturing the metadata so we're not capturing the data but understanding all the relationships and the links in and out of the data warehouse is very important and a lot of these financial institutions are global implementations so the data can get quite massive but again we're looking at the schema of the information versus the data in most cases. Lovely and again questions to both of you. Did you prototype various data lineage tools and which did you use? So I'll take that in the research that we did. We did not look at any specific data lineage tools. It was really we wrote a paper recently trying to understand the value that data lineage was providing. We didn't go specifically into tools and we didn't do any comprehensive evaluation of the tool sets. Sure. Well, to both of you I don't know if you have this on handy or maybe we can include it in the follow-up but the inquiry is seeing the impact in the graphical fashion for a change at source assuming the pipeline is complicated data federation environment. Okay, I think the question is how easy is it to see the impact of the change in a real busy environment and definitely there's different ways that you can zoom in and zoom out on the information and we always feel that it's better to start from an app-to-app level first. So kind of describing the flow of the information from an application perspective and allowing the end user to drill further into the information. So as they drill it's important to have some online graphics that provide a trace and with our technology you can hone in on a particular critical data element and say trace and it will draw a red line around the feeds and around the seed item and where it's going to. So absolutely correct. Sometimes getting through the spaghetti is not easy and there should be some good graphics and it should be dynamic for the end user to control and zoom in and zoom out on. And so does ASG support lineage for hybrid data warehouse implementation comprising of Hadoop and relational databases? Yes, we do. I love that, that's a quick easy, quick answer. You know and the final question coming in here and again maybe this is something we can add in the follow-up email the questioner wants to know where they can find more about this topic to both of you. Stuart, I think it was a great time to talk about that white paper. Yeah, so ASG will be providing the email that goes out of leave after this will give you links to a white paper that I have written at IDC based on the research that I did looking at the impact and value of data lineage. You'll see some of the content in the white paper. But you'll see a lot more than that and you'll be able to grab that from the ASG website. And obviously there's a lot of things going on in data lineage metadata management. We're sort of tracking the market so we've got additional material in terms of market sizing and forecasting that sort of thing. But there's a lot going on in data lineage or you want to look at the broader topic of metadata management. You want to start to look at data intelligence solutions. I'm sure that there's a lot of different things that are going on out there. I know certainly when I was doing my research my queries, my internet searches ended up giving me a lot of things to look at. And in addition to that I'm sure there's lots of additional material on the ASG website. Hello, and that brings us right to the top of the hour. Again, Sue and Stuart, thank you so much for this great presentation. And thanks as always to our attendees for being so engaged in everything we do. We just appreciate all the great questions you submit in engaging conversation. Again, just a reminder, I just sent a follow-up email by End of Day Thursday with links to the slides, the recording of this webinar and the white paper that Stuart just mentioned. And thanks again and I hope everyone has a great day at the ASG for sponsoring today's webinar. Our pleasure. Thank you all. Thank you very much, Shannon. Thank you very much, Sue.