 Hello and welcome. My name is Shannon Kemp, and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining this DataVersity webinar, How to Accelerate BI Responsiveness with Data Lineage, sponsored today by Octopi. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A in the bottom middle of your screen. Or if you'd like to tweet, we encourage you to share highlights of questions via Twitter using hashtag DataVersity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And again, to access and open the Q&A or the chat panel, you will find those icons in the bottom middle of your screen for those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to you our speakers for today, David Lotion and David Bitten. David Lotion is president and knowledge of knowledge integrity is globally recognized as an expert in business intelligence, data quality and master data management, frequently contributing to intelligence enterprise DM review and the data administration newsletter, tdan.com. This David Bitten has extensive product knowledge coupled with creative ideas for product application and a solid history of global sales and marketing management of software as a service and internet driven products. And with that, I will give the floor to David Lotion to start today's webinar. Hello and welcome. Thanks Shannon and thanks to Data Diversity and to Octopi for inviting me to speak on this topic. It's a topic that I've been familiar with for a long time, although what we will see as I walk through some of the slides is that the perceptions of the value and the utility of data lineage have truly accelerated over the last couple of years and that's contributing to great value that can be provided to accelerating BI responsiveness. So to actually to start off, I think it's actually really good to begin a talk about business intelligence responsiveness by reflecting on a historical perspective about the use of information for decision making. And interesting kind of side note recently I've been doing a lot of reading about about information value and decision making and, and going back over over decades of articles and research papers and magazine articles, etc about information value. And many of them, if not all this all the reflect what I what I've got to hear this quote from from this article called the value of information that was written in 1968. That says ideally information is the sum total data needed for decision making. And when I read through a lot of these these papers there's some common themes that appear over that five decade plus span about data usability within and across the enterprise. For the purposes of making good decisions and these are mainly what I show on this on the right side of the slide information awareness, or knowing what data sets exist and could be a value for decision making and, and, and business intelligence information availability, which is basically a lot of research about whether the data sets that that are available for use and under what circumstances or restrictions are in place for using that information, information trust or the degree to which that data in those data assets are trustworthy accessibility or how that data is made actually made available how I can get access to that information currency and information freshness are both looking at at the, the, how frequently data is refreshed and whether it's kept up to date. And then the last item there is information format or whether the data is in a format that can be easily used and I think that anybody who's worked on on any kind of data warehouse data mark data visualization reporting analytics any of those those types of applications is intimately familiar with some of the issues that arise from from any one of these items. But organizations that have traditionally use of data warehouse have attempted to finesse these issues by creating a set of well defined data extraction and data preparation pipelines but as we will see this tightly structured. data warehouse architecture is is gradually disintegrating and I think what what what we'll discuss is is how the, the, the traditional conventional approach to, to, to populating data into a data warehouse is beginning to be no longer the status quo so if you look at this picture in a conventional data warehouse. There's going to be well defined data processing pipelines and data is is is being extracted from from existing operational or transaction processing systems that are typically known sources that are well within the corporate data center. The data is extracted usually manage batches those data sets that are extracted are, are then typically moved to some kind of staging area sometimes when operational data store, where that data is in processed it's it's standardized it's cleansed. It's transformed it's reorganized it's then prepared for loading it typically in all in batch into this target data warehouse so it's forwarded to the data warehouse that data warehouse is a single resource that then shared by the different downstream consumers, and for the most part, all these processes are are managed by an it team or or a small team that is the data warehouse. So there are these well defined processing pipelines that are transformed the data from the original formats to one usable format for reporting and business intelligence, and there is some control that is exerted over that those data production processes so there's some some level of oversight and and control and so to that extent, the fact that that that we've we've limited ourselves to a small, a smaller domain of sources that can be fed into a bi environment gives us some some level of control for what seems to be happening in recent years is that enterprise data strategy has become more complex, and it's really driven by three evolving and continuously evolving realities one is a lower barrier of entry for scalable high performance platforms, especially when using cloud resources that don't have any capital acquisition costs and can be scaled up according to demand so. So, a large number of organizations are looking at migrating their environment to the cloud because it's it's lower costs and is more economically feasible number two is the available of low cost, or more more frequently no cost open source tools that simplify the analytical process and so, so years ago when you had analysts they were using in a particular types of of end user reporting and bi tools. There were license constraints there were limitations on availability of licensing to the desktop you might have have a need for particular types of expertise to be able to use these systems today. We've got all sorts of available tools that low cost no cost that data consumers are able to configure on their own and so we've got an increasing degree of of of sophistication of the data consumer communities and that leads into this third point which is that that there's a broader array of personas that are positioned to derive value from information and analytics so, aside from your traditional data analysts and your business analysts, you've got a number of different staff folks team members with a range of skills that go from being being basically a neophyte when it comes to data analysis reporting and bi to expert data scientists who are, who are, you know, need deep and hands on with the data they're all beneficiaries of the business intelligence reporting and analytics environments. And so the result turns into this virtual cycle where you've got a greater demand for analytics and that means modernizing the data strategy and that means growing the enterprise data landscape and growing the data landscape means there's greater numbers of data sources, and data pipelines and a much more diverse distribution of the way that data is is stored managed access flowed piped across across this this more complex enterprise. So when you get this increase in data sources data pipelines and inspires data downstream data consumers to be aware of more data sources which makes them want to explore more ways that they can use the data to form their decisions by creating new reports and new analyses and and applying the data by data in different ways so we get this this kind of virtuous cycle where the more data that's available the more demand there is for more data which then continues to increase the complexity of that enterprise. And, you know, the challenge here is that is that it makes the environment more complex to support the, the analytics demands because, and this becomes an issue because as the environment becomes more complex. So you've got this growing number sophisticated data consumers, each one of them now wants to exercise control over their own data pipelines. So instead of having the way we had used to have it in the traditional approach where you'd have an it team that oversaw all of the, all the, the progression and pipelines of that that that extracted data from the sources, did the transformations and loaded the data into the target environment. Now you've got these different data consumers and there's a much more, you'd say distributed control over those environments where each downstream user may may be able to get access to data and, and be able to do their own data preparation data engineering and data, data analysis so you end up with with this distribution of knowledge, but as a byproduct you end up with diminishing data awareness you've got an a decrease in centralized authority and increase distributed authority, but then you also end up with a decreased ability to centrally manage all the, the, the available resources and make it make everyone in the enterprise aware of what those are. This is not, this is not an academic question this is actually becomes a complicated problem right we started out by noting that we've known for decades that business decisions are enabled through reporting and analytics and business intelligence and that relies on data awareness and data availability and trustworthiness and freshness and currency etc. But when your data awareness is is diminished. It introduces questions about the data that's being used for your bi or your reporting essentially that impairs the data consumers from the best use of analytics so so instead of enabling those data consumers that the the increase in in complexity and distribution of data and the growing complex inter inter inter network of data pipelines actually ends up reducing the effectiveness of business intelligence and reporting because it throttles the ability to to to to enable those and and and data consumers so to enable informed decision making and accelerate responsiveness analysts data analysts or data consumers have to be aware of, you know what data sets are available and what data sets can be used and under what circumstances that can be used, and how's the data data source, and how the data transform between the origination points, and when it's been delivered into some kind of report or some kind of visualization or whether that's been forwarded in a particular format to to a, an analytical application. What are the dependencies that exist across the organization with respect to data sources in essentially asking, asking these, these key questions that I that I've got on the bullet points on the slide here is essentially can I trust the data that's made available to me. How are those, how are those data sources or how are reports impacted by changes in the environment or change to a data source and then the third is, is, can I get the data that I need at the right time. But when you think about it you actually can reflect on this a little bit differently in that we can put these questions a little bit in a different way because, because when you ask the question and this way it's really thinking about it as if the end consumer is not part of the process, but in reality, when we start looking at at how the end users are integrated directly into the those, those increasing number of data pipelines, you have to then turn this these questions a little bit on their side and say wait a second it's not necessarily about, can I trust the data in the warehouse but rather, how can I examine the data pipelines that are feeding the data into my application to which to allow me to feel confident that I can trust the data that's being made available to me, or, or a second the second question that's, it's under different what if scenarios. How can I determine how the different reports that I look at are going to be impacted. If there were some change to the source data or if there were some change to the data model of the source or even if there were some changes to the circumstances under which the data in the source is collected, or some change to a policy that that that makes a modification to the way data is being collected and then subsequently propagated. And then the third question am I getting the data that I need at the right time it's it's a different way of looking at it at that is, what are the best methods for optimizing the data pipelines in the environment to ensure that that I get the data that's most current and trustworthy and in the right in the right time frame but that it doesn't diminish from any other and elicibility to get access to the data that they need at the right time are there ways that we can look at how the different data pipelines are configured to be able to facilitate enablement of data delivery in a way that is trustworthy is not impacted in ways that we don't understand and can be optimized so that everybody in the organization gets what they need. At the right time in the right in the right way and and essentially also provided with the right degree of exposure so that there's you know protections and those types of things so from from a logical perspective what we're looking at in terms of informed decision making is that is that we require data awareness, and that data awareness is not just being able to look at at a simple enumeration of data sets and a list of the data elements that are in those data sets, but rather having having increased visibility into into how the data elements that are that are available to me in the formats that they're being made available. How are they produced and so here. It's no longer just a question of metadata so so in the past if you came and asked these questions 10 years ago, the answer would have been, well you need a metadata repository, but but in fact, the metadata repository is is only part of the answer is actually we look at data lineage data lineage is really the answer and data lineage methods are are are are intended to help develop a map are usable map of how information flows across the enterprise and these methods help help map out the landscape provides a holistic description of each data object that exists within an organization that is being used as a source at some point to any of the downstream consumers artifacts so the data sources, the pipelines through which data is flowed. The transformations that are being applied to data elements or combination of data elements along the way, the methods of access of those data objects and the data elements within those controls that are imposed. And basically any other fundamental aspect of information use or information utility and so data lineages combining different aspects of corporate metadata the first one would be the production lineage, which is the semantic aspects of tracing data elements that values are produced and an example of this is is. And this is an interesting example I think because people don't usually think about a report as being a an object that is subject to two metadata, but when you look at a report and you see that there is a field on a report. And there's a column, and then there is a value. These are not persistent typically these are produced values, but they are still data elements, and those data elements are are at the end of that production line. And so, so there's a there is an actual semantic interpretation of what that data element is that is based on how it was produced. And so if you are able to get get get some visibility into into the production change for that particular data element. It gives you insight as to what is actually being represented in that, in that that that component of that report or that that that bar chart individualization, or that that result of some kind of analysis. of data lineage better data that's associated with data lineages, the technical lineage which is the structural aspects of that data elements, as they are produced, consumed extracted, propagated across the enterprise. And so it's not just the flow. It's also what happens to that data element as it moved along the way and then the third piece of this is what we would call the procedural data lineage. Which is a trace of the journey through the different systems and the data stores that gives you essentially an audit trail so if you looked, if you looked at this audit trail the changes along the way it would give you visibility to be able to answer the types of questions that that we're having in, in the prior slide, which is, can I trust the data in the data warehouse well if I can see what the meaning is that is that is based on the composition of that data element through the way it progressed from its original source or sources, and transformations to three point then I can, then I can, I can get a level of assurance that I can trust the data in the data warehouse, or the second question being, being, how is the report impacted by a change with data source well I'd like to be able to trace what the, what the relationships are and the dependencies are from the data source to the points in the where that data is, is, is, is essentially employed, or used, and then am I getting the data that I need the right time, if I've got, if I've got this visibility into the procedural lineage and I can look at whether there's there are any, any inadvertent delays introduced into the way that those those data values are introduced, that will let me know whether there's there's any potential for the introduction of a delay that could impact data freshness or data currency within that process. And so lineage actually gives you visibility in multiple ways, and yet needs to be addressed across multiple dimensions and we're going to look at at some perspectives on how data lineage has changed over time, and it's kind of interesting. You know, I reflect back on some of the work that I've done 20 years ago with respect to data quality that that looked at being able to trace lineage of the process flow of the production of data that went into an end user report and end user analysis. And the issue, though, was 20 years ago, if you wanted to have data lineage, it had to be manual, you had to manually walk through your processes and document the metadata and document the dependencies and and and manually manage that. What was emergent, recent relatively recently and what I would call the second generation data lineage tools was simple automation for me able to do things like creating a data inventory or harvesting metadata, and then inferring some systems and systems dependencies, which really tells you at a very global level, which systems touched, which data elements in which in which data sets, so you can tell that that a system read a value or wrote a value, and even be able to give some kind of visual representation and and a bridge or an interface to a data catalog. But, and this takes you gets you part of the way there this tells you a little bit about, about the, you know, the data awareness of the data availability, and it does give you some potential for inferring dependencies on the data elements but it doesn't really give you that that the depth that is necessary for being able to to be able to get get in the level of insight that's necessary to be able to answer those questions the right way but the emerging I'll call the bleeding edge tools, and I would I would group what octopi octopus brief me on their xd, the data lineage xd, and, and these are essential things that they've, they've incorporated into their new release and I'm sure David that it is going to elaborate on greater detail but, but lineage of data elements across different systems transformations from source target or between system column or dependencies or basically cross system lineage that shows how data flows from the origination point through the data pipelines to the different reports and analysis that are delivered to the data consumers. It provides column lineage that shows transformations that are applied to data elements from the source to the delivery point. And then inner system lineage the documents details of the production of data elements within the specific system context. So, so I'm sure we're going to get some more, some more details on that when I hand it over to David. But to kind of cycle back, you know we talked a little bit before about a level of data architecture complexity and how, how there's a need for automation if we're trying to do this in a manual way it's it's essentially it's it's, it is, it is undoable manually organizations are continuing to expand their data landscape across on prem and cloud platforms the complexity of this of these these data strategies means that manual oversight and manual management of data lineage is going to be difficult, if not impossible and so organizations are going to need tools that automatically infer and capture and manage and provide a visual presentation that that is intuitive to the data consumers that provides details into this multi dimensional data lineage. And, and the implications are here is that the manual capture is difficult, it's time consuming and worse, worse, it's error prone. Automated capture and management of lineage is going to provide trustworthy details about the data origin the transformations and those dependencies across the enterprise. So, you know, we'll come back to the to the theme of today's talk data lineage accelerates bi responsiveness because it informs these processes and requirements things for like integrated auditing for regulatory compliance, or impact analysis to assess how code or model applications impact data pipelines, or replication of data pipeline segments for for optimization or root cause analysis, and access to these different dimensions allows data consumers to know what report data elements are available and how are they produced and what dependencies are based on the original sources and what transformations were applied. And we can look at an example use case. And this is, this is timely irrelevant, you know, we've got these data privacy laws that are intended to prevent against exposure of sensitive data and that's typically engineered into a bunch of applications. But, you know, here's the issue if you look at some of these, some of these laws laws are change over time where the, you know, perhaps the definition of private data is included, or it's expanded to include a data element that previously was included. So, so if you've got this this constraint where, where you've got, you know, multiple systems that are depending on on analyzing data and all of a sudden the law changes that says some data element in the source is no longer available unless you have a particular right to view that data. How would I know where in the environment I need to make a change the answer you know what processes are impacted, what reports are impacted, what systems or what code needs to be reviewed and updated and data lineage provides this visibility for doing impact analysis because the cross system lineage allows you to identify which systems are impacted by modification to that externally defined policy. It'll tell you when you modify the use or the rules associated with particular source data element, what systems are touching that data element and how are they touching it. So you can determine, you know, if you look at the column lineage, it'll show you what the direct dependencies need to be reviewed, and the inner system lineage will expose where there's internal data dependencies that might inadvertently expose that that data that is now has been modified to be incorporated into that definition of private data. So this is just just an example of a use case and there's multiple use cases about how data lineage can be used in different ways. So what do we want to look for when we're looking for a data lineage solution. Well, well, you have to look at this from the practical perspective again data lineage is a tool. It's used in different ways by different processes I gave you, you know, one example. And a bunch of other examples for the data analyst data lineage is going to provide insight into semantics and meanings of data elements that are available for developing a report or producing some kind of analysis for a data engineer data lineage gives you details about the pipelines and cross system dependencies. The AI developer might rely on lineage to track down issues that are affecting the development of a of an artifact. A data scientist might want to examine different methods that other data scientists used to prepare data for their analyses to see if there's any opportunities for for replication that would speed up the result. The application system developer might want to see how changes to policies or models need to be addressed across the enterprise. So you need to understand how to look for the right capabilities that will address all these different use cases for all these different types of consumers. So I've kind of boiled with them to these four categories when you want to look for a data lineage tool. So these these four different facets right breath is is you want to be able to have a details about the breath of how information flows across your enterprise when lineage is limited to assistant system data flow that doesn't show the finer details about dependencies or what transformations are being applied during each processing stage that's not going to be able to satisfy the needs of the different personas that we just talked about so look for tools that that are give you this description of the lineage across those those different dimensions going clockwise automation again you know I'm just I don't think I can emphasize this more any attempt to do this manually is is doomed to be error prone. So, you really want to you know if you rely on manual capture management that's going to be time consuming. It's going to be error prone automation is going to remediate these issues. The third is visualization you've got to have an intuitive method for providing the right level of detail and a visualization to each type of persona, especially as your, your the number of data pipelines increases and the complexity of those pipelines is going to grow. I mean, if you recall, if we look at at the, the continually evolving complexity of of our data landscapes, you can you'll see that that that relying on something that is that is not giving you the right level of detail is depending on who you're who the engineer is is going to is going to to impact their ability to make good decisions about how to to address their particular use cases. And finally integration. Again, data lineage is a tool, but it's a tool among an arsenal of other tools. The ability of tools need to integrate with these other tools and utilities, especially if you want to be able to automatically derive lineage information. So look for products that are engineered to integrate with other complimentary products and with that. If you've got questions, I think Shannon's already told us about how to how to share questions if you think of one after after the fact. You can contact me at either my, my knowledge integrity email or my University of Maryland email, and I'm going to hand it back to to Shannon to introduce David. Yes, thank you David lotion and so David if you want to share your screen to start your side of the webinar. And if you have questions for either David you may submit them in the Q&A section which you can find in the bottom middle of your screen for that icon and we will get to the Q&A at the presentation. David, take it away. Thank you Shannon Thank you, David for an interesting presentation so I'm very excited to be here today and to be able to actually introduce you to octopus data lineage XD which is actually the first platform on the market to provide advanced multi dimensional views of data lineage. Alright, so what is multi dimensional views of lineage. So first of all we have cross system lineage which provides you end to end lineage at the system level from the entry point into the BI landscape, all the way to the reporting and analytics so this level provides a high level visibility into the data flow, and it maps where data is coming from and where it's going. Secondly, we have inner system lineage, which details the column level within an ETL process for example report a database object and so on. Understanding the logic and data flow from each column provides visibility at the column level so no matter how complex the process report or object is. And finally end to end column lineage which details the column to column level lineage between systems from the entry point into the BI landscape and all the way through to the reporting and analytics and now what I'd like to do is jump into a demo and show you the power of data lineage XD with an actual use case. So bear with me, we should be able to jump into the octabyte demo environment. Alright, so what I'd like to do is again like show you this in a use case imagine now that you have a support ticket that was issued by a business user could be the CFO Mr. Mr. CFO and might let's say it's the end of a quarter and the report that they're basing their business, the quarter the results on there's something wrong with it which is of course a common scenario that I'm sure many of you are familiar with. So, in order to try to figure out what was wrong with that report, you're going to need to understand how the data landed on in order to do that, it's going to require reverse engineering and of course that's going to be done in most organizations today that are not using octabyte. It will be done with a lot of manual work, which is going to be very time consuming. And as David mentioned, inefficient and it will also introduce some other production issues as well. Now, this would not be the case with octabyte. Let's go ahead and see how octabyte would address that challenge. So, octabyte was searched through all of your various systems in order to gather the metadata that we need. Now what we see here on our dashboard is actually the octabyte dashboard on the left hand side we see here, a sampling in our demo environment of some different ETLs from SSIS and also from SQL Server Restored Procedures. In the middle what we see here are of course the different database objects tables and views, and that is of course coming from SSIS, SQL Server and as well some textual files. To the right of that we see the different reports and the reporting systems. So now in order to investigate this error in this report, most BI teams will go through a very similar scenario, which is they'll probably start out by investigating the structure of the report and the reporting system. After that everything will need to be mapped, and then they'll probably need to contact the DBA to ask some questions such as which tables and views were involved in the creation of that report if they don't know themselves. Now they also might go in and take a look at the fields and labels to see if they were given the same names and if not, which glossary was used. Now even after investigating everything at this level, which is of course the most common or make sense because the error is here, I'm going to take a step back to see first if the error crept in there. Now even after investigating everything here, our DBA may be kind enough to tell us there's actually nothing wrong at this level and you may need to look in at the ETL level. So you're going to need to take a step backwards, start investigating of course that at that level. Of course it's going to be a very similar process. Now in most organizations in order to do that kind of investigation, if you're lucky it may take an hour or two if it's a very simple scenario. If it's more complicated than that, then that it may take a day or two and we even have scenarios where customers are telling us sometimes takes weeks and even months depending of course on the complexity. So that's a fair synopsis of how that would be handled in most organizations today. What I'd like to do now is actually show you how that would be addressed within October by literally in seconds and automatically. So the trouble we're having with is in a report called customer product. So I'm going to type that in in our lineage module. And as I type that in October, I will filter through the entire environment, showing me the report that we're having trouble with. So now what I'm going to start off is actually by showing you a cross system lineage. So I'm specifically going to click on that. And about a second later we have any complete understanding at the cross system level of how the data landed on that report. So here's the, the legend on the left hand side, you can actually see what we're taking a look at. So on the right hand side, over here, what we have is the actual report we're having a trouble with that are CFO complained about as I moved to my left. I can now see how that report was created and we can see here that there was at least one view involved in the creation of that report as I continue to move to my left to see here there's another view, and a few different tables that were also involved in the creation of that report. So if I click on any item on the screen I get a radio dial that comes up which gives me more options. Now just for for, I guess for argument's sake or just to give you another example of how we can help is if you needed to make a change to this table and wanted to know what the impact would be. Imagine what you would have to do today in order to get that information with activity simply clicking on the button there, we now see the dependent objects, of course at a high level what would be impacted. We can make changes to that one table and of course that would be the same to any object on the screen. So as we continue to move to the left we hear finally we find the ETL that was involved in the creation of that report. In this demo environment there's one ETL and many organizations that we're dealing with they're actually using multiple different systems to manage and move their data. And if that's the case in your organization it's not a challenge rock to buy, we can still show you the path of the data has taken in order to get or land on that report. So we pushed our customer further in this scenario we actually asked what went wrong with this report and they admitted that a few weeks earlier before they had started using octabyte that had made changes to this one ETL over here, and most likely that's why they're facing production issues today. So we asked them of course if they knew that whenever they make a change there would be impacts why not look into, you know, be proactive look into the impact that those changes would would have on the on the on the environment on the system on the on the on the data pipeline and of course, make the corrections and save all of the production issues the data quality issues that arrive and the resulting confidence in the data and so on. And of course, as David said, it's, it's basically unduable in most organization because there's just too much to look into there could be hundreds if not thousands or even tens of thousands of different objects that could be affected different ETL tables views and reports and so on that could be affected by anyone change to anyone object on in the environment such as a table view or or an ETL so, of course, since organizations are forced to be reactive. And that's because the only way that they can work, they of course they will try to make changes without making any production issues using the capabilities at hand such as, you know, the knowledge of the people on the team if they're all still there and they haven't left the organization, maybe some spreadsheets hopefully they're kept up to date if not, you know they'll deal with it and then of course all using all of that. Make those changes, maybe holding their fingers crossed and a little bit of a prayer and then eight or nine times that attend that will probably be no production issues and then the one or two times that attend that there are production issues because they're forced to react they will have to actually react to those issues and the problem with that is you're only reacting to what you know of what you don't know of of course continues to snowball and create all kinds of havoc throughout the environment. Now of course with octopi we can change that we can turn that on its head. We can now empower you to become more efficient and proactive and actually before you make a change you can actually now understand what would be impacted. Should you make a change within the environment so let's say like this customer. However, we were using octopi we needed to make a change this ETL simply click of the mouse and click on lineage and we now understand exactly what would be impacted should we make changes to that one ETL. And so what we see here is something quite interesting, because if you guys remember the reason why we started this entire search or root cause or impact analysis search is because of course we had one business user complain although it was Mr. Mr. Mr. CFO, we had them complain about one report. And so as far as we knew of course ignorance is bliss that was the only report affected. Now, however, that we take a look at the lineage of this ETL we can actually probably 100 be 100% certain that most likely that is not going to be the end of the of the scenario most likely that some if not all of these different objects on the screen could have been affected or would have been affected by any change to this one ETL. So of course, you know, these different ETLs for procedures views, of course, sorry, we go views and then of course these measure groups dimensions tables and views and of course reports could have been affected. So most likely what will then continue to happen in a real reality is that as these reports get opened, you hope that those business users that are going to be opening those records are going to actually notice the errors in them. Because if they don't it's going to be worse now they're going to actually open support tickets. Now these reports will be open throughout the year by different people within the organization with different job functions of course. And of course since they're open throughout the year by different people at different times and those support tickets are open at different times with you there's just no way humanly possible that those who are responsible for trying to fix those errors could know that there is one root cause. What's going to happen is as we said earlier they're going to start to reverse engineer those reports which could take anywhere from hours days or even longer, and you can probably know better than I, how much time and effort is wasted through throughout the year, trying to reverse engineering those reports, because of course not going to be limited to six or eight or 10 it's probably going to be hundreds. Now of course I said wasted because if they had known the team or those responsible for correcting had known from the get go that this ETL was the root cause and they wouldn't be able to reverse engineer all of those reports to try to get to that root cause. Now I left these two here on the side then that is to prove a point and that is if you're working reactively, manually, most likely, you will get to most of the errors and most of the reports or in the system, but not all. And so some of these reports will fall through the cracks they will continue to be used by the organization. And then of course the organization will make business decisions based on those reports which is going to be of course, the most impactful of the two. So before what we show in you to up till now is a impact root cause analysis, and then we jumped into an impact analysis of course at the cross system level. And now what I'm going to do is jump into the inner system to show you inner system linear so let's take a look at this SSIS package maybe you need to make a change to this ETL and you want to see the impact at the column level. So let's go ahead and click on the SSI package and choose package view. So here we see of course once one package as it's as it's a demo environment so of course your production environment if you're using SSI is most likely will have multiple packages and of course you would see them here. So now let's delve into the container by double clicking here I can delve into the container. And let's take a look at the logic logic sorry and transformations that take place within one of these processes so I'm going to take dim product double click it. And now it will take me to a column to column level at in the inter inter system level. And what we can see here is a source to target so now, let's say, if you'd like to see the entire journey from source to target, including the transformations and logic. Of course it happened within a specific column, you simply now choose the column that you like to get to you might not be able to see but there are three dots that pop up onto the right of that. If I click on that that will now take us to a end to end column to column lineage. So now, if you'd like to see the entire journey, what you're seeing here is from source to target at the column level. Now we can also show you as well from the column over we can jump into the table level schema level and DB level. Now of course all of this is integrated. So if you need to jump further into any one of these objects on the screen, for example, if I needed to go backwards now to cross system lineage again black and that go back into the cross system lineage. And so that was everything that I had to show you here today of course there are other dimensions to octopus platform that I haven't shown you here today. We have of course data discovery. We have a business catalog which is actually called automated business catalog, a business glossary ABG. There may be, of course, other questions and if you'd like to see more about octopus of course you can get in touch with us. And of course we'd be happy to arrange a more in depth demo and presentation. Back to you, Shannon. Thank you so much for this great demo and information and thanks to both of you for this great presentation. Again, if you have questions for for either David. If you have a minute in the Q&A section of your screen to find the Q&A panel just click that in the bottom middle and answer the most commonly asked questions. Just to note I will send a follow up email by end of a Thursday Pacific time with links to the slides and links to the recording of this presentation. So and if you see in the Q&A section that somebody's already asked the question that you like just hit that little thumbs up button to escalate and to dive in here so which so it this came in with David Lotion when you were talking you know with the same information being used across the enterprise for various purposes would you address the any ethical implications that may be overlooked or not considered really 100% sure I understand what you mean by ethical considerations although I do think that an example might be the the determination that there is an a an unauthorized approach used to combine data from multiple origination points that results in exposing information that probably should not be exposed so that might be an example where I can make I we can automate the inferencing of of characteristics associated with say a customer based on data that's being pulled from multiple sources in a way that it shouldn't be used. So that I would assume that would be a good example of a use case for for lineage because you're able to see how data sets are being blended and being fused for for for downstream use but if you want to go back to the to the Q&A and type into clarification maybe recycle back on that question. Sure. Yes. So I see a question here that I wanted to answer if you don't mind. Okay. So I see a question here by one of the one of the one of the attendees it says I see the demo uses Microsoft tools to find and tools for and then lineage what other reporting or other tools do we support. If you don't mind I just like to answer that question. So Octopi has actually the most extensive list of supported systems, not just of course Microsoft, what you can see here is currently what we do support plus what is in our roadmap. You can simply find that on our website under octopi.com supported technologies, but to give you an example there's ADF as your data factory. We have a Teradata SQL of course Amazon Redshift was on the way, Vertica, Power BI, Click, MicroStrategy, Cognos, and of course we have many more coming. Sorry about that. Sorry to interrupt. Shannon, go ahead. Sure. Yeah, lots of good questions coming in here about Octopi. So, in fact, speaking of you know does Octopi work within SAP to collect lineage information. So within SAP we do not collect lineage. However, we do support SAP BO as a reporting systems and we can provide lineage to and from it. Yeah, so what is Octopi's enterprise pricing. Okay, so that is a question that would be a little bit more difficult to answer in a form like this, but I can give you an understanding of how it's priced, it's certainly not by user. Everybody within the organization can have access to Octopi and gain benefit to using Octopi or from using Octopi and that includes also our business glossary, so everyone on the business side can also have access to it. So the way we do price Octopi is like I said not through user it is by module depending on the module today I showed you one module. There are other modules within Octopi and metadata source. Ballpark is anywhere from around 3000 to $10,000 per month total, all in there are no limitations basically on anything and includes training upgrades maintenance. And so on that it is of course in a in a annual license or annual contract. What was the initial information created in Octopi? I'm sorry repeat the question again. Yeah, what was the initial information created in your demo there. Okay I'm not sure I understand the question. But it's asking how did it boot boot bootstrap the collection of information. That's how I'm inferring the. Okay, I might have understood that as well so how do we collect the data, the metadata so it's very simply done. There is an Octopi thin client that we send to the client, the customer, the customer installs that once in their environment on any windows system. They point that thin client to the various systems that we want to extract the metadata from. Of course we provide with you with all the instructions on how to do that. That entire process on configuring the Octopi client to extract the metadata should take no more than one hour. It's done once, once that's configured you hit the run button Octopi then goes ahead and extracts that metadata saves it in in XML format. The XML files are saved locally you can of course inspect them to take a look at them to ensure that they meet your security standards before you then upload those to the cloud where and then Octopi, the Octopi services is triggered. That's where all the magic happens the, the algorithms the machine learning and the processor contemplate to analyze that metadata and then make it available to you in the form of lineage discovery and even a business glossary to the business user via a web browser. That answer the question. Yeah, I believe so because you know there's a follow up to that you know how not only how is it initially but how does it keep up to date. Actually great great point that's something that I forgot to mention is that is it could be a entire process that I just mentioned can then be automated so that on a weekly basis. You upload a new metadata to the cloud it's analyzed and given to you so that you can see on Monday morning for example you upload that on a Friday Monday morning come back to work. And then you have a new version and that works actually quite well with most organizations because development usually happens during the week, then uploaded to the cloud and uploaded to production and so Monday morning you have a new fresh version. And while we're on that topic, you know, how do you handle lineage with software as a service apps often we're leveraging extracts or API access data from those apps. Well, I know that we don't support API's but I think I'm going to refer that response to me hi my colleague who is on the line. Hi. Yeah, sure. So we have different methods of extracting metadata from all different types of sources. And if in any case the specific type of source is not supported for automation there's always an option to augment different types of lineage for anything you have that wouldn't be supported. I love it. So how can a tool like this be opera like operationalized to work in an enterprise system like an MR. I mean hi, I think that's what's for you. Repeat that again. Sure. Yeah, so how can the tool. How can this did the data lineage XD be operationalized to work in an enterprise system, like an MR. Again, with all the tools that we support with automation is what David showed and is available on our website. And in addition to that there's always the option of augmenting additional lineage so to kind of get the complete full coverage if you have anything that's not supported. So I think here, someone may have misunderstood me a Kim Pair doing had asked a question no API integration. Yes, we do have API's that can be called upon so if you need to export everything or anything within octopi you can use those API's be called upon, and they can inject that metadata and the lineage into a third party application. We also have a direct integration with some other industry systems as well. Yeah, so it is clear that there is a need for a comprehensive data lineage tool to learn and understand the semantic structure and process. How do you integrate the tool to the ecosystem and other products and utilities. Okay, I mean hi once again I'm going to get refer that one to you. Yeah. Sorry about that. You know, the way the way our tool integrates is basically the way David described before is that we actually connect to the different tools and the BI ecosystem we pull metadata from those tools in our in an automated way. And once we do that we're completely away from the ecosystem we do perform all the analysis that David talked about on the side and make it available for the URL. I love all these questions about the product and lots of interest here. How does occupy help in a distributed environment where data sets are extracted and use locally. So again, it really depends on the type of implementation you would make use of all the different methods that we've been discussing so far. It really just depends on the different type of environment that you have. And you know there's a lot of questions here about data catalogs do you have any. Do you connect to any other data catalogs Clebra I bring in any others. There's a direct integration with Calibra. I have I see here a question and as I mentioned earlier we have API as it can be called upon to integrate with other we also are in talks with others to integrate directly with them. There was a question about what the data lineage at that Calibra delivers and what occupy does the question is the use case. The governance you cases are the use cases that you're concerned with and of course Calibra would be well suited for that. If they are the use cases that would be involved in a BI landscape such as impact analysis reverse engineering report. You know, and the various other scenarios that are within the BI of course occupy would be more suited for that. And augmented data management is a concept catching up with clients do you see a type catering to that market as well. So I'm not familiar with that I think I mean I would might know a little bit more about that but maybe you want to attempt that on the ground. Again, I missed the first word what was that augmented augmented. Yeah. Yeah, augmented data management is a concept catching up with clients. Do you see octopi catering to that market as well. Yeah, well, they kind of complete each other in a way. And then the API as well as, as I mentioned before allows you to augment augment lineage. But the ball you really enjoy the big benefit of the automation, whenever it's needed. That's also part of what we offer. And there's a lot of questions here you know about what you what other products you connect with and how you connect is there a link that we can get that I can send in the follow up email that shows all the. So here on my, on my screen, that is the link that currently supported systems I feel like I can send that to afterwards or anybody can actually say it's octopi.com slash supported dash technologies slash love it would you mind putting that in the chat for us please and then I'll copy that over and put that in. Okay, I think we've got time for a couple more questions to hear at least one more. Is it is it possible to include metadata from other governance tools into lineage for example if cluster seven inventory of reports as described with lists of sources. Sorry run that question where run that question by me again. My guess. The Corvette is, you know, is it possible to include metadata from other governance tools into the lineage. So that is a good question it all depends on if currently today octopi supports the systems that I mentioned earlier, we do have augmented links which we can actually add for systems that are not supported on the list there. I mean, how did you have any other options possibly for for that question there. It of course depends on what type of metadata you have in those different tools but yeah there's an option to import different data assets from different tools into octopi. All right. I think we do have time for one more. How might this be used to track data flows from discrete internet of serve internet of things devices across multiple source channels. Alright, so the way the way octopi works is we basically connect to the different data pipelines and different data. So we have the data elements that the where data warehouse the reporting tools and all of those and we will pick up the metadata from those, from those places directly. So that's, that's the way we, we kind of harvest the metadata and build the entire lineage. I love it. Well that does bring us to the top of the arm play that is all the time we have so many great questions about the product and interest again, just a reminder, I will send a follow up email to all registrants by end of day Thursday Pacific time with links to the slides and links to the recording, as well as the additional information requested here. Thank you to everybody for all these great presentations and information thanks to octopi for sponsoring today's webinar helping making all these happen. And thanks to everybody who's been so engaged, we really appreciate it, and hope you all have a great day. Thanks everybody. Thank you everyone. Thanks guys.