 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We would like to thank you for joining this DataVersity webinar decoding the mystery how to know if you need a data catalog, a data dictionary or a business glossary sponsored today by Octopi. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. If you have any questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions by Twitter using hashtag DataVersity. And if you'd like to chat with us and with each other, we certainly encourage you to do so. And to open and access the Q&A or the chat panel, you find those icons in the bottom middle of your screen for those features. And just to note that the chat defaults to send to the hosts and panelists, but you may absolutely change that to chat with everyone. We will send a follow-up email within two business days containing links to the slides, the recording of the session and additional information requested throughout the webinar. Now I'm going to introduce to you our speakers for today, Dr. Malcolm Tism and Amakai Fenner. Amakai is a product manager with Octopi, with over seven years experience working as a full stack BI expert. He has expertise in BI methodology and architecture, as well as technical skills and various BI tools from ETLs to reporting and analytics. He currently has the development of Octopi's automated data catalog. And Malcolm is the president of Data Millennium. He's a thought leader, author, speaker in data governance and data management. Malcolm has over 25 years of experience in data related disciplines and has worked in a variety of sectors, including finance, manufacturing, government, pharmaceuticals and telecoms. And with that, I will give the floor to Malcolm to start today's webinar. Hello and welcome. Thank you very much, Shannon and it's great to be here with the community today to speak about this important topic. So if we can go to the first slide. I think it's a very hot topic today, but it's one which exists in a historical context and that is the shift to data centricity. Think about when you know the computerized age really took off, which was in the mid 1960s. For many years thereafter, the focus was on automating the enterprises processes. Data was thought of as a byproduct of automation. There was very much a rush to do things like, you know, automate the books and records of an enterprise. Because if you think about a bank in those days, people had to go into a bank and a human being would write in a ledger how much money you were taking out or putting into your account that's completely unthinkable today. The banks couldn't scale and these these kind of clerical operations were very expensive. And that's when computers came in and provided tremendous amount of efficiency. It also allowed scale it allowed speed to increase. But the focus was on automating processes. To some extent that process has stuck with us today, but the reality has changed over the decades. Today, we can find packages to automate practically anything. The core problem we're faced with is getting value out of data. Data is increasingly at the heart of business models. And you can think of, you know, companies like Google and Facebook and Twitter, whatever you might think about them. They have their working with data. That's what they get value out of that what they monetize. Not all enterprises are directly monetizing data, but they are using data to do things like predictive analytics that be I etc run their companies. So that's what's happening today, albeit with somewhat of a process centric mentality still in it. We have massive data volumes and data from not only internal enterprise sources as well, but external enterprises enterprise sources as well. So data acquisition has become a whole sort of sub discipline of data management and data governance that's affecting the way in which we need to process data and fees into our artificial intelligence and ML technologies which are increasingly available and are increasingly within reach of in small to medium size enterprises. So they this makes everybody very data hungry. So data is at the center of the sort of modern technological world. It's been called the new oil the new gold, the fuel that runs our digital or economy or information age. And that has a profound impact because we need to know how to manage it, which is where data catalogs business class glossaries and data dictionaries come in. Next slide please. So, what is metadata, well metadata is the information we need to understand and manage the data assets we have. But metadata isn't like a single uniform substance. There's a lot of different kinds of metadata, and that tends to be reflected in these different kinds of tools which manage different kinds of metadata. So, the business glossary does things like manage terminology for both information and data concept manage definitions manages classifications, terminology is important, what do we call something. We don't want different report labels on all our reports for the same piece of information it would be nice to have that standardized. Well you can do that with a business glossary got a single place to go and look at their actual terms, maybe the synonyms and homonyms in there that get disambiguated. And then we have definitions definitions are not any more like you know the couple of lines we saw in printed dictionaries definitions when it comes to data management and data governance are much more than that. They are facts of business significance. Think about a metric like stock on hand. How is that calculated mathematically what's the methodology for that calculation. Do we do it in the morning do we do it in the evening in our in our shop. How is it to be done. So those, those facts of business significance to help us to understand data are in part in the business glossary and classifications we can classify data all kinds of ways and the need for that is growing. We've seen that with data privacy, and even within data privacy regulations you see subclassifications for instance, the California Privacy Rights Act has 12 classifications of personal information which you have to disclose to data subjects. When you tell them what you're doing when you collect their personal data. So, a lot of things going on in the business glossary data dictionary, very familiar goes back a long way. Typically, it's the schema table and column information, the structural metadata that is from our relational databases, but that's grown up to over the years and now we have data profiling information in there. We can do simple things like say, you know what's the minimum value the maximum value, and you know you can get all kinds of information are profiling data universes which will come to again in a moment, and then other relational objects like views so we see these stored in a data dictionary. Additionally, I think it's fair to say the business glossary has been much more on the business side of the house and the data dictionary has been used by it professionals certainly I did when I was more active in development activities. We have the data catalog, and the data catalog is an agglomeration could just go back one slide please on major data assets files data sets which are logical groupings of data reports other data assets, and it attaches definitions to these data assets at this more aggregated level. Okay, can we go to the next slide please. Okay, so that's a brief look at them but it's what do you need well depends who you are traditionally usage by roles kind of look like this. So the business glossary has certainly been used by business users and increasingly by self service users so I think self service business intelligent has has really sort of been a pioneering area in self service, but we're starting to see self service come sort of expanding to other areas with the idea of data democratization that anyone can use the data. If they can get hold of it to improve the business, you know for the benefit of the companies or enterprises we work for data architects also use business glossaries and so do data governance professionals. So the data catalog, you're now having to look at more technical level data. So certainly a self service user particularly in bi might need to use that to look up columns that they intend to use data architect clearly is spending a lot of time, doing with you for attributes if you're at the logical level and I certainly did when I was doing a lot of data modeling data engineers the guys now who are moving data around and building data pipelines and integrating things they need it to. And again the data governance professionals have an interest in it, maybe again for something like privacy. Sorry there's data catalog the data dictionary, much more on the data on the technical level again the data architects need it the data engineers need it now our friends the DBAs obviously need that because that's their primarily working at this level. And again you'll find the data governance professionals needed also they need everything. So traditionally we've seen different roles use these things in different ways but I think what we're seeing today is more of a coalescence of of the usage. It's probably still role based but it's getting more subtle. So I think it'll be interesting to see how this develops over the next few years as we get more and more technology available and data becomes even more prominent in just regular business usage. I think one of the things that we've seen in the past is when somebody goes into a new role and new job they asked well what do I do. Increasingly they're asking what data do I work with I need to understand it. So these three capabilities are providing the answers. The, that are needed to understand the data that each of us as business people now will have to work with. Next slide please. So data catalogs which I think are emerging as the primary vehicle today need content. Okay, and this is something that you have to think very carefully about. You can put out a data catalog just as a capability and say, hey folks here's our bright new shiny data catalog. Please start putting information into it well. It's not going to necessarily work too well, because you're asking people to make to, you know, take time and effort to put the metadata into the data catalog, but there's not enough content yet to get any value out of it. That's not going to be received very well and I think this is something we have to think very carefully about in the data profession that okay we have tools and they have great capabilities. But if it's just a capability and it's devoid of a lot of what makes it useful the content, then we're likely to have a problem. So as you can see from this graph, each of us in our enterprises needs to understand what's the minimum level of content the minimum viable content needed for business adoption past which the business says aha. This actually does may not have everything. It's got a lot of stuff that's very valuable to me, and I can use it. And now I'm going to get value out of the data catalog, and that in turn gives value to our data management to data governance programs. The way to do that is to have an initial sort of provisioning stage, which is done with automation because you can only really be done with automation will see a bit more of that later too. So this is something that I really want to emphasize please think, not just in terms of capabilities, but level of content minimum viable content to. Next slide please. So a little bit more about data universe as we've talked about data definitions and people say yeah I know what an employee is I've got the definition, but is that all that you would want to know. So here we have three databases a global combined employee database, Canadian employee database in the US employee database, and these have a some sort of data movement and integration between them. The definition of employee may well be the same for all three of these databases. So they're all employee from a data definition perspective, but the data universe the population of things they contain is different. So the Canadian employee database only contains Canadian employees contractors interns whatever directors, you know members of the board, the US employee database, same thing but just for the US. The global combined database has everything in it. So populations, what the data contains is never going to be given to you from a definition, a definition alone. The definition explains the concept. It doesn't explain the extension the coverage, the universe, what the populations are, how are we going to get that information that has to be in. data catalogs and or data catalogs also. And as you can kind of see here we can infer it from lineage source already starting to see the lineages providing a role in helping us collect metadata to improve our understanding of data assets. So that's important because data universes are very often overlooked again something I just really encourage everybody to think about in their day to day work. Next slide, please. Okay, so what we are seeing, as I mentioned earlier, as a trend today is a move towards consolidation and the data catalog appears currently to be the point at which this consolidation of metadata is happening. And it turns out, I think you can kind of see this is a bit of a busy slide that the, the, the sum is the whole, the product in the data catalog is more than the individual parts sort of some together so here's how it might work. We have a business glossary and it has business terms in it, and the business terms will have definitions synonyms will be identified etc, and so on. We have a database, which we can understand through in terms of the metadata the structural metadata that we're going to harvest from it. Which would traditionally be held in a data dictionary, and then something we haven't really discussed yet but reports or reports are out there and reports would be seen today in a, you know, in a data catalog, and there are ways of capturing data into data or what happens in reports well in reports we see report labels report headings, and the actual fields that are populated into a report and there's a linkage between those fields, and the databases from which they come. So we may have a column that's just called CL in our database but customer last name in the reports. Now what this is solving here, which is very important to note and may often be overlooked is this traditional tremendous chasm in metadata management was to say okay, I have my business terms. And I've got to see of those and I have my database columns and I've got another ocean of those. How do I relate the two am I going to be able. Am I going to have to like have people do this manually. And that's not going to be easy to do at all, unless you want to spend vast amounts of money and by the time you've got through it everything will have changed all over again. So that's in practical but you can see that there is data catalog functionality that's going to unite all of these different items of metadata, and it pushed them into this consolidated view which we can now see where we've got business terms reports and database structural information altogether in one place so that gives us that provides tremendous value in terms of metadata management. I mean, we have to manage these highly complex gigantic environments of information that we have today you would not, for instance, have an oil machinery that you would run without controls that measured fluid levels temperature pressures and you'd have them in a control room and all would be monitored. We need to do the same and more for our metadata environments not just operational monitoring, but to get this additional value unlocked from the data that we're dealing with. So here's our problems kind of hinted at them already. How do you collect the metadata. Difficult. How does all the metadata get related how do you establish relationships among it. That's a key question. How do you keep it updated. Also we've hinted at that one. So let's have a look at one solution. Well, you're, you're going to go to automation because you cannot do this manually. And I mean the scale and complexity of data ecosystems is just too large for human effort. I have seen it done. It's an enormously expensive time consuming effort that gets lots of people angry and doesn't lead in necessarily yield great results. Just to get everything documented before you establish the relationship relationships among it and then how do you find the relationships. So, these are, this is a really difficult nut to crack but let's take a look at the next slide and see how you might be able to do it well at a very high level. Data lineage can do this it can help you. Why you think about data lineage in its full extent you're starting with something in a database. It flows through processes and we should come back to talk about processes in a minute. Yes, those are ETL processes that technical processes or they could be SQL scripts or something else I get it, but they're And then they maybe go to another database and another hot skip jump through other processes. Eventually we hope, ending up in reports. It's very interesting we find columns that just dead end and nobody knows what to what they're used for, but ending up in reports So, as we've seen earlier, which is where we can make that link between a database column, which is attached to a report label. So we understand, aha this database column is the, this report label rather is the business term that goes along with this database column, and then I can go backwards through that chain of data lineage and say oh all these things are in the same column for the same column because they are all feeding through without transformations without changes, all the way back to the original source, the book of record. And it also provides other things that are important, such as the processes, the business processes today are really represented by data flows, which get implemented again by things like may seem trivial to business people but they are not ETL, not trivial but rather overly technical, SQL scripts, and ETL processes but those represent business processes so if you want to do business process reengineering, and you want to know what processes you have. And a good deal of that information lies in the process steps in the data lineage. So you can kind of see, again, for instance, think about GDPR, you need to have a process register for what you're doing with personal information. This can help you. So we're starting to see that the. We have the automation to gather this information. We have the capacity to populate the business glossary the data catalog and the data dictionary. And because the lineage has these relationships which again yes I know that flow relationships but they're also logical relationships because the logic data element is populating like data element will be transformed in some way. So we're making the relationships to. So that gives us this consolidated integrated whole view of metadata, which is what we want to see in the data catalog is increasingly the place where we can get that. So we get the terminology we get the semantics out of business glossary kind of functionality. We get the structural metadata from data dictionary type strategies, but it's becoming manifested in the data catalog is the one stop shop for everybody now and everybody is going to include the citizen data scientist the citizen developers the citizen analysts to work with the data. So that's why it's this this sort of new paradigm is so important it's all coming together. So data lineage can harvest metadata and build the relationships among it. Data traceability. So, this is something else I just want to bring out, which is another major reason that we need data catalogs, because I think as we're all probably where traceability for impact assessment is very need is very much needed. So what it is to say if I change something upstream. What is it going to change downstream. And, you know, that's important and going the other way, sort of traditional lineage, something broke, what's feeding into it that could have broken, for processes, but I want to point out that data traceability is becoming more of a general data governance requirement is beyond the realm of the technical folks who, you know, who are very important don't get me wrong. The things out there such as BCBS Basel Committee for banking stability 239 which says, look, guys, you, you are doing things like reporting on risk or reporting on capital capital adequacy ratios, you have to prove that the data that you're reporting actually came from operational systems. So, without changes without people doing manual changes to it, and so on. So we've got to see the flows from the operational systems to the risk reports, and that's got to be proven. Okay, so you see that traceability itself is going beyond the, the realm of the more technical environments into very much business needs that are in many enterprises. So, something else that you will see increasingly overlaid in the story of the business glossary data dictionary and data catalog. Next slide please. So in conclusion, the business glossary data dictionary data catalog have kind of different foci or focuses in terms of the metadata they manage and that's been very traditional but there are relationships and the business glossary gives us meaning. But the automation is going to be needed to harvest particularly technical metadata of the kind we see in data dictionaries and data catalogs. Data lineage is a great way to do that. And it also helps us to create trust in the data, because of this full traceability that's a big, I know people will talk about data qualities being important by for trust, but traceability is too. The data catalog then becomes the place where all this information is integrated, and it's kind of the one stop shop to understand and collaborate about data. So, that's a very brief overview of this very complex topic. Hope that helped and with that I'll send it over to Amichai. Great. Thank you Malcolm for that really in depth comparison. Thanks again everyone for joining. And as you mentioned, well did data teams face major challenges in these challenges include the lack of visibility and control of data. And of course, lots of lots of knowledge that is just scattered throughout the entire data ecosystem. The causes of these challenges include the ever growing amount and diversity of data and tasks that the team is faced with alongside with growing demand from the business, as it becomes more and more data driven as you mentioned earlier. And you know it's to not only make decisions based on on accurate data but also incorporate data within the company's offerings, such as in product in product recommendations, and so on. To be able to meet these challenges the company expects users to be more self sufficient. In most cases, though, without proper tools and processes that have data citizens are not truly independent and using the data. There's no, you know, ultimate single source of the truth about the data. There's tremendous loss of tribal knowledge which is all that knowledge that the different subject matter experts share and undocumented ways, or at least not widely accessible ways. And we'll take a look soon at how we address these and are to buy, but these are all points that you should keep in mind when you're evaluating any data litter literacy platform is make sure that it alleviates these challenges. So I'm sure that this is, you know, familiar to many of you, if not all, this is what we refer to you as the data hunt. Okay. So business reaches out to the data team asking about data and then there's a whole, you know, undocumented loop of communication collaboration that prevents the data users from quickly and accurately using the data independently. Right. So, this process wastes a lot of times for data team members. What's worse is that this process basically repeats itself every so often. And for the same data. And we all go through this exercise again. This is what our customers have shared as main drivers for implementing occupied data catalog. It's easy to see that successfully adopting a catalog is a win win for all data citizens technical and business users. Everyone can easily see what's in it for them right, and everyone gets time back to do the job they were hired to do instead of hunting for data or explaining data redundantly, depending on what side of the data you're on of course consuming using creating or maintaining a creative way. Some may say the only way as you touched on this Malcolm to truly achieve data literacy is by leveraging automation without automation. Most attempts fail. By the time you're done manually centralizing an inventory becomes pretty much still. And because that process is just so time consuming, keeping the inventory up to date without automation is almost impossible. Occupy automates creating data discovery, which describes where data is used data lineage which describes how data flows through the different systems. What are the sources of the data what happened to it. And it basically most commonly serves use cases such as root cause and impact analysis. We'll take a brief look at that if we got some time, but today we will dive deeply into the data catalog. So, let me go ahead and share a demo. Okay, so this here is what occupies data catalog looks like basically it's the one stop shop for the data. So we started out by just briefly running through the different layers of data assets that occupy automatically harvest. So we harvest assets automatically from the different reporting tools, different databases different ETL systems these are of course just samples of the different technologies that we support automation for. The reason I'm showing this to you is it kind of it relates back to what you were describing before Malcolm we have different types of users. They're all going to end up collaborating in one catalog instead of in siloed systems, instead of having technical users using a dictionary and our business philosophy for the business users. So we're just maintaining all these many different tools which has really not proven effective in the past few years, we're going to want everyone to work in the same place. But we want to help each type of user focus on the type of assets that's relevant to them. So different users can set this up to see the types of assets that are relevant to them. So let me give you an idea of what that what that is. And specifically a business user. I'm interested mostly in reporting tools specifically presentation layers of reporting tools. That's where that's those final different columns and KPIs that show up on reports. I'm also interested in the actual reports. The business user are probably also interested in the semantic level level where all the logic is, as well as the physical level which is how it relates back to the databases. So you can see that every user has their own type of assets and layers of assets that they would be interested in using. And that enables kind of focusing exactly on the type of asset that you're interested in, which is super important because in an average catalog, give you guys an idea. They're going to be around a million assets, one million assets that means you got to get good capabilities to focus on the assets you're interested in be able to search through them filter through them, and we're going to go through that in a moment. And I understand that this entire inventory has been created automatically from our entire ecosystem from the ETLs from the reports from the databases. So, let's kind of run through this with a use case. So say we've got a business user who's interested in a sales report, and he wants to know which report would kind of match his needs. If you were to participate, what you would do is use the filter over here which is same as in any marketplace to filter out and say hey I'm only looking for reports at the moment that have been tagged as sales and click on apply. We can also add the term summary for instance, and say, here we go so here's the order by sales rep summary great. And that's what I was looking for I'm looking for a sales report that is summarized. Let me go ahead and click on this and see the different definitions for this report, and by clicking on it what the user sees right away is well all these different tags that this report has been associated with. And this power bi report has been associated with sales sales force in the GDPR specific project. It's been associated with PII. It's been associated with orders. And we can see the rating over here which has been for rated as 4.5 by two different users by clicking on it you can actually see who's been using it and rating. Typically users with high engagement which he may want to collaborate with about this data and we'll show you how you collaborate within octopi in a few moments. He can also of course rate through this functionality as well. Next you can see the status is approved so this asset has been approved for use. By the way, that's why I got this badge over here to make it easy for him to select these assets from the list. And it flagged as sensitive. This over here gives the business user an idea of whether this is the type of report that he should be further looking into. Okay. Next, you can scroll down and see the different descriptions that were provided for this report. So there's this long description over here it's yearly mass sales report contains detailed sales information by sale, long description right. And this technical description over here is all a shorter yearly sales for last year at account level. We can see here an origin description for tools that support origin descriptions within the actual tool. Octopi automatically harvest them and shows them right over here. We can see the calculation description. So since this is a report we've entered here the filter condition all data filtered for email only. Again, if it's a logical data asset, and it's already got a calculation in it for semantic layers for reports for instance, the origin calculation will already show up over here. We can see the asset as a report we can see the data type when it's relevant. We can see the path to it. The source system that's been documented for. We can see two really important roles about this data asset we can see the data owner responsible for the business aspects of this asset and the business definitions. We can see the data steward responsible for the technical aspects of this asset and the correctness of it. We can see who updated and when, and so on. Next down here, we can see all these assets that have been linked to this report. Since this is a report octopi automatically links assets that come from this report the different KPIs and different columns and so on. We'll take a look at this in a moment. We can also add additional links here right within octopi say you want to relate this report to some type of project which is also an asset in octopi click on the add add the specific project here to the linked assets for the report. Let's say as a business user, this report really seems to answer my needs, but I still have some questions about the data right so I'm sure that all of you are familiar with that. In octopi we've got this built in collaboration that allows your users to collaborate within the platform. And let's take a look at this example. We've got Holly Miller over here was reached out to Sophia right here. She's the data owner does this report represent fiscal or calendar year right she's got additional questions about this report. We can see Sophia has mentioned Holly over here, replying that the report uses fiscal year by mentioning each other, they each get notification with a link to continue collaborating about the data right here within the catalog. What's great about that is that not only is everyone collaborating in one place and everyone knows what, you know, what exactly they're talking about everyone's on the same page. This is tribal knowledge that otherwise gets lost. This is 10 other users that end up reaching out to Sophia if they didn't have this available to them, asking the same question. And Sophia replying to each of them separately maybe having to even check for it separately. And then what happens when Sophia gets promoted to a different role, and she's the subject matter expert. Now who knows this information and needs to look it up by documenting this here in context, giving the option to collaborate about in context over here has huge benefits by preserving all that terminology and all of those discussions and really creating that tribal knowledge. Let's reach back down to these linked assets so assuming that Holly wants to continue investigating this report and feels this report is a good fit for her needs. Now she wants to see what it includes. She reaches down here to the linked assets and sees for example the asset total do some by clicking on it. She actually goes to now look at the details for total do some an additional column within that power bi report. Of course, it's linked to the actual report it's also linked to the semantic player where the logic lies for this column. And all these same attributes that we just spoke about exist here as well including the tags rating and so on. Now she's got additional questions. So she reaches out to Jeff Smith this time, asking, it looks like the sum is rounded, can you let me know if the amount is rounded up or down the numbers aren't matching up with other reports. What this means is, Jeff now got notified to answer right here within the catalog. He probably needs to check this out. And the way to check this out well that's traceability. In octopi we have lineage it's basically built in and integrated to the catalog. So what he can do in this case is click on the three dots over here simply click on the end to end column lineage. And then go back to the column level lineage for the total do some in this report, and be able to trace the data flow, all the way back through the different database objects, the different ETLs. Over here through another database object through additional ETLs all the way back to its original source over here. This is complete completely. So let's say, technology agnostic, you're connecting databases of different types, maybe Oracle and SQL server and snowflake all in the same lineage, different types of ETL tools you may be using both SSIS and ADF for instance. That's all the same to octopi we bring everything to that kind of unified view, the visualization to see everything at one level. The total do is ultimately coming from the total do and sales order header and adventure works in this demo environment. But let's say that's not enough. Jeff, we said he's that he's the technical user right he's the steward, and he wants to see the logic within the specific data flow over here of this ETL this SSIS data flow. So you can then click over here and say, take me to the inner system lineage to visualize the entire column level lineage for this process and see the logic for that so I'm going to go ahead and click on the inner system lineage. And what he sees over here is, this is actually the column level mapping of the entire package that shows how the data is getting from any column. And he goes all the way to its target through all the different components and transformations within this data flow to the final destination in this SSIS package. Once he's here. Let's say he wants to see okay what else is using this table DWH fact sales click on it you can see here it's DWH fact sales and schema DBO in a database, E2E DWH sales. This is the component name that it was given here within the ETL. Let me go ahead and close that. And by clicking on the three dots over here, you can say, Okay, let me see the lineage for this entire table as a whole, not a column level this time. Click on the lineage object and see how this table is being populated by these two different ETLs again from completely different systems. It's being used in these analytic models and all up in tabular. It's also being used in these different views in this stored procedure. And this view is actually being used for this, these reports over here. By the way, this all ties back together as you probably guessed by now, if I click on this view. I can say hey, let me see what the definitions are for this view. Click on it, click on the ADC view automated data catalog. And now we're looking at all the different definitions for this view and SQL server. So it's easy kind of to see how all the different types of users collaborate in this one space that kind of answers all these different types of needs whether they be more technical or more business oriented. Let me go ahead and share my slide again. Okay, so basically, as you can see the data catalog creates independence and using data while preserving that tribal knowledge through collaboration, as well as the traceability through the lineage. This interactive catalog will enable data citizens to independently answer questions such as, where should I look for my data. Does this data matter. What does this data represent is this data relevant and important. How can I use this data, and I can go on and on. That is where true value is adopting a collaborative data catalog is the ultimate enabler of any data driven organization. And I think we're ready for Q&A. Hi and Malcolm thank you so much for this great presentation has been fabulous. We got a lot of questions coming in and just answer the most commonly asked question just a reminder I will send a follow up email by end of day Thursday for this webinar with links to the slides and the recording along with anything else requested. So diving in here so what is the difference between lineage and traceability. I brought that up so I'll give you my definition. Lineage is standing at the far end of a data flow, like in a report and saying ooh, where did this data come from and trying to look upstream impact standing upstream somewhere and saying I'm going to change something I wonder what it can affect downstream from where I am. I think that's my my contribution to the topic. Certainly. I agree. That's that's a really good definition. I think that also you can look at lineage as part of the traceability for the data. As Malcolm mentioned, the lineage will provide you with that data flow and understanding the origins exactly of the data. So that's the traditional aspects to traceability, but that's going to be kind of the backbone of it. Awesome. So which of the, which of three uses apply to data privacy professionals again I think that was part of your section there Malcolm. So the data catalog the business glossary in the and the data dictionary all going to be important for data privacy professionals. So you can think about the data catalog at the data set level, you could you would want to know what are the points at which data is given to service providers that might include personal information what data sets we give into service providers because they're going to you know pass on data subject access request to them so that's an example of the data catalog. The data dictionary is going to be well what are our actual data elements that contain personal information where are they, and then the business glossary would be. I have a business term called, I don't know, employment, previous previous previous in employee employers name. Okay, and that might be in two or three tables in a human human resources database, but previous employer's name is sub categorized as employment history and employment history is one of those if I remember correctly, I'm sure people correct if I'm not, but I think it's one of those categories that if you, you have to disclose it to people who have personal information under the CCPA or CPRA is going to be shortly. So you can kind of see that there's different uses for these three capabilities for data privacy professionals, but they're all used in in some way albeit different ways. Okay, anything you want to add there. I think that described it really well. Thanks. Awesome. So a term thrown around like data domains linking to specific lines of businesses where do those fit. I find there are some aspects from a business and technical perspective. So is it in data catalog or business glossary. I'll take that one. I think that's a really good question and and I believe that the answer is that everyone needs to look at the same system. The last place you want to be is no place where you're maintaining different systems for different types of users. Maintaining one is difficult enough. So the catalog ultimately should be the place where the technical users reach out to the business users can have their answers there as well, and then everyone can collaborate in that place to answer all those different use cases. I always think that the word domain is the most overused word in data management and data governance. So could you repeat the question for me please. Yeah, sure. So a term thrown around data domains linking to specific lines of businesses. Where do those fit. I find there are some aspects from a business and technical perspective so it is in data catalog or business glossary. So I mean depends what you mean by data domain. If it's some people think it's like, you know, reference data, like list of valid values others will say it's like subject areas. If it's subject areas then probably in something like a business glossary but it's no actually maybe I take that back probably in the data catalog. So anyway, that's that's that's my thoughts on it, Shannon. I love it. Um, so, so many great questions coming in I'm just trying to move rapidly through them here is the consolidated view data catalog metamodel because it has. Sorry, I got shifted on me. I'm is the consolidated view data catalog and metamodel because it has detailed row based for each instance of first middle and last name. I would go for it. I mean, by definition, a product like octopi is dealing with metadata, and it's dealing with it, you know, and housing in the structured way so it has a metamodel. So anything that that is going to house metadata has to have a metamodel, because we would define a metamodel as the data model for metadata. So I think, I think I that would, that would answer the question is yes to you, you do need a metamodel. And I don't have anything you want to add to that. Yeah, I think that when we when we move to think of catalogs. Okay, we stopped looking at, you know, how the technical aspects really of what's going on behind the scenes, and we're we start speaking of it in terms of the value and the different types of users and use cases that they can really use for it, and different, different catalogs use different frameworks, but everything we showed today which speaks about being able to provide the definitions traceability collaboration, those are all things that you really should be looking for it. You know, and we kind of anticipated this, you know, there's, there's quite a few questions on what Octopi connects to. Is there a good resource for those, that kind of thing. Certainly you can find them on our website octopi.com. You'll see all the different supported systems. Those are supported systems that we support for automation of harvesting the, the different assets, I can also ingest assets. Externally, so both of those options are available you can see them in the website and welcome to reach out of course there's any additional questions about it. Yeah. Perfect. With automation is manually added metadata preserved or overwritten. And what about with product updates how our custom attributes for example manually added metadata preserved. Perfect. So yes. And manually is of course preserved everything automated gets overwritten in a way, meaning that if you've got some type of original calculation. In the example I gave before. And then we've got a description for this calculation. The description that's been provided in Octopi manually that gets preserved of course the disc the actual calculation if it changes in the actual metadata that gets updated automatically. So, and do we need to model the reference data in like Power BI before we upload the sheet and data catalog tool. Not at all octopi does that entire process as part of the automation. Awesome, I love it. And can I do data lineage for Python ETL code data lineages normally SQL base nowadays all cloud ETL happens using Python data frames. Do you have a solution for this. We'll automatically of course create the lineage for the sequel and for all the different ETL tools that we support which are many. Python code is not supported with the automation but it can be in the lineage for it can be injected to reflect it with the same visualization and kind of enrich the already automated lineage that you will have from the rest of the ecosystem. So how does the tool manage multilingual definitions. So, we are just in the completing steps of adding additional customizable attributes will which will support exactly those that use case. So many great tool questions coming in. In maintaining the data lineage up to date what steps would be performed automatically and what steps need to be performed manually. So great so extracting the metadata analyzing it all the machine learning happens all automatically on our end so there's really no effort for all the different tools that we support automation for that's manual, that's all automated. As I mentioned before if you if you would like to enrich that lineage that's already created, that's possible to do manually. Throughout your UI or through dedicated API's and so on. Okay I'm going to try and squeeze in one more question here at least I'll get any questions we don't have time to for over two octa five. So what makes sense to store reports in the catalog when the reports are just views of some database at the end, losing report is a small loss no value but losing data is a big loss. Oh, that's a really good question. So, yes, one, an asset that's important to one role is not necessarily the asset that's important to another role. The catalog serves so many different types of users so many different types of data citizens, you're going to want any type of data asset that needs to have some type of description, and so on and needs to be associated with any other terms. We have that documented in the catalog, so that it really is the one source for all of the different types of use cases and all the different types of data assets, regardless of the importance of that actual data to a specific individual. Well, hi, thank you so much and Malcolm thank you so much as always, another great great presentation but I'm afraid that is all the time we have for today. Again, I will get these questions over to occupy for any remaining questions we didn't have time to get to. I will send a follow up email by end of day Thursday with links to the slides the recording and additional information that you've been asking for, and appreciate it so much thanks to all of our attendees for being so engaged in everything we do as always another great webinar with you all, and hope you all have a great day. Thanks so much everybody. And thank you. Thanks a lot everyone.