 Hello and welcome, my name is Shannon Kemp and I'm the Chief Digital Manager of DataVersity. We'd like to thank you for joining this DataVersity webinar, Noise to Signal, the biggest problem in data sponsored today by Elation. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them by the Q&A in the bottom right hand corner of your screen. Or if you'd like to tweet, we encourage you to share highlights or questions via Twitter using hashtag DataVersity. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and additional information requested throughout the webinar. Now let me introduce to you our speaker for today, Stephanie McLennon. With over 15 years of data infrastructure and application experience, Stephanie has a track record of bringing new technologies to market and into the hands of business analysts. Stephanie is currently the Vice President of Marketing at Elation. And prior to Elation, Stephanie was instrumental in building the first marketing team at the self-service data preparation provider, Trifecta. She previously held senior product management positions at a number of companies, including Teradata, Astrodata, and Oracle. Stephanie earned both her bachelor's and master's degree from Stanford University. We're glad to have her here. And with that, let me give the floor to Stephanie to get today's webinar started. Hello and welcome. Thank you, Shannon. It's great to be here today. I would like to start with a little bit of commentary on noise to signal. I am a mother of two sons, and so I found this cartoon pretty humorous, actually pretty reflective of my day-to-day life. But the reason I bring it up in the context of data is that our ability to produce ingest and store data has grown exponentially over the last seven to 10 years. But our ability to parse out insights from that data has not. And you have to start to ask, why is that the case? And I'm often wondering if our business insights are more like the tinkle of an ice cream truck filed miles away or the commands that a parent gives to their children to do homework. Sometimes I'm not quite sure. I know I have a preference there, but you can't manage what you don't measure. And signals are often heard by those who are looking to find the signals. So we'll get serious now and we can get into the content of today's webinar. But I thought that was a little bit of a perspective that our attendees might appreciate. So what are we going to discuss today around noise to signal in your data pipeline? We're going to take a little bit of a look and just level that everyone on where is noise coming in the data pipeline. Sometimes as daily practitioners, it's hard to remember how much has changed over the last several years in data processing as well as our access to analytics. So we'll spend a little bit of time there. I think what might be a little more interesting to the audience is we're going to take a look at some examples from major companies who have struggled with this noise in the data pipeline issue. Most specifically we'll look at Pfizer, Munich re-insurance, and eBay. I tried to pick three customers that are representative of a wide range of industries and a wide range of types of organizations. So hopefully there's something in there for our attendees that sticks. And then we'll dig into the next level of the problem. Why is this problem so pervasive across all industries for all customers? And finally talk a little bit about what to look for in technology to help so that you can identify whether a data catalog might be something that could help your organization get more value out of analytics. So what is the problem statement? When I say there's noise in our data pipelines, what do I mean? That noise-to-signal ratio is an analogy that we're using to highlight the fact that that is represented by these statistics. Most organizations have claimed over the last couple of years and in fact have put major big data initiatives forward or self-servicing analytics forward, but it made a strategic commitment to using data and analytics to make more accurate, more competitive, more successful decisions in the organization. We put this label on that that we're all trying to become data-driven organizations. And yet if you look at what's happened over the last couple of years, very few organizations are claiming to be completely successful in becoming data-driven organizations. And the missing gap seems to be the ability to connect the analytics to management action in the organization. And this noise-to-signal problem would indicate that one of the challenges here in actually making decisions from analytics is that there's too much production of happening in our organizations. There's too much growth in data, too many different types of analytics that we can run against that influx of data. And so our data supply chains are overproducing. No matter what level you look at, there has been an overproduction of data assets. And I often think about this kind of as a production line. So when I was a kid and long summer after news, of course, I spent some time watching television. My parents might not appreciate me sharing how many hours I watched television, but if anyone remembers this episode, I often think that this encapsulates what self-service analytics feels like in some organizations. And there are all of these data points coming your way, and you're just trying to catch up with them. And if you're like, Lucy, maybe you're doing that in a pretty elegant way, finding new places to put data, but if you're Ethel, you've had enough. You're just done. And that's kind of a fun analogy, but I think that recent data catalog surveys confirm this in its details. This is pulled from a report that Howard Dresner and Bill Hostman recently did for Dresner Advisory Services. And this information was collected through a survey-based study reaching out to many organizations like yourself where they were asking respondents about how they locate relevant content for their analysis. And that could be either the data itself, it could be the metadata, the descriptions of that data, whether those are technical descriptions or maybe it's the descriptions, business semantics. And 47% of respondents indicated that they have difficulty just getting started in locating and accessing the relevant content to begin to do their analysis, to begin to make decisions. So there are plenty of examples where enterprises are struggling with the influx of data and the influx of data assets as well. What's the challenge? There are too many raw sources, either databases or Hadoop instances or derivative sources like reports and dashboards and Excel files. There's too many sources to be able to find data in the organization easily. There's also too many processing engines and methodologies that are used to transform data to begin to really understand what trade-offs, what nuanced trade-off decisions are being made by the individuals that prepare that data for analysis. Today, we not only have SQL as a language of choice to do data transformation, but we have ETL and ELT tools and Presto and Spark and click-based self-service data prep tools. There's logic that is instantiated in each one of those tools. Those data is manipulated or transformed that actually has an impact on the accuracy or perceived accuracy of the output analytic. And so that's a challenge in today's environment. And there's often too many sources of reports and dashboards at the end of the day from individuals. I know this analytics has been great to get everyone to be hands-on with data in the organization. But how do you really know what are the more trusted individuals or the more trusted sources of data to make a decision off of? I work with a lot of heads of different departments, whether those be marketing departments or finance departments or supply chain organizations. And what they tell me is it is not uncommon for someone at the vice president or the C level of an organization to ask two analysts to go and run some analysis for them and get two distinctly different answers back. How is a decision maker do you know what's the right answer? Who did the right analysis? Often it's not really a question of what is right, but just the trade-off and methods and methodologies that have been used. And it's hard to understand that and be able to trust one of those analysts if they haven't shown their work and that work isn't transparent and I don't know how to validate those results. Some organizations are investing in chief data officers and the office of the CDOs to begin to peel this back. And that is a trend that we see rising in the market and a positive trend. I think that the definition of the role of the CDO, if you look at some of the survey data, it's also changing so they're no longer just the compliance master of the organization, but they are also now tasked with also sharing best practices for analytics throughout the organization and starting data literacy initiatives so that we create a baseline of knowledge and education for the organization on how to translate analytics back to decision making in our organization. These chief data officers are being asked to really focus on driving change. This survey I showed here was a forester survey on how chief data officers are... investments in chief data officers are growing across all industries. This slide actually represents how Gartner is viewing the world and Gartner clients now indicate that their organizations are increasingly viewing information as an asset. Doug Laney at Gartner has been promoting Infonomics as a way to calculate the impact of the analytics as an asset. And that work has led organizations to focus on how do you measure data now as an asset? How do you begin to manage it as an asset? And then how do you think potentially about monetizing data as an asset? And there I think we're really talking more about monetizing the analysis as a data product. So that's some of the industry research that's out there about how organizations are starting to make changes in their investments to approach reducing this noise-to-signal problem. I want to get real and talk from the perspective of a couple of customers, because this is really happening in every company and every industry, and I think it's only through the use cases can we really understand how you might be able to put best practices in place to overcome this noise-to-signal problem. So I've pulled three examples from three organizations that I've worked with closely. One is a financial services organization who is embracing Hadoop and finding queries in their environment and the freedoms that that gives individuals to analyze. That organization is Unic Reinsurance, one of the world's largest reinsurers in the world based out of Germany. And they started their path to reducing this noise-to-signal problem back in 2015. That was when their chief data officer started building a program with Unic Reinsurance focused on data-driven decision-making. And he started very small, just with 10 employees trying to think about how they could change how their organization used the analytics. And their foundation as a financial services organization already was a pretty data literate or data savvy population. They have actuaries that have been involved in the business for many, many years, but they wanted to ensure that it wasn't just the analytic experts that had access to data analytics, but it was every employee. And so they've been on a path. Currently, there are hundreds of employees that are using analytics to make decisions on a day-to-day basis, and they're looking to expand that to thousands and tens of thousands in the future. So I'll talk a little bit about what Unic Reinsurance put in place to help them get on this growth path in a little bit. But I wanted to set up two other companies that you might have in the back of your mind to think about this. And Hadoop isn't the only technology that organizations are storing data on where they have a challenge of data volume. Companies like Pfizer and the Pharmaceuticals Space are increasingly moving from their analytic environments to the cloud. This article is an article from last month in the Wall Street Journal where the team at Pfizer and the Chief Information Officer lays out the journey they embarked on several years ago to start to build out an analytics platform that incorporated AI and machine learning. But I think more interestingly, build that out in the cloud to have access for all of their employees to be able to run analytics. Jeff Kiesling, the Chief Information Officer at Pfizer is quoted in this article as talking about the fact that data science shouldn't be confined to mathematicians. It shouldn't just be the experts that have access to data. It should be everyone. But in a world of high noise-to-signal ratio, that means a little bit more confusion in minutes about how Pfizer began to approach that issue so that they could achieve success. And the third company I'll refer to today is a technology company in the online retail space. That's eBay. And so to give you a sense of an infrastructure, eBay has one of the largest TerraData database footprints in the world. They also have a very large Hadoop footprint. And for a technology company like eBay, if you think about how eBay is different than Pfizer and Munich re-insurance, eBay works in a slightly less regulatory environment. They have a lot of developers and engineers on staff and a little bit of a different ability to give raw access to data through technical tools to their end users. But one of the things they really noticed was that more employees access to data with that data governance became a huge issue for them. Not so much for compliance reasons, but because a lot of individuals were not managing their data in a thoughtful way. So I love this quote from Joe Harkaru, who was the chief data officer when I started working with eBay. And he says, any person in the organization, because that's how their culture is, every employee has access to data, if a random person queries some data, then puts it in Excel, so they take it out of a managed environment and put it into a desktop tool like Excel, modifies it and then puts it into a PowerPoint and ships it around as a truth. That is the biggest sin of data governance, not a compliance issue, but in eBay's perspective, the biggest sin is not having lineage back to the source of the data and a transparent viewpoint into how that data was manipulated. And as a result, managers can't actually trust that data for future analysis. And so I think these three examples give you a sense of slightly different cultures, slightly different technical challenges of processing their data. And now I want to tell you what their success stories were, because in all three of these scenarios, technology helped them start to resolve the gap that they had between noise and signal in their data pipeline. And not only did it serve as a technology to help them solve the gap, but all three of these organizations were able to use a data catalog to produce business outcomes to actually get to that point of impactful business decision-making. And so I want to share with you some of their transformational results as inspiration for what you might be able to do in your own organization. And I'll start with Munich re-insurance. As we mentioned, they're one of the larger re-insurers in the world. And the pressure in their market to stay on top of development and provide new products is actually quite high. Their board and their shareholders have expectations for growth of the company and the evolution of new larger natural disasters of natural catastrophes, robotics in our workplaces that open factory floors up more so than ever before to cyberattacks. These trends, global trends in the world have opened up opportunities for new insurance products that might counteract the future risks that organizations see and provide an avenue for new re-insurance to establish new revenue bases. And so Wolfgang Hauner, who's the chief data officer at Munich re-insurance, has a very interesting perspective on his role. Wolfgang Hauner has put in place analytic infrastructure is managing a data governance program. So when Wolfgang was in our offices a few months ago, what he told me that was that the most important portion of his job as a chief data officer is to ensure that he can support the business in making the right decisions of what new business units they can open to drive revenue in these new areas of innovation. And to the extent that each of these investments in the new business unit can be based on an analytic foundation of a new way to gain insights and data, that is of interest to Wolfgang and his team. And so, you know, their entire data strategy is geared toward figuring out and providing individuals with the tools to offer new and better risk-related services to their customers. They implemented a data catalog to be able to support that initiative. So not only actuaries could have access to their Hadoop environment, but analysts, who didn't have the technical skills potentially to go and access raw data themselves, could use the data catalog as their portal to this new storage system for data and analytic snippets of code. And so they started with 600 users accessing that data catalog and have built that up now into the thousands of users who are accessing data through the catalog and getting input into the evolution of new business units in the organization. So Pfizer's a little bit of a different example, not just because they're a different type of company as a pharmaceutical company, but you can imagine a lot of the information as a pharmaceutical company that Pfizer works with is highly regulated information. Things like physician notes, lab reports, demographics, information as to how clinical trial results. There are many disparate sources of data. Often many of those sources are stored in file structures rather than databases. And it can be hard to get an aggregate view of that information. One of the important inputs of insights, particularly for drug development, is looking at what they call comorbidities where there are multiple health issues that may result in a patient's death and being able to sort through what were the causes or the interactions of more than one disease simultaneously in a patient. It's something important to sort through as you're determining the effectiveness of a new drug before taking it to market. So for Pfizer, access to data in one place is a little less about democratization of data and a little more about finding the most accurate models to potentially identify rare diseases. The data catalog implementation at Pfizer started within their data science team and then was expanded to a larger group of users and really helped the teams at Pfizer work on their processes to potentially identify rare heart failures. This was one of the initial projects they took on. And the challenge with rare heart failures is that the disease can often go undiagnosed because the symptoms of a rare heart disease might be very similar to more common heart failure. And so identifying candidates that might be able to participate in clinical trials or candidates that might be able to help with the diagnosis based on all of their records can feel like finding a needle in the haystack. And so what the Pfizer team was able to do was really rather than having to scan through all those materials, the data science team worked with the researchers to define some machine learning models that could in an automated way use the materials and highlight for human oversight who might be the best candidates and then this human could be the side she would include in the clinical trials. And so this ability to share out these models and have the machine learning do the work that would be impossible for a small group of individuals to do was really made more efficient by being able to register all of this data in a data and have the data scientists more easily be able to find that data and some of the query code snippets that they use to start the machine learning models. And so this is an example where you have a company in Pfizer that's been able to bring breakthrough drugs to market faster by having access to this technology in a data catalog. And the third example I'll share is eBay. We talked a little bit about eBay's business and their intent to give all of their users access to data but in a governed way where best practices are shared as well as meaningful compliance with regulations. And eBay has had a very successful rollout of the data catalog as a foundation for the data governance efforts. What you see in a screen shot here on the left-hand side of the screen is their business glossary which is a living business glossary. There's a group of data stewards that not only help with maintaining this business glossary but use it as a way to distribute and propagate updates throughout the entire organization. There are over 1,000 weekly users at eBay of the data catalog that encapsulate this business glossary and they're hoping to grow that to 3,000 users by the end of the year. What this foundation has allowed them to do is to reduce the amount of time it takes to onboard new individuals who want to make data-driven decisions because a lot of the inputs are really encouraged reuse of some of the best practices that information stewards are certifying because there's an automated way to share those best practices with the rest of the organization. And that's given them a very solid foundation of not only trust in data but ability to govern their data assets in their environment. So hopefully walking through a couple of those examples and questions, I'm happy to get into any of the nuances around those examples. Before we get to questions, I would like to address why has this been such a challenge? There's always some people in process that develop challenges in our environment when we're trying to democratize the use of data but there's also some long-standing challenges in place that you can attribute more to how our data ecosystems have evolved. And our data ecosystems have certainly created this noise to signal a problem that we're now trying to unwind a bit. So I'll take you through a little bit of history. If you think back to the 80s or the 90s, an organization's data would live most typically in a data warehouse. So the answer to disparate data sources was to put all of this data into an IT initiative that was called the data warehouse where we would have a single source of truth. So ETL pipelines were written to be able to maintain that single source of true data warehouse. A simple reporting layer was put on top in the form of business objects where I worked for some time or Cognos or Hyperion. And it was the head of IT that controlled this data environment. So the data environment might have been a little limited in breadth, but the data was nicely organized and modeled in advance of anyone using it. There were some trade-offs, though, right? The breadth of the data was limited, answers were slow to trickle down in the organization, but it was all right for the most part. What happened, really, started to happen around 2005 was there was this movement towards self-service analytics and you saw this flowering of different types of data visualization and self-service tools. And so anyone could create a report. There was no single source of truth. There was no single source of truth anymore. There was no IT team that was going to distribute our reports and dashboards to us. Everyone could go and seek the truth out on their own and do their own data discovery. What that meant, however, was this problem of a seemingly straightforward question like how many customers do we have likely return different answers because of the way that someone in sales might define a customer success. And so depending on their definitions and the data at hand, we had a proliferation of different definitions and thus different answers. And this is the era in which top-down data governance tools started to emerge as a way to control this environment. So the real problem today is that we still have too many tools creating conflicting answers. We have more complicated code now because we've added a whole bunch of new systems to those old single source of truth systems. And we still haven't solved that problem of governance although we initially told folks that it was a top-down process and procedure. And so the environment as well as the getting to answers looks extremely complex in general. And I would argue that we're really only halfway through the sales service analytics revolution until we solve this problem of how to properly govern these environments, we won't get to all of the benefits of the sales service analytics revolution. And so changing this notion of data governance is really what comes next. The traditional response, data governance and processes and workflows. And try to change behaviors over months and years. But it's super expensive to do this from the top down and to be honest in a culture like we have in many organizations in the United States, it hardly ever works. The best that it does is to create two factions of data users. You have the data governors and people. And so what we believe will happen over the next couple of years is that organizations will begin to stop trying to govern from the top down and think about a more grassroots way to get to sharing best practices as well as adhering to compliance rules. But data catalogs can actually play the role not only of helping to govern data in organizations but to promote items up best practices and propagate those best practices through the organization so that best practices become available as recommendations within the workflow of data consumers. This notion that the data governance team or a team of information stewards can use a catalog to curate positive behaviors and self-service becomes really important. Gartner has researched this and validated that they believe that organizations that adopt the curated catalog will realize two times the business value from analytic investments than those organizations that do not organize around something like a catalog. Why is this important? Where does that value come from? I think the most obvious place is in the catalog as a way to increase the productivity of anyone on your team who is using self-service analytics. The numbers on this slide come from surveys from the over 100 customers that we work with at Elation. What we found is on average organizations are saving from 20% to 50% of time, increasing the productivity of each of their analysts from 20 to 50% by using the Elation data catalog. And most of the impact is happening in the process that it takes individuals to find data and really understand the nuances of those data assets so they can come to the position of trust. State of the art several years ago to do that was a series of 30-minute meetings with all the experts in your organization who might understand the technical metadata around those data assets and the business data assets were used. Today we have technologies not just from Elation but many vendors to help shorten that time to finding and understanding. We also believe that data catalogs, and we've seen this in organizations, can help get to more accurate answers by increasing the speed of documentation and the volume of documentation that's achieved. Data catalogs that start with through existing documentation and make linkages between different data assets where definition is common and that helps to manage the documentation environment. And overall the effect of achieving reduced time to insight for the consumers of data. So as I've been talking about these examples and impacts of the data catalog, I think not all catalogs are defined or built or constructed the same way. So if you are interested in looking at a data catalog for your organization, there are five things that I recommend looking for in a catalog to make sure that it will have this business outcome oriented impact in your organization. The top thing that I would recommend, and this is probably obvious from what we've been saying today, but make sure that the data catalog isn't just an inventory built for IT. The business outcome impact of this new era of technology is driven from the ability to surface data for the business user. So this has to be a living catalog. It has to be something that is always changing and enhancing the understanding of consumers of data. It's not just a footprint that's used for storing and aggregating a list of all of the data sets that you have in your organization. I'd also recommend that you look for a data catalog that has AI or machine learning technologies built into it. And that those technologies are focused on delivering recommendations to the end users. So that the end users get this notion of the machines are giving me recommendations. I can accept or deny those recommendations, but there's a collaboration with machine learning here that helps speed up human processing of data. The third thing to look for is that does this data catalog help you with actually tracking all of the data assets that come together into an analytic? Does it have access to data sources as well as to the transformation logic that's been using to prepare that data as well as cataloging the reports and dashboards? All of the data assets should have their own catalog page and should be linked together in data lineage so you can get an end view of what happened during the analytic process. There's some type of just-in-time guidance that the data catalog gives. It shouldn't be the static catalog, but it should be a platform by which your information sewers can upload or download different data assets. Give them a certification. This is our gold standard and communicate that throughout the organization. And the last thing to look forward and make a data catalog that you may have seen, make sure there's a data catalog supports collaboration. It has things like reviews and comments and rankings. Those things help break down some organizational silos that lead to finger pointing around analysis and if the technology can help with that, you'll find it easier in your organization to move the organization to analytic best practices. If you're interested in more data catalogs, share it in the bottom. If you'd like to go take a look and consider that top five list in more consideration. I also wanted to share a growing list of customers that we're working with who have data catalogs in their environment. Hopefully for the folks on the call, you'll see an organization that isn't a similar industry to your own or might have a similar business and I'll open it up to questions. I know some folks often need to leave at this point in the conversation so if you are interested in more information, Forester has come out with their first Forester wave ranking some of the vendors who provide machine learning data catalogs. You can find a free copy of that research on our website at the first link presented here on the right hand side. If you'd like to set up a demonstration for you that's a little bit more personalized information than we were able to go to today and really see a data catalog in action. That's the second link here if you'd like to go to the Lation website and get a personalized demo of Lation, please enter your information there and we'll be in touch with you. With that, Shannon, I believe that I'm ready for questions. Hopefully we have some good questions on the right hand corner of the screen. Just to answer the most commonly asked questions, just a reminder, I will send a follow-up email by end of Thursday for this webinar with links to the slides and links to the recording of the session. Just to kick us off, Stephanie, you mentioned at some point the democracy. This is a key word that we're hearing more and more. Can you tell me a little bit more of what that means to you and how you see it impacting the industry? I think generally the definition of data democracy is to give everyone the freedom to access data. And I think also the tool set to be able to really perform your own analysis. And the tool set of being able to perform your own analysis is probably where there's quite a bit more work to do. In many organizations it's fairly simple to give end users access and to buy an off the shelf tool to use. Here's an easy way to drag and drop visualizer data. But I think what we're struggling with as an industry is how do you make sure that the individuals who now have access to data can really interpret that data appropriately and apply it to decision making. And that's probably where data catalogs and software data prep tools and some of these innovations fit in. And how do we get providing folks with the right tools to establish their own foundation of data literacy and make that data access useful to be able to apply it to decision making in the organization. How does a data catalog relate or compare to metadata repository? That's a great question because I think a data catalog you can think about it as a next generation metadata repository. A lot of the fundamentals are exactly. Metadata repositories collect information on what are the technical descriptions of data. That's usually the foundational starting point. Traditionally, metadata repositories have been really useful for IT teams who are looking at an inventory of data and then maybe for modeling processes need to look at different descriptions of the technical metadata of a column in a particular database. I think data catalogs take the next step is making that metadata and that understanding of the metadata first automated. So by using machine learning technologies to look for patterns and metadata, by using parsing technologies to automatically parse through query logs to see how that metadata was applied across a bunch of different queries, we're able to automate the creation of a much richer metadata repository and then turn that metadata repository into a data catalog that can make recommendations to business users about how to use that data going forward. That's what I think is really innovative about data catalogs and make them different from metadata repositories. You have to do a lot of technical work to turn a metadata repository into a business impactful catalog if you were in the ground up on your own and you need some machine learning and AI data science skills to be able to do it. Gartner uses their hype cycle to show where new technologies are on that curve. What would you say is the growth adoption for data catalogs? I think what we see right now is data catalogs are starting to cross that chasm to the mass market adoption. We started building data catalog. eBay was one of the original partners that we worked with back in 2012 and 2015. We brought data catalogs to market and publicly announce Elation as a company. There were several startups at that time that announced their companies. I think now what you see in most of the analytics reports that are covering the market is that vendors have had enough time to work with that. The new cases are well known. The business index opportunity is well known. We're really crossing that chasm from the adopters to mass market adoption. Everyone is so quiet today. Don, a lot of questions coming over. Feel free to submit them in the questions in the bottom right hand corner. I'm always bragging about how you guys are so engaged. How does a data catalog integrate with various data extraction scripts? I think most of the work out there for data catalogs has been done with BI tools first. You see some interesting things happening. We at Elation happened to partner closely with Tableau and MicroStrategy and Salesforce Einstein analytics. The baseline for integrating with any BI tools is being able to automate the cataloging of all the reports and dashboards in that tool. We view that with all three of those vendors plus a number more business intelligence tools. Many of the other data catalogs in the market have gotten that cataloging down where you can automate the creation of a page that describes each report and dashboard in that third party tool. I think what's more interesting in the work that has been done, particularly with I'll highlight Tableau, is the certification of reports and dashboards where an information steward in Elation can highlight, kind of cull through all of the reports and dashboards that have been created in self-service mode in a Tableau environment and really select ones that they want to promote as a gold standard for usage across the market. So by selecting those in Elation, we then use API integration with Tableau to have those gold standard reports and dashboards show up in end user, business user interactions in Tableau as certified data assets. That's a super powerful example of how a data catalog and a data visualization tool can improve the workflow of end users by giving them a gold standard to start from and asking them every time to create a new one-off self-service analytic. GoDaddy is a customer who's had a tremendous amount of success with that type of approach and there's a couple of case studies on Tableau's website as well as our website describing exactly how they did that. That's great. So a couple more questions coming in as well. How should data catalog relate to a standard logical data model? You know, I think what a data catalog helps do is to make a transparent connection between a physical data model and a logical data model. So sometimes as data organizations, we have a logical data model, but it's not well expressed in one place and it's not accessible to the business. In many data catalogs today and particularly, I can speak to Elation, we have a notion of data wiki page. And that wiki page helps the business user see the translation between business term, they may be aware of, how it's represented in that logical model and how that logical model may tie back to several data stores or data transformation scripts. And since that's all in one place, it becomes easy for an end user to understand what's going on and grok it. So when someone says, hey, you know, the logical model you're using is great, but it may not be operational for a month because the system is down or we're replacing the system or the last hour it wasn't available because the CTL process broke, you have one place to go to to see where that impact was and it's more obvious to the business users that they're using some conceptual concept that they have a hard time doing that. So how does Elation discover data being onboarded in a Hadoop data platform? Does it have data lineage capability? Yeah, Elation does have the capability to show the data lineage. So with Hadoop what we see most often is that data is coming into Hadoop in a raw process, these are through scripts within, often through scripts within Hadoop into a derivative form and then put into Hive or some sort of database tables for use within a data visualization tool. And so Elation connects to not only Hadoop directly to say here's the file that that data first hit when it came into Hadoop but it also connects to when it came into Hadoop. But we partner with Hortonworks and Cloud Era as well as Trifax and Paxata, some of the tools that are most often used to transform data in Hadoop into Hive or database tables. And then we also connect to Hive and Presto and some of the processing engines on Hadoop to say okay, then what happens next? Does that data be shipped up to a front end BI tool? So we can show the lineage end to end across all of those hops that the data has taken. So if a company has poor data quality, does data catalog help to improve data quality or should improving data quality proceed the implementation of data catalog? Yeah, and this is one of the trick of the rig, I think questions of where should you start? Should you start with data catalog is really going to help do is surface data quality issues so that they're well known within the population of users of that data. Most data catalogs are not deep data quality tools. And the reason they're not is as many organizations have already had their data quality tools and standards in place far ahead of adopting a data catalog so there wasn't that much demand in the early days to build in data catalog is so focused on helping the consumers of data understand how to wait through that data set that there are features for, you know, collaboration that superseded some of the requirements. So we typically recommend that if you have deep data quality issues that you do have an IT tool in place to help surface those issues, and if you have deep data quality issues that you do have an IT tool in place to help surface those issues and take off mitigation work flows, but that the data catalog is integrated with those tools so that any issues and data quality can immediately be surfaced up to end users through the catalog and observations that end users may have where an automated tool might miss it, they can log those through the data catalog and make sure that the appropriate individuals in IT to those usage challenges where there's deep data quality issues. How does it work for data that has either sparse documentation or reference material across many sources? Yeah, that's a great question. So a data catalog is going to help with bringing all that documentation into one place and allowing linkages to be made through that breadth of the documentation that may sit over different systems. The easiest way for me to think about a data catalog is rather being a single source of truth, like data warehouses were positioned, you can think about a data catalog as a single source of reference. It's not the place where you store the data, it's not the place where you process that data. It really is a pointer to all the other systems that hold keys to the answer. So just like we can point to different sources of data where different components are stored, we can also point to different sources of documentation and have that pointer coming from a central place. For organizations who don't have a lot of documentation at hand, where we found those organizations often start with a data catalog is by using the data catalog to see who the task users are of tables in the database and files on Hadoop. So we can automate the identification of the task users of different data assets and then use that as a way for information stewardship team or the data government to be able to reach out to those individuals and ask them for some help with documentation. And by prioritizing the documentation effort according to usage, you're making sure that those limited resources you have to do documentation are best spent on the highest value impact. So is Elation able to call legacy mainframes that use JPL, COBOL, natural, and ADA, VAS? No. Unfortunately, we are not able to call mainframes today, but potentially in the future, we've decided to really focus on databases, Hadoop, and the developed large ecosystem of business tells of the data visualization tools first. So unfortunately, mainframes have not hit our priority list yet. And I there may be other data calendars out there that do, but I am not aware of any. So that is kind of an untapped part of the market. I know that can be a challenging problem. So for data lineage, is Elation able to span cloud data lake and on-premise data sources? Yes, absolutely. We have a number of customers that can understand both their on-premise instances, as well as their instances in the cloud, be those databases in the cloud or other instances in the cloud. So this type of a hybrid environment is one, particularly as organizations sometimes use the cloud for test instances or development instances and maybe on-premises section instances, I mean a blossoming around those types of implementations. Customers like Chegg and Intuit would be good examples within the Elation customer base. So Stephanie, from a previous question regarding logical data models, did you use that term the way a DBA would, a physical model, or are you asking for a conceptual model? I hope I answered. Yes. So I am talking about a logical model. It can refer to a data model that is crossing multiple sources or a data model that is not physically instantiated in the relationship definition between tables in a physical database but is more a logical model that reflects usage or design for how the data should be used, even if that's what I was referring to when I was talking about not, there may be a deeper question with, I would ask whoever asked that question maybe to ask one more time because it was a little bit more description of what they are getting at because there may be a deeper definition of what I meant. We'll give them just a moment and they're coming up right at the top of the hour here. I just want to say a reminder to everybody that we will be sending out a follow-up email by end of Thursday with links to the slides, links to the recording of this session. Stephanie, thank you so much for doing the presentation today and joining us. It's been very informative. I can speak today. I really appreciate it. Thanks for having me. I really appreciate it. Have a great day, everybody. Thanks, Stephanie. Thanks. Bye.