 Hello and welcome. My name is Shannon Kemp and I'm the Chief Digital Officer of Data Diversity. We would like to thank you for joining this Data Diversity webinar. I'll have trust issues with my data. Can I really change that? Sponsored today by Precisely. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we'd be collecting via the Q&A panel. And if you'd like to chat with us or with each other, we certainly encourage you to do so. And just to note, Zoom defaults the chat to send to just the panelists, but you may absolutely change it to network with everyone. To find the Q&A or the chat panels, you can click those icons found in the bottom of your screen for those features. And as always, we will send a follow-up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. Now let me introduce to you our speakers for today, Julie Skeen and Paul Rasmussen. As a senior product marketing manager with Precisely, she has over 25 years of experience working on solutions for customers in data intensive industries. She focuses on understanding customer needs and ensuring Precisely's data quality and data observability solutions are aligned with those needs. As a product manager at Precisely, Paul is passionate about helping companies tackle their data challenges by focusing on data integrity, analytics, governance, and data ops. After receiving his Bachelor of Management Information Systems at the University of Texas in San Antonio, he accumulated extensive years of experience in the engineering, consulting, and product management industries, leading to his strong technical background and business acumen. And with that, I'll give the floor to Julie and Paul to get today's webinar started. Hello and welcome. Thank you, Shannon, I appreciate it. Before we start, let me take a moment to introduce Precisely. We are a leading vendor in the data integrity space. We have decades of experience in data quality, helping some of the world's largest companies get the most from their data. We have 12,000 customers in 100 countries worldwide, serving 99 of the Fortune 100. We work with customers from a variety of industries from telecommunications insurance and financial services to retailers, manufacturers, and many others. We partner with leading technology companies to jointly solve challenges for our customers. So we've seen organizations working to implement various initiatives to meet today's overwhelming business challenges, while competing in an uncertain economy and complying with complex regulations. So one of those initiatives include risk management, understanding your customer better, finding ways to leverage AI and ML, companies are looking for ways to leverage their data. It's likely no surprise that 83% of CEOs want their organizations to be more data driven. So investors recognize the need to leverage data in ways that allow them to make better decisions and drive more efficient processes. CEOs worldwide are pushing their organizations to be more data driven to ensure the success of these critical projects. Knowing that doing so is only possible with high quality trusted data. So there's a gap between what businesses want to do with their data and what practitioners within the organizations feel is possible. Less than a third of practitioners believe they are leveraging their data to drive their actions. Even fewer practitioners believe they can completely trust their data. So organizations who are closest to the data feel that they can't trust it for various reasons. For instance, they can't get it fast enough. It's trapped in complex legacy systems, not available when and where it's needed. And it's not as fresh as the business demands. They don't understand it. They don't understand the lineage of the data. They don't know how it's used by the business. And they don't have accountability around data changes. Hey, Julie, that don't understand it. This is what's really interesting is I've heard this from a lot of customers lately. It was kind of really stuck out of me when they say this because it's something you assume that everybody understands the data. But in one example, I had a customer add a new third party partner data feed and they introduced, it was a currency code. But the data engineer that was building the analytics on top of that didn't understand that there was a new currency code introduced and that currency code really had some downstream impacts that could have been avoided had the data just been understood. So it's just really interesting how don't understand it is really a modern problem. Well, thanks for that color commentary there. I appreciate that. So just to finish out this slide. Other ways that teams don't trust their data are when it's full of errors or non standardize. Or maybe they don't have the context to use it. They might be lacking additional internal or external data that could provide the context needed for decision making. Also, sometimes they don't know when it's going to break and downtime comes as a surprise. Anomalies unexpectedly impact the business downstream. And the true root cause of data maybe is never even identified. If you're experiencing any of these situations, you're not alone. As Paul said, every day we talk to companies who are struggling to meet their data driven business goals because of issues with their data. They might be dealing with problems of how to fit operational data into analytical use cases or managing desperate data sets to get a better view of where the best data lies and struggling to understand where the data is coming from that drives critical decisions. There's even some famous technical organizations that have struggled with this. And a couple of just very public examples were NASA in the late 90s, they took a 125 million dollar hit when they lost the Mars orbiter. And it turned out the engineering team there was responsible for developing the orbiter use English units of measurement while NASA use the metric system. That's definitely an extreme case of not understanding the context or source of the information. Another example that was very public was when PayPal had to agree to pay the US government $7.7 million when it was found that nearly 500 PayPal transactions worth almost $44,000 violated sanctions that ban US companies from doing business with individuals or organizations on a blacklist. So for those of you are thinking well I'm glad we're not trying to put a straight spacecraft on Mars. The cost of dealing with these types of trust issues can still be quite substantial and take on many forms beyond the direct costs associated with fines or destroyed spacecraft. There's the obvious direct cost of being PayPal having to pay $7.7 million of the government over a compliance issue but there are other types of costs that you need to take into effect and take into account when you think the nature of trust and how you come to trust your data. There's a cost of preventing. For example, quality costs can be understood in terms of both avoidable and unavoidable. So cost to prevent includes the cost of building traditional defensive measures into software and analytics products through QA requirements development exception cases training etc. There's a cost of finding issues once you've discovered that you have a problem and endless search for trying to understand where the problem is coming from and why it occurred this time can be very time consuming. There's a cost of failure when those issues aren't addressed, and they do get into the final product. And they start to undermine the confidence in the process that you're setting up, or in the reports and dashboards you're putting together to help drive the internal decision making. So, with this whole is probably a good time for us to think about what we mean about trusted data. Paul, maybe you can chime in on this. What do we mean when we talk about trust in data? Sure. So, let's talk about the path that trusted data first. So, you've got trusted data and that what we've seen the typical progression really in the market and other customers goes like this. And we're losing the pyramid to kind of reflect this at the bottom of the pyramid, we have the collected data. And we have a lot of it. You know, we all know data and related products have enormous utility and it's been a gold rush to collect that data. We've seen in the companies grabbing as much as they can with intent to figure it out later. You know, there's the whole phrase, there's the golden, then there are hills and data is the same way. But this also assumes the collection of this data is timely and that they're appropriate aggregate and granular levels. And really, the game changer on this data collection a few years ago was the really the mainstreaming of new technologies such as the Internet of Things, commonly referred to as IoT, along with instrumentation and digitalization efforts. And really, especially with the rise of the popularity of temporal data, you know, data representing data states over time. And next on next on the pyramid is quality data. This layer represents data that is really usable, reliable, it's based on consistent and objectively measured criteria. Here, here we're making the data trustworthy for trying to avoid the scenario where user finds, you know, a single issue in a data source like a data warehouse, and that assumes the entire data source is untrustworthy. You'll also notice that the size of the level is getting smaller, representing less data meeting this criteria. On the third layer is discoverable data. Discoverable data is data that is coherent based on various degrees of data literacy documentation and user education. It's really the enablement of users in your organization to know what exists, the credibility, quality, context and the status of that data. And lastly is the accessible data. And it's really the goal, getting the right data to the right users. It represents the ease in which users can obtain usable data products, metadata and documentation, a secure and cost efficient manner that meets compliance, confidentiality and regulatory standards. And what's really interesting to me about this chart is very similar to one we've seen and used over the years with respect to critical data assets. Essentially, it's the same Pareto principle applied to data, whereas 80% of the issues come from 20% of the data, and it posits that the most productive use of resources is to make the most used data a priority. To give you a real world example of this, for a mid-sized company may have 20 columns in its data catalog. So those 20,000 columns, only only 400 of them are typically used to drive the business. That means only 2 to 4% of the data is being used by your organization. So, Julie, what do you need to make this happen? Alright, so to achieve trusted data, you need data integrity. Data integrity is data with maximum accuracy, consistency and context for confident business decision making. This requires a holistic view of your data. To dig into this a bit, having consistency in the data means that people can reliably trust the type of data every time they access it. Accuracy means once you have the data, does it match what you'd expect? And there's no data missing, improperly formatted or otherwise erroneous. Context means you understand how that data relates to other data and how it applies in the given specific business scenario you're trying to use it in. Data integrity brings these three dimensions together and applies them equally across the data being used for that specific purpose. Everyone is on a journey to continuously improve the integrity of their data, better understand their business, and ultimately better serve their customers. We've learned from our customers that there isn't a standard linear journey to data integrity that works for everyone. And that the days of large corporate initiatives are a thing of the past. Customers have told us that their business and IT teams are working more closely together than ever, jointly identifying the specific scope that delivers meaningful business impact. And as a result, they tackle data integrity through distinct projects that give them business value, no matter where those steps fit into the journey, and then plan their next move. We've seen that organizational needs are changing. It used to be that organizations had data management projects that were led by IT teams and focused primarily on responding to issues. They would target operational use cases and look to improve efficiency and effectiveness. They typically address data that resided on premise at the organization. Now we're seeing proactive data engineering with data engineers embedding data quality within data pipelines. There's a huge focus on analytics, including artificial intelligence and machine learning use cases. In addition, companies are now migrating and centralizing data in the cloud using cloud data providers such as Snowflake and Databricks. These organizational trends have a significant impact on how the data quality needs of the business are addressed. We see additional involvement from the business teams, not just as participants but jointly leading the initiatives. Business teams need to ensure that data can be trusted. And they are now an active participant in the resolution process. These processes can now become more automated with the deployment methods and with built in intelligence. This intelligence can be used to guide users in building the appropriate rules to perform sophisticated matching processes and by monitoring the data. This observation process can provide proactive alerts to users about potential issues so that the issues can be investigated and resolved before any decisions have been made based on that data. All of this can provide a more seamless user experience by leveraging semantics and metadata as part of the process. In addition, because companies are migrating and centralizing data in the cloud, they want to ensure data quality validations occur where the data resides without taking it out of the centralized location in order to apply data quality rules. This requires data quality processes that can run natively where the data sits. So now we're going to move on to talk about some of the common themes that we see in successful data quality programs. Paul, you want to go ahead and start off. At the top of this list, leadership via is incredibly crucial that management understands the importance of data quality. And you do this by getting them to understand the costs, speaking their terms, costs, both known and hidden, which we've already talked about. And once you gain agreement on the data quality, the data quality is real and a costly problem, identify specific real world examples and remediations. Next we have institutionalizing data quality and the associated mindset. It's another common theme we see in successful implementations. We see a mindset that everyone in the organization can contribute to the quality of the data. Everybody can be able to view and understand the data. It's lineage, it's overall status and even contribute to fixing or correcting the issues with the data. That's another hallmark of organizations that are successful in implementing these data quality solutions. Next on the list, metrics and measurements. Everybody has varying degrees of what we consider good data. You need to come to a common understanding of what the definition is of good data and what fixed data would look like, preferably at a quantitative level. And the key at this juncture is really get everyone actively involved in the process of identifying truly problematic issues and making that entire process a positive experience. Really about taking the next one. Yep, for systems and automation, more and more people are relying on automation to take complex problems, audit and execute on those on a regular basis. Automation becomes key, but also the ability to build decision making into those automation processes to help alleviate some of the tasks from manual intervention. Investing in tools that can monitor your data will create automated alerts, can raise alerts that can be proactively resolved. And last on the list, openness and interoperability. Openness and interoperability is a mindset and it's really an absolute necessity in the modern world. I like how one of my customers praised it the other day, they called it liberating the data. As a Conan point, as a reference point, let's try to recall what it used to look like when you wanted to use a new application a few years ago. You were dependent on an operating system version, arcane and manual deployment methods and infrequent release cycles. Now with that advent of cloud and software as a service, applications are simply immediately usable. So let's take an example, an email application. You know, you just, if you want to use one, you just pick one that works from you, works for you on the cloud, start using it. The same applies for data. If you want a data set, find a trusted one that works for you and simply use it. So to put simply, really it's to find early and automate, standardize, measure, reuse and educate. So let's look at each one of those in a little bit more detail. So the first one, find early and automate. I'm really basing this on a fundamental tenant that states good data increases customer satisfaction and improves profitability. So in other words, your data serves one of these three purposes. It increases or decreases profitability, improves competitiveness or enables regulatory compliance. So let's take for example, let's assume your customer finds an issue related to your data. This is going to be your most expensive outcome. The next scenario is that you find the issue before your customer, which is good. This is definitely going to be cheaper than your customer finding the issue, but still pretty expensive. And lastly, the least expensive option, your system prevents the issue from occurring in the first place. That's where you want to be. And this goes beyond just systems. It's a paradigm shift so that your entire organization is educated and empowered to improve your data. So what does this look like in practice? Let's take a look at what typically happens at most organizations in their data products delivery process. You've probably seen this in your organization. You have a data supplier. This is the source of the data on the far left. It could be a purchase data set, an internally sourced data set, such as a customer list, marketing data, invoice, or an order, or even an order. So what happens is it starts off as a single data set, has numerous processes performed on it across the organization. For example, it might be transformed, enriched and published to the cloud. And this is just a simple process in the real world. What starts off as just one data set may end up being copied and modified numerous times with thousands of transformations. Essentially, it mushrooms across the organization. And then some other process takes this data and consumes it. And then a de-factor issue is found in the data. Eventually, that issue is reported to IT. It gets prioritized, assigned, analyzed, fixed, tested, deployed. And eventually, maybe days, weeks, or potentially months that fix gets remedied in the source system. And to make matters worse, data in most companies is not single use. This process repeats itself after every iteration of new data. So now let's look at an alternative model. In this model, we're simply implementing time-tested improving data quality improvement methods. And you find these same techniques in almost every quality management technology or methodology. For example, there's TQM, which describes a management approach to long-term success through customer satisfaction, as well as a host of others, SBC, SIGBA, and Laney. And a basic premise of all of those techniques is for a manufacturer product or developer product, is each step is monitored using statistical methods and rules. In this example, there's an issue in process step number three. If you cut the issue prior to step number four, intermediate it, you've effectively, proactively prevented the issue in the final consumed data product. And so why isn't this being done for data? It's because it's traditionally been time-consuming and expensive for complex data products. The good news is technology has caught up for data. Let's look at the next one, standardizing and measuring. And that's what the approach in the prior slide, standardization and measuring, is an important part of your data quality initiative. We've seen customers have great success using rules and scores to accomplish those. So scores essentially define what fixed data would look like. Some examples are a quality score. And so when you look at a quality score, you're looking at how good is my data? Which data is failing the criteria? Some examples would be, I used to feel null, blank, has a semantic been identified? Do the records have outliers? Is there a corresponding business logic applied? And so those are really categories you'll see traditional. When you hear data quality, you'll hear people talk about it as the dimensions, accuracy, completeness, consistency. You know, there's lots of variations on those or individual organizations use their own. The second are the governance scores. And these answers the question really, how well am I governing my data? Stewarding and governing my data? Questions like, is there a business owner assigned? Is the data steward assigned? Does the asset have a descriptions populated? Is the status certified? Those are really trust scores. Can I trust my data? Those examples would be, how many data sets are being used that consume this data? Is there a rating system across the organization? Is the governance score high? The governance score, but my quality is low. The key at this juncture is to get everyone actively involved in the process of identifying truly problematic issues and making that a positive experience. To kind of give you an example of those, I'm going to show you a system for storing those and displaying them. So we are seeing a system here. This is a data catalog. We've got the individual assets. You can see there is a data set called customer cleansed in the sales schema. We'll have a governance score and a quality score. So looking at the quality score, we're going to see individual rules on the far left. We can see if the emails parse, the phone numbers parse for the dates standardized a score and then the history of that score. So for example, we can see the highs and the lows. Are we progressing? Are we getting better with the data? Are we getting lower? Are there alerts on the data? What's the typical patterns been of my data quality? From here, we can see how that was calculated. So everyone has the same understanding of how those scores are consistently applied and the definition of the score. So that's just a quick example of the repository or capturing data quality metrics and showing trends over time. All right, Paul, thank you. Appreciate that. As you can see, having a clear view into the quality of your data can help drive the conversation around how to best address quality issues. The foundational view of the data via a catalog or governance system is a great kickoff point by providing a common place to understand sources, context and lineage and other factors that can help you better articulate what issues exist and the extent of those issues. It's also a great place to help level set how the business defines quality and prioritize what actions should be taken first. Again, the idea isn't to tackle everything all at once, but to focus on business needs and objectives to drive an ongoing process of improvement. Bit by bit, step by step. As you can see, there's an overlap where in order to get a clear view into the trustworthiness of your data, you need to look at it from the perspective of both governance and quality. And cataloging gives you a foundational view of your data and becomes a great starting point by providing a common place to understand sources, context and lineage and other factors that can help. You make better decisions on top of that data and better articulate what issues might exist and the extent of those issues. Data governance provides a place to help level set how the business defines quality and prioritize what actions need to take place first. The idea isn't to tackle everything all at once, but to focus on the business needs and objectives to drive an ongoing process improvement. Working with customers, we have found a lot of different definitions of data quality and the associated capabilities. This can include profiling row management and data validation and the types of scoring that Paul discussed between quality and data governance. Another important aspect of data quality is the complex task of ensuring contact and address data is valid, including geocoding addresses and adding enrichment data to provide context. This can be done in batch, real time, or as part of a manual entry into a CRM or ERP system. Data quality can also include making sure that data is consistent across systems and not duplicated, enabling a single view of critical business data. In our increasingly real time world, we also need to ensure that we are considering data quality as data moves between systems and utilize machine learning and AI to quickly identify anomalies and outliers in the data. Paul? The effect of data quality on AI. We all know AI is out there. That's here to stay. Plenty of statistics out there. 67% of companies indicated that they are integrating gen AI into the overall AI strategy. That was in October 23. By 2024, by the end of it, 60% of professionals will use their own AI as part of their duties. This is the top of the already existing AI in the organizations. They know their existing traditional AI functions that are used for categorizations and predictions. They're really across the three different themes where the effects is, one, AI is changing the business. This can be partially attributed to the massive improvements in AI's accuracy. The more accurate AI is now compared to just as recent as three to five years ago. And it's both cyclical and exponential in growth. The better the data is trained on, the more accurate the predictions result in the more AI usage and more data. Another effect is the cloud. The cloud is a serious enabler of AI, especially as cloud becomes omnipresent in all of our personal business lives. Again, it's a cyclical relationship. The more data on the cloud, the more data for AI to learn from, the more AI usage. And finally, companies will need to catch up with data quality and integrity as AI governance becomes a requirement. We're at the early stages, but we're already starting to see inquiries about this. odds are your organization has already sent you a policy on your AI usage. So let's take a couple of examples here. First one. The data quality. High data quality saves data scientists time and improves ML accuracy. One of our resident data scientists has a great phrase that I like to repeat. ML models are an abstraction of the data they are trained on. So the quality of a model is reflected in the data it's trained on. Essentially, what I'm just rephrasing is garbage in, garbage out. At this point, the science and technologies for data analysis are already quite powerful and well proven. But how much time and money is being wasted? The answer is a lot. And that's because 80% of a data scientist's time is really cleaning up that data, collecting it, and transforming it. And, you know, an estimated 54 to 90% of machine learning models don't make it into production from the initial pilots. You know, that's a lot of waste there. So let's look at example number two. Example number two would be ML ops. Essentially, data changes over time. So do your models. You've probably heard the term ML ops and data observability. If you haven't, ML ops are the people, process, and technologies involved with the continuous delivery of AI in your organization. And data observability is the process of monitoring your data for anomalies based on historical patterns. So one of the key components of ML ops is this concept of freshness, and how data freshness impacts the relevance of your model predictions over time. You know, they have ML models are trained on stale data. They don't capture the trends that are present in recent data. In other words, you know, ML models tend to expire. For many cases, the old training data can even add noise when used with models with new data. A couple of examples there for freshness. Take for example, forest fires. ML models which use forest fire data from 20 to 30 years ago, you know, may not capture the patterns which are emerging with the present climate. And for data drift. So take for example, you've got a sales report that's using predictions based on an old product list that doesn't have new products listed. So as you can see, as your data changes, so does the quality of that data. Thanks, Paul. Appreciate that. So as we start to wrap up, let me talk about an example use case of how trust and data impacts your business. Does this use case sound familiar? I need to drive our new marketing campaign based on lists I get from outside vendors and internal sources. But how do I know if the lists are any good? What could be causing trust issues here? How do I judge the quality of the data? Could you establish quality checks that need to be run before the data is onboarded? Do you need to monitor for regular updates or freshness of the data? Do you need to establish measures for outside data providers to keep track of their performance over time? These are all things that might help address those trust issues. Am I getting a new data or the entire data done? In many cases, you don't always get just what changed from version to version of the data. You must still comb through the data to determine what was old information, what I already knew, and what has changed since the last time I saw this data set. So being able to match against and create a process that reconciles and deduplicates that data against your known context is a critical transformation step. Also brings up the trust in the data. Is information missing? Am I getting bad addresses? That's where contact information quality becomes key, particularly for high cost marketing campaigns like using mailers where address quality can have a big impact on the overall cost of your campaign. Is it fresh and accurate? Is also an area where you can build some measures around improving the trust. And then there's IT issues, monitoring for continual updates, ensuring that I'm always getting the freshest information once I start to automate these processes. Am I continuing to see the quality that I expected when I initially set up the process? Having the ability to observe those pipelines, to intelligently look for anomalies and changes that might alert you to potential problems, become something else you can do to improve the overall trust that you have in your data. As we discussed earlier, everyone is on a journey to continuously improve the integrity of their data, better understand their business, and ultimately better serve their customers. And as a result, they tackle data integrity through distinct projects that give them business value, no matter where those steps fit into the journey. And then plan their next move. And not surprisingly, that means they want solutions that give them the freedom to make those choices. Data integrity is a journey. It's continuous. And it requires best in class solutions working together to deliver value to the business. That's why precisely offers the modular, interoperable, precisely data integrity suite, which contains everything you need to deliver accurate, consistent contextual data to your business, wherever and whenever it's needed. This enables you to start wherever you are on your data integrity journey. This is a suite of services that are SaaS based and built around a common catalog to help you better understand your data. And from there, identify issues, understand the overall quality of the data, define business rules and scoring metrics against which you can better monitor and understand that data, and build remediation processes to improve data accuracy. It also adds context with geo-addressing and spatial analytics, and even taps into a library of data enrichment information to add additional outside context for the data that you have within your organization. All this is built on a foundation of intelligence and a common data catalog with agents to enable use with data both in the cloud and on-prem, addressing a wide variety of data management needs with one integrated solution. And with that, I'm going to turn it back over to Shannon. Julie and Paul, thank you so much for this great presentation and demonstration. If you have questions for them, feel free to put them in the Q&A section and just to answer the most commonly asked questions. Just a reminder, I will send a follow-up email by end of day Thursday for this webinar with links to the slides, links to the recording, and anything else requested throughout. So lots of questions coming in here. So diving in, can you please provide the name date of the IDC report you cited on early on in slides four and five? Yeah, I would need to look that back up to find that, but I can provide that to you, Shannon, if you want to. That'll be great. Yeah, and I can send that up in the follow-up email. And can you expect quality data when there is no data integrity and force between disparate datasets? I'll take that one. You can expect it, but you'll probably be let down. The, you know, really data integrity between disparate datasets, that's a definition of disparate datasets, right? Is a disparate dataset because it was from a different data source or is it because of the lineage and the parentage where it started off from the same original dataset, but it got transformed so many times before it got to the consumed dataset. I think you can't expect if you're not doing any data integrity or quality, right? If you're letting everybody run their own rules and do their own transformations and then expect them to be consistent, you're probably going to be disappointed. Really, the key is to get those consistent rules applied. Now, you may have two disparate datasets that have the same rules applied to them. You know, common validations, reference list checks, validity checks, and then you can absolutely expect the quality if you're running those same rules on those disparate datasets. But if you're not doing that, then yeah, it would be very difficult to expect high quality data if you're not, you know, doing any quality on them. Thank you. So many great questions coming in. So how are the data quality scores calculated? Does the application calculate or is it a direct feed from another source? Several different methods. You know, in our product, we support the feeding them in from other sources or calculating them. And how they're calculated is really going to vary based on what type of score. So when we showed you DQ scores, you know, you have a couple of different ways. You can introspect each record and say, you know, does it have a pass-fail criteria? And then at the end, you may say, 80% passed, 20% failed, or you may use some higher level metrics like, okay, is this completeness rule? 30% of the records were complete. So then you're kind of using an aggregate score based on those as well. And now for scores, for example, data governance score, you're not actually looking at the data. You're looking at metadata about the data. So as a business owner been assigned, well, that's a binary. So you can't necessarily say yes or no if it's yes. And so that's a metadata about it and it's calculated directly. So the answer to your question is lots of different ways that it's calculated depending on how you choose to implement it. Do you have any advice? How can I build a daily process cycle to find data quality issues and how to know where to search and test? So to build a daily process cycle, it really is using a tool out there that's purpose-built for doing that. It's going to find your data quality issues, both known and unknown. And then to do the kind of nowhere to search and test data catalog, data catalog and governance is really the way modern users and organizations are communicating those data literacy use cases where they're trying to enlighten the organizations about what data is out there. You know, socialization, who's using the data, ratings, comments on that data, what rules are being applied. And those are things that you'll see in a data catalog or a governance product. Perfect. And is it a good practice to monitor data quality on the data sources or just as part of the pipeline process? Lots of opinions out there, but really the industry has moved to that moving away from the process. We used to see ETL or enterprise or extract, transform and load. And so that transformation and quality was being done at that middle stage. And that for a big part of organizations has shifted to ELT, especially for cloud data sources, data warehouses, where the data really is extracted, it's loaded into the target system and then users and actually are running those transformations after it's been loaded versus doing it within the process. And then one of the downsides of doing it on the process is what if somebody uses that data before it's hit the pipeline process. Now all of a sudden, they're gonna get the same bad data because they were unaware that rules were run in that pipeline process. So they're not capturing those same rules. But I would say either in the source or in the target is preference. And Paul, there's probably one more dimension to that, which is if you're getting external data, right, then you would want to make sure that you're checking the quality of that before you add it into your own data source. Absolutely. Now what you see mostly on the pipeline process is its transformations. And sometimes those are more difficult to do on the source of the target. Sometimes you do those transformations as part of the pipeline process. Thank you. So we have a Q&A process that requires manual audits because we have to review documents to compare them with the data that has been entered into the system. Do you have any suggestions on how to automate that process? So that, okay. So I think what they're saying is right, they're manually entering it. I mean, one thing you can do is have like a front-end validation. So for example, I'd mentioned front-end validations that can occur on like CRM and ERP systems. So that as someone's entering it, you're also making sure that that quality of that data is high quality and they're not entering things that you know right off the bat can't be correct like an invalid address or things like that. Paul, do you want to add to that? Yeah, I think the key to that question and I need a little bit more detail is the document. So when I hear documents, I'm assuming you're referring to unstructured data. And that's where we're seeing a lot of advancement in data quality. So you can have, you know, you have your answers or your reference data or those baseline metrics and then you're able to use AI to read that unstructured data and answer questions about it. So without knowing more about the data sources, that's kind of my interpretation is you're looking at how to parse that unstructured data in the documents to answer questions against a known baseline. AI is kind of a long way in doing that. Indeed. So is there a breakdown of cost of poor quality by industry, say retail, banking power, et cetera? Absolutely. So if you're in those industries, you're probably on some of those publications. We, precisely, we've got a strategic services organization that focuses on industry data quality within industries, happy to engage and talk to some of those folks, briefly to reach out to us and we can get you pointed in the right direction for those, but absolutely there. I've seen them in the past. I can't tell you off the top of my head which publications, but they're definitely out there. And going back to the topic of AI, you know, so using AI for data quality, what about AI hallucination or even AI bias? Absolutely. Absolutely. Absolutely. Absolutely. Illucinations, are you seeing things in the data that don't exist or bias, but that's where we're talking about the temporal data, the freshness, you know, it's key that you have fresh data that's up to date and of high quality. That's what's kind of the points of the whole AI for data quality, essentially that without high quality data, your AI is erroneous and going to lead you down past that are incorrect. Thank you. And we've got a few minutes left. So, you know, I'll keep the questions coming in as many as I can hear. So if your data quality scorecard alerts you of a problem, where is the best place to do corrective actions in the source pipeline? I think we already talked about that one, didn't we? The scorecard alerts. That's the same. The corrective act, I think the corrective actions was asked before, but essentially the source, the source is generally the best place, or if you don't have access to the source, you know, typically you see those transformations done in a pipeline if you don't have access to a source. So a lot of depends on that one, but as a general rule, fixing the data at the source is going to be the best place. Think about a manufacturing process where you're building bicycles. If you're building bicycles and you have a defect on the welds for the frames, you know, if you find that defect at the time you've shipped the bicycles, that's going to be incredibly expensive to remediate versus if you discovered that issue, that defect in the manufacturing process at the very beginning at the source, you would alleviate all of those costs associated with remediating that and preventing it. Thank you. Yeah, it's a slightly different angle on the question there, but you know, I noted that the scores were weighted, quote unquote weighted. Could you explain more about that and maybe why certain decisions were made at different weightings? Sure, so weights are typically very subjective based on your organization. You know, there's some weights that are more out of the box. You know, semantics, you know, semantics are typically very correct. That's your confidence, right? Your 98% confidence. It's an email address. And then the aggregations, right? You can aggregate at a, or you can weight at an aggregate level. It's really subjective about how your organization and the data, what that data is and how you want to score it. Scores, scores for their, vary between, you know, how good is the data? You know, is it 70% good? Is it 100% good? And then very binary, right? That'd be value X. So it's really going to vary on what data you're looking at and what rules and how your organization is improving that quality. Very nice. And do you see weighted score by dimension for aggregate scores? For example, certain dimensions impacting aggregated scores more than others? Absolutely. Yeah, I mean, some notions of validity may be more important than completeness. Maybe you need to validate. It's a valid address is much more important than if you have an address. It might be for completeness. So definitely weighted by dimension is important. Nice. And that is all the questions in the queue for right now. I'm just going to give it a couple of minutes here. But Julie and Paul. Just so you know, I'm sorry. There's one question in the chat looks like too. Oh, so the glossary. So how do you see integrating or leveraging a business term glossary in data quality? Yeah, so from that perspective, we see that data catalog as the foundation. And so you're cataloging everything you're doing quality on. And then that catalog is also integrated with the business glossary. And we see that it's very important, right? Especially as we get into, you know, through AI again, but even in data quality, you know, you see typically it's references either technical assets, which are those assets that are in your data sources, in your databases, but then you have a business lens, you know, your glossary. You know, for example, a valid address, a valid shipping address is different than a valid marketing address. And it has to be 100%. What has to be, you know, anywhere from zero to 99%. So adding that context of a business glossary is gaining importance and become very important for data quality. That's where we talk about the context. Where is this data being used? How is it being used? Answering those questions. Perfect. And thank you both so much. Thank you for watching that. I am in the chat. So that's a great question. Thanks to everybody. And as you can see, there's a great QR code up there for you to learn more. I love it. Also, precisely be presenting on their data quality products in our April demo day, specifically focused on data quality. So look forward to that as well. And just a reminder, again, I will send a follow-up email by end of day Thursday for this webinar with links to slides, links to recording and anything else. Thank you both so much. Thanks, everybody. Thank you. Thanks, Julie. Paul, hope you have a great day. Thanks to all our attendees. Hope you, likewise, have a great day. Thanks, everyone. Bye-bye.