 Hello everyone and welcome to our next DDW session called Data Trust, Gain Understanding and Confidence in Your Data. That's going to be presented today by Jeff Brown, Director of Data Quality and Analytics at InfoJix. All audience members are muted during these sessions, so please submit your questions in the Q&A window in the right of the screen and our speaker will respond to as many questions as possible at the end of the talk. Please note that there is a linked form at the bottom of the page titled the DDW Conference Session Survey. This is where you can submit session feedback and we encourage you to do so. Also, there is a small icon to the lower right of the screen, which will enlarge this window with the speaker and slides. So with that, let's begin our presentation now. Thank you and welcome, Jeff. Thank you, Eric. So today we're going to talk about data trust and gaining an understanding and confidence in your data. My name is Jeff Brown. I am the Director of Product Management with a focus on data at InfoJix. The agenda that we are talking through today is the concept of trust signal. So we'll be talking about what are some of those external trust signals that we run into in everyday life. What are some key questions to ask in your organization as it relates to data and trust? And then also what are some of those trust signal components that you can embed in your organization to help you gain trust? So trust obviously is a major component to data and being a data consumer. And Gartner has a quote here that is just relevant to data trust, but it really is basically data trust is something that helps business operations. It helps run more efficiently, helps it run smoothly, helps us make better decisions faster with greater confidence. And as we think about trust, we really need to think about how we recognize trust in our everyday lives and how can we then apply it to the data we consume. Oftentimes we are consumers of products, offerings. We have relationships that we build trust in, but how are we actually applying this to the data in our organizations and especially our consumption of that data? So how are we relating these two in our life? So I want to take, for example, in our everyday life, I was introduced as Jeff Brown, director of product management. How do I project trust? How do I build trust with you? So sometimes when we do that, we might say, well, there is a LinkedIn so that you can prove that I'm a real person. I am reaching out in Olive Branch to say, connect with me. Let's build this relationship. Let's build this trust. You might say that's still not enough. So what are some additional trust signals? You might look at trust as a lineage or a pedigree or a data providence so that you can say, well, maybe education factors are building trust in who I am and building the trust between you and I. Maybe there are certifications that you might look at to say, Pragmatic Institute, Toastmasters. These are all trust signals that you use to build trust and to say, do I trust this person, this strange face in a circle on a screen? Well, maybe there are some badges. Maybe you don't know that I'm certified fresh on Rotten Tomatoes. I'm not, but there's a trust factor that's involved there. And then finally, maybe you use titles as a brand of trust. If I had something that was different, maybe you would trust me then. So titles such as King, there are these little trust signals that we build into our everyday lives to assess and compile trust. And how do we do that with data? One of the things that I like to do, and then as I was thinking about presenting and thinking about how do we relate the trust signals in our everyday lives is just building upon food that we consume. So if you are a data consumer, you are comparing data that you want to consume for your purpose to make sure that it is potentially fresh, potentially manufactured or created at a trusted source. You also want to know that maybe this data hasn't been tampered with. Here's a kind of a gross example of, I think this is baby food that has a sketchy type of tape around it. But you also want to identify trust in the things that we consume in our everyday lives with things like misspellings or names that might not fit the mold of the trust that you have in your idea, in your head. Another fake type of trust thing. This is a yearquel that was pulled up. I thought it was kind of funny. Instead of dayquel, yearquel, it was ending 2020 fast. So again, it's these things where you look at something and out of intuition, you have your own trust that's set up. So if we look at maybe quote, so we look at offerings, maybe there are some things that we also are building into trust. This might be too good to be true. If we look at data, maybe the quality is too good to be true. Maybe it's something that you don't trust completely because of something that it's origin or a quote around it that you just don't have trust in it. Trust me, I've been using this for years. Or maybe in the everyday interactions we have trust too. So we pick up a phone call. Do I recognize it? Do I don't recognize it? This is a funny example of you yourself are calling, right? So you're looking at, do I trust this number to have a relationship to build that out? Or is it some place where you're picking up a phone and you're saying, do I trust the other person that is asking me for information? So as data consumers, we're constantly trying to evaluate this, but making it corollary to our everyday lives is going to be critical. If we think about websites, we visit websites every single day before you came to join me. Maybe you were on a website. You look at those websites and you try to understand, maybe I don't trust that because the spelling is off. This was kind of a funny website. Dihydrogen, what is it? Dihydrogen monoxide research, which is really just H2O. So you look at it, that kind of seems funny. As a data consumer, we're looking at validations for trust. We're looking at trusted scores. We also look when we're on websites to look for trust signals around the badges that they bear. So there's a process that maybe this website went through in order to build out consumers trust to give them critical credit card information, to have them buy from them. So as data consumers, for whatever reason, we're consuming data, whether you're building out a data model and you're a data scientist or you're in the finance industry and you're building out quarterly report and you want to be able to take all these different trust signals to combine them and say, can I really trust the data that's being used? Also, can I prove out the trust that is being presented in front of me as well? So there's these sets of questions that I encourage you to ask yourself and to ask of the data before you consume it, just as you would in your everyday life. Ask those simple questions. Does this smell okay? What's the expiration date? But really you can relate that to data as well if you're a consumer of data. So you're looking at how defined and curated is the data. Is there an ability to kind of find the data to centralize the data? Is the data that I'm looking at, is this field? Is it critical? Is it deemed critical by those data stewards? Can I then hold data stewards that are related or own the data accountable? Do I trust that data steward that is actually looking over and curating that data as well? Where does this data even come from? How else is this data being used? Where is it originating from? How fresh is this data? When was the last time this data was updated? So if you continually are asking those questions like before I use this, when was the last time this data set was updated? I've been using it for five years. Is there something that might be more recent or looking at a data set that you're then putting into let's say a marketing campaign or data that you're putting into an analytic model, there might be a difference on consumption. So if you're looking at how you're consuming the data, freshness may not be as relevant if you're let's say a data scientist and you're just looking for a large data set. However, if you are looking at marketing campaigns, contact addresses, people's addresses, that is something where the freshness of it really is important to you and how you are consuming it. Then we get into another layer of the questions that you really should be asking of your data so you can find it, you can see how fresh it is, how is this quality to build out that trust? What is the quality of the data? How is this quality score that I'm being presented? How is this quality score being calculated? How is this quality score being weighted? Oftentimes when we talk to customers and we talk to organizations who are setting up data governance programs, we're setting up data quality initiatives, not programs, they should be programs, they talk about data quality one score for all, but that may not be relevant to somebody who's looking at a quality score that is looking for something different that for their purpose of consumption, they may not care as much about a certain metric or they may think that something should be weighted differently. So if you think of something like completeness of a field or completeness of a data element, that might be important to everyone, but it may not be as important. It may hold less weight to a data scientist, to somebody who's just looking for a sample set of data to get an idea of sales by region, or if somebody might say, I would calculate my rolled up data quality score slightly different, I would have completeness, but consistency and accuracy and patterns are very important to me to make sure that my data is of the highest quality that I consider it. So really when you're looking at silos of data and you're looking at those consumers of the data, data quality is extremely important. It's not just important that you can find the data, but it's also important to be able to roll down to the next level of quality to apply it. And then finally, if you're looking at a set of data and it's being curated and there's metadata around it, are there other people that are looking at this data set too? Has it been reviewed? Is it credible? So just like a product that we would look at on Amazon or on XYZ site, let's look at some of the reviews. Oh, I'm looking at a shoe. It says that the shoe size is typically too small. Okay, well then let's see what else that person has been rated or reviewed. Or if you're looking at a review you're looking at, is this good, is this bad? Oh, they rated it something you're not supposed to use to product that way. So I'm not taking into consideration their review. So they're trying to use a baseball bat as a, something to catch spiders, something that isn't relevant to, I need this data set to have my financial report, to have my audit trail of XYZ, when really no, this is just a data set that I found and I want to share with the world inside of my catalog. So it's asking these types of questions that you should expect answers to in order to build out your data trust. And again, if you are a data driven organization, you really want to be able to answer these questions. You should be able to answer these questions because again, as we talked about in the beginning, building out data trusts helps business operations. You become more efficient and helps you build better, make better decisions faster. So again, these are all characteristics of an organization that can benefit from trusted data and build the trust out into those consumers of data as well. So as we talk, what are some of those critical factors? How can we bucket some of this information? So when we talk about data, we talk about is this data discoverable? So if you think about an entire enterprise of data and an entire ecosystem of data, you want to know, is this data even discoverable? What is out there? So if we look at discoverable, you can hear the questions within your organization and if we were in a room right now, physical room and I would say, raise your hand if you've heard this before from your stakeholders, they would say, I'm not even sure what data we have or where it's at. There's not even a single source of truth that I can trust. So that's oftentimes when we talk to large organizations that we hear that too. I don't know where it's at. I don't know what we have. Even if I know where it's at, I don't trust it. So that there's no real centralized location to catalog to be able to curate metadata. So you need to know about where that data, where the data set lives, where the data sources live, whether it's business data, dictionary definition, policy information that you need to be able to have a centralized location to catalog it, to curate it, to also be able to tag some of these assets too. So when you're looking at discoverability, you need to say, this data also looks like other data. I'm going to tag it. So this data looks like a social security number. Let's tag it so that when you go in and you're looking at, I need a list of social security numbers of XYZ or I need a list of customer ID that's tagged as customer ID. You can then trust it because it's been tagged based on different metrics and different standards before you so that you can then easily consume it. When you look at once you've found the data, let's make sure that you can measure or that there's some statistics around it. So that you know what's there, you found the product that you're looking to consume. Let's have some measurements around it. Maybe what are some of the different calories on it or what are some of the different metrics that you've established as an organization that you've built out trust so that we always typically hear, we have the data, we know where it's at, we have lots of data, but the problem is I have no clue how good this data is. So that's kind of scary, right? That's kind of like going to a supermarket and not knowing how fresh or how good the food is that you're looking to consume. So as a data consumer, you really need to understand what are the metrics that are being measured here at the data that I'm looking to consume and are they relevant to me and are they relevant to the type of consumption that I'm going to do with that data. So as you look at what are the established metrics, is there data quality? Can I understand how good is that data? And then you also want to look at the quality of the governance or the score of the governance around that data set that you're looking to consume as well. And what we mean by a governance score is, is there a data owner? Is there a definition? Is there a certification around the metadata about this representation of a data set that I have, let's say in my catalog? So you want to know how well is it being curated and looked after within my organization? And is there a score there just as you would unravel and go to the next level? What's the quality of the data? So I know how well it's represented and I know how well it's being taken care of. But then for the quality, what does that look like? Let's peel it back and let's actually look, let's get our hands on that data to see what are some of the quality metrics. Let's even look at some of the metrics around profiling of the data. What are the ranges? What is the overall validity of the list of ranges that are in a certain data set or a certain data element as well so that I can better understand the metrics and how they fit my data purpose? Let's also look at traceability. So then you're thinking about going back to traceability and lineage of myself or even of a piece of food. The ultimate data lineage might be farmed at table, right? Because you know where it's being raised. You know maybe what trial it's on, what restaurant it ends up, all tracing it back so that in an organization, we often hear I can find my data. I can see how good it is, but you know I just don't know where this data comes from and even downstream where is it going so that you can see I'm tracking the lineage of my data. I know where it came from. I can see that it came from this source. It came from Salesforce. It came from this field as customer ID. It came, then goes to let's say a data warehouse, then it goes to a data mark and then eventually it ends up on our cloud in the data lake so that you can track lineage of it. Maybe it's an aggregation of data as well so that you can see how is this data being aggregated. Maybe it's four different sources. And then also impact, where does it go from here? What else is it related to it? So that also how do we then determine any related policies and business terms to this data so that you can see that it's traceable as well. I see a question came in and I'll touch on that at the end when we have a few minutes. So then also understanding some of the trust signal components that were mentioned too so that in the discoverable, measurable and traceable what are some of the components to help build out discoverability, measurability and traceability so that when we look at the discoverability we're talking about things like definition. We're talking things that help build trust credibility so things, we found it but what are some of the creditations that are associated with it? Is there accountability as well when we look at measurability data quality metrics we're looking at governance scoring so again we're looking at how does this relate to the measurement of the data and then finally traceability we're looking at impact lineage related assets we're also looking at workflow to see traceability meaning what happens if something changes is there traceability to as a data consumer alert me that that data is changing because I am now depending on it and I'm trusting it based on a previous let's say value of quality previous value of governance score and then last social proof what are some of those comments what are some of those maybe votes up votes down to the data that you're looking at to then consume it when we talk about trust signals at Infigix we offer a product or a platform called data360 that offers data quality, data governance, data analytics so when we look at when we talk to our customers we talk about how are they building out their trust signals how are they building out trust and engaging the consumers to build out trust as well so that we look at things like certification we've got contact email address in our data catalog and we are building out a definition we're making sure that there is a valid description we're looking at things like governance scoring we're looking at things like is this sensitivity do we have a sensitive sensitivity level associated with it oh this is proprietary now I know as a consumer that may not be something that is fit for my use or let's say it's public is this PII, oh this is actually PII I actually don't want to look at this moving forward is it tagged this also looks like it's been tagged for marketing contact email address okay so let's see that there are relevant tags to it I'm building out trust to say is this the right data set that I'm looking for is this the right data element that I'm looking to add as well is there an owner there's a data steward assigned to it okay good so there's all these key signals that you would use in your everyday life as it's related to data you need to start asking these questions when we look at data quality and metrics what is the actual quality of said contact email address as a data set we're looking at things like completeness does this piece of completeness does this completeness score fit my use maybe it does what about consistency how consistent are the fields across this email data asset type in order for me to then trust it for consumption we also look at things like I mentioned for governance scoring how well is this being looked after what are the weights associated with this scoring as well in this example you've got contact email address and you've got a business owner has not been assigned maybe that's something I should take into consideration but you know what data stewards been assigned to this asset to look after it there's a description that's been populated there's a status that has been certified that fits into this data this governance score that may be to my liking and again it builds on to this scoring metric again maybe this is a new score let's look at scores over time to see whether it's been fluctuated whether it's a pretty static type of high score pretty static low score maybe somebody's forgotten about it we also look at things like impact when we take contact email address in the center here what else is it related to is it related to some data quality checks okay that's good what else is it impacting well it's also coming from let's say an SAP field what are some of the other areas what are the relationships that are built on it looking at lineage what are some of the lineage where downstream is it going are there active alerts around my lineage to say I am looking at an asset I'm looking at contact email address that is actively being managed inside of my lineage so that if there's something that falls off or something that changes downstream I can be alerted there are active alerts related to it what are the related assets are the related assets to this data before I consume it oh looks like there are some automated tasks associated with though there's checks for completeness there's checks for validity I'm also looking at things again like workflow who gets notified what are some of those checks and balances in place to see if there's threshold violations because as a data consumer I want to know if there's bad data inside of this data set so that if it falls below a certain level that I'm being notified so that what happens if something changes who is being notified what are the different steps and what are the different threshold limits that are involved too going back to social proof as a trust signal what are some of those signs of social proof we've got Chris Reed here who's looking at it okay he's made a comment Michael Ortman's made a comment on it okay they're looking at they're looking at is this the right contact email address for my consumption so again is there activity around it or is it stale has nothing been touched on it has it not been uploaded have no comments been added to it as well so as we look at key takeaways for our trust signals and how do we apply them we're looking at trust signals and how they vary depending on your consumption and fit for purpose again what is right for your consumption doesn't make sense for the data modeling that I want to do as a data scientist versus I'm looking for a marketing campaign of set email addresses of set of customer lists maybe it's not filled out that's okay but again that may not be okay for regulatory reporting so what are those trust signals according to those lines of business and consumers what are the trust signals that make buying easier to make decisions so again if you treat your consumers of data your businesses and your business users as buyers of data how can I empower the trust signals how can I build out those trust signals for those buyers to trust the data to make it easier so reduces time spent by IT reduces time spent making a decision because you know you can trust your data and then lastly trust signals empower data driven organizations so again if you've got trust signals in place to empower your business users you will then become a data driven organization so I'm going to open it up for questions here and again trust me I've got the credentials I've got the title no I don't I'm just a person speaking at you across the computer so if you have questions please feel free to submit them to the Q&A box and I will get to them we'll take some time now to talk about it I see that a question has come in you mentioned the importance of metrics and data being measurable but will you please explain the role of requirements in your understanding of data quality so again there are lots of different requirements and metrics and dimensions of data quality that are important when you look at strict business requirements those also play an importance to data quality there are some generic potential metrics that you can get from profiling but there's also those business centric requirements that are extremely important to this as well and that gets again to fit for purpose around your metrics that you want to build out with your data governance and data quality program to make sure that you can communicate those requirements appropriately to get them into the trust signal that you want to convey as well so a trust signal might be I need first three digits of customer ID to be numeric they're actually alpha that's a business requirement completeness might be a passive type of requirement but actually a key piece of the requirements in metrics as well also about data quality metrics one of the things that builds trust is making sure that they provide value so what you don't want to do is over requirement your data quality metrics you want to ensure that the metrics that you are tracing align to a corporate strategy align to a corporate goal so that you can directly say we are measuring what is absolutely critical importance so that you're not measuring things that might just be noise and then let's see we've got another question does the tool require business to define data quality specifications to reveal data quality issues so you can either have the business let's say data stewards or the business defined in a governance manner I want to say this field or you want to provide different thresholds you can definitely provide those different types of data quality measurements and different data quality thresholds inside of the tool as well or you can also present profiling statistics and we also have an out of the box set of data quality rules that we find business users say let's start here for email address it must be complete must have an ad signal must have a proper domain those are the different types of data quality issues that can then be conveyed we also have is there a template that can help guide building out the questions to define the trust matrix per se so we do we do we have a set of of a governance framework that can help you best organize and prioritize what are some of those trust signals again trust is the overarching umbrella here right you've got quality that plays into it you've got governance that plays into it but we have a set of professional services that provide a template that can give you exactly what you're looking for and help you prioritize based on again your fit for consumption as well and then last question here is there guidance document some of the types of trust components you've described so again this gets back to being able to define your trust if you visit infigix.com there is and just search in some of the content that we have there as well there are some documents and some white papers that we have that help you build out and kick off governance programs to kick off data quality initiatives so we don't have one per se to say this is the exact trust for you because it is different right so there could be a set of foundational type of requirements that are universal across organizations but really it's asking of you what are your trust signals that you can build into your organization based on feedback from the business based on feedback from the science program as well to help build in those metrics and then to help curate all that metadata so they can discover it measure it and then ultimately trace it as well Jeff thank you so much I am afraid we are out of time or should I call you your highness we appreciate the great presentation thanks to our attendees for tuning in for your great questions please complete your conference session survey on the page for this session the keynote session with Danette McGovery and Ron Ross will start in about 15 minutes thanks again Jeff thanks everyone thank you