 Hello, everyone, and welcome to our next DDW session called Discover, Enhance, Connect, Act, Modern Data Governance to Unleash the Value of Your Data. That's going to be presented today by Evan Tobin, technical evangelist from BigID. Now, all audience members are muted during these sessions, so please submit your questions in the Q&A window on the right side of the screen, and our speaker will respond to as many questions as possible in the time. Please note that there is a linked form at the bottom of the page titled the EDW Conference Sessions Survey. This is where you can submit session feedback, and we encourage you to do so. It really does help. Also, there is a small icon to the lower right of your screen, which will enlarge this window with the speaker and slides. So, let's begin our presentation now. Thank you all and welcome, Evan. Thanks, Eric. So, let's get here. So, I'm Evan, just as a brief overview. I'm BigID's technical evangelist. So, I help companies bring BigID's data to external systems. So, bring BigID's discovery that we'll talk about today into whatever other systems you're using as an organization. And we'll kind of see some of that at the end because I bet that you're interested not just in, okay, I can use some data governance framework to get information about my data, but you're also interested in how do I bring this to the 30 other tools I'm already using. So, we'll see a little bit of that at the end. First of all, I'd like to congratulate all of you because, presumably, you work in data. And data right now has amazing job security. Organizations are rapidly adding new types of systems to help them fulfill their business needs, right? With COVID-19, there's new systems with new types of machine learning, especially self-serve machine learning and data analytics. There's new types of systems. There's all these different new systems that organizations are adding, and they're adding them everywhere. They're adding them across country boundaries. They're adding them across types of data source, types of applications, types of files even, right? You now are having people who are expecting you to be able to tell what data is inside of an image file. And just to compound all of that for you guys, there's now all these data regulations. And Gartner says that by 2023, 65% of the population is going to be covered by one of these data privacy laws. So, that's just going to 10 times the amount of work to govern properly, know what's inside of these systems, and make sure everything is up to whatever your compliance standards are. And so that is a task that's just too big for a human being, right? If you have thousands of different databases, thousands of different file shares that many modern organizations do, it's not possible to do that as a human, right? It's just not something you can do. It's too many systems, too much data to be able to do that yourself. So, what we've created kind of is a four-layer approach to finding that data inside of all these systems and not just using naive methods of looking at column names or humanely linking stuff. You still need to approve stuff and deny stuff and that kind of stuff, but actually kind of mimicking the process that humans go through as they're going to see and recognize patterns and develop that and give good insight around that. So, that's kind of what we're showing you today. We're going to show you four different lenses that you're able to combine and look at your data to kind of mimic the way a human would look at data and process that, but on a huge scale. So, you're not just going to be able to process your business glossary for 10 databases. Now you can do it for 10,000 databases. So, those four lenses that we have to process your data are going to be first the catalog, which is going to take all of the information that you'd usually have in like a spreadsheet or in some kind of manually written business glossary or something like that. It's going to take all that information that you would have had to track down, take credentials for, talked of like 30 different people to get proper database access. It's going to take all of that information you've spent time trying to get and it's going to grab that information from your data sources automatically for you. It'll tell you what are the column names inside of this database. It'll tell you what type of data is inside of those columns. I know there's historically been a problem with especially older systems. We have column names that are column one, column two, column three inside of your database. The catalog sees past that. The catalog will say there's emails in this column. It'll also check things like access control and that kind of stuff and show you all that information in one place. Our next lens is the ability to classify information. Not just cataloging the information but also being able to say, okay, we have this random file, this random column. What really is it? Going beyond just regular expression classification and saying, okay, I can tell that this piece of text is an email because it looks like an email. Going beyond that and saying, okay, I can tell this file is an invoice because I have this machine learning model that says this is what an invoice looks like and it matches it. In addition, being able to classify inside of those unstructured pieces of data. So saying, okay, we have our invoice inside of that invoice. What data do we have? Which is something that increasingly is being required by organizations to comply with data privacy laws and compliance standards and moving data in between countries is kind of a newer, very big one. So being able to look inside documents and say, okay, there's a phone number inside of this document that maybe that document was being stored as an image. Cluster analysis is our third lens. Cluster analysis is going to allow us to say, okay, we have millions of files in our file shares and those file shares could be cloud file shares. They could be SMB, Windows file shares, whatever they are. So taking the data inside of those file shares and saying, okay, we have these files that are similar. This is going to say maybe they're copies and it'll tell you, okay, you have duplicate data here. You might not want to waste data storage costs having 20 copies of this single file as well as group those files together. So we have a folder inside of that folder is 1000 files. All of those files are similar. So they must contain similar information, right? If you have a folder and there's thousands of files in it and the folder is named invoices and those files inside of there look like invoices, right? You're going to be able to say, okay, all of our invoices kind of look like this and take that information and know, okay, if I get a request for this type of information, it's in a lot of our invoices, so I should probably check invoices. Finally, and what I think is the coolest of these four lenses is the ability to correlate information. So what this means is we're able to take all of your different data sources and not just say, oh, you have a bunch of files. You have a bunch of email addresses. You have a bunch of social security numbers, bank account numbers, but really say you have bank account numbers. This bank account number belongs to this person in the real world. And it doesn't have to be people in the real world. I know some people have used correlation for things like part numbers. So like machine part numbers, some people have said and used our correlation to say, give me all the data on this machine part number and it'll find all the files and everything related to that machine part. So correlation is the ability to find information that belongs to something that exists in the real world or I guess virtual world you could do as well. Those four lenses are what we call our four essential services. We have catalog, classify, cluster and correlate our four services. And that's the big ID platform. On top of that is whatever you want. So a lot of my time is spent dedicated to teaching people how to create their own data protection apps, how to create their own data perspective apps, as well as use the however many we have now through our marketplace and a few other places that you can download and run into your environment. It really is the case with big ID that we provide these four services and then everything else on top of that is either an app we've built, an app one of our partners built, an app a customer built or something that you just wanted and you built yourself. We've really enabled people to say, we have these services, take them wherever you want. And people have really liked that so far. So right here on my screen here you can actually see this is what it looks like when you log into big ID. So I've logged into my big ID training environment here and this is the first thing you see. So you can kind of see, okay, it's showing me a brief view of those four services. If I go on the left hand tab here, so our little menu, you can see there's individual pages for those four services. So let's just click into correlation and I can show you a little bit of what correlation is all about. So here in correlation, we have a list of attributes. So these are attributes about the things we're tracking, right? I said before you can track people. It seems like here we're tracking people because we have user IDs. And then we also have, it looks like we're tracking songs because we have song IDs down here as well. So we can say, okay, we have this user ID. Let's see where else that leads us here. So if I enable showing this table here, we can see, okay, we have our user ID. And it seems like our people are going to be stored inside of this user's AP database table here. And so I'll enable profiles just so we can see a little bit deeper look and see, okay, our people's user IDs are also inside of these profiles. And then that also traverses down into subscription identifier. So correlation, as I said before, is going to tell you, you have this person, you have this thing in the real world. Where is their data? And so this correlation screen here is just a good way for me to illustrate of, okay, if we're looking at one piece of information for a person, what systems is that in? So correlation is going to tell us where is this person's data. After we have that data, what we can do is we can classify it, right? So we know that this data belongs to this person and we can say what type is it. So this has 212 different classifiers loaded into it. Classifiers can be created by you or there's 200 some preloaded that we've already made for you. So you can see, okay, I want to click on country. I want to see who's or let's click on something that's more kind of PII. US Social Security Numbers. We're storing US Social Security Numbers and it looks like we're storing them inside of a server that is located inside of New York. And that's getting transferred to application that's located inside of Portugal. So in our case, there's not really any problem here. Maybe we just decide to use a cheap Portugal web hosting provider or something. In other countries such as China, Russia with that are going to have restrictions in terms of transferring data between countries or even the EU for that matter. You're going to want to be able to see, okay, I have this person's data at this physical location. Where's it going? And then if I scroll down even more. Well, there's no people for that one. I guess I'll just go back up to email. If I scroll down even more, I can see exactly where that information is. So it looks like we have emails inside of this relational database. We have emails inside of this relational database inside of this specific file. We have email addresses inside of this image. We have email addresses. So it'll call out and tell you where exactly is this information. We also have cluster analysis, which as I said before is going to look at files and tell you information about them in general. So it's going to look at all of the files we have that are similar. And so we have all these files that are artwork agreements that are similar. And it's going to say, okay, we have these 30 artwork agreements. These files are pretty similar. What is an artwork agreement to us? To our organization, what does artwork agreement file mean? And it's going to say, well, an artwork agreement file to your organization means it has emails. It has some residency information inside of there. It has URLs. It has full names. And all this other information inside of that file. If there were duplicates, it would say, hey, why are you storing X amount of this file here? And it would call that out. So you could perhaps adjust your backup settings or adjust where that information is copying to and that costs up. So that's called out on the correlation screen as well. And then I like to think of our catalog service screen as kind of a summary. So it's a summary of all of the information we found everywhere else. I know I've had a few people in my classes come up to me and be like, okay, I need, I've been given the task of creating a business glossary. I've been on it for three months. We have millions of databases. How do I just get information about all of my systems? I can just say this is the data inside of every single system and import that to some external system. So this core catalog service is what's going to allow you to do that, right? So you have every single file we're scanning. And then we have all of our different databases. And then if you look on the right here, all the attributes inside of those databases, so I can expand that. And see, oh, okay, so there's this type of information, this type of information inside of this database. And of course we can export this to Excel to make it easier to read. Or you can click into one of them and see, okay, these are the different things we detected inside of here. These are the columns inside of the database. What is in each column? And then what's the confidence level? So what percentage of those? So once you've completed those four services, I talked about before, okay, we have those four services. Now it's up to you. You run with it. And what I mean by that is there's a few different ways you can extend big ID. The most common I'd say is going to be the application framework. So you can create your own app, as you can see in the upper right. Or we have all these apps that we've created for you that are just deployed. And this isn't even all of them. I think I'm missing like three or four different apps here related to data governance that are not on my environment. But so from that you can take that information and do things like upload files and have big ID tell you whose information was breached. So I upload a sample data breach file. I have big ID scan that and big ID will say, okay, your database users us was breached. This is where the data came from. So breach response notifications become a lot easier. That's just one example of a use case of taking those four different services and branching them out. And so I see a few questions and I'll get to questions at the end, but they kind of relate to the next thing I'm going over. So where is this data coming from? How do I connect? What can I connect to? And the answer to that is anything. So big ID takes the approach that out of the box, we give you around 60 connectors, 60 plus connectors that you're able to use out of the box. You don't need to pay anything for these. They're just installed into big ID when you get big ID. I mean, besides getting big ID, right? So all of these connectors are there for you and they cover the gamut of stuff like office 365 all the way to SAP and Slack and ServiceNow. Even if you wanted to just do CSV exports and something like that, you can do that as well. What if your data source isn't in these 60 options? We have a one day class that I teach that you can come to on how to build a connector. It really is as simple as one day or maybe like a week depending on you need to get credentials, you need to get the business processes lined up. But after that, programming them is very simple, but we have a one day class. You build a custom connector, as you can see right here in the middle, and you can now connect to that data source. And there's no programming language limitations on building those custom connectors. You choose your own programming language, you choose your own environment. I teach the classes in TypeScript because I like TypeScript, but you're free to do whatever you want. Beyond just having connecting to these different data sources to get the data and scan the data and give you data insights about them, you also can connect to whatever type of integration system you want. So two sample integrations that I think are very common are going to be a sample integration to JIRA. So if an alert fires within big ID, I want to create a JIRA ticket. That's an integration that's on our big ID community that you download. It's just a big ID app. You install it and all of a sudden automatically big ID will automatically synchronize your information from JIRA to big ID and back and forth and so on and so forth. We also have what's called metadata connections. So there's a few of those. All of those also take the form of an app. So something like interfacing with Calibra, you're able to, what I think of metadata collections as is you're interfacing with big ID presents a view of the real world, right? Big ID scanning your actual systems. So Calibra gives you, Calibra and all these other data governance tools are going to give you a logical view of your systems, right? What you think they look like. So these metadata exchanges are going to allow you to take big IDs, data about the real world and reconcile that with the logical data that your data governance organization has about systems. I know it's very common like, okay, I finally got through documenting all my systems properly. And then the applications updated. And then everything changed. And so being able to keep that metadata connection of my documentation, I have the real world that big IDs providing, and it's changing that data is very important. So I think I'll do, I'll go to questions now. That way we'll have a little bit extra time and I can show you in the product where that actually is. So we have our first question here, I'd be curious to see how I can track where my data goes, not just the data itself, how do you show me that. So currently big ID does not provide our own lineage tool. So we don't provide something that you would see like, what's a good example of a lineage tool. I'm just trying to think of one. But we don't provide like, Oh, this data flowed through this system flow to this system flow to this system, you can document that and big ID, but we're not going to discover that flow for you. But we do have integrations with lineage systems that allow you to get that insight. So we can do a metadata exchange with a lineage system to enhance their data and then pull the lineage into big ID. The second question, how is the tool deployed, what is required in order to gain this level of insight. How is it able to capture that a user has stored an email in the database, like I showed. So how is it deployed. There's a few different deployment scenarios. The most common today is on Prem. It's on Prem. Kubernetes is the most on Prem Kubernetes or on Prem Docker is the most common. However, you can there is a button in the AWS marketplace, where you can just click deploy big ID, and it'll spin up an instance for you. So there are you can deploy it in the cloud. I know some people have a lot of our customers are going to go with that on Prem option though. What's required to get that level of insight. So there's a few things required. The biggest I would say is you just need to have read only credentials to that data source. So if you're trying to scan a MySQL database, you need read only connect credentials to connect to MySQL. And then at that point you click inside a big ID scan this data source. It's going to go through our scanning process, which there's a ton of different parts of the scanning process I think we take like an hour of that in our product class. But there, it goes through the scanning process and then it spits out the results. So there's a lot to do with the scanning process I can't talk about right now just because we're time limited, but that's the highest level view. I'm sure that a user has stored an email in the database. So during the scanning process. Going to classify all the information it comes across of using the installed classifiers in your system. So this is a machine learning classifier. This is a regex classifier. We have, as you can see, at least I think there's like 200 in the system or something. So it uses a combination of those regex classifiers those machine learning classifiers, and then there's also natural language processing classifiers that will be used to on all the data to say, Hey, is this piece of data, this type of data know next one next one next one. I think that answers your question. All right, Evan, it looks like we do have one more question that's come in here. Does does this tool also have governance workflow capability like review and approval process for metadata changes or catalog updates. Workflow capability, kind of. So there is workflow capability where you can say, I'll just bring it up. You can say, okay, data process and sharing, you can say, okay, I have this flow and this is the flow. I've said my data is going through. And then when an update appears, it'll go and say, oh, okay, there's an update to this. And I need to say, where's my, I need to get approval on this from someone so new attribute was discovered. Hey, Ellen, can you come check this out. Yeah, that would send an email to Ellen and it would go in your JIRA, whatever, whatever else. There's that capability. I feel like you're asking for something different, though, approval process for metadata changes. Yeah, so if you could clarify that would be helpful. So has big ID been used as a tool as a part of due diligence in MNAs? So like in mergers and acquisitions, I don't think so. I don't, I haven't heard of that happening yet, but I could see how it would be an amazing tool for that. I mean, I have even installed it locally on my network. And then as like a demo, I've just scanned my computer whenever to like find lost files. It's not what it's made for, but it works. So, so I'm not aware of that, but I don't see why that would be something it couldn't do. How am I defining classifier? So a classifier is going to be either one of three things. For big ID, it's going to be either a machine learning model that says this file looks like this. Or it's going to be a natural language processing model that says inside of a file, a name looks like this. Or it will be a regular expression saying that this piece of data matches this regular expression, then it's a phone number. It's one of those three options that's run against data. And if it returns true, then that piece of data is that type of data, right? So if our phone number matches this regular expression, then this this piece of data is then a phone number. If it matches this regular expression is how we define classifier. Great. Yeah. Oh, I think I just, yeah, I missed one question, but you can email me at evint at bigid.com and I can answer your questions. I think it's in the whatever materials we gave for the conference. Great. Thank you so much, Evan. Also, once this video is processed and posted, the Q&A will remain live. So you'll be able to go back and answer questions textually there as well. Oh, perfect. Evan, thank you so much for this great presentation. Thanks to all of our attendees for tuning in. And a big reminder that the sponsor virtual exhibit booths are open today until 1.30 p.m. Pacific time and will also be open tomorrow. So please do take some time to explore the sponsors area. And please also complete your conference session survey on this page for this session. The next sessions will start up in about 10 minutes. So we'll see you all there. Thanks again, Evan.