 We go. Hello and welcome. My name is Shannon Kemp and I'm the chief digital manager of Data Diversity. We'd like to thank you for joining this Data Diversity webinar, which today is understanding the data you have before applying a governance strategy sponsored today by Aparavi. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar. For questions, we will be collecting them via the Q&A section. Or if you'd like to tweet, we encourage you to share highlights of questions via Twitter using hashtag Data Diversity. And if you'd like to chat with us or with each other, we certainly encourage you to do so. To find the Q&A and the chat panels, you can click those icons in the bottom middle of your screen to activate those features. And just to note, Zoom defaults the chat section to send to just the panelists, but you may absolutely change it to chat with everyone. As always, we will send a follow-up email within two business days containing links to the slides, the recording of the session, and any additional information requested throughout the webinar. Now let me introduce to our speakers for today, Daryl Richardson and Gary Ling. Daryl is Chief Evangelist at Aparavi and an expert in data management. And he brings over 40 years experience in the IT space and 20 years in data management focused on helping businesses manage their data with available tools and strategies for proper data retention and governance. Daryl is passionate about building innovative intelligence software, Aparavi software. He brings a wealth of knowledge in all facets of industry from financial business and federal government to privacy and health care. Gary is the Chief Marketing Officer of Aparavi Software and Senior Executive with extensive experience in success and trailblazing business growth and leadership in enterprise software and data infrastructure markets. Gary has worked with both large and early stage organizations developing business strategy, product directions, and executing go-to-market strategies. He's also well known for his speaking and motivation as a market visioner. And with that I will turn everything over to Gary and Daryl to get to sit today's session started. Hello and welcome. Thanks Shannon. Great introduction. You're speaking quite fast. You should be a newscaster when you've got the limited. So good job in terms of those introductions. Certainly, yeah, we're going to get going here. Daryl, anything you want to say before we get going? No, I'm looking forward to this. This is something that we're very passionate about here at Aparavi and actually in my career. And I know Gary, you have the same thing. But managing your data correctly is never been more important today than it has been. So yeah, let's just jump right into this. Yeah, I concur. I mean, at the end of the day, data is certainly the lifeblood of every business. And when you look at the Fortune 500 today, you know, there's so many people doing a level of data analytics, predictive analytics, machine learning, deep learning. But it's like it can only be as good as the data it's provided. And so one of the things that we've noticed in certainly our careers, and I've worked in various storage companies, NetApp, Dell EMC, and even with Sandisk in terms of the actual storage layers and storage management, and the great technologies and great companies have innovated over the years. But it's all been about storing the data, storing it, moving it, replicating it, backing it up, and so forth. And we know we're in a very strange world today with a huge influx of data security issues, ransomware, and so forth. But again, how can you protect information that you don't know exists? So one of the things you've got to look at the data, the data usage, the data creation, the data lifecycle. And arguably, we're sort of caught in this rut as an industry where there's sort of this data management status quo, where we're sort of blindly copying information from point A to point B without fully understanding its value. And you'll understand various elements of the data where, yeah, there's some great data integration tools there, great analytics like Snowflake in the cloud now that's all amazing. But it's only as good as the data goes in. And if garbage goes in, it can clog up the system, slow down everything. And so you can see some stats on this slide that just are really shocking to some people, especially when they start to look at how their businesses are operating, they use a CRAM or ERP system, very much data in a structured world, in databases itself. But in the next number of years, almost 90% of the data we're going to be generating is unstructured. And that may be a technical term to some people, but we're one of the data community here. I think, like I said, 800 people, almost 900 people registered. And I think I see now an uptick of like about 230 people online. So I'm not going to teach people to kind of, you know, the stuff they already know, but clearly the amount of files that are being generated is just tremendous. But they have a different life cycle. When you hear from IDC, the amount of video content that is generated and then disposed of. Or, you know, I was listening to the CEO of Microsoft just the other week, and they were talking about every meeting they do, that they record no matter how big or how small. And then it's actually converted into text and into documents and then utilized for insights and records. I mean, that truly is a powerful thing, but it means storing a lot of stuff. And who's to decide whether the data is good or bad, useful today or useful in the future? There's some crazy numbers out there. 95% of businesses actually cite that this is a major problem for them. However, when you inspect the data itself in most enterprises, 50% of that data is completely unknown. It's not known the state of the information. Is it operational or transactional? Where it's located? It's just not known. And with our customers, and this is also an industry stat, we've actually met with a number of customers anywhere from as little as 10%. And as much as, you know, 48% so far, people are finding that they have so much trash data. You'll hear a term rot or redundant obsolete or trivial data. I just like to keep it simple. It's trash data. It's stuff you could just throw away and get rid of and don't worry about it. So why are you plugging up the system and why are you migrating data to the cloud when 33% of your stuff is trash? And that's because we've got outdated practices. We're fueling this massive redundancy of data and it slows down innovation. So it's time to clean the house. You know, you can't have good machine learning or AI if you've got garbage data going and processing garbage stuff to get rid of the waste. And it also has a massive impact on things like sustainability and greenhouse emissions with amount of the size of data centers out there. So it's getting worse. You can't protect what you don't know. One stab that just shocked me that I found out the other week that the poor data quality is actually just costing the U.S. economy alone, not worldwide, just the U.S. $3.1 trillion a year. Yeah, there's a lot of data out there, data creation, 180 zettabytes, great. It just makes the problem even worse. And with IoT devices, commercial IoT as well, whether it be video or whether it be different sensors, you know, 41.6 billion different devices by 2025. You know, that's only a few years away and it's all unstructured data. And so when you look at the market, it's incredible to me in terms of the amount of players that have come out in sort of the data ecosystem and everyone's vying to store your data in their system, whether it be their servers, their NAS environments, their sand environments, their cloud or their application itself. And therefore, they're controlled by that application. And so it makes it very difficult to actually utilize across applications, across devices, and across users. So you really need to have a level of integration. If you look on the left side of this, you can see big data analytics. These guys really know about the data. Tends to be a lot of structured stuff, but also being fed in with unstructured. But what we do, and what I believe we need in the market is how you can provide solutions that augment and integrate into existing data infrastructure. Why should you have to replace your backup environment or your cloud environment or your Icelon storage devices? Why can't you augment, integrate seamlessly without any causing any headaches? But most importantly, and one of the things that we're doing as a company is orchestrating data across your entire enterprise. So you have visibility into that data, its life cycle, and more importantly, its value, and not doing silly little metadata searches, but actually integrating and understanding the content of a file itself. And so it really is a matter of integrating data intelligence and automation into your system as some core technologies, but working seamlessly and in harmony with all of the investment that you've got in all of your information that's out there itself. And so go ahead, go ahead, Darryl, you know, I get on a roll and no one can stop me, so try it on. Go back to that last slide, because it's something else here that might be resonating with people out on any audience is because every one of these products are licenses, right? You know, so most of them are capacity based, some are user based. User based licensing is obviously, I mean, you're going to, it's going to dictate how many people are in the organization. But the license on capacity is where data intelligence really shines as a tool that's needed. If you can identify 33% of your trash data and remove 33% and the average enterprise today is hosting well over 450 terabytes of data, that's also one million files per terabyte. So you're looking at about 450 million files with 33% of that being junk. So you can technically identify this and remove it or move it to a cold storage, even cold cloud storage where your price per gigabyte is, you know, a one cent as long as you don't access the data within nine months. But now you've reduced the cost of your backups, your archive tools, your e-discovery tools, you know, by 33% of what you would have if you just said, this is what I have in my environment, let me license this much, right? So the tools like Operavi sit at the top layer of the data management pillars of intelligence, backup and recovery, archiving and retention, and then your e-discovery tools, because we're going to help you reduce the amount of data that's there. And then another surprising fact here is that, you know, that 33% of rot or trash data holds an immense amount of risk. There's privacy data there. There's regulated information. There's data that's old or stale that hasn't been accessed for so long. But the mentality around this keeping data forever is really a huge issue. And why we're in this situation today, because there haven't been tools that effectively will evaluate this information and give you the ability to make data-driven decisions about the data. Yeah, it's an important point. It's almost like your biggest risk of your data is your data itself. Right. Maybe that's something we should put in a fortune cookie. But I digress. Yes, it's just basically to touch. And then when, you know, Darryl can get into, you know, how do you actually make this happen? Operavi is an organization and the original vision is just really understanding, you know, and almost getting frustrated with this blind status quo of data management, really amount moving blocks of data around without understanding its value. And in a lot of cases, it's been so controlled by either the system it's in or the application. So being able to tie into and, you know, dive into like, you know, the application itself, whether it be a collaboration app, you got to be able to find that data and unlock it. And so you can make use of it. So you can learn it. And ultimately, it reduces risk, can also lead to cost avoidance, but also cost reduction. I mean, if you can reduce, you know, have an efficient data infrastructure, where you can truly and proactively, and I won't say ILM for some of the you guys have been around, but I just did, it really is matter of the data lifecycle and understanding the relevancy to your business. So then you can control your costs. And then exploit that data value. Yeah. And I said a contentious word there, you know, exploit. It's your data, you should be able to exploit it for its value, put it to good use. But you can see that some of these stats in the middle of the slide, this is, you know, not made up stats or marketing stats. This is what some of our customers have been achieving today, where they can almost literally reduce their costs and complexity by 40%. You reduce some of their data and storage and compute because you've got less data, less stuff flowing around the network. Maybe you don't need quad processes and high performance servers. You can actually reduce that. But the realities are, you know, proofs and the put in terms of the numbers, you know, you look at some of these numbers from IDC and Forrester, you know, 30% growth every year from companies that actually take advantage of data insights. So clearly in the analytics space and the data is certainly fueling that. And then, you know, the quotes there from these ones from IDC around $430 billion in productivity gains that people analyze like data is phenomenal impact on the business to get their data rack together. So really is about mastering the data and I'll turn it over to you, Darrell. So that's what you mean by this? Yeah, so data mastery is a vision for, you know, the next level of data management. Mastering the data is a kind of a term that we're coining here because, you know, Operavia, again, sits at this top layer of data management and basically will collect all the information about the files and the data and give you the information you need to make the educated decision. So once you know everything about your data, now you've mastered it, right? So that's really the key phrase here that we're going to use. And if you go to the next slide, Gary, you know, we can look at some of these things of how you can start to understand your data. You know, what I would do as a data governance specialist for, you know, many years is identify the low hanging fruit, right? And what I would say low hanging fruit initially would be the rot data, which is your redundant obsolete or trivial information. Now, this information is data that we've been storing for years, in some cases, decades. And this data has no perceived value to the organization whatsoever. Now, a lot of the rot data discovery is going to be done through policies or through classification, or simply your own general record schedule that says, like, I'm keeping data for this long. And anything after that, as long as it's on legal hold, I'm going to remove it. But the practice of removing it has been kind of a taboo type situation, because nobody wants to remove data, because they think that there's some adverse, you know, repercussions for removing the data if a legal issue or request comes in or whatever. So what we would actually do would say, through auditing and defensible deletion, we can identify the rot based on its classification or contents or metadata fields that we're looking at, and then remove this information. If we can't remove it because that's a policy of ours, then we're going to move this to cold storage and just keep it at this cheap one penny per gigabyte cost. If that's the case, just don't access it within nine months, because it'll move to the next layer of whenever that accessible data is based on your cloud storage. Cloud storage is kind of a misnomer to a lot of people still, but I think people are starting to understand how it's used. But as far as rot goes, it's the risk that is involved in the rot I think is the biggest challenge that we need to identify. So that's number one thing for me is to remove this rot information. Rot information can also be old log files or auditing files that some application that's running in a debugging type situation has created tens of thousands of millions of text files that are one or two or 10 gigs in size and for someone forgets to turn off that logging. So I mean all this is doing is it's cluttering up your backup schedules and your capacities for your backups. It's cluttering up your storage. It's also being migrated from one storage to the other every three years. So every time we have a hardware migration, we're never looking at the data. All we're doing is simply copying and pasting or using the built-in migration tools for the storage vendor to move the data over. It would be much more valuable to the organization to identify the data first, remove the stuff we don't need that has no value whatsoever and then move what's remaining. And what you'll probably find is that your storage capacities now every three years are not growing at the 20%. So we've been able to identify everything that we can remove. We've removed 33% of our storage. Why am I having to now buy 20% more storage year over year which is 60% storage capacity over a three-year period? So that's why our storage situation has gotten to where it is. The storage vendors love you guys. Everybody out there who never looks at their data and just simply copies the data from one device to the other because it's beer 30 and don't have time to look at it. So now we're in this decade long thing we've been dealing with and we've got this huge challenge of 450 terabytes of data to look through now. So how do you do this? So, Darren, it's also as well when you talk about data governance, some people automatically gravitate towards it. Oh, that's a compliance thing. Well, we'll know because it's corporate data governance and govern your data proactively, but it's not all about electronics. You've got to set your policies ahead of time, sit down, write them down, make them enforceable, and actually make sure that people are disciplined where the productivity workers or is the IT department, but there is a let's call it a data policy or data lifecycle management that the people follow. Do you agree? Yeah, absolutely. And it kind of goes in line with the question that's in the chat right now from Cindy. And Cindy, I'm going to go ahead and answer this slide because it's actually relevant to talk about the differences between a data catalog and the definition and then the actual data discovery that Aparavi does. A data catalog is a collection of multiple sources of data. It could be structured. It could be unstructured. It could be a data category of what you want to collect, right? Aparavi collects everything, right? So if you look at a data catalog as being a managed source of information, Aparavi would be that raw source of information. And you can determine based on what you find, it could be 70 fields of metadata, right? If that's all you're concerned about is, you know, locating, you know, 1999 versions of Microsoft Office Word version 10.0.0.10, you can find that because all of these fields are in the metadata. If you're a forensic investigator and you're looking at the original creator of a file, this file might be 12 years old, but the file you're looking at is only six months old, right? Because this file has been opened up and resaved and renamed 10 times. However, the original creator is still a metadata field that's sitting there in the metadata. So from a forensics investigator perspective, you can find a problem with a file that's 10 years old by just going back to the metadata. There's not a lot of tools out there that collect all the metadata of a file, right? And in some cases, like I said, like PDFs, for instance, there's a lot of settings for PDF that you can make. Can it be printed? Can it be printed in low res or high res? Can it be changed or modified? Is there a security password? Is there a password to make changes? All these things are in the settings of the Acrobat print driver, but every time you turn one on, it adds a metadata field and we can collect it. But we're also collecting the contents of the file. So if you want to use Apiravi for your enterprise search functions to bring all the data into one location and then do an enterprise-wide search of the unstructured data, you can fulfill the data subject access request or FOIA request if you're in the government or if you're in state and local agencies, it might be a public records request. All of these requests can be easily filled out if you head all the data in one location. You can say, here's the email that says, I need these five custodians. I'm going to look for these keywords or phrases or maybe they've given you a lexicon of 30 different words and then within these date ranges. So a faceted enterprise search, which is what Apiravi has, you can certainly build that query in Apiravi's tool and then produce a list of all the files that meet the criteria and then send that file back and say, okay, based on what you found out of my 450 million files, there's 30 files that meet that criteria. It's going to cost you $15 a file for me to produce them. How do you want to proceed? And most of the time, these guys go away. So everybody's looking to catch you for doing something wrong. They're not necessarily looking for something for, unless it's some sort of a legal issue with a person that's involved. So looking at understanding your data through data classification is another element of where Apiravi's data collection and data knowledge helps, because not only are we collecting the contents and all of the metadata of a file, but we're also classifying the information with over 150 prebuilt policies that are ready to go out of the box. And these are everywhere from all the different, the 60 countries of GDPR, all of your United States personal privacy data, you're from CCPA down to the newest policies that are coming out. Like Nevada has one, Massachusetts has one, Florida's working one, one. The federal government today is working on a federal privacy law like GDPR ish. So all of these will be added dynamically because it's a SaaS platform and that's what we do. Darrell, what you're highlighting there is you don't have to be a lawyer to understand these rules. They're just basically check boxes to go find, right? Say that again, Gary. Sorry about that. You don't have to be a lawyer to implement this. Yeah. There's a reason that there's no true professional services needed for Apiravi, because as we're installing the application, we can actually train you on the product's use during the installation and you can be a novice user on day one on hour one, right? And the more you use it, the more comfortable you get, right? And I see we're getting some questions in here, but we're going to leave these until the end because some of them are pretty easy to answer. Darrell, on this, I mean, one of the things, we've both been in this space a while, one of the things that really jumps out at me is people will automatically assume, oh, this is good, but it can be cost prohibitive. Oh, I need to digest this all into the cloud or I need to make big huge investments in storage because of indexes and so forth. We actually do in-place search. You don't have to ingest or move all this distributed data into a central location. And we can find data, whether it's on a NAS environment or it's in a controlled productivity application or email environments, these are all accessible, but we can actually find it and then analyze it and provide insights into it without actually having to create any additional cost or infrastructure needs. Yeah, this is correct. And the data never truly leaves the environment, right? So the in-place collector sits on a VM within the network. There's two reasons for that. Number one is that for organizations that don't want their data leaving their organization, obviously, you want to keep all that collected data in one location. But it also is for performance, right? Because we're localizing the collection of the data, right? And to Gary's point, we don't move, copy or do anything with the data. We're simply collecting everything and we're looking at the data as it comes in, allowing you to make these reports or export reports into business intelligence tools or even other tools that analyze the results from a graphical perspective, whatever it is you're looking for, right? I mean, if you have all the data in front of you, you technically never need to touch the file again, unless you need to delete it, copy it, or move it, right? And you can do that with our product, as you say, because it's like to do the basics in terms of finding the data, analyzing it, basically stacking it up and getting it organized and to find out what you really have, then you can make a decision to, like you said, to remove it, to copy it, or like you said, even to automatically and based on policy, move it to the cloud. Yeah, exactly. And if you go back to the previous slide, Gary, because there's one thing here that, you know, the dashboard here, it tells a lot of information. So where you're seeing the tiles here with 2.87 terabytes, these are called widgets. And you can customize this to be whatever is important to you, right? So part of this might be, if you're in the legal side or the regulation side, you might say, I want to know where all my classification information is, right? So one of the widgets shows where all your classification is based on the policies that you have active, or all the data that we've actually classified as being something, and you can literally click on that policy, and it'll bring up all the files that have that regulation of data, and then you can export it or do whatever you need to do with it. But another important aspect of the platform itself here is to the left, Aparabi is a native multi-tenant application. Now, what that means is we have the ability to segment organizational data, right? So HR data, finance data, technical data, or IT data, or engineering data, all that can stay separated, and there can be specific data owners that can only see that data. They can't, you know, an HR guy can't see the engineering data, or an engineering data can't see the financial data, right? But at the top level where it says Aparabi here, if you're granted permissions there, which typically your compliance CISO or legal team might need to search everything at one point. So we would actually do the query based on the top level here. But if you are a user of just one of these East or West, you know, as shown here, then you would only be able to search the East data, right? You wouldn't be able to search everything else. But why that's important, if you're a managed service provider and you have 60 or so different customers, you can install one instance of Aparabi and install 60 different clients, and they can all have access to their own data. You, as a legal person who provides legal services for these organizations that your company manages, can do these searches for them. You can also perform regular audits of the information as well. You know, so having the information in front of you is extremely important. One of the incidences that I can and why we put this on this slide is about finding rogue PSTs, right? You know, so it's like a PST is just a container of a bunch of emails because they're, you know, back in the day in the exchange, these things were quota based. So everybody would have to empty out their mailbox every once in a while because they were getting those emails about being over their quota. So a lot of people would create PSTs on a monthly basis, which is simply taking all the email and putting it into a container and then offloading that container somewhere else and then removing it from Outlook, which clears your exchange email. But the problem with these things is PSTs now have become like viruses, you know, they're everywhere. You know, and I can say the largest healthcare provider in the country, which is a federal agency, and I'm pretty sure that out of 240 people, 239 other people can understand who that is, but they had well over four petabytes of PSTs that we identified, right? On 160 different locations and four separate data centers, right? So I mean, what do you do with them? Most of this stuff is duplicate anyway, right? You know, but the fact is the idea would be to locate all these PSTs, which again is this low hanging fruit, find them all and relocate them somewhere else, maybe offload them to tape or offload them to something, some removable storage that it's not in my primary data center in my second or third tier storage, you know? So again, understanding the data is what we're trying to do here, right? Give you the tools to make the decisions based on the data, not the decision based on people. Yeah, actually, people is a good point, Darrell, because one of the things you talked about, we just mentioned, was kind of the effect, but the cause is that people are humans, we are natural pack rats, we keep things, we don't want to delete it, and but the other thing that happens, and I remember this probably a decade ago, where literally, you know, people weren't put in, it's a little different now with or rather Microsoft 365, I suppose, Office 365, but people are, you know, they're resilient, they create survival tactics. That's why, you know, when there was a quota put on the size of your email box, people would literally then offload and create PSTs. And then that caused an even huge problem, because it just moved the problem from the exchange servers over to, you know, the file servers, and most importantly, the IT department lost control, because it wasn't all now inside exchange servers. Now, obviously, things are changing with, you know, cloud based and 0365, but as you said, the PST is really an evidence of the survival tactics. And that's also happening with data leakage, where people have got, you know, Dropbox OneDrive, and so forth. People need to know, or corporation needs to know, when does stuff get moved off-prem outside the controls of IT. So it's certainly good evidence of, you know, one thing is in governance, but also making sure that your policies in place to govern individual people's habits. Why don't we jump forward then, and then we covered in terms of the deep discovery and classification. And this certainly really plays well in terms of speeding things up and making it usable and easy, even so to say, somewhere for a paralegal, but more also to do inspection of are you compliant? You know, we're not a big, we're not an e-discovery tool, we augment e-discovery tools by actually reducing the amount of work that needs to be done. And a lot of those e-discovery tools actually require you to ingest the data as well. And I miss anything on that one, Daryl. No, I think you're pretty spot on there. And I can throw out another possibility of like the SOC2 audit process, right? You know, SOC2 audit process, there's this evidence collection aspect of the SOC2 audit process that Aparavi can certainly be that evidence collection tool, right? You know, because everything's right in front of you, right? So, but no, you're pretty much spot on there. Let's cover a customer. Obviously, when we're dealing with data, especially, you know, compliant data or whether it's, you know, just someone's data environment where we find some, you know, the smoking gun, as it proverbially speaking, people don't always want to fess up to that. So we've changed the names here to protect the innocent. This is a large financial institution that certainly Daryl knows extremely well. Go ahead. Yeah, so this particular customer was a nine month four stage governance project. The first part of it had to prove itself before the other three would be approved. The challenges obviously were it was a highly regulated industry with different financial regulations with SEC, Fenris, Arbanes, Oxley, billions of files, yes, needed to be classified. The bank, the bank was out of compliance in their own eyes. They weren't out of their compliance by like SEC yet. As a matter of fact, this project was, it actually was built out of another financial institution that was found to be out of compliance by the SEC and they were getting fined millions of dollars a month until they became compliant. So this bank decided they were going to be proactive. And a lot of the challenges around this spawned from 11 years of mergers and acquisitions that were this large financial institution acquired these other and simply all they were doing is copying and pasting the data from these NetApps and Isilon servers when they got them into the main domain and just isolating them by company name of the merged company. So what had happened as a result of this compliance issue was they needed to determine a schedule for deletion. And then that was the whole point of is to start removing some of this data. So the compliance officers got together and said, okay, anything other than two years old, and it's not on legal hold, we're going to remove. So there were some internal projects that were built, but there was also the discovery aspect of, you know, where is the data that holds risk and files that were potentially on legal hold and these other things. So they would process a thousand custodians at a time. And what they couldn't tell us was who was on legal hold. They could tell us who wasn't on legal hold. That added more complexity. But in the end, the solution was added to ensure that all the data that was under two years old was going to be managed correctly and moved to the secure locations. Also looking at the cybersecurity aspect of this. So if you run the data through classification, you can see the value of the data and you can also see the risk that's involved to that. And you can move it to protected shares and protect the access and actually have more stringent user access. For instance, one of my pet peeves is somebody who has access to a protected data source that hasn't accessed the data within a month. I mean, if that was my organization, I would remove that access. And if they ever needed it back, they would have to send an email to the security team that said, I had access to this before, but now I don't. And I would say, well, you haven't accessed it for 30 days. Why do you need it now? Right. And that further helps to protect this information. But, you know, 35% of the data was identified as rot within these organizations. And you're talking about 72 petabytes of data. Right. I mean, that is a massive undertaking. And today this bank is still deleting data up to about 80 terabytes a month based on the processes that we've outlined here in this webinar. It was a pretty monumental thing. But, you know, based on what we actually were able to accomplish, the other three parts of the deal were actually closed as well with another hundred. You know, so it was quite a lot of learnings there from both sides, both the customer and our priority in terms of, again, understanding, you know, user behavior, as well as, you know, policies and the how things kind of literally get mushed up. And then it's all about just making sure we've got continuous data and application availability and less about the data. And then finally they get around to really understanding their data. And they're like, the light bulb goes on. So pretty amazing stuff. I know we're getting a little we move on time. So why don't we get into talking about, you know, around a governance strategy? You really want to be able to build a trusted one, right? Yeah, exactly. So, you know, the first thing we need to do is determine the type of data that we have, right? What's valuable? What's not valuable? What's regulated? What's not regulated? And what needs to be retained based on our own internal processes that we have? You know, so when we look at understanding the data first, you know, again, Apravi sits at that top layer, but, you know, understanding at first helps you to make these decisions moving forward with the data governance strategy. You know, you might say, you know, based on the type of business we're in, let's just say, for instance, health care, right? You know, there's many different regulations with health care. It's not just HIPAA, right? The HIPAA is a data type, right? Or an information type, right? But there's also regulations around, you know, what types of data you need to retain and what types of data you have to retain after a certain point, right? Like, for instance, children under the age of 18, you know, they have different laws for health care organizations where they have to keep it once they turn 18, they have to keep it for an additional set of years before they can remove it, right? So, it becomes this complex thing, right? But if you don't understand what data you have, then it's very difficult to build these strategies, right? You know, optimizing, of course, you know, looking at duplication, right? I mean, you know, the, I can say in the state agencies today is a survey that we ran, but, you know, the normal state employee, which in Florida, there's like 90,000 of them, the normal state employee actually duplicates a file two times a week. And how do they do that, right? They'll open up a file, they'll create a file, and they'll save the file, and then they'll forget where they put that file. And then it's still open, so they can't find it, so then they'll save it again to the place where they thought they put it. So now we have two copies of that file, right? If you do that 90,000 times, times, you know, 12 months a year, I mean, you're, you're, you're looking at a massive amount of duplication that's unaware, that people are mostly unaware, right? You know, so understanding the duplication is one thing. Understanding your rot, you know, we keep hammering on this, but that rot holds so much information that's risk related, privacy data, you know, IP data, passwords, authentication, criminal justice, you know, all of these things, if you don't understand what it is, it's hard to make a decision on removing it or moving it or do whatever. So yeah, and then, you know, once you've identified, you've reduced a lot of your data because you don't need it, you've optimized it based on like data duplication, removing old files that are from old applications, and then you've classified the data, and now you're becoming this compliance aware organization, right? So now, if you want to find all of your GDPR data because now somebody wants to delete their information, it's pretty simple. You just put in the name of that user and, and then the GDPR is a classification and you'll find all this information, it's pretty, it's really that simple, right? And looking at the good data, right, you know, so, you know, there's business intelligence tools out there that you can actually make money from the data that your company collects. And you have to remember, right, I mean, we've heard this data is the new oil thing, it's not a oil at all, it's something that cannot be commoditized. And without the data, your organization ceases to exist, right? So it's more like blood, life, water or food, right? You know, so you need to protect your blood, water and food as much as you possibly can to make sure that your company's integrity stays. You know, there's financial gains that then there's that reputational thing that will actually kill a company, right? Everybody can recover from most financial losses within an organization's breach or some kind of, you know, privacy issue, but the reputational thing takes years to build back, right? You know, so everything you can do to understand the data that you have, I would highly recommend starting to define a proper data governance strategy which starts with understanding that you have. Yeah, it's good segue into the next thing, which is kind of now what, you know, like any of these webinars, it's a matter of providing, you know, guidance and advice, you know, clearly people have a lot of investment. I think most, everyone's really heard you need to find that, you know, find out as much as you can about your data itself, you know, look at both, you know, offline data policies and look at how to ensure policing of those policies, ensure the level of discipline, certainly without our platform, you can start to bear to find that data very easily and very quickly and take out a lot of the grunt work, as it were, you know, with automatic classification based on either regulatory policies or in a lot of cases, the policies that you create, you may say anything over so many years is just junk or any MPEG file over so many years is junk and you can actually go do that kind of cleanup. And then you can start to get into, you know, the ABCs here of then to determine, you know, what's important, and what do you, what's duplicate, how do you trash it? So you're in a healthy state. And from that point, you can really go on to really exploit your data and put it to work. I mean, at the end of the day, we're all too busy trying to keep the lights on, replicate it, back it up, back it up, back it up. Well, what if you could actually reduce that complexity and that manual effort for managing that information and automate as much as possible? So you've got a very hygienic environment. Any kind of closing points there, Daryl? Um, no, not, not specific to that. Like I said, I mean, I would just, you know, take a step back before you, you know, if you're in the process of defining a data governance strategy, I would take a step backwards and evaluate your processes that you're about to start. A lot of times data governance strategies forgets that top layer of knowledge and goes directly to the data management applications. And why is that? Because the vendors out there today have trained every user and most people that are on the phone or are on this meeting today, they've trained them to think that they need their product to be compliant or to govern their data or to protect their data, right? And that that's true to a point, but they're not selling the complete solution of data knowledge, right? You know, so if you have the knowledge about the information, you're going to be able to make better decisions on the licensed amount of the data applications that you're going to need to manage the data further, right? And there's one comment or question that asks if we had partners in Europe and we do have a Germany office that you can reach out to. And as you know, Germany has some of the strictest data sovereignty laws. So we picked Germany for a strategic reason because of that, right? And they're also extremely intelligent over there and they can help everybody in the European market and they can also get their help from us as well. But no, I think this is a very important aspect. And you're looking at cybersecurity too as a posture, right? I have done many cybersecurity discussions around the data, right? I mean, if you take away the bait for the fishermen, they can't fish, right? So understand the data to where it is. And you know, there's nothing for them to get, right? So even if they did breach your system, you already know what they actually were able to breach, right? They didn't breach your risk adverse data because you've already moved it to protected shares or moved it off the network or closed the ports or access to it from the outside world, right? So all they got is Gary's cat pictures or his Michael Jackson's thriller MP3, you know, okay, you know, because I'm looking at it right now. You know, you want a Bitcoin for that? It's like, yeah, take it. I don't need it. You know, I need to get anything from me, you know? So yeah, I think about prevention and protection, right? All right, let's leave it there. The clearly just in summary, one of the key things is, you know, you've got to know what you have, you've got to be able to find the data, you can, you know, certainly go to get.apparavi.com and one of our data specialists, not a salesperson, can help you kind of give them more of an in-depth demo and even set you up with an install. It's pretty easy. And we really augment the environment. This is not a rip and replace. And this is not going to be intrusive into environment like, oh, my God, it's going to break something if I install this new tool. No, it won't. So very easy to get a good glimpse into where your data really sits. And at the end of the day, we're augmenting your existing data infrastructure, but we're then enabling basically the orchestration of data across the ecosystem. So with that, I'll turn it over to Shannon. And Shannon, I think we've got some questions that we may not have answered. I know, obviously, you know, being online for about 45 minutes, we've still got a few hundred people online. So why don't we attack those questions? Sure. Yeah, absolutely. Just diving in here. So this apparavi works for all types of data structured on structured files, databases, metadata, cloud providers. Yeah. Let me let me start that. So the connectors to the cloud providers are coming first quarter of next year. And the reason we did that is because we want to do it differently than everybody else did. So it's not, I mean, we could have done this in summer and just collected data as a status quo, right? But we're going to do it better. As far as structured information, there's many applications out there that do structured. But what we can actually do is use tools that are built into these databases to export files like CSV or JSON files that can be imported into the platform so that you can tell what's in there. And some of these can be scheduled tasks within these databases. So you can actually export the entire database to a JSON file or some or some other container file and then import it into our system. And then you can see what's in there. This is going to really help with your privacy stuff like your GDPR or CCPA where you've got to remove data. So at least you're going to be able to find it and locate it quickly as to what data sources have this information. Yeah. Thanks for that question, by the way. I was going to ask that question. But let me just add to that, which is there's both data sources and data types and data or storage destinations and storage sources. So clearly today, we can go across any file environment, any cloud environment and move that data to the similar. So we can move it or copy it up to cloud, whether it be Amazon or Azure or Wasabi or Google. But there are thousands of different file types and been able to support those and understand those and query those. We support those today. And then as we go forward, we're actually adding more application container connectors, whether it be things like Slack or other community tools. That's again, where a lot of data has been moved around and shared. So it's a matter of you're never done. You're always adding more policies. You're always adding more legal stuff in there and always adding more connectors. So again, as we say, we can find data no matter where it lives, but also whether it be on a storage device or whether it be in a controlled application environment. Yeah. Shannon? Great. And so can we integrate Apiravi with existing data catalog? No. The main reason for that is that Apiravi maintains its own proprietary index and metadata catalog. But with that said, the data can be exported and imported into these other catalog solutions. I'll add one thing to that, which is we've been publishing an API, you know, basically a RESTful API. So when you actually move content as well, it can be done in either native format or it can be in Apiravi protected format so that you can only access the data if you say moved it to Amazon, that you can only access it through the Apiravi interface or through the Apiravi API. So again, another layer of protection depending on who you want to have access to that data. If you simply just wanted to move, you know, group it all together, deduplicate your primary environment and consolidate it into an S3 and keep it in native format and provide access to 100 people, you can do that also. And then that's clearly under the IT administrator to kind of manage the access lists or the access rights to that particular data repository. So you mentioned the data cataloging capability. What is the difference between the other data catalog solutions? The difference being the data that you're collecting, right? So the applications that collect this information is designed by the users or the ones that have basically said, this is what information is important to me, right? With Apiravi, we just collect everything, right? I mean, if you don't need the information that we're collecting and you don't have to use it, but it's going to be there, right? In some facet, somebody is going to use some of the stuff that you're not using as a user specific to your role, right? So if you've built a data catalog based on what you think you need, there's going to be a time in the future that you're like, I wish we would have collected this information, right? So and to add that from a time basis to collect all the contents, all the metadata and put it into one location that's searchable using a combination of content searching and metadata searching and then add a classification search onto that, I mean, the sky's the limit. You've got access to all of the data that you ever need within the grasp of a UI, right? I mean, so data cataloging solutions are specific to the project, right? And we don't really have a specific project that we're trying, we're just trying to give you everything you need so that you can build your own determination on the value of the data you have. Yeah, I'll add one thing to that, which is also the automation of taking action on the data. It's one thing to find it and collect it and you know, cruise it, but then the other is to take action, to delete it, to copy it, to move it and to do that based on policy and to do that without having to have manual intervention. Now clearly you can control that based on the policy, but a lot of sort of catalog solutions actually don't do that. So it's nice to find stuff, you've got to do something with it, you need to be able to act on it. Otherwise, you're just sitting there staring at a problem that is growing. Yeah, very good. Fantastic. So the question came in, is this a duplication of other applications to some extent, or is this consolidating the need of? Yeah, let me address this one because there's a misnomer from where the vendors have educated people again, of many applications overlap other applications that do the same thing. What I would say, and I think Renee Rubio was the one who asked this question, which is a really great question because it's been asked before, but look at the completeness of the solution that you're looking at. If the date ranges of something is all you're looking for, well, technically you could build a script to go out and find the properties of all the files, create access and modify them. But in most cases, that's not a determination of governance. Like do I need the file just because it's older than two years old? I need other information about the file. Identify old application files that we don't, we haven't run the application for five years. But you've got these four terabytes of old database backup files of an application you haven't even used. So this type of information is like the low-hanging fruit that you can actually say, we haven't used this program for five years. Why are we storing it here? Why are we backing it up on our primary backup and our backup schedules now are overreaching a 24-hour mark where we're going to have to add more backup to it in order to reach these SLAs. So looking at the completeness of everything from the content collection to the metadata collection to the classification policies themselves, just look at the depth of the applications and ask those questions to the vendors and you'll find which applications seemingly overlap other applications. Some applications that were in this market have moved past this market to more security-centric solutions. And that's because the main guy in security went away and just left all their security people to hang. So now you've got 10 or 15 large security companies now that are vying to be that king. So some applications in the past that were very good at intelligence-related applications have moved on to security. So yeah, Rene, look at the completeness of what it is you're collecting. I think that would be the start. But also, don't look at data knowledge as one of the pillars of data management because once you get into that rat hole, the vendor is going to say, this is all the great bells and whistles that my application does to help solve your problem. And yeah, we might overlap this guy and this guy and this vibe up. This is why we do it better and this is why you need us. If you understand the data and the challenge of front of you first and then you say, well, I need this much backup because now I've been able to reduce my one petabyte down to 660 terabytes of data and now I only need to license 660 plus 20% more for my retention sets. So all these things become a much more manageable license amount. Archiving-wise, all that data that you're backing up doesn't need to be archived. You've only got a certain amount of data that needs to be archived. So when you classify it and you identify that data and based on its retention set, the classification is going to tell you what you need. All this data is SEC or SOCs or whatever regulated that we have to keep for seven years. So build a policy that keeps it for seven years based on that classification. But then if it doesn't, even though it's financially related, if it doesn't retain information around that regulation, you're not obligated to keep it. So yeah. So I'm going to scare everyone, scare Darryl here. Again, one of the things that I think we can do as well because obviously we are running out of time and there are more questions is we can host a, we can do a something, a demo. We can reach out to Darryl directly because he's the technical one here and he's just Darryl.Richardson at apparavi.com. So write that down in the data's email and then he'll have to go credit his PST. What do we say? One more last question then Shannon and I think we can let everyone have their day back. I've been answering some of these live. So we can look at all fine and well for files for their containers. What about data rows or JSON blobs and databases, you know, structured data. Again, there are many tools out there that can help you with the structured information. The one thing about structured information that kind of puzzles me is, you know, most organizations are really concerned about the type of data that's in a structured environment. But somebody's entering that information in. So you must know that it has PII or some privacy data in it. If that's what you're looking for, right? Because somebody in the organization is filling in the information or it's being generated by some web content or something, right? The dark data information, which is the unstructured stuff is that people are creating every day and we have no idea as storage admins or security admins or CISOs or CIOs what that data actually is and what risk it holds, right? Sales people have been known to keep entire spreadsheets of their customers, addresses, user names, their login information to get into their portals, their, you know, all this stuff was just, it's just sitting there on their machines, but nobody knows it because we don't have proper ways to look at it, right? You know, so. Thanks now. So Shannon, over to you. Well, thank you both for this great presentation, but I'm afraid that is all the time we have for this webinar. And just a reminder to everybody, I will send a follow-up email to all registrants by end of day Monday with links to the slides and links to the recording from today. Hope you all have a great day. Thanks everybody. Thanks to everybody for sponsoring today's webinar. Hope you guys enjoy. Thank you all for listening and hope to see you all soon. Take care.