 with the OpenShift product management team. And we are here today with Big ID and we have Alan, Sachin and Tim all from Big ID and they'll introduce themselves in a moment but we're really excited about this because they're using AI and data intelligence to bring data privacy and security and governance to the OpenShift platform as well as CloudPack for data and really excited to get to know their platform better. So Alan, please. Yes, good day everybody. So this is Alan Taylor. I'm the Global Alliance VP here at Big ID and on the call with me today, we have Sachin who's our Global Alliance's technical director and Tim Bain also in the technical side of the world here at Big ID. So Sachin will be doing some of the slides and a demo of our product. So, you know, very OpenShift Commons community session today as a brand new member. For those of you that don't know Big ID, we are a four-year-old company, a couple of hundred people pre-IPO and we have SAP and Salesforce as two of our investors. So great heritage there and great investors. We generally deal with a lot of the Fortune 5000 customers and we're proud to say that Fortune 1, Walmart is one of our biggest. So we play with both small and very large customers. So today, very pleased to present a theme of privacy by design. As this audience is probably a little bit more technical and a little bit more developer focused, we're going to talk about privacy by design and the advent of privacy engineering. Big ID describes itself as a modern privacy aware data discovery and intelligence platform. And as Carina says, we are based on containers and Kubernetes, so that's the modern naming convention. And you're going to see what we do across three different buckets, both data privacy, data security and data governance. Knowing your data from discovery to insight is really what we're all about. So the agenda that we're going to go through, we've kind of done our introductions. I'm going to give you a quick intro into what we mean by privacy by design and privacy engineering. I'm going to give you a little update on our cloud pack for data status with the IBM team and OpenShift. We're going to talk a little bit about our ecosystem and our alliances. We have our own marketplace and developer portal and all of these great things. So our ecosystem is very important to us and very strong. And then I'll hand it over to Sachin to go over the solution overview and finally to give you a demonstration of our product fairly quickly. We look forward to many more sessions where we can do deeper dives into our solution and getting people further educated. The really privacy by design last year IBM actually had Forrester do a report that talked about data privacy as the new strategic. And really what's driving that is customer trust. Customers who trust their suppliers, their vendors, partners around data protection, data privacy is seen as a major benefit. And in that report, they talked about privacy by design. And think about this of embedding privacy and data security, data governance into everything that you do in a proactive way and across the lifecycle of data management. And you'll see there, for many of you on the call, privacy by design starts to get you thinking about things like privacy shift last and this whole concept of privacy engineers. But really privacy governance originally was very much around regulation. GDPR, CCPA, obviously two of the very big ones for both the US and in Europe. And of course there's similar regulation around the world in different countries. And so designing by default privacy into everything you do, not only has a regulatory and compliance mandate, but there is benefits to understanding your data, having data quality in everything you do, whether it's AI, analytics, machine learning, et cetera. And data intelligence and identity data awareness is very much a force multiplier. Forrester talks about other mandates of your a holistic platform being scalable and global and you're hopefully going to see that through our presentation. This phrase of privacy engineering, we at Big ID wrote a blog on this quite recently and you'll see here a bit of fun here, make room for the privacy engineers, move over CPOs and DPOs. But again, privacy by design is a set of principles that embeds privacy and data protection into all products and services. And privacy engineers are the people who are going to implement this and in fact run operate. And what we're seeing in the industry around data privacy, data security, is some industry standards starting to appear. So there's some ISO standards, which Big ID are part of. Many of you will know that NIST security framework, again there's some mandate starting to appear in there for standards. So all good for the industry. I mean, we're living in an open shift communities where standards and openness and APIs are what it's all about. On the right hand side of the screen, you'll see a box here that's actually taken from Carnegie Mellon University, where they start to define what a privacy engineering programme looks like. And in your own time, you probably can look through this to say, you know, this is something that you'd be very interested in doing both in your own organisations and personally. And we'd invite you guys to let us know if you'd like to, you know, get into our training programmes, which are free to go through some of our learnings. So a little update, you know, I manage the IBM and Red Heart relationship globally. So we've recently been working with the IBM CloudPak for data teams. We've actually created a profile on their playground. And of course, having a Kubernetes container architecture right from the get go is very exciting to that group of people. We will now bring, you know, Big ID solutions together with some of IBM's CloudPak for data. I know there's other CloudPak programmes that we'll look at in turn. But we went through that process and we're in it now. Tim and Satya are working busily and getting it loaded up on to, in this case, an Amazon instance and building out integrations and use cases and also including the full-blown platform. So that's kind of underway. We're very excited. I think the IBM strategy together with Red Heart of delivering business solutions and business outcomes to a multi-cloud delivery is a very powerful one. You guys know that story fully better than I do at this stage. But not having lock-in and allowing customers to choose one or many cloud versions is pretty important. And then we talked about our strategic alliances and the ecosystem that exists. As you guys think about privacy by design, embedding privacy and especially data privacy, data security, data governance into everything that you do, we not only want you to think of new applications but the existing ones that are already out there. BigID has a very strategic alliance with all of the cloud vendors. That's both from a data source perspective because of course we are discovering data so we can tell you where sensitive data exists. But we also integrate to many of the cloud vendors tools on their platforms. So I look at things like Microsoft Azure, we have Microsoft Information Protection which is a tool that sits on that platform and we integrate to that to do data labelling of that sensitive data. So that's a very good example of that. I mentioned right at the beginning SAP and Salesforce being at two of our big investors. So of course you would expect us now to have developed some very strong powerful integrations and joint solutions with both of those vendors and that is in fact the case. So again, as you think of privacy design and your customers, if they're a big SAP shop or a big Salesforce shop, it turns out that 60% of SAP customers are also running at Salesforce. And of course the other big 800 pound gorilla is serviced now. You'll hear from Sachin about our APIs, how our UIs can be broken out and embedded into other UIs and other workflows. That would be a very good example of what we do with service now where they take a lot of what we do and embed it into their workflow and UIs. We also have a very healthy partnership with many of the data privacy, data security and data governance vendors who use our technology, especially some of our discovery technologies, to augment and make better what they do. So some of them may, on paper, you may think of them as competitors to big ID, but that's truly not the case. So our strategic alliances and our ecosystem are pretty important to us. And so that takes me to handing over to Sachin. We go back to that Forrester report that IBM commissioned last year where it described a holistic data intelligence solution. And that's really what we've done. We are in a very unique position where we've developed this platform, this data discovery and intelligence platform across these three buckets that you see on screen, data privacy, data security and data governance. And so with that, I'll hand you over to Sachin. He might just do a quick introduction and then he'll take you through the product followed by a demo. Thank you for the introduction. So I'm just gonna start off from where Alan left off from that perspective in terms of doing the positioning. So firstly, in terms of the topic today, so there are two key topics that we wanted to cover. One is in terms of how do we essentially play in the role of privacy by design and how do we enable the platform for a wider usage from that perspective. Now, as Alan was mentioning, traditionally when people have thought about privacy, it has always been retrospective. They've always been trying to look at retrospective solutions in the current data landscape to try and make sure that you can embed something somewhere. But if you have to look at it from a blue sky approach from that perspective, you will always start to look at it from a green field project perspective. So what is it that we really need from a privacy perspective? And hence central to any of the implementation that you will think about from privacy by design is to now look at who are we talking about privacy? Is it about individuals? Is it about entities? And the moment you realize that, you also realize that traditionally, the discovery has always been on your own classification based. Whereas in case of privacy, you really need to go to a deeper level into the data in itself to be able to essentially create controls at the data level. So that's where when we started to create the data discovery product, we thought that the metadata based discovery or reject space discovery isn't going to be enough from that perspective. We got to find the business context of the data. We got to contextualize any personal information. And that's where we essentially started to look at the different kind of discovery methods that we would use in order to essentially achieve that goal from that perspective. And hence, any level of data understanding that you're doing has to be from the perspective of centrally as an individual as a person. And that's where we started to focus on classifying the data, correlating the data all back to the individual or the person to which it really belongs to. And that gave us the basic fundamentals to be able to look across all kind of data assets to be able to essentially look at the data from an individual perspective. And once we started doing that, we then looked at all the kind of privacy regulations that were in place, which were all consumer focused, which were all individual focused. So it came naturally for us to be able to now put the solution in the market in order to compliance space in the privacy regulation but equally by doing so because of the discovery methods that we are using because of the way we are looking at the data, it gave us deeper insights into the data in itself. Now, when we talk about from our perspective, what are we essentially saying in that respect? So right at the core of the platform is the Data Discovery Foundation. And we also call it a C4 platform. And the reason why we say that is because in terms of the data discovery, we're fundamentally changing the way people traditionally has been doing discovery by first of all, not giving you one method but multiple methods that can span across all kind of data system. So if you start to look at about coverage of the system which I'll talk to in a minute, but really you're looking at structured, unstructured, semi-structured system, all kinds of different systems that you can essentially look into the data assets. But from the perspective of the data discovery methods that we would use, we will essentially use classifiers. First of all, classifiers are traditionally, people have always relied on reject space classifier. The moment you start to think about personal data, those do not work anymore because you can't really define rejects for finding somebody's name. You can't define rejects for finding somebody's postcode or address, for example, you need deeper data intelligence in that scenario. So for us, classifiers in itself was a different kind of technology. So we defined our own advanced classifiers which were based on named entity recognition models which we pre-trained on the context of it. And hence we are able to identify context like names, like phone numbers, like addresses. Those are things that you would essentially struggle to find otherwise. Now at the same time, we wanted to also make sure that the actual unstructured data, we can look at the overall context of that unstructured data and being able to classify that into a particular category. And hence we invented what we call document classifiers which were also named entity-based and again trained based on the model so that when we look at a content of the document we can identify if the document that we're looking at is a CV document, is it a boarding pass, is it a mortgage document? So out of the box we created about 20 such classifiers and again the idea is that that's once you're starting set as a user, as a particular developer you can create your own advanced classifier. You can train the model further to essentially create additional classifiers in itself. And then we also looked at what else can we do in unstructured space which does not require training. And then we came back into the concept of cluster analysis where we're looking at similarity of the data. So being able to look at the content, group them together based on the common content that we had. We essentially looked at this as a cluster analysis approach where we group a set of content, group a set of files together based on the common content that can be identified. And that immediately gives you a level of confidence that whatever data that I had I can probably create a smaller problem of it. If I had thousands of file I can probably group them together in smaller cluster and then look at the cluster themselves to see what sort of content do I have to apply a level of classification on top of it. Again, a different kind of method to look at the data but again giving you a different perspective as well. And then the third discovery method that we were generally talking about from a privacy perspective is a correlation based techniques. Where did we look at values of the data? We look at data values across the enterprise and not only do we identify, yes, I've got these four numbers but I now want to go a step further and say whose phone number have I discovered because that's what is important to apply any kind of privacy control on top of it. Now we use all of these different techniques to then create a rich catalog so that you can now have a central view, central view of the inventory, central view of all the objects that you carry across your enterprise to now understand on every object what is the sensitivity of data that you carry? You can essentially then look at number of use cases that you can develop as a result of it. Now the idea was that having established this kind of discovery foundation we wanted to now go a step further and say let's now create specific use case based apps around it. We developed an app framework and then there are different kind of applications that we developed depending on the use cases both in the privacy domain from the data protection side of things or the data governance side of things. Now if we were to look at so what else can we do from a privacy perspective then we also looked at also doing some amount of consumer facing portal requests. So this is another use case that apart from the core portal platform we also created this privacy portal which can also help individual customer individual consumers also manage their own request. It can essentially allow you to do and do it workflow management of any consumer request that may come in how do we accomplish that? How do you manage that within organization? So that's again another specific SaaS based product that we have recently launched to get into that particular aspect. Now equally another specific aspect of regulation is also understanding how you manage your third party data how you share that third party data. So we're also helping in terms of building that kind of a picture based on the insights that we are producing based on the kind of data that we can essentially understand that we can find out from these different systems altogether. And then more recently we started looking at how can we essentially look at scan images, scan PDFs for example to also be able to identify personal information in it. So we have created our parsers to now also be able to look at OCR based images essentially hence create a text parsable image and then be able to report based on that as well. Now this is an additional one that we are now trialing as well. Now one of the key things when we do the unstructured space discovery is the volume of the data. Everybody has huge mountains of volumes of data. How do we essentially reduce the amount of data that we have to scan from that perspective? So this is where we are now coming up with a different approach altogether called Hyperscan where we are going to be investigating the data assets. We're going to be looking at learning from the data assets themselves to be able to predict. So the idea is we'll be able to predict looking at some of the metadata from the file and a sample of the content to say whether it is probably likely to carry the kind of data we are looking at from that perspective. Now in a way, this essentially works in a way that we can essentially do a sample scan on those assets within the system and we'll create a sort of metadata model with that sample scan which will then drive our generated predictive model on that particular asset and then it will start to create predictions on top of it. So in a way, sorry, what's the question? Okay, so in a way, we're basically going to use almost like a sample of the data of overall volume, let's say between five to 6% essentially and then still be able to predict with a high level of accuracy whether these files are likely to contain the data that we are looking for. And based on that, we're going to essentially create this model, roll out this model so that the actual content discovery only happens on a small volume of the data based on the predictions that the model will essentially create as a result of it. So again, beyond looking at traditional discovery, this is again additional discovery thing that we are now trialing out which will be rolled out soon as well. Now coming back to the platform as such and how we make it more developer friendly from that perspective. Now this is where we have almost like three different dimensions. We are going in terms of providing those developer tools. Firstly, we've created almost like a community portal in itself for developers wherein you would essentially be able to find lots of useful information about the APIs, about the SDKs or even creating your own apps. There are a bunch of apps that are already community content that are available there as well. But really from the perspective of your launch pad, this exchange community becomes your launch pad from a developer perspective. Now when we talk about APIs in general, the idea for us in terms of creating that API layer was that whatever system that we connect to, we should be able to essentially expose the information being discovered from those through the APIs from that perspective. And from that perspective, the whole platform in itself is API enabled. Everything that we'll basically go through in the demo which I'll show you in a minute, you would also be able to use that information through APIs, which is how traditional we have essentially competed or integrated with other kind of ISP partners. And equally, we're able to essentially provide the level of data insights that you may need for creating your data pipelines. You may need to essentially look at the metadata in the data catalog, for example, or even from the perspective of what data is shared, how it is monitored, any level of insight that you want to produce, we can essentially do it from the API layer. Now from an API perspective, what we are also doing now recently is that we are also exposing our scan APIs so that you can use the scan APIs as a developer, embed it in your applications to be able to discover and classify your data as it essentially is being used. This is especially helpful if you have ETL workloads or you would essentially want to essentially look at the payloads asynchronously, and that is where this particular API can be especially useful in order for you to essentially scan the pipelines, in order for you to embed that in ETL tools or application in itself, where traditionally it would be difficult to discover data from that perspective. Now just going above from the API's perspective and landing into the data coverage aspect, so today in terms of the kind of data assets that we do discovery against, as I mentioned, they are all kind of structured, unstructured, semi-structured systems, even enterprise applications as well, but the idea is that we want to essentially make sure it is a scalable approach. So if today you have a set of legacy applications for which we don't have a connector, we want to make sure that you have the developer tools to create your own connector as a result of it, and that's where we essentially expose our own SDK from that perspective. We also have a generic API-based or REST API-based SAS connector, which only requires configuration, so you can pretty much create your own configuration for a SAS service, put in the API endpoint that can be used to scan information on that, and here you go essentially in terms of creating your own connector from that perspective. So not only are we essentially trying to cover as many different systems by having our own connector factory produce additional connectors, we want to make sure that as a developer you're also able to create your own connectors. As a SAS solution, it is easier for you to connect to those systems as well as a result of it. Now just coming back to the app framework in itself, so this is where we traditionally started to look at specific use cases where we may have a role to play, and hence these are some examples of the apps that we have developed on the platform, such as in the privacy space, you essentially can manage the consent, essentially look at the consent management from a privacy angle, tie it back to the consumer rights so that you can essentially implement that within the applications. You have access request management app to essentially understand and look at the data from individual perspective. You've got business flows to essentially create and maintain business flows. Lots of different kind of applications that come out of the box in the product, and there are lots of community level content as well, which I'll show you that in a minute as well. Now in terms of creating these apps, as I mentioned, it's a customizable app framework that we have built on top of the platform, which means that you can create your own specific apps as long as you use the manifest and you essentially are using that manifest, you're able to essentially identify the endpoint through which you will apply those operations. That's how easy it is for you to essentially create these kinds of custom applications from grounds up from that perspective. And then lastly, what about data in motion essentially in terms of how do we essentially handle data pipelines from that perspective? And again, for specific data pipelines like Kafka, Kinesis, et cetera, we already created specific connectors, which are essentially subscribing to events on those data pipelines, then monitoring the payloads whenever those come in on those particular events. And once there is some payload, they can essentially push that data payload back into the system to do a discovery and classification. And then based on the results of that discovery, you can essentially manage the workflow from those data pipelines as a result of it as well. So that's how we essentially change the model in terms of the way that you look at in terms of the data pipeline. The moment you push the data from the producer into the pipeline, we can essentially consume the payload, get it through the scanning process, correlate it, and hence you have a level of metadata available for you to essentially make your decisions based on what is the sensory level of the data and how do you manage essentially pushing that information back into the system as a result of it. So that probably gives you a good overview in terms of what we wanted to accomplish in terms of positioning of the product. Now, I wanted to essentially go back and show you a live demo of the product to see how that essentially works behind the scenes. Now, the one that I'm logging into is one of the sandboxes that we have. Now, in this sandbox, essentially, the moment you log into, you see this kind of a dashboard which is primarily statistical in nature, but at least it gives you the insight in terms of the granularity of the discovery that we would establish by looking into the different systems altogether. So we start off at the top level. In this case, we have 15 different systems that we have configured, and these are all a mixed of different kind of systems that we have. So you'll see we have SharePoints, MySQL, Postgres, Snowflakes, Kafka Pipeline. These are some of the systems that we essentially used in order to discover information against it. Now, beyond looking at these configured data systems, now, as a part of the discovery, we will now start to establish which one of them have any kind of relevant information. Now, in privacy-specific information, we're looking for PI, we're looking for personally related information. That is what this is quantifying in this particular case that 14 of these systems have some amount of personal information that I've discovered from these data assets. So right at the top level, we're now down to from 15 systems to 14 systems from that perspective. Now, within these 14 systems, then we go down to the underlying objects. And again, when we say objects, depending on the class of system that we have, these may be different. If you're talking of structured system, we'll be talking about tables or views. If you're talking of unstructured systems, we're talking of files. If you're talking of emails or messaging servers, then we're talking of actual emails and messages being objects as well. So depending on the underlying system, that's how we classify these objects. So all we're saying in this particular case is we have identified 5.1K of these objects within these 14 systems, which seems to be carrying that level of personal information that we were looking for. And then we go down to the lowest level of granularity that we now want to establish how many such records can we identify or how many such records exist in these systems, which are relevant for our discovery. So in this particular case, we had 1.1 billion records. Again, when we talk about records, we're talking of individual values in this particular model. We're talking about individual values of addresses, names, phone numbers, et cetera. That's the lowest level of granularity that the system will discover information against. And again, in between these layers, we also have the fourth dimension, which is where the individuals really come in from privacy perspective, that we now essentially start to establish what information have we discovered and who does it belong to. So from these perspective, the system will now go down a step further after identifying these records, it's gonna now correlate all the information back to the individuals or entities to which it belongs to. And again, the classes of entities in itself is configurable in the product as well. You can define whether the entities should be consumers, should be customers, should be employees, should be patient, should be insurance policy, doesn't have to be personal. For example, it can be non-personal information as well. You can decide to correlate information to products, assets, any kind of entities from that perspective. But again, the idea is that you're doing discovery at such a granular level that you're identifying the records. You're also correlating back to the entities where it belongs to, to give you a much more granular view of the system from that perspective. Now, when we do this level of classification from a privacy perspective or from a regulation perspective, what's also important for you is to also understand the residency of the data or individuals for which the information has been discovered. So from that perspective, as part of an additional classification, the system also identifies whether the data is being discovered is of Canadian citizens, is it of Brazilian citizens, is it of Chinese citizens, for example. So you get that level of granularity in terms of determining what exactly are you looking for from that perspective. Now, this level of determination can also be done at a country level, can also be done at a state level. So in case of US, we can essentially do it at a state level to be able to essentially identify whether the data that we have is of Californian citizens, Texan citizens, and so on. So you can decide what level of granularity you can essentially go to from that perspective. Now, beyond looking at this statistical view, we can essentially now go back into a more granular view of this inventory, which essentially gives you a much more deeper insight into the data. So now we are looking at these records themselves and trying to see the distribution of those across the different residencies, the different attributes that have been discovered, the different data sources, as it belongs to, and then the same kind of a geospatial map, but it is basically filtered based on the criteria that we might want to adopt. So if I were, for example, interested in knowing and going back into the HR system, I can see within the HR system, I've got different eight attributes of information. There are about 117.4k records exist. I can then even go down to a much more granular fashion if I wanted to, and say that I wanted to identify information about Brazilian citizens in HR system, for example, I can then essentially go down to that level of detail as well. And at that level, I can now start to identify the underlying objects, the actual records of information, and equally the actual entity. So these are all individuals that are Brazilian citizens whose data exists in HR system. So you are able to essentially create any level of deeper insights from the system in itself, depending on what level of classification you wanted to adopt or what level of granularity you wanted to adopt. Now, equally, beyond these or top level filtering criteria, you also have other dimensions that you can play around with. So for example, instead of working at an attribute level, if you wanted to work at an abstract data category level in order for you to identify data which is health information related or financial data, for example, or personally sensitive or PII, you can create your own custom categories to be able to essentially create insights based on that level of filtering criteria. And equally, earlier we also talked about how the system uses the privacy angle or the entity angle to also correlate all of these data sets. That's another angle that you can take while creating insights from it. So for example, if you're interested in knowing where all do I store my customer data, you can basically filter based on that. Or where all my insurance customers are, for example, you can be very specific as I actually do that as well. Or where all do I have insurance policies, for example, you can be even more specific than that as well. Now, equally, if you wanted to define risk levels on these, you're able to also define and categorize the data based on the different risk levels. And you're able to customize the criteria of how the risk is determined. There's a complete different risk model behind the scenes that you can customize as a result of it as well. So that establishes the two key components in terms of going into doing the discovery, creating this inventory, but really from a data governance point of view, this is where you now start off to essentially get insights at a catalog level. So this catalog is an inverted view of the inventory, but it really goes down to the individual systems that you're looking at from that perspective. And then each every object that you have in that system, you can essentially get to that level of information at a more granular level. So for example, in this financial object that was discovered, at a column level, I can identify what are the technical metadata elements that it carries. But equally, I can now also go down to that level and identify what relevant information has I discovered in these data elements themselves. That's where this attribute mapping really comes in. If I were just to hover around at an attribute section, this would now help me highlight what sort of information are we discovering in these columns, for example. So we've got things like card numbers, to think like transaction amount, transaction numbers, user identifier, email addresses, IP addresses. Now, these are all being discovered by the different discovery techniques, which is why you essentially see some of the different prefixes coming in, things which are prefixed by classifier. These are the result of classifier-based discovery, things which are prefixed by enrichment base. This is based on a proximity analysis that we would do across the data sets to be able to identify additional fields, things which have a proper logical names like emails, residence, user identifier. These are all based on our training the system based on the data values or a correlation-based discovery. There are multiple different discovery techniques that have been used on this particular object to be able to determine what sort of data existed in that case. And again, any level of categorization that you might have adopted would also come up here. Now, this also from the perspective of validation, it also helps you to also look at the data from the asset level. So if you wanted to look at a sample value from the data asset, you also have that option here where you can essentially get an attribute value with this particular case. You can validate if this value was really an IP address that the system determined or not as a result of it. Now, this was an example of a structured asset. We can also look at an unstructured asset, for example. So I'm just gonna pick up an example of SharePoint in this particular case and try and establish what level of information can be derived at a file level. So at a file level, you'll basically see some additional things coming up here such as a document classifier. So earlier I also talked about content level classifier but also document level classifier. So as an overall content of this document, it can essentially establish that this was a CV document based on the similarity groupings. It also identified which cluster this particular document was essentially put in. So that's additional level of information that you will get at an unstructured file level. Now equally, if I were to just hover around the attribute section, this would now again go back to the content level where it will say I've discovered full names and emails. I've also discovered email addresses. And here you will see the prefix NER which essentially shows that some of these classifiers were essentially discovered based on the NER based recognition rather than the reject space recognition. The full names and country and city fields were identified based on NER based model. Email was identified based on a reject space model. Emails and full names were identified based on correlation based search. And as an overall document classification, we established that this was basically a CV document. Now another specific useful feature that you will also have is identifying the duplicates. So if you have the same file present at multiple places, this system also identified the duplicates of these and it will again point out to all the file objects, full file parts of that particular file being present at multiple places so that you can essentially reduce the amount of duplicates that you might have in your estate and essentially optimize on your storage cost, optimize on the way that you have essentially handled some of this metadata or master data as well. So that's what essentially shows you in form of this catalog. Now if we were to now go back into a more granular fashion and I'll start to talk about from the perspective of what you can essentially do based on the discovery. So we look at the classified based discovery. This is where you can now get to the next level of detail where you're now starting to look at the specific classifier and where you find those specific findings. And again, in this particular case, these are database classifier, which are both reject space, but equally there are all NER based classifiers as well. Then there are metadata based classifiers, which again, something that you can identify only on the objects themselves where you're looking at metadata within that. And then there are document level classifiers, which is at, there are about 18 to 20 of them, which are already pre-trained within the system in order for you to establish that these four documents look like financial statement documents, or these four documents look like invoices, for example. And again, that's based on what we have already trained the model for it to identify such classifications. And the model in itself can be further trained. You can create your own additional classifiers or in order for you to essentially then create your own kind of custom classifiers, which you can either make it part of the product or leave it as specific things from that perspective as well. And then we talk about cluster analysis, which essentially helps us to look at the similarity of the data from that perspective. Now, when we look at the similarity of the data in a raw form, what the system does is it essentially groups the actual files together based on the common content. So in this particular case, you see seven of these files were grouped together because they all had these common keywords. Now, by default, the name of these cluster would just be the appended list of these keywords. But as a business user, the idea is that you would essentially come up to this particular case. You would essentially then look at these keywords or the content attribute that have been discovered to give it some business context on what that particular grouping should be called from that perspective. It will also allow you to create tags of your own or if you wanted to essentially use the established stats such as risk level, for example, and assign a risk level for that, you could do that as well. So there are a number of different use cases that you want to essentially use in this particular model, but the most relevant one in this particular case is to auto-classify these grouping of files by essentially first of all grouping them into cluster and then you essentially apply a level of classification at each cluster to be able to essentially then get to the granular level and hence being able to adopt a more automated way of flashing or a more automated way of classifying these documents together. Now, like we had a document level duplicate handling for each object, we also have it in clusters. So even at a cluster level, you're able to identify how many of these files within that cluster duplicates, for example. So that's another model that we have in terms of doing the discovery and really coming back to the core discovery model, this is where you essentially do a value-based discovery in order for you to establish the presence of data elements in the different object themselves. So it sort of creates this kind of a model behind the scene where it creates those linkages of that data element based on what values are being discovered where. So here, for example, this email was discovered in the comment section in the note table. Equally, if I were to look at some other examples, I can also see this was also discovered within the note section or in the content field as well. Again, the whole idea of creating this level of mapping is that you're doing a value-based discovery and at the same time, because of the models that you're using, you're also able to establish what is your level of confidence of finding those values in these objects, which is what is being shown in this particular case. So for me, in this particular model, they are 93% confident that the content section is carrying the email value. But equally, this content section also happens to have other values like patient references, email addresses, et cetera, which means that this will be now mapped to three different data elements. If I were to look at patient reference, for example, the same object would essentially appear here as well, but with a different level of confidence. Here, for example, the confidence level of finding patient references 96% in content section. The content section had a 93% confidence for finding email address. So this is a typical example of a free text field being used for putting in multiple set of values behind the scenes. So that probably gives you a good level of understanding of the core discovery platform that we have. Now beyond these, what we also have is, as we were earlier talking about, the application framework. So you'll see there's an add application button which essentially allows you to essentially create your own applications and manage that here. But after the box, there are lots of different applications that we have here, such as these data privacy applications, these data protection applications. And then there are also third party applications that have been developed as a community content or by the third parties themselves. Things like JIRA integration, which allows synchronization of tasks between JIRA and us. Things like service now integration, which again, there are multiple actions that you can take from a service now perspective. And again, these are all API level integration that we have done. For example, importing the data sources from the service now of CMDB, syncing the tickets or inputs into the service now, or even pushing the actual data sources or asset information back into service now as well. So these different kind of applications are different connotations. Some of these are more security related, such as integration with password vaults like HashiCorp, CyberOF, et cetera. Some of them are more in the discovery domain. We can do network discovery as well as a result of it. And then there are others, for example, which can essentially help us do things like breach analysis. So the breach response app essentially helps us do a breach analysis. This Microsoft AIP essentially allows us to do labeling based on Microsoft labels. We have the Access Intelligence dashboard, which helps us essentially to look at over permissioning of data sets. So beyond looking at the data themselves, it also essentially highlights where some of those data sets may be over permissioned and hence available to the whole organization to give us a view of the risk frameworks as a result of it. So there are different kind of applications that we have behind the scenes in order for us to essentially work and establish based on these. So that hopefully gives you a good flavor of the product, the way it is positioned so far. If there are any questions, we can probably open the floor for any specific Q&A we have at this point. It's incredible seeing how far privacy engineering has come. And you've made it a reality, right? Yep. So I mean, this platform looks to be really helpful for just getting organizations to go from policy to code to Q&A and providing that real business value and just going beyond the mere theory on how to build privacy into products and processes in your application. It's just incredible what you all have done and built. I just wanted to lay that out there. Well, thank you. Yeah, no, it was a great vision four years ago and here we are delivering on that vision. And like yourself, when we present it to partners, to customers, even to the analyst community and whatever, it gets the same reaction. It took a complete rethink of the world that we live in and making it look simple as hard work as we all know when we build stuff. But from our discovery and depth, which is the foundation for all things, people on this call can take huge advantage of that discovery and depth. I mean, we recall last week with a partner who really was, lots of discovery products but was really struggling with unstructured data and we took them through the story myself and Sachin and we showed them what was possible. And again, they thought, oh God, we didn't need no product like this existed. So when you can't get to all of your data, then that's your weakest link effectively. So thank you for those comments. Any questions from the floor? Thank you, left and speechless. This is usually when we know it's been a really good presentation and we have to go back and digest it. And I'd love to have you back again and we can have some more discussions around privacy and everything that you've talked about because there's a lot here. Where do you see it going from here? Well, I mean, I think we're very excited about IBM and Red Hat and the whole CloudPak for data integration. We see that as a great combo. As I said earlier, I think the whole reason for IBM acquiring Red Hat and this whole delivery on a multi-cloud environment is extraordinary. I think it's a great strategy. And we've been lucky that right from the beginning, we defined an architecture was not only scalable and global, but it was based on containers and Kubernetes. So we already have customers, as I said earlier, using our technology in an open-shift environment. And we'd love to give it to the community and really deliver against what our topic of the day was, is privacy by design and privacy engineering and show people how easy it is to take what we've built. Our ecosystem, there's a huge amount of heavy lifting whether it's SAP or cloud vendors or some of the other ISVs that you will see in your customers. So a lot of that heavy lifting and integration work has already been done and it's just drag and drop and bring it into your flows. So a call to action for us is you're at the end of the slide, you have my contact information, you have Satchin's. We'd love people to reach out to us. We'd love some comments. We'd love to see a whole bunch of people getting on our training. I mean, through the Covid period, we've been doing a huge amount of training and it's all free. And people have been, while they've been stuck at home, have taken time to do our training and start to move through the certification process. So we'd love to even create a dedicated session for the OpenShift community and run a training course, perhaps, just for the OpenShift teams. Thank you again and let's bring you back and welcome to the OpenShift Commons community. This is fantastic. Thank you. Thank you and thanks for listening and thanks to the big ID team, Satchin. Great job and look forward to our next time.