 Okay, thanks a lot Grun for introducing me and organizing this webinar. I'm happy to be here and presenting. I'm not really at the beach that's depicted in the background, but I thought that giving you a glimpse of the Greek summer, it would be nice even if we cannot enjoy it right now. So, my name is Manuel Sterovitis, I'm a researcher. I work in the broad area of data management. In the last few years, I've worked a lot on data anonymization techniques and I'm heading the development of the Amnesia tool. I work in Athena Research Center and the tool is developed there. So, on this presentation, I'm going first to briefly present what data anonymization is and what one should expect from it. I will tell you a few things about the theory behind Amnesia and then we're going to have a demo. So, if something is not clear or if you want some details, please write on the Q&A box and I will do my best to provide you more details and explain again whatever is not clear to you. So, let me set my screen and we should start from here. Okay, and also I will make it. Well, why do we anonymize the data? I think the general issue is obvious. I want to use data from human activities that contain very valuable information. I think the COVID pandemic is a very good example for this. Data about how the disease spreads or the networks that are created is very useful to stop it, to understand it, to contain it. But at the same time, the data that is required is threatening to the privacy of citizens. For example, if movement data is at stake, they can reveal anything from social interaction to political or even sexual preferences. Very detailed information can be revealed from tracking human activity if it's studied a lot. So, data anonymization comes there and tries to do two things at a time. One thing is to protect the privacy of the user so we cannot recognize a user or make an inferences for a specific person by looking at anonymized data. I'm going to come back to what means cannot recognize or cannot make inferences. But at the same time, data anonymization produces anonymized data that are as accurately as possible so that data analysis is feasible. I would like before getting to the details to stress the difference between data anonymization and pseudonymization as it is defined in the GDPR and also as it behaves differently in practice. Now, pseudonymization or a naive anonymization that simply replaces direct identifiers like name, social security numbers or any other kind of specific identifier that clearly describes a person results to pseudonymized data. This means that if someone can make the link again to the original person, everything can be revealed and no privacy is hidden since reverse engineering on the data is possible. There's no guarantee that a third party cannot use the data to go back to the original person. And this is done in two ways. If they're very naively anonymized, then, for example, arbitrary identifiers are defined based on the housing of the original name or social security number, maybe even in a mathematical way we can go back. But the easiest way to go back to the data is by using secondary identifiers. That is information that might not be by itself identifiable, but by combining several bits of different information, we can go back and isolate a specific person. Let me give you an example on how this works. Okay, I'll go first to the example and then go back to pseudonymization. This is a real example and it was used in the first papers that introduced anonymity around the year 2000. And it came from the Massachusetts state in USA. And the idea there was the situation actually that for reasons of transparency, hospitals published anonymized records of treatments. They were saying that they removed the name, they removed the social security number, but they gave publicly all the data that are on the left cycle here described as medical data, ethnicity, visit date, diagnosis and so on. At the same time, in the United States, they had public voter catalogs, which did not contain any sensitive information, but they contained the data that's depicted on the right cycle here. Now, if you look at these two cycles, they intersect on three specific fields, which is the zip code, the date of birth and the gender of an individual. So any person could combine these two catalogs based on these secondary identifiers, which we will call quasi identifiers in the scope of anonymization. And based on the quasi identifiers, they were able to link the treatment data from the from the hospitals to the identity of the person from the voters catalog and actually proved that statistically, like I think it was 87% of US citizens were unique based on the date of birth and zip code. So by using these two bits of information, the zip code and the birth date, which are not directly direct identifiers for a person, you could reverse engineer the pseudonymization and get the identity of a person. Now, the GDPR, this discussion and this extra example of the GDPR makes a strict distinction between pseudonymization and anonymization. It clearly states that pseudonymized data are not anonymous data, they're personal data, and you have to take every measure, the whole regulation applies to pseudonymized data. And the basic difference that it defines is that pseudonymization is reversible, but anonymization is not. Now, this is not always black and white. There are many intermediate versions of anonymization because an anonymization technique might not be reversible, but it might allow an adversary to make some inferences about the original person. The original person inferences could be that a specific patient has an increased probability of having cancer in his medical record. But GDPR says that anonymization methods, and that's methods like there are ones that are provided by amnesia, K anonymity, KM anonymity, differential privacy, which give a guarantee that irreversible produce anonymous data which are no longer personal data. And this is a huge difference on how we treated the data, because if they are anonymized and they're no longer personal data, then the GDPR does not apply on them. We can share them, we can use them for research and every other purpose. So real anonymization like the one performed by amnesia frees us from the constraints of GDPR and the constraints about handling personal data. So I will present the basic guarantees and anonymization methods employed by amnesia. So you will have some background before we see the demo. I don't see any questions yet. So we're fine. The first guarantee that was proposed for anonymous data was K anonymity. And what does K anonymity do? With K anonymity, we transform the data to a more abstract form where the record of each person is indistinguishable from the record of other K-1 persons. Basically, we hide each individual in a group of K individuals. The most popular methods for doing this is suppression and generalization. Suppression is the removal of values, we remove outliers. And generalization is the replacement of specific values with more abstract ones. For example, in this example here, the exact age has been replaced by an age category, less than 30, more than 40, between 30 and 40. These are the three categories on age here. And for zip code, the last digits are removed and this removes the accuracy in the definition of the administrative region that's described in the zip code. Zip codes actually have an internal structure where each digit defines different information, usually administrative regions of different sizes. So in the right most table here, we have four anonymous data sets where even if we know something about a person, we know that John is American and he's 29 years old and we even know his zip code. So we cannot learn which disease he has because when we examine the second table, there are always four candidates. Now, we're not going to get in a lot of details here and it doesn't always come into practice but it's good to have in mind that the canonical limit is irreversible but they are always pitfalls. For example, if the equivalence groups are small, k is low, then there is a chance that everybody will have the same disease. So even if we cannot recognize the exact record, we can infer that the specific person suffers from a specific disease. Now, I'm going to get into some more details on how amnesia achieves k anonymity. Most k anonymity algorithms, all algorithms that are based on generalization both for k anonymity and for other guarantees, cell diversity or anything rely on some instruction on how to replace specific values with more generic ones, which we call them a generalization hierarchy. Here you see a very simple example on how administrative regions, countries are grouped into larger regions and this semantic information is given to the algorithm. So if he needs to generalize a value, which is a country, the algorithm will know to which greater region it belongs to and will do iteratively as many generalization as needed as to achieve the desired privacy guarantee. The simplest algorithm and the simplest way to work with generalization is the following. We examine all the data and if they are not in groups of size k, let's say that we have a very simple data, right, where the only quasi-identifier is the location of a person. Then we group the locations and if some countries appear less than k times, let's say that k is equal to 4, then we go one level up in the generalization hierarchies and now every value of every country is replaced by the most, by a greater region where it belongs. If now again there is not enough, there are groups that are smaller than k, we go one step up and we do one more replacement. If this is shaped like a tree, then there's always a root, so the algorithm converges because in the worst case it will replace everything with a single value and if we have more than k records then we are going to have k anonymity. Of course this is a trivial case that guarantees the convergence of the algorithm but would not produce probably useful results. Now we call this global recording and full domain generalization. Global recording means that if a value, for example Greece is replaced in one place in the data set, then it will be replaced everywhere in the data set. There's no chance that we're going to see south Europe and Greece in the same data set and it's called full domain because we go layer by layer. We do not opt just to generalize Greece and Italy to south Europe and leave north Europe as Germany or Holland, even if this satisfies the k anonymity. This is the simpler version, it requires the simplest algorithm and also the result is intuitive because everything is generalized to the same level. Here is an example of a two anonymization, a two anonymity table on the right. We want to make certain that its record belongs to a group with two records. So the only transformation we can do is at least the one that the algorithm finds is the one on the right most table. Now the drawback here as you can see in this example is that the full domain generalization is not very flexible. One problem will cause to generalize all values in a layer. Contrary to the full domain and the global recording, we can use what we call local recording. Now here in every different part or every different group, we can do a different anonymous, a different substitution, a different recording of the original values so that we stay as close as possible to the original data and at the same time we provide the required privacy guarantee. As you see the data here again to anonymous, but the location information is delivered in different levels of granularity. In some cases just the south Europe, in the case of Greece which had a lot of appearances in this data set in two cases it remains the same. And some values that could not be grouped any better appear just as Europe. So the gain with local recording is that we preserve information loss better, but the tradeoff is that we have more complex algorithm and the data are less intuitive because when you search a location, for example, if you want to make an estimation on how many people live in Greece, then you will have to get all the people that appear live in Greece in the data set, some that appear to be living south of Europe, and even less from those that appear to be living in Europe. So it requires more complex data analytics, but at the same time the results you will get will be closer to the original. Now, amnesia supports both methods, and I'm going later to demonstrate them both. An additional, an additional advantage of local recording is that it also allows us to process the data in partitions and this greatly improves the scalability of our algorithm, whereas for the global recording and the full domain generalization, our algorithm gets everything to the main memory, so it's limited on the amount of the memory of the server that someone is using for the anonymization, the local recording can process just a part of the data set at its step. So very large data set can be processed, it takes, it might take a lot of time, but it's doable depending on how much resources you can give to it. So both these algorithms achieve the same guarantee which is K anonymity, but they do follow different ways of transforming the data. Now, I would like to also discuss about the KAM anonymity and a more complex guarantee that we offer in amnesia, which is also one of the unique features of amnesia, to my knowledge there's no other tool that offers KAM anonymity. K anonymity works well on simple relational data, where you have a limited number of quasi identifiers, because it requires that each group has the same number of, has exactly identical records with respect to quasi identifiers. This is very problematic when we have sparse high dimensional data. For example, if we have, if we had, I'm going to directly show to you with an example. This is, let's say that on the top most table we have a very simplified log of transactions in retail store. So we just say that each person, Vasili has bought foods, has bought meat, and has not bought vegetables or fish. Manolis has bought foods, meat and vegetables, but not fish. If we wanted to, if any of these transactions, every product that Vasili or Manolis brought is a quasi identifier, then if we wanted to produce a two anonymous data set, we would have to make them identical. So vegetables and not vegetables would have to become one value. And also, since every combination here, every record has a unique combination of products that were bought from possibly a very large domain of different products, then the only solution we would get even for two anonymous is the solution at the bottom, bottom table that we see here. We would only say that each person just bought food because the quasi identifiers have to be identical. Now, if we were a bit more flexible, we could gain a lot more in terms of information quality and from a privacy point of view, we can afford to be a bit more flexible. For example, think that you go to a supermarket and the supermarket has 20,000 products and you buy like 20 of them. It is not likely that an adversary can monitor whether you bought or not 20,000 products. An adversary would only have partial knowledge, either because he looked at your shopping bag or because he has a loyalty program that tracks few of the products of few products that were taken in that you put in your shopping bag. So if we consider adversaries that have only partial knowledge about their products you bought, they know up to M products that were bought in a single transaction, then we can offer what we call KM guarantee, which guarantees that every combination of M different products will appear at least K times. So, we can offer the more informative table that you see here at the bottom of the screen where every combination of two quasi identifiers, fruits and meat, fruits and other food, meat and other food, appears at least two times in two different candidates. Note that this combination, this method does not consider negative knowledge. It does not consider that someone did not buy something because we consider this a very weak quasi identifiers in real world cases where if you buy 20 products out of 20,000, someone knowing that you didn't buy something, it's not very informative, it's not easy to come by to this knowledge, but it's a lot easier to come by to the knowledge that you actually bought something. So, a two to anonymous data set could be the one that you see below and these guarantees that any adversary who knows up to two bits of information from what's in your shopping bag to quasi identifiers of all quasi identifiers that characterize a person, then he would always have K to hear candidates as possibly records in the anonymized data set. So, this anonymization method is better suited for multidimensional data. That is when you have too many quasi identifiers and it's very effective when these data sparse as in many cases where from numerous quasi identifiers, there are only few that are not null, that have some positive value in them. So, okay, I'm not going to discuss now everything about data anonymization and all the challenges. I'm going to stay just to humanize as a challenges that we're going to meet with amnesia and every other tool. And the basic is that there's still not enough experience on how to set the several values that define the strength of a privacy guarantee based on experience. We don't have decades or years of using anonymization methods and we know which settings are dangerous or not. I can only bring forward the practices of the statistical authorities which usually publish data in a category if they're my thinking for the euro study it was like five entries in the statistical categories. I think for the Greek authorities were a smaller country it was just three entities in its in its category. Another challenge. Okay, I see something in Q&A. Okay, I will, I will tell you a few things about K&M I think this is exactly what I'm answering now. My only direction at this moment is the practice of statistical authority so a K of five would be in the same lines with what statistical authorities do. Now them should be an estimation of experts in the field on how how probabilities for a third person to acquire M parts of a record. So this is a challenge because it's not a technical challenge. It's not something that as technical people who do the algorithms can actually solve. What we can give is freedom of flexibility so that it can be easily adjustable by experts, but it's only experience that actually helps to define K&M. Well, I will answer one more question that I think it's relevant that since anonymization is not really tested in practice, if we should treat anonymized data as pseudo-anonymous at the GDPR to be safe. I would say no, I would say that we should treat them as anonymized, not as pseudo-anonymized. Maybe we should take, excuse me, maybe we should take some small steps, use anonymized data and share them only with control audience or start by sharing only a small part of our data. But the guarantees are there and there and the guarantees provided by anonymization methods, even the simplest ones, even the weakest ones are a lot stronger than what is done now in practice for anonymizing data. There are no years of experience with these kinds of methods, but there are some years of experience with other methods like what the statistical authorities do or what the HIPAA rules do in the United States. The United States have some rules for providing an anonymous data which are pseudo-anonymous based on the GDPR. And there are fixed rules saying that, you know, from zip code, you can only publish the first three digits or if you want to publish the birth date of a person, you only publish the year or the decade, there are these kind of rules. And after years of practice, they do not have very important problems with these kinds of methods. So if we use as a point of reference the existing methods and we're certain that the methods we provide for anonymizing data, which are formal step forward, I think we have taken what the GDPR says all reasonable measures to guarantee the anonymity. Because GDPR has this word also that they cannot be reversed by spending reasonable resources. So if you use state-of-the-art, you know it's better than the older practice, I think you can show the conformity to GDPR. One more question that's based on where anonymization fits. I am asked whether if we have efficient access control, if we don't need anonymization. And I would say that they are complementary. Access control gives full access to people that you trust to get all the information. The idea of anonymization is to give the useful information to people that you do not trust. That for example, with COVID, you might want to share all movement data so researchers everywhere can use them. And you want to give a wide audience where you cannot do any efficient access control on who gets it or you don't want them to get it anyway. So anonymization comes there when you do not trust the recipient to treat personal data. But we also want to give the useful information for other purposes like scientific or marketing research. So access control and anonymization have different goals. Also the GDPR says that you have to give as less personal information as needed. So if somebody wants to do scientific results or do marketing research, they do not need to know the information that lies, the personal information, the data. They just want to know correlations and patterns. So by anonymization, you try to remove the personal information and just leave there the correlation and the patterns. Okay, what time is it? I want to continue with the questions, but okay, I think we have time. Another question is how a data owner that preserves the original data can actually produce anonymous data since the original data always allows him to reverse and to near the data. This is not the case. If we, if we do replace and randomize the positions, which is trivial, or if we do replace the class identifiers by monitoring value, we cannot go back, even in the original data. And of course, the data owner does not need to go back. So the GDPR, when it says about reversibility, it does not refer to the data owner. The data owner who has the original data does not really need to do any reverse engineering. It has the original data. It refers to an attacker and an attacker cannot reverse engineer data that have been anonymized formally and there's a statistical guarantee about what can be, which records can be isolated or not. Okay, one question that I'm not said enough, completely understood. It's by Judith, so please send more if I'm not answering. If I understand well is how do, how can we use these anonymization algorithms to anonymize data that are collected incrementally and the world will get more. You cannot, you cannot really make a guarantee between different, between data that are updated. If you anonymize a data set and you give it out publicly, and then you put 10% additional records in the same data set and you anonymize it again in a different way and give it out publicly. Then there's no guarantee that somebody can use both versions of the data to get something more than what you guarantee. This is an inherent problem of almost all data anonymization techniques. It's not a problem of amnesia. It's a problem of canon, immediate diversity closeness and even differential privacy. So differential privacy offers us the option to estimate the possible information leakage since it just doubles the privacy budget when you anonymize twice the same data. But in general updates are very problematic. You need to do it if you have, let's say 100 records and you get 10 more is, and you have published anonymously the 100 records in the past is to anonymize separately the 10 new records and publish them and do not reanonymize the original 100. This way it would be safe because the two data sets could be different and disjoint. About having a portable or a browser version of amnesia, we do have an online version of amnesia and you can find it on our site which is amnesia.openair.eu. But we do not consider it most suitable mostly for demo and training purposes because then to use the online version, your data have to leave your premises, go to the server, get anonymized there and then you get them back. So this is not a good practice. But as a portable, yes this, okay this is not portable you install it but we can, since it's simple Java we can make it portable without needing installation, without needing an installation on the local computer and thanks for the comment, I will keep this in mind. Okay, about the tool storing the data, I think partly was covered before. The tool, if you download it, which is the safest practice, then it reads the data from your local disk. So it is the user who has stored the data. The data are only used in main memory and the anonymized file is written again locally. Now if you use the online version, then they're stored for a short while on the server. And that's why we do not propose to use the online version for real anonymization, it's mostly for training and demo. If you want to anonymize sensitive data, I would recommend to everyone to download Amnesia locally and use it locally. Also, it's also a matter of computer resources, the online version cannot really anonymize very big data because Amnesia anonymization is an expensive procedure similar to data mining so we do not have that good resources in the online version to work for many users on the same time and for large datasets. We do not do data, we do not do anonymization on completely unstructured information. We only work on Excel tables or semi-structured information which are the set values and I think I will very quickly show you now the demo. As we cannot anonymize unstructured data, we cannot also anonymize qualitative data. I mean, any unstructured like interview transcripts, we can only anonymize features that have been coded as tables or object relational tables which are a bit less strict. The link for Amnesia is, sorry, I thought I had it on the proposal, on the description. So, I think I will stop answering these questions here. Actually, I'm not on a clock so I can spend more time answering them but I would like to have a demo inside the designated time period for this webinar. So, let me say... Sorry to interrupt, you can go over time afterwards. So, if you want to first give a demo and then you can, if there's some more questions you want to answer, it's no problem if people want to stay. Okay, thanks. That's what I had in mind. So, for everyone who does not have enough time, we'll have a demo. Okay, this is Amnesia and actually... Okay, now you all see the first page of Amnesia. Obviously, the first thing we have to do is to upload the data set. So, we choose a data set which have already chosen in the past. And as the first thing, we're going to see how we can do simpler anonymization. This is a simple anonymity. This is more or less an Excel-like import wizard where we take the limited files, we define the delimiter for every field. Amnesia guesses the type of the data but it might not be correct so the user has to correct this. And here we can choose which data will participate on the output data set. So, we remove the directed identifiers. These are synthetic data similar to health data in the UK based on some original data we had access to. So, I'm going to leave the date of birth, the marital status and the diagnosis codes as quasi identifiers. And I'm going just to do simpler anonymity. So, we have to create hierarchies. Now, hierarchies are text files for some that can be created either through the tool or manually. They just show how we can generalize its value. This is a hierarchy that has been created in the past and I'm just reloading it. And Amnesia helps us also to auto-generate hierarchies. This is quite useful when we have continuous domains like the date of birth. So, dates are the most complex ones and what I have to give is a name for this hierarchy. Amnesia reads the data set and decides what's the lowest and the biggest value. The way it depicts the dates here are the years, 1930, 1989. It's just the Java description that's chosen here. And because dates are not on the decimal system but we count them on days and months and years, we have to instruct Amnesia how to define periods in terms of years, how to define it in terms of months, in terms of dates, and then how the five-year periods will be grouped to higher-level instances. So, by doing this, we get a hierarchy over dates. When we want to generate hierarchy over strings or distinct values, then Amnesia does not know how to do it. It can only allow us to do a sorting alphabetical or not alphabetical. And here, we're going to see the first big problem of simple anonymity with more complex data. Now, it designs random names and at the bottom, it has combinations, values of the first row of our data, which, if you want to see again, it was like this. If this is a collection, this is actually a set of different ICD codes. ICD codes are diagnosis codes for patients. Each of these codes means that a patient has been diagnosed with a different disease. But if we store them like this in a simple table that's not opposite relation, then each collection of codes is treated like a different string. So, when we go to the algorithms, we have to, in stack Amnesia, which hierarchy should use with which attribute would give a value for k and will execute the anonymization algorithm. Now, I'm going to zoom in. The anonymization algorithm for k anonymity gives us this very complex lattice. The numbers, remember what three quasi identifiers. Each node shows a different state of generalizing the quasi identifiers. Here, the first quasi identifiers has been generalized three times. The second one has been generalized three times, and the last one none. And the data looks like this. And also, it's 13% of the data that do not fall in some group of size of 5. We had k anonymity equal to 5. Now, red nodes depict the solution generalization that do not produce k anonymity. Blue nodes produce so valid solutions. That is solutions that do produce a k anonymous data set and see this k anonymous data set. The model status has been, has not been altered at all. The date of birth has been, has been replaced by 25 year periods. And the diagnosis codes have been totally raised because they are all different. Because the combination of the individual diagnosis codes is different for every person and k anonymity requires that they are identical. So just by pressing this, we apply this anonymization solution and we can save locally the anonymized data. Another option would be to examine a solution that's not valid, that does not produce k anonymity. I'm sorry, that's okay. This is not a good example. Let me see. Okay, I will go back to this one. This one says that 13% of the values do not adhere to k anonymity. Now, instead of generalizing more the data, I could choose to suppress, to just delete the 13% of the data set. In other examples, this could be very small. It could be less than 1%. And then get an anonymized solution where now we have preserved the date of birth more accurately, instead of having 25 year periods, we have 5 year periods, but we have lost 30% of the original records. And this is a trade-off that's decided by the data curator. Okay, let me see your questions. Okay, to anonymize an Excel file, you just have to save the Excel file as a comma delimited file, which is a simpler format. And then you can just upload the data on Amnesia and define the delimiter. We do have a small data set on our web page and we will upload a more complete example over the next period, so you can have more options for training. And again, the discussion about how we choose k, I think we've answered this before. It's based on experience and values of 3 or 5 are in line with what statistical authorities do about publishing data for different groups. So, okay, we'll have 5 minutes here, so I'm going to show you the KM anonymity choice. So I'm going to restart, I'm going to load again the same data set, the same delimiter, but this time I'm telling Amnesia that this is not a simple table. It's a table with a set-valued attribute. Those that are familiar with computer science, this is an object-relational table, and actually what we're saying to Amnesia is that one of the fields is not a simple field. It has a set of different values, which are an arbitrary number of these values. So it has more flexibility on the data structure it uses to model the data. And now we have to give a second delimiter. The first delimiter defines the different fields, which is the comma here, and the second delimiter defines different values inside a field. So Amnesia has guessed that this is a set, and it's actually a set of these different ICD codes. I'm going to remove these things and actually just have a simple example. I'm only going to keep the marital status, and now I get this data set. I have a predefined hierarchy for the ICD code, which is an ontology of diagnosis used globally, and I'm going to use the same hierarchy for merits. Again, to let the algorithm know which hierarchy should use with quasi-identifier, but now what we're going to do is provide KM anonymity. And this is the way we treat set values, and an adversary that knows up to, let's say, two values, which will not be able to identify less than 50, less than five records. I'm going to show you what it means. Now, this is the anonymized data, and we have gone up to the diagnosis hierarchy, but there's useful information here, and we have sacrificed the marital status. Now, KM anonymity is more expensive, is a more expensive algorithm, so the algorithm thought to choose a good solution without offering us all the options that simply anonymity offered us. Now, what is the guarantee here that anyone who knows two values, that could be two ICD codes, or one ICD code, and the marital status of a person will not be able to identify less than K records. The way the algorithm did it is it actually removed the marital status, and it reduced the number of diagnosis codes. And if you remember for K anonymity, we had to completely sacrifice the most useful information here, which was the ICD codes, because each combination was unique. Now, you know, the patient here has one, two, three, four, five, six, seven, eight different diagnosis. If we wanted to have a K anonymous dataset, we would have to find other four people, which would have the same eight diagnosis. So this is infeasible, and we have to destroy it all. And also, it's pointless. If an adversary already knows eight ICD codes about the health condition of a person, then probably there's not much to hide. What's most important is to protect it about someone who just knows a part of your health condition. So here the part is to cross identifiers to diagnosis codes or a diagnosis codes and the marital and the marital status. Okay, let me take the questions again. We had only very small video tutorials. I will upload more and now you will have this webinar and the other webinars, but I'm going to upload to create a few and ask for some time for that that will do more specific jobs, how to create a new hierarchy or they will explain some more details. Access to Amnesia is via the OpenAir portal. Okay, and since I see that we have a little more time, then I'm going to show you the last algorithm we have in Amnesia, which is, which offers canonimity, but it does local recording and it is disk based. So it scales a lot better and offers better quality on the result. Now, to simplify things, I'm going to use just the date of birth and the marital status for this. I'm going to proceed to hierarchy is I'm going to generate a hierarchy for the date of birth again. I could have saved the previous one and used it, but I just didn't do it. So I'm going to show you again. Okay, this is the date hierarchy. That's the date of birth, marital status and choosing k equal to five. And I'm running the algorithm again here. There are not many options. The algorithm directly chooses the best, the best solution it finds, but I want you here to notice the first three values. In the global recording, now here we do not know the original age. It was a number between 30 and 49. Now, if we had global recording, all numbers between 30 and 49, if they were replaced, they would be replaced by exactly the same more generic category. But here we see that a number here has been replaced by this range, but another number that might be in the same place has been replaced by this range. And another one which overlaps by the previous has been replaced by this range. So we have many ranges that are in different granularity levels and that actually overlap between them, because now the algorithm splits the data in smaller parts and in each part it tries to find the best solution for this part. This is making it at the same time more scalable, but it can get more accurate estimations about the original data and the distribution of the age of each person in the original data. Let me see the Q&A. Okay. I don't see any more questions. Sorry, I have not. Okay. I think this is all for me for showing you how much, how does Amnesia works overall in the overall. Understand for your questions and I note that we need to publish more training material and this is our goal actually for the next month and at the rest of the June and in July, we're going to publish more material in the website and also in YouTube videos. Also, the last question about when we upload the file, I'm going to change the upload description because it's not an actually upload, it's just a loading of the file and the file is loaded to, if you have locally your, if you have locally installed Amnesia, then the file is just loaded in main memory or if you use the disk-based method, it's just a part of it that's used in the main memory. It doesn't go anywhere and you can use locally Amnesia without any connection to the internet. So when you use it locally, it's completely safe, nothing goes off your computer. If you use it as a service from our server, then it's loaded on the server. So there it is saved for a small period of time. So I would not recommend using the online service for anything more than training or demonstrating how the tool works. If you want to use it, use it locally. As for statistics for using this tool, we only have statistics from the web page which shows I think in the previous year it was more than a thousand that used the online service and like 2300 that downloaded and installed it, which was I think the first year that actually we had some, we gave it in a more mature form. I see every day the interest to grow, but since we do not have yet, we plan to do it any kind of page subscription or support. We do not know how many would actually buy it and use it on real data. Since we don't attack anything when you download Amnesia, we do not know how the people use it. Okay. Okay, so I hope I gave you a good understanding of Amnesia works. Please do not hesitate to contact me if you want any help with using it or you have any comments or anything that you would like to see, something that doesn't work well. We're actively developing it and since this is a new field, both for the developers and the users, we want to be certain that we're going to cover every need. Also, if you want to use it on a real world case, we would much like to have feedback on that. Thanks a lot, everyone. Okay. Thank you very much, Manolis, for taking the time to demonstrate this. I think it was very useful. Everybody, so already this link to the slides is available via the open-air webinar pages, and I will put the recordings there in a couple of hours. So if you have time, please fill in the evaluation form. Please, we will not send an additional email to notify you of the availability of the recordings and the slides, just so you know, you will just, you just have to check the page later this evening and it will be there. And Manolis, did you provide your email address at the end of your presentation? I think I had it in the presentation in the first slide. Okay. Good. Yeah. Okay. So if you want to contact Manolis directly related to this webinar, you can find this email address there. If you have any general questions about webinars, or if you have a question you want me to pass on to Manolis, just use the webinars at openair.eu address. Thank you very much for attending. And I hope to see you soon in another webinar. Thank you very much. Thank you very much, everyone. Thank you.