 Hello everybody. You probably have heard about the GDPR and even if you wouldn't be a hacker or IT nerd or whatever You would have learned about GDPR because of all this cookie stuff happening in the browsers On the websites. Well or not happening or not correctly happening or whatever So we have Karl Kubitschek here who will tell us a lot of those possible violations or correctness or incorrectness of these cookies cookie acceptance things and he is a PhD student at the ETH Zurich and Have fun with his talk Thank you for the introduction So this talk is actually something that we published and it's going to appear at this conference using security and the main author is actually not me, but Dino Bollinger and he was the one who implemented the thing So I want to give him the credits, but he can't be here standing and presenting it so As I said, you've probably seen similar pop-ups as this one and this one is very simple It is not for giving you many choices It's just informing you that the cookies are being used for the best experience whatever it means Maybe the other extreme are such Constants that are so complicated actually that they contain hundreds of choices this one is having 120 to 130 choices and basically this corresponds to every first party that is being included by the answer to page and I don't want to shame here Neutzerichet Zeitung actually the guy who implemented this even wrote me that he's unhappy about this but This is not something as outlier here Lots of websites are including 100 and more third parties. The outlier would be 200. Those are the extremes But 100 is like still happening You probably know These consents so Facebook and Google and you would say that these companies are having enough Budget to have the developers to implement this correctly And they probably have also the legal team to give the developers the expertise to do this properly So they should be compliant and not really they were fined just in January 150 million euros for Google 60 millions for Facebook So it means that even they cannot make it compliant So probably it is rather the goal to not make it compliant then we're not being able to do it properly What I mean by this not compliant so there are two regulations in play I'm not going to overload you by the legal details, but just like just the surface GDP are most of you Hurt and the other that plays here role is e privacy directive and The e privacy directive is actually the older one. So that one is defining what needs consent and here It is stating that all but strictly necessary data processing needs consent It's not speaking about cookies even that sometimes it's called cookie law It's not speaking about cookies at all the technology doesn't matter it is consent for data processing. So properly these things should be called Data processing consents or tracking consents, but people would not probably agree with tracking. So Actually, this is just FM isn't to call it Cookie consent another trick being used and you've heard actually a lot of those that Attended to talk about dark patterns. You've heard how many tricks are being used like this the GDP are Then is defining how consent is given and there are four requirements on the consent So consent should be freely given it should be unambiguous the freely given here means that You should have some choice here. You can click Got it. What does it mean? What does mean if I ignore this pop-up and I just leave it Be and I browse the website does it mean that I consented that means that this was not I'm this was Ambiguous because we are not sure what we are doing by interacting with this it should be specific and informed That is it should be clear to the user what is being consented and The user really should understand that clearly. I will show more details, but another aspect here Is that GDPR defines that when data is being collected or used? It should be limited to certain purposes of the usage. So that is this purpose limitation Aspect so then Therefore you are sometimes seeing declared purposes and it should be limited to only to these processing purposes Let's give you example of some better notice actually so this particular notice I will be using it a lot during this talk and The specificity here is implemented by giving you choices for concrete purposes here The fact that it's informed is that you can click here show details and then you can actually read why these cookies are being used and Then the purpose limitation idea is free by separating the cookies Categorizing them into the different purposes and giving you choices in which you want and which not This is still not ideal The problem here is that by default These checkboxes are checked. So this means that it was by default opt-in that is still violating the law But one can imagine the developers can also fix that and by default actually just leave it Unchecked and therefore we can there's no problem with designing a proper consent. Yeah So it's everything soft because we can Design a proper consent not really another problem is that actually the websites are not respecting what is being consented to show that let's play this video and We are visiting website for the first time and we are being asked for consent So here you can see the consent before we would even click agree We just show you that everything was by default opt-out, which is the correct implementation really everything That takes a bit longer in the video because they are all the third parties listed and so but everything is off You see that there is nothing that we would agree to we don't interact with the pop-up but we look inside of the browser and now we can see that there are Google Analytics cookies already used our other tracking cookies are used by this website and This just shows that actually even if you think that you honestly interact with the Consent you opt-out of from everything then you are still being tracked often Sorry a bit of Related work here, but again, you don't have to read all on this slide just an overview There are plenty of studies that are showing how these consents are flawed and one Generic aspect here is that More recent studies on the bottom are finding more problems. That is not because The website would actually get worse. That is because the studies just look more into the This is more in-depth analysis while the first studies were very shallow Other aspect that I wanted to show here the things that are in bold Those are showing the violations that are happening Despite your interaction with the Consent so those are the cookies that are in tracking queue despite You rejected them or before you even interacted with the Consent. So it is not novel. I'm just one Next step in this observe in these observations Other direction are the dark patterns. So there were so many studies about dark patterns The conclusion here is that they work. They work incredibly well. They trick they nudge 90% of users even more and to agreeing with something that they would not like to agree So if it would be called tracking Consent people would not agree that easily and I would like to ask actually where in physical world you would see such significant level of non-compliance. So if the Regulations for cars would work the same as digital privacy Regulations, then I would not be biking into work because I would really be scared that 90% of probability some car going on red will hit me And I'm not going to give you answer on why is this happening in the digital world Maybe in the future. I'm working on actually that but at the moment I'm at least going to give you solution to this so the goal of our work was to create a browser extension that classifies the cookies using machine learning and then removes the cookies as they are being created and therefore Enforcing create a privacy for the users directly in the browser without need of cooperation of the website so How Do we do that and that is basically outlined for significant part of this talk We found way how to collect training data set when and then we extract features from this data train the machine learning model and Built a browser extension with this machine learning model included. Let's go into this into steps Data collection, I don't know if any one of you have experience with machine learning in the sense that you ever annotated the data set That's you need it, but that is super annoying task. I did that and Not for this project It's really is demanding task So actually what was here nice is that we found some trick how to collect the data set already annotated so I was showing you this consent and This consent was listing the cookie names here assigned to the domains and Then categorized so here it is assigned to the statistical purpose So that means that they are actually constants that are doing this Annotation for us. They are matching the cookies to the purposes The problem is that they look very different if we would have to design Method how to collect this from all different types of consents then it would be another huge task to actually find such method luckily, there are some Basically libraries JavaScript libraries that you can just include to your website and they do the consent for you so these are third parties providing the consents for the websites and These are then having the same interface on all the websites that are using them and they are being used by like 2% of websites and These are actually these three that I was showing here So they are having the purposes they can serve for us as the source of the annotated data set and They also have the advantage that we don't even have to parse the text to extract it but they have some Inner objects that are defining actual list of cookies and assigned to the classes So they are JSON files or other representations how we can get that in Mesh and readable format. So not that much work needed I Did not mention really the purposes I mentioned just one so far and that was the legally defined strictly necessary cookies and That is really the only one that is defined by law The others are the facto standards. So they are being proposed by this Consents Sorry, I meant and forgot to mention that these are being called consent management platforms I might be using sometimes the CMP abbreviation But I will try to always use the long version because I don't like abbreviations So these consent management platforms actually define these four purposes So we just stick to that but let me introduce them strictly necessary something that is really needed. So your shopping basket on e-commerce website that is clearly needed or Authentication cookie for your login that is needed Functionality cookies are something that they define for Changing settings of the website. So do want to use dark mode that will be stored often in functionality cookie Sometimes this is being used also for switching languages, which I found a bit unfortunate, but They defined it as functionality not strictly necessary cookie so these two are The cookies that you would like to have Then the remaining two categories. That is what the website wants The website wants to collect some aggregated information about you. So then they can do alphabet attesting what thumbnail or what article title is going to Increase the number of clicks or Just for some debugging purposes. Those are often some performance and analytics cookies But they are collecting information aggregated over all the users the most privacy Problematic types of cookies are distracting advertising cookies. So then these are focusing on your individual actions And they are collecting create profile based on which they can Give you some targeted content or targeted advertisement So now we know the purpose is now we know where we are collecting the data. So just Reporting on what we collected We crawled Basically headless crawl so without rendering the pages we crawled six million websites and found that these Particular constant management platforms are being used on 37,000 of them so that was fast detection and then we crawled these 37 and a half thousand websites with Full browsers. So this particle library open WPM is used often for privacy studies It contains very detailed information about What is being created and so so detailed instrumentation? useful for us and This crawl had the goals of detecting the declared data, so actually the cookies and their assignment based on the Consent and Then we also needed to observe the cookies actually used because that is what is going to be classified So we needed to really see the cookies observe their content So for that we randomly browse to the website. So then random pages that we opened And we scrolled a bit because often they track actually our mouse movement and these are enabling a lot of cookies So that was the interaction that we did to Create the cookies such that we can observe them and We had to make sure that we are actually Consenting to all the cookies we want to observe the cookies in this case So there's a browser extension called consent automatic that is built for these consent management platforms And it automatically can reject or accept all cookies for these so we used to that one and Made sure that we are consenting everything Now from this crawl of the 37,000 websites roughly 30,000 reworked from them we Found declaration to 2.2 million websites. So sorry 2.2 million cookies and Then we also observed so we're seeing the cookie being used we observed 600 cookies 600,000 cookies And then the overlap of those two is actually just 323,000 cookies so that is going to be our training data set and the first issue here is that We are not observing All the cookies that are being declared which is Coming from two reasons one is that these constants are being outdated They are having cookies that are not being used at all But the main reason is actually that just our crawl is Not able to do as detailed interaction with the website as real human So as human you can register to the website you can change the settings and that will enable a lot of cookies But the crawl is agnostic of the content of the website So it's not going to observe all the cookie and cookies and it is quite some limitation of our work But other problem here is actually that there are plenty of these this green region a lot of cookies that we Found so we observe them, but they are not declared and that is going to be something to which I will Return in the end of the talk because that is basically a privacy violation The constant is not complete and I'm going to present you more types of privacy violations I can also just show that The classes of these cookies that we found are highly imbalanced So that is something that we have to deal with in our training and in our process, but They are mostly really most of the cookies on the internet are advertising and analytics cookies That was about the data collection now We need to extract some meaningful features. So we need to take The cookie which is a name and some value and some other flex We need to take that and we need to process it into some numerical Representation that is suitable for the machine learning to train Actually the prediction So for that what we did we defined over 50 methods That are using the expert knowledge here to extract something that the humans would see inside of the cookies So we when we see the cookie we have some understanding as developers. So we tried to give these hints to the machine learning module and I'm not going to present 50 methods. So just for us an example a Lot of cookies they contain some timestamps. So it can be The Unix encoding it can be human readable format, but the important thing is that it is often being used to track some information like your first visit or your last activity happened at this timestamp or your consent expires in This time. So these cookies are going to be used this Parsing this will be then useful to find out if something is potentially tracking you other content that we are seeing often in cookies are Some Text that is saying this is language. So DE or EN so it will be German or English or it can be the currency and similar texts and this is often being used in the functionality cookies If cookie is supposed to track you it needs to contain some randomness. It needs to be unique identifier at least among the seven billion People and our earth. So that means that you need at least something like 33 bits of entropy of randomness in it So we just measured the amount of randomness by different randomness metrics Shannon entropy We tried to compress it all of these methods and the last example is that the cookies are often not only Simple name value pair, but it encodes some object And then the object can be encoded in Jason. It can be CSV It can be some other encoding. We detect that and we give this information to the classifier and we also look inside of the Attributes if it's composed object and we even Extract a lot of features for the inner attributes so this was very just a Sneak into the what is being extracted, but there are many more methods Now to the implementation of the classifier and before giving you immediately some results. I will actually start with Explaining some baselines such that you can make some sense from the results. There is Manually annotated data set of cookies called cookiepedia. This repository is actually maintained by one of the constant providers because they are using it to ease To the web maintainers the work of assigning the cookies to the purposes. They claim to have over 30 million cookies and from the perspective of our data set of the 323,000 cookies that we have They have roughly 70% of this data set and Because it's being used by this same consent as we are Crawling they are using also the same purposes so we can then take this and basically say that this is a Hard-coded model that is taking the cookie name and classifying it to the purpose So we can take this as baseline of then our model our model is Tree ensemble Namely exche boost and just perform the best. I'm not going to Show you results, but we tried neural networks. We tried also other traditional methods, but Ensemble three ensembles worked best and what such model does internally is that it is decision tree just plot of them and You take your current value you ask is it a session cookie and If not, then you go to the left Is it having entropy higher than 0.8 whatever it means for the model if it's true Then you get to the result and that is then based on this is assigned the probability of the class this was very very brief and trend to what this actually looks but This model worked very well and The comparison is that if we take the baseline as cookiepedia Which is basically human experts trying Classify cookies and if we compare it directly to our model then our model is achieving comparable or slightly higher accuracy So we are outperforming here already human expertise or At least at the level of the human expertise if you want more details than these are confusion matrices to those that are into machine learning and To those I want to just show that This is showing what types of misclassification the models are doing and for example if a cookie is Necessary, but it's going to be classified as advertising then it's going to appear here and What I want to show is that our model is doing fewer of the More extreme mistakes. So if you would like to keep just the necessary and functional cookies and remove analytics and advertising Then the cookiepedia would even make more misclassifications From that Where we are having here fewer misclassifications We are even like training the model with the goal of not being way too far off like Misclassifying something necessary to functional is more okay than misclassifying something necessary to advertising and vice versa The performance won't get over 91% we evaluated that by just finding that our data set is containing a lot of noise roughly 9% of noise and That is something that I'm going to I get back to because the noise is another type of privacy violations that we observed so we trained the model and Now just we need to create the browser extension The problem was that there are no libraries for machine learning models for JavaScript or for JavaScript you get them but not really for the API that is allowed by browsers so we had to reimplement still ex-jboost prediction for Java script and browser extension, but that was done dinner was prank a lot and then he created this browser extension cookie block and The functionality of this is that when you install it then it's going to ask you for basically one more consent you will select the the categories that you would like to remove and That you would like to keep that is going to be the last concern that you should give after that as you are browsing it Categorizes the cookies using the model and removes them within 10 15 milliseconds It does not remove the pop-ups So we are not doing that because there are other browser extensions doing that there is You block origin that allows you to install filter for specifically the pop-ups That is my preference, but there is also browser extension I don't care about cookies that only purpose of that is actually it's removing the pop-ups So if you want to just remove the pop-ups that is the extension to go But if you are only removing the pop-ups then some cookies are still going to track you So that is the goal of the cookie block and we left these functionalities separated and not to implement it through these extensions One more aspect maybe not that important for you here is that it works for everyone and everywhere so it is language agnostic and Also, it is Regine agnostic it doesn't matter that only European Union is having that strong privacy regulations It works also for us. It works in Asia in South America everywhere they where they don't have privacy regulations that would protect the users and that effect is actually called Brussels effect that The European regulations that are advanced also in like climate change are then affecting the whole world and The whole world is benefiting from them even that we are a bit better testers of such regulations So I think that this is now nice example here that we are also exporting the privacy We evaluated this by Browsing using cookie block 100 websites and Ninety-five percent of them worked Without any problem. So everything was as expected. We were tried to register to the websites We changed some settings. We did everything that was possible but the problem here is that Some of the websites were broken and this is the sad truth about our extension so seven websites It was impossible to either register or we were losing the authentication session. So that is of course Usability issue and That is something that we are addressing by maintaining a list of exceptions Mostly currently now I'm maintaining this list of exceptions. So people are reporting to us broken websites and then I'm just spending the time finding which cookie is the broken one and I'm granting the exception and then everyone is Having the information built in the browser extension to ignore such cookie. Don't remove it Eight websites were broken for some other reason Seven times it was the consent that was for appearing even that user consented the cookie that was storing the consent was Being removed. That is something that doesn't matter if you are using the browser extensions to remove the consent So that is not that big deal Unfortunately one website it was impossible to switch the language. So that is also something that I'm trying to fix You can get this extension in the browser stores. So you can get it for Chrome for firefox opera Edge, but it's not for Safari because Safari API Did not allow removal of individual cookies It's being changed. So maybe in half year we will have Safari It's still I think I'm not sure about that, but it's not for Safari at the moment but so far we have seven thousand users and We have small small community. It's mostly me. Do you know Who are maintaining it? But we got two people committing to the repository also translation to other languages. So Spanish and Japanese was translated by users we have over 100 reviews if we include all the stores and also our forum for feedback and average rating is like four Unfortunately, we have people that are like one star. It broke one website for me and they are unhappy, which Yeah, the cook the extension is breaking some websites. It's unfortunate, but I want to be clear about it So just know that you are being warned, but it's protecting the privacy. There is some utility privacy trade-off between this The maintenance is time-demanding I'm having some other projects Along this but this would take almost like one third of my working week if I would work on it all the time And I'm now a bit behind so Lot of feedback from users after it was covered with media So that is the challenging part and if there are people that are interested in contributing to open source and You would like this extension then I'm happy to explain you how my eye I'm defining the exceptions and I would be happy to see actually people helping with this Not only the translation even that it was very nice Here demonstration of how it works So again the same problematic website, yeah, and after so after installation you will select what purposes you would like and then we will navigate to the same news website and This time Again, we are not interacting with the consent, but we are looking into the Storage and there is no Google Analytics cookie anymore. There is this one underscore GAT and that cookie is Seishirin cookie and it's not tracking queue because it is having value one So it's not having enough entropy to track you It is really just to store the consent which was not given finally But it is storing that the constant was given which is just the bug on this website So it is being removed Again, yeah, I Hinted you few times that we are also detecting some privacy violations So it is from the same data set of the websites so namely only from these websites that are using these specific consent management platforms and we Declare Defined eighth methods to get your evidence for potential privacy violations potential Just because it needs court decision to say something is privacy violation The shocking result for those that are not in the field Not the choking for those that read the previous work is that Almost 95 percent of websites are having at least one privacy violation Let's go through some of them Maybe I actually have time for going through all of them. So one The one most commonly used cookie in the internet is Google Analytics cookies almost 70 percent of websites using them They should be statistical. So they are collecting aggregated information users and The problem is not that much if they misclassified as Advertising tracking that is only going to harm them But problem is if they are classifying it as strictly necessary where you cannot Objects to these and That is something that is happening by 2.7 percent of websites and this was by court decided that this is real violation But this one serves more as example now generalizing this case There are cookies that are being defined by multiple websites because they are third-party cookies so different websites are defining the same cookie and Then we can basically let the websites vote. What is the purpose of the cookie and If then some of them disagree then we can say that there are at least some of them being wrong and We found that at least 30 percent of websites are having Cookie that is disagreeing assigned to purpose that is disagreeing with other Websites what they are saying this might be actually we can be wrong. It can be that the Majorities being wrong, but this is giving us the lower bound at least on this type of violation potential violation some cookies are being assigned to multiple purposes it doesn't really comply with the idea by GDPR to separate the purposes But it is not that big deal if it would be implemented correctly such that if you disagree with one of the purposes the cookie would not be used Unfortunately, if you agree just with one of them, it is already being used But it's not happening that often 2.3 percent of websites Then some of these Consents are having also category unclassified the problem is that there is not no checkbox for Unclassified so you cannot object to these and That is something that 25 percent of websites are having unclassified cookies, which is rather Ignorance by the developers to not reassign them, but the problem is you cannot object them is there So that is another thing that we report fifth method the most common was actually what I already Hint it once and that is that we are observing a lot of cookies that are not being declared at all and roughly 40 percent of cookies are being Used by the websites and not appearing in the consent at all and it is more than 80 percent of websites having such cookies again more the ignorance of the developers and Maybe they are just not aware So this is another problem because your consent is not informed then you are consenting with something that you were not given the right information Sixth method if I show here so cookies itself are having as one of the attributes they are having expiry and By law they were also required these consents to inform about the expiry So that is the reason why here is the column about expiry we can just simply compare these two values and If there is significantly Mismatch the real expiry is significantly lower than we are reporting that is another potential violation found roughly on 13 and a half percent of websites Last two methods are not known will by us. They were actually in the prior work But we can very well Remeasure them. So the cookies that are being set before your consent that is the independent page that I was showing in the two videos they are being used by almost 70 percent of websites and that is agreeing with one previous study but very disagreeing with other and Last is when you provide the consent you object to some of the purposes and these cookies That are for these purposes are actually still being used That is something that is on one-fifth of websites and again our measurement is very different from previous studies So maybe it's happening more or maybe their study was flawed If I just show all of them together it looks roughly like this So you see different problems differently often, but if I aggregate them together and say How many potential violations are there on websites then you can see? only 5.3 percent of websites not having any problems at all and Then you see that a lot of websites are having multiple problems. So really it is happening a lot of stuff on multiple websites and Also, this is just of these eight types. So the maximum here is eight, but If we would take into account every single cookie that is a potential violation of some problem We would be with numbers that would be like 50 and something like that. There are websites that are having 100 undeclared cookies our future goals with this project is actually Two directions one is that we want to prevent websites doing this So we cooperate with the data protection agencies if someone here would be actually from data protection agency connect with me please because We are happy to give the data to these so far. We are cooperating just with one and We also provide them. We are working on auditing version of the browser extension That would be more generic and that would allow basically faster for them to check if something is a real problem or not Um We also try to detect these potential violations on more websites Not only on these websites that are using the concrete CMPs Consent management platforms, but on generic websites, which needs some generalization natural language processing and so The other direction is Something that I hinted as well. We would like to understand why is this happening and one Direction is that we want to have continuous measurements and see if new regulations are helping and Improving the situation such that we can give feedback to the lawyers because the lawyers are Not having this feedback. They would like to propose something that would work So this is the goal that we have here. The other is understand actually how the different consent providers are Shaping than the generic constant. So if one of them is having some feature, is it going to be implemented by others? Okay, I'm getting to the end. So we can clearly see that cookie constants are broken or Tracking constants are broken and if we want to prevent it now it has to be Enforced at the client side. So in the browser What we did towards that was that we crawled Dataset of cookies with the purposes extracted the features for them trained a machine learning model that is categorizing them and Built a browser extension that includes this. So this cookie block and along that we Also found evidence for a lot of potential violations 85% websites having some problems. That's everything. Thank you for attention your questions Are there any questions if you don't have a question, please stay seated? So not to disturb everybody else, but there are quite some questions I start here even if you want Okay, I think that's really great work And I really like that you not only use it for research, but also to actually let the public benefit from it directly. It's very nice I have a couple questions actually but One thing I'm interested in is like you have this bias that you mentioned like Because your crawler doesn't look into websites and doesn't use the interactive features of websites You lose all the functionality cool or you lose a high percentage of functionality cookies Would you say that this is also the explanation for these websites that don't work? I think it's quite a high number from a user perspective that don't work with your extension and How do you incorporate? The Exceptions that you said that you are maintaining manually. Are you feeding them back to the model so that the model also improves? And so do you retrain on them also and do you plan to maybe also improve your crawler so that you can in the future? Have less problems with this. Thanks for the question. I totally agree that it is a reason why we have underrepresented Strictly necessary cookies and it is big problem why websites are being broken We are now going rerunning crawl After quite some time and we want to include also the labels from the manual exceptions But the exceptions are working as Hardcore the data set that is being into account taken into account before the classification happens. So it is a Jason file that is just saying this cookie is having this purpose don't even try to classify it and It is up to me or you know to assign this purpose from looking into the privacy Policy and one more thing that you asked how when can we improve the crawler in other projects? I'm doing a crawler that might be more problematic to Say that is actually automating registration to websites because we want to inspect also authenticated sections for different privacy problems and Joining the projects would be nice But that other crawl is extremely expensive in just running like it's running 10 times longer than this one So yeah, eventually my crawlers My projects hopefully will merge but at the moment it's not happening Yeah, which is causing this limitation You talked about the cmp's how do they work like are you downloading a library? Configure it and deploy it on your website or are you embedded and send personal data to? Maybe a US company You are directly embedding it in your website. So it's a first party and you are maintaining it then through the Website because a lot of them are paid So you pay the subscription fee and this way you maintain everything through the website of the constant management platform So everything is set up there and everything is contained by them and also if it's Going to use company it is totally happening Oh First of all, thank you for solving this important problem How big would you rate the risk that the companies adopt their cookies to? Not get caught by your current algorithm If we have some thousand users, then I'm not that afraid But of course they can use the Adversarial machine learning methods to create cookies that would evade the detection. I Would at the moment I would actually address it by saying that Love would be very unhappy For them doing this So I think that their legal team would actually decide like this is not a good thing to try to evade this I think that they would really be risking more than they can benefit from it In long term We can use methods that are preventing Adversarial machine learning. So actually our model is not that Suspectable to adversarial machine learning while neural networks would be much worse But it would harm again the accuracy of the model So that is something that we at the moment decided we just ignore this possibility and we go with The best possible accuracy that we can get Any more questions So kind of from the implementation side Having probably said cookies that might have required consent to a very specific question So if I said a preference cookie like your language example Is that something if I don't set any tracking cookies anything like that to a need a consent planner for that or? Would that also explain the kind of Unclassified cookies because developer think oh, this is just a preference that user can directly manipulate So I don't need to get consent. I don't even need to list it so they should not leave things unclassified they should say that it is strictly necessary and I think that it is some obscurity created by the There's actually a consortium of the consent management platforms So I think that the functionality class is not beneficial for the users and it should be strictly necessary So if your website is using cookie to switch language, please Use it as strictly necessary. You don't need any consent at least. That's my perception here like to me This is very functionality of the website. I would like to have this Thanks for the talk and Among the CMPs you looked at were there any that relatively speaking produce less violations than others who I'm not having data on this at the moment so I Would have like personal preference which one of them are having like more user-friendly implementation than others, but I think that most of the violations are stemming from The ignorance of the developers or just not awareness of what they should get consent for so the API here These types of violations. They are not probably meant much influenced by that But I'm sorry. I don't have data like hard data for this to say it Some of them, for example, don't have the unclassified class. So then of course that type of potential violation is not happening So thank you for the talk. What about Scrambling the cookie contents instead of deleting them So If you would only Interact with the cookie consent, then you are still dependent on the website to then Implement the cookies accordingly. So the last two types of potential violations So these two are Dependent on the implementation and of the website independent of the consent this is free something some cookies that are being used before you consent and Namely, these ones are the most important that you would not get rid of. So these are cookies that are being used after the you provide consent and also all these cookies that are being misclassified you would and They actually were not for us detected as privacy violation because We just considered the labels given by the developers as given but There are many misclassified and cookie block will prevent also these problems. I think the last question was about the idea of Disturbing website owners by providing them noisy data instead of no data So that your extension would not delete the cookie but instead write something else inside the cookie But I actually have a question on my own I'm interested in your training So you said you had just like around 350 cookies right that you had in your training set How many samples did you have so how many different values on average? Did you have for each of the cookies and also about the features? And did you also use the name of the cookie as a part of the features? so 350,000 cookies that we have in the training data set the We extract Some things from the name so we parse the name for like tokens because it is often like came a case or some other representation then we have Back of words for the most common Names and we give that to the machine learning model to do whatever it wants with it So we are not saying that if it contains tracking then stored it as a feature We are just simply letting the feature extraction to say it contains tracking in the name Sorry, the other question that was in the beginning. I might have skipped that So how many labels be observed per Or how many layer? Okay, so how many times we observed the same name of the cookie so for us Uniquely we interpret cookie as a combination of the name and the domain so we use that as a unique Representations that actually is a big difference from the cookie pdm That is why we did not use data from them because they are taking just name and they directly go with the purpose Which is the problem then one website can use cookie User ID for tracking while Adrienne is using it for authentication and those are absolutely the opposite Goals, so there are cookies like the Google analytics that we found and Really a lot of websites and then with roughly 8% they disagreed from the statistical but Apart from these cookies that are the third party We are not having the measurement on how many times they are appearing because they are unique for every website I have question by myself. So this is by origin a legal question as such these Consensus things and GDPR. So do you have any legal people in your team or somehow? So in this particle work, we published it without lawyers because I learned from other project that Publishing with lawyers means that I'm having even that I'm going to meeting with some questions I'm going out with more questions that I had So it was just easier to publish it without lawyers But we have cooperation with lawyers to law professors and postdoc and PhD student that are involved in other projects And I'm sometimes Trying to get their feedback on this so I have a lot of legal guidance from them and my overview of the law is actually from them a lot There are any more questions You mentioned something about an auditing tool. What's the status about this? I think it's would be quite nice for a website hostess to check if everything is alright Yeah the time so I'm supervising master student working on that. I'm not implementing it myself and The progress is that we first have to implement cookie block actually for the new browser extension API which was unfortunately updated and it harms a lot of privacy extensions and Now we are also detecting a lot of additional things that then the developer does not need to provide to us So we are trying to automate the auditing process as much as possible and for that reason. It's not available yet and it will some first version will be out in August so definitely in August I will upload it to my website to link to that but the final version the end of the Project of the student is end of September, so Connected with me if you want to have it or if you are interested you can be better test actually and that would be nice Also for the student But at the moment, it's not released more questions Well, I think they were already quite a lot of questions and Yes, you did very well answering more or less all of them. Thank you So this was a quick check and I hope you can hear more about all this in some future years. Thank you Thanks a lot for the attention if you would like to have it then you can just search for the cookie block name simply Or everything else is available from my website or on github, but Everything is now quite well by all the search engines