 And we're live from the MIT Media Lab, I'm Daza Greenwood, and this is GDTR Hack Day. And we're about to start a joint panel with our friend and colleague, Mark Luzar, in London on the algorithmic sections for GDTR. And so as most of the panels are in London, I'm going to hand it over to Mark to get the facility started. So Mark, it's all yours. Thank you. We're all really happy. We're just tuning in the... Can you guys hear me now? No. So I have to be over here. Can you hear now? Yes. All right. We're talking about muting and unmuting, just to get these speakers going. Hi, everyone. We've been having a really good day. I'm going to hand it over to Nicola, who's actually hosting the session. And then we're going to wrap up with some discussion and questions from both places. So here you go, Nicola. Yeah. Okay. Hi, everyone from MIT. So it's been a long day of discussion. We have talked so far about two issues, as you might have seen from the program. One panel was more about standards. The need for standards in this field to facilitate, you know, data exchange in line with privacy. And the second panel was more focusing on benefits, opportunities, and challenges in relation to data portability. So we have been focusing not only on the portability right that is given by the GDPR, but going a little bit broader. And one of the issues that came out in the last discussion was specifically how the data portability and the facilitation of exchange and accessing to data is potentially with decisions that can be made through those data through algorithms that can have adverse effects for individuals. So when you have some kind of trade-off, the example of an insurance was made where you are required to give some data in order to have a discount. And with those data, they're going to make a much better profile of you. So that's what we left for this panel. We said, yeah, in this panel, we are going to investigate ways in which we can provide some more transparency and accountability with regard to those kinds of decisions. So just by way of introducing the GDPR, we have a few key provisions that increase the amount of agency that users have with regard to algorithmic processing. You might have heard about the debate on the right to explanation. So Article 22 of the GDPR requires that no decision shall be taken that has a significant effect on individuals and it's based solely on automated processing except for some specific circumstances. So this decision can be made only if they're based on explicit consent of a data subject or a contract with the data subject. And in any case, there should be some minimum safeguards which are the right to obtain human intervention and the right to contest the decision. So in this panel, we want to explore some potential ways that these services that are going to spring up with the GDPR can facilitate that kind of intervention and explanation for users to understand how the algorithmic choices are made. So we have a fantastic lineup. I'm going to start throwing the ball back to you at MIT because I think Thomas' presentation is going to connect very well with the previous panel since we are talking about PIM services and your presentation is really about that. It's about personal infomedary and it's a term that I'm very much endorsed. So just by way of introduction, Thomas, a journal works at MIT for the OPAL project. I guess you can tell us a little bit more about this project as you introduce yourself more accurately. But yeah, after that you are going to make your intervention, we are going to throw it for some possible burning questions, but we are going to keep the discussion with more flow towards the end. So after everyone has made the presentation, and we'll reserve some time from questions from MIT. Thanks for having me, and I'll leave it to you, Thomas. Let me just... Yeah? This project of mine was called Clear Button a long time ago, which is the concept that people can press a button and ask everyone for their data, especially those organizations that you don't have a relationship with. That was a long time ago, and now we are here and we're actually talking... Partically, if you have a detail about what happens when you press the button, what is the actual definition of the term of the thing? Not the concept of the law so much, but the operational, how does it work is very important. Thanks. Yeah. Over to you. Okay, great. Can you flip the slide to slide number two for Nicolo? Yes. Wait a second. We do have the slides available, but we need to put the projector back on. He put the whole handle together the last second on their side. Are we good? No, not yet, but you can start talking and hopefully we'll be able to do that. So the vision of OPAL came through research here at MIT, which is focused on how you can combine data from different data domains and obtain interesting insights. So one of the interesting papers that I encourage you to look up is called MoneyWalks, and it is a research that combined credit card data together with movement data, GPS data, so basically telco data and credit card data. And one of the goals of that research was to provide increased accuracy in predicting credit card churn. A churn in this case means that somebody drops a card brand for a different brand. You go from Visa to MasterCard. You go from a Bank of America Visa to a Capital One Visa, that kind of thing. And this question is actually preeminent for the folks in MasterCard. MasterCard's got an entire group just working on data analytics. And what this research showed was that if you are able to combine this data together, you get an increase of up to 30% in accuracy in predicting the churn rate for a group of people or persons within the next six months. So this is something unheard of in the card payments industry. That's 30 years worth of ideas. Now, the challenge there is that you need to get access to this diverse data. And so slide number two is our vision slide. And so basically, could we do privacy preserving data sharing or more correctly, privacy preserving insights sharing across different data domains, telecoms data, finance data, social data, health data, you name it. And within each of these, there's different sub segments of the market. Next slide please. Got it. So the next slide is personal data is a new asset class. So I will do this for the folks. I know my slide. I kind of made them. So if you are interested in some bedtime reading, I would encourage you to look at the blue or the purple report here. It's a 2011 report from the World Economic Forum, the big data group there. And basically, I don't want to say it's a sequence of complaints, but it is a sequence of complaints about how badly personal data is treated, infrastructure is a mess. It's like, okay, you don't get the press. I hope reading that. But the key thing is that this group is chaired by Sandy Penland, who is our director here. And the other key point is that much of the readership of this document was in fact the European regulators who proceeded to draft up GDPR. So there is influence there. I also encourage you to read the paper on the right hand side, the American paper from three or four years ago, Saving Big Data from Itself, which is in the big boom of big data and analytics. And here's a paper saying, whoa. And one of the things there is surveillance. So this is one of them. And I think I could be mistaken that when we had the Snowden Inquiry, it was probably the only paper that was cited by the community saying like, okay, this is worth, there's some interesting threads in there. I'll let you guys read that. Moving on to the next slide. So Open Algorithms, Opal stands for Open Algorithms. Yes, we're awful at marketing and choosing brands. Too geeky to be choosing brands. So there's some very simple principles, simple in words, but could have dramatic implications on how people do analytics. The first is move the algorithm to the data. So this means that stop doing all this data copying and data selling and instead send the algorithm to the data. Now, this is classic client server architecture from the 80s when people had done terminals and all the processing powers on that side, the server side. This is like that. The idea here is that in order to remain consistent with the notion of keeping data where it is, you need to just move the algorithm there, which is in contrast to what people do today for the big analytics platforms like Hadoop or Spark. The assumption is you've got all these petabytes of data at your disposal and your next problem is how to clean it, how you uploaded, how you do all these clustering, which are fun and great, but it works on the assumption that you have access to the data, which is not always the case. Secondly, which is probably important for the GDPR, in particular Article 22, is the idea of verited algorithms only. So the term vetting here is kind of hand wavy, let's just say, but the idea is that when you want to pair an algorithm or a set of algorithm with data sets, you need to understand bias and a built-in unfairness. It's not because it's built in. Because when people collect data, it's not necessarily fair. In fact, there's a famous paper I also encourage to read. Raw data is oxymoron. The reason for assessing is raw data. Every data is biased is the way you collected it. Maybe IoT device data is kind of neutral. So you need to get the algorithm vetted by an expert, somebody who's in the field. Now, for those who are in the AI machine learning field, you will know that the topic of fairness in machine learning is a huge, huge topic, multiple conferences, and it is quite a difficult topic. But the idea also is that a consumer needs to know what the algorithm does. There's no point in giving the consumer a Python code and implements it. No, you have to explain it. And your explanation has to have some kind of a legal bearing. You have to have responsibility when you explain this to the consumer. The third is data always encrypted. So next generation cryptographic solutions such as homomorphic encryption and multi-party computation allow you to do encryption on encrypted data without decrypting. So can you compute on encrypted data? Is that homomorphic? That's homomorphic encryption. There's two branches in the field, homomorphic encryption and multi-party computation. And this is relevant for organizations that hold your data. So let's say EchoFax had a solution like this. If somebody broke into EchoFax, they'd be stealing encrypted data. They wouldn't be able to do anything with it. Decentralizing distributed repositories. We all have a blockchain, both the real and the metaphor. Because everywhere I go, 90% of what people say about blockchain is really the blockchain metaphor. So if we are going to keep data where it is and not move it, you end up with a distributed data architecture. So how do you address these challenges going forward? If you wanted to know more about this, I encourage you to look at the July European presidency presentation. So the keynote is by Professor Sandy Pendlin. He has a very high-level presentation. If you just Google EU Presidency Keynote July 2017, you'll see that. Moving on to the next slide, the MIT Opal slide. Please mark it, Nicolo. So this is a very, very high-level view of Opal. It has query processing capabilities, machine learning capabilities, privacy protection, nice broad term, data, of course, algorithms, and consent management. So this is probably what would be irrelevant for this audience here today. And when we say consent management, of course we mean GDPR compliant, consent management, the sense of authentic consent, producing a consent receipt, and being able to track it. So we talk about using blockchains to track consent receipts, and so on. Fine, that's easily doable. So the other thing that's important that sometimes people forget when they talk about data and algorithms is the provenance of the data. So the way the industry works today, at least in the US, there's a thriving aggregator market, which is companies that scour the internet scrape stuff about people and then resell it to people who are looking for data, marketing companies, and so on. Number one, we don't know the provenance of the data. Number two, the accuracy is not so good. So if you want to run a business, you want to run the next shoe company, Zappos, whatever, and you're looking for customers, you will be paying maybe $500, $600 per lead, you know, to the aggregator, and the success rate is 5%, 10% only. So the negative is like big chat, right? This just illustrates how the data, you know, is this poor quality. Next slide, please, which is this personal intermediary. Okay. Number six. Okay, good. Yeah, you're right. Yeah, number six. So since we're in the panel, I don't want to take up all the time. So the idea here is that could you create an instance of the Opal framework for personal data stores? So I would maintain a data store, but more importantly, I would have an online agent, and we'll just leave it at that because you can have virtual machines all that stuff up there, it doesn't matter. It's an agent that is essentially my proxy and it would interact with social platforms on my behalf, but more importantly, it would collect all the data that comes through as if I was browsing on Facebook. So instead of Facebook, me with my Facebook app, it's a client talking to the Facebook server directly and all the data including all the junk, advertising, everything gets collected, put into my data store. The point being is that these data stores become important tools to do analytics because you don't know what individually it may be meaningless, but you guys remember the case when Facebook announced last year, two years ago, that they did an experiment with 100 users where they injected certain fees and were measuring people's reactions over a period of two months or three months. And so they announced it and people were like, this is before analytical, by the way, I don't know if people don't read it, it's like, what, there's no like, and the only reason is 100 people and everybody thought, well, that's harmless. What if you could reverse it and say, well, everybody in this room has a personal data store, all the complete feeds, and what if you could run your analytics on this? And so it's AI data driven and what if you could put this on peer-to-pea nodes encrypted? So that means it's a permanent storage of a history of your life. Next slide, please, the one with the diagram. So this shows a very high level, fancy diagram of, you know, mobile and blockchains and so on. We think that the idea that every individual will be having their own server at home, it's not going to work. So you're going to be hosting this somewhere, you're going to get this as a service. So the important point there is that the service operator needs to enter into a contractual agreement with you, the individual, as to rules, like GDPR clients, they're not allowed to copy your data, they're not allowed to read your data, and so on and so on. And they have a fiduciary obligation to you regarding your data. So people, this term is coming into vote now, people are talking about personal data banks, the Japanese government talking about personal information banks. My bank has a fiduciary obligation to me, you know, with the rest of financial matters, so why can't we have something like that? And I think Daz has talked about that credit unions to organize this a while back. So the credit union in the U.S. is an interesting legal construct that might be interesting. Finally, last slide, the two books. Again, if you are looking for sleeping material and there's nothing on, you know, on TV and Netflix and Amazon Prime is not cutting it. Take a look at these two books. We'll be revising it pretty soon, but it's got a good summary of the research that we do here at MIT. That's the end of the slide. Last slide, URLs, if you want to look at some more, trust.mit.edu and connection.mit.edu. All right. Amazing. Any questions? Any burning questions now? Hello to everybody who's tuning in over the internet. We know that you are live streaming now, and I'm not sure Daz if you can see how many people, but we have a lot of emails, so I really like your presentation. If nobody has, I have a clarifying question which is about the AI and the driven nature of this service. So do you mean that basically it's going to learn user preferences and act on behalf of the user, for example, by recommending which websites to go to, which kind of services are catered to him, or would it act only under the instruction, specific instruction of the user, or something in between? So the AI and the machine learning is owned by the user. So the idea is to level the playing field. So in the sense, well, you know, I could have an AI engine that essentially warns me of certain anomalous behavior on my Facebook stream or on my Twitter stream. A group of people who shared temporary access to their PDS and run a collective AI to decide, has Facebook been gaming us? Because that's what they do. That's what analytic has been doing. Is there a covert channel happening here where we don't even realize you're being played? Can we do that, right? So, hey, Sal. Hey, Thomas. Can you hear me okay? Yeah, I guess you have to come up. So it sounds like what you're talking about is really maybe starting with an alarm monitor before you get to delegated authorization from an AI perspective, right? So if you could have an AI as watchdog, I guess, would be as opposed to an AI in broader terms, which makes a lot of sense. But the term AI there, the mind wanders, right? Thank you, Sal. Another just tiny question. Have you thought about the legal issues with scraping? Because I think what you're doing there is getting a lot of information from Facebook not probably in accordance with their terms of service. Is that a potential legal issue? That's a good one. Maybe we should go to court. Okay. No, there was actually a reason to... You need to get this in court and see. I mean, the only way it seems nowadays to get any action is to drag Zuckerberg in front of the cameras. I mean, really. How else are you going to get action, right? It doesn't go without saying that, you know, my personal favorite article of the GDPR article 15, giving people right to access their own personal data would be the core authoritative source of data that people would be getting from Facebook and putting in their personal data store. And so that would hardly be a violation of law. That would be enforcement of law. Yes. I thought there is a difference between requesting and self-enforcing, you know, the law. I am totally with you in spirit. I just see that Facebook will probably fight back, as you know. Of course. Of course, yes. Okay. So with that... There are rumors, right, on Facebook's response for GDPR to just moving the service, moving the data out of European servers and into elsewhere, right, so that... It's cheaper. Yeah. There's a lot to do that. The worst case is that you could demand reciprocity from Facebook, right? So if they're going to start complaining about something, you know, you could define as long as you treat me the way you want me to treat you. And so maybe in some ways that might even be a thin wedge, right? I mean, I think a good progress would be that if the events in Europe could force an open API for Facebook, right? So this published API that as long as you sign on, you can get access to your own data. Yeah. Through this API. So you can build your own app, you can build your own tools, equals the playing field, right? Yeah. So levels of playing field. Great. All right. So I think those are our panelists coming from London, England. Exactly. So thanks for the provoking presentation. Now we are going to continue the conversation. Thomas, please stay there and we would want a discussion at the end. So now the second speaker is Weisjöck. Just a second. Do you have the slides up? They're there on the computer. So Weisjöck Belle is an assistant professor and chancellor fellow at the School of Informatics in the University of Edinburgh and an Alan Turing fellow. He has done a lot of work on artificial intelligence, published over 40 scientific articles, won prizes. So we are excited to have him here. I'm going to talk about explanation as a service. All right. This is Weisjöck. Yes. Okay. Okay. We're going to allow direct view of the speaker. Perfect. Okay. And you have to speak on the microphone. Okay. Can I put that here? Yeah, I think so. Can you guys hear me? Yes. Yes. Okay. Good. Oh, they don't have the slides. I'm going to send you guys the slides right now. Yeah. Go ahead and do that. Okay. So I want to thank Nicola for inviting me here. As Nicola mentioned, I work in the area of AI. I've been working here for a couple of years. And this is sort of a high level perspective. So let me say up front that it does not mention PIMS. What it does mention is the fact that, you know, the way AI is able to exploit data is very magical in some sense. And so what it tries to give you is a perspective on what does it mean to actually have explainability and how potentially in your system, you know, again, it's sort of a high level, but based on technical considerations, what it would mean to offer explanations to your customer or user. Okay. So let me sort of start by, you know, basically mentioning what's going on with explainable AI or why are we suddenly concerned about that now? Some key issues and some solutions and finally conclude. All right. If you look at the current state of AI, right, it's quite likely that you'll see two extremes. On the one hand, you'll have tons of news articles talking about AI systems supposedly reaching near human abilities, but in carefully defined circumscribed tasks, right? You're talking about Poco, you're talking about, you know, Go, you're talking about chess and so on and so forth. But, you know, if you put that aside, what you have to recognize is that practically every app you use on your phone is using some form of AI, right? Right from recommendations and IMDB, you know, what movie to watch next on Netflix, your credit approval is essentially based on some kind of decision making, automated decision making, recruitment, you know, who contacts you, who gets to see a profile on LinkedIn, you know, all of these decisions are based on AI. Of course, I'm not saying that the, you know, AI is, in a sense, not one thing, but a collection of different techniques, right? So it's not a surprise that everything gets clubbed with AI, but they are very different. So I say at a high level, I won't go into too much detail, but I'm just saying that there's a spectrum of techniques and the way it impacts you, right? So why do we care about explanations? So one of the interesting things about it, at least if you look at the second part of the spectrum, is that AI systems sort of, you know, deployed an application, a very complex pipeline, and it's near impossible to verify what's going on there, right? Sure enough, there is some amount of data collected, some kind of algorithm that works on this data, but it's in some part of this abstract pipeline that's pretty hard to, you know, run an end-to-end verification. And the interesting thing is, of course, you know, in the more, you know, things like recommendation, recommending movies and Amazon and so on, these decisions are based on millions of data points, right? Every interaction that the company has recorded is essentially contributing to the decision and it's based on what it's called in the teacher space. What that means is that they look at all of these data points and try to figure out what essentially is important for making a decision and they construct this kind of feature space and that's essentially what's driving the decision. So, for instance, you know, things that you might buy on Amazon, what they're going to be interested in is, what did you buy in the past? What are the kinds of products you look at when you search for a certain product? All of these are essentially impacting the decision, right? So, you know, one of the naive perspectives you could take here is that, you know, so what? Why do we care, right? As long as it works, as long as it does the job for me. Actually, there are tons of reasons to be very doubtful of what's going on. This is a set of an example that's been coming up quite recently. So, as you can see, that's a label of a panda, right? Good enough. Then there's the way, you know, as I mentioned, these artificial intelligence systems are based on this very complex feature space and some of these features might be like regions of interest, you know, certain sub-circles, for instance, obviously the distinguishing factor for here for a panda for us as human beings is the fact that they have these big black eyes, right? But you don't really know what's going on inside one of these machine learning systems. So, you could have these very funny-looking noise models that you add to this image and suddenly it says it says something else, right? And this noise model is so ascribed and yet, you know, you could get very high confidence that, you know, the labeling is something very bizarre and strange. So, you know, so think of this not just in the context of image recognition, but any kind of application that you might be using AI. If somebody is able to inject a small amount of noise and get the decision about you to be very different, it's something for you to worry about. Often what happens is that there's a huge mismatch in terms of what you expect to see and what you actually have. So, as you know, every system, every startup, right? It's starting at some point. It starts off 100 users, 200 users, up to a million users, right? And it's taking decisions incrementally, right? So, you could imagine that it's taking a decision based on this black line, but the true distribution is missing these two important points of 1580. Right? So, again, this is a sort of a diagrammatic perspective trying to show that the decisions it takes could be based on something very different from the things it's collected so far. And this could potentially impact you in very advanced ways. There are also other societal implications. There has been many studies to indicate that, you know, users are much more likely to trust the system when they have an explanation. And one of the interesting things that's coming out recently in machine learning is that what if this decision was based on historical data that happens to be biased, right? For instance, we know that there's a lot of historical data that shows that there was negative biases against women. Imagine this data is fed into one of these systems, right? You don't specifically, you know, bias the system. But because it uses this data, it's going to automatically inherit these biases. There are, of course, illegal implications. GDPR, I think, is just one instance. And I'm glad, you know, there's been some discussion that we need to look a little bit ahead that, you know, it's not just in the context of this specific instantiation of GDPR. We have to think about the long list of societal implications of collecting data. So, what do we really care about for this kind of explanation? So, I should say that, you know, as far as I, in my view, there is no system out there that completely implements explanations to be social scientists and cognitive scientists understand it. They think that explanations are easy in terms of between human beings. Basically, it's in three dimensions. The first thing is that it's contrasting, right? When somebody asks you something, you're explaining why this happened and not that, right? The second thing is it's very selective, right? At what time you brush your teeth, you know, what sort of clothes you chose. It's very selective. It's very directed about what you're asking. And ultimately, explanations are, ultimately, is a social construct. The reason we really care about explanations is because we want to learn about each other. We want to understand, you know, what sort of people remain. What that means in practice, of course, is that you have to understand a bit about where this person is coming from before you just give out an explanation, right? What you might have as a customer, right? What is often sold as an explanation in the literature is what is called a causal attribution or correlation. So, typically, you're saying, look, the reason the credit card was denied for you is because you're 35 and you are still unemployed. It's not a polite decision, right? It's hard for you to get a decision of that form, but that's the kind of explanation that's being sold at the stage. And not that I want to go into detail, but I just want to point out that in the literature of machine learning, there are many different techniques. They offer different amounts of, you know, explainability as you want it. You know, I can come back to this. Yeah. Oh, sorry, yeah. So, I know there are numbers on this, but there's one slide. First of all, let's ask if that has received the slide. Yeah. Have you guys received the slides? The questions if we receive the slides? Yeah. How were they transmitted? My email at the beginning of this session. Oh, okay. I'll dig them out and I'll make sure that they're available in the room. I'm sorry, I'm like 200 emails in debt today. Okay. I hope you could still make sense of the things I said to make sense of it. Okay, maybe, yeah. Maybe it's more interesting for you to see Yeah, great idea. We're going to switch the view to the screen where we are projecting the slide. This is that was the single best of video production choice of the day. Good. Okay. So what I was where I was at is basically seeing the landscape of machine learning models in AI fits into, you know, this kind of spectrum where you have some models that are super accurate, but not very good in terms of potentially providing some kind of readability and there are some models which are not that accurate but very readable. So, you know, I think again, my view, I think that in terms of experiment in machine learning in AI, they fall into basic types of categories. So the first one is essentially trying to use these kinds of interpretable models, right? And the idea, of course, is that you don't have to construct any kind of postdoc explanation. Now, you have to keep in mind that even when they say inherently interpretable what they really mean is that it's interpretable to an expert, right? So I'm thinking of an engineer to a computer scientist and this is very different from being interpretable to a user, right? That still requires a lot of work. The second kind of approach is to say look, guys, there's no way we can compete with these awesome and scalable methods. Let's try to see if we can come up with postdoc explanations, okay? I'll just give you some illustrations. Again, this is not something that you have to implement in a system that you might be building. This is just to give you an idea of the kinds of things that are out there, you know, just to have some concrete points. So, you know, going back actually to the history of AI and the classic things that people quite alone, what is called as decision freeze? What it tries to do is imagine you have a data set that basically says what are the conditions that you play in, right? So, for instance, if it's raining, you're not going to play. So what are these systems tries to do is it looks at this data and tries to construct one of these trees and so the numbers that you see next to certain things like play and don't play basically says the number of points for which this is true, okay? Out of 14 points, 9 of those points say that these are the conditions you play in and 5 we don't play and so on and so forth. It keeps sort of making it more and more granular. Okay? Again, this is a very old technique I'm just trying to give you an illustration. So, the type 2 thing where you try to do a post art explanation. So, I'm changing slides by the way. They can see that. The type 2 thing what it does is that you have this fairly complex model, okay? So, what it's done is looked at a bunch of data science and sort of built a time looking structure and basically the right things and the wrong things. Yes or no? And if you're asking about a very specific point, this big plus sign and what it tries to do is it tries to construct an explanation. At least it explains the class of data points around this thing. Okay? So, again, you know, the beneficiary is kind of simple because it could be that far away don't really influence the kind of concern that you have. Another example of type 2 thing I guess the echo is coming back but anyway. Another example of type 2 thing is some work that a colleague of mine has been doing. What he tries to do is he looks at the various learning systems plays games like Atari and Go and he tries to construct what is called the program from the learning model. So, it's saying it took a decision to move if such and such facts are ahead. What you have to notice is that these programs are highly granular. You cannot make sense of it by looking at this. It says things like you move if the ball's position is 0.0086. It's not readable from a very high level point of view but the hope is that if you do learn a program at this very low level view maybe you can cluster things high level view as you go along. There are sort of hybrid approaches that we've been looking at and what this tries to do is to say, look, why don't we do two things at the same time? Why don't we focus on these kinds of interpretable models and why don't we also construct intuitive explanations from these things. So, it kind of works that we've been looking at. We looked at some of the we looked at essentially science problems, right? One of the things we do as children as we're learning about science and math and so on we take a number of these tests like GRE and SAT. So, what we try to do is we try to understand one of the reasons we can't put answers to these things. So, we take some of these questions and we construct a kind of representation which we think is similar to how we imagine it. So, for instance, Mike has a bag of models with four white, eight blue and six models, right? So, essentially and he pulls one model from the bag and it's red. So, what you can see is that he has this abstract representation that there's a bag of things. We take the first thing out and it happens to be red and the rest is all the things that are left out of it, right? So, he tries to construct a representation the way we actually try to solve the problem and gives you an explanation how to do it. Our hope is that maybe this can be lifted more generally to a point where somebody who's not an expert in machine learning can basically just say this is my problem spec and it tries to generate a kind of representation that says what's going on, why do you get this answer and what should you be doing and hoping that this can lead to some transparent AI, right? So, that's all I have for today. So, to sort of conclude I think explanations are super important and not just for legal reasons, right? And as I said, I think if you think about the societal impact about making decisions for people's lives we have to be super careful about what these decisions are based on. My own bias here is that if possible we should prioritize these more interpretable models. There's some argument among people in the community that look, if you have a model that's 99% accurate, why do you care about interpretability? So, that's a good argument but if you think back about the Panda example and you think back that there's this 1% chance somebody in using a little bit of noise takes a decision way off and maybe you'll think back about prioritizing the 99% I think from the perspective of potentially building applications starting companies and so on and providing explanations to users I'm not saying we can implement exactly what the social scientists asked for, right? But I think these three features are interesting to think about how you can provide proxies for these features. For instance, can I provide an explanation to the customer that's contrasting and clearly says that why this happened and that did not happen can I help them in this cognitive overload and give very selective explanations and perhaps you can ask for more granular explanations as he goes along and finally an explanation is not just one atomic thing, it's very context dependent you need to sort of understand what is the user asking and so what kind of explanation is appropriate given the context in which he's asking that's it, thank you applause from MIT yeah, yeah just generalizing around what things there are, but you know context-wise it's going to be a part of what we're trying to do yeah okay, sorry there answering a question we are going to switch to another speaker here that's the second, putting the slides up so GDPR, ODI yes okay, no why do you have the slides projected at the end or do you want to keep this format it would be great if you could keep pointing the camera if it's possible to pull it back slightly or angle it so we could see the whole slide oh that's nice oh that's perfect okay great so let's move on now to the second speaker so we have Ruben Binz who is a postdoctoral researcher in computer science at the University of Oxford currently funded by an EPSRC project Petras Internet of Things Research Hub project respectful things in private spaces so I'm sure there's many issues like Ruben has authored a number of papers in last year even on fairness, accountability machine learning so I'm excited about what he has to say here thanks for all so I'm really glad that I was able to follow Visex's excellent talk because there are many overlaps between what I'm going to talk about here so it will help me to skip through the slides that I did because there are too many slides but many of them are very similar to the slide one of them is exactly the same so that'll be fun so I'm going to be talking about some research that we've done over the last year with colleagues at Oxford so we've heard a little bit already about this right to explanation I'm not going to go into too much detail but it's worth pointing out that it's actually quite an old right it exists in 1978 case protection legislation in France and it was also to some extent in the 95 case protection directive then we've got it now in the right to explanation in the GDPR as you may have heard there's controversy about whether the right to explanation exists and what it consists in pre-GDPR there was a lot of, before it was passed when it was in the drafting stages there was work by Mariah Hildebrand that was just talking about this as a new era for the transparency of profiling what my colleague Lillian Edwards and Michael Veal wrote recently was that whether it exists or not and what form it exists in do we want it and why would we want it that's kind of what this research set out to explore so we wanted to know how should these provisions in the GDPR be interpreted what kinds of explanations might be available from the cutting edge of research in this area and what kinds of people might want them and why and will they be happy if they get them so I'm going to try and word through these because a lot of this has already been covered in the previous talk but there's a sort of general principle in intelligence systems research that we should build systems that can account for how they operate to help users understand how their tasks have been accomplished and when people built rule-based systems so these are systems that are based on explicitly coded knowledge bases and inference sets it was much easier to generate explanations for outputs because you could kind of go into the rule set and extract the rules that we used in that particular decision but machine learning is a little bit different and so by machine learning here I'm talking about essentially ways of automatically identifying patterns in data and in particular I'm mostly talking about supervised machine learning where you've got labelled data, sort of the data from a domain about behaviour of people for instance where different instances in the training data are labelled with the categories that you're trying to predict or classify and so you get the data you train it on a learning algorithm that produces a model and then that model can be used to predict or classify things by showing new instances that it hasn't seen before and then it generates a classification or a prediction and on some level, on a really basic level machine learning is not really that different to fitting a line across a set of data points and when you've got really simple models like this one it might actually a graph like this might give you the explanation that you want so this just says the higher your salary, the better your credit score will be and that's a kind of explanation that you might want. You might be wondering well if that's all machine learning is why can't we just use things like this so one reason we can't do that is that there might be more complicated relationships, there might be lines that aren't straight, lines that go up and down in the case of C it's not a straightforward explanation that you can draw out of that it depends on where you are on the gradient mode another reason is that usually we're not dealing with one relationship between one variable and other but we're dealing with multiple variables at once so this is just using this is just three-dimensional but in many cases it's going to be multi-dimensional data so as we saw in the last talk one way that you can explain machine learning models in some cases the model itself is a decision tree so you can trace, if you want to know what was the explanation for a particular output, you just find the leaf node in the tree that was the output and you trace the thinking all the way to the top or you can do it in a reverse order and that in a sense is a form of explanation often these decision trees are very complicated and they're really only one class of machine learning models there are many other kinds that are more complicated so one form of I can't remember the terminology that I used but this kind of matched onto some of the terminology was that? attention attention so this is a way of looking at all of the possible inputs and outputs and then generalising about their relationships in the model but then there are also local explanations which try to explain a specific portion of the model so if you want to look at well okay we have an individual who's salary is 30k and age is 30 what can we say about how the model treats that particular input and the relationship between the input and the output and nearby input spaces so the people who are between between 25 and 35 say whose salary is similar to this person what are the features that contribute to the negative or positive classification in that portion of the model so you may be familiar with this slide but yeah this is an example from a paper about a system called LIME which is local interpretable model agnostic explanations and so yes I think I should explain that in the previous slide but this is an example of what LIME can do so the image on the left is Google's inception image classifier and this is using LIME to explain how the classifier comes up with its classifications and it classifies this image on the left as Labrador electric guitar, acoustic guitar and with LIME what you do is you perturb the inputs to see how that affects what the classification is so you systematically blur out portions of the image and either called super pixels or hyper pixels basically a way of dividing the space of the image up into sections and you systematically blur out those sections and you see how that affects the classification and you do that across all the different inputs and then the portions that affected the classification of electric guitar, that's the of those portions of the image and then the D you can see the portions of the image which made the classifier think it was a Labrador so these can give us useful things like for instance this is part of a study where we did a broad overview of many different types of explanations so I'm just going to go through some of them here so what we called sensitivity score is essentially a way of saying well if you perturb inputs into the model what changes would you have to make in order to change the outcome class so in this case if someone didn't get the credit score that they wanted that they needed to get their loan we'd say well what other things would you have to change in order to get the positive result that you wanted and this is actually implemented in some credit score services that exist and this is what I think is the one from the US so you can say well if I opened if I've got a new credit card if I made a transfer balance if I close my old card you can see the effect on your credit score by changing that another kind of explanation which we found from the recommended systems literature called explanation so what this does is it says if you want to explain why a particular input got the output that it did you find an instance from the training data on which the model is based that's most similar to that input that's got the same output so for instance if Marion is like Vivian and Vivian paid her loan back that would be one explanation as to why the model thinks that Marion will pay her loan back another kind of explanation would be well what are the characteristics of demographics of people who've got the same outcome as me or what are the characteristics of people who've got a different outcome and then finally there's some new interesting work coming out that's looking at mostly explaining image classifiers that are based on neural nets and what they do here is they isolate specific neural pathways in the this is a representation of an image classifier and it looks at which neuron defiring when the image classifies certain elements in the why the model classifies certain images as it does and what they found is that it looks like there are kind of particular pathways which activate things that look a little bit like floppy ears, things that look a little bit like pointy ears, snouts and so on so there's kind of ways when you're looking with especially with image recognition you can kind of draw out some sense from what's going on in the black box but what these things don't tell you is about causality so they don't tell you how things in the real world why is it that certain attributes of a lone applicant make them more or less good as a credit risk and what they don't tell you also necessarily is what it is that's caused the output in the model so we're equipped with this kind of overview of some of the explanation styles or explanation systems that have been proposed we wanted to think about who might want explanations and why and in particular in the context of the GDPR so we decided to focus on the decision subject this is someone who's had a decision made about them that had a significant or legal effect on them and that was made using purely automated processing that's the kind of things they might want to do so they might want to understand it in order to decide whether they want to mount the legal challenge ask for a human to review it or change the inputs that they've given to the model for instance so it's no longer forthcoming work it's work that's been done and published we did a series of controlled experiments test people's perceptions of justice in response to different hypothetical cases we did several different studies some were in person long in detail studies where people came into our lab and we did a series of online studies based on the in-person studies that were quicker and had more people involved we wanted to test people's perceptions of justice in response to different hypothetical cases and for that we relied on work that's done in organizational psychology which is mostly in response to human decisions so usually in firing, promotion, disciplinary procedures and so on and this is a kind of a large body of work that's been going on for several decades and that breaks down perceptions of justice into different aspects of informational, procedural and distributive justice informational is the information that was provided when the decision was made procedural was the process that was used to come to the decision a fair one about whether the distribution of outcomes across the population was fair overall so we presented people with different hypothetical contexts so imagine Sarah has been evaluated at work by a computer system and that was used to decide whether she got a promotion or not we looked at employment, insurance travel, fraud and financial loans and we gave people the same context and the same outcomes but different explanation styles that were drawn from those different explanation systems I mentioned earlier and these were kind of standardized as much as we could and we used these four different explanation styles so case-based sensitivity influence which was a measure of the importance of different features in the classification and then demographics so what other people who got the same decision what were their demographics like and so in the in-depth lab study we got people to talk through their responses as they were considering each case and so we got a lot of rich qualitative data from that so one person or many people but this is one example talked about whether they felt it was okay for statistical inferences to be used at all in these kinds of contexts as she said she's been a victim of this computer system that has to generalize based on like somebody else what it should be looking for is the ability to pay back the loan not characteristics of others who couldn't pay back the loan in particular people seem to really object to these case-based explanations so this is the style where the explanations for the output goes and finds the instance in the training data that's most similar to the person in question and people seem to really dislike this so just because Joel performed badly in the end doesn't mean that Jing will perform badly because there could have been other influencing factors so maybe Jing is despite having many similar workplace behaviors to Joel maybe she's actually an excellent helper she should get the job there's sensitivity explanations so this is where you imagine counterfactual situations where you had different features or different inputs some people thought this is actually quite a good way of proceeding because especially in a case where you're trying to apply for a financial loan the counterfactual explanations tell you the specific things you could do to try and change whether you've got the loan or not so in this case if he's not earning enough money for the company to fill they can give him the loan of £5,000 they're also giving him alternatives if they have an extra £2,000 income you'd be successful or if you asked to borrow less then either would have been successful so this is the right thing to do reject him but give him alternatives and then people reflected on whether they thought these things were overall fair this is mostly talking about the distributive distributive justice aspect so one person said well fair haven't really come into it because it's a cold decision it's not fair at all but it is understandable from the perspective of the business one of my favourite questions from the study was someone who said well after thinking about what it meant to be fair in an algorithmic decision making context he said well maybe what's fair is to just be completely chaotic and award the promotions or the loans completely at random chaos is fair so those were some of the qualitative responses we got for the online studies we had enough participants that we could get statistically significant outcomes from the study so one of our questions was do these different explanation styles affect people's justice perception in response to these explanations and we found the yes and no so we did two different experimental designs one was a between subjects design so that means we split people into different groups and we gave each group one explanation style in different contexts and then we compared their responses across groups that's between subjects so each participant in the between subjects design got the same explanation style over and over again in different contexts we also did a within subjects design so that's where one participant in the study got different explanation styles in different contexts and in one case that was in a sequential way in another case it was in a comparative way so they could see several different explanations side by side for a single decision and we found that in between subjects there were few statistically significant differences between explanation styles and we think what happened there is that people basically became habituated to the explanation styles two minutes whereas within subjects design people were able to compare explanation side by side and that caused them to think differently about the decisions so yes people are unsure about how to attribute responsibility when there's a rhythmic decision the role of explanations in influencing people's perceptions of justice is complicated even when it's contextualizing a specific way and the explanation seemed to make the most difference to justice perceptions when people could directly compare them to a single decision so I was asked to kind of reflect on the extent to which these explanation services might be facilitated by infomedary so this is a term that Nico used in his work so I think this could work in different ways in different contexts so in a business to consumer context you can imagine trade association consortia of companies working in different industries could combine to create standards around explanations you might also see PIMS personal information management services who are kind of facilitating explanations and given you know particularly so the thing I just mentioned which was that people tend to pay more attention to explanations when they're comparing different explanations side by side that seems to lend itself quite well potentially to comparison services so when you're applying for a loan and you're comparing multiple different providers if you could compare explanations for why you would or wouldn't get the loan or the terms that you get the loan on between providers that could be quite useful and then in the public sector employment sector I can imagine NGOs or trade unions playing a role in developing standards or developing explanation systems so I am there I think that's basically yet focusing on other forms of explanation services that would be more oriented towards decision makers who are using this and then finally there's an interesting project in Germany which is using subject access rights to try and reverse engineer credit scoring algorithms used by the main credit rating agency in Germany so that's what's happening there Thanks very much Ruben that was very enlightening so again unless there are very burning questions since we are running out of time here let's move to the next presentation GDPR Transparency which is also the lost I think presentation so Paolo means he's a reader in large scale information system in the School of Computing at the University of Newcastle and he has a lot of expertise in metadata management and provenance as well in system design and architecture yeah I'll leave it to you Paolo Thank you so the bad news is it's past 6pm here everyone is really at the limit I think the good news is this is the last talk the other piece of good news is that basically the previous two talks basically cover half of my presentation so it will be very short excellent so that's coordination for you that's a good sign I think we interpreted this call in a way that is kind of consistent so but I won't go into much detail about then what is meant by algorithmic transparency explanations from the point of view of the model but we have seen two perspectives where well you're trying to open the black box which is the model that makes the predictions either locally or globally and another perspective where we basically use the model as a black box and we challenge it by providing input that is carefully crafted i.e. perturbation the input to see what happens and try to get the model to review something about itself just two very clever approaches what I would like to talk about is yet a third idea because I'm not a machine learning person natively everyone today is but not natively so I'm more of a data management person and I like metadata okay so let me very quickly at the background still use this to set the scope we're interested in decisions that actually make a difference to people and that people have really no control over so for example all the world recommender systems is out of scope because you're free to follow the recommendation or not in all these cases that are here basically decisions are made on your behalf either by the machine or more commonly by some human who trusts the prediction made by a machine it can be an intermediary to try to it can be someone who pretends to understand what the machine does but these have consequences and these are the ones that people care about as we have seen this is a classic case now it's been like a textbook case just to go over you know if anybody not know what this is about compass yeah so should I say something about this yay thank you so this is one of the systems that is used to present the presidivism which is basically the likelihood that you reoccur as a criminal and a criticism was made to the system in public so that a pro public on the web page to say oh look black defendants who did not presidivate over a two year period like this we need to specify that higher rate compared to this white counter pass which obviously touches the nerve everywhere in the world especially in the US and the counter of that is also true white defendants where it seems to be obtained within the next two years where mysterious levels as low risk so that's clearly raised the flag of saying what is this conference system doing is that we can plug in before we die is this working okay so to which now I apologize because there was a reference to the paper that makes the counter argument but somehow it doesn't appear fundamentally the idea of the counter argument was there's nothing really wrong with this conference system the problem is that the data is biased in such a way that the model is reflecting the bias in the data that's basically the story so when they get it right the conference gets it right in a fair way when it gets wrong decisions that make mistakes making mistakes is very different way we need to classify the white and the black to kind of think in a different way in other words the false positive rate and the false negative rate are very different so that's just to give a motivation of why one would be interested actually in addition to looking at what the model is so models, okay this part has been covered very well by the previous speakers there are opaque models and in terms of models generally there's a trade-off you have the child complexity and accuracy opaque models tend to be black boxes if you're on that course so the decision tree is mentioned as in particular models which are in this class if you normally what you do, you don't use a decision tree you use an ensemble of decision trees for example a random forest and voila, you basically made your life very difficult because all of a sudden you can't really see what's going on lime has been mentioned and I have the canonical taking a picture from live fantastic and so here's the reference there's also work that I recently discovered but it's 15 by Cervana and ours from Microsoft research on generalized aesthetic models from fair white interactions people look at middle ground where the model is not linear because that's too simple but it's not completely arbitrary so there's some hole that you can extract explanation out of it okay so I'm going to skip this this is the point that the linear paper makes the lime paper makes to say don't really trust the accuracy because as we already saw first of all there can be adversarial action that people fuses the prediction but even when things are done without an issue in time the model can get completely wrong because it is a drastic difference as a very high effort but these decisions are both correct it's very different to predict whether a blog post is about updating them or a religious blog post but the way they come to the conclusion is practically different so we use that particular prediction are kind of significant in this case and completely random in this case so at this point if you have this kind of post-explanation then you start doubting that the model actually is trustworthy so lime this is the picture of lime again lime has two ambitions one is to be model as master it basically looks for a very linear neighborhoods of your habit at that point so I try to explain why you got a bad score on your insurance application and local business ok we've seen this now what I want to talk about really is the path of the picture that is less looked after so most of the MI community works here and what I'm suggesting is that all that happens up to this point so before you actually feed data to the model is equally important because for example data bias comes from here and see the decisions are made during data collection and data preparation so how do you collect the data and Thomas earlier from the MIT mentioned provenance this is basically where this is going so the more you know that the contention is that the more you know about what has happened in this place the better off you are in explaining what the model will do in addition to what we have seen it is not a way to snowball throw away everybody else's efforts but I insist that this is not looked after well enough and as research did out and basically you want to answer these questions can we explain these decisions and are these explanations actually useful so we haven't done any user study Ruben has done a fantastic study on trying to understand what people actually expect so we haven't done any kind of this and we try to start doing can we explain the decisions and what does it say so what this said actually is to what extent does metadata analysis on the data have explained for example model bias so the two things there's the provenance of the data and actually the data itself which is used to train a specific model so for example if you look at data profiling there's lots of literature on data profiling from my community essentially but you still do very simple things it's like okay very quality measures that's really really not enough and when you look at what kind of data validation is performed before drawing the conclusion from the analysis that in reality there's very little okay so I have two kind of random examples from Pagol that I picked up today actually one is you know both are interesting the data analysis of gun violence in the US and basically if you follow this link and see the excellent documentation about this there's absolutely no profiles so basically I trust the data for what it is and I draw conclusions from it okay I don't question the origins of the data the second example global terrorist attacks also you know pretty heavyweight and again all they do is try to detect all values and maybe eliminate things that there are you know data that is incomplete okay so there is a lot more that can be done and the list of things that you can look out for is pretty long so there's you know when you collect the data and you do data wrangling which is basically the dirty art of getting data in the right shape possibly for multiple sources integrated in a proper way you know joining different sources and so on to all these pipelines that preprocesses the data and massages in the way that it then ready to be fed into a modern train and decided as the approach for patients okay so this data before he's massaging in very many ways and here are some of the things that actually normally people do the bad news is that lots can be done and if you do it wrong you get a model that you cannot trust the good news in my view is that much of these can actually be documented and if you bother to track what happens here ie to collect the provenance of this data set here as much as you can then you are in a good situation because you can try and reason why the data looks in a certain way and why did you remove outliers how did you do it did you do it at all what kind of thresholds did you use what kind of techniques and so on and the good news again this is actually much needed to explain that the model because what happens here is actually you can read the script it can be your Python script typically it can be your pencil for all scripts in Python Spark whatever you can have workflows 9 is a nice work format system that is completely transparent so there's actually a lot of transparency here that is not in the in the model and once you know once playing that at that point it's a bit too late to understand what's going on because many of the decisions are made here but you can actually go back to that yes if you have never seen a provenance graph this is what it looks like I had the privilege of working with a provenance working group at the WCC and thanks to the MIT for hosting us for two years 2011-2013 we ended up producing actually graphs which means that you can actually then collect provenance represented in a way that is formal and then query and analyze it and you have a very nice metadata platform where you can hope to answer this kind of question how far this is going to go in terms of providing explanations that complement what the machine learning people are doing it's yet to be decided that we haven't published anything so you can give it away okay lots of references if you want to keep the slides around and coordinate thank you you're a perfect candidate for our community I hope you're going to make another you don't want to hear that but I think that's my last one is Thomas still there yes okay so now basically it's impossible to manage everything because we had 10 minutes for questions from MIT and in theory we have two minutes until the end of the program but we will keep it a bit longer going oh yeah we're going to have to jump off here speakers can I ask a question yeah unless somebody knows any clue of the answer so if can I speak? yeah so if I find out that you are 6 feet tall and my algorithm decided that all 6 feet tall people are liar so more likely to be a liar am I obligated to release it to you that my algorithm decided that you are a liar yes to you I guess so to whom in particular are you asking? it's an open question it's a legal question well it has an answer I don't care I just want the answer I think I do GDPR would have liked transparency GDPR would have liked transparency GDPR would have liked transparency I mean there would be several processes but as an individual you have the right to ask and get the explanation and if you're not satisfied or even if you're satisfied you could still report it to the regulator at national level and they could do an investigation and they have the right to look at the details of the algorithm so IP for example intellectual property is not a bar for looking at the algorithm thank you did you hear that? sort of kind of did anybody else understood it better? yes I did the answer was yes yes okay great just imagine the height of the regulator okay good stuff I guess I'll take a whack at it in the room here the way the question was asked I think that number one under article 15 you'd have a right to know that there was personal data about you and the personal data was something along the lines of you know user one two three or Jane Doe is likely to be a liar at least that data so a personal data and then there seems to be this we're trying to find it here but it's like article 13 or wherever you have the right to get some more information about the logic of the algorithms that are used that impact your rights I guess there'd be a second question there which is the result of this algorithm impacting your right in some way so I guess that might be a trigger question there but I think it's a good question because imposing hypotheticals is one way to let all of us start to grapple with the meaning of this new regulation and how it's going to play out and what we can do to start to engineer systems to support and reflect this legal regime hey I want to thank everybody in the room that oh Mark is coming up I just want to thank you all for making this available and for Thomas to be able to present and for us to be able to get the benefit of your panelists in the room there I think we should stop this stuff as another successful collaboration with Mark Lazar and the Consent Receipt and Affiliated Communities so thanks all yeah thanks Dad thanks to you we look forward to seeing this stuff happen at MIT with Open GDPR I mean it's a really good effort out there I'm really excited to see that stuff happen the whole thing Does anybody have any questions for Thomas for the guys we're good we're going to drink beer now okay great we'll lift the glass together kind of asynchronously okay thanks all I wanted to have a discussion yeah I know do you mind using the room in London thanks I mean I do need to then Toxi can you hear us okay yeah I can hear you oh there you are hello how's everyone doing interesting talk yes learning about the algorithms so one of the things when you're done with your talk when we have some discussion maybe explore if we want to hack on your API a little bit and your open source code could we maybe look at adding another capability that in some way addresses the rights people have with respect to the logic learning about the logic of algorithms that resulted in some decision that impacted their legal rights so I see you've got the subject access request the article 15 stuff you've got a erasure request and you've got some other stuff maybe we could talk about what would it look like if we hacked and tried to add something for algorithm decision making and then maybe if it worked okay we can make a pull request and your team could review it yeah let's talk through it okay great so without further ado Ben Hoxie is very kindly agreed to join us from New York City United States of America and he is the project manager at M particle and it seems just from reviewing the github repo a frequenter of and you know kind of like a of open gdpr which is currently my favorite open source tool with respect to supporting and reflecting gdpr in code that works in a way that I think makes a lot of sense it being API driven so with that thanks for joining us and we'd love to learn more about open gdpr and hear some of your perspectives yes thank you first of all thanks for having me I appreciate it and I appreciate the interest in the project I don't have slides so feel pretty interrupt me at any time otherwise I'm going to give you a quick rundown and then we can chat great so let me give you a little bit of context about what M particle is and why we're doing this other than altruism M particle is a customer data platform and if you're not up on the latest in the marketing stacks that means that we work with clients who are gdpr controllers to automate data collection and data subjects to standardize it and then to distribute that data to other processors so basically our entire business is personal data and it is customer data so we have aside from my personal interest in making sure the gdpr works directly we have a professional interest in making sure that the challenges of our clients with regards to compliance are smoothly met also our business is really about integrating data marketing set it makes sense for us to help out here in terms of making it easier for them if we're going to help them distribute the data to help them be compliant with regards to the data so that's sort of the context we founded open gdpr a few months ago with a couple of our best and breed marketing partners of apps flyer amplitude and braze and we've been working with them to evolve sort of the concept and the spec and the repo and let me try to break apart what the scope was and what we intended and then we can talk about where it's going you know when talking to our clients and controllers we see that the topics of most concern were lawful processing which you know there's been a lot of conversation around consent, legitimate interest and the other four lawful processing umbrellas and the rights of the data subject under I think it's chapter three around access portability, erasure, rectification suspension there's a bunch of others and so we looked at both of these and so we built a product in our platform for consent and there's an IAB consent framework from the ad tech world and I think Google launched a consent framework so consent is both very complicated and also kind of addressed well evolving I should say it's evolving and it's getting lots of attention but we noticed that the rights of the data subject really didn't have anybody putting a stake in the ground publicly to say this is how it should happen and instead what we saw was a lot of processors saying if you need to fulfill use this new API send a delete request to some bespoke API in our platform and pass us these two IDs that you have to collect and good luck and so in the same way that our core business is really about collecting and centralizing and standardizing regular customer events it made sense to try to standardize this piece just to say this is what a request is and this is how they flow and this is how controllers can be implemented both internally and to connect to their processors so this was sort of the idea was to lay out a standard is the wrong word to lay out a spec a common spec for people to articulate an interoperator on the request so that's sort of the vision where we landed was basically a description of objects and of a mostly restful API that describes a workflow and a defined status or the list grows so I think the list stops there that defines how controllers can represent the request, distribute them receive the status and then it's up to them really to aggregate and to respond to the data subject I think the distinction of one of the challenges in the whole thing was really the distinction in the responsibilities between the controller and the processor so you know I could walk through the spec that's probably super boring maybe I'll talk a little bit about what what's covered in it right now and where we're thinking it's going today it only covers request, data subject request for access portability and erasure and even within that it's a little bit a little bit funky because erasure is relatively clear to everyone I think of the folks that I've talked to the distinction between an access and portability request is not crystal clear and in terms of what are the processors obligations in each I think for portability it's relatively clear that you have to extract the data but for access the question is how much of the processors input does the controller need to respond correctly to a data subject about what's been saved and how they process it I will also fast up right away that the other rights are not explicitly addressed so the right of suspension and processing, of rectification because one or two others they aren't covered and partially that's because it was not obvious how that would be accomplished by API but maybe that's an interesting topic for in two minutes could you repeat again the major elements of the scope of the regulation that are not covered yeah so we set out to address the sort of loudest data subject rights of access portability and erasure there are other rights in the spec and the spec doesn't say these aren't important sorry the regulation doesn't say these aren't important so the right of rectification the right of suspension processing I think there's one around the right to object these are not explicitly covered right now in OpenGDPR and I think are reasonably good topics to think about what would it take to add them what would that look like you know when people ask me about those I usually say well the right to rectification is pretty easy or the controller sending some new data and you're good but you know we've added a lot of maturity around the other rights so maybe it proves us not to do that for these I think the one final piece that I'll mention before before I'm happy to chat is OpenGDPR does not define what each processor must do it's really a framework to encapsulate the request and to track the status rather than interpreting the regulation because that's impossible within the context and the architecture and the business of each processor so it's really still a processor to say when I get this kind of a request you know I'm gonna have my defined fulfillment process and according to my architecture and my business and the data I'm gonna execute it in X way and of course to communicate that in whatever legal use and PPAs and you know communication is required to their controllers perfect so so in the scope is my personal favorite article thank you article 15 access to our data and then also the deletion or so called erasure in their language and GDPR English translation and the right to object and then not covered are rectified you know so you know basically correct errors restrict processing through request restriction and processing and then I would maybe add to that what I was mentioning earlier perhaps also some way we could cognize the rights people have with respect to algorithmic decision-making and being able to gain well there's a kind of a mini access request here tell me about what was it the logic or you repeat what you found in there is like the logic of the algorithm yeah and what was the section again so it's coming up I remember that it's about profiling right so automatic decision-making and profiling that results in sensitive either use a sensitive data or results in impactful decisions you're allowed to see what the result was or how that decision is which I think with the intention of like you know you didn't give me a loan you have to prove to me why and what data you used that kind of stuff yeah what data you used so to me it seems like there's three aspects number one like what was the input data oh well your input data was incorrect that doesn't relate to me or it's old or whatever or the number two is the could you repeat it again it's the model of the algorithm it says if you sort of combine a few pointers together seems like articles 13 or 15 say decision based solely on automated processing that produces legal effects or similarly significantly affects the subject when an individual is subject to such a decision the GBR creates rights for meaningful information about the logic involved in that decision that's it so the second thing is meaningful information about the logic or let's just call that the second thing is information about what was the algorithm that you used so what was the model like walk me through how the function worked as Sam said and then number three would be the result obviously the result is you know Sam is a good credit risk because you know Sally is likely going to be into the ZBO or 5pm I would say take it sort of one step further it's not always a black and white answer that these algorithms process it's from various data points that you make the system is effectively making certain assumptions and then based on those assumptions it's then evaluating the answer to that answer so depending upon how granular without violating you know intellectual property as much as possible the time you want to have that happen is being audited by government entity you know how granular can you get into getting not only the data that's being used but also the assumptions that come before the evaluation you know what's interesting is just reminding me there's a piece that it says affordability definition we are not obligated to extract inferred or calculated data so they say that on the portability side you only have to give them data about that person about the data subject but you are allowed to ask as you all just mentioned about the process for decision making and so my guess is that you're likely to get the result of the data subject to raw data on the portability side and then on the automatic processing question you're likely to get a very broad view of what does it do because we're made with that data I would guess that most people aren't going to give you you know the ML bias tree they're not going to be that detail that's going to be very high level I suspect it would be nice if they did but a lot of expensive rockstar ML people are going to be out of a job if that's the truth I would think there would be a starting point to extract the logic from them because then you can, based on the UML you can determine what does it actually need to be protected in terms of the natural property trade secrets etc and you can start to abstract it doesn't need to be reflective directly of the UML architecture but it could be an analog to that as far as the logic that's being produced I suspect that they're going to summarize the model in something like X third party dataset that's economic data it combines your X personal information that we've already shown you and it results in a probability score for a purchase or a probability score for a default amount of loan and then they'll probably stop there and then it's up to the data subject to say well the input was wrong because they did a portability request and now I got X for some other rectification to say that's not right Is there a standardization that you've developed around the portability? No because I'm grouping access and portability together because it's my belief of course we'll see how the market begins to respond to these requests in two days but it's my belief that the access request really is a summary report about what was saved and so a controller may need to ask the processors what kind of data do you have for this person because they might not know but a portability request is really just a dump, a full extract of everything that was reported about the person again not inferred or calculated for our own implementation it's public so I'm happy to tell you about it it's just a JSON extract zipped up I think there's like a one line per day or one file per day or something zipped up and then it's the controller's job to aggregate that across processors and deliver something something reasonable to the data subject So it looks like it's sections we're just scrolling through now 6.3 in the specification has the discovery response 6.0 let's see data format it's in the status up here in self-respect there's a result URL and we don't actually define in the spec what the format has to be so once again this spec really is very limited in scope it defines the statuses, the life cycle of communication but it does not define how the processor fulfills it and so we go as far as telling people how to communicate the status and what the result, where the result defined the result but not the format of the result Something I might want to suggest you take a look at is RDF and semantic web standards because that might present a data format that is not only malleable to your own processes but is at least some form of a portable standard because the definitions of the data that you are considering as fact are included within the structure of the data itself and so effectively you now have a portability where someone can take that data and even though they might not have the exact same data fields or the exact same semantics they can use the ontology structure of the RDF store of the semantic data storage to translate into whatever format that they would make use of therefore being able to have consistency of not only export but import of what is portable data that is a huge point the specs just says that it has to export it in a machine readable format and I think there is a bit of skepticism for myself beyond the data subject being enlightened to browse what was stored which I think is a primary concern and largely addressed with the access request but skepticism for myself is anybody going to accept somebody else's data export like if I go to Facebook and say export what I have are you going to give us your Facebook export and we will be off to the races that is a pretty intense re-platforming of the data to understand what each column means and that is a monstrous thing we should probably get through some of the core items that we want to talk about so I think you have kind of segwayed pretty well from Sam's previous observation into the identity and authentication question that we have so how are we fixing to implement if we were implementing Open GDPR and maybe try to test it out a little bit and how what is the idea or what is the vision for how data controller or processor that is receiving the request would authenticate the identity of the requester to ensure that they are complying with responding to the requester but that they are not now violating the law by revealing the data to other third parties that are maybe posing as them this is another huge point so unfortunately I am going to be a bit of a cop out in that it is the controller's job to communicate with the data subjects and to validate their identity and to make sure they are not leaking or erasing data for the wrong person and so for logged in experiences this is relatively easy in that they can require that they submit the request while they are logged in and then they have X security around identification and authorization of the person if you are a publisher or most of your your data subjects interact with you anonymously it is super hard and depending on the risk of the data that you collect in process is going to define the measures for which you go to make sure that person is who they say they are so if all you are giving them is web cookies maybe asking them for the cookie and then maybe that is enough maybe you don't care because it is so low risk if it is super high risk again there is a nice trade off here that most high risk things you are logged in or you are already authenticated but if it is high risk you may need to ask the data subject to sign an affidavit or to some other way prove their identity before you communicate with them sensitive data but again this is out of scope because this is up to the controller and their communications with the data subject understood and then the second question at a high level was what is the current status of the project and what do you have a sense of a roadmap or what you would like to do next by way of developing the project further? Yeah totally it is a good question so right now it is in the what you see on the github is the state of it in that there isn't reference code I would love to have a swagger doc I would love to have reference code I would love to have some implementations that people could just grab and stand up we have implemented it ourselves and it is our API standard for handling data subject requests we have also gotten a lot of interest from partners to a handful have reached out and said hey we implemented this we are working on this or they have asked questions but because it is open I am going to be honest I may not know everybody who has implemented it and that is the beauty of it but we are working with again those three founding members and it is a mistake in the ground rather to keep our feet on our legs to say this is what we are going to use and to try to swing some of the controllers and I think once we can get a few controllers and a few processors on board it will become commonplace in terms of roadmap to be honest it is the 24th today so I am happy that we reached a v1 and that we are going to it is live and that we are using it and that the world hasn't fallen apart yet which is always good we are going to get in covering the other data subject types if we can find if we decide that they fit the model to be honest I have tried not to think too much about them and you have to step out the door but if they fit the model that would be super interesting we always wanted to do something around consent again it is called OpenGDPR it is not called Open Data Subject Access Request so I would be totally open to making another project within the repo to address consent to address another big chunk of it of interest you have another comment here you said you don't have any reference implementations you wouldn't have to have an example of the JSON data dump of the portability aspect that we could potentially tinker with so the JSON data dump is going to be extremely dependent on the processors structures and what kind of data they have and how they format it it doesn't say it has to be an X standard format other than the JSON but JSON is super open because every company every processor is going to have their own data store some are going to be keyed on one thing some are going to be keyed on another they are going to have their own objects depending on the way they extract and sort of prepare the result it is going to be totally processor dependent on where it is not going to be very helpful though give us a sample of one from particles at least let me see if there is one in our public duct type I am happy to reference you to it any other questions or comments in the room I had a general comment that is a bit of a higher level I am thinking as the world moves towards cloud environments and 100% SaaS environment and startup world how does that impact GDPR where your data is stored in multiple CRM multiple repositories and I imagine it complicates the ability to kind of aggregate everything on a per individual request how do you think about that that is a great question your point is 100% right and the business of MParticle is in collecting and distributing data subject data there is lots of cloud companies but the average marketing environment uses 75 processors the customer data is going everywhere and MParticle's business is to help them do that this is why we are helping them to be compliant as well it is a huge challenge the pure legal answer would be the controller to merge the results and prepare a response to the data subject I will be honest I don't know how people are planning to do it there is a difference in the work for an erasure request you don't have to respond to anything you can say great we did it we have executed the erasure request you don't really need a response for access and portability you really only need to know one copy of what you have collected you don't need to know every copy because you don't need to get 10 versions of the same login event so in some way if you have everything if you are spraying everything and you have no centralization then you have a huge challenge if you have some centralization you can extract it from conceivably one place and you don't have to do tons of format aggregations but you are right it is super hard if you are spraying it all over you don't have one place you may be sending out differently formatted files I don't know if there is another answer for that so it sounds like at a high level I guess if you could say the staff layer that OpenGDP operates on at this point I might call the envelope or the wrapper and it is not so much like the payload to get to what Sam was saying which is fine that makes a lot of sense it is maybe hopefully so perhaps looking forward one thing that we could focus on here if we were to build out the code base might be not to go deeper into any one but maybe to just complete coverage so like get something on consent get something on algorithmic decision making kind of getting the subject access to that and perhaps just one or two more things and their notices maybe or something and then start doing deeper dives once there is full coverage I think that would be super cool I think your idea of using I think it was RDF that defines the schema or you could use JSON schema or something that gives a little more structure to the raw export would also be super cool these things make a little harder to implement for the processors but if it is more valuable I am into discussing it yeah okay so any other questions or comments in the room here no okay hearing none so where so I am not sure where the next speaker is but you are welcome to actually I should say if you ever come up to Boston please reach out you are happy to meet up and if I am here anyway to host you and then I think I really do like the approach of your project and I am looking forward to trying to implement it sounds like Sam is interested in checking it out as well and I think it is one of the key missing pieces and when we look at law at MIT we are very much interested in identifying ways fundamentally MIT is an engineering school so we look at what would it be like if we could do a better job engineering legal processes and I think your tool is a good example of how to engineer the kind of functions and the flows and the technical capabilities that give life to the rules so it is not like yet another amendment to terms and conditions or privacy policy it actually effectuates what the expectations are so that people can conduct these interactions and get the results in a measurable way so I guess I just want to say I want to salute you and the team that came up with open source code it is a really good start and I do think it is worth spending time on and seeing if we can start to implement it and deploy and get more feedback that is going to make it ever better I appreciate it and I appreciate the conversation thanks for having me it is a pleasure thanks Ben cheers I am confused if you do not share the copies of the data from other sources we are going to mute for a moment for everyone out there in internet land while we get the next speaker online hello I can't hear but I can see hey Nick hi welcome thank you so glad to have you so this is so Nick is co-founder of thank you what is that in the background there sorry that was the YouTube live stream which is how I was watching before Ben by the way talking about open GDPR that was awesome can you have the show do you want to just get us started maybe do an introduction do you mind doing screen share to show us your slides and kind of advance them as you're ready let me just pull that up my name is Nick this is Jonathan here co-founder and CEO of T-Machine let me pull up the slides real quick and we'll get going awesome so while we're doing that part of the reason why we've invited T-Machine to present is because it seems like there's an opportunity in the spirit of engineering the law achieving more predictable legal results to look at how data can be used within say an enterprise that is among other things trying to comply with GDPR to help basically structure the implementation and deployment of for example GDPR compliance that we just looked at in a way that's more measurable so we can see things like well how are we doing, how well are we performing has this been successful and then perhaps even regulators in the aggregate could look at this kind of data to see how is the law performing has the law achieved what we would consider success for what our expectations were so yeah San Francisco so with that Nick T-Machine team take it away yeah that's a pretty good intro and can I see the slides okay on my end? yes can you go full screen present? I probably can yeah let me just I think if I want to go full screen I probably have to change it so that it is the whole window not just the application window so let's try it how's that can you see now um yeah way better okay great cool so yeah that was a great kind of intro here Dawson I had a really wonderful conversation about the application of measurement along the process of the law and that's kind of one of the examples we'll talk about at the end it's probably the least tangible the three examples I have but we can kind of talk broadly about how the system works and then more about each of those particular examples and I will say too I know it's difficult over a screen share but interrupt me please as the questions come along Dawson has done a great job of telling me when I'm not making any sense and that's appreciably well received feedback but thank you so the three things I really want to go through today are the high level vision of the way that we picture the future of organizations organizations being really companies today nonprofits, governments and then we think of academia and individual disciplines as organizations too we'll talk about how our platform works and then talk through a couple example example use cases more so than case studies one being user privacy how users could share best practices around services that they're using and make sure that they're increasingly improving their own you know digital identity and data privacy and the second one how companies can become more compliant with GPR in a measurable way that could potentially be transparent to their users as well and then that third example is the government iteration one we'll get to that too and I recognize that some of you may very well be familiar with the idea of collective intelligence but just to really frame the macro level view of what team machine is thinking about we think that this inevitable kind of coming future of organizations has a lot to do with the way that we make decisions as groups and the way that we leverage software to be able to do that in a way that optimizes for our organizational goals so societal progress is kind of driven by the organizations that are formed throughout history and we're defining organization as a group of people who share some common goal a lot of times in today's world that common goal is profit in a specific market but we think that the definition of organization is much broader than that in organizations what we're seeing is they leverage technology and they leverage people to increasingly optimize at their ability to achieve that goal they scale up, they include more technology and processes to be able to achieve that goal across the scale of the organization and they get better and better over time at achieving those goals but that's what we call organizational learning quote and as an aside in case this speaks anyone's ear and they want to ask about it is we've also found that this correlates to the level of complexity of a goal that an organization can handle is directly correlated to the level of complexity that can be handled by the language that that organization uses over time you were struggling on that a little bit you found the level of what is connected to the level of what we're going to go down a rabbit hole here if we dig in too much on this one but it's interesting on my end so I'm happy to do that can you say that again we'll stop you if it gets too deep okay perfect thank you through human history as language has evolved our ability to create organizations that have increasingly complex goals has evolved you look at human history like the way that the internet works the way that communication has and let me take an aside here just to talk about the idea of language language in our definition is not just looking at the words used in English but you can imagine like the differences between the particular vernacular used by a scientific discipline the complexity of thought that can be conveyed through that level of communication and the methods by which communication happens such as the communication over the internet enable much more complex goals to be held and achieved by organizations is what I'm trying to say okay maybe we can return to that at the end okay cool yeah that's interesting that I'm happy to talk more about that and my Jonathan has really defined this well for me and can even speak to it better than I can okay thanks Jonathan so this all kind of gets to the macro level hypothesis that will inevitably use software to get better and better optimization within organizations in the next decades that's kind of like where is the world today and where are we going so then when we extrapolate on this learning thing we think of learning as a function of three different components the first one is experimenting the second one is communicating and the third one is decision making and those kind of have a nice loop to them experimentation is obviously within the scientific realm it's pretty well defined we think that that can be applied to organizational learning government learning as well if you think about what is what are the outputs that we're achieving right now what is something that we think might improve that what's our baseline versus our expected outcome let's run some experiment and then get some results and continuously learn over the course of that cycle government iterations are pretty slow but the idea is that in order for an organization to learn you have to be able to communicate those the results of those experiments quickly and unambiguously so you want what we've been saying is get the right information to the right people in the right context and context is time context is the situation in which they're going to actually need to make a decision or use the information that they learned in a useful way and this kind of addresses the idea of information overload too like organizations are saying I have too much information I have too much bureaucracy I don't know where this information lives in an organization I don't know who knows this thing you can imagine in a perfect world an organization of the future those those issues are handled by algorithms that give people the right information so that they can then make decisions and then we define decision-making as groups of people that are kind of dynamically selected to choose specific actions that make progress for that shared goal and the idea is that this loop is what is getting improved over time and that's what learning is. So the idea here is that there's the idea of just in time manufacturing or whatever where you have the inputs come in just as you're making the car or whatever so this is just like just in time information so you can make all the decisions. That's awesome Yes that's exactly a great analogy for it and I hadn't thought of that before so I really appreciate that. Okay, thank you. Cool. So broadly I kind of touched on this already you know the application we see experimentation happening and knowledge sharing happening in organizations today across the three kind of the triple helix you know companies move really quickly they are experimenting often but they don't have a lot of often times of rigor to their experimentation they're making decisions maybe not categorizing them well or tracking them well and then the bigger the organization gets the harder it is to communicate clearly across the organization in context like that just in time aspect and there's decreasingly shared context because there's you know first there's different teams and there's different departments and there's different units and there's different entire companies that just makes it difficult to communicate well. In the government side I think some of the flaws are the experimentation that happens people lawmakers come in and say we think that you know this thing is important data privacy is important we're going to implement GDPR we're going to do the best possible job that we can to design it in such a way that it gets some specific outcome for the populace and we hope that it works but the accountability metrics on what we're trying to achieve with the legislation and did we achieve that as someone who's not an expert at legislation and involvement with government doesn't feel to me like that's something that's happening well and then because of the slow experimentation cycle and I think in addition to that the incentives in government communication is also slow so there's this lack of transparency and accountability which makes it hard for especially voters the populace to understand and then science I'd say experimentation is great communication is very difficult because of the uniqueness the nuance of the language using a specific discipline so as human knowledge expands the disciplines get more narrow and more deep which means it's harder to communicate that information to people who are outside of those specific disciplines they just they don't have the language they don't understand the background information so how would you characterize that in a way that was that was helpful to communicate was that okay that was a rhetorical question yes it's a question if we want to return to it no back to you cool thank you so so second section here is broadly like what is what does team actually do knowing all this stuff about the future of you know what we would like to see in organizations what we have is a protocol for representing knowledge and recommendation engine for giving that knowledge in the just in time fashion so the protocol for knowledge implementation really simply put is time series data on what was happening when a specific decision was made and then those actions that were taken are the actual decisions and you want to measure the outcome so you get something like here are all of the here in the government example here are the expectations of implementing some piece of legislation here are all those the metrics that are specifically associated with that legislation that we're trying to improve the implementation is that we're going to implement we're going to take that legislation and act it on a certain date and then we can track over time how those metrics actually change that that can be applied to a government situation not going to be applied to a team situation for example we see working with a company that wants to retain their employees they can say well our employees take a survey once a week assesses their happiness and fulfillment and engagement with the company when those metrics get low enough we have to take action to try to improve them when we take certain actions in certain contexts those work well this can be this methodology can be applied broadly to a number of different contexts does that kind of make sense yes okay is that does anyone have any questions at this point not right now we're good right now I think let's see more detail and then get into it with you sounds good the other half of the explanation here is that in order to get the just in time information we leverage that context that's built into the individual knowledge nuggets to understand what are the situations the actions that have been taken in a situation that's similar to the one that I'm in right now and which actions have been taken in the past that are actually that have an actual outcome that is similar to the outcome that I want to achieve you can look at the retention use case and say okay well there's 500 teams in our organization in each of those teams we've had thousands of surveys taken for engagement we've taken 350 actions to retain specific employees and then today I want to go in and say I'm managing an engineer named Joe and Joe is disengaged I can measure the context of my specific team all the teams that have existed in the past and look at which specific actions worked for employees that look like Joe and then have a confidence interval on taking a specific action that has been taken in the past in the organization to apply to Joe to have the highest likelihood of him continuing to be engaged and not leaving company and that is really the communication aspect of the experimentation the experiment is each knowledge nugget is kind of its own experiment they build on top of each other and then the communication that happens is making sure that those knowledge nuggets can be presented in the right place and the amalgamation of all those knowledge nuggets can say here's what we think is the right decision for you to make right now okay so I don't know that it doesn't sound like I have enough time to really give a demo because I think I'm already behind the schedule of this presentation but the use cases that I think will make this more tangible and can really open up the conversation around GDPR are the three parties that are involved with the GDPR regulation the first one is obviously the end users second one would be the companies who are providing services you have controllers and processors and then governments that are you know enact first off enacting GDPR and then the follow on governments that will enact regulation that's similar to GDPR and for each of these I think the interesting use cases to talk about as we were talking about this are for individuals if you use the system like the one that the machine had you could have people who had a customized assessment for their data privacy and said you're using these services you want to achieve this level of data privacy you want to achieve certain assessments like your data is used in these ways or your data you're only using services that are GDPR compliant to a certain degree then people can enter recommendations into the system and say well if you in a certain situation if you wanted to make sure that you were only using services that that gave you clarity around rectification as an example one of the many end users who were using this service could then say I learned that Facebook doesn't do it so if you want to be if you want to be your audio is a little muffled can you hear me now? yeah better alright what did you miss? just start from again again from being in a sentence so yeah if someone had evaluated Facebook and said Facebook is not doing a good job of meeting the minimum requirements for GDPR but they're not meeting the ideal requirements for GDPR around data privacy and kind of co-creating those ideal requirements as a community this is a type of system that would allow someone to say hey people who care about this higher threshold being met don't use Facebook or only use these aspects of Facebook information setting in Facebook so that you get to the level of data privacy that's important to you as an individual okay so it could allow for some kind of benchmarking and rating against these metrics so people could use it as one of the factors in deciding a service provider right exactly awesome that's really and I can jump to this slide here this is really going into that recommendation engine piece that was on the left side and saying okay in the situation that you're in you're looking for what the data privacy like what level of data privacy you have give me recommendations so I can move in the direction of achieving the metrics of data privacy that we've created as a benchmark as a community okay so that's one the second one is I think the second one that's worth talking about here is for companies that are providing services like how does the company transition from being not GDPR compliant to GDPR compliant across a number of metrics to minimally compliant to ideally compliant like that same idea of co-creating benchmarking not just what is what does the legislation send here what are people's interpretations of the legislation but also like what is the community of the internet co-created idea of what data privacy should include and the opportunity to be transparent around the specific metrics that are used for for that assessment so you might say the minimum is we need access we need right to eject, we need a racer this company does that because it's required by law and we can show that there's a specific assessment around the company's execution of those things and then okay well the community has decided that rectification clarity around rectification is really important to us so we can also see that companies are doing that there's an assessment around those metrics and right now the way that this kind of looks in our flow you can show you kind of example from UX that we have for another an actual customer is this first, I don't know if you can see my cursor but the first column is what is the what does it look like from the user's perspective the second column is what are the actions that the company is actually taking so this could be in this case this would be the processor and then the third column is an individualized assessment for a specific metric so if you're saying access it could be the or a racer it could be like the time to a racer and we can create a standard around that really would require expertise from your guys and to figure out what are the assessments that are associated with that yep and the idea of doing all that is that now we can create best practices across entire industries or all SaaS companies or something like that to say in that same flow of recommendations for individual people you could say here's what IBM implemented for their GDPR here's what Airbnb implemented for their GDPR that's working in a way that's compliant to the ideal case that's the standard that's benchmarked for the community you can do that same type of thing and here's how you might implement it that's the power of the recommendation in this scenario very awesome cool I know I want to be able to answer questions but I just wanted to bring up this last point here this is kind of like on the top piece this is what a specific decision might look like and on the bottom piece is what we call the computable evolutionary narrative but really that's what I referred to before is like the context for a decision we're doing something we're not just using a specific type of reinforcement learning called symbolic learning and the reason that I want to bring this up is because as we use our own system and implement algorithms the idea of understanding meaningful information about logic in the GDPR compliance symbolic learning allows us to do that directly we can see what were the specific elements of past decisions that were used in the process of making this particular recommendation and I think that that methodology is generally really powerful it's not usable in every scenario but it could give a much deeper perspective into what logic is used to make recommendations for variety of services including t-machine dynamite cool so we were we were just having a little bit of a quick I don't know like a tribal meeting here and it's the view of all of us then we'd be very grateful to extend the time for the presentation if you'd be willing to share the demo so that we could get maybe a sensor from what it looks like in implementation and then have a little bit of a conversation about what this could mean we've got people around the table here that are very classy engineers and deep expertise and certification and enterprise standards based assessments and we've got lawyers and candlestick makers a lot of it could like to have a basis for dialogue if possible up around your demo yeah that'd be awesome so I'll show you two important screens here as soon as you log in there's a high level overview and this is kind of for an organization that's a client so they're a company that is what do I want to get to they're a company that is implementing transformations specifically around moving companies from the waterfall development methodology to agile development methodology so you can think of this as closest related to the companies you know transforming to be compliant with GDPR use case this case they had a company that had an assessment specifically around all the different metrics that are relevant to being a healthy agile team across hundreds of teams within an enterprise and the way that this works is over the whole we're able to see where the teams are the green bar is the health goals that have already been achieved and you're contextualizing the transformation of this organization by the end goal that you're trying to achieve when you get into the back end of this this was basically a test harness for data that was fed into their system we could see it on our end and it's not the prettiest thing but it allowed me to explain it you're measuring this leftmost column here is the full assessment of what are the different components of what are the different components that we're measuring across the entire agile team where are those metrics today and then this is the rightmost column here is like where is the goal that we have for the entire team here and all of you a quick question at this point is what's an example of how a goal would be articulated in this context such that it was trackable and measurable yeah Jonathan is going to help me out here because I think he has a better explanation of this than I do so when the team takes an assessment like this they look at what are the improvement areas we want to really move the needle on and this team said we want to increase our happiness index and our trust and respect and so it takes their baseline index and it looks at what other teams in their situation have done and it recommends growth items and that's what's on the second column over here so it effectively is looking at other teams like them what's worked for them in the past but it's also incorporating expertise from consultants and coaches who have worked with a lot of teams like theirs and have just recommended certain items so we have a whole lot of different recommendations and then the team towards the transparency that Nick was just talking about they can see why was this recommended for us what improved on these other teams that we're talking about right here and this allows them to have open discussion during a team meeting and you're talking about an environment with like 10 people in it so they can decide let's try this over the next 3 months and let's try this over the next 3 months so we can achieve those growth goals so it's really just adding data context and insight into the decisions they're already having to make as a team and giving more grounding to the group decision that's being made so how do you measure a unit of happiness or respect oh these are qualitative surveys yeah so each team member before they come to our meeting they've taken a survey they've said how they felt about the team other stakeholders involved with the team are also surveyed so everything is just relatively indexed on a Likert scale got it okay great this is interesting so in this demo is there is this also just with maybe not so much like survey based data but maybe more like operations data or logistics or something that measures like a thing in time and space yeah absolutely the way we craft the assessments is that they can map to either qualitative inputs from surveys or from quantitative inputs from data sources a real concrete example of that in enterprises right now is DevOps teams most of their metrics that matter are driven from more quantitative data driven metrics the product engineering teams seem to be making a lot of improvements around more qualitative metrics because that's more the agile philosophy in this case and there's assessments that can combine both awesome and I'm sorry to I couldn't hold the question of the demo yeah absolutely I mean do you want to walk through the rest of the kind of the columns here well I think it's the gist of it these other columns here they just give insights so like this column right here where it says historical assessments we're going back to the exact team and the exact improvement that team made so that you can drill down into the data yeah we are doing some statistical analysis but it's all you know map to a symbolic process we can bring back the exact data points and the exact teams like yours this really helps to empathize with why a particular action might produce a particular result or you know you can even call up this team and say hey you know when you guys did this two years ago you know talk to me about what worked and what didn't work let me gain some better insight into it that's what we call this product the knowledge marketplace it's really about you know sharing knowledge in the context of decision making that's really enabling the article 13 and 15 stuff around meaningful information about the logic like it's not just meaningful information it's not just parameters it's we can walk you back step by step through each time this action was used and the specific outcomes that it had and why we why we're pulling it as a recommendation we should use it in your current context okay is there more of the demo this is pretty much it so the way that this piece of it was implemented and for the customer that I was talking about was they have a front end by way of by which their customers can take a survey that is then fed into just our algorithms in the back end and then we're spitting out their collection of recommended growth items from their clients and consultants and team members and their historical knowledge nuggets this was really just a view for us to be able to ensure that we understood what was going on and to show it to other people one of the things we're building right now is an implementation that can do this completely on its own without integrating into a separate product okay pretty impressive so I want to kick off with a couple of questions one of them is could you extrapolate a little bit on well first of all thank you for sharing the demo and for talking us through your approach I think it's really novel and it's it definitely speaks to conversations around care in law.mit.edu and in the Human Dynamics Lab about how the law itself could be more measurable and how we could look at what sometimes almost feels like a quagmire of subjective and not well standardized activities and behaviors such as legal dimension of activity and begin to look at ways that could compute and computational models that could help us to measure and improve and optimize legal activities and then for me the big success metrics would be could we engineer systems such that we can achieve more predictable legal results all around so thanks for sharing that it's good stuff I guess the question that I would like to have facilitators privileged to jump off first would be could you extrapolate on the collective intelligence aspect of your approach it looks like there's there's definitely groups and teams and there's information that multiple people see but I sense or I think I remember that there was something even more transformative almost transcendental about your collective intelligence approach and just appreciate if you could spell that out a little more yeah absolutely so the collective intelligence piece of it is just defined by the group of people who are using a specific instantiation of the software or like a specific assessment to find the information that's relevant to them so on the one hand if you had a group of teams within an enterprise learning from each other on retention actions or something cool like that that drives ROI wonderful right that's a cool shared intelligence within an enterprise context like what if you think really broadly about this what if you said you know how can every government in the world learn from each other on the data privacy regulations that they're putting into place like look at GDPR as an experiment and look at GDPR within each country that it has to be implemented as an experiment and then say each of the specific aspects of the legislation are of the regulation are pieces of that experiment what did we learn from that what were we trying to achieve by each of those in the future when the U.S. implement similar legislation or when the other country does what will they learn from that what metrics will they be trying to improve and then what can they have learned from the other governments over their time of actually implementing this in a very tangible quantified way as opposed to just an intangible case study like oh you know as we design legislation we're going to have to look at what the E did I think that that piece of it is a really important part of the collective intelligence like the individual data privacy recommendations it's cool if we got everyone who is on this call and everyone who's you know at MIT who's focused on GDPR to contribute to a system by which we can all improve our data privacy but what if there's a million people who are running those same experiments and contributing their own knowledge to the system that gets contextualized when people need to implement things what if it's 100 million people right that's where the system really scales up to be valuable that's exciting okay so I guess I'll wrap the first thing on collective intelligence with saying just as you're saying if we could imagine or hypothesize you know how this could work if there was a thanks Harry if there is a if you imagine that something like open GDPR were scaled well so that we could start looking across organizational boundaries and jurisdictional boundaries to see trends in compliance like you're saying and then maybe go the next level deeper hey Bill welcome to see even things like I don't know user feedback or you know get maybe more perspectives on the activities that are happening maybe that could be a way that that could serve as one of the data inputs to systems like team machine that could provide an overlay to monitor and measure performance well I guess the law you could say but in compliance with the law and perhaps if the legislators got a little bit better at stating things like the goals of the law the way they might in a nuclear test ban treaty or in an environmental kind of regulation where they're looking to cap and edit some overall level and then maybe they have like a cap and trade and a tax incentive and some other stuff but you always have this stated objective goal that you can measure the performance of the law against well did we achieve the carbon level was there a nuclear test that we detected with seismic whatever monitors or anything like that and so in GDPR there were these kind of like stated objective goals that we could then platforms like yours to measure how close we got and then finally I think something that Sandy Pemlin talks about a lot here in the media lab is could we then actually go the next step and use that to optimize further what should the law be like in view of the performance that we're seeing like did we get the law right did we do other ways to achieve the goal better was that even the right goal now that we got it did it scratch the edge like did it achieve what we were really hoping at the end of a rainbow and so we could maybe start having a better measurement of all that so that's kind of what I got out of your answer yeah and I have two things I would add to that I think that's a great synopsis and I appreciate that two things I would add would be the first one is there's some automation that actually comes with what should the law actually be when we get increasingly confident in these systems and see the systems making recommendations that are right most of the time we don't always have to go to when we have a certain level of confidence a certain confidence interval in the recommendation we don't always need the human end loop there's a really important process to have it designed so that there's a human end loop but there's some decisions that will be so clear that the algorithms can make recommendations on it that you don't need it anymore and the second piece is that the assessments themselves can be iterated on by the people who are involved so it's not just okay the collective intelligence comes from recommendations on what is the growth item, what is the item by which you can increase your data privacy what's something that I did that worked well for me context may also implement to achieve a similar outcome it can also be what are the assessment metrics that we use how do we improve those assessment metrics over time, how do we improve entire assessments so that the context can be increasingly well defined so that the communication, the just in time communication can happen in a way that is increasingly relevant and unambiguous beautiful would you be able to dive in a little bit deeper on the reporting capabilities in T-Machine what sort of dashboards or reports I could create if I were in an organization and I wanted to discuss this with teams or my board how do you express to answer that currently honestly we're doing a lot of these projects on a customized basis but what seems to be a common thread is outlining the goal you want to make in the organization as a whole and then tracking towards that goal with metrics that can really get by and from stakeholders that you're on the right track to achieve the long-term goal so one of the reporting requirements here a lot in the enterprise transformations is well that's great that more teams are becoming agile and moving faster but what are the bottom line metrics is it delivery output, is it defects in software, what are the bottom line metrics that meet the ROI goals that I have as a VP for the company and so it comes down to just building out business intelligence dashboards and prescriptive analytics that kind of meet those stakeholder needs a little different niche case but I think the overarching goal is to articulate the goal, show the progress and show that each of these decisions that's being made is making things better and better towards what the company wants to be absolutely so I think more to the point I was trying to figure out if these sorts of visualizations could be self-service or if this was something if I had your software I would have to request from you all our goal is to make a more self-service get the data in a format where you can send it over to Amazon Redshift use the latest business analytics business intelligence tools and we want to strengthen our product to be around the group decision making any other questions or insights we were just talking about measuring were you following the thing Bill when you were there we had a late entry any questions for us you guys have taken a really admirable amount of time putting the presentation together and prepping for the session and now going through it in the middle of a work day I'm very grateful can we get you back somehow so short of like signing up MIT is your like anchor client I know feedback or anything that we could contribute I have something on my mind but if you have anything go for it yeah I'd say one of the things a new space advocating for collective intelligence advocating for data driven management academic perspective is not very new but from a business perspective it is new and I already got the just in time information analogy is awesome if you have other ideas of ways that we could say these concepts that I said more clearly more concisely the feedback directly that really didn't make sense all of that would be really well received you know smart minds that have eyes on this and give feedback is extremely helpful to us and if you don't feel comfortable in this setting or if there's not enough time then all that would also be really well received via email do you have a question we have another question have you been in comfort data may have missed it how much of we working on is strategically open source and how much is like closed source yeah we've talked about this a little bit we haven't open source anything yet and the idea would be I think the core algorithms of what we're working on is probably not something that we want to open source at least until we have some stake in the market that we can feel confident about or some momentum that we can feel confident about but leveraging this technology to achieve things as collectives or as groups like let's create dynamic best practices for people on data privacy particularly around GDPR like the individuals example is something that would be really cool to be able to reinforce with the algorithms that we have in a black box way and we've talked about this in terms of building narratives around evolutionary biology around scientific sharing for the context of academic papers it's something that's interesting to us it's just not something that we've worked on a lot yet thanks so to piggyback on that question what do you currently or in the future what other systems do you plan on integrating with I know you mentioned maybe one but I didn't catch it yeah I think the integrations they happen at a couple of different levels one is how do we get data the biggest challenge right now is getting data from companies in a way that's cost efficient time efficient building integrations into the ERP systems that they use into the communication systems that they use and into the survey systems that helps map into the assessment basis for the tool art itself I saw the video right now oh no okay it would be like what was the very last thing that you said I'm sorry there was like a photonic disruption in the room yeah I was saying the integrations with ERP systems in companies communication systems you know it says email it's phone calls it's meetings and qualitative information such as surveys that can test things that are not necessarily qualitatively tested so those are some of the integrations we have to input the data and then integrations to output the data include business intelligence tools for dashboards it includes interfaces that we can touch people you know through transactional emails text messages dashboards and all the ways that are meaningful our philosophy as a product is obviously to be as focused and slim as possible and then use these other great tools to create a great user experience so how do we follow along and you guys progress and keep in touch with me I think Nick will follow up with some information we're a growing startup we're trying to find like minds of the community and it's one of the reasons we're really happy to have the opportunity today so I guess and I'm going to help you out a little bit number one teammachine.io yeah number two if you want to relive the glory and the splendor of the slides they are embedded in the event site now as of moments before like we went live and number three can I email or what channel communications friendly but can I do a follow up with the notes and everything from this event could I include your contact information absolutely so this is I probably should have made a slide that said it much greater than over here but this is our contact information just hold that here we go yeah and then it's fun to visit them because they're in this very groovy like startup space in San Francisco so this is every aspect of all of that is nothing but wonderful yeah if you're ever over here and we'll show you around the office and I can make recommendations on other cool stuff and companies that might be relevant to you and things like that here too so just open communication channel on mya okay so I guess we could I'm not sure if you guys are timing out in a meeting I understand and people we probably need to take a break shortly but would people indulge one more hypothetical what we have them on the line is that okay with you guys Sanofran let's just go for like the brass ring is right ahead in front of us so let's just like I'm just going to jump for it so let's say let us envision that some amount of time from now there's like a wide deployment of team machine and there's a motivation for companies for regulators and for individuals in different ways to get the benefit of using it what might some of the goals be and how could some of those types of goals how could appropriate goals be measured and what sorts of data and sources of the data could fit and just to get us started one thing I could imagine that to me seems like a natural would be as you said Nick article 15 request for data we could just measure there's request made and in FOIA space information act there's a number of open source platforms and some other portals where they get a lot of mileage out of looking here's the date that the request for the public records was made here's the date it was fulfilled or denied or something like that and they get a lot of business value out of that including estimating how long it will take to get a given public record oh well this one's kind of like that one it's piece of cake it'll probably take three to five days like review by the Attorney General and a congressional committee we're guessing ten years or never so to me it seems like one simple one might be article 15 subject access requests although then I'll just go further and say and perhaps a goal might be something like of all the subject access requests that show up on the system we get responses that are deemed successful or complete within one business week or something like that so to me like I just like that might be like a very like almost toy rudimentary example of a way to do it but I really that's not a brass ring I think that's like I don't know like a tin I don't know like lid on a can I think the brass ring might be then if you had like policy makers it said our underlying goal for GDPR is something more systemic or you know kind of population wide such as having personal data act as a new asset class so like our goal is that people now have such ownership of control of the personal data that we are seeing a measurable bump in GDP of at least like a tenth of a percent of GDP solely based upon like exchanges of ownership rights where the individual is the owner for personal data or we're seeing like you know like a competitive industry and some market is measured somehow to for people that are now like racing to the top to have you know to be measured and performed well as providing best GDPR skills and we're seeing that that's now a competition point within a market so that those are the maybe examples of just over the horizon kinds of goals that we can imagine measuring and then one that may be like available now so how close did I get could you evaluate like my extrapolation of how a GDPR module or implementation of T-Machine could work yeah one thing that you hear me now okay a little bit what you bring up about the personal data as an asset class is really interesting to me except I've been thinking about you know for a long time blockchain use cases around data ownership is like very interesting is a thought experiment and one of the things that you could do like not just okay let's analyze this particular use of the data within an assessment but you could create a series of different assessments that apply for different contexts here's the here's the minimum GDPR compliance requirements for a company and then here's the transparent version that we show to all of our users and then you could say here's the ideal benchmark we were talking about earlier like a co-created community internet non-profit version of what we think GDPR compliance could be or data privacy generally even on top of GDPR could be and then you could build another assessment that said if you're someone who is trying to generate value from the ownership of your data here are the additional requirements or here are the different sets of metrics that you want to use within a separate assessment around how you can maintain data privacy and security but specific to someone who is selling their personal data because there's a whole slew of different things that you now want to think about like how are these people using this data that I'm selling how are we evaluating the value of this data and all of those things and that goes back to my idea what I suggested earlier is that like these community needs to co-create those assessments as well like those are what it really improves over time does that answer your question yeah I think so I think you sort of validated that I was we're thinking about it broadly the right way and then I like how you broke it down into roll based or kind of vantage point based ways it really is important that the success metrics and what we're measuring would and should be different and adapt to the different clusters of users like the policy makers are measuring different things than individuals measuring different things from companies and then there's going to be other stakeholders in this personal data ecology too doubtless that will have their own views so having this kind of focus on a me of assessment models to be able to emerge and support and reflect and adapt to the needs of their communities I think makes a lot of sense one thing I wanted to add there is in these different having the different types of assessments these are newer issues for society complex things and often you know these type of debates get broken down to narratives that have two different points of view and one of them is good and one of them is bad you know with all the different points of view that need to be reconciled you know you can you can take experts and you can take consumers and you can represent a very broad fragmented points of view by taking these different assessments and relating them back to one another so that it's a more realistic picture for decision making and I just think it's so important to break out of that duality of the narrative into something that really is a richer way to communicate about the problem as a whole here beautiful okay anything else in the room we'll have a guess no okay then just a round of applause for you guys thank you our hands together okay thanks very much and hey I'll be out in the Bay Area by the way in the first few days of June so I'll definitely look you guys up and maybe get some more of those snackies and 500 startups great yeah thank you so much for the time this has been awesome again feedback is very well received even if it's a you know scathing email about how wrong we are I'd love to hear that so really appreciate it okay we'll do I'll share contact info thanks again guys take care bye and so this is probably a good time to sign off the public broadcast now they were done with this part of the program so if you're out there in internet land you can find all the content for this discussion at gdprhackday.org from law.mit.edu so thanks very much everyone and look forward to seeing you online bye bye