 I thought it would be fun to have four speakers who would speak to us about different aspects about both managerial and technological issues, and hoping that it inspires some of us, me, and also some of you to pay more attention to this when we are doing our research. So it's going to be a bit disjointed because it's going to be different topics, but I think there's lots to learn here. So our first speaker is Yves Alexandre de Mongeois, who teaches at Imperial College and is one of the world's best specialists on privacy, and he will explain to us why technology means that privacy is not just a yes or no thing, but there are difficulties with it, and you notice that he's got the most beautiful beard of everybody in this conference. Yves Alexandre, up to you. Unmute yourself, please. Oops, sorry. Yes, can you hear me now? Yeah, perfect. Sounds good. Thanks for the kind introduction, and for the invitation to speak. So I think we have 15 minutes. Basically, what I'm going to try to do is, in good start, in good engineering style, you know, topics to discuss and where the state of the art is when it comes to privacy engineering research. More specifically, I'm going to talk about what we call basically the technical search for anonymous data, and really how the field has been moving from the identification to, you know, the identification mechanism to privacy preserving in anonymous systems. And just to be clear, like, this is what I focus on. This is the type of data that we work with, you know, large scale behavioral data collected basically as a side effect for use of technology. I think, like, you know, the previous session mentioned, like, you know, Facebook data, you know, mobile phone data, credit card and other kind of data sets. And these data sets are certainly amazing. I mean, they're really, in my opinion, driving a lot of the advances that we've seen in deep learning and AI recently. It's like, you know, it's a game changer when it comes to clearly social science research, but also increasingly in economics. However, when you start digging into these data sets and looking at these data sets and analyzing them, you know, there's always a time in which you feel, you know, there's something weird, right? Like, this is literally someone's life. Like, you know, these are like, you know, hundreds to thousands of data points per person per day on how someone has been behaving. And there's always very quickly the privacy question that comes. And what's really interesting to me is very often, when you start digging and start asking questions, you often get the same answer. And so this is one example of a time where this answer was provided to really reassure you that, you know, there was no risk and nothing to discuss. This was TFL, so transport from London. The subway system in London basically collecting Wi-Fi data of every single person using the subway system, but also passing by any Wi-Fi access point. And when you start asking questions, you get the answer that, you know, really you should not worry that this information is anonymous. And if you're not sure what anonymous means, they make it overly clear that it was depersonalized, whatever that means, so that no one can be identified. And actually, this idea of anonymity, this idea of breaking basically the link between a person and his or her data before giving it to the analyst, is deeply rooted in our data privacy laws. If you look at GDPR, if you look at CCPS, both of them basically roughly do not apply anymore as soon as the data is anonymous. So, so far, everything is good. And even when you start looking, technically, we have actually quite a big literature in how do we anonymize data. Someone answering one of our paper told us that, you know, there's 40 years of literature in how we take data and we safely anonymize it before using it, basically pseudonymization, the identification. You remove direct identifiers, pseudonymization, and then you start adding noise, record swapping, suppressed to try to prevent re-identification. However, I think what a lot of research in the past 10 years has showed is that this might not be as secure and as robust as we might believe when reading the news. And basically, I'll try to make an argument in three acts on why traditional de-identification techniques, basically the way we've been doing things in the past, is not appropriate anymore, given the technical state of the art. First learning from Jack, it's basically pseudonymization. It's really this one, it's rarely enough, right? Like just removing direct identifiers is often very rarely sufficient to properly anonymize data and prevent re-identification. As a little game, let's imagine that we have a data set of mobility data. You know someone, you're searching for someone in this data set. In this case, Jack, you know where Jack was at a given place, at a given time, and basically the question is, how much does it take? Or how hard is it if I have a location data, I'm searching for Jack. I know Jack is in the data set. What does it take for me to find him? And basically what our research and other problems have shown is basically it's quite easy, specifically for mobile phone data. What we showed is that most of the time, actually 95% of the time, knowing four places and times where someone was, is sufficient to uniquely identify him 95% of the time. That basically means that, yes, there's a lot of people around Jack right now at TSE, but very few of them will then go back somewhere close to his home tonight and then potentially back at TSE tomorrow morning, et cetera, et cetera. Very quickly, few pieces of auxiliary information are sufficient to uniquely identify someone. So studentization rarely enough. Second act, noise addition. Very often you say, well, yes, but this is because you have precise information. Maybe if I were to add noise, I could actually hide Jack's data within the data set and prevent re-identification. The idea is quite simple. It's very similar to like, you know, face recognition or images. Basically, I'm going to blur it, right? I would make the information as precise. And the question is, can I blur it enough to prevent re-identification? And basically what we showed in another piece of research is that very quickly, this is not sufficient either. Basically you have decreasing returns with noise addition. And as you keep adding more and more noise, the return that you get for adding this noise decreases. So not only you are losing utility, you are losing some of the quality of the data, but every time you need to add more and more noise to be able to prevent re-identification, to decrease the risk of re-identification. So noise addition is not sufficient either. Third one, uncertainty. It's a big, fairly popular technique at the moment that basically goes along the lines of, well, because I don't have the entire data set or because I'm only giving you a fraction of the data set, even if you were to find Jack in this data set, maybe it's not him. And actually, you know, if I sample 1% of the data set and you find Jack in a sample of 1%, actually, you know, 99% of the time you're going to be wrong. And basically what we did lately, I think in a paper published last week, or last week, last year, is basically to show that this is true, yes, but at the same time, we can do a lot better, right? We have this kind of cool field called statistics that's actually quite good at quantifying uncertainty in whether we identify the right person. And basically what we did in this paper published in Nature Communication is basically to estimate the likelihood of me to have identified Jack in the data set. What does it take? And maybe if I have, you know, a few piece of information, this might not be sufficient. If I find a person who is an economist, teaches at TSC, lives in a certain place, drives a certain type of car, the color of the car, a hazard dog, etc., etc., very quickly, the likelihood of another person in the part of the data set that they do not have access to, to contain this person is quite low, and therefore I'm very likely to have correctly we identify Jack. And actually what we showed is that in the US, 15 demographic attributes are sufficient to uniquely identify someone, to re-identify someone in a US person in a data set 99 or 98 percent of the time. So in any data set where you are searching for the person with these 15 demographic attributes, you're going to find a person 99.98 percent of the time. So basically the conclusion of these, these, these three quick pieces is that we really do believe that, yes, it's enticing and it's a useful notion, but really technically how we use to achieve anonymization, how we use to break this link between a person and this, this data basically just, just does not work anymore, right? Like the data we have today is, is too large, too rich. We have too many pieces of auxiliary information that we can use to re-identify someone to be able to reasonably anonymize the type of data sets that we are dealing with today. This is a perspective that was shared, for example, by PCAST that went even further and said that, you know, it's basically they don't see these kind of the identification technique as a useful basis for policing. Despite all of this research, we just keep seeing it happening. These are, for example, two examples from, from Australia, in which basically they released government survey data, medical data, de-identify 30 years of data in 2016, only to, to take it back a week or two later when researchers contacted them to tell them that it was actually quite easy to re-identify people in this data that they made publicly available. It's also, it happened two years ago, actually, when we saw the headlines that actually we know whether Trump is a good businessman or not. And actually, when you dig into the New York Times piece, you can see that actually this information comes from publicly available data that has always been available with basically some level of anonymization that was not sufficient to prevent the reporter from the New York Times to actually find Trump in these data sets. So these are not just, you know, theoretical attack that academics are building. These are happening in practice and are a real risk. Moving forward, I think increasingly what we see and what the state of the art is moving is really towards, you know, security mechanisms. Can we try to find a way to give access to the data while maintaining some level of control? Basically, from an information security standpoint, we don't really believe that you can keep anonymizing the data and then just give it away. You need to maintain some kind of control to be able to control much information you're releasing and ensure that you protect anonymity. A big part of this is query-based system. It's basically you remotely access the data set, for example, to an SQL interface or any other kind of interfaces that would only return aggregated data to you. Aggregated data can be a count, can be a machine learning model, can be the result of a statistical analysis. Quick note of caution, however. Simply returning aggregate is not necessarily sufficient. This is just a very simple example that I want to leave you guys with. Like, you know, obviously, if I'm only returning aggregate, my system would refuse to answer this question. Like, let's assume there's only one Bob in my data set. Obviously, the data, the system will say, no, I'm not answering this question, right? Obviously, this touches upon one person, and so this is not aggregated, therefore I do not answer. However, just consider this simple other way to get at the same information. I'm going to ask the same thing about everyone. So a lot of people, I'm going to ask the same thing about everyone, but the person of interest. Very quickly, I will get the answer to my question and bypass the aggregation. There's a range of attacks, basically, that are being developed against these kind of systems, as well as defense mechanisms. So really, these systems are good. Aggregated is good, but aggregated is not a silver bullet, and it's not sufficient. This is an example of a state of the art system from Max Planck Institute that we attacked with my team two years ago, which basically we showed that once you find the vulnerability, we build upon the theoretical vulnerability from the 2000s to build this attack. And you can see that basically as soon as you have the attack, you can basically do close to as well as if you were to have direct access to the data set. Finally, I think the last thing that I want to leave you with is really to think about privacy, not only about the individual, but also the network effect that can be a play in privacy as soon as data basically relates to more than one person or you have access to data about people you're connected to. And we started basically formalizing models for more than edge observability and then start studying the network effect that can be a play. You were talking about Cambridge Genitica at the end of the previous session. So we're basically building model to try to estimate the network effect that play, for example, in the case of Cambridge Genitica or other surveillance mechanisms. So basically, I think to conclude one data anonymization, the traditional like let me take the data, anonymize it once and for all and publicly publish it doesn't really work anymore. And as I said, does not really have policy relevance anymore. There is a road, modern privacy engineering techniques that exist and that will allow you to fully use the data while preserving privacy. But then we need to be developed for every use case. And then finally, always remember that we have the theory, we have the actual implementations, none of them are server bullet and there's really a need to constantly keep testing the robustness of the techniques that are being deployed and really make sure that these techniques was reasonable assumption, really reach the intent of the law. Thank you. Thank you very much, Alexandre. This was great. I have one question from Alexandre de Cornier. How costly is it to re-identify people? I mean, is it something which is, because if it's very costly, it's not really relevant. Do you have any idea of... That's a great question. And this is actually something that, again, it's very in a lot of our conversations, right? It's really, we hear that fairly often. It's fairly often, it's like, well, it's complex. You need a PhD in statistics to re-identify people. Honestly, it's not, right? Like the vast majority of these techniques are actually fairly simple, publicly available. Honestly, to me, the standard is, I teach a private engineering course, basically at the end of eight hours of training, we give them data sets and they managed to re-identify people with no issues. So it's really, it is not, it is not, we're not talking about trying to crack, we're not talking about hundreds of GPUs to try to crack some password or pseudonymization mechanisms. This is literally statistical matching, potentially profiling techniques. Most of them are available. So like, yeah, no more, yeah. No more. Okay, thank you very much. And we are not sharing you, so it's a good, I don't even need, you're cutting off, so I'm on your own, so I don't even need to cut you off. But thank you very much, Yves Alexandre, this was fantastic. Zock, did you get yourself? No, somebody mutated me, thank you. The, I mean, economists are very excited about interoperability. If there's going to be interoperability between platforms and, you know, other users of a platform, it's going to be through something which is called APIs, and I thought it would be nice to have a little exploration of benefits and limits of APIs. In order to do this, I'm cheating because the next speaker is not an engineer, Wurzoff even is a lawyer, but he's been working quite a bit on APIs and trying to convince the legal community to take them seriously. So it's Chris Reilly who worked for a long time for Mozia and now works for a thing called the Art Street Institute in Washington. Chris, you've got 15 minutes. Thank you, Zock. I usually say that I'm a public policy professional. I went to law school. I used to be a lawyer, but now I'm getting better, but I do also have a doctorate in computer science. So I'm at that rare weird intersection of technology and law. You gave a lot of my setup already. The importance of APIs, the importance of interoperability, I think Isol Sandra is talking about challenging one long held assumption within the regulatory community around privacy and the relative protections of anonymous data. I think APIs and the emergence of the importance of interoperability are beginning to challenge a fundamental, maybe not an assumption, but a paradigm for how antitrust and competition theory approaches single firm conduct. And I think we're starting to see that play out. I have had the pleasure of talking about API for a few years now. And one of the most impactful and also fun meetings I've ever had with government was in 2018. I had a 14 or 15 on one meeting me with 15 members of the leadership of the Federal Trade Commission for an hour and a half to explain to them the importance of APIs and interoperability in the functioning of the modern internet. I'm going to try to compress some of that into 15 minutes, but please forgive me if I'm abstracting up a fair bit. So I'm going to use this first slide to illustrate what one of the forms of documentation of an API looks like. This is Slack, the Slack messenger service increasingly popular throughout the pandemic, of course. I'm a big fan. I'm also a big fan of how they document their APIs. So an API involves software, but it's better not to think about it as a piece of software. I think of it as something like an instruction manual for a service. It tells you what you can do and what effect that action will have. So the Slack APIs are designed to help software developers build apps that interoperate with Slack. An API allows you to build a messaging app that can communicate with Slack users, at least in theory, or something completely different, something totally downstream that allows you to build on Slack as a platform. Platforms like Slack operate APIs in order to allow other technologies to work with them. Now, this whole platform economy concept, there's a lot of different people who use this term in different ways. I think it's fair to use the term platform both to refer to hosting content as an internet service that allows users to contribute content like a YouTube or a Facebook, as well as hosting other businesses in fact. And in fact, when we think of modern day digital platforms, they are fulfilling both of those functions. So just as a contextual note, but in the context of APIs, really what we're focused on here are how does a platform make available access to its services, to its content, to its network and to its users, to other technologies and other businesses in order to do more impressive things together. So what types of APIs are there? Now, I don't think there's a single second stone difference for these categories. So I will be upfront that these are the terms I use to distinguish these, open, private and public. I think it's useful to think of there is three different kinds of categories. To me, open API means no restrictions, no limitations. You make available this interface and you don't track who's using it. You don't limit use to it. It is truly open. Obviously with that kind of an API, you can't offer any form of sensitive or personal or protected data because there are no limits and no restrictions on it. It's kind of an edge case. I'm not gonna talk about open APIs anymore. I just want to define that when I use the term open, I mean that many people use open API to mean what I call public API, an API that is made available to a third party to use as distinct from a private API. However, I resist using open when it is something so limited because the framing connotations are very challenging. Public APIs in fact do include many restrictions, which I'll get into more in a little bit, but that's really the heart of when we think about APIs in the relation to interoperability and to antitrust in competition, we are focused primarily on public APIs. However, it's important to understand public APIs stand in contrast to private APIs which are set up primarily actually exclusively for use by other services offered by the same company. So for example, I can promise you, although I don't know, not having worked there, Facebook and WhatsApp have private APIs that allow those two services to exchange functionality and information with each other that are not made available in any way, shape, or form to other social networks or other messaging services. Technically, just as a detail, private APIs may not be an entire API. There may be a public API, such as the ones I showed with Slack, which we'll come back to in a little bit by the way. There may be a public API that says, hey, we wanna be interoperable. Here's how you can send messages into our messaging service. Here's how you can look to see if there's a user on a service. Here are all our great public APIs and public methods, which is another name for the function, the call that comes into the system. And then as a part of that public API, there may be private methods, private pieces of the public API structure that are another way the platforms can, better control and manage access into their services and into their infrastructure. But put a pin in those restrictions on public APIs because I will get into those a little later. So why do we do APIs? APIs are, well, abstraction as a starting point principle is necessary for software development. I learned a program in the 1990s. I learned a program in the C programming language, which nobody uses anymore because it's horribly unsafe, but it's very powerful. And the thing that I loved about programming in the 90s was I could think about the entire program. I had every line of code in my head. I knew what every single piece did. With some exceptions, there still were some libraries that I didn't know. So even in the 90s, we needed abstraction in order to build complex software. One of the first things you learn to do when you program is print. A print's a line on the screen. And in C, there's a function called printf that just takes arguments and puts hello world on the screen. I have no idea how printf works. As powerful as my knowledge of C in the relationship to the computer was, I had no idea how printf works. That same principle of abstraction has just been taken up to another level and put onto the internet. And that's what APIs are. That's why they're ubiquitous. That's why they are such a central part of how internet services operate. They help specialize. They help make sure that when you have two teams, large teams at a place like Facebook, you have the user interface team that builds how users send content to and read content from the system. And you have the trust and safety team that looks at that content to see whether it's terrorist content to make sure that Facebook is in compliance with laws and enforcing its own policies. The user experience team doesn't need to know how the trust and safety code works. And vice versa, they set up internal methods of communication with each other generally via APIs, private ones in this case, in order to reach that level of functional specialization. Another thing that you'll see as you read more about APIs, there are a lot of companies that want to sell an API. API is a very valuable commercial resource. What they're selling is not the API in itself, but rather access to the resource that is protected by the API. So they will have an API that allows for a certain amount of access. If you want more and more access to that resource through that same API, you pay more money. It's as simple as that. Now, what kinds of restrictions are put onto these APIs? The first and foremost one to keep in mind is authentication. You have to ensure that the user, the program, the service that is accessing that API has the right to do so, the permission to do so. It's necessary for any form of protected data to be made available. It's also necessary for security, even if you're not offering protected data, but you want to make sure that you're not supporting operations or interference from a repressive foreign government or known spammers or that sort of thing. You generally have to put an authentication layer on any public-facing API that you offer. And particularly, if it's commercial, if you're trying to make money off of the access to it, you have to have an authentication layer. But even for free service, for these spam, bot, abuse controls, you need authentication layers. That then sets up the other three categories of APIs, sorry, of API restrictions. Pricing and access control, so that you can charge for API access, which sets up very effective freemium style models, which I would argue Twitter offers something like that, but they may frame it differently. So Twitter has fairly good APIs to allow you to send messages, do and receive messages from their services, but they have pretty aggressive limits on that as a way of making sure that you can get to a little bit of access, but not a lot. Now, this is where some of the complexities in the competition theory start to come in, in this pricing and access control layer. Those rates aren't subject to any form of legal regulation anywhere in the world, and there are no clear laws against varying prices by who's purchasing them. So in the US in particular, there's no limit on self-preferencing or refusing to provide access to a competitor. Obviously, you've lost different. We're gonna see this being tested as the Federal Trade Commission's lawsuit against Facebook continues. The massive wave of antitrust lawsuits that we've seen over the past few months, they're primarily about mergers, but there is a huge section in the FTC's lawsuit, which is really just about interoperability and about how Facebook has managed access to its APIs. Rate limiting throttling, I mentioned a little bit. It's a very common technique that allows for some amount of free access to a service, but beyond a certain point, you throttle it back, you slow it down, or you just cut them off so that they can only access it periodically. If, of course, you pay more, you usually can get more. Privacy and security controls on APIs are their own very, very large category, but I think this is an important one because APIs come into two different categories. There are general APIs, and then there are user-driven APIs. So Twitter used to offer a general API to its entire timeline, to everyone posting every message on Twitter. I think they still share some version of that with the Library of Congress, actually. If you can send, anyway, there's lots of privacy complexities here, but let me stick to the categorization. That is a general API that does not limit the amount of data to that accessible by a specific user. A user-driven API, by contrast, is the sort of thing that Facebook would offer. So imagine a future where you don't have to use the Facebook app to get to Facebook, you can use something else. You can use Slack, maybe. Still, though, Facebook doesn't want to make available its entire user base that would be disastrous, but Facebook could be expected to make available to you as a Facebook user through an API the content that you can access on a Facebook. So that sort of user-driven and user-controlled concept of an API is fundamental to this future vision of interoperability. So you, as a user, would say, hey, Facebook, I want Slack to be able to read my Facebook messages. And Facebook would check, yep, Slack is okay. They're not spammed and not a bot. Yep, you are you, you have authenticated. Okay, I will let Slack access your and only your messages within Facebook. Limiting APIs to that kind of user-driven concept makes it a lot easier for the platform to limit the data the API can reach to only data that the user themselves could access if they were directly using the platform. So how do these privacy and security controls actually play out in law and in regulation? We don't know the answer to that yet. The text on this slide is how the 2019 access act from US senators, Blumenthal, Warner and Holly addressed that. It's just some language that says, yes, you need to have privacy and security standards. One of the most hard questions in how this plays out, this tension between privacy, security and openness to interoperability will be in how this works out. APIs evolve over time, like all computer code. This is normal, but it has consequences. We've talked a lot about Cambridge Analytica, but Cambridge Analytica is a very good historical example of an API that needed to evolve in order to limit the kinds of data that was available. Reliance issues come up though. If you, like Facebook, offer APIs and then businesses build downstream of you in reliance on those APIs, and then you change those APIs. And maybe, as happened in 2018, that change breaks a fundamental function that that business needed and they are unable to adapt or compensate for that change. Suddenly there are competition challenges and maybe Facebook has a right to do that. Maybe they were doing it for an anti-competitive purpose. These are the kinds of interesting questions that we don't know and they're all sort of illustrated by this API dynamic that has occurred. I personally consider this all a kind of a deep theoretical challenge to Clayton Christensen's theory of disruptive innovation. If a disruptive market emerges downstream of a platform that is dependent on that platform's APIs, they have control over that and they can nip that potential disruption in the bot and really have some pretty significant effects on economic theory, I think. I'll move quickly to a close here as I'm running low on time, I know, but from a regulatory legal governance paradigm perspective, sorry to venture back into lawyer territory here, we don't really know what interoperability via APIs looks like. It's not quite the same thing as a traditional supply chain model. It's not quite the same thing as banking, though the open banking movement in the UK and the law there has started to do some pretty interesting things around interoperability. And it's not quite the same thing as traditional communications markets where there are established principles of non-discrimination fairness baked into the law. It shares some elements of these paradigms but it's all a little bit different. So the computer scientist in me approaches this a lot more simply. The internet secret sauce is its openness and that openness is made real via APIs. So it's why I'm such a strong advocate for APIs and for interoperability in this context because I regard it as truly central to the economic and social benefits that you've got from the internet. So I have one final brief theory and this is not something I've tested but it's something that has been operating a lot of my thinking for a while as well. When a platform is small, it benefits more by being heavily interconnected to established services and it shows its value to them and grows by offering value to users of those services through fairly open APIs. Once it reaches a certain scale, suddenly others start to benefit more from its relatively open APIs than it gets in return. Therefore, it's natural incentives start to flip and it starts to close these down and try to silo more and more usage and values in itself. This is what I hope to continue to poke out and test in the years to come. Thank you very much, Chris. That was really interesting and I'm sure that a number of people in the audience will get back to you to discuss some of the issues in most of the day. I'm sure I will in any case. So our third speaker is Boris Otto who works, I'm sorry, my German pronunciation is even worse than my English pronunciation. Works for the Franhaufer ISST. Boris has been very influential in the data space, in the data spaces, which are the means by which you exchange data space. And in particular, this is very important for the GAIA-X project of which CEO Hubert Tardier has very kindly joined us. So basically though, what are the limits? Do we, what limits do we have about exchanging data, trading data? That's the question which is very important for economics but clearly has got some engineering aspect. So Boris, you've got the 15 minutes. Thank you very much for being here. Thank you for the kind introduction, Jack. Let me first share my screen. It should work now and you should now be able to, now you can see the slides as opposed, right? So what I, as you said, Jack, what I want to talk a little bit about is the fundamentals of data spaces, how that actually works. And then in particular also talk a little bit about what implications that has for an economic perspective on the topic. So as Jack, as you mentioned, obviously data space is somehow a hot topic these days. If we look into the digital transformation, not only of businesses but also of let's say states or even economic regions, I brought here with me a quote from Ursula von der Leyen from her State of the Union address of last September, where she basically called for the establishment of common data spaces. And in fact, they want to create nine data spaces. So the topic has received quite attention and is discussed quite vividly, but then of course questions arise, okay, well, sounds nice, data space to share data, but what actually is a data space, right? And that is something that I would like to touch upon on the next slide. In general, a data space is a data integration concept, which has for the first time been proposed some 15 years ago. And I would like to outline the general design principles for a moment. A data space does not require physical data integration, but it rather leaves data where it is, which is very important because it means that you do not are forced to dump your data into a central data store, but basically can keep it on your own, right? Also, there is no common schema required. So in line with the fact that there is no requirement to have all the data in one data store, a data space does not require that all the data matches a certain schema or a syntactic model for how the data is formatted. So basically you can have the data in according to your own schema and then it's basically integrated on a semantic level. Also, that means the data basically exists redundantly to a certain extent. So describing real-world objects, of course, there are different data sets that may describe the same physical or real object in the real world. Also, these data spaces can be nested and can overlap. So if you are part of one data space, it does not say that you are not allowed to be a part of another data space and these can overlap. So this is the general design principles and what we came of then came up with when we started the IDS initiative, so the International Data Spaces Initiative, we added two other design principles which are circled around or centered around data sovereignty. So the capability of a data provider to control to a certain extent what happens to his or her data once it's gonna, it has been shared and traceability of these data sharing and data exchange transactions as well as trust among the participants that has been touched already by Chris and by Yves Alexandre when they refer to authentication and making sure that you are really you who are approaching a certain data. So how does that look like? You know the language of the engineer is the drawing so I don't want to go into all the details but touch about a couple of let's say roles that are outlined here. So we have on the left-hand side the data provider and on the right-hand side the data consumer which is nice and if you remember the fact that they will basically keep their data on their own of course if they wanna share the data they need to find each other, right? And therefore we need to have a couple of shared service let's call it in the middle in particular a broker service that basically makes sure that data demand and data supply can be matched but also a clearing house which locks not the content that is shared but rather the metadata of the transaction to make sure that a transaction has successfully be carried out. And of course we also envisaged an app store provider which is basically functioning as a repository for software services and basically software components that can be used as we know it from let's say the consumer a rearm when we use for example the Apple app store. To frame it a little bit more technically even what you can see here is the software components that are used in order to make this happen this ecosystem so to say of the roles that basically form a data space. And I would like to draw your attention to a component that we call the connector which basically makes sure that data from a data source can be retrieved then basically enhanced through certain pieces of metadata which I will elaborate in detail further on and then exchange the data, the payload data so the content of the data that's of interest together with metadata about what is allowed to be done with the data by whom. And on the other side we also envisage another connector which is the software component that is able to interpret these metadata and also to a certain extent enforce them. So the technology that we are using here is mainly relating to policy enforcement technologies and to be precise in related to a distributed usage control technology. So what does that actually mean? It allows business partners to share data in addition with certain usage constraints that basically determine who is allowed to do what with the data even if it's shared. And these usage constraints are depicted here in orange and you can see here from a supply chain context for example, they specify a certain use context so you are allowed to use my inventory range or capacity information data for a certain application for a certain use context and you are only allowed to use that data for a certain period of time. And in return on the OEM, so the automaker basically shares data with this first year supplier. Again, certain inventory ranges for example and specifies these pieces of information can be used only in a certain context and also for a certain period of time. So what we did is we came up through let's say an extensive longitudinal field study with a well set of let's say how we call it a 14 policy classes and they specify what data providers, data holders allow other parties, trusted parties to do with their data. For example, the usage is allowed in very general terms but the usage may be restricted for certain purposes. The usage is restricted upon the occurrence of a certain event or over a certain time interval. So you can use my credit card information until let's say the last day of the next month and then you are not allowed to do that. You can use my data end times and not more. Or for example, that you can basically use the data once and then delete it after. So these kinds of policies in order not to anticipate that but also point to this aspect right now could form let's say the foundation for how we sometimes frame it, frame it the terms and conditions for the data economy or at least in data sharing environment. So I want to be in control of articulating what others are allowed to do with my data. There is one important thing. The first thing is that the other party is able to interpret that and not by a human being sitting in front of a computer screen but by a machine. So it must be machine readable. It must be interpreted in a, let's say harmonized and standardized fashion. And to a certain extent, we also want to have it to be enforceable. So therefore the IDS connector, which I specified earlier on is also able to enforce these rules and in that case delete the data in that software environment when a specific event has occurred. That has close relations to let's say the FAIR principles that we know from let's say the use of research data and which are very prominently discussed in the current discussions about establishing research data infrastructures. And in particular, if we look into the accessibility principle, we can easily extend this through usage conditions. So usage control goes beyond access control because it does not only look into is the person allowed and authenticated to use to access the data but it also specifies how to use the data once the data source has been accessed. In general, what we see is that let's say the software architecture basically forms the foundation for then emerging data spaces which are depicted here in blue in the middle. And those two layers then are the prerequisite for the ecosystems that everybody's talking about and where we basically expect innovative services that make our life easier or even allow for new digital business models. I'd like to close with this chart which basically summarizes on the left hand side the business requirements that these data space architecture that we developed in the International Data Spaces Association is meeting. There are a couple of those that are related to the ecosystem that emerges on the use of this architecture. So it must, for example, be open for everybody to join. Everybody should be able to take up the architecture and the software components that are specified but the provisioning of services that are using this specification need to be certified. So I wanna trust that they are really not only telling me that they confirm with the rules but really have a certification as a proof point. Other requirements are related to data rights and data heterogeneity which I outlined earlier on and also data flow traceability because we learned in many cases that the reluctance to share data can already be relieved once transparency is created about what happens when I've shared my data because that is often not the case. Sometimes and many occasions right now I'm asked to share the data and never hear anything what happens later on. So transparency about what's happening and about the data flows is something that is deemed very useful. Talking about the economical implications from our point of view and also from our work also in the Gaia X context, we see a couple of questions arising that of course clearly go beyond pure technical aspects. So first is if I have such a system established which basically is a decentralized one which is not looking into one central data store which forms an infrastructure how do I fund and finance these kind of things? So is it according to infrastructures that we know from electricity networks, motorways? Is it based on usage fees? How does that work? And the other thing is also that we also when we look into let's say discussions around let's say data trustees and also when we look into the Data Governance Act that was recently published by the European Commission we see discussions around, okay we need somehow of a cooperative structure and a cooperative legal form to basically look in when we talk about the ownership of these data spaces. So is it a public-private partnership? Is it a cooperative society? How should that be materializing? The other thing is then clear incentive systems as we usually have them in let's say markets or also on platforms. How do we do that? And also an interesting point also how do we organize for data governance? Because in the past data governance has mainly its roots in within internal organizations where basically instruments of hierarchy can apply. I have a data standards and everybody is obliged to meet this standard but this of course cannot be achieved if we talk about ecosystems where we have let's say more freedom of course of individual members of this ecosystem to engage or not. So how can you basically ensure that data is used and managed in the desired way? The other thing is could those architectures be an instrument which once certified also by default fulfill the requirements of the Data Governance Act as I mentioned that is recently published and also could it be a technical means to build federated data trusts in order to also break the power of control of large platform providers and hyperscalers. Couple of references and with that I'd like to close my little talk. Thank you for listening and hope it was a little bit inspiring for you. Thank you very much, Maurice. There's one question by Paul Siebright which is, is it possible to verify that data has been deleted? And I would like to kind of make it a bit more precise. I mean, do you ensure that data is deleted by promising that you're going to be really mean legally to people who don't delete the data or is there something which you can do for an engineering solution? Yeah, no, there is something in between. There are technical means to allow for technical enforcement of the deletion. Of course, all these, well, technologies are overhead. That is something that we all need to be aware of. So what we see is that in let's say critical applications and when very sensitive data is shared that these kind of technologies are applied. And in other cases, we see companies rely on the fact that they have a proof point through the architecture. Okay, my business partner has received these pieces of data together with the usage constraints. So I have a proof point, okay, he should have known what to do with the data or not. And in the case, the business partner misuses the data. So basically violating these usage constraints I at least have a proof point. So at different levels, so to say, from organizational to technical means to realize and achieve these, well, this adherence to the constraints. But to respond to these questions, technical means are there and it's possible to enforce that technically. Thank you very much. So our last speaker is, oh no, I'm sorry, I don't know how to pronounce your last name. One other there who I met when he worked for in Grenoble for the Xerox Research Center, which was something of a mix between academic research and the consulting, internal consulting for Xerox. And now he's gone entirely commercial because he works for a booking and is going to speak to us about, you know, how do you go about implementing mechanisms in real life, which should be dear to the heart of many of us. Oh no. Thank you, Jacques. Let me share my screen. Does that come up correct for everyone? It's perfect for me at least. Perfect. So good morning, good afternoon and good evening for everyone in the world. I saw in earlier presentations that some of you know booking.com already. We're one of the leading online travel platforms. And the main part of our business is a two-sided marketplace where we help travelers to find places to stay. And so on the buyer's side, we're referred to customers, on the seller's side, we refer to partners. And we are a sizable company, but maybe more important for this audience, for the Digital Economics Conference, we participate in auctions and we run auctions. So on the participating side, the officially announced numbers of 2019, our marketing spend was very close to $5 billion. And the vast majority of that is through auctions. So that is buying advertisement spaces next to search results, but also places in meta-search ends such as TripAdvisor and Trivago. But we also run auctions. So it's not as prominent and well advertised as in say paid search, but it definitely is an auction. So for every customer coming to our store, there are different auctions for partners to increase their visibility. For instance, they can give discounts to our loyalty program customers, which we call Genius, in return for increased visibility to that group. Or they can partners can increase their cost of service in return for increased visibility. The details of the programs are maybe not important for this panel presentation, but with this background of such a big company, both participating and running an auctions under the banner of what engineers would like economists to know. I think I want to share some, say perhaps a request from the trenches of things that you would appreciate when you really try to improve, say the systems that bid on these billion dollars or improve the mechanisms that are at the heart of such a big listed company. So when it comes to participating in auctions, and so maybe I'm addressing myself mostly to those of you that are designing auctions or consulting for companies that are designing auctions. In the process of bidding, as we spend billions of dollars every year, we put in a lot of effort in building models that predict return on investment. And what we find is that there are many aspects in context inputs that we would like to condition on that help in our offline setting, but we can't communicate to the auction, right? So it might be worth your while to discuss with the advanced bidders in your market to reflect if there are opportunities to expand the bidding language, right? Because we find now that we need to approximate certain bids, which of course leads to a reduced efficiency in the market. And the same goes for the type of data that is given back to bidders. So sometimes we see there's a layer of aggregation and that severely hinders us in bidding accurate models that would give accurate return on investment prediction. And again, that would lead to a reduced efficiency in the market. So these are relatively obvious points for those of you that are building designing auctions or consulting for companies that build them. On us running auctions, there's, with all things, nothing is ever perfect. And we would like to always improve whatever we're doing. Same goes for our auction format. But what we find is actually quite difficult to do for such a big company. It is an oil tanker and it is difficult to sometimes make changes, even if we would want them to. And so I want to show two aspects. So again, if we take the engineering hand, if we compare what it would take to change a mechanism to say deciding what new servers to buy. So there's a typical engineering decision. And in some cases, just having the specs in the price is enough already to make an informed decision. And for mechanisms for auction formats were quite far off from this. Even though there are very many beautiful theoretical results, very powerful results, they all rely on assumptions that need to hold. And it is very difficult for us to figure out a priority if all the required assumptions do hold. And so in that light, there would be a lot of value for us in more empirical studies, maybe even just war stories on the successes and failures of changes in mechanisms in auctions, right? So the very fundamental theoretical results are important, but there would be a lot of appreciation for these empirical studies as well. It might be non-trivial to get unbiased reports of this if there is a PR department that can censor things. But if we take a cue from AB testing, I think the relative openness in big companies about their successes and failures in AB testing has really helped in democratizing that for smaller companies to pick up this practice. And it's something where we actively try to participate in that literature. And second, and actually last point, I'll try to make it short is the end of the conference. Is if we compare improving mechanisms versus improving the website, right? So if we were one level up from say the infrastructure decisions, as I'm sure you're all aware that most e-commerce platforms are improved using AB tests or randomized control trials where say if you want to compare the version of our red button and the green button for every customer that steps in the digital store the coin is flipped. And depending on the outcome of that coin flip the red or the green version is shown and depending on the metrics that you're interested in for instance, say clicks then after sufficient amount of time you'll have the data to determine if the red or the green button leads to more clicks of customers. This gives the opportunity to improve using small steps that are typically reversible and therefore very safe and therefore test driven a product development. Now there's a very powerful tool and something what I'd not fully appreciated before joining booking.com is that it also influences culture, right? So there is actually in this AB testing space this slogan of listen to the data don't listen to the hippo where the hippo is the highest paid person opinion and the slogan is pushed so much to the fact that to the point that Microsoft is handing out plushy and squishy hippos as swag at conferences there's definitely power to this philosophy, to this slogan. The flip side is that it makes it hard in that culture to make big, to champion big untested bets and large changes in mechanisms would actually fall into that category. So it's something that the whole culture is steered away from. And so maybe the main point to make is that there are maybe tools lacking or the mature tools lacking. And so one is that if we would have reliable validated testing methods for mechanism changes that would be very worthwhile, right? And with validated I mean I'm aware that there are more and more papers coming out suggesting ways to experiment with mechanisms. We would also like to have, say, the test of the test. So we have the test validated. And with that I mean, if we consider that we introduce, say, geo-based experiments within the company, then we have the luxury that we can validate that against our AB tests. So if we ran an AB test with red and green buttons we can artificially generate by looking at the sub data we can recreate an experiment where we give only the green button to London and the red button to Paris. And then compare the outcome of the geo-based experiment to our AB test to our more rigorous randomized control trial. And with that we can get a sense of how accurate are there any pitfalls that we might not have covered. So that's on testing, of course, that would be one aspect of this, say mirror image of AB testing for auctions. The other thing would be small steps. So another benefit of this experimentation-based product development is this opportunity to make small steps that are relatively safe. Right, and so that could take the shape of a mechanism that has, say, a degree of freedom in it. So we screen our eyes and we look at paid search auctions where there is a quality score. The quality score that it is there and what role it takes in the auction is communicated clearly. What secret ingredients go into the soup that make up the quality scores is maybe not fully disclosed so that over time changes can be made to it. That's a very nice property of such a mechanism if at day one you're not 100% sure what this quality should be, right? So one aspect could be is to design the auction format for data-driven iterative improvement. The other could be we might have a strong feeling that with a fundamentally different auction format we'd be better off if we can find a trajectory of smaller steps that would take us there, then the step of starting that journey is a lot, that barrier is a lot lower than if we would have a full committed bet the farm change in one go, right? And then combined with either of ideally experimentation and failing that with a monetary would really, I think in all these e-commerce platforms would create a practice where there would be an iterative improvement and testing of these mechanisms. So maybe it's a mix of a presentation and say a request. So I speak for ourselves, but I think I speak for many e-commerce platforms that there's a realization that none of these things are ever perfect and there's a willingness to improve them, but the tools are currently largely lacking. And so we would be very happy consumers of any theoretical results that you might produce, but also indeed in empirical findings and reports. So with that, I hand it back to Jacques and we're of course open for questions and suggestions today or you can contact me via mail.