 Hello everyone. This week we're going to discuss privacy preserving contact tracing to cope with the current crisis of the COVID-19 virus. And we'll discuss the technical aspects of contact tracing in general, the challenges to gain trust among the population. For instance, in terms of privacy preserving and in terms of robustness and resilience to malfunctioning. So let's first start with maybe explaining the protocol. Maybe you can start first with the motivation. Okay, let's maybe begin with the motivation. Go ahead. Yeah, so as you probably know, currently there's a pandemic due to a disease called the COVID-19. And the reason why this is very hard to address is that COVID-19 is extremely contagious. And this makes like the exponential growth that you've probably heard about. But what makes COVID-19 particularly more trickier than a lot of other diseases is that COVID-19 is actually contagious, especially for like at least a lot for people who don't have the symptoms right yet. Like you can be contagious even though you don't have any symptoms. So it can be very hard for you to know that you should be a lot more careful. And maybe you should stay home because if you go outside, then maybe you can transmit the disease to somebody else. And this is extremely problematic because as soon as you are not careful, then the number of people you contact can be larger than one. And if it's larger than one, then you have this exponential growth and essentially the disease is out of control. So if you want to make sure that there's not this exponential growth that the pandemic will be contained, what you really want to do is to make sure that anybody who's at risk will be staying at home and self-isolate. This is by the way the recommendation by the World Health Organization and many epidemiologists agree with this. And this is by the way how some countries were a bit more successful, such as Taiwan, which arguably has the best management so far of the crisis because people were tested very early and quarantined very early. Yeah. And so to do contact tracing, what people usually do, so the idea of contact tracing is that you should try to identify who is likely to have the COVID-19 by looking at what are the infected people, the people who got tested positive to COVID-19 and who these people who are now tested positive were in contact with a few days ago and may have transmitted the disease to. And usually you can do this, well, still now you can do this by hand with humans, like you can take someone who's positive and you can ask him who did you get into contact with over the last few days. But the problem with this is that it's slow, it's also very costly, you need a lot of manpower to gather this information. And it's what are flaws because people can misremember or people have bad memories. Unfortunately, the human brain is not the best storage system. And the idea of what we're going to talk about today is electronic contact tracing, namely the idea of making sure that this contact tracing is done by algorithmic solutions in particular by your phones. And DP3T is a particular protocol to achieve this while guaranteeing a high level of privacy and security. Actually, I believe the motivations are not well communicated and understood. So you mentioned the exponential growth and maybe we can spend some time on how much non-intuitive an exponential growth it is, even for the most educated scientists. I don't know if Louis, you want to comment on the non-intuitive part of an exponential growth and why we should be really very careful and not be overconfident as, for example, Angela Merkel warned two days ago. Yeah, so to explain to my friends, I use this famous enigma proposal that I heard long time ago, is that in the middle of the lake, there is a flower that doubles the size every single day. After 31 days, it covers the size of the lake. And the question is at what point, at what day did it cover half of the lake? So the answer to the question is sometimes people would say after half the time, so after 15 days, the correct answer is that it was just one day before covering the full lake, it was covering half the lake. And now I put this in parallel with how full are the hospitals right now due to the pandemic. At some point in time, the pandemic was growing with at least doubling every three days. When the hospital is still 50% full, the problem is that everyone would have the impression that, oh, we still have a lot, more than half of the hospitals are free, we still have a lot of capacity. And we have been talking for COVID for one month now. It took one month to get hospitals half full, so we are not at risk at all. But this is untrue to relate to the case of the flower in the lake. Half full capacity is happening three days before the hospital is over with this patient. So yeah, if it takes one month to get half the hospitals full, it will just take three additional days to get them 100% full. And this is why it's really not intuitive to deal with exponentials. And so maybe if, I don't know if you want to comment on the efficiency of contact tracing in the case of Taiwan, or we may just go ahead and discuss the protocol. Yeah, so what I want to stress is, well, I'm going to talk about something else, but what I want to stress is that it's not only about saving the lives directly caused by COVID-19. The current lockdown in most, like in all developed countries, nearly all developed countries in the world, is putting a lot of stress on a lot of different things. It's no longer only about healthcare. Now it's affecting the way businesses are working, like companies are in shutdown for months. Most of them cannot afford this, many of them at least. Some of them have to ask for loans just to stay alive. If the lockdown carries on for more months or if there's a second confinement because there's a second wave of the pandemics, then many of them will probably have to declare bankruptcy. And this will lead to a lot of unemployment. Like in the US right now, it's particularly critical. Like there are tens of millions. I don't know whether figures are right now. 16 million, at least, I heard. And talking about exponentials, it was 6 million and then a few days ago it was 16 million, so almost tripled. Yeah, so all of this is also growing exponentially and the risk of growing exponentially. So that's why it's extremely important to avoid like potentially major risks that we think about the confinement and hopefully doing this not in a too distant future. But it has to be safe. Like it's critical that it's a safe confinement. Maybe just a point about economy because people hear a lot about this dilemma between saving the economy and saving lives. And unfortunately, there are some people who are arguing for saving the economy with sometimes the wrong arguments. Actually, they're not like contradicting objectives. And if we have unemployment figures raising, we would eventually have deaths and suicides and several like in many countries, people's healthcare is tied to their employment. So if they lose their employment, they lose their healthcare. And also like there are many side effects of being under lockdown. For example, domestic violence is on the rise according to many public health agencies. And this is also something that is a side effect of pro-engaged long-term. So looking for a shortening lockdown is not only motivated by basic greedy economic motivation but also life-saving economic motivation. So people really have to take time and think about that those are not contradicting objectives. Saving the economy and saving lives are not necessarily always contradicting. And it's not about the rich. I mean the rich people are going to be fine. This is about poor people who are homeless or don't have a great housing. So you see videos from poor countries where some people are literally asking for the release so that they stop being hungry and not everyone has a safety net and the salary that is popping up at the end of the month while being remotely working. Yeah, yeah. So that's why it's really important to think about how to make the economy. But it's also critical to make it safe. And that's why anything that can help should be at least considered. And in particular contact tracing is estimated to have a non-negotiable impact. It's not going to be life-saving but it's not going to be completely changing the game. But it can arguably, like I've made some rough calculations based on the estimations of pre-sumptomatic contaminations, contaminations by people who don't have yet the symptoms. And if you can reduce this even by 50%, then it can reduce dramatically, or even by 20%, it can reduce dramatically the contamination numbers, the reproduction numbers of the number of people every person who contaminates. And this can reduce the lockdown by month or accelerated by month or it can prevent a second confinement. And this may save in the end maybe hundreds of thousands if not millions of people. So it's really, really critical to do, to at least consider contact tracing. And for this, we need to do it right. Okay, so maybe now let's move to the protocol and maybe just discuss the basics and then discuss specific ones such as DP3T. So the basic idea of contact tracing is that as we all have a smartphone with us, as most of the population has a smartphone with them at all time, a smartphone will be communicating with one another and register which other smartphone to get close to. Then when someone feels sick and make a test and they realize that he is infected with a COVID-19 disease, then he will publish on an online database, let's say its name, and other phones that registered that they went close to that person would then be able to know that they were in contact with someone that was sick plus or minus a few times to know at what time this happened. And then people who were close to someone that got sick will receive a notification on their phone telling them you are at risk, you should maybe confine yourself or get yourself tested to know. One advantage of this contact tracing also that we didn't mention yet is that regular contact tracing without digital technology when we simply ask the sick person who she has been in contact with, it not only has a problem of memory or this kind of human problem, but also that there are a lot of people that you pass by every day but you don't know them so you will never be able to recall and point to them. For example, the person you were next to in line in the supermarket, the person you sat by at the cinema, the person you sat next to you in the bus, all of these you don't know but if your phones are able to detect that they have been in contact, then this person will be able to notify this person so it makes it a lot more efficient. Maybe I will show a cartoon made by Muti Case and with the help of Professor Carmela Troncozo from ETSL. So this is a cartoon about how it works and especially this is a very general representation and I think it's helpful to understand the core ideas behind most of contact tracing apps. So this is what you explained, Alice's phone, so our phones are always like broadcasting a signal to antenna etc. But here in this case it will be through Bluetooth so they will be just broadcasting them. Yes, what I didn't say is that this doesn't work by sending your name to every phone that pass by next to you, but it works using a randomly generated key and ideas that will be updated every few minutes. This is made to avoid leaking outside around you too much private information. So if you take this case for example, this phone is broadcasting those random 5L, POMK etc. And when the two phone owners met they exchanged those messages and they keep track of all the random stuff they received. So both phones remember what they said and hear in the past 14 days. Of course the 14 days can be tuned depending on what epidemiologist and virologist know about the virus and depending how the knowledge on the virus is updated. So then if one of these people, so you can imagine many people meeting many people, so this is your phone, it keeps track of all of the random stuff that it broadcasted and then it keeps track of all the random messages that were received by nearby devices. And then when someone is diagnosed positive in a hospital, they send, so she sends the message to a hospital. It can be a hospital, it can be a public health authority, it can be any abstraction that you can put here. That is not necessarily a hospital. And then, so here you have a database of what COVID-19 positive cases said. And then Bob, who happened to be sitting next to Alice earlier, could, so actually not Bob manually, but the app, the protocol, the algorithm would compare the public record of the random messages said by COVID positive people, but the algorithm would recognize the messages and then it would recognize that it sat next to someone with COVID, that it's important to highlight here that this is not enough and absolutely not enough for Bob to identify Alice. And then Bob would just then, if it's received messages, like for example, saying that, okay, you received this, so you have two hours of exposure to COVID positive in less than some meters, it means that Bob should also self-isolate and with this would divide by two. For example, if Alice self-isolates because she was tested positive in a hospital and Bob also self-isolates will divide by two the potential spread from the Alice case. So to conclude on discussing the efficiency of such a technique, what's important to understand is that this technique would be efficient proportionally to the square of the number of the proportion of the people that are using it. The way to understand this is that every time a contact happens, the probability that we detect this contact, it will be the probability that the first sick person is using the app multiplied by the chances that the second person is also using the app. So this leads to, for example, if 70% of the population is using this contact tracing app, we would get 70% to the square, so 49%, approximately, so it will detect approximately half of the contacts, and we can expect it to reduce the exponential growth of the pandemic. But on the other hand, if only 10% of the population is using it, it only detects 1% of the contacts, so it would be, in that case, not worth it to use such an app. Yeah, the effectiveness of contact tracing is critically dependent on its wide adoption. And fortunately, like right now, it's still very hard to foresee how many, what is the fraction of the people who are actually going to use it. A scary number is that in Singapore, there was like around 20% of users, of people in Singapore using this app, at least in the beginning of April, I don't know if the figures went up, I hope they did. So if we want this ocean to be effective, it's extremely important that we communicate well around this. And people gain trust in the system and they're willing to use it, unless it's made mandatory, that could be another way to go. But so far it seems like there's a strong push, maybe it's going to change, but there's a strong push for this not to be mandatory. Yeah, so there's a big challenge here in terms of communication, and the other challenge is that if you want to convince people to use this app, then it's also extremely important for it to have guarantees of privacy. God, it's important in general, I guess, but especially in this case, privacy and very good privacy is extremely important so that it's like people really think it's a good application and also because it's going to be allowed to protect the different users and the P3T in particular gives a huge emphasis on privacy. A lot of the protocols are designed so that even... So there are two kinds of attacks we can think of on the system. One kind is called curious but honest. So this would be typically spies, people who are trying to find information by poking into the system. And then there's this other kind of bad user, which is a user that tries to screw up the system. Sometimes we call them Byzantines in computer science. And you want the system to be robust to both of these kind of attacks. So in terms of privacy, the main trick is this idea of random ephemeral identifiers that we've already talked about. And I guess there are a few additional tricks and subtleties maybe that I won't get into. But as well, I think the Byzantine resilience of the system is extremely important. And it has been criticized, like maybe the system is not going to be resilient enough because typically the basic attack is that anyone can create a lot of identities and just say publicly that it has contracted COVID-19 so that it scares a lot of people and makes the system eventually not reliable. So this is prevented by using... Essentially you need a proof that you have been tested to the COVID-19 to be able to say that you actually contracted the COVID-19 and this is based on interaction with health agencies. But there can be other kinds of attacks, for instance replay or relay attacks. So essentially, as soon as you see a name appears... So one kind of attack is as soon as you see a name appear in the list of COVID-19 confirmed cases, based on this information you can compute the ephemeral ideas of the people who contracted the COVID-19 and you can replay or relay these ephemeral ideas to scare people off so that people will think that they've been in contact with these ephemeral ideas that are associated to someone who has contracted the COVID-19 and this can create a lot of people scared for bad reasons. So defending against all of these attacks is really a big challenge and I think DP3D did an excellent job at mitigating a lot of these challenges. But my main concern right now is about these Byzantine resilience. You can imagine that this system is not going to be very popular among some communities that may maybe fear some privacy issues with this app or maybe they just don't like technology in general and I think we should fear that some people will try to screw up the system because of this and because of this I would be more confident if the system was more Byzantine resilient. But maybe if you want to really guarantee that it's more Byzantine resilient then you need to sacrifice something else maybe in terms of privacy. For my experience in researching Byzantine resilience and fault tolerance in general everything has been equal it's harder to achieve it if you also want to preserve privacy and vice versa it's harder to achieve Byzantine resilience if you want to achieve more privacy and it's harder to achieve more privacy if you want to achieve more fault tolerance because and again this is very high level specific situations can make this change but in general when you want to achieve privacy you want to obfuscate who is saying what and when you want to achieve fault tolerance and being resilient to malicious inputs etc you want to spot misleading input from the group of other inputs and by spotting it you might de-anonymize it. So it's a very hard challenge to combine privacy preserving and Byzantine resilience it's not impossible as it is it's extremely hard. Yeah it makes a lot of intuitive sense like if you're trying to set up an organization if you want to make sure the organization with people I mean if you want to make sure the organization is working well intuitively the more you know the different people and the strengths and the vulnerabilities the more you can make your system robust and secure but conversely this means also that you're going to learn more about the other people which is a violation of privacy. Yeah so I think for this kind of thing it's important that we're not being too deontological I think things are not going to be black and white it's not there's not going to be like fully privacy fully robust Byzantines and I think there are trade-offs and we need to acknowledge that there are trade-offs and maybe express our different preferences regarding these trade-offs and another thing is to do not like it's important not to lose sight of the fact that at the end there's also an effectiveness problem you want the application to be effective at identifying contacts between infected people and susceptible people and yeah I think it's important to take all of this into account when judging whether this application particularly DP-30 let's say should be widely promoted should be widely used in societies or not And now maybe another challenge that we would have is the multiplicity of protocols and applications because as we explained in the beginning the power of contact tracing grows quadratically with the number of people who adopt it so again if you move from 10 to 20 you don't double the efficiency but you multiply it by 4 so keeping that in mind you would like the contact tracing protocol used inside a geographic region to be widely adopted and ideally inter-operating so that we're not a pocket of people using protocol A and another pocket of people using protocol B and app B and this is a very hard challenge now we face where the multiplicity of protocols and app could also be a barrier to the efficiency of contact tracing so what do you say about that? Yeah at the time we're speaking there's a proposal from a French and German computer science organizations about the inria and the Fraunhofer institutes and inria from one hand and then you have to They propose an alternative to DP-30 called Robert so we haven't had time to read the details so far It was just posted today I think there are differences that are important enough so that they're not clearly compatible the two of them so at some point I think there's a coordination problem maybe in the future there may be different proposals so I think so far it's fine to have a lot of different research I think it's even desirable because maybe you haven't explored all solutions so far but at some point when you move to deployment it's important that there's an in-coordination problem and you don't want people to speak between these two different apps because of the context So ideally you would like a thread of deployment that is global and centralized and multiple threads of exploration research that should be cut free in the future but there's consensus about one protocol ideally this protocol should be the one deployed and this raises another challenge which is very general which is global governance and we've seen it with the World Health we are very fortunate to live in an era where we have the World Health organization despite all what we can say about how it did not or did maybe something less good than it should have been done etc if we didn't have a central global organization such as the World Health organization I think this pandemic would be a nightmare we have examples from the past Yeah So I would strongly argue as well that the World Health Organization has been critically relating information calling for stronger measures and this kind of coordination is a huge challenge and it's going to be important especially for content So we could imagine for example so research going on independently several research groups coming up with different protocols but in deployments ideally it should be for example the health authority in Switzerland that organizes the deployment in Switzerland that the health organization in France organizing the deployment in France etc and eventually when they come up with improvements on each deployment they have interoperate operating apps because eventually borders would be opened and people would be traveling again and this might happen sooner than vaccines arrive So one thing we didn't mention about the contact tracing app also is that a second goal of this project DP3T is to collect data from but on a voluntary basis from whoever using the app would be who accept to to give this data to epidemiologists because today we are collecting already quite a lot of data but somehow there is a delay between the data we are collecting and what really happens because we collect data on a most likely patient that already go to the hospital or already have symptoms and it comes after 5-6 days of being infected So using these apps if epidemiologists would have the opportunity to collect much more interesting data data that's more on time and for example as we were discussing different measures to reduce the confinement over time and if we want to know if we are doing the right thing we want very quick feedback on the measure we are taking like if we reopen the cinemas we want to right away know was it a mistake, should we go back but the problem with the data we collected today the problem is that we will know 6 days later and because of exponential growth parameters it would already be a huge mistake by the time So yeah, sorry, we would know but with the delay that is extraordinary dangerous so we would know but with the 6 days delay which is given the exponential we kept repeating since the beginning would be a dangerous delay and arguably the delay is longer than this because if you look at the confirmed cases we would say 14 days maybe more in the order of 14 days than on the order of 6 days So yeah, real-time estimation of the reproduction number in particular is critical like to see if our interventions are good enough to contain the pandemic or not A lot of people are scared about the privacy concerns about this application so I think DP32 is like excellent but maybe it's still hard for some people to estimate the privacy concerns and I think it's interesting to compare this with the privacy leaks that we have by using other kinds of applications and particularly you can think of Google or Facebook, Amazon or Tinder or whatever like all the apps you're using on your phone DP32 is designed to violate a teeny tiny fraction of the privacy that you are leaking through these other apps Some friends, all of them academics on Facebook were discussing the privacy concerns with DP32 and they just said exactly that like this is a fraction so you're here, you're giving your real names etc, location and also so if you see the allowance you would give to DP32 it's like it's epsilon, it's like tiny and microscopic compared to the allowance so people are now renting against the privacy concerns with DP32 using social media that is already they publish it on brands on Facebook which is a bit weird I think it's important to have a more quantitative approach to these notions of privacy because if we don't we can miss out on the fact that yeah, there are going to be privacy leaks but they are so negligible compared to giving away to other apps and maybe the counterpart of this is that right now there's a lot of uncertainty about a lot of things going on for instance, are people following rules? How many people are there outside today in New York City or whatever city you think you're having in mind? and like I haven't seen Google, Facebook and so on being involved in these kind of thinking even though they have data that are so relevant to this actually Google published and Apple published Mobility Reports I agree with Mobility Reports I think you missed that yeah, we've seen it today I agree with that, yeah that's helpful that we could have better data also now published machine readable version of that data for researchers, so CSV instead of PDFs it's a bit counter intuitive because in proximity tracing you can just let the app know if you met person if Alice met Arno but it won't need to know where they met the location where they met even though some countries are using location we might argue that that's not necessary so you only need to know if A met B you don't necessarily need to know where they met and what did they say or what did they talk to each other so with way less data just whether A met B you can have valuable epidemiologic information while social networks are accessing way more data and not necessarily giving researchers anything to use yeah, there's an application by MIT called Private Kit where you actually have exploited GPS data to do content tracing somehow I think it's a bit of a shame that there's so much data about these companies and fortunately maybe like I don't know all the details but maybe these companies did not do enough effort in trying to work with health agencies and help them but maybe on the other hand researchers did not do the effort of going to these companies and asking them I know there's a lot of legal concerns also doing this especially on the GDPR it's quite a complicated problem but given the scale of the problem right now I would hope that there could be more kinds of food full collaborations along these lines next week we will be talking on other AI algorithms that can help fighting the Covid disease in particular to test different proteins, different molecules and do research for I guess vaccines these are a few ideas listed by Jürgen Spithuber who is one of the founding fathers the most influential researchers especially on your networks good bye