 Our closing and expected keynote is for Nuria Oliveira. Nuria, hi, how are you doing? Hi, how are you? Good evening, very good, thank you. Are you a volunteer or are you a volunteer? I'm in Alicante. Alicante, not that far, not that far, it's fine, it's fine. For sure the weather is better right here. So, Nuria... 27 degrees today. Oh my God, oh my God. That sounds great, that sounds great. Well, again, let's turn it to the audience, that they can make questions that, questions in English and in Spanish are both welcome. Then Nuria and I, both of us speak English, but we speak much better Spanish, so both of them will be welcome. Nuria is talking about the war that data science is fighting with COVID. I think it's a very, very interesting topic. So, Nuria, it's all yours. Thank you, thank you so much. Well, thank you for the invitation and for the interest in the work that we have been doing. So, as probably all of us, our life was very different in March. And in my case, I was very focused on ELIS, which means the European Laboratory for Learning and Intelligence Systems. And finally, I was very focused on creating an ELIS unit in Alicante, which is a research team working on human-centric AI, and particularly on three areas of AI. But then, of course, COVID-19 happened. Spain was one of the most impacted countries in the first wave. And I had worked on the value of data science in the context of pandemics since 2009 with the H1A1 flu outbreak. So, I felt compelled to contact both the central government in Spain and the government in Valencia, the autonomous region, to propose to them the idea of creating a data science team that will be working on analyzing data and trying to help them have better decision-making. I got a very positive response from the government in the Valencia region. And very quickly, basically the same day, they said, yes, let's create the team and let's see how data science can help us in the context of the pandemic. Particularly, the goal of the work that we have been doing since March is to bridge this gap, the gap between where the data is and where policymakers are. Ideally, we would like to make decisions that are based on evidence, and that evidence is typically captured by the data. So, if we want to be able to make policies that are substantiated by this evidence, we need to be able to analyze the data and draw insights from it and make sense of it. But there is this big gap, as you can see in here, between where the data is and where the policymakers are. So, our goal is to bridge that gap. And how have we been doing it? We are dividing our work in four large teams. The first team has been working on mobility modeling. We know that an infectious disease like COVID-19 that is spread from human to human doesn't become a pandemic if we don't move, because it is by moving that we are spreading it geographically. So, understanding human mobility, measuring human mobility is very important to understand how the disease might be spreading. But also, we have been confined, and one of the strategies that has been used to contain the pandemic is reductions in mobility. So, measuring mobility can also enable us to determine whether these confinement measures are actually working or not. The second team has been working on computational epidemiological models. And actually, the previous speaker has presented the different techniques and types of models that are available. In our case, we have used two types of models, compartmental metapopulation models, SAIR model, and then an agent-based model, as I will explain later. The third team has been working on building predictions on the number of cases, the number of hospitalizations, the usage of the intensive care units, and also, for example, infrared prevalence. And the last team has been working on a very large citizen survey called COVID-19 Impact Survey, which I would like to invite you to participate in if you don't know about it. And that has been really instrumental in helping us understand the situation and the perception of people throughout these 34 weeks of pandemic right now. But still, the work of these teams is quite technical, and there is still a gap. So, one of the secrets for a team like our team to be successful is to have some of the members from the decision-making side of things be part of the team. In our case, we have a director general who works for the president of the Valencian government, Anna Beringer, who is part of our team, who comes to every meeting and who helps us identify priorities and also translate the results of our work into actions or into items that will be actionable and into insights that will be useful for the policymakers. It's not easy to create this team, a team like this one, and that's why there aren't that many of them. Some of the challenges are related to a lack of capacity and a lack of a digital mindset in a lot of the public administrations, difficulties in terms of accessing data, loss of concerns or privacy and data protection, even if all the data is fully aggregated and anonymized, like in our case, difficulties around the gap that there is between where research is and where the operational projects are. And also in the context of a pandemic where you need to make decisions really quickly, there is actually a lack of preparedness for this kind of immediate action. If you are interested in knowing more about how mobile data can help in the context of public health actions, a team of scientists from different countries we published this paper early in the pandemic back in April. And because these teams are not so common, our team has actually been featured internationally in MSNBC or in Politico. In terms of the technical skills, everyone is a scientist working in one of the universities or research centers in the Valencian region, but depending on the area of work, they have different areas of specialization. So the mobile data analysis team has a lot of expertise in data wrangling in a spatial temporal, time series, visualization, statistics. The epidemiological team has a very strong background in modeling and computational modeling. And the prediction team and the survey team has a lot of background in the statistics and machine learning. These are all the members of the team. So the work that I'll be presenting is the work of this large team that we've been working together. And until the summer, until I guess the new normality in Spain at the end of June, we have been meeting every day. Now we meet every week. We have daily meetings that are organized. We have a common code repository and everyone signs NDAs and code of ethics. We have very strict data access controls and we communicate via Slack channel. This is an example of one of our daily meetings in April or in May. If you want to know more about our work, we also have a website in the Generalitat and you can read some of the reports that we've written for the different areas of work. So what have we done? I'll just give you a quick summary of the main lines of work in the different subgroups. In terms of the mobile human mobility data analysis, we were the autonomous region that was declared the pilot region in collaboration with INE, the Spanish National Office of Statistics, in getting access to aggregated anonymized human mobility data derived from the mobile network infrastructure through a collaboration that INE had with the three largest decos in Spain. So through that collaboration, we were able to understand how mobility changed during confinement, whether the interventions worked or didn't work, what kinds of mobility were impacted and also if the reduction of mobility was enough to contain the pandemic. This is a capital visualization of the data. This is the Valencian region of Spain. It's on the east of Spain by the Mediterranean coast. So one of the first results was related to the radius of gyration, which is the radius of the circumference that contains most of the movements in a population, and we found a very significant reduction in the radius of gyration from the moment we started the confinement measures in March, where the reduction was 65%, which was larger than the 54% average in Spain. This means that if before confinement the radius of gyration was 10 kilometers, during the confinement it shrunk down to 3.5 kilometers. Another analysis that we did was related to the stay at home campaign. Here you can see, and I will explain a little bit more about this data that we had access to and that is actually available now in the website of the National Office of Statistics. The National Office of Statistics divides the space into these cells, and these cells have to have at least 5,000 people living in them. So if there is municipalities that have less than 5,000 people, then they put together different municipalities until you get the 5,000 people. If there are municipalities between 5,000 and 70,000 people, there is only one cell, and then municipalities that are larger, they are split in different cells. For example, the city of Valencia, or Castellón, or Alicante. For each of these cells, we get an estimation of how many people that are in the cell that sleep in that cell and then how many people who do not sleep in that cell, so that cell is not their home cell, spend more than two hours in that particular cell, and that cell is the one where they spend the most time. The data is actually recycled data from a pilot project that the INE had with the telcos to compute labor mobility. So the main purpose of the data was to measure how many people were moving for work, and that's why they measure which cell you spend the most time in a day outside of your home cell. But we could repurpose that data in the context of the pandemic. So when we look at what percentage of people didn't leave their home cell during the day for more than two hours, always in their cell, except for maybe they went to another cell for less than two hours, we found a very significant reduction during the confinement. During working days, 88% of the people remained in their home cell and in weekends, 92% of the people. You actually can access all the data and all our analysis on this website where you can play, you can determine the time period and you can see how many people stayed home, how many people left home and so forth. This is a visualization of the percentage of people that stayed in their area of residence where from March 16th to April 27th where the two weeks in Spain where we didn't have labor mobility where the ones from starting on March 30th until the Monday April 13th or 14th, these two weeks. So the greener the map, the larger the percentage of people that stayed in the area of residence and as you can see before the labor mobility confinement there were a lot of areas that were yellow and orange which meant there were 80 or 70% of people remaining in their home cell and then during the confinement everything became really green meaning that a lot of people did stay in their home cell. All the analysis we've also performed them with different spatial granularities including the Department of Health granularity which is the one that is meaningful for the Department of Health here in the Valencian region of Spain there are 24 departments of health here we can see the percentage of people in the area of residence working days versus weekdays in the different departments of health in the Valencian region. We also looked into labor mobility because labor mobility is one of the biggest sources of mobility and we found that on average there were 60% fewer people outside of the area of residence during working hours when compared to a baseline day in November we had access to baseline normal day in November during the confinement period between March 16th and April 27th so that was a big drop in labor mobility. We also defined a variable called the activity variable which measures the difference or the sum of the incoming flows and the outgoing flows in each of these different areas in each of these different regions and we also found very significant drops of activity here the greener the region the larger the drop in the labor in the level of activity when compared to a baseline day in November on March 24th we still had labor mobility so we see that most of the areas are light green or yellowy which means a drop between 40% 50% of activity levels versus a baseline day in November and what happened during the two weeks where we didn't have labor mobility so we found a very significant drop in the activity levels with most of the map is green meaning we had a drop on average larger than 70% and in some regions it was as high as 95% covering up the same analysis in the 24th Department of Health we also observed a big drop in the levels of activity during the confinement which is in yellow versus the baseline in November which is in orange you see this is the levels of activity during confinement versus a baseline in November using this mobility data we can also run community detection algorithms to identify regions that are self-contained in terms of their mobility and we did that because we thought that could be helpful if there was going to be the case of doing selective confinement of selective regions it would be very helpful to know how connected different regions are and which regions have a lot of internal mobility but they are not very connected they don't have a lot of mobility to other regions so using the same data we identified we run our community detection algorithms and we identified 14 communities 14 large areas that had you know, pretty high levels most of them of internal mobility from 93% this area to 40% the least contained area and we thought that identifying these areas would be helpful if we ever had to make decisions about doing partial confinement of course the Valencian region is very touristy so we also analyzed the impact that this had on tourism and we found a very significant drop visibility and availability and presence of phones from outside of the Valencian region and from outside of Spain during the confinement the second team is working on epidemiological models and luckily the previous talk has been about this so I don't really need to explain in much detail the main purpose for this work is to be able to answer questions such as what is going to be the evolution of the pandemic how many people are going to be infected what is the impact of the different confinement measures are the confinement measures enough to flatten the curve and to lower the growth and the number of infections and so forth so we've been running two different types of models say a metapopulation model where it's a compartmental model where you divide the population into different states which are S for susceptible E for exposed I for infections and R for recovered or retired from the system for COVID-19 the parameters that determine the probability of moving from being susceptible to being exposed from being exposed to being infectious and from being especially recovered are defined already so we use the data from the literature the parameters from the literature this model is given by this creation that give you how to update the populations in each time step using this model we fitted the model to the Valencian region and to the different provinces and we've been running the models since then and updating it we've also adapted an agent-based epidemiological model to the region and we've been running in parallel simulations with both models the metapopulation compartmental model and the agent-based model each of the agents is one citizen in the Valencian region so we have five agents and they have their demographics and they have their behaviors and then based on what they do they might get infected or not with the same parameters in terms of the probability of infection and so forth as the same model running these two models we were able to do different scenarios and see what was the impact on the pandemic and doing nothing to only having social distancing to closing schools and so forth and according to our models the impact of the confinement was really significant in reducing the number of infections and flattening the curve the third team has been working on building predictive models we have a couple websites where we publish our estimations and our predictions every day and we also run a model to predict or to infer prevalence back in April before there was even any knowledge of how many people were really infected because there weren't any tests available and finally the last area is a citizen science project and why will we do that if we have all these other data sources the main reason is because there's actually been a lack of relevant data regarding very important elements in this pandemic for example the social contact behavior of people the resilience of the population the prevalence of symptoms or the availability of tests the emotional impact that the confinement and the pandemic is having on us which individual protection measures are we taking is contact tracing working or not I mean there are so many questions that we don't have regular data sources for them so we decided to ask people this really large survey called COVID-19 impact survey I encourage you to participate you can participate every week it's anonymous and we never thought it was going to become so big we launched it on March 28 right before the two weeks of severe confinement in Spain without labor mobility and it has 25 questions originally we launched it in Spanish and in English but later on we have expanded to many different countries thanks to the collaboration of a lot of people and associations and town halls and universities the survey became vital in the first 40 hours from launching it we collected 140,000 answers from Spain and since then we have collected more than 380,000 answers feeling a big sense of responsibility given how much people really helped us and how everyone shared it with their contacts we felt really responsible in sharing the results of the analysis from the first wave that we did from the survey so we wrote this paper that is freely available in January where we report the methodology and we report some of the main findings from analyzing the data from back at the end of March at the beginning of April as I say we have a lot of answers over 308,000 answers from Spain and another 70 or 80,000 from other countries in the world and we actually have two websites with lots of visualizations and you can play with the data as well we have this one which shows the results up to dates and then we have another one through the ELIS foundation that is this one is using ARGIS and this one is using Tableau so in the Valencian region we reached the peak of the infection in April so at the beginning of April so from the beginning of April there were some important questions that we wanted to answer and one of the most important questions was has there been herd immunity is there going to be a second wave how many people are really infected at the time there were not enough tests there were a lot of asymptomatic people that were not diagnosed there were a lot of mildly symptomatic people that were not diagnosed so we had really no idea how many people were really infected so we decided to infer prevalence using three different methods the first method was using our survey so we built a generalized linear model that was using three answers from the survey the question answers to three questions from the survey the question on prevalence of symptoms the question on whether a family member is infected or not and then gender and age and using that we run our model to the entire Spain all the regions in Spain and we infer the prevalence that was actually quite aligned with the prevalence that was later the turning by the Carlos Tercero a month and a half later so back at the beginning of April we already determined that we were very far away from our community because the average for Spain was around 5% of prevalence we used a second method using the deaths so how many infected individuals do you need to have to explain the amount of deaths that were observed and using the excess deaths and the deaths we estimated that we had around 5% prevalence in Spain around 2.37% prevalence in the Valencian region which was also very aligned with the later results and finally the last method was using our epidemiological models so our models had an underlying number of infected individuals that is much much larger than the observed number of infected individuals here you can see the red curve would be the number of infected individuals according to the model and then the blue curve is the reported individuals and we estimate every day the detection ratio so how many of the infected are actually detected because they are tested and they are reported so using the underlying number of infected we were estimating that there would be around 2% of the Valencian population that would be infected so our answer was no we are very far away from our immunity and there is going to be a second wave as soon as we lift the measures unless there is a vaccine or there is an efficient treatment and in fact there are other simulations where the curve was growing immediately as soon as we were lifting the measures another analysis that we did was of the impact of contact tracing so using our agent based model we ran different simulations where different percentages of the population were contact traced from 100% of the people being contact traced would be these really light being blue and these basically barely a second wave and then you know 0% of the or 10% of the people being contact traced this is assuming that everyone that is infectious can isolate themselves which we know is not true through our survey so this would be like an upper estimation on the a lower bound estimation on the numbers and then we have been doing a lot of analysis on the answers from the survey and I just wanted to share with you some of the maybe more interesting ones one of the results that really surprised us was the emotional impact of the pandemic because since the very beginning the most impacted age group have been the youth here we show the different emotional impacts and abusive usage of technology drugs and alcohol by age group where the blue youth the orange is the middle aged people and the gray will be the older people and we see that the levels of stress and the levels of anxiety abusive use of technology sadness and even loneliness is really high among the youth so one of our messages and recommendations since April since the beginning of April has been to deploy programs for the youth when we look at the impact by sex we also find that women are consistently more psychologically impacted than men on every aspect except for drug abuse and alcohol abuse so we've also been sending messages that women report the highest levels of anxiety and sadness and stress and so forth another interesting finding is related to the willingness to stay in confinement so we ask people for how long you would stay in confinement and the answers are zero days one week two weeks one month three months or six months and we found that in March at the beginning at the end of March there were barely anyone that would say zero days you know it was the beginning of the pandemic no one had been confined so far so the most popular answer was one month and then as the weeks went by we find that the one month people went down the percentage of people and then the percentage of people saying zero days went up a lot but also it went up the percentage of people saying six months so we went from having like a uni model distribution around one month to having a bi-model distribution with a significant percentage of people reporting that they could be confined only two weeks or less than two weeks and then another significant percentage said that it could be three months or more than three months so that has been a surprising finding when we look at what are the key factors that determine whether someone will be willing to be confined or not we find that the most important factor is actually the economic impact the people that report economic impact are six times more likely to tell that they will not be able to be confined than the people that do not report economic impact followed by psychological impact which is also a very big driver for determining that we cannot be more in confinement when we ask about the perception of the government measures this shows the evolution every month until now we have observed a very interesting behavior so until the new normality at the end of June it was about more or less like 40% of the people were saying that they wanted more measures and 40% of the people were saying that the measures that the government was taking were enough and then the new normality came at the end of June and the percentage of people demanding more measures started growing monotonically and then at the same proportion the percentage of people that they were considering that the measures were enough to the point that now the people that are considered that the measures are enough are a smaller percentage of the people that don't know how to evaluate the measures again when we look at which factors drive the perception of the measures we find that the emotional impact and the economic impact are the biggest drivers to determine that the measures are too much when we look at the economic impact profession we find results very similar to what other studies have found where hospitality is the most affected sector together with entertainment, domestic services, construction and retail and commercial activities a very worrisome finding from the survey is that a very large percentage of the population it reports that they will not be able to confine themselves if they had to and that percentage has been growing over time and now we are about 50% when we look at why people cannot self isolate we find very different significant differences per age and per gender the good news is that the old people 70% of them people age 60 and older report that they would be able to confine themselves so that's very good news because they are the most vulnerable demographic group but then we have some worrisome findings for example the youth are the ones that report the largest levels of fear of estigmatization and psychological impossibility as the main reasons why they wouldn't be able to confine themselves after home sharing which is the main reason for everyone another interesting finding is for women aged 30 to 59 where the percentage of women that report that they wouldn't be able to confine themselves because of taking care of children is significantly larger for women than it would be for men of the same age bracket we've also been asking about the perception of safety of different activities and here perhaps the most interesting finding the activities that are considered to be the safest is individual sports followed by buying in small shops where there is an important age difference between the elderly and the rest of the groups followed by going to locations where you need to ask for an appointment like going to the hairdresser or some other sort of like appointment based system and the ones that have every week being considered the least safe is flying by plane where we find a very significant difference also by age where the young people think is safer than the older people and then going to church where the results are the opposite the older people think is safer than the young people. Some of them are worried some results are related to schools where we are about a third right now of the population things are going to school is safe entails low risk of getting COVID-19 and then the hospital where it's about 50% only of the people think that going to school is safe and we think that's very low percentage and possibly a lot of people are not going to the hospital because they are scared of getting COVID-19 but they actually probably should go. In terms of gender differences we don't find very significant gender differences but in general women tend to be more cautious than men this is the evolution of the perception of the safety of schools which has increased a lot over time because in May it was only 7% of women and 10% of men thought that going to school was safe so it's actually increased a lot but it's still pretty low. We also ask about the individual protection measures do people wear masks do people disinfect their hands are they doing physical distancing are they limiting their contacts do they do ventilation and so forth and here the main finding is basically women do a lot more than men the youth is pretty good at wearing masks and disinfecting but it's not so good in terms of limiting their social contacts the youth would be the light blue and the light pink versus the other age groups but then we have a very peculiar finding about the vaccine we ask people the same question that the Spanish sociological institute asked which is whether people would put the vaccine when it was available and we find a very significant gender difference where a much larger percentage of men say that they would put the vaccine versus women and then we find also age differences where the age group that is the most likely to put on the vaccine is the older men there has been a lot of discussion over the past few months on whether this pandemic is a pandemic or is a syndemic because it is affecting disproportionately different groups immigrants people that are poor people that have some kind of disabilities women and so forth so to shed light on that we look at what are the behaviors and what is the economic and psychological impact of those that report being positive in the survey versus those that report being negative and we do find significant differences in their economic impact and in their psychological impact and not so much on their behavior I mean they all report wearing masks and disinfecting hands but we find for example that 18% of the ones that report positive they tell us that they lost part of their savings or all of their savings for the ones that test negative or for example we find that 10% of the ones that test positive say that they lost their job versus only 7% for the ones that test negative or 13% of the positive they say that they have fear of stigmatization because of COVID-19 versus only 6% of the negatives we've also been looking at the temporal evolution in the number of close contacts because this is a very important figure the reproduction number the famous R or RT is very correlated with the number of close contacts so we've been asking the number of close contacts since before the new normality since the beginning of June and we indeed observe an increase in the number of close contacts as we reach the new normality this is the people the blue bar would be the people that say that they have zero contacts outside of their home and the dark blue would be the people with 50 or more contacts outside of their home in one week so we find that when we were still in confinement in partial confinement there were 42% of the people that they were reporting that they had two or less close contacts in one week outside of their home and that percentage with the new normality went down to around 23% or so and it has remained like this but we observe now given that we are in the second wave and the number of infections and the incidence is pretty high in Spain we do observe that the number of close contacts has decreased and that is good news when we look at the origin of the infection we find that most people know the origin of the infection there is around 38% of people who don't know but the vast majority of people know and the main origins of infections are family members which will be 21% household members plus 10% of other relatives so that will be 31% followed by colleagues at work and finally when we look at contact tracing we find an interesting gender difference we ask people the people who had been in close contact with an infected individual if any contact tracer had called them and we find that women a lot more women than men say that a contact tracer called them our hypothesis is that women are more likely to answer the phone when an unknown number calls them which is the contact tracer but this is just a hypothesis that will have to corroborate so what have I learned after all these months, 8 months working on this the main finding that we have learned is that a pandemic is not just a public health problem it is a societal problem and therefore solutions cannot be simple they have to be holistic to account all the different dimensions and in particular I think there are three areas that we could really work on to help us a lot in finding better solutions and more efficient solutions for this pandemic the first one is data there's been a lot of talk about data about the lack of data on this pandemic and I have lived this in my hour scheme for the last 8 months there is a tremendous lack of high quality data that is captured and updated and shared in a systematic and regular way and this is absolutely necessary because the data is a reflection of reality it will enable us to know where we are, how we got where we got, what's working what's not working and so forth but we also need to invest in people the right resources the right contact tracers and researchers and teachers and doctors and social personnel but also people enhanced with the right technology and the right data there is no use of having a lot of contact tracers if they are using a software that is from the 1990s and it takes half an hour to enter every contact so we really need to invest in the people but also in the technology and the capabilities to enable them to do a good job and finally why would we do all of this because we want to identify weaknesses in the system we want to identify areas for improvement and we want to design public policies that would actually tackle those weaknesses for example, if we know that 50% of people cannot self-isolate let's design policies to help people self-isolate so they won't go and infect other people if we know that the youth are really impacted by the pandemic, let's design programs for the youth because they're really suffering and finally in this recent paper that we just published last week we propose six recommendations to really make the best of this pandemic I think the first recommendation is to really think and act boldly now let's take this opportunity to build back a better society, a society that is fairer, that is more evidence-driven and that is more digitally savvy we also need to make a very clear assessment on the technology and the data that we are using and make it only fit for purpose there is a lot of concern on over-technification and over-debtification of the world, an excessive collection of data with the excuse of the pandemic that is going to become the new reality and this probably not a reality that we want we also need to always place people at the center and people in the loop we really need to invest in data literacy, the lack of capabilities the lack of knowledge, the lack of skills and citizens in general but particularly in public administrations is absolutely terrible. It's also very important to test and scale sustainable business models because a lot of the valuable data is privately held data and then I think we should think of regulation as an enabler and be creative in thinking how regulation can help us accelerate the achievement of this better world that we all want to build with the analysis of the data and the technologies that we have so thank you very much and again I encourage you to answer the survey if you haven't answered yet thank you. So thank you Nuriya for this amazing presentation. I think there's a lot of data, a lot of information we have a very technical question asking for the sample of your research how did you take this sample the sample size and all this stuff I don't know if it's some information that you can share or maybe the person that is making the question can find the answer in the web or in some place Yeah so the sample size for the survey what do you mean so the sample size for the survey I mean I can show you I can actually show you the survey right now we have 380,000 answers worldwide and we have 308,000 answers from Spain if you go to this link you can actually see all the data and then if you go to the paper which is publicly available open science the paper has all the description of the methodology this paper in Jamier this is actually the preprint version but if you click on this name of the paper you get the paper and the paper also actually gives access to the data that we report on the paper so the sample size is huge still of course is sort of like voluntary online survey so it does have some biases and we use reweighting to compensate for the biases and to make the distribution of our data match the census data in terms of gender age and geographical region of Spain and also profession okay so thank you Nuria and take care thank you very much thank you