 Thank you. Thank you all so much for the introduction and thank you big data spent for the invitation I'm going to talk about something that perhaps is less known in the context of big data Which is how can we leverage big data to have positive social impact? If we change to the slides, please There we go And in particular, I'm going to talk about a specific type of big data Which is the data that is collected by the mobile network infrastructure So let's first look at the context, which I'm sure you all are very aware of We live in a world where there are more mobile phones than people Worldwide there are over 5,000 million mobile phones to subscriptions Which corresponds to approximately 70% of the world's population? The mobile phone is the most widely adopted piece of technology in our history with Between 89% and 120% penetration rates is bigger than a hundred percent because many of us like me have more than one mobile phone But there are two very important facts for the purpose of this talk The first one is that we love our phones They are always with us and we actually spend more time with our phones than with anyone or anything in our lives And the second one is that this is a global phenomenon Happening both in developing countries and in developed countries and the fact that is global will be very important for Having positive social impact In fact in 2013 MIT technology review declared the idea of using The data collected by phones not a smartphones But any phones by the mobile network infrastructure as one of the breakthrough technologies because it could enable us to Understand human behavior at a level that we have never been before And this has given the rise to a new area called computational social sciences Which is the result of merging social sciences and computer science because for the first time in our history We can model large scale human behavior and we can quantify the behavior And we can also verify or not social theories about us as a species as a society That's the technological context, but there is also the sort of development context so in 2015 as Many of you might know the world was ending the objectives that they had Set for the world in United Nations the objectives for the millennium and they were thinking which new objectives put and In that process United Nations realized that there was something called big data and they called for a data revolution They realized that the availability and existence of all this data about the world could actually help us achieve these Goals that United Nations wanted to set so they commissioned a report to a group of experts and one of them was Sandy Penlan my PhD advisor Which is called the data revolution report is publicly available and in that report They outlined how data could help us basically make the world a better place so in 2016 when the 17 sustained Sustainable development goals were defined in parallel to the definition of the goals There has been a movement on how data can help us in two ways with respect to the goals The first way which is the most obvious easier to understand one is How they can data help us measure if we're achieving the goals So as you see the goals are very ambitious for example number one is to end poverty And one of the criticisms that United Nations has had in the past when setting these types of goals is Actually that we cannot really measure them So if you cannot measure them, how do you know if you actually achieve the goal? So how do you measure poverty using traditional means is very expensive and it's also doesn't really scale So thanks to the existence of data may be we can better measure and track if we are achieving the goals But there is a second less obvious way where data can help and of course when I say data I mean data plus all the algorithms necessary to make sense of the data Otherwise the data in itself, you know, that's really almost no value. So the second Area is how through the analysis of the data, we can actually achieve the goals. Can we end poverty? better just because we have this data or can we Eliminate hunger or can we ensure that there is more gender equality etc etc through the analysis of the data And I'll give you some examples today in my talk So for the past couple years for the first time in its history United Nations has organized a very big Conference called the United Nations World Data Forum I don't know if any of you has been there or knows about it So the first one was in South Africa last year and the second one has just been last week in Dubai. I think it was last week And it has gathered this last one I think it was over 2,000 people from a hundred different countries and everyone talking about how can we leverage data to? Help us achieve the 17 sustainable development goals So it's been that it's become a very important track within the development area So in my particular case one of the questions that I have been trying to answer in my research and with the The teams that I work on is how can we leverage these mobile digital footprints to understand? Aspects of human behavior that could help us improve the world that could help us achieve these 17 sustainable development goals So How do we do this? So before I explain to you the areas where we can have impact. I thought I could share with you the type of data that we use This type of data is the most basic type of data collected by the mobile network infrastructure But because it's all is the most is the simplest is also the most available Across all the different countries and particularly in developing economies So the type of data are CDRs, which I don't know if you are familiar with So this is how mobile phones is the world So as you know the phones are connected to the mobile network infrastructure the mobile network infrastructure consists of a number of Cell towers they're called based transmitter stations and they have a certain area of coverage, which is usually a three lobed A of coverage, which is difficult to model So typically what we do is an approximation where we place each cell tower in the center We do a bortonoid desolation in such a way that at the center of each of the Cells will have a cell tower and we assume that the area of coverage of each cell tower Would be each of these polygons in the bortonoid desolation some other times we do another another simplification And we just make a grid, but it doesn't matter at the end The idea is you have in space these cell towers and then they give coverage to a certain geographical region and you don't know anything in terms of location of Specifically where in that particular region any phone is but all the phones that are connected to the same cell tower Would be for example in the orange Polygon down there Etc. So as I said the type of data that we use are called CDR's CDR's are collected for billing purposes Pretty much. I would say in every country They are event-driven data So they only exist when the phone makes or receives a phone call or sends or receives an sms Or connects to the data network and for most purposes we typically use voice CDR's and sms CDR So the CDR's of the voice events and the sms events obviously for privacy reasons all personal data is Encrypted and also the data is usually analyzed in an aggregate form So the CDR's have many fields But for the types of projects that we do we typically use the time stamp on when they're meant to place The encrypted originating number the encrypted destination number the cell tower ID of the originating number If both phones are in the same mobile operator then you also have the cell tower ID of the Destination number, but if they're in different operators You don't know obviously and then the duration of the phone call and with sms is the same But there is no duration this data many times is referred to as metadata because in reality There's no content. There's no actual data. There's obviously no conversations or no content of sms It's just data about the fact that there was an event when you aggregate this type of data in space and in time Then you are able to model emergent as elements of human behavior And in particularly we typically compute three different types of variables We compute consumption variables that measure how intensely used the network is for example the total number of phone calls Total number of incoming phone calls outgoing phone calls, etc We can build what is called the call graph, which is the graph of all the phones that are calling each other And once you have a graph you can just compute any graph characterization variables to Characterize such a graph and then we can also compute some aggregate Rough so to say mobility variables like what is called the radius of gyration Which is the radius of the circumference that covers most of the sort of like cell towers Used by the different phones or distance traveled or most popular antennas and so forth I'm just gonna show you a couple videos So you get a sense of the type of data and this video shows the number of phone calls connected to the cell towers in a Neighborhood in London starting at I think it was like a 3 a.m. Or something to analyze the morning And as you say in the morning, you know There is a lot of phone calls are the bigger the circle the more the higher the number of phone calls And just by looking at these we have a sense of human activity So if it was going to be for example, a natural a natural disaster at that time Just by looking at the number of phone calls connected to different cell towers We can have a proxy of roughly how many people might have been affected by the natural disaster when we look at the mobility of the of the Phones then we can have aggregate mobility measures like this is an example of a collaboration with the Wales transportation agency to understand traffic in Wales and to also Estimate co2 emissions and again is very high level is very aggregate, you know, it seems not that Detailed, but when the alternative is having nothing as it is in many developing countries this data becomes very valuable So the first take a home point from my talk is that this type of data mobile data brings tremendous value to have Positive social impact and a number of areas and I'll share with you the areas that in In the experience of my research and the research of other teams We have found that this data brings value the first one is as I mentioned with the first video for Natural disasters humanitarian crisis and climate change control So when a disaster takes place and as you probably know because of climate change the number of extreme weather events is Increasing so there are more and more natural disasters happening We can use this data to very quickly assess roughly how many people Might have been affected by the natural disaster and where these people might be and also if there are Displacements of the population to other areas and where those areas are Another area Where I've also worked on is financial inclusion and socioeconomic development So we have found that this data can be a proxy for socioeconomic status inference and socioeconomic status inference is important in developing economies because a the national statistics and the census is tend to be very old They could be 40 years old like in some countries in Africa, but also Usually socioeconomic status is a proxy to access to education or access to drinking water or access to sanitation So if we are able to infer Socioeconomic status on an ongoing basis We can actually better understand where we can invest money and where we can deploy interventions to help certain areas Another area which is very big. There's lots of examples is urban studies understanding cities and understanding You know the land use in cities and understanding for example, we did a project on predicting crime hotspots in cities using this type of data Transportation is very obvious. I showed you the video in the with the highways There's a lot of projects also on understanding transportation predicting predicting flows predicting traffic predicting CO2 emissions and so forth one area that I'm particularly passionate about is public health and in particular Pandemics and infectious diseases the World Health Organization has declared that this century they will probably be a big pandemic that Very likely will kill a lot of people And one of the reasons why there is concern is because we have more mobility than ever and an infectious disease Human transmitted or even mosquito transmitted infectious disease doesn't spread if people don't move So understanding mobility is very important and as I showed in one of the videos We can start to understand human mobility and if we combine that with epidemiological models We can actually have more accurate models of the transmission of infectious diseases There are some examples on energy on how this data can be a proxy for energy consumption I might help us predict peaks in the consumption of energy There is a world movement Happening in most of the national Statistics offices in the world on how we can leverage these new data sources to compute cheaper and more scalable and more sort of like recent Sensuses everywhere and there are even some examples for agriculture how we can use this data to better predict the yield of crops for example in our case We have projects on transportation as I showed in the video on a national statistics and population studies Both a phone owns and PESA, which is the leading mobile money transfer system in the world So we are also doing projects on financial inclusion and seeing how we can help a Developing economies prosper through the use of both the mobile technology But also through the understanding of the use through using machine learning techniques on data We've also studied social capital social capital is what measures The resilience of a community or what sort of like creates the fabric that makes communities prosper And we're trying to understand how we can infer the social capital from this type of data And then finally we're also doing a number of projects on public health both in developed and in developing economies and in particular we have a collaboration for malaria with a Malaria consortium and the sound of the best malaria researchers in the world and see how we can use this data to help tackle Malaria, and I think would be amazing if we can contribute to the eradication of malaria in the world I'll have time to explain the more details So until now I've explained to you what we can do the type of data and sort of like the advantages And I think there are three key areas in terms of the advantages of using this data The first one is the cost and the effort the particular data that I told you about CDRs are collected anyways So it doesn't seem that it would be very costly to use it because it's already being collected The second one is the temporal and the spatial granularities So I said that the temporal and spatial like this were not very good because it's true We are not talking about GPS or all a precision and the and temporally. This is event-driven data So it only exists if the phones are actively used But what we compared it to having nothing or having information that is 40 years old or even 10 years old This data becomes very valuable and then certainly we could argue that it is even more accurate and definitely a scales better The traditional methods because there is no human in the loop They have in a number of examples in the past five years on how we could use this data for social good but Why is it that we are not using it if there was going to be you know a pandemic right now or You know a natural disaster more likely than not we probably are not using gonna be using this data to make better decisions So why is that so for the rest of my presentation? I wanted to share with you my personal view based on all these years of experience on why We are not really using this and hopefully maybe this will inspire some of you into working into this area I'm trying to tackle some of the barriers that I'm gonna be talking about Some of these barriers became evident in the not in the last one because the last Ebola outbreak was just this summer But the previous one that took place in West Africa where there was a genuine attempt to use this data But it became very very difficult for a variety of reasons which I'll explain next The first set of barriers are internal a Lot of these data is privately held But until now there hasn't really been a department in the companies that hold this data called the big data or AI for social Good department where you know that we work on these types of projects. So these projects exist Mainly because there are passionate individuals in these organizations that believe in these and they try to push for these But of course, you know if those individuals leave those companies and then go somewhere else Then what happens, you know with all these projects. So I think this is changing I think increasingly more and more companies are creating Departments or teams that are working into this but historically this hasn't been the case The second set of challenges are technical which has a research in the field Of course, you know, it's nice because it gives you a lot of ideas of things to work on there are many many many Technical challenges and I'm just mainly gonna focus on the more data analytics types of challenges So the first challenge is in terms of the representativeness of the data Usually the data is just a window into the complexity of reality And in most cases you only have data from one service say from one mobile operator or say from one internet service But of course, not everyone is the customer of those services So you need to understand how well your data might generalize to the overall population Which leads to the fact that the data might have biases and as we very well know, you know We don't take provisions for these biases whatever machine learning model that will train with that data will learn the biases All these projects require combining data from multiple sources Because all the use cases that I explained have their own data be it malaria prevalence data or some, you know Transportation data or weather data, etc So you have to be combining data of the technological sources with the sort of like human and computer and sort of like social Sources which makes the problems quite complex because the data has obviously not just different formats But different sampling rates different characteristics different levels of noise if you're lucky there for the same time period sometimes then or even from the same time period For many of the projects there is no ground truth because as I said the Benchmark might be information from 20 years ago, which probably you shouldn't use it as ground truth So many times the only way to really validate these projects is to actually do an intervention is to make a decision Based on the analysis of the data and then see what happens But of course doing interventions in the real world is not easy and you kind of just be trying things It's not a lab. So this makes it difficult to assess the impact and the value of some of the projects of Course we cannot confuse, you know correlations with causality. I think you all know this For many of the use cases we need to be able to interpret the results In fact, the value might not be on the performance of the algorithm itself The value is on the understanding of what matters to model a particular problem So we need interpretable models and as you know for many tasks We we cannot use or it's difficult to use deep learning base models if they are very complex because they are less interpretable than other types of models It is very rare to see an example of using this type of data in real time But there are many use cases where you need real-time Analysis so moving to real-time is still a big challenge And then there is also a challenge regarding I'll explain that later on how to Make this data available to different partners because most of these projects involve people working in different organizations We describe some of these projects in there are there's a literature on this But these are some of the recent papers that we wrote about this There are also social and skills barriers There aren't any a lot of clear guidelines on how to use data for humanitarian purposes because this use of data Has not been done before so we really need to work on that and figure out what are the guidelines for this There is a big risk for having negative unintended Social consequences of very well intentioned projects and I can give you an example Imagine you wanna you know There's a developing country that doesn't have a democratic government and you are very interested in modeling You know the behavior and the mobility of a certain ethnic group which happens to be prosecuted by this non Democratic government so on the positive side you can really help humanitarian institutions help that group But if the government sees exactly where these people are and they are not their friends, you know They could use that information to target them. So in all the projects, we need to think okay This is the real world. This is real people's lives and we need to try to anticipate any potential negative consequence And I think the social barrier that I think is the biggest from my perspective is The divide that there is right now in skills We are very few people and you are probably All of you are part of this very few who know The possibilities of what can be done with this type of data and we and we can actually make it happen But there's a vast majority of people including decision makers, you know On politicians who really don't know don't have the expertise to be able to do this And if we want to make this a reality We really need to invest in education at a professional level, you know to like the citizens But also to like decision makers and politicians There are obviously privacy risks that need to be minimized even though the data that I showed as I showed to you is like largely aggregated and Synonymized minimally or fully anonymized But I think the two pillars that are very important which in this case in the European case the GDPR Also thinks is very important and has it in the regulation our control and transparency So giving people full transparency and control over the data Security is obviously very important But in my experience in all the organizations that have worked on the data is very secure And even though you cannot obviously ever say that something is 100% secure But I think they use the state of the art security systems. I Highly recommend that anyone working on this should comply and sign an ethical code of conduct in all the organizations that have worked on I have signed and complied and contributed to the Designing of ethical code of conducts to make sure that whatever analysis is done is done, you know with on an earth in an ethical way In this context there is a big movement and I don't know if there is any talk in this conference on this But if there isn't it could be very interesting to add for next year There is a big movement Not only the computer science community But also in the in other communities for example, I just come from a conference called Roboty Yuri's which is organized by lawyers and They are very interested in in AI and in algorithmic decision-making from a legal perspective And they are working a lot on ethics in computer science We're also working a lot on the ethics of algorithmic decision-making and AI so there are many examples And in fact, I think the point now is that we have too many examples And I think maybe we should converge to which ones are the principles that we really want to follow So these are some examples the ones on the right are from a paper in PLOS computational biology from last year with a Outline 10 basic principles when dealing with the big data And if you don't know about it, I recommend you to read this letter. I really like it. It's a manifesto called the Copenhagen letter Does anyone know it? No, okay, so I encourage you to look for it It has five basic principles that I really like and that I really Follow the first one is that technology should not be above us Technology should be to help us know the other way around the second one is that Innovation doesn't equal progress Not every innovation represents progress and what we should work for is for progress for how to use technology to help us all Make progress but not necessarily to do innovation for the sake of doing innovation when maybe has a negative impact on progress Trust is very important and you develop trust through transparency So designing for the screen is very important and then there's very interesting concept, you know, I'm sure you all have heard of Human centric design, but now we are moving to humanity centric design It's not just even looking into the individual and basing the design on people It's actually basing the design on humanity because we are all connected and this project that I presented affect Countries, you know or affect, you know the world Like as a whole not just like a single individual so Bringing together all the different ethical principles I like to summarize them in this acronym called fate and algorithms algorithms that would be fair I don't have the time to explain each of these concepts That's a different talk, but at least I just wanted to throw the concepts to you So fair so they don't discriminate accountable. So this clear responsibility, but also autonomy They preserve our autonomy autonomy is a key pillar in Western ethics autonomy means that we decide what we think and what we do However, as you very well know, I'm not sure we decide what we think of what we do today because there is such an Intertwined relationship with algorithms that are modeling us that a big part of our decisions, you know are modulated by algorithms So I think autonomy should be at the core Transparent as I explained then algorithm should be beneficent This implies that they should strive for progress that they should have some provisions for veracity I'm sure there might be some talks here on on deep fakes and on like fake content. If not, that's another interesting topic for the future sustainable Promoting diversity diversity in terms of teams and diversity also in terms of like gender and other demographic factors and investing in education And then the end is that also there should also be non maleficent So they should preserve privacy. They should be reliable. There should be some guarantees for reproducibility security and prudence The fifth area is the way of working as I said these projects require People from different institutions to work together. I'm not an expert in malaria I'm not an expert in like emergency crisis, etc So we need to partner with experts in the field and this means that many times you have to do agreements between private And public institutions, which I don't know if there is any lawyers in the room But when you get the lawyers to talk to each other to sign the agreements then it always takes a long time So this is something, you know that we need to work on and also the agendas of these different institutions They might not necessarily align so it takes time to solve these issues And then there is a very important factor that we cannot forget which is the financial factor. We need to understand Who will be paying for these we need to understand what is an adequate business model and we need to start building from examples There are they have been a number of successful pilots But how do you move from a pilot to a sustainable? Product that would be used like in a large scale is still a challenge to overcome So what can we do? To sum up I think the key guiding principles for any project in this area is that it obviously need to be legally compliant You kind of break the law But it not only needs to be legally compliant it needs to be socially acceptable It needs to be developed within the ethical framework of the societies what is going to be deployed it needs to be commercially and financially viable Otherwise, it's not gonna move beyond being a pilot and of course it has to be technically feasible It has to actually work technically if we want any of these projects to be sustainable over time So as I told you at the beginning This is a world movement that is increasingly more and more Initiatives happening and more and more people working on this I want to share with you a couple of them So United Nations has the global partnership for sustainable development data which a few hundred institutions have joined and it's just a statement of Intent that you really support using data, you know for the sustainable development goals Like in the last United Nations World Data Forum, which as I said, I don't remember if it was last week or two weeks ago There was actually one outcome from the forum was this Declaration the Dubai Declaration to invest more on using data for sustainable development The GSMA launch in February of last year Another action called the GSMA mobile data for social good where 20 operators right now Including Vodafone and working on how we can use this data for positive social impact in two main use cases natural disasters and public health And then the European Commission Just called for a high-level group of experts on business to government data sharing to see how What kind of frameworks we need to develop to tackle one of the barriers that I mentioned in terms of the access of the data Even though I think the biggest barrier is the access to the knowledge and to the expertise more than the access to the data There is this initiative that I wanted to share with you So data pop alliance is a nonprofit organization where I'm chief data scientist And our goal is how to leverage big data for positive social impact and one of the biggest projects that data pop is doing is opal where the idea is to Bring the algorithms to the data instead of the data to the algorithms and enable authorized third parties to run authorized queries Against proprietary data in a privacy-preserving way So if it was going to be a disaster natural disaster and a certain NGO wants to have a sense of how many people have been Affected by the disaster the NGO could just query retrieve me people and then you know It will answer like approximately the number of people and there are a couple of pilots going on in the world There are lots of publications if you are interested on the topic You can also email me This is a publication that is gonna come out in a nature scientific data I think next month and we are a lot of authors as you see is almost as long as the paper the author list and In that paper we propose a privacy conscientious way to use mobile data for social good And it could be it's very relevant to this talk. So I just finished with a question I am working on how to make this a reality and there's of course many other people working on this So I guess the end of my talk is an invitation to you if you would like to join in this revolution And how to leverage big data and AI to have positive social impact in the world. Thank you