 Bueno, vamos a proceder con a segunda parte de la jornada, onde tenemos a Anuria Univer, me preguntaban se éramos familia, pero no, podemos confesar. Somos primos. Primos lejanos. Se nota. Pero bueno, para mi é un placer, digamos, tener a Anuria unha vez más en la universidad dando una charla, ella é Directora de Científica de Telefónica e Másdria, aquí en Barcelona. La charla se va a impartir en inglés, e bueno, Anuria, adelante. Gracias. Gracias por a invitación, Miquel, e por a organización de estas jornadas. Now I'm going to switch to English because they asked me to give the talk in English. The slides are also in English. My name is Anuria Oliver, and I'm scientific director of a research team in Telefónica Argentina in Barcelona. And today I'm going to talk about how we are using mobile phone data in different areas and particularly in the public sector. I'm not touching on education though, which is the topic for most of the sessions before, but I'm going to cover different aspects in particular how mobile phone data can help in official statistics in urban planning, in security and crime, and in public health. And then I'll highlight some of the opportunities that we see, and also the many challenges that there are to be able to use this data in general and in particular for the public sector and then some conclusions. But before I start, I thought that it might be useful to you, particularly to the PhD students to hear a little bit about the research team that we have here in Barcelona. So this is our building, it's next to Diagonalmar, next to the forum building. And the research organization in Telefónica is about 20 full-time researchers. We are about eight, nine years old and we are divided into two big functional areas and I'm responsible for one of the areas. We have a very successful internship program whereby PhD students from anywhere in the world but also locally can come and do an internship with us for three to six months and work with the researchers on a research project. And we also have some visiting professors and an open innovation approach. And we are hiring and my last slide is actually the URL for our hiring site in case you are graduating and you are interested in joining the lab after you hear what we do. In my research areas, I'm responsible for a multidisciplinary research team that I came here to create and we work on a variety of areas for machine learning, data analysis, human-computer interaction and the main theme of everything that we do is building computational models of human traits or human behavior either individually or aggregated from a variety of data. So we can build models from voice. In the past we have done from video and images, from mobile data, from services data from Telefónica services for example, etc. These are some of the areas that we cover. I don't have time to describe a lot of these areas today so I'm just going to focus mainly on the areas related to aggregate analysis but we're doing a lot of work on the individual modeling including recommender systems, as I mentioned voice analysis and human-computer interaction. We have a lot of external visibility for being such a young in quotes and a small team so we've generated a lot of patterns, we've gotten a lot of awards and also appeared a lot in the press, our project. So we are very happy and proud about this. So without further ado I'm just going to describe these four areas where mobile phone data can have an impact on the public sector and before going into the areas I thought that I would first give a quick primer on what is the mobile data that I'm talking about. So why are we interested in mobile data and why is mobile data useful to model large scale human behavior. So the main reason is because there are more phones in the world than people. Infact the mobile phone penetration ranges worldwide between about 90% and 120%. In addition we love our phones. We carry our phones with us all the time even when we are sleeping many times and this is very important because it is a sensor and it's a computer that is connected and that is always with us and the other very important factor particularly for large scales of like computational sociology projects is that this is a global phenomenon that happens both in developing countries and in developed economies and you don't need to have a smartphone to be able to leave these traces of data behind. So as a result there is a new area called computational social sciences which is focusing on modeling large scale human behavior using this kind of data because we leave these digital traces behind and then the idea is to be able to analyze those digital traces. Infact two years ago in 2013 MIT technology review named this area of analyzing human behavior from mobile data one of the breakthrough technologies and United Nations with whom we are collaborating actively for the past couple years it is calling right now for a data revolution because they realize that this data is extremely powerful to help and make better decisions to improve the world. So what is this data about? So I'll just quickly show you how the data looks like. Have you ever worked with mobile data any of you? No? Ok. So this is how a mobile phone would see a city and this is a simplification but usually what you do is there are a lot of cell towers or base stations and we can do a vortinoid distillation there are these pink lines here and then in the center of each of these cells there would be a cell tower and this region would be the area of coverage of the cell tower and then in each of the cell towers there is some information that is being registered every time a phone makes or receives a phone call or sends or receives an SMS and this data is traditionally called digital records it is all anonymized all the personal information is encrypted but it still is quite valuable data when you can do aggregated analysis because there is a lot of data from a lot of people Typically from this data we can compute variables of three types you can compute what is called consumption variables which characterize how many phone calls are happening how many incoming phone calls, how many outgoing phone calls what is the duration of those phone calls you can compute some social network features because you can build what is called the call graph which is the graph of all the phones that call each other and from there you can apply network science to compute some characteristics of that network and then you can compute some rough, not very precise mobility features because we are not talking about GPS precision or all we are talking about the granularity of these cells that I showed before but you can infer some characteristics such as the distance travelled or the most popular antennas or the radius of generation which is the radius of a circle or a circumference that would cover most of the travelling areas I am just going to show you a couple of videos to illustrate how this data looks like the first video shows the activity in the different cell towers in a state of Oshakha in Mexico right before, during and after an earthquake takes place each of these bubbles is proportional the size to the amount of phone calls that are connected to that tower so just by looking at that we can see this data as a sensor of how many people are rapidly in each of the areas so you can see this is a more populated area this was where the earthquake took place and then suddenly there is the search of activity and by looking at how many phone calls are connected to each of the towers we can have a proxy of how many people are rapidly in those areas and this is extremely valuable because if there is a natural disaster like this one we can help the Red Cross or we can help the government tell them ok, this area is very populated there is a lot of people here and there aren't that many people here so they can determine how much help to send if we look at the mobility between the phones then we can see where the main cities are how people are travelling between the main cities this is in the UK and even how they are going to Northern Ireland so just by looking at this very aggregated and anonymized data we realize that there is a lot of valuable information that can help particularly the public sector which is the purpose of this presentation so as I mentioned there is an entire research area called computational social sciences on how to use this data as social sensors and there is a lot of work and there is been work on monitoring mobility understanding pandemics which I will present later infering socioeconomic indicators infering different traits and a lot of these papers are actually papers coming from my research team in fact we have a research area within the team which we call big data for social good where as I mentioned we are working with the United Nations or with MIT or with some governments to see how we can help them so as we are working with the United Nations thanks to the existence of this data so now I am going to quickly present four projects that I think can illustrate the value of this data in the public sector and the first area where I think we can tremendously help is for official statistics so official statistics are statistics that are generated typically by governments and that I try to characterize from a quantitative and qualitative perspective different aspects of people's lives in a certain country or in a certain territory so many of the variables in official statistics characterize the population the gender, employment situation immigration but also economy, trade, energy etc these statistics are usually computed by hand so countries typically do a census typically every 10 years and it's a very expensive process because they need to ask every single individual in the region to answer these very long questionnaires to collect all these statistics so the question is can we leverage the fact that we have all these data to help build more up to date and cheaper statistics about a certain region and the answer is probably yes and in fact we have done work in the red area but there is research by other teams working also on the blue area and today in particular I will show an example of how we can use this data to help build better economic socio indicators and this is the next project which is worked by Vanessa Enrique Frias in my team so the challenge as I mentioned is related to the construction of census maps and in particular socio-economic indicators and this is very relevant in Latin America so in Latin America there is a lot of differences between the different socio-economic levels typically there is a pyramid of socio-economic levels that goes from A to B or F where the bottom of the pyramid is very big and is the poorest people and the top of the pyramid is very small and is the richest people and the socio-economic indicator is very important because it is usually a proxy for access to education or access to clean water or access to health care so it is very important for the country governments to know what are the different socio-economic levels of different regions in a country but as I said it is a very expensive process and very difficult to collect and therefore they only computed every few years so what we did in this project was to see if by having access to the mobile data and characterizing consumption patterns, mobility patterns and social network patterns of each of these different regions in a country we were able to infer the socio-economic levels in such a way that the country wouldn't need to expand to expand millions of dollars in creating these indicators and also we compute them every month instead of computing them every ten years we know from previous work that there are already some correlations that have been found in particular higher socio-economic indicators are usually correlated with larger area of mobility so the richer you are the more you travel and the further you travel and conversely lower socio-economic indicators are usually correlated with smaller degrees in the social network which means that the poorer you are the smaller your social network tends to be so this is the question that I would try to answer can we infer the socio-economic levels from mobile data and the answer as you will see is yes, with a certain level of accuracy so the main benefit that we see from this technology would be that the national statistical institutes would only have to carry surveys on a subset of the regions that we can use to train our models then we built our models and then we applied those models to unseen regions and that we automatically compute the economic indicators for and this is what we did the details are in the papers that I will mention later there are a number of challenges one of the first ones is that the Voronoi cells that you divide the space on don't map with the census regions so you need to apply a mapping between the Voronoi cells and the census regions to assign a particular socio-economic level to a Voronoi to a cell tower and then for each of the different cell towers we compute all the different behavioral variables consumption variables, mobility variables and social network variables and predict the socio-economic indicators in order to evaluate it we had the ground truth data for a particular country which was the census data and we took about two thirds of the data for training and one third of the data for testing usually in all these projects you generate a huge amount of variables I'm talking about it depends, in one of the projects that I will present later we generated over 6,000 variables it could go from a few hundred to a few thousand variables so it is very important to apply feature selection techniques that will automatically determine which ones are the most relevant variables because otherwise you will be over fitting to be building a model that generalizes we tried different algorithms and the one that performed the worst was random forests so when we reduced the dimensionality to 38 features we obtained an accuracy of 82% to estimate 3 socio-economic indicators and if we went to 6 socio-economic indicators we got a performance of about 63% so as I mentioned the main value for this is to be able to only collect the indicators for a small part of the country and then infer the rest by using our technology in terms of which variables matter something that was very interesting is that mobility seemed to be very important four of the top variables were mobility variables which we knew from previous work but it was nice to see that we could corroborate previous work and no one had tried to do this for socio-economic indicators inference and then communication and social variables were also important so the different aspects of human behavior seemed to play a role in characterizing the socio-economic level of a region the second project is related to urban planning so I'm presenting all the projects pretty fast at a high level we have papers that describe the details but I rather give you a taste of the different projects and in particular within urban planning the project that I'm going to present is about determining the use of the land or what is called land use identification so one of the tasks in urban planning is to know how a particular part of the region is being used is this a residential area is it an office area is it a leisure area is it an industrial area because depending on how it's being used the town hall or the government decides to build different infrastructures and different services for the people living there Traditional techniques again are based on questionnaires like for the census which again don't scale and don't allow for a very updated information so usually the land use information available is again a few years old at the very best and the question that we wanted to answer is can we infer how the land is being used by looking at the patterns of activity in the cell towers so to do that we could analyze data for cell towers both in Madrid and in Barcelona and we had the ground truth from the city of Madrid this ground truth was obsolete it was much older than the data that we had but that was the only ground truth that we had so we had to use it to validate the algorithm so how do we do that how do we characterize a region so the main idea here is if you look over at 24 hour period the number of phone calls that are connected to a particular cell tower you see a specific signature of that cell tower here this is the number of phone calls and this is the hour from 0 to 24 in a week there and in a weekend so as you see there is no phone calls no phone calls and then around the time when people wake up there are a lot of phone calls then there is a deep at around lunchtime then there is another big peak and then people go to their homes so in this particular cell tower there is almost no phone calls so this could suggest that this is a working area because it seems to match the working hours and then in the weekends there is no one this is another example for a different cell tower which is located in a different place and as you see it has a different signature so then what we did is for each cell tower we computed this graph and then we applied a clustering technique which in this case was spectral clustering to cluster cell towers that had a similar signature into five different clusters which were residential, commercial leisure, nightlife and industrial automatically the algorithm is able to determine which cell towers are the most similar to which ones and segment the city into these different private clusters and then we evaluated it with the real land view so I'll show you an example of what the algorithm found for the city of Madrid and then very quickly a very short video for the city of Barcelona so these are four of the clusters in the city of Madrid so this red cluster represents offices and as I showed before the characteristic pattern for this cluster is activity during working hours and no activity in the weekends and you can even see the street of the here which is where Telefonica is based and this is like the main office areas in Madrid if we look at the commercial areas we see that there is activity both in the weekend and in the weekdays and they have a similar pattern which this is very typical in human behavior analysis where there is these two peaks usually there is a morning peak and then there is an afternoon peak if we look at the nightlife cluster the main difference here is that there is a lot of activity in the weekend and pretty late this is like 2 am, 3 am and there is a lot of activities so this represents the nightlife areas and then the commercial areas have their own characteristic so something that is very interesting is particularly the nightlife areas and the leisure and transport areas because those areas tend to change relatively quickly so the information that town halls have because it's only computed every few years tends to get obsolete very quickly whereas with this approach we can detect on a weekly basis or on a monthly basis where are the main nightlife areas happening when we validated with the ground truth which was the data from the city of Madrid one of the observations is that in the ground truth official land use ground truth there is no nightlife because they don't have a way to characterize this because as I said it changes a lot so we were able to identify some of what they thought was residential or offices as nightlife as being used as nightlife and I'll very quickly show a video of the same analysis in Barcelona where the different colors represent the different clusters one is industrial the red one the orange one would be commercial the yellow one is nightlife the light green one is leisure and the dark green one is residential so I'll just show very quickly a video so you can see how the area around the airport is automatically constructed as industrial and then there are some pockets of nightlife around downtown this is mainly residential but is able to identify other areas within the city that are just for a different purpose so this is another example of how to use this data for urban planning the third area where we are finding that we can provide values in the context of safety and crime and this is a joint project that we did with the FBK in Italy and MIT and it's a project about predicting crime using this kind of data so crime is being studied a lot because it affects the quality of life of a place and usually again, mainly city officials and governments are very interested in understanding how safe their citizens are where the crime areas are there are a lot of studies that have found that there are correlations between crime and socioeconomic levels or levels of unemployment or percentages of immigrants etc and most of the recent studies of crime today don't focus on modelling individuals and determining whether a particular person is going to commit a crime but they focus on modelling what is called hot spots so one of the observations is that crime tends to cluster in different geographic spaces and almost all of you know that in Barcelona not all the neighborhoods are equally safe and there are some neighborhoods that are less safe than others so why is that because crime tends to cluster in these hot spots so most of the research today in understanding crime is focusing on understanding and finding those hot spots in terms of theories to explain what contributes to crime until now there have been two competing theories so the first theory was proposed by Jane Jacobs which was a social activist in the 60s in the US and she wrote a book called Eyes on the Street where what she says is that areas that have a lot of people going through them and they have a lot of diversity of people are safer because there are eyes on the streets and all of us are policing each other and then you know there is a lot of people and I see that someone is being mobbed and going to do something and protect that person however about 10 years later Newman proposed the opposite theory and he called it the Defensible Space Theory and he said no places that have a lot of people and a lot of diversity of people are more anonymous and anonymity induces less safety because if I know everybody I'm going to care for them but if I'm just walking by I don't know the person I'm just going to be okay it's not my problem and I'm going to go so the question is who is right because those two theories are actually conflicting in conflict with each other some evidence about who is right but I'm not going to tell you until the end so as I mentioned we are focusing on a place centric approach not on a people centric approach of how to characterize crime and this is very important because this project became very popular with the press and then they were talking about minority report and this and that and it has nothing to do with minority report but anyways so the particular study that we did was a study in London and this was in the context of a data that Telefónica organized where Telefónica shared large scale aggregated data with research teams from the world from anywhere in the world to work on any project but it had to be for social good and the winning project was actually this project and from that we started working together and this is the result of the collaboration the approach is multi-modal because we are using people's dynamics based coming from the mobile network data is London and as I said we are focusing on finding crime hotspots not criminals, not individuals in terms of the data that we use for this project we had three sources of data so something else that you've probably seen in all the projects that I've mentioned is that these projects are all about combining different sources of data so we have the human behavioral data coming from the mobile network but we need the domain data or the ground truth coming from somewhere else so in the land use it was coming from the town hall here the criminal cases data set which has the ground truth of the crimes was coming from the London police in the socio-economic status project the ground truth was coming from the census data so it's very important to understand that in order to solve a real world problem you need to have the data for the specific domain that you're trying to have impact on plus the behavioral data that you're using to try to make the inferences and then to compare with the state of the art we had the census data so this is the same data as I presented for the socio-economic level inference project for the case of London so in London is called the London Borough Profiles data sets and it has 68 variables that are census variables for the different neighborhoods in London the mobile data as I said was shared and was coming from a product that Telefonica has in this case which is called SMART STEPS and what SMART STEPS does is instead of creating a Voronoi desalación it creates a grid and in each of these cells of the grid you know every hour an approximation of how many people there are and then what percentage of them this is their home this is their work or neither home or work and then a rough approximation of their ages so what percentage are young medium age etc this is what I just explained the crime data was coming from the London Police and we had two months so we used one month for training and one month for testing and we had all the crimes that took place in that month geolocated so to define a hotspot we look at the median value of all the crimes in the month and then in this case it was 5 and then we just said ok if there were more than 5 crimes in this particular location in this month that's a hotspot and there are fewer than 5 or 5 is not a hotspot the spatial granularity similar to the socio-economic level project in the case of London these regions, these census regions are called LSOA and they are smaller than a zip code there is a picture here and the way they are defined is they are regions in space that have roughly 1,500 people because when they compute the census they want to be able to have the same number of people so they can compare so again we had to find the mapping between the areas of coverage of the cell towers and these census regions and the census data which is the London border profiles data had 68 variables which are the census variables so demographics information, unemployment information socio-economic level information percentage of immigrants percentage of retired people etc as I also mentioned from the mobile data we computed a huge amount of features usually you compute every possible feature that you can think of and then statistics of those so first order statistics and second order statistics etc so we generated over 6,000 features something very important to take into account is that all this data is a spatial temporal data so a lot of the unknown is what is the right time scale to model the phenomenon that you are trying to model so usually you try many different time scales because you don't know what is the underlying right time scale for that phenomenon so you compute say the total number of phone calls every hour, then group by 4 hours then group by day, then group by weekdays then group by the whole week then group by 2 weeks etc because you don't know what is the right time scale and then you apply feature selection techniques which in this case we use the genetic coefficient to select which ones are the most discriminative features at the end we selected the top 68 features to be able to compare with the census data which also had 68 features and so we built 3 models we built a model of crime that was using these 68 features coming from the mobile network a census model or a London borrow model that had 68 features that were coming from the census data and then the combined model with the census data and this is what we found in terms of the classification it's a supervised problem because we have the ground truth and again random forest was the best classifier but we tried a lot of different classifiers in the state of the art and this is the performance so we built the models using the data from one month and then we tested using the data from the following month if we only use a baseline classifier it has usually the best measure to use F1 in this case it's totally random actually this should be 0.5 which is the equivalent of random the borrow profiles model is a little bit better than random 57 or 0.57 using the mobile data we were significantly better than using the census data and that was very surprising in a sense because the mobile data doesn't have nearly as much detail as the census data and we were in 0.65 and then combining them we were in 0.67 and if we look at the accuracy we were almost 70% accuracy we did a lot of visualizations even though it doesn't really help a lot here to see this is the ground truth and this is what the model was saying but you can see a lot of the big hotspots are correctly classified and it's difficult to see whether it's actually working or not because roughly with 70% accuracy we were able to determine whether a particular region of London was going to be a crime hospital or not so then as I told you at the beginning who is right, is Jane Jacobs right or was Newman's right so to do that we looked at the features that were selected by the classifier and what we found first of all in terms of the time scale we found that daily features were the most discriminatory of the other time scales in terms of the features that were important we found that areas that had an increased ratio of residents had more crime which would be contradicting Newman's theory of the defensible space where he was saying if you have a lot of residents that should be less crime moreover we found that features that were entropy based features and entropy measures how predictable a variable is or not or how easily predictable is those features were actually correlated with less crime which seems to again go against Newman's theory and supporting Jane Jacobs theory so from our findings we have a lot of empirical evidence to think that Jane Jacobs was right and Newman's was wrong and that's what we call this project moves on the street instead of eyes on the street because we don't have eyes but we have the movements of people another interesting finding was that when we use the combined model that was using both mobile network features and census features only 6 out of the 68 features were census features the model automatically selects the features and it only selected 6 out of the 68 coming from the more traditional variables and interestingly the ones that were selected have actually found in previous work to be correlated with crime so our empirical evidence is also corroborating some of the previous work that wasn't done empirically was done using observations and questionnaires and then to wrap up actually what time did we start sorry I think I have 5 minutes left so I'll cover very quickly the last area where I think this data can bring value which is in public health in fact and you might have read some of these already in the newspapers one of the biggest concerns in terms of the survival of the species so to say is actually the risk of extremely lethal pandemic that could wipe out a big percentage of all of us on the planet earth and this is like an article from the International Monetary Fund saying that the pandemic risk is one of the global threats of the next century so there is a lot of concerns about how we will respond to a pandemic que happened with Ebola last year and how much of a crisis it was so the question is can we use this data to help in the context of a pandemic and the answer is yes and I'm just going to share with you very quickly what we did when the H1N1 flu outbreak happened in Mexico in 2009 which I don't know if you remember so in 2009 there was the first H1N1 outbreak in the history and actually the first pandemic in the 21st history in the 21st century and I'm just going to quickly give you an overview of what happened during the first days of the outbreak so around April 17 2009 the first confirmed cases of flu were detected and the government declared a state of medical alert which was just a recommendation it wasn't any kind of intervention where they recommended people to stay home because there was a risk that there was this flu going on and it was a pretty dangerous flu however the number of confirmed cases continued to increase and there was pressure on the government to do something stronger than just a recommendation and what we did is they raised the alert level to a second level of alert on April 24 and they closed the schools and universities and they closed some of the tourist sites they also closed the port of calls in Mexico from the cruise ships for example also forbid people from going to mass or going to big socares stadiums to see socares matches and so forth however the number of confirmed cases continued to increase so on May 1 they did something that was extremely unprecedented which was they sort of like shout down all non-essential activities in the country which meant everything except for hospitals and firemen and police and these closure lasted five days but unfortunately about a month later the World Health Organization declared that there was a world pandemic of H1N1 flu so one of the questions that stayed in the air was what was the impact of all these measures that the government took did they actually help in any way because at the end the disease spread so was it a waste of money doing all of this did it help and this is what we answered with our data so what we did was we took a million of anonymous phones from one of the most affected areas in Mexico and we characterized the mobility of those phones before the flu and during each of the alert periods because for an infectious disease to become a pandemic people need to move if I was infectious right now and I just stayed put in my house my disease doesn't become a pandemic because I'm not moving so mobility is key when there is a risk of a pandemic why a lot of the interventions are focused are reducing the mobility of the population and the interventions that the government took were aiming reducing the mobility of the population so thanks to the existence of the mobile data we can actually quantify if indeed the population reduced the mobility of not and this is what we found I'll just skip this if we look at each of the different levels of alert we found some surprising things first of all we found that there was significant reduction in mobility during the recommendation period so during this period the mobility of the people was roughly the same as if there was no alert so one of the first conclusions is recommendations from the government don't seem to be very effective because people just cannot continue with their lives the second finding was that during the second level of alert 80% of the people significantly reduced their mobility whereas only 55% of the people significantly reduced their mobility during the third period of alert however this period of alert was much more costly for the government than this period of alert so another important finding was closing schools and universities during working days because this period happened during working days and this one happened during holidays is more effective than imposing much more severe restrictions but during the holiday period because it seems that during the holiday period people the measures are not as effective and then the last question that we answered was was this reduction sufficient to delay the progression of the disease so to do that we applied an state of the art computational epidemiologic model which is called the SEIR model where you have a lot of agents and the agents can be in susceptible state or exposed to the infection then become infectious and then they are recovered by different rates for the probability of going into each of these different phases and then what we did is I'll just go quickly through this we use the mobility model coming from the mobility data from the mobile data the social network model coming also from the mobile data and the disease model being this SEIR model and we put all that into a simulation of the progression of the disease and we run two simulations the first one assuming there was no intervention from the government so the mobility was the same all the time and a second model where the mobility was the reduced mobility because of the interventions and what we found was that the number of infected people and the peak of the infection was about 10% smaller and about 40 hours later thanks to the measures that the government took and then of course the question is is that enough, is that a lot so 10% represents 100% of thousands of people potentially and 40 hours when you are dealing with a crisis is a lot of time because it gives you time to order more medication or to empty more beds or to mobilize more medical personnel so it's not a neglectable amount of time I'm just going to finish here because I don't have time but maybe I'll just quickly mention three areas where I think there is still a lot of challenges for how to use this data in general and in particular for social good and for the public sector the first set of challenges are related to regulatory and social challenges so there is a lot of not up to the regulation there are potential unintended consequences and there is also the potential risk of creating a digital divide between the people that have access to this data and the people that don't have access to this data there are many technical challenges which is very exciting for us as researchers so there are a lot of problems related to how this will generalize how you combine data from different sources of course all we are finding are correlations so if you want to attribute any kind of causal effect you need to do interventions many challenges in terms of the features that you select and machine learning and then of course there are potential privacy and security risks depending on how careful people are when they are dealing with this data and with this I will just finish this is some of the relevant papers that you can also look online for them and as I promised I said at the beginning that we are hiring so if any of you is graduating and is interesting in applying please go to this website and enter your information there thank you