 Hello, good afternoon. Thank you very much for the organizers for having me here to this very fascinating workshop. So I'm Elisabeth Day. I'm an assistant professor at the Department of Network and Data Science at the Central European University. And so today we'll focus on, yeah, like somewhat more specific topic, I would say. So giving you one example of how data-driven models can help us monitor the SDGs. So the broader context of my talk is the realization during the last, I would say, more or less 15 years that a lot of this new data that we are producing now on a daily basis using our phones, using the internet, but also through all the satellite imagery, for example, that is available nowadays. So this data can be used beyond their original scope. And they have started to be used by researchers and by international organizations as proxies to try and measure socioeconomic indicators, for example, and things that previously were mostly only measured traditionally through surveys, census data, et cetera. So these methods, such as survey, census, et cetera, are still fundamental for assessing socioeconomic indicators. But we are nowadays also trying to figure out whether some other kind of data can help us whenever we don't have first-hand data available. And so over the years, some of these data and methodology have been used to address different kinds of questions related to sustainable development. And to the SDGs in particular, there's a lot of work, for example, on estimating wealth using satellite imagery, cell phone data, et cetera. Other works that focused on, for example, measuring gender inequalities or, for example, measuring integration of immigrants in city, et cetera, just to give a few examples. But today I will focus on SDG2, so Zero Hunger, which has been the focus of my work in the last few years. As I was before transitioning back to academia again last year, I worked for over two years at the World Food Program in Rome, and that's where this work that I'm going to present started. So let's start with some basic definitions. So food security exists when all people at all times have physical, as well as economic access to sufficient, safe, and nutritious food. So it's about availability of food, but also people being able to access it, right? And when these conditions are not met, then we say that a population in a given area at a given time is food insecure. And just to give you a few numbers, in 2021, the estimate was that there were almost 200 million acutely food insecure people around the world in across 53 countries, from Latin America to Sub-Saharan Africa to Asia, and then also recently, more recently, even more so even in Eastern Europe with the war, you know? And these numbers are actually growing. So back in 2016, there were around 100 million people. So as you can see, the number has practically almost doubled within six years, basically. Of course, the pandemic didn't help to make things better. So this is not only a big problem, but also a complex one. So a complex phenomenon. So just to give you an idea, food security experts, they try to characterize food insecurity by the main drivers. And three main drivers have been identified. So one is conflict and physical insecurity. Then, of course, there's more and more of weather extremes, and then economic shocks as well. But although, basically, in every region, there is a main driver that has been identified. Actually, what we see is that many most food crises are the result of multiple drivers, and inter-combinations of how these different drivers, from weather to economic shock and conflict, how they also basically interact. So this said, how do we measure food insecurity? So several indicators have been developed across the years by FAO, the World Food Program, et cetera. And basically, most of them look at the diversity of dietary intake of households, and also at the consequences of constrained access to food. And the indicators that I've focused on are the food consumption score, measuring the former, and the reduced coping strategy index, measuring the latter. So food consumption score, basically, is measured by interviewing households and asking them, in the last seven days, how often did you eat food from different food categories, from staples to fruits to dairy products. And each of these categories has a nutritious value. These values often depend also on the country. So this is a general methodology that is applied in several countries, but at the same time, depending on the region, you have different values for the weight that these food groups have. And then, basically, through this frequency, you build a weighted sum to then get a final score that allows you to classify households in three different groups, having poor food consumption, borderline food consumption, or acceptable one. So this is the first kind of indicator. Another very important measure is, OK, I might have access not to food, but then, what are the consequences of me not having access to food for some time? And how do I cope with this? So another kind of question that is often asked is, in the last seven days, how often did you have to rely on the following coping strategy? So how often did you have to borrow food from family or friends? Or to limit portion size in the meantime, restrict consumption by adults in order for small children to eat, et cetera. And here, also, you have different levels of severity for each of these strategies. And then you sum them up in a weighted manner, this frequency in a weighted manner to come up with a score for each household of how insecure they are in terms of coping strategies. And in order to collect this information, they were saying traditionally several ways to do it. The more traditional one are face-to-face assessments, such as the comprehensive food security and vulnerability analysis that are normally carried out around twice a year. But then also in the last, let's say less than 10 years, some new ways relying on mobile phones, for example. So now, even in sub-Saharan Africa, there's quite a high penetration of mobile phones, at least sort of all the style ones. And so it's possible to carry out this kind of surveys also remotely so that you can make many more surveys in less time and spending less money. And for example, WFP has been carrying out these surveys on a daily basis in over 35 countries, basically starting five years ago and then increasing the number of areas where they are actually doing the surveys now. So then you collect all this data at a household level. And then what you do is that then you aggregate, of course, you need to have a statistically significant number of surveys per area and per time window. And then basically what you do, you look at all your households, most times then households are also weighted in order, like there's some post-certification methods carried out in order to make sure that you are accounting for actually the characteristics of the populations. And then basically you can measure the prevalence of people that have insufficient food consumption by looking at all those households that have poor or borderline food consumption and then also the prevalence of people using crisis or above crisis food-based coping by basically selecting those that have an RCS score greater than a given threshold. So this is nowadays done in many places, very often. But still the question remains, we cannot really do this at all places, at all time. It's still time consuming. It's expensive, et cetera. So the question that I was posed when I joined the World Food Program was, can we actually use secondary information, data from all these relatively new sources and these methods from data science, machine learning, et cetera, to actually give an estimate, give a sort of prediction of what the food insecurity situation is of an estimate of these indicators for areas where we don't have up-to-date primary data. And so in order to do this, of course, the first thing that we did was to resort to, like we went to the food security experts. There were a lot in-house, of course. And so let's look at what the causes are. And I mentioned this earlier. So there are three main causes that experts identify. Of course, there's many more. Let's think about, for example, animal diseases, crop diseases, health emergencies, of course. But these are like the three main ones. And so the first thing that we did was to start building a database as wide as possible that would put together and harmonize all this kind of data to start building our models. So for example, for conflict information, we resorted to the ACLET project, where basically they collect on a weekly basis daily news from all around the world about conflict. And they are able to extract also the number of fatalities involved in this conflict, categorize the conflicts, et cetera. Then we also resorted to a variety of micro-economic indicators, such as food and land inflations, currency exchange rates, et cetera, as well as things that the FFP collects itself, like market prices or what are the prices of different commodities in local markets across the world. And then finally, on the weather side of things, we resorted to data coming from, as I was mentioning, the satellites that basically provide for every 10 days new measures of the status of the vegetation, how much it rained. And then having collected data now for over 30 years, you can also look at anomalies with respect to the average, for example, rainfall of vegetation, greenness in that same area in that same period of time. And so see if actually things are worse or better. And so we created this database. It's not a static one in the sense that we wanted to build a model that then could run in near real time. So basically this is a database that updates itself every evening or whenever, every night or whenever there's new data available. And we also wanted to build something that only relied on open source and frequently updated data so that we could build a sort of sustainable system in the sense that the system could run over time, like even in the long term. And then basically what we did, we had a set of also training data, so thousands of data points on the prevalence of people with insufficient food consumption and the prevalence of people using crisis or above crisis food-based coping strategies. And so we had this information for several areas around the world. I don't know if you can see the resolution, but basically what we look at is the first level administrative units across quarantine. So we have this for hundreds of different first level administrative units around the world and at different points in time. So this is our training set. And then so we associate for each area at each time, we associate the value of the different kind of indicators that I was mentioning before economic weather and conflict indicators. And when available we also look at what was the last measure prevalence. So we then, as I was saying, this is a complex problem for which we don't really have a mechanistic model of how things work. So we went for a data-driven approach. We used a state-of-the-art machine learning techniques. I'm not gonna go into the details now, but we of course like tune the parameters. It's some feature selection, et cetera. And what we obtained was, were models. So we have actually two models for each of the two indicators that I mentioned, a model for places where we like a model when we only use secondary information and a model where we also look like take into account the last available measurement for that area. And then we basically also tested this model since of course, like when you include the last measure prevalence, you're already, you obtain quite some high scores. And you, of course, you might wonder, well, is it because you can simply use that and you will get a good enough, right, metric for what's happening right now? Well, actually, food security is a quite dynamic process. So what we looked at was a baseline model when we only used the last available prevalence as our predictive variable. And we see that actually our results using also all these other dimension perform much better than just this naive approach. And one important thing once we trained and tested our model was to actually look at explanations, right? Because in a decision making context, in general, it's not only important to make predictions, prediction often not accurate as you need them, et cetera. And so you need to also provide the users of the model with some concrete ways to understand where the predictions come from so that they can understand why you obtained this number and then also make sort of an assessment themselves of if things make sense or not, or if there's what are the elements that are worth exploring more. So we've started by using shop values just to give you an idea of what is the contribution of each independent variables compared to a broad average. But then what we did after that was to actually look at differences in shop values because what we have is a static model, right? So we want to predict, let's say, the food security situation for a given area today, this month. And so we use data, secondary data, about that area. Of course, it was like data, for example, about the last three months, et cetera. And then we make one prediction. But then we run this model every day or as soon as some of the input variable changes so that we can actually also look how things change over time. But it's not an intrinsically dynamic model. For now, it's just a static model that you run every day. But then what's interesting actually for to look at is when the model actually predicts a change, right? So until you can see here some, like most of our prediction of this kind of steps. So until one day, you get this value. And then from tomorrow to the day after, you have an increase in the prediction, let's say. So a deterioration of the food security situation. And so we actually use shop values differences to try to understand what are the variables that play a role in this change. And this is really what interests the decision makers so that they can make an assessment whether it's worth exploring this farther. So the idea is not to provide a number that then the decision maker should just take and as the ultimate truth, of course, but just to give them a tool to say, OK, these are the things that you need to be looking at. Like our model, for example, like this is the variable that has changed a lot and also has had an impact on the prediction, for example. Now, finally, another very important part of the work was actually to operationalize this model. So the whole point, so this started more with this idea in mind. So this was the initial ideas by the managers at Work for Program. They wanted a platform like this where you had this world map and you could click here and there and get estimates of the food security situation. So this is a publicly available website that is updated to the same on a daily basis. And then when you click on a country, if in this country there is a continuous data collection, you will see the numbers coming from the, let's say, let's call them actual data. And then, in other countries, you will see the predictions that come from our model. So basically, the prediction of the prevalence and then transforming number of people with insufficient consumption and the number of people who are in crisis or above crisis food-based coping. Now, let's go to the challenges and invitations, which is usually, I would say, the most interesting part also to spark some more discussions on where we could go. So one of the big limitations is also critique that we received was why are you building a global model? Like, we are putting all these different areas together into one box and then train our models using all of this. Whereas a local approach could give you more nuances on what's really going on. So the main problem is an operational one. So basically, the idea is that since we need the model mostly for places also where we don't have that much data, so we're not able to train a model only for those places. Because these are precisely the places for which we don't have a lot of data. But so this means that there is a lot of limitations. So we actually had more data than the ones that we used. But then they were, for example, most of it was for Yemen and Syria. And then if we would only use that to make predictions then for a sub-Saharan African country, of course, that would not work. So we had to sample the data. We know that our model is not that sense. Like, there's a smaller sensitivity to local patterns than to a more global one. And then, of course, the transferability to geographical context that are not included in the training data is, of course, not guaranteed. So this needs to be, of course, something to keep in mind while using this kind of model. Then another challenge was, as I mentioned earlier, the fact of finding open data that were available on a global scale at the same geographical and temporal resolution. So the data that we have is all open data. It's not at the same geographical and temporal resolution. Some is daily, some is monthly, some is weekly. And also some is national, some is sub-national. But then, so a lot of the work was also to put all of this together and find ways to integrate this in the model. And then finally, as I said, of course, these predictions are not meant to trigger decision-making, but rather to trigger further assessment of the situation. So I don't know, this is actually something to find as an interesting question. Is it just a matter of getting to more and more precise models? Or should it always be the case, actually, that when you deal with these kind of systems, I think there should always be probably this sort of human step in decision-making. And this model should probably just be used as tools for decision-makers to get some early warning or to trigger further in-depth analysis. And then in terms of future directions, and this is really something where I'd love to hear more about your inputs over the next few days. So as I said at the beginning, we're using a purely data-driven approach. So we throw all these variables from different drivers. But is there a way to actually go about this complex phenomenon and still come up with a quantitative output, but that maybe has a bit more of a mechanistic component? So how can we actually combine these two approaches, I would say, or even more than two probably, a bit in a way. I would say that probably in the field of computational epidemiology or like epidemic spreading, they have been able so far to maybe combine the two a bit more. So can we do this also in other fields, such as the field of food insecurity or this other kind of phenomenon? So that would be sort of what I would love to do in the future. And that I hope is one of the things that this workshop can also bring more insights and how to combine these two worlds. So in the interest of time, I think, so these were the main things I wanted to tell you. Well, this is all in this preprint, but I would suggest to hold on to if you're interested. Wait, because we are actually, there's a new version coming soon, the article has finally been accepted for publication. So we're going to update also the preprint just in a couple of weeks. And also the methodology has changed a bit. So this was posted like a year ago. So in the meantime, thanks to the also reviewers input, we have made a few improvements. So this will be available soon. There's also follow-up work that I've been doing, also with colleagues at ISEI Foundation mentoring on, so what I talked about until now is to predict, estimate the current situation. But then the question is also, okay, actually in areas where we do have data, such as, as I mentioned, Yemen or Syria, where really daily data have been collected, data has been collected on a daily basis for now, like almost four or five years. So in these places, can we actually use this time series to go from just now casting to actually to forecast how the situation is likely to evolve in these places? And then one other thing that I'm very interested in, as I said earlier, but yeah, like, so I come more from, I like originally was trained in physics and I have like my master in physics of the system. So that's really the kind of methodology that inspired me the most. And so recently I wrote a piece with colleagues at UNICEF and at ISEI Foundation on sort of what are the current applications of complex system science, complexity science for the most vulnerable, so getting like for monitoring and addressing the SDGs. And we outlined there a few success stories but also discuss a lot sort of challenges and limitations and future opportunities of this. So this is also something I'm happy to discuss in the next few days. And we also organize a workshop every year at the conference on complex systems. So the next one is going to take place in Palma de Margarica in October of this year. So if you are at the conference, I hope to see you also at the workshop of the satellite. Okay, thank you very much and have some discussion. Well, thank you, Lisa, this was great. So I have a question I said. So first slide where you show this transition in your food insecurity index. So is, I mean, can you trace down what happened at that point in time? Yes, so basically here, as I was saying, so these are what happens is that from one week to the next, mostly what you can see here from the picture, like so this tells you what the input features that are responsible for this change are. And so here what it seems is that basically food inflation went up. And so this in the model caused the deterioration in cities, so an increase in the number of food insecurity people. Food prices. Yeah, and then also another, whereas at the same time this was slightly counterbalanced by rainfall being actually, yeah, being going more towards normality, but still not enough for, yeah. So basically it was mostly about food prices, yes. Okay, okay, so, okay, so. And this is just an example of course, like you can create these kind of figures for any of the countries where we run the model and then you will, it will allow you to understand why you see that change basically. Thank you, thank you for the interesting talk. I was wondering if in your model you also consider the adaptive capacity of a society, so how do they, they can react, or individuals or of a society itself? How can they react to these food insecurity? Is it something that you can consider because it's also important, whether like they have a shock, but how can they react, do they have, or whether it's migration, whether they are going to do something though, would you consider this as well? No, not really, not explicitly. I would say that, so one of the two indicators that we measure is their, like how they cope with insufficient food availability, but so that's, but it's not, yeah, it's not something that we explicitly take into account. I don't know exactly how you would measure that, but that would be very interesting to explore that. Thank you. I had a question about the question that you ask your informants about food sharing. So when they have food insecurity, how often do you borrow food or rely on help from friends or relatives? So for a lot of small scale societies, food sharing is just a normal part of day-to-day life. So how do you tease out whether that is I have insufficient food and I need to borrow this, or just have help versus a norm in the society that someone who has more, even if it's just a little bit, is going to share with everyone? Right, so I don't know the details to like a detailed answer to your question, but what I know is that when they do this kind of surveys, so first of all, these are done, like I put here, or like the English translation or what like that is valid for everyone, but each of these surveys actually done in the local language and sort of, yeah, customized for the local context. So I would assume that in societies such as those you described, the question would be phrased slightly different. And then maybe also what I would think is that if you, if you interview a large enough sample of your population and you see that consistently sort of everyone has to try and borrow food or actually come up with different coping strategies, then you would, this would mean that actually there is a problem for everyone, right? Like, so I guess it's a matter of that. But I also, I think yeah, the most important point is that when these questions are asked in the local language and in the local context, they probably have different declinations of it. Thank you very much. Two points. I read that as Fields Medal Manual. I thought great, but anyway, that's an aside. So he, no, just to the two, a question and a comment on the question at the back. Presumably right, you have periods of time where your prevalence index is reasonably invariant, but your underlying factors are variable, right? So that would be in some sense a measure of adaptive capability, right? The invariance to variation in your primary factors. So I guess that's in the data that you could look at that. Right. But no, but my question was it connects to Luisha's talk. Cause this is exactly the kind of thing that Luisha was alluding to at the end as a signal of aggregate, you know, behavior. And the question then becomes, does anyone actually attend to any of these things? In other words, you have all these fancy dashboards that you showed us. Does anyone actually care? So do policy made you look at them and make decisions or is this just a lot of money spent on graphics? I'm just curious to know. So, well, how this is used within WFP? So actually, so there is the website, but then also from the same data and models, like sort of automatic report is generated on a weekly basis and we like summarizing all the main results and looking like given numbers for each country. And then this is sent every Friday. Well, sorry, yeah, every, let's say during the weekend to the members of the board. Advisory board of WFP who meet on Monday morning with this information attend. So then like the idea here, so what I've seen in my times there was that this is seen as a useful tool at HQ level, right? So where you get a bit of a sense overall of what's the situation in the different countries and if there's any country where we need to to take a look more closely. And I mean, when this again, just to underline it, it's a combination like this reports and the hunger map, they are a combination of the data that is collected on a daily basis and the prediction. So I would say that right now people still of course trust more the data that is being collected, which makes a lot of sense of course. But yeah, so in general, it's considered like it's a tool that is looked at at HQ level, whereas the individual countries and field offices, they don't find it as useful because for them on a local level, of course they claim that they have much more information than what goes into of course the model and the data collection, which as I was saying, it's done more of a first level administrative unit, whereas you often need more localized information, right? And things change also more locally. So there is still I think a bit of, we need to bridge this gap about different geographical scales, but it seems that at HQ level, this is sort of, yeah, like at least looked at on a weekly basis, which I find already as a for me personally, yeah, good indication so far that we are going in at least in the right direction. We are not building this just, yeah, as you were saying, just to build fancy dashboards, but definitely there's more work that needs to be done there in order. I think this issue is interesting. So what is the right scale to see a particular phenomenon because essentially if you are too large scale, you can essentially averaged out what is going on. And I think this is connected to what Luis was saying. Personally, I don't believe what you're saying, so you have to convince me. But I think also coming to the model part, I mean connecting this to the model, I mean somehow this was sounding a little bit like missing this issue of poverty traps, which I mean, this has been studied, I mean, a part is expert about this. So when is it that the level of calories that you intake make it's impossible for you to get, I mean, a revenue and then there's a positive feedback so that you fall in this poverty traps and. Yeah, so this is supposed to be integrated into this indicator. So when we see, look at this threshold here, et cetera. This is something that has been developed by food security analyst and this threshold had been probably. There are some evidence of this positive feedbacks. I mean, that you can see in the data. No, like I don't, that's not something we really looked into yet, yeah. But yeah, I have to explore more. There are no other questions than, we have a coffee break upstairs and maybe we can reconvene.