 Hi, I'm Aisha Walcott-Bride, a research scientist and manager here at IBM Research Africa in Nairobi, Kenya. Of course, I would have loved for all of us to be together in beautiful Addis Ababa, Ethiopia, but I'm very happy to engage with everyone, at least virtually. So today I'll talk about Africa's AI innovations for global impact. To give a little context, I'll provide a little background about myself. I was born and raised in Maryland. I always had an interest in computing and digital physical interactions. And so I pursued an undergraduate degree in computer science at Clark Atlanta University in Atlanta, Georgia. I then went on to graduate school at MIT, where my PhD focused on mobile robot navigation in dynamic environment. Now, while at MIT, I worked on three projects in Africa with amazing colleagues. The first was deploying the very first fab lab on the continent of Africa. And the second was on teaching computer science and entrepreneurship. Both of those were in Ghana. And the third one was developing a computing and educational technology curriculum for schools in Central Kenya. So from these experiences and my academic research experiences together, those two really deeply influenced my decision to pursue research abroad and research in the African context. So after I graduated from MIT, I had the opportunity to work in Spain for a few years where I worked on Smarta City's project. And then there was a post that IBM research was establishing a lab in Africa. And literally the rest was history. So I moved with my family from Spain to Nairobi, Kenya, and joined IBM Research Africa. Now, we're a part of a network of global labs on six continents with more than 3,000 research scientists and engineers. We really focus on working with partners and organizations to address Africa's grand challenges, transform society, and have global impact Africa. We're more than 1 billion people over 2,000 languages and more than 3,000 different ethnic groups. This rich diversity offers amazing opportunities for addressing complex research questions within the African context that can have global impact. So let's take a look at a few. What is the optimal mix of sustainable energy sources to improve water and food security? How can emerging economies restart productive society after a crisis? And how will climate change affect the way infectious diseases spread? For the remainder of this talk, I will focus on our research in an open area, and that is global health. So when we think about health care, we think about health care starting with the individual, then the family, then the community, and upwards to the global scale. And right now, we're all experiencing a global pandemic due to the novel coronavirus. Countries across the world are implementing various interventions and countermeasures to reduce the spread of the disease. At the same time, health care needs that are ongoing or new still require attention. And what does this look like? What are the implications for these individuals that need access to health systems for non-COVID-19 issues? For example, on March 27, the government of Kenya implemented a curfew to help curb the spread of coronavirus in the country. And on that same day, my son had an accident. We had to take him to a local health clinic to immediately address the wound, get him stabilized to clean the wound. And there we had x-rays done, and the doctor noted that he had a broken bone and would need to see an orthopedic surgeon. So we conferred with the doctors to identify a surgeon that could see my son that day. And we took his x-rays and the referral note with us from this health clinic to a hospital on the other side of Nairobi where my son to get surgery. While entering that hospital, there was a policy that only one parent could enter. And this was to limit the number of people in the waiting areas, again to minimize the spread of coronavirus. So my husband and I had to decide who's going to wait in the parking lot and who will go in with my son. And I decided to go in with my son. Now this counter measure is also being put in place at health institutions across the globe for the same reason. So once my son was checked out and then admitted, we were able to rejoin together as we waited for surgery in Nairobi. So while we're looking out the window of this room, you could actually see the white tint in which possible cases of COVID-19 could be addressed by health workers. And then later on that day, my son had surgery. All went well, fortunately. And then we started to enter the night when the curfew would officially begin. Meanwhile, my daughter, my young daughter was at home, fortunately with a caretaker. And we all agreed that we would save my son overnight and then rejoin with my daughter in the morning. So from this experience, I wondered, what are the implications of COVID-19 interventions and counter measures on the countries that already have a high burden of disease? And thinking more as a researcher and a mom is around one of the overarching research questions that we're looking at at the lab. And that is, how can we leverage AI to transform health systems and improve global health? For the remainder of this talk, I will focus on two key areas of our ongoing research in global health. That is, decision support for disease intervention planning and methods to characterize subpopulations to understand health outcomes. This work was done in collaboration with the Bill and Melinda Gates Foundation. So the malaria parasite has been around since prehistoric time. It was originally on six continents across the globe, excluding Antarctica. And over time, various countries put measures in place in order to eliminate malaria in their geography. However, it still persists in some areas. There are very saddling facts about malaria. Every 30 seconds, a child somewhere in the world dies because of malaria. 90% of all malaria cases occur in sub-Saharan Africa. And in 2018 alone, over 400,000 people died from malaria. It can be treated with anti-malarial and transmission can be prevented with effective malaria control. Now, malaria policy or decision makers have to take into account a number of factors in order to reach their target, like a reduction in mortality, eradicating the parasite, reducing the prevalence, and so forth. And they have a set of constraints as well, like budget constraints, resources, supply chains. And at the same time, this decision-making process requires malaria intervention models. These intervention models are usually very rich, very complex, and stochastic in nature. And it makes it difficult for decision makers to consume immediately in order to make an effective plan that's optimal for their context. The intervention models also take as inputs various interventions, like you see at the bottom of this slide, including case management, indoor residual spraying, larvisciting, and insecticide-treated nets or net distribution. So it turns out this base of all possible interventions is essentially its infinite, as any proportion of the population can have any combination of these interventions. Traditional methods for malaria control usually require a gathering of the minds in the space, developing long-term, three, maybe five-year plans, such as spraying homes every five years, setting targets together, and outlining strategies. And then at the same time, there are multi-drug-resistant strains of malaria that are spreading in Asia. So overall, it still remains an open challenge. Our aim is to look at how AI methods can be used to explore this intervention space. We know that the domain models are rich, they're comprehensive. Generating samples is computationally expensive. The search space is vast and complex. And AI methods are actually well suited to address these types of problems. So here's the classic example of an agent and environment. And in this case, the environment is represented by a malaria intervention model, or simulation environment. An agent can perform an action on this environment, get some reward, and then there's some future state of the work. So we started off looking at, you know, what are the existing interventions in place, and then what methods can we use to explore or to find new intervention plans? We have a set of actions. In our case, it was insecticide-traded nets and indoor residual spraying. We have a reward function, which is cost of dally averted, where dally is disability-adjusted life years. And then we have our malaria environment. We use open malaria from the Swiss Tropical Public Health Institute. So you can find out a bit more information in our paper. We're using the stochastic multi-arm banded approach and Gaussian process to do inference of the rich complex search state. The top shows samples that represent the space, and the bottom is interesting. The bottom shows, and this was calibrated for a specific geography in Kenya, it shows at the time of our work, the current intervention that was in place, which is about 55% ITN coverage, net coverage, and 70% spraying, that's IRS. And then it shows what the expert human thought was the best decision, and that was about 80% net coverage and 90% residual spraying for that geography. And then we deployed our AI algorithms on this landscape, and we actually found new intervention plans that were more cost effective. And you can see the three of them there in the darker red part of the surface. So then we once wanted to extend this work to support sequential decision making. What if we vary the plan over time as they have a long, they're generally like a multiple year strategy that's put in place. So here we did multi-step intervention planning, starting to look more like your reinforcement learning problems. We built a surrogate tree using Monte Carlo tree search of the malaria model itself, and then did selection and expansion using upper confidence balance at each node. We ended up needing to run some expensive simulations, but obviously not for the full space, and then are able to find a path to the space that is the most cost effective. And it has different types of interventions that can be done at different points in time and are dependent on the actions in the past. So it turns out this approach actually had a faster time to reaching the target. In this case, it was prevalence, so reducing the prevalence of the malaria parasite, let's say 10%. The yellow dots show the multi-step intervention plan, which and then at the bottom, the X axis shows the cost in millions of US dollars. And then the blue line shows when we applied our single-step intervention plan. So it turns out these more multi-step intervention plans can be more cost effective for malaria control. Now, furthermore, the malaria modeling community is a broad community that I said has been around for quite some time, and there's many different models. And we want to take advantage of combining multiple models together to provide ideally a better description of an uncertain future. So here's an example of where we have three different models. The target prevalence is very dark, dark purple here, where prevalence can go down to zero. So those are the dark purple regions in each of the models. And you can see that they change, they don't necessarily completely agree. So we use a covariance weighted averaging method in order to combine these models together to take advantage of all of them at one time for intervention planning. Now, all of this work is being looked at in a case study. The case study is with the country of Uganda, where we're looking to generate new malaria intervention programs to reduce prevalence and minimize deaths in Uganda. This is in partnership with the government of Uganda and malaria modeling community. The target there is that by 2025, they would like to drive the malaria parasite prevalence in all 134 districts of Uganda to 5% or less. On the right, you see the map where the red shows high prevalence and yellow shows much lower prevalence. So you can see that for 134 districts with different varying levels of prevalence and different factors in the environment, it's a very complex problem for decision makers. So what they should do? What should they do for an intervention program that allows them to reach those goals in those districts? Now, even further, this research has been extended in a few ways. First, we are now looking at and have developed a few deep reinforcement learning algorithms for policy learning, starting to bring in constraint optimization methods. And then there is a very important part around being able to have explainability of our work because we work closely with the domain experts and the results need to be understood and consumable by them. We now have a trusted decision platform to help support decision makers to find optimal and context relevant intervention plans for these different diseases. This platform allows us to bring together decision and policy makers, AI algorithm developers, and domain modelers in malaria for any other disease. We've deployed this platform or a portion of it at reinforcement learning challenges in 2019, including at KDD and Deep Learning in Daba. And we've presented some of our work at NERC in 2019 as well. Now, the key is that this is replicable and scalable across many different diseases and actually a broader base of interventions and decision support, including COVID-19. And HIV-AIDS, where the target for HIV-AIDS is that by 2030, there's 95% of those on treatment were identified on treatment. And then 95% of those with the virus suppressed and 95% first identified. And other applications areas look at how climate models can impact the consistent access to us. So let me talk about this application area of COVID-19. Again, we have these COVID-19 task forces, both in business, in the public sector, and in the private sector. They're trying to do things like contain the disease, limit economic disruption, reduce mortality, they're constrained by their pledges, the supplies within the health system, the number of health workers. And there are actively, right now, a rich set of COVID-19 domain models being developed. Again, complex models, in which we want to explore their space. You see these very common interventions, stay at home, school closing, testing and social distancing that are being applied to countries across the globe. So the work that we're doing for malaria control is very, can be leveraged to address problems such as a global pandemic and COVID-19. Now, let me look at, and let me talk about the other portion of this talk in which we're exploring methods to characterize subpopulations to understand health outcomes. If you think about it, once you're able to understand the different subpopulations, you can actually apply targeted interventions for ideally a bigger impact. I'll do this with two examples of our work that we are doing with the domain experts. That's in family planning, where we apply discriminatory sub-sequence mining and maternal newborn child health. Well, where I'll quickly highlight our automatic stratification method, family planning. Over 20 to 40 percent of women discontinue a contraceptive method for reasons other than the desire to become pregnant. And more than 214 million women of reproductive age who want to determine when they get pregnant are not using a modern contraceptive. Family planning has been known to reduce maternal mortality risks and to improve child survival through birth spaces. Consciousness of discontinuation, it can actually signal that there's a dissatisfaction with a contraceptive method. So working with the domain experts, we wanted to explore two questions. What do women transition to when they discontinue or switch between contraceptive methods? And are there any recurrent sequences of contraceptive use and discontinuation across countries? So to do this, we started with a heavily mined data set. It's known as the demographic health survey program. It's been around since about the 1980s and there's been 300 surveys done across more than 90 countries. And they're usually done on a every five-year basis. So some of the surveys have calendar data. So you have some time-based event data. And specifically, we look at contraceptive calendar data for a five-year period. This data contains episodes, which includes the type of contraceptives used and the events that occurred during that month, such as birth, pregnancy, or termination. It also includes discontinuation and the reasons for discontinuation. So here's just a sample of this data where there's an individual contraceptive calendar at the top. It is an episode, which means for that specific month, the intervention that was used for that individual and they did not continue. In our case, we're going to match all the types of contraceptive methods to numbers. And I'll use that throughout the remainder of the talk. So given that we had those specific questions on understanding these patterns and the types of data that we have, which is this time-based contraceptive use, we use sequence mining to identify patterns that are unique to one cohort of women versus another. You can think of methods within the association rule mining space. For example, your item base would be the different types of episodes. A sequence is the ordered list of types of episodes. And then our approach was actually to extend an existing method called prefix span to find discriminatory sub-sequences of contraceptive use. So we proposed discriminatory sub-sequence mining. And it has two key elements in it to focus on. So you have a cohort that we're calling the left cohort, and you have a right cohort. And you can use prefix span to find the proportion of frequent sub-sequences of contraceptive types or episodes within these populations. We call these support left when we estimate it for the left cohort or compute it for the left cohort and support right for the right cohort. So if we take a sub-sequence and we get the proportions within the left and right cohort, we can compute something called the left, which is essentially dividing the support left by the support right. If it's one, then the sub-sequence is about equally representative in both populations and therefore not very discriminatory. But if it's much greater than one, then you can say this pattern is representative of this sub-population. So we've done quite a bit of analysis and results, and I'll just give you a sample here. We used over 95,000 contraceptive calendar data for five countries. That is Kenya, Ghana, Nigeria, Ethiopia, and Burkana-Fasco, based on the interests of our partner. It's Kenya in 2014, Ghana 2014, Nigeria 2013, Ethiopia 2016, and Burkina-Fasco 2010. And on the right, we have a left cohort that contains sub-populations that discontinued their methods because of health concerns. And then the right cohort is all the remaining women, excluding those two discontinued to become pregnant. So this general sub-sequence that represents most of this population or the left cohort is, we're shown with three and zero term combination thereof. In Kenya, for example, the support left is more than 30%. And that's true for Nigeria and Ghana and Burkina-Fasco. And in Nigeria, we show two different sub-sequences that had more than 30% support left. But we show in Ethiopia that only in Ethiopia, these patterns were actually not discriminatory for this population. And discriminatory again is when list is much, much greater than one. In Nigeria, there are issues with the data. There is a lot of reporting of non-use. And so we don't know if it's non-use or just not reporting at all. And then that was filled in. Another example is from Kenya DHS in 2014. So first, we looked at left cohort where women discontinued because they got pregnant while using contraceptives. And in a right cohort where women discontinued because they wanted to get pregnant. And there, again, we see the common sub-sequence where more than 25% of the women in the left cohort had this item set or this combination of pattern or event sub-patterns. And any woman with this sub-pattern is 3.1 times more likely to be in the left cohort. Some further analysis shown on the right is, and by the way, the one shown on the left was for 2009 to 2014, whereas the results shown on the right are just from one year, and that's 2014. So in the left cohort for 2014, women discontinued because they got pregnant while using a contraceptive. And then in the right cohort, all reasons except for wanting to get pregnant. And it found a different sub-sequence that was represented of the left cohort. 31% of these women had this sub-sequence or the item set one piece. And then they were, anyone with this item set was 172 times more likely to be in the left cohort. So in summary, I talked about how we characterize behavior patterns using discriminatory sub-secrets mining. We actually use the outputs of this method as hypotheses to start to understand causal effects of contraceptive use and discontinuation using causal analysis. We're also looking into, and in starting to develop, predictive methods to predict when a woman would discontinue using recurrent neural networks. And furthermore, we've created an interactive dashboard to understand and explore family planning across these many different countries. And the third area, the final area that I'll just highlight is in maternal newborn child health and applying automatic stratification to really understand health outcomes in this state. So some background. In maternal newborn child health, 99% of all maternal deaths are in developing countries. 47% of all child deaths under the age of five occur within the first 28 days of birth, essentially neonates. And there is a 15 times higher mortality rate for children under five in sub-Saharan Africa than in high-income countries. Again, much of these deaths are preventable. So when we started out this work, we actually used the VHS data set, but not calendar data. We took the snapshot survey data, built predictive models to understand the risk of poor health outcomes, really focusing on maternal mortality and neonatal mortality or under five child mortality, and using those models to find the most important features that represent these different outcomes. And that was because we were focusing on understanding vulnerable populations and what are the markers and triggers for vulnerable populations. Ideally, or essentially, we're seeing the important features as potential markers for vulnerable populations. And then to get towards triggers, you have to move more towards causality. So in addition, we started to take a different approach to start to answer that question. And that was around stratification. And this was to focus on whether or not we could identify sub-populations of mothers and children who are disproportionately susceptible to poor health outcomes. Let's look at specifically Nigeria's VHS 2018 calendar data under five child mortality, in which the rate is 13.1%. And that 13.1% of mothers had experienced the loss of a child under the age of five. So we do, in working with the domain experts, we do some manual stratification to understand mortality under five child mortality by region. And you can see here that the southwest and south south regions have lower mortality. And then the northwest region has more than 17% under five child mortality. So the big question was, what combinations of households and mother features have drastically increased the odds of under five mortality? And we'll do this in two different ways. We'll look at both who are the vulnerable population and then who are the protective population. So we applied our method on this data set, and we found some key characteristics of the vulnerable subpopulation. And this includes mothers with more than three births in the past five years, those who reside outside the south south region, and some other characteristics. In this subpopulation, we had a 48% of under five mortality among this subpopulation. And 6.1 times increased the odds of under five mortality compared to the average. This included over 1,100 of the respondents and 19% of the overall under five child mortality. So it's startling the difference in this subpopulation compared to the average. Now let's look at the protective population. Key characteristics of the protective population are mothers with one birth in the past five years, mothers less than age 40, and mothers residing in a household of size three or large. And in this population, there was a 4.4% of those who had experienced under five child mortality, which is much lower than the average. And it consisted of over 9,000 of the respondents, and they had a one-third decrease in the odds of under five mortality. And over 14% represent over 14% of the under five. So now we've looked at both the vulnerable population and the protective population using automatic stratification. So I'll summarize the method that's used and point you to a few references to learn a bit more. So you can see here that all of these variables or features, region, wealth index, household size, have a set of values that they're categorized into. And what we do is we really want to identify these extreme cases of vulnerability and protection for these different variables, for the survey respondents across these different variables. So we start with all the variables and the values or the features and the values being included. And then what we do is we randomly select one feature. We look at all combinations of the values for that feature in order to find those that are with a higher probability of under five child mortality. And then we mark that sub-population in that group as blue and the orange are excluded from the search. We now condition on the blue, look at another feature at random, all combinations of the values for that feature and decide again, which sub-population or which combination has the most evidence that shows that there's a stronger under five mortality. I mean, and that in case was the poorest and the poorer by wealth index and that a show on the blue and the others are conditioned out in the red. So here we look at household size. As you notice, there are eight different values making this 255 different combinations. So you're wondering, I know, how can you explore this space efficiently? And what we're able to do is actually exploit a property of our optimization function. So there's a method of scoring within these combinations. And it allows us for a calculation of the most anomalous subset or the subset that deviates the furthest from the average in own time, essentially making the algorithm polynomial rather than exponential. And you can see the references below. So next steps of this work is looking at how do you penalize the subsets and the complexity of the subsets. They can be very vast in terms of all different variable values, sorry, feature values, the combinations thereof. So how do you even explain that? And there's some methods that we're working on in that space. Most importantly, bringing in this notion of targeted intervention planning, where we find a sub-population, let's say it's X, and we can have the biggest change in effect to a targeted intervention by addressing the modifiable characteristics or the actionable characteristics. And third, this work can be directly used in clinical applications, of course, in MNCH, which has an age, chronic disease management, and again in COVID-19. And for COVID-19, it's really about understanding these different populations that are more susceptible to poor health outcomes based on where you are in the world. So the algorithm, of course, scales across context and geography. And we've, again, developed some interactive dashboards to understand the models, including some partial dependency plots, ICE plots and so forth, to help with understanding these sub-populations and to think through the right types of targeted interventions. And what are the variables that can be changed to improve a population? So I talked about two of the ways that we're looking at AI for global health in the disease intervention planning space and in characterizing and understanding sub-populations. And none of this work would be possible without collaborations and partnerships. So we work very closely with different organizations, public and private sector, anywhere in Africa, of course, or even beyond, so that we can understand their context. They bring very challenging questions. And we work also with academia. And we do technical engagements in order to show domain experts in those different areas how methods in AI can potentially accelerate progress towards their target. And this is some broader ongoing work at IBM Research Africa. It's in climate and food security, financial inclusion, healthcare, as you saw, water access, and developing novel algorithms in core AI, or what we call core AI science. And a key aspect of this is, let's be realistic, these problems are very interrelated. And so what we're starting to explore now is how AI can be used to derive cross data and cross domain insight in these different areas. So before I close, I just want to highlight some key activities at ICLR 2020 by IBM Research. There'll be a talk by David Cox, who has our MIT IBM Watson Lab in Cambridge. We'll have a social hour AI COVID-19 research. We'll also have technical demos at our booth, including our family planning demo and our malaria intervention planning demo. I want to say thanks to my amazing colleagues who work with me here in Nairobi, in South Africa, and in our other labs on this very important effort. And in closing, in this talk, I focused on AI methods for identifying and understanding subpopulations, as well as supporting decision making for disease intervention planning. So we're all experiencing COVID-19 pandemic, where we see the interplay of decisions and interventions put in place by one country and their effects on other countries and the people within those countries in unique ways. I see this as a parallel with the work that we're doing in Africa. Because of Africa's rich diversity, complexity, I see that the work that we're doing here and the innovations that we're creating can also be applied around the world for global impact. Thank you.