 Can anyone hear me online now? Give me a thumbs up if you can hear me. I think you can. Perfect, thank you. Okay, so sorry about this shuffle. We moved to a different room. Maybe that wasn't necessary because we have a number of people here and a number of people over there. There's a lot of competing sessions right now. But thanks for joining the session today. We're gonna be talking a bit about artificial intelligence and machine learning, how those kind of work in the DHS2 ecosystem. We have a few presenters who are gonna share some, AINML as well as data science, just advanced analytics. But I'm gonna do a quick intro on AINML and talk a little bit about how that can apply to DHS2 and the areas where we work. And then we'll have a few presenters who we'll share. So thank you all for joining us today. Hopefully we'll have a little bit of time at the end of the session as well for some questions and maybe ideas as well because it's a very quickly changing field with lots of things that are happening. So I'm very excited to see where this will take us. This has no significance other than that. It's a very cute picture that an AI drew. All right, so what are the opportunities for AI and machine learning in DHS2? So before I move to the next slide where I have a few possible answers to this question, do we have any ideas in the audience? I don't know if we can share this or I'll just repeat what you say. So did anyone in the audience or online have an idea of what would be a good way to use AI or machine learning or even not necessarily is but data science and advanced analytics in DHS2. Any show of hands, any ideas? Lars, data forecasting, very good. Any others? Other ideas? Yeah, interpretation of data points, maybe interpretation of visualizations or being able to kind of not just show somebody something visually but tell them what's interesting about what they're looking at could be a very interesting use of language learning models. Lars has a lot of ideas. Anomaly detection, that's a good one. Any others, do we have any online? Feel free to put it in the chat if you do. Sorry, yeah, it was a little hard to hear. Yes, tuberculosis. X-rays, yeah, X-ray interpretation for tuberculosis detection, absolutely. Image analysis, that's another good one. Okay, so there are a few ideas here. Let's see what ChatGPT says. So I actually asked ChatGPT this question and came up with some pretty good answers. They're not perfect. Some of them we've already heard here today but you can all follow this link when I share the slides as well. So I first just asked, how can AI and machine learning be used in DHI's too? And we have some data quality and cleaning. So that goes along with anomaly detection which is also down here. We have predictive analytics which is forecasting. We have decision support which maybe is similar to interpretation of visualizations but we actually also have that one down here as well. So data visualization and exploration and yeah, a number of other things here. And then I went on and I asked it, all right, what about in the software development process of DHI's too? How could we apply artificial intelligence or ChatGPT or the language learning, large learning models? And there's some good ones here. So data pre-processing, how do we, that one I'm not entirely sure if it knows what it's talking about but that's okay. Automated testing is a very big one that I think is very much worth exploring. So making sure that software is robust. We don't have regressions. That's something that humans are not that great at, right? You can throw a lot of human power at it but it takes a lot of time and you're gonna miss some things. Whereas a machine is very good at figuring out all of the possible paths for something. Obviously code optimization, documentation generation is a very good one. So being able to say, all right, we know what this does. Now let's describe it and let's help people to figure out what's going on. Obviously bug detection and resolution. And then I went on and asked it again. All right, so what about user training or documentation and support? And we'll learn a little bit more about this later. And so user training is something that chat bots or virtual assistants could be very helpful for contextual documentation. I think this is actually, and I recommend talk a little bit about this later but I think this is a huge area of opportunity where we have generic documentation and we have very configurable DHS2 systems that are maybe difficult to kind of, for an end user especially to take the generic documentation and their specific configuration and put those together and figure out how to do what they want with the configuration of their system. And so if you can then take that documentation and contextualize it to what their use case is or how their system is configured and present that to them, that could be very, very useful. And also do that in an interactive way so that they don't have to search through a huge document to find what they're looking for but they can ask the question that they want. Troubleshooting and support, recommendations, I think is a good one as well. Another one that didn't come up here but I think is worth exploring as well is being able to do intelligent mapping of different configurations or different metadata in different systems and automatically try to figure out how you use something in one system that's called something slightly different than another one. We had a presentation from Pete Linnigan from BAO Systems the other day who used a pretty lightweight but a model to say within a certain threshold, let's find names of different data elements in two different systems that are similar but not exactly the same and suggest that that mapping should exist. So if it's called, I don't know, ANC in one and it's called antinatal care in the other, it can say that those are probably pretty similar even though it's not an automatic mapping. So this is basically all to say that there's a lot of opportunity here. Obviously some of it is hype and we're not just going to throw everything out immediately and jump on the AI train but I think they're especially with the recent developments in AI machine learning, advanced analytics we didn't talk about as much here but there's a lot of opportunity there to really supplement and support the processes that people are using DHS to work on. So that's all of my intro. I just wanted to give kind of a fun little dive into chat GPT, we could explore this a little bit more and maybe I'll let, oh wait, I have to log in with my account, that's over there. I'll do that another time. So maybe if you can think about the next question to ask kind of beyond this, maybe to dive into one of these specifically, we can ask that at the end if we have some time. But with that, I think I'll turn it over to our first speaker who's from Solid Lines who's going to present a bit about data science or advanced analytics with DHS2 data and then we'll talk a bit more about machine learning and artificial intelligence in the following presentations as well. Thank you. Great, I hope the people online can hear me. So I'm gonna be presenting an integrated architecture for adding analytical power. My name is Sakura Lopez and I'm from Solid Lines. So we won't be speaking too much about ML and AI applications but more of an architectural approach that can allow you to do this type of analysis. First thing first on the agenda said that Carlos Tejo would be presenting this. He's one of our lead software architects. Unfortunately, he couldn't make it. I'm presenting on his behalf. I'm a data analyst and BI specialist at Solid Lines. So I will do my best to try to answer your technical questions afterwards but I may have to send some of those his way after the presentation. So a little bit about us. We are a small digital consulting company based in Spain and we do a lot of different projects from server management to DHS2 implementation. We have 15 plus years of experience working across 25 different countries but recently we've really been interested in looking at building analytic platforms helping organizations build their analytic platforms. So today we're gonna look at some of the DHS2 analytic capabilities, what it looks like to scale that architecture and the strengths of that architecture and then some use cases from some of the partners that we work with. So first, DHS2. There is a lot of analytic capabilities in there as you guys know and the functionality is improving with each release. And what we've seen with a lot of our partners is that most of their use cases could be achieved through DHS2. I had one partner come to me and say, oh, I want you to help me with this Power BI report only to spend many hours figuring out we could have done it all in DHS2. So we wanna prevent that. And the key takeaway there is that you should have a really strong use case if you're gonna pull that data out of DHS2 and visualize it in a BI tool. And usually it's DHS2 is the best option when you do have all of your data in one DHS2 instance. But there are those use cases that have really pushed the boundaries of what's possible with DHS2 in terms of analytics. So some of the things that we've seen is people wanna build those elaborate dashboards for their donors or a public facing dashboard. Maybe they have really massive data sets or they're trying to do an analysis of complex indicators and heavy data transformations. They may wanna combine data from multiple DHS2 instances and often they wanna triangulate their data with external data sources or unstructured data sources. So this can range from chat bot, free text conversations to weather conditions to system blogs. So what does it look like when you want to scale that architecture? So as I said, a lot of the things that people wanna do can be done in the dashboards and the visualizations but sometimes people need to pull that data into a BI tool for more elaborate visualizations and complex analytics needs. So when you need to do more complex transformations and then a benefit of that is you're able to pull in other data sources. However, there are limitations to this approach of importing data directly from DHS2 into the BI tool. And when that's the case, then the organization may want to consider integrating a data repository and this is for more advanced complex analytic needs. So your data repository is comprised of data lakes and a common data warehouse which we'll delve into in a bit. But first let's talk a little bit about the difference between pulling it directly into a BI tool versus using this data warehouse solution. So a BI tool, you would go through these various steps to do some kind of data analysis and then facilitate data use. And you can do this with an array of tools, right? We had a presentation on SuperSet yesterday. There's Tableau, Power BI. However, when you have these more advanced analytic use cases, then you may wanna consider using this data repository for the first two steps. So to connect and transform the data and model the data. And a key point here is that even if you use this data repository, you're still gonna have to pull that data into the BI tool to visualize it and share it with your data users. But if the organization decides that they want to implement this integrated analytics system, it can be done on-premise or using cloud computing platforms like Azure and AWS. If you do it on-premise, it is a lot of work and it's quite expensive. Within our own implementations at SolidLines, we are using cloud platforms. So that's what I'll be speaking to mostly throughout the rest of the presentation. So as you saw on this slide, both BI tools and the data repository can do similar things, right? They can both connect and transform the data and model the data. So why would you choose to incorporate a data repository? And there are several reasons. The first that we've seen quite a few times is that when you have people pulling a lot of DHS-2 data, a lot of API calls really often, it could stress the server, which you really want to avoid if you have a large DHS-2 instance across multiple countries where there's a lot of users. It also allows you to rapidly access huge amounts of data. It reduces the dependency on the availability of data sources. So for example, things like Facebook, they'll prevent you from pulling historical data like five years in the past. If you have it in a data warehouse, it's going to be in there for you to analyze until you remove it. It also provides the point of truth for multiple users, like especially around your common dimensions. So what we mean by this is for data sets that people are using often in analysis, like your org units, it can be in your data warehouse and people can pull that over and over and over again and they don't need to do the transformations themselves. It also structures data in a way that makes it easy for BI tools to analyze. And you can do this with any BI tool, right? Like open source BI tools, commercial BI tools. And lastly, it really helps organization scale. So with regards to the cloud platform, scalability is related to being able to store and manipulate significant amount of data from any source, structured or unstructured. But the cloud platform also facilitates with transparency and consistency. So you have core data consistency for those common dimensions like maybe your org unit tree or a calendar table. You also have a collaborative approach to data transformations. So for that final integrated dataset, it's very easy to see what transformations took place to get to that final dataset. Also facilitates data security. So your data is distributed across multiple nodes. And then if there's ever failures, you can store those failures for further analysis. And there's ways to do safeguards for restricting sensitive data and PII. And it's cost effective, especially when you end up having huge amount of data, the storage cost is quite low. And it eliminates silo costs in the sense that people aren't performing the same transformations over and over. One person can do the transformations and store in the DWH. And then it's there for all of your analysts to pull into their BI reports. So what does this architecture look like? So first you have your data sources and those sources are extract, the raw data is extracted through a pipeline into the repository, the data lake repository. The data lake repository has three data lakes. This is typically the best practice for the infrastructure. The first is a raw data lake and that's where you have your data that is in its original format. Then you have a curated data lake and that's where you have the data that has undergone transformations. And it's also where you can remove sensitive information and PII so that users cannot access that later in the warehouse. And finally you have a staging data lake. So that's where your data is stored and it's an analytic sandbox where your data scientists can use that data to perform deep data analysis. So then you can store those integrated data sets into a data warehouse, which we call DWH often. And again, that's done through a pipeline that's pushing those data sets into the DWH. And the DWH provides the structure for that data. So usually this is done through dimensional data modeling. Once you have that data modeling that data can be fed into a BI tool to visualize it. So what types of data do we see being put into these tools or into this repository? So most often it's chatbot data, social media data like RapidPro, demographics, weather conditions, server logs, free text conversations. This can range from like comments on YouTube to Facebook messenger. There's also a lot of data from databases themselves. So DHS2, FireMoodle, Matmo, all kinds of things. Possibilities are really endless. So we're gonna look at a couple of use cases from one of our biggest partners, Population Services International. This is from a project called DISC and they're pulling data from over 30 countries and multiple data sources. So chatbots, social media, fire, system logs. And in this particular part of their Power BI report they're looking at ways that they can facilitate followup for people who are doing self-care in reproductive health. This is another example where they are pulling data from multiple DHS2 instances to look at health quality care within the private sector across many countries. So once you have this architecture in place, how would an organization engage with it? So in your data lake repository, this is where your data engineers typically configuring the system. So the structure of the data lakes, the DWH, configuring the pipelines that really make it all work. Then your data scientists would typically be doing the big data and deep data analysis within the data lake repository where you have all that unstructured data. Within the data warehouse, your data analyst is querying that routine data but they're also using it to develop reports within Power BI so that hopefully eventually the data users can use it to gain insights and make informed decisions. So what is driving the analytical power behind this type of architectural approach? And in our opinion, it's two key things. The first is the design principles behind the architecture. So first data lake structure where you have three different data lakes for your raw data, your curated data and your staging data. Second is common dimensions and dimensional data modeling which we didn't get to go into too much but it's one of my favorite topics. So if you have more questions come up after I'm happy to talk more about it. And then reusability. So in terms of the architecture, the pipelines that are built can be reused but also reuse ability in terms of getting the data over and over from the data warehouse because it's there for multiple data users to use. And then on the technology side, we have Hadoop. So HDFS and this is a highly fault tolerant distributed file system and Spark which provides the computational power for transformations over large distributed data sets and both of these are open source. So for our final use case, we'll look at usage analytics and this is also from PSI. So they were interested in understanding how people are navigating the system, how visualizations are being accessed and whether the dashboards are really being used. And there are capabilities within DHIS too that allow you to analyze favorite views and top favorites. However, if you pull the system long data into a data repository using the data lake and DWH approach, it gives you much more dynamic analysis capabilities, a lot more flexibility in what you can actually analyze. So the possibilities are really endless. You can look at things like API endpoints used, the latency times, the devices and browsers that are accessing the system, conflicts and errors during synchronization. So what would this look like in practice? The data is being pushed to the data lake where it's stored in a file system and you can push that data as often as possible you set that schedule. In this case, it's being pushed on a daily basis, the log data. This is what the raw data looks like in the raw data lake. Then it goes through the other two data lakes so it gets curated and put into the staging lake. And then it's modeled in the DWH and this is what the model looks like once it's brought into Power BI or the BI tool. So once you have those core transformations and the modeling, you can end up with reports that give you some very intricate and detailed analytics. So for example, you can understand the number of the users in the system, the size of the data, the number of API calls. You can look at how many calls were successful versus how many failed. And you can also get a better sense of who is accessing the system, how often they're accessing it. You can even go as granular as seeing which dashboards, which visualizations they're looking at. So key takeaways from this presentation. This architectural approach is definitely an investment in terms of time and resources. And it requires buy-in at the leadership level. This is not a project-based approach. You really need to do this at the organizational level. But as organizations, data size and complexity increases, you really need to think about the limitations of connecting the tool directly to your BI tool, the data directly to your BI tool. For the architecture, it can be on-premise or in the cloud, but cloud solutions typically provide better functionality for scaling and security. And in terms of the powerful aspects of this architecture, it's driven by the data lakes in DWH, which facilitate scalability and data processing because they provide high data availability and fault tolerance. But, and you can also have really powerful analysis of the data, but you have to make sure that it's properly modeled. So thank you so much. So is it next? Does anyone have a quick question while we switch speakers? Great presentation, thank you. I'll repeat your question. Did you hear the question? So I'm going to repeat it for those online as well. I'm actually going to have you repeat it. I don't, I don't miss a chapter, please. That's fine. Since you are pulling data from different sources, if you have the same data, say from the population data from two different sources or stock data or something, then how do you reconcile which data to take and which to ignore? So you're saying that if you had two different data sets that are both around population data and you have to decide sources, I think that would really be dependent on like the organization and what they decide is the most credible data source, for example, or what's the most appropriate for their use case, like maybe what's the granularity of the data set and is that useful for what they want to end up visualizing in the end. So credibility and then aspects that influence how you could apply that data, like how granular the data is. So basically the data analysts will make the choice and which data to choose if it is coming from multiple sources. Yeah, so I think from our implementations, typically like we would have a discovery phase where you look at the various data sources and you try to understand how that data can be pulled into the system and how it would be visualized. But typically, at least in my opinion, you always want to understand like what the end goal is and what actually needs to be visualized based off of user needs. So when they're looking at that data, what kind of actions are they supposed to be taking? And then that should really drive what data source you choose, right? Because the data source needs to be able to provide that kind of functionality for them to make informed decisions at the end of the process. Go ahead. Hi, Sean Brumman from HIST South Africa. So really just a comment, I wanted to say thanks very much, this is fantastic. We as an organization prioritized data science about two, three years ago. And if we had seen this presentation, then it would have been incredibly helpful. So I think anybody that's wanting to go down this journey, this is a fantastic cheat sheet. And we spend a lot of time learning lessons which we would have saved if we'd seen this presentation. So thank you very much. And those who want to see what we've done, we've done a production level national system that uses machine learning to predict staff attrition in South Africa, which we're considering in the next session. Very cool, thank you. I think we do have to move on to the next speaker, unfortunately, I'm sorry about that. But we did a one question in the back, so maybe you could talk to him after. But yes, we'll move on now to, yeah, Kayla who is going to talk to us about application of machine learning with low code. Okay, hi everyone. I want to thank the last speaker as well. I think that that was a great presentation and something that we're also struggling with and hoping to improve upon in the future. And it's very relevant to what I'm talking about. So as mentioned, I'm going to talk about an approach that we are using for machine learning with our DHIs to tracker data. And I'm a technical advisor in health informatics and data science at FHI 360 where I work on a large HIV project. So Austin kind of stole my thunder. I also asked chat GPT some questions. I'm wondering who's all used chat GPT before? Yeah, a lot of people. I think AI tools are becoming more and more available and something that I hear people, especially people working in analytics, say often is, oh, AI is going to take our jobs. So I thought I would test that hypothesis and ask chat GPT to write me a little intro to my presentation. And what I came up with was actually a knock, knock joke. So I'm going to tell the first AI generated DHIs to knock, knock joke. So please, please help me out if you know the familiar or the format of knock, knock jokes. So knock, knock tracker, tracker capture your attention. DHIs too is here to help you track and manage your health data. So I hear some clapping, but not necessarily a lot of laughing. I would argue that this is not a very good joke. And I included this because I think it's an example of the limitations of AI. And I think if you take away one thing from this presentation, it's that while we can automate machine learning, the most important part of the process is everything else that goes into it, right? We still need to collect our data, clean our data, format our data as the last presenter mentioned in a way that can be analyzed using machine learning. And then we need to take the results of that model and apply it to something that actually matters. So I would argue that the automated machine learning part that I'm talking about is really a very small piece of the puzzle. So with that context, let me go a little bit into our approach. So in general, this is the supervised learning process. If you're familiar with machine learning, we start with data acquisition. So we get data from the source and try to understand our data. Next, we do data cleaning and something that's called feature engineering, but it's essentially looking at what parts of our dataset are we going to use to try to predict our outcome. We do the actual modeling and then we deploy that model to new data and put it in the hands of the people making the decisions based on that prediction. So the way that we're doing this is with a combination of DHIS2 and some available software through the Microsoft Power BI platform. And I'm going to give you a demo of this. I have some more detailed slides, but I don't think I have enough time to talk through all of them. So if you're interested in more information, feel free to reach out to me, but I'm going to give you kind of a practical application of how we're using this. So this is one of the use cases we're looking at HIV in Lesotho. So Lesotho is a country in Southern Africa and it's estimated that around 290,000 people live with HIV in Lesotho. It's thought to have the highest prevalence of adult HIV in the world. So we run a community-based project in Lesotho in 12 community councils and we work with participants who are at high risk of acquiring HIV, female sex workers, men who have sex with men, transgender people and priority populations. So we collect data in Lesotho for our community-based project using DHIS2 Tracker. And this is just an example from a mobile phone of the kind of information that we collect. You'll see we have a number of program stages and many of the stages are collected by different people. So when a client is reached by our project, they first get a risk assessment from a peer educator. They're then offered HIV testing services and an HTS lay counselor provides those services. If they test positive, they work with a peer navigator for treatment. And if they're interested in prevention via pre-exposure prophylaxis, they work with a prep nurse. So all of these different stakeholders use our system to collect data. We've been using the system since around, I think 2021, but we actually back entered data. So we have about three years of data in the system right now. So why did we want to use machine learning in this context? I think some of you are probably familiar with the 95, 95, 95 goals, which is quite a mouthful. But essentially they're global goals where we're looking at trying to ensure that 95% of people living with HIV know their status, 95% of people are on treatment, and then 95% of those are virally suppressed. And this is what that looks like for Lesotho. They're actually doing a fairly good job. So we wanted to look at, can we use machine learning to prioritize HIV testing and reach the undiagnosed? So we're using this really with a subset of our population, or this is the thought is to look at people who we've reached, but haven't been tested for a number of different reasons. And ideally we would follow up with all of them, counsel them and try to get them to come in for testing. But if we don't have the resources to do that, how can we prioritize that time intensive follow up to those most at risk of being positive? So now I'm gonna give a quick demo. I hope this works. Okay. So I just wanna highlight some of the features of the approach that we're using. So we do not have a data lake that's bringing all our data into the model. That would be great, but instead we're going directly to a BI platform. So this is Power Query in Power BI data flows. And I just wanted to point out, so this is our data model and I'm not going to go into all of it, but we've parameterized the model. So what that means is that you can just change this information about in the system and then you can refresh the model and bring in data from any DHIS2 instance from a program. So we work in, do you say 35? But we work in very many countries. I think we have 18 countries with DHIS2 trackers right now. So we wanted to make an approach that we could replicate. I think a lot of you probably know it's sometimes very challenging to get data out of DHIS2. So we parameterized this model so that we could easily replicate it across countries. So once you save your data model, this is kind of what the system looks like. And I just want to show you how easy it is to apply this auto machine learning. So you just go to machine learning models right here and you can see that I already have a trained model here. But if I wanted to add a new model, I just press that add ML model. I choose the table that I want and then my outcome. So I have a lot of data here, but I'll scroll down to the bottom and choose test results and then press next. All right, there's a trying to get to my next tab. Okay, so once you press next, it actually takes a minute to load because it's analyzing what that column contains, but it gives you options for the kind of model you can run. So you'll see regression is grayed out here. It's because it knows that this is not numeric data and not appropriate for regression modeling. So I'm just going to choose binary prediction. And it has you choose your target outcome. So I know that the results I want the system to predict is positive and that's how it's stored in my data set. And then it just has you specify a match label and a mismatch label. So instead of not positive, I'm just going to put negative because everyone who's not positive in our data set has tested negative. And then this is really the important part. It allows you to select which features you'd like to use to try to predict your outcome. So again, it's not a magic system that can automatically understand everything about your data. This is where the human element really comes in. So I'm not going to train this model right now. Let me just select like one thing. So the last thing that you have to do is give your model a name and then select a training time. You can have the model train itself for up to six hours. I've tried that before. It never actually takes six hours. Maybe if you have more data, it would. But after you press save and train there, and I think this is one of the really powerful parts about AutoML is that it gives you this really great training report. So it's going to give you some basic model performance statistics, a confusion matrix, which I won't go into any statistics here, but you can see visually what your top predictors of your outcome are, if it loads, and then you can actually dig down and get more information about each of your predictors, which is really helpful in terms of understanding how the model is making its calculations. You're also able to fine tune your probability threshold. So again, I promise not to go into statistics, but this is incredibly important when depending on what kind of outcome you're looking at. So since testing HIV positive is a fairly rare outcome, but we don't want to miss anyone. We want to ensure our recall is quite high. So we're able to identify everyone who is positive with our results. So again, I'm happy to go into more information for anyone who's interested, but I'm cognizant of time. So I'm just briefly going to go through some results. So we applied the model with a probability threshold of 0.35 to around 8,000 people reached by the Epic Project and it predicted that a subset of them could be positive. We haven't fully tested all of this yet, but we've reached a subset of the population identified by the model and we were able to identify new positives at a higher rate than our regular testing. And we were able to identify some new positives, which I think in any case means that this was a successful pilot. So there are a number of limitations to this approach. I could talk for this about limitations just for 15 minutes, but there's limited ability to fine tune the model. Changes in the model take significant time. Power query, while very powerful, is also very slow in some cases. And it's difficult to replicate the model via program rules. So Power BI or AutoML gives you a lot of details about how the model is made. And it's actually based on Scikit-learn. So it could be replicated in Python, but it's not simple enough to replicate directly in a DHIS2 program. So just back to chat GVT briefly. I wanted to give it a chance to redeem itself because its joke was not very good. So I asked it to write an ending to my presentation and I actually kind of liked it. So I thought I would read it here to finish us off. So it said, in conclusion, the combination of Power BI, Microsoft AutoML, and DHIS2 tracker can enable advanced analytics with minimal coding. While automation plays a role, our human understanding of data sources, systems, context, and model evaluation remains essential. Let's embrace these tools, harness the power of human intelligence alongside machine learning and unlock new realms of innovation. Thank you for joining me today and may your data-driven journey be filled with endless possibilities. Thank you, everyone. Thank you very much, Kayla. I think the point that you made a couple of times is really essential that your predictions or your analysis is only as good as the data that you're basing it on. And you really need to, that is not an automated process to make sure that you have quality data. You can use automation to help with it, but it's still a very human process and making sure that you are training things on the right data is very important. Okay, next up, we have a very interesting presentation. I'm sure many people have questions for you, but we don't have too much time. So you saw what you looked like, come up and speak with her afterwards. Next up, we have a very interesting presentation on documentation for DHIS2 and how to use some advanced machine learning and artificial intelligence to enhance that. So I'd like to introduce Irik from our tracker team. Yeah, so I trust you all can hear me. There has been three great speakers performing today. They all mentioned chat GPT. So you can probably guess what I'm going to talk about. Okay, I think I should be at least, hopefully I'm not sharing to the other room now. Oh yeah, there you go. All right, so for those of you who don't know me, my name is Irik. I'm a developer on the tracker. I'm part of the core development team here at the University of Oslo. In addition to being a developer, I'm also a project lead where we use DHIS to try to track climate emissions for the private sector and businesses in Norway. So for the last two and a half, almost three years, I've both developed and used DHIS2. So I know your struggles. I know your pain points. Today I'm mostly going to talk about LLMs or large language models that they're called. Large language models itself, the term is pretty much unknown. You're usually known it by chat GPT or just GPT. But before we go that far, we need to make sure that everyone's on board on what actually LLMs are. And I use the LLMs and not chat GPT because there's multiple. It's not only open AI and there's much more to choose from. So what are LLMs? And LLMs are basically just huge, complex, advanced AI models. The same AI models that we're sort of getting used to when we're scrolling our social media or buying products on Amazon. The only difference is that this AI model is trained on a lot of conversational data. So it's trained on billions and billions of basically just text. And it sort of analyzes and it can actually get the context of the text that it's working on. And that is the revolutionary part compared to just having like an auto complete on your iPhone or whatever. So I talked about chat GPT and chat GPT has gotten a lot of media attention in the last couple of months. Usually it's painted this dark and gloomy sort of picture of how we are not connected anymore and students don't teach anything. And I brought two samples up here and this was just the last hour before making this presentation. So there's a lot of dark and gloomy stuff. And the first one is chat GPT founder, Sam Altman, who was quoted on something that learning is going to be a bit different. I read the article and they paint this picture of students not having to learn anything, they don't need to go to school. And of course, that's not the reality. The other one is actually a pretty fancy one that was in Tecto Dano, which is a famous Norwegian technology magazine. And it's the very first all AI church service. And you can see the screen up there. The reverend is created by AI. The preacher was a 40-minute preach basically written by a chat GPT. And there were over 300 people attending that class. So almost as much as the annual conference itself. All right, I've talked about it trying to analyze and actually getting the context of the text that is trying to analyze. And I'll give you a live demo on what we've done. We've tried to improve the documentation of DHS. Before we go there, like show of hands, how many people have entered the docs of DHS in the last year? There's pretty many, there's pretty many. Typical use cases just to do that is often like creating SEO content, like content that's good for Twitter and Google and everything. It's writing essays in long texts and also translating languages. We've taken a different approach and this is just like a brainchild of mine. It was very cold in Norway and a lot more cold than it is now. And the benefit of it being cold is that you get a lot of hours to think around your computer. And what we did is that we used GPT or I did, I used GPT and tried to take the docs and splitting up and I have some technical jargon that is not really important, but I'll try to show you what it really is. And hopefully you can see it up here. So this is just a normal, it's just using the GPT model and you can pretty much ask it anything related to DHS. So we'll try what is a data element. And once you write it, we'll check this vector database in the backend and we try to get the context of the user query, match it and get the relevant context from the documentation and send it to OpenAI so they can rephrase it. So it seems like it's a personalized person trying to respond to your question. We can also, oh, sorry, sorry, sorry. Better? Okay, yeah. Okay, so we can also do this on more complex tasks. That's one of the benefits of using LLMs is that it can break down pretty complex issues and give it to you in a format that's personalized for you. So we can, there's been some people talking about how to use R with DHS. And we can sort of tell it, can you give me five reasons to use R with DHS? And we'll go through the documentation on how to use R, what is R? And we'll give you five good reasons or at least five reasons. We can also rephrase it, say that we want to print this thing and actually show it. So we can say, can you show me in a table? So you get in exact format that you want to get it back. And this is, of course, using OpenAI, so it takes some time, but there you go. And it's back to showing you in a table that you can now display to others. Of course, this doesn't always make sense. But it's more of the having the option to actually do it. Another thing about using your own separate application is that if we go to the OpenAI sites and we just use the GPT-3 and say I'm a developer and I want to check out what are the new endpoints in the tracker endpoints, which are no longer new, just a shameless plug that you should move on from the old ones. What are the new endpoints in the tracker? Well, that's GPT. I apologize. Of course, he doesn't know. He knows a lot about DHS in general, but the new endpoints in the tracker is so new that it's not been trained on this data yet. So what should we do? Should we wait for the new GPT-5 or whatever to come out and actually be trained or should we provide it ourselves? So let's ask the same question in here. What are the new endpoints in the tracker? Sorry for my spelling. I will try again and it analyzes the context of the question and hopefully it returns five new inputs. Thank you. I'll also very, very quickly show you how to do some reporting and analyzing data through GPT. I know not everyone likes to do it this way, but it's more to give you an idea of how easy it is to get started with these kinds of things. So I have a visualization in the data visualizer app. This is part of the Sierra Leone and it's basically just ANC-3 coverage for the last 12 months. So I choose the same visualization. I'll go down to this one. I won't enter a custom prompt. I would just say generate a report from this data and this is finalized data and it gives you a report. And it starts off saying, what is the data? It just goes through saying that this is data for the last 12 months. And it also wants to give you a huge overview of all the data and everything. So it's pretty good, but it's not good enough. So how about if we do the same thing and we provide it a custom prompt, let's say include a title, key in slides and actionable points and generate. I see that I'm running out of time. So I'll just try to go through it. But here you have the same visualization and the only thing that we pass in is basically what we get back from the API. We don't do any salutations. And here you have some key insights. Ah, sorry, sorry. Once again, I tried to read your lips, but I'm not good at that good. All right, so it tries to go through and it says it's for the last 12 months from June 22 to May 23. And it says which district it's on and the district wise performance. It tells you the minimum, the maximum and the average. And it also gives you some actionable points. Now say you're a district facilitator or something and suddenly you can get these actionable points that you can send out to all your facilities. And the actionable points is actually pretty good. You can, let's see, this is of course live data. So I'm not really sure what it says back, but it's monitoring performance, investigate data quality issues. And it's not only just saying that you should investigate these kinds of things, but it's saying that in Fujihun, a specific region, there's something that might be wrong. And this is not replacing the great stuff that the people before me have talked about, but it's more of a giving an indication and really, really easy trying to see what can be done. So these use cases are of course not viable. They're not ready for production. They're not ready for anything, but I'll just tell you this to conclude everything. If you were to make this app like the old school way with just code, no AI, does anyone like to guess how long you would have taken? I can tell you now, this is under three hours of development. So it doesn't take a lot of money. It doesn't take a lot of effort. It's really simple to get started. I don't know the use cases, but I'll show you. And in true DHS fashion, I'm just expecting you guys to go wild with it. Yes. All right, I'll end there. Thank you, Eric. We are wrapping up now, so it'll be time for the coffee break here in a minute. But I did want to touch on something that a few people have said. I think Eric mentioned education in particular and people saying that AI is going to destroy learning, but there's actually a pretty good talk recently by Saul Kahn of the Kahn Academy who talks about that it's actually the opposite of that. Especially conversational artificial intelligence can give everyone a personalized tutor that understands their context and helps them to do their job or to learn something new and helps them to understand it better. So being able to kind of evaluate a visualization in DHS too, but also being able to say, all right, with the configuration of my DHS2 instance, how do I do this? That's something that if you have a thousand users, you would need a lot of people to be able to answer that question or to spend a lot of time training them. And so I think training documentation and analytics of things within the public health context and within DHS2 is a big area of opportunity for us. Thank you all for joining and everybody is around if you want to continue to talk AI and machine learning and advanced analytics in the future. So thank you to the presenters again.