 But let's get started. So the plan is this last part we added and then we actually move this to tomorrow. So ignore the third bullet point there. So the plan is I'll give a sort of introduction to this, different approaches for integrating tracker and aggregate data when why it's useful. And then Claude will give an introduction to this new Apache Camel tool for doing tracker to aggregate that interoperability team has developed and hopefully we'll have time for any questions towards them. The TB experiences from Pakistan will be a separate session tomorrow. So why is this useful to think about? So the first is that when we're looking at data analysis of our details to data, it's often useful to be able to compare data coming in both from the aggregate and the tracker data model together. So let's say you have an immunization registry in tracker, and then you have your monthly stock reports in aggregate data model and you want to compare the immunizations with the stock data, for example. So for analytics, it's useful to have this, bringing together different health programs or health components. We also see in a lot of places that with tracker, you often it takes time to get the full geographical coverage. So you often have like maybe even years where you have tracker in some facilities, some districts, and you have aggregate in others, and you need to have a way to look at the data together in one place to get the full picture geographically. Often we have tracker and aggregate running in different details to instances. And if you want to then look at this together, you need to move the data from tracker into aggregate typically. And finally, which has been sort of very clear with the last couple of years with COVID, we had this huge large scale tracker implementations, there are performance issues with the tracker analytics as you reach like millions or tens of millions of enrollments and events. So these are sort of the three approaches we have for linking tracker and aggregate data. The first is basically just to have your, if you have your data in the same database, you can pull in within the same analytical objects in data visualizer, you can pull in program indicators, aggregate indicators, data elements in one, one visualization, so you can make separate charts, put them together in a dashboard. You can use the aggregate indicators to combine program indicators, aggregate data elements, for example. And finally, you can actually export your tracker data as aggregate values and save them in the aggregate data model. So I'll just speak briefly to the first two and then the rest of the session will focus on the third one, which I think is what most people think of when we're thinking of tracker and aggregate together. So just a few examples here, where we have, for example, covaq data, combining program indicators with a vaccine doses given and aggregate IFI reporting in one chart. So that's one example. We have a part of a dashboard there where we have the case-based cause of death data together with aggregate numbers on the top causes of death. We also have the possibility in aggregate indicators, like I said, to combine the aggregate and tracker data models, which is what you typically would need for doing coverage based on tracker data, for example, you would need aggregate population data together with your service data coming from tracker. But this is also possible to use if you have, like I mentioned, the different geographical areas using aggregate and tracker. You can combine the numbers you want to look at in aggregate indicators to have the full geographical coverage within the same indicators. Also, if you're transitioning from aggregate to tracker, you might have five years of historical data in aggregate, and then you're starting to use tracker and you have your recent data in tracker. You can combine them then in an aggregate indicator to be able to look at the time series across this transition. But the focus here in this session will be on doing this, taking your tracker case-based data and producing data values for the aggregate data model in DHS2. Like I said, the focus on this recently has been on performance because of these huge COVID tracker implementations that have struggled with the tracker analytics. But it's also important if you're thinking from a health information system overall perspective, where you have your existing routines for doing the routine reporting, you're implementing tracker in a specific domain and you want to somehow bring it together. Also important to keep in mind is how this is actually implemented in the country. So fine, if you're only using tracker, you have your tracker instance. That's where this will happen. Some places you might combine your tracker aggregate in one instance. Perhaps you have, for example, one parallel instance sort of focusing on one program where you do both. But I think what is increasingly sort of the norm and what is recommended is to have separate instance for tracker and aggregate. So you keep your routine reporting in one place. If you're implementing a large scale tracker, you set up a separate database for that. And then in terms of these different approaches that I've talked about, it's only really the third one of actually extracting your tracker data, saving it as aggregate data values that is usable in the third scenario with separate instances. So this is the one we'll focus on. So why would you want to do this? I've already touched upon some of this. One thing is sort of in terms of taking into account the whole sort of health information architecture in the country, you want to have a place where you bring the data together, both your tracker and your aggregate. Same with this, facing in the tracker geographically, you would want to have a way of combining this during the implementation. I touched a bit upon the data use analytics. But a key thing, key limitation with the tracker data model at the moment, in terms of analytics, is the lack of dimensionality of the data. So if you're setting up program indicators to aggregate your tracker data basically to get counts of children in an immunization program or confirmed malaria cases, you can make program indicators for this with the HDS aggregations, for example. But there is no categories in the same way as we have with aggregate data. So there are ways of using option sets, et cetera. But if you have, in particular, when you have aggregate and tracker data together, you want to put them in the same table, it doesn't really work because the way you disaggregate tracker data and aggregate data is different. So if you bring everything into the aggregate data model, you can have confirmed malaria cases under five using a category together with data collected as aggregate data with the same HDS aggregation in the same table. And finally, this performance, which I'll come back to later as well. So there are some challenges with this. It can be a bit complicated. There is no built-in functionality in the HDS too for actually taking the tracker data and saving it as aggregate data values. The last slide will be a bit on the plans for actually building that into the core. You have to wait until the end before I reveal that. So one thing is that you need a tool for actually moving the data. But you also need to map your tracker meta data to your aggregate data for whatever tool you're using to know what program indicator is linked to what aggregate data element and category. And the second point will, at least for the time being, still be the case when we have functionality in core for doing the actual data transfer. As soon as you have two DHS instances, the third challenge is that you don't need to keep your org units in sync as well, which could be a big or small problem depending on the implementation and how many org units we're dealing with and whether there is some service setup for syncing this already. So this is sort of an attempt to describe the data flow from the tracker data being collected until you're able to present it in the data visualizer on the dashboard as aggregate data elements. So we're starting with tracker data coming in, being saved in a few different tables in the database for enrollments, for the actual data element values for the tracked entity itself. Then there is the tracker analytics process, which sort of I always forget whether it normalizes or denormalizes the data to make the analytics queries against this more efficient, but it's still it's still not actually aggregating it, it's just making it more efficient to query against. So when we define the program indicators, the example here of counting children given BCG doses that are under one year, the query against the tracker analytics data still requires to actually count the roles in the database. Even though we're running the tracker analytics, we don't have that number pre-calculated in the database. That's happening when you request the program indicator data. So in terms of the performance, this is the step that is problematic with these huge tracker implementations. Once we have our program indicators defined, we can then from the API extract the program indicator values as a data value set, which is the format DJI uses for aggregate data. And we can import it again as an aggregate data element. Then we can run analytics on the aggregate data values and we can use that for producing visualizations, maps, dashboards, etc. So that's sort of the from A to Z the process we're sort of going through with this tracker to aggregate integration. Of course, the step import, export step relies on having a mapping between the program indicator and the data element and the category option and the attribute option if you're using that as well as the organets. And of course, if you're then changing anything in your tracker or aggregate configuration, you need to make sure that this keeps synchronized. So depending a bit on how the tracker and aggregate system is implemented in the country, there are a few options. Of course, if you only have one tracker instance, you would typically then do this either because you need for performance reasons or because you want to use some of the dimensionality in the analytics. If you have separate tracker and aggregate instances, there's sort of two options. One is to try to do this export and import into different instances. So you have your tracker instance. You do the extraction of the aggregate values from there and then you import it into your aggregate database. I think what we're generally seeing as the best approach in most cases is to actually first do the aggregation within your tracker instance so that you actually have your aggregate data in the tracker instance and then do sort of a plain aggregate to aggregate transfer later. So a couple of reasons for this is that you then have some of the benefits of having your aggregate data in the tracker instance in terms of the analytics performance, in terms of the additional options for doing analytics. And you also avoid at least in the first step to deal with the organizing issues. So then you're sort of separating this into one step where you don't have to deal with organets. Then of course you still need to do that as you're moving the aggregate data later. The disadvantage in addition to dealing with organets later is that you then need to have your aggregate metadata in two places and keep those in sync. So I'll just say a little bit about the less technical and more sort of implementation side of this. If any of you have attended some of the tracker academies, you might recognize the figure on the right there. So this is in terms of integrating your tracker data with an aggregate reporting system like an HMIS. Of course then you need to do the tracker to aggregate to actually do this integration. But there are a few challenges. One thing we're seeing is that often when you have this pre-routine HMIS monthly reporting forms and you have a tracker program covering more or less the same area, it's typically not complete. So if you're thinking of immunization, you might get all your vaccine children immunized under one above one from tracker. But in the monthly reporting form on immunization, you would typically also have some other information on community outreach sessions and maybe IFI, maybe you have your stock data. So even though you have tracker and you can partially replace your routine aggregate reporting, it's often not complete. And the other thing which I already talked a bit about is that you might not have the same geographical coverage for tracker as you do for HMIS. So maybe you have to do both in parallel. There's also all these decisions that needs to be made on how you do the data transfer from aggregate to tracker. So how often should you do it? Should you do it daily, weekly, monthly? Should you update data for the last three months, only the last month, last year, in terms of harmonizing tracker and aggregate? Many countries want to have a way of looking at aggregate data after a certain period, but then how do you do, if you know that you have updates to your case-based data, how do you sort of align that? Similar type of questions in terms of the data quality. What if you have a process of sort of reviewing the quality of your tracker data and you also have a routine of generating your aggregate reporting from tracker? How do you sort of harmonize that? You might do updates to your tracker data after you've produced your aggregate data. What if when you're looking at your aggregate data you realize there must be something off with my tracker data? How do you go back and sort that out? And then there's also the big question of completeness and timeliness of reporting, which is sort of a key data quality metric in the aggregate domain. But how do you know when your tracker data is complete? Then there is the question of data access and ownership. You might have one set of users entering your tracker data. What happens when this is aggregated and becomes part of the sort of routine aggregate reporting flow? Who owns that data? Is it the people entering the tracker data? Is it the people responsible for the aggregate data? And how do you make sure that people actually have access to the data they should have access to, in particular when you're dealing with two different instances? It's also important to keep in mind that if you're sort of transitioning from having aggregate reporting to having case-based reporting, you would typically need to have a transition period where you're doing both in parallel for a while and comparing the numbers you get in through tracker with what you're getting in through aggregate. And they probably will never be exactly identical, but at least comparing the numbers, looking at the discrepancies needs to be part of some sort of discussion on when you think you're ready to end the parallel aggregate reporting and go fully over to relying on your case-based data. Then, which I've already touched upon several times, I think this whole issue of making sure that you keep the whole configuration in sync is something you need to take into account. Also, at the moment, like I said, we don't have this built into Dechise2 fully. So you need to have a tool outside Dechise2 for doing the actual migration. Someone needs to be able to configure that, maintain that, keep it up to date in addition to Dechise2 itself. And there may be changes to Dechise2 that you need to take into account, etc. So some of this will improve in the next few versions when there is more or less built into Core, but this won't be completely gone. Just a quick note on these metadata packages that you've probably heard about earlier in the week. We're trying as far as possible there to include the mapping between the aggregate and the tracker as far as possible. So for example, for the TB packages for Surveillance and the aggregate TB packages, we'll already have a mapping based on codes from tracker and program indicators to the aggregate data needs. And with that, I give the word to Claude who will talk about this tool that the Interoperability Team has developed for data migration. Can you hear me? Yes, hello everyone. Yes, I'm Claude. I am the software engineer who contributed to this tool. I think it was Shatura from Hisp Shalanka who started with this and then I took over. So yes, the tracker to aggregate tool, T2A we call it in short. So yeah, you see here the sign T2A that means the Java application which we use to pull the export, the program indicators from the HS2 and push them back up as data value sets. It's to the same instance, okay? It's not across different instances of the HS2. It's the same instance. Yes, so I'm mentioning Java because it's important. You need to have Java installed in the machine where you're going to run this application. And you see this funny icon here, the clock, because it's a batch job. So it's going to run every so often. Of course, you can configure the time and so on. So one of the reasons why you would want to use this tool is to avoid the state, which can I get a show of hands? Who ever got a time out from a dash? Oh, wow, okay. Much more than expected. Okay, so to avoid the states. Endless spinning circle, the dashboard, and a happy user, a crying user in this case, and probably an overloaded the HS2 server which can have an impact on other operations. So yeah, we should try to avoid as much as possible the states. Now, before you run this tool, there are a number of steps which you need to follow, the HS2 configuration steps. As Olaf said, some metadata packages have these steps already configured. I tried to come up with a clever acronym for this, but I really couldn't. So I mean, if you have some way how to abbreviate this and make it easy to memorize, let me know. So yeah, you need to create a PI attribute, create as well aggregate data elements for the relevant program indicators, map the relevant applicable program indicators to the aggregate data elements which you created in the previous step, and finally assign the program indicators to a PI group, which could exist already or you could create one just for this scenario. Yes, step one, step two, and step three are some metadata packages already set up. So that's actually some work you don't have to do. Okay, let's go very briefly through each step. I'm going to try to make this short because I know it's kind of late in the afternoon and concentration is not the best. So the first step, create a program indicator attribute. Yes, you go into the maintenance section of the HS2, click on attributes, create attribute, and you can just create a PI attribute. So you attach an attribute to the PI. I hope that everyone is familiar with the screen more or less. So that's the first step, very easy. Just take note of the description because we're going to see it in the next screen. I wish I had a laser pointer, but I don't. Okay, the second step is to create an aggregate data element. So this aggregate data element will be used later on to map to the PI, to the program indicator. So in this case, we created an aggregate data element with the code C-V-C-E-I-R-A-G-G-P-P-L first dose. Keep in mind this code because it's going to be used in the next step. And yes, the domain type should be aggregate. So I'm not sure why. Maybe you'll have, I don't know if it makes a difference if you set a test count or track error actually should be aggregate, I guess. Yeah, the domain type. Okay, all right. All right. Next step is to map the program indicator to aggregate data element. As you can see here, we are referencing the code. So the code here, it's from here, the same code. Very important to use the code, not the name, the code. And just, yeah. And the final step in the configuration, in the HS2 configuration, is to assign the program indicator to a group. Now this could be a new group which you created or an existing one. You will see later on that we reference the group ID from the tracker to aggregate to. And those are the four steps. I'm going to repeat. Some of these steps are already, we're done for you in the meta data packages. All right. So let's get into the specifics, the logistics of this how to run this tool. You go onto our DHS2 GitHub page, type T2A, there's the link over there actually, but if you want, yeah, type T2A. And the first result that comes up, just click on it, it's the repository. Yeah. Go to the releases page and download the latest Java archive, the jar. Yes, you can see that that guy there, by the way, the contributor. Okay, once you download it and assuming that you have Java installed, installed into the machine where you're going to run it, you can run it in this manner. Now, yes, this is actually, I hope everyone can see the parameters from there. Let me know if it's hard to see. Yes, so this is the minimal number of parameters you need to run the tracker to aggregate to. The DHS2 API URL, which is the API and points for the web, for the DHS2 web API. The username, the API username and password, the application, the job will run us. Probably this will change into an API token in the next version because it's a recommended way to log in to authenticate against the HS2. The org unit level. These are the program indicators for the org units, for the level of the org unit hierarchy. The tracker to aggregate will export. So this actually, in this case, we're exporting the program indicators for the third level of the org unit hierarchy, which could be, I guess, the facility. Same goes for the periods. For which periods the program indicators will be exported. It can accept multiple periods. It can be absolute. They can be relative. They need to be comat delimited. And finally is the DPI group ID, the program indicator group ID, which we saw earlier on. We need to take a note of the ID, of the group ID. And these are all the program indicators in that group will be exported. And voila, that's it. That's how you can run it. Some other optional parameters. This is a job, okay? So it's going to run every so often. So with that in mind, you can schedule the time that is going to run. I think this is mid to nine, but your micron is not that good. Anyway, there's like a website where you can convert the time to current expressions. I found that very useful. You can also kick off manually the job if you need to. So you can hit the application via HTTP via web browser. And you can also specify the address the job is listening on. Very importantly, please don't leave this exposed to the outside world. Don't leave this application exposed to the outside world. It needs to be behind an HTTP gateway like NGINX, for example. You don't leave it like exposed because, yes, some very nasty things can happen otherwise. Like you can have someone just kicking off the job every time and overload your server. Now, yes, overloading. This job can take a very long time to run by default. And we do that on purpose, actually, so that we don't overload the server, the DHS to server or kill it. So by default, it can take a long time to complete. It depends, of course, on the number of track entity instances that you have and how complicated the PI expressions are and how many PIs you have and how many periods as well you have, as well as org units. So in order to reduce the run time, the job time, we have provided a parameter called org unit batch size. And with this, you can reduce the chattiness, the network communication going on between tracker to aggregate and DHS to server. Less network communication, faster run time, usually. So here you can see an example where using one org unit batch size one. That means that we're exporting a PI for each org unit once. And here we have org unit batch size two. That means we're exporting a program indicators for two units, two org units every time for every iteration. You can add as much as you want, but be careful that you can kill the server like this. So please don't run this on production. Test it first. See that it scales well and then you can try it on road. I'll run it on road. Kind of the same thing, but for periods, yes, I forgot what this slide is about. Yes. For periods, by default, split periods is set to true. So that means it's going to export API for each period, but once for every network request, network message. If you set it to false, it's going to aggregate the requests into one where it batches the periods. Again, you would use this to optimize the runtime, to reduce the runtime. As I said, be careful. Test it first. Make sure you don't kill the server, and then once it scales well, run it on road. There are a few more other parameters. I'm not going into them, because I don't want you to doze off. But you can find the documentation in the readme file. It's in the GitHub repo of the Tracker2Aggregate. And there's the blog as well. If you go on to the DHS2 blog, I go, basically, what I explained here, what I explained there, much better and with more detail. Okay, that's it. Thank you. So only one slide left, which is the roadmap for doing some of this within the DHS2 core. So the plan is that the backend functionality for doing the Tracker2Aggregate data transfer. So basically what the tool you just saw is doing is, according to Lars, for 239. So that's the backend functionality. When there will be a UI for doing this is to be confirmed. And the plan is that sort of configuring these mappings between the Tracker2Aggregate, that will be done through a new metadata object, basically, for DHS2. So you have configurations that are metadata objects, which you can import, export into DHS2. So sort of self-contained elements. So what the benefits will be of doing this within the core is, first of all, that it's running internally. So it should sort of automatically be more performant because you're not going via the API, via the network. This means that you can schedule these Tracker2Aggregate jobs as part of the DHS2 scheduler, where you also schedule analytics runs, et cetera. It also makes it possible to run this sort of automatically in parallel. So what Claude was talking about, these batch sizes, et cetera, and doing more at the same time. This can be done within this feature in DHS2. The plan is to have some auto partitioning of the program indicators. So doing the same again with the batches. If you have a configuration with 200 program indicators, the system will then take care of separating those into smaller and jobs. Finally, this means that sort of dealing with authentication, et cetera, will not have to be done outside of DHS2. That will be part of the configuration. So this functionality in DHS2 will actually both support doing this within one instance and also doing it from one instance to another. So there you have the possibility of doing it across instances in one operation. So then you would need the passwords for sort of your target in the DHS2 system. That's the plan. Any questions to any of this? If you're still alive? Really excited for this because, yeah, I've been doing similar stuff. So it's exciting to see it getting a proper tool. The last time I asked, there wasn't any support for attribute option combos in the T2A tool. So it was only splitting by category option combos. Do you know when that's going to be available as part of the tool? Yes, I was actually going to implement it this week, but I was busy at the conference. So it's going to probably be next week. Thank you. I totally understand the benefits from combining tracker with aggregated data. But my question would be, do you see benefit of using this, just converting tracker data to improve the speed or efficiency in the dashboard? Do you think that's going to be something? Yeah, I think this is relevant if you have a large scale tracker and you don't have any aggregate system you're integrating with. I think this is still useful because pre-aggregating the data in the aggregate analytics is more efficient than doing the program indicators based on aggregate tracker analytics. I would say you want to be efficient if you have the time and stay in the analytics, but also you have the time. Yes, you run it off peak hours. Yes, you would run it off peak hours. But yes, all I've said, the advantage is you're running it once. You're not exporting the PIs every time you load the dashboard. And then of course, actually that's one slide I should have added. Then you would load from the dashboard. You would load the aggregate data element instead of the PI. So essentially doing this aggregate to tracker to aggregate means that you're only exporting this program indicator once for each org unit and period instead of every time someone opens a dashboard or chooses that program indicator in analytics. Yeah, I think you did talk about data elements. I would get data elements. You are mapping data. I would get data elements, but we are dealing now with the category of combos. So that was applied for the category of combos. Okay. Yeah, so that it's actually in the program indicators. That's actually one of the built-in attributes of the program indicators is called category option combo for export. And there's also one attribute option combo for export. So that's part of the program indicators. Whilst for the attribute for defining the data element that you're exporting to is not built into the program indicator. So that you need to set up as a custom attribute. And there is a reason for it, but I don't remember exactly what. But something. A couple of questions from Zoom. So could you talk more about how Apache Camel was used? I mean, it's going to be an assembly. It's not really, it's not going to make a difference in this case. But yeah, we use camel types with the performers with the split with scaling up with performing the operations in parallel. For example, you have the split better, you have the split operation in camel. And that's actually very useful because it transparently runs the operations parallel instead of in sequence. So that's actually instead of you writing it in plain Java code or whatever it does it for you. But yeah, it's all hidden away. So the user is not going to, the administrator is not going to know about it. Cool. And the other question was, can you please upload your slides to schedule? Yes. Anyone else? Predictors, John. Yes, you can use, you can do this with predictors. We don't know how it scales. You know, tracker to aggregate has been used in many times, many variables, many places. It's not new, but like every, we have multiple solutions. So one is like what you demonstrated and other things. The good thing is that because it's, we are running in the back and sending it as a batch, right? So what we did also was is to, we didn't want to use the custom attribute to create the mapping. So we used predictor to create the mapping. So because in the predictor, you can select where you're getting the data from. And also you can just say which is the output data items, including category, option, combos, and all different things. So you can define that. So we use that one as mapping. And then from the external service or the script, we pointed to, to get all the data and push it through exactly the same way. But I'm just like saying if could have been used, one of the things which I asked about predictor is because of the performance. It was not so good. So that's why it was good to make it outside. So but storing the mapping, it's, it might be a still a good solution. Yeah. And also that's, that was the reason because like John says, it is possible within the predictor to specify a program indicator as the sort of data source and specify a target data element that you're saving the outputs into. And the reason we're starting with this whole sort of doing something outside was that the performance wasn't good enough initially. So, and there's been a lot of performance improvements with the tragic predictors in the last couple of versions. So it is possible that that would also scale and to bigger, but that's not something we've tested for this purpose as far as I know. But I think yes, you don't have to create the attributes, but then you need to create one predictor for each data element. So it's not like it's effortless. And you still need to have some way of doing the mapping. I should also in talking of the mapping mention that there is an app on the app for helping you do the sort of generating the program indicators. When you have like HD segregations, sexy segregations. So instead of sort of repeating those program indicator expressions again and again, Pete just made an app that is available on the app for helping you do the not repeating manually all the generation of the program indicators. Yeah, it's called the program data set connector app. So you can find on the app hub to do that, set the metadata mapping up, and then it already used it creates program indicator groups. So the output of that basically plugs into the input of this tool quite nicely. So yeah, final question. Okay, then I guess we're done.