 So, you know, every year we do this session and almost every year, I'm always, so we keep doing this session because there's always new things to talk about, there's new things to show off. But then throughout the course of the year, I am also constantly reminded on how nobody remembers that these things are possible in DHIs too. Who are you Scott? Who am I? Oh, I'm the analytics product manager, but I and your name is Scott and and I also support just a lot of different implementations doing different things and you know, I'll I'll give an example. So, a couple of weeks ago, I was sitting in a meeting and someone from a ministry that will remain nameless said, Oh, well, we had to build an app because you couldn't do that calculation in DHIs too. And I said, No, you could have done that calculating DHIs too. In fact, you could have done that calculation like two years ago in DHIs too. And they just didn't know so and you know, kudos to us for making a platform that's super easy to build an app for, I guess. But so their solution was not read the user docs, but just default to making an app. There's an interesting like research perspective to have there, but it is kind of discouraged and that's just one example that happens all the time. It's kind of discouraging that we keep improving these functionalities and making DHIs to better to calculate things. And then people are kind of stuck with what they're stuck with the concept of what DHIs to is able to do like five years ago. And it's way, way, way more powerful than that now, obviously. So we do this session to just have one opportunity to update everyone update you all, but more importantly, now this is being recorded. So I have a YouTube video that I can continuously look forward to folks when they say, can DHIs to do this or or I'm thinking about building an app because I think DHIs to can't do this. Yeah, it's more for prosperity than anything else. But I'm glad you guys are here to observe it. How do I make this go? Yeah, OK, that you can use the arrow keys once the cursor is in the box. OK, great. So the first point here is that. I'm pleading with you to not be afraid of predictors anymore. I think I'm preaching to the choir in this room, I think. But everybody who's watching this on YouTube later, please don't be afraid of predictors because there was this predict when predictors were introduced. They and essentially if you're not familiar, what a predictor does is it allows you to run a calculation as a kind of like a as a job. And then it produces a new value and that value is saved as a data element in DHIs to. OK, and then that data element can be used in every other way that you use a data element. It can be put into a chart. It can be put into an indicator. It can be just normal data element stuff. When predictors were originally developed by Jim a while ago, they were very powerful tool at the time, but they did run into some performance issues. People were saying, oh, wow, I can make a predictor for every single data element and run that job and it'll just magically work. And what we realized was that there were some scalability issues. The thing to remember now, though, is that technology continues to improve. Jim continues to make great code and he has improved the predictor performance by like, say, the percentage 140 times 140 times, which is. 14 40 percent. Yeah, 14000 percent. So though, right. So I mean, those issues that those issues that you had like four years ago, they don't exist anymore. Well, at least we're pretty sure because no one's telling us that those problems still exist. You can find new issues this year. There's other issues here. So the point is that performance is really dramatically improved and that needs to be known and understood and stop living in the past. Start using predictors because it'll actually work. And the other thing is we really want to hear when people are struggling. So don't suffer in silence. You have our emails, Scott and Jim, you can also reach out in the community practice. If you just Google predictors, DHS to you'll get like 20 links to the community practice. And half of those links are where Jim and I have basically answered the same question 10 times, saying, how does this work and given a very detailed, long response? So there's plenty of information out there. Of course, there's everything in the user documents that is is quite thorough. Now, there's also quite a number of YouTube videos on how predictors work. If you like the sound of my voice, you can watch those YouTube videos because it's me explaining how predictors work over and over again. But but there's plenty of plenty of resources. So just please don't suffer in silence. And if you do run into issues, email us. So now we're going to jump right into some use cases for predictors and how we're going to do this presentation is I'm going to kind of give you some context. And then Jim is going to explain all of the math and science behind it, the technical stuff. More or less. Yeah. So I feel like I'm like kind of just the hype man, more or less, for this, like the flavor, flavor of predictors. You're the front man of the band. Yeah. I know that's that's I stopped paying attention to. Pop culture a long time ago. So that's my that's my most recent. That's my only reference. OK, so that's but OK, serious. So what do you do? And this is not even a rhetorical question. So let's say you have your population, right? You get your population data. Maybe you import it from our new cool Google or Google Earth engine population. But that population comes in from 2020. The only the only thing that we're able to get from World Pop is the population estimates from 2020. And let's say you know that you have a 2.2 percent growth rate in your country, which is quite standard for most African countries around there, about 2.2 percent. How do you then? Project or predict that data to be applied to 2021 and 2022 or 2023? Because you're getting you're getting other data in for today, you know, this month. And you want to use that as a population denominator and some of your indicators coverage indicators, right? How do you get that 2020 data into 2023? And remember that you have all of that data being disaggregated by sex and age, right? So you have under one, one to five, six to 10. And then you also have male female. So you have quite a disaggregated amount of data that's for 2020, but you want to have it applied with a 2.2 percent growth rate to 2023. Does everybody kind of get the scenario? Great. OK. So that's the first situation that Jim is going to explain how you can actually do that. The other one, which is becoming increasingly common is let's say you have using DHS to attract commodities. And you have your commodities disaggregated by brand and dosage. So you have like Paraset, but you have like 10 different producers of I guess Paraset is a brand. I'm not sure. Anyways, you have like some kind of antihistamine or whatever. And you have like 10 different brands of that antihistamine with lots of different dosages. But you have but you're tracking just antihistamines, right? And you need to be able to count the number of facilities that are stocked out by antihistamines or maybe by a disaggregation of the specific brand or dosage of antihistamine. Does this make sense? OK, so. Jim, so I'm going to talk about some of the stuff that's got promised. Really, I am. But we just put this slide deck together in the last whatever. And I'm also going to talk about new prediction features to provide multiple predictions with the same predictor definition. And I mean, it's always done multiple predictions in terms of different org units and periods and all that. But there's there are new ways that it does multiple predictions. And I'll talk about them. One way is by data element group that can answer some of the questions that Scott was just saying about your commodities that you're tracking and you have a certain amount that's dispensed in a certain amount that's restocked in a certain amount that's lost or past the expiration date, whatever. And I'm also going to talk about prediction disaggregations. And on all the slides, I'm going to try to show which version the feature was introduced. And I'm going to talk about a lot of new features. And I realize that a lot of you are probably on older versions of DHS, too. So you may have to be patient to get them. But at least you'll know what's coming down the pike. So, yeah, next slide. OK, this one's back. This one's back to me, sorry. OK, so this actually kind of answers one of my old questions is in logistics predict we've started to do prediction by data element group. And what this means is with logistics, the way that the logistics team has set it up in DHS, too, is there was kind of two different methodologies. The old way was to have a single data element for each commodity and for each like value, I guess, for commodity. So like you have a drug, a received drug, a dispensed drug, a missing drug, a resupplied and just goes down the list. And you do this for drug B, drug C for all like tracker commodities, which could be like 200 commodities. So that means you're producing like a thousand, two thousand data elements. So it just became an administrative nightmare. No one can manage that. Well, I mean, some people might be able to, but it's it's a it's a it's a it's a serious task. And it becomes even more difficult when you have you're constantly changing commodities. So it's not like your commodity list is stagnant. It's always changing in these countries or in most implementations. So you're adding new commodities, you're getting rid of old commodities. And so you're continuously updating these two thousand data elements. It became just horribly burdensome and nobody could really manage it. So what the logistics team has actually worked out is that you have one data element per commodity. And then these different values like received, dispensed, missing. These are grouped together in a category combination. OK. And so that category combination is then applied to the drug. So when you have a new drug, all you have to do is say, you know, you know, just make a data element for the new drug and one click, you apply the category combination and then all of a sudden you get all of those different fields where you can enter the received, dispensed, missing, etc. And, you know, some of you might be thinking, wait a minute, that's not a category combinations were designed to be used. Category combinations, if you're a real DHS to nerd like me, you know that category combinations, the original design was that everything should be aggregatable within that category. So you have like age, you have like a population being broken down by age and sex is a common one, or you have like total number of cases being broken down by age, that total number of cases, which is the data element, is aggregation of all of the values that are in the category combination. If this makes sense to anyone. So. But of course, if you do it this way, these drugs, these don't aggregate to some a meaningful value. That it should be just be gibberish. If you look at an aggregated value of all the commodities that received, dispensed, missing, etc. It just it doesn't add up to anything that's actually represents a true value for any purpose. So. But it is a means to make it very easy to manage that list of commodities, this approach. And so what we've been doing is kind of almost unscrambling the egg a little bit and saying, OK, so we have a really great tool to actually manage commodities and DHS to using category combinations. We have an easy pathway to do that. But we have this kind of legacy concept of that it all has to aggregate to something meaningful. And one of the ways in which we started to unscramble this egg is being able to apply predictions by data element groups. Yeah, question. Yeah. Yeah, the question is it would it would it make sense if you had them like the the commodities out represented as essentially negative. So it means your data entry people would have to always enter negative numbers on the form for and that would be awkward. Which no one's ever going to do that and it's still a burden. Yeah. So so for one example of so this is again a new feature in two point four, which just came out. So many of you may have to wait in order to actually use it. But let's say that we have all of our commodities are together in a data element group. And that's that's the way this feature works. And then you have some category option combinations for as was said for, you know, one for stock balance, one for restock, one for the stock that's used during the period, one for stock lost. And then this is the actual syntax. You don't have to remember this syntax, but just say, yeah, I saw it once and he told me it worked and you can go back and find it in the user manual now. But you're saying basically a single predictor can operate for all the commodities for all the data elements in your data element group. And the syntax that we dreamed up is you say for each and this is a variable you can define a question mark anything you want for each data element in the group. And this is the idea of the group then take the sum, which means in predictor speak, that means go into the past period. And in this case, you're probably just looking one month or one week back, depending on your stock cycle, go to the past period and take the start balance at the start plus the restocking plus the sorry, minus the stock used minus the stock lost. And that because predictors can go through time, you can put that into next month's opening balance. And that can be your output. And again, because you said sum, you did an aggregation function, even if you're just taking it from one month in the past in a predictor, when you use the aggregation function, that's the predictor's key. Go and get it from your past period data. So this can you can run this and it can update the balances for the next for the next period. Any questions about that? And the next example I'll show. Does not have a sum operator. So this means it's only operating within the current period. Let's say that you wanted to have a net stock change. Did we gain stock or lose stock on the net? So you want to compute that by just the restock during the period and how much stock was used or lost during the period just shows you the net gain. And let's say you just want to compute that for the for the same period that you have these values in, you're not trying to set next month's starting balance. You're just within a month. And so there's no there's no cumulative thing. You just say for each data element, calculate the restock minus what was used minus what was lost and put it into a new thing that you call stock change. The group is for all the commodities you want to do all the anti histamines and all the anti everything else is and positive anything probiotics and anti histamines and all that. Yeah, yeah, right, right. So it's up to you to define the data element group. You can have a different data element groups to do different things. That's fine, too. So, yeah, it's it's it's what this happened was that, you know, the people doing the the stock manager were saying, we're going to need a Python programmer to write a script to generate 500 prediction predictors to each what they handle a different thing. You know, there's there's a better way. So we invented this the second way in which predictors can have recently been enhanced to do multiple things in the same predictor is to do disaggregations. So a disaggregation is a category option combination. You have male sex equals male and age equals under five or whatever. Combined, you can have one or more dimensions there. So, for example, you might be having total on on on TB treatment is your output data element and it may be disaggregated by age and sex. So without disaggregation, this happened before. So oh, this this didn't quite make it for 2.40. So you're going to have to wait for 2.40 point one for this for this feature. But the way it used to be is that you'd specify a only one category option combo. So predictor could only output to the default cat option combo or if you had if you wanted to produce to to predict to male under five, you'd have to say predict to the combo male under five. And if you wanted something for all the age bands and all that you'd have to define a lot of predictors. So this is a way at trying to get around that and how you configure prediction predictor to work with disaggregation. So there's actually no special new thing in the configuration. You just select an output data element with disaggregations. So if your output data element is only disaggregated by age, you'll get predictions that are only disaggregated by age. If your output is disaggregated by age and sex, you'll get predictions that are out aggregated by age and sex. And then don't select an output category option. Don't say I want male under five and by default, it'll just give you all the all the combinations. Yeah. Yeah. Oh, so this is a completely different feature from the data element group and and use for yeah. Yeah, the same goal is to use a single predictor for a lot of things. And with the stock management, they have a certain way that they treat category option combos that's different from this. And this is treating the category option combos in the classic DHS to sense their aggregate their disaggregations. So yeah, it's it's for a different use case, but it accomplishes the same thing of using a single predictor for a lot of different category option combos or data elements. So in the next couple of slides, I'll show some examples of how. So if you have input that's already disaggregated by age and sex and it can be again, can be many different data elements, whatever kind of fancy computation you're doing, it'll go through and it will say, OK, we're going to output for female under 15 will just choose all the input combinations that have female under 15 make that put it into that value. And then we're going to implement we're going to produce this. Let's look for all the male under 15s put it into that value and so on. But if your output is only disaggregated by age, you can collapse dimensions if you want. And again, it'll say, OK, the output has just age as the as the category. And that means we want to get under 15 and 15 and over. So it'll go through the input and say anything that has the under 15 will go into under 15. Anything that has 15 and over will go into the 15 and over. And even more, yeah, next slide. If you have a category, an output category that has only positive or negative test result, but if your input has a different category that has positive negative or unknown, then it will go through and say, OK, we've got to get all the under 15 positives, I'll pick that. We got to get all the under 15 negatives will pick that. And this one doesn't go anywhere. So if, for example, you want to show all the tests that had some result. Or a known result, you could automatically filter this way. So it's really up to you to define the category combination you want for the output and it'll go and collect the input. And the way it does that, here's kind of the secret sauce is it looks and something and says, OK, this is positive. That comes from a category that has positive, negative or unknown. And so if I'm looking for positive, I'll put it through. This is, you know, jump down. This is unknown, OK, this comes from a category that has positive and negative and unknown. But I know that positive and negative are some of my outputs. This was Ben's idea, by the way. I know that positive and negative, I'm going to get to them later. So I'm not going to put unknown into either positive or negative. So if you have other data elements that don't have that aren't related to these categories, it'll just some it'll provide it'll be business as usual. If you sum them up, if you don't have any category option combo or if you do have a category option combo, you get what you ask for. So that was kind of this. This took a couple of iterations and an aversion of this was actually in 2.39, but we didn't have the front end yet. So I didn't put it in the documentation. And since I've rewritten this a couple of times, I think I've gotten a good a good way to do it now. Yeah, and you could filter out the ones, you know, one of the values of doing this and this has come up in discussion with PEPFAR is if you let's say a little bit different from this, let's say that you have an indicator that is just showing the positive results and you want to filter out all the negative results. If you do that with an indicator, it will still show up with the little green dots that will say, Oh, by the way, test result is one of the dimensions here. And if you define a data element where test result is not a dimension because it's showing positive only, then it won't show up with that little green dot. And that was that was that was exactly one of the use cases that PEPFAR came up with is we want we don't want people selecting the test result for a positive only indicator. Yeah, you can make it you can make an indicator with just one value to the data element. Sure. Yeah. So this kind of filtering is not is not a huge part of the feature. But I just wanted to mention it because often when I put stuff in, people use it for uses that I didn't think of. Oh, the questions. Sorry. Do we I was going to sit in. Yeah, but right after this slide, you can take over again. OK, so this how am I? Where's this? OK. This one is a bit this one's a bit different. OK, so just this is a hard pivot. We should have warned you. We have been talking about predictors. We're still talking about predictors. Yeah, so we but a very different kind of function here. OK, so, you know, DHS to is not a statistical analysis tool. It's not our state or SPSF or any of these right. But there are many use cases where you need to have more advanced statistical analysis functions to be able to actually utilize the data. And one of those is again, coming from a supply chain and then also some from vaccination campaigns. In vaccination campaigns, what they some people need is the ability to predict or kind of forecast. Am I on target? Will I reach my vaccination goals? Right? Vaccination campaigns, they make like annual goals. I need to vaccinate so many children by the end of the year. Right? How do you know if you're eight months into your program if you're on target to reach that goal? Or even earlier, if you're two months in or three months into the project, how do you know you're on you're on target to reach that goal? So what we've added is a statistical formula, essentially, calculation called the probability density function. And it basically, using the probability density function, it will give you the probability that a value that you have today is within an acceptable range of values given a normal distribution. So practically speaking, you say, here's my value today. Is this value within a, you know, what's the likelihood that this value will be in a range that I need? Right? Like this value falls within an acceptable tolerance, to some extent. And so you can say that, like, is my total number of vaccinated children today going, what's the probability that if I continue at this rate, I will get to my target? Right? It's a, yeah, a little abstract, but but you can use that. Then the other one is coming from supply chain again. And what we need to be able to do is based upon commodity consumption data, we need to be able to predict low stocks and stock outs. Okay, so here's my average consumption. Here's what I have this month. Am I going to be stocked out next month based upon my average consumption trends? Okay. And so for this one, we've added another function called cumulative distribution. And cumulative distribution basically says, is this value likely to end up less than or equal to another value? So the, so this is my current consumption and will this consumption give me to a low stock or stock out? What's the probability that this consumption rate will get me to a low stock or stock out? Okay. Yeah. It's not machine learning. It's not AI. It's just statistics, but it's kind of more, much more advanced statistics. Yeah, so we've added these two and statistical analysis functions to two predictors as well. And so here are the two functions. These functions mimic what's in Excel, what's in Libre Office. They have the same functions. They call them a little bit differently, but I actually tested them against Excel and Libre Office to make sure they were generating the right numbers. You can get the cumulative distribution function or the probability density function. Honestly, the cumulative distribution function seems to be the more useful because it gives you from the start up to where you are. It's for those mathematically inclined. It's the integral of the probability density function. So it says, you know, what's my chance that everything up until now is in range? And you can give it with a single argument and this can be a data element or it can even be a sub expression if you want. And if you only specify one argument, it'll tell you what the current period value is and how that compares to the probability of that same thing over the time, over the past sampled periods, or if you want to compare one value against the probability density of a different variable, you can add the mean of the other variable and the standard deviation of the other variable in the expression. So this, again, follows the same kind of optional arguments that other functions do. I even looked at R, but I decided to model it more after the Office products. Yeah. So we've covered the basics of predictors and now we'll go on to talk about other expressions. And so I just wanted to put up here, what do we mean when we say other expressions? Expressions are a common element in DHS2 for indicators, validation rules, predictors, program indicators, and program rules. And program rules, we're in the process of redoing how we do parsing and hope to bring program rules a little more into the group. If you look at expressions very closely, you might notice that some things are left out of program rules that are in all the others, and we hope to fix that over the next release or so when we when we refactor a bunch of this stuff. But mostly we're going to be talking about new functions for indicators and the rest of the thing. But some of these expression functions apply to a wide range of expressions. Hi, you're up. Oh, yeah. Okay. I feel like we should have done like softer transitions somehow. It's just like we're just like, I feel like we're just jerking you into like completely different concepts constantly. But yeah, okay. Anyways, one of the major use cases now for predictors and indicators for that matter, actually, but definitely for predictors is the ability to calculate immunization categorization. So if their WHO has to find for immunization campaigns, and this is just an example, right, but I'm going to get to the actual functionality, but WHO has and Gavi have defined specific categories for immunization campaigns or immunization data in general. So like category one is that your coverage is greater than 90 and your dropout rate is less than 10 equal to or less than 10. And category two coverage is greater than 90 dropout rate is greater than 10. You know, you see this. So how do you then get to a scenario where you have, you know, your coverage and you have your dropout rate and these are two separate indicators within DHIs too. And you want to assign a category to it. You want to say that this or unit has a this facility has a coverage of greater than 90 has a dropout rate less than or equal to 10. So it's in category one. So assign a one value or if it's in category four assign a four value. Okay. And so what you can do in predictors now is that you can do these long kind of conditional statements. So it's just an if so like if this value is greater than 90 or less than 10, then output of one and one thing that's very important to notice here is you probably can't read it too well, but we're referencing an indicator. Right. So you can reference indicators in predictors and other indicators and this is actually the reason why that country that will remain nameless built a nap because they didn't know you could do that. Right. They thought you just couldn't reference indicators and indicators you can. And so this is just an example of how you can put in a long conditional statement. You can make it quite complex and you can reference indicators and indicators and predictors and and a few people in the when Scott and I do this presentation last year a few other people who were in the room learned that too. Oh, I didn't know you could have reference one indicator from another indicator and the answers. Yeah, that's been in there for a few versions. It's in the user guide, but of course there are a lot of things in the user guide. Yeah. And people don't read number one rule of number one rule basically of lies. People don't read. I mean, I don't read. You don't read. No one reads. So the magic behind the magic behind the color code you just saw is simple. It's just an if statement that we have in in in all kinds of in all of those expression types even even program rules. And you can just say in a predictor or even you can just do this in a straight indicator. These can be data elements and this is just an indicator expression. It can be very simple. And if you're just saying if this data element is less than 10 and this is greater than equal to 90. I pulled this code straight out of the thing that Scott wrote that made all those colors. It's just an if statement and it gets a little messy because you have a lot of close parentheses over here and we have a ticket which for something I've been wanting to write for years. But DHS to is not is not is not driven by people like me who want to do geeky things. DHS to is driven by users who have concrete use cases and some you know people come to me and say I don't like all those open parentheses and close parentheses and if we need a case statement that says if you do this do that if you do this do that and I say to them. I've been wanting to write that for years but no one will write a Jira ticket that says we need a case statement so please write a write a Jira ticket and they did and then it got approved now in the next release. I expect we'll have a case statement but all I'm saying is users are really important to the process because we don't just go off and do our our geeky things because we think it's cool. We do things because people come to us and say I have a need and when you're writing that Jira ticket please please please tell us what it's for. Some people write a Jira ticket and they say I want you to do this thing and return that result. And it's like okay but if you say I need it for malaria coverage or I need it for now for education or whatever just tell us what you need it for because that helps us understand the meaning and that helps us prioritize it instead of just saying you know tell us why not just what when you write a Jira ticket so at the moment you have to do these nested ifs because no one asked for a simpler way but now someone so in the future you'll get a simpler way another new okay so if that was an old one something that's here's an example of something that somebody else just asked for I said yes I'd love to write that is you ask is something in a group of things so if you say is this data value in a list of negative or unknown this was requested particularly for program indicators because in program indicators sometimes you want to compare something against the number of things sometimes each of those things it generates a sub expression in sql when it runs and it's costly and so you don't want to say yeah or sorry this thing itself may generate a sub expression so you don't want to say if this costly thing equals that or if this costly thing please evaluate it again equals B or if this costly thing please evaluate it the third time equals C now instead you can just say if this costly thing is in a or B or C and so it really but because it's a nice language feature we make it available for other types of expressions as well so this is new in in 2.40 okay Scott sub expression this is this is the thing I got most excited about because when we gave this expression these this talk last year we had a new thing that was called sub expression but it could only work for a single data element or a single data element plus category combination then and I'll show you on the next slide why and how it's been changed but the response we got last year was yeah that's kind of great but we really want to be able to say how many how many facilities do we have where the malaria rate is over five percent of the population we want to compare two different things like the malaria rate in the population and having you know how many facilities where we had more than 20 malaria cases that's okay but we really want to compare two things so now at least as of 2.40 you can compare two things so you always could say again if you have more than a fixed number that was true since 2.38 point one is it didn't quite make the 2.38 release and now you can say you can compare two things so go ahead Scott we'll talk more about that so our sub expressions used to work is because each in the analytics tables each data value is just on its own row in the SQL table so it used to evaluate the sub expression on a single data value in a single database row and that's why you couldn't compare two different things because they're in different rows and then you aggregate up from there so you could say count the number of facilities where you have to do this thing before this was added you have to use predictors and you say each per each facility is a one or a zero and then you use aggregate data to some the ones or the zeros sorry I'm kind of jumping around a bit I know there are people here know what I'm talking about and well there are other people just to just to clarify a sub expression gives you the ability to count the number of org units which everybody needs all the time right is like and it's and I apologize it took us till version 238 to put it in we knew it years ago but it you know things take time so so you you need to be able to count like the number of org units that have a higher have a malaria outbreak the number of org units that are stocked out the number of org units that have such a high patient load and the sub expression gives you the ability to do that in an indicator and so the way they work from 2.40 is we aggregate all the dimensions except for word units so if you're collecting weekly data and you have a quarterly report will add up all the weeks in the quarter and if you have a bunch of of disaggregations that you want to be added up will add them up because maybe you're just looking for any you know total number of malaria cases regardless of which age and sex demographic they exist and will add them up then we evaluate the sub expression for one organization on each organization unit and then fine then the sub expression typically what you very often will give you ones and zeros if an organization unit is something you want to to count with that indicator or not and then finally oops we add up all the ones and zeros across the organization units and that gives you the count of how many org units are whatever you're trying to look for there's a small analytics cost it's not free and in tests that I ran it adds maybe 25 to 30% of so it's not bad it's doing a little more work but it's not doing several times the amount of work if you were to do this in predictors then it would be 30% faster but you'd have all the overhead of writing additional data elements back into the database and then having to run them through analytics again all that kind of stuff yeah but this is a great time to bring up our sponsor for this YouTube video transfer wise and Nord VPN if you I think that we have to edit that part out Scott and I are both looking like who's going to get the next slide I don't know it's a surprise every time Oh, so aggregation I just talked about that we're aggregating you know all this stuff now and we're aggregating this stuff later what do we use for the aggregation when you're within before you do the sub expression is applied we take whatever aggregation type you have for that data element so if you've said this data element sums we sum it you said this data count we count it but because we have this aggregation type function you can override that if you want because they in this particular event use the average instead of using the sum it's completely flexible you could say whatever you want this is before the sub expression is evaluated you take the data element you know inside the sub expression and you say this is aggregating before you evaluate the next slide if you want to once you're going up and doing all the org units by default it uses sum so by default you're you're very often counting the number of org units in each district that are but maybe you just want to say okay does this district have any org unit that is over that limit and in case you and you can override the whole thing with aggregation type max to get that answer for example so it's outside of the parentheses you can have an override you know as many different data elements as you want inside and outside it gives you complete flexibility to say how you want the aggregation to work both inside and outside the sub expression so yeah there's a lot of new stuff and this is all written up I hope intelligibly in the in the news user guide that's all out there okay skyron okay cool so this one's maybe this one's hopefully pretty easy to wrap our minds around we have a new expression in indicators that is year to date and what it gives you is it will give you the total sum of a particular whatever you put into the indicator or the year starting from the beginning of the year to that date okay so we already do this in like the data visualizer app so you put in like this is like bcg doses given and then you just do cumulative values right and you see like you get a chart like this so every month they're doing more bcg doses and at the end of the year you end up with some high you know a value this just produces this number as an indicator right every single month like it just produces this number so instead of giving you like the value every month that gives you the sum of the values over the over the year okay so again lots of folks have been asking for it and we we still want to do add year to date to pivot tables which is something it doesn't do that cumulative values to pivot tables which is still a request that we haven't managed to do yet but this is a little bit of a workaround so you could do like year to date for an indicator and then just turn on that indicator for every month and then we'll get something that looks exactly like this in a pivot table and just put the indicator in the pivot table and you'll have it for now yeah yeah as the question is thanks yeah the question is is there a way to do financial year to date and the my answer to you is which financial year because there's like seven of them that we currently have the PEPFAR one the most arguably most important one and I have another answer which is this is what we were asked to do so this is what we did but I had in mind that you could do this for any period type so write us a Jira ticket I'd love to do it and in fact write it right now and I'll really seriously approve it right now in fact we could have you yeah we could have you specify any period type so you specify the financial period you do I want it for financial October financial not sorry October but November or or July or April and we'll do it yeah love to do it thanks for asking this is how software gets made I think so what do you mean like within a quarter weeks within a quarter how are we doing by the end of the quarter or yeah I mean yeah it would it would just up until whatever the period you turned it on to so you have a use case for that right a Jira ticket if you don't have okay this is this is how we determine what's important to implement what's not we don't we don't do speculation so just keep your comments yourself if you're just wondering okay and this is the syntax for for doing that it's pretty simple you just have something you say dot year to date and it will give you that and then I'll show you what these other things are if you want to say what's the average of all the periods so far then we have a special thing called period in year if you're talking monthly periods this is one for January and two for February and three for March so if you want to say this is March what's my year to date divided by three this will show you what's my average so far and we were asked I was asked for this by the way by the way it wasn't just my idea and any questions about that first before we get to the final one yeah no this is all in indicators would you like it and program it please write us a Jira ticket Oh sorry the question was is this period in year special value available in program indicators and I said write us a Jira ticket I'll put it there I'd love to really not very good to people like if it's not in Jira it doesn't exist in our brain and finally if if you say you have a population and and you want to get to everyone in the population by the end of the year and you're doing this monthly that means you want to get to a 12 of that population each month on the average and you want to track how well am I doing am I am I going to make it by the end of the year then you can you can take if this is the population of your district or whatever unit you want then you can you can say this times its March this times three divided by 12 am I on track to make it and the value of using these and not just hard coding in a three well you can't hard go to three with the value of not just hard coding a 12 is if you put this in your indicator then someone can do a pivot table and say I want monthly or quarterly or weekly and whenever they select a different thing this will show most most years we say have 52 weeks but because of the rounding once in a while you get a year with 53 that all comes out in this this will give you 52 in the years that at 52 weeks this will give you 53 in the years that we have 53 weeks you don't have to worry about someone selecting monthly quarterly whatever yeah so these are to do new to do constants for indicator so far but I hear that they're going to be in program indicators before one okay this is a this is a yeah the question. No, that's okay. That's okay. This is a little bit of a repeat but it's worth repeating so you folks have been in DHS to world for a while you remember that in a for an indicator if you're making an indicator there's that little box that says annual says annualized. Do you know what that does that clicking that box. I'm giving you the answer right there. It just takes the numerator for your indicator and times it by 12 it's all it does it's really like yeah it's pretty stupidly basic but the problem is you get you know like people would do this for like vaccinations and all kinds of programs. And so they're like I want to I want to annualize like my BCG doses under one so that you have a data that's captured monthly but then you have a denominator that's done annually so like total target for my vaccinations or total population or something this month this year that's your denominator. So when you annualize it essentially all you're doing is taking that numerator times it by 12 and then you're basically just generalizing it across the whole year if that makes sense like you're not as if for every month was like this yeah it's like every month is the same and we know every month is not the same. So what you're able to do now in indicators is a function called period offset and the period offset is allows you to actually add up every single month in your numerator so instead of doing an annualized you can just say I want to add up essentially every single month and then you know as your numerator and then as your denominator you can say you know like total live births or at your annual target so you actually get the data as opposed to just taking a generalized numerator that's just of some value times 12 does that make sense. Okay and before we had period offset indicators could not move through time and you couldn't get this the best you could do is say take this month's time 12 as the annualized value. And now with period offset so this is the complicated result you know formula for the last 12 months but you can also just move something positive say how much of something changed since last month. So what's the value this month minus what's the value of the same thing with a period offset minus one for last month or last week or whatever your reporting period is. Or you can say like percent change since last month or something like that because it's relative to when you're running when you're actually put the period you put into the visualized the data visualizer app or whatever app you're using to the maps app etc. So you can put in last month you can put in last year and it would still do the same calculation but it's relative to whatever period you set in the analytics. And this is way ungainly and maybe when you ask us for something by a Jira ticket will make it even better and more elegant but now at least you can say take me this month's plus last month's plus did it up to 11 months ago and you'll have a 12 month rolling average or 12 months on which you can divide by 12 if you want. Yeah not sexy to put in but but it works pretty it works. So now we enter the part where well actually with with period offset we're already entering a recap of the best of you know recent years expression enhancements. So all this stuff was was we presented it last year and it's been there for a while. If you remember at all you you're welcome to leave at any point. Aggregation type this is something that again was asked for for years and years and finally I went to El Marie from South Africa last year I said we did it we did the thing you've been asking for for 17 years which was to be able to say I have a data element which has a default aggregation type of some but maybe I want to use it in this indicator with an aggregation type of count. And so there's this dot aggregation type that you can use you can even use it on a whole expression and it will override all of the aggregate all the data elements within that expression. So it it it applies to a group. You can say an average value put any of the standard aggregation types in and override it for this particular indicator formula. Minded in Max state this one that far requested it but Scott assures me that there are many installations out there that have the same thing what happens is you things change over years you know you have one set of data collection requirements and then the next year they say oh but we still want to collect that data but it's no longer part of this total because now it's redundant because we're collecting it somewhere else. So your indicator expression needs to be different depending on which year you're doing it and many installations as I understand have you know the 2020 indicator you got to run it on 2020 you have the 2021 indicator you got to run it on 2021 you have the 20 and so on and what this allows you to do is make a single indicator formula. But you use this and you say okay this data element only counts during 2021. And some other data element was was collected with data up until 2020 but after 2020 you don't want to count it so you put a max date on it of December 31st 2020 you're not going to count it later. So this allows you to make a single indicator and have data elements coming in or out depending on the year. You're very welcome to create an indicator per year if that suits your thing better but if you want to create an indicator that spans different years with different requirements that's why we put in them in date in the max date functions. Yeah well what we see is a lot of countries will change their indicator definitions. Because they change the reporting forms they change WHO changes how they're supposed to report on it and then so you end up with like ANC one coverage defined one way for 10 years and then they change it and then ANC one coverage is defined another way and it's kind of going forward right and you and they build these like charts you may have seen it they build like charts or like line graphs that have like two ANC one coverages one goes for a while then it ends because they've and then made a new definite a new indicator for ANC one and that starts over with a different definition. Does that make sense with this you're able to combine the two so your definition will change over time but you can put time limits on when to use this definition and when to use this definition for the indicator. That's pretty dope actually for you. That was pretty simple to write. I use a chair to make in progress. So these are just some other functions that were put in recently and against this one was put in 2.37 we're getting a little little bit older now. But you can you can have different rules for different parts of your organization unit hierarchy. If you have a an indicator that you want to calculate it for all of your countries except this one they're different. You can say if the word unit ancestor is somewhere on the under Senegal word we're going to do it one way or if it's somewhere under Sierra Leone we're going to do it a different way. And again you can put that flexibility into your indicators with this with this or unit ancestor function. Like you may have indicator definitions that change maybe by facility type. So in hospitals you define the indicator this way but in health posts that's my next slide. Oh I'm sorry. Yeah question. Maybe it is the question is this is says it's in predictors and validation rules is it is it also in indicators. It might not be. Sorry what I just said about indicators. Thank you for reading the slide. Please. Yeah the answer is we're catching on fast. Yeah right I think I either that's incorrect or it's not in in indicators and we need a Jira ticket to prompt us to yeah. Yeah interesting that it's in validation rules and not indicators. Isaiah. All ancestors if it's any anywhere above so you can you you can have a country and you can say anything under this country that means the provinces the districts the facilities. Yeah and so what Scott was saying was saying is if you we can also have by organic group if you want to handle something differently you might want to say how you might want to handle something differently. Well I mean that's essentially it so you you have lots of different or unit groups you see countries that have or unit groups by provider by private versus public hospitals versus clinics versus health posts versus whatever just lots of different disaggregations of or units into or unit groups and this functionality gives you the ability to calculate a single indicator differently by or unit groups. So I want to calculate this this coverage or you know vaccination coverage for private hospitals this way but I want to count it calculated you know for public hospitals this way but at the end of the day I just want one number. Right and and this is a simplified example I'm saying if it's in this do a one also zero but this these can be any expressions you want if it's if it's in this group do this data element times that plus this and if it's in that group do whatever you want it's a general expression. Yeah this one you can count the number of facilities in a group or like a number of or units in a group. True although we have simpler ways to do that too true. Could do it that way. And finally again with facilities you can say if a facility belongs to a certain data set treated differently or a facility belongs to a different program treated differently. So we're trying to give you a bunch of extensible tools all of which have been requested by Jira tickets to to operate differently in different parts of your organization unit hierarchy. Yes then OK well. Please ask for it. The manual is never wrong. It's a gospel truth. But no one's going to read it so it's fine. And of course tell us why you need it. Don't just ask us to do it. Tell us why you need it. Of course. But I'm picking up on a trend here Jim that we should probably just self correct. Yeah a little bit and just be maybe like consistent across all of our things. That'd be nice but actually doing this for indicators is actually in a different part of the code and that gets into the analytics queries and that gets into the real time SQL generation and it would have been a bit more effort to put it in. These two predictors and validation rules actually use the same code to pull the stuff out and so it was easy to do both. So if it's easy I just I just kind of put it in everywhere but if not you know let us know let us know the use case. Yeah we'll we'll think about it and probably do it. This was something that went in new in 2.38 people were complaining I get all this I get all these zeros I don't want to see all these zeros. I just want blanks and with indicators you know you can you can filter things out but there was no way to say give me a blank in the pivot table instead of a zero once you have underlying value you're putting it in you're saying oh that doesn't count you still get a zero and so this allows you to say you know if it's if it's less than zero make it don't just put a zero call it a empty slot in the pivot table. So the null keyword and again when I was initially writing the the parser to do all this stuff I wanted to put in no I really did but no one asked me for so I didn't do it till someone asked because I had all these geeky you know amazing things I was going to do but Lars said and he's right no we just do things when people ask for we're user driven not geek driven. Okay and then there's a remove zeros which is sometimes easier than explicitly using the keyword null. So if you have if you say I want to subtract two things and if the result is zero then I wanted to be a null that's what this function does and it's the same as if you say okay if I subtract two things and it equals zero then put a null in the pivot table otherwise put the subtract of these two things. So this function is just an easy way to avoid having to say the same part of your expression twice and just to review some of the other things we have that have been in there these since 2.34 we have test for is null and test for is not null which is the same as not is null. I mean we have a not operator but this is just a handy thing. We have greatest at least functions these some people including me had been tying ourselves up in knots of saying if a is less than b then show a but if a is less than c then you'd get all these if statements that go around and around and around and the code becomes completely unreadable you could just say print the greatest of a comma b comma c comma however long you want to go these take a variable number of arguments it'll just print the greatest one it'll just print the least one the reason these are called greatest and least is because we already use men and max for other things when we're aggregating data and this follows the exact syntax that's there in Postgres they have a greatest function where they were able number of arguments at least function so I figured we don't need to reinvent the wheel there we'll just copy what they do yes no unfortunately isn't yeah okay the question is will it stop it from crashing when and unfortunately no in the Java code if we do the wrong thing we get a null point or error and those are pernicious and difficult and bad and this is just looking at the data if if the if the data is absent it won't it won't fix our internal bugs that generate the null point or errors unfortunately the good question yeah so we're getting near the end of the presentation and here's the laundry list these things have been there for quite some time these are the d2 functions that were there before we did the new parser all the other functions that you've seen have just been there since we completely rewrote the parser in 2.38 I think no earlier than that anyway it used to be that we're using regular expression processing for everything and it it helped to have the d2 tag on the front of all the functions these have been there for a long time and these are just for the program indicators or the program rules and also stage offset was was added without a d2 more recently so we still require all these d2 like you have to type d2 yeah can we just get rid of that make it your take it was definitely not my product stream so okay Scott doesn't want to do it anymore that's a use case and Scott gets to remove new features so let me let me just step back for a minute we're through talking about functions and expressions and just talk a little about about using predictors because this is something that that you need to understand if you want to use predictors predictors can take aggregate data or event data you can put a program indicator or a program data value in your predictor expression the same as you can put them in an indicator expression they they more or less get the same kinds of things and so a predictor can pull aggregate data or event data and then a predictor always generates aggregate data values so yes predictors can be used for tracker to aggregate conversion and that's a whole other discussion that we we still have a lot of work to do on but it is possible so and they can take data from the same period or a past period and actually they're named when we named them that we all we thought that they would only be used for past period data but I was programming and I said well what if they don't put what if they don't tell me how to aggregate that data what if it's not inside a sum function or standard deviation function and since I had to do something either return an error or do something else I said oh well if they don't put an aggregation function I might as well just take the current period data not knowing that people would actually start using that as a feature and predictors now do a lot of things that were they were never intended for originally never designed for never named for you think you're predicting something in the future but you can actually use them for current period data just as well so this is the basic thing of what predictors do and so what are predictors good for they're good for past period data although indicators can now do some of that with the period offset function especially if it's if it's just one past period you're looking for indicators can't or they can't yet give you the sum or the standard deviation of several past periods that can only do a single past period at a time you can count the number of organization units in a predictor and this is something with the sub expression in the indicator we've been trying to provide more ability to do this directly in indicators so predictors is basic they're basically a Swiss army knife in which you can do a lot of different kinds of data transformation and where possible where it makes sense most people would rather do that data transformation just in an indicator without having to go through predictor store extra values in the database then pull them back again so over time we're trying we're trying to give in predictors fewer and fewer jobs to do and maybe someday they'll they'll disappear all together I don't know that's that's one idea I've had but some things can still be done only with predictors and of course some things you just can't do even with predictors but they're they're they're like a Swiss army knife they're like a data transformation tool and think of them that way okay so here is how so for if you have aggregate input data and I will talk about event data on the next slide but how do predictors work with the analytics table rebuilds how do you coordinate all that stuff and this is an attempt to show you and it's it's kind of complicated at the moment and I hope that some year in the future it will be simpler but this is what you got to do if you have aggregate data you need to run the predictor which will generate more aggregate data then before you see it and analytics you have to build the aggregate analytics tables in an analytics run and then you can see it in your analytics and next slide if you want event analytics data to come into your predictors it pulls them from the event analytics tables so first you have to build the and the vent analytics tables from your event raw data and then you run the predictors and then the predicted data can get so this is a really important flow chart if you really if the the simpler flow chart was if you're just building predictors off of of aggregate data but this is the full thing you really got to do if you want event data to go through predictors and of course you can do analytics runs you can only rebuild the event analytics tables if you want in a run and then you got to run the predictors and then you could only build the aggregate tables if you want so yeah sorry this is complicated and I hope that someday it won't be but in the meantime if you know it you can charge a lot for consultancy fees there you go it's not even joking that's it any other questions George not yet the question was given that indicators can reference other indicators can predictors reference other predictors and well no they can't in the expression but predictors reference data values so other predictors can produce the data values and then the another predictor can subsequently reference that data value that was produced by the other predictor so in a sense yes so then you just have to run your predictor jobs in a cascading manner or more than twice but yeah and okay two more things I'll say about that about predictors accessing data that is produced by predictors because this is this is something we've I've tried to accommodate as best I can one thing I'll say is that predictors always predict forward through time so if you have a predictor that takes the stock values and produces a starting balance for the next period and then you want to run that same predictor on the next period that takes those values and produces a starting balance for the period after it's the same predictor it will always run from from earlier times to later times and it makes sure that it actually feeds back any predictions so it can use it as it hits the next period second thing I want to say about that is that we've given a lot of thought to in which order will predictors run and if you run a predictor group it will run all the predictors within that group in alphabetical order so by your naming convention of predictors you can control the order in which all this is in the user manual trust me it will control the order in which the predictors within that group run if you set up a job that runs multiple single predictors you can actually it's not well supported in the user interface but it'll actually run them in the order in which you define them in the job and if you squint your eyes and look closely you can see that order and the only way you can change that order the most at the moment is you you delete the the first one you added and you add it again to the last but it will honor that order if you add multiple predictors in a single predictor job or if you add multiple predictor groups within a single predictor job it will also order honor the order in which you added them and hopefully we can somebody will write a JIRA ticket because you have a use case and the front end people can make that user interface look a little nicer but yeah we tried to do as much as we can to to to allow you to control the order of the predictor runs and so the answer was about writing more JIRA tickets and again this is where we have to prioritize we're not we're not going to ever implement every JIRA ticket that's been written but this is why it's important to tell us the use case because if we look at something and we say oh I understand why that's important then we'll give it priority and if we look at something it says I have no idea what that's for we're not likely to say it's more important than this other thing which we understand and there's also technical feasibility there's a lot of factors to go into what actually gets done so we look at technical feasibility how many you know a level of effort how many developers is going to take how long to be able to put in that and can we afford that given our pre-estated roadmap roadmap that's publicly available on our website and the great thing about Jim has been that he's kind of been a bit more freelance with us and able to kind of put these things in as they come around predictors and indicators and validation rules and I hope it remains that way and and and you know and he's found fairly efficient ways surprisingly efficient ways to be honest to to get in these things as they've kind of been coming now if you ask for like I need a new filter dimension for dashboards right that's a much bigger job and that's going to take a larger team of developers and you know but in this domain validation rules predictors and indicators I think we're able to be a bit more agile than we are with other products in the DHS to platform no we when we were making the the custom calculations or some people call them on the fly calculations in the data visualizer remember we demoed that on Monday we we found we figured out how to do that how to not have to reference the the UID of the of whatever it is the data element or indicator and that kind of stuff and it was surprisingly easy and not not as taxing or heavy as we thought it would be and and we've already talked about like you know holy shit we should definitely just put this everywhere because it's way better and the so now there has to be a hand so we have multiple product streams I'm in charge of one product stream and then the the product stream that has to respond to that is David's product stream so we have to do like a handoff which needs to be orchestrated still but I think we I think you can count on that happening because there's no reason we should ever reference UIDs in in a graphical user interface there might be some reasons but it's good questions you guys ready to call it quits or I don't believe we finished on time we finished four minutes early. Happy hour. Yes, we do. Everybody happy. Oh, good question. No, okay. No, that's it. All right. You don't have to go home but you can't stay here. Thank you.