 Hi everybody, my name is Jacob Wyschniewski and I am a research software engineer at MI Square Data Lab and a student at Warsaw University of Technology. And today I'll be presenting my joint work with Przemysław Biedzek, which is Air Package for Models. For Models is a flexible tool for bias detection, visualization and mitigation. Now let's answer the question, why should we even care about algorithmic fairness? Well, decision making systems and machine learning models had a history of discrimination and somewhat shady practices. For example, ProPublica found that the software used across the United States to find whether someone would be a recidivist or not was biased against blacks. In gender shades, the researchers found that the popular gender classifiers were biased against darker females. The algorithms performed best on light male faces and the gap of accuracy was as high as 30%. As Diverge reported, Google removed gorillas from the training sets of images because the model was prone to categorize black people as gorillas. So as you can see, there are many potential harms to be made and it is necessary to test our models before we put them into production. Now let's define this bias because you may have many meanings but in this presentation and in this fairness field, we'll define it as some discrimination of some groups of people giving unfair advantage to others. From now on, I will call those groups of people subgroups and one of them, the one with the most advantage, would be called privileged subgroup. The vector of indicators of membership to those subgroups will be called protected vector. Bias can be detected by some non-discrimination criteria which are independence, separation and sufficiency. Essentially, they're trying to answer the question if the response of the model or the target of the model is independent of some protected vector or independent given some condition. But this mathematical way of measuring bias is inconvenient and we'll use some relaxations and sometimes equivalence of those discrimination criteria. We can do this with some fairness metrics. Those metrics can be derived from confusion matrices for each subgroup. So with a well-known metrics like true positive rate, false positive rate, precision, accuracy, we can now detect bias. For example, like in ProPublica case, we would like our model to have similar false positive rates for both black and white defendants. The backbone of fair models is Dalek package, which is a wrapper around the data and model and it is model agnostic. So it does not matter if you use XGBoost models from Carat or MLR. Fair models focuses on group fairness metrics and utilizes this iterative accumulative approach where after obtaining a model, you can make a fairness check. You then receive the answer if your model is fair or not. If your model does not pass the fairness check, you have some options. For example, use different data, use different model, or maybe use some mitigation techniques. With fair models, you can, for example, make a dozen of machine learning models, then make some visualizations where you can pick the best one and check if this model passes fairness check. As you can see, this sort of pipeline or model development flow is easy for testing and prototyping your solutions. Now, we'll dive into the details. Let's take German credit data where we aim to predict whether some person will have a good or bad risk of credit. In this case, this is a binary classification. And we'll be looking at bias and fairness with perspective of this protected vector, which is the gender or sex of the person. So let's start with a simple classification model, which will be logistic regression. The next step is to make a Dalek explainer. This can be done with one line of code where we pass the model, the data, and the numerical target. Now we are ready to check fairness. The main function of fair models is fairness check. It wraps around and encapsulates explainer, protected vector, and privileged subgroup, which describes which element of the protected vector is suspected of the most privileged. A function returns object type f object or fairness object, but will assign this fairness object to this variable. The f object can, of course, be plotted. It includes five metrics course depicted by the bars, which are on top of the red and green areas. The area symbolizes the field. The red area symbolizes the field where the bias is significant. So intuitively, when all the bars are lying within the green area, then the model, in terms of these metrics, can be called fair. The border between red and green area can be adjusted with texanum parameter. On y-axis, the subgroups are presented, and for each subgroup, the metrics are calculated. If there would be more than one subgroup, there would be more bars for this metric. So the protected vector doesn't have to be binary. It can have many, many values, or categorical values or some levels. But what are those values on x-axis? Well, let's dive deeper. The idea behind those values is the ratio of the metrics for unprivileged subgroup, which in our case is female, and the privileged subgroup, which in our case is male. We don't care if the metrics are high or low as long as the ratio of them is close to one. For example, let's take equal opportunity ratio, which can be measured by true positive rate. When those true positive rate scores are close to one, then they would be lying within the green area, and we would say that there is no visible bias here. So formally speaking, we would like our ratio to be within epsilon and one divided by epsilon, where epsilon is some value between 0 and 1, which denotes the lowest possible ratio between unprivileged and privileged subgroups. Now let's understand some code behind this furnace object. The values after the privileged parameter are the default values for this function. The epsilon parameter is on default set to 4-fifths due to a guideline set by equal employment opportunity commission. Generally, any selection rate below 80% or 4-fifths can be considered as adverse impact. Cutoffs and labels can also be adjusted. Labels have to be unique for each explainer in furnace check. With such a newly created furnace object, we of course can plot it and print it. Printing is a way of summarizing the plot. The print method shows how many metrics does the model pass, and it also shows the total loss of the model, which is summarized height of the bars. A cool feature of furnace models is that you can incrementally add explainers to furnace object. Notice that when we pass furnace object, we don't have to provide any protected vector or privileged parameters because they are already included in the object. On the furnace check plot on the right, we can see green bar areas added, which denote the ranger metrics for this female subgroup. It passes all of them. So in this print method, it will be printed with green colors. If the ranger would exceed more than one acceptable metrics score, it would be printed in red. Sometimes there is a need to plot unscaled metrics scores, which aren't the ratios we have seen earlier, but the raw metrics scores of the models in specific subgroups. On the plot legend, we have colors and shapes. Each color describes a model, and each shape describes certain subgroup. You might ask, where is this male privileged subgroup? Well, it is represented by those vertical lines. So the intuition behind the spot is that the closer the shapes are to those vertical lines, the better. In some visualizations, parity loss is used. It is a custom function that aggregates some metric ratios over all of the subgroups. So for example, let's take a true positive rate and calculate its parity loss. The intuition behind this method is that the bigger the difference between the metrics in the subgroups, the larger the parity loss will be. These are some other visualizations that use parity loss, and they can be obtained with a code snippets below the plots. Those are only two of many visualizations that can be used in for models. On the left side, there is a rather plot, and on the right side, there is a stacked metrics plot. They both visualize the amount of bias in many metrics at the same time. In the package, there are also some simple bias mitigation strategies, which can be used during a data preprocessing, and after, we create a model with explainer post-processing. The resampling focuses on statistical parity mitigation. It duplicates for remove observations given some condition. The reject option-based classification pivot is a post-processing method that adjusts the probabilistic response of the model. When unprivileged subgroup is close to the cutoff, it pivots to the other side. The opposite things happens when the privileged subgroup is on the right side of the cutoff. You can see the effects of mitigation on the plot on the right. Both methods successfully mitigate the bias in our model. Somewhat new addition to for models is the support for regression type models. It works just like the one for classification, but instead of using fairness check, user must use fairness check regression. This experimental module is using logistic regression to approximate fairness non-discrimination notions. The module is relatively new and still awaits some feedbacks. So if you use it, please let me know what do you think. To dive deeper into the concepts of fairness in Fair Models, I recommend visiting Fair Models' landing page available at this link. There are many sources to choose from, including article, blogs, documentation, and tutorials. You can also visit our blog, which focuses on responsible machine learning. Some things like explainability, new packages, case studies, and there is even whole series about basics of explainable AI. If you want to explain the behavior of your model and understand on what grounds does it make the decisions, you can check out Dalek package, where there is also Fair Models in Python implemented as Python package Daleks module. That will be all for me. Thank you for the attention. I hope that I gave you some intuition and basic understanding of Fair Models. If you have any questions, please go ahead. Thank you. Thanks a lot, Jakub. We have a couple of questions in the Q&A. So the first one would be, is the origin of the bias from the methods of collecting data, or was it the data that were biased themselves? Well, there are many origins of the bias. I once saw the paper that found as many as 20 of them, and it can be obtained or it can lead to the data on many stages, for example, in the data collection or in this state while depicting the state of the world, that we don't want to propagate, or for example, while data labeling, et cetera, et cetera. So the bias can be picked along the road, I guess. Thank you. Another question that we got from Andy is the following. What would be the advantages or disadvantages of measuring the fairness against all cases rather than against only the privileged group? Is there a way to automatically detect the privileged group beyond picking the larger class? Okay, great question. I guess there is not like this automatical way implemented, but you can iterate through every class, see if this bias exists. This numerical values can be accessed through the fairness object. So there is a nice way to do it. So I guess, yeah, it can be like iterated through each subgroup. Thank you. Yeah, we have more time for another question. So just keep them coming. We also have time after all of the talks to answer more questions. So I see another question. What happens if the mitigation is done in a biased way? How do you check or how do we check if the changes fix the biases or result in introducing more bias? Another great question. Well, I guess that in terms of this five metrics that you saw on the fairness check, you can always like plot it after the mitigation to see if the bias indeed mitigated or there were some other biases introduced. You have to also keep in mind that while using this bias mitigation methods, you also affect the accuracy or the performance of your model. So it will most likely decrease. It is not always what we are looking for, but it also can be visualizing for models. Nice, and thanks a lot. And we can get back to more questions. I see there's more coming and I also have some questions for later, I guess. Okay. But now let's hand over to Liz again. To introduce our second speakers. Yes, our next talk is about double ML or double machine learning in R. So it fits very well that we have two presenters, Philip Bach and Malta Kurs from University of Hamburg. They're gonna talk about the estimation of causal parameters by machine learning applied to a wide variety of models. And let's see what they have to say. Welcome everybody to our talk on the double ML package for R. We are Malta Kurs and Philip Bach from University of Hamburg. And we present this project, which is joint work with Victor Schenel-Zukhov and Martin Spindler. First of all, I would like to thank the organizers of USR 2021 for inviting us to present on our R package. Today I'm going to start with an introduction to the double machine learning framework, which is the statistical framework for our package. Later Malta is going to provide some insights to our implementation in R. Let's start with this basic question. What is double machine learning? In some sense, double machine learning can be considered as a combination of the strengths of two separate fields. On the one hand side, we have the literature on machine learning, which is illustrated here by the box on the left hand side. On the right hand side, we have a box which stands for the literature in the fields of econometrics and statistics. Whereas in machine learning, powerful tools have been developed that are able to deliver very, very accurate prediction rules, such as lasso or regression trees or in the forest. The statistics and econometrics literature has developed various approaches to causal inference and also delivered asymptotic analysis of certain estimates of causal parameters. The double machine learning framework makes it possible to estimate causal parameter and thereby base estimation on these powerful machine learning techniques. As a result, the procedure is going to output an estimator that has nice statistical properties, such as an asymptotically normal distribution and we can set up hypothesis test and confidence analysis. On the next slide, we have listed some examples on when causal machine learning might be of interest. For example, when data scientists perform and evaluate a bee testing, when researchers evaluate clinical studies or maybe in political science or economics, when certain policy measures are evaluated. The general question is underlying all these research agendas is what is the effect of a certain treatment on a variable of interest? Let's continue with a short example on a partially linear regression model. In this model, we are interested in the causal effect of a variable D on the outcome Y, provided we control for some confounding variable X. And this way of controlling for these confounding variables can be very general. In this model, we are interested in the parameter theta, which represents this causal effect. When we want to estimate this causal effect by using machine learning methods, we have to be very careful in general because machine learning methods, they generally introduce some form of regularization and thereby some bias. If we naively apply machine learning procedures to estimate the parameter of interest in the, for example, in the partially linear regression model, and the resulting estimator might be severely biased and also have a non-normal asymptotic distribution. In the histograms below, we have illustrations of two separate naive approaches. One is based on the Lasso variable selection, step, and the second one is based on naively plugging in predictions obtained from random forest. In both these histograms, we see that the empirical distribution of the estimators is very, very different from a non-density, which is illustrated by the red solid line. The machine learning framework is a general approach for estimation of treatment effects based on machine learning estimators. Identification of the parameter of interest is based on a moment condition with a score function psi, some data w, and a nuisance term eta that has population or true value eta mod. The machine learning approach can be generally summarized in terms of three key ingredients. The first one is name on a signality. The second one is the use of high quality machine learning estimators, and the third one is the use of sample splitting. Let's turn to the first ingredient, name on a signality. Name on a signality is a property of the score function that is used for identification. The idea is that the moment condition that is used for identification of the causal parameter tolerates some small errors in estimation of the nuisance part eta around its true value eta mod. In some sense, the estimation procedure is now immunized against these first-order or regularization biases that will result from using machine learning estimators. In the partial linear regression example we saw before, we can establish orthogonality by including an additional regression step. Here, if we include the first stage relationship, which is a regression of the treatment variable D and all covariates X, we can achieve or we can get the name of an orthogonal score that has the form below, which now has two nuisance components. The first one is the function G showing up in the main regression equation and the M comes from the first stage step. Let's go to the second key ingredient, which is the use of high-quality machine learning estimators. So in general, we want to use, of course, very, very accurate machine learning methods for estimation of our nuisance part eta. In theoretical terms, this amounts to rate requirement on the rate of convergence associated with the estimators in use. In practice, when we think about the use of machine learning methods, we will base this on structural assumptions. For example, whenever we are willing to make an assumption of sparsity, we won't maybe use some R1-panelized estimator, like LASU. The third ingredient is the use of sample splitting, which is common procedure in machine learning. Sample splitting avoids biases that arise from overfitting. So in the algorithm of the double machine learning estimator, we fit the machine learning methods only on one part of the data, trade sample. We generate prediction on the holdout sample, which is test sample. And then we can swap the roles, which is called then crossfitting. By swapping the roles, we can basically use every observational dataset once for training and once for fitting. Once we obtain these predictions, we can plug them in to the score and solve our target parameter data. Once we include all these three ingredients and under some regularity assumptions, it can be shown that the double machine learning estimator, theta tilde naught, is asymptotically normally distributed. More details involved, and this is why we refer to the paper by Shana Zuckoff and co-authors 2018, and also in our package unit, which is available online, and we have additional and a more sensitive introduction to the double machine learning framework. To conclude our partial linear regression simulation example, we included here now the two histograms that are associated with the machine learning estimators based on lasso and random forest, where we use now orthogonal score and sample splitting. And we see that the empirical distribution is now much more similar to the normal density that is illustrated by the red solid line. That was the theory part, and now I hand over to Maite. Yeah, thank you, Philipp. After we have now learned how this double machine learning framework works, we now want to have a closer look at the actual implementation in the R package called double ML. We have just learned what are the three key ingredients of double machine learning, namely the orthogonal score, high quality machine learning methods, and sample splitting. And these three key ingredients also translate to the central parts of our implementation. So the orthogonal score gets a central role via an object-oriented implementation using R6 classes. And we also wanna provide the user the ability to use high quality machine learning methods. This is achieved by having the API flexible with regards to using all the machine learning methods that are available via the MLR3 ecosystem. And also sample splitting is needed. And here we also built on top of MLR3. So we have just learned the main dependencies of our double ML R package are the MLR3 ecosystem in terms of the MLR3 package, the MLR3 learners, and the MLR3 tuning package. Besides that, for object orientation, we need the R6 package. And as a data backend, we use for efficient data search, we use data table. How could you install or get our package? It is available or we release it to C-Round so you can install it via the standard command and the development version is available through GitHub. So why did we actually choose to implement this in an object-oriented fashion? So we have learned that this name and orthogonal score function is having a prominent role. And if we have defined this name and orthogonal score function for a model we are interested in, we can implement a lot of things in a very general way. So if we have defined this score function, we can, for example, implement the estimation of orthogonal parameters, the computation of the score function, estimation of standard errors or confidence intervals, and also a multiplier bootstrap. In a general way, just using the score function. And this is then actually done in an abstract base class called double ML. And from this we can then inherit all the model classes. So the only model specific parts are then how this score function is actually implemented and which resource models we have to estimate using ML methods. And coming to what models classes do we have. So we have Philip introduced to us the partially linear regression model we see here on the left. But we also have other model classes. So at the moment we have four different models. And for example, if you wanna add an instrumental variable to your partially linear regression models, you can use the partially linear IB regression model. Or if you are interested in heterogeneous treatment effects, we have also interactive regression models. And all those model classes are inherited from the double ML abstract base class. And you can then, for example, also extend that by adding other model classes. So what are the main advantages of this object-oriented implementation of double ML? So first of all, it gives the user a very high flexibility with regards to the model specification. So you can throw in all the different ML methods you might wanna use for estimating your new resource functions. You can alter the resampling scheme. So how many folds are you using? How many repetitions are you using for the repeated crossfitting? You can choose among two different DML algorithms. And you can also choose among different name and auto-general score functions. A second key advantage is that our package double ML is easily extendable. So you can add new model classes by inheriting from this central abstract base class double ML. But you can, for example, also add other resampling schemes. Last, I wanna quickly advertise our online resources. So we have a website double ML.org. There you find like an extensive user guide and a lot of examples. But you can also, for example, see there that we have a Python twin. So if you are not only interested in R, but also sometimes use Python, that's also available there. And in addition, of course, have like a package vignette, which is available as an archive working paper. And with that, I'd like to conclude and wanna thank you for watching this video. And we are looking forward to the discussions at the USR conference. And as we have a bit of time left, I wanna quickly show you, we have two papers available for the two different packages. And on our website, you, for example, also find things like the double ML workflow. It's basically showing you how for a data example, you can use the double ML package and it basically guides you through the steps, the important steps to do, to use double ML for causal machine learning. Thank you. So thanks a lot, both of you. We're still waiting for questions. So please everyone type your questions. In the meantime, I would like to say, I really like the logo of your package. I would also like to know how you came up with that. But I can also start with a question on your talk. So at the end, you said that double ML is really easily extendable to other models, or all kinds of models. Did you get any feedback from people who tried to extend it and made good experiences with that? So I think I can take this, right? Yes, we already got some feedback. And of course, I mean, the paper by Victor Schoena-Zukoff on double machine learning, it's pretty general and it's a very general approach. So it's, I think, pretty common as a statistical framework that it's first developed in a very general way. And practitioners have to kind of adjust the details according to their analysis, right? So you have to cluster standard errors, for example, or you have to kind of, I don't know, you want to take some extension, some slight modification, you want to trim your propensity scores and all these little things that you have to tune and to parameterize in your analysis. And I think this is the part, like we are currently developing the package and we are going to integrate more and more of these kind of, yes, details that are more relevant or like there are not details in the analysis, but they are like relevant if you really work on a particular application. That makes a lot of sense. Thank you. So I see there's quite some questions coming in. There's also one of the questions that I actually had from Andy Pryk, where there are particular real world applications which inspired you to do this work. So we have some kind of experience. So we are here to check statistics and we already have some experience with using like basically Lasso-based methods for causal inference and practice. So, and then the natural development or extension was generalizing it to this double machine learning approach. We basically can use any kind of learning. And then of course we're curious and what to find out if everything works well. And yes, and then this is how we got to this problem. So actually it was kind of, of course given because we wanted to use all these machine learning stuff in applications. Great. I see a more like productive question from Carlo, I guess. You're combining machine learning and statistics. Do you want to conquer the world? I don't know how to answer this question. You can help me out on this. So we don't have plans to conquer the world. We just want to, I don't know, try to combine these two different literatures because they are of course relevant. And I mean on the one hand side, the machine learning literature is developing very, very quickly. On the other side, there is more and more happening in causal inference and there are more causal models appearing in that stuff. So I think conquering the world is like the super, super big goal. Even like keeping up with the pace of publications. This is kind of a still challenging goal. So I think we try to be productive at all. Yeah, maybe we can come back to this after the last talk. So I invite Liz again to present our last speaker. Thank you for the questions. Our next speaker is Jan Ma from Tsinghua University in Beijing. And he is going to talk to us about his package called Copent for estimating copula entropy and transfer entropy in R. Let's see what he says to say. Hello users, I'm Ma Jian. I got my PhD from Tsinghua University majoring in computer science. The title of my talk is Copent estimating copula entropy and transfer entropy in R. In this talk, I've introduced the package Copent, a package for estimating copula entropy and transfer entropy. This work is part of my PhD thesis. The Copent package was first developed during my PhD study. This is the content of my talk. In this talk, I will first give a brief introduction on background of copula entropy. Next, I will introduce the implementation of the package. Two examples will be followed next. One are wearable selection, the other on casual discovery. Finally, summary and some information. First, introduction. Statistical independence and conditional independence are two fundamental concepts in statistics with many applications. Copula entropy is a mathematical concept for multivariate statistical independence, which I proposed during my PhD study. I also proposed a non-parametric method for estimating it in my PhD thesis. Transfer entropy is a tool for measuring casualty. It generalizes the famous grandeur casualty for nonlinear cases. Recently, I proved it can be represented with only CE and therefore can also be estimated non-parametrically via CE. The copland package in R implements the above method for estimating CE and TE. This talk introduces the implementation of the package and then compares it with the other related packages. This slide is for copla theory. Copla theory is about the representation of multivariate dependence with the so-called copla function. At the core of the theory, I discuss theory as shown on the slide. With the coplas theory, we give the definition and the theorem in the theory of copla entropy. We first define copla entropy, a type of Shannon entropy defined with copla density function, with an proof of the equivalence between mutual information and copla entropy. The theorem sees that mutual information is actually an active copla entropy. The difference is that mutual information is defined for bivariate cases and copla entropy is for multivariate cases. Copla entropy is an ideal mirror for statistical independence with several good properties that the other mirrors don't have, such as multivariate, symmetric, non-negative, equal zero if and only if in the independent cases invariant to monotonic transformation and equivalent to traditional correlation coefficients in Gauss cases. The table on the slide compares copla entropy with two other famous mirrors, distance correlation and HSIC. We can see that copla entropy had several advantages over the other two. This slide forms the method for estimating CE. A well-known estimating mutual information is considered as notoriously difficult. Here we propose a method for estimating mutual information or copla entropy based on the theorem just mentioned. It's simply an elegant composal of two simple steps. First, estimating empirical copla density function with run statistic. Then the problem became an entropy estimation problem. Among the valuable method, we propose to use KN method for estimating entropy. Because both steps are non-parametric, so the final method is non-parametric. It has several advantages as shown on the slide. Police refer to the paper on the slide if you're interested in more details. This slide for estimating transfer entropy. The transfer entropy is an important concept for marrying casualty. It is essentially conditional independent testing. It has wide applications in different fields. However, estimating it is also difficult just as the mutual information. Recently, I proved that transfer entropy can be represented with only C as shown in the equation five. Transfer entropy equals to three C terms with this representation. We propose a non-parametric method for estimating transfer entropy. Compose those two simple steps. First, estimating the three terms in the equation five and then calculating TE from the estimated C terms. After the theory, this slide is for an overview of Copenhage package. It implements the methods for estimating couple entropy and transfer entropy. The latest variant is 0.2 and includes five functions of the least C on the slide. I will introduce them one by one. This slide is for the three functions for estimating couple entropy. The method for estimating couple entropy composed of two steps. So we have two functions for each step. Construct empirical coupler. It's for estimating empirical coupler density. And ENT-KNN implements the KNN method for estimating entropy. Here for estimating couple entropy from empirical coupler density from the first step. And the main function, copent, is with two line codes calling for above two functions. For user's convenience, the copent function returns negative coupler entropy. CE is the function of testing conditional independence. It's simple, just estimating three C terms and calculating the result. Transcend is the function for estimating TE. It is also simple. Just preparing the date according to the argument line which is for time line. And then calling for the function CI because transfer entropy is essentially conditional independence. Now let's demonstrate the usage of the package with examples. The first example is on variable selection. The paper and the code for this example are listed at the bottom of the slides. Variable selection with the copent entropy simply to select variables based on the rank of the association between variable and the target measured by copent entropy. There are several other related measures in R such as HSIC in DHIC package. Distance correlation in energy package. HHD test in the HHD package. Hoffman's D test and the Buxma Dysos T star sign covariance in the independent package. And ball correlation in the ball package. We will compare them in this example. The date used here is a HHD state in the UCM machine learning directory. It contains four databases including 899 samples without missing values. And each sample has 76 samples attributes of which the number 58 is for diagnosis. And 13 attributes of them are recommended by the professionals as a clinical relevant. The goal of the example is to select attributes for predicting diagnosis. We load data from the UCI server directly with the code on the slide. Here is the main code for the example. We call the functions for all the measures in R packages. With this code, we estimate the dependence between the attributes number 58 and the other attributes. This slide shows the figures for the selection result with the six measures. The red line in each sub figure is for the dependence between number 58 and the number 16 attributes which is the selection threshold for each measure. This slide for the interpretability of the selections. We compare the selections with the recommended variables to check the number of the selected recommended variables. We see that couple entropy selects more than the other members did. Next is the example on casual discovery. The paper and the code for this example are listed at the bottom of the slides. Casual discovery is to infer casuality from observational data. Here we will infer casuality by estimating transfer entropy with the Copenhage package. There are also other R packages including coronabase conditional independence in the test package, conditional distance correlation in the CDC SIS package and the conditional dependence coefficients in the FOCI package. The data used here is the Beijing PM 2.5 date also from the UCIW tree containing the hourly observation lasting five years. It has several meteorological factors. In this example, we use the pressure factor. To avoid the meeting value, we use only a part of continuous observation including 501 samples. Here is the main code of the example. We estimate the casual strength from pressure to PM 2.5 in one hour to 24 hours. We first prepare the data according to the lag hours and then call the function for each measure. For transfer entropy, we have two ways of doing the job. We can call the function transcend directly or we can call the function CI with the prepared date. This slide shows estimation result with different measures. We can see that transfer entropy, CDC and the codec has similar results. Now let me summarize the talk. First, I introduce the background of the Copenhage package. Then I introduce the implementation of the package. Next, I use two examples to demonstrate the usage of the package and compare it with the other R packages. This slide shows the reference for this talk. And this slide shows the software. The Copenhage package in R and Passen are now available on Cron and the PYPI respectively and also on my GitHub. And finally, let me show my electric car, a golf I named Corporal Entropy. I enjoy the power of Corporal Entropy on my way to office and also in my office. Corporal Entropy really gave me a lot of fun every day. I hope you can also enjoy it with the Copenhage package. Thank you for your listening. Thanks a lot for your talk, Tian. Thank you. So there's no questions from the audience yet, if I sound correctly, but let it come, please. But I can start with a question. Which one should I choose? So for example, for the variable selection example that you had, you compared different methods or different R packages and showed the results. And the popular entropy framework selected more variables than the others. So is there a reason for that? Can you explain that? I think I can explain from theoretically and practically. So actually, I think the Corporal Entropy is defined rigorously. I think it's an ideal mathematical concept for statistical independence measures and compare with other theoretical tools. I think it has the advantage over others. And for the method estimating those measures, I think our estimation method is non-parametrically. So it's model-free. It can be applied to any method. And it also takes advantage of the Corporal function, which can make our estimation method more stable and more reliable. So I think I answered your question from these two aspects. Thank you. Also regarding the same example, since you showed or compared these different methodologies, did you also compare the predictive performance of the different models that came out of the different frameworks? Yes, I did. In another paper I showed in the reference slice, there is a paper titled Variable Selection with Corporal Entropy. In that paper, I showed more details on these examples. After we select variables, we compare the predictability of each models. And the model built with Corporal Entropy also presents the highest predictability. Great. I see a question now from Martin Mechler. He also gave a comment in the questions that you can look at that they also in the current package Coppula provided or estimated the empirical Coppula. And then the question is, how do you tune the smooth thing when estimating the empirical Coppula? Estimating empirical Coppula is very simple. It's just use the round statistics and divide it by the sample size. We call the empirical Coppula. That's very simple. Just two or three lines of code. We can down the drop. I hope that answers your question, Martin. Otherwise, just ask another one. So we have two more minutes reserved for you. So I can ask another one of my questions. OK, he actually said that he thanks you and found sometimes. So they found in their package, I guess, that sometimes it needs tuning the smooth. Yeah. OK, so I don't know if you want to go on with that. But otherwise, my question would be, so you also implemented your method in Python. Do you see advantages to R or why did you do that? Because I think our estimation method is made sure to be applied to more areas. So I have applied it in my project. I have applied it to several projects successfully. So I think maybe Coppula entropy can be applied to the much larger Python community. We can use Coppula package. And also, I see many Python users have used my package in their projects. I got many feedbacks from them. That's great if it's used. That's why you need it, I guess. So I believe now we're in the global or Q&A part of this session. So I invite all the speakers to come back to the stage and switch on the cameras if they want. And I invite all the participants to keep on asking questions also in Slack. So, Liz, do you have a question that you would like to ask to one of the speakers before I go? Yeah, go for it. OK. Go for the... All right. So I hope that I give them now to the right speakers. I tried to write it down, but I'm not 100% sure if I did this correctly. So I guess the first one that I would like to ask is goes to Jakub from Anastasia. Do you think that unsupervised algorithms can also be biased? Well, I guess so. I guess it's because we have some data. We make this unsupervised learning. We are, for example, clustering something. And we measure the distance, for example, between the clusters, maybe in such way. But I didn't focus my research on unsupervised learning. So I don't know if there are some, for example, like packages and maybe more research methods on this topic. Thank you. Also, if the other speakers want to jump in, that's totally fine. Yeah, I think we're going on with bias. So I guess it's also again for Jakub. Are you testing the biases removed model against a separate data set or can you use cross-validation? You can use, I guess, some different data. For example, you may use, like, train your models on training data and, of course, evaluate this on the test data. And you can measure fairness on this test data, of course. I don't know if it answers your question. But also, in for models, you can just use different, for example, columns or different preprocessed data, et cetera, as long as the target variables match. And you are predicting on the same, for example, set of people or instances. Yeah, and then, like I said, there's another question for you. Is there any particular stage of collecting and analyzing data when we should start paying attention to bias detection? I guess the most focus should be put on the data gathering and thinking about what features do we need to get into this data to predict the outcome. Maybe some features like the gender or the place of where does this person live, et cetera, et cetera, and some correlated variables, like the zip codes, which may indicate where someone is from. I guess this is the setting of the data gathering and all the things that you have to do before making your machine learning model. This is the most important in my opinion. I guess you have to keep it in mind in all stages. Yeah, thanks a lot. So I do have a couple more questions anyway. And Liz, just if you have something, just take the floor. Are you looking on Slack as well? There's, I think, nothing. There's a couple of people that are interested, which is good. But yeah, I mean, maybe Malta and Philipp, can you give us another real data example where you applied your methodology? Yes, so what we did to kind of, as a first test of our package and everything is, so in the original paper by Victor Shamsukov, where he developed the machine learning approach. So it's 2018 in the econometric journal. Then he has an application on where you can estimate the causal effect of eligibility or the participation in certain pension plans on your net financial assets. And this is what we replicate and we have it online on our website so you can go through all the single steps, how you can estimate, and then you can basically also compare the results to what they have in the paper, also the data set has been used in other publications as well. So you can kind of see, like, you can find out what's the important part. So do the parameters vary in some way? Do your results change if you have different burners or if you change something, if you use only a subset of the data, anything. So we have one example. And also we are working on getting the package run in more and more applications so we're currently working on this and they will all be published. And also we have been yet where we go through some examples. So in case you're interested in seeing how it works, what's the idea? Because I think in our talk we're rather generally talking about general method. So maybe it's nice to see some more hands-on example and that's what's available online. So in case you're interested, have a look. It's very good to know. Yeah, I would have another question for Sian. Also regarding applications, maybe can you name an application or a case study where the copula entropy framework is favorable to other methods? You're muted, sorry. Let me mention hydrology. That means water resources and management. Copula entropy is getting more and more popular in that area. Many researchers in this area apply copula entropy in their problems. So I know one researcher from, I don't know how to pronounce it, Lisbon. Lisbon? Yes, one researcher from Europe apply our method in their projects to manage resources from Brazil. And also in my country, many researchers from different universities use my copula entropy to design water resources stations, network, and many other applications in these hydrology areas. So I know many other areas also apply our methods, but due to the time, I don't know much more. Thank you. Thanks a lot. I mean, you can also put some links maybe on Slack so that we get a better idea if you want to. Yeah, so there's still no more questions coming in, but now I just dare to ask about this logo. So for this, it's actually two rhinos on the logo. I was wondering why? How come? Very nice question. So it's basically something like two-headed rhino, right? Yes, it was kind of hard to come up with a nice logo that illustrates the idea of the estimator. So the basic idea or the basic challenge was to visualize the double robustness property of the estimator. So we thought of something double and something robust. All right, so I have one last question for Jakub. And then I think we're done, I guess. But one of your first slides, you showed this, what I call it, this diagram to show the different steps of getting to a fair model. And there you actually presented the choice of the best model before the fairness check. And I was wondering whether what is the reasoning behind this ordering and whether it would also make sense to actually do the fairness check before testing which model is the best. Well, in this diagram, there was like, you have to do a fairness check to obtain this fairness object and then you can make a lot of visualizations on top of that. And my point was that if you have many, many models and some of them, all of them, maybe not positive this fairness check, maybe there is some metric or two metrics that are not met. But you may use some visualization techniques to obtain the model with the least amount of bias. Maybe not this fair model, but the best you can get in this situation, for example, because sometimes there may be not many ways to get rid of the bias without sacrificing the predictive power of the model. Okay, so I guess it's kind of a, yeah, you have to kind of wave over somehow. Yeah, interesting. Nice. Perfect. So thanks a lot again to all the speakers, four speakers of this session. And thanks a lot to Stefan, Stefan, sorry. Also behind the scene to taking care of the, all the technical parts of this session and list to introduce the speakers. So what is up next so you can enjoy some more social events going into trivia, which will happen at 315 UTC, and then at 415 UTC so quite soon you can enjoy another very exciting joint keynote on communication. So thanks a lot all of you, and see you in the next session. I'd also like to mention that if you are interested in this topic of modeling and data analysis, there will be another session on 9C will be similar.