 Hello everyone. My name is Makola and today I want to present our work towards a fair bundling detection system for Wikipedia. Wikipedia plays a central role in today's web and its knowledge is frequently used for empowering the other websites and products. And at the same time, that is very fast-changing environment with around 16 changes per second in more than 250 different languages. And that is important that not all of those changes are good facts. Some of them can violate the rules and even can be damaging. And those damaging should be identified and reverted if needed. And there is a specific type of users, patrollers, or the volunteers who are working on identifying those revisions. Even though there are the tools like ORS that help patrollers to do their work, there are still open problems like model performance, fairness, and language coverage. So our goal is to create a model that help editors to identify edits that requires patrolling. We use the implicit annotations, reverts, that we collect from historical data to train the machine learning models. We introduce the new generation of open source, multilingual model for content patrolling on Wikipedia. And our tool outperforms the current state of the art and also increase the language coverage by more than 60%. We also study the fairness and trade-off between the fairness and the performance of the models. So having the revision as an input, we compare it with the parents of the revision that was previously and extract the following features. We extract the features based on media Wiki edit types package. Also, we extract the texts, pieces of text that were inserted, removed, or changed. Later, we use the mask language models. In our case, those are the fine-tuned multilingual birds to classify those inserts, changes, or removes. Having those scores were applied the mean and max pooling for each of the signals and build the unified features set. Later, we combine those extracted features with the user and revision metadata and apply the final classifier to extract the probability score of the revision being reverted. In order to train the models we collect the data, we use the database of the media Wiki history and media Wiki Wiki text history to collect the data from 47 most edited languages in Wikipedia. We collect the six months of the data for training and the following week for testing. Also, we apply the filtering technique excluding the bots revisions and the revisions that are included in the revision words or edit words. Here's the example of extracting the text signal from the revision. In this example, you can see that there is only one change in one word, however, we extract the pair of texts, which include this change along with the context. There are no inserts or removes in this piece of text. Another example is the features extracted using the media Wiki edit types package. In order to train the model, we split our train data set into two independent parts. We do that randomly, however, making sure that the different articles are in different parts. Also, we have the holdout data set of one week to evaluate the final solution. So, we train the different configurations of the model, meaning that we use the different feature sets to train our model. Also, we experiment with training the model on anonymous only users revisions. Here you can see the two tables. That is the performance of the model based on the all users and anonymous only users. So, we can see that configuration that was trained on all users revisions with the user features and the features extracted from the mask language models outperform all the configurations along with the current state-of-the-art oris and the same true also for the anonymous users. However, there is also an important insight that the model trained on anonymous only users revisions is the best for anonymous only users. However, the difference between the best configuration for all users is not big. That is why we are using the model that is trained on all users as our final one configuration. Moreover, we are evaluating our model on different languages and conclude that our model is outperforming or performing equally to the previous state-of-the-art oris on all languages. Moreover, we are also evaluating the fairness of the model. In order to do that, we are using the disparate impact ratio. First of all, we calculate the base rate. In our case, that is 7.93. It shows the difference in probabilities of being reverted for the revisions created by anonymous users and registered users. We are anonymous users, in our case, that is an unprivileged group of users. We see that this rate is not 1. That is important because naturally, the revisions created by anonymous users tend to be more likely to be reverted than revisions created by the registered users. However, as from the table, we can see that the same disparate impact ratio for oris model is 20.02, which is much higher than the base rate, meaning that the model introduced a significant bias against the anonymous users. Our best configuration shows much closer value, which is 9.54. However, we should also say that that is still not the same and a little bit higher, meaning that we still introduce the minor bias against anonymous users. However, that is much smaller than for the previous iteration of the models. Also, we analyze the difference in AUX core for different groups of users and conclude that our model performs for this metric also better, introducing the less bias to the unprivileged group of anonymous users. That is all for today. Thank you for your attention.