 Just in case, is there Sergei Zagorouk in the room, because I don't know him personally. Seems no, and I don't see him online, okay. Then, well, we need to start, so let me act as a session cheer then, at least for some time. So, we have a computer vision session right now, and the first talk will be by Razan Dibo, and it will be on some cool computer vision with X-ray images. So, Razan, can you please try sharing your screen, and also say something? Yeah, good morning. Good morning, we hear you very well. Yeah, it's perfect, please go ahead. Okay, thank you so much. So, good morning, esteemed highest attendees. It's truly an honor to present in front of such a distinguished audience today. My name is Razan, and I'm delighted to share with you insights from our project titled DeepLock Deep Learning-Based Poem Pathology Localization and Classification in RISC-X-ray Images. So, in fact, poem diseases are common and serious problems. According to statistics, there's around 1.5 million cases suffered from fractures due to the poem diseases. Also, it was found that the fractures of the RISCs have the highest frequency of misdiagnosis, accounting for 32% of all the cases. It's important to note that the manual analysis of X-ray images is time-consuming even for experienced doctors. Also, bad quality images may hide some important details for the diagnosis. Furthermore, the experience of the radiologist is a key factor of the error probability. All in all, the problem can be seen as the detection of RISC-poem diseases is manual, time-consuming, with high probability of errors, and that's why there is a high motivation to automate this process. So, our goal in this project is to increase the speed and the efficiency of the diagnostic procedure. As we can see here, this will be the input image for our model. It will take several minutes from doctors to evaluate the image, while it will take less than a second using our model to evaluate it and drawing these bounding boxes around the injured areas. The objectives of this project were to, first of all, implement the YOLO version architecture to the modified dataset that I will explain later, then trying to enhance the results by adding attention mechanism and then trying to add a visual transformers and finally to combine all of these techniques together, which are the attention mechanism with a swing transformer with YOLO version 7 and finally fine-tune the final architecture. So, before start talking about the proposed approach, let me first introduce what is this a swing transformer. It stands for shifted windows and it's basically a viewing transformer variant, but the idea here is that it is with hierarchical way of processing the image. So, it's like any visual transformer. It still relies on patches, but instead of proving one size and sticking with it through all the layers, instead, the swing transformer first start with a small batch size and it will try to merge them together into bigger ones, as we go deeper in the transformer layers. And here you can find each of these transformer blocks. It consists of the shifted window multi-head self-attention model followed by a two-lighted layer perceptron and also we have the layer normalization before each one of them with the residual connections. Talking about the attention mechanisms, actually there are different attention mechanisms that we're suggested to use either what's called the channel attention or maybe the spatial attention in a way that to suppress the important channels. So, there's only one study. It's called C-BAM, which is convolutional block attention model which consider both the spatial and the channel attention together sequentially. But the problem of this study is that it ignored the channel spatial interactions. I mean that sometimes we have a common or a cross-dimension information. So, in C-BAM, they lose this cross-dimension information. To amplify this cross-dimension interactions, the GAM, which is Global Attention Mechanism, was proposed, which is capable of capturing the significant features across all the three dimensions. As you can see here, we start before going to the channel attention, we start with this permutation. It's a 3D permutation to preserve the information across the three dimensions. And we also have the multi-layer perceptron to layer multi-layer perceptron to amplify this cross-dimension information. And then we have the spatial submodel, which is the two convolution layers for spatial information fusion. Now, we can talk about the proposed approach. So, as you can see here, YOLO version 7 was used as baseline. So, the YOLO version 7 is made up of three main components. We have the head, neck and the backbone. Sometimes they merge the head with the neck together. So, in the backbone is where the convolution layers detect the key features of the image for the processing later. The main convolution layer here is called ELAN, which is effective layer aggregation network, which uses group convolution to increase the cardinality of the added features and try to combine features of different groups in a shuffle and merge cardinality manner. So, this way of operation can enhance the features learned by different feature marks and improve the use of the parameters and calculations later. Then we have the neck. In the neck, it takes these features from the backbone into the fully connected layer, which are in the heads. So, finally, to predict the bounding boxes, coordinates and the classification probabilities. So, the final output layer here, which is the head, it makes the prediction, as we can see, at three different levels. So, the key important thing here is that it improves the model's ability to detect small objects, medium objects and small sized objects. So, if you can notice these red boxes, you can find that we suggested to use or to insert the swing transformer and gamma tension before each of the detection heads. And, of course, now you are wondering why in this position exactly. So, let me first describe the dataset before answering this question. Мы использовали диатрик-ристо-трома X-ray dataset, который имеет 9 классов, и здесь вы можете найти какие-то сомбли для различных классов. Это имеет около 20 тысяч изображений с докторами и нотацией, которые являются bounding boxes и дезисами. Классы являются дезисами. Но после общения с докторами в hospital, мы решили модифицировать эту dataset, чтобы уехать только для классов. Мы решили модифицировать эти классы, и мы выжимаем два класса, которые являются форум-боди и металл, в одну классу мы называем форум-боди, и в другие классы, мы называем их все, что включает сомбли, броносин и другие дезисы. И также мы делили важную классу для докторов, что это текст, чтобы уехать только для класса. После модифицирования, мы нашли сомбли на этой модифицированной dataset. Она была делена в тренировании и тестировании 70%, 20%, respectively. И результаты находятся в таблее. Простой метод используется для evaluации и сравнивать различные модели. Перед началом экспериментам, все эксперименты, мы должны найти, что это лучшая припроцесса и сомбли на технике. Мы попробовали различные техники, как сомбли, с различными фильтрами, клахами, фильтрами, и также сомбли. Мы называем их технике сомбли, которые involve resizing the original dataset to 640 by 640. Мы используем музаик, сомбли, сомбли, и хоризонтал-флиппинг. И в том, в том числе, вы можете увидеть, в том числе, все эти припроцессы. Мы пытались найти, где лучший мест для влажной внимания. Мы пытались для каждого из их ассоциативных методов. И для всех методов. Мы сравнивали результат с сомбли, потому что, теоретически, мы говорили, что гамма-тифлиппинг лучше, чем сомбли. Но мы должны видеть это в экспериментах. И, да, это лучше, с каждым из них, мы можем видеть, что у нас есть маленькая деференция. Это не очень большая деференция здесь. Итак, здесь можно найти табель, которая сравнивает различные модели. Мы попробовали ело, мы попробовали с си-пам, с гаммином. Мы попробовали только с винт-трансформерами перед головами, и мы тоже попробовали, если это хорошо, с винт-трансформерами, даже в пакпонах. И finally, мы попробовали все эти техники вместе, с ело, с си-пам и с гаммином, который мы называем диплок, потому что это винт-модель. И вы можете видеть, в том числе, из каждой классы, в которой есть лучшие результаты, даже с винт-трансформерами, она имеет 0,65 мм. Мы также сравнивали различные модели, в том числе в том числе модели, как и влог, с линии и си-пам, и, конечно, как все-таки, как мы добавили линии и как config модели для этой архитектуры, эти Taste Parameters, amplitude, си-пам, и нашим модифицируем, или победимым моделом D-плок, в том числе преследовательного рекол-керфа, и специально говорим о керпе, о том, что связано с по-онлеженному классу, вы можете найти реальный enhancement, используя нашу предыдущую идею. И в том числе в том числе, в том меме для всех керпов, есть также enhancement. Для оплачения, мы пытались найти, что это эффект, не так как бы вы не могли бы бы выбрать эту реальную. Что бы вы не могли бы выбрать ее реальную? Что бы вы не могли бы выбрать ее реальную? Но в каждом плане, мы должны были бы выresser с несколькости, а с несколькросности. Существует очень сильный эффект, и очень важный эффект, мы мы можем посмотреть, чтобы вырвать также Kelly иVO, и формирование, которое ввозьмёт в imageData. В этой стаблее вы можете найти результаты для преследования и рекоменда, для выбора модели, для всех классов, и для каждого из них. Теперь посмотрим реальные имминии, чтобы увидеть, как эффект нашего выбора модели. Так что, используя елло в себе для этой имминии, вы можете видеть, что это сильно предупреждает в области этого образа, как фориум-бодис. Пока эта проблема не существует в нашей модифицированной версии, она вернулся правильно, что есть трактур. Так, я вам скажу, что белые бунтинг-бокси были грандитами, и другие цветы были вернувся от моделей. Другой пример здесь, используя еолу, можно было вернуть только одну из двух классов от периодической реакции, пока в нашей модифицированной версии можно было вернуть эти два класса правильно. И здесь тоже очень важный пример, потому что, используя еолу, можно не верить любые из грандитов бунтинг-боксов, когда используя бунтинг-бокс, можно найти, что даже если это только одну класс, но как-то в области бунтинг-бокс, то все эти бунтинг-боксы разговаривают. Мы тоже пытались тестить нашу модель на экстренальные данные от hospital. Во-первых, докторы были интересны только в трактуре, так что можно найти, что модель можно было верить, что есть трактур правильно, даже если она была как-то развернута, как и в этой модели, с очень хорошей аккурицией. Также это очень важный пример, потому что даже опытные докторы не могли найти, что здесь есть трактур, потому что это экстренальный пример в области бунтинг-боксов. Модель можно было верить, что здесь есть трактур, и здесь. В итоге, я могу сказать, что в области геолу оно расширяет результаты в 6%. Когда мы добавим внимательный механизм в трактуре, у нас есть еще another positive impact on the detection results by 8%. Мы нашли, что позиция attention mechanism had a relatively small impact on the result, and it's good to notice that we still have limited amount of porn legend class images, and this affected negatively the average precision result. So, of course, adding new images to the dataset will definitely improve the results. For the future work, I can say that if we expand the dataset and explore more like transformers or even attention mechanisms, these are important areas for the further experiments. And thank you so much. Your colleagues, do we have? We have questions. Thank you for the talk. I have two questions in particular. The first question is unfortunately, YOLOV7 has a research-only license, so no hospital could use it for free. Could you please tell me whether replacing this with some alternatives or earlier versions will affect the result a lot or is more or less the same? Could you please comment on that? Yeah, thanks for the question. Actually, I didn't know that it is limited to use, but I think, yeah, if you want to use alternatives, of course it will affect the result because the baseline that we used is YOLOV7. But actually, I've tried also using YOLOV5 and YOLOV8 even, but YOLOV7 has the best results on this dataset. Can you comment what's the difference between V5 and V7? Because V5 is, at least for now, more or less free. Yeah. So I think the main difference is using the ELAN and also in the detection heads the sizes of the anchor boxes to find the small and medium and the large objects in the image itself. Okay, thank you. A second small question. Image augmentation really helps in the computer vision. So different approach like run, document and stuff. My question is, are there specific augmentations that could be used for X-ray images? Thank you. Yeah, thanks for the question. Yes, actually I used something called Mosaic Augmentation Technique. In this technique of augmentation you know that when you have an X-ray image sometimes the object is in the center of the image but when you use this mosaic technique the mosaic technique you combine many images together in the same image so in this way you have objects in different areas of the image itself. Thank you. One more question. Thank you for your talk and simple question. Just which framework did you use for your models? To build your models? Do you mean the PyTorch? Yeah, PyTorch or like something else? Yeah, PyTorch. Okay, and another simple question. Maybe I was a bit too late just to clarify what's the amount of samples in the training data. We have 20 000 images so 70% of them are for the training data. So 13 000? 13 000? No, 20 000. Okay, thank you. Okay, colleagues, if anyone will have further questions we have a course of this paper here so we will ask later and let us thank Razan again. Thanks a lot for a great talk. Thank you so much. Now we proceed Our next talk is is on greedy algorithms for fast finding of curvilinear symmetry of binary raster images. My name is Nikita Lomov and in this report I represent Tula State University and the title of my report is Greedy Algorithm for Fast Finding Curvilinear Symmetry of Binary Raster Images. So let me start from the field of our activity. It is analysis of symmetry of planar forms and it is intuitively clear that planar shape is symmetric when we can cut it into two halves and they will be equal when we take a photo shooting of some object from real life it will be noise applied, it will be deviations caused by the point of view or some pose some occlusions and so usually we can't talk about pure symmetry in images, we can talk about quasi symmetry or approximate symmetry and because we can see that all the subject that produced these binary silhouettes are intrinsically symmetric but sometimes their shape is affected by pose and for example for this alligator it is supposed can be expressed as some sort of curve line and we need some straightening procedure and algorithm for finding this curve line to make a conclusion that our initial object was really symmetric in fact the task of obtaining symmetry of planar object is well analyzed by a bunch of approaches and they can be divided into two main parts one part is contour oriented and the second one is interior oriented and in the first class of approaches there is some procedure for comparing parts of contour, for example we are assessing Fourier transform or maybe finding some critical points and the problem with these approaches is that is that the estimation the number that expresses the quality of our symmetry is hardly interpretable so it's just some distance in Fourier space or some discrepancy in critical points so we need maybe something more accessible and more intuitively clearer and another class of approaches is interior oriented and for example our last research was about this class and there are for example mechanisms for calculating the best transform according to jacquard measures so we just reflect our image and we overlay them and in fact it's just intersection of reunion measure well known in for example segmentation by neural networks so we can at first analyze projections Fourier don't transform and of course for symmetric shape these projections will be symmetric as functions too and the more advanced approach is connected to any possible fin transforms of our image so we can we can find optimal solution for jacquard measure for straightening and symmetries in our image for all possible fin transformations and for this part I can conclude that it is all about the parametrization of our transform so in fact the real task is how to find transform that makes our image symmetric according to vertical axis so it is it is clear for strict symmetry we just need to rotate and shift our image and for curved symmetry the task is much more complicated because we can of course express our curved line by splines or by some some points and so on but it's too hard to optimize and the task of searching for curved symmetry is very well established in bioinformatics for example the analysis of worms or fishes and so on and sometimes there are different representations of our images because of the nature of photo shooting procedure sometimes we cannot for example cannot get the whole outline of our object we can just obtain some point cloud and we need to approximate our curved line through this point cloud and our past approach for this task was dedicated to dynamic programming so we can express our curved line just as a sequence of steps with the weight with a small angle and we can compute the optimal solution for this but we have two problems the first one is that approaches to time consuming because we in fact in dynamic programming should check all possible lines even if we can stitch parts of them we are dynamic programming paradigm it is still too it is still too tedious and time consuming and another problem is that we need to forbid cycles in our tractors and it implies two hard restrictions of our possible curves so we need to access better speed and better flexibility of our line so what is the idea of straightening we just draw some perpendicular evenly distributed along our curve and we just feel our straightened image row by row along these perpendicular so the approach of straightening will be the same but the approach of constructing our line will be completely different it will be constructed we are 3D approach so it is the same sequence of steps but of course we trace only one line and maybe with some range of possible directions in every step as I saw we use jacquard measure as a measure of our symmetry and we can apply jacquard measure not to the whole image but to the line so we have some segment and some point in this line and we can compare the left side and the right side of this segment and to compare length and it will be some partel jacquard measure along this perpendicular and from this partel measures the total measure will be collected so in fact our approach is simple so we start with some user predefined or maybe some point extracted from the skeleton on the boundary of our object and we take some steps and on every step we can adjust the direction just using our jacquard measure and with some penalty of deviation from the past direction and we trace our line with adaptive choose of step and adaptive choose of possible range of angles until we will leave the interior of our object so we will go out of the image itself or out of the black part and there are a lot of formulas and it's maybe to time consuming to explain everything but maybe this one is most interesting so it's just a measure for choosing the proper angle and it contains of 3 terms so it is proportional of jacquard measure so it's just local measure of symmetry along our future perpendicular and square root of cos to penalize penalize rotations from the past direction so if the direction will be the same rotation angle will be zero and it just crushes to unit and in the denominator we have some factors just length of our perpendicular but the portion of width line inside the figure so it states that our method prefers stepping into rather narrow parts of our shape because we faced a problem that it is it was shifted to too wide to white parts of the image so and our our aim was to trace our shape along and it starts by choosing one principle direction over 8 evenly distributed and as of a measure of our direction we just estimate the length to the first point of the boundary and so when the boundary is far in this direction the direction is considered to be better and another parameter of our algorithm as a step it is it possesses the procedure for adaptive choose of step so it's just square root of this distance to the boundary and when the distance to the boundary is short so our algorithms begin to be more accurate and more and less risky as I can say and when the boundary is far we can make much longer steps that's the idea and another parameter tied with this distance is range of possible angles from which we can choose our direction it is also the same idea so when the distance to the boundary point is far we not only can make larger steps but we can allow greater deviations from the last directions because in this narrow and very curved line this distance will be small and so the range of alpha will be much greater because we need much more freedom in our character and of course after applying the straightening procedure we can access the classical jacquard measure just between straightened and reflected image and for example it should be 0777 so it is pretty fact that is rather close to 0 to consider this image to be symmetric but now we have some advanced procedures just to align the other parts not the main body but the whole all all the parts of our lizard together yes and the main result it is our algorithm is successfully applied with wide sort of images and it have have very successful speed of dozens of milliseconds and yes it is the main it is the main stage for time consumption of our algorithm because straightening is much more fast and maybe there are successful successful images but sometimes we can face the problems because because as our algorithm is greedy it can it cannot see the future so sometimes it can choose a wrong direction and by its nature it should follow it to the end so sometimes we need to find some attacks of our shape and to end our curved line in it in this point but before we reached it we don't know about its existence and its example of problematic image so and yes we successfully successfully solved the task of searching for the current symmetry for elongated and curate binary shapes and we have some prospects future improvement for example to align more subtle parts of image and to prevent cycles and so on so thank you very much for attention and ready to ask your questions your colleagues any questions my question is from a different angle you can show me the straightening yes shape straightening this question arise from me all objects by nature are fractal in nature could you do that you define the fractality of the nature not doing such a computation by your algorithm but define your fractality which will make it easier to you to understand how fractals and fractalities very close connected to symmetrization and the symmetry you could do that for example by another algorithm and to make it easier for you not that make this curved object to make it straight for example by shape straightening for example this object and could you do that or just a reflection and question for you it's a very broad question for me it seems that another approaches and another techniques will be more suitable for this search for fractality for example I think that graph representations we are a skeleton so skeleton is just a medial axis so it is has some sort of thinning and result of thinning in our form center it will be a graph and it seems that we can analyze the self similarity and fractal parts without this skeletal graph maybe but not we are this rather restricted and it's aims procedure Any more questions, colleagues? I will also ask one question with probably I was not listening that well in the beginning but like what's the ultimate goal so you achieved that so you understood your shape very well but how do you use it at the end so I mean what is the end application афикс Yes Example it is result from the past but in biological research for example we need to to get straight into texture for our object so we have a mask in form of silhouette of binary image and when we constructed this current symmetry line we can straighten it and example get these straighten textures and compare them because it is much more comfortable to deal with straightened image with the same size and so on with the same coordinate system but our research in fact mostly theoretical theoretically inspired so we are interested in some parameterizations of symmetry some every possible kind and form of symmetry also as symmetry in graphs and symmetry in some complicated shapes and so on Thanks Nikita and I think we have no more questions so let us thank Nikita again Okay, so the next talk will be on handwritten text recognition and browsing in archive of prisoners letters from Smolensk convict prison Wow Please go ahead It was completely different completely different field and topic and tasks We analyze historical documents because we have a grant of Russian scientific fund so it is access to high school of economics but primarily some participants from Moscow State University and it is about analysis of handwritten historical documents and of course some sort of automatic machine analysis and the problem with this task is there is a lack of large scale and general purpose data sets of handwritten documents in Russian there are some of course in English maybe in another language but we have just several such data sets and some of them are too simple for example handwritten Kazakh and Russian because it is restricted in phrases and its structure is just forms fill it with predefined words so it's it doesn't contain so much diversity and there is a problem but there are several scriptures several script styles and in fact we revealed that the way of writing dramatically changed to last century so they cannot be considered and regarded historical and also we have some data sets dedicated to historical persons for example Peter the Great and have had very very difficult and unique style of handwriting and of course it cannot be directly generalized to some historical archives because the nature of handwriting will be completely different and with different features but of course these data sets are commonly used and a lot of papers use them as experiments but in practical tasks we need to find another way so our basic data consisted of only 67 photographs it's just a notebook consisting containing letters from Smolensk convict prison and they were completed with the same handwriting because of the censorship these letters were opened and rewritten by Sam Jandarm and and this stability of handwriting is a great feature of our data set of course we extensively used it and there was a transcript but a bit noisy with a lot of mistakes of missing parts and it was some tasks on preprocessing because there are overlaid pages and some parts of the same letter can be distributed to several pages so this preprocessing was made manually and what are possible representations of our handwritten text data and practical approaches there are three main it can be expressed the image can be split into individual lines and aligned with text and script sometimes you can pass to the system of handwritten recognition pass the whole paragraph it it is suitable when the lines are good aligned because just one over another and there are no samples no marginalia no lines no portions of texts in between but sometimes there is a possibility to get more structured representation of page with different paragraphs with different types of texts but of course in applications to collect data with such representations with all fragments isolated and all fragments market is much more deduced so we will work primarily with line and partner representation and we will apply algorithms to these two types and yes we need some preprocessing tasks to obtain these representations and it was achieved mainly by we are using some readymade neural networks so the first one is extracting representation and grayscale image and the second one is extracting baselines so and also in the sort of binary image when components represent these baselines and the previous step can be made automatically but in this case we obtained that applying some readymade models to extract text fragments wasn't successful because they are overlaid and they are in the same script so the system cannot distinguish for example a letter between letter and underlying letter because they have almost the same structure so this step was taken manually and it resulted in 86 text fragments and corresponding to parts of the same letter and to obtain our line representation we have done such a step of distributing our portions of pantries to baselines it is done by analyzing connected components so we filter out the noise the parts that are far from our baselines and distribute components according to majority rule but sometimes there are two or more maybe candidates because we have in our script underlined elements and overline some loops and they can be stitched and so they represent the same component distributed to several baselines and in this case we need to cut these components into several parts and it is down and the most narrow parts of our pantries just because it corresponds to the places of stitching of different different lines of our pen and another preprocessing task is text straightening it's straightened again because we have some sort of rotation some sort of inclination in our text direction and it is also simple algorithm we just need to correct our baseline to make it fully horizontal and to apply the same transformation to our initial color image and there another very valuable step is to eliminate the portions of pixels related to pixels of pantries so they are informative but related to another lines so it achieved by analyzing the mask by filling pixels related to this mask with nearest pixels in the background so without this the noise and the result is much more worse that denoised option and bottom row and the same procedure can be applied to the page as a whole so it's this transformation is not line by line but for the page itself and it was a lot of manual work of course with preparing and transcript because we need to achieve full correspondence between the image and the text to use it in machine recognition so every feature of the spelling, every feature of line breaking and so on it was retained when preparing this ground truth transcript and there are some popular architectures for handwritten text recognition and we've taken vertical attention OCI because of our dataset is rather narrow so we cannot use full transformers but in fact it contains some sort of attention mechanisms but it is just expresses the distribution of our line between rows of the image maybe with some pulling or striding so it is attention in this case is just column vector vector and the representation of a line is just a weighted sum of representation of feature representation of rows and that we are fully convolutional matter and using this approach we achieved some satisfying results in terms of character error rate and word error rate and it's just an example of one portion of text of course it is this neural network itself knows nothing about Russian language, it's just analyzing graphical structure and features of our image so there are a lot of strange mistakes in a lot of words and letter sequence that don't exist in Russian language and yes but the final recognition rate was slightly over 505% in character error rate and we can see that our preprocessing is also very valuable because without denoising it is almost 7% of character error rate so the denoising is very valuable and maybe the straightening is not so very valuable in terms of error rate but it speeds up our network because the size of the image will be much more lower and so we are dealing with some sort of noise and machine generated output and to use it in practical tasks we need to do some post processing and there were two directions of this post processing so the first one was dictionary based and rule based and we just transformed our automatic transcript to modern spelling as to correcting the alphabet correcting rules for prefixes so they can be prescribed of course and to make our words realistic we used some dictionary based method relying on Levenstein distance and make limitisation and also library for future analysis because the text should be prepared for some sort of navigation procedures and there another direction is GPT based correction so it illuminates the process of searching for proper instruments is just some instruction to our board to correct our text and we obtained that we need to provide some more instructions not related to text correction itself but for example pay attention to named entities to actors in text and the problem was that result with the initial prompt was too loose so GPT can use synonyms, can use some reordering and it was instruction to forbid this transformation text and to keep it its structure as similar to the original one as possible so yes and here our quality measure is word error rate because its more suitable for GPT because it can correct spelling and it still makes some reordering in our word so and in terms of third penalty was much more for reordering so yes after after improving our instruction with some examples and some role it overcomes the error rate achieved by dictionary based methods yes and maybe one can explain that GPT perfectly corrects the text that's not the case so we obtained some number of mistakes in automatic transcription and also far less than half of this mistakenly transcripted words corrected by GPT sometimes that is even in case when the solution seems evident for human being so we have no such word as Matrasny in Russian language but it keeps as is and another problem that GPT produces the output is one portion of attacks without line breaking so we invented some sort of dynamic programming it is very similar to added distance just to to set line breaks in proper place of our text and minimize at a distance not between the whole text but with pain and attention to line breaks and in fact it was a preparation for navigation systems for our colleagues from humanities, historians and philologists and one task they are interested in is a search by keywords search and topic extraction and in case where topic can be expressed in a set of keywords what is the flow of our system we have some automatic transcription and as the system use CTC loss as as a loss one training we can align our predicted text distributed to columns of our image so and also we can distribute the text our lines so we have placement over both x-axis and y-axis and we can make some sort of visualization just to underline our found our keywords found using this placement yes there are results of our system it was a model task when the expert was setting four topics expressed by sets of keywords and so there are four colors corresponding to these topics and we have made some sort of visualization of analysis of co-location as future work and some topic extraction and the right most image is the visualization of our page based model and we can see that this placement this highlighting is much more smooth just because of the nature of our attention because it is it it is very simple distribution of vertical levels so so our underlining underlining part will be completely horizontal and another another possible task using this machine generated output is search for personal names and we utilize two simple approaches just to extract everything that starts with capitals and to analyze some connotations so when for example this capitalized word starts after a dot it can be not appropriate just can be ordinary word so we have here two types of highlighting for ordinary words capitalized and words that regarded as proper names and it is interesting but the search for proper words is the main tasks when GPT based correction was the most successful one it is because it corrects maybe not mainly not the particular words that but it describes the overall structure of the text and separation in sentences and to restore the capitalization much better than correcting just spelling it is maybe unexpected conclusion in our experiments so we can use GPT as a solution for everything we can search for a proper task to utilize it and some slides for our future prospects of course they are visualizing and analyzing an archive of 100 images is looking like a toy problem although it's very it can be very helpful still but now we have much more much more extended archive of four volumes of diaries of admiral Ferdor Lidke and only contains several thousand pages and only several hundreds are mainly transcribed and with with great artifacts so we are planning to transcribe it automatically and we are towards towards a successful solution and 2 notebook is aligned so the line space seems stable to our data and in this case we can for example invent some new type of attention for equally distributed lines and another trick we utilize is CTC loss with missing parts missing parts are ubiquitous and this guy especially when transcribing another language like German and Latin so we can use special symbols when the real the true character is unknown or maybe the land of sequences unknown and it still works and still have an ability to train so and the main result of our work is that we can achieve rather satisfying recognition rate using just just 1000 of lines but written in the same handwriting so thank you very much and thank you for your attention and interest okay thank you very much Nikita it was I think well project is amazing I think and also your effort today is like together to talk it was like longer than a plenary one so hey it's definitely difficult colleagues do we have any questions here Halik thank you very much for the talk and sharing this task with us did you use any specific and exotic augmentation techniques for the training process of the OCR model here something like handwritten stroke augmentation or so thank you okay there are some sort of augmentation but it is dealing with images so possible geometrical distortions tone corrections and so on may be stitching parts of words from different lines together but we haven't generated new text maybe it can improve the results but we have struggling with it because of the low amount of our data but of course it's possible any more questions here did you use only GPD based model for your transcripts and recognition or for example other models also for antropic for example google's bot and so on but okay in fact dealing with language models is task of my co-author so it more experienced maybe than me but it checked some language model maybe not so much maybe 3 on 4 and concluded that GPT is better and this is the most accessible one but of course there are a lot of tasks and here and every can be improved so yes there are a lot of open questions and a lot of drawbacks in our system these are not drawbacks that's the directions for improvement so let's be positive any more questions well if there are no more questions then thank you Nikita again and I think that our next talk is gonna be online okay hi everyone my name is Maxim Kuproshevich I am computer vision team leader at the layer team and I am Andrey Tolstyk as the author of multinput transformer for ancient gender estimation second okay so let's start with task definition and our task can be divided into two slightly different sub tasks first one is very straight forward and it's much more common vast of methods for trying to solve exactly this task so let's say we have some crop of person and the face is visible enough to be used for prediction with our black box to predict ancient gender and this task is I think very plain and very common but there is another type of task not so many methods trying to solve it when face isn't available or heavily occluded but despite this task is not so common it's still very important for science and business for example for surveillance cameras or personification or as it is in our case for a closed accessories choose recommendation systems because let's say we cannot estimate gender from picture at the bottom therefore we have to search for the entire index and we will search of course just for visual similarity and therefore we can easily find something wrong for example I don't know if there is a kid here and of course this will be an absolute fail so that's why it's important so our goal was to create such model that can operate in both cases and be as precise as possible of course and be in real time because we're solving a business task first of all let's discuss metrics I'm not describing this here we use just accuracy as most of work because most of works before us have been used this metric and for age also quite simple main metric usually is mean absolute error so most of work also uses this metric and we did the same and addition metric is cumulative score it can be seen but it is not it is not usually L is 5 it is also very clear metric it is easy to understand what about data not so many datasets have age and gender ground truth and you can see biggest open datasets for this task on the slide and this datasets can be separated to regression and classification the same as methods of course classification is much cheaper you can mine much more data with this approach but it is less precise less strong and regression approach guides model to better generalization one work we refer to in our paper called deep and balanced regression they have a really good recommendation for this if you're interested you can read the original so as you can see and we perform most of our experiments on MDB and Utica phase and as you can see not so much data and MDB is heavily based to the celebrities so this dataset mostly contains celebrities so obviously we had to mine our own data and we did this open images dataset huge google dataset from our production system we have many products we were able to do this from some additional sources like web scrapping and so on and we sent these images to the crowdsource we asked users to estimate roughly age I hope not roughly gender and we said overall up of 10 images so after we finished raised the question how to aggregate these 10 volts into something one precise prediction so the gender everything is quite simple you can just use a mod we did this and we also declined some samples with inconsistent volts like 5050 or 4060 so that's usually images with bad quality samples were generated with detector neural network so some mistakes possible but with age not so everything is not so simple and of course you can use some statistical methods you can see many of them in table one and you can see that result at Maya like 4.77 for example isn't so high so that can be used for a train but still error is quite high and yes we calculated these results based on control tasks we use them for quality control so we can calculate these numbers and there is one quite simple idea if you have control tasks and obviously some people estimate age better and some worse and a few slides later I will show you that Maya almost normally if we have this Maya and we can calculate them individually for each user we can waste a lot so we did this with an exponential term and you can see that results are much better with significant margin from other methods so 3.5 that's a really nice result I can say so after we finished we collected more than half a million images and half a million we used for a train it's our close proprietary data set because we can share them some of them from our production and can be licensed but we decided to create new benchmark from images from open image data set we called it layer age in gender data set you can see statistics on the slide you can see that the histogram and you can see it's almost perfectly balanced except they're very right because it's very hard to mine ages and we decided to create it because previous benchmarks are heavily based to celebrities or very small because it has been taken in police office or studio and you can download this new benchmark by url for free without any forms so you're welcome okay let's move to the methods so we started we wanted to start with some baseline some strong model so we can be sure we can move on to something more complex and we took as our baseline visual outlooker if you're not familiar with this network you can read the original paper but I don't know right now that's a modified visual transformer with modified attention block and we started with the simplest task so face crop as input and age as output and then we add another output as gender so I won't go deep here to train procedure it's modified slightly for example classification model because we described these details in our paper and we have not so much time let's take a look to the results so you can see that our baseline already take state of the art and for both MDP clean for example here is previous state of the art so very good results for our new legend train and benchmark test set and what's really interesting here that you can see that age and gender model with double output is much more precise than just age output difference between these two models just gender output nothing has been modified in train procedure so that's a well known phenomenon when model generalize better with multi tasks but it's not so often you can absorb it so straight forward so that's amazing and you can see the same for all the data sets model with additional output is more precise ok so we beat state of the art but that wasn't our goal our goal was to create a model that can operate in any cases in any combinations of faces and bodies some of them can be unavailable and so obviously we cannot use single input of entire person crop because the resolution of this model is very small and face features will be just vanished away so we cannot do this and we have to create some model that will use multi input architecture and what we did so you can see that we separate face and body crops and to perform this cross view future fusion we created future engines model so first we pass our inputs to the original patch embedding and also very important that we need to preserve original dimension status because otherwise we cannot use transfer learning otherwise our model will be slow so we have to use some early fusion and fit the same dimensions so our future engines model fuse features in the way you can see on the right we do cross view attention so first we perform face to body cross attention then vice versa body to face cross attention then we can continue features pass them to multi layer perceptron and eventually we have a fusion joint representation of the same dimensions as in original so we can use transfer learning and everything else and model is quite fast because of this early future fusion strategy everything else is original network except output of course so let's check the results here you can see baseline at the top and here you can see our multi input model and you can see that we achieved significant improvement for an age and also for a gender for a agenda the hardest and biggest benchmark for now I think and slightly lower results for a gender but difference is so small that it's hard to say why is that this needs to be resurgit deeper and what is most interesting here you can see we can we can use of course separately only faces or only body inputs model is joint face and body inputs and when you use just a body input model still performs quite good for example for an age it's 6.66 let's remember this number we will need a few slides later and you can see that for all the datasets gender works quite good of course lower than with face of course face is most reliable way to predict age and gender but still it works with quite good accuracy so as this point we took state of the art for every benchmark and we became interested what will happen if we will run more benchmarks so there remains not so many benchmarks we can run for with regression output for an age we tried HDB but only some old measurements were available here so of course we easily beat these results but much more interesting was to run our model for classification benchmarks of course we cannot use any trained data for our model because we use a regression and these datasets have classification outputs for age ground trusses and also the ranges are different so they define these classes differently so we cannot use any trained data but we can simply map our regression output classes ranges of course and we run our model with validation part of these datasets and you can see that we also took state of the art for both of these datasets you can see for example for ADNs quite old dataset here is a previous state of the art for gender and you can see that margin gap is really significant for our model and that's amazing because it proves a real real good generalization power of our model and a few samples from the internet visible face everything is very simple here is less than one year and you can see that even with tricky samples for example here quite tricky hair features you can see our model still performs very well and also age works very well you can see even when face is not available so model performs really good and you need to remember model never seen samples like for example this tool without visible faces because we trained our model on face centric datasets because it's hard to annotate data without faces for humans this will be super hard and we just removed our removed faces from the data during the train so we drop them randomly and also we dropped randomly body crops so we never seen samples like this and still works very well so generalization power is really good and what about human level because we had control tasks we can calculate it and you can see that human on average mean and median you can see here so ok we are quite bad at this task you can see that Maya is higher than 7 that's quite big Maya and if you remember model performs with Maya 6.66 even when faces not visible so even faces were removed so model can operate better even without faces on average of course because some of annotators has quite good Maya like 4.5 but only few another plot about relationship of Maya and age you can see that model almost all the range except very left but this data was created from MDB control task was created from DB and this data set is quite hard imbalanced so just a few samples on very left and very right and it's hard to say why that so it's need to be researched deeper but maybe that's because just few samples there not enough data here for some conclusions and about speed about performance so original model is very fast you can see that with big batch size this model can perform with more than 1000 frames of course our model a little slower because of multiple inputs but thanks for early fusion strategy slowdown is just about 20% so it's still very fast almost 1000 frames can be achieved per second okay so validation code all the models trained on open data sets demo with our full closed model and everything else you can need is available by this URL on the github there are also our contacts feel free to contact with us and also we have a telegram channel of our team it's in russian but if you are interesting you can also follow us with QR code thank you do you have any questions thank you very much Maxim we have questions in the audience hi there, thank you for your talk I have quite a few questions maybe first straightforward questions how does your model perform versus real state of the art which is namely a NIST challenge so you have compared some baselines in academia which are not quite updating there is less and less data have you tried to send it or you already sent can you please comment thank you for the question we based mostly on papers with code so we took all the benchmarks we can found big benchmarks we can found there that reflects our real world task without celebrities with benchmarks we took all the benchmarks even with celebrities so we based on papers with code mostly and we took first place for everything we can find there choosing from something big benchmarks like this okay but you are aware of NIST challenge right okay so the second question is about the data so maybe I misunderstood but two big chance of your data is a celebrities b people whose age was crowdsourced estimated so you don't have real age I mean you don't really have an age you just have an estimation from different people or celebrities which look better than they should right yeah okay so it's very hard to explain in short time so much information here so yes again existing data sets they are usually has been taken or in some studios of police officers of course because only there you can estimate real age so they are small and most of data sets contains celebrities like mdb for example for obvious reasons because you can obtain their ages that's why we use the crowdsourced and yes we can't estimate real ground truth of course we can estimate how well eventually our annotation is it's 3.5 Maya should be like this based on control tasks of course this estimation is based on control tasks generated from mdb but I expect it should be even lower because mdb is harder for annotators as you mentioned because of celebrities they are biased very hard so yes we can't obtain real ground truths but we expect our annotation is super precise thanks for wasted means strategy because with other statistical methods as you can see Maya will be quite high so maybe this is the last first question the question is that you use standard RMSE because it's used in some of the papers but there are quite a few papers and even in this new challenge it states that it makes sense to have a different metric on age estimation because 2 years difference for children or for old people is not the same as for middle aged people so there was like as far as I remember quite a few metrics proposed back in 2014-15 could you please comment on that yes I agree that Maya isn't perfect metric here and for sure error for life 2 years on the very left or very right are not the same and we have some samples with even age more than 100 so it's of course 2 years here it's not an error at all because it's impossible to see some difference from the picture so I agree but we used something that can be used for comparison with other works yes because of that so maybe in future we will use more advanced metrics but because we have this plot we can be sure that model performs quite well you can see that error growth of course it's obviously should happen but not so hard as for example human thank you all really great work and thank you for this work and data set it's really important because there's problems with data sets right now thank you very much and for your questions one comment and one question one comment your model by using celebrities as a model couldn't be right because celebrities undergo regularly cosmetological procedures and they are age cheaters so they age on their face for example on the picture is not a definite age couldn't show their definite age the question is how your model of age definition differs from for example Microsoft's and in Silicon medicines age definition models and was the range of plus minus was the range of the right answers for your model age definition model and how it differs from Microsoft's and in Silicon medicines models sorry did I get you right what do you mean you mean accuracy of the model or what yeah accuracy is meant I don't have results for their models so I can compare I never think them if you have them send and we will try to run if there is some benchmark so I don't know but you can use our demo for example if there are no benchmarks and you can compare manually for example but I don't know benchmark I can run and compile with this models Microsoft any more questions seems no I just can comment on in Silicon model probably they have some updated one but I have seen a demonstration of it back in 2016 I guess and it was actually very funny because it was Eric Zhavernkov who was presenting it to Skoltec president Alexander Kuleshov who was very well known to look much younger than at least at that time he was like 70 but was looking like 60 probably and then the model output was like 82 so it's apparently there were issues for the elderly people because probably not that much data for this category the same as we distribution is super hard we try to solve this developing next version we try to solve this now so yeah that's a real mess yeah sure okay so let us thank all the speakers of this part of our session and now we have coffee break so we ended a little bit later so I suggest that we take full 10 minutes just start a little bit later so 15 minutes to one we met again let's start because every minute we have now will be subtracted for our lunch so please let's try to make it brief so my name is again Zabalov I am replacement chair and our next talk is interactive image segmentation with super pixel propagation Good afternoon everyone I am delighted to present our paper titled interactive image segmentation with super pixel propagation this is a collaboration between Pixar and American University of Armenia our main task is interactive image segmentation which is to create a tool user friendly tool which allows users to achieve precise segmentations by actively guiding through process let's look at some characteristics of state of the art methods they are mostly click based methods they use click user interactions they are commonly deep learning methods trained on huge data sets and they are designed to get 85-90% intersection over union of iterations here is shown an example of one such method vocal click after one iteration and 8 iteration of user clicks so to address this challenge we are concentrating on a method that doesn't require a training data set and we are prioritizing achieving higher than 95% intersection over union precision so let's take a look at our interface of our method workflow the first one is the user zooming user zooms into the part of an image and then this sub image gets partitioned into super pixels using ETPS algorithm next user may click one or more times inside the object of interest and the next step is fast dashing method and arrival time distribution which is the main contribution of our workflow this is the crucial step for controlling the propagation wave by super pixel and finally user can utilize a slider to control the overflow of the wave from outside the boundaries of the object this is a cyclic process and it continuously improves until it reaches an acceptable segmentation and user can then extract the mask for future use let's take a look at the user interface shown in the video below the right part the right part is the final mask and the left part is the segmentation process this example achieved 99.86% intersection over union in about two minutes let's dive into the data sets we used for our experiments we are interested in accurate image segmentation with well defined boundaries so that's why we use data sets with reliable ground truth masks first two Berkeley first two Berkeley and Davis are benchmark data sets from other state of the art methods they are very popular and also in addition our in-house logos data set for experiments in another settings another domain so let's take a look at the graphs summarizing our evaluation experiments first one is mean number of iterations per intersection over union it is worth noting that we extended our experiments up to 500 iterations instead of commonly used 20 to explore how methods work on higher precision the purple line in the graphs describe our method other two lines are for state of the art deep learning approaches as you can see the initial segmentation deep learning methods reach the initial segmentation more rapidly than our method but as the number of iterations increases our method achieves much higher intersection over union over time it is improved increasingly and also it is worth noting that on logos data set some methods didn't even achieve 85% intersection over union which can be a problem of specifics of training data sets for these two deep learning approaches the next metric is cumulative number of images to achieve certain intersection over union values in these graphs a higher graph corresponds to a better performance of the method and as you can see on all these those three data sets our method outperforms these methods to do such extensive experiments on huge data sets and to reduce user effort we need to simulate the user interaction part for these methods using the ground truth masks there are shown two examples of such optimization process one of the state of the arts and our method the left side is the segmentation process and the right side of each video is the difference mask between current segmentation and ground truth mask so in general we have several important contributions to image segmentation field the first one state of the art deep learning approaches usually achieve usually achieve good segmentation in under 10 iterations however it is hard to achieve a higher number of precision using such methods and whilst achieving initial segmentation our approach outperforms these methods on high accuracy high accuracy segmentations and on detailed boundaries our approach also expects considerable user effort for good results however it is significantly less than manual segmentation for this part please refer to our paper as we have limited time to discuss this during this presentation as we look ahead there are some improvements for future research the first one is improvement of our method to handle better textured images adding negative clicks for the fewer iterations for boundary variations we also plan to investigate some hybrid approaches using deep learning methods for the initial segmentation and use our method for final refinement to reduce overall user effort thank you for your attention do you have any questions your colleagues do you have any questions because I have many so thanks for the part I'm not really into the field of interactive user segmentation but I learned about segmentation so the first question is straight forward have you compared with segment anything model which is quite the last few months we did this research quite sooner than now and we actually didn't compare it with that method but we are compared with state of the art interactive segmentation approaches and our main our main advantage is that we are we are good at higher precision than deep learning approaches yes I see and could you please also go back to the slide with this very good curves so my understanding is that these clicks they are not really user clicks but there are some generated clicks by some algorithm right so how to distinguish maybe the procedure which was clicking was not that good which was a good clicking procedure and a good part of your algorithm is there a way to do it maybe if I click in a correct way with different model I will achieve better results faster so we have used the interaction of automation methods mainly used in our other methods we are taking click as the center of the difference mask the larger connected component of difference mask and we try to get very similar automation as the other approaches like RITM or focal click how well is your model performing on more complex scenarios like when you don't have two segment images like this but a human from background or its hair which is not always a hard mask you should sometimes also give soft mask the main advantage of our method is that it is gives full control to user unlike deep learning approaches which can be hard to achieve the small boundary delineation as the example you said but our method user can fully get full segmentation for the desired parts because it is traditional method not deep learning approach Any other questions? Well in this case let's send the speaker again and our next talk is acne recognition training models with experts Thank you so I'll present the work acne training models with expert I represent high school economics and we did this work with collaboration with skin research institute in US so we will start with motivation so we know that acne is a huge problem nowadays many people have it and I guess it marks the quality of life they can feel insecure about that and so on so the doctors who study and treat this disease are called dermatologists and they usually firstly to diagnose and then to choose the procedure to treat that they use so called severity grading system there are several severity grading systems but most of them are based on the all of them are based on the visual features and the majority of them are just counting the amount of acne lesions on the face of the individual and then proceed to give the individual the severity score so the example of such systems are AGA and GAGS so our goal is to develop our own dataset with our own grading criteria because there are drawbacks to the majority of the grading criteria because they only focus on the acne lesion count and the second goal is to develop automatic grading severity system so we first start with again these issues with labeling from the point of view of dermatologists so first of all if we choose two different dermatologists let them discuss what they think is the right criteria and then set them to separate room give them the same image they will return most likely with a different score another issue is that the same dermatologist can give the one score on one day and another score on another day so and as I mentioned the most system account based so to illustrate that issue so there are two different dermatologists in question and as you can see that the first image is a red line is a sorted scores of the first dermatologist and the blue line are the scores which are sorted according to the index of the first sort of the second dermatologist and you can see like there are a lot of outliers and distortions and the general trend is preserved but the distortions are there and the second plot is a histogram of both scoring so you can see like you can observe the differences as well so we acquired our own all data and decided to to let them determine the optimal guideline to score the images and this is the table represents the conclusion they reached all together there around 7 images and the score is real valued not critical as you can see a lot of different factors are taken into account apart from the counting so after this consensus the images were labeled according to what was the consensus and we can see the distribution of the scores on the data set so later on I'll explain why we need the additional data set but just to demonstrate some facts about that additional data set it's called Technis 0.4 it's open source so one thing important to mention is that our data set consists mostly of selfies taken in front but this one is from photographs for this data set are taken from different angles which so those angles and those conditions to take the photo are called Hayashi requirements so in total there are 1.500 images approximately and Most importantly about this additional data set is that it has bounding boxes around each region and in total 90.000 bounding boxes around 90.000 so bounding boxes distribution are shown here so most images have almost no bounding boxes but the more bounding boxes the more rare such images so examples of this data set you can tell that the photograph was taken from the angle and you can see like the bounding boxes around lesions so then since our main data set has real valued target variables we solved the regression problem and to evaluate the quality of automatic grader we choose the following metrics the first one is well known it's mean episode year measures the absolute year for each prediction and averages them and the second one is symmetric map so in short it's it's symmetric version of the just map and what it does it's basically an analog for a relative year but for the whole data set and the symmetric part is about the denominator because if we in the standard map formula the denominator just consists of of the truth value and it's symmetric which is more for for our prediction being bigger than the actual target value and less for the case when our prediction is less than the target value so this denominator is basically symmetric correction for this metric so for all our experiments which follows that we use the following computation techniques to increase our data size so the first baseline approach is just to choose some backbone and that fully connected layer at the end and get one value for the score so you can see the results presented here but we can see that mean absolute year is more or less decent especially for mobile net V3 because our scores ranges from 0 to 1 but symmetric mean absolute percentage year mean absolute percentage year is quite quite big here so it means that for some individual images the absolute year is jumps a lot and yeah this is transfer learning paradigm so next we know so we think about why we have such deviations why we have such results and most so the main conclusion is that in our main dataset we don't have positional information about acne lesions so we have to start searching for something additional and the additional dataset which I described earlier is that thing what we liked so we then proceed to use this additional dataset to build some sort of detector or segmentator I guess we could call it that we have to somehow adapt our target images to the segmentation problem so what we did is just everything inside bounding boxes we just label as a target pixel and everything outside is not a target pixel this way we can try to build some segmentation but we see like that the segmentation performs the detection model and visually it looks like this so we can see like in case the acne lesions they are basically for segmentation in case there are not much acne lesions we get the following picture which is above and in case the more acne lesions there are the better segmentation is but nevertheless it's not suitable for our use so for detection we trained the model and the yellow model and we can see on the image that there is a significant improvement we can observe correctly label to bounding boxes around acne lesions so using this detector we can now use this detector to improve upon our initial line so we oppose the following the following scheme so we choose detector in our case it's yellow, we get the bounding boxes and then from the bounding boxes we simply the simplest way is just to count them and when to build about that some classical machine learning technique like regressor or something else like boosting etc but this is just one feature regressor so we found that just linear regression works the best and so this approach produced an improvement but later on we discovered that we can slightly improve upon that as well we can introduce two more heuristic features basically handcrafted ones so the first one is just measures the coverage of our detected lesions so basically it's a total area which covers which those lesions are covering and the second feature is sorry okay it seems like this old version of the presentation but basically here is the second handcrafted feature which measures the amount of lesions which were detected in different regions of the individual phase to do that we split phase into N by N grid and count how many boxes were detected in every single cell of this grid today we acquire two features but the second feature which is called positioning basically amounts to N squared additional features because the grid is N by N so the proposed scheme is shown here so basically we add two handcrafted features but we can now discard the count feature but basically just sums up to the count feature so its count feature is just related to this positioning features and we acquire slight improvements upon the results so from the initial baseline we didn't debate much in terms of mean absolute year it still looks more or less the same but still an improvement but in terms of symmetric mean absolute percentage year we improved from the initial results so to sum up we developed a new grading criteria for severity score of acne we suggested the model to automatically grade the severity according to this criteria we proposed new handcrafted features and for the future works there are a lot of degrees of freedom to improve upon for example to choose the amount of cells in the grid in positioning feature we can vary the detectors use different detectors and so on so there is a lot of room for improvement so thank you if you have any questions I am happy to answer them thank you for your beautiful talk do you have any questions your colleagues how do you think is it possible to take into account some additional information on a person such as age we don't really know but it's a good question because certainly those information information can probably push on and there was also there was a lot of talk about from to take into account the race of the individual since it's a sensitive topic in the US but yeah, you're right so it's better to take into account the additional features but we didn't do that if no more questions arise let's send the speaker again thank you so hi everyone today I represent for also for my professor and reception call from higher school economics university and our paper is learning facial expression recognition in the way from synthetic data using lightweight neural networks so facial expression recognition is a task to classify the expression in digital images or video frames into category like anger fear, surprise, sadness happiness and so on so it has a wide range application like in marketing in gaming how monitoring or any human machine interaction etc and I believe this is also a very familiar topic so we have SMI some recent progress in the facial expression recognition and this we found that despite numerous proposed methods the performance hasn't improved significantly over the last two years this is on the effectiveness that it says and so this motivator to explore the concept of ensemble model that we combine different single idea, different single solution to leverage their advantage to enhance our performance so in term of data sets I think that they are not the challenge that we are lacking the data set for this task and for example in the right table that I believe that this number of image is not a big number when we compare with the number of people in this in the world so this leads to a common problem that those models perform well in a controlled lab environment but their performance drop significantly in real life conditions and because gathering a proper amount of real life data sets is time consuming and expensive missions also that to gather the concern agreement of individuals to get their facial image that is not easy task so using synthetic data sets can address those problems because it offers numerous advantages and primarily due to its generations so we also motivated by the synthetic data using synthetic data sets in our research so in our paper we use synthetic data set from the learning from synthetic data set competitions from the photo of the photo effect behavior analysis in the wide workshop and we use the LSD training data sets to train our model we use the LSD validation data sets from this competition in our validation step also because the test data set was not public so we have example some images from the multitask learning competitions and we use it as the test set to evaluate our models so we use the F1 as the magic in our validation step which is similar to the original LSD competitions and after that we deploy our models in the general and activate and measure the frame per seconds of our models on a random input tensors so this highlight just provide a more detail illustration of the data pre-processing using in our research in total we have 277K images in the LSD training data sets and 4.7K in LSD validation data sets because that the original image have some blur is input size quite small so we decide to have an additional pre-processing step here that we use a super resolution algorithm here to upscale and deploy the images and we run the experiment the highlights in this variants of those data sets so we don't apply any element for the image because we want to and we apply the same training procedure for on the model because we want to to have a fair comparison of those models on this limited data sets so in terms of the proposed methods we have a mind and select the solution that were used by the top performers from this LSD competitions also first we select the multitask that WIC is recently the state object model on the net data sets and the second one is Dan, a transformer based model WIC is well known for its advantage on the generalizations and the last one is a graph convolution that were used in the second solution from the LSD competitions that we propose some different ensemble approaches to utilize the advantage. The first one that we combine the empty we use the backbone from the empty emotifness and we use the attention layer and classification head from the Dan model and the second one that we that the empty emotifness WIC we use the backbone from the if the empty emotifness and the graph layer, a graph convolution layer and classification head from GERS and the last one WIC just the combine of those so in our result we find that the best result were achieved by the ensemble models and in particular that the empty emotifness then GERS achieve the highest F1 score of 0.771 on the original validation data sets and the empty emotifness then achieve the highest F1 score of 0.419 on the empty data sets in general that the result so that the ensemble models achieve a better result than any single model on the LSE data sets and the ensemble model of the empty emotifness and GERS even increase the F1 score by a significant least factor by 10% so the similar results were observed on the MTL data sets except for the most complex ensemble model from the the most complex ensemble models and although this model achieved highest score on the LSE data sets it failed to provide a high solution on the MTL data sets another observation that the ensemble of the empty emotifness and then not only achieve a higher score compared to the single models but also able to achieve the higher results on the MTL data sets so I believe that the state of the empty emotifness when we use it backbones we could attract the better embedding from the data sets and when we passing this embedding better to a more complex classification has so that from dance or gove we could improve the performance significantly however when we perform we compare the F1 score between the LSE and MTL data sets we see that the F1 score drop significantly this could be due to the lack of generalization of the synthetic data sets and the difference between the label distribution between the LSE and MTL data sets so in terms of inference pick since the ensemble model were more complex than any single models those inference pick were slower however while the real time is around 30 fps we could see almost model achieve faster or near real time here and if we stop here at this point we could say that maybe this is good for real time applications but we want to evaluate it in a more practical condition that we have included those models in the typical end-to-end video analytic applications with a face detector a face recognition our emotion recognition models and I have used it in some new scale from my work before and I think I got some positive feedback but if we look at the frame perspective here we see that even only one face appear in the frame that the frame perspective drop less than 20 and when the number of face increase the spec drop very quickly so I believe that there further work need to be conduct here to make the model faster or something to bring those models closer to the real time applications so to conclusion that the first one that we propose an ensemble approach that combine different single state of the art models to achieve a better result on the facial expression recognition test and secondly although there are some gaps of F1 score between the LSD dataset and MTL dataset that using synthetic dataset is still high feasibility and in further work maybe not only in a lecture synthesis dataset but also for all the facial expression recognition dataset and the last one that since the influence pick up many deep learning models still meet the near real time requirement in a real application that maybe there are some further work that we will explore later to make our model faster and so that's all and thank you for your attention Thank you for your talk as a previous facial recognition worker my heart is filled with joy when I see that the people not only combine the models but also implement them on hardware and measure FPS so thank you for that Do we have any questions dear audience? Well, it seems that I have a question I am really curious I am not familiar with the original challenge can you please comment on that whether each face represents one emotion or there are some kind of soft labels because some like maybe some face maybe both disgust and anger something like that I think emotion is a very complex contact for example I think some politicians hiding their emotions maybe in the face you don't see anything but they are feeling happy or anger something like that so I think that maybe in term depend on the requirement or some specific use I think we can develop some model for specific tasks for example for children in a class or something like that ok, thank you so let's thank the speaker again and our next talk should be semantic aware gun manipulations for human face editing ok, hello my name is Khrusov Pavel and today I will present paper semantic aware generative adversarial network manipulation for human face editing the present study is devoted to manipulation method in the generative adversarial network Latin space in context of human face editing for this study we have chosen several unsupervised methods for detecting detection semantically meaningful direction in the style again to Latin space we evaluated the quality of obtained direction and images obtained with them and also we analyze result and propose original method that allows to improve quality for manipulation with larger aspect in this work the style again to model is used this model uses style based architecture as opposite to the original architecture where random noise is immediately transferred to the synthesis network here a non-linear mapping is used that produce an intermediate so called vector style this architecture allows to make the variation factor more separable in particular it is shown that intermediate space W is better linear separable than original basis that it is useful for image editing task in this study we have chosen three methods for the discovery of a semantically meaningful direction in the generative adversarial network Latin space the first method is based on optimization and includes two main paths the first one is matrix A containing the direction in which the shift is made and the second path is a constructor that takes two images original and edited and try to predict chosen direction and shift size these two paths train together and loss function uses cross entropy for the predicted shift direction and standard and mean squared year for the prediction of shift size the next method is based on principle component analysis to get the direction with sample n random vectors from Gaussian distribution fit them to the mapping network and we obtain intermediate Latin style vectors W calculate principle components on their basis and use principle component as direction and the last method is closed form of factorization of Latin space it's based on the assumption that the weights of modulation layers contain knowledge about of the semantics as direction we use singular vectors of weights matrices of modulation layer for this for experiments we calculated singular vectors for layers took the average and use those vectors as directions and as matrix we use fetched assumption distance as this matrix allows to evaluate the quality and variability of generated images the next matrix is learned perceptual image path similarity that allows to evaluate the perceptual similarity of images and to evaluate how well the target attribute changes they pre-trained regression prediction was used and the last matrix is cosine similarity as an example consider shift in direction associated with H change as can be seen in the images all methods are able to find a direction associated with an H change at the same time it's clear that when we perform in manipulation with changes both in the direction of decrease and increase all methods add additional attributes such glasses change face position hair color and so on but generally it can be said that the methods based on optimization show the best performance actually this is confirmed by the value of matrix to optimize based method we have the best values fresh reception distance best values lornet perceptual image past similarity and best values for other metrics during the experiment it was found that it's difficult to achieve changes only local attributes such eye size at eyeglasses and so on to overcome this problem we used so-called layer wise editing technique in this method shift is applied only to the vectors only those fitted to certain layers for example consider a shift in direction associated with eyeglasses this direction presented here was detected by optimized based method the first row is images obtained by applying her shift to all the levels shift is applied only for the first two layers and we can see that in the second case the identity of the face is well preserved there are practically no changes in independent attributes in the case of planned shifts for certain layers we get much better metrics value especially for fragile to improve the quality image obtained by large shifts we developed original method based on the use of extended latent space of style garner tool instead of the mapping of one mapping network several are used the number is equal to the number of modulation layers so which modulation layers receives its own vector w initially all mapping networks are obtained as a copy of the original networks weights of the last four layers it's a hyperparameter which mapping network obtained as a copy of the original mapping network since manipulations don't change the line images obtained by the modified generator and edited images should belong to the same distribution as images generated by the original generator we minimize deviation in discriminator prediction for images obtained by the original generator and modified generator also we minimize deviation in discriminator prediction for images obtained by the original generator and shifted images as generator we took as discriminator we took pre-trained discriminator из оригинала «Альгана-2» модели, чтобы сохранить идентичность между имминиями, используя оригинальный и модифицируемый генератор. Мы будем минимизировать перцепшу имминии пас-симиллера. Чтобы улучшить тренирование и минимизировать вещество мультипликации веса, модуляцией, и объекты, в соответствии с материкой. Сейчас посмотрим результаты. Несмотря на факт, что только дирекции, которые были получены в форме фактиризации, были использованы в тренировании, вещества были получены для шрифта в направлении, которые были получены по всем предыдущим методам. Например, здесь есть шрифты, которые были получены в направлении, которые были получены по всем предыдущим методам. В данной линии имминии были получены по оригинальным генераторам, и в длиной линии имминии были получены по модифицируемым генераторам. Мы можем увидеть, что имминии в первом колонне были почти identically. Они были получены в первом и втором колонне. Мы можем увидеть, что имминии получены по модифицируемым генераторам. Они выглядят более натурально. Мы имеем более натуральное колонне, более натуральное философейство, а также меньше артефактов. В таком случае, мы можем увидеть, что в данном случае, когда мы используем шрифты для директора, используя PCA, и в директоре используя пакетеризационный метод. В результате visual analysis были уверены, что их метрик было директором для директора для директора, с помощью H-change. Можно предупреждать, что в течении Латинской Кот-Z использует в течении Латинской модели, используя культа-лит, в течении Латинской модели. Эта гипотезия поддерживается метрикой для 0-шифта. Мы обзорим высокие дистанции, но дискриминатором still recognizes images as real with the same level of confidence for images obtained without a certain space. В итоге, я могу сказать, что метод можно использовать для human-face manipulation или human-face editing. Также мы можем использовать техники, чтобы улучшить в течении в течении, в течении метод, чтобы улучшить улучшение манипуляции в течении. Спасибо за внимание. Какие вопросы? Спасибо, Павел. У нас есть вопросы? К сожалению, у меня есть вопросы. Один из моих вопросов о ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... Другая question is that whether it is possible to transfer these methods to, let's say new era of generators, namely diffusion models. Thank you. About cross-question. Since we use unsupervised methods, editing really change not only target attributes, but independent attributes, so when we change genders, usually we change hair size also. But when we use so-called layer-wise editing technique, we can, in some cases, but not all, we can achieve more sustainable changes. And about the second question, in this paper, we don't consider possibility to transfer this method to fusion models. But I think it's a goal for the following research. That's for the mind. Okay, thank you. Let's send the speaker again. And we are approaching our final talk. So I'll present to you our work on dynamic gesture recognition via contrastive pre-training on video sequences. So we'll start with motivation, and then the problem statement, where we train the soul, the data sets that can be used and current sort of approaches, and our proposed approach. Then we'll show the results and conclude the work with further research. So what is the task? One second. I'm trying to move it. So the motivation is basically to develop a system to recognize dynamic hand gestures. It can be used in many scenarios, basically mostly human-computer interaction. We can control robots, computer systems, games, VR and AR applications. And we can translate and generate sign languages, especially for speech and hearing impairment people. We expect to see even more applications with research and development involved in this area study. And we mainly focus on human-computer interaction here. So the problem statement, we were basically given video sequences, I mean image sequences, and we have to identify what is the gesture class that's been performed in the last end frames. And is the fixed size window that we use to basically identify for the whole sequence. So the sub-task here is basically hand detection, because given a full frame, we have to find the hand and then we have to come up with a gesture class. And preferably segmentation to remove any other irrelevant information from the image. We also have to recognize neutral gesture, which is basically not a gesture. We're showing our hand to not doing anything, anything special that we need to recognize. And it has to be end-to-end trainable pipeline, because everyone is executing gestures differently. And using some other subsystem, which is not trainable, can produce noisy results and poor solution. So the data sets that we found is basically here. We'll go over them quickly. First is Cambridge hand gesture data set. It's very simple data sets, nine classes and only RGB image sequences. They are executed on a dark background. Then there's a DIG data set. It will use it for evaluation for hand key points by method methods. And it contains depth images as well as 22 3D key point sequences. And there's 14 classes. The key point sequences are generated through the Intel RealSense depth camera. Then there's ego gesture data set. It contains RGB and depth data. It's a very large data set with 83 classes. And the camera is capturing from the top of the head. So it's an egocentric camera view. And then there's data sets for dynamic hand gesture recognition systems. Contains also RGB and depth data. And it has 27 classes shown here. The quality of this data set is probably the highest among other data sets. Because every participant has been trained to execute the gestures. And they strictly follow the guidelines. Then there is Nvidia dynamic hand gesture data set. Basically it has RGB and depth and infrared data. An example is shown on the site. And it has 25 classes. Gesture data set. What's different in this data set is that it has also neutral gestures. And 27 classes. Only RGB data is captured. Then there's IP and hand gesture data set. Hand data set will use it for evaluation for image based methods. And it contains RGB optical flow and hand segmentation sequences. 13 classes only. Will not use optical flow and hand segmentation. Because we'll try to focus only on RGB data. As it's the least noisy. And we want to focus on that. Then there's a Chalon dataset. RGBD image sequences. 249 classes. It's a very large dataset. And here's the depth sequences shown. And Sheffield connect gesture dataset. RGB and depth image sequences. 10 classes. And this dataset is kind of weird. Because some of the classes are basically drawing some figures. Like a triangle showing on the left here. And since we focus on humor computer interaction. And not really on sign languages. But there are very high quality datasets for sign languages. We'll mention them here as well. First is American Sign Language dataset. It contains RGB image sequences. And 29 classes. Basically the letters of the English alphabet. And the space delete sign. And the neutral gesture as well. Then there is the modern new Slovak. Russian Sign Language dataset. It's a very large dataset. Containing 1000 classes. Gluing words, phrases, numbers, name and sentences. And it contains RGB and 21 3D hand key points. But the hand key points are generated through the media pipe. Hence framework. So it's not manually annotated. And it's prone to inaccuracies. So we'll discuss the current sort of approaches. And as I said, we'll use only hand key points. And image, pure RGB image based approaches. Parallel count. Basically it does 1D convolution supply to each key point coordinate in the sequence. But it doesn't really treat the sequence as the sequence. So it's unlikely to be superior to sequence based methods like transformer architectures. And the other drawback is it works only on complete sequences. Which is not the case in real world scenarios. Where we have to actually use the sliding window. Because we never know when the gesture is ending. Then the DG STA it does spatial and temporal attention. With the key points and makes use of positional encoding. And as we'll show, this is the best approach we've found so far for hand key points. Then for image based, we came up with 3D Resnext 101. There's also MB2, which we'll find on the table later. And here when we say 3D, we mean we're dealing with 2D image sequences. So 2D images in time. So our proposed approach. What inspired us and what we actually propose to do. And the subtext that we'll solve. So we're inspired with OpenAI clip method. That's the following. It takes image and text description pairs. It has a text encoder, which transforms the given text to vector of size k. And the image encoder, which does the same thing, but with an image. And then both encoders are trained to maximize cosine similarity within the pair. And minimize cosine similarity outside of the pair. So it's called contrastive pre-training. It also uses symmetric cross-entropy loss to minimize load in both directions. Text to image and the image to text. The rest of the clip is less relevant for our test, because we'll discuss why. So what we propose is to replace the original image encoder with 3D image sequence encoder. Because we not only have one image, but a list of images. And we should take a pairs of gesture image sequences and their textual descriptions. And we can take a large pre-trained text encoder, which will generate the text embedding from the text. And we'll train specifically the image sequence encoder to produce similar embeddings to text encoder in the same way the clip does it. So we'll replace the original classification task with actually a metric learning setting. So do we even need the text encoder afterwards? No, we actually, it serves its role through text supervision. So during inference stage we can just use the image sequence encoder and forget about a text encoder. So we can just use any huge sort of model for text encoder that is trained to produce good embeddings for the text. And we picked a sliding window size 32, which should be determined specifically for every data set. And it should be studied how to take the window size properly. So the sub-tests that we solved, we don't even need irrelevant information from the scene. So we can use a pre-trained hand detector to crop the hand. And we can take the largest visible hand that we say we're working with. And we can also replace the other irrelevant parts of the image using hand segmentation, as we discussed before. So the results that we came up with for hand key point based approaches, we can see on the table that MLP and CatBus succeed over LSTM, which is pretty interesting, because MLP and CatBus don't treat the sequence like LSTM does. So we can see the non-trivial connection between the noisy hand key point in the sequence. And image-based results, we're using IPN10 data set. And so there are C3D, and then ResNet, ResNext, and MV32 small. And so the baseline has been chosen as MV32. And this can be also substituted with the latest, for example, TRA transformer. So the conclusion, we deduced that hand key points approaches are more prone to errors by design, since they use several subsystems to generate, for example, hand key points if we use a media pipe, for example. And then we designed a novel system that is able to recognize dynamic hand gestures, can generate as well the new unknown gestures, and doesn't need to be trained on all possible gestures due to a metric learning setting. So we combine vision and text, which is in line with how we humans understand gestures, as they provide the language of their own. And there's a link to sign languages here. So for further research, we propose the following. Current datasets have no textual descriptions, just labels. But we want to use the text encoder. So we probably want to have very detailed descriptions for every sequence. So we propose a large pre-trained video-to-text model that will generate descriptions for us. For example, for a wave gesture, we can do the following. The person hand is moving from side to side with an open palm, and it's the wave gesture. So the current datasets will just have for the wave class, but we need a detailed description, as I said. So we'll use generated descriptions as the input to our text encoder. And implementation of the proposed approach is left for further work. And it also needs ablation study to understand which parts of the system actually benefit the actual solution. And we need to carefully study the sliding window for every dataset, and came to come up with an algorithm to properly choose it. And right now we just used 32-window size. It's also important to investigate depth-ware models for the task. So right now we only have RGB data, but we could add depth data to it. So thank you for attention. Thank you for your talk. Does anyone have any questions? Well, we're kind of actually not even slightly out of time. So, yeah, I also don't have any questions. So let's send the speaker again. Thank you. I'm the chair of today's session, and it's my pleasure to welcome the speaker. Гагик. Гагик, sorry. And the talk, visualization-driven graph sampling strategy for exploring large-scale networks. Four is yours. Thank you. Hi, everyone. Thanks for coming. Today I will present our research work conducted with Irina Tyrosyan and Vartey Razaryan on visualization-driven graph sampling strategy for exploring large-scale networks. So what are graphs themselves? Графs are complex data structure that represent pair-wise relationships between different objects. And when speaking of the large graphs, it means that we have a lot of entities that might or might not be interconnected one with another. Here we can see several examples of large graphs. For example, on the left we have social circles, Facebook dataset, which basically represents interconnection between different Facebook accounts. And in graph terminology, like people here or entities are called nodes, and the connections between them are called edges. And for example here, if people are friends together on Facebook, then they will have an edge connecting one with another. And here are different other graphs. For example, these two represent collaboration of different paperwriters with one another. And the key thing about the graphs is that they are gaining more and more usage nowadays. And we have really a lot of amount of information. And they are becoming too large and too complex to apply any analysis on them. And especially when we talk about visual analysis with so many nodes that can grow up to tens of thousands or even millions of nodes, it's becoming quite hard to apply good visual analysis on the graphs. And so what we offer for that is actually an approach called graph sampling. It just refers to selection to only some portion of the nodes and edges and not operating on the whole graph, so that we will have less information to process and it will be much more easier to gain insights from them. But the challenge with the sampling is that we must find a good way to apply it so that when applying and conducting some analysis on the sample, we can actually replicate these results back to the original graph. And all the information that we retrieved will be correct. And so having that, our research mainly focuses on two questions. First is the evaluation of the existing sampling approaches that we have, like which are the state of the art approaches and what drawbacks or advantages do they have. And our second question was the proposal of the novel approach that will address at least some of the drawbacks that the current state of the art approaches have. And speaking of such approaches, after conducting extensive literature review, we actually identified three algorithms, which are more or less good at graph sampling. They are random jump sampling, random edge node sampling and the final approach, minocentric graph sampling, is actually the best one that we had. And so we will now go over each of them and explain how sampling is conducted in each case. First, random jump sampling. Random jump sampling is actually based on the idea of random walk. So what random walk is, it is when we randomly pick a node, meaning an entity from the graph, and just randomly start traversal between adjacent nodes. We randomly, with it we can start, for example, at such point and with random movements go and explore the whole graph. And so what random jump is adding to random walk, it is actually that at each step with some fixed probability we can jump to completely random new region. And it solves some generalization problems, because if we apply regular random walk, then we start at some point and can only move to the nodes that are actually connected to them directly or indirectly. But with the help of random jump, we are able actually to generalize better the whole graph. But, of course, as the whole process is actually applied randomly, there are drawbacks related to it. And sometimes, based on at which point we are starting, we might miss some important components of the graph. And so this was the random jump. The next thing is random edge node sampling, which is pretty close to the previous approach. In this case, we randomly select edges from the graphs and also the nodes that form this edge. And in the end, all the nodes that we have selected, if there is also some interconnection between them, even though the corresponding edges have not been selected randomly, we strictly connect all the nodes that we have picked. And so in this way, again, we are able to get a very good graph sample. But, again, the main drawback is related to the fact, to the word random. Again, as the underlying process is completely random, we still might miss some important components of the graph. And finally, the last approach, as we already stated, this is the state-of-the-art approach, minocentric graph sampling. And the idea of this approach is to break this selection process into two parts, minority selection and majority selection. In case of minority selection, actually we are trying to first pick, like, anomaly nodes from the graphs. By saying anomaly, we don't mean like that thing, just the nodes that are different from the others and might contain more important information than the regular nodes that we have. And so in this case, we define four types of minority structures. The first ones are superpivots. We can note one here. And what superpivots are, they are nodes with high degree, meaning node that has a lot of connections with other ones, and also with high connectivity between its neighbors. Okay, we can see here. So such kind of nodes marked by number one are actually called superpivots. So they are mostly important structures in the graph. The next type of minority that we have are the huge stars. We can see one here. Again, huge stars also have high connectivity. But in this case, now they are forming like star-like structure, meaning their adjacent nodes have no connectivity with one another. The next important minority structure are dreams. They are these edge-like structures getting out from the main clustered component. And also we have bridges. Okay, here, then bridges just connect different highly-connected components with one another. And so this approach is based on first selecting these types of nodes from the graph. And after the minority selection is complete, majority selection is being applied. And the idea of majority selection is very simple. All the remaining nodes that we have are being evaluated iteratively. Like all the ones that are remained, we calculate several distance metrics for them. We actually add them to the graph one by one. And then calculate some metrics. And the node that provided the best distance metric, meaning the sampled graph is closer to the original one, this node is being selected. And once we actually add this node to the graph, the process repeats for all the other ones. And we can directly say that this is very computationally expensive approach, because with a greedy approach, we add all the nodes, then remove them, and then simply add the best one. And then this process repeats. And in fact, with MCGS, we are able to retrieve a pretty good sample, but the main drawback is actually related to computational expense of this majority selection phase, because too many iterations are done. And if we have a very large graph, the process can take really long. And also we have a problem of imperfect minority selection, meaning that actually here we identify four main minority types, but there might be other important components of the graph that might simply be missed as a result of the algorithm. So what we offer, we actually take MCGS as the best approach that currently exists and try to apply several modifications to it. And they are named much major, enhanced minor, connected component, and also the ensemble link of all the separate approaches that we have. Let's also go over each one of them. So the first modification that we applied is called batch major MCGS. It means we take the regular MCGS but modify the majority selection phase. It actually means that instead of picking a single node at each step, the one that performs the best, we decide to take the batch of nodes, like 10 nodes at each iteration. So we will actually reduce the number of iterations by 10 times. And our algorithm will become much faster. And here we can see the actual results. For example, here we have Condense Matter Collaboration Network. This is actually the largest network that we used in our analysis. It represents some collaboration between paper writers in the field of Condense Matter Physics. And here we can see that with original MCGS, like the running time of the algorithm was 79 seconds. But with batch selection, when we picked 10 nodes in this batch instead of a single one, here runs off in 8.6 seconds. Almost 10 times execution time improvement. And how we picked this batch size. Here again we can see some analysis applied on the same largest graph Condense Matter Collaboration Network. And we tried to sample with different sampling rates and different batch sizes and just calculated the execution time of the algorithm in all such cases. And then we decided to apply the elbow method to identify the best point, the best number of the batches. And we can see that starting from number 10 like for almost all the sampling rates, the execution time decrease is not that sufficient. In fact, it is getting closer to zero. So we decided that 10 is the good breaking point and the batch size of 10 will be a good solution for our approach. So this was the first modification that we offered. Here we tried to change the minority selection process. We said that we still after selecting the four main types might have important minority structures. So what we did, in fact when observing our graphs, for example here we still see the graph of the Facebook people connected with one another. Here we can see a bridge like structure with this single node between two largely connected components. And we can see that actually the regular MCGS is missing this bridge. So this bridge does not exist here. And in fact, this bridge is connecting two highly connected nodes that have been included here. But the connection between them is missing. So we decided that the minority structures that we pick, in particular the superpivots and huge stars that we are picking we should also include the shortest passes between them. So just once the minority selection is done we also add the shortest passes between the high degree nodes and only after that the majority selection process begins. And we can see that like by doing our modification actually the bridge is retrieved in the given sample. And one last approach that we offer is called connected component MCGS. The regular MCGS performs on the whole graph. It doesn't differentiate between different connected components. And as we can see on this example some information can be missed. For example in our original graph and at the top in this cloud like thing we have many small connected components. They can count like 2, 3 or 4 nodes and we can see that one supplying regular MCGS the central structure is kept pretty well but this cloud like thing is not retrieved because the components were too small and the algorithm didn't retrieve them. So what we actually offer is to independently apply MCGS on each connected component and the results. And as we can see having done that we have a much better representation in this case. And finally as I said we also offer the assembling of such methods and basically we take all the possible combinations of length 2 and 3 of our approaches and combine them together like batch major with MCGS with connected component and enhanced with connected component and in the end we apply So once our approaches are ready we needed to evaluate like the efficiency of them for that we picked 8 different graphs from Stanford's large network dataset and we can see varying number of nodes and edges with again as I mentioned Condesh Matter is the largest one with 23,000 nodes almost and approximately 93,000 edges. And in the evaluation we decided to go with two steps First was Quantitative Evaluation and the second phase would be the qualitative In case of Quantitative Evaluation we tried to measure by actual matrix how the sampled graph is close to the actual graph representation For this we picked 5 metrics like average clustering coefficient and global clustering coefficient of the nodes they just tell some information about the general node connectivity in the graph and having this we just calculate the distance between the sample and the original graph we also calculate the number difference between number of connected components of the sample and the original graph and we calculate Q divergence distance and between the degree distributions of the graphs and having all these 5 metrics what we are doing we are actually picking all our 8 graphs and for each graph we run each of our algorithms for 6 sampling rates from 0.05 which basically means that only 5% of the nodes will be retrieved up to 50% and also as all the algorithms have implicit randomness we actually run each one 4 times for the given graph and for the given sampling rate and what we do we just calculate the stated matrix for each one by one and for each metric we rank our algorithms meaning the one that had the least distance to the original graph and then the first position and then the worst one will be the last and what we do we just sum up all these positions that algorithms have gathered through different runs it actually means that the algorithm with the smallest amount of final aggregated points have appeared at the top most times so this is the best approach and what we do first we try to using this ranking mechanism identify which approach is the best among the ones that we proposed and we try to compare it with the same approach with the existing state-of-the-art algorithms and here we can actually see that still MCGS leads and it's a little bit up from 0.05 and we try to compare it with the same approach with the existing state-of-the-art algorithms and here we can actually see that still MCGS leads and it's a little bit up from our approach by the final number of points but the difference is not that big and also taking into account that with our approach, especially with the Batch Major step, we make our algorithm more than like 10 times faster then we concluded that this is a pretty good and sufficient result and so this was the first type of evaluation that we performed quantitative evaluation based on the metrics and the main focus of the research was the visual analysis of the graphs we also tried to visually compare the generated samples with the original graph representations so we performed qualitative evaluation of the algorithms and for this we conducted a survey with real 100 users and how was the survey conducted here we can see a sample screen that the users received in the survey at each question like they received a triplet of the graphs and the middle graph was always the original representation of the graph and on the left and right sides we had like samples generated by different algorithms and of course these samples were generated with the same sampling grade and what users had to do they just had to look at the original graph and then from the given samples pick like which one was better and if the user was not able to identify like which was better they could just mark this 2 sampling algorithms as equal and having done that actually if one algorithm was winning over another it was getting 2 points otherwise in case of the draw or equality we were giving the algorithm a single point and so in this case the algorithm that actually gets the most number of points would have been the best one and we identified that in this case in visual analysis our approach is actually winning and surprisingly MCGS even dropped to the third position performing lower than random edge not simpler so we can conclude that actually our algorithm performs pretty well and sufficient in terms of the visual analysis that the users can apply so what conclusions we can make that we got pretty solid quantitative results ok we were behind MCGS just by a small margin and also we got the best results in terms of the visual analysis and of course the very good time complexity optimization regarding of the future work still the problem of selecting not all minority structures remains and there might be room for improvement of the minority selection process by selecting additional complex and important components also we can still apply testing on the larger set of the graphs because as you remember we picked 8 graphs for the main analysis also we can still do potential improvements on implementation level still there might be room to speed up the algorithm and finally we can really look at the application in diverse domains because graphs are really used in different domains as we saw like social networks for example facebook and it can go up to natural sciences like chemistry or physics so real applications in such domains will also help us a lot and basically that is all thank you Do we have any questions like yes please thank you for your presentation well on the slide when you showed the time complexity of the algorithm for what size of network was it yes Ok Ok Ok Ok Ok Ok Ok Ok Ok Thank you, more questions Ok, let me ask one question Can we go please to the slide with manual results like human evolution and are the points are some of the points of all all the people participated in the query or there were some kind of specific formula that was calculated to calculate the final score Provided each algorithm equal amount of we just provided each algorithm equal amount of times to each user they were just shuffled randomly and so through all 100 results each user was given like 16 such triplets and if the algorithm was winning we were assigning to its total 2 points in case of the draw one point and so in this way we got to these final results on how many pointage algorithm has received Ok, thank you and maybe it was interesting to check like do people agree on the same images do they choose, tend to choose the same representations or the same different samplings or the same samplings like have a measure to agreement how they agree on this Yeah, actually we haven't gone into much details on this aspect like whether there are some trends or something else but I think it will be a good direction for future work that we can apply Thank you, Tom Yeah, more questions Thank you You said that you estimate your algorithms algorithms 4 times Each one It's too few, I mean or not We just made different rounds because algorithms have implicit randomness especially the random edge node and random jump sampler and also the minority selection process in MCGS that is described in the original paper just not to do like if they try to identify all the minority structures with a grid one with a regular approach it would take too much complexity so they applied some modification that is why basically everything we have has some implicit randomness so we generated for us like to not have only one good result and say that this graph is good just to get more general representation we decided to run each 4 times and include it in the comparison Okay, let's thank the speaker again Thank you Сергей Сидоров, Сергей Миронов и Алексей Григорьев Thanks for the opportunity to present our work titled Limit Distributions of Friendship Index in Scale-Free Networks by Sergey Siddarov, Sergey Mironov and me, Alexey Grigoriev So, what is the Friendship Index? It allows one to measure the degree these proportions in networks So, Friendship Index is closely related to the Friendship Paradox which says that your friends are on average are more likely to be popular than you In network represented by graph the Friendship Index can be calculated as the average degree of nodes tables to its own degree It's known that Friendship Index allows to measure the direction of influence in networks and also allows to compare networks with Friendship Index and without Friendship Index So, in this presentation we will find out how Friendship Index is distributed in real networks as well as social networks and in simulated networks produced by the Barabasi-Albert model and configuration models Also, we will estimate the share of nodes for which Friendship Paradox travels through or in other words for which the Friendship Index is higher than one and we'll see how different are real and simulated networks Okay First, let me shortly define some notations that would be used A complex network is represented by a graph which has a set of vertices and a set of edges Quite simple DI is the degree of node VI and then the Friendship Index denoted as beta is calculated as the sum of degree of neighbors divided by the square degree of the node or, in other words, it's the same as the average degree of neighbors divided to its own degree and we're interested in finding not only nodes Friendship Index but the average Friendship Index among all nodes with the same degree This will be denoted as sin of k Don't worry, when you'll see it again I'll remind you what this is So, I know you all of you are tired already, probably Okay, let's move on As I've said we'll see how Friendship Index is distributed in real and simulated networks For simulated networks we've picked once created by the configuration model This model is convenient because it creates networks with a given degree power low and with known exponent of the power low and these generated networks don't have degree-degree correlations which would be helpful for us in the future So, to create the configuration model we need a degree sequence These degree sequences are independent and identically distributed samples of random variable X and this variable X follows the power low with parameter gamma Okay Now we'll see what are the limits of average Friendship Index among all nodes with degree of some fixed degree So, first of all let's introduce Nu1 and Nu2 These are the first and second moments of random variable X if they exist and L0 is a slowly varying function which in other words is a function that when its argument tends to infinity if it's multiplied by some positive a its value does not really change So, what we have here is that in configuration models actually if the gamma the exponent of the power low is more than 2 then the average friendship index among all nodes with degree K with some degree actually tends to Nu1 divided by Nu2 multiplied by K which is a constant multiplied by K which is really convenient for us we can we know this value we can calculate it However when gamma is between 1 and 2 it's not so great and there well this say N of K which is the average friendship index among all nodes with degree K when divided by this power it tends to gamma by 2 stable random variable with parameters 1, 1 and 0 this result is not so great cause what we get is this average friendship index among all nodes with degree K depends on N the size of network this way we cannot compare networks with the same size so speaking about the proof of this theorem quickly when case is gamma more than 2 then first and second moments are finite and with the use of central limit theorem we get the following and when gamma is between 1 and 2 then the second moment is infinite and with the use of stable low central limit theorem we get the following result OK this was the theorem and now let's see how friendship index is distributed in simulated networks so we create so we choose a number of combinations of model and gamma the exponent of the power low and for these combinations we create networks of size 300 000 and we will be looking at 3 measures a sample mean which is a sum of all friendship indices among all nodes with degree K and divided by the number of such nodes and well sample variance which is an analog variance for the friendship index measure and sample coefficient of variation defined as a ratio of standard deviation to the mean let's see the results well there are a lot of images here but let's focus on some of them first each row here represents the results for each own network so let's look at the left column first here we see on these logarithmic plots log-log plots horizontally is the degree vertically is the average friendship index among all nodes with this degree for all networks we see that the average friendship index actually follows the power low with parameter minus 1 I mean it's log-log plots and then it's power low if these weren't the log-log plots okay in the middle we see variances let's just skip them and move on for a moment the right column is one of the most interesting because at first glance it may look like they are the same for all networks but however it should be noted that only for network where gamma is more than 2 well 2.5 you see that for nodes with all degrees the average friendship index is or the coefficient of variation is less than 0 this means that mean would be larger than standard deviation well for other networks for small degrees it's higher okay let's move on now these were results for simulated networks let's compare them for some large real networks these are networks from online sources the data for which was already collected before us these are networks from different sources of different size let's see the results for them surprisingly despite all these networks being very different we see the similar results that average friendship index also follows the power low with parameter close to minus 1 variances for real networks are much larger than for simulated networks which results in coefficient of variation also being much larger which makes it harder to predict the values of the network so this was purely about friendship index and its distribution let's move closely to the friendship paradox as I've said earlier friendship index is closely related to the friendship paradox which says that your friends are more likely to be popular than you some of the known facts about the friendship paradox is that it's present in social networks most nodes in social networks have friendship index larger than 1 or this means they have friendship paradox it holds true at both individual network levels and last but not least the presence of friendship paradox was confirmed in some random networks generated by the Barabasi-Albert model and the final theorem for today from my presentation here we'll estimate the share of nodes for which friendship index is larger than 1 actually so if random Barabasi again follows the power low with parameter gamma and its failures begin with m we can estimate the proportion of nodes for which friendship index is more than 1 we can find the bounds so it is actually bounded by 1 means a1 and 1 means a2 and this 1 and a2 differ in the upper bounds of the inner sum they are highlighted in red so what do we do with it here we plot the upper and lower bounds based on the results of the theorem so as you may see these the value of bounds depends on the gamma the exponent of the power low and m the minimum amount of nodes in network so this is plot of the bounds of kappa which is a share of nodes for which friendship index is higher than 1 or the share of nodes which have friendship paradox and these were theoretical results and now I show empirical results for simulated networks so these networks are based on the configuration model and as you may see I will just switch slides back and forth the results are quite similar so that's nice of course again it depends on the result depends on m and gamma which the same one thing I should mention is that well when gamma is between 1 and 1.5 then well the share of nodes for which friendship index is higher than 1 is well close to 1 it means that almost all nodes have it but then the question arises when friendship index is not equal to 1 well the share of nodes for which nodes it's present for which nodes it's not and we'll see it for real and random networks these are random networks that you are already familiar with that I've shown previously and here we see the distributions of the share of nodes friendship index is higher than 1 but based on the degree so for each degree we get the share of nodes among nodes with these degrees for which friendship index is higher than 1 or for which friendship paradox is present and well for all these networks we see that for smaller degrees friendship paradox is present however for large degrees for hub nodes almost never present and also this parameter depends on gamma if gamma is smaller than the amount of nodes with friendship paradox is higher and vice versa okay I think that's everything here and finally let's see the same for real networks well these are also quite similar yes so you see the shapes of slopes differ they aren't so beautiful as for simulated networks or you may consider them more beautiful well so yes it also depends on gamma parameter of power law but well real networks are more complex then the results are a little bit changed from the simulated networks so to sum everything up we looked at the values of friendship paradox about looked at their distributions in real and simulated networks and how it depends on degree of nodes and so in networks without degree-degree correlations network size tends to infinity we've proved that power law degree distribution because it has a finite second moment and the value of average friendship index tends to a constant divided by k however when the second moment is infinite it's not bounded then this quantity converts to a stable distributed random variable divided by this and secondly the friendship paradox is present in all networks whose degree distribution follows the power law and it depends on the exponent of the power law and if the higher minimum degree is larger then it leads to a stronger friendship paradox in network thank you thank you very much for your talk do we have questions yes please thank you for your talk it reminds me one interesting chapter in the proofs from the book book by Eigner and Ziegler and this is a chapter about friends and politicians and the theorem says that if two people have exactly one friend in common then there is a politician who is everybody's friends but your friendship paradox this also should be famous thing maybe it's somehow related to this type of theorems or not just curious yeah and just adding about this moral thing that everyone knows through some 4-6 handshakes it also struggled my mind maybe comment on this well actually friendship paradox is a known topic well it's not I who invented it of course and usually it was developed in social studies and not really in network studies maybe well their friendship index as a measure well this name was introduced recently about your exact example that's an interesting one I I can't really say if there it's the same or not it's a nice thing to check out and I'll do it, thank you wow and one more technical question you mentioned one more technical question you mentioned a bipartite graph and you also applied power law analysis to it and you also applied how to say sort it out the vertices of one part from another so it's just the same sample right thank you thank you very much do we have more questions okay thank you let's thank the speaker again and the last one is approximate density computation Dmitry Gnatov, Kamila Usmanova and Daria Kmisarova thank you for introducing our talk this is the talk done together with my former students and the lab members and also with some colleagues from institute for molecular genetics since our reference asked us to include some other examples where by clustering can be applied we included this part they are responsible for the data we are for the processing so as for the motivation of our research it was constantly multi-model clustering imagine that you have data you can think of it as a hypergraph regular uniform hypergraph where you have several types of vertices like the offers of papers marked by some tags and such a structure sometimes called folksonomy particular one because people use specific tags to mark some resources, some data but this is a three-dimensional case and a lot can be done in a two-dimensional case so here we can find the so-called three communities that is users that shares the same resources and mark them by similar resources or the same resources and mark them by the same subset of tags they are the so-called three communities or three clusters but the question that we asked was what is a good approximation of three concepts that is maximal triadic rectangles and such data and we answered this question and our paper published in machine learning journal already several years ago but in 2D case we are going back to 2D case now there are many things to do concerning the performance of similar techniques and we use as a language formal concept analysis which is rather simple language as we will see we have a set of objects a set of attributes binary relation on them and two operators called derivation operators or Galois operators saying that given a set of objects what are their common attributes and vice versa given a set of attributes what are the objects that share them all and the concept is just the unit of thoughts as in philosophy but here it's more formal it has two parts an extent and an intent all objects from A shares all the properties from B and vice versa all the objects from A these concepts are hierarchically ordered let's have a look at this small world of geometric figures equilateral triangle rectangular triangle rectangle and square and let us consider just 4 properties has exactly 3 vertices 4 vertices has a right angle and to be equilateral we can extract all the formal concepts which are just maximal rectangles in our data and hierarchically order them by set inclusion of the first component for example the concept 234C 234C is just the concept of rectangle or figures it's more general than the concept 34 BC which is just the concept of rectangles this is the tool to analyze real data for example we took all the publications on formal concept analysis performed terms extraction or maybe we use keywords and combine them to taxonomic terms like sort of topics so in the top of such lattice diagram we have 1000 papers about 1000 papers devoted to formal concept analysis and they are split into sub-topics or sub-communities of authors that write papers on formal concept analysis but also a specific sub-topic like software engineering these concepts may intersect the key property of lattices intersection and suprema here it's a bit more complicated you can also analyze real websites users those are the visitors of HSE university websites and also they read some other news like I don't know RIA news or cosmopolitan or expert.ru In triadic case as a source of this data we can use, for example, Bipsonomy German project where the authors can share their papers and tag them so we have triadic data but those were the motivating examples let's talk about bi-clustering and bi-cluster is the approximate version of concepts coined by Boris Mirkin but definitely bi-clusters were analyzed before by Hartigan, for example and bi-clustering refers to simultaneous clustering of buff objects and attributes in genetics, for example, the objects are genes and the attributes are some tissues or conditions under which specific genes can be co-expressed so this is actually a matrix where each cell is a number from 0 to 30 2000 plus something and here we can see that some of them are grouped because of their co-expressions so very large values are in red and if we consider the tissues which are how to say, suspicious to contain malignant tissues they can be the source of cancer or biomarkers of cancer the formal definition of bi-cluster is just a sub-matrix of an input matrix like a gene expression matrix serial valued one will consider only binary ones and it turns out that in bioinformatics they rediscovered formal concepts as such inclusion maximal by clusters so there are theorems saying that but we propose something which is the relaxation of such rigid notion when all the objects should share all the properties we consider just one pair an object and an attribute maybe the gene and some condition and we can apply prime operators saying that M-prime is just a set of objects that share as an attribute and vice versa G-prime is the set of all attributes which describes the object G and then we can intersect such a rectangle with the incidence matrix and count the number of non-empty cells and this is exactly the density of such a bi-cluster here is an example the geometric one so GM cell isn't here in the center these stripes the gray stripes they are full of ones and we can find first primes so to speak they shape this bounding box but we can also find the second primes and they form this green dense cross full of cells and there might be some other cells but they do not form such a cross-like structure so these black cells also belong to bi-cluster actually we may think of bi-clusters as sub-lattices in the whole lattice of formal concepts of maximal rectangles they have some properties like the density lies in the interval from 0 to 1 and actually in reality this one is enrichable so it's a bit larger but we can do for non-empty binary relation and they also can be hierarchically ordered but we cannot devise some efficient data mining algorithms like a priori algorithm for finding association rules or frequently both both items because this relation is a sub-bi-cluster the other is not monotonic nor anti-monotonic with respect to density constraint but we can set the density threshold to be 0 and consider all the bi-clusters and then the following fruitful property fulfills every concept that is dense maximum rectangle is contained in some bi-cluster in terms of computational time there are some propositions which says that the total gain is polynomial versus exponential in the worst case for finding all the maximal rectangles or formal concepts this L might be exponential in terms of the size of G and M the number of objects and attributes L is just the size of the lattice here also some example from the past we analyzed Yahoo dataset with 2000 companies and 3000 advertising terms and we applied object attribute by clustering for recommendation purposes if we generate concepts with different constraints on the A and B components that is the number of firms that advertise their goods with some number of keywords on Yahoo then for 0 and 0 treasures we obtain about 9 million concepts this is infeasible for manual analyzing for example so that's why by clustering was one of the means to reduce this number and when we change, when we vary the minimal density threshold we can obtain the number of patterns which are suitable for manual analysis and we can interpret the found by clusters as markets we do not show the first component of the by cluster they are just ideas which we do not have from Yahoo but we have the real keywords affordable hosting web hosting web and so on this is about hosting market something about hotel market the pattern here is very simple the name of the city and the word hotel we also applied object attribute by clustering to scaled data known in machine learning from UCI machine learning repository here are just some statistics some figure statistics the number of concepts versus the size of the relation in terms of non-empty pairs and we can also see how the numbers the number of concepts relates to the number of objects attribute by clusters found there is a drastically reduction up to several tens SNA related examples probably one of the most famous but small data sets in community detection is the so called so turned woman data set about 80 women and the certain part United States I believe anthropologists ask them several questions what kind of activities they share together like going to the church or to the dinner to some party the cross means that they participated in some event so it results in a bipartite graph and we applied by clusters here and compared it with clicks as communities so you can see that by clusters sometimes can capture can capture larger groups than clicks due to their sparseness so here is another example about karate club that fist in two parts after after the conflict of the I don't know the right word coach and the president of the club some of the club members decided to be with their coach the master and the other part decided to be with the president and there are some key persons here like president and this master and subgroups of people that were identified with clicks and with by clusters here I can say that by clusters are better but they also captured similar structures for three model data we have yeah okay for our free model data we have offers of papers tags and we can also analyze this with similar samples so the more focused result which is maybe not really interesting as real examples is how to compute the density of such by cluster sufficiently we were trying to use epsilon delta approximation here and chernoff-hovedenkin equality allowing us to use only some fraction of points to estimate the density of a by cluster and unfortunately this function the number of points that we should test is unbounded in the most desirable points like delta 1 the probability in threshold 1 and accuracy epsilon 0 so we can at least test it in real scenario here is a small example we have for other spars by cluster we can compute the density of this gray cross almost immediately by this formula and with the formula for n as a function of epsilon and delta this probability threshold we obtain that we need to test 10 samples we test these 10 samples and find out that among these 10 samples only 3 of non-empty cells among them and that's why the coefficient here is 0.3 and this is the size of non-gray areas together we tested the approach on free data sets so intern women, zoo and advertising the greater the density of the data set the greater the accuracy here you can see where in white cells where the fury agrees with experiment for which data set actually zoo data set this is for internet advertising for southern women it was not very stable and here also an example from genetics where we studied is chemical stroke individuals versus non-ence chemical stroke individuals and their attributes known as single nucleotide polymorphisms being attributes we found by clusters we counted their density size and purity that is the number of individuals which are not included and finding single nucleotide polymorphisms describing non-health individuals which are at risk of having a chemical stroke so later on these SNA descriptors are decoded to their identifiers and geneticists analyze them and we are in the process to publish a paper together with them where the findings were discussed with experts in relevant terms to genetics so I skip everything which is left it's just the things which was done and can be done and I'm ready to answer your questions Thank you Do you have any questions? Yes, please The complexity of the algorithm So the complexity is a linear quadratic code Actually it's linear by each of the input parameter that is the number of non-empty pairs belonging to the input relation the number of objects the number of attributes but here for maximal rectangles or formal concepts it's not linear it looks like polynomial but in fact it's not in case in case you have very simple object attribute matrix where of size 4 times 4 where all the crosses present accept the main diagonal then am I right? that seems so then this is g, this is m and let it be n that is 4 then you will have 2 power n patterns so here is the complexity in terms of the input and also in terms of the output so in the worst case we'll have even exponential complexity but we managed to use approximate patterns we lost some information but we can use it with real data with rather large data at least thank you any more questions so if not let's thank the speaker again this was the last talk of the session and now I invite you all to go to the main hall because there's going to be a closing ceremony and we're going to announce the best paper in this track as well so thank you all and let's go there thank you