 here on stage here to talk about key words thought in here. For example, Janus was part of the impact project, he was a tremendous improvement for Oostlar, and was also part of Church Victorian, this is where Reeds was, and Transcribes was conceived. So it's great to have you as one of the main experts in the field here. Thanks. Thank you very much. Thank you very much for inviting me to talk to you about the hidden component. We are a member of the Reeds Consortium, and I'm going to talk to you about the hidden component, because it's not still integrated in the Transcribes zone. They use it, which is a great tool that you can hear and get feedback about. Unfortunately, for this component, you're not going to have feedback, because still this component is not incorporated, it's not integrated. But I will give you some hints about the value of having such a tool. As the word implies, he was plotting his action of identifying words, a document, a collection, when you really buy a string. You have this experience. So what I'm going to talk to you about is a query by a word image, by taking an example of an image which identifies the word, and I'm trying to retrieve similar places in the collection, in the second document, which relates to this query. It might be a trade-off between a technical description and the basic ideas. I don't know if the vast technical things are not so important at this talk. So when we deal with this problem, we have another segment, a document into certain words, which is a difficult task, or not segmented. So we have a segmentation based on dollars and a segmentation free methodology. The one that I'm going to describe is a segmentation free methodology, which is going to support something which goes beyond the typical table spot, since it retrieves, it can retrieve, not only city words, but part of words may be compute phrases and also grab the components which sometimes, like logos, identify the origin of the document. So the difference between segmentation methodologies and segmentation free methodologies, on the one hand, the segmentation based methodologies apply what we have seen layouts with noise free documents, and we have positive times, but when we have single layouts and it's not evident that currently in the state of the art that this can be used for big data, for big collections. This approach cannot be created with documents and can affect only words, cannot go beyond what I mentioned before, beyond the work. On the other hand, segmentation free methodologies, when we can consider up the presentation, this is the key that we have to apply in order to tackle the worst-plotting problems. It can be used with complex layouts, since it's a segmentation free approach, and as I mentioned before, it can match password words, phrases and symbols. And also, what I'm going to describe today is that with choosing particular structures for indexing, you can deal with the past memory computational power requirements which are needed for this particular approach. Again, additionally, I would like to mention that for this approach, we don't need any training. We don't need to transfer images, we don't need any user feedback, we don't need any user effort. As you are going to see, there is an unsupervised training, but this does not require feedback, it's automatically applied, and it does not require the user involvement. A general picture of the architecture which is used for the approach which is taken into consideration in the reproject is this one, the general architecture comprises two stages, the offline stage, during which we get the indexing part, so we take collections, the ones that we are going to query, and we index the information which is mainly based on these two types of storages, a memory storage, which permits to have a very, very vast access for big data, and another storage which uses hash structures in order also to facilitate this past retrieval. The basic idea relates to using local information in order to model the different words. You see words which can be read in a different way, and we have certain key points, as we call it, which identify similar points in the same structure, but written in a different way. So we have two stages, one is the key point detection, and for this particular key point which is chosen, we compute a particular descriptor, as we say, as we call it. So the key point detection can be seen in different stages, just visually to guide you saying that we visualize on the gradient orientation, which is quantized, and all the connected components which are related to the quantized levels, we define the different key points taking the gravity for each connected component. For each key point, we use different windows depending on the scale. The scale, we have an automatic scale selection based on a function which is optimized, and we build a certain vector with values which represent this particular region. The region around this key point is divided into 16 cells, like in the picture, and for each cell, we have a particular vector which represents the information inside the cell. What is important is to index this information, to make it very fast, in particular, when you have very big data, which was a basic requirement in the read project, you need to have very efficient strategies to do that. For this purpose, we have an approach which is based on a well known principle in computer vision, which is called another word, for which you build a visual dictionary. A visual dictionary. You can imagine it as a collection of particular words, visual words, which means it's not related to the well known words that you have in mind, visual words, which are unique for particular key points. We expect that a set of those particular words can be connected and give us the words in question. That's why in a offline stage, we produce clusters of such similarities, which are the visual words, and here in those clusters, for each new word, you see here, for a particular key point, we have rectangles, which are related to a particular cluster for a particular visual word. If we do that, we can use a hash function, like the one that you see here, each coefficient, which is called PoW, with the index 1, 2, 3, 4, relates to the optimal word, which is chosen for this particular rectangle. And in this way, we provide a particular ID for each word. And in that way, we have this inverse structure, which is a particular ID for each word. And each scriptor is modeled with this type of information, x, y, which are the coordinates in the document, the doc ID, which exists in the collection that we have index, and the particular scriptor, which will be used to have similarities for the query process. Also, we have a special hash structure, which is modeled with this special hash function in order to add outsets, not only for the central point in the world, we are going to understand in the SQL word what this does mean, but also in the context, in the special context. Therefore, imagine that we have a query image, for this query image, we take the central key point, for this key point, and we find out all the element key points in our collection. Such an example can be found. Can be seen here. And for this, identified key points, we set in the neighborhood similar to the neighborhood as it is. For example, here we have a central point and the neighbor key points, so we have a special context, and this special context is created in network, which relate for similarity purposes to be compared with the context which is raised out of these key points in order to find out the built-in boxes which also relate to the particular words which exist in the collections. And in that way, I hope that is quite understood. I'm going to demo later on to give examples how this works. We end up with the selected words out of that query word. So I repeat that really what is my query is a word image and I get the positions where this word image exists in the collection of the image. So for this purpose, we have also ready available REST API which can be used in order to make a connection between someone who is going to use a backend and a program and it actually is going to use this. And also, there is a demonstrator who we plan to integrate in the trash can bush and currently you can also use it, this is the address, but don't do it there before and I'm going to do it and demonstrate to you. You can make it all simultaneously, but it is not really a problem. Just to give you some ideas of results to quantify the performance of this approach on different data sets with this number of queries, the data sets appear like this type of handwriting to get an idea of the type of documents and the problems that we are going to address and here we have for the different collections of the performance of this approach and also the query the type of expense that is required for query these data sets. The very important thing in order to address big data sets is the fact that obviously here in the order of magnitude we don't have a linear increase which means that we arrive at addressing 50,000 documents with just an increase of 0.5 seconds. So in this way I would like to change the mode in order to permit me to demo you here. Here we are. By the way here we have a collection from Azure exactly. Here you see some examples. For example here just to see if you see here we have a logo which is a very unique logo sometimes we may want to retrieve all similar documents from having this particular logo which means that we have to select it and to get the similar logo which is identified other documents but also sometimes we would like to address queries which relate to partial phrases. For example I would like to see which documents have for example this part and we have a response and a relevant word but also in order to become more close to a speed which comes to historical more historical documents for example I think here it's a typical example this is a protocol and we would like to retrieve all documents which are related to this because of the appearance of this word we have selected the word and we search and then we get a response to similar documents which this word describes in this document but also another case which appears as a challenge due to the difference because of the tables the library of Passau gave very challenging let's say manuscripts here we have one table which is very interesting to index and to retrieve related information which is very interesting if we identify thematic information which is related to particular columns for example I would like to retrieve tables which contain this particular type of columns and we see that we have retrieved different other tables respectively the intensity variation you see here that we have a different variation intensity variation profile which is not easy to deal with and many many more ideas from having used cases because with this tool it permits selecting the component you can view the word as a component which means that we have explicit type abbreviations for example additive abbreviations but you can extend your imagination in order to retrieve information from the vast collection of data thank you very much any questions or queries is it possible to because I don't know the right one here is it possible to upload everybody can give his collection in it yes because it is connected to the transcribable not yet it is not yet connected to that but there is the pipeline of uploading a collection so the collection will be indexed and then we get to this collection would you get a better result if you reprocessed the images to high contrast black noise to write this out but currently we don't see this problem with the material that we have already tested maybe for that the material could be a good space that should be followed that was incredibly impressive how you mentioned the use of symbols and marks and you demonstrated the principle of symbols how responsive is your system to variation in a particular mark I give you a use case very formal but it could also be distorted or say an anchor which could be displayed in different ways so I am interested in marks which have got some mounting consistency but can show a lot of variety the variety is related to different transformations currently this approach is not rotational invariant but it can be used can be done but we should talk further this is not a big problem we have time for one more question thank you thank you I wanted to know if you know what the algorithm is missing did you ever put it on a trial to see what would be found in the lecture but the loops are missing what you mentioned the read code I think it was asking about the read code to check the read code what we are losing what is the current problem you mean it's great that you find something but do you know what you do not find if there are resources are there other words so first of all there are two things here a part of the accuracy is to access big data and because of that we step back to the accuracy if we forget a part of access big data and we focus on accuracy we think that we should work a bit more on the scale the scale component to be to improve the scale invariance I think this is the most relevant to deal with the problem when but this is also a scale problem the world with a very small scale and then we have a very, very big world there we have to work thanks a lot and thanks again thank you very much from the technical university in Valencia he is in the team of Enrique Vidal and he is one of the main experts regarding keyword spotting of keyword spotting they have carried out already some different projects on keyword spotting and he will demonstrate and take you away from the technical side I will talk about the indexing and searching for a really large monthly collection to cut it out and develop project this approach is intended to make searchable I mean one million words one million page image which is what the most work of the keyword and keyword system of course we want to make this searchable on reasonable time this is more or less the content of this talk we talk about the motivation we talk a bit about the property index we carry out to make searchable and we talk a bit about the performance measures the product list the final is to demonstrate so everybody know there is a massive definition collection which is available by every cultural institution around the world who most of these document lack of crimine hand transcribe this document so if they are here we have a keyword image transcribe for such document we can transcribe to build index and make this collection searchable also everybody know that manual or interactive transcription such material will be inexpensive and affordable task we need expert in the topic and we will read this kind of letter letter and also see we are applying full automatic transcription system of course they are if we use automatic transcription to obtain some transcript of such material is no error free so it's no useful for searchable purposes the idea here is to make some kind of probabilistic index and I will talk how we can use such index here you can see an image piece of image this is taken from pencil collection we build this spatial probabilistic map for the for the word in this case matter you can see here the peaks there are four different peaks corresponding to each of the possible location of the word matter in this image and also look here this is the word matters which is different from the one matter so it is supposed here working correctly the probability of matter here to appear here is very low this is a more or less the idea how we can obtain this probabilistic map this probability not this concrete is true we can say a sophisticated word classifier which is taking account of the context around each of the word so for example here the word matter is very low probability because it's next to it for example so taking account of this we can decide that this matters is not possible to appear here so we can use this probabilistic map to make search directly but the problem of that is take a lot of time searching in this in this big dimensional map and also we have the problem to store this map in the disk which take a lot of space so one another in order to solve this problem directly we compute from this probabilistic map some relevant probabilities for words we can use or directly build the probabilistic index like this one for example here we have the same piece of the image and here we come the index we can build from this image we have all the word here this is the relevant probability that indicate that the word appear or know in the image in this value and also we have the position of this word in the image look here for example the word matters matters here in this position appear with very high probability practically 0.99 while matter in this position in the same position but it's very low probability it should be in the same position we can also remark that in this index there are a lot of words that don't have any sense for this word so the idea here is we can index everything we call this actually we don't call this word this is a lexicon free indexing tool which we don't apply I mean we don't apply any dictionary so we can in this way we can solve the problem of our vocabulary words so it is expected in this index appear or the word likely appear in the image so how we can compute this probability index as I have commented before we use a sophisticated word classifier which is obtained from the convolutional network and neural network of course if network as Kundra and yesterday explained is depart from text line image we can also call confidant matrix from this confidant matrix then we apply some techniques based on transducer to incorporate language model which are from external purpose and then from this technology we obtain the final probabilistic maps for or less this is the basic idea this is for or less a cool process we depart from raw image we apply the key word important indexing word we have explained before previously we obtain the corresponding index for each of the pages then we obtain from this indexes we can extract some information and to fill out the data set and finally keyword search the user makes search using the image and the raw image and the data set and can find the spot in the full collection now talking a bit about how we evaluate the system we use the indexing the precision of record model in this case indexing and search quality can be accessed in this case by the precision of record so everybody know precision is high since most of the retrieval results are correct while before is high since the system core results are retrieval of course if we use a perfect transcription we produce a perfect transcription the record precision will be in this point 100% in the other hand see we use an automatic transcription which is no error free we have some error probably we will obtain for example this point we lower precision and lower record but in contrast using this pluralistic index what we actually obtain is different many points which form a query record precision query which the user is each of these points are put into some specific threshold and the user can set up in order to make the search so in this way for example the user can decide to get more precision go up increase the threshold and obtain more more correct sample but probably we let record see if we can control the record to get the maximum record possible decrease the threshold until for example we obtain of course again to have more phase positive so in the process to obtain the indexing the recollection we show later on we use transcribable tool in this case for mining from three different tasks first line detection all text line detection well as tactics using the transcribable tool implemented by the Rostov using the engine implemented by Rostov then we use also in order to obtain the network we use the page image text aligner and also of course we use this to transcribe some page in order to have more training samples these are the big collections we have indexing so far the first one is the chance collection it's about it's a collection from the 14-15 century from the National Archive of France it's about more than 70,000 pages this collection the other one is ransom which would be top after me and the final the last collection is the Golden Age of the Spanish Theater it's about it's from the 15-16 century it's about 50,000 pages this manuscript and these are the record precision cubes the preparator of course we conduct this experiment in the very small subset of the collection people we don't have to go through all the pages this is a training test for obtaining for the chance the BELSA and GCO using the language model this is the number of what is said what you want we test it in each of the collections more or less this is obtaining precision record cubes this is the average precision this means the area under the record precision cube is more or less similar similar comparison so this is one of the chance indexing search like demonstration the first one is the chance you can access and test it in these directions as I said there are more the number of indexing pages are around 76,000 pages the number of in the index is about 266,000,000 spots the entries this is something interesting using priority here we can compute what is the number of running words in the whole data set in this case it's about 44,000,000 words this is from the BELSA collection it's a number of in this case this is the biggest collection so far it's about 95,000 pages from which only 89,000 pages were indexed because more than an empty page or printed text page which were still different from the data set this is the number of reports and the running words in this case 25,000,000 words this is the last the indexing of the golden age of Spanish theater which is now the page the running words in this case is 5,000,000 words we show you the demonstration everything is working for example you can look at the most Austria so the interface shows you all the the books where contain some of the page and appear the word Austria for example we can go inside one of them here appear two pages and have this this word this bar show which is the confidence find in the word find here we can look here here we have the word Austria and also what is interesting to see here we can also click in every part of the image and also we obtain for example here in this case we obtain for example a different alternative for each of the words as you know there are lexicon free system and also give for example another possible alternative for the words written in this area this is a good this can be a good tool in order to find out what exactly is written or to have some idea what can be written in some areas of the document so also I want to show it's made possible also it's made possible here to look at a combination of queries using boolean expression for example or and for example and also to come directly search for directly streams if you close in square brackets also we have another kind of demonstrator in expression we are using one of them is directly we are indexing for example another kind of collection no HDR but for example music manuscript for example this one from manuscript is called piano codification I don't know about music but for example open sounds of the page you can click each of the part and give the symbol the music symbol that found in each of the words but the interesting thing here is for this demo it's better to it's not heavily sent to look specific symbols the really interesting thing here is to look for stickers of symbols that's all to upload the private collection and to make the impact from because it is not connected with transcripts no not for now I think we are planning to integrate this at the beginning of the next year maybe you need to say something about how you process the index and what your part is in that regard so in this case you usually do prepare the HDR and then you do some post-processing maybe you need to freely talk about this what steps are to be taken there to envision using your tool ok as I mentioned before we apply more of them we apply the transcript the tool available in transcription about presentation was expected with the tool available in this transcript but then we apply some preprocessing this preprocessing the client preprocessing is about some image preprocessing correct the slant, skew I make some I remember I make some apply some preprocessing to fill in the image and then to train the neural network of course we apply some data documentation in order to to train possible distortion also of course we apply the network to regularization so you said that we also need the language model in order to process it so how does the user or will the user build the language model or where will the model be built automatically yes I think to use the language model can be directly automatically usually it can be built directly from the training sample from the ground through or directly you can you can use an external an external data don't have any sense with it which actually an English copy for example if you are to recognize English sentences or another language there are two available for that I have a question about training model you are using for the three series you named so Gensri Benton and Spen Spen's Theatre how many pages did you need before you could see the difference here actually of course in Gensri we only use if I remember 300 pages to train the model in Benton we use 800 pages or less in TSO we use here in this case we use 286 pages which in combination with the remaining pages practically nothing