 Hello everyone! In this talk, we present the bootstrapping of an enterprise NohGraph with VikiData and specifically how to constitute a nucleus focused on business domains of interest, to enrich a NohGraph we can perform knowledge extraction from text in tables. However, this method requires a supporting NohGraph that contains business terms and iterations to focus on. This is a virtuous loop between the enterprise NohGraph and knowledge extraction step. But we face a cold start problem. For the moment, the enterprise NohGraph is initially empty, which prevents knowledge extraction approaches. So to cope with this problem and to provide initial business terms and iterations to focus on for later steps of knowledge extraction, we can leverage existing generic knowledge graphs to bootstrap it. Such bootstrapping could be manual, but this is a tedious process. That's why he proposes an automatic approach relying on VikiData and a set of business terms of interest. From this internal source, business terms are semi-manually aligned with VikiData entities, represented by a QAD. Then from these entities, we perform an expansion along their ontology hierarchy by retrieving the classes that will be provided to the NohGraph. For a concrete example, we align the business term VikiData with VikiData entity of VikiData. Then we retrieve its direct classes for instance programming paradigm, then the superclasses of this class for instance paradigm, and so on. So the global expansion strategy consists of starting from the direct classes of the business term in yellow, and retrieving all superclasses, all subclasses, and instances of the deepest subclasses. So from more than 800 starting business terms, such a strategy leads us to retrieve too many subclasses and instances. Therefore, they may deviate from the enterprise vocabulary, and for example from Inux, we reach a galaxy, human, star, and so on. Thus, we must print these irrelevant classes before integrating them into the enterprise NohGraph. So we propose to print these classes with relative and absolute thresholds based on node degree and distance in their embedding space. A NohGraph embedding is a representation of the graph in a n-dimensional vectorial space. There are many types of model. Probably the simplest model is a translational model, where each relation between the entities of the graph is represented by a translation in the vectorial space. Thus, the NohGraph embedding preserves the property of the graph, and we want to leverage the embeddings for a distant space printing approach. So the expansion algorithm with printing takes as input a set of CQIDs. Then, for each QID in this set, we retrieve its direct classes. Then, for each class, we check whether it respects the different thresholds. So we check the absolute degree threshold, which corresponds to a fixed threshold not to be exceeded to avoid exploring classes that are too generic. Then we check the relative degree threshold, which is computed at each level of expansion, and allows us to verify that the class doesn't have a large degree deviation from other classes of the same level of expansion. Then, we check the relative distance threshold, which is based on the mean of the distances between the direct classes and CQID in the embedding space, which allows us to check if the class moves away or not in the embedding space. For the distance threshold, we use two different definitions. With the first definition, the distance between a class and the starting QID is the Euclidean distance between their embeddings, and with the second definition, the distance between a class and the starting QID is the Euclidean distance between the centroids of the embeddings of their respective instances. Then, if the class respects these three different thresholds, each of the classes are retrieved, else we print this class. Then we continue the exploration and so on. So we perform several expansions with different configurations and with two definitions of the distance on more than 800 business terms, and we have manually labeled the printing and keeping decisions. Then we computed the printing precision as the number of correctly print classes divided by the number of print classes, and the expansion precision as the number of coloring classes divided by the number of kept classes. So the first table shows the printing decisions' results for each threshold, and the second table shows the kept classes' results. Lastly, we can notice that D2 is better than D1 since it has the best global printing precision and even so the expansion precision with D1 is better than D2. D2 keeps more relevant classes for the enterprise network. Finally, this strategy limits the expansion from about 2.5 million subclasses to about 2,000 subclasses. To sum up, a distance based on centroids of instance is the best printing configuration. Nevertheless, the degree thresholds remain essential since some classes are only printed with degree and avoid exploring and retrieving two generic classes. Now that we have a set of labeled printing and keeping decisions, in future work, we want to train and experiment different binary classifiers and compare them with symbolic feature-based printing or propagation-based graph matrix approaches. Thank you. Do you have any questions?