 Previously, we assessed a list of words from an Excel file. Now we can load this data directly with the corpus widget and display it in a corpus viewer. Here we go, 150 words. Okay, we've already developed a workflow where we select a word and find a number of the semantically closest words from our list. We judge the similarity from the embedded representation. So by turning the words into a set of numbers. Now this time let's do something similar again. But I want to make up a reference word in orange directly. So to do this I'll use the create corpus widget. I'll keep just one placeholder for the documents, use new text for the title of the document, and then I can type watermelon in the document text box. I can use another corpus viewer to make sure everything is okay. It is. Now I have one text document containing the word watermelon. Okay, just like before I want to find the semantically closest words to watermelon from my list of 150 words. So I need to embed my words into a vector space. I'll use the document embedding widget to do this and double check its output in another data table. I find my words and there's corresponding 384 dimensional vectors. Great. Okay now I also need to embed my new watermelon document. So let's add another document embedding widget and connect it to the output of create corpus. This way I can also connect this to a data table. So this is what a watermelon looks like in 384 dimensional space. Okay now it's time to wrap up our workload. Just like last time I'll use the neighbor's widget. Feed it the data from my list of words and provide a reference from the text I type in create corpus. I need to make sure I use the data that already includes embeddings though. So I'll connect the outputs of the document embedding widget to neighbors. Also note that the order in which I make the connections matters. I want the word embeddings first and the reference embeddings second. But even if I do it in reverse order I can always reset the links between orange widgets by clicking on the connection. Okay now I'll use the cosine distance in the neighbor's widget and try to look for the three nearest neighbors. Okay let's use the corpus viewer to see what I get. As it turns out watermelon is also a fruit just like grape or lemon. Now let's try some more words say truck. Okay I need to wait a little bit for the results because the text that I typed needs to be processed by document embedding and it can take a while to get the embeddings from the server. But here we go the word truck is semantically most related to tractor van and car. Makes sense. Now instead of single words I can also embed text fragments or sentences. Let's try resting on a porch. I got doorbell blanket and chair. Now chair and blanket are fairly obvious but I guess I can try to explain doorbells as they are pretty common on porches as well. Anyway let's continue and try to find an example where semantic search fails. So let's try animal species. This time we get zoo dog and puppy. Now neither zoo nor puppy are a species of animal. So I might want to try this with a different embedder. If you remember espert is optimized for sentence embedding but fast text focuses more on words. So let's switch over to fast text. I'll first change the one that embeds my list of words here at the top and then I'll also have to switch the other. I have to change both of them because not only do different embedders have different numbers of features but they also map into two completely separate spaces. Anyway I now find elephant tiger and zebra on the output and these are indeed species of animals. Now semantic text search can be pretty fun and turns out to work pretty well but of course everything depends on the quality of my embeddings. So next time we'll try to use these embeddings further to do some semantic clustering.