 Hello, everyone. So today, I will be explaining to you how Open Source can help us to adopt a friend, also known as Pet. My name is Liza Shoah, and I work for Ivan as a developer advocate, where we do our best to manage open source database such as Kafka, Open Source, and others for everyone. I also happen to like Python, and I live in the city of Munich. So this has led me to organize my latest Munich chapter. In my free time, I like to read, to travel, and to code. So, what are we learning today, and who is this talk designed for? You are all welcome to be in this talk. This is designed for beginners of Open Source and also Python beginners who are interested to learn about Open Source and that want to see how to perform Open Source queries and to learn a little bit about dashboards. I'll give an overview of Open Source during this talk. I will also talk about Open Source queries and so you can understand how to write queries and write your own query. And I will show you how to explore your data with Open Source dashboards. So, we as an industry and we as a developer, we deal with data all the time. That is a saying that if you cannot measure it, you cannot improve it. We want to collect data, understand data, because data helps us to make better business decisions. And also to improve our applications. So today we will talk about a tool that can help us with that. It's Open Source. So, what is Open Source? Open Source is a distributed Open Source search and analytics suite suitable for real-time applications monitor, load analytics and more. Open Source is an alternative to a last search and is often used to enable search functionalities for your application. Some of the key features of Open Source are scoring of the entries, scalability, it's very stable, high performance, Open Source performs really well for a large amount of data. And also you can perform aggregations in your data with Open Source. There is more that you can do, you can also do analytics and so on and so forth. So, what are some of the power of Open Source in action? Open Source includes many features that can greatly improve the user experience. Some of them are, for example, we are looking for how to adopt dot dot dot. When we tap this, we expect that all such engineering could be so smart and powerful that can kind of guess what we are thinking. And can, for example, autocomplete our thoughts or our sentences. Open Source actually can do that. It can suggest phrases, how do you type what you are looking for. So this is pretty cool, right? But what else? As humans, we are very visual and we're looking for, we want to look for results in a way that it can be easily to find all the relevant terms. Even like just looking, we can easily identify those terms. So, it's very, it's very nice if we can look for something and those terms that we are looking can be highlighted. So all meds can be highlighted. This is another thing that you can have with Open Source is highlighting meds. Open Source highlights the search terms in the results. So, you may be thinking, there are more things, right? When we are looking for things and that we want that all such engineering to perform. Another thing that we can do, and actually we should not do, but we do it because we are not robot. When we are writing things, we actually make mistakes. We type things. And those kinds of mistakes, they are not intentional, but we still want that the or such engineering is able to find relevant results for us. Open Source actually also can help us with that. It can make sense of out of typos and still find relevant results for us. And there is much more such as searching for synonyms that you want to match and ordering by relevance and more. So, these are some of those features that we will learn today. And I hope you are motivated about it. Now, we're going to also understand that Open Source is not only a full text search engineering, but also offer some analytics. From the Open Source dashboards, you can actually drive some insights about your data. We are also going to be showing this today. And I hope you enjoy. So, let's see how it's done. And before we jump in and continue, we're going to try to understand some of the terms that are used when we talk about Open Source. Is it Open Source storage data or is it Open Source as such engineering? What do you think? It's actually both. So, you can use Open Source to store your data, but also has a search engine, as I mentioned before. So, how the communication works, how can you communicate with your cluster? To send, for example, your data, perform search queries, do aggregation and so on. Open Source supports communication via the JSON-based REST API over HHPS. You can also use any program language that supports, that has an Open Search client. Because when you use the Open Search client, for example, you have more access to more variety of methods that you can use it. It could be easier to use it. We will use Python because it's JSON-friendly language and beginners-friendly. But Open Search has support to other languages. So, you can check it out. I know that for JavaScript, it has also support, but for others too. Maybe there is a language that you already familiar and you want to do it with it. When we send data to Open Search, this data is actually organized into something called documents. So, documents are units of data that we send to a cluster. Documents are JSON objects containing whatever data you desired, designed actually to send to your cluster. And documents can be indexed. It's something similar to what roles are in relation of databases. So, what is index, right? So, I already said index. In the plural index, it refers to a collection of documents that have similar characteristics. They are logically related. Indices are used to store the documents in dedicated kind of data structures, correspond to the data type of the fields that you are sending. When we run such queries, we actually run a guest index, how we will see it. They are like database in a relational database. So, Open Search will then, when we send our documents, create the documents, Open Search will create something called shards to allocate our data. So, sharding is a way to divide your data into smaller pieces and each piece is called shards. Sharding is done at the index level. Close to the shards, we're going to have something called replicas. So, where is your data is replicated to? This kind of way that Open Search does, does the sharding and the replication. It helps Open Search cluster to be highly scalable and also to perform very well when doing search squares, for example. Close to the shards, as I say, there are replicas. And replication works by creating copies of your shards. Replications configured at the index level as well. A replica can serve search requests just like shards came. So, it's kind of, you can reallocate some of the requests to the replicas. Shards and replicas, they are organized into something called nodes. A node is an instance of Open Search that stores data. Nodes are not machines, but actually you can run many nodes in the same machine. A collection of related nodes that together contain all our data is called cluster. Clusters are independent of each other by default. But you actually can search for clusters, different clusters. It's not really common, but you can also do this. So, this is like a simple overview of how Open Search works in this kind of flow. So, I hope you're familiar with the terms that we're going to be using. And now we can move forward. So, the time has arrived and we're going to be now actually looking for a path. I'm very excited about it. You can find here the demo that I prepared. You can either scan the QR code or find in the GitHub repository. For that, we're going to be using Python, an Open Search cluster, and of course some data. So, let's see how it's in the code. So, we have here our repository including some files such as config.py, where we actually first create the client. And we can see it here. I import Open Search Py in the created client. And once that we have the client, our next step is actually to send the data to the Open Search cluster. So, you can find here index.py, the function called load data, which actually will enable us to send more than 5,000 documents in one API call by using the helper.bolk. And you can see here I give the client the data and so on. So, let's run it. Let's send the data. So, you can see it here that is ingesting the data. It takes a little bit of time because there are a lot of documents, of course. But it's quite fast actually. So, you can see it here we send all the data. If you want to explore how the data looks like, because we didn't really set any kind of mapping of fields to certain types, we rely on how Open Search does this for us dynamically mapping. But you could also set yourself. You can see here I will run a function to check how are the mapping is, right? So, I have this function called get mapping. I will run it with index.py and I will write get mapping. And this is the results that we can see. We can see all the fields because I'm printing here the keys of the dictionary and we can see here the schema. So, for example, if we check out how the birth date is, you can see that was set has a date. So, there is some thing that I wanted to double check. There are some metadata here, but you don't need to worry about it. And for example, another thing we can check is that weight in kilogram is actually a numerical value. So, I think our data was birthed correct and we can continue for queries. So, we did already send our data and now it's time to query. And I do have some questions in my mind that I want to answer from the data. What I'm looking in for a pet. So, one of the things is that I prefer a small animal because I do live in a very small tiny, tiny, tiny flat. And I don't have space for a big animal. So, this would be kind of one of the requirements or one of the things that I want. And another one is that the animal would be around zero to five years old, for example. So, those are two, one is a numerical value and the other one is a date. So, we actually can use a search query called range. We can find results in looking for fields in a certain range. So, this is how the syntax works for the range and you can also put different kind of limits. So, let's see how it's in action. So, we have here the range query being constructed and we can specify the fields. Let's run it with the specifications. For example, that we are looking, I put here 2015, 2020, the birth date of the animal. So, we can get some results here. We could do the same with the kilograms if you are looking for this measure. So, we call range and weight in kilograms. And we can specify, for example, five to ten to find animals which could be smaller, for example, or at least lighter. And, yeah, I can have here some of the animals and I can take a look. So, it's actually used like this, the range query. Another query that we have here is the match query is when you are looking for a word or words that are in certain field. So, I already have an idea here about one that we can try out. We can try out to find straight animals in search in this category. So, let's see how it's in action. We have a file here called search.py where we're going to be looking for the match, for example. So, we have here the code. As you can see it, we have the client and we call the client.search. So, let's run some combination about what we are looking. So, one thing I wanted to look with this one is I can write here match. If I'm not sure how it works, I can do dash help. I'm using this type here. So, I do have some instructions here. In this case, I'm going to be first adding the field and then the query. So, for the field, I'll be looking for returned reason. And for the query, I'll be looking for straight. Straight, not star. So, let's see what kind of results I have here. So, all these pets are pets which are in this category, which is really great. I can already take note on that. I have it here, the name and the chip ID. So, it's already a good result for us. We're not going to stop only match. We actually can have multi-match, which means that this word or this word would be looking across fields. And this is like a kind of way of combining things that we want to also check. The syntax would be like this. So, you have to specify the term and the fields that this is going to be looking for those terms. And so, let's see what kind of cases we can actually do this in the code that actually we can find some funny combination for all pets. So, I do have the file here and we have also multi-match here. So, you can see how we can construct the query and how we can call it. Let's call this with some funny combination. So, for example here, we can have it as multi-match. We're going to be looking at the fields animal name and base color, the word orange. So, we do have some animals that do have the color orange and they also name house orange. It's quite funny. And you can do this with different fields and with different values that you want. So, let's try to actually write things wrong. Yes, you also can do that with open search. If you mis-type something, you can use something called fuzzy queries, not funny queries, fuzzy queries. So, let's try to use this and see how it's done. You can see the syntax here and some examples of how fuzzy queries actually can help you to correct words and still bring you some relevant results in your match, even if you mis-type it. So, let's see in action. So, we have here all search underline files and we're going to be giving you a parameter called fuzziness. In our case, we can give it the automatic one. So, open search can decide for us what kind of level fuzziness we should be using in our data. So, let's call it python-search.py fuzz. Now, we're going to be looking in the field animal. Let's look for the animal name. Let's call pet, but actually we index to find actually pet or some other name. And we can set the parameter to auto, the fuzziness parameter. And you can see here that still we could find results with the word pet. It would find Betty and so on. So, it's quite cool that we can still find results. We have also Boolean queries. So, Boolean queries are a combination of queries. We have it here as an example how a Boolean query could be built. For example, you're looking for animal name or also range of birth date and so on. So, let's see how it's in action. So, here we have a combined query and we have here at least two queries combined that we're going to be running. So, let's run and see what kind of meds we're going to find. We are looking for a bat and bat is an animal who has born from 2016 to 2021. And let's just run and we don't need to pass any parameters here. So, let's just run and we see it here, the results. So, we're very successful. We found some animals here that I would take note for adopting. So, now that we checked how we can do such queries, we can actually see how we can use open search dashboards to do queries and also to see some analytics, some visual things that you can see from your data. So, let's see how it's done. So, we are here in the dashboards and the first thing that we need to do is to create something called index pattern. So, in order to do this, we can click it here. And from this, all these options, there is one called management and in stack management, you click it. You will see here that is index pattern. Let's make it bigger here. There is an index pattern here, which you're going to click and we're going to create a new index pattern. So, when we deal with open search, we often deal with indices or index. But in the open search dashboard, it also works actually with index pattern, which means that you can combine or you can see all the data from different index as long as they belong to the same index pattern. So, if you have an index that has a date, it's taken daily and every day it adds the date into your index name. From the index pattern, you actually can combine many of these days just by adding an index pattern. It's similar to rejects, for example, pattern. So, here we already see we don't have this. We don't have extra things there, but it has an explanation here how it works, how you could do it. In our case, we don't have it, so we're just going to use the name that we have there already for the index. It could be the same for all index pattern. So, I'm going to click next. And in the time field, we don't have any time stamp that we want to see all that the throat time, so we didn't select it. Here we have all the fields. And we can also see the types here. For example, just going over some of the fields here, we have one field for the birth date and is actually saved, has a date format, a quarter of the open source one. So, it's quite cool. I think it's good and it's working. The next step, we can see this data in the discover. So, let's go to open source dashboards and then go to discover. Here we see all the data that we have, all the documents, and each document here is a hit at the moment because we are matching everything. And we have this amount of hits or documents. And if we click it to see what is the underlying source here, we will see that we can see the data in form of a table, or you can see the data in form of JSON. You can see all the fields here and what they are, the key value of the fields. What is underlying source. From here, we can add filters. So, there is an option here to add filter, and you can select the field that you are interested and the kind of operator that you want to do. This, for example, I want to find, let's find intake reason dot keyword to be has. So, actually the keyword will allow me, it aggravates or it sees has a word, and I will choose the ones that are interested for me. So, let's see the intake underlying his or the keyword. So, in this case, I don't need to type it because I'm going to be using the keywords once. And let's see the animals who are, for example, abandoned. So, let's save here. And we see here that all from the five more than 5000 hits, we went to 183 so it's already got less hits, right. And we can continue adding more filters. For example, I can also see the weight in kilogram of the animal to be, for example, is between so we're going to do some sort of also called the ranges. And we're going to see animals who are 10 to 20 kilograms. So, as such has now been 58 so we are going to have many smaller ways of seeing. So, you can we can continue adding for example we can see the sex the animal. And if I don't select the keyword, I will have to type myself the keyword. So I need to be sure about the format that I'll be using or what I'm looking for so it's male or female. So I will look for female animals. So it's already showed up here as a keyword. And I will save it. So I see I now I end up with 27 hits. And here it's interesting how we can see all of these animals, but also it's a bit hard to read. So what I will do I will select just the fields that are important. So important for me. So you'll see how it will be much easier to read. So I will add here the sex name is one of the things that input. Let's also put the intake reason. Another thing that is important from this measures is the ID ship number, because it's where we can actually be easier to find the animal and to tell another thing we can try to access here is the objectives. And now we have tables we can also see the animal name if you are interested. And let's put the animal name, we can move with those columns as you can see I'm moving the animal name to be the first first column here. And now it's the first column. So we have here precious, which is a female abandoned. And this is the ID chip number and the objectives of her. Okay, so we can also inspect here. If you click and inspect what we will see we see the request that was what's happening behind the scenes right so we send a request to open search cluster with the query written here, how it's defined image phrase has sex named the work female, and you see here that they all have some sort of filtering that we did it outside but here you can see how is it in the query. And we can select this query. Let's see here goes until must not in this one. Let's select this Boolean query. I will select everything actually. And we have to deal with the parenthesis of closing and so on. So let's, so we can see the request and you can see the response and the responses and all the hits available and the fields that are there and so on and so forth. So there are some metadata that open search edit has well for configurations. And I want to quickly show you how you can also do this from the also called console. So if you go here in depth tools that is here a console and you see here that we can write our own queries here and it will search for us. This has some advantages because we don't need to do any authentication me already going to send you back to our cluster. And we can see the it's already has another good advantage. It has some formatting for us, and kind of James or friendly here with highlight for the syntax so you can see here that shows where the parenthesis close and supposed to close and to start and to close. So here I have this one that I copied. We can reuse here in the query. We need to double check the if everything that is open is also closed. So you can see here that I can use this tool to actually do some easily to go from and so on. But actually we don't need this fields we just need the query one. So let's see. Here bullion and close and close. So it's good. And we can hit play. And you can see here that we also going to find the results here has well. But this is like if you want to do this from the open source dashboard you can use the queries here to do directly write your own query. What we're going to show now is how you can quickly do some dashboard visualization. So let's create a new dashboard. Here I can click and create a new one let's pick one of this. I will be cloud. Let's do a cloud. Here is all data we select and you see here has for everything we need to add some buckets. So we're going to be selecting in terms. Some one of the few that we can see, for example, let's see the intake reason dot keyword and click and update. So you can see here we can see five items if I want to see more, for example, 100. We can also click see and we're going to see more things here and the tech cloud works like having the size of the words here equivalent of the number of hits that it show up has a counter. Let's put see too many here let's look 50 and hit play. So this is what we see it. Wait, I, we don't need the future here. Yeah, so we see it here and we just one select to 50 fifth of them, and you see here a lot of information let's save it. I still think it's a bit hard. It's a bit too much. So I'll put 40 maybe will look better. Oh, buddy. Oh, the top 10. Let's see the top thing. So we can save it and we can save it has intake reason. And you can see it here in your dashboard has a visualization. We can take some insights from here. So for example, the animals have more the intake reason being I stray but there are lots of interesting things such as movie abandon landlord issues. Those are the things that you also have to think when you adopt a name adopt an animal, you have to think your responsibility in the long term. If you for some reason are adopting in a special situation. You need to rethink if you really want to adopt or not, because you need to make sure that the animal can be at your house has no issues with your landlord or the law, or, I don't know your house conditions needs to be sufficient for the animal. And also that if you move, you can take care or find someone to take care of the animal or take the animal with you. So I think when there's our data but also makes us to think about, let's create a new one. And for this one, I will be using a star bar. And I'll select the data here for the metrics. X, Y, X, X. So in us I like to hear an aggregation. Let's see from Instagram because then we can see some sort of a distribution of our data in certain numerical range of values. And for the field, one field that it will be numerical fields and one that's what's here would be weight in kilogram. So let's select it. And let's update. There's a lot of short ones here. And it's very hard to read. We actually still have use out interval here. So we're going to be typing manually the interval that we want. So that's put like, I don't know, ever one kilogram, for example. So it's already much better. Maybe two. So we can see. We can see here that most of the animals, they are around four kilograms, maybe our small animals. And yeah, let's save this weight in kilograms like this. Save and return. So we already have some data here. Open source dashboard is being created. I will make it smaller. Let's add a new one also for example controls. We can have a range of slider based on pets data and the field weight in kilogram here. And the step size could be also of one. So let's add. Let's update and see. So yeah, so we can see here how the variation works. Let's also add this one is a range in kilograms. So here we have already three kind of dashboards and you can see that you can create even more. They all have the same size here, but we can actually make some bigger than the others we can do something like this depends how we want to organize our data. So for example, this one I think is too big. I'll make very small actually and bring it to here. So we can see the three dashboards or the three graphics that we just did it like very easily doing from the open search dashboard itself without any kind of extra program language or any different tool. So here you can continue exploring. Here there are many types of kind of graphics that you can do bicycle bars, regional map, if you have also look locations for example, if you have longitude latitude in your data set. One thing you can do it can see this data in maps. So it's quite interesting all these options here that we can we can do it so I will close here. This is just a quick way or a quick overview how open search has has a dashboard and how you can go to discover create index partner and evaluate a little bit of your data there. And I hope you have enjoyed this part. So it's the end of our talk I hope you have enjoyed how to get started with open search dashboards and with open search queries using Python to actually find your pet. And I hope you can find an animal that you want to adopt and use lots of love. So thank you for joining me.