 Okay. So Paolo is going to talk to us about, you know, how to build a full tech search engine using only Django and Postgres. So I'm going to let you start. All the best. So hello everyone. I'm very happy to be here with you at your Python 2020. I want to thank all the organizers for making this online edition possible and thank you all for attending from all over the world. If you're asking yourself what is a Python full tech search, I'll show you an example. This is the search function in the Django website. How many of you have searched information on it in the past? I think a lot of you. The search function is based only on Postgres and Django itself. And I was the one who built it. So the next question is, who am I? I'm Paolo Macchiorep and I'm the CEO of the 20 tab. It's a Python software company based in Rome for which I work remotely. I'm a software engineer and a long time Python became a developer. After using Django for a few years, I became a contributor to the project. And now I want to try to explain a bit more about the title of this talk, a Python full tech search. I think you can read the definition of Python by entering import these in the Python interpreter. These are only the first principles of the design of Python. The most important for me is the third one. And I think it's also the most difficult to follow. Full tech search refers to technique for searching a computer store document in a full-text database. There are a lot of search engines that already provide full-text search as in this definition. The most popular search engine library is Apache Lucene, an open source software written in Java. Based on Lucene, there are two very popular search engines that I used in the past in some projects. Solar is the first one and it's part of the Apache software foundation. And Elasticsearch is a product of the Elastic Company. The last big project where I use one of them is DocsItalia. DocsItalia is an Italian government website to find public documents. I worked in this project with my colleagues to improve the search function. Under the hood, DocsItalia is a fork of the open source project with the Docs. So, as the original project, it's a Django-based platform and it requires a lot of Python packages to access the Elasticsearch instance asking for results. Of course, the search function is working very well now, but we can consider these as a simple solution. We can say various things about search engines. On the good side, they are very popular. They are a lot of features and you can find a lot of line resources about that. On the best sides, you always need a driver to use them from Django. You have to use their specific query language and it's common to have synchronization problem. But let's go high-end. Hope, this is embarrassing. Jokes aside, this is something similar. What happened in e-commerce when you find a product in the search results, but it's not available anymore when you click on it. Usually this happened because search results are fed from the search engine, which is not already synchronized with the database. So, why don't we search directly on the database? Maybe a big one and with the Elasticsearch memory, like this one. Postgres is a very popular and long-life database. It's added full-time search years ago with specific data types and special indexes and since then, many useful new features have been added every year until the last version. The main concept of full-time searching in Postgres is the document. A document is the unit of searching in a full-time system, for example, a magazine article or the reunion or all these parts, for example, the title, the abstract and the body text. But implementing a web search function directly on the database can be a low-level task. To do this, we can use a web framework. Maybe one of the best. Django is a very popular Python web framework. It added full-time search a few years ago and it did it in the Django country Postgres module. It added specific fields, expression and function. Since then, many useful features have been added every year until the last version, which will be released in a few days. Django 3.1. The Django documentation defines document-based search as a full-time search with advanced features, waiting, categorization, highlighting, multiple language. We can implement all of them with Django itself. But to better understand how the full-time search in Django works, we are going to see how to perform some queries, from the basic to the more complex one. That can perform well, also with a big amount of information. To do that, we can use the blog models as defined in the Django documentation. Here we have three classes with a few fields on it. A blog, with a name, an author, with a chart field, and an entry connected with both of them with a lot of text on it, and then the line, both text, and other fields. We can form basic query on these models using FilsuCup. For example, we can search an author using part of his name. We can have more cells performing a case-insensitive query. To find a word with an accent letter, which is common in Italian or other languages, we can activate the unaccent extension. After that, we can search an author name, also if we don't know exactly of the accent letters. Twelve results, also if we don't remember what the author name is, we can activate the trigger extension. Searching for an author, we can have results with similar, but not necessarily identical, name. I see you can see here. But to use all the above features, we have to add the possible module in the install adults. After that, we will be able also to perform a full text search on a field. For example, we can search for a word in the plural form and have results in the plural form. So, search a text in the more than one field. We can use the search vector function. We can define our documents as the union of the entry body text and the related block name. After that, we can search for a word and have more accurate results. To search using a more complex text, we can use the search query expression. We can also use common search syntax directly in the word text using the web search type. After that, for example, we can search for two words at the same time, having potentially more results. To perform a full text search in a specific language, we can use the search config expression. We can specify the language in both the document and the query. After that, we can have more precise results than before in the selected language. To this relevant result first, we can use search rank function. Based on the query text and the document Django, we can create a rank. We can order and filter our results using this rank. And we can also show them. To perform a fine grade full text search, we can use the search vector weight attributes. For example, we can decide that words in the headline are more relevant than in the body text. After that, we will see a new rank in our results also performing the same search. To add a light results, we can use the search headline function. We have to specify the light fields. After that, in the results, we will see some HTML tags. All these things can be customized using some attributes. To speed up all the search and simplify the query, we can use the search vector field. We have to manually update our search vector fields before running a query. But after that, we will have the same results as before, but way more quickly. I started using the full text search in Django 1.10 and the search frequently in the Django documentation for information about this new feature. But in the main time, I started asking myself how was implemented the search function in the Django website itself. I noticed that the search was performing only on English contents. And in some cases, there was HTML tags in the results. I started then the Django website source code and I found out that the documentation was generated with Sphinx and all the data was stored on Postgres. But the searches was performed in an external search engine. So I proposed to fix that on the Django developer main list. A lot of Django developers share different opinions about their date. The doubts were the amount of work to be done, the equivalence of the search feature, and the increase of the workload in the database. The safety things on the other side were less maintenance, lighter setup, and the exclusive use of Django on its own websites. After that, I organized the Django Spring during the Europe item 2017 in Rimini. And some developers joined me to work on the search updates. In a Spring day, we created the draft of the Postgres-based full text search. But also, we spent a lot of time trying to set up the Django websites locally, principally because the external search engines. In the following months, I wrote an official pull request with a complete version of the full text search. I received a lot of suggestions from other developers, and after a lot of comments, they merged by pull request. That was the first one of other merged pull requests in the same full text search function. So today, after a few years, the Django website full text search is multilingual, it's based only on Postgres. It returns clean results. It's a low maintenance solution, and it's way easier to set up than before. Also, locally, if you want to try to set up on your PC. As I already said, you want to search feature are released every year in both Postgres and Django. And I want to add all of them in Django website search. For example, misspelling server, search suggestions, highlighted results, web search syntax, and search statistics. After that, I want to share to you some useful tips to learn more about full text search and how to become an expert on it. As I said before, I think the starting point is reading the Django documentation. The Django documentation on the Django website, it's full of information about the full text search feature. You can read all the attributes you can use or the function and expression you can implement in your full text search. It's well written. There is a lot of sample, more than the one I show you now. If you want more details, you have to read the Postgres documentation in the Postgres website. It helps you to understand how it works in the lower level. And for me, it was very useful to understand how Django developers implemented something in a certain way. After that, you can read also the source code for Bottle projects. You can find them in GitHub and you can learn something from the source code and you can find documentation. It helps you to understand more deeply how things work. After that, strange tips. I suggest you to search for questions stuck overflow without reading the answer. Try to answer them by yourself and also solving the problem and submitting the answer. And it's something that will send you to the next level. Last but not least, you can also study this presentation because it's released with Creative Commons license. So you can download and share the link at the end of this talk. And you can reuse it and share with other people. I hope I've been able to show how it's possible to develop a more complete full text search using less software in the stack. Doing more with less is the motto of 20Tub. And it's our pressure of Python. You can find more about our hope it's this project and our work using this context with different social media and also in our website. To find out more about my work with Python and Django, you can use all my contacts. And using this QR code, you can download this presentation on my website. Thanks again for me and enjoy the next talk in the conference. Thanks. All right. Thank you for the talk. I think we have two questions. Will you be able to take them? Yeah, thank you. Okay, so here's the first one. Does the annotate on a search vector involve a massive database overhead to perform the query? So as I said before, I cannot record slides. Hey, I did not understand what you said. I'm sorry. So I'm taking the slide when I ask for this question. Okay, got it. As I said before, to speed up the search query and maintain a workload of the database very low, we can use the search vector fields. Because of its store or the document we constructed in the search vector, we can add index on this. Everything is working very fast as querying in a normal column of your database or field of your model. So this is the solution for speed up our query. Okay, so the next question is on the similar slide, I think. So here's the question. When using search vector field, I was unable to populate this field with fields outside the current model. For example, the author's name of a blog, if search vector field is inside the blog model, do you know why and how to include relationship fields? Yes, I show at the beginning is the one and in this example, I shown exactly these things. So the document I built, the search vector, it's the union of the body text of the entry model and the name of the related object, the blog. And so as you can see, we can construct the search vector using both this model. To populate them, maybe you need something more sophisticated. You can update your search vector field using an update or also other things from routine database, but in the theoretical meaning, you can add here also more than one fields and also join many, too many fields using aggregation. So everything is possible. Your document can be very big if you want it. Okay, so here's the next question. How much more is the load on Postgres database with this full search feature? Oh, I think this was just asked in other words right now. Yeah, okay. Yes. Actually, there was a lot of people that told the workload on Postgres database. It can be affected by the use of photo search. But at the end, I can say the workload on the database is identical than before because the search vector field is only a new column. And if you add also index on it, at the end, when you're searching in this column, you perform an index column, the column, everything work very fast. More than you thought, more than I thought before starting using it. And you can check it using the search in the Django documentation website. So here's the last question. When should Django Postgres search not be used in production? Sorry, can you repeat? When should the Django Postgres search not be used in production? I didn't understand the first word. Okay, so when should the Django Postgres search not be used in production? When should this not be used? If I ask, well, the question, I think you can use, it's no problem in production because I used in a lot of projects and as I already said, in the last three years, the Django search documentation feature is built using exactly this. So it's run queries using Django feature and full text search in queries on Postgres. So it's on production since long time. Okay, that's awesome. Thank you very much for your talk and it was pretty amazing. Thank you. Thank you very much.