 So welcome to the to the second talk for today Excluding the keynote of course, so it's the third talk for today Speaker is Benoit who's the who's it works as tech lead for grab one. Is that correct? and he's gonna talk about Python and using elastic search in Python and I'll just hand it over to him So, hello everyone. Whoops Is it okay? So we've started Around a year and a half ago at grab one to move from our legacy code pace which was in PHP Search was using is using like lots of people a person like persons We had more and more data in our database Users are more and more used to search like Google. So they type something there is spelling correction It's just giving you what you need and they're not used to like all the way of searching where you have to be exact with your syntax We started to have performance issues as well on the website because grabbing is quite popular in New Zealand and Was still growing. We had more and more deals. It was slower and slower So that's basically how our search was looking like before which is really nice when you look at it But it's not really working. It's not really scalable So a year and a half ago we started to use elastic search, which was just really is 1.0 when we started So what is elastic search elastic search is a JSON document? Oriented data store. You just store JSON with a rest API. It's really simple You just call the API you have your json you could another API and you it returns your json It is near real time meaning that it's the indexing is not instance So it's taking a few milliseconds up to half a second to index your documents So you don't want to use this for transactions. You want to use this really for search By design it's clustered. You can have multiple nodes. You can spawn them do replication It's really really easy to install and it's open source written in Java and really active so Just basic tutorial how to install it just install Java on the website UN zip and Use launch the bin that's finished So that's really simple if you want to test it on your computer. You can do it at lunchtime. You'll see it's really really So Python ecosystem around elastic search, so the official low-level library written in Python and using the HDP interface and more interesting for us because we're using it the high-level library Which is a query DSL helper. So helps you to write queries a bit like a Django model query There is a simple arm that allow you to save and Read and save objects in your database. We use Django So I wrote a year ago this module for Django Debug toolbar, which is exactly the same kind of things than the sequel debug debug toolbar, but for elastic search and there is this awesome Commonline cluster management for elastic search that is written in Python as well and is maintained by elastic search Practical grid how do you in? Take something and insert in elastic search with Python. So that's what eight lines of code Reading a one million line CSV and Inserting it in elastic search. There is no no more than this because elastic search use inferred data data type by default If something looks like a date it will save it as a date if something looks like a number It will save it as a number if something looks like text. It will just do a full-text search on it So it's really easy by default. That's why the learning curve is Really in at the start then comes higher when you try to do more complex things In Python, how would you do a search for documents? So first if you want to just get a document by ID It's just pretty simple if you want to do a more complex search You can use this library that will transform this into their query DSL, which is JSON and you get your documents So this is Basically the JSON that is returned by elastic search for each of your query So you have a bit of meta the time at the top how long it took How many shards were used so a shard is just a place where you store your documents? How many hits you get from your search in this case? We have two of them. What is the max core, which is an important feature of elastic search by default? Everything is called so if this if your text is matching more documents then the score will be higher in this document if you have Words that are more important for the cluster the score will be higher as well And then you'll get your documents your list of documents in index type the ID The score and then the source of your document, which is exactly the JSON document. You send it to elastic search So the our first challenge was how can we connect our mysql legacy database our legacy PHP code into elastic search So there are many ways to do it. I'm writing four of them Most simple to most complex ones and we're using the last one at the moment So you can have just a CSV you generate every day By your application you're importing the elastic search with the eight lines. I showed you it's working It's only working of course if you don't need real time and but if you are indexing books And you just do it do them overnight then there is no issue You can have a same kind of thing, but we have a cron job every minute. You read your database. You're like a something's changed Yes, okay. I'm serializing them and it's selling the elastic search. Then you have just one minute You are just one minute behind your main database Which is probably for most people enough even for us at guaban One minute after could be okay Just sold out will be a bit weird Because you'll click on it and say it's not sold out and then one minute later It's sold out, but this is happening all the time on e-commerce website. So won't really be a problem Then you can have a database queue using triggers with on-road change for example You create a queue and instantly you read this queue every second And now you're just one second behind your cluster or in our case in PHP We had it a post-save hook. You can do exactly the same thing in Django And this is posting to rabbit MQ and then we have a consumer that is reading from rabbit MQ and indexing it in elastic search so important things it's elastic search and jason document store in general are not SQL so you need to think completely differently you need to duplicate your data Which is really weird when you come from a sequel word So if you if we look these documents, it's just Boots and sneakers But we are we have a merchant sub document inside the documents and we are going to repeat his name repeat his location This is really important to repeat everything because then you can do aggregations on name aggregation on the clustering But you need each document to contain all info you're going to search on or group buy At the end you don't have so many fields you need to display to your end user because you're going to use the elastic search for your end users Not for your main data store You need name descriptions some basic info. Just look at your website look at what's Displayed and you'll see that it's not that many fields at the end elastic search Versus sequel because I'm everybody's always asking me so what how is it different? What how is it working? simple query select star from an index with Basic pagination you'll have size from and you have a limit 25 25 in sequel you have an order by name You have comma order by name. It's pretty similar. It's just jason But where I found that elastic search is really shining is By default even with Django if you want just to have the number of total of results of Your search it's going to do to do a count star because there is no way to count your Documents without doing a second query in sequel. You cannot retrieve 10 documents and count everything at the same time so elastic search as this Aggregations that are used more and more at the moment with what's called the el tk stack, which is elastic search lock stash and kibana it's basically log aggregations and doing aggregations on log a bit finding Graphs and things with maps. It's using the exact same technology so If you look at github for example, this page is fairly complex You have it's of stuff on the left. You have a total you have a full-text search You have a list of results with a number of stars. You have pagination at the bottom This page can be done with one query in elastic search It's taking in our use case around 10 milliseconds to return a page that is that complex for github it's a bit longer because they have more documents and Yes, this page is poured by elastic search on github. So if we take a closer look at it you have two different aggregations at the same time repository and languages You have the order by you have the total of results and you have the least view as well at the bottom You have pagination So I was somehow inspired by this page and say hey, we need to do something like this at guaban being able to search everything But just with less higher than when we have which was probably around one 1000 sequel queries to display page So What type of aggregations can you do in the elastic search? terms aggregation means for example if you want to aggregate per Category you'll have the category name and you'll have the list of categories ordered by the number statistics on anything price bucket time-based histograms Geo distance Geo hash grid I'm going to highlight a few of them just so you can understand how it is how easy it is in Python To create those aggregations and display them so if we take the date is core histogram the query would be simply AXE which is the keyword for say I want to do an aggregation Then you give a name then you say it's a date histogram. I want my field called dates I want an interval of one months and I want the output format to be with Year-year month-month day-day and the result on the right is showing you a list of buckets and In those buckets you have the key as a string You have the key which is the real timestamp, and then you have how many documents are matching then you with the elastic search Library in Python. You just have a normal Python object at the end You can just iterate through it and display it to the user more interesting because it's Really more difficult to do with sequel word is the Geo distance aggregation I want to know all restaurants. There are one kilometer for me Five kilometers and 100 kilometers from me It's the same kind of aggregation than the dates histogram, but based on Locations and it's out of the box with elastic search. You can easily draw circles find how many things are near you or far away and Then the one we are using that is even more interesting is the Geo hash Which is doing clustering of points a bit like Google is doing on Google Maps When you have multiple points the group they together and they say three in this area This is also by default using Geo hash, which is a standard industry for the clustering things on low precision Geo so here's our map with our deals can see Christchurch If there is only one deal matching in the area, it's just going to display it If we have like five of them at the same Same place, then it will just for us draw a circle say five of them if you click on it It's going to zoom in Because the zoom level will be lower then it's probably not going to cluster it and it's going to display them On the left. I put our timings at the moments, which which are around 15 17 milliseconds to Pure query, so it's really fast. It's not costy for our servers Full-text search What do you have out of the box with elastic search and Put it on your website. So you have you can search multiple fields at the same time You can do spelling mistakes. I've put an example with restaurant and and This is just finding words that are more common and closer to this one You can use synonyms. You can have did you mean autocomplete and You can also have something that is more like this, which is show me products that are similar to this one but different I Put the code that is actually doing this more like this on Gabon and it's 80 lines Which is pretty simple, but what it's doing in the hood is really interesting. It's taking the price point doing a ghost Reduction around it. It's Looking at the text of the documents on each document looking at how many words are common in those documents And I had like a pizza deal and I can see for all the pizza that are on the same price The configuration is really pretty minimal to have a result like this if you turn it you can have a really powerful Did you mean more like this search? We started with just search and then suddenly we realized why Why are we still using sequel for our website for our end users? We don't need this anymore. We can display least views. We can display products. We can display everything So recently we switched to full elastic search for the entire front end we don't use My sequel anymore for the back end only if you're logged in then the session is a Django session with Normal Django, but all pages are just displaying elastic search results So this data store. We are also in this data store. We also save CMS pages So like static pages of the website because at the end it's just another type of documents So the static pages are just normal Documents like any other one So I talk a bit about the learning curve before When I started to play with elastic search, I was like amazed how easy it was I could do things like the CSV example in five lines and of course, it's becoming more and more harder as you go because You have to learn a lot of how about full tech search how it works another hood how the IFIDF Ranking is working You have to dig into it, but it's easy to get started fun. It's really interesting You can prototype something probably for your website in a few hours I'd love to run one day. Maybe a sort of hackathon around this with Python and elastic search And bring your data and we'll try to do something out of it so in conclusion, I'd say Just just do it try it. You'll find a use case probably for your company or for yourself Keep your prototype really self-contained and try not to put it inside your main application because We switch from a big minority application and now that we have our front-end ID separated from the back end It's really easy for us to just bring new features without breaking legacy things It's more than search Like I said, it's our entire front-end database at the moment We had nearly no major issue In a year and a half with it. The only one we had was a disk fully issue that went badly but it's a bit our fault and You can you still need my sequel because remember that it's near real time that you don't have transactions It's there is no acidity. It's just Pure documents. It's immutable You need behind the scene your back end with a sequel database, but you can easily push this for your user with elastic search So I have some interesting links here I'm sure you have lots of questions because you can already see some use case I'll be happy also if some others are interested to try to create a elastic search user group at some points in Auckland not Christchurch So if you are interested with this come talk to me later and we'll see if we can do something Thank you Benoit So I'm sure I'm sure this talk will spark a lot of interesting questions And I'm pretty sure most of you will have some use case for elastic search. So Please any questions? Do you know if it's at all feasible to use it as a component in a desktop client-side application? I'd say if your client is in Java. Yes, because it's Java based You could potentially install it if you even if you use Python and your clients have Java on their machine Installing it there is no install. It's just a Binary to run so jar to run. So yes, definitely you could potentially shift it with your program Run it behind the scene using Python to run the binary and it's just running It's using under the hood Lucine index indexes Which is used by Sola as well and for example an ID like PyCharm is using this under the hood to do the search On PyCharm. So because but it's fully Java. It's easier for them because they just interface with Java Hello Have you do you when you're thinking about switching to elastic search? Do you consider Postgres as opposed to my SQL considering Postgres has full tech support as what tech search support as well as Postgres for doing the GIS dog queries. So I've used Postgres project the Postgres When I was working at yellow To do this the same kind of things and I switched to elastic search Because first it's faster for millions or billions of documents in Postgres It's just not fast enough because it's not Distributed you can have around I think GitHub has more than 500 nodes so 500 servers with elastic search clustered then you can really do aggregations on those 500 With postgres you'll have an issue with this the full tech search Yeah, it's kind of same but then you have more features it's easier to interface because it's JSON So it's working everywhere, but I started with postgres. It's definitely a good for starting elastic search is more powerful It was good talk things When you were listing out the ways that you were doing the updates Did you say that you'd sort of essentially moved through those and you'd settled on the last one where you're doing a using a Django hook Oh, yeah This one So yeah, we are using the last one In PHP and Python, so we still have part of our back end that is in PHP So it's a symphony Backend and there is the same thing that posts posts save hook So we are when we save anything we are serializing it and pushing it to rabbit MQ Right, and then we have our front end which is just a Django application with a consumer that is listening to the queue Indexing documents all the time as they come in at the moment. We have two backend systems one in PHP one in Python so we need a way for them to talk to the front end and Rabbit MQ with both of them sending documents is a good easy way for us, right? Okay, good things and the other question was when you first used it did you Did you use a book did you have it make use of a book which you would recommend? No, but not when I started but I can recommend one now, which is the official Elasticsearch book. I don't remember the name, but it's on their website It's an e-book and a physical book as well. I'm not sure the physical book is released yet. It's probably in the next Few months that is supposed to be released and I read it It's a it's a massive one, but it's it's really good I Was a bit in dark when I started because it was still 0.9 when I started 1.0 came drew a lot of attention at it So now it's way better to find documentation and being helped and this book is good Hey, Benoit So you said that you started Off with my sequel and then you move to elastic search. You're no longer using my sequel Yeah, are you still using that the source data? So is that my sequel still your source of truth for your data or not? This is correct My sequel is our source of truth, but we're not exposing it to our users. It's only our Admin people or sales people that are entering deals on the website that are using a website for it by my sequel But the front-end itself when you search or look at deals, which is not true for carts for example Cart and transactions on the websites are still happening with a transactional database We don't we're not using my elastic search for transactional stings because there is no transaction So it's a bit hard to roll back if you have an issue But the entire front end you don't need most of the time is equal database. You can just use json document store And how do you back up your elastic search indexes? Is there any yeah, that's my way of doing so that's a very good question there is No, some built-in snapshots Restore feature where you can just run from Rest API, please back up my elastic search to this path and then you'll have a backup and then from any other cluster You can say please restore From this cluster using rest API. So yeah, it's built in we're using it We are also I've restored on my computer dev computer the production cluster and it's coming back from prod directly and I have a local cluster that exactly exactly like fraud So how would you compare it to something like so solar? It's Really similar because hand of the wood they're using lucine. So it's same speed Same kind of features the difference is solar is pretty rigid and that's exactly why The guy that creating elastic search call it elastic search because you found solar really rigid And said I'm going to call it elastic search because it's less rigid than solar. I said that it's Way easier to get started with elastic search than solar. You can do right really good things with solar But you know you need to know what you're going to do with elastic search you can figure out as you go We still have time for a couple more questions Even if they're really complicated. I'm pretty sure Ben wise happy to answer that if there are no questions Please join me in thanking Ben while again