 So let's start. Hello everybody, thank you for joining me. It's a great honor to be here sharing with you my passion and knowledge about the Elasticsearch stuff. I'm really excited because this is my first Drupal conversation and excited is the right word. I want to apologize because we're supposed to be two speakers but the other guy, Valim Velchev from ProPiPo, had some issues and he wasn't able to come. So sorry about this. My name is Nikolay Ignatov. This is my Twitter account. I worked several years for ProPiPo and from now I'm the founder of a new company that will focus on Drupal development and Elasticsearch integrations. I'm using PHP since 2006 and using Drupal since 2009 when I joined ProPiPo and it was love from first look. I really love the new technologies and try to give my knowledge to people if I can. I'm a volunteer in a Drupal Bulgaria foundation. We are trying to make a Drupal popular in Bulgaria and I'm a maintainer of the Elasticsearch connector module for Drupal and the main architect of the module. My colleagues used to call me Mr. Elastic because of this. So what we are going to talk about, we are going to talk about what is Elasticsearch, how it works, what we did for Drupal. I will show you a quick demo, live demo, let's hope it will work. I have a video for backup, so don't worry. We will see the demo. Yeah, and we will talk about the roadmap of the project. But before we start, let's show some respect to the creator of the Elasticsearch. His name is Shai Banan. You can find more about him on the links I provide. He did a great job with Elasticsearch, I can say, and he had a lot of experience with the distributed system and open source searches. So at first, I asked myself, is it really Elastic and why it's Elastic? It was very strange to me why you should call something Elastic. And after a while playing with the system, I really saw that it's very Elastic and just wondering how to represent you how Elastic can be in the system and just take this photo. This is actually not Photoshop. So with Elasticsearch, you can split your data and scale massively. And the most important is that it's not so hard to do it. And it's great technology for me. But to get your rotation again after this destruction and picture, let's see who is using this system. Bloomberg, this big company, is using the Elasticsearch for work warehouse and they crunches more than 1.5 billion or client per day. The Guardian analyzed how the users interact with the news and they have more than five million users, I think. And GitHub, most probably all of you are searching in GitHub and GitHub is using Elasticsearch for searching in the code. So a lot of files there. And some companies like Facebook, sorry, Atlassian, 4Square, SoundCloud, Stackover for Wikipedia also using Elasticsearch. So a lot of big companies are using Elasticsearch and it seems like Elasticsearch fulfilled their needs. So what is Elasticsearch actually? It's a distributed time search engine and analytics engine. And it's a data store like MongoDB. It's distributed out of the box. It's very easy to just start new instance of Elasticsearch and they're a former coaster. And it's very easy. And this was something I really like. It has a REST API with JSON. It's very easy to work with it, you will see. It's a bottom top of our package I've seen and it's open source. You can find the repository for my presentation. It's a schema frame. You don't need to specify your schema before you index the documents. There is no need to tell the Elasticsearch that this field is, this field is integer and so on. And it's document oriented. You can put JSON documents in this system. And it's also nested. You can put nested documents. It's great. So how can you install it? It's pretty simple. Just download the Elasticsearch from Elasticsearch.org and zip it. And if you have Java, just go to the grandfather of the main Elasticsearch folder and execute Elasticsearch. And it will start the Elasticsearch. And if you have default settings and start another system with Elasticsearch, they will form a cluster immediately. And that's it. You can start using and pushing your documents there. It's very easy. And I will show you right now. But I just want to show you some tools that I'm using for the demos. It's a vast, vast cloud browser. It's a pretty good system for testing some rest APIs. And I'm using a plug-in system for Elasticsearch called het to see what is happening when we push indexes in Elasticsearch and so on. Elasticsearch is a plug-in system and extended a lot. And I started using Elasticsearch maybe one year ago. And there were no many points. But right now, they are crazy. The community there is good. So I have one virtual machine right now. And I have downloaded Elasticsearch. Can you see? Download Elasticsearch.org. And I have this version and this version. It's the just extracted twice. So I can go to the first one and just execute the Elasticsearch. Now Elasticsearch is trying to find other nodes in the network to form a cluster. But it's pretty easy. And this is the het plug-in. And right now we can see that this is my instance of Elasticsearch open running. I can just go a simple get to see that Elasticsearch is running right now. Okay. So we have right now started Elasticsearch. And we can start pushing documents, indexing stuff and so on. So to show you how the distributed search works, to understand it, we only right now started one instance of Elasticsearch. And it's one node in the Elasticsearch terms. They call it node. So we started one node of Elasticsearch. And this is like we have a new installed MySQL. Right now, we need to create the database for it, or the index for Elasticsearch. We just popped HTTP request and the name of the index. I'll just show you with the REST client. Okay. We have the server. It's over HTTP. As I said, we specify the name of the index. And here we specify some nice settings. They're very important actually. You can specify a lot of settings here, but these are most important you need to remember. The number of charts and the number of replicas. Let's just create first the index. And now we'll explain you. Okay. We have acknowledged. True. So it's created. And if I reload the hit, I can see that there are some zero, one, two. And what is this? So what's happening right now is that I put three charts in the system, which means that there are three units of data. This is a world-wide scene indexes that are allowing you to distribute your data to three servers right now. So if I put another server here, some of the charts will move to the second one and some will be replicated. On the first image, you saw that there was no replica. And this is because we have only one note. It cannot replicate it actually. So if we put the third note, it will move the other charts to the third note. And if we load one note, it will automatically record for the other notes. And you can put multiple indices. Let me show you. Right now I can see that the replica's charts, these are the charts you cannot see. These are the charts that are actually right now. And these are all the charts that are the replica chart. And if I start a new instance right now, it's the same machine, but just another instance of elastic search. They will form a cluster and you'll move the charts between them. It will take some time to transfer the data. We have it. So now I can put, for example, dealer index, again with three charts and one replica. And we can see it here again. So right now we have two databases, one called North, the second one called Twitter, and we can start using actually the system. So index of document. It's very easy. We have the index right now. We just specify the index, the type of the document, and the idea of the document. This, for example, imagine you have a North type particle and the North ID and the body of the document JSON object. And putting this to elastic search, it will index it automatically. And what will happen actually, if we put it to one note, it will decide to which chart to send the document. It will replicate it. It will go back to the first chart and we will return the result to the client. So showing you this. I have one document here. I can just put it. You can see put command and I receive the following response. What is actually important with the response to remember? The version ID. This is very important for the distributed system because this is the way how you can handle the conflicts. For example, if two users try to put the same document to different nodes, the system should somehow reduce the conflicts because you cannot show which version of the document you are trying to put again, for example, or to update. And when you put again the document, it just replace it and increase the version ID. And if you specify a new version ID, for example, if you take the document, the version ID is two and you put it again, you should specify the version ID two. And if somebody else in meanwhile put this document again, you will get a conflict. So you can resolve conflicts like this. So to get a document, you just execute the get command and you receive the document and all the things in the JSON object are in the source field. How the get is working. You can call whenever node you want. It finds the right chart for you and it's returned the document for you. So to update the document, as I said, you need to put it again with the put command or use a script for Elasticsearch that will update a specific field. It is the same actually, but with the scripting is on a chart level, some Elasticsearch stuff. And it will be faster actually, but it will do the same. To delete a document, just execute delete. To delete the index, just execute the delete command. It's pretty straightforward. So the funny part is the search actually. We right now put some documents and right now we need to search them. To make a search, the examples on Elasticsearch.org for the beginners, let's say, show you the query in the URL. But this is the old approach source stuff that I don't like very much. And we'll ask the body for our examples to show you how the DSL query works. But just show you some example how you can test the system with the query parameters. What happened when you make search? Just call one note again. This time, because it's a full text search, should go to the old charts because it cannot know which specific chart to call if you don't specify, of course. But this is another topic. Merge the results from all the charts and then give you back the result. So let me index some documents right now and show you how you can search. So I've put one document. I will put the second one right now. And now I can go and search. As you can see, I'm calling the search endpoint, not the... I don't have any query in the URL. And I have this body that will match my documents. And this is called the DSL query. It's very, very powerful. And it's something that makes the difference, I think, with the search engines. You can nest a lot of stuff here. It's pretty awesome, pretty awesome. So you execute the postcard with the search and I tell the elastic search that I need to match the user that is user two. And right now I don't have any documents. Okay? So I need to index another document because I don't have any users two. I will put another document with user two. Now I will go and search again. And here is my document. So I will not discuss in very detail the DSL query because it's a very big topic. You can do a whole session for this, actually. Let's focus on how actually the full-text search is working. The full-text search is working by creating the so-called inverted index. And the most search engines use this inverted index. And what is happening when you put, for example, a text, it will separate the text into words or terms. Then you'll make a unique list of these terms, sort a unique list. And then you specify which document contains these terms. And for example, if I have three documents indexed in the system right now, and I have some words in it, it will match each word in each document, something like this. And when I search for, for example, jupal.com, I will get the third document as relevant. Or only if I search for being, I will get two documents. So in this way, it's made very fast. So this process in making this inverted index is called tokenization and normalization. And it's analysis, actually, in Elasticsearch. And each field, when you put a document and you have a string in it, it, by default, go through this process. And it's the default process that parts the string into words, make them more case and store it in the inverted index. And you can, you can have one tokenizer that make the separation of the words and multiple token filters that, for example, implement the stemming part and so on. So Elasticsearch has many more nice features. As I said, it's, it's a DSL query language percolating that it's reverse searching. Instead of putting a document in the search, you put a search query and then execute a document and it returns which query match. And for example, if you have shop and you're all users, for example, to subscribe for price, if the price of the products become below a specific price to get a notification and you can very easily do this with this percolating stuff. It's a very powerful system. You have the facets and aggregation as the same like the Apache store stuff. It's the same actually. In the latest version of the Elasticsearch, we will not have facets, we will have only aggregations because it's the same and more powerful than facets. We have parent child relation, which is very, very powerful thing. It's something like joins in my scale. You can specify the parent document, the child document. You have aliasing that can be, for example, if you have one index with several types, you can alias the, for example, you can alias this index to something else and filter the whole result by some criteria. For example, you want only type particles to be shown and call the index like normal index, the alias like normal index and you get the documents. You have also geolocation and attachments. Attachments are handled with the Apache ticker, the same like Apache store. We have many more features, but I show, select some videos that you need to watch to get more information about the Elasticsearch and the community there is big and it's something like Drupal, I think. So, to turn a point, actually, this is the module that I've made. It's Elasticsearch Connector. The main idea for this module is to build a whole ecosystem using the Elasticsearch. Right now, it has integration with the search API. It has watchdog. It has statistics, but statistics is not very good model right now. It's just a port from Drupal statistic module, which we all know that it's not so good, but I'm using the Elasticsearch as a backend. We have a developer module and we have the views module integration that I like to extract documents from Elasticsearch without having them in Drupal. So, let's hope that then we will work. I already start trial kickstart distributions just to be more nice. We have the product right now, the product list tag, and it's come from the search API, but the backend for the search API is a database. So, I can go and enable the modules. I don't have the latest module from the Drupal repository. So, the module requires the Elasticsearch.pl library that you have two options to install. The first one is to use the Composer Manager module that will download it for you, but it requires some work to do it, some commands to execute, and the other one is to enable a model from this package, and it's called easy install. It's a packet with all the libraries needed for this stuff. And for example, if we have Drupal on shared hosting, which is not very good, but it happened, and you want to use the search, your option is to go with a dedicated server or something like this to install your search and install your libraries. But if the client wants just to easily install the libraries, it can use this package. For the demo, I will just install this package. It's the Elasticsearch Connector module, and this is the package of the libraries. I will install the search API and the watchdog for the demo. It's not the fast system, but it installed so I can go and create my new cluster here. This is just pointing to the Elasticsearch instance unit. So I should name it specifically because I found work before the session. You can see that you have the two maps, the state is green, everything is okay. Well, we set up actually the cluster right now. So the Elasticsearch Connector is just handling the communication with the Elasticsearch in Drupal. You have some statistics here. You can see the indices in this cluster and so on. But it's only to handle the Elasticsearch specific things. Now we can go and configure the search API. What we have is just an Elasticsearch specify the Elasticsearch Connector service and select the cluster we need to use. So the next step is to change the index server to be the Elasticsearch. You can easily do this by selecting the Elasticsearch server in the search API and you can select from the existing boxes or you can add a new one. Again, you can specify the merge of shards. There's three, zero and we have the index right now. And just save the settings. There are some statistics I want to implement here that, for example, missed queries and so on because the Elasticsearch can handle this stuff. And it's very important for users to know when somebody searched for something and didn't find anything. So it will be very good to have the statistics here and it's under development right now, but I don't know when it will happen. I just index the documents in the Elasticsearch and now if I go to the products, I have them from the Elasticsearch in the same order. We have the facets, we have everything working. So to show you the watchdog right now, we just need to go to the watchdog settings because we just installed a module without setting it up. We again need to select the cluster, then we need to select the index. We don't have right now index for the watchdog, but we can create one. Here we can specify the type in the Elasticsearch and which types we want to show. I will show why we need these fields for the other side. Okay, it's specified, so if I walk out and walk in and now go to the reports, you will see the same like Drupal watchdog settings but using the Elasticsearch right now that works here. And we have the filtering with facets showing you how many messages you have and you can fully search the watchdog. But to show you why, we have to specify the type and the type for the watchdog. Actually, I will show you the second installation of the Commerce Kickstart. So we can collect the watchdogs for several sites and showing them on one. So if you have several sites on shared hosting, you can install somewhere or use plenty of hosted solutions of Elasticsearch and use them like storage for your watchdog and monitor them from one instance. And let me show you how it's done. Again, we need to install the modules. When you will start only the watchdog here, there is only one module to show you after that. So again, we need to configure our cluster because it's not represented here. Actually, I forgot to delete it, so we have it. Great. So we can select the demo type, the watchdog, and let's say we need the watchdog message 2. And here we just select to search for the two types. And if I go and log out and log in again to generate some logs, I can see the logs here. Wrong one. This one was the correct one. So I can see the two domains here and you can filter the watchdog by domain. So, for example, your SLA team can monitor some errors happening on other sites. So the last one I want to show you from this package is the views integration. As I said, you can select from any indexes you have in the system without having the documents in the Drupal so you can share documents or whatever. It's a... We have a real use case for this that we are importing some transaction from MySQL to Elasticsearch and use this module to see the transaction in Drupal in the new pages. So it's not stable yet, but it works. So if I go to settings to the views, I cannot preview. And now I have the Elasticsearch cluster in front and the types and the indexes I have on my cluster. So I can, for example, select the tweets that I indexed through the advanced test client and make my view from this. Now you can see the preview of the view that is showing the messages from the type in the Elasticsearch and they are not in Drupal, actually. So enough for the demo. So we need to talk about the roadmap of this module and my main idea was to collect as many contributors I can for this module because I believe it can be something good and helpful for all of us with the performance stuff and so on. So the roadmap is, of course, to make a stable version of Drupal 7. There was a Google summer of code that made the integration for Drupal 8. And I have a request to review the code and maybe merge it in my module and start working for Drupal 8 version. I want to build the statistics to integrate Kibana one tool that's very nice for some admin stuff. And these are, I think, pretty good points and they are very big. So if you are interested in contributing to this module, I will be very, I will appreciate this, actually. So I want to thank some guys from Denmark that invest some money in this project without giving anything. The guys are actually very cool. They just invest in the module because they want just to make the module better and to have better users. We actually contribute some stuff because for Drupal 7, we have two modules for Elasticsearch. The first one was search API Elasticsearch. That is only a search API integration and we start contributing to this module but there was a response there without applying the patch and we decided to switch to another version of the library and to build this ecosystem. So thanks to these guys and the contributors. Thank you. If you have any questions? Yeah, I have a question about GeoSearch. What GeoField do you prefer to use for it? Well, the GeoField, actually right now we have implementation of the vocation, I think, but we didn't make some deep research on which is the best field there. So it's hard to say right now because this is a part that we need to investigate and implement in this module because it's not very well supported right now. For example, the vocation module is not supported at all. It's a Geo vocation or something like this in Drupal module that is supported right now. And last question. How should you work with Elasticsearch when you have production and development? Should you have different Elasticsearch servers or should you work with different indexes? Well, this actually depends to you. I actually prefer to have different servers. Thank you. Hi, great presentation. I have two questions. First, we're using solar a lot as with the Angular Atlas Drupal. And that's great when you have a strict data model and scheme and everything. So I'm interested in the schema less part. Now, if you create a new content type with a bunch of fields and you create content and click index or whatever, is the output good enough? I mean for an API? Well, actually when you, it's a schema less actually because when you put a document without a schema, it will auto generate a schema for you based on the document. It will match if your field is string in the JSON, if it's an integer, if it's a fault, and it will create a schema for you. And there are some points, for example, the geolocation stuff where if you put the geolocation without adding a schema, you will receive a nested object. It will not be a geolocation point. So from time to time, you need to touch the schema. But most of the time, if you don't have any specific data like geolocation, it will work for you. So basically, you can just add the removed fields and it just uses the machine name and you get the root. You cannot add the ID. You can put the one structure of the document. The other ID can be something different in the structure. You can have more fields. You need to add it in the elastic search without problem. Okay, second question. Is there like a soft commit option similar to the solar for soft commit where you don't have to re-index a lot but you just update that specific? Well, I'm not sure, but I think there was some options, but I'm not sure. Sweet, thanks. Awesome stuff. Thanks for the presentation. Also, I have two questions. The first one, is it an idea, did you think about an option to make a caching backend in elastic search for Drupal? I actually had the idea to remove my SQL and use the elastic search. That was my second question. Would that be feasible? I don't believe this. In Drupal 7, in Drupal 8, I'm not sure. I didn't go deep into the database layers, but in Drupal 7, for sure, it will not work, because a lot of modules are using my SQL specific queries and it's not possible actually. What about the caching backend option? The caching backend option, I believe it's okay because it's just key value and it will not be a problem. Because the advantage would be that it's self-replicating and consistent, so it's very easy to set up a highly available caching. This is actually a good point. Maybe I will answer the issue. Thank you. Okay, thanks for this very nice presentation. I have a question about the views integration, because Shares API has also got views integration, so why did you write your own views integration module? Yeah, this is because if you don't have content in Drupal, for example, the use case I mentioned, we have a MySQL table with the transactions, payment transaction, and we need to search them with free tech search and so on. We put them in the elastic search and they are not in Drupal, just from one MySQL table to the elastic search, and then Drupal can build a view from this elastic search index. That's my view. Okay, and we are now sprinting for Shares API 8, so if you want to join or have a discussion, this will be very nice. Thank you. Okay, if you have any questions, I will be here and I'm a responsible guy, a little bit hard to speak English, but I will try my best to explain you and share ideas. Thank you.