 You can notice already one thing might happen that I can't talk anymore. That's really unfortunate. And the other thing, my laptop might run out of battery life and I only have a European charger, but I hope we will be fine with both. Yeah, Matthias talked a lot about not only about using Elasticsearch for log analyzers, like doing centralized logging with Elasticsearch and that's a really, really good use case. And if you're not doing centralized logging already for big applications, it's really, really helpful. And for developers, that's a really, really great thing to have. The last project I did, we had centralized logging with Elasticsearch. Elasticsearch for business reporting and Elasticsearch as a search engine, all in the same project. So that was really nice to have. Yeah, I'm Florian. I'm from Germany originally. And I'm doing a lot with search, obviously with Elasticsearch. Matthias just reminded me that this is the old logo and I shouldn't use it anymore, but I like it better than the newer stuff because of the glasses. But I'm also doing a lot with Lucene and Solar as well. So I'm more on the side of doing application development, but also a bit of centralized logging and stuff like this. And besides the search stuff, which I mainly do as an independent developer and consultant, I'm a classic backend developer, I would say, all in the Java landscape, like doing classical stuff like Spring and Hibernate, and also different databases like MongoDB, MySQL, and all the different microservice technologies that are available and taught right now. Yeah, I'm extremely happy to be here today. As you might notice, I'm doing a user group, the Java user group in my hometown as well, or I used to be the organizer. We are maybe the same size of people, I guess, normally doing talks every month, so it's really fine to be here as a speaker at the Java user group here. And I'm also one of the organizers for the Search Meetup, which is more specialized. I think there are two meetups like this in Singapore as well, but there haven't been any talks for a while. Yeah, Singapore, I really, really like Singapore. I'm here for the fourth time, I think. Yeah, that's one of the pictures that was available for free. So I took care that I don't have any problems with licensing. So are there more buildings right now? Okay, I did notice, sorry. Yeah, okay, so maybe that's because I'm mainly here for the food normally, so I really, really like the food in Singapore. And all of this sums up, it's a really nice city, I like the food. So currently, I'm looking for work in Singapore, so I'm looking into relocating here. Yeah, but that's. Yeah, maybe if you have interesting jobs, I would be happy to talk to you. Yeah, that's enough about me. This is about Elasticsearch, of course, and the Java clients for Elasticsearch. But for now, Matthias showed already some parts of how Elasticsearch works. But let's take one step back and look at the basics again. This is a definition from the Elastic website. Elasticsearch is a distributed, JSON-based search and analytics engine. We heard a lot about this already. Designed for horizontal scalability, maximum reliability and easy management. I really, really like this definition because it's a very, very concise definition containing a lot of the different aspects of what Elasticsearch is. Important for us right now, let's first look at the aspect that it's a JSON-based search and analytics engine. What does it mean? First, let's have a look at what search is generally. Like this is something maybe everybody of you has seen already and used already maybe. This is the GitHub website where you can just search for different projects or code snippets or something like this. You're doing a keyword search. So search normally is keyword search. Easy. I think that one of the aspects that distinguishes Elasticsearch or search engines generally from databases is the notion of relevance. We have already heard about already. And you can see that when searching for Elasticsearch, it is very, very likely that somebody searching for this term is meaning the project Elasticsearch. And this is also what is happening here. GitHub already manages to have this project on the first position, even though there are other projects with the same name. So I don't know what GitHub is doing internally, but I could guess that they are also using the popularity of projects for boosting this. So this is important for search, that the user gets the documents they are looking for at the first places. Besides that, from this screen, you can also see a lot of different aspects supporting features of search. Like there's this highlighting features, which lets you see where a match occurred. So this can help the user to see if the document really is the one he is looking for, or she. And also you can sort by other aspects. There's a lot more supporting features. For example, this here, the faceting, we already have heard. It's for refining the current search. And this is done using the aggregations feature of Elasticsearch. All of this search like this can be built using Elasticsearch. And in this case with GitHub, it's Elasticsearch as well. They're using it for a really, really large amount of data. And working fine, I think. Okay, if you want to build something like this ourselves, of course, we first have to install Elasticsearch. That's really, really easy. All we have to do is download the artifact. Like it's an archive, like a zip archive, but there's also TowerGZ as well, unpack it, and run a script. All you need to have available is a Java runtime. And that's all you have to do for now. Once it's running, you can just call it using HTTP. I am using Curl here, the command line, HTTP client. Just doing a request on the default port of Elasticsearch. 9,200, and Elasticsearch will answer with this JSON document here. It's just a bit of information, like the underlying Lucid version that is used for the search features or the build date of the libraries. But for us, it's important the application is running. That's what you can see here. And Elasticsearch is used in JSON everywhere. So we will see that a lot. Okay, but we really want to build an application. And for me, what is very important is with all the good food in Singapore, I of course want to be able to find it. So I will be indexing different dishes in Singapore. Like this is the chicken rice, very popular, I like it. And this is a classic document that can be indexed in Elasticsearch automatically. You can see different aspects of JSON documents available here. Like there's simple strings, but there's also this kind of lists or array where you can have multiple strings. And also you can have sub-document like here with the favorite section. And some numeric types, Boolean types and the geotypes we already have heard. So where are we putting this data? This is, we just added two fragments to the URL here. The first one is the index name, which is just a logical collection of documents. It's like a database in the relational world. You can just choose anything. The thing behind it is the type of the document and the type determines the structure, how it will be stored in Elasticsearch and how it can be searched afterwards. It's very important to know that even though we have indexed another document before, there's no need that all of the documents need to be the same. So even though it's very, very advisable normally in applications you will have documents that are very close in structure. But of course you can leave parts of the original document out or add other attributes. So it's a schema-free style of development. Like there is a schema in the background, but for the user you can really evolve the documents easily. Once we have indexed those documents without doing any configuration, we can already search them. This time using a GET request on this underscore search endpoint. And then in the easiest case we can just add one parameter containing the search term. And this works without any configuration. It returns the two documents we have indexed. And in the list here it will also contain the original source of the documents we have indexed. So without doing any configuration work, you can easily start Elasticsearch index documents and search them. Using parameters alone wouldn't work. We want to have a lot of search features and the queries can get rather complicated. That's why with Elasticsearch normally you don't do the parameter stuff, but you're using the so-called query DSL instead. This is now a post request that sends a JSON body to Elasticsearch for search. And this JSON structure here describes the query that we are about to do. In this case we are having one match query here. Match query is the kind of, it's used most often for the full text search stuff. We are querying on a special field that is a combination of all other fields. And say we want to have rice contained anywhere. And then we're adding another filter here that queries the tags. And this is the keyword feature that was introduced in Elasticsearch 5. So this means this needs to be an exact match down here. But this is a free text match. So there might be more processing over here. So this is the way Elasticsearch is working. At first it might seem a bit of overhead like passing this huge JSON structure just for search. But it's really, really useful. And it's easier to get started with it and even it's easier to maintain it after a while. So the query DSL is also the place where you would request a lot of the other features like highlighting aggregations and stuff like this. You can do a lot more with it. And also normally you don't use, for production applications, you don't use this feature that you can just index documents. You would normally define a mapping for them. But those are features that are out of scope for now because we just want to look at how we can use Elasticsearch with Java. But first let's look at another aspect that is also contained here in this definition. It says that Elasticsearch is distributed and designed for horizontal scalability, maximum reliability. This of course means that starting one node only is not enough normally. Like what we can do is we will have one Elasticsearch node and our application somewhere. Application can be Java application or even something else. This will access Elasticsearch and use it for search or anything else. Once this node goes down, our application would be down as well. And this is not what we want to have with a highly reliable application. So what you normally are going to do is to have multiple of the Elasticsearch nodes. Those will form a cluster automatically. It used to be more magic in the background. Right now you have to configure a lot more to have the production mode. Matthias talked about. But still it's rather easy to manage the cluster. Still you can talk to one node only and it will take care that it will ask the correct node where the data resides for the information. So this feature is for high availability. We don't want the cluster to go down totally. And secondly, it's for maintaining very large data sets that fit on one node. So those are the two aspects here. Okay, let's recap. That's a basics of Elasticsearch. It's a Java-based search server. It uses HTTP and JSON everywhere. The search and filtering query DSL is very powerful. There are so many queries for different use cases. Often you have to combine them with the analyzing process. And there are lots of features for supporting search, mostly from Lucene, like the highlighting and suggestions, but also custom features like faceting using the aggregation. And very important, it's a distributed system. The nodes can form a cluster. Okay, let's look at the first option. There is two access Elasticsearch from Java. This is also the one that is the oldest. It's a so-called transport client. This was available for Elasticsearch for always, I think. With Elasticsearch 5.0, this is now a separate artifact. You can use it from Maven or here from Gradle. Just say you want the transport client available, so it will pull down some charts, a lot of charts. Then you can do, like, you define a transport address. That is one address for accessing your cluster. In my case, it's running on the local machine. So localhost, port 9300. And then you can build one of the clients from this, the pre-built transport client it's called right now, pass any settings that might be necessary, like when you have a custom cluster name or something like this, and add the address. This client interface can then be used to do everything with Elasticsearch. Like, you can do the classical search applications, you can do the indexing stuff, you can do the administration and monitoring and everything. Let's look at the search part. Again, this is the same query we have seen before. The good thing with the transport client API is that it's very close to what the query DSL does. So what you can often do is, like, you experiment with the query DSL, just typing in in the browser or on the excellent sense plugin, which is now also part of Kibana. And once you are happy about your result, you can just take this structure and nearly one-on-one do this with the Java application. This above here is the code that searches the same query. And if we recall what we had before here, there's a bool query that contains a must section and the filter section, the must section contains the match and the term in the filter. And we can see the same thing here. There's a bool query, contains the must section with the match query and the filter with the term query. So it's really easy going from the query DSL to the Java part again. There's a bit more before here. We are starting the search with the prepare search method. We can give it index and type and then execute it. This is all asynchronously executed. So the action get here is the blocking call. Afterwards, this search response object is the same we have seen before, like the response to a search. It contains this hit sections where we can, for example, get the total number of hits. And this hit sections then has the list of hits themselves. And we can access the original source of the object again. When indexing data, we can go the same way. There are different options available. This time we are starting with one of the builders that are available. Like this is a static import method, the JSON builder. And this is very useful to just construct the object you want to index. So in this case, you can just say, okay, start object and object for the JSON structure and then add field, add array, stuff like this, and start objects again. So you recreate the JSON structure again using Java code. This can then be used with this prepare index method as the source. So this is the document that is then going to be indexed again. You can use a lot of different objects and types for passing as the source document. Like very common is the builder we've seen right now. Just pass it as a string. Like you can use something like Jackson or JSON to build a string-adjacent representation. And there are some convenience methods here for building very simple objects or just passing as a map. So there are different options available. Okay, we have seen it connects to an existing cluster. And you might have noticed that it didn't use the same port that is used for HTTP. That's because it's using the transport protocol. That's why it's named that way. And this is a binary protocol that is also used for the communication inside the cluster. So of course, this is more efficient. So it looks like this. Basically, all of those communication aspects here, it's all binary. What I didn't talk about before is I talked about high availability, but there's only one arrow here going from our application to this node. So if this node goes down, we might be screwed even though there are two or more nodes available because our application doesn't talk to it. That's why the transport client and all of the other two clients I'll be introducing later on has a feature called sniffing. And this sniffing allows our application to retrieve the state of the cluster from one of the nodes and then talk to all of them in a round robin fashion. So it does client-side load balancing. And then when this node goes down, we don't have any problem in our application because it can still talk to the other nodes. Yeah, you can do this by just passing this as a setting, client-transport.sniff and it will automatically do everything for you, do client-side load balancing. Yeah, that's the transport client. The important thing, it has full API support. Like we saw building the queries, it has the full support of the current version of Elasticsearch always because it uses the same code that Elasticsearch is using internally. So you will always be up to date. You don't have to upgrade your client library for new versions. The communication is very efficient because it's the binary protocol and it has the client-side load balancing. But there are also some serious drawbacks with the transport client. Especially that as you're using an internal protocol, you need to be very, very careful with the Elasticsearch versions used in the application and in the cluster. Those need to be nearly the same. I think there are some exceptions. And this makes upgrading a lot more difficult than it should be. You have to take care that your application is upgraded when the cluster is upgraded and the other way around. And this is not suitable for all applications. And also the JVM version? The JVM version needs to be the same as well. And this might be even more dramatic, I think. And another drawback that can be as worse is that it has the dependency on the full Elasticsearch server. When you're pulling down a chow, even though it's a separate artifact right now, this will pull down all the Java code for the Elasticsearch server. And this also means that you will have dependencies on Lucene, on Jews, on different libraries, and especially for existing applications. Or if you're integrating either in a CMS or an online shop, an existing one, you will have some dependencies already and it can be a real nightmare to make sure that all of them work together. And it can be even impossible. So that's why Elastic introduced the REST client with version 5.0. But so far, are there any questions about the transport client or the stuff I introduced? Fine. If there are any questions, just feel free to ask anytime. Okay, of course, this is different now. Our application excesses the Elasticsearch cluster using HTTP instead of the binary protocol. That's the huge difference. Elasticsearch itself will still talk with the transport client, transport protocol, but our application talks to Elasticsearch using HTTP. Of course, this makes it far more independent of the Elasticsearch cluster of the versions, anything, and even of the API. The HTTP API doesn't change that much as the internal stuff. And this is available in Maven Central as well. It's just, instead of transport, you can use REST. And it has far less dependencies. So this is actually, I think, one dependency with some transitive dependencies. It just uses the Apache HTTP client library. So this is a lot easier to integrate in existing applications, of course. So how do you start with it? We saw it on the slide already by Matthias. You can add one or more HTTP hosts that are used for communication. And this is, again, one of the builders, like Elasticsearch is using the builder pattern a lot. And this will create a REST client. The important thing is we don't have that many dependencies. And because of that, there is not much support right now for any query DSL structures building them. So the REST client, as it is right now, is mostly for talking HTTP. And there's not much support for querying and indexing. So for example, this is one example one might do right now for querying Elasticsearch using the REST client. You create one of those HTTP entities. I think this is also part of the HTTP client, Apache HTTP client. And you really pass the string you want to add as a payload. So in this case, it's a match all query because I wouldn't be able to fit a normal query on this slide. But even this might not be as bad as it might seem because when building an application, you don't have that many different queries. So you can even use something like a template engine or something like this, just store your queries in a text file and load them in the application. It could be fine as well. And then you are passing this to the... You just perform the request, say you want to go against this endpoint, and that's basically all. And after that, you can just then retrieve the result of this request. And again, you will retrieve this HTTP entity. And there's a convenience method for just converting it to a string. But after that, right now, you are completely on your own doing something with this string. So you need to maybe use something like JSON path or some features like this to get the information you'd like to have. There will be a lot more coming with this. There will be a separate query DSL. It's talked about, but I'm not sure how long it will take, and I think Matthias as well. But this will be the future of Elasticsearch. So I didn't show the indexing part because it's basically the same, just pass a JSON string in the body. The important thing is the REST client is less dependent on the Elasticsearch version and JDK version. Something which can also be very important, especially for larger operations, there's a clean separation between the cluster and the rest of the application network. Like you don't have to open any ports beside the HTTP port to your cluster, and it's all internal, the communication. This can also be very beneficial when it comes to firewalls. I have seen very bad things happening with some corporate firewalls and the transport protocol. Yeah, it has minimal dependency, which is very good. There's an additional library that supports sniffing. This has some more dependencies. It has a dependency on Jackson and a bit more. But then it can also do the client-side load balancing we have seen with the transport client. There's a lot more like error handling, timeouts, basic auth, and so on. But for now, no query support, no indexing support. That's the state of Elasticsearch 5.0. Elastic didn't have any HTTP client until this version, but you were able to use it even before. By the way, the REST client can also be used with Elasticsearch 2.x versions, so it's not tied to Elasticsearch 5. The chest client is an older client. It's a community project, so Elastic is not involved in building the project. And this is an alternative REST client for Elasticsearch. And it's still maintained. Again, available in Maven Central, doesn't follow the naming, the versioning scheme of Elastic, but yeah. And it has a client object as well, and it can be created using a factory. It's really similar with all the libraries, like you pass in the URL you want to access. You can use multiple threads, and then you have the client that is then also responsible for querying and indexing and stuff. Again, it doesn't have much support for querying, so you will have your JSON string somewhere, and you can either have it in a text file, or you can build it using string concatenation, or you can even use the Elasticsearch classes for building them, but of course then you have the dependency on Elasticsearch again. It's a really similar API, but not the same. It also has some builder classes where you can just build the request and then execute the search. The search result, one way to access the data is it supports this JSON object, like this object structure, so you can really navigate all the way down, but this is not what you do normally with a chest. This is far too complicated, accessing this JSON structure. The good thing about chest is that it supports Java beans. So what you can do is you build a Java bean, classical stuff like having properties and gaites and setters for it, maybe assign a chest ID, like the field that we will have, in this case the auto-generated ID, and that's all you have to do. Then you can just make sure the chest populates this class from the JSON it retrieves from Elasticsearch, which is really nice, of course. Again, you can just use your Java beans for anything you might want to do. Of course, the same is true for indexing. You can just pass Java beans to chest and those will be indexed, which is really nice. It's an alternative HTTP implementation. It has been around for a long time, but I think not that few people are using it, especially in the case when you have the requirement that the dependencies are too much. The queries still are strings, but you can index and search Java beans, which is really nice. It also supports sniffing. They call it node discovery, but it's basically the same as in transport and REST client. That's all I wanted to show you about the clients. There's one more thing I want to show you, which is Spring Data Elasticsearch. This is not really a client. This is more an abstraction on a higher level. It uses the Elasticsearch clients and provides a bit more around it, which can be a nice thing for people, especially for people starting with Elasticsearch, I would say. What is Spring Data? Who is using Spring Data already? A lot of the people. That's good because it's a really nice family of projects. What they are doing is they provide abstractions for different data stores, like for relational databases, for MongoDB, for Redis, I think, and a lot more. They don't try to pull everything in one module, but they make the specialty of the data store still available. The key value store has different characteristics than a document database, for example, and this is reflected in the project. Something which can be very impressive is the dynamic repository implementations we will see later. I think the most popular modules are Spring Data JPA and Spring Data MongoDB, which is a very, very nice pose of them. Of course, there's a module for Elasticsearch as well. This is built by the community as well, so this is not really done by the people behind the Spring framework, by some people outside, but it's still released with the Spring framework. How it works is that you annotate your Java classes with some attributes, like in this case, this is a special annotations for the Spring Data Elasticsearch, and you can tell it different stuff, like, for example, the index name it should use and some more characteristics. Then you can either... This is the simplest way to do it, like you can just add the ID annotation to one field, but you can also influence the other fields how they will be stored. So there's support for the analyzing process available as well. What you can then do is use this class you created to type an interface with it. So this is the... We just create an empty interface called dish repository in this case, because we want to search the food, and this extends the Elasticsearch crud repository, which is a special interface. There are more available in the project, but this is one you can use. So once you have done this, you can now just configure it. Of course, you can use Java config as well. It has its own namespace where you can, for example, just create an Elasticsearch client. So this will be a standard transport client, which is created here. It uses the Elasticsearch template for doing the basic operations, but the really important thing is this below here. This will scan our class path for any interfaces that inherit from one of the spring data interfaces. So this will automatically create implementations for these interfaces, so we don't have to write any code for accessing Elasticsearch. All of this will be done automatically. So in this case, we can again just create the Java bean and call this repository safe method, which is available automatically, and pass it one or more Java objects. And this will persist the data, index it, and afterwards we can of course search it again. These are the basic methods that are available on this crud interface, like you can just retrieve all of them or find one by ID. This is not very exciting for a search engine, like we want to have our query on our fields or on the text data. So what we can do with spring data Elasticsearch is just add more methods to this interface. So for example, we can add a method find by food, and just by using this naming convention, find by food, it will build an implementation that queries the field food parameter that is given here. So this makes a very nice language for querying your data, like you can have some stuff like find by a favorite price less than, which will then do a range query for the price and return all the dishes for you. So this is really nice if you don't want to get that deep into Elasticsearch and just want to do some basic queries. So there's also support for adding the query as a JSON string and you can even implement the methods yourself and do any custom logic you want to do. So that's the great thing about spring data. It's a high level abstraction. So it uses the existing client. It uses the transport client or something older, which is not recommended anymore. And there is a pull request for using Jest for accessing the data. So it might even be possible in the future to use the HTTP communication. And I think somebody might be working on the Rust client for Elasticsearch. You're using entity beans. The dynamic repositories are really, really nice. Yeah, the HTTP support is in the making. But to be honest, the project itself is not the fastest pace, unfortunately. So currently it's stuck on version 2.2 of Elasticsearch and a lot of features are missing. So if somebody is interested in providing a bit of development work for one of the spring data projects, this would be a very, very good case. Yeah, but for my impression, it's more for people who don't want to dig that deep in Elasticsearch. So normally you will use one of the clients instead. This is if you're using spring data already and I want to use it again. So that's all I have to talk about. We saw the transport client, which has the benefit of full API support. But you have to live with the Elasticsearch dependency. The Rust client that uses HTTP but is lacking in features right now. This will change. I'm sure this will be a very nice thing in the future. And there's Jest, which you can also use right now. It's a bit difficult what to recommend right now because of course once there is good support for the HTTP client in Elastic, maybe the Jest support will not be that good anymore. And the API is a bit different. So you might have to switch at one case. And there's spring data Elasticsearch, which is really more suited for people either starting with Elasticsearch or using spring data already and want to use it everywhere. Yeah, that's all I have there. Of course you can find, for the transport client and the Rust client, you can find a lot of information on the Elastic website, on the reference manual. Jest is on GitHub. It used to be by a company called Searchbox. I'm a bit concerned that this company doesn't exist anymore, but there's still development on the project. Spring data, another one. And all the example code is also on GitHub, on my account. And I will also publish like a transcript of this talk, like I wrote all of it down on my blog, so you can read all of it there again. So thank you again. I'm happy that my voice and the laptop voice finished. Are there any questions? About the concept of sniffing. So it allows your client to talk to multiple nodes in a cluster and not crash them with a crashed node. But what if a node is instantly crashed? Is there intelligence in the client to kind of shut it off from the pool of clusters? It differs a bit in the implementation of the different clients, but correct me if I'm wrong. For the transport client, for example, it will check every five seconds. It will do a special request on the Elasticsearch API. Elasticsearch has a monitoring API where you can retrieve the state of the current cluster. And it will use this for building the information which clients to access. So as a crashed node will not be in this cluster state, it will not be queried again. It might be queried once, I think, or even a bit more. But after a while, it should be fine again. I think it's five seconds it checks. In other clients, it's a few minutes, but you can configure it in all of them what happens. In the new REST client, there is some scoring with the connection fields. So it's going to apply like a pack of algorithm. Yeah, yeah. There is good functionality for this case. Yeah. Is there a concept of guided search, just like an Oracle Ndika? You actually add more filters and it narrows down the results. How do we achieve this kind of... If I get it correctly, this is also something like the faceting feature, like something you can see on Amazon, like clicking on the categories. Yeah, you will use the aggregation feature. It can build these facets. And then you can add filter queries for refining the search result. So this is a very common feature to use. Yeah. And especially with larger datasets, this can be very beneficial for the user, because it helps them to find the right thing, because they might be... At first, they might not know what they are looking for, but later on they can see which facets are available and then just click on one and it's better. Yeah. Okay, if there are no more questions, you can also reach me with... You can ask any questions using email or Twitter or anything. I'm around here for a few minutes, I guess. Thank you.