 All right. Good morning, everyone. Thanks for coming so early. And please welcome Yiri Holusha with this talk about Infinispan 8. OK. So good morning. So good morning. Thank you for coming here, coming to the presentation about Infinispan 8. My name is Yiri Holusha, and I work in JBLS DataGrid Quality Engineering Team. Red Hat JBLS DataGrid is a supported version of Infinispan, which I will be talking today. So I would like to go through this presentation with so many examples as I could possibly do. So I prepared a live demo for you. I also know that not everybody knows Infinispan. So I will go through a short introduction, what Infinispan really is, and how you can use it, how could you benefit from it. So let's get started. So what is Infinispan? Infinispan is distributed in-memory data store. So what does it mean? So it's distributed. So it's replicated or distributed all over the cluster when you install it. It's in-memory. It's usually used as a cache. I will be talking about caches all the time, because the basic implementation of the API is the implementation is named cache. So it's usually used as a cache. I'm pretty confident to say that you all know what a cache is, so it's exactly like hard drive cache when you want to accelerate the operations. And so it's a distributed data store. It's really when you want to distribute the data all across the cluster, Infinispan does it for you. It doesn't expect from you anything more than a configuration. We'll see an example in a minute. Infinispan comes with basic APIs, like very simple map-like API, it's actually implementation of concurrent map. And so it's very easy to use. What it means, you will use a basic operation, like get, put, put effects, and size, key set, all the operations you know from Java map API. It also has some other APIs and new functional map API, which I will introduce today, which is new in Infinispan 8. I forgot to say that Infinispan is no SQL data store, and what it means. When I say SQL databases, you will probably figure of some relational database. So you've got databases, tables, rows, columns, records, and so on. Everybody knows it. No SQL database is different in that way, that it doesn't have the fixed schema, so no tables. When I want to put something in the map, I'll put the whole entry, and when I want to retrieve it, I will get the entry as it is. So I can put, let's say, some person, entity, or product entity, whatever, and I can retrieve it back, just like with basic Java map. So this is somehow a description of no SQL data stores databases. They don't have a fixed schema. They have a flexible schema. Infinispan can operate in two modes, library and client server mode. The library mode, what does it mean? It means when you have your application, your var file, jar file, or whatever, you put Infinispan library in your application. So when you have your jar file, let's say, when I open it, I see the Infinispan jar, the Infinispan library in it. In contrast, the client server mode, as you probably know what the client server is, so the application is completely independent of the data store. So the application is running on one server, and the data store, the Infinispan server, is running on other node, server, whatever. So it communicates remotely. That's the difference between library and client server mode. As I said, the basic unit of operations is cache, and it can operate in three basic modes. There are actually four, but the three of them are the basic ones. The local one, local mode, is when you have the Infinispan cache on one single node. So it's basically a map with extra features, like transactions, eviction, expiration, querying, and so on. So it's like a super map. But the real value of Infinispan is when you use it in clustering environment. So what it does, let's say in replicated mode, when you put an entry, you have several nodes all across the cluster, and you put one entry somewhere, and Infinispan will distribute the entry all across the cluster. So all the nodes will have all the values. This is great for failover handling, because all the nodes has all the data out of the box. You don't have to do anything besides configuration. Infinispan will distribute the data for you. So whenever any node will go down, you will still be able to continue the work, which is pretty neat. I think this is the right time to show you just a very basic example of Infinispan for those who are not familiar with it. So I prepared a demo. So what it does, it's entering or listing products through the cache API. So I'm listing it. It's basically create and retrieve operation. So how does it look in code? So as I said, cache is basically an implementation of concurrent map. So you should be really familiar with this code. I retrieved the cache from some kind of cache manager. You don't have to know why. And I create the entry and put it with a key and value. That's it. So it's really easy. Everybody understands it. When I try to, I was talking about clustering, about replication of the data. So if you notice, I'm going to serve on port 9080. Here I have another one with port offset 100. So I will try to add new product, let's say. New product with, I don't really care what. So I enter it. The product was successfully added. And now I will try to retrieve it from the other server. I didn't do anything special. You saw the map. You saw the put operation. When I go to list products, I see the product there. So Infinispan did this for me out of the box. So this is quite a nice feature. That's basically the basic use case of Infinispan. So where did I configure it? Big partner? Q. Yeah, sure, yeah. Yes, it can be, pardon? Yeah, sorry. So if Infinispan can be used for asynchronous putting or Q, it's a shared Q. I'm not sure that I currently understand your question. So maybe I could answer it in the end. Would it be OK? Because I don't know how the time will go. Is it OK? But definitely I want to answer it later. So when I look at the configuration of the cache manager, I'll see the other stuff is not really important. But the really important stuff would be that I'm enabling cluster mode. And this says replicate everything on all the nodes in the cluster. So this is it. Besides this really basic, I just have to kill one of the servers, sorry. And besides this really basic partner, yeah? No, no, this is done lazily. So when I try to retrieve the entry from the other node, it will retrieve it then. So you don't have to wait, OK? Also, there is a distributed mode, which is replicated mode is just a really specific use case of distributed mode when the number of owners equals to the number of nodes. So all the data is replicated to all the nodes. In distributed mode, you can replicate the one entry to artificial number of nodes or subset, let's say. OK, so back to the presentation. So this was the very basic usage of Infinispan. And what other features it can possibly handle, like expiration, eviction, I will maybe introduce it later, transaction, failoverhandling, that one node goes down. So Infinispan will automatically replicate the data. Somehow, you will not lose the data, may produce, and so much more. But this presentation should be about new features in Infinispan 8. So what's new in Infinispan 8? It's based on Java 8. So it leverages as much as possible the new APIs in Java, like Stream API in particular. So I will introduce the new APIs, querying, enhancement, new web-based admin console, and some other minor stuff. OK, so let's get started with the new APIs. Functional map API. This is not a replacement for the basic cache API I was showing you a few moments ago. It's a supplementary API. It provides you a new way how to put or operate with the cache. And it also gives us new features. And I will show it by example. So let's see. I have the live demo again using the functional API. I killed my server, of course. So using the functional API. And I will also put another product. Doesn't really matter what. So it does the same thing as I showed you before. So how does it look in the code? So this is what it really looks like. The functional map API, it needs some kind of entry point. I create a functional map implementation from the cache, from the advanced cache. And then I retrieve one of the implementations for operations over the cache, like write-only map, read-only map, read-write map. And then I can execute some operations via the eval method or eval-many or eval-year-all, I think. And I can use a lambda to operate over the entry. So what's the difference between the classic API and this one? This one is completely asynchronous. So all the operations retrieve a completable future. So it's not blocking operation. I will go, the code flows like this, it will execute the eval and goes down. And it's blocked only by the future get. But I can call it later. This is just an example. So this is completely asynchronous API. And so what I do, I just put an entry here. So you might argue, why do I do this? Because it looks more complicated. It's still simple, but it's more complicated that can push, cache put. And so why do I do it? I get the same result, right? Why the hell did we do this? And I'm going to show it to you. Because there is one pretty nice use case where you can benefit from it. Suppose you have a following scenario. Then you want to, you have a cache of entries, like a products, so big complex entry values. And you want to retrieve only one property from it, but same names. So I want to do something like this. I want to show only the product names. In the basic API, the answer is pretty obvious. I get all the, I basically get, I'm just a really big fan of Java 8 stream API. So I use it as much as possible. So in the basic, how would you do it with the map? You retrieve all the entries, and then you just select the names. This is what it really does. So I really have to retrieve all the entries, all the values, all values. But let's see how it goes with the functional map API. So when I go get all the names, move it a bit. So I create the entry point, the functional map, re-only map. This is some kind of entry point. And then I can execute a lambda on the keys. And here is the big difference. The lambda gets serialized, and it's transported to the node to where the data lives. Then it is executed there, and the result is transported back to me. So with the basic cache API, the whole product entity is going via the network. But in here, only the strings go through the network. So suppose you have a very big entity, a very large entity. This can be a huge performance boost, because you retrieve only the information you really need. So this is the use case where you can do it only with a functional map API, not with a classic one. So this is a very nice feature, and it's motivation for you to use it when you need to decrease the payload. Let's go a bit further. Another nice new API is Distributed Streams API. So if you're familiar with Java 8, you probably notice the Java Stream API, and this is the distributed implementation of it. So you operate basically as you did with normal stream API. And in Finishpan, we'll distribute the work for you. So it's in parallel. It's topology-aware, meaning that the jobs are targeted on the nodes where the data lives. And yeah, and that's it. It's parallel. So you're getting parallel execution out of the box, right from nothing. You don't have to configure anything. So let's see an example. Let's say I want to compute some statistics, like average price, total number of pieces, some sum. Immediately, you will think of something like sum, average. And this is just a grouping example that I want to know how many products have number of pieces one. OK, so it's really a basic example. So let's see how I computed it using distributed streams. As you see, it's pretty simple and pretty straightforward. I retrieve the value stream, just like I would do using normal map. And then I do some Java 8 stream API, like magic. If you're not familiar with it, it might look strange. But what it really does, it's only averaging the prices. It's summing the pieces. And it's grouping by the pieces. So it's really just a syntactic sugar. But the thing is that I called cache value stream. And I get an implementation, distributed implementation of the streams. So it's really nice. I didn't have to do anything. And it's parallel, it can be much faster. Let's go through another enhancements in infinite span A, which might be querying. As I said before, the infinite span is a key value store. What does it mean? So you have a value and you have a key. When you don't know the key, you cannot retrieve the value. That's a basic definition of a key value store. But sometimes you don't know the key, right? So it might be useful for you too. Somehow query the data. And infinite span has this ability for a long time. But the new features in the new version also contains an aggregation, like min, max, sum, whatever. Grouping, classic group buy thing in SQL, and continuous queries, which is a bit more important. So I will show it later. Again, just a basic example of the aggregations. So I want to compute the same statistics as I did with distributed streams. And I want to show you how it looks in infinite span. So I basically create some cache factory, then create a query, and you'll see that I'm using the aggregations, average, and sum. I think this is really readable. And retrieve the values. And here I will use the group buy. I think this is pretty straightforward. And it's just for you to see how it really looks like. So the new feature is that it wasn't implemented in previous version of infinite span, these two features, aggregations, and grouping. If you think about it, it may seem to you that this is obvious, so how could it not be there before? But still think about it that this is a no SQL data store. The values can be more complicated. It's a complex, complex value. So this is not so obvious to implement in a relational databases. And it's really nice to have it. Yeah, it's really useful, of course, to have a group buy close. I was talking something about continuous query. Like in the same time, I think you get the result. So you're saying, repeat the question, OK? So what was the question? What happens if I do a query during update of the cache, basically, or putting a new entry? I think it might be pretty tricky, but I would bet that it would retrieve the query as the change didn't happen before, like the query would be executed without the new value, OK? But this is kind of like really deep implementation detail, but I can definitely verify for you from documentation, but I would bet 90%, OK? 87. So the question is whether the query API supports nested querying. And what I showed you is an infinite span special query API. It's just a faceting over layer, but you can always use the pure native Lucene queries. So it can do everything that Lucene does. So yes, OK? And so the new feature is called continuous queries, what it is. So you create a query and store it somewhere, and attach to a cache and a listener. And whenever a new entry is entered or removed in the cache and matches the results of the query, you will get notified. So I think I described it pretty well. So I will show you an example again. So here I have continuous queries. Let's say this is some kind of system notification, like you have on GitHub that you haven't read the notifications yet. So here are a list of notifications that I didn't read before. So I will mark the notification as read and insert an insert a new product. We don't want any exceptions, so no price. And I will add the product and look at the continuous queries, and there it is. So how did I implement it? Implemented it. When we look at the code, so I'm constructing the query. This query is very simple that achieves all the products. I create a listener. I will show you the implementation in there. I will say that I want to create a continuous query for the cache and attach a listener to the query. That's it. I think this is also pretty straightforward. So whenever a new entry is in the results, is joining the results or leaving the results of the query, I will get notified to the listener. And how does the listener look like? It's a very simple implementation of continuous query listener. And you have to overwrite two methods, result joining, result leaving, with obvious meaning. So when it joins the result, it will provide me key and value. When it leaves the result, it will provide me the key. And I can react to that event. Sure. Yeah, yes. You will, this one will be triggered. If you update the, yes, yes, yes, exactly. OK, so these were the features of, no, OK. And I also want to show you a new, so this was kind of like APIs and programmatic news in Infinite Spanade. And we also came up, not we, Infinite Spanade developers, not me, came up with the new management console. So suppose you have a cluster of Infinite Spanade nodes, and you want to manage it somehow, create caches on runtime, see the statistics of the cache. So I will show you a demo because the demo is a thousand words worth. So I don't really, I will just demonstrate it because it's just a management console, not just, but it's a management console for the Infinite Spanade cluster. So here's what it looks like. I log into it. I have two servers running. It looks ugly in this resolution, but I cannot do anything about it. So here I see a list of cache containers. I can see a list of caches. I can also search, of course, between them. Like I want to see only transactional caches, or I want to see statistics of the cache. It looks much nicer in the normal resolution, but I can see statistics of the cache, how many read hits, misses, and so I can see also the properties and many other stuff. I can configure the cache, and I can also see information about the servers I'm running in the cluster, in the domain mode. So here I see I have two running servers, one stopped. I can enable it, and so on. So this is kind of like quick through demo. But it's a really nice feature, and this management console is evolving every day, really in the hard development press. I would like to mention a few more core enhancements. In eviction enhancement, I will try to really go through it fast. So what the eviction is that you can basically do a bounded cache, that you can say in this entry will be only 100 entries. So whenever you will put 100 and first entry, one of them will be evicted, removed from the cache. So you have a bounded cache. This feature was also in enhancement from the beginning, but the new thing that is implemented is memory-sized eviction, that you can say this cache can only have 100 megabytes of data. This is done via an estimation, and it works under certain circumstances, but it can be, it's not a blocker, really. And one more very nice core enhancement is also an expiration cluster events. So an expiration is an ability to say to the entry, you will give a lifespan to the entry. So you will say, this entry will last only for four minutes. This feature was also in infinite span for a long time, I think, from the beginning. But right now it's implemented so you can get a cluster-wide notification. So whenever an entry is evicted, expirated, you will get notification very similar to the continuous queries, like this entry and this value was expirated. It could be very useful, too. One last slide would be about the new integrations of infinite span with Hadoop and Apache Spark, what the integrations are about, that you can run Hadoop or Spark jobs over the infinite span cluster. So the infinite span will serve as a data source for the job for a Spark. I think I don't know really much about it, but there is another guy who knows everything about it. This is Vojta Ihoranek sitting right here. And he has a presentation, I think, at 12 o'clock. So if you want to know more about this particular Apache Spark and infinite span integration, or in general, you should definitely see this presentation. And this is it from my side. And I think I'm ready to answer all your questions. Hopefully I will be able to. So if you have any questions, feel free to ask. You mentioned the client support the distributed mode. Client server, client server mode, OK. Does it mean that the client sends all the data to all the nodes? No, no, no. I think the client server mode is basically that, so either you have a library mode where the infinite span is bundled with the application, or you have a client server, which means that the server is a standalone server. Can I imagine a JBoS AI server? So it's a standalone infinite span server that is as a data store. And it clusters to the cell, and I can connect to them. We are remotely, we are client. OK, so no, I didn't think. Let me get it this way. Well, maybe I doesn't heard the, or, well, you mentioned that the server have two modes, distributed and clustered. Oh, replicated and distributed, right. Yeah, replicated and distributed. OK. And when you have a distributed mode, the client from your application sends the data to all nodes. Or what's the difference between these modes? Yeah, so distributed mode is, you have a cluster of infinite span nodes, and every entry is replicated. There are as many copies of single entry as you specify. So I say that the number of owners in distributed mode would be two. So even if I have cluster of five nodes, every entry will be in the cluster twice. OK, so whenever, so I can assure that whenever I have one node goes down, I will be always, I will not lose any data, because I have two copies. OK, so this is distributed mode. And replicated mode is just a special case of distributed mode when the number of owners is equal to number of nodes. OK, so all the data is replicated on all the servers. Any more questions? Can you tell us maybe some good examples where this is already used, used in production? Why is it this cache better than memcache or other kind of caches? OK, I'm just thinking if I can. In production. Yeah, so thanks for the question. We have a number of users, but I don't think I cannot name it, and I can name it. But so there are several differences. Like why is infinite span? Why would you use infinite span? So first of all, it's performance. I'm not sure if I can say it, because I don't know if it has been published, but I think yes. It comparison to memcached or the performance is pretty nice. But it aims differently, because it has many other features that you can use. For instance, when you use Redis, if you're familiar with it, it's a really pure key value store, no querying. I don't think so. I think there is no querying. I've met Redis slightly, but I think there is no querying. But it's used for this type of pure key value purposes, but infinite span can aim to other ones. So there are different aims, I would say. It's also Java-based, which also can be a benefit from other stores. Somebody prefers Java-based. So yeah, there are also some performance comparisons. But as always, different products are better in some areas, so I cannot really say that infinite span is the best. I cannot say it, because it can't be. And yeah, so there are lots of features. I would say that this is the biggest advantage of infinite span. You can also configure it as an offline store, that you can evict the entries to the hard drive. You can query them, failover handling. There's a lot of features that other projects maybe don't have. Infinite span, which is directly in Wi-Fi, so it's battle-tested and proven to be rock solid. But there are, of course, much other deployments in various products and some open source frameworks, like Apache Marmot are built on top of that. Or for example, if you visit yesterday's presentation about Key Club, it behind the scenes again used infinite span for clustering applications. So there is quite a lot of projects which use infinite span under the hood. And of course, custom project used infinite span. So what we don't know about it, it's open source, so everybody can download it. So we don't have probably any list of users. I can also say one particular example that I'm familiar with, that at my university, at Masarik University, they use it in one really interesting project that I'm running a similarity search application, where the similarity search is basically where you have an application and you upload, let's say, an image into the application and you say, give me 10 most visually similar images and then do a Retrieve. And since they have a large amount of data, like hundreds of gigabytes. So they used infinite span as a background store because of its distributed execution framework, that it's basically something very similar to distributed streams, that you can execute some particular task on particular nodes on the keys. So and the leader of that project said that this is the feature that he didn't found on any other NoSQL data store, that you can basically perform the operation only on the keys that you want to, that you can select the keys and you can execute the task on the keys, not all the keys, like map reduce. It's different. So this is just one special case to maybe advance your question. One more addition to this, I realized another famous project with use infinite span, it's Hibernate. You probably know Hibernate, so it's again used infinite span as a second level cache. So it's another famous project with use infinite span. So are there any other questions? I didn't forget about you, so I definitely want to answer your question, but I feel that I will need to think about it harder. Or contributions are welcome, Disney. Do you use TwoKeeper for the distributed mode? What was it? ZooKeeper, the consistent distributed map. You have a cluster of nodes and they all have to be consistent and you have to know when they go down. How is this managed behind? Is this something in-house or is it TwoKeeper behind it or console or I heard yesterday about ATCD? To be honest, I would have to check with the, because I'm not familiar with the ZooKeeper thing, so I didn't know what you're talking about. So I guess that I'm out of time. Out of time? So thank you very much.