 Crate has actually is a database which is developed in Austria and as you can hear from an accent I'm not, I'm one of the few people in the team who are not Austrian, I'm German. The other guys are really Austrians, which basically means if they talk Austrian dialect I can't understand anything. So we are supposed to speak the same language, but it's sometimes not the case. Crate is not very known in the US, I guess that's as well for the late hour and this is not a lot of stuff like here, but I think it's actually a really, really cool project, open source project, database project. It actually won a TechCrunch disruptive startup battle field contest and yeah, I'll just let me give you a little bit of an overview and yeah, so the title of the slide is Big Data from the big Austrian mountains. So yeah, let's go into it. So what is Crate? Crate is a database which was built to scale massively. It is a database which is, so it's not a memory database which are right now very popular, it's a database which has for sure cache and memory, but where the main idea behind it is to have a database where you have persistent storage. It has very powerful search capabilities and it actually can do very powerful data analysis. It can run very powerful data analysis in a large cluster and currently the, for example, the cluster sizes we have deployed is, so we have a couple, so this is still a very young product, so there are a couple of projects we now have, so a lot of projects we don't really know of because they simply use the open source version of our product and we don't really know what kind of scale they have. From the production systems we now, one of the bigger ones is about something like 120 nodes on running on hardware and with about 3 billion inserts per day. So that's kind of something like the scale where you can get to. So it's a really scalable database. The reason, by the way, so I always kind of mix it up or have difficulties with it. So it's kind of, when we're talking so about Crate, there's kind of a little strange thing, so I basically say Crate is built on no SQL technology, on a no SQL architecture, but the database understands SQL. So this is kind of a little special case here. So we have kind of like a lot of features you will recognize from no SQL databases, but it understands SQL and that's why it's a new SQL database. One of the targets of Crate is to be extremely simple to operate. So it's a database which is specifically built for easy operation and especially to attract DevOps people and developers because you just can simply install the database by yourself and to a certain kind of degree, most probably many dozens of nodes, you don't really need a dedicated database person for it. So you just can more or less run it by yourself. Crate scales almost in linear fashion when you add new nodes and it has the search capabilities or basically the queries, they scale with it. You can run Crate on commodity, hardware if you want to do that. It will for sure scale better on basically server hardware, SSDs, a lot of CPUs and so on, but it's actually doing very well in scaling and you can very well scale out horizontally on commodity hardware. There is no centralized storage or whatever needed. It's basically a shared nothing architecture. Storage is all locally. And Crate is extremely elastic so basically adding new nodes or taking nodes away. This is something which doesn't really require a system administrator to add a new node or to take it away. So you just can simply stop the process, shut the server down and Crate will figure it out by itself. Or you just start a new node and Crate will even figure it out by itself. So it will automatically, the nodes in the cluster will detect the new node and they will start to stream data to the new node and just give it something to work. It's very resilient so as long as we have, so we have in Crate we have all the good stuff like sharding, partitioning, replication. So as long as you choose a replication factor, which is basically big enough, the cluster will be extremely resilient. And even if, for example, let's say you have a replication factor of two and you take two nodes down. So then, for example, because accidentally you just actually rebooted the node, when it basically, the data will be for a while, some of the data will be not available. And then when the nodes come up again, the data will be available. So even that is just not killing the cluster. It has read after write consistency. So what that means is it's a typical node sequel architecture. We don't really have transactions but you just have atomic consistency. And when you actually write and create a row to the database and you just after that immediately read, you can be 100% sure that you just get the change row back. So let's read after write consistency. So when you look in the database solution space, then I don't know if you noticed but there are a couple of reports every year coming out about basically new databases. They're right now most probably a couple of hundred different databases existing right now. There's a wide mix of databases and different kind of storage technologies, different kind of data models which are behind it. There's a huge variety of databases you can actually choose from and a lot of them are as well open source like create. So to give you an idea where create is actually located, so even if create is based on elastic search, so it's basically I will show you later some slides. So we didn't reinvent the wheel. We just started basically and used reuse some other open source projects. So even if elastic search is here at the documents store bucket, we see create specifically for create and cloud usage. So I'll later show you why that's the case. So our problem right now is we are really drowning in data so there has been never so much data generated as in our time today and at the same time what we are doing, what do we really do with the data? So we have the problem that there's a lot of data is generated but we are just not really able to analyze it in a proper fashion. So there's an estimation this is by the way something I have to give credit to Cassandra or to Datastacks. That's numbers I had actually from one of their slides. So the estimation is that there's about a set of bytes of data in our world today which most probably is about 135 gigabytes of data for every person on this planet. So what are basically the challenges in this kind of environment so if you look at the basically traditional data and kind of compare the properties we have here then so in the with the traditional databases I have most probably data, so my database sizes are ranging from gigabytes to terabytes while when we talk about big data we are talking about petabytes or exabytes. Then with the traditional databases I'm talking about the centralized database while very clearly with petabytes of data this is not possible anymore so you need actually a database which is a distributed database. That's what Crate is. Then another challenge is that in the big data space while we have in a traditional database we have a lot of data which is structured. We have basically our table structures which not very often change. In the big data space we very often have unstructured or semi-structured data which can basically change from one day to the other from one record basically to the other. That's as well something Crate can manage. And in the old traditional data space we have stable data models which can be very deep. In the big data space that's a problem so we are actually trying to have flat schemas because otherwise this is not really manageable. Very often the relations we have in the traditional sense they are actually resolved as embedded relations in the data itself. So that's how this is actually working. In the traditional data space I have a lot of complex relationships. In the big data space I have fewer interrelationships so a lot of interrelationships are as well they are basically embedded. Then in the big data space I have real-time a lot of real or basically I have real-time data. I have analytics I want to do and I have search. In general one of kind of stuff like the patterns we have noticed in the last couple of years is and this is something we have seen over and over again is that a lot of people in internet scale projects they actually use technologies. So they basically use a mix of technologies for data storage. So very often they use a relational database in combination with a document database in combination with some search functionality they need and then they have additionally they have a blob storage or a storage basically where they store their web assets for example. So people build different stacks here with React, Solar, Rados or MongoDB, Elasticsearch, RITFS, CouchDB, Elasticsearch and HDSS with Hadoop. So this is something which makes projects really complicated because you just have different data stores and that's basically what Crate is actually trying to do. So the target for Crate is that you can combine this kind of technologies and that you can have a database with no SQL capabilities with actually a flexible data model. So kind of like in a document oriented database to store your data with blob storage and with search in one single open source product. So that's what we actually trying to do. So this is very quick. I want to show you how easy it is. I hope you can actually read this. So I wanted to show you how easy it is to set up Crate. So this is basically another target we have as well. So this should be the easiest database we ever actually set up. These instructions are for starting an instance on Amazon EC2. So this is here from the EC2 command line. This is starting basically, I think this is an Ubuntu image on Amazon. You log in to your server. Then this command here is loading a shell script from the Crate server which is analyzing what kind of Linux distribution has. It's just going to install, you can do that as well manually if you don't want that someone is executing, that you're executing a script which you haven't reviewed. So there are instructions on the website. So this is adding local repositories to your Linux server and then it's just going to install Crate. At this point, I'm sorry for the color here, at this point Crate is already installed and you can access it from the dashboard. And when you want to start to create a couple of tables then you need to install basically a command line tool which has the nice name Crash. So this is actually the Crate shell called Crash. This requires Python. When you have this actually requires Python PIP, so the Python package manager, then you install Crash. Then you can call Crash and that's actually, as I noticed, this command is missing. So it's simply the command of Crash and this is the Crash command line. You connect to the local server you have installed and at this point you're connected to your database and you can write basically, you can carry the table, you can insert data in your table and you're good to go. So this is basically all what's required to install Crate and get it working. When you want to install a second server, then you basically repeat exactly the same steps here. If you are in a subnet or if you are not in the cloud, so if you're locally in a subnet which can, which accepts multicast traffic, then the second node will automatically find the first node and the database you have or the table you have created at this point will sync up with the second server automatically. So they find each other and they will build together a cluster. On Amazon there's one step more required if you want to set a cluster with multiple nodes. So Amazon doesn't allow multicast messages, so we have a plugin which either allows you to use single casts or even more elegant. You can, at the time when the EC2 instance has been created, you can give the EC2 instance a tag and then you can define in the Crate configuration that all the servers which are running in your account which have the same tag are part of your cluster. So what's happening is that every couple of seconds your server is scanning using the AWS interface to scan for other new servers with this tag and then it automatically connects them together. To give you an impression about the syntax for the SQL syntax for Crate, there are a couple of additions. So one of the things I was mentioning before is that Crate is allowing for basically semi-structured data. In semi-structured data you can actually define, so either you can define this here explicitly and for example doing something here like an object which then basically has nested attributes in it or you can simply start, you just actually, for example, create a very basic database schema and then you just simply start to insert data into it. And at the time where you start to insert data, Crate will automatically will try to figure, will first notice the columns which it doesn't know. It will modify its internal structure of the table and it will learn, the table will basically learn the data you're just inserting. So it's adapting the metadata of the table structure. This is default behavior if you don't want that because for example you don't want if you have false data or faulty data and this data is creating new columns then you can actually turn this behavior off. But the default behavior is if you insert data then this will, so Crate is going to adapt the table metadata. Also you see here a couple of keywords which are going into sharding and partitioning. So I'm defining here how many shards I want to use and also I'm defining on which column I just want to do the partitioning. Also at this point I'm defining how many replicas I want to have in my Crate cluster. So replicas here in this case mean number of replicas, two means that in three, that in the complete cluster I have in total I have three copies of my data. And then you try to find out if for example you can simply insert data so you can do a mass insert. So for example one data source you can use is JSON. So if you have this kind of JSON file then you just can simply do a copy from and then it will insert it into the database. So Crate actually tries to figure that out by itself and it kind of stuff like rebalances the cluster on the size of the partition. But as well you can for example you can define here you can define ranges so you can explicitly define what is part of a partition and how many, how the petitions are actually used. So the petitions are, so all the petitions you define here are as well in all the shards. So the petitions basically define the range of the values. So it basically defines that the bucket where the data is in. So it defines the partition. That's correct. Let me actually think about it and I'll come back to you at the end. It actually maps so the, so because it's actually sitting on top of it this is actually mapping to elastic search. It's an elastic search term, correct. So Crate is actually written in Java. So as elastic search is written in Java you will need to have, so Java is going to be installed as part of the process. One little information here so that's something which I noticed yesterday with installing the latest version of Crate. So the latest version actually requires now Java 8. So when you install it you need to make sure that your Linux distribution has basically Java 8. Otherwise you will need first to install it from somewhere else manually because then the whole installation process will be not able to get the right Java version. There are different drivers available for talking with Crate for different programming languages. I'll talk a little bit later about that. These drivers talk with Crate either through a REST interface or there as well there's a binary interface. And I think binary interface is mainly for Java clients. There is a UI with Crate where you can see basically the class status. So basically it will tell you where, so how many, so it will give you an overview about all the tables in the system. It will give you an overview of where the system is regarding loads. And it will also give you an overview if there's any kind of problem in the system. So for example if there's a note down or if there's certain kind of data or if they're just under replicated. And in this case it will show and it will show as well which kind of action it's taking. There is a blob storage which you can use so there's an interface for uploading and managing blobs. These blobs are as well managed with, so for example the replication settings you define in the database. They are as well used for the blobs. So they're just going to be stored according to the replication settings you have. There by the way they are not stored in an elastic search. So they're actually stored in the file system of the nodes. And it will just distribute, so it will actually look for the size and the available space and then it will distribute it across the cluster. There is a plugin infrastructure which is used that you can extend the SQL queries. So it's kind of, it's not really like stored procedures but you can for example define, you have something like user defined functions. And that's something you can implement with plugins. And there are many different clients in the cloud available for all the main languages like Java, Python, Ruby, PHP, Scala, Node.js, Erlang and as well a couple of other ones. Also, Crate runs really, really well in the cloud and as well in containerized environments. The main reason why it runs so well there is that it just does kind of like the whole discovery automatically. Also when nodes are just going up or down. So Crate basically takes care of the replication and that the data is distributed into the new nodes. So these are a couple of features of Crate. It actually supports I would say about 95% of what SQL, I think it's SQL92 is actually supporting. Additionally, it allows that you do have arrays and nested objects so that something I showed you with the Crate table statement. So this is by the way directly mapped into nested objects into Elasticsearch. Then internally it has an information schema which you can query and where you can find out more information about your, basically like other SQL based databases, where you can find out metadata about your database schema. And cluster and node states, this is as well something you can find in this meta information. So you can build your own little client which is querying the cluster and finding out in what kind of state the cluster is. In case this is important for your application. And one really, really big feature is that in many cases you can reuse your OR mappers you're already using because Crate is accepting most of the standard SQL. You can use OR mappers like, for example, Hibernate and SQL Alchemy and Active Record and PHP, PDO and so on. You can directly use that. And if you, for example, have a project which is running already, let's say with Postgres or with MySQL, then if you don't really use stored procedures, then in most cases it's relatively easy to move it over to Crate. So this is a brief overview of the different components on a Crate node. So basically all the nodes in a Crate cluster are exactly identical. So there is a shared nothing architecture. And all the nodes are basically are equal. There is, though, there's one, it's not really an exception, it's basically there's one node in the cluster which normally has a special use case, that's the so-called master node. And this master node is basically the node which is having all the clusters metadata in it. And this master node is something which is not a single point of failure. So if this master node goes away, then the rest of the cluster does an election and elects a new master node. So this information, they made the information from the master node as well, something which is replicated in the cluster and which then can be replaced by a new node which is taking over this responsibility. The clients in general have the choice either to use the rest interface or to use the binary interface. Then we are just using a single parser. I will show it on the next slide, which is actually, so that's Presto, which is coming from Facebook. Then this piece with the analyzer and the distributed analyzer and planner, that this is again a piece from great. This is doing the distributed execution and then it's just merging and collecting the data and just giving that back to the client. And for the case of the blob storage, that's working a little bit differently. So that's kind of bypassing most of this. So this is then directly going, so this is retrieving the blob directly from the nodes. So this is an overview of the different layers and as well what kind of technologies are used here. So the base storage layer is Lucene and Elastic Search. So everything is basically stored there. And also on the side we just have the blob storage, which is the blob storage, which is outside of Lucene and Elastic Search. Then this is the network layer where we are just using NETI and so for basically transferring the data here. Then we have the aggregation level. This is something you can see from the color coding here where the things are basically coming from. So this is what we have developed completely by ourselves. This is the distributed SQL, the distributed reduce and the data transformation here. For the layer for the querying, we're using Facebook Presto. And then this is actually using calling the query planner, which is then doing the execution. Also there is on this layer is the module for importing and exporting the data. And as clients, we just have, so the create dashboard, then the shell, the create shell, which is called crash. And also we just have a lot of different kind of client libraries and also Java, which is kind of a first class citizen here because it actually uses here the binary protocol. It actually uses directly Lucene. Yes. So the current version from the latest Elastic Search version, which right now in the last print was published, is using Elastic Search, I think, to zero. That's basically brand new. Then the second one was what was the actually, and now that there's some brand new functions regarding geodata type basically in there, but I have not actually played around with it. So I'm not really sure to what kind of tree query that is supported. But this is a feature which came actually in with the latest version. So I would need to have a look and see what exactly is supported there. Yeah. So this is the technical highlights for create. So it's very easy to scale and to manage database. It is based on a shared nothing architecture. You can use real-time SQL. It supports environments with high availability requirements. And you just have a data store where developers have a very easy time to develop with because it doesn't really give them a lot of restrictions. So you can very freely basically insert data into the database and manage it. So regarding scalability, so they have been in our bigger setups. We had actually up to 300,000 records inserted per second into a create cluster. And if I remember, this was even not on hardware. So I think this was actually on an easy to cloud deployment. Sharks can be actually moved manually or automatically. So the standard settings which actually with create come is that create will actually try to optimize that by itself. So in general, data can be semi-structured. And you can use either strong SQL schemas or schema less SQL schemas. So that's basically something which is up to your decision. If you basically choose to define a database schema, a table schema by yourself, or if you actually want that create actually auto-detect the schema. You can have nested documents. So this is basically an elastic search property. And these nested fields is something you can use exactly in the SQL queries, exactly like you would use normal fields. So if you actually have nested fields, then you basically have the field name, dot, the nested field, dot, if there's another nested field, dot, et cetera, et cetera. And as well the same thing, if you have an array, then you just can select kind of array elements out of it. So this is directly supported with the SQL. So that's kind of an extension to the normal SQL that you can have those nested elements there. Yeah, then create, the create planner is actually that piece in create which just tries to optimize and have an internal strategy for having the optimized queries on the create cluster. The collect, shuffle, and reduce phases are used for data aggregation across the cluster. And for sure create uses a cache for internally speeding up the queries and caching. So if you have, for example, paging to have faster access here. And create is developed with the Java NIO interface. So basically it uses the internal asynchronous capabilities of the NIO library and Java to execute or basically process the IO. Yeah, single table you can't. So you can have, there are no constrained supports. Also when we talk about tables and kind of stuff like single tables, there are joins. I haven't mentioned that so far. So it's actually one of the latest versions. So there are now inner joins and as well cross joins. But, for example, outer joins are right now not really supported. So you can have some dependencies between tables, not regarding constraints, but you can, for example, do the joins. Which is actually was a pretty big step. So I think it took about a year to develop. So it was quite a bigger subject. Yeah, yeah, you can, that's basically happening automatically. Yeah, no, no, I meant you. No, it's happening on the servers. So the servers basically. No, it's happening on the create level, exactly. So that's the secret source of create basically. It's relatively new. I have not seen kind of stuff like good, so there's, I've played around with it, but I have not really, I don't really have a test environment where you can see comparison between, let's say, a MySQL database joining locally versus create actually doing that. Anyway, it's kind of a question, what is kind of stuff like, what is kind of like a good, how can you really test it well? Because to be fair and having kind of like good numbers, you would actually need to choose a scenario where MySQL really would have problems because of the size. So I haven't seen that yet. I hope they can some publish something on the block soon about the performance there. There are actually a couple of, sorry, there are a couple of, I think there are two block entries right now on the create website, which is talking about optimizing basically performance of joins. If you want, you can have a look there, but like I said, I don't have a direct comparison between relational versus no SQL joins. Sorry. No, but no, we don't. I was actually, sorry, I was thinking about something else. So when I was actually thinking about real-time data, so there's currently some work invested into making create a MySQL slave, so regarding streaming data, but not in the Spark world. I don't know, but if you give me your email address, I can basically ask. So actually, my colleagues are meeting tomorrow for some skiing in the Austrian mountains and also they just have a week ahead of kind of stuff like planning and so on. So unfortunately I'm, or fortunately because I really hate the cold. So fortunately, unfortunately, I'm not there. So I don't know right now what is planned, but I will be able to find out at the end of the week. Yeah, but the problem actually, you got it exactly. Yeah. So in general, kind of elastic search is kind of packaged into create and you can query create through the elastic search interface, but you should not really insert or update data because that's gonna confuse the metadata layer which is sitting on top of it. So the schema information as well, the internal information create actually has about it. So it's read only the elastic search connection. Create is licensed under Apache, under the Apache license. If you want to have a look into the source code, then you can do that. This was published to all the source code is published under, so on the GitHub account. Also, if you want to have a look, so we tried actually to reuse as many open source projects we actually could to not rebuild the wheel. So if you're interested in what we are kind of using, then you can find that here. That basically said, just a couple of more words regarding database scaling with create. So that's one of the features we are pretty proud about so that you just have a shared nothing architecture so that all the nodes are basically the same. The app containers, by the way, they actually for the connectors we have built here, the drivers, they are intelligent enough that when they're actually talking to a node that the node actually will basically tell them either to talk with them or basically talk to another node. Also, there is a mechanism built in in the drivers. So in case that one of those nodes basically go away, then it actually can talk to another node. So in this case, the intelligence is built in the drivers. Create runs really well on different container-based environments. One of the really, really good integration tests we have done is together with CoreOS. So, for example, you can run create containers there with fleet and you just can actually push the create containers out with Docker and fleet and then manage them there and kind of scale up and down. So this is actually working really, really well. So in general, in all these different environments like container-ship or directly with Docker or with Tutom, so we have actually built Docker containers and we basically have tested in these environments and this works really, really well. Like I said, because of great capability of kind of the auto-detecting the cluster and then when you basically scale up and down to do this all automatically. Yes. If that's the case, then we have a problem. Yeah, sure. So I think I mentioned that before. So if Mazikast doesn't work, then there are a couple of other possibilities to configure it. Yes. What is the default right now? Unicast. Yeah, so like I think I mentioned it before. So, for example, in Amazon, you have different ways of doing it. So one way I think you can do it with DNS, but kind of the most elegant way of doing it is do it with tagging the instance. And I'm not sure, so I don't think that the Docker container has been tested with ECS. There's actually a methods scheduler, I think, which has been written. Yeah. Yes. That was about maybe three quarters a year ago. Yeah, there's a scheduler for that and so it runs in methods sphere as well. I'm not sure about Kubernetes, so I'm not sure. I think there was something, there's something built there as well, but I'm not sure about that. So the easiest way is just simply search on our website. So you will find it. We had actually two or three weeks ago, I think we had a problem with the block and a part of the content actually was down because of a little mishap. But right now all the content should be up again. And so as soon as there's kind of a new thing out, a new feature out, you will find it for sure on the block. So there's actually, there's also, there's quite a lot of documentation out there. So there's, it's actually pretty well documented. The only thing, so we have a couple of third party, for example, drivers, they might be not as well documented. But whatever is actually maintained by Craig has actually quite good documentation. And if not, there's in the bottom corner, there's a little chat box and there's a real engineer sitting there so you can actually ask him this kind of question. And so the only thing you shouldn't do is most probably do it kind of like super late in the night because then this is, this is kind of Austrian, so middle of European time, so they're not going to be in the office. But if you kind of try to chat with them in the morning, they will basically directly respond to you. So this is the, so the crate is available as a Docker container. So you can directly use that. You can use it with swarm. Then it's on Kinematic. So that's another place where you can actually get the container. So that's this one here that I think the projector is actually not really sharp. And also there are a lot of different ways of run, a lot of different cloud platforms we can create. So there's actually on AWS, there's an AMI you can directly use. Also there is a website. If you're interested, you can come to me and I can give you the address. But there's actually a cloud formation generator. And the cloud formation generator you can kind of type in. I want to have, I don't know, an X large crate cluster with 10 instances. And I want to use EBS storage in this and with two availability zones and blah, blah. And then at the end it will spit out a complete cloud formation template. And then you can load the cloud formation template into Amazon. And the only thing which you need to have ready at this point is your SSH key. In Amazon and everything else it will automatically orchestrate and create. So that you have security groups and all the other stuff will be created for you and configured. Azure, their image is available for GCE. Their image is available. And then also for SoftLayer, their image is available. Here's a little bit different. There's some salt, basically installation procedure there. That's one of the things I haven't tried yet. So I can't tell you exactly how that's working. Crate is mainly, as you can see, popular in Europe. We are very present there at user group meetings and conferences and so on. So in the US we start to have right now as well. So this is kind of our download numbers. We start to have more and more people looking into Crate. So in Europe we have right now the most people kind of looking into Crate. If you want to play around with it, I'm also recommending there are a couple of nice examples you can try to kind of get a feeling how you can develop with Crate, how the drivers basically work. So there is on GitHub, there is a GitHub example. So I don't know if you know, but there's, so the GitHub has actually opened up there. So they have a quite big amount of data stored about the commits of their users to public repositories. And that's actually data you can mine. And GitHub has right now every year has kind of a data mining competition where projects are finding kind of interesting analysis of that data. And we have actually built such an analysis on Crate. There's an example you can download from GitHub which analyzing languages, request latency, sentiments and so on. So that's the way the example looks. You just can, that's something you can basically reuse for your apps and you can see how the analytics and the aggregation basically perform. So if you download the complete GitHub archive, this is going to be 100, 200 gigs of data at least. So you can see how this is actually performing, how Crate is actually performing with big data. Also, what I want to mention is Crate is, so because there's actually elastic search under it, it works very well with elastic search tools. So for example, you can use Kibana for, in this case, these are some web logs. And these web logs are actually fed into Crate as a SQL table. And then you just can, for example, use Kibana on the elastic search interface to analyze that as a dashboard. Not to my knowledge. So that might be possible that you can do that through the plug-in interface, but I'm not 100% sure. Sorry, what is it? The writes are consistent. The writes are consistent. There is a performance penalty for doing that because you need to wait till basically the last of your replicas have actually written before you can actually return. I'm not sure how this is implemented. I think this is as well a Crate thing as far as now. I just only know that you can tune it and, for example, you can disable, read after write consistency, but this heavily recommended to use that. Okay, that's all I had as slides. Questions? Correct. So when you download a block, Crate will automatically, so you don't need to know about which node is actually located. So this will be handled internally by Crate. It's not chartered, that's correct. It's basically, it's a file which is in, but it's actually, so it's replicated. And you will be, for example, if this is kind of the next question, so it will not, for example, if you download it, if you have a replication factor, it will only download from one node. So it will not go to the multiple nodes and kind of stream at the same time. So it will be only downloaded from one. Okay, then thank you very much. And so I'm also in the exhibition hall. Forgot right now which number the booth is. It's something like 300 something. So whenever you have time or want to ask some more questions, these come over. So I'll be there tomorrow at the normal hours. Cool. Thank you very much.