 Mijn naam is Niels de Vos. Ik werk op Gluster en een van de co-maintainers. En ik zal je gewoon een beetje vertellen over het stijden en de features die we hebben. Dus een heel kort introductie. Gluster is een post-ex-like valsystem. Het geeft access door verschillende protocoules, including normal valsystem access, zoals Fuse mounts, NFS mounts, object storage en ook blokstorage door, bijvoorbeeld, KMU direct met native drivers. Er is geen metadata server, dus LizitFS, de top 4, heeft een metadata server. Gluster doesn't have a metadata server. Xef doesn't have metadata servers for the object store. We prefer to calculate the position of where the data is instead of asking someone about where the object to find. So our clients are a bit more smart and we actually know where the data is instead of keeping track of everything. Gluster runs on basically any kind of hardware at home. I run my Gluster environment for testing and installation media and everything on small 32-bit ARM systems. That works perfectly fine. At Red Hat we have many customers that run on pretty decent Dell HP and any vendor you can imagine kind of hardware. So from the small things to the big things we do everything basically. Sizes are important so you can scale up. If needed, add more servers if you need more storage. Integrations with, so you use it through Fuse, NFS, Samba, or any of the other tools. There are integrations with different projects. So if you want to store backups on Gluster you can use, for example, Barrios, backup all your workstations and Barrios knows how to speak the Gluster protocols and automatically store your data there. And of course it's a backup solution so it can also recover it. Barrios even has the option to backup Gluster volumes. So Barrios is a whole fuel-featured backup solution and if you want to backup your Gluster volume onto tape or Amazon Glacier or anything slow cold storage Barrios and other backup tools would be able to do that. So we even have partnerships or, well, work together relationships with commercial products that use Gluster and provide for backup. Replication is one of the main features. So this is a distributed replicated Gluster volume. You see on the top there's a distribution algorithm than the replication algorithm. So very similar to how this mathematical description works. We do first the distribution on one layer and then the second layer does the replication. In our case it's completely disconnected. So we run into these kind of problems. There's no connection between those algorithms and that's not always the most efficient use for disks and you run into unexpected issues. Instead of using replication and distribution there is also erasicoded options. So you can use, we call it disperse, it's erasicoded. So it's like a rate five over the network. It's a bit more complex because Gluster, by default, uses everything based on files. So you have distribution based on the whole file. The file will locate it on one of the bricks. If you do replication, this particular file gets replicated exactly to all the bricks you need. So it's really file-based. En in absolute emergencies, a system administrator can go to the storage, go to the XFS file system or whatever file system you use, and you see the files. You can access the files without using Gluster. We don't recommend it, but if you're in an emergency situation you can actually do this. So administrators feel very happy about it. With disperse volumes, files get split in chunks and get encoded so that you have the erasicoded bits. If you go to the back end as an administrator and you want to read the file, you only see the encoded bits and you have only a couple of encoded bits on this particular server, you have another couple of encoded bits on other servers, so you somehow have to reconstruct this. This is something that is relatively new. We don't see a very high rate of adoption because administrators don't feel too confident about this because they just can't see the contents immediately wat they used to see with previous or other type of volumes. So this is something that we're working on and has arrived like two years ago, maybe, for the first time. So in the meantime, it's very stable, it's used everywhere, but many users still are not very eager to adopt it. It also comes with performance penalty, but that's it. All the cluster packages are part of many different distributions, so we try to push the packages in, for example, in Debian, there's a version of Glouster, in Fedora, there's a version of Glouster. We provide them for the CentOS storage shakes. There is a NetBSD contributor that pushes the cluster packages into NetBSD, there has been work with FreeBSD, people to put it in the port. I'm not sure what the current status is because one of the maintainers was doing a lot of work there, find something else to do, so we probably need to pick that up again. We try to make it really easy for users to install Glouster and make sure that every distribution has its native installation storage. There are several quick start guides, the documentation through Glouster.org, but, for example, some of the distribution offer their own weekies and you can go to the weekie pages there. Recently we changed our release schedule a little bit, so we were doing releases roughly every six months, and every release was maintained, like up to two releases until the third release, follow-up release would be done. So, for example, 305 release was supported until, well, this is the old schedule, so it was planned to be supported until the 3.10 release was done. Unfortunately, six-month release cycles were not very ideal because every release would have like 30 new features. It would be a huge amount of features and users cannot easily go through these features and say, okay, what's new and I want to try this out, I want to try this out. We would overwhelm users, developers dat werkt on several months on a feature, they would like to see this feature, sometimes small features, it's like one month or two months development cycle, and they would like to see the adoption of the feature or have feedback from users. And waiting six months for the next release would be really long. So we changed the release schedule with the current releases and now we do a three-month release cycle. So that makes it possible for developers and users to try out new features en get feedback and everything. But we can't require users of a storage solution to switch every like six months or maybe nine months of their storage solution. Users prefer to have a stable solution and don't want to upgrade too often. So what we do is we have a long-term maintenance release. 3.8 is the first official long-term maintenance release that we have. And three months later, 3.9 gets out. 3.9 is a short-term maintenance release. It is supported for three months until the next release gets out. So with the release of 3.10, 3.9 gets deprecated, abandoned. 3.8 is a long-term maintenance release so it will still be supported and get bug fix updates and everything. And this is the current approach that we're taking. 3.10 has a release candidate, I think, was tagged last week and is planned to be released in the coming weeks. So we're doing some tests to make sure that everything is there and we'll get it released surely before March. With the 3.9 release, which is a pretty current short-term maintenance release, it's still being worked on en bugs get fixed in this short-term maintenance release. We had additional commands to ease the replacement of bricks. Multi-threaded self-heal existed for no more application but it didn't for erasier coded application. So now we have multi-threaded self-heal for erasier or dispersed volumes. Also for erasier coded volumes, we added CPU optimizations. So we have some of the assembler code for the erasier coded erasier coding functions that we have so that it's more optimized than the CPU. Not that the CPU is normally a bottleneck but with erasier coded volumes, you definitely have more CPU usage and if you have low-end hardware then these optimizations make a difference. Sometimes clients take a look on a file and this file, or the client has an issue. The look needs to be revoked and we have a CLI gotten from, CLI is the common client interface. Facebook users cluster a lot and they don't really care about their crashes that they have on client-side systems. So the Facebook team that provides the storage interfaces they say, well, applications is not our business but they tend to crash, they forget to release logs and we cannot wait for the time-outs that it takes to detect this kind of log loss or application loss. So we want applications to take over immediately. So we added a functionality to actively revoke logs on demand when the application management interfaces notice that an application crashed or is not responsive anymore, they kill the application and they revoke the logs as well. So it makes it a bit more flexible and everything. We have bit fraud detection. So if data changes on the backend or disks start to return non-fatal read errors we can detect them and it's a background process that runs every so many times. Now there's an on-demand way of triggering this scrubbing and checking. We added more APIs so that you can do eventing. So any of the management infrastructure around cluster needs to get notifications and notifications are important to know when bricks or servers are missing or crashed or disks are failing and display these events in a web interface is important. It really helps administrators to figure out what's wrong. They don't seem to like to go through logs anymore. Everyone wants a web interface. So we needed a notification API to provide this. We also improved our geo-replication. So we have a multi-site replication and it's always a bit tricky to set up so we have an interface to make it a bit easier to configure. That's also command-line. 308, time before, already added some REST APIs so you can configure cluster through REST APIs. Mostly Hackety does this in combination with Kubernetes. We added improvements for sparse files. Sparse files over a network are useful that you do not read the empty areas. So if you have a gigabyte file which only contains a little bit of data and then maybe at the end of data. In the middle is empty, it spars over a network file system. You prefer not to read this empty area. You prefer to skip it because reading, for example, zeros, is not very efficient and it would cost just a lot of network bandwidth. So backup applications were not very happy with it that it wasn't existing and we added that in the previous release. Again geo-replication because it's one of the big functionalities that we offer in Glaster. We made it tearing-aware so we have a tearing, you have a fast and slow tear and now geo-replication can actually do that decently. Sharding is splitting from big files into chunks. geo-replication didn't do that efficiently either. That has been addressed. Some users use only two copies of the data instead of three what we recommend. If you have two copies of your data you can run into split drain scenarios which is a problem. For that you have to resolve them and you resolve them in Glaster with a policy-blased split drain resolution. So that means that you can say the biggest file is always the correct file or the file that is updated last or several of these policies now exist. And it helps administrators immensely they don't have to recover or fix the split drains for every single file they just can set the policy on this particular type of volume for this particular kind of workload and they pick whatever solution they would like to see best. Yes, in the previous release we added multistreads cell field. I'm sharing this look with Patrick so I don't know how much time we have left. Yeah, okay, yeah, okay. So 307 is the current oldest release that we still support. If you have not upgraded to 307 yet many of the users stay on the release that they sometimes install. So we see a lot of users that's still using 306. 306 does not have any bug fix updates anymore. 307 is the oldest release and it gets out of support or out of maintenance within the next couple of weeks when 3010 is released. Still, these features are not known to many of the users. So we have small file performance enhancement so like mail directories are still not very efficient because it's really a lot of small files and file systems just don't like small files. Especially network files systems don't like small files but we improve this and there's work or there has been work going on a lot to enhance it already. Tearing, so hot tears and cold tears that have data that is used a lot are on fast storage and data that is not used much is on cold storage. Windows users like to be able to undelete files in the Linux and Unix world we just delete a file and we know it's gone. Windows users don't seem to like that they want to have a trash directory and they want to have this kind of support on Gluster as well so we have a lot of users that use Gluster in combination with Samba and they requested us to provide like a trash or recycle bin directory where the files are still there in case they deleted something and they can just copy the files back if they want to. So improvements in NFS so more advanced configuration also contributed by Facebook so Facebook tends to send huge large features that they use internally and that's actually quite nice so most of times we get really small changes or improvements from users sometimes bug fixes but Facebook actually sends like 10,000 lines of code changes in patches it's not always easy to go through those but it's actually quite nice to see those features. Yeah, one of the hot items is NFS Ganesha that's coming just yell if you want to replace the Gluster stuff with Seth. So NFS Ganesha is one of the things that's coming and more adopting much more so with 3.10 NFS Ganesha is not enabled by or NFS Ganesha as the suggested NFS solution Gluster NFS de old NFS server is basically deprecated en we'll fix bugs in it but we won't spend much time on enhancing it further. Brick multiplexing at the moment for every storage unit. So for the people who were in audit stock the OSD is like a storage unit for Seth. Bricks are a storage unit for Gluster en we use a process per brick which is fine as long as you do not have too many bricks or too many volumes because a volume needs a number of bricks. With the container kind of solutions that are happening people want to have small volumes but just really a lot of volumes so instead of the huge data storage and file systems that we normally do containers want small volumes like 50 gigabytes of volumes 100 gigabytes of volumes en we normally would go into the terabytes for storage so instead of this couple of big file systems we now have to provide many small file systems en for all of these file systems if you start processes and more processes for the bricks that's a bit cumbersome to manage so we have brick multiplexing that is actually one binary or one process opgezien multi-threaded en dat uses multiple bricks en dat enhances de performance voor deze kind of workloads kwijt a lot, so that's coming up next more metadata cache enhancements so mostly enhancements for small files so if you store home directories or get repositories on Gluster 3.10 will improve performance quite a bit there enfs kinesia en Samba we'll use storehawk storehawk is a project dat is an HSA solution for well storage environments and at the moment we suggest to use CTDB for Samba environments enfs kinesia with pacemaker and storehawk actually combines the same approach for both enfs kinesia and for Samba it uses pacemaker en dat is wat we will use for or suggest our users to use we'll test it much more then we test CTDB en pacemaker separately so we have one project that we use for everything it makes it much easier for users to set it up en they're familiar with the setup done we split it out the tierdium from the sort of tearing process is now managed by Glustady you can easy check its status and see if there are any problems or how busy the process is and everything it's a bit more easy to manage and we'll be able to debug things more for applications that have integrations so Kemu for example more improvements for Teeringa come in we still do not support subdirectly mounts over Fuse if you need subdirectly mounts over enfs or Samba you can do it but Fuse doesn't allow us to do that yet because how we have our internal structures for volumes so that's something that we really want to address for 3.11 we tried to get into 3.9 but some of the developers got busy with other tasks and now they want to put it in 3.11 and that's happening in like 3 months so hopefully they have the time to look into that as a Linux on Gluster volumes is something Jiffen will speak about this afternoon service site DHT to improve read there so by a standard we have all our logic done on the client side so the client knows about the distribution the client knows exactly where all the files are that's so fine if you just know the file name and you need to open the file but we have mainly Windows users that like to do LS or there or open their explorer window and see all the files read there is very inefficient because you have to connect to all of the distributed parts then you have to combine all the results and display it decently because read there is a kind of funky you get a whole stream of directories but you can also rewind things so you can basically seek in a post-seek way through a directory list and that means that if you seek back and forth the order of directory or the order of entries still needs to be roughly the same which is very difficult in a distributed environment so we're moving part of our distribution logic to the server side where the server actually does some kind of magic details and the client can actually do the read there and the rewinding and everything a bit more efficient so that's also coming in the next like three months hopefully this summer we'll get Gluster 4.0 the first talk this morning was from Korshaw about GlusterD 2.0 GlusterD is the management daemon it's being rewritten in Golang and it's one of the key components for Gluster 4.0 because this rewrite is mainly done to improve the scalability so make sure that we can address 1000 Gluster servers more efficiently than we can do now if you're now going over like 100 Gluster servers all the servers stop to each other so it's very inefficient and you will most likely run to bottlenecks if you don't have too much network bandwidth and network disks and everything so GlusterD 2.0 will be a big improvement for the management infrastructure we'll do things like iNotify so you can actually listen om de directie en kijken of de nieuwe files worden geëcteerd en dingen zoals dat dat is iets dat mensen echt willen zien de Gluster protocols gebruiken Synapse met XDR encoded zeer simpel aan NFS maar ze zijn niet zeker dat wij we hebben SSL-support maar SSL-support is van een klant naar een server en niet op een userbase dus als je een variety tenenzen op een server dan zouden jullie allemaal dezelfde SSL-connecties gebruiken die niet echt veel appreciëerd zijn dus we willen dezelfde veiligheid dat het NFS geeft omdat het NFS wel known is door veel organisaties ze zijn meestal gebruikt en we willen dezelfde infrastructuur gebruiken dat ze gebruiken en gebruiken de gevolg van de GlusterD 2.0 warm retentie en gevolgen is iets wat mensen willen dus simpel aan een life cycle een soort van manier om de file te gelaten tot de volgende 2 jaar of iets zoals dat een file te vermoeden naar een particular directie wanneer er zeker eventen gebeuren of zo die dingen komen maar het is langzaam want het is niet echt op onze prioriteit list hopelijk hebben we een beetje meer functie in de 4.0 release we hebben basic warm features maar er is nergens naar wat mensen verwachten van compliënts opgevoerd warm dus we hebben simpel schrijven veel soorten features maar het is gewoon zoals toy en niets echt officieel dat eigenlijk end mijn slides dus we hebben een paar minuten meer om te praten over dezelfde als je vragen hebt over de cluster ik zal in dit room of op de clusterbooth in de k-beelding niet geïnteresseerd om e-mail de user list of iets dat in de ISC-channel dus we hebben de hash cluster op freenode we hebben veel users over de wereld die post-questions antwoorden en alles alright als he mentioned i've just going to take a few minutes to talk a little bit about what's going on in the sef community lately as some of you came in late I know that this talk was originally supposed to be about the sef user committee and Vito was going to cover some of what was going on in there so I will hopefully touch on a few of those things but those of you that don't know me my name is Patrick McGarry I'm the director community for the for sef at Red Hat just wanted to touch a few things there's always a lot going on in the sef community so I just kind of wanted to touch on a few of the highlights in the few minutes that we have left here before we get on to a real presentation so one of the great things that we like to do throughout the year is our sef day program it's basically we go around the globe find somebody who wants to host and have a whole day event about sef so individual members from the community will come in they'll talk about what they're doing and the reason I bring this up is we will have we have plenty of sef days that are coming up this year the ones that are already published we have San Jose in 17th of March we're going to be in Boston during OpenStack and we'll be in the Netherlands on September 20th some of the other ones that are in progress right now I believe we are headed back to Frankfurt we may be in Stockholm and Warsaw in April and we'll also be probably in Taiwan we'll be in Sydney during early November and several others but if you look at the new sef website at slash sef days you should see them as they become available one thing I wanted to talk about also was that we're moving to the new metrics platform so right now we have metrics.sef.com it's a great dashboard shows all of the kind of ins and outs of what's going on in the sef community it was built by the Batergia guys with us so those of you that were at grimoire con this past Friday you already know a lot of this but we're pretty excited about where we're headed next we're moving to a kibana based dashboard so you'll actually be able to play with it and dig into the numbers but more importantly we're also launching a new tool called sef brag this is a part brag and part tool so users have the ability to do some performance tests on their sef clusters and share that performance data with the sef community at large and this is great they can share you know I've got 1.4 million IOPS on my cluster look how awesome I am but in the other side it allows users that are either new to sef or looking to improve their footprint in sef to take a look at this sef brag tool and sort by use cases that are similar to what is that they're doing so take a look at people that have you know large numbers of files that are small files and check out other people that are doing similar things see their hardware footprint see their tuning options and get a much better idea of kind of where they should start so we're hoping it'll be a little bit of both as I mentioned Vito was originally supposed to be here to talk about the user committee the user committee was the kind of first little baby steps into governance that we took about three and a half years ago and it has since grown into a very nice part of the community underneath the now sef advisory board which is our actual governance most recently they've been involved in helping with our release cadence doing contributor credits and most notably our mirror network so sef is actually building a functional global mirror network so we're very excited about that we have mirrors in China, Australia a couple in the EU a couple in the US and more are coming every day I believe there's one plant in Japan and several throughout Southeast Asia so very exciting to see that and anyone can become a mirror assuming you meet a couple of simple guidelines so if you're interested in that definitely go check it out it's on the sef website it's also in the sef doc and you can you can check that if you can't find it let me know I'd be more than happy to point you in the right direction as I mentioned we have real sef governance now this launched in October of last year we tried to get a pretty good representation across the entirety of the the sef ecosystem a little bit of the academic routes a little bit of the individual contributor routes and then some of the larger players that were making kind of long term strategic bets on sef so we want to make sure everyone had a voice in helping to drive sef forward the nice thing is sef as you know as it was designed no one can really own sef red hat may own the trademarks but no one can actually own the code it was deliberately built with a fractured copyright or a distributed copyright so anyone that contributes to sef owns their contributions in perpetuity so that's nice no big bad can come along by sef and make it evil so it's it's nice to see that a lot of these companies that are doing large contributions are also getting involved and kind of helping to drive the community forward as well as the code sef tech talks so if you would like to know more about deeper levels of technical things associated with the sef community and tangentially related projects every every month we do a sef tech talk it is the fourth thursday of the month it's done online so anyone can join this one we hold at the same time every month so i don't flop it around like the the developer monthly but one p.m. Easter i think so it's only like seven o'clock here so it's not too bad we usually have a pretty good variants of speakers everything from core team members i know josh has previously talked about rbd sam talked about rados all the way over to the other side where people are talking about integrationswerk en management tools en things like that so it's usually a a pretty good hours worth of deep technical sef discussion and a great time to ask questions and if anyone wants to talk about their sef work you're more than welcome and as i mentioned we have the sef developer monthly it's the first Wednesday of the month and we alternate this one so usually we have kind of a u.s. east and europe friendly time zone one and u.s. west en apac friendly time zone depending on the month and we alternate back and forth again this is online and it's usually designed for people that are doing work on sef to get together and say here's what's coming here's what i'm working on and make sure everyone's kind of aware of what's going on the biggest thing in the community lately is a new sef dot com we launched a new website i'm pretty happy with how it turned out there's definitely some some great resources there and everything is a lot more discoverable so if you haven't stopped by sef dot com lately definitely give a shot we can skip that and the big news for this year is we are holding our inaugural sef conference in august 23rd through the 25th in boston so it will be the first all sef conference that we've ever held we're pretty excited about that definitely take a look and come by if you can join us so that's the end of the sef community updates i'll be around if anyone has any questions otherwise i believe it is about time to start a real presentation so we'll thank you