 Okay Hello In premier lugar, we gastaria a proven char esta oportunidade para agradecer a las barcelones es por su hospitalidad Su suidad is realmente Surpreendente es pero me disculpen por no desier este in catalan muchos gracias Hello Hello, my name is Andrew Bogue I'm here to talk to you about clustered file systems in your application stack so Moving right along. I want to start the discussion I want to start this presentation with a definition of a complex problem There's a couple of different sort of flavors of this definition But I heard it and I cannot actually remember where I heard it But I really like it and it's very applicable in the IT space of a complex problem is one that When we start thinking about solutions as technical people and we propose those solutions What we end up doing is changing the problem and The problem in the in the case of a clustered file system or a distributed file system is that well one of the problems is That there can be some new and unexpected behaviors when you move from a single Traditional file system that was sitting one cable away from your CPU in a standard server to one that is distributed or networked So just a bit about catalyst to give you some context, so we're all about open-source technology. These are some of the Systems that we work with some of the technologies we work with we have Offices our headquarters in Wellington in New Zealand. I'm New Zealander We have also in Australia, Melbourne Sydney and Brisbane and a UK office in Brighton We have our own open-stack cloud public cloud in Wellington, New Zealand actually with a number of availability zones So regions and that's been going since 2014. There's a public cloud. This was the first public cloud in New Zealand We're all about open-source technologies and that sort of open-source focus took us to open stack So just want to give everyone a heads up and give you a rip cord To let you know what this call what this talk is all about because there's a lot of great talks going on So this this talk is very much aimed at open-stack users at solutions architects at people who are deploying solutions into an open-stack cloud We're gonna I'm gonna talk about some of the stories and experiences that we've had and some of the things that we've seen and And and you know what we've done and and this includes both legacy and other cloud Platforms because it's reasonably agnostic. This is a linux centric talk. We don't do windows stuff at all I've spent a bit of time talking about GlusterFS because I think that's quite an interesting Technologies at the moment. We seem to get a lot of interest around it I'm not gonna mention the manila project. That's not what this is about at all And I will get on to object storage because object storage is relevant in terms of the clustered file system problem So let's just take a step back in time right the file system the file system 101 once upon a time And this is still very much the case for you know Many computers that are turned on including the phone that was in your pocket, you know They had a CPU. They have RAM had a file system right and these rules sort of proverbially One cord away from each other one cable away from each other And there was sort of there were quite a few assumptions that came with that around things like availability Consistency performance, right? You know, they sort of were quite predictable and consistency is probably the one that for me personally Takes the most mindset adjustment when you move away from like a singular file system to one that's networked or distributed Is that the idea that once you make a right or an update that that will be that will be consistent and perpetual And you won't get any ambiguous answers But I mean as as we've noticed this these sort of ideas and this belief of what a file system is Has really influenced some of the challenges we've faced when we've moved application stacks Into a networked or distributed file system model So we move from the more traditional one single file system into a network file systems now when I say Network file system. I don't mean specifically NFS, but obviously NFS is going to be discussed But just a file system that is a network away from your application or web server one that is available to multiple compute Instances and it's something that everyone has had some exposure to whether you're a developer or someone working in an office When you use the the bottomless X drive Inside your office that was a big shared pile of mess and that's these these also have their challenges As we've discovered They they big they get full of clutter. They have some performance challenges. They need they get full they get filled up quite quickly They have a certain amount of overhead from operationals around managing capacity and Quotering and all these sorts of things so they they sort of solve some problems in the terms of capacity and availability But very much brought some new challenges to the table So, you know one day of course we all had our first bad experience with a clustered file system with us with a shared file system You know, maybe it was that X drive and that means, you know, it broke or There was it there was massive performance degradation because of you know locking or race condition issues Or it was just unreliable and there was never any space on it Or we weren't able to write things fast enough to it So there were challenges that everyone's seen including it simply not being available So now I'm going to talk just go through some of the tech technical approaches We've used to solve the challenge of having file storage available to our application web servers What are you going to call it? So NFS of course NFS the workhorse working hard since 1984 Thanks very much Solaris and everyone who's contributed to NFS since then bit of a pop quiz Who is using some form of NFS right now? Lots of hands of course right who's using a replicated NFS solution Okay, who's who has what they consider to be a rock-solid high availability automatically failing over NFS solution Please come and talk to me afterwards That's that's great. So it's it's and we're still using it very much. So it's a it's a great technology It's very the thing I think the thing that I come back to and I'm not a network engineer or assistant I'm sort of an amateur Generalist who has to sort of get involved in decision-making some of the things I asked my sys admins and Network architects to do they look at me and say you can't do that And but I've sort of just being pragmatic and I want to get things done The thing that I've I like the most about NFS or one of its really strong qualities is we know it well Okay So any sys admin who's been around for a while really does understand Some of the symptoms what some of the symptoms of a poorly performing NFS system actually mean right? They they understand potentially what happens when you try and use locking aggressively or if it's just not working properly Or if they were just underline underlying disc issues And that's a very good thing because when the gloves come off and there are problems You have people who have pragmatic solutions as to how to fix it now in the real world when things go wrong as they Do that's infinitely valuable and that's what some of that That's actually some of the issues that I'll talk about later with some of the different technologies Where we weren't in the best position to really diagnose and resolve some of these these problems So the next one is as I said, we've got a lot of NFS stuff going on which is you know, no great surprise Do you on DRBD once again? Who's using DRBD in some way shape or form? Okay So we so do be there DRBD is you know block level rate everything essentially Okay, so when we saw this for the first time oops When we saw this for the first time probably about eight years ago To us it really solved a lot of problems And this is we use this prior to when we went on to cloud infrastructure for replication of file systems To us it really solved sorry It really solved the replication problem because it was agnostic and replicating anything And that gave us a lot of power to be able to just have an agnostic replication solution Well, we were very confident that the data in one place was getting replicated to another We we did some quite interesting. It wasn't our initiative. We saw it in the market We did some great solutions with Postgres replication and using this as a way of having zero data loss in a replicated postgres set up We had a lot to do with postgres, you know, ten years ago where there are about four different ways of doing replication master slave replication all having sort of certain levels of trade-off and the DRBD model I actually was present during a demonstration where someone had set up two physical machines And a master secondary DRBD set up and they were writing just an Incremental insert into the database on the master and then someone went and pulled the plug out of the master And you could see the last value that had been inserted into the database And then they went and fired up the secondary one and fired up the database Which was not which is not available until you do a fail over but they'd fired that one up And you could see that the very last data point had made it all to wait is all the way the secondary server Which was quite an interesting Sort of functionality because the other replication strategies didn't guarantee that I mean most of the time the latency and lag Would be very small, but it was not guaranteed always going to work now We had one other interesting experience with DRBD where we ran an entire zen image on Actually a DRBD replicated server so that in the instance of a failure the entire zen image Sort of popped up on the other side of the pair This was our sort of early attempts at a high availability Vm now of course it was Brilliantly genius and to all of us at the beginning. We thought it was what a great idea And it did solve us a lot of problems We didn't have to do any application level replication all of the configuration of the server was exactly the same It looked exactly the same once it had moved from one to the other all the VPNs everything worked But we had a sort of a horrible frightening moment where for some reason the DRBD Replication had jammed because we never actually knew whether it was running on the primary or the secondary When they actually felt over all it looked like to us was a reboot and at the time when it failed over the DRBD Replication had actually jammed three months prior So what happened was at some random time of the evening? We started getting all these alerts and the alerts were sort of random strange alerts It wasn't that the system was broken It was just certain characteristics that started firing on our Nagios and it and what we discovered was these are the alerts that we had Added over the last several months because they were not actually on the server anymore So the whole thing had been wound back three months now Hallelujah Actually, this particular system was a pre-production solution where the client had actually been dragging their feet to roll it out So we had backups and we were able to recover but that put a pretty horrible taste in our mouth around DRBD and Unless things have updated recently the challenge with the secondary node in some DRBD setups It's a bit of a black box and you don't necessarily know what the state is until you actually enable it and You can sort of ask DRBD. Are you okay? But asking that application if it's okay is not the same as actually having some External test that does validate the state So there's a little bit of caution that snuck into our world after that But however we are still using it and the fact that it's agnostic to what you store and it makes it an extremely powerful solution So OS OC FS2 has anyone ever used OC FS2? Wow, okay, so we saw that as a Oracle clustered file system. This would be Six or seven years ago. We saw this and thought well, holy grail. This is what we want This is sort of a lot of things that cluster offers and we installed this we did not do enough testing We did not do enough profiling and on one happy day. It all just came crumbling down There was this was using physical machines at the time. This is before there was even AWS in Australia So we ended up in a situation where we just basically had a broken file system and and DR in itself was not We didn't have a DR model that involved leaving this technology So we sort of ran away from this I had a I did actually look recently to see if there was anyone still using this and I didn't see too much interesting But we experimented it looked very very promising. It was a multi master File storage setup, but yeah, we would not go anywhere near it again So Gluster FS. Okay. Gluster is something that we came to in 2012 The the background being that it was our first we were engaged in this. This is a this was using AWS But the information here is agnostic to any platform because there was nothing here that you couldn't do anything on even Azure So we were brought here because we proposed a solution for a large LMS stack. It was a MOOC for an Australian University and this was a this is a big platform gets over a million pages a day has lots going on Sort of quite standard lamp sort of a stack using post-credits not my scale But same sort of stuff using all the sort of scale of horizontal scaling techniques You would imagine and in our in our solution. We had NFS We decided we wanted to some sort of high availability NFS solution And that was given a lot of criticism by a number of solution architects who sort of see that doesn't adhere to the Architect for failure model and you need to go and start using Gluster FS So that was where we began And this was 2012 and since then we've actually and we so we are still using it in this particular platform And it's been clear to us that there's interest in the Gluster FS space as there was a blog post written on our website on the catalyst website Just about some of the technical tools we use and some of the ways we apply things And it was really just about our Gluster FS story And we just talked about what we used it for and what we'd seen and what were some of the problems We'd faced and you know when we thought it was suitable and that got a lot of traffic And we were actually contacted on LinkedIn and a few other people by people saying we're thinking about using Gluster What do you think? And you know what we would say is that it's it's different and you need to you need to think about how you're going to use it And you know we're happy to have a discussion and some of that would sort of went forward into engagements of work So something to understand now the Gluster FS is Something we've used in certain ways and there are there are other ways of using Gluster that we are not using but so our Some things to understand is your application server your web server if you're talking about a standard web application It needs to have a client library to actually mount the storage right now Sometimes you see that Gluster is NFS compatible, but you're not using NFS If you're using NFS, you're not getting the redundancy advantages of Gluster So your application needs to have its own library now and your your application Servers also understand that there is more than one Gluster storage node So in the same way that DNS knows that there's more DNA more than one DNS server or you know Mail supports the idea that you don't have to actually do a failover it understands that there is another server It can go and talk to so the failover happens Conceivably at the actual app level once again It depends on how you might have set up your underlying Gluster underlying cluster But if you have a small number of redundant Gluster nodes then you Your replication will fail over if you sort of preferably pull the plug out of it I mean there's some configuration about how long you let it time out and stuff like that The storage nodes behind the scene are also doing some syncing. They are talking to each other and replicating between themselves auto healing The one thing that got us and when we when we have had issues with Gluster is that the files themselves Are only visible by mounting the the Gluster mount So in the case where you had some NFS problem for example and you decided that oh look It's all gone horribly wrong and we're getting slow performance But we really need to get this file off because we just need this file You can just SSH into the machine go and get the file and pull it off right whereas with Gluster That's not always possible because the files themselves are Sort of represented as metadata and sort of pieces So generally the way you get things off is by mounting the Gluster itself Which if Gluster isn't working very well is quite problematic And of course there's a number of different considerations around raid setups and architectures and in all the trade-offs You can have around performance redundancy and flexibility So Gluster FS could be for you for these sorts of reasons It's just not feasible to move away from a file storage layer You have to use a file storage layer that that's that's what you have to do But you don't want to use NFS Another thing is if you have a good Understanding of the usage pattern of your files what it's actually doing right because that that sort of matters What sort of files are it is it putting on that how many times is it checking for a file that never changes? Is it trying to use locking is it using little session files to store some data? Are they changing or are they are they small or are they large? And of course following on to are you able to actually configure or patch your application to do some things That make it a little bit more sensible to use Gluster Larger mutable file objects like the sorts of things you put in object storage are actually a reasonably good use case for Gluster one of the problems that we faced was our application the same application the Large Mertel LMS was using a it was using a session file That stored user data and that was written There's a couple places you can put it but you can put it in database You can put it in a caching server or you can put it on the file system Now it's just a small file that stores some information But if it's running on a single machine, it's getting updated and read all the time on disk and that works just fine But when it was put on to Gluster that didn't work very well at all I mean and it only really manifested itself under certain low conditions if there wasn't a lot happening It sort of worked reasonably well, but if once things started going badly, they started going very badly So that was something that we discovered and it was particularly visible when In Gluster as opposed to other other solutions So some ideas about when Gluster might not be right for you as if you don't understand what your application is really doing If you're taking some proprietary lift-and-shift exercise where you're moving something into the cloud and you decide you need a clustered file system You go we'll just use Gluster be careful because you don't really know unless you know what's actually what's actually happening And there are ways to profile these things. There is a risk that it will do things that doesn't suit Gluster very well Once again, if you're not able to actually make changes to this application and that limits your ability to fix any issues that may arise Certainly small Highly accessed and mutable files are ones that you want to be careful with It's not so that it can't be done But it's one to be careful of and of course the lack of internal expertise If you don't have the people or the support or the agreements or partners to be able to work through any challenges Then once again be careful So What does a sad Gluster FS look like now? I'm not saying don't use Gluster at all I'm just passing on our stories of what it looks like when it's in an unhappy state So of course you can get and we have certainly seen this. This has been problems that we've faced For a number of reasons one of the cases actually a few years ago was nothing to do with Gluster it was actually a kernel bug where because there've been so many nodes created the This is beyond my knowledge of how the internet tracking routing works But there was something being cached that meant that basically network wasn't working between the machines But this broke Gluster grandiosly and twice. This was a failure of Gluster So we saw massive performance degradation and just complete failure to be able to application to work. You may get timeouts File operation failures split brain scenario, which is pretty frightening Clustered failure meaning, you know one part your file system fails that would it which is shared across potentially a number of applications And it fails for one and it fails for all and of course application outage. So these sorts of things are very bad so Taking that on board and thinking about what are the mistakes that we have seen And made in our Gluster implementation journey is not enough performance and usage profiling right and this this is this also goes to the Thinking about this is a new model not just the same as you know The raids and the file storage models that were built before of like there was quite a standard way of testing that you know People were very interested in things like IOPS and throughput and availability and how you might tune the raid setups and all that sort of stuff And that's still important But you have to think about actually really understanding the usage and how it's how your application is talking to the file system Load testing of course load testing for us interesting scenario where Apache so we had a site data pretty standard web application said there was a site data that was visible to Apache and Apache by default was looking every time it looked inside a served a folder that was on the site data component Not with the applications. It was looking for the small dot HT access file, right? Even though we never use that particular tip We never do HT access in that way We had a centralized configuration, but by default it was still actually looking to see if one existed now that put a huge huge Oh, that gave massive performance degradation and once we actually Reconfigured Apache to not do that anymore. We've got about 10 times more throughput. So but that was once again something We weren't expecting This the the other implementation mistake I talk about this file system assumptions problem of just assuming But that because it looks like a file system and behaves like a file system Then it's a file system the same as the one you've always used But you that that assumption is dangerous Another thing potentially is getting too distracted by some of the traditional exercises of file system and storage Sort of measuring which is all you know really getting distracted with what sort of raid setups you want and how much IOPS You need and all that because those are solvable problems in other ways, right? If you talk about cloud native applications, there are other ways to get faster throughput There are other tools you can use it isn't the same as in the past where it was all about the fastest point to write persistent storage was the centralized disk there are other ways it's still adhere to high availability and No single point of failure even in memory ones that if you're comfortable with and you've and you've set up properly are still valid And there's more and more ways in which so we tend to when we use storage nodes and cloud and just use a single Single disk or a single block device because there's already redundancy under the hood And there's redundancy built across a number of nodes But sometimes because of the legacy of the way we did this in the past We spend a lot of time really faffing around with all sorts of you know raid setups and all that and just I don't necessarily think It was always terribly valuable Back up and monitoring with the instance of cluster is also different You need to have a think about how you're gonna do it. It isn't snapshotting block devices Is something that is a good way of just getting started But isn't always perfect and can end up causing you to pay quite a lot of money for a lot of snapshots Even if you have one that does only incremental Even if you have a solution that only charges you for the incremental difference If you start doing it lots and lots and lots you still find yourself really storing a lot of things That you probably don't really need to store and that just means you spend more and isn't always And monitoring as well monitoring you always need to spend more time on getting your monitoring tidy and getting it slick review review Retro talk talk about problems get in a room Discuss what your challenges are see if you can improve it experiment I mean this is the great thing about a cloud infrastructure is that you can prototype and you can stand off You can have an orchestrated stack and fire it up and run load testing and all these things There was a lot more challenging traditionally with physical machines because you only had a symbol Small amount of them and they weren't they weren't sort of lying around for your use So now I do sort of come on to just mentioning objects and I this cap theorem I just think is genius when I first saw the cap theorem and it's relevant for file storage and databases and Everything that is based on the single point of truth is really understanding what your storage needs can tolerate Because I mean really as much as this is about clustered file systems in the title You want to be starting to think always about how you can move to objects And I mean I don't have time to talk about a large reason of why objects are great Or what are the advantages of them, but they have they have many advantages in terms of management and overhead and capacity and all those things but they they don't map one-to-one with the use pattern of of applications expecting a file system and It's not always easy But it's really I find it really useful just to think about which one of those three things you can sacrifice So you got consistency availability and partition tolerance Traditionally file systems don't do the partition tolerance terribly well, but they are consistent and they are available Whereas object storage use you have tolerance to petitions and availability, but the answer isn't always consistent now that that that's fine if your application understands that and Certainly more and more we see the future of our cloud native applications as being using objects And we're already starting to do that But there are challenges and it's quite a journey, especially if something was built wasn't built for that and In our opinion some of the object implementation challenges that you will face if you're taking an application moving it into object storage from Network storage Using the APIs right you still gonna have to go through this application and find all the places where it's writing things Even where it has a file system layer and implement the API use the whole eventual consistency just makes my brain hurt It shouldn't be like that you wrote something you should get it back and you will most of the time But your application has to not be confused when it doesn't get the same answer back Right and things like large objects for example the risk once once objects start getting larger the risk of getting inconsistent Answers is very low, but still it will be very bad if these things happen like for example with the with the session Example like I've earlier where data was being stored inside a session that would not be suitable for objects because you would get inconsistent answers and You might get different answers at different times any and the latency of adding these things are generally too high a Mutable objects objects can't change you have to create them and delete them There's access latency sometimes. It's not that fast to push in and out objects compared to writing to a file system and The other one is once you decide what is a good use of object storage Understanding what you're going to do with the pieces of your file system sort of workload that don't fit that model Which other tools are you going to use? Are you going to use Redis? Are you going to use a queue? Are you going to use some in-memory storage really understanding what your options are and what's a good idea and how that needs to Be architected and that's very much not a solve problem, but there are so many tools in the open-sport space that really allow you to amazing things The last one is just that legacy file system addiction that we just stuck on it And it's the way it should be and like that. That's what that's where we want to be So this is the this is the poorly worded Cloud native application oath, you know my cloud native application does not need a file system for persistent storage Well, of course many of the ones that still run on our workloads do but moving forward There are just so many advantages to objects in terms of the I mean in terms of the underlying infrastructure If you're actually running an open-state you cloud yourself I mean Swift just makes so much more sense as opposed to big disks Your capacity management the way you actually have to care about the assets in there and your ability to roll things over the way You do backups everything it makes a lot of sense But is a bit of a radical leap One more thing actually that I've actually got a few minutes left. So I'll mention is that technically Does anyone still use blobs database blobs? in their application Okay, so not many because once upon a time of course, especially in the Oracle world Storing files was very much about using the blob layer. So putting these binary objects into the database Which was essentially a network file system in the sense that it was available from a lot of places and you you know You had this centralized point of storage But it had lots of problems. I mean every time I every time I try to use blobs It just wasn't a very elegant way of doing it. So I've actually finished early, which is good I hope I might be able to take some questions. Thank you very much. Do you have any questions? So CIF I did not mention so we have not used CFFS. We're aware of it We're using CIF as a storage layer in our OpenStack cloud to provide elastic block services We have not used CIFFS. We would like to have a look at it. I'm very interested in having a look at it But we haven't rolled it out yet. Okay, which is gracias