 going through the process of setting up and installing Swift. So this will be just the process of, by hand, going through the process, bringing down the code, doing the ring building, pushing the rings out, all of the steps involved with setting up and getting Swift running. And it's a great place for us to start. The next session, which we'll start in another hour or so. The next one after this. The next one after this. So it'll be a 10-minute break. And then we'll do another one where we're going to do, we'll do a Swift install with the Swift stack platform and go through the automation processes. It'll be a little bit shorter workshop than this first one here. So what we've done, when we first started doing this, we brought in ISOs. And we distributed USB keys around. And everyone got one and, oh my gosh, it was tough. But we had to do it because we couldn't have internet. Fortunately, the OpenStack conferences and the summits have been so good on internet that we're like, well, let's do something over the network. And we have way more people. And it's a pain in the butt to manage USB keys. So we spun up a bunch of stuff on RackSpace for us just to log in and do the configuration on. So that's what we're going to be using as a baseline image here. Can I get the volume turned down a little bit, please? That's the gain. Thank you. I don't know this stuff. Hey, Martin, do we have the presentation up? Oh, OK. Yeah, I will. So what we have on the, so everyone, the first step here is to, we're going to log in to each one of those nodes. We're going to show, what we're going to do here is we're going to show a presentation of the concepts, the slideware, if you will. And then Martin will be driving what we're going to be typing on the command line on our instances. So really, the first step here is to log into the instance that we have running in the cloud and get a command prompt like Martin has right here. Does everyone who want one have one of the handouts with the login information on it? And I see some people are already going to, because we have a handout, people are just going to run ahead and just do everything all at once. Let me introduce myself quickly in the folks in the room. So I'm Joe Arnold, CEO of Swiftstack, and that's John Dickinson, the project technical lead, also at Swiftstack. Martin helps out with her engagements, and we have Clay here who's going to be walking around to, if you have any questions along the way about Swift for Swiftstack, Mark Hugo in the back with the paper, he's another person you can ask questions during the course of this workshop here, too. You ejected it before turning. So yeah, exactly what Clay said, this workshop is for, and if you have a question while we're going through this, because we're going to talk about some of the course constructs, don't hesitate to ask. We're here for answering questions. This is a workshop, not a lecture series here. OK. Right. So the philosophy of the lab, we're going to do step by step, not with a magical deployment tool, that's in the next workshop. And we're just going to go through the key concepts as we start typing things along here. So the steps, we're going to log into the node. We're going to go through and format devices. We're going to go mount the drives, create the partitions, build the rings, start up Swift, and then we're going to upload something to it. So that's what we're going to do in the next 40 minutes here. This is the first step, log in. So has everyone done this by now? I know we've been kind of stalling around. Anybody need some help logging in, if it's not working for them? Because we could have missed something in our automated provisioning scripts to get everything set up properly. OK. So the first thing we're going to do is we're going to format the devices on the node itself. We're going to be using XFS to format the drives. Hey, John, do you want to say why we're using XFS? Because it rocks. Because it rocks. In Swift, we generally recommend that people use XFS because in a lot of testing and large-scale deployments, it happens to be exceptionally good for failure handling and also handling very large numbers of inodes. And it's quite performant as it gets more and more data stored in it. So that's why they do that. Now, when we format this, there are some options that we pass into the format commands to make the file system. A couple of those that I wanted to point out that are very important are one, looking at the inode size. Generally recommend using 512 byte inodes. And this is where all of the metadata for your objects is stored. So it means that the inode size of 512 is a nice balance between being able to have enough space to store metadata, most of your metadata without having to go into new extents on the disk so you get a lower overhead. And then the next, the most important thing here that we're talking about is mounting it by label. And this is just a very good best practice in order that when your system reboots for any reason or something like that, you're not going to get your drives mounted in a different order, remap to a different mount point. And so by formatting them and mounting them or creating the file system and then mounting them by label becomes very important. Not that we have any practical experience with nodes rebooting and having the entire node show up and we don't know where any of the data is. Yeah, yeah, question? No, that is something that you, it's just a rule of thumb. It's a pretty good, for almost all deployments, as far as a good size that you would use whether or not you're using 512, I mean, I'm sorry, whether you're not using 2T disks or 4T disks, it's still just a good thing. The reason is this is where not only storing the information like directories and things like that, but the actual object metadata is ends up storing in the inodes because it's stored in extended attributes. So you can probably construct a situation where if you had a metadata intense workload, you might want to bump that number up. And I think even with some of the original, I'll just do another extent. If you overflow this or go to another extent, which is not great, but it still works. And Martin's really far. So just a note. So we had, thank you, thank you. So we had a, so because we booted these environments on Rackspace for such a large group, we created a bunch of block devices and then they got mad at us because they're like, why do you need 600 block devices? And we got a call from them. So at the last minute, we had to change both the slides and tried to change those slides in the handout. So Martin was up all night, straighten the salad. So what we did, yeah. So it's using LVA right now, or I'm sorry, LVM. And so you need to put a dev in front of the mapper, the slide got a little typo in it. And it's slightly different than what's on your handout there. So you're making these slash dev slash mapper vxvd d through f. Right, again, so change that here. I'm gonna, I'll change it on here right now. So it should look like this. So apologies for the handout. We had to make the adjustments at one o'clock last night. But this is what the command should look like right here. The question is, is there a reason we're not doing partitions but using the whole drive, Joe? Why have a partition? When we're building out these systems, we're not really, we're not running an operating system or needs swap space. We're consuming this whole drive into the system and that's our failure domain and that's what gets mapped into the ring. And so we don't really feel the need to chop that up into more bits. So that's a good observation. And we just format the device directly. And that's the practice that we recommend. You add more drives. And another thing to note, we're not gonna be using RAID here. So when you're provisioning, when you're getting equipment, the drives are exposed in a JVOD mode. They're not rated together. And so if you do have some existing equipment at home and at work and you wanna try this and it has a RAID card, just create one volume per drive as if you're gonna be testing this out. But if you're buying equipment for deployment, then there's some hardware configurations that we recommend which where you just have an HBA, not a RAID card and you expose a disk per volume. And then that's what you format here. Yeah, so there's some features that are included on RAID which can be beneficial, but you won't use the RAID functionality. So there's a cost trade off there. There's a cost trade off. So the next thing we're gonna do is we're gonna create a place for these formatted devices to be mounted. And so the steps here is to, we're just gonna make places to mount each one of those. So makedir-b, so it builds the full path. And then we're going to do, we're gonna mount by the label for each one of those devices that we just created. So we just ran out. If there's anyone not participating that has a handout with a sticker, let us know so we can give it to someone who may want to use it. So thank you very much. We may have a blank handout. I don't know. Hey Hugo, do we have any blank handouts that some folks may want to take home? We should. Go ahead. So start with this one. No, this really does happen. The labeling's really important. Yeah, question. I don't know. Do we have any blanks? I don't think we have any more blanks. No, we're out of blanks. This is what we have. Yeah, so just to reiterate, the purpose of labeling the disk is so that when a system reboots, sometimes SDB, SDC, those will get jumbled around or if you have a failure in particular and you're removing that drive, you're putting a new one in or you're replacing a controller, you're adding a different back expander card, the operating Linux will often jumble those around. And so we label the devices so that we know who they are in a reboot situation. And we can go then and take that drive and mount it in the same location. And so if we did that in FSTab, first off, another thing, we have a book which goes through this as well. So as soon as this workshop is over, well, if you haven't gotten a copy of the book already, one of the things is don't do that in FSTab because in a dense storage system, you'll see a lot of failures and you don't want to halt the boot process due to a bad disk. And so we recommend having something that happens after the operating system has been booted to mount and then do that mounting based on the label. We're not gonna do that in this 40 minute workshop, but it's covered in the installing Swift part of the book. Time, okay. So next thing, we're gonna change the ownership of the file system so that it's owned by the Swift user. Yes, I'll start talking now. Okay, so when you get started on these things, we have already installed on that server the, thank you, all of these Swift... Anyway, we've already installed all of the Swift handouts so that it... Swift handouts. All the Swift dependencies, thank you. And so that you don't have to worry about grabbing the code right now and going through just a package install. So we've already done that for you and we've already put some configs in place that are just nice default configuration just in the element of time for this workshop. So the next step is we're going to create the ring builder files. The ring builder files are what allow you to describe your cluster. And these are what are created and managed offline generally from your cluster. You generally can do this not on your live cluster. And you create a new ring for each of the different kind of things that are stored in Swift. So you've got accounts, containers, and objects. And each of these three then are described by a single ring. Now, these rings can actually have all of the same drives assigned to them, which is exactly what we're gonna do here today. So what we're going to do is create the account, the container, and the object rings. And we're going to assign, we're going to put all three of our, the file systems that we just did inside of those. Now the ring builder command file here, I want to describe a couple of points on it. The first thing is, well, obviously we have the CLI command and then describing the type of builder file we're creating, giving it the create option. There's others, of course, to add in. We'll see those in just a moment to add in things. But the magic numbers here, 14, three, and one. As we go, as you create your ring, you need to create this based on, yeah, no, I'm gonna go through the next slides because we have a talk on each one of these, or assign on each one of these. So the first thing is how big are you going to be when you grow up? So this is called the partition power. And this is a scary thing for most people to set because it's something you have to decide before you do the deployment of the Swift cluster. We recommend sending it to a large number. And what this sets is in Swift, there's a partition space and that partition space gets distributed across all of the disks in the system. You want enough disks so that you avoid having any hotspots, but not so many that there's too many and there's more processing than needed to find out where the partition is. So we have a rule of thumb, which is based on how much space you have available in your data center and how many spindles you'll be able to fit inside of your data center, physically. It's a calculatable number, and this is how we recommend setting it, which is the number of drives that you'll think you'll have, and because it's a partition power, then carried up to the next power of two. A good setting that we recommend is 18, which will carry up a cluster to pretty good decent size. And in the workshop, we're gonna recommend something setting it smaller so that it doesn't take very long to build the rings because it's computationally intensive. And just to go through the math in very explicit detail here, let's just assume hypothetically that your cluster was only going to have four J-buds on it, each of which only had 24 spindles inside of it. So a total of 96 drives. So your appropriate partition power there, you knew this was the maximum size that you're ever possibly gonna grow your cluster. So in that case, we know we have 96 drives total, and we want 100 partitions on those at scale, so when we're all grown up. So we multiply 96 times 100 to get 9,600. Now we need to look for the next largest power of two that is at least 9,600. And in this case, and this is the example we're using here in the workshop, we're using 14 because two to the 14th is just right size. 16,000, yeah. So that's that setting. The next one is how many replicas do we want to have in the cluster? Now this is changeable over the life of the cluster. So you can actually change it incrementally. You can go from three to 3.1. Anyway, we recommend setting it to three, and that's been tested. And the failure, how the system behaves under failures with three replicas is very well understood and very well tested. We know there are some deployments that are running with two with some success, and we're working on with a larger distributed cluster to go up to four replicas where you have two replicas in each side. Mid-part hours, do you want to do that one? And then finally the last magic number, if you remember the Swiftring builder command had the 1431, 14 for the partition power, three for the replica count. And then that one is something we call min-part hours. It's a nice little programmer variable name that is the minimum number of hours that must elapse before that Swiftring builder command line tool will allow you to do another rebalance. And this is something that will help protect you so that when you are doing a rebalance command, so you add all your new drives in there. As you're growing, you're adding more drives. As you're replacing things, you're taking old drives out. And each time you do that, you want to rebalance your data so that rebalance your ring so that the data is effectively smoothly spread throughout all of it. So when we do that, when we do a rebalance command, Swift will make sure that it locks down at least two copies of your three replicas so that you can always still remain available even while you're in the process of deploying your rebalanced ring. And so you set this number generally based on the amount of time it takes for your cluster to undergo a replication cycle. How long does it take to ensure that all of the data is in the right place? So you can watch, there's an item in the log that will come across. It'll say, okay, finished replication check. And you can watch all of that data point on all of your clusters to know how long it takes for a full replication cycle to occur for your cluster. And then you take that number and use that and put that in the ring builder so it knows, hey, I know how quickly or how aggressive I can move data around when I'm rebalancing, adding new capacity and the like. One implication of this setting is that it means that in order to ensure your data is available, it may take more than one rebalance cycle to completely smooth out your data. You can see this very easily if you're adding a significant percentage of capacity to your cluster all at once. You may need to do a rebalance, let it all set it out, then rebalance again and continue until it's really done. And that command line tool, as you play with it, has a lot of good feedback on what the current status of your ring is and how well it's working. The next thing is to add the devices. And we're ever lazy, so there's a little shell script for you to add each of the devices into the ring builder. So the ring builder is a database, you're putting that information about each device into that database with Swift ring builder add. And you're adding in for each of the account, container, and object ring files. And one of the other interesting, one of the other important parts about when we're doing the Swift ring builder to add this is this number at the end, 100. This number is the weight of the storage volume that you are adding into the cluster. And the weight is a dimensionless number, but it only makes sense in comparison to the others. So if something has a weight of 100, it means it's gonna have twice as many objects generally as something that has a weight of 50. I heard Clay said, so we add a new feature in Swift, which is around regions, okay? And so when we add, when we say add, we did not add the region, regionality. It'll default to a single region. So it doesn't break any existing scripts, but it'll say, hey, we have this new region feature, we're gonna just default it to one. So you guys should have gotten a warning about that. The other zone to talk about is zones. So in Swift, data is gonna be distributed, as we say, as unique as possible across a disk, a node, a zone, and region. And so zones are for unique fault-tolerant domains. So if you truly have a data, when you're doing your build, and that group of equipment is either on different power or different network segment or data center room, then that is what you would anoint as a zone. And most deployments that are under a petabyte, most of the time you'll be running with a single zone. And data will be distributed across the nodes just fine. But it's when you start breaking out of that one to two rack, that's when we start using zones. Yeah. Say that. So the lists are, it's a drive, nodes, zones, and regions. So if you have one box, which is what we have here, and you have three drives, we can be all in the same zone, and data's gonna be placed as unique as possible, so that's, we only have one node to work with, but we have three drives, so we're gonna put replicas across all three drives. If we had two boxes, two replicas would go on one, one would go on the other, and kind of randomly two on one, one on the other, vice versa. And as you add more devices, and you start having a fault-tolerant or a failure domain, then you can say, this is a zone. And then once you create another group of machines that are connected via, you know, that are somewhere else, then you can call that another zone. Yeah. Yeah. Yeah, that's a good question. So yeah, in this exercise, maybe what we should have done is had everything as in one zone one, and that would probably have been more illustrative of how we should proceed. The danger is that you'll, if you don't truly represent your failure domains correctly, you run the risk of having replicas exist in that failure, those failure domains that are actually close. So if you have 20 zones, and then there are three different rooms, you could set up a situation where all the replicas of a particular object happened to sit in one room. If you partition the zones to fine-grained, that's the risk that you run. So don't run a tremendous amount of zones, only use zones where it truly represents a failure domain. No, we just did with that ring-builder command. If you go back. Which one? Next slide. This one? This one. So that mount point is, see the very end, the D, dollar sign, I. That's the disk label that we set up earlier. Correct. Because earlier we mounted in as D1 was label one, so it just kind of is a nice one to one. But ultimately this is the mount point of that volume. So this is a nice step to do, to validate the rings. If you run this command, you'll see, take a look, and you'll be able to see, here's all of the devices that are, have been registered with the builder file. And Martin would have showed the output, but apparently. Martin's catching up right now, he had to. Okay, good. Yeah. You use regions when you have data centers, which are, there's a latency sensitivity between them. And between two different regions, SWIFT has different rules on how data traverses that WAN link. And so when a write occurs, the write will occur durably in one of the regions and then be asynchronously replicated to the other. And when a read happens, a request into one region will prefer data in that region and not go over the WAN link to try to retrieve it. So that's when you'd use regions. Yeah. No, it doesn't matter. So the question was, we do account, container, and object, and we've just been doing it in that order. There's really no ordering per se, but the reason why there's three rings is because data, you want the option to be able to tear out account and container data into different tiers in the system. So for example, we're working with a deployment where we want high performance around account and container. So we've tiered that off into its own infrastructure, running SSDs, and then the objects we're running on spinning media. So by having separate rings for account and container, that gives us the flexibility to tear those two things apart. But for our, in our example now and in the next one, we're just gonna be running everything on all the spindles. Rebalance. Oh, these are? Yeah. So once we have done this, we need to rebalance the ring, just as I described a little bit earlier. This will assign the partitions to the particular storage volumes. And again, we'll keep using word partitions. As we talked about earlier, these are not file system partitions, but actually overall pieces of the balance of that ring. So the rebalance will make sure that the data is, the partitions are allocated as uniquely as possible across all of your available storage volumes. And in this way, since we have it very simply deployed with just three servers, I mean, sorry, three storage volumes, then we are going to have one third of those partitions assigned to each of those devices. So you can run the rebalance command here, and then you can, that will create the serialized version of your ring file, which is then deployed out to your cluster. Because the builder file is actually pretty big. And rebalancing is slow. And rebalancing is slow. You're actually distilling down from a database of all of the devices, when the timing of when each of those devices got introduced into the system, it's history of which partitions have existed on that device. And you're distilling that down into a condensed roadmap, if you will, for where all the partitions are supposed to live. And that's what the rebalancing command. You're taking a database and you're distilling it down into what is in effect a pickled JSON blob. That's gonna get loaded in memory by all the nodes in the system. So now, once you're done rebalancing, in our case, you've got the, this creates that ring file and puts it in the appropriate place. Happens to be, you don't have to deploy this out there because we're all running on a single server. Then you can run that Swift ring builder command again with your builder file and you'll see that there have been partitions assigned. You'll see that the ring balance is now zero, which means it's not over full or under full. It's just right at zero, which is exactly what you want. And we're ready to go. And now it is time to start Swift. So has anybody had any problems getting the up to the ring coordinating yet? If so, raise your hand in Hugo or Clay or Darryl and I'm gonna help you out. I think we're good. Okay, so now that you have everything in place, we installed Swift for you and set up a sample config file because we're not gonna have time to walk through all the config options and things like that. And now that we have created some file systems, we've mounted those on drives, we have ensured that they are in the ring properly. We've got the proper balance on everything. We've deployed our ring file out now to the server just by virtue of it running in this case. Now we're ready to start Swift. And so Swift comes with a Swift init command, binary tool, CLI tool, that allows you to control the Swift processes. There's lots of different server processes and demon processes that can run. And so this one that's on the screen right now, the Swift init main restart, you could just use start in this particular case, we'll start up the main server processes. This is the proxy server process, the account server process, the container server process, and the object server processes. These four processes will then start up in the background and start listening and responding to connections. And at this point, you have a working Swift cluster, at least from an API perspective, with an endpoint. Yeah, the replicators, the consistency processes don't get started with this command and we'll walk through that next. If we have time. If we have time. So you can see what's going on. We've configured also, one of the things I left out that we've already configured for you is, we've configured our syslog to segment off all of the Swift logs to go into varlogswiftall.log. Normally these would be in a kind of default, we use syslog so unless you have it configured specially, it's going to go into varlogsyslog. And you can segment off. Where it goes. Where it goes in very fine detail. So the next, the first thing that we're gonna do once we have an API system up running is walk through the authentication. So Swift, the mechanism it uses for auth is to send in requests, a token is generated, then that's returned to the client and that token is used for subsequent API requests. So we can do this by hand or manually, if you will, by using curl and we can see a token being generated from an authentication request to the authentication service of Swift that's running. There's a configuration to set that. I think default in the system, it's 24 hours. It is dependent upon your auth system that you are using. But the one that comes out of the box of Swift by default, it's 86, 400 seconds. And so when you make this request, you're gonna get a token. And that's represented by ex auth storage token. And then you're also going to get a URL. And so as a client, when you're building one, you would do the authentication request and you would pull out that storage URL. You'd pull out that token and then you'd use those two items to make your subsequent request. So this is what a request on the account would look like, is you would take that auth token, which is the auth token and you'd get that URL and it would say, hey, there's no content here because well, you haven't uploaded anything yet, right? One of the things that I like to point out with Swift, I think we're... Nope. Nope, okay, never mind. We're out of time. We're not out of time. We're running short. So then the next step would be to upload something. Well, and so what we're gonna do next is use the Swift command line. And so what we've done is we've put a picture of a cat because if you're not uploading cat pictures on a web addressable storage system, you're doing it wrong, that's right. And so we have, there's a Swift command line client and this allows you to specify the authentication endpoint, your user credentials, and then put in upload, download commands so that you can upload something into the system. And so what we're gonna say here is Swift-U and then the pre-configured account name that we have already on the cluster, dash K, which is the key, the password admin and then dash A, which stands for auth URL and we're gonna post that locally on the system. And if you had cd'd into the home directory, we'll have a cat's jpeg that will upload into it. Is there any questions about this part? Anybody having issues with this part? It will automatically create the container if it doesn't exist. It's only one file in there. Correct. Millions of files. Yeah. So then if we ask the cluster again for the listing of all the files we should get back, hey, the file we just uploaded. Okay, now we're gonna get really fancy, really fancy. What we're gonna do is we're going to change some of the metadata on that container so that it's globally readable on the object, I think. Nope, it's on the container. I think it's on the container. And it's setting it to a publicly readable. Anything in that cat's directory, anyone can read. So this is taking advantage of Swift's ACL functionality. And when we do that, so we'll do the post dash R, reads, splat, which is the refer cats. And then what we can do is we can pull up that IP address, and that's gonna be the one on the front, the one you SSH'd into. We're running Swift on it. We're running it on port 80. So now we're in effectively, we've just stood up a web server that has a storage system underneath. And we're just going to hit that URL, and we should be able to load up our cat picture. The minus R, so the description of this part here is when you're posting metadata into this container. Minus R is a shortcut for read ACLs and the dot R colon star is a slightly abbreviated shortcut for set any referrer. Is allowed to have read access to this. Yes, that's the referrer. And so that is the workshop. We have, we've covered a lot of ground in a very short period of time. We brought, there's a, we have a book that we've done on Swift which goes into the more detail into all these steps and into all of these configuration settings. And we'll have that available up front here. We're going to take a, I believe it's a 10 minute break. And we're gonna come back and we're going to do another install of Swift because I know you can't get enough ways to set up and install Swift. So we're doing another one this time. We're gonna do it with the Swift stack platform. And we'll start at two, I think it's two, two, two 20, right? Two 10? Well, it's two 10 now, but we, I think it's a 10 minute break between workshops. Yep, yep, two 20. So we'll come back. And if you have questions about getting this up and running now, where there's a few of us that will walk around and during the, during this break, help you get to that last mile and then come up front and we'll hand out, we have more copies of the book. So thank you.