 All right, let's see. Sound good? Can't see anybody up there. All right, so welcome, everybody, to bagering up a standalone multi-node OpenStack Swift cluster. Thank you all for being here so early in the morning. I know it's early for a lot of us. It's 2 a.m. to me. And especially after breakfast getting here, I know we'd all want to still be asleep. So thank you for being here. So who am I? I'm James Thorne. I'm a sales engineer at Rackspace, working in the Rackspace private cloud business unit. I've been with Rackspace for about a year and four months, and that's as much of an introduction as I'm going to give. So we're going to get right to it. And what are we going to talk about today? So the first thing I'm going to talk about is a very quick overview of OpenStack Swift. Then we're going to go through the vagrant file I've put together to create a standalone multi-node OpenStack Swift cluster. And then we're going to do a quick demo of some of the Swift commands, the Swift recon commands, open it up for QAA. And then at the very end, I'm going to show you where you can get all this stuff, so you can do it on your own time. So what is OpenStack Swift? So I say OpenStack Swift and not just Swift, because thanks to Apple, we now have two popular things called Swift, and it makes googling for information a bit more difficult. Hopefully, Swift has been copyrighted by the OpenStack Foundation, and we don't have any silly name changes like we have had with some of the earlier or other OpenStack projects. I attempt to make my own definition of Swift, but the folks over at the OpenStack Developer website provide a very efficient definition. So Swift is a highly available, distributed, eventually consistent object in Blobstore. Organizations can use Swift to store lots of data efficiently, safely, and cheaply. And those last three words, efficiently, safely, and cheaply are kind of the foundation of Swift. It's something you can run on commodity gear or enterprise grade servers as well, or anything in between. So use cases. What can you do with Swift? So it's an object slash Blobstore. What does that mean? It's a place to put stuff. A blob, an object, it's essentially a file. So one of the biggest use cases that always comes to my mind is photo storage. We all have cell phones in our pockets with high-resolution cameras. We all have digital SLRs at home. And we're always taking pictures of our cats and our kids. And even though all those pictures look exactly the same, we always want to keep those pictures around. And Swift Cluster is a fantastic place to put all that information, all those pictures, efficiently, safely, and cheaply. In addition, it's a perfect place to put anything static file-like, so your photos, your audio files, your static web assets. It's not something where you would run any sort of real-time data, so you're not going to run a virtual machine off Swift. You're not going to run a database off of Swift. But what you can do is run the, or not run, but back up those database snapshots to a Swift Cluster. Back up those virtual machine snapshots to a Swift Cluster. Anything static that you need to write once and read later, maybe many times, is a perfect use case for a Swift environment. So how do you access an OpenStack Swift Cluster? It's done entirely through the API. So you have the Python Swift command. It's using the APIs all in the back end to talk to the Swift Cluster. You can, of course, use curl. If you tie Swift into the Ryzen dashboard, there's a GUI that sits in front of Swift Horizon. And that, of course, uses the APIs as well to interface with the Swift Cluster to create containers, to upload objects, to download objects, to delete containers, so on and so forth. So I mentioned that was a very brief overview. And the meat of this presentation is actually in the demo. So I'm going to jump out of this presentation. And which PowerPoint does this thing? All right, so on my laptop, and this is just a four-year-old MacBook Air with 4 gigs of RAM running four virtual machines. So that kind of shows you the little amount of resources necessary. But this laptop is running out of RAM, so things are going to be a bit slow. So I recommend at least eight gigs of RAM to really do this efficiently. Each virtual machine has about half a gig of RAM, which is not enough. But it'll work for demo purposes. So to show you real quick, let's see. We're on vagrant status here. So I'm running vagrant in virtual box. And I'm sure you all are familiar with virtual box. VM Refusion is another option you can use as well. And then vagrant, if you're not familiar, sits on top of both of those things, as well as VMware Workstation. And it allows you to bring up virtual machines very quickly without having to interface with the virtual box GUI or the VMware Fusion GUI. It becomes very tedious going through all those menu options when you're creating a lot of virtual machines, especially when you want to bring up an entire environment quickly, bring it back down, and maybe do it again after some changes or testing has been done. So with an image and a text file, you can create an environment. I can give you those things. You can create an exact replica environment. And that's what vagrant is really meant to do. And it does a great job at it. So I currently have four virtual machines running, one proxy node and three object nodes. And a Swift OpenStack cluster can be architected in a lot of different ways. But the typical ways are you have the proxy nodes and you can have many of those for redundancy and high availability. That's where the Swift API runs and that's the front end for the environment. So whenever you upload or download or do any sort of querying against the cluster, it's gonna go through that proxy node and then that proxy node will work with the object nodes to service that request. So on the object nodes, that's where the actual storage is. So all your hard disk. In this case, I just have a loopback file on each virtual machine, 10 gig loopback file that's created as a device and I'll show you that here shortly. On those object nodes are the object services, the account services and the container services. All those can be separated out, but wherever your storage is, that's where the object services need to run. And then you could distribute the account and the container services to other servers depending on how large your environment is or how heavily it's taxed or whatever your use case is. So let's actually look at the vagrant file. So I'm gonna start here at the bottom. And this isn't gonna look a little different than a typical vagrant file. I've taken some time to try and reduce some of the redundant code that you typically see in a vagrant file. And what do I mean by that? So I have four virtual machines running and each of those virtual machines is defined by what I call like a vagrant virtual machine definition. So it's this block you see on the screen up here. And the first top of that block is always, it's just that one piece that defines what vagrant box, which is the image you clone from to actually create the VM from. And if you don't happen to have the vagrant box, I'm using a CentOS 6.5 box. If you happen to do not have that virtual box or that vagrant box, you can pull it down from that URL. So I could give you this text file after you had just installed vagrant, installed virtual box, you could type vagrant up and it would go grab that box and begin building the entire environment. In addition, there's a VMware Fusion block in there as well. If you prefer to use that, it will go pull down the VMware Fusion compatible vagrant box if you so desire to use that. In addition, there's this turn off shared folders here. With vagrant, you can mount a shared folder from your laptop into the virtual machine. And that allows you to funnel data in between the two different things. You can also SCP that data over as well. There's some caveats there with virtual box based around network connectivity. But in this particular case, I'm not using shared folders, so I've turned it off. It makes the boot time of the vagrant environment a bit quicker. And then we get to the actual part where, or which is defined for each virtual machine. So between starting at the boxes line, going all the way down to the last end statement, typically you would have multiple of those in a vagrant file. So I have four virtual machines. Typically you would see four of those. Each one would have a lot of the same code, but you would have the unique parameters for each one. So a unique host name, unique IP addresses for each of the additional network adapters I add, the amount of CPU, the amount of RAM, and those could be the same or different. As well as specifying any sort of inline scripts you wanna run or provisioners. So vagrant has the ability to use, or tie in with Ansible, Puppet Chef, all those things to lay down your software after you bring up the environment. In this particular demo I'm just using inline shell scripts which are in this file up at the top and we'll look at it in a second. But instead of having these blocks done four times, I simply have a pretty much a for loop. At the very top you'll see the boxes.each. It'll loop through this thing and put in all the particular parameters in place for each virtual machine. But where do those parameters come from? So I'm gonna hop back to the beginning of the file. And the top there is just a vagrant file boilerplate specifying a required version, a minimum required version. So that's nothing special there. And then this is where the actual unique parameters come from. So it's just a boxes variable with a JSON array. Inside of that array are four objects. Each object has different keys, name, eth1, eth2, mam, CPU, and node type. So pretty self-explanatory there. The name for each vagrant box, the unique IPs I want for eth1 and eth2. Memory, CPU, and node type. So that you can put in whatever keys you want here. The node type is an arbitrary key I added, whether it's a proxy or an object node. I have particular inline scripts that need to be run, pertaining to which node is being brought up. So that's where I can simply use an if statement down at the bottom of that virtual machine definition to say if it's proxy, run this set of scripts. If it's an object, run this set of scripts. Or commands rather. I'll open up questions here in a second. All right, so that's where that boxes.each loop pulls all the parameters from. So let's look at the actual shell scripts. So the first set of scripts we're gonna run, or commands rather, is the common script. So in this script are a bunch of common things that I want done on each box. So this is gonna run on every virtual machine that is brought up. Simply laying down an etsy host file, so I can ping the short names of each node. As well as laying down some environment variables. So I'm putting in environment variables for each one's IP and each two's IP. And I'm doing that for the scripts or the set of commands we're about to look at. This is also a technique that you can use to reduce some of the redundant code. So there are three object nodes. And on those object nodes, pretty much the same set of commands need to be run to set up the object nodes. But of course you need the unique parameters such as the IPs for services to listen on. So instead of having an object one script, an object two script and an object three script, I can set these environment variables, have one object script and reference these environment variables in that script. And I can show you that here shortly. So with the common script out of the way, we can then begin to look at the proxy script. So there's only one proxy node. So I just have one proxy script. And there are some hard coded values in here. If I had another proxy node, I could do something similar with the object script and the environment variables to reduce some of the redundant code. So what are we doing to actually install the Swift proxy node? First, we're installing the Rdio repo. It's a CentOS VM. I'm installing EPL. And then I'm actually installing the OpenStack Swift Proxy software as well as the Python Swift client and Memcache D that the proxy node will use as well. And then I'm not gonna go through every line verbatim. It would simply take too long and it bore us all to death. But we start to get to the actual configuration of Memcache D, enable the service, start the service. And then we get to the actual proxy server.comp stuff. So you can see right here at the very end is one of those environment variables that it's pulling. And when it cats in that file, it'll pull that environment variable, put the IP in there and it will have the proper configuration. Scrolling down here, one piece I wanna point out is this. So this piece right here. So in the title of this talk is the word standalone. So what does that mean? So typically in an OpenStack Swift environment, you're gonna have a full OpenStack environment at your disposal. So you'll have Keystone as well. Swift can be very easily tied into Keystone to provide tenant authentication, user authentication, all that stuff. So in an effort to reduce resources, I don't have Keystone. So what can I use? So Swift has a temp auth authentication system that you can use. And it essentially provides the same things with less features. But I have an admin tenant, I have a test tenant, I have a test two tenant. And then I have an admin user, a tester user and a tester two and three user. Each of those users have different roles and privileges that you can, or that are applied to them. For example, the admin user has the admin and the reseller admin privileges. The admin role allows that user to have full control over their account in that tenant. And the reseller admin is the full admin of the entire environment. And then the tester users as well have full admin over their stuff, but they can't affect anybody else in the environment like the true admin user can. So that's what the standalone piece means in this particular case. And then we're going and setting up the Swift.conf file. And that's generating some hashes that will be used in creating a Swift ring. And I'm not even gonna attempt to get into the discussion of a Swift ring. It's kind of a complicated topic. Essentially what it allows you to do is it allows Swift to figure out where data is and where to put data. So if you have questions around that, I recommend go talking to the Swift stack guys. They're right across from the Rackspace booth and they're even giving away signed copies of their OpenStack Swift book. So I'm sure there's a great many chapters in there on the Swift ring. So definitely check that out. And then we get into the actual creation of the Swift ring. So we create an account ring, a container ring and an object ring. And this is the first piece right here to do that. And then we get into the actual places where the storage devices are. So for each hard disk you have in the environment or each storage device, you're gonna have a set of three lines for the account ring, the container ring and the object ring. And that may change depending on how you architect the environment as well. In this particular case, each object node has one 10 gig loopback file. And at the very end here, you'll notice that I have just a 10. That's the weight. Typically, match that to the size of the disk. It makes things a bit easier. You can make it whatever you want, but it's gonna introduce confusion down the road. So 10 gigs, just 10, makes it simple. So that will actually build the ring for each object node. So I have three of those blocks there. And then once it's built, we rebalance it. That can take a while, depending on the speed of your CPUs and how many hard disks you have in the environment. Setting some permissions, enabling the OpenStack Swift proxy service, starting it, and that's it for the proxy node. At this point, in the build process, you have a functioning proxy node, but if you go to run any Swift commands, they're gonna fail because Swift is gonna try and talk back to the object nodes, which you specified up here when you created the ring. So you'll notice at the end there, there's an IP address, a port slash device name, and whenever you issue any Swift commands, it's gonna try and talk to those things. And if it can't, it's just gonna air back and you won't be able to do anything. So before we can actually use it, we need to run all the object scripts. So again, starting with the object scripts, starting from the top here, again, install RDO, EPEL, and then install the Swift account container and object packages, as well as XFS progs, which is going to allow us to format the hard drives that Swift uses as XFS. Swift is designed to work with any standard Linux file system, but to my knowledge, it's been more thoroughly tested on XFS, provides better performance, so that's what we're gonna use here. That's actually what we do at Rackspace as well, is we format as XFS. And then we're also installing X and at D for the R-sync daemon. So once those packages are installed, we're actually setting up the R-sync daemon. And again, there's one of those environment variables to make things a bit cleaner. Once that's in place, we get into, enable the service, we start the service, we create some directories, again, permission sets, and then we get into laying down the configuration files for the account server, the container server, and the object server. And these are not the full configuration files. There's additional feature or things you can add in here to open up new features for you to query things in the environment. And then once that's in place, so this next bit right here is a bit specific to Vagrant. So because I want this fully automated, I wanna type one command, go make a sandwich, come back, and I have a full environment. I've added, there's some additional things added here that you typically wouldn't see. So the object nodes come up after the proxy node. And the proxy node, as mentioned earlier, created the Swift ring. And when it creates that ring, it simply creates a handful of files. So there's an account file, there's a container file, there's an object file, and there's also the Swift.comp file that contain the hashes that were used when it was creating the Swift ring. So all four of those files need to be on every node in the environment. So the object nodes need to pull those files. So I'm creating an SSH key on each object node, running the SSH key scan command from each object node to the proxy node so we can get the known host file populated, because as you all know, you SSH in the first time, it's gonna say yes to the fingerprint to get that in your known host file. And if that comes up when this is running, it's just gonna stall out. And that doesn't help you in any automated way. So I'm doing the short name, doing the IP. And then even though that's all done to actually get the key onto the proxy node, you can use SSH copy ID. And of course you have to enter a password. Well, I don't wanna be there to type in that password. So you can use expect to do that. So install expect and then actually do the expect statement. So spawn that SSH copy ID command, wait for that output to come back saying please type in your password. Pass the password. It's just vagrant throughout the entire environment. This is all running on a workstation. There's not really any security concerns. And with that done, now each object node can log in via SSH directly to the proxy node. And then we actually just run scp to pull down the ring files. So it's just star.ring to pull in the account.ring, container.ring and object.ring. Put them in a particular directory they need to be in. And then same for the swift.comp file as well. With those in place, we again set permissions. And then we get into actually creating the storage device. So making a directory here, just slash SRV slash node slash loop two. And then we're using DD to create the 10 gig file. Then using the loop back setup tools to actually turn that into a device we can mount, format and then mount, formatting it as XFS, make it an FS tab entry and then mounting it. So with it mounted, we again set some permissions or the user and group in this particular case to ownership. And then we go and restart all the services. So there is a Swift init command where you can Swift init, restart objects, Swift init, restart account. I like to have this a bit more verbose just especially for newcomers to this just to see all the different services that are being restarted. So with everything all set up, it'll restart all the services and now that object node is available to use. And then again, this last part is specific to Vagrant. Because I'm using that loop back file, if you reboot the environment, so vagantly reload will restart all the virtual machines. If you reboot it without this last piece here, the object, the actual loop back file will not be set back up again and the object services are gonna fail to start. So just recreating that loop back device, mounting it, restarting the services to make sure everything is healthy. So with the end of the object script, which runs three times, we're back to that virtual machine definition. And that's it for the vagrant file. So now I'm going to actually begin running some commands against the environment. So I have the Swift Python client installed on my workstation here and I have access to that proxy node so I can communicate with it. So let's do that. Let's just run the Swift command real quick. And of course, like anything, it's gonna output all the help. This is actually a more useful help than you typically see. You have all your various sub commands, delete, download, list, post, stat, upload, capabilities, and temp URL. And then some very useful examples, especially around authentication. That's always kind of the trickiest part. So in this demo, I'm using the version one Swift API. There's also the version two. So how do we actually authenticate against the environment and begin issuing commands? Luckily there's this very useful example here to show how to do that. So that's helpful. But there are some parameters there that need to be changed. So go ahead and do that Swift-A and then I need to point to the actual URL of the proxy node. So run the 2.168.236.60 on port 8080. I'm offing as version one uppercase or dash uppercase U and then I'm gonna do admin admin. So in the version one of the API, this is not just a username. It's the tenant colon user. One thing to keep in mind. So we're in the admin tenant with the admin user. And then dash K is the API key or the password. In this case, it's just admin. And then we actually can issue a sub command. So let's just type list. And it's gonna return nothing. The list command simply lists containers or object in containers. And in this case, we don't have any containers. But as you know, this is a very tedious command to write over and over and over. And with any of the other open-sex services you can just create a text file with your environment variables typically called an open-rc file. So I already have one here with the particular variables that you need, stoff, stuser and stkey. So let's just source that in and now we can just type the swift list command and much simpler, much easier. So great, we have that swift list command and there's no containers. Let's run a swift stat. Give us a bit more output. So we're in the off admin tenant. Containers, none, objects, none, bytes, none. And then a bunch of timestamp information. So all right, that's kind of useful. But let's get some photos into this. Let's actually upload something into the environment. So I have this lovely llama picture here that I wanna make sure is kept around forever. I wanna replicate it three times because by default, data in a swift environment is replicated three times. That's why I have three object nodes. Ideally, you would have more than that for failover zones. But this is my treasured llama picture that I wanna upload and save. So let's get it up into the environment. So again, swift upload, very similar to swift list or swift stat, very easy to use. So swift upload photos, I'm gonna create a container called photos and you'll notice I didn't do like swift container create photos. That's not an actual command. But in this case, if the container doesn't exist and you're uploading a file into it, it just creates a container and uploads that file. So we upload it. Successful output is it returns its name and of course you can run the swift dollar sign question mark command return to zero status. Good to go. Of course, if it returned anything else then you should probably look into what's going on. So now if we run swift list, there's our photos container. Let's run swift list photos. There's our picture in that container. If we run swift stat, we now have one container and zero bytes that's interesting over the entire environment. So let's run swift stat photos. And here we actually have a byte value. So we're looking at statistics on that container. Again, we're in the account auth admin. We're in container photos, objects one. There's the size of that container. And then we have read and write ACLs. We're gonna come back to the read ACL here in a second. They're empty right now. And then a bunch of other variables or keys that you can use when using the environment. So we can of course upload another file if we like and we now have two pictures. So let's say I of course accidentally delete my picture on my workstation. It's gone. My treasured llama picture is gone. I need to download it. Very similar to the Swift upload command. We have the Swift download command. If you just do Swift download photos it'll download everything in the container. And of course you can specify one particular file. Open it or download it and it gives you a bit more information. The speed at which it downloaded and good return statement. And I have my picture back there llama one. So cool, we got that back. But let's say we wanna share that photo to someone. I want all of you to be able to access that photo without any sort of authentication required. And I want you to be able to do it just through your browser. I don't expect you to have to use the Swift command to download this picture to get it. So let's actually apply some metadata to the read ACL so you can do that. So to do that we use the Swift post command. Swift post dash R to specify the read ACL. And then the syntax for the particular metadata is a period R colon asterisk. Single quotes around it and then the photos container. So if we do Swift stat photos you'll notice now we have a read ACL that has information in it. But how do we actually get to that data? I'm gonna open up the browser here. And this is probably gonna autocomplete because I've accessed it before. But the IP address of the proxy node in the particular port that it's listening on. And then specify version one of the API. Specify the tenant name which is auth admin. Specify the containers. And then the picture. And there's our lovely llama picture that we can access. So great, now you all can access it. But it turns out the photographer this picture doesn't like the fact that I'm distributing this. Sends me a copyright notice so I have to take it down. So to remove it, just Swift post dash R again. Just do single quote, single quote, photos. And we'll run Swift stat on photos again. And it's gone. If we go back, refresh. We now get a 401 HTTP error unauthorized. So hopefully that made the photographer happy. But he also stated that I need to delete it from the actual cluster. So Swift delete photos, llama one, oops, jpeg. And if we do a Swift list it should be gone. Just the other llama pictures in there. One thing to keep in mind is if you do Swift delete photos and you accidentally hit the enter key and there's a bunch of stuff in there, it's all gone. There is no, are you sure you wanna do this? So just keep that in mind when using the Swift command. I have a lot of just silly things in Rackspace cloud files which you can also use the Swift command for. And it's very easy to destroy a bunch of stuff with one command. So just one thing to keep in mind. So that's all I'm gonna show for the Swift command. The last thing I'm gonna show is the Swift recon command. So I'm actually gonna log into the proxy node. And the Swift recon command is something that you can use to start getting an overall health of the environment. So this is gonna take a second cause I have no ram left on this laptop. So it takes about five seconds here or more. There you go, all right. Go up to root, all right, Swift recon. And this is something you have to run from the actual Swift proxy node cause it needs connectivity into the rest of the environment. It's not something you can run from workstation to my knowledge. So Swift recon, get the help. And what we're gonna look at are the dis-use stats, the load average stats and the replication stats. So to enable this feature I mentioned earlier that when you're setting up the Swift proxy configs there are additional things that you can add. Typically this part of the config is not in there so you have to go in and add it to be able to use it. And you'll know it's not working or if you're missing that config by just running the Swift recon command and it doesn't function. So it's a quick Google to find the particular configs for that. It's a couple of lines in that Swift proxy file. So Swift recon dash L, let's get the load average stats. So it gives you the typical one minute, five minute, 15 minute load averages across the entire environment. And this gives you a quick way to make sure that all your nodes are up as well. So at the very end here you can see reported three. That's good. Typically any of the Swift recon commands if they fail to run you'll see real nasty error statement pop up and then the rest of this output. So this is a good way just to see what the load is across the entire environment to see if maybe something's wrong. I expect in a much larger environment you're gonna have a pretty consistent baseline with the high load, especially if stuff is replicating back and forth. And then let's run Swift recon, this stats or disk usage rather. So this gives you a distribution graph, disk usage space used, space free, the lowest, the highest and the average amount of disk used and we of course don't have much in this environment. We only have, each object node has a 10 gig loopback file so that gives you 30 gigs but data's replicated three times so you only have 10 gigs usable in the environment. And the last thing here, replication stats. Again, gives you replication stats, the lowest, the highest average, any failures, the oldest completion time, the most recent completion time and on what node. So as for the demo, that's all I have for that. So I'm gonna flip back to the presentation real quick. Go PowerPoint and open up for questions. Yes. If it does, you would probably need a Linux box to do it because I mean, OS 10 doesn't have Docker support unless you use Docker within a VM. But yeah, I'm not sure if it has, what do they call that, a vagrant provider I think. So like virtual box is a vagrant provider, VM or Fusion and there may be a Docker provider or something like that or an LXC. There's no Docker provider? Okay, thank you. Okay, cool. Any other questions? Good, cool. All right, so again, thank you all for coming. Everything I showed you here today is available on my website. Go to thornlabs.net. On there you'll see a search bar, search for Swift. And there's gonna be a bunch of results but look for the result that's named similarly to this presentation. It should be install a standalone multi-node open stack Swift cluster. And there's an entire post there with links to the GitHub repo with the vagrant file as well as all the manual steps. They're a little bit different because you do it all by hand. But it's all there. If you have questions, feel free to ping me on Twitter. Thank you all for coming. So have a good rest of the summit.