 So thanks, everybody, for coming to the workshop. This is for the notice afterwards. We have books, the Swift book to give out. If you would like one, drop your name card in here. Or swing by booth, we're at A29, I believe. And then there's actually a sign session that you can come and join. Or just let us know. We'll send you a copy of the books. We have a bit of a splag here on the front as well. We'll probably give out questions. Or if you've got a question later on down the row, we'll put your hand up and we'll probably throw them out. We'll try not to hit anybody. So we'll give another minute or two for people to get settled down. In the meantime, if you have Mac, one of the hands-on workshop lab is for you to get your own Swift note up and running. We'll be using the Swift stack software. But you'll need terminal. And for Windows user, you'll need putty. So get those installed now while we wait for everybody. And we'll be good to go in a second or two. Yeah? Yeah. So the lab guy will be posted on the link. And we'll get you to download it in a sec. So make sure your internet is also working as well. Yeah, we're going to do a general overview of Swift by itself. And then you're going to get two URLs and some QR codes that you guys can go grab it. We're going to have two different types of lab manuals. We're going to have the one that we're going to do today, which is relatively short, has about three sections to it. And then we're also leaving a full manual, which allows about 13 different sections. The idea is that we're going to leave the lab up for the rest of the week so that you guys can jump in and do whatever you need to do. If you want to go through the rest of the lab yourself, feel free to do so. Stop by the booth. Ask us questions, whatever you guys want to try to do. So you'll get both of those once you get about halfway through here. OK, let's get started on this. So my name is Albert Chen. I'm one of the Systems Engineer for Swift stack. That's the driving force behind the open stack Swift. Eric Reif, I'm also a Senior Systems Engineer. I actually work in the federal space predominantly. But as Albert said, we're here to talk about Swift, not necessarily Swift stack. So we want to try to keep as much of us out of this. But thanks for coming. Appreciate you guys attending. So now that we go over who we are and why are we here? Swift has been branded as one of the newest storage system out there. It's a new paradigm of storage alongside with S3. So how do we make it so it's easy enough for you to use? You can run this in your own private data center. And as one of the software-defined storage, we want to make it deployable and very user-friendly. So here's the agenda for today. We'll talk about high level to cover what is object and why you want to use it. And then we'll go under and give you a little more detail about what are some of the components underneath. Albeit this is a very high level course. So if you want to find out more information, feel free to grab us at the end. We'll give you more detail as you want. And we'll also give you some of the use cases. And then we'll proceed to the hands-on lab. So before we get into it, I want to try to get kind of a show of hands who understands what Swift is within the open stack framework. So we've got a moderate amount of people that have some understanding. So we want to make sure that this is informative. I think for some of you that maybe have already touched Swift or have been involved in the space, it might seem a bit redundant. For those that haven't, it should be informative. I think the lab is kind of interesting. It gives you a chance to kind of see the personalities that exist inside of a Swift environment and what they do and why you might pick one versus the other. And then we've got some interesting use cases that kind of give you some real world scenarios in which you might want to use this. What's an easy way to dip your toe in? And then once you kind of grow your environment, how do you turn that into something that really helps benefit your business as a whole? Great. So what is an object? So what we find a lot today is an object is a piece of data with additional metadata tagged alongside of it. This is one of the picture I taken at the previous, when I was in Japan last time, for the OpenStack Summit. So in addition to the file name, you have additional information about the photo. You have the size of the photo, the date that was taken, the location that was taken, the GPS coordinates, and also additional tags I can add on to it. So one of my colleagues raised a very interesting thing the other day for me. And I thought I'd make this presentation very much my own. So I have a lot of objects. And they all have pictures and metadata that tags along with it. What can I do with it? That's a better question. So let's jump right into that. One of the things that SwissStack and Swift is going to be talking about, I think I'm letting you guys know a little bit ahead of time, is one of the things that you could do with metadata search. We will hide this into Elasticsearch and allow you to find out more information about the data that you have. So not only can you hold the data, but you could actually do something with the data. So it makes the information searchable. So I want to search what I want to eat for dinner tonight. So I want to search for Taiwanese beefy noodle. I want to pull out the recipe. I want to pull out the picture. And I'll pull out the data. So Elasticsearch will allow me to do and do such a thing so I can make my data useful. Not just hold on to it. But then why do I want to do it with Swift? There are so many other systems out there today that tout object storage, single namespace. Well, one of the things that we find is there is a shift in the type of storage that you're dealing with and type of application you're dealing with goes along with it. Before it was a single storage application and a single storage silo. So you have three applications. You have three different types of stores that supports it. In the newest type of era, you have everything converging into a single storage system. So you want that to be extremely doable, performance scalable, and flexible, and easy to use. I don't want you to read all the bits and pieces on this, but the idea is you have that block storage from 20, 30 years ago. And then you move to NAS, NFS, and SIFS, this type of network to access storage. And then you're moving more to the left where you have object storage. So one of the key thing you want to keep in mind is the reduced in pricing and what it is optimized for. So I'm going to add in here. Sorry, I apologize for jumping in. When we're talking to customers, one of the things that is interesting is when you look at block, people think of a couple of different protocols. You've got SCSI from way back in the day, Fiber Channel, iSCSI today. When you look at File, you're talking about NFS and SMB. And then now comes this new vertical called Object. And while they're not necessarily protocols, there are two predominant APIs that exist. And that's your Swift and your S3 environment. So lucky for us, I suppose, Amazon came out and says, hey, we're going to do this object thing. So now there's a huge proliferation of people wanting to do object-based environments. If you want to run your own environment, you want to be able to provide Swift access to your customers so they can use future applications. This is the new vertical. This is about that large capacity, the large data retention, your big data environments. So it's just a little bit of a different way when your traditional file block an object. Thanks. So like Eric said, Amazon F3 is on top of that list. But there are other ones such as Rackspace, HP has their own IBM software cloud. So all these are a sort of object storage one way or the other, whether they're supported by Swift or some other proprietary technology on their own. So one of the, sure. So the term, what is the term data? The question was, what does data lake mean? And does this term have anything to do with object storage? So data lake is essentially a giant pool of data. I think GE has, is one of the ones I heard of that they're building something called data lake. It's a generic term where they funnel all their sensory data from all the sensor they built and then all the components they built. And they just want to keep those data available so they can do analysis later. But one of the key issue is, how do I keep all this data, petabytes of data for a years and years, if I want to do weather analysis, I want to do wear and tear on a piece of engine or motor. It's done through several decades of research and collection of these data. How do I keep this data durable enough so it lasts this long? We all know hard drive lasts about three to five years and also cheap enough so I don't go in the hole for keeping all this data. So very good question. So I think to sum that, yes, object storage plays inside of a data lake but so can block and file. So the term comes from this pool of data, this lake of data and it's everything. Throw it all in there, you got all of your ecosystem going on and then how do you intelligently view and visualize the data inside of that lake? I don't think so. I'm not 100% sure what you're asking but I'm pretty sure that's not in there. So we will talk about an auditor that exists to look for data corruption when you have your replications and copies and how that auditor works and it basically fixes and then quarantines bad and then does the recovery. That would be how you long term retention. So let's cover the slides and we'll move on to the next one. So why are, one of the reasons for you wanna choose OpenSack Swift and the Swift API? I think one of the biggest issue is the open source capability of it and the fact that we're not locked into a single vendor hardware. It is at the very bottom for some reason. The idea is you are no longer required by a single vendor's hardware, say at Lenovo HP IBM which we'll talk about a little bit later and no single vendor's hard drive to be able to use this. So this year if you're happy with HP, you're happy with Seagate, go with them. Next year you're happy with Hitachi, use those and stay next year. So it gives you flexibility and gives you more control back to your data center. Okay, all right. So what is under the hood? Well, let's go over the API methods first. There's only six API access methods that you really need to know to be able to store your application to Swift. Put, get put, which is upload and download, you got copy and delete which is pretty obvious and then head and post. Would you like to talk about the defense on those two? So this is what makes object interesting, right? It is not your traditional modify, update and write in place, right? Doesn't use the traditional terminology. Everything is immutable inside of an object store meaning there's only one copy of it. You can expire a copy and then write a new one but everything is done over HTTP protocol. So it uses the traditional terms that people are familiar with. So your heads and those things like that, you're actually going to get back a 200 error or a 404 error or a 202 error. You're literally gonna get HTTP response codes. So it's a different way of dealing and interacting with your data. So the point of this is to show that it's a very limited command set in that you don't have to worry about a lot but you can do a lot with it. You can add metadata tags as Albert alluded to earlier, right? So you can go and put custom tags on data after the fact. So if you're interested in taking a bunch of data that you have that's historical and putting a set of tags to it so that then you can do visualization and or groupings or say you're doing metadata searches, you can do that. You can go in after the fact and add. There is not a limit to the number of metadata tags that you can add to an object. So it just gives you basically organizational capabilities, right? Okay. No worries. So the question was is the metadata in a standard format? What you can do is so you can either run a standard HTTP command and literally do a put like you're doing a mod to it. You can run curl commands at the command line. It is literally an HTTP and you're doing like a dash X meta and you define the tag and then the definite, whatever the value is. So you could say meta short Eric, right? There you go, right? And then that way you can do searches on everybody that's short. So every single object has a method that you could talk to it. We won't go too much into it but this is a very high level. You have a base URL. You have what it's called accounts and container. Well, sorry, the API version and then you have the accounts and then the folder that you're gonna put it in and then you have the object itself. So this is exactly how you would access the method and one thing to note is that you can put this out on the web so you can access this object externally to your system depending how you segment your firewall, okay? So here are the high level architecture. We're gonna go over the low balancing, the proxy, the accounts container, what they do quickly, their application and auditor and the standard hardware that we could talk about later. So why do we need the low balancer? One of the reasons is Swift is a distributed system. It means that there is no single component that you talk to but a multiple component that you can talk to and they would do all the same thing. So the low balancer act as your SSL termination and your authentication layer. Actually, no, that's in the proxy layer. So SSL authentication. So then you can then find out what is the most efficient way to access your data and that ties in to the proxy layer. Sometimes the low balancer and the proxy container is actually run as a single service but you could think about proxy as the valet parking, valet person at the parking spot. All you need to do is give him or her your ticket and then he or she will go and bring the car back to you. You don't need to know exactly where your car is actually parked. And this can be a scale out type of node where if you need to have more throughput, you get more of these node added to your cluster and your retrieval and your put will go faster as part of the process. So next, you have the account container and object layer. The account container are responsible for knowing where all the data sits. This is a distributed service and usually you have three, four or more in your cluster to ensure durability. The idea is that they are the layer that is responsible for authentication and authorization. So you need to have correct username, password or token to be able to access your object. And you could also share that object or token with other accounts as well. So there's a multi-tenancy built into it to allow you to scale out to a very large environment. The object server at the bottom is where obviously where the payload is. It's job is to be responsible for managing all the disks underneath and then storing data successfully, okay? Go ahead. If you're starting. In the... So the account piece maintains the tenant information and knows about the containers that are associated with that tenant. The container knows about the objects associated with the containers. So this is where it gets weird. We use the word account, but it's a Swift account. It's not necessarily a user account. There is a differentiation between them. Think of an account more as a tenant, right? So I could have company A or more accurately, I could have organizational unit A, B and C and each of them could have their own login and then they have a structure that exists under their specific tenant and whether or not they can see each others is completely up to you as the admin to define. So accounts, containers and then objects. And so that account container tier is very small. It doesn't take up a lot of space, literally gigs of data, but it is vitally important to making sure that your data is accessible in rapid fashion because it's your pointer, I guess, right? It tells you where everything is and we need to be able to read that SQLite database as quickly as possible. So that's where that... So the account container has nothing to do with the physical location. It is just aware of the physical location. Swift as an entity will compute all of the different permeations of where an object will exist. We're not gonna get into it today, but what's kind of cool is if you actually run a specific command, you can see where an object would exist even if it doesn't exist. You can run this command with any name you want. You can type in ABCDEF and it will tell you where it would be put. It would tell you which primary locations would be as well as its secondary or what we call handoff locations. So it's already pre-calculated. That ring structure that exists is the foundation of where everything exists. So that's... You could actually do pre-planning. If you wanted to... If you knew what you were gonna write, you could actually find out the location of where that's gonna be. Yeah. All right, so I'm gonna keep going. So the next one we're gonna cover is the replication and the consistency portion of the system. So one of the question was, hey, how do I know if the data is secure through other, it's lifetime? Well, we do have what we call an auditor that will go and check each piece's metadata, the hash of the MB5 sum of each of the object to make sure that it has not been changed or corrupted. And if it has, it will pull the other two copies usually in the cluster to say, hey, do I have any corruptions? And then replace that with a copy. But one thing to know is the auditor and the replicator are two separate services. So the replicator's job is to, once I write a piece of data in, the replicator's job is to move a copy of that data to its secondary and tertiary designated location. So these services are always running in the background and they're meant to be kind of a system management, maintenance type of service to make sure your data is secure on all the different types of disks you have. Do you have anything to add? That's about it. The easiest thing to think about is, you also see this when we do the lab, right? There is this definition of networks, right? You have a customer-facing network, you have a cluster-facing network and a replication-facing network. They can be independent, they can also be shared. So active environments may or may not have more cluster-facing traffic versus ones that are rapidly changing may have more replication traffic. So you can define all of that. We'll show you during the lab, you kind of see where you can pick and choose that in the network structure. You'll see the three interfaces when we do it. But the point is you can get really granular with what Swift can and cannot do. If you want to lump it all together, great. Lump it all together and kick it out the door. If you want to be really, really prescriptive to the way that you implement it and the way that it provides data to you, you can do that too. Go ahead. There's no such thing as RAID in Object Store. We have a thing called Erasure Coding. Erasure Coding is actually a super set of RAID. RAID came out of this global term called Erasure Coding. RAID is an intimate association of a pool of drives that are now only can be used for that specific purpose. Erasure Coding says, I want to slice my data into some predefined chunk, whether it be an eight plus four, so 12 smaller pieces, or whether it be a 25 plus four. And that gets distributed across the disks that contribute. All disks. So if I've got 1,000 disks, I'm going to use any of the 1,000 I want. I need 25 plus four, for example, but I'm just gonna distribute it. It's not an intimate association. So you don't suffer the challenge of a drive failure and having that whole RAID rebuild time and all that. That's no longer a thing that you just wait. It'll be there. Yeah. I've got a slide that does tell you exactly that answer. So yeah. No, that's cool. So one of the key things to know is we do only, we do run on standard x86 hardware. So what does that mean? Anything that can run the operating server we support Ubuntu, CentOS, Red Hat today, we'll be able to run Swift on top of it. CEDA or SAS disk is your choice on, or we can even support Flash as well. The key thing is no RAID. And part of the reason, like Eric mentioned earlier, is we want to know when failure happened. Swift is designed to handle around failure versus mask it through hardware redundancy. So if you have an array of 10 drive and one of the drive fail, we wanna know that and direct that traffic of that 10th drive elsewhere versus make all the other nine drive run slower and slow down your degradation. And this will also allow us to replace to, instead of saying, oh, I have to have a six hour window to plug in a new drive and then to start a 48 hour rebuilt window versus convert it to a one month drive replacement. One guy just comes in with a box of drive and remove all the bad drive and then plug them all in. So it does make your administrative more a, not a reactive but proactive kind of approach to it. Makes it a lot easier. You don't get call at three in the morning. Hey, can you pop in a new drive? I need to rebuild my array from here. I think there's some new report that came out that says an average sys admin or storage admin can manage 400 terabytes of traditional storage and two petabytes of object storage. Whatever, that's the guys that supposedly know, right? The point is, it doesn't require nearly as much hands on. It doesn't require nearly as much attention from you. If built properly, you have this blast radius concept. You have this idea that a single loss of a drive or even a single loss of a node doesn't cause an outage and ultimately doesn't matter. For those that have been in the space for a while, if you think about HPC environments where we had 1,000 nodes back in the day, right? We all bought a bunch of 1U servers and we built Beowulf clusters and that kind of stuff. The idea is that in individual, those are basically disposable systems. So you wanna be able to take that system and not worry about it today. Yeah, so I just point, list out some of the general hardware that we have running that we, some of our customer have used or we're seeing that it was very good for object storage. So you have the general purpose. You got the single 1U or 2U server. They may have anything between eight to 12 to 24 drives and you put a couple of those into your server, your data center, two or three data center and you have a cluster running. Or you have more of a dense type of environment where you have 50, you have 84, 90 bay drive running six terabytes and after a rack or two of those you got multi-panel by storage running in your data center. So depending what your use case is and depending what server you use, you can go from very small scale, couple terabytes of, well we usually start, so just 50 to 100 terabyte, that's where it shines, to multi-petabytes. I think we're, customers are asking, how do I stand up a 20 petabytes, you know, cluster? And that's not a small fee by any means, okay? So to cover a quick summary, we have a native HTTP API that's open source, no single point of failure. We are looking for something that will scale linearly. Any of these components, you know, you need more storage, you add a storage component, you need more traffic throughput, you add the proxy. So it makes it very easy for you to grow from a small cluster to a large cluster overnight. It's extremely durable, you have to take out a lot of, we put in a lot of redundancy in the software layer to be aware of that and you have to break all those. So it's really difficult, that's where all the nines come from for a Swift. The consistency and being able to run on any type of hardware makes it extremely flexible and very user-friendly and for large environment and large clusters, okay? Any questions right now before we move to a little bit more interesting topic for Eric? So I want to be cognizant of time because I want you guys to be able to play in the lab as much as possible, so. Okay, we'll take one, there's one question over there. Yeah, go ahead. Correct, yeah, so it is a completely different way of interacting with your applications. If you go back, say, seven years, this was very foreign, it required a lot of custom professional services to write custom APIs and it was very difficult to integrate and was used in very specific use cases. Today, a lot of the manufacturers out there are already creating cloud connections or cloud connectors, right? So you can get Commvault, you can get NetBackup, there's a slew of products inside of the media and entertainment space that also do it. We're gonna use one at the end. They make FTP clients that now do Swift, right? So it's not nearly as difficult to interact with it. There's products like Stories Made Easy, right? If you wanna create a Google Drive type of an interaction, if you wanna be able to have what looks to you like a regular window share, that absolutely exists already today and you don't have to code it, right? So that's nice. So I actually have a demo of that at the end, so we'll save more questions for later and let Eric dive into multi-region and global cluster. So part of the other part that makes, I think, Swift interesting is the fact that we have this idea of regions and zones, right? So if you think about the structure in which Swift is defined, we know about the node that exists and inside that node we know about the drives that exist. Those we can find out on our own just from technology. But what we can't do is we can't figure out where that node sits. Does it sit in rack one or rack two? Does it sit in data center one or data center two? So in an effort to provide data location awareness to Swift so that we can make sure we have the most durability, this concept of regions and zone exist. So regions usually are defined by your data center, right? You might have a region in California, a region in Colorado, and then your zones, a lot of people will use this to define the racks that might exist. Now you can use these in any method you want. You can actually name the zones whatever you want, but it still is this region and zone concept. And the idea behind it is, if you think about a individual node that has three drives in it, if we're doing a triple replica design, we're gonna make sure a copy sits on each drive. If we're doing three servers in a rack, we're gonna make sure a copy sits on each server. If we're doing three racks in a data center, we're gonna make sure a copy sits in each rack. If we're doing three data centers across the world, we're gonna make sure a copy sits in each data center. So there's this algorithm that's unique as possible. So by leveraging those regions and zones, you can help define and give granularity to where your data is gonna be distributed. And then to add to that, we have this concept of storage policies. This by itself could be a whole session, but the idea is that a policy is defined whether you're doing single, double, I'm sorry, double or triple replica. You could do quad replicas. You could do whatever you want. If you're doing a racer coding, you define whether that's a seven plus six or an eight plus four. It just depends on what you wanna do. What's important is that in your storage policy, and you'll see this when we do the lab, you're going to pick the drives that you want to play in that policy. So you literally could pick a drive from say zone one, region one, a drive from region two, zone one, and a drive from region three, zone one, and make a small pool of three drives and get very, very granular and say anybody that writes anything to this storage policy can only be written to these drives. Obviously, it's not the greatest idea because you want availability and you want the drive performance. But the point is, is by taking advantage of regions, zones and storage policies, you can get very, very low level control of where your data sits and its availability. So one of the questions was, how do we make sure that we maintain that durability? How do we wanna make sure that if we have a loss, it doesn't hurt us? This is the type of thing. So it can get a little complicated as you add more zones and regions, but ultimately it's gonna give you better control of where your data exists. Now one of the things that happens is by default, the way that Swift works is if you have say triple sites, we wanna write a quorum. We wanna write 66% of those writes. So it's gonna go to site A and site B by default. Now once I get site A and B, then I can do the next write. So this is not a performance based environment. This is meant for large throughput type of transactions. There's a feature that exists which is called write affinity. There's read as well, I'll go over that in a second. But what write affinity does is say, hey, give me my quorum local. Allow me to use nodes that are inside of my local cluster and write out that quorum. So now I write to a local set of servers or nodes. I get my redundant copies. I now have durability that I expect from my environment. And then in the background, what we're going to do is we're gonna start to push to those other sites. So we're still gonna get that unique as possible endpoint, remember eventual consistency. But the idea is that I can write local. So think of this. We have customers that do this all the time. I'm writing a backup, say to Colorado. And then I want that backup to get replicated to Maryland. So I'm doing my backup. The last thing I want is for this backup to have to wait to have dual writes everywhere that's going. So instead I'm gonna write all my copies here locally and then over the network, I'm gonna just trickle charge my remote site. Now I get a complete duplicate copy. I have DR, so I coined this term disc to disc to DR. All of that at no cost. That's already included inside of the object store. It is inherent. That replication process, the auditor, all of that is already included inside of Swift. It's there for you to take advantage of. You just have to figure out how to do it. So read affinity is, as you can imagine, the same idea. If I'm gonna read data, I wanna make sure I read it from the nearest piece that exists, the nearest copy. I wanna make sure I get it as fast as possible. So instead of making sure that it might be read from, say, the least busy system, instead we're gonna find it from the closest system. And it obviously uses geo-awareness. So if you have geo-location services or if you have geographic load balancing, it's going to have that kind of ability to define where the nearest location is for you. Questions on that? I flew pretty quick through that. Please, go ahead. It's timing, but the answer would be no. If you read it at the same time I write, you're gonna get the copy you have. Once the replication occurs, you'll be up to date. So this is the eventual consistency versus strict consistency. Strict consistency would say that you can't open the file because it's already open. So object is a bit different in that, in that there is a hole that can exist that where data could have been updated here and may not be updated at remote site. You have latency that exists, you have all of that stuff that has to happen to get that copy over. So there is a possibility of that. So that's where that indeterministic of object storage eventual consistency comes in. So your application need to be aware of that. And how do we get around by that is generally we say don't update an existing object, create a new object and therefore the only way for you to be able to read that new object is when that object has been finished writing. So you version the object and say, okay, well here's the new update, we'll use this new update. And if your secondary site does not have that object available, it will come to the primary site to pull that object back for you. So you will always get the updated copy. It'll fail to read it local, it'll get an error. And then we'll, again, there's a set of locations as I mentioned earlier. So it will try those locations, it'll roll through the locations, the nearest one first and then the other ones that are there. So ideally you'll get back to your primary one. Yeah. Go ahead, go ahead. Yes, yeah. So there's a 102 class, which actually gets into some of the finer details of what's called like static large objects, dynamic large objects and versioning and all that stuff. That's gonna give you a little bit better detail. I mean, yes, the answer to your question is yes, you can set an aging on that the way it moves, but that's over this, right? The class will be listed at the end of the slide, so. All right, let's keep going. All right, so use cases. Oh, please, go ahead. Yes, so the question was, can I increase the replication factor? You can change a number of different variables. You can have the number of replication threads that are running. You can change the rate of the replication, so the size that gets replicated, like the minimum chunk size. So there's a number of things that you can do to improve that. I'll use a real-world example that one of the Dista-Dista DRs I was talking about, they were doing a one terabyte backup and it was taking for them four days to get it replicated over. And if you looked at the network traffic, it was like this. It looked like a little heartbeat. So very, very slow, right? We went in, we made some tuning, made some adjustments, and so when they write a backup now, we pin the interface. We literally fill the pipe, takes 25 minutes to go from Maryland to Colorado, just to give you an exciting idea. Was there another question back here? Please, go ahead. Yeah. I'm sorry, say that again, please. A Duke file system. I know what he meant. I'm not sure, to be honest. I haven't seen, I don't know, I'm not an HDFS expert, so I can't answer that, but I can find out for you. Yeah, well, okay. We're gonna keep going and wrap up and get to the slide, so. So use cases quickly, right? Backup and archive is a very easy place to start doing object storage. It's an easy way to kind of dip your toe in without having a massive restructure of your environment. As I mentioned before, people like NetBackup, Commvault, they already have the connector that you need to be able to write to a Swift environment. So it's a matter of buying yourself a couple of nodes or even a single node, deploying it and starting backing up to it. So there's one very key point. Traditional backup system like NAS or Block, they have a limited, a lot of people say, oh, my backup is so big, I can't fit into my backup window anymore. With object storage, you can add a couple more nodes to it and then you could parallelize that backup so you can complete your backup much faster than the traditional system that you're dealing with. So to add a little detail, the proxy node, right? And again, we'll go over this during the lab, but the proxy node is responsible for the rate in which data comes in. Because you have the ability to create an independent proxy node, you can add as many of those nodes as you want and then keep your object or your capacity independent of that. So you have the ability to have these different personalities. You can combine them as one node or you can split them out. Again, you'll see that in the lab. There's a little dropdown. You'll see where that option is. Big data map reduce, right? So Swift FS is a product that exists out there for HDFS. It allows you to run your HDFS environment, do your compute, offload it to your object store or vice versa, pull data in and then do all of your distributed computing. So the idea is, you know, how do we, how does Swift interact? You know, you asked earlier, you know, how difficult is it to implement or, you know, interoperate? This stuff is already out there, right? There's projects, there's programs, there's more and more ISVs, more and more programs that already are building to the cloud or building to a cloud connector. I'm sorry guys, I'm rolling pretty quick because I don't want to lose on the lab time. Media and entertainment, this is a very interesting use case. So think of then the same idea, right? We're streaming data in, we're taking lots of throughput. It's not lots of IO, it's lots of throughput. So it's about how many gigs per second I can put down into my infrastructure. So with enough proxy nodes or enough network connectivity, I can stream data in. We can take and we can do the encoding. And if you look at some of the advanced features, again, like I said, the 102 class will have it, but when you look at things like static large objects and what we call ranged reads, you can actually take and get very granular and say, give me a 15 second clip of this video so I can edit that clip. And because it's an independent object, you can modify that object, put the new version of it back up there and now have an edited video, right? So we do a lot of work with PAC-12. I don't know if you guys are familiar with them. They do a lot of sports broadcasting out in the Pacific, out on the West Coast. So the idea is that for every hour of viewing, they have about eight hours of data and they curate all that data. They do everything they deem to so that you can then see it. They're saving this. Their goal is to save a hundred years of video. So they have a local, they have a local scratch space, high performance scratch space and they have a second tier, which is a Swift environment, which is saving five, six years worth of image video there. And then they actually offload it to Amazon, to Glacier and S3. So you wanna talk about a full hybrid environment, right? You wanna talk about, hey, I know that I have active, I have moderately active and I have inactive and how to leverage that kind of an infrastructure. It's a good use case for that. So digital film tree, hands up, how many of you guys know what Game of Thrones is? All right, so those are guys that film and produce that video and they're using us to do some of the transport and some of the storage. And just like Eric said, all these use case apply for them as well. So this is definitely being used in the mainstream media as well today. Life Sciences is another easy one, right? You've got the idea of these sequencers, these mass spectrometers and all these other things that do massive collections amount of data. And then they basically create it and then they store it and they let people analyze it. So if you look at some of the inherent replication capabilities of object, we could have NIH doing work at Bethesda and then replicating out to George Washington University in DC, right? They do that data, they set it in that specific policy I talked about earlier. That data gets replicated. GW does their research on it, they put their data in, it gets replicated back to NIH. So now we have this collaborative environment that you can take advantage of. So we've done a lot of work with Hudson Alpha, Fred Hutchinson, some other companies. This is a big space if you guys have that type of environment. It's really easy to get into that long-term active archive or image repository. This is where you guys want to put this is in object storage, right? It makes sense. You're talking about cost structures that are in the $200 range per terabyte. When you start looking at dollars per terabyte versus dollars per IOP. It's a different type of a paradigm. Don't put your active environments inside object store. Put your inactive environments, right? Don't do databases in object store. It's not that type of an environment. Okay, so go out and grab these. As I mentioned before, we have two different labs we have two lab manuals. One of them is the one we're gonna do today and the other one is the full manual. So we're gonna leave this lab up and running. We wanna make sure that you guys have all week to play with it as much as you would like. So I recommend get the full manual, come back later, take that piece of paper with you that has your credentials on it. Feel free to hit it all week long. If you have questions, come find us at our booths or whatever you need, we're at a booth and we'll be happy to help you out if you guys have other questions. So we're gonna, while you guys get that all set up we're gonna go ahead and get kicked off. I'm gonna do the lab with you guys. Steve, I can break it. Don't break the lab. Have you ever seen me? I break everything. Ugh. And if everybody can just let me know once you have the manual. I don't wanna get ahead of anybody but I wanna make sure everybody has time to get in there and play around. Okay, so if you look around there are, can the Swiss stack personnel put your hand up? So there's three on the back, there's another one over there. So these guys, if you put your hand up we'll come by and help you out if you got any questions. So don't be afraid, just shout. Yeah, just raise your hand. We've got a number of people here that are circling like vultures waiting to help. Don't scare them too, you know. Does everybody have the manual? Anyone doesn't have it? Okay, we got one, two. Okay, give another minute or two and we'll flip. Would you get your piece of paper? No, I didn't. Let me go grab one. Thank you, sir. Feel free to skip ahead by the way. I'm not gonna yell at you for skipping the first five steps. I remember when I was in middle school they had this exam and the first thing said, please read all the instructions and it was like 60 instructions. And of course everybody's in there, everybody's ahead while the teacher's talking and at the last it said, only do step one. You have the right credentials so just ignore that, go ahead and download the file. You're okay. Let's make this far bigger. You might need to turn your head around. Let's see what kind of damage we can do. All right, so at this point has everybody been able to gain access to their node? Does anybody having any problems with gaining access to their node? Want to make sure everybody can SSHN like they need to? Oh, I waited with the long user. See, I told you I'd mess it up. So as some of you guys probably saw, when you run the first command you get a series of commands that would have been run or can be run independently of each other. Once you've learned to do this a couple times we wanted to kind of show how this works. This is actually what gets run and then if you want to rerun the command again and you'll see it in the manual you can just pipe it to bash immediately and you don't have to worry about it. It does everything for you. But the goal here is so that you guys can kind of see what's going on, what the process is, right? So at this point what's happening is, so we're using a little bit of a cheat here, right? Swiftstack makes a controller that kind of maintains all of the packages that you need to implement Swift. We're not trying to, implementing Swift is a whole lab by itself and so we're not trying to do that today. We wanted to give you guys some hands-on with what Swift does. What this is doing is it's basically automating the installation of all the Swift packages and all of the pertinent pieces that would make Swift usable or viable on your node. So in this case we're actually making a call out to this controller that we're gonna jump onto and that's going to create this node and say, hey, this node is now available for use inside of this cluster. I see a lot of people looking around and have questions, what's up? Has everybody gotten to the URL, the claim URL yet? So some of you guys are running the install command. That will usually show you, the whole point was we wanna show you what kind of command is gonna run, what kind of check is gonna run. You actually need to pipe it to bash for it to auto execute. So just keep that in mind. Hey Eric. Yes. Can you flip the download screen back up again? The which one? The PowerPoint so they can see the manual. I think one or two of them missed it. Sure. If I can find my own cursor I'll get right on it. And for some of you, there's about a hundred people in this lab right now. At some stage it will queue up one of the commands. So if it's just taking a while, don't worry about it. It's because you guys are all running it at the same time. So just be a little patient with it. I'm sorry, say that again. Yes, so on the card where you have the IP address on there that is a public IP address. Make that your cluster API address. So everybody should be unique. So if you guys go ahead. Did you create a user? You need what? Hang on, Albert, can you? Someone had a question in the back? Maybe not. Go ahead. You got it? Okay, somebody's right behind you. How's everybody doing? So some of you have gotten to the network page and you're noticing there are multiple interfaces. And the reason for that is we want to separate. Part of it is network traffic separation. We want there to be one network interface for you to talk to the application and another one for the internal cluster communication and the third one for the replication. The idea is that you can VLAN, you could physically separate the network connection so you have either security or even you could do traffic shaping so your replication traffic can run at night where the network traffic cost is lower and better bandwidth and therefore you don't impact your overall traffic. I think Eric will get to that step in a second but that's why we have multiple pull downs for those selections. So I may be behind some of you guys but I wanted to kind of talk about, we talked about the idea of regions and zones and we talked about some of the personalities that exist inside of a Swift environment. So when you're adding a node you can pick your regions and zones. In this case, we didn't really build any regions and zones but as an example you can see here I have a region one and a zone one. I could add additional regions or I could add additional zones and then as those get added they become available during the ingestion of a node. So this is how you would kind of take advantage of that location awareness, right? So if I wanted to, we'll just add another zone here. So now you can see here I'm gonna have in my region I have that second zone that I added, right? So ideally you wanted to find your regions and your zones and the roles that you're willing to have before you start deploying nodes because changing the region and zone requires kind of pulling it out of the cluster environment and then adding to it. Chris, could you look to your left and help the gentleman there beside you? Thank you, sir. I'm assuming everybody's at least at this point if they're not, please let me know. Oh, do you have a, okay, I'm gonna move on then. So over here, you have this idea when we talked about some of the personalities we talked about the idea of a proxy node. So because of the ability to kind of break Swift into the individual services it provides, you can add a node that's a proxy node. So imagine that you might get a 1U server with a couple of CPUs and say four or maybe even eight, 10 gig interfaces. Keep in mind that the proxy node is predominantly around throughput, data in, data out. So you wanna make sure that it has good network connectivity. Vice versa, if you're doing just an object node, then that one is more about having a number of drives, right? If maybe it's a 45 bay or 48 bay or even some of the new 90 bays that exist out there, so think about the math, right? If I'm doing a 90 bay at eight terabytes, that's 720 terabytes in a single node. So getting the multi petabytes in a rack is really not that big a deal anymore, right? We can get to five petabytes of raw capacity in a single rack. So when we start talking about that idea of regions and zones, you can get pretty granular. If you want, you could even create individual zones for each node if you're doing that really high dense environment. But the idea here is that you wanna pick the personality or pick the role that is most applicable to what you need that to do. So if you're doing backup environments and it's a matter of my data is growing, I need to back it up more quickly, you may want to add additional proxy nodes. Vice versa, if you're doing media ingest and the rate is fine, but your data is growing, you may wanna just add object nodes. Now, in this case, we're doing a single node. So what we're gonna pick is we're gonna pick this Swift node here. And the Swift node basically says I can do everything. So Swift node is gonna run proxy services. It's gonna run account container services and it's gonna run object services. So you're gonna see all those networks. You're gonna see all three networks when you go to the next page, right? You see that in that network section. That's because we've said, hey, I wanna be a Swift node. I wanna do everything. Everybody getting what they need? I'm sorry, say that again? You don't see where what? You didn't get to this? Look at the screen, you didn't get to that? Okay, are you still on your cluster? Go to nodes, do you see the ingest button or do you see a node that's already there? Go to setup, go to your network tab. So if you look in the manual, right, you actually have to change this because of the way we have it. It has an IPv6 infrastructure. So I need to make sure I pick my outward-facing the IP that came up for this. Did that fix it for you? Okay, so okay, for this lab, yes, everything's on one, right? We're not doing a bunch, we're not everything, sorry. The two of them with the cluster-facing and the outward are gonna be the same. Ideally, you would have a network that is customer-facing and then everything behind the proxy tier would be, say, an isolated network that not everybody can get to. So again, I mentioned earlier, I do a lot of work in the federal space, so they really care about that from a security perspective because they can then isolate that network and make sure no one can get access to the cluster-facing or the replication-facing network. Yeah, that's a good way to look at it. I mean, yeah, same idea, right? You've got your out which no one gives and then you've got your in, right? So again, in a combined node, the idea would be, what we recommend to people is that you create one large aggregate, like an LACP aggregate, and then you create virtual interfaces. And the reason is is that you may get spikes on different sections or different network segments, depending upon what's going on. So as an example, if you add a bunch of new nodes or add new capacity, new drives to a Swift cluster, there's a process in which you have to rebalance that infrastructure. And what that means is it takes and sees all the new drives that are there that are empty, sees all the drives that are full and says, okay, I need to start to distribute all of my data equally across all of them. That all happens on the replication network. If that is set, say, a static 10 gig, then you could be really long before you get to balancing. Vice versa, right? If you do this LACP, then you can basically spike on your replication network. The other thing that's kind of cool is if you really want to, you can get intelligent about it and say, hey, don't let my replication network get greater than five gigs because I don't wanna thrash my proxy or my ingest. So there's a number of reasons why you could do it. All of them will work. Our suggestion is make an aggregate and then basically vlan it out or in a virtual interface it out. So there's some cool technologies that we've talked to. I mean, I'm not here plugging a bunch of people, right? But there's a number of technologies that are also leveraging restful API calls. So they can get really intelligent about your environment. If, for example, you could tell it, I'm going to create a new set of nodes. You could actually send a call to the network and say, hey, I want you to give me 20 gigs. I want you to isolate. I want you to create an aggregate or vice versa. I want you to limit it so it doesn't throttle the network. So there's a lot of interesting technologies out there that will really kind of look at that software to find networking place or space. A lot of people are using restful API calls to really automate that interoperability. So we're working with some of those to create middleware. Again, this is SWIFTAC. We're working with them to create this middleware concept where that is a tight interoperation. So when you click, for example, add node, we send a call out to that infrastructure saying, hey, we've added a node. Here's what it's doing. Here's what we're doing and here's what we need. That's, I call it intelligent switching. Yeah, see if I can get somebody for you. Get your hand up there for a second. Albert, there's a one here in the third row if you're available. I can't tell if you are because I can't see you. Yes, sir, go ahead. OK, oh, go ahead. That's a question. I'm sorry. Oh, then he'll help you right there. Are you raising your hand or stretching? OK. Guys, I think we're pushing the control a little bit. We're getting out of memory errors on the node. So we killed it. Yeah, you might. So as you can imagine, you wouldn't normally take, I don't know, let's call it 100 people in here. You wouldn't take and create 100 clusters and start ingesting 100 nodes simultaneously. That's not normally a traditional use case, but I appreciate it. We'll see if I can break it too. So if you guys remember when we talked about policies, right, I was talking about that unique as possible location. Remember, we talked about how you can get kind of granular in regards to policies. So this is where you would pick that policy. So remember, I mentioned that you could get really specific around the individual drives that contribute to a policy. The other thing that's very interesting is that a drive can contribute to multiple policies. So if you're doing a erasure coding policy and a triple replica policy and a double replica policy, a drive or the capacity of that drive can contribute to that. So this is where you would pick, right? You have your account container, which we went over that, right? That account knows about all the containers. Containers knows about all the objects. And then you have what would be your traditional replica environment. So you can kind of pick and choose how you want your drives to participate in that overall aspect. So this is another thing. I know everybody's doing their own thing, but just when you looked at how you added the drives in, there was this idea of gradual versus immediately. So I kind of alluded to earlier, if you're going to do a 50% capacity increase, if you add those nodes immediately, the cluster is not happy because it's going to do everything it can to start immediately moving that data because it sees all this new capacity. So you're going to crush your network, right? It's going to take advantage of it. So the idea of this gradual would be add a new one, let it ingest, add another one, let it ingest, and let that happen kind of over a period of time. Obviously, when you're doing an initial build, you can do immediately because there's none of them already on there. But the process would be, let's make sure we do this intelligently. So we were wondering if this might happen, because if enough people have so many jobs simultaneously, the easiest thing to do is go to clusters and then deploy and hit deploy again. They queue up, but there's a timer on there that says, hey, if I haven't actually even seen anything with an X, then consider it. But it will work. We just have so many people hitting the same button at the same time. If you want to, you can drill in and look at the alert. If you actually want to get into it, the alerts that are there kind of give you some detailed information from the log of what might have occurred. So it might actually be something interesting for you. So clearly you guys are ahead of me if you're getting deployers, so I apologize. I'm so slow. Go ahead. Well, ask me, I'll see if I can help you from here. What is the error you're getting? It's just blank screen? Do a refresh on the browser? And it doesn't, bring it up, let's take a look. This is your interactive demo. They're gonna hear me, but you can't see me in the video. Let me see. So you guys may notice that as you're doing work, you'll see a little count that shows the deploy count, and we'll get to that in a minute, but when you go to do deploy, you'll actually be able to see all of the steps that are going to be created as you do your deployment. Okay, so you're, in your case, your network interface is still wrong. We need to make that your public-facing interface. Okay, it may have, he had the same issue. It may have somehow skipped this. And then you have to add drives in, so you're not ready to deploy yet. You gotta pick a couple of drives. I'll let you finish it. So right now you are, where's your lab? Where's the manual? There you go. So we're right there. Oh, I was right there. Yeah, see, you just had to get there. See, you had it all fine. So that's where I was feeling, I didn't even go. I think what happened is it actually ingested. That was what, because it was just waiting for you to finish. It had ingested. The next screen should have been a pop-up of the network. It normally would automatically redirect to there. I think you probably got stuck in the, back and forth of everything else, but you're right at the spot where you can pick your network. So I already picked your network for you. Nothing like a live workshop to get the blood. Okay, so I think, Eric, I'm gonna borrow the screen for a second. No, I'm just getting ready to deploy. No, not yet. Deploy, deploy, deploy before it comes. All right, I deployed. Okay, good. So some of you guys are noticing you got to the web console portion, and the console is not showing up. Part of the reason is we have interfaces that we are unable to access. So in this lab, everything, including this guy back here, need to be ETH zero. Oh, so flat. Okay, so our lab manual was wrong. So you're on our part. So once you make that change, just reassign the interface, redeploy, and you're good to go. So check your network interface if you're unable to access your web console, okay? Did everybody see what he just did? Okay, let me do it again so you guys can see it. Maybe, all right. So if you go to clusters, you wanna go to manage, go to nodes, manage again of the node. Oh, am I ahead of everybody? Okay, cool. Now you're in the node configuration, go to network, and make sure all three interfaces are the public facing IP. Hit reassign interfaces, and you'll see here we need to deploy that change. Everybody get that? No, you didn't get it all? Okay, I'll walk you through it real quick. Go to clusters, you good? Clusters, manage. Okay, so you haven't enabled it yet. So go to nodes, go to the network, or am I past you? Andrew, can you help them out real quick? Look, see my job failed, awesome. Yeah, what do you got? Is it running what? Yeah, local. Yes, so there's, you have cluster users, which are basically tenants, right? So you're just creating, you're using local authentication at that point. If you remember, we had a URL up on the screen and had like dot com slash v one. So there's actually three versions of the API. There's a v two and a v three. V two and v three are predominant, are only used really when you're trying to do external authentication. So things like Keystone or other authentication mechanisms, they take advantage of that identity management concept. Ideally, if you're gonna hit the cluster directly, v one is gonna be what you want. So that's usually a local access. Now, you can, you can, it's a very manual process, but you can set up LDAP authentication or even AD authentication on your nodes so that they can do that authentication. Keep in mind authentication occurs at the proxy tier. So your proxy nodes are the ones that have to have that unless you do in a combined node, right? But, but my point is, is if you have that isolated network, your proxy nodes are where all of your authentication and connectivity comes. Yes, sir. Okay. Did that answer your question? The proxy manages, yeah, your proxy is your front door. Yes. Yeah, I mean, yes. Gateway would be a correct term as well. It is, like I said, it's your front door. It's the way you get in and then it helps you get to everything else. So when data is being written, and again, I'll talk about those three networks. When data comes in, it's coming in through the customer facing network. Once it comes in, so let's just say I got a triple replica, it's going to use that cluster facing network to do that triple replica. If something changes in the infrastructure, I use the replica, the data replication network to do that back end changes. So the auditors and all that stuff run on that network. So you've got to look at it. You have to kind of understand how that data comes in. Cause remember that, again, I use that example of that backup. When you do write affinity, you're writing local. It's using the cluster facing network to write local. And when it replicates out that eventual consistency, that's on your replication network. So it feels like it's wrong, but technically my write is being accomplished on the cluster facing and the background process is doing that replication. So you just got to be aware of it. I had a guy who said his cluster facing really fast and his replication really slow and couldn't figure out why. And so of course he's crushing it because he's doing different things. So where would you run it? I'm not sure what you mean. So ideally you want to have two or more and the reason is for redundancy, right? If you're doing maintenance, if you're upgrading the Swift or if you have to do packages on the OS, you have to bring it down. If you have one proxy, you lose access. Now granted, if you have a maintenance window, then no harm, no foul because you've told everybody it's not available. But in a perfect world, you'd have two proxies and you would use something like a load balancer or maybe a round robin DNS or something of that nature to allow for this continual framing, right? Again, one of the things that Albert mentioned during backups is the proxy ultimately is, think of it as a peer to peer network, right? Like a torrent, it is all about parallelism. The more connections you make, the better you're going to take advantage of your overall network structure. So if you've got a couple proxies and you're doing multiple streams to them, you're going to get better throughput overall. Hey, my job finished, look at that. Okay, so how many of you guys got to the Swift console or have already passed it and left? I guess people on the left aren't gonna answer me, are they? Okay, so this is not necessarily a component of Swift. Swift Stack provides this. This is basically a lightweight web interface that allows you to put files, get files, so you can kind of see how an object works. So you can see here, we've got this auth user which we all created. When you add a container, you get to pick your replica, you get to pick your policy. So remember, we talked about how the different policies apply and the policy is at the container level. So when data gets written to the container, it's going to be treated by based on the policy that's in there. So if you're doing erasure coding, it's going to get split up. If you're doing replicas, it's going to get distributed. Depends on what you're doing. Now, you mentioned earlier about versioning without getting into too much detail. You can actually have a container that has erasure coding that is a versioning destination for a triple replica primary. Make sense? Because again, the properties get set on the container. Now all of these properties can be set with a standard curl command or a standard HTTP call. That's how you would traditionally do it. So you, the user, use an application, you right click, you say, set the properties. What this is really doing in the back end is literally an HTTP call. And it's saying, hey, I want to modify this parameter, this metadata, this flag, this setting on that container to tell it I want to do this policy. Does that make sense? Eric, I think there are some people that are still running into issue with the web console. So do you want to wrap this section up and move to Cyberduck or? What do you guys want to do? We've got a third-party app I was going to show you, just largely to kind of browse the object store through a more traditional interface like you might be used to file services type of scenario. Yeah, let's do that. Keep in mind that this lab will be open for you guys for an extended period of time. So please hack at it later, reach out to us, stop by the booth, we'll be happy to help you out. Doesn't have to all happen today. Your login, take that piece of paper with you because that'll still be there for the next five days. Well, don't pay for that. Actually, I think it is. I think it's a really cool app. I mean, it gives you, I think it has S3 connections, it's got Swift connections, and then you notice in our lab manual we have you download a special profile which is an HTTP version of Swift, right? It just lets you connect an HTTP instead of HTTPS, which is what you should be doing. But I haven't used Transmit, and I'll hang on a second. Is Doug still here? Doug, question. Are you familiar with Transmit? Okay, then he would know, and if he doesn't know, I don't know. Sorry. No, he was asking about Cyberduck versus a product called Transmit. I hadn't heard of it, so I wasn't familiar with it. So if you have a, sorry, if you're looking for the open Swift, Swift app book, put your name card in this. I'm gonna come around and just put your hand up. And we'll send you a book. If you don't have a book, grab a piece of paper, write your name, email address, and your shipping information, and then, let me find you one. And we'll put it in. So just the book bucket is gonna be up here, just drop by, and you have to put your name card in. I only have one book here, so. Grab a piece of paper and just write your name. Sorry? Yeah, we will ship it to you guys outside of the US. Okay, so just name address and your email address and contact phone number. So we have a way to reach to you guys. So just come up to here and drop the name card in. Yeah, I never actually created the container. I was on, I'm clicking on a different screen. So when you guys go to the connection profiles in the lab, make sure you get the HTTP connection profile. If that wasn't clear in the document. Oh, no, thank you very much. Appreciate the timing. It's great, this was, it's on. If you have any questions. I'm sorry, you guys ran through it. It's always fun with this many people. Stop by the booth. If you wanna get more into it, we'll be happy to do it. We've got a number of people that know this far better than I do that can get you into some geek stuff. That's an exercise. Thanks, man, I appreciate you attending. I've got a couple swags here if you guys wanna come and grab something. So charger for your phone if you got labs and stuff, coasters. First, thank you guys all. Thank you all for coming. I know we got the booth roll and we're cutting into your evening. So thank you very much for the time. One person? Appreciate it. Cooler, key chains, one power pack for your phone if you wanna charge.