 Hello, everyone. My name is Joe Arnold, and I'm one of the co-founders of SWIFTstack. And today, we're going to have Dirk Peterson from the Fred Hutchinson Cancer Research Institute talk about how they're using HPC and SWIFT. Dirk, do you want to introduce yourself? Yes, I'm Dirk Peterson, scientific computing director at Fred Hutch. I've done this job for a couple of years, and recently, we got into SWIFT. Excellent. Do you want to do the intro slide here? That was yours. That's mine. That's yours. The challenge, okay. Well, okay, so the challenge here is the need to have an archive system to offload expensive storage. And what we're seeing right now, particularly in the high performance computing workloads, is that it's not sufficient anymore to have tape be that archived here. It needs to be available because at some period and point in time, in order to run an HPC compute job, you need to have that data available, and it needs to be able to feed into that HPC cluster. However, it needs to be low-cost. And that's kind of the rub. And so, what we're going to talk through here is how to use SWIFT so that you can integrate it into the HPC environment. We're going to talk about kind of the story that Fred Hutchinson took to decide going down this path, some of the costs of doing this. And then again, some of the tools on how to do that actual integration, and that's what we're going to be talking about in today's talk. Super. So now it's me. Now it's you. Yeah. Go. OK. So Fred Hutch is not a technology company. It's a free conservative cancer research center. In the best sense of it, we are about half a billion dollar budget. Most of it is government funded, about 3,000 employees. And we can be seen as kind of a, some people call it as a research mall that central IT provides services to a lot of decentralized research groups who have all their individual needs. And let's go into the next one. So we have multiple data centers. It has kind of been organically grown. It also depends a lot on government funding. These data centers are geographically, I wouldn't call it geographically distributed, but they are on campus in multiple locations. And we wanted to take advantage of that fact. We have about a hundred staff, a couple of sysadmonds. And we started to venture into something like storage charge bags. And at the same time, we also had, and you see this here, we have a lot of capacity. You see all these empty wrecks. They are, they're still empty. And we have probably one of the, if you are into data centers, 1.03 PUE is, I've never seen something lower. It's natural, air-cooled. And we are in Seattle where it's always like very mild. So this is kind of the setting when we said, oh, we need to find some technology that actually works well with all of that. Let's see you again. So let's meet you again. So about SwissStack. So what we do at SwissStack is we work on OpenStack Swift and develop that open source engine. There's a big community. Lots of developers that are part of that. And we're one of the leading contributors to that project. And what we do as a company is we build a, we build the deployment, the management and the operations tools so that it's easy to deploy. It's easy to scale an OpenStack Swift environment. And we've added capabilities around that. So things like how to do user authentication, how to do charge back, which is one of the things that we're going to talk about here. And we'll get into some more of the details of that later into the talk. We also have a couple of resources that folks might want to check out. So at our booth, we have a book about object storage being used for genomics. And we take a lot of the lessons learned with Dirk Peterson and Fred Hutch, Cancer Research Institute. Also with Brandon at Hudson Alpha, which we'll hear about in a minute. And we talk about this use case and how important it is to have these really large-scale storage environments for HPC and sequencing projects. And then we also have an O'Reilly book, which is just all about Swift. And if you want to roll up your sleeves and learn about all the nuts and bolts on how OpenStack Swift works, that's the book to check out. So a couple of resources there. They're also available on our website at swiftstack.com.slashbooks. And here's the thing that we found, was that the researchers and these research institutions had very, very high storage costs. And you would see, if you were to map back the price per terabyte storage costs for a NAS environment, it would be $40 per terabyte per month, which was really high. And they were really asking for something that was much, much lower cost basis, because they would turn around and they'd look at the public cloud and think, oh my gosh, that's what I'm baselining my cost against. And so as a result, what they would do is go out to Best Buy or wherever the local retail and just start buying stacks of hard drives. And this is actually a picture of a bunch of personal NAS units sitting in some researchers' desktop area. And all this data, it's not secured. You're not gonna be able to collaborate with it. You're not gonna be able to feed this into any future research projects. So it's definitely an organizational mess. So if you go towards the NAS environment, it's high cost. If people take this approach, then it's a bit of a chaos. So this is part of the things that we're seeing need to be solved, particularly in these high-performance computing environments. So we're talking about cost, cost, cost, cost a lot. And cost is not necessarily only an absolute number. We see that the finance folks, if there's something that costs that much and that much, it may very well be that that is better. It's also about predictability. If you have to do a forklift upgrade of your tape environment every couple of years, then you aim to predict the cost for that, but it's not always possible. And so it becomes sometimes a little bit of a surprise. Something is end of life and it's prematurely end of life. And then you come up with this big check that they have to cut then and become the surprise. So they don't like surprises and they like the cost to, if they grow, they like to grow them steadily and not having these step functions. Cost again, I've done a similar talk last year where we compared the NAS system here that's basically a standard scale-out NAS system with Amazon and Google. And of course they see, yeah, our current storage is more expensive than what we can get in the cloud. What's the consequence of that? And here we have our current Swiss stack environment. It's at $11. We, the Swiss stack got a little cheaper. The cloud surprisingly stayed all put. And even on the NAS front, we haven't seen that much decline. So we see actually that the costs are not decreasing anymore. We can see 15% or 10%, sometimes 8% per year. And we don't know where that's getting us. Currently we are down to glacier costs almost. There's another component that's interesting and you might have seen that Swift is, you can actually put multiple zones and multiple regions. And each zone costs about $1 per terabyte per month for us. That is a tenth of glacier. So that means if we want to have a geographically distributed location somewhere, we can actually buy the hardware and get that for $2 a terabyte a month. That's really a good number. Mentioned before, chargebacks for us drove a lot of these behaviors. So we put a sophisticated chargeback mechanism in place. It's all SharePoint based right now because the administrative folks are used to really windows computers with SharePoint. And it's been adopted quite well. We had a go live in mid-2014. And there was a really strong interest. I mean, it was kind of surprising for us that the interest was so strong. Researchers just don't want to pay the price. They always get back to you with this Best Buy Drive example. And the result was kind of this. So here we started the charging officially was November 1st. And we had announced it in 2013. So long warning, right? But then in mid-October, people realized, oh, chargebacks are coming. And so they started to actually move stuff over at rapid speed. Then suddenly we hit the wall. This is the yellow line where we say, OK, at that's the time when we need to expand. And so we stopped the whole process for a bunch of time. And then we bought new hardware and had a couple of other things. And then we started then later and the growth started again here. So that's a really fascinating story. A little bit interesting here is what are these blips? We had a maintenance of one data center. This is all across three data centers. And the one data center had to have a new power supply or a new powertrain late. And it had to be on two weekends, on two following weekends, consecutive weekends. And what did we do to actually sustain the maintenance? We just shut down the servers in that data center. It was basically just running the shut down command on Linux, parking them for an entire day, and then bring them up again. And it all synchronized. And so there was no single second outage or there was even not a ping loss for a Swift class. So that was quite fascinating. Another thing you see here is there's a blip down. We do have a piece of software in there that is, we call it the undelete, or it's a trash can. And if people delete data, then it will be held in this trash can for about 60 days. And then you see then finally that sometimes storage is reduced. That's the way how we do our backup. We don't have any backup behind it. And the trash can is fulfilling this need. Yeah, I might just add to that. So that's something that we, when we were starting to work with you that realized that sometimes researchers would delete a bunch of data. That was a concern that you had. And instead of having a separate backup, take that data, move it to another storage system, and then archive it, what this does is it basically treats it like a trash can on your desktop, but for giant HPC sets of data. So if someone does delete something, then it's a configurable number of days that you can retain that and return it. That was a very critical component for us in Swift to have the undelete feature. It is a unique thing that you can only get in Swift because only Swift supports these middleware tools that you can put into intercept calls. And it was like, it's not actually a giant piece of software. It's actually pretty simple. And that helps us to really get the storage cost down to a minimum. Tape for us is surprisingly expensive because we are low scale. Tape is almost as expensive as Swift. And if we want to have a backup of your Swift cluster, then you're yet again at something, then you're almost at Amazon pricing again. So we really wanted to avoid that, and that was a critical component for the success. Today we can migrate about 30 terabyte per day. That's pretty good. We do not actually need that. We are okay with 5 terabyte, but we get 30. So taking it. This is a boring slide. It's the standard system we use, the standard storage hardware is a well-known super micro 36 drive hardware. It is deployed in thousands of data centers around the world. It's basically just a standard workhorse. Two things to note here is that we, unlike some of the other talks you saw before, we went, again, lowest cost. We are deploying desktop grade drives, the ones that people tell you not to use. And they were actually quite fine with Swift. And we also have some SSDs here. You saw that in the CEF talk, perhaps earlier, that people kind of standardized on these kind of Intel S3700. And the SSDs are to cache metadata. So even if you have maybe slow object transfers, or if you think that you have them, you will get a directory browsing and metadata operations are really fast using these extremely low-cost SSDs here. You might if I just touch on this. So we've put a couple of these hardware configurations on the slide so that everyone can see some of the different configurations that people use for different use cases. We saw the Ancestry.com one last session. It was much more high performance equipment. And there was more tiers. There was a proxy tier, account container tier, and object tier. Well, this configuration, everything runs on these nodes. And they scale just by adding another one of these units into an existing rack or a rack next door to it. So that was a question, right? So you want to optimize performance or capacity. And we said, can you just add a bit of RAM and add a proxy to it? And they said, sure, you can. So let's do it. And it's a very, very simplified architecture for that. It almost looks like scale out mass, basically. It's from an architecture diagram, a bunch of boxes, wired together with cable. You don't need any Infiniband. It's just all ethernet. So it's the most simple thing you can possibly imagine here. Swiftstack. Swiftstack an out-of-band management controller. And that's the key to that. If I have my Swiftstack controller and it does something, then it doesn't necessarily affect the Swift cluster. I can have my Swift cluster. It's completely contained. And occasionally, maybe once a week, maybe once a month, maybe twice a week, I need to manage that. And this controller lives in the cloud. And I can log into that controller. And it then does manage our hardware that's on-prem through secure channels. So Swiftstack provides the control, visibility monitoring. We have authentication, LDAP, Active Directory integration. We talked about the undelete feature that has been developed by Swiftstack. Is it open source? Is it? It's on some GitHub repository. I'm sure it's somewhere. But it's a checkbox. Yeah, it's a checkbox. And then capacity alerting. So it's all within this GUI that is incredibly easy to use for us. Do you want to? Yeah. I mean, I think having you walk through this is probably more valuable than, I mean, one thing about Dirk mentioned there is we offer the controller in two models. Like one is the way they're using it, which is a hosted version of the controller, which is something that we operate in some of our data centers. And there's a connection that comes into that controller to manage on-premise hardware. Well, the other option is that management controller can also co-verside next to the equipment as well. And we have customers using using both configurations. And it depends on it's an interesting discussion. It depends really on what your security team is comfortable with at the end of the day, right? You could say, Okay, I have it in my data center. It's more secure. But is that really true? Because Swiftstack manages this thing 24 seven and and our sus admins don't manage the thing in our data center. Right. So you can it's kind of a discussion that you need to have with your security team, whatever they are more comfortable with. It's also a combination of different use cases and different stages that people are evaluating the system, right? Usually when people get started and want to try and evaluate it, they're almost always using the the hosted controller. And then, you know, use cases where they don't want to have to manage that extra management play. That's when they would they would use the hosted version and in a production. Yeah, this is the the key feature that I like most with Swiftstack. It's the first of all, it's deployment automation. So you can you can install, you can bring up your you have a you have a hardware installed, you run, bring up your shell, you type a bunch of commands, they connect to this controller, they pull down the software, they install the software, then you join them to this controller. And that takes, you know, 10 minutes, perhaps, and you're done. And then you wonder, Oh, this is all my project, I have, you know, reserved so much time for my storage project. And I allocated all these resources and I'm already done this. So you're kind of, you know, excited and disappointed. It's all done. And but that the deployment, and that's a really important thing in OpenStack. And whatever you see on this kind of deployment, you see 100 of deployment options and tools out there. But you don't really see a lot of upgrade in place upgrade in production upgrade. And you don't really see this, you don't see a button that's orange that you can click. And you see, Swiss admins are hovering around this button. And they wonder, Well, this is this, you know, it's just like that, a button upgrade. And should I click it or not? And then you click it, and then things happen. And 20 minutes later, your entire platform is upgraded to the latest version. You can run this completely in production at two o'clock in the afternoon. And it's, it's fully transparent. So that's, that's one of the, and I hear that there's a lot of work behind this button. The button, how it works is, is so it's a distributed systems architecture. And, and a single node can be pulled out of the cluster. And requests can be routed around. So we kind of take advantage of that. We control the load balancing tier above as well. So we can, we can be careful in terms of routing traffic. And so what we do is, we'll pick a node that we want to do the upgrade process on, and we'll tell the load balancer, hey, begin pulling this thing out of the load balancing group. Meanwhile, we're going to complete the existing TCP streams that are on that particular node and wait until they drain out. And once they're, once they're all drained out, then we'll, we'll do that upgrade cycle, do a test on it, and then re-introduce it, reintroduce it back into the load balancing group. And once we do that, then we can move on to the, on to the next one. And if it's an object storage node, really similar story, it gets pulled out. And whatever's running the proxy server processes will understand that, hey, I'm going to set a back off interval, so I'm not going to keep sending requests out to that, to that storage node. And so then, from a user perspective, you, the, no user's going to see a connection drop while that upgrade process is happening. And again, if anything happens during that whole process, then an alert goes out and it, and it stops the process. But what we do is we make sure that during the upgrade process, if the new version and the old version is Swift, and you're left in a half upgraded state that, from an API perspective, it's compatible. And when we're introducing new capabilities, we make sure that you're not allowed to configure some of that new capability until you've already rolled out that, so you're not left in a, in a, in a halfway state. But the, the point is to make sure that clients don't see any issues. And you can do an upgrade in the middle of the day rather than doing it at three o'clock at night when no one's really coherent to do, to do a, a storage upgrade. It's, it's better during the middle of the day when you're fresh and you can address issues. And that's kind of the idea behind that. We see, we have declared, and it's always difficult to make that certain, what kind of staffing resources you use. We are estimating 0.253, a quarter of an FTE per year. And we're doing pretty okay on that right now. I have seen, there's not a lot of research on Swift, pure Swift deployments before. And I've seen one, 1.5 something FTEs to manage a Swift cluster, a larger Swift cluster. So if you are trying to roll Swift yourself, do this also. Try it out and, you know, make up your mind what you want to do. It's, it's pretty straightforward. Talking about HPC, use cases and tools. So typically, Swift has always this reputation of an archive. We're talking about archive workflows. We only recently talked about higher performance stuff. And this is also one of the things where we, where we like the benefit. We had, our requirement was really low in terms of performance. So the researchers said we don't want to pay any money. But once the system is in place, they say, well, it shouldn't be slow, right? So when you bring up a system that can't scale, like a tape system, then you have the complaints and you have to live with investment for five to seven years. As you see, when we run our HPC system, we're getting an aggregate throughput of about three gigabytes per second. That's basically a 10G backbone completely saturated. So it's, the network is really the bottleneck. We can, we've currently like 15, 15 Swift nodes and they are, they can, they can really max out the, the networking infrastructure. And this is pretty good for an HPC kind of workflow. So now we get to this point. We are a traditional, I mentioned that, we're a traditional company. And most users, they are used to their POSIX file system. We have multiple POSIX file systems mounted onto our HPC system. And it's just easier to open up a file and you edit it and you save it or you copy it. And now we're going into object storage, right? It's not a file system. We have containers before we, some people talk about like folders or containers. It really doesn't matter. But, you know, in object use these terminologies. And the first thing is that people see as well, where's my sub directory? I, I, you know, it's not there. And so then it's, it's a confusion. And then what people start to jump into is they, you know, want to venture in putting some sort of gateway in front of it, some appliance in front of it to make it work like a traditional file system. But we want to explore some ways of to not do that, to just use pure Swift. And maybe, maybe it actually does meet our requirements. So let's talk about that. In Swift, we can basically simulate all the sub directories by putting a forward slash into the object name. And many tools or most tools actually recognize that and are able to display this as folders if you, if you use these tools. So you basically put a, put a, put a, put a slash in and you can see this here. You do have a pseudo folder and then you have a container. And this pseudo folder has a slash in it and it's going to be interpreted as a, as a, as a, as a directory, a sub directory. Of course, when you see this, is everybody excited about this? This is what you, I'm, I'm going to go to your end user and say, Jerry, this is what you need to do. It's really easy. You just need to type segment size, that container name, blah, blah, blah, blah. And then you have it uploaded. And then Jerry's turns around to you and looks at you as if you have been abducted by aliens. And I said, no, you're not going to want to use it. So but it's not really a big problem. You can basically just wrap it. We do this all the time as HPC people. There's complex bioinformatics tools that have 15 different command line options. And we're writing wrappers so that to simplify our workflows for the end user. And this is, this is the same thing. So we wrote this tool called Swift commander. And instead of all the mumble jumble that Jerry has to do before, just have to type SWC upload, you know, my local folder, my, my Swift folder, and you're done. And then we added another goodie to it that was SWC compare that basically takes the size of the source and the size of the destination. It's not an MD5 subject, but it gives you a good indication whether it actually was successful because some of these uploads, they take hours and hours and you may not, you may not have seen the result of it. Right. So these sub commands, we stole them actually from Google. Google has this tool GSU till that implements GSU till all these traditional unique sub commands. And when you show that to an end user, they see, I know how to do this. And, you know, you just add this other little thing in front of it and you're back in business. So we use the Python Swift client. The Python Swift client is multi-threaded. It's, it has enormous performance. It is very stable now. And we also use curl for some other latency sensitive things in that on the back end. So there's no real risk in using this tool for us. It's, you know, some self homegrown thing, but underlying is really all the power of like the Swift client under it. We have, people talk a lot about metadata. Eventually we may have tools that allow us to research, you know, very complex metadata, but we don't have them today. So we just want to see how can we get going on this. But it's actually quite easy. The Swift client also supports metadata. It's, again, not completely user friendly, but how about you just add some key value pairs here, simple as command line option three, four, and five. And then they are stored as metadata in the Swift cluster. And then you can retrieve them later. And once you have more mature tools, you can actually use those as well. It's very simple. People see that I said, oh, yeah, that's really simple and it's useful for me because I can't find my stuff easily anymore. So this is a way to, to handle that one. In, in high performance computing, again, we don't have a file system, right? You don't have, you can't just open the file and that confuses people at first. But then in, in HPC, people writing like 50 lines of super complex batch submission scripts that have all sorts of things in them. And I said, so, so you're writing this very complex script to do something, but then you can't just add another thing into it in order to get your file. And then you just show them, well, you can just, you know, see if the file exists in your scratch file. So if it doesn't, you just download it with, with Swift commander. And, you know, you're back in business. So you're adding three lines to a really complex batch submission script and, and you're, you're in there. So they're using that today and it's, it's actually, you know, there's not really an issue with it. Talking about performance in with Swift. So here's one use case. So let's say we want to like regenerate BAM files out of these files. They're genomic arcfives that contain basically your entire genome. They're up to 150 gigabytes large and you don't need them very often. But if you need them, you don't really want to wait four days until they are restored from your tape because research is often ad hoc and, and also decisions are made over the weekend. And so you have the ability here to, to really copy lots of files in, in, with very high performance. In this case, you can see that we have, we have this paralyzed batch submission that we use to download things on the cluster. And you can submit like 30 jobs on this cluster and then we're getting 1.4 gigabytes per second throughput. Again, that that's limited by my own allocation. I managed this cluster, but I didn't give myself unlimited resources. So that, you know, we can't drown the resource. And you see here is our scratch file system and it's and it's a BGFS high performance file system where you can actually see that the throughput is, is, is real. How long in time? We have plenty of time. Good. It's always probably in a little bit early. So we're good. That's great. And you tell me when you're bored, then you can accelerate this a little bit. Then. Frequent use case, you can buy the most expensive scale out NAS system. You can spend millions of dollars, but if it comes to copying a bunch of small files from A to B, they are all not good. There's no good solution to that. Use tools like Arsang, CP, anything. It is, it is, it is everything that is mounted. We NFS is, is, is not going to be very fast. Part of the reason is most tools are not paralyzed. So Arsang is not really a parallel tool. And one, one solution to that that we found productive is actually saying, let's tar it up. But again, then the problem is if you tar something up and it is like four, four terabyte or 10 terabyte under that directory structure, then this big hunking tar ball becomes very, very inefficient to use. So we basically did this on a directory level, basically one tar ball per directory level. And you can then archive it very easily and also restore it very easily. So in this case, you can just restore this folder two and folder three. And you don't have to restore folder one. And you save some time on that. And it's those, those files are then typically a couple of hundred gigs, which is easy to handle. We were able to get the before at 400 megabytes. So the compression algorithm that we use is a PIG-Z multi-threaded algorithm on GZIP compatible. So you basically can get the files always back with standard GZIP. So you don't have to use any other tools. And you can then get about what is it like 111 seconds using GZIP and five seconds using PIG-Z to upload this data. It has enormous performance. We've shared this on GitHub. Feel free to use. It works quite well. Now, we're talking command line, command line, command line, but not everybody works with a command line. In our special case, we often have huge directory structures with lots of big data, also a lot of small files that are machine generated. And then you find occasionally in that folder, the Excel spreadsheet where somebody kept track of some other metadata that wasn't machine generated. And they keep their log in the Excel spreadsheet and they want it in that folder. They don't want it on another drive or on their desktop. They want it with that project. So they put it in there. And sometimes they need to access that Excel spreadsheet. And for that, we really need Windows and Mac-based clients that have GUIs which just click on and it opens up and you edit your stuff and you save it back. It's probably about 0.0.1% of the data that is used that way. But if you can't have it easily, then people are not going to adopt the system. So we have... This is CyberDuck. CyberDuck is very functional. It has lots of features. You can actually edit the metadata with CyberDuck that you uploaded with Swift Commander. You can change it with CyberDuck. It's an extremely versatile tool. But it is not yet your Windows drive that some of the people in your group might want or the Mac Finder integration. So it's not that yet. And so that's what we have here. I've been personally working actively with Storage Made Easy and also with Expand Drive. They have tools that actually can mount Swift at the Windows drive. So you can basically let people not know that you're using Swift. You can just say, here's your drive O and you can store your data. What credentials do you get? Do they use to log it to do that mapping? They can do both. So we have a shared account, a Swift account, and that account is either used directly if they don't have any special requirements or use Active Directory. You use, and there's a hash. It's not yet integrated with Kerberos in Windows, but that's one of the things on the roadmap. You still have to actually enter your password. It's encrypted in the registry. It's your AD password. So these work pretty well. They're not high performance, but again, the use case is you want to browse your directory structure. Directories are typically frowned upon with object storage people, but they are very important metadata because you have it in directories. It's what's there. So you need to browse through there and get to your data, and it's really fast. So again, we have Swift, metadata on SSDs, so the browsing through these folders is actually like the Windows share. It's not a big difference. Where it falls a little short is if you copy massive amounts of data from your one drive to this other one, these don't work very well because they also have local caching, and they're more built like for cloud environments where the object store is very far away, and we are very, very local, so the performance is limited. But it's functional and works well, and it's also an entry to some grander things. So if you take, for example, the Storage Made Easy tool, Storage Made Easy offers like a full cloud sharing, a file sharing solution, but that's a project. You need to, if you want to deploy something like that, it involves servers, it involves security. It is an extra step. It offers a lot more functionality than this drive, but this drive is basically you can walk up to your desktop support team in since, I don't know, four or five years, I think even in most enterprises, automated software deployment has taken over this. Very few sneaker assessments left now who manually install all of these things, so you can basically deploy these really well, and you can solve this on a local level, and then when you have grander plans later, you can actually take it to the step and you can replace it with something like the Storage Made Easy application, the larger framework, and it's a good migration path. Couple minutes. How many minutes? Two more minutes? Yeah, okay. All right, so here is another example. ArcLone, multi-threaded mass copy backup and data migration, we did this on Windows, it's an application written in Go, very fancy, it works very well, you can put it on your Mac, you can put it on Windows, and we use this to back stuff up from Windows desktops, or from Windows Server, sorry. And then this is more like for life science people, Galaxy is the most used web application for this, for data intensive biology, and we drove together with SWIFTAC and together with the Galaxy folks and our local research team at Fred Hutch, the medicine group, we drove an integration to make that work, so Galaxy can use huge amounts of data and it can use SWIFT today. And that's it, so I mean the big takeaway is that discovery that people are doing in the bioinformatics and the research space, it's not necessarily a problem of generating the data, the primary data that's used for research, so I think some of the big issues are how storage and HPC environments can interact with that data, and that's ultimately what's unlocking some of these discoveries. I mean I think that's kind of the takeaway, and thanks Dirk for sharing, and we do have a few copies of the book, which are, oops, falling down there, but if you want to get a copy and hear more about some of these tools and some of the stories around HPC, then please feel free to come up and grab one, and thanks Dirk for sharing the story with us. Questions? Yeah, questions, does anyone have? Three copies? So the question was why do you need something like RClone if the system is already keeping replicas in the system? It's basically we're using Swift Stack as a backup target. This is like a Windows server that does something obscure that I don't want to explain, and this Windows server is running something local, and we use Swift Stack as a backup target for that, and we're using RClone, and there's a ton of other tools. RClone is just one example that's a command line tool that I think that people will like here, and it's really functional, it's written in Go, it's really fast. It's not yet been very long in the field, so if you take like the Python Swift client, some people like the Python Swift client, others don't, but I can say the Python Swift client is the thing that is very well developed. It's a lot of developers on it, the responsiveness is very high, and it's a really robust workhorse. This may be a little bit more elegant, written in Go, there's some, it's a little, but it's not there yet in terms of like a mind share. Again, there's one developer developing it, who's really good, but it's not tested in the field as well yet. Yeah, another question. So the question, just so people can hear, is there a semantics for partial retrieval of a large object, is that correct? Right, I mean, if you have a multi-terabyte object, and you want to do partial retrieval, because only some part of the way that's interesting, I mean, standard HTTP just do a, you know, offset and a length, but you mentioned middle way, somewhere, where the middle way comes, is it possible to find domain aware partial retrieval? So there is a, what we call a range request in Swift, so you can ask for a byte offset for that, and there's a few strategies on how uploads happen, and I think what you've done is done a wrapper to take a very large file and break it up into multiple pieces, and then those are pushed into the cluster as in multiple streams, and that's what enables a lot of the throughput. And, but you can still do a byte offset of that large file. I think the question is, some store a very large video file, and is that something that can be focused on by... Oh, that's a, oh, I got you. So the question is, can you do something smarter in the middleware? So for example, with some of the BAMs or the FASTQ files that exist in the genome sequencing space, there's going to be certain offsets that map to something logical, and so could you have a bit of middleware that's smart about that, say Brandon's going to correct me here, or you might correct ChromoZone, 18, and then pull that section out of that BAM or FASTQ. Yeah, that's middleware that you could develop inside of Swift, and likewise, so if you have a video transcoding, then to create that analogy, you have a large file, and there's going to be a time offset. Yeah, is that something that you can put into Swift to do the offset on them? Sure. Exactly. Yeah, exactly. That's a great idea. All right, we're out of time. Dirk, thank you for... Okay. Congratulations. Thank you.