 So Arnold, and I'm from a company called Swift Stack, one of the founders here. And today we have the opportunity to talk about some of the use cases where people are using Swift. And so what we're going to do is we're going to be able to walk through three use cases. First, we're going to talk about ancestry.com. And then we're going to have the opportunity to talk to a couple of other folks. So someone from HP might gas away from HPIT. And Scotty Miller from DreamWorks. So it's a pretty cool opportunity. Unfortunately, Jordan couldn't make the summit, unfortunately. And he's asked me to present a few of the slides. I'm going to go through and tell the use case and tell the story. And then we'll invite Scotty Miller for DreamWorks and Mike Gasway from HP up on the stage. And we'll talk about what they're doing. So that's the agenda for the next 40 minutes. All right, so to get started, this is a story about how ancestry is moved away from a traditional scale out NAS and started to use object storage and the story of what they did. And so Jordan presented this, so I'm happy to be able to walk through this. So who's familiar with ancestry.com? I mean, it's pretty popular web service, right? I know somebody in my family uses it, too. And they're always sending me photos and getting me to fill stuff out. And what it does, it allows you to put your genealogy online. And not only can you upload photos and records of you and your family and old photos, but then you can also start to dig through historical records and they analyze that data and connect the dots. So that's what ancestry.com does. And it's not just ancestry.com, but they also have a few products that other folks might not be aware of. They have a pure archiving, military records, and newspapers and some of the genealogy stuff. Data is a really big part of their business. And they collect and they categorize and they organize a really tremendous amount of data. Because it's not just about data that people upload into it, but it's also the data that they purchase and they collect. They have rooms of these folks that actually will take these archives that they will purchase and then they'll record them. And so they're kind of carefully making sure that they scan and enter in the data correctly. They have a DNA testing product where they will mail you a swab, a tissue swab, and they will sequence that data. And then when you do that with several family members, it'll connect all up. And then do a lot of other media things, TV shows. I don't know if you've seen some of those things. They're storage workflow. So as you can imagine, they have a lot of small file workloads in ancestry.com. And so that's 100k to 2 megabyte files. So it's getting a high resolution scan of a document from somebody immigrating into a country. There's also user uploaded content that they store. And that data, the processed images and the user uploaded, those are, think of that as long tail data that will get requested by users in the application. But then they also have an archive where they keep the original content, the original high resolution raw data of the actual file. So that's separately stored offline. So what they had was a traditional scale out NAS. And when people were building applications, they would connect, they would create a volume and mount that to the server that was receiving, it was serving up that data. And they do that over SIFS, NFS, SMB type of interface. But there were some challenges with that. So cost was one big one because they would buy from vendors where they buy an appliance model. And so they were forced to buy when they upgraded from that particular vendor in order to expand the footprint. They didn't have the same type of industry standard APIs. They were serving out HTTP, but then they were talking SIFS or NFS to their storage. So it's some scaling problems, renewals, licensing, and then just the management of having lots of volumes to manage. So they knew they needed a change. And so the way they executed this change, and we'll hear some of this from DreamWorks later on, is they built a bit of middleware that was a service layer between the storage system and the application tier. And what that meant was that they could change out the underlying storage system and put new things in and do a migration over a period of time. And so that was the strategy that they took in order to get to object. And so some of the requirements that they wanted to have for the next generation storage. So one was using open source software. They really wanted to use a REST API. They wanted to minimize vendor lock-in so they could use different vendors if one vendor came along with a different hardware price point, commodity hardware. And we'll hear more about this later on too. Getting close to the price of a drive is a goal that they had. So when new denser drives were available on the market, they could buy those and then put them directly into the system. They wanted scalability so they can keep adding and growing the system. Object storage multi-data center, so that way they could say for an archive tier put across multiple data centers and keep things low-cost. Now they also wanted some of the performance characteristics from a traditional NAS so they needed to maintain that performance level but they still wanted things like snapshots, versioning and some of the disaster recovery characteristics. So that was sort of their list they wanted to go through. So they took a look at Swift and the thing with Swift is it's a, they were going for the high availability, distributed across multiple data centers and across multiple nodes rather than a RAID model. And they were okay with an eventually consistent model which means that when they put data into the system they're gonna be putting in new objects and when they go retrieve that object they're not trying to say distribute a lock around it. Now an eventually consistent storage system is not necessarily gonna be good for running databases or running operating systems but it's perfect for storing these documents and these images and then serving them out over the internet. So it just matched the use case really well. They chose to store with three replicas and one of the things that they did notice they were very interested in OpenStack but they wanted to start with storage and they could implement Swift without necessarily bringing up in a whole OpenStack environment and that allowed them to tackle just the storage problem first before they moved on and worked on the rest of their compute problems. So a few components and what we'll do is we're gonna walk through a few of the components in Swift and we're gonna map that to some of the hardware that they decided to purchase. So we can get a flavor for what that hardware does. So the roles in a Swift cluster there's a proxy server and that's on the front end. It receives the incoming storage requests and if it routes the requests to the appropriate storage, the object storage nodes, then there's the account and container server and what that does is it keeps track of what objects are grouped together. Now not from a pre-request basis but when you list objects and you want to organize them, that's what the account and container server are used for and so you have accounts that point to containers and containers that point to object and that record keeping is used not to service requests but so you could do things like list what objects are in a container, what account has which containers in them. So it's a grouping function and then we have the object servers themselves and they actually store the data. Then there's a data structure in Swift called the ring and what that does is it takes a URL and we heard Chris talk about this early in one of his sessions, it takes that URL and it maps it down to physical locations on the object server and there's a ring for each container and objects and that's what gives Swift its ability to add and grow capacity. You make modifications to that ring, you distribute that out to the cluster, you add additional hardware, you make modifications ring and you distribute it out. So that's the strategy for scaling out the environment. There's multiple regions that they wanted to take advantage of and that gave them the ability to just deploy a single storage server in two data centers and if one of those data centers say we're to entirely go offline, they would still have one up and operational and you'll see this why this is important later on in some of the benchmarking. So in their use case, they wanted to make sure that they could survive a full data center outage and still have an operating website. So they reached out to us and listed us to use our product around OpenStack Swift. So we've been working really hard in the Swift community. We have the project technical lead, so SwiftStack does a lot of development in the Swift community. It's a broad ecosystem, a lot of participants in that community and what we've done is we've built a product to make it easy to deploy, operate and scale a Swift, a Swift cluster, so that's what we do and this is a screenshot of that product and really what that allowed them to do was instead of having to get ramped up and be Swift experts themselves, they could use a tool to do a lot of the management of the Swift environment for them. So the hardware. So the nodes that they decided to pick were they got these from Redapt, which is a partner that we work with and they're fairly dense storage nodes. There's a storage head and then there's an 84 SAS hard drives and this is a bit more power than what we typically see for people who are doing an archival use case and if you think about this use case, what they're doing isn't necessarily archiving data, they are serving data, long tail assets out to users. So there's a lot of performance in these systems. The account container, which has some of that metadata, we are using, there's an SSD tier to cache some of that data so we can ingest all of those archives and those uploads that they're receiving from their end users and this allows us to keep track of them and give good performance on that. And then the proxy nodes. Now, these are just a 1U server and 10 gigabit in and out of the rack and so that's the hardware that they set up and this is the hardware rack for the serving out a lot of the data. So when they went to go benchmark, the nutshell is it was three times what they needed to sustain for their production workload and what they did was they ran benchmark and we have very good testing tools so we could profile the workload that they needed in order to service their application which was a mix of reads and writes. So they did a couple of tests, one with a 10% write ratio to read ratio and 5% and 95% and it met the expectations that they needed, three times in fact. So we're really happy with the performance of the Swift environment. And so really what this means is they could take standard off the shelf equipment, apply software onto it and exceed the performance that they were getting out of their traditional NAS for this use case which was long tail storing and serving image assets for ancestry.com. It's pretty cool. We're really happy with these results. So next steps with them is to handle the backup. So we've done a Commvault integration where you can take Commvault Simpana and those will be backed up into Swift so that's a project they wanna take on. Swift storage policies, this is the ability to carve out different pools of infrastructure for different use cases and they can do things like have reduced redundancy or for the archived here when they wanna be extra careful, they can have say a more sophisticated or ratio coding seam in there. And there is some use cases where they wanna have even more performance on highly very popular content that can be put on an SSD tier. Now, when they take on their OpenStack project, the Swift cluster will be the backup storage for Glance and some of the Docker images that they're gonna be storing in there. And that is it, that's the use case that ancestry.com is using with Swift. So thanks for listening to that and I'd like to invite my gas away from HP up here and Scotty Miller from DreamWorks, if you don't mind. Welcome them, yeah, yeah, that's it. All right, so this session's all about use cases and Scotty, I'll start with you. Just introduce yourself and explain what your role is at DreamWorks and what is the need for object storage for you within DreamWorks? Sure, my name's Scotty Miller, I'm a technologist in the Technology Operations Group. I focus primarily on our HPC environment, storage, asset preservation, editorial, post-production. Our need is similar to Ancestry, a typical animated movie for us. You get a 90-minute movie, it's 130,000 individual frames at 48 Hertz, takes about 500 million files to create that final movie product. We complete two movies a year, so about a billion new files a year that we wanna preserve forever. Potentially monetize, use for sequels, use for TV programs, use for consumer products. We've been doing all this work on traditional scale-out NAS products and a couple of factors have come into play. The cost of traditional scale-out NAS is much higher than the object store, as Jill mentioned. For us, though, we're also faced with the desire to collaborate globally. So we're using object stores in two places. One is in the global collaboration. We've actually been doing that for almost three years now. We started with HP's public cloud using Swift and some various gateway products to share data sets between our US and India operations. That gave us experience of Swift as an API, got us comfortable with the notion of eventual consistency, you can't, you have to educate your developers. One of the things you have to think about in your use cases is people who are used to traditional file service, atomic sort of operations and locking, they're gonna have a little bit of an education. One of the things that's interesting in a public cloud implementation of Swift is that eventual consistency window can be minutes long. So we had to learn how to deal with that. So that use case is collaboration and the other use case for us is asset preservation. Very much like Ancestry, most of our data set is cold, long tail, write once, maybe read once, but it has value. And if you can preserve that data and reuse it in the future and make revenue, then it's an asset. If it costs you more to keep it or you can't find it and you can't monetize it, then it's a liability. It doesn't make sense to pay to keep liabilities around. Our other use case is a little bit non-traditional for Swift. The intent is to use it as a replacement for tier one performance scale out NAS in our computational rendering environment. The intent there is replace our asset management system, which has been database backed file system based with one that's database backed but Swift based. By, we built a middleware piece similar to what Ancestry did that all of our applications can talk to. It's an asset management middleware, not necessarily a storage middleware, but it provided a place to do storage location transparency, to do storage protocol type transparency and provide an access gateway that the applications can say, I just need to get the Shrek model and that it's in Swift or that it's in Swift that originated in Bangalore, doesn't matter. So long-term asset preservation is the first use case, cheap and durable. And then tier one and a half storage, I get to call it, is the other use case where we're trying to replace primary NAS with Swift. Thanks, Mike. So what are the challenges that you were facing inside of HP to serve internal assets and what did you come across as solutions for using, with using Swift? Our primary use case for Swift is similar to Ancestry in the way that I work primarily on the storage team and we were looking for a cloud based storage solution for a specific use case of Sync and Share for all of our end users within HP. And so we didn't need the full open stack architecture at that time, so we just brought up Swift and what we were looking for was something that would be easy to manage short on people, you know the story. And so that's when we turned to Swift stack for that particular need. Pushing out Swift and then eventually developing the app for the Sync and Share service that is now pushed out to all of HP. Yeah, so to share a little bit about dealing with, I mean there's a lot of employees at HP and so what kind of, what did you pick up in order to be able to serve out all of those users and support all those users? Like on the authentication side. Oh, so we use LDAP for authentication and one of the things, or one of the components that we needed from Swift was to be able to use LDAP authentication in that and in that process, we worked with Swift stack to get that development into the Swift stack product so that we could tie it in to our LDAP infrastructure. And then would you share a little bit about the product that you selected for file Sync and Share? So I think that's another cool use of this. Yeah, that product is actually a product from GladNet that we rebranded under, it's called HP Data Drive and that product has an agent that runs on everybody's desktop and will Sync and Share to their mobile devices, laptops, Macs, it doesn't matter what they use and stores the data in their quota, 25 gigabyte quota on the object store. Cool, cool. So can you walk a little bit, so Scott, walk a little bit through that animation use case. So what are the steps involved, right? You have the models that are- You're stealing my thunder from my Thursdays talk. Thursdays talk, well give a hint, give a hint. Sure, so it is an interesting process. Most people only realize that before I go there, I wanted to do another plug for Swift stack middleware. One of our requirements for our asset management system was the ability to have object immutability, which is something that's not in native Swift. We had a lot of talks about what did immutability mean in an eventually consistent environment where you have multiple writers around the globe and we ended up with Swift stack working on an engineering spec to build a delegated authorization middleware piece that for certain objects in certain marked containers with certain headers in the put command will call out to an authentication server that we provide that will decide whether or not that object could be written, deleted or whatever. Huge, getting that and that was about three weeks of engineering work and we had ship delivered software. That would have been two years with the traditional NAS vendor. It's the joy of open source is that you can talk to people. They agree it's a good idea and you get it done. A big win for us. The animation process is, it's a three to five year process per film and it starts out with, hey I have a good idea, let's make a movie about an ogre to release date. In the first two years or so, it's primarily developing models, developing the look of the characters, figuring out what the story looks like and what the environment looks like. Very small compute demand. Very small but very critical storage demand. At that time we're creating storyboards which are hand drawn rough drafts of the movie if you will that people want to play back with synced audio. We iterate on that for quite a long time. There's a famous quote by DaVinci that says, art is never complete, it's only abandoned. George Lucas says movies are never finished, only released. So it's an artistic endeavor much like software development. There's always one more fix or one more tweak or one more thing you want to change. There's a name for it but I can't say it in polite company. So we iterate up until about the last 12 to 16 months of the movie and then we turn the knob up to high resolution, 2K full color fidelity, left eye, right eye stereoscopic and then that's when the compute kicks in. A typical movie consumes 80 to 100 million CPU hours of compute. It generates as I mentioned 250,000 final frames of movie plus the 500 million or so intermediate files that are both the source code and the work by products that make in the film. Add to that, you have to dub into 45 different languages and do any text in the film in those native languages and you get quite a few combinatorics. So all this stuff is right one's valuable dataset which is why the object store makes sense for both that middle or asset management, the final frames as they're delivered and then the preservation footprint we keep forever. Cool. You know I also wanted to make a comment about working with the open source community because there was a need that we had as well. So since the time that we've pushed out SWIF, we've also pushed out HP Helion, integrated that into our SWIFs cluster and we had a need for the V3 support and we came to your company and within a week we had that support or that fixed back. It was amazing. V3 for Keystone? Yeah. Good. And so then what's the use kit? So then what are the use case that you're integrating Helion and the SWIF environment? So how are you seeing that integration work? Well that integration allowed us to do full LDAP support through Keystone into SWIFT allowing complete access through Helion into SWIFT and back again. All right, so what's the future of, what are some of the next projects that you're planning on taking on with an HP IT to use, right? So you have the storage system, you're not just using it for one thing, now you're using it for both files you can share and it's backing up an open-sac SWIF cluster. What's next on the right? So of course we store all of our glance images in there and then we'll also be using it to back up tenant images when they want to use that function. So that'll be probably the biggest next use cases is backup, image backup. Can I get an idea? Just like the, not necessarily scale if you don't want to share that but what type of hardware that you guys are, have chosen to run the system on? Sure. We run obviously HP hardware. It's HP DL380s on the proxy front end with SSD and then we run DL380s with a J-Bod very similar to Ancestries. We run a two-node header in that split between the two, I'm sorry, between the J-Bod on the back end. So each node basically has 45 disks assigned to it. What kind of time does it take to operate the cluster for you on an ongoing basis? On an ongoing basis, it's extremely stable for us and I wish there were some wood I could knock on just because you know how that works but extremely stable, ongoing, we're probably about maybe half an FTE. What kind of hardware are you guys selecting? Similar, the proxy nodes, the account container nodes are DL360s and 380s, we're also an HP hardware shop. Our storage nodes are SL4540s which are an HP product that has one, two or three controllers in either 60, 50 or 25, sorry, 60, 50 or 15 drives per. We're using the 60 drive one controller base, we have eight of them in California and two different fire zones to make two regions and then a set of them in Bangalore, India to be the third region. Right, right on, right on. And we are spread site to site as well. So we actually have three different storage policies that we use depending on if it's a single site application then they can fall into one of the two sites and then we have a site to site which is a four replica policy. That's awesome. Any questions from anybody in the audience who would like to ask any questions from these folks? Really? Don't feel bad. I've been asking questions. You don't have to learn stuff, you gotta ask questions. Well, I appreciate everyone coming in and listening to some of these use cases who have found it informative and let's give it, let's thank Mike and Scott for sharing with us. Thanks so much.