 All right. Good evening. Hi. My name is Joe Arnold. And I'm here with ‑‑ And I'm Allie Fen. I'm from Seagate. Yeah. So my name is Joe Arnold. I'm the CEO of Swiftstack. And we're going to talk about OpenStack Swift, the object storage system, and Seagate Kinetic. And I have to say I'm really kind of excited about this technology because even a few years ago, when you used to say object storage, people would go, what the heck is that stuff? And don't you need file systems everywhere? And when we persisted data, even in this object storage system, at the end of the day, we were ultimately laying it down on a file system on a disk. And I was really excited when this project was announced from Seagate, which is bringing object storage all the way down to the drives themselves. And it is really a big deal. And I can't stress enough, at least in our world, it's a huge deal this change. It's going to enable a lot of things. So a bit of a preamble is over here. So, okay. So what object storage is trying to address is the world of applications and the type of data that people are storing has grown tremendously. And it's not just that the data is grown, but it's also how the data is being consumed and served. And object storage has emerged as a technology that makes it really easy for people to build applications that could connect to the storage systems directly. Because object storage systems can serve out objects via web protocols like HTTP, that means that people who are building out these applications can embed things like URLs directly into them and the content can just be served up directly out of them. And what that means is it's a rebuild of the storage stack. And instead of trying to figure out how to create silos of storage pools, now larger storage infrastructures can be built using object storage. Yeah, and I would just add to that, you know, really the macro situation which has given us the opportunity as Seagate to really fundamentally rethink from the device perspective up, you know, what would the storage architecture look like if we were designing for a world that is object type data that's exploding primarily and use cases that are very, very different than they were in the past. And, you know, that explosion of data, these are IDC numbers. There are lots of other numbers. Seagate Market Research would tell you that by 2020 there's going to be, you know, on the order of six and a half or seven zettabytes of unstructured data that is stored in the cloud. That's a majority percentage of all the data being stored up from a fraction now is shifting to the cloud. It's object type. And as one little anecdote to give you a sense of how much that is, if you took, and I don't have my prop with me, but if you took a standard four terabyte, three and a half inch hard drive and stacked them next to each other, not the long way, the short way, basically an inch around the world, that's how much that is, right? So a four terabyte drive. So it's a lot of data that we all need collectively as an industry to figure out how we're going to store, right? And we're going to have to do that at much different economics. And thankfully we have a landscape here which has changed to let us do that, right? It's much more open. It's not dominated by, you know, some of the more traditional types of systems that we've had in the past that have been inflexible in terms of our ability and others ability to innovate, right? It's very much open. It's very much software defined. And it's against that landscape where we've said, hey, now is exactly the right time to say, let's actually take out a clean sheet of paper and see what this would look like if we designed it today rather than in the 60s, right? And that's what we're trying to do with Kinetic. And it's obviously one of the things that Swift is doing as well. So what is Swift? So what is Swift? It's an object storage standard. And it's a project within OpenStack. And the participation in that project mirrors everything that else is happening in the OpenStack community. There's lots of participants that are involved in making Swift a great product. There are very large public cloud deployments that are using OpenStack Swift. And, you know, attributes are what you'd expect of an object storage system, scaling to very large environments. It's, there's geodistribution capabilities in that. There's lots of contributors. It's a growing ecosystem. And we're out of them. But you can go online and check out a book we recently published about OpenStack Swift so you can learn more about the nuts and bolts of how Swift works. We'll cover enough so we can get through, like, where Kinetic fits in in the concept. And so just briefly, there's a few tiers in a Swift system. What a user would interface with or a client would interface with is called a proxy server. And what the proxy server does is it mediates the request that's coming in from a client and it says, oh, here's where that storage lives. And the way that Swift works today is it uses replicas. And so when the request comes in, the proxy server can, if it's an incoming read or write request, it will stream that data to several locations. And if it's an incoming read request, it'll pick one of them. And when there's failures, it will route to the nearest available storage server. And then the storage server is responsible for serving the data that it's locally connected to. But there's also processes that run behind the scenes that will do things like check for bit rot, do replication checks, and make sure that that data is consistent and is replicated across the environment. And so some of the things that we needed to do in Swift to enable Kinetic is we had to do a few things, right? So the way it works right now is that when you're placing data, we take that whole hard drive and we format it and you put an XFS file system by default. And that's where the files get written to. And when we're trying to incorporate a new drive technology into these environments, not everyone wanted to replace everything all at once. And we had a different way of persisting data. And so we're like, well, shoot, how do we make, how do we allow people to be flexible in how they do their configuration? So we did these two things. The first thing was storage policies. And what storage policies allow is an operator to do is they can demark different hardware or different regions and set different replica policies in their environment and in the cluster itself so that the users can put data into an effectively different, I want to call them tiers, but different operator configured policy on where they want the data to be put on. And then the second thing we had to do was we had to make how data was persisted pluggable. So we extracted the guts of how Swift persists data onto a file system, which I mentioned before was XFS. And we replaced it with something that was more modular. And that gave us the hooks that we needed to re-implement what we needed to deploy Kinetic. I feel like we should be doing a little dance as we go across. So what is this Kinetic thing? Again, I forgot my prop here, but it is a drive that looks from the outside on the surface very much like a traditional drive. It is exactly the same form factor. We have re-pinned the connector so that instead of speaking sass, it speaks Ethernet. We'll go into more details about this in the future. So it adds Ethernet and it adds something, a second thing which is really, really important, and that's a key value interface. Only the combination of those two things enable the value proposition that is at hand here and enabled by Kinetic, which is many things, but first and foremost, really, really significant total cost of ownership gains. And we'll talk about why that is. But it's very much enabled by the combination of the key value interface and Ethernet, which together let you rip out and remove very inflexible layers of both hardware and software in the architecture and therefore deliver a much kind of lower cost at the data center level. There's power implications, there's cat-back implications, there's reliability and management implications. We'll talk about what all those things are, but it's important that we do those two things. As I said, standard form factor. And then it's a platform. We introduced this thing last October, it's called the Seagate Kinetic Open Storage Platform. And open is really key, right? We are a drive company, we're going to sell drives. But we want to enable lots of people to produce drives and we also want to enable, we need to enable systems builders and we need to enable the software stacks. So as part of this, we have designed and there's a Kinetic key value API. As of this week, the API and the libraries are all fully open-sourced. So the goal here is again to participate in this industry, which is now this landscape, which is now much more rapid innovation and open and software defined and so forth. So open-sourced APIs and libraries. And the interface spec is also open because again, we want, this is not intended to be a one-off Seagate thing, we want the industry to be able to converge on what we think is the right architecture going forward to enable these scale-out object storage use cases. So that interface spec has been contributed to both open compute and is available on the SSF traditional area as well. So I want to talk a little bit more about those layers and why this is a better approach. So this is, is there a pointer on here? No. This is kind of a traditional stack diagram, right? So what you have at the top here where this says application, I want to be clear, that is intended to be a storage application like Swift, right? This is not the top-top application. So you have this application, which again, in today's world, many of these are trying to do object storage, right? They want to deal, they want to put objects, get objects, delete objects and so forth. They mostly don't want to, or in fact probably never, want to seek into a file and modify it. So things like POSIX file systems, volume managers, drivers, those now serve one purpose, which is to take those objects, those keys and values, and go through a bunch of layers of machinations to spit out blocks and sectors. And then they do the reverse. And we think that's a pretty dated paradigm, right? What we want to do is say, if these things just want to deal in keys and values and objects, let's let the device be a key value store, right? And let's let you talk directly in your language of keys and values over ethernet to that device. So you, this storage server is also in the way, right? You have pretty expensive parts in there. And it's there because these devices need to sit attached to it. Right? But now we got ethernet and we have key value, you don't need the file system, right? Joe and his guys were thrilled to hear this because they're like, wait a minute, now I can spend my development energy on the parts of Swift that are really super important in areas for me to differentiate in. I don't need to be mucking around in file systems, right? So there's, there's a lot of good reason to get that out of the way. You get the storage server out of the way. And now suddenly you can scan and attach it to ethernet. You can scale out the most dense possible racks of storage anywhere you want to. Across the hall, across the data center, wherever you want to, right? So suddenly you have a much more economically interesting kind of racks and racks of storage that can be distributed and disaggregated from compute and scaled much more flexibly in the way that you need to based on your use cases, your architecture and so forth. Right? So that's the goal here, right? It's a prettier picture, but it's also, it comes along with very significant economic incentives as well. And we'll get into some performance and other things as well. But I want to just touch on the API in general, what it's designed to do. It's very, very simple. Put data, get data, delete data. It does things like get next and get previous. It is versioned so that you can ensure, we have this now a world where multiple machines can be talking to devices and so forth. They're all connected over the internet. So we need to make sure that one server can put data, somebody else reads the data, and that doesn't result in something bad happening, right? So the API includes the ability to make sure that those puts and gets are solid as well. In a world where you have multiple distributed systems potentially accessing or inevitably accessing the same data and the same device. In that same world you have multiple masters and so it's really important that there is security integrated into the API, designed into the API. So things like you might have certain machines that are able to only do management. You might have certain machines that are able to only read and write data or only read data or whatever. All of that is configured as part of a management server but then requests are authenticated based on whatever the authorization is and then of course you have integrity checking on all the data as well end to end. So from client request all the way through and back out to the client. When we say clusterable, couple of important things. The device is a key value store device but it's still a device. It is not a storage system. Swift is the storage system. So the device doesn't take on things like clustering and replication or erasure coding or any of that stuff. The system does that. All the device is doing is now doing the space management on the drive. Because frankly we know how to do that best and that becomes increasingly important when we get into things like shingled magnetic recording and other things as well. But at the same time we need to make sure that we do the minimum we need to at the drive level to be clusterable. And what that basically means is that we maintain a cluster version such that if a zombie comes back online somehow and thinks it's a cluster and it's now out of date that can be recognized and dealt with. So the minimum we need to do to enable what the system level software needs from us is what we do. The other thing is the drives as I said they're not stored systems. They don't know anything about each other. They don't know anything about what the data is that's on them and it's really important because that's how we can ensure that they're infinitely scalable. Another thing that's really interesting especially when we add by nature of adding Ethernet is that we can now do something we call third party copy or peer to peer operations and that comes into play in two ways. The first is you can say to the device the system can define that the device hey I need three replicas of this but I want you to I'm going to push it to here and then I want you to copy it out to the other things. You can move data directly from one device to another device. You can pick up a range of keys and move them over. You can do that happens can happen for load balancing and other sort of system rebalancing and other things as well and again gets you out of this this thing that you have in the past of going back up to the storage server you know reading data in writing dating out out and having the overhead associated with that right. So the ability to move data from one device to another device is a very interesting thing that's enabled by Ethernet. And then management of course right so we you will not see in these drives exactly what you see in smart today on traditional drives because things like sectors are no longer relevant. There's no notion of a sector in a kinetic drive. You don't you literally only deal in keys and values. So some of those things that are no longer relevant are not exposed but of course we expose full ability to do anything else that you would expect from smart and more actually and those are you know you'll be able to see that as people go out and start playing with the API which again is available and we'll show you the link the link later. So I'm just gonna build this out. So I'll go quickly through this the as I mentioned it's the same form vector. The interface has been repinned to speak Ethernet now and what that means is basically this. So no new ports you don't have there's no added cost here of suddenly having each drive have two Ethernet two one gig Ethernet interfaces on it which it does. No new ports none of that it just plugs into a backplane. The backplane now is Ethernet instead of SAS. In fact there's one of these systems down that's made by a company called Roush which is down in the OCP booth here this week. It happens to be 72 drives. They've taken off the backplane there's no there used to be SAS expanders there now there's two Ethernet switches and so what you have is in the traditional world you'd have those SAS expanders here and those would those you know comes up to the the storage server and then from there you might have two 10 gig Ethernet ports going up to the top of the rack and then out to the core right so there's a hardware layer here and these there are a bunch of drives attached by cables. Now in the kinetic world what you have is two Ethernet switches the cost of this is about apples to apples and from you know our experience with the six or seven guys that are already building systems and now those ports go straight to the top of the rack and out to the core so that's the implication on the system side. And then one of the other questions that we get commonly is wait a minute these drives are going over the Ethernet now doesn't that mean I have way more networking you know traffic and all this stuff well the truth is in the traditional architecture you have client machine somewhere over here and it goes it comes across your data center fabric and it goes into the top rack switch and it goes down to what we call data node here a storage server and then it to the SAS JBod. What you have in the kinetic case is exactly the same flow of traffic just with eliminating that last intermediate hop right so there is one of the things that we have paid a great deal of attention to is making sure that nothing in this architecture imposes any obligations on the network right and again we'll get into the TCO model in a minute but the opportunity there's no added cost and no added you know kind of obligations from an architecture perspective either so so that's the kinetic version and now I'm going to flip it back to Joe. Yeah so when I was first told I thought this was extremely weird because the drives do they do speak ethernet and when you plug them in they DHCP and they get an IP address and like even when we got the first samples from Seagate and we put them in the lab and we just got a little card we kind of plugged them in and but that drive was powered on based on some power supply that we ripped out of something or other plug that in and then we had a cable going from a hard drive sitting on like a piece of cardboard into the wall and that's how we did our when we very first like started testing this stuff out that's how we did it now we have these the chat these nice chassis which have the switching built into them which makes it's makes so much sense in the world but once you mentally get over that that that concept from hey it's not a SAS back plane it's an ethernet back plane so many ideas start popping up and how easy it is for these drives to start talking to each other third-party copy is amazing you can actually you can when when we notice that there's a replica and like the the apis that are that are available is going to say hey this key is eh we got it okay it's it's it's a reasonable quality you can actually kick off a replication to another part another part of the cluster using third-party replication and because each drive has an IP address you're just telling the drive to talk to another drive it it's it's pretty neat so so this is how this is how things look with it with the art our traditional architecture that we when we go in with a customer site has this how we deploy today and it looks something like this you have a top of rack switch and then you have what we call proxy nodes which I was explaining those route the requests to the to to where the data lives and we usually have a couple of them and just to provide for availability and what it changes to is something that looks like this what we do it will go back one so it's proxy node in in that orange and then the red is account container and object so the data is in the red and the routing if you will is in the yellow so we go here and it says Paco but what that means is proxy account container object so what we did was we combined the processes so the the logic that makes a storage system a storage system into those one you access nodes on the top in in yellow and then the kinetic shelves then do the data so all the data storage and so what that means is that we can size precisely to how much data ingest and serving needs to be done to how much compute capacity we need to store for that storage system and so what that allows us to do so if we have an an archive use case where data is trickling in but it's not really being served all that much we can have a very lightweight what's in yellow there Paco account container object we have very small footprint because you know just needs enough to provide the availability and we can have that kinetic shelf just go very very very deep for use cases like if we're doing a a public software as a service application or gaming or or something that has a more intense workload what we can do is we can just ratchet up that that that account container those one you know the compute nodes in the cluster and to relatively small amount of shelf when we when we introduce a racer coding coming here in in a few more releases we'll just be able to size that compute capacity more precisely rather than guessing oh how much store how much compute capacity I need to have attached my storage we can just put exactly the matter out there scale it up exactly as we need so it's a from a from a TCO estimates often don't even include this in the equation which just allows you more precision in how you scale the system out so I was like more physical view this is the logical view and Swift has this data structure called a ring and it's a distributed hash ring that when a request comes in it maps itself down to a location and it routes the request to different object storage servers and so this is what how it looks like today you know the proxy server there's the ring that's used and that exists everywhere in the system and then the object server gets gets to field the incoming request so the way that we're that we're changing the changes that we need to make what happens is that proxy routes down to kinetic zones and we create and we hoist that object server process and it sits right next to that proxy server process so it lives on that same same box and the the cool thing that that that a lot this allows us to do from a from an adoption perspective and we know that day one everyone's going to use kinetic all at once right no I mean there's a there's an adoption life cycle that people are going to want to want to take on and the design approach that we took we wanted to be sure that when we introduced new drive technology that it could coexist with existing drive technology and this approach allows us to have some kinetic zones in in a cluster so you might have maybe one zone or one region or if you're just being you want to trial one one shelf of kinetic drives and see how it performs and see how you how you like it that can coexist along a traditional architecture and so that's that's one of the benefits that that gives us this one or is this me sure yeah I'll do this one so the couple slides here on back on on the value proposition so this is a this is a slide about performance really and it's usually builds out like that but anyway the here's what what drives performance in kinetic you now have the device is now a log structure storage device and you are not working through a file system which is figuring out where to place data on the device and doing a lot of metadata kind of overhead so you have the data literally streams to the disc as it's written and then to the extent that we need to we can reorganize it in the background to make sure that it's laid out optimally both for read performance and and just general you know management of the space on the drive so there's no penalty whatsoever you literally just stream the data to the disc as it's written and then of course you've eliminated that metadata overhead one so those two things are really driving the performance gains that we're seeing one interesting benchmark we did early on and we talked about it actually got it seems so long ago in Portland a year ago was just a swift bench benchmark and we were just putting 20 megabyte objects on the drive and what we saw was that for every one seek that actually moved data there were 12 that did not so it turned out that 92 percent of the seeks of the the I.O. was moved point five percent of the data now obviously there's lots of work going on and lots of friends to help solve that problem this is not the only way to go about that but obviously that is a massive kind of Greenfield opportunity for us to go and reclaim and we see we're seeing very significant benefit from that as an example in that benchmark the at the system level we were seeing about 13 megabytes per second and with kinetic we see about you know 50 to 60 sustained at this point right so there's there there's really a dramatic impact to having the device do the space management and getting the file system out of the way so TCO you can just go ahead and build I think oh yeah that's it the three things we have modeled are capex cost of eliminating those storage servers or converging the the object demon back up to the proxy layer in the case of Swift that has a big impact and associated also with pulling those storage servers out of the rack two other things on the OPEC side one is power so yes we in order to do ethernet and to do the space management we have added a an arm chip to the drive there's already one on the drive that's true of all disk drives there's now a second and that adds a slight tiny bit literally in kind of most most cases a watt or less of power the pulling the storage servers out of the rack has a much bigger net impact so at the rack level we see a significant decrease in the power consumption also on the OPEC side of thing of the model one of the important benefits is that we've taken that kind of higher-level fair failure domain out of the system right so now a server you don't have a server that goes down that's got 60 drives or some number of drives attached to it that creates a somewhat you know immediate maintenance event now instead you have each drive as a solo actor drive data is all replicated somewhere else who cares right I don't care I can just go pick up go to replica to and move that data from drive to drive and create a new third replica and I don't have to go fix that so one of the things that we have validated with a lot of very large data center operators is that some number of some same number of operators can manage a much greater amount of storage plus you have a lot more storage in the racks because you filled up all that server space or you can have fewer people right and this is all part of everybody's quest to get more towards a hands-free data center so that's a big piece of it as well the impact of this is potentially as high as 50 percent so think back to that seven and a half zettabytes of data we're all going to store it's not cheap even if you fact if you take today's kind of commodity stack and not big traditional expensive you know OEM systems to a commodity stack today there's even with you know assume craters law and aerial density gains and everything else there's still by our record keeping about 240 billion dollars of cost over and above where we are today to store that amount of data right there's a gap so we think that if we can deliver 50 percent 40 percent 30 percent something this these are really really meaningful numbers and what drives that number what drives the variance there is is the traditional alternative in terms of how many drives you're packing behind a storage server so for people who are doing 15 they might be in this 50 percent category and that seems to be still a pretty reasonable industry average for people who are doing 60 or 90 or 120 it's less it's still very compelling all right so the mechanics of how we did it talked about it earlier but we built this disk file abstraction which allows us to plug that a new that new batch of API so instead of using a file system which is an API to talk to which is on the on the system itself to talk to the drive now we're in a tier and we're talking over the network to that drive over ethernet using the kinetic API next thing we needed to do was but also changes how replication works because before we had a server that had all of those drives attached to that server and we could just run those processes locally on that drive to say hey are your replicas here yeah okay and we talk over the network to ensure all that but now that's been hoisted up into a tier so we have to have different rules on how we check for replicas that exist onto the drives and then the auditors can now talk more I guess I think it's more more simply to the drives themselves to check the keys to to compare against to ensure consistency to make sure that they haven't they don't mismatch their checks them anymore and and like I mentioned the storage policies are going to be a big enabler with to on adoption and then so for from a product perspective what we're doing with with Swiss stack at the end of the day we're going to take advantage of a lot of the management capabilities that are existing in these drives so the the the ability to check for consistency on these keys to be proactive in moving some of these data around moving some of the data around are things that that we're going to be able to to really take advantage of so if you have one of those drives that dies out in the field let's issue an API call to power it down and starts and spinning that spinning that drive down and so we're going to have full support for kinetic in Swiss stack and and we'll see that get enabled as as we move forward so what we do from a from a product perspective is we take that hardware and we have a a an installer that installs Swift Stanix standard Linux distribution installs our runtime stack and then there's an operator dashboard so that they can manage deploy operate the environment and so this will be another technology that will be able to fold in we're also working with with with some early customers with kinetic and so if you're here I think it was on Tuesday we had a Dirk Peterson talk from Fred Hutchinson Research Institute about their use of Swift handling large bioinformatic workloads and their life life sciences so lots of lots of gene genome sequences and they're very interested in reducing the cost of storing large amounts of data and so we're working on right now is we're getting those getting that system getting that rack built and we'll be shipping it out to Fred Fred Hutchinson Cancer Research Institute so we can get some real-world metrics and get it in the hands of customers that's happening in the next couple of weeks yeah it's really really soon so we'll be get some get some mileage under under the belt and we'll have have some some more stories to tell pretty soon about it summary yeah so in terms of kind of where we are from the drive side and Joe just talked about where they're what their plans are in terms of timing on Swift stack availability and so forth we introduced the technology last October it was purely a technology introduction at that point with the goal of enabling lots of you know customers and partners to start working with the code we at that point the we were only making making the API and libraries and so forth available and prototype drives available in a very limited way under NDA as I mentioned as of this week everything is open sourced so please have at it take a look we we will be shipping drives later this year we are actively engaged with both Fred Hutchinson and a significant number of other customers and we do have prototype drives available and systems available for customers that are doing committed POC's so they are not available in kind of you can't go buy them on Amazon yet but but there is real-world work happening at reasonable scale with systems on customer sites currently and again we'll be shipping drives second half of this year so please obviously people who are interested there's the github link to the code there's also a developers dot cgate.com website that has a lot of information and documentation and so forth that will help people to get familiar and of course we're totally available to to answer questions and and start to have some customer conversations with a broader set of folks as well and and from a swift perspective the prototype of the swift connectors available now it's it's a it's also on on on github so you can look to see how that is implemented if you want an example of an API and you're familiar with Swift and you want to see how that looks we'll have a full open source driver for kinetic in q3 and we'll have integrated into our product in in q4 now we are the last session of the day and I want to hold it too many folks up but we have just wanted to say we have clay Gerard in the back of the room Allie and I will remain up front so if anyone has any questions please don't hesitate to come up we'd be happy to answer them and Jim Hughes I didn't see you there you are hello and thank you very much yes thank you all very much