 All right, it's 9 o'clock. So I think we'll get started with the next session here. So thank you all for coming to join us. Hopefully, we have a topic that everybody here finds of interest. We're going to talk a little bit about building a cloud environment with open networking and software defined storage technologies. My name is Paul Speciale. I'm the VP of Product Management for Scalely. I'm actually here on behalf of our CTO, Georgio Regni, who couldn't make the trip. But I'm fortunate to be joined by Nolan Leak, who's the co-founder of Chemos Networks. So we'll kind of co-present this session here. And we'll kind of take you guys through a little bit of the problem statement and description of the various technologies and how we feel the two combine to make a synergy and make life easier. OK, so as a mini agenda, we do want to make this interactive. But we'll kind of set up the talk here with a little bit of a couple of slides about the data center, how we build a big storage system for storing hundreds of terabytes or potentially petabytes of data in an open stack environment. And I'll have Nolan talk a little bit about open networking, what it is, what are its key capabilities, and then again, we'll kind of close with some of the advantages and ask you to participate. So I think much of this is very obvious to us, but a couple of things I want to throw in. We're obviously trying to build new data center infrastructures based on kind of the technologies that we know the giants are using. And that's one of the reasons why OpenStack is around. Certainly one of the things it introduces is a lot of variables. We're going to have lots of different applications, hopefully running in OpenStack. That's really sort of the core goal, is to support our businesses so that we can run applications and different kinds of workloads. That's going to have a huge bearing on storage infrastructures and on networking infrastructures. But moreover, one of the things that we believe is the problem is that there's lots of disparate management frameworks that are starting to appear. And we'd like to make that common. So one of the approaches here is to make this very open and to be able to leverage kind of common management frameworks, things like Puppet, Chef, Ansible, and all of that. From a storage management perspective, which is what we're focused on, we've certainly seen the cloud impact data storage at scale and also at type. We're seeing lots of explosion in virtual machines. That means you need virtual machine storage, you need file storage, you need object storage. It just creates a big mess, and ultimately, one of the things that we feel it results in is an explosion of storage silos. We'll talk about a solution for managing the big unstructured data. That is a scale-out, typically addressed now by a scale-out technology, which requires very flexible networking technologies. So that's kind of the heart of what we'll talk about. Big storage solutions today, people are certainly very interested in object storage. In the world of OpenStack, that means Swift, right? So Swift provides the verbs for doing large-scale puts and deletes and gets against data, stored as objects in the cloud. But there are competing technologies. There's, of course, Amazon as a growing force. Their S3 spec is something that we're tracking and a lot of people are embracing. Ultimately, we feel that this drives a big increase in complexity of network typology and also for managing these infrastructures. So that's where we feel we can come in with kind of these open platform technologies to really make life simpler. So I mentioned the fact that distributed storage is now becoming kind of the normal model. So what does this mean? It means scale-out in two dimensions. I may need my application to talk to various protocol services. REST protocols are now very common, of course, for accessing big data. That's where Swift comes in, certainly. But the need to address an incoming larger and larger load may mean that I need to dial up the number of interface servers, the number of protocol servers. In our model, we call these connectors. They can serve NFS traffic. They can serve REST traffic. But the key is that I need to be able to scale them out as my load grows over time. Very commonly in the Swift model, we have the notion of object storage demons or storage nodes. As capacity needs grow, I want to be able to add these incrementally and sort of grow the capacity of my system. And all of that, of course, is glued together by an expandable network fabric. So what's the heart of a software-defined storage offering is really the intelligent storage services. This is really the glue, right? So the model these days is I don't want to buy hardware appliances or arrays that embed the software or the intelligence for storage management. I really want data durability services to be in the software. I want things that route around failures when they occur. If a drive fails, it shouldn't be an abnormal condition, right? In the old world of RAID controllers, as soon as a drive failed, some administrator received an alert and you had to go hurry in there and repair and replace the disk. That goes away now. So we'll talk about how we do self-healing in the face of these types of failures. And then, of course, growth should happen without disruption. I want to be able to add resources on the fly. The service remains available. And moreover, I want flexibility of choice. I want to be able to choose what hardware I run on in terms of both the server side, the storage side, and also the network platform. And ultimately, certainly users want to be able to manage these things over APIs so that they can monitor the health, the performance, and also to do management, to do provisioning of these systems over APIs from all their common tools. Okay, so with that said, there are some challenges in distributed storage. Certainly management is one of them, but the two I want to point out that are very, very relevant to the choice of networking technology that you select are really failure domains and data proximity. So these are data placement considerations. So as you move to a distributed storage system, you start having considerations about where you place the data. So again, this isn't the model of having a RAID controller that manages eight or 10 drives and all of those drives are local. What we're doing instead now is taking a big file or a big object and distributing it across the system. And there's really two considerations that drive that. The first is the notion of a failure domain. I want to be able to take either replication chunks, multiple copies of a file, and distribute those across my servers, my disks, my racks, and ultimately, I might even have multiple data centers that I need to think about placing these chunks across. Modern systems currently support things like replication schemes that are variable. So that lets me dial in how many copies of an object I want to store. More sophisticated ones implement erasure coding. So this is a variable parity scheme where rather than storing multiple instances of an object, I'm actually doing a parity scheme and I'm breaking the object up into chunks, storing some data chunks and some parity chunks across these different failure domains. So in the example here, if I have four data chunks, each representing a replica, I want to ensure that they're all on different servers, on different drives, and potentially on different racks, right? Because the racks might be a failure domain of their own, given how power is distributed. Okay, so that's dispersal of the data so that I can ensure that if I have a failure, I'm not impacting multiple copies or multiple chunks at one time, but that fights against the other need that I have, which is data proximity, right? So in order to get performance, I need to be able to access the data rather quickly, right? And this usually means that one of my protocol servers, a REST server that's fetching data, needs to be fairly close in terms of number of network hops to the actual chunks. So we'd like this to be bounded, right? And this becomes a problem as I scale up the number of racks. Traditional topologies really start increasing the number of hops that I need to cross in order to get the data. So maybe with that, I can hand it to Nolan with some comments about the networking side of this. Thank you. So before we get into how we solve those challenges that Paul brought up, I'll give a brief overview of Kemal's Linux. Who here is heard of Kemalus? Okay, who here has used the product? Okay, fewer, so I'll get into how it works a bit. So we're best known for open networking, and so that's the idea that you don't buy servers with an OS on them anymore, right? You no longer buy a Spark server with Solaris, right? You buy an HP server running RHEL, or you buy a Dell server running Windows, right? But in networking, it's still kind of common to buy everything together in an appliance, right? You go buy your Cisco switch and it comes with a Cisco OS, and if that's not working for you, well, you're just kind of out of luck. So the idea of open networking is to decouple that hardware from the software. So unlike NXOS that only runs on Cisco switches, Kemalus Linux runs on switches from seven different hardware makers, and each one has a wide variety of different kind of switch designs. So you can pick one that really does exactly what you need from the kind of vendor you want. Maybe you want global reach from a large OEM, but maybe you are building a large-scale data center and maybe you're okay with a kind of smaller OEM because you get more appropriate pricing or whatever. So it provides that flexibility. And the other kind of difference from what you may be used to in more traditional switches is that it really is just Linux. Almost all network OSes today have a Linux kernel in them, but you would never know it, right? Like you SSH in and you get some proprietary CLI, and it has a config file that has some knobs you can turn, but you can't easily load your own software on it. You can't use open-source tools on it. The interaction through it is either through CLI or some sort of proprietary that switch API if they give you one. So beyond that, it is a switch, right? It does VLANs, VXLANs, bridges, routing, protocols, all the kind of normal stuff you'd expect. So you don't have to learn some new protocol like OpenFlow to manage this. Like all the concepts you already know from networking still apply. It's just with a slightly different interface to it. So there's a lot going on in this diagram. So I'll try to break it down. But so Paul talked about kind of the locality and the tension between wanting the replicas to be distributed widely across the data center, across the different racks so that if you lose an entire rack, you don't lose multiple copies. But at the same time, you want the data to be co-located close to each other for write so you don't have to replicate across a huge distance and also to the consumers of it. So they don't have to go talk to some server that has very low bandwidth to it because it's far away. And so to facilitate that, we use a topology called a factory. It has some other names as well. But the basic idea is, unlike your kind of traditional network where you have a pair of core switches and then top-of-rack switches connecting up to it with oversubscription through the core, we have a larger number of small spine switches. Usually in a full mesh topology, you would have half as many spine switches as you have top-of-rack switches. And the interesting property that that topology gives you is when the techie term is full bisection bandwidth, but what it essentially means is you have the same amount of bandwidth to a server 10 racks over as you do to the server directly below you in the same rack. So now you no longer have to think about the network location of a server or disk because they're all the same. And so that allows us to resolve the tension between wanting locality but also wanting distribution by removing locality as a constraint so then we can focus just on distribution. There's no longer any tension between two. So the way we realize this is through building an IP fabric, a layer three fabric, usually with protocols like OSPF or BGP, typically these days EBGP. And if you're familiar with networking, you may be thinking, well, I mean, technically, I could do that with any switch. But as a practical matter, we've done some things that make this far easier and far more practical, both for technical reasons and for non-technical reasons. The first is that that open supply chain means you're no longer paying obscene prices for your switches. You can shop around and get a reasonable price, which means you can afford to have a lot more switches. What enables this extreme amount of bandwidth is the fact that you just physically have more switches. So you have more capacity. And the other thing, the other challenge you might run into is then, if we have more switches to get this more bandwidth, you have to manage these more switches. So if you have your traditional switch, you'd have to SSH into each one and upload a config file or build some tool that kind of manages templates and blows them out to various switches over SSH pretending to be a human, right? And so we'll get into kind of how we can help out with that with automation tools in a second. And then the third is that we've implemented some technologies that make this a little easier. I don't want to get too far into the gory details. I'll be getting into like config files and the real gory details on Thursday in a different talks. If you want to see that, come for that one. But the basic idea is instead of having to manage a huge number of IP addresses for all these little links, we give you the ability to assign one IP address to each switch and have that be the entire config. And so this makes it super easy to write a single config, usually about that long, that you can then just blow out across all of the switches. And the final thing that's kind of cool here, and this is maybe hard to see. So you can see the 1.1.1.1 IP address floating around. So in this model, that's the connector service that Paul was talking about. And this is kind of what the client interacts with. And then it in turn goes and talks to all the storage servers. And so you want several of these for availability and for distributing load. And so what we've done here is we can configure something called anycast. And the idea is, since we're using a routed network here, all of those connector nodes can announce a IP address into the fabric. And so that way anytime someone asks for that IP address of the connector, they get routed to the closest one, automatically by the network. And if that fails, they will be routed to a different one. And so we implemented some technology around that. This is not a trick we invented, right? Other people have done this in the past, but it in the past has had a problem which is that the flows are distributed to all these servers based on a hash of the source IP, desk IP, source port, desk port, and some other stuff as well. But the problem with that is, if one of these fails, you were hashing across four servers, this hash function is stateless. It doesn't know which flows are going to which servers. So if one of those fails, and now you're talking to three, all those flows get redistributed at random to all the servers. And so if these are TCP flows, which they are in this case, that causes broken connections. So what we did is implemented something we call resilient hashing. And so the way this works is, if you have those same four servers, if one of them fails, only the traffic going to that failed server is redistributed to the three remaining servers. And so that way, only the traffic that was going to be broken anyway, because it was going to a server that failed is affected by the outage. Okay, so we were talking earlier about the config management challenge of all of these large number of switches. So if you talk about a big deployment and you've got hundreds of switches, that number is fairly small actually compared to the number of servers you probably have, because you probably have thousands of servers. So in terms of managing things, we should look to how people manage very large number of servers to draw lessons that we can then apply to managing large, but smaller numbers of switches. And so in this case, instead of learning lessons, we just wholesale stole the technologies people have developed and open sourced for this. And so there's lots of great tools out there, Chef, Puppet, Ansible, Saltstack, they're more, people roll their own. And because all of these speak and they're designed to deal with the common language of Linux, the Linux commands, Linux APIs, we can just reuse them. We don't have to port them. We don't have to do anything. In fact, when we were a very small company, often we found out that one of these tools worked on our switch because one of our customers just did it. They downloaded it, installed it, and said, hey, it works great. And similarly, they're monitoring tools, collectee, graphite, Nagios, all of these tools that have been used for years, decades, even in some cases, on servers are now applicable to switches. And I don't know how many people here have used SNMP, but all of these tools are a huge improvement on SNMP. And I think that's it. We wanna throw it up in four questions. Yeah, and I just add that the level of scale that we're starting to see for managing some of these large clusters starts ranging into hundreds of servers, thousands of disk drives. So certainly the collection of statistics and logs is becoming a problem at another level of magnitude. So embracing things like elastic search and log stash, that seems to be kind of the paradigm that users want us to take. So that's the technology that we're starting to embed. So it's a very open approach. We actually use Saltstack, Chef, Puppet for deployment, but you can see that sort of the synergy starts getting created when you put multiple technologies together that work this way. So we'd be happy to take any comments, any feedback from the audience or, okay? Question? Yeah, so the question was, are we trying to maintain state, network state behind something like file services, right? So I think the networking technology that we're talking about here is both the back end fabric and the front end. If you're talking about front end connectivity to something like an NFS server, the technology that you use could certainly could be applied here, right? The technology that you discussed, but really what we're talking about is sort of simplifying the back end fabric, right? The management of these large topologies that glue together hundreds of storage servers and lots of protocol servers. So for that problem, you're gonna run redundant network servers and you're gonna do failover, right? So you're gonna have standard technologies that do load balancing and failover on the front end servers. Yeah, so I mean, when we were talking about anycast earlier, the idea there is, you know, if I'm a client, I would connect to a single one of these connectors. If it fails, I would then have to reconnect. But, you know, most storage protocols expect to be able to reconnect and then reissue the pending IO. So it is kind of designed to handle transient failures. Right, so it doesn't change the nature of NFS retries and timeouts in that respect, but there are stateless protocols now, like REST, which you can just naturally load balance and retry a question. And the great thing about stateless protocols like that is then, you know, with anycast trick, you know, it's active, active, active across all the servers, right? So you just kind of get connected to whichever one happens to be closest. So if the clients are evenly distributed and the load gets evenly distributed across all of them. Right, so if you have an S3 protocol server or a Swift protocol server, you can run, you know, a dozen of them, access any resource from any of them, and if one of them fails, there's no interruption to the service. Yeah, and convergence is sub-second. Well, I mean, on the back end, the storage software is aware of the multiple replicas. So it takes care of that. It's the clients that may not be aware. They think they're talking to a single service endpoint. And so that's where the kind of cluster comes in and hiding the cluster behind a single IP address and kind of trying to load balance the traffic between or the connections between these various... And we'll provide back end redundancy with multiple paths to the different nodes. If there's a failure of a node, there's somebody else that can take over for that. So all of that sort of built into the SDS layer. Yeah, okay. Anybody else? All right, so Nolan and I will stick around. If you have any questions for us more one-on-one, please come up and chat with us. And we have a booth as well on the show floor, as the accumulist does, so there's lots of access to us. There's also a paper available for download on the Scalody website, as you see here on this URL, that talks about kind of the nature of the discussion we had here, a little bit more detail on some of the underlying technologies. So thanks very much for your attendance. Thank you.