 Hey folks, morning it's 11.30, I think we're going to go ahead and get started here. My name is John Benedict, I'm a technical marketing engineer with NetApp, and I'm based out of Research Triangle Park, North Carolina, and I'm very happy to be here today with Rodney Peck of eBay PayPal and my colleague, Shiva Chinia, from the Advanced Technologies Group also at NetApp. Today we're going to go over considerations and lessons learned in deploying OpenStack in three different examples. I'll be talking about things in regards to a converged infrastructure. Rodney is going to talk about his experiences with OpenStack at eBay PayPal, and Shiva is going to talk about DevTest within NetApp, so we've got, after that we're going to hopefully leave 10, 15 minutes for some QA. So let's dive right into some challenges that are not necessarily unique to enterprise, but certainly challenges within the enterprise nonetheless. So from an operational standpoint, anytime you're adding in something new like OpenStack that has a fairly substantial learning curve, there's complex administration involved. In the enterprise you always have those project life cycles, data life cycles, things going from DevTest, staging production, projects going from high priority to lower priority and then off to archive. You've got to be able to handle changes in a very graceful manner, whether those changes are planned or unplanned, and of course you've got to be able to spin up, scale out, scale up in a graceful manner as well. And last but not least, there's always going to be some kind of security and compliance certifications to achieve security and compliance issues to consider. Design challenges, you don't always have time to do a six-month POC to design something that you can scale out yourself. At the end of that six months, if it works out, great. If it doesn't, you have to go back to the drawing board. You may end up with a management team that says, we're out of time and money. You need to go. You need to go forward with what you've already got. So from there, obviously whatever that design, it has to be scalable, it has to make the best of the resources and obviously when you scale out and scale up, you've got to have some predictable performance. So some things up there, you really have to have a scalable, repeatable, manageable design and then from the enterprise standpoint, when you go from traditional virtualization to hybrid cloud, enterprise customers don't necessarily want to give up the support, integration and predictable performance that they've come to expect. So along with those technology changes, we've found that there's also changing roles that some of the nuances for the folks that actually work in the data center. SysAdmins, NetworkAdmins, folks that do the racking, stacking, so on and so forth. Those changes, those changing roles means that your traditional builders and operators are now brokers of services that have to roll with all of those changes and updates in a rolling fashion. Essentially everyone has to think as a service provider, whether you're the overall architect or you're one of the teams that does the racking of stacking everywhere in between. So one way of solving these challenges is with the converged infrastructure. We happen to put this together on FlexPod. You could potentially use something else. But the big benefits to a true converged architecture is that somebody has already done the hard work in terms of putting it together, running it through the gauntlet, making sure that all of the different components work together really, really well. It's not a bolted-on approach. Everything's been scaled out and tested, scaled up, scaled out, break things to see how things react when things fail. Secure multi-tenancy is always a big deal, especially in regards to open stack. And possibly the most important feature beyond scale up, scale out and performance is the fact that a true converged infrastructure is going to provide you the ability to have full automation over that environment. Your typical 1U2U server is easy to set up and it's fairly easy to manage, but it's not necessarily easy to automate from that mythical single pane of glass or those couple of tools that you have to manage your environment. Whereas something like a FlexPod provides a rich set of APIs that you can affect everything from a wide standpoint, from a granular standpoint everywhere in between. So the basis of that converged infrastructure is NetApp clustered on tap. And I'll make this very quick. This is not an advertisement. So a lot of folks raise an eyebrow, NetApp and open stack. The truth is, is we've been involved from the beginning. We've had, we joined the foundation at the gold level right after it started, right after the foundation was created. And we've done most of our work around the sender drivers, allowing folks to take advantage of things like deduplication, thin provisioning, cloning, copy, offload, things that existing NetApp customers already know and love about us. You can take that with you in open stack. Additionally, that's kind of a no-brainer that we're doing that with iSCSI, but possibly the big news is the fact that we've added in the file services drivers by way of NFS. So you can do that today with NetApp sender drivers. It's been upstream since S6 Folsom time frame. And we're also in the process trying to break that those file services drivers outside of sender into its own project called Manila. What's a Manila folder do? Holds files. So we had to keep with the clever naming scheme on that. So the basis for Clustered On Tap is we have one or more two node HA pairs bound together with a 10 gig cluster interconnect that provides a single namespace for both SAN and NAS. And can also support as the foundation for Object Store. From your NAS and SAN clients, i.e. sender services attached to the namespace by way of lifts, logical interfaces. These are storage connections are completely decoupled from the underlying hardware. The storage volumes themselves are completely decoupled from the underlying hardware. And then the other virtual piece here is the concept of the storage virtual machine. In this case, we have a yellow one and a purple one. It's just a logical container that can be part of one node or it can span the entire cluster. What these things allow you to do is a, you've got secure multi-tenancy built right into the cluster. And we can do some pretty crazy things with the storage interfaces and the storage volumes that we can live migrate them just like you would a virtual machine. So how does that impact and enable OpenStack? Well, it means that from the same cluster we can put together, we can mix and match different controllers of different speeds and memory footprints. We can put together different disk drive types. So high capacity SATA, pretty well performing SAS, high performing SSD, so that we can handle multiple SLAs, multiple workloads. We can handle the complete lifecycle of a piece of data as it goes from dev to test to staging to production. You can promote those volumes from the lower priority higher density left hand side to the higher performing SSD drives to the right or anywhere in between. So what we did from the enterprise testing was we put all of our ice fizzy bootlums on the far left. With the idea that we would have tier one workloads on the far right, tier two and tier three somewhere in the middle, dev test staging again somewhere between the left and the middle. So again, being able to handle all of those different service level agreements simultaneously, handle those life cycles, absolutely applies to what enterprise folks want to do with OpenStack. So some other things that we have that absolutely speak to OpenStack. We've got eight major areas here, two of which are not unique to NetApp. So for example, continuous operations and seamless scaling, whether it's our traditional enterprise competitors or some of the cool commodity storage solutions. Everybody's got something that's worthwhile with continuous operations and seamless scaling. But some of the other things we feel like we've got some unique capabilities around secure multi-tenancy the way we handle that quality of service. Again, being able to handle multiple SLAs in the same environment, the unified architecture, being able to support NFS, SAN and NAS from the same storage controller, same cluster. Again, the Cinder and Manila drivers that we're working on upstream. Service automation, again, not only the converged infrastructure being able to be handled from a full automation standpoint, but the storage itself. Being able to say I want 100 terabyte or 100 gigabyte volume with it's going to be on SSD, it's thin provisioning, deduplication, and oh yeah, that's my platinum volume. So anytime I ask for a platinum volume, that's what you're going to give me. And everything else that you would want to do to affect the storage. Data mobility, again, being able to migrate those volumes, not only from within the same cluster, but if you need to mirror from site to site, cross-country, cross-campus, we can do this. Storage efficiency, I mentioned deduplication, thin provisioning, cloning, copy offload, things like that. So again, these are things that absolutely apply to being able to really help scale and make the best out of OpenStack. And more importantly, the application workloads on OpenStack. So that's the foundational piece for our converged infrastructure in the enterprise. The Cisco pieces in the middle represent both the compute and network spaces. And this really is a modular approach to the compute and networking. Again, fully automated, probably the best concept that comes out of the UCS in my mind from the automation and stateless computing standpoint is the concept of the service profile, which allows you to take the identity of the server and move it to other servers within that framework. So the magic smoke escapes from one of the important chips on the server motherboard. Instead of having to adjust your network and your configuration for new MAC addresses, new WWPNs, new ISCSI initiators, firmware levels, that's what that service profile is. You just take that service profile and apply it to a different server, and that's it, no other configuration necessary. So unified fabric that again allows full automation at that layer. So what about, how do we do the networking with this deployment? We ended up going fairly simple with really good results. We essentially used OpenVswitch as a VLAN provider. My friend and colleague, Stephen Carter, was a big help with all of that. He's got some pretty big name customers on the West Coast that will remain nameless, but this is essentially how some of them have decided to do it. The VLAN, using it as a VLAN provider not only means we can carve up the 10 gig in a very easy way, but it also means that we're keeping those VMs as close to the physical network as we can. It ends up being very highly performing. RELL-OSP, why didn't we pick RELL-OSP for this particular enterprise approach? Again, with some of the enterprise angles in terms of support, integration, things like that, it was kind of an easy choice for us. Things like the fact that Red Hat treats RELL-OSP as an extension of RELL means that things like SE Linux and C Groups are built right into, RELL-OSP is able to take advantage of. The fact that they've got a world-class security team, a security response team that takes care of security concerns very, very quickly. Complete life cycle that they've taken from RELL and applied it to RELL-OSP is a pretty big deal. The learning curve, addressing the learning curve of going from traditional virtualization to hybrid cloud and then being able to show certification and master those skills through certifications. And arguably the most important piece is support. Again, enterprise customers don't want to give up supportability as they move from virtualization to hybrid cloud, so this is a pretty big deal. They, year after year, they rank pretty highly in terms of support. Enterprise customers typically also want to know that they're dealing with a leader in the community. Red Hat is consistently one of the top contributors to the code, showing their commitment, certainly doubling down on OpenStack for the next many, many years. The other thing is a lot of folks don't have, a lot of data centers don't have folks that they can dedicate to actually coding in OpenStack and contributing code. They're limited to resources that can work on the databases, work on the actual application that they're known for, they're money makers, so to speak. So Red Hat can be a conduit for getting feature requests and it's not a guarantee by any stretch of the imagination but at the same time, knowing that Red Hat is a top contributor means that someone has a decent chance of, hey, this is a big deal to us. It's a very common case that it's a big deal to other enterprise customers as well. So that's the top layer of our converged infrastructure. So if you're looking for some documentation on that, we've got, I don't have time to leave this up for the entire time, that's okay. We have, you can snap a picture of it. You can also find QR codes to download these at the NetApp booth downstairs. So with that, I'm gonna go ahead and turn things over to Mr. Rodney Peck of eBay PayPal. Sorry about all the noise here. It's very interesting. Okay, my name's Rodney. I'm from eBay and this is an example of using Cinder in a typical environment. The situation we've got is we need to upgrade a hypervisor. This is a very common sort of thing for a large company. We have no physical space available in a data center. There's no place to put more hardware but we need to do a security upgrade on a hypervisor. The hypervisor's totally full. So we need to find some place to create more VMs so we can take the data offline. In this example, I'm gonna keep the microphone. In this example, we've got four hypervisors. Each has six VMs on them. The hypervisor itself, I'm sorry about the math. I just noticed it doesn't add up but work with me. This is a 1.2 terabyte hypervisor with six 200 gigabyte VMs on it. That takes up all the local disk. However, the hypervisor has 32 processors, not 16. And we're only using two per VM. So we're using 12 processors and 48 gigabytes of RAM. So there's a lot of space left over on these hypervisors and the VMs are mostly idle. So we could run more VMs on these hypervisors and free up space on the first one but there's no more local disk to boot from. This is where Cinder is the solution. We can solve it using Cinder and a filer as a backend just with software config. There's no hardware to deploy, nothing like that. We've already got a NetApp in the data center. It's got available space. It's just a matter of using it. So again, here's the full hypervisors and John's redone my slides so I don't know which ones which. So the ones on the right here now, we've used Cinder to boot from volume and we have now seven through 12 these new hypervisors running off a NetApp. Totally easy thing to do. The NetApps are connected with 10 gig networking. The disk IO is actually faster than the local disk IO which is kind of a surprise to legacy people. So with that installed, there's other things you could do. You could do VM instances backup if they're running all on the local disk, running on Cinder instead of local disk. You can do snapshots in the filer to do backups. You could clone them. You could just NetApp clone into a new volume and boot that volume so you can build machines much, much faster. Because the machines are attached with Cinder, if the hypervisor dies, if the magic smoke leaks out, for example, you can just start up another Nova and attach it and you don't lose any of your local data. If you're using local disk and the machine dies or your switch connecting the machine to the network dies, you can't get your logs. You can't get any of the explanation of why the machine died. If it's attached to Cinder, all that data will be on your NetApp and you can just attach another VM to it and look at the logs and fix your problem or move the VM to another place. Also, in PayPal, we have really large volumes that we clone. So we have 400 gigabytes. Instead of moving it through Glantz to create a new system, we have 400 gigabytes. We can clone it in the NetApp in seconds into another copy of 400 gigabytes. So that's a fantastic application of Cinder as well. So I had 10 minutes to talk. I think I blasted through a little fast. Lessons learned, it works very well. It unblocks the VM migration. But the main issue is that unless you have bonded networking off your hypervisors, you've only got one data path to the Cinder. So if the switch should glitch or anything like that, you're gonna have totally hung VMs. So it's important to have very reliable networking if you're gonna use Cinder for boot for volume. That's all I have. Thank you very much. After this, we'll be available for questions and such. Thanks Rodney. So I'm Shiva from Advanced Technology Group in NetApp. So I'll be describing the work that I did with the test dev team in NetApp to solve their use cases. And I'll be describing some experiences and some analysis of our OpenStack installation that transformed the test dev infrastructure there. So the specific use case that we were looking at was the data on tap test dev infrastructure. So the infrastructure provides the basis for NetApp developers and QA to add and test new changes in data on tap, which is the storage OS that underpins our fast series. And it's a completely virtualized infrastructure which simulates data on tap and it's sort of medium scale with thousands of users spawning VMs daily and they deploy on tap to test any new features that come out in releases. A typical deployment of a test bed in this infrastructure can range from a single VM running data on tap to complex multi-VM stacks which could be a combination of data on tap VMs and test client VMs, which could be running load on the data on tap. So apart from the regular properties that you would see in a test dev infrastructure, there are two sort of unique things that we found in the test dev infrastructure which didn't immediately suit our OpenStack sort of use case. One thing we found is that most of the on tap VMs run custom images that are compiled and built by the user before they start, immediately before they deploy it on the infrastructure. So at any given time, you have all these VMs running kernels that are all unique and custom modified data on tap. The other thing about the data on tap VM is because it simulates storage functionality and because it's replicating all the functionality that's there in fast, typically the number of virtual disks that are connected to a VM are much more than 20 and so on. And we noticed that this is not the typical use case that is solved by sender. So I just wanted to point out these two things that were unique to the test dev infrastructure use case. So here's a quick comparison of how the physical fast box on the left looks in comparison to a virtual data on tap. All the storage resources that's required for the work by the virtual data on tap, which could be NVRAM storage or the data disks are all completely virtualized into and mapped to virtual disks that are all consolidated on net app, actual net app storage underneath. So why was a team interested in sort of adopting OpenStack and moving to OpenStack? And that's where we sort of came in and we initially explained to them the benefits of OpenStack. We all know the goodies of OpenStack and that's why we're all gathered here. But the specific benefits that they were looking for was really seamless support for multiple hypervisors within their test form. And currently they're using VMware as the underlying hypervisor to run data on tap VMs. But the promise of OpenStack to support multiple hypervisors and thereby expanding their test coverage on different vendors really appeal to them. The other thing that they really liked is the intuitive user interface of deploying and managing VMs. And for that reason, Horizon dashboard and the ability to use heat to single click provisioning multiple stacks of VMs and not placing the onus on developers to define all these complex stacks and having to manage them by themselves really appeal to them. So that was one of the real benefits too that they were looking for from an OpenStack-based implementation. The administrators of the test of infrastructure are currently sort of managing multiple tools to monitor the entire stack starting all the way from VMs to storage. And CELO meter was one tool that they were interested in sort of using because it provided that potential to be this one stop unified monitoring tool that can provide visibility into all the resources that are used by the test step. And while we are working on a sort of a pilot infrastructure for them, one of the main transition goals was to seamlessly integrate with their existing infrastructure. And as you can see in the pilot architecture that we built, we try to cater to that a little bit. So it's a combination of all the OpenStack services installed on their farm coupled with a legacy service that currently runs in the test of infrastructure. I'm calling it the ONTAP service. And this ONTAP service, it does a lot of things that are very specific to the workflow that currently exists. For example, the build environment. So I mentioned that users have to compile the ONTAP code initially and then transport these images to the OpenStack-based service. So the ONTAP service acts as a bridge in order to accomplish that. And it also does several other things which I can go offline if you guys are interested in talking about that. And then we made a few point changes to some of the services. I'll be explaining a couple of them. So some of the changes, we are looking to push them upstream. And then there are some changes that we made which are more custom and most suitable only for our use case. Yeah, and then one thing I wanted to mention here is the actual hardware infrastructure that we use to build this prototype is FlexPod, which John was talking about. So it's a convert solution that includes Cisco UCS blades and clustered ONTAP underneath. So try to summarize some of the benefits that we were looking for. And after building the infrastructure, these were the key highlights that we were able to derive from our work. The single click deployment of complex stacks that include ONTAP VMs and test clients was a big plus for us. And I mentioned that data ONTAP VM uses NVRAM storage, tries to simulate NVRAM storage and the data disks. And essentially those two different storage data models, the VM requires multiple service levels from them because typically the expectation of performance from NVRAM is much faster than the data disks. And we were able to simulate that with the help of SLO differentiated backends that sender supports with multiple volume types. The multiple hypervisor support for VMware, Zen, KVM and so on as the list is growing is really a big plus for us to expand the test coverage. And then the vendor agnostic monitoring and reporting again as a nice bonus to have. So in the next few slides, I'll be talking about a specific technical sort of analysis that we were able to derive from our study. And one thing we found out was in the Havana release, which was a target for our pilot infrastructure, we noticed that the compute scheduler of NOAA was left behind compared to some of the enterprise grade schedulers like VMware, DRS, distributed resource scheduling. And what was nice to see was in the ISOs release, they seem to have immediately addressed that problem. So for example, one thing that OpenStack NOAA does not do is look at instantaneous resource utilization of the VMs and the host to make more intelligent decisions of placement of these VMs. It seems to only look at the headroom based on configuration sizes. And in the ISOs release, I noticed that OpenStack NOAA has created a framework where you can look at resource utilization metrics and schedule based on that. And as that work is going, it'll be really useful for us to leverage that and make our infrastructure more protection based. Automatic load balancing is something that will be really useful for the infrastructure because we have a lot of ebbs and flows in the workloads with a lot of VMs being spawned and torn down and a lot of debugging going on and users rebuild and launch VMs again. So really effective utilization of resources and consolidating all of them and as for your hypervisor, host as possible, this is a key requirement in infrastructure. And obviously the benefits to using OpenStack NOAA compared to something like VMware DRS because it's extensible, you can make changes. We noticed that the API is pretty flexible in defining your own metrics for scheduling, which is pretty useful for our use case. So one thing I mentioned earlier is the nature of how sender volumes are used in data on-tap VM. A typical configuration requires more than 20 sender volumes to be attached to our VM. And what we noticed in the NOAA volume attach architecture is when you attach all these volumes with one single command through the NOAA attach API, internally it still seems to do all of them serially one by one by talking to the hypervisor. So in this case, I've shown a picture in which we're talking to vCenter and we're trying to serialize here all the attaches of the sender volumes. So this was a significant time drain for us in the context of VM provisioning. It added about 40 seconds for a single VM provisioning. And we fixed this with a simple solution where we leveraged vCenter's API that can enable actually bulk attach of all the virtual disks in one single command. And this is an example of one thing that you can do with open source code like NOAA and make these changes to suit your use cases. So this change has really improved our performance of provisioning by about a factor of two or three and we're looking to push this upstream. So the other thing I talked about was the use of silometer and the promise of silometer for the infrastructure. And what we are noticing with silometer is it's still evolving. A lot of features are being added in ISOs. We didn't see support for VMware in Havana but we now see support in ISOs. And as we speak, I think we're adding lot more metrics and monitoring which will be really useful for both diagnostics, charge back and what all the use cases that can be built on top of silometer. And this is something that we really like to use for other automated things like an automated ticket filing if something goes wrong in the farm and so on. So we're really using silometer for that purpose. So to summarize some of the lessons that we learned from our use case with test of infrastructure and NetApp is we really like the fact that OpenStack is highly customizable and extensible. We added changes, we added our own service, we were able to combine it with the OpenStack installation to suit our use case. The seamless support for multiple hypervisors for maximum test coverage is a big plus for us. Silometer capabilities are still evolving but the potential of one-stop monitoring service is highly desirable for us. The other thing I'm noticing is that the capabilities of NOAA scheduler to bridge the gap with an enterprise-grade scheduler as that continues to shrink, it makes our transition a lot more simpler. Thank you. But, John. So with that we're gonna go ahead and open up for questions. We've got a microphone in the aisle here. If you don't wanna get up in the aisle, you can raise your hand. We'll still answer your questions to the best of our ability. Yes, sir? Sure, so. Yep, so from the storage perspective, ah, yes. It was asking about secure multi-tenancy. How do we implement that? It starts with the storage. Talked about the storage virtual machine, to secure entity within the storage cluster. In traditional NetApp, you would create a volume, export that volume, or carve a line out of a volume and share that out. With Clustered On Tap, you don't do that from the admin account anymore. You have to create an SVM. That can handle NFS, or SAN, or both. And that SVM, if you want it to share name IP spaces with the other SVMs, you can. But it can have its own routing tables. It can have its own IP space. Things like that. You can lock down ports just like you would with Linux. You can say AD and 443 and 22, and other ports. You can shut a lot of things down. There's some pretty good granularity in how you can secure the storage. On top of that, obviously we can do VLANs between the storage and the network, and that can go all the way up to the application. The Cisco Nexus and the Cisco UCS have some capabilities that extend that virtualization, that the security, and of course, REL has IP tables, C groups, SE Linux, so on and so forth. It's not just one mechanism. There's multiple layers of security at each layer that complement each other. And then that can be used, tenet to tenet. So we can keep Coke separate from Pepsi. We can keep marketing separate from sales, so on and so forth. Yes, sir? Yeah, I may have missed this because I was a little late, but could you elaborate a bit on what distro you used of OpenStack, or did you do it? Absolutely, so. And you mentioned something about a FlexPod, and does that come with a distro of OpenStack, or how does that fit in? So we ended up using REL OSP Red Hat Enterprise Linux OpenStack platform for multiple reasons, including being a top contributor in the ecosystem, support, lifecycle training, a lot of the security certifications that they've already achieved with REL OSP, and the fact that there's a lot of things that are attractive to enterprise customers. The FlexPod refers to a converged infrastructure built around Cisco and NetApp. That is not part of the distribution. It is an example, converged infrastructure. There's other converged infrastructures out there that you can choose from if you choose to go that route, but from our standpoint, obviously NetApp has a vested interest in FlexPod, but from our standpoint, enterprise customers can really benefit from the pre-validated designs, the fact that you can do a full automation of the entire stack top to bottom. Anybody else? Okay, great. Thanks, guys. One quick question. Can you talk a little bit about whether or not you guys felt a need for segregating your storage traffic versus all of the other types of traffic that the stack brings? And if so, then kind of what was your approach? And if you can get into the details on the Cisco side, it'd be interesting to hear that. Yeah, so the question was did we separate the storage traffic from the other traffic and how did we do that? The answer is absolutely yes. I'm of the, my professional opinion is, not only do I like to keep storage traffic separate from management traffic, separate from data traffic, but I like to segregate even the individual storage, keeping NFS separate from iSCSI. And in a true secure multi-tenant environment, I would even go so far as to say that tenant A gets its own NFS VLAN. Tenant A gets its own iSCSI VLAN. It's a little bit more on the front in administration and configuration, but from the standpoint of keeping everybody separate, keeping everybody in their own swim lanes, it's critical, especially to not only enterprise customers, public sector customers, things like that. So the means of separating that is not only through VLANs, but through different IP spaces and things like that. So obviously a lot of layer two stuff there and layer three, but that's pretty much how we achieve that. So configured in the next switches and then configured in the storage and in the Linux side as well. Yes, so Rodney was asking how many tenants per VLAN or per, it could, I mean you're looking at, the question was what's the limit on the VLANs and I'm pretty sure that's gonna be down to the Cisco switches. I'm not sure what the upper end, no, it's more than that. It's more than 100 for sure. I would think it's in, there's a couple thousand if not mistaken. 4,000? Yeah, 4,000 VLANs. Thank you, thank you. Anybody else? You coming up or you leaving? Oh, he's leaving. If you wanna know more about the sender drivers that we're working on, all of that is available at the NetApp booth downstairs. So the sender drivers is under a document called the NetApp installation and administration guide. It has everything you wanna know about the sender drivers and what will become the Manila drivers. So thanks.