 All right. Welcome. Thank you for coming in the morning. So our talk today is, well, the primary purpose is to talk about using Manila with Docker and containers. The title we thought was pretty catchy, and I don't know we didn't realize how many people realized the overlap between Manila the project, which is an envelope, and Manila the city. So anyways, the goal here, though, is to talk about how we provided a service for Chinese universities, and it was done out of the China Research Lab and IBM. The goal had several purposes. One of them was to promote the use of open power. Another one was to show that they can build cloud platforms on top of the open power platform, and then also divide a development platform for students to get involved and start writing applications and basically get them involved in the whole cloud building infrastructure. So this is the team. Unfortunately, the people from the China Research Lab couldn't make it, and Michael Hines, who was on assignment there, also couldn't make it. So we've been working with that team a lot, and helping them get this thing up and running. So we're going to talk about it. The rest of us are from team working with Spectrum Scale, otherwise known as GPFS, and building Cloud and OpenStack integration with Spectrum Scale. So the goal here is we want to show why we built it to expand a little more in what I was just saying. What are all the pieces that we put together to make it work? And then give a demo of the system and what the students actually see when they are using the system. And talk a little bit about the challenges of integrating all of these different projects together inside of OpenStack. So here's an overview of the system that has been provided to the students. You can see it's used by about 30 universities right now across China. There's a lot of different services then that they've put up that the students can then get involved with and use depending on their interests. You can see a big data service, cloud service, HPC type service, and a lot of other aspects to the system. So I think I'm going to pass it off to Bill now. I was going to talk more about the details. All right, thank you. OK, so as Dean mentioned, a lot of the people working on this project are from research. And it seems to me that people in research want to try and get as many different components as possible as disparate as possible and see if they can make them all work together. So in the previous slide, there's a lot of different pieces that the research, one of the research goals is just to prove that they work together, select the right components for the production system. For the analytics piece, the big data analytics service, these are the main building blocks. So building an analytics service, you need storage, you need compute, and you need an application. So to provide storage, we're using OpenStack Manila. This is the shared file system service, and I'll talk a little bit more about the details of that next. Then underneath your Manila service, you need a storage back end. You need a way of making storage available to all of your different consumers, be it Manila as well as the other consumers in the system. And it needs to be extensible. It needs to be maintainable. So that's where we're using OpenStack, I'm sorry, IBM Spectrum Scale. Then for compute instances, we want something that's fast, easy to use, easy to bring up, and doesn't consume a lot of resources. That's where Docker containers fit. Now you've got your compute and you've got your storage, you need to be able to provision that and easily manage that. So we're using OpenStack Heat for that overall management of the environment, being able to provision both storage and the containers. And finally, what's the analytics engine? That's where Apache Spark comes in. So we'll go through each one of these now in a little more detail and explain how we're using them. So Manila is a project that's been in OpenStack at the summit for a year and a half, maybe two years. It's matured quite a bit over the last year and what its main goal in life is providing a shared file system to compute instances. It, like Cinder and other storage applications, it provides a vendor-neutral set of APIs for provisioning and attaching file system-based storage and supports protocols like NFS, SMB, and in the future, other protocols like native GPFS NSD protocol. It supports access control for the shares, multi-tenancies, and also supports the operations that you might expect, being able to manage system shares, so create, delete, and list shares, manage access to those shares, so define which VMs, which instances can mount different shares and manage the access share rules as well. Finally, something with preserving the contents of the share itself, so being able to do snapshots or clones of the shares in the file system. So that's Manila at a high level. Typical use case, I think most people have probably seen this, but you've got an environment here where on the left-hand side, you've got the share provider service. This is your controller node. You've got a number of different shares that have been created, as well as new ones you wanna create. On the right, you've got your compute environment. This is where your instances are running or can be launched, and instances, again, can be VM, can be Docker, can be VMs, or they can be Docker containers. So the first use case is you've got an existing share. You wanna make that available to a certain number of compute instances. With Manila, you can say, for my R&D share, I wanna publish that or make that available to compute instances five and seven. So with Manila, you set that up, then from those compute instances, you can mount that shared file system and begin using the data, publishing or consuming data. The next use case is you don't have the share yet. You can create that from the Manila interface. You can specify things like quotas, how much space is available to the users of this, and then in the same way, you can publish that shared file system to one or more compute instances. And we don't show it here, but with Manila, you can publish that to VM instances, Docker instances, as well as bare metal instances. It's just a set of IP addresses that you're publishing to. So it's a very general service for managing shared file systems. Okay, next, a little bit of information about spectrum scale. This is the product formerly known as GPFS. And it's, as we show here, it's a high performance clustered file system. But what we're excited about and how we're using it in the open stack environment is that it provides a data plane for all of the different kinds of use cases you wanna deploy on this storage. So from the open stack perspective, we've got Nova, Cender, Manila, Glance integration. We also have object storage that's integrated tightly with the spectrum scale environment coming out soon. And then you've got the data plane, below the data plane, you've got a variety of different kinds of storage. You can have SSDs, you can have storage controllers, like an XIV or a V7000 or another vendor storage. You can have a GNR native rate environment. You can even have tape and cloud storage. All of these are configured within spectrum scale as storage pools. And by doing that, that allows you to do things like create policies that automatically will move data from different levels of storage, different classes of storage to another, depending on the definitions in your policy. So I can write a policy that says if data's being accessed frequently, if it's hot, I wanna move it to SSD and reduce latency. Contrarily, if I have data that's rarely accessed, hasn't been accessed for a week or a month, I can automatically tear it off to a lower class of storage, tape or even another cloud system. So those are a couple of the key features that we really like in GPFS and are trying to leverage in the open stack environment. The key is we're trying to provide data in the best location, on the best tier, at the right time and make that data available across the data plane. So then how do we integrate Manila with spectrum scale? We have a GPFS Manila driver that leverages the features of GPFS with Manila. It supports both Kernel and Ganesha NFS servers. Again, we take advantage of the tiered policy based storage pools and for quota management, when we create a share within GPFS, that's a GPFS file set and we define the usage quotas at the GPFS level and so that provides a very convenient way for managing that for our users. You have other enterprise features like encryption, compression that can be enabled as part of the environment as well. And the Manila GPFS driver is now available in the Kilo release. So again, how does this look in a real environment? So we've got our controller node, we've got a number of shares that have been created there or can be created there. You've got a number of compute nodes and within the compute nodes, you've got the shares mounted there. Manila helps manages mounting those and then with mount automation that we'll show in the demo, we're able to spin up VMs or spin up containers and have the correct Manila share mounted and available for use as soon as the VM comes up or as soon as the Docker instance comes up. Okay, that's at a high level what we're doing with Spectrum Scale and Manila. I'm gonna hand it to Nilesh now. Hi, so thanks Bill for taking us through some of the building blocks like Manila and GPFS in this solution. Now I'll talk a little bit about how Docker containers and heat helping us to provide the end-to-end solution. So why are we using Docker containers? So as you know, Docker containers are lightweight, pretty fast, they come up pretty fast, they can be destroyed pretty fast as compared to the VMs. And also they provide a very high density as compared to VMs on a particular compute node, right? So we use Docker containers for various purposes. One of them is to use... So since this particular deployment that we are working on with Supervases Cloud is they have Juno release deployed on their host and we were using the Manila driver for GPFS from the Kilo trunk, right? So they're running the Manila service, all the Manila services itself inside a Docker container providing a separate environment to run those services, right? And we use Docker containers to deploy and run some of the Python demos, some of the typical usage as I said, Manila services run inside a Docker container and we use Docker containers to deploy big data clusters and we use heat to do all that thing. So why Manila? So for the Docker containers running a big data service they needed a shared file system to access the ingest data as well as to put the results onto it. So since we needed a shared file system and shared file system in open stack environment is provided by Manila, right? With the GPFS service or GPFS driver you get to use NFS shares and that can be created within the open stack ecosystem and deployed inside the containers using heat orchestration. So that's why we are using Manila to help us in that sense. Next, so why heat and how that helps? So heat, as you know, is an orchestration engine inside open stack ecosystem, right? So it is based on templating mechanisms you can create templates wherein you specify the resources that you want to use inside your deployment and how you want to deploy using those resources. So that is how heat is there and how we use heat inside our deployment is we create templates where we mention the Manila shares or all the parameters required for Manila shares and create stacks. So the heat stack will create clusters, cluster one, cluster two for a particular user. So user logs in and a request for Spark cluster for these big data applications, right? So these containers are created accordingly using the heat template behind the screens. So how everything fits in together? So this diagram shows how all these things, like we said Manila, GPFS, containers and heat, how all these things fit in together to provide a solution. So this is an overview architecture of this service, the Supervizel service that is provided, right? So you have UI, user account management, et cetera, at the top layer and a user dashboard from where he can access services like cloud infrastructure service or big data service. We are focusing on big data service in this talk. So user access is the big data service and behind the screen the heat template comes into picture which works with Nuva, Neutron, Manila. It creates a Manila share, creates a subnet using Neutron to launch the Docker instances of the requested size. We provide, you can have three instances or five instances. This is for your students, right? So to learn the big data. So that's how we use, Glance is used as a repository to store the Docker images. So Glance is also coming into the picture and this is how shared folder is created by the heat engine to give various clusters for user A, user B, user A gets his own folder, shared folder which automatically is mounted inside the containers that he gets for the big data analytics work that he wants to do. This is a bit deeper look into the system as you say. As you can see, the billing and authentication is there, big data dashboard is there from where use logs in and request heat template as I said communicates with Neutron, Manila, Nuva Docker, Glance to give you Sparkluster, container one, container two, container three. One of them is being the master and that's being the slave nodes in the Sparkluster. So we have a small demo. So this demo will show you how all these things happen. So in this demo, there are two scenarios that we are talking about. First scenario, you create a Sparkluster, the user logs in into the Supervessel service. By the way, this Supervessel service, the PT Open Lab is online, available, publicly available for anyone to log in and try these things out. So user logs in into the Sparkluster and he gets a pollution data, PM2.5 history for the last year. He gets that onto the shared folder that he launches into whenever he logs into the Docker container, the master node, he gets a shared folder that is created out of Manila it is pre-mounted inside all these containers. He gets that across. So he gets, pulls in this pollution data, puts it into the shared folder, which is now available across all these containers in the analytics work, and then launches an analytics service, which analyzes this data and produces results. The second scenario is we wanted to show that the same data, the same Manila share can be mounted inside another null instance and can be utilized. So that's how this demo is about, right? It is showing you the Supervessel service with various types of services available on top of Power Cloud. You enter the big data service, you create one cluster now, you choose what kind of cluster you want Spark or MapReduce, we are doing Spark in this environment. You select the disk size you want that is translating into the Manila share size behind the screen. Now Spark cluster is getting created, all these containers are created, the Manila share is first created, mounted inside all these containers. So user gets an IP address to log in into the master node, the floating IP as is shown over here. He logs in. You need to run a VPN to get into this, to log in into this service, right? So you can see the Manila share is already available and mounted inside this container, CD mount and then you see it is empty right now. Then you get the pollution data as I said, you enter and you can see all the pollution data for the last year that is being available. PM2.25.txt has the number of particles, right? It is actually the number of particles of the polluting agents per day. Then you run the analytic service run.sh. So within 20 seconds, the last year's pollution data is analyzed and you get the results. So as you can see, you get the results grade one, type of pollutants, grade two type of pollutants and you see the worst dates where the pollution was the worst. You also get to see what is the average pollution value across the last year's history. So now we are getting into the second scenario wherein we will show how we can use the same data into another instance. So we are using a Python science image to launch another instance. So the image is launched and you can get into that. You can mount the same data inside this image and then use it. So this is Python notebook. You log into the web-based notebook for Python processing. So it's a notebook wherein you can write the scientific programs and here it is filling the data from the same file that we saw initially that was put inside the shared folder. Now it is available on this instance as well. So it is actually reading this pollution data out of this file and then plotting it. So you are getting a plot of the pollution that has happened over the last year. So thanks for watching this video and welcome to Superficials. So that is all about this demo. We have another demo which is in much detail more than 10, 15 minutes or so. We'll post that which gives what happens behind the screens. We'll post that along with this video when it is released on YouTube. So what are the challenges that we face and what are the learnings? We wanted to share that. So connecting storage to Docker is currently not available, right? So you do not have ways to attach volume inside a container or mount a share inside a container from NOAA. The NOAA IPs are not available there. So what we do is we bind mount this Cinder or Manila shares, Cinder volume or Manila share inside the container. We use actually the Python equivalent of that inside NOAA. So that these shares are available inside the containers whenever the container boots in. Also as I said, the Python client in Juno Relays did not use authentication based on tokens. So we had to pull in some patches from the Kilo Relays and make it available because our base installation deployment was based on Juno. Heat did not, so some of the learnings while we were experimenting with this thing. So you have to have the ordering inside the heat template correct, right? So you have to first delete the containers, release the resources and then you delete the storage. It was happening that we were first deleting the storage and then the shares were getting deleted and then container was not able to access them. So that has to be properly done. Manila is not yet supported in Cilometer, right? So we cannot get the statistics of the usage of the share, et cetera inside Cilometer. So currently it is open and free service, but going forward if someone wants to build a service, a paid kind of service on top of the Supervises Cloud, right? You want the billing information, you want how much access happened to the share, et cetera, that time you'll need all these metrics and Lippenten also does not support any of these metrics. So that needs to be overcome. Currently it is an unsolvable problem and this is one of the challenges that we faced. So that's pretty much of it. I think we are pretty good on time. So any questions? Sahara, I don't know. One thing right now is that Sahara doesn't natively support containers, as far as I know. Still work there to be done, so that's part of the reason there. As well as Sahara is not natively hooked into Manila, so that's actually something we really like to work on is getting those pieces all fitting in together, right? So that when Sahara launches up, you can provision storage for the analytics you want to use and then be able to use those pieces. So Sahara, you can see how many pieces were put together here. So Sahara has yet another piece then to link into the overall system. And then the third reason there as well is that Sahara currently doesn't support any other file system other than HDFS. So we're working on that as well. Yeah, it has some hooks, but yeah. It's running general, so Spark is okay, but running general Hadoop workloads, I don't know if you saw our talk on Monday, is challenging with Swift as well. So getting all of those pieces put together, yeah. So we have Spark, we have MapReduceService also, right? So and all these things are binding those together and also Manila being relatively new in the community and it's not that well integrated with some of these Sahara and some of these things. Inside, no more Docker, so basically, yeah. Repeat the questions and then move on. Yeah, the question is, yeah, you can again ask the question from the mic. I would like to understand where in NOVA this takes place at what stage of the provisioning process? Right, so since currently NOVA does not support Manila IPS, you cannot use NOVA IPS to mount a share inside a container, right? So we had to pass on the user data to the NOVA Docker, the cloud unit, we'll then create a container based on that user data it will mount a share inside. Okay, another question I have is in your demo, you showed logging into something and then running the job. Was that something the container itself? Yeah, it's the master node of the container. Okay, so does that mean you had a SSHD running in the container, how does that work? Right, so we have, so you get the floating IP when you create a cluster, right? And you get a VPN account and you log into that, that particular floating IP and then port forwarding happens to land you inside the container. Do you actually do SSHD inside the container itself? Yeah, I'm asking, because one of the best practices around containers, at least that people hear a lot is you use one app, one process, one service per container. So I was wondering if you, in your case, you were using containers more like operating systems? Yeah, kind of, but it is mainly for the big data service and but you can SSHD because you need to pull in the data that you want to process and write. Okay, thank you. Yeah. Any other questions? Oh, okay. All right, thank you very much. Thanks a lot. Thank you.