 Okay, it's time to start our presentation. Thank you for choosing our presentation as your first breakout session in Tokyo Summit. This presentation is a user story of NTT Resonant. I'm glad to get a Super User Award. And thank you for all of the community enabling our services. And it's great honor to be a part of OpenStack community. And we will share you what we learned to build web infrastructure with OpenStack as much as possible in the next 40 minutes. We are working for NTT Group and have these backgrounds. My name is Toshikazu Ichikawa, working for NTT. And his name is Tomoya Hashimoto, working for NTT Resonant. And his name is Kazuhiro Toriyama, also working for NTT Resonant. In this presentation, we cover these five topics. After introducing NTT Resonant business, we talk about our infrastructure design, that is how we apply OpenStack to our infrastructure. And we talk about not only the infrastructure, but also the framework we built on top of OpenStack to support our services. Especially, we explain how we set up a VM by using Puppet in connection with OpenStack. After that, we also talk about our operation and monitoring both of OpenStack and the VMs. At the end, we share our current issue and future plan. Let's start from NTT Resonant business. This slide shows what is NTT Group and the position of NTT Resonant in the group. NTT Group is one of the largest information and communication technology company. Our global operation generate a total revenue of $112 billion and we are 240,000 skilled professionals. We are the largest brand in data hosting and global IP backbone. NTT Resonant is 100% owned subsidiary of NTT Group. Its parents are NTT Communications providing long distance and international communication business and NTT.com providing mobile communication business. NTT Resonant collaborate with NTT R&D to build its platform, develop new technology and provide solution to customer. This is the business area of NTT Resonant. NTT Resonant is responsible to provide internet services of NTT Group we provide a wide range of services to individual. We provide the internet site portal GU, the e-commerce site for communication devices, NTT XStore and smartphone application and so on. Moreover, based on the technology we foster in B2C services, we also provide platform type services and customer centric services for enterprise business. For example, we provide cloud-based testing solution for mobile application development, disaster prevention response solution and healthcare solution. If you are a smartphone application developer, please check our remote test toolkit. You can use physical smartphone in our data center for your development. We also provide and operate many services of customers even though we are not in the position to disclose the name of services here. But for example, we operate some services of Japan's largest mobile network operator, DOCOMO. We operate search services, content services and mobile applications for them. NTT Resonant's core business is to provide a web portal site GU. GU provide more than 60 services including web search, blogging, news and Q&A site. Even though I don't explain the detail, I believe this image give you an idea of GU's variety. GU was launched in 1997. March of this year was the 18th anniversary of GU. And what is the scale of GU? It generates, it serves to 117 million unique user per month and it generates 1 billion page views per month and it's growing day by day. GU is the third largest web portal in Japan. I think if you are coming from foreign country, GU is famous and GU and GU are similar name, but it's different company and actually GU is about one year older than GU. And now GU is operating on top of OpenStack. That's the very overview of NTT Resonant business. Now let's move on the first main topic about infrastructure design. Let me talk how we meet OpenStack. We started our OpenStack development project in March 2014. I will explain what was required to us, what we have to achieve at the time. It is already decided what we stopped using existing data center and migrate our system to another data center because of certain business reason. The existing contract term was fixed. So we had to finish to migrate our system to another data center by the end date. Secondly, we need to accelerate our business speed as fast as public cloud. At that time, we started, we already used the virtualization technology of KVM at large scale, but we manually created VMs and managed them by operators. Manual operation has a speed limit and it was a problem that we couldn't create a lot of VMs within short time frame. We had to resolve these problems. Suddenly, we needed to provide a common and effective method to deploy not only a VM itself, but also middleware and configuration inside the VM. If we provide only the way to create a VM and to manage VMs, then it's not different from subscribing to our public cloud service. What our application engineer wants this framework to cover our workflow to provide our web services. It's not only to create a VM, it should include more such as installation and configuration of middleware, as well as system monitoring of 24 by seven. I will explain how we addressed and achieved these from now. This figure shows our organization and team structure. NTT resident platform team is responsible to provide the infrastructure to their service teams. The team has about 10 members versus more than 300 members of service team demanded to prepare resources. It's tough work to respond them quickly and efficiently. The platform team worked with NTT R&D to design this project since NTT R&D had a knowledge of OpenStack. We jointly designed and test OpenStack environment for NTT resident platform. When we find some issues of OpenStack, NTT R&D works to fix it in the OpenStack community. We follow this way to keep the consistency with the community as such as possible. This is our timeline, how we deploy the OpenStack and manage or our services on it. Our project started in March 2014. Our result, we thought our internal process and the procurement time from hardware vendor, we had OpenStack would be able to start working on October 2014 at the nearest. It's four months after that, we had to migrate our service from existing DC onto the new OpenStack infrastructure. We could achieve this timeline. So today, we are here and we can share our story. This is the scale of our OpenStack infrastructure. We introduced OpenStack as private cloud infrastructure at our main data center. It has been working in production since October 2014. It's working in stable more than one year. As of now, OpenStack support more than 80 services and one billion page view per month. It consists of 400 hypervisor, 4800 physical cores. On top of it, more than 1800 virtual servers are running. As you can see in the graph at the bottom right, the number of VMs keep increasing day by day. To support this scale, we use the Novasel to accommodate a lot of server to singling endpoint. This is a component which we choose. We used iSouth release as it was the latest release when the project started. Since the project time and the resource were limited, we try to keep our infrastructure as much as possible because OpenStack is well designed as sets of component. It give us the flexibility to begin with a small set of function and expand later if needed. OpenStack community is very active and provide us a lot of component and the choices under the big tent. So I believe it's important to figure out some minimum set to satisfy your requirement as your baseline. Choosing small set also contribute to reduce the complexity of day by day operation. We chose six component to build our infrastructure. It includes a Swift, Noble, Grants, Horizon, Keystone, and Neutral. We didn't use Cinder since we have ever built our services with commodity server without using shared storage. So we just follow the same way. This is the way how we deploy OpenStack. We used OpenStack packages distributed by RDO community. The version of OpenStack was as said, IceHouse and we installed CentOS 6 as hostOS for servers. The reason we chose this distribution is that our engineer were familiar with CentOS and Puppet. Therefore, we use Puppet to automate the installation, the infrastructure calls. RDO community already had a set of Puppet manifest to do that, we refer them so we could make it easily. Automation is important to enable us to set up a lot of server quickly. This is how we design the networking. Networking is a fun part of OpenStack design. There are a lot of features and the variety you can deploy. In this project, our initial step focused on the management of VM rather than networking. So we selected to begin with a simple networking architecture. That is using a networking provider network implementation. It only handle ray arts connectivity for VM. So we needed to configure physical network appliance such as a router and a load balancer, manually outside OpenStack as we did before deploying OpenStack. And administrator manage the configuration of network devices for tenant. Any tenant is not able to manage their virtual network. That's the current status. The community networking guide has a good reference that is similar to our implementation. If you need more detail, I recommend to check the guide shown here. This is our system architecture and the HA strategy. The system consists of six type of nodes. All node except computer node has high availability mechanism. To make them highly available, we use RapidMQ, MariaDB, Gallaudet Cluster, Pacemaker, and HA Proxy. With deploying at least two nodes per node type, these are fully redundant and the total rate to single node failure. However, computer node doesn't have a HA mechanism. The system doesn't provide high availability for VM. There's a concept of design. So it is the responsibility for an OpenStack user and application to take care of the higher availability of the services. I talk what we contributed to the community related to this deployment. When we tested OpenStack, we found some bugs. So we fixed them and contributed those to community up story. One bug was critical for us. Sherp, one of Nova function, didn't work with Nova cell deployment at the time. We use Sherp and Un-Sherp as a part of workflow. Our user or administrator moves VMs from one computer node to another with Sherp and Un-Sherp to have a maintenance over servers. So it was important. Sometimes when you use a new feature at a minor use case, you may hit a bug or find a problem. I want to say that it is important to share issues and to practice each other in the community. It may help you find a solution more efficiently than struggling alone. And I talk about which part we customize OpenStack to adjust our environment. We didn't want to modify the code because we know its costs to maintain. However, we decided to modify Horizon. The reason we modified Horizon is to enforce our operation rules in the environment. For example, it includes VM naming restrictions assumed by our other tools. As shown in the figure, our Horizon pop up a warning if the name of VM doesn't match our rules. Because we assume our users only use through Horizon rather than API, we didn't have to modify other components except bug fix bug ports. I believe each team has its own operation rules or workflow or requirements for other system integration. To enforce or achieve it, you may need some development. That's how we built OpenStack infrastructure. At next, we talk about the framework we built on top of OpenStack for our services. From now, I'd like to explain the way how we set up our VM on top of OpenStack. As we said before, we had to complete within four months from VM creation to service migration after OpenStack started in production. If we simply built OpenStack and asked engineers or service side to set up OS under middleware, it was unlikely that migration completes within four months. We had about 1,300 VMs. We needed to migrate. We needed to automate the steps as such as possible. It was fortunate for us that we had been working to implement the calls as puppet manifest. Since before this project, we had the puppet manifest at all door existing data center. So we think it will be possible to achieve the timeline if these puppet manifest works well in connection with OpenStack. We build a framework to apply puppet manifest to a VM on top of OpenStack quickly and easily. I'd like to explain how we set up our VM with puppet. We deploy an individual puppet master per tenant. The puppet master executes all VM setups in your tenant. VM setup includes a Linux account to creation and installation of software such as Apache and MySQL and the distribution of configuration file. And the minister of tenant makes and manifest for VM in the tenant. Even though we deploy a puppet master per tenant, we create a single repository of manifest and each puppet master refers to the repository. After OpenStack user creates a VM by using horizon, the user has to apply a puppet manifest to the VM. We created synchronization tool to make it happen quickly and easily. What requires to achieve it other than OpenStack and puppet? We have to add a record of the VM to DNS and LDAP and change our puppet manifest. Puppet users are host name as a key to apply a manifest. So the VM's host name needs to be resolved by DNS right after the VM creation. We use LDAP to manage the group of hosts when we have 10 web servers for some services. We went to handle them as a group with puppet. We have to manage VMs with LDAP to do that. When we add the group of hosts, we have to add the entry of the host group to puppet manifest. We implement the synchronization tool to manage the entries of DNS and LDAP by using OpenStack API when your VM was created. Let's see how it works. Our VM is created a tenant. The manage file, image file is a grant has a puppet agent to install it. It is able to connect to our puppet master. Our synchronization tool is pouring NoVar API every five minutes and it detects a new VM. That tool registers a record of DNS and LDAP. That tool also modifies a puppet manifest for a new VM and commit repository. OpenStack user updates a puppet manifest from the repository and is able to apply the puppet manifest to the VM. We manually did these steps before we deployed OpenStack. But after deploying OpenStack, we automated these steps by using OpenStack API. We addressed and resolved the issues with the mechanism I have explained. We were able to shorten the time to deploy our services drastically. We deployed 1,000 VMs within one month right after OpenStack started in production October 2014. It was impossible if we use minor operation as we used to do. We needed to five business days to prepare a new VM before OpenStack introduced. Today, it is possible to deploy a new VM and start the services within 30 minutes. We could remove the task of two operators by reducing minor operation. We got the benefits by this integration of service deployment with OpenStack. When we start new services, our new service engineer is able to prepare the system by following this new mechanism. When an engineer assists other projects or hand over their tasks to do another project, it is done efficiently with common framework. Okay, I'll show you our monitoring environment briefly. So first of all, this is abstract. We have two monitoring environments, two monitoring systems. One is for cloud infrastructure. So this side monitors network, physical servers, and OpenStack itself. And second one is for web services. So we as an infrastructure member of Entity Resonant, so we are in charge of providing standard service monitoring methods inside our company, even on the OpenStack. So I'll discuss them later. And we utilize Xerbix and RedMind for monitoring. One of the features is semi-autoboom monitoring. So our Xerbix detects booms automatically and monitors them. And second one is auto-issuing ticket system for RedMind. Once our Xerbix detects an alert, Xerbix automatically create new ticket. Also there is operation center for us. Operators watch Xerbix 24-7, and they try first response to simple occasion, to simple program. But in case of serious situation, telephone call wake us up even in midnight. So if our OpenStack had any trouble, any problem, they would call to us, info team members, and if any web services has had problem, they would call to web service team members. Going to our OpenStack monitoring. So this is simple stability order, we think. These are not all of our rule, but the basis for us. API monitoring, it means if any API doesn't respond, Xerbix will send the highest severe alert to us. So it's quite serious problem, as you know. And second one is process failure detection. As any process died, it should be repaired ASAP as soon as possible, to keep redundancy. And third one is performance monitoring. It depends on middleware or software. So for example, my scale connection number and so on. And the final one is most messaging log monitoring. So at the beginning of using OpenStack, we decided to use, we decided to treat any log message above error level as problem. So we don't have any knowledge about OpenStack logs. The lack of knowledge reached out. So we've been trying filtering, through using problem-free log day by day, even today. Now I'll talk for future operators from the aspect of operation. If you are a developer of OpenStack, you've already know about it. Well, so you might sleep in this slide. Okay, what's this? Approximately 220 lines and 120,000 characters. You know that this is log message from OpenStack. When I tried to just launch just one virtual machine. I feel smell of madness from here. So without debug logs, it means only 24 lines in here. So it's peaceful, I think. But finally we decided to our OpenStack. Finally, we decided to set OpenStack with debug logs level. So, explain the reason in this slide. Okay, this is a very simple example when the launching new instance failed. So let's try to analyze this situation without debug logs. Okay, at first, nobody turned accepting message. And second one is from scheduler. He said, attempting to build one instance. Okay, these are very easy to understand. But third one, it's called the beginning of sleep or sleepless night for you. So the new scheduler showed an error. Well, it's a very simple error. So the list of message must mean the lack of read disks. Okay, and this time, we'd like to get more information to drill down this situation, but that's all what you can see without debug logs. So, is it enough for newbies? So at least it wasn't enough for us. Next, go to next level. Let's analyze the same situation with debug logs. At first, nobody turned, round filter returned, 8080 host. This means there are 88 hyperbizers capable for required memory. Okay, so memory is not problem. And second one is also from scheduler. He said, does not have blah, blah, megabyte usable disk. It only has blah, blah, megabyte usable disk in 88 times from all hyperbizers. So there's not enough disk space in our hyperbizers in this situation. This is it. So debug log is very useful even in simple situations such like this. So this is the reason why we set open stack with debug log level. So if you'd like to use open stack from tomorrow, we suggest you to please think about your logging environment for your health. And this thread is a kind of promotion of NTT group. So, but this shows importance of standpoint of operators. So there are another problem around open stack logs. So NTT suggested and working on new function to solve this sort of one of them. So our suggested function realized to trace logs easily even across components. In current log implementation, each component has each request ID. So we needed to map request ID for tracing logs even in trouble shooting situation. But log format doesn't allow us to find IDs easily. For example, creating a new volume from image, Shinder calls Grants API. The right figure in the right side shows Shinder has the request ID like A in red color and Grants has request ID like B in blue color. So of course you can find like B in Shinder logs, but it's very deep, it's difficult to find. So our suggested function logs request ID mapping within one line in each corner, like the bottom of this slide. It's easy to find. So this spec was approved by the community and we are writing code now. This story tells you operators can contribute to the community from one standard point. Okay, this is the final slide for monitoring, about monitoring. As I mentioned before, we've been providing standard monitoring system inside our company. So if you saw, it might be important for you while I talk in this slide. So we give standardized monitoring workflow for internal service developers, such as standard monitoring item sets and rules and parameters threshold of our art on Zabix and so on. So once upon a time, we configured manually Zabix and Nagios. But then think about monitoring skin with OpenStack. On OpenStack, over 1,000 virtual machine are born and suddenly die, suddenly be killed. So can you configure manually in real time? It can't be. So we decided to add new function to our Zabix. It detects new VMs and starting and monitoring semi-automatically. So in this case, semi-automans, our engineers can choose whether any node is monitored or not by them service. So the function is based on auto-discovery function of Zabix, but we wrote some script to achieve our requirements. What does it mean? So it means we changed our workflow dramatically for the sake of OpenStack. So we suggest that before getting along with OpenStack, operators must consider their today's workflow deeper for an efficient operation with OpenStack. That was our monitoring. And last, I share our current issues and future plans. I explained that we are recently working on. At the beginning of OpenStack deployment, we designed our flavor focusing in the migration project. To migrate smoothly, the compatibility with old DC was more important than the resource efficiency. The VM specs some as old DC was the best solution for migration plan. Therefore, we designed our flavor that has 37 gigabytes disk capacity for one gigabyte memory. However, when we looked at the current usage, it turned out that only seven gigabytes disk capacity is in use for one gigabyte memory in average. To improve the efficiency, we had the new flavors having smaller disk capacity than current sets. And we are asking users to release unused disk capacity by switching to these new flavors. We are instating the additional memory capacity to servers concurrently. We accept we will be able to increase the VM density and the resource efficiency 1.3 times at least or two times at maximum. Sizing is always important. This is our future plan. We are planning to upgrade OpenStack itself. A lot of new features are coming, but we can't use them as long as we stay at I-South. Especially, we want to deploy load balancer as a service or L-Bus. We configure load balancer by manual today. It's not provided to OpenStack user. We tested the L-Bus API version one, but it doesn't meet our requirement due to a lack of function. Since L-Bus API version two will be mainstream at the community, we are waiting our vendor provides version two driver. We feel it better approach to address both of L-Bus deployment and OpenStack upgrades. We also have to establish our version of operation. We modify the horizon to make it fit our operation. We have to apply these patches to new version again. This request on deployment and test effort and it needs certain time. This provides us from upgrading frequency, frequently and following every release. Next release name is Mitaka. NTT R&D locates at Mitaka and we feel it is a familiar name. We went to use a familiar Mitaka release. By the way, these photos show our goods. We hand out marketplace. Chinese characters Kanji of Mitaka is written a sense and a towel. Don't forget to come to our booth to pick up US. After you check another user story of NTT group next of this presentation. Okay, that's all we have today. This is just a summary of what we presented. I'm gonna just emphasize a single point. OpenStack give us business speed and agility. That's the number one benefit and we have to think about to introduce OpenStack. Thank you very much for your attention and we have about two minutes remaining to answer your question if you have. Hi, that's a great presentation. Thank you. Are you planning to upgrade directly from Icehouse to Mitaka? Yeah, yes. We know OpenStack only supports version of step-by-step one version. So we have to figure out the process from Icehouse to at this point, the rebatee or maybe after that, the Mitaka evaluating the all steps, upgrade Icehouse to your rebatee within a certain maintenance time. That is a plan we are planning. And the original plan we had is we will get upgrade much earlier if possible before this summit, but it turned out the RDO package, after Juno package require the CentOS 7 for the hostOS. So now we are upgrading the hostOS and the great things community give us more suggestion. When I joined the operator meetup, there is a tool called Anvil. That will be another option to make packages for Juno or Kilo package for CentOS 6. But to avoid a complicated pass, we are now just upgrading the hostOS. That's our situation. And another. So question is a strategy, how to design the flavor, VM flavor. So at the time we designed the flavor initially, we just followed our previous KBM setup we used to use in the previous data center to make the migration easily. But after one year experience, we measure the usage. I think we have to go iterate this process to know how the user using the resource efficiently or just asking the other user having a similar work world to share the such number. It's kind of magic number, but we shared our number in the previous slide. I hope it's gonna give you some hint. Okay, so it's time to close up the session. Thank you for joining our presentation. Have a great summit.