 So my name is Clayton O'Neill. I work with Dave on the OpenStack team at Docker. Co-worker and I gave a talk in Austin about how we deploy things with Docker. We're a little bit further down the road. I had a little bit more things to share, so I figured I'd put some stuff up here. So as far as we're at, as far as using Docker, we have all of our control plane and compute services inside of Docker, so everything you see here. First started doing this about July of last year, and the last of these actually just went in a couple of weeks ago. And it works. We haven't really had any problems with it recently. This works pretty well. But it has not always worked perfectly. One of the biggest problems we had is solved with Docker. Wow, this mic's really hot. One of the biggest problems we had this was solved with Docker 1.12. It used to be that if you upgraded Docker, or restarted Docker Engine for some reason, all your containers would restart, which was not ideal. With Neutron, that was a really big problem for us because of some of the processes that can run inside of the containers. We really want to keep running around. Upgrading to 4.4 kernel also solved a lot of problems. We had a lot of issues with AOFS bugs. We were using the 3.13 kernel, one of the newer ones that comes to the Ubuntu trustee. Upgrading to the newer version really has made things pretty solid. We haven't had any issues. So the main reason we did this is we don't want to have to upgrade all of our servers at the same time. We don't want to have to update Nova to Mataka because we want to upgrade Keystone to Mataka. We like to do these things in a more staged fashion. Some of these things are less risky than others. The other thing is that we want to be able to control exactly what version of a stable release that we're running. So I'll give you an example. We upgraded Cinder to Mataka the other day in our first dev environment and we discovered that none of our solid-fire storage worked and it turned out that was a bug that was already fixed in the last couple of weeks since we'd last built our last Docker image. It took us about an hour to build that image and then get it deployed into a new dev environment and verify that the fix worked. So if you're interested in hearing more about this, I'm going to talk on Thursday. I know it's really late in this thing. And we're gonna be talking, I'm gonna be talking about how we managed to, the tricks that we had to do to get Nova and Neutron to actually work. The talk that we gave at the last summit is kind of a more overview sort of thing and that's on YouTube. We also have a puppet module we've put on GitHub that is what we use for deploying OpenStack services running inside of Docker using Puppet. That might be useful to you even if you're not using Puppet. I even know that I've found the Kola repo is really useful because they're doing a lot of the same things. Sometimes just figuring out what the command line flags are is really useful. And that's about it. I'm gonna be around all week. If you wanna talk about this stuff, if you're doing similar stuff, have any questions, I'd be glad to talk to you. And if you don't catch me this week, feel free to reach out. All right, thank you. Go ahead and, I believe we've got Dweb Marcella Porrozolo. Is Alex Lowe here? Yeah, we've got a desktop. Is George Maescu here? So I have a lot of charts, but don't panic because some of the charts are just for you guys to glance and there will be more information in other places. I'm gonna give some times for you guys to get more information about. So my name is Marcella and I'm part of IBM Power Systems. We deploy OpenStack on Power hardware. And this part in particular, it's about the operations stack that we deploy in your OpenStack clouds. And it does work in both Intel and Power platforms. So that's why it's relevant for everybody. So there's more components of the whole stack. We actually use OpenStack Ansible to deploy OpenStack on Power hardware. And then the last part of our stack is what I'm going to show here to you guys. So when we started looking at deploying OpenStack in Power and how to actually add value to OpenStack, right? So there are three points that we try to make. So OpenStack manages the cloud resources, but it doesn't do a good work to manage the rest of the infrastructure like the other services in the operating system, hardware. And believe it or not, OpenStack services themselves. So this is slowly changing. There are new projects coming up like Monasca, but up to now we have to complement it using tools like Zabix and Nagyos and Senso and ElkStack. So that's what we try to do here. Multi-platform, I had to have a bullet for that because this is tested on Power and Intel and it works independently on both. And one other point we try to address is a configuration drift. So once you add new nodes to your OpenStack cluster, you have to configure those nodes to be for operations as well. So you have to drop more configuration on Nagyos, you have to drop configuration for the filters for LogStash, you have to drop visualizations. So we try to address that as well. And here's the architecture, we use Ansible. And there are three levels of playbooks that we use. One is what we call integration playbooks. Then there's the core playbooks that deploy the actual applications that we use for operations. Like I mentioned, we start with Nagyos and ElkStack and we use Ansible itself to do the deployment of the endpoints. And we have a dashboard as well. It's an extension to Horizon. And it's a very simple dashboard that's just made for listing all the hardware resources that we integrate and launching to the other applications like the Nagyos dashboard or the Elk dashboard, right? And there are different deployment scenarios that we had to support. So since we deploy on top of OpenStack Ansible, we wanted to, of course, support deploying our services together with the controller nodes on OpenStack Ansible. So this is one mode. We had engagements where we needed to have the same kind of architecture, but with no OpenStack. I know it's a relish to say that in OpenStack Conference, but our case is where we don't have OpenStack around. And there are hybrid cases that we use OpenStack Ansible just to deploy some services. For example, we build clusters with a Saft, just Saft or just Swift standalone. So that's what hybrid means here. And one important point as well is we have this concept of ops packages. So these are bundles of configurations for the applications that integrate, and they apply for certain scenarios. For example, we have bundles that support management, and they kind of define visualizations for Kibana, and they define checkers on Agus for the standard services for the hardware, for the operating systems and services, and you can expand that. Then there's the private cloud one that we manage all the OpenStack services that OpenStack Ansible deploys 20 plus services. There is one that we add Trove on top of OpenStack Ansible. So we use OpenStack Ansible in Mitaka, and Mitaka doesn't have Trove. So another team into IBM debuted a playbook for that. So now they're adding Trove to Newton, right? Or they already did, I guess. Then there's those other scenarios that I thought about, Saph is not alone or Swift is not alone. So the idea here is that you have those packages and you deploy into the stack, and you can go to the stack and just say, when I add a node, and the node is type Saph monitor, for example, and then the core detects exactly where the configuration has to be deployed to. There are services that run on the server side and the endpoint side. So for NIGA's, not sure if you guys are familiar, but they're very team agents. For NIGA's it's called NRPE. For AOC, I put a picture of Beaver here. Beaver is one of the log shippers, but they're planting more, right? And for Ansible as SSH. And the idea is to have this extensible by the community, right? So somebody could go and recently we added support for Ganglia. Ganglia is kind of a metric visualization too. So just add, go there and drop a playbook for the tool that you like. And in theory, it should work seamlessly, right? And the same thing for endpoints. If you want to support like a switch, you want to visualize a melanoc switch. For example, you could build an ops package to manage the melanoc switch, right? And this is being put in GitHub right now. It's available for sharing. It's a patch, Apache license. We're still working on it. We're a small team, just three people working for a few months. So don't expect much, but it's evolving, right? And now just to finalize, I know that we're short on time here for lightning talks, but just wanted to skip that one. This is a kind of a look and feel of the extension we built for Horizon. Just a list of the resources. And then there is a drop box with the applications that integrate. So it launches in context to the dashboard for Nagyus and the dashboard for Kibana, for example. And this is Horizon, this is right there in Horizon. So you can go Horizon, click on the resource you want, click on the tool, and then it goes to that tool, right? And some of the packages that we built, I'm not going to go into detail, but so they're built a lot of visualizations for Kibana, for OpenStack itself, for Swift, for SAF. And Nagyus, we actually did three levels of optimization. The communication between server and agent, it's consolidated, so if you have like 20 checks that you want to do between server and agent, it just communicates once either way, right? There's a plugin for that, for Nagyus, it's called the checkmulti. So in your playbook, we configured that. In OpenStack Ansible, we have the concept of LXC containers for each of the services. So we don't install an RPE on each of the LXC containers, install just a bare metal, and then you have a special plugin that we built, and I tell, okay, monitor that container for me, right? And it kind of does an LXC attach and runs the checks there and consolidates that since back. So this is good for scalability, you can scale that probably to hundreds and hundreds of nodes, possibly. We haven't tested yet, but in theory, right? And there are different checks for Nagyus. Here's showing a detail of checkmulti. For example, each of those lines that you guys saw in the previous picture, this is a consolidated check. And then if you go to the details, you can see how that does and or more checks that they actually does. And the status is consolidated back and showing the main Nagyus service interface. Here's one check for IPMI sensors. So really critical, it's a problem we're having into our hardware. And there's some, yeah, I'm finalizing. So there's some directions. And just to finalize, if you guys want to know more, there is going to be a live demo. It's part of the Open Power Summit and it's going to be Friday, 10, 15 a.m. on the Princess Hotel just across the street. And that's for the head. Thank you. Oh, that's okay. I think we have Curtis up next from Interdynamics. Check one too, all right. Right, so I just have a quick presentation here on ECMP load balancing OpenStack endpoints. I don't know, I've been doing OpenStack for quite a while and sometimes you get to a point where you just want to do something cool that you've wanted to do for a long time. So I've always disliked virtual IP addresses and all of the, I've used a few different systems in production like UCARP and some of the other systems and I've just never really liked it. And I've always wanted to do ECMP-based load balancing for the endpoints. So in this particular example, I just have an unlike probably a lot of people in the room. This is a small lab deployment, but it's a permanent one and we use it to deploy, to test other OpenStack deployments. So it'll kind of be like a underlying service cloud or under cloud. There's just three physical servers, a bunch of switches are attached to it because we do some pretty complicated networking stuff. But it's a converged deployment, which is a little weird because I run the control plane for OpenStack on top of the hypervisors themselves. So there's just the three systems. And I do that using LXE containers. One of the goals that I have for this little lab is just to try to make it as HA as possible. So if we lose one of the nodes, then everything keeps working, even though it's still just like a little lab deployment, but it was just a goal that I had. So ECMP is equal cost, multi-path routing. Typically when you would use this, you would also include BGP on the nodes so that if you lose the BGP session, then the route is automatically removed from the switch. I'm not doing that right now, that's like a next step for myself. So there's just a script that runs that notices when one of the interface, one of the IPs isn't working and it shuts that interface down. Something that's a bit interesting about this deployment is that I'm using edge core switches, which are kind of considered a white box switch. So in this particular example, I've got one of the edge core switches that uses an ARM processor and this particular Broadcom switch silicon, and then the network operating system that's running on it is Cumulus Linux, but I'm back on 2.5.7. So one of the problems with this little switch is it's just a very inexpensive one gig, like 48 port switch, and it doesn't really have the features that some of the newer silicon has, like the Trident and the Tomahawk stuff. So it doesn't have resilient hashing, which is a feature in these newer systems, but that's sort of a next step. So this is just a really, really ugly diagram of what I have. So there's three physical hosts and then a whole bunch of containers are running on each of those. And I actually unfortunately ended up with two virtual IPs, so you can kind of see all of the IP address spaces that I use, but I guess the main point of the slide is that there's each of the physical hosts, and in particular the HA proxy node that's running on it is in a separate IP space and VLAN, and we can kind of see that here in the blue. So my virtual IPs are in the 172.16.11 range, and then you can see that we have three different routes to three different IP addresses all with the same weight, and they're actually on different VLANs as well. So each one of those corresponds to a different HA proxy, and you can also sort of see that in the red, it describes which IP addresses are there as well and which networks. So inside each of the HA proxy interfaces, it's listing on my two VIPs. So each HA proxy node is listing on those to the 11.2 and 11.3 interfaces, and then it sends traffic back out through its gateway to the switch again. So some of the things that I have some problems with in this particular design is that this particular switch is kind of underpowered to do this kind of stuff. Like if I upload a glance image to glance, it'll pretty much swamp the entire switch. Fortunately, we do have a bunch of Edgecore 5712s, which are much more powerful switches in-house. So when I go back home, I'm gonna put these into place, and when I do that, I'll add in BGP, so there'll be a BGP agent of some kind running on each HA proxy node, and if it goes down, then that route will get pulled. And also the other thing I have to do is upgrade to a more recent Keamless Linux release. But that's kind of all I had to talk about, but it's just kind of fun to be able to do stuff in the lab and mess around with new technologies, and thank you. Okay, I think up next we have, I'm gonna massacre his name, George Mahescu from OICR. Is Blue Box in the room for there? Yeah, okay. Hello everyone, my name is George Mahescu. I'm from Ontario Science Institute for Cancer Research in Toronto. We are one of the largest cancer institutes research in the world, and of course, the largest in Canada. And OICR supports about 1,700 researchers in Ontario, and it also hosts the secretariat of the International Cancer Genome Consortium, and its data coordination center. ICGC is a research organization created with the goal of collecting and analyzing 502 more normal pairs from 50 most common types of cancer. So it's a large data project, and as you can see on this chart, the largest countries in the world participate in this project, providing data samples from their cancer patients, so we have coverage for each type of cancer from at least two countries, in order to better cover the variety of cancer mutations. The Cancer Genome Collaboratory Project that I'm an architect for was created based on a research grant from the government of Ontario, and Canada actually. In order to provide 3,000 cores and 10 to 15 petabytes of object storage to store the data collected by ICGC. The problem that the project tries to serve is to allow researchers to bring their computational algorithms to the data, instead of researchers downloading the data over the internet to their local data centers. Taking months to do this and having to store this data over and over again, they can create accounts in the Collaboratory where the data exists, there is compute capacity, and they can analyze it in one place. The genomics workflows and workloads are different than the regular OpenStack workloads, so latency and the speed of provisioning is not that important. What is important is to be able to download the data fast, analyze it, and the VMs that we use are very large VMs, and they totally saturate the resources that are allocated. For this reason, over subscription is not something that we enable in the Collaboratory. And as you can see, the flavors that we created cater for this type of workflow. So the C1 large, which is commonly used type of VM, has eight cores, 57 gigs of RAM, and 1.3 terabytes of local storage. Because when they download the data, we are talking about very large files. And also it's a very good CPU to memory ratio. If we look at the high-level architecture that we create in order to accomplish the goal of having 3,000 cores in just 12 racks, in addition to 10 to 15 petabytes of object storage, we decided to use the high-density compute chassis. So we are talking about two huge chassis that have four compute nodes. The four compute nodes have 24 drives in front, six drives allocated to each compute node. Dual power supplies, each compute node has two 10 gig ports. Currently we are only using one. And we also, you'll see in the same rack, we have eight storage nodes. For the storage side, we are using high-density storage chassis, 36 drives. Because for the storage node, we allocate four 10 gig ports. We basically use the entire space in the rack, 40 use for compute and storage, and we have two use left top of rack for one gig management switch and one gig, one U, 10 gig switch for production. One of the benefits of mixing compute and storage in the same rack is that some of the Nova Cinder traffic stays local. So as you'll see later, object storage is the main use case for this environment, but we also have a small pool of Cinder volume, which means that researchers can create volumes, attach them, and store data persistently, in which case the traffic between the compute node and the storage cluster, some of it, the primary replica traffic would be local to the rack. Also the power draw is lowered by having storage and compute. If we had to just put compute in the rack, at this density, we definitely need three, four 60M circuits. Right now we have 260M circuits per rack, and we have around 40A drawn from both circuits, 20 and 20. So if one circuit fails, the other circuit left, which is a 60M circuit with a PDU of 48A rated, basically has enough power to feed all the power. This is again the diagram of the rack. So you can see we have the top two 1U switches, and then we have eight use for the compute, 16 compute nodes in eight use, and then we have 32 use allocated to eight storage nodes. What this gives us, if we use 10 core CPUs, about 640 CPUs per rack, and about 2.3 petabytes, we started with four terabyte drives, and then we moved into six terabyte drives, and last week we added eight terabyte drives. Okay? So the racks have different weights, depending on when we loaded the hardware. We had multiple purchases. On the open-stack control side, we have a pretty standard HA architecture, because against this research environment, the SLA is a little bit more relaxed than a regular commercial open-stack environment, but we still have MySQL replicated, RabbitMQ clustered, HApproxy doing SSL termination, keep alive the taking care of the private and the public VIPs. We have on the control nodes used SSD drives, and we created three rate controllers. One controller allocated to the operating system and self-mon, so self-mon also runs on the controllers, but they have 128 gigs of RAM and 24 hyper-traded cores. The second container is allocated to MySQL, dedicated, and the third one for MongoDB. We use GRE for tunnels, because this is about two years old environment, and GRE was more stable and still has very good performance. We have four 10-gig interfaces for the controllers. We are using an active configuration for the bonding, and use the layer 3 plus for hashing for better link utilization. On the computer nodes, so we use a two CPU socket, so these micro servers four in the chassis, they have eight or 10 cores. As I said, 256 gigs of RAM, again, very good capacity, local storage capacity, so six to terabytes in rate 10, SAS disks gives us about 5.3 terabytes. Usually, rich searchers start a number of VMs, large VMs, so we have 3, 4, 5 VMs per computer node. They share these six drives in rate 10, so we have a good IOPS performance. A lot of capacity, if one VM does, is it a step where it hammers the disk, it only affects other VMs running on that computer node. It doesn't affect the entire environment. Also smaller failure domains. We don't do live migration. These VMs are basically ephemeral. If a computer node dies, then one workflow will have to be restarted somewhere else. It's a pure cloud environment, and it's not feasible to actually do live migration for VMs that have 50 gigs of RAM, changing and 1.3 terabytes of attached disk. That has to be moved. On the south side, we have three more servers running on the controllers. We have 10 rados gateway instances, so each controller runs two instances of rados gateway on different ports, load balanced by HAProxy, and we have four other instances running on some storage nodes in the other racks. We started with Hammer, actually we started with Giant, and then moved to Hammer. We used triple replicated pools. As I said, most of the space is used to store large cancer genomic data files, but we also use the self-pulse for Cinder and Glance. We made some tunings to the rados gateway in terms of stripe size, so when we upload the data at the large files, we have a special client that we used to upload, and it does multi-part upload in one gigabyte chunks, which then is split by rados gateway into 64 megabyte rados blocks. This basically lowers the number of rados objects in the pool. So we have Cinder volumes pool, which has more objects than the rados gateway buckets pool, although it's 10 to one date usage. The Safeway denotes, this is 36 drives. We have 12 core CPUs, I know it's a lot, but in case of rebalancing, they are used. Plus, if we want to move in the future to other types of like image recording, or it's good to have faster CPUs, we use 280 gig SSDs, half swappable in the back of the chassis, in the RAID 1 for the US. The LSI controller has eight, 12 gig bits per second channels, and we have four 10 gigabits nicks on the server. We bond them two for self-public and two for the replication, active, active. On the networking side, the top of rack switches are brocade. The main production switch is the 7750 that has 48 10 gigabits ports downstream and 640 upstream, and it has a stacking functionality that allows you to basically connect three ports, three Twinox cables to the rack to the left and three to the right, and basically you have two to one hours execution ratio. This is the client that we created to do the uploads and to limit data access. In terms of software, we focus is Ansible, Zabix, Grafana, Elasticsearch, Rally. We started about 5,000 instances in the last three months. This is an uninstored 500 terabytes of objects in CIF, which is 1.3 petabytes triple replicated. This is a screenshot from the last week's Rally test. We are able to have 800 simultaneous workers starting one instance. We are downloading during a load test about 28 gigabits per second. We are actually limited by how fast the VMs can save the data they download from Radu's gateway. This is a screenshot from rebalancing traffic last week. As you can see, 14 gigabits per second on the CIF replication, so 10 gigabits wouldn't have been enough. So it's been saturated. CPU, as you can see, the old nodes are on top. They are not very CPU bottlenecked. The nodes at the bottom are the nodes that receive data, and they were about 50% used, but the yellow is IO weight, so even if you give it more CPUs, not gonna actually use it because it depends on the disk. Memory not an issue during rebalancing. As you can see, the data was being drained from the old nodes and loaded to the new nodes, about 400 terabytes of data that was loaded to the new storage nodes. IO saturated, and I'd like to thank our funders for providing funds for this project and government of Ontario. Paul, from Bluebox. Oh, IBM, Bluebox, an IBM company. Something like that. Is Hiroki Ito here? At the end of the first session, we're just going to keep going through the break. There's like a 10 minute break here. Feel free to get up and leave if you've got another session, and then we'll wrap up with two more speakers now. Or soon. All right, so this is gonna be a short version of a talk I gave at OpenSex Seattle Day with one of my coworkers. So I work at Bluebox. It is an IBM company. I am legally obligated to tell you that. And my team doesn't actually work on OpenStack itself. We work on a project called Site Controller, which is kind of everything else required to install OpenStack in a data center. So here's some stuff about us. You don't really need, you don't really care about Bluebox itself. So Ursula is our Ansible based automation for deploying OpenStack. It's very similar to OpenStack for Ansible. It's just a little bit older and it's very opinionated for Bluebox. And it is open source, if you wanna look at it. And then we also have a tool called GiftRap, which builds OpenStack packages. Basically, we give it a manifest that looks like this, which tells it which OpenStack projects we want, what versions and what any extra dependencies and stuff. It builds a bunch of packages, uploads them to package cloud, and then we download them to our mirrors. So yeah, Site Controller is kind of everything we wanna, we need to install OpenStack and operate OpenStack in our and our customer data centers. Before we had Site Controller, we only really had two data centers. And so it was pretty simple to run a couple of Elk servers, some Pixie Boot infrastructure, et cetera. But then as we were acquired by IBM and also we started taking on customer data centers, that sort of changed things. So Site Controller came about when we had a first customer that said, can you install this locally for me? By the way, you don't have any internet access, go. And so I kind of sat in the room by myself for a week and kind of threw together a proof of concept of this. It's basically a bunch of Ansible to install all the bits we need. And so it's basically the initial bootstrapping to IPMI, being able to do Pixie installs, making sure it doesn't have to reach out to the internet. So having mirrors of everything, we were even mirroring Git repos and all sorts of stuff, which we don't do as much anymore. And also all of our logging and monitoring and stuff. The only thing we were allowed egress for was sent to the pager duties, because they were super, they were happy for us to be working up at three o'clock in the morning. They were cool with that. We made it work. So for Pixie Server, we actually tried a bunch of stuff, Razor and Cobbler and a bunch of other stuff and realized that all we were trying to do was here's a Mac address, give me a Linux. And DNS mask is actually really good at that and is really simple. So we used that. We tried a bunch of app mirrors and we had a lot of issues with repo repo and stuff like that. When there were like non Ubuntu mirrors, as far as like the upstream rabbit and stuff like that, they didn't quite conform to the standards and some of the tools were very strict whereas app mirror just goes, just grab stuff and it works. And then app repos, same thing. We had a lot of issues. So we're like, we'll just use package cloud and then we'll just mirror down from package cloud. So we don't really download stuff from package cloud. We just put it up there to be a sort of source of truth. And so we just treat our own repo like any other external repo. And then the other ones are important is for Python, we use DevPy and for Ruby, we use Gemina box and we have Varnish sitting in front of those because they work really well on our laptop but they don't really work when you have hundreds of machines trying to use them. And then we have just a generic file mirror which is the virtual host on Apache. We've got proxy, we've got the Elk stack, we've got sensor, et cetera. As we were bought by IBM, suddenly we went from a couple of data centers to over 30 data centers. And so we're like, how are we gonna handle this? How are we gonna make it so we can quickly deploy in multiple data centers? And we decided we were gonna split site controller up and have a remote site controller and a central site controller. And the central one would be kind of the, any time a person needs to access any open stack node that would go through the central thing. So if it's SSH or accessing Kibana or accessing sensor or anything like that, they come through to the central site controller and it kind of takes care of them accessing all the bits of the remote data centers. And they're all connected together via IPsec tethers. And this is kind of what it looks like. So you can see we have the, up on the left we have the central site controller with all of our mirrors and our flapjack server, Elk and stuff like that. And then we have the remote site controller which is kind of a smaller subset. And so at each remote data center we have, it has its own sensor server, it has its own Elk stack. Some of that is because at customer installs they wanna keep their data local. Some of it is just so we don't have like massive, massive elastic search clusters. And then the Apache at the central site controller takes care of virtual hosts. So like from one dashboard you're accessing sensor at any of the data centers you're accessing, Elk at any of the data centers, et cetera. That doesn't matter. So we have a bastion server, obviously SSH and of course we're Ansible so we run all of Ansible stuff from there. We do two factor auth with UB keys and we have a project there for helping us with that. And then we have this thing called SSH auth proxy and what it does is it fakes being a SSH agent and allows you to share an SSH key with a user without them ever actually seeing that SSH key. And so that allows us to say they log in, they're a member of a certain group so they get a particular key injected into their SSH agent and say they can then go and access customer X's open stack or they can access a site controller or whatever roles they have available to them. And that actually works really well. And then we have a thing called TTY spy which is basically doing script pipe to curl X post. And it does it with a bunch of sockets and some magic. It's not open source but there's a project pretty similar to it that I've linked there. And then we have IPsec tethering to everything. So from the central site controller we tether to each of the remote site controllers and if they're in soft layer which is our data centers we then tether to each of the open stacks. And if it's local then they just have a direct LAN connection to the open stack. And so for all these services, Censu, Elk, et cetera, in Earthshiller there are places where we just dump our environmental settings for site controllers Elk server in that data center or site controllers Censu server in that data center. And it sort of loosely couples itself together and that's how we get stuff across. And then we have a control proxy which OAuths back to our central inventory management tool which is called box panel and then has reverse proxies for everything. So if I was accessing the Kibana and the Barcelona data center I would go to that URL and it would bring me that Kibana. And everything is wrapped inside Ansible. We have a kind of a general rule if you can't automate it we don't run it. And so this is an example of what it looks like when I'm setting up the open ID proxy sorry the proxy in the central site controller. I give it a bunch of locations, the actual location for it on a proxy tool and the URL that I want to proxy for it and it builds out all the Apache virtual hosts and stuff to do that for me. And then it ends up with this and so you click on any of those data centers and access their dashboards. Oh yeah, you wouldn't download OpenStack. And so we have a bunch of mirrors. I kind of talked about that. It's pretty straightforward. Again it's all driven by Ansible and it's all data driven. So I have a list of files that I want to download and put on the mirror so that the OpenStack installer can access them. I do it like this. I can just grab them from URL or grab them from Swift. Same with after mirrors I get a list of mirrors I want and a bunch of OSes I want to mirror and it goes and builds up the after mirror config files and runs that mirror and downloads and mirrors everything. IPMI, so we try not to use it too much. When we have to, we try and use the IPMI tool and use serial on LAN. Very occasionally we have to use the actual IPMI GUI and so we have a IPMI proxy which is actually a little web app that knows about all of the servers and their IPMI addresses and it creates a NAT just for your IP to through it across the IP sector to the IPMI card you're trying to access. And it's kind of dirty but it's easier than trying to push tunnel like IPMI through SSH or something because there's a lot of UDP and stuff happening on the remote console version. And so yeah, Pixi we tried a bunch of stuff. Turned out if we tell Ansible about every machine's MAC address and like four or five other things, it's really easy to generate a Pixi boot file and a Ubuntu pre-seed and so that's what we do and we used to use Razer, we use this now and it actually simplifies our life a lot. And again, this is what it looks like. We say here's the DHCP range you're gonna Pixi boot for, here's the mirrors, any extra packages we want passwords and the list of servers. And so as long as it's got a name and a MAC address and the IPMI, it'll be able to connect. It'll IPMI to it, say next boot, boot from Pixi, now restart and then it'll restart, it'll find the Pixi server, it'll find the files that will then install Ubuntu. And we don't do anything apart from like installing the VLAN packages and a couple of things that Ubuntu doesn't ship with so that we can get it working at that Pixi boot and then when we do everything via Ursula when we're actually installing OpenStack. And the other thing with this is we have what we call a mini bootstrapper which is either a VM on your laptop or it's a little Intel Nook which is this role again but only for the initial Pixi boot server so I can just rock up to a data center with my laptop with running Vagrant or Intel Nook that's already been ansible and I just plug it in, I restart the Pixi boot server, it'll Pixi boot off this, get itself all set up, connect up to the IPsec tether and then it will then be able to Pixi boot all the other machines from it. For monitoring, so because we have a lot of customer data centers that don't want internet access, we wanted to try and get it to flow through our IPsec tethers and so to do that we had to put Flapjack in there which sits centrally and so the sent to server at the remote data center gets an alert, it tags it with a couple of things and then it sends it up to Flapjack server and the Flapjack server then either forwards it to a page of duty or email or to wherever else it needs to send it and that works pretty well although Flapjack seems to be a fairly stagnant project so I don't know if we're sticking with it long term but it's kind of what we have. This is what our monitoring flow looks like. It looks more complicated up there than it really is. Logging's a pretty similar story on each of our OpenStack nodes, we have the LogStackFordr, we've got FileBeat is coming in and we're depreciating LogStackFordr, it talks to our Elk servers by the Lumberjack protocol and just does round-robin DNS to our Elk servers to decide which one to send it to. Our Elk servers are kind of a, we start with two or three and every server has Elasticsearch LogStack and Kibana on it and so every time we add more, we just get a little bit of extra redundancy and via the round-robin DNS we don't have to worry too much about pointing at servers and stuff. And the Lumberjack protocol's pretty good. We have a fair bit of filtering, Grock filtering and then we also archive up to, we were trying to archive to Swift so we had to keep like seven or 30 or whatever days of logs live on the Elasticsearch servers but the Swift backup driver was unmaintained so we're actually backing up to S3 object storage right there and I should say an S3 compatible object storage for logs backups and that's kind of what our flows look like. Again, it's actually simpler than it looks in a diagram and that is it. Thank you very much. Our next speaker is Ito Hiroki from Japan. Introduce yourself, please. Is Alex Lowe in the room? Hi. Hi, I'm Hiroki Ito from NTT and now I'm operating the private cloud which used as the kind of a test bet for developers in my department and today I'm going to talk about our security or it was stories concerning about correcting not logs from the virtual readers and first I'd like to... I start with how this story begins. Recently, our department have been enhancing the security audit rules and one day the security manager announced that the correct not logs from the virtual machine from your private cloud and every internet connection should be traceable in case of a security incident and at first I thought that this work is very easy because Neutron has some good features or something to concern with this. However, I realized that Neutron doesn't support the correcting not logs so we have to find a good way to collect not logs from virtual readers. So what should we do exactly? There are two works. First, we have to collect information about local IP allocations for each virtual machine and second, we have to collect not logs from the virtual readers and here first job is relatively easy because we can use this... we can use the notifications and we can collect local IP allocations from the novel compute process... notifications of the novel compute process. However, the second work is a little bit complicated because these virtual readers are dynamically created so we have to deal with it and today I'm going to talk about the second topic. So I'm wondering what is a good way to find not logs from virtual readers but soon after, the good solution comes down from the sky, I mean the off-smale and in this topic how to correct not logs from virtual readers that were discussed and that was a very hard timing. In the basic idea here, we use the UlogD software and it is... which can be correct, Netfilter or IP tables-related logs of course, of course, including not logs. And there are three steps in this method. As you know, Neutron use IP command to create network namespace. So first we... when Neutron use IP command, then we start UlogD in that namespace. And second, we start logging not the connections using UlogD and send the log information to the appropriate logging servers. And when some network namespace is deleted, when we kill UlogD process running in that namespace. And this is the detail of the... or method. And we use Neutron's rootlap filters to automate... automate... to start UlogD process. And in the... each... each rootlap-filters... confires, we have to replace the original IP command to use IP... IP wrapper scripts which we created and in IP wrapper script, there are... there is a if branch. And if Neutron use IP command to create some network namespace, we... first we start... we create that namespace, then we start a UlogD process in that namespace. And if you Neutron use IP command to delete some network namespace, then first I stop UlogD process in that namespace, then we delete that namespace. And with this method, we can collect the netlog. And this is the example of the netlog. And this log... this log has all information to... to... no, to find out the... the... each virtual machine's network connections. However, there is a problem. And the next day we achieved this, we found out that that information doesn't log in the log... log files. And it looks like this happened in the log rotation. So what is the problem? And we find out that we have to change the log rotation option like this, because the original script of the log rotation, we... we only send signals to the UlogD demo in the host network... host network... network namespace. However, we... as I mentioned earlier, I... I'm running the UlogD processes in all network namespaces. So we have to send... we have to send a signal to all... all UlogD demons in every network namespace. So we have to send to the red line, like red line. And with these works, we can finally correct... finally complete the missions. However, there are a huge logs, so we have to deal with it. And this is the... maybe this is our next work. So after... our work is never over, I think. And that's all. Thank you. Thank you. So we are... have completed the scheduled lightning talks. If someone has a lightning talk, they're desperately wanting to give right now. I'll entertain one or two of those. Otherwise, I think we're dismissed. Speak quickly. All right, thank you for your participation and thank you for your attention.