 Hello everyone, thank you for coming for this presentation, I know it's the last session of the day. My name is George Mihayevskou, I'm the cloud architect for the Cancer Genome Collaboratory Project in the Ontario Institute for Cancer Research and I'm here today with my colleague Jared Baker who is the cloud engineer working with me to support this environment. OICR or Ontario Institute for Cancer Research is the largest cancer research institute in Canada and one of the largest in the world. It focuses on prevention, redetection, diagnosis and treatment of cancer located in Toronto, down town Toronto, funded by the government of Ontario and it also hosts the secretariat for the ICGC and its data coordination centre. ICGC is an international, it's a voluntary scientific organisation creating with the goal of collecting and analysing 500 normal pairs for the 50 most common types of cancer. ICGC is the largest global project of its kind with members from all over the world and for every type of cancer covered by ICGC it receives DNA samples from patients of two or more countries in order to cover more variety and then the data collected is analysed using the same algorithms for consistency. Cancer Genome Collaboratory is a cloud computing environment that was built for cancer research by OICR. It enables large-scale research on the data sets generated by ICGC and the project goals in terms of infrastructure were to build an environment offering 3,000 compute cores and being able to store 15 petabytes of data, very large data sets generated by ICGC. Also one other goal was to create a system that allows cost recovery to make this project self-sufficient after funding from the government finishes. I'm going to talk a little bit about genomics for those of you who are not in genomics research field. So basically you'll understand better the design decisions that we had to make. The new generation of sequencing machines produces more data and faster than ever before. You've heard that many times in the news. The human DNA has three billion base pairs and the sequencing machine scans the organic material and produces an unaligned file. It's similar to a book with the pages unordered. It cannot be used right away. In order to actually use that data, you have to align this genome based on a reference genome. So you take the reference genome and you take the unaligned file and you align it. And this process is very CPU-consuming and time-consuming. And for each donor, researchers need at least two files, one for the DNA of a normal sample from a tissue that's not cancer-affected and one from the biopsy of the tumor. Some patients have multiple tumors or multiple files. It takes about five days on a virtual machine with four cores to align a single file. You have two files at least per patient. The workflows usually happens, the researchers download the files. If they are looking at finding the mutations between the normal and the tumor samples, they download the files, could be between 150, 300 gigabytes, and then analyze the data for days or weeks, depending on the coverage on the file. The resulting output can be as large as the input. If you just align the files, if you are doing further analysis, like VCF calling, you just find the mutations, the output is usually five to 15 gigabytes, so much smaller data sets. But at scale, it adds up. It is also recommended that the workflow is to be independent, which means that if you have a computer to know that fails, you are not affecting a large scale analysis. You just affect one donor, one analysis, which you can reschedule on another VM, on another computer node, so no external dependencies. We see the bioinformations packaging their workflows in Docker containers for better portability and reproducibility. I'm going to show you a few slides about the stress that research impacts on the resources for the VM. Basically, this is a slide of CPU usage on a VM with eight cores. It runs for about 12 days. As you can see, it was 100% CPU usage most of the time, except for a few steps when it was doing something less CPU. So the workflows have like 15, 17 steps. Most of the time, they are CPU intensive, but some of them are maybe not parallelized, or they do other things like merging files, but most of the time, you see 100% CPU usage. At the end of the workflow, the results can be uploaded back into the object storage, or on a cinder volume for persistent storage, or upload it somewhere else. This is a slide that shows memory usage. You see, most of the time, memory utilization was about 12 gigabytes of the 47 gigabytes provided to the VM, but there are steps when the VM, that's the available memory usage, drops to half. So you basically have to provide the VM with enough memory for its peak usage. Otherwise, the workflow is going to fail, and there is no use for it. And some workflows have different memory profiles, depending who wrote it and how they are using memory in the workflow. This is a screenshot that shows the disk IO usage, which is high only at the beginning of the workflow when the data is downloaded from the object storage and saved on the disk. After that, minimum IO. So basically, it reads the data from the disk in big chunks, works in the CPU is basically stressed, but not the disk, which means that if the workflows are scheduled properly, like a two, three hours intervals, they can overlap on the same compute nodes and do not contend for this IO. So basically, in conclusion, the modern cancer research lies on large-scale studies because of the extreme diversity of mutations. So we decided that capacity was more important than speed when we designed Collaboratory. Being able to run more workflows and analyze more data gives researchers more confidence than the insight that their analysis provides is correct. Just running an analysis on three samples, you can draw a conclusion. You have to run it on 100, 300 to actually confirm that results actually stand up. Also, building for high performance is very expensive, as you probably know, and whatever we build today, it is only going to be cheaper tomorrow. So we decided to build for capacity and we can upgrade the slower parts of the system as technology is more affordable, like SSD drives, for example. Wisely, pick your battles. In terms of design philosophy, we knew that if we provide a new open-stack service that is not very stable or well documented, we have to support it. Also, if we run a not very well tested or less stable, safe configuration, we have to support it. We are talking about very large data sets. So data corruption, it makes it very hard to re-inject the data from other sources. So you don't want to necessarily go with erasure coding or more complex story strategies that might have hidden bugs and later lose all your data. With just two of us administering the environment and no support provided to rely on, we have to be careful on which feature we implement. If it's not something that's very well used, very much used by our users, we don't necessarily have to deploy like load balancer as a service or other features. We employed a new field design. So we use high-density servers. This reduces our footprint and requires less space in the racks, in the data center, less PDUs, less switches, less management overhead. You'll see our design rack. Basically, we use all the space in the rack, 48-hues. We use all the ports on the 10-gig switch. We also mix compute and storage in the same rack. This has the benefit of lowering the power usage per rack. Heat generation also keeps some of the NOVA to send the traffic in the same rack. When the VMs attach their volumes, they will read from the parameter replicas, which some of them will be on self-nodes in the same rack. No east-west traffic. Other design constraints, we only had 12 data center racks reserved. Although we have a fixed budget for infrastructure resources, we also have the flexibility being a four-year research grant to stagger purchases, which allowed us to maximize the compute and storage per dollar spent. So, for example, we started with four-terabyte drives for SAF and then next purchased six-terabyte and then moved into eight-terabytes. So as drive sizes increase and they are cheaper, we buy more. We are now at more than four-petabytes on the story side. This allowed us to avoid sitting on all the equipment near the end of the project that was maybe cutting edge three years ago when the project started. This is a rank layout. So we use 80 of space in the rack at the top to pack 16 individual compute nodes. So we have four 2U chassis. Each 2U chassis has four compute nodes. In total, we have about 640 cores per rack and up to 2.3 petabytes of SAF storage in the eight storage nodes below. Each compute node has six 2-terabyte drives in rate 10. Rate 10 gives us good IO performance and redundancy. They also have 156 gigs of RAM and 40 cores, which gives a very good, very generous resource ratio. So memory per core, it's about six and eight gigs. The average bioinformatics flavor that our researchers use has eight cores, 48 gigs of RAM and one terabyte of disk, because the files that they download can be very large. The largest genomics file that we have, it's a 800 gigabyte file. That's an exception, but not necessarily in the future, depending on the sequencing machines and the resolution that the researchers use when they sequence can be, that could be the norm. The VMs run from local storage, and because each compute node has just a few large VMs, like 40 cores divided by eight cores per VM, four or five VMs per compute node, if there is any IO contention, so if they're a step when they all use the disk, that's going to be localized to the compute node where it happens. It's not going to impact the entire cluster, SAF, so we don't have latency concern. The drawback of local storage is that live migration cannot be easily done. It's not a good fit for large VMs that have a lot of local disk, and also high CPU usage, memory usage, probably live migration, it shouldn't even complete if you start to migrate something like that. We treat the environment as a pure cloud environment where failure is given, so if a researcher, we don't have usually higher failures that take an entire compute node, but if you have a kernel panic, you lose maybe five VMs that were in doing analysis, they can reschedule, start somewhere else. This is the storage node chassis, so 36 drives per for you. We are using, as I said, collocated journals. Basically each node has 48 cores to 256 gigs of RAM. This improves read speeds as well as recovery and rebalancing operations. The servers also have 40 gigabits of network, 210 gigs front end, 210 gigs back end. The SSD-based journals would not help us because it's a read-intensive object storage cluster, so the reads always come from the primary replica and they come from the hard drives. They don't come from the journals. And to use 20% of the capacity, like one in six ratio, to provide journals on SSDs, that means 20% of space that could be used to store data is just used to cache writes. Not a good use for space or money in our case. We use the backend for glance and cinder, but as I said, the pool that has the highest usage is the Radus Gateway buckets pool where we store the data. On the volumes pool, we have a smaller quota, and we made other optimizations to the Radus Gateway as well, I will talk about later. On the control plane side, we have a pretty standard highly available setup. Three controllers, HEPROXY, NKIPA-LIFE-D. We invested in good hardware for the controller servers. We have SSDs, six SSDs in the controller servers. We split them in three RAID-1 sets. We have a RAID-1 for self-mon. So it has space and IO dedicated in case of recovery. MySQL has its own dedicated RAID-1 on top of SSD. And MongoDB for cilometer, although we are not using right now. We do HEPROXY in front of the APIs and Radus Gateway with SSL termination. We switched to ACC certificates, which helped us with the CPU usage on the HEPROXY side. We use the VLANs on top of bonds. So the controller nodes, because they are in the North South data path, we use four 10 gigs in a bond. And then we have VLANs for self-public, management, GRE, and monitoring. We use 10 gigabits per second Ethernet everywhere. This keeps the cost down. There is no need to buy expensive transceivers. Cableing is not as nice as with fiber, but, again, we are not picky. We want it to build at scale. And you cannot do that if you do expensive stuff. So we want to basically spend most of the money on capacity. We have 10 gigabits per second upstream to the internet as well for data just from other data repository across the world. And we use the Neutron Plus GRE because it was deployed more than two years ago, and GRE had better support. And we don't use DVR, just HA routers. We control better the traffic in and out if we have connected to the internet just on the three controllers. The top of rack switches are stacked in a ring topology. This is a blockade stack. The nice part about it is that it has 48 gigs going to the servers, copper, and also has 640 gigs up links. And because they are in a ring, we cable three, three NAS cables to the rack on the left, and three cables to the right. So we have 240 gigs east-west between the racks. And if we lose any of the racks, traffic goes on the other side of the ring. So we get redundancy, no blocked links, and two-to-one over subscription ratio, which is pretty good. Again, we use two NAS cables, very inexpensive versus transverse and fiber and so on. Software stack, of course, open stack. We are running on top of Ubuntu 14.04. We are Ansible for configuration, Grafana for graphs, Xabix for monitoring, Elk for log aggregation. Pretty standard, I guess most of you probably use something similar. We also use ARA. Here's a new tool for Ansible run analysis. We also developed our own object storage client on top of the S3 API. This allows us to basically give access to researchers based on temporary URLs. So this is protected data. Researchers who are authorized to download the data basically use this storage client. They receive a token from us that they can use to feed into the client. And they say, I want to download this file. The object storage client connects to a server which confirms that their token is authorized and generates a temporary URL that they can use to download the data. The temporary URL, of course, is valid for a limited time, so after that, it's not good anymore. The tool supports S3 Swift and Azure and has some nice features like reasonable downloads, multi-parallel uploads and downloads, also BAM slicing. BAM is a file format used in genomics, and it allows you this tool to request just a portion of the BAM file. Basically, you can say, give me just chromosome 3 in this section instead of downloading the entire file. And you can feed it a file that contains multiple objects and you want just chromosome 3 in this section from all of them, and you get back just that. That's of cloud usage. In more than two years, we launched more than 57,000 instances. This value includes the hourly rally triggered instances because we used the rally to monitor the health of the environment. So that sounds like a very large number, but it includes also this hourly, but it still shows the usage of the OpenStack environment. What's interesting is that for a bio-information that develops a new workflow and starts short-lived instances or any developer doing this in a non-public cloud is very valuable. If you start an instance in a public cloud and you keep it running five minutes, you are charged for an hour. If you do this 20 times in an hour, you pay 20 hours. If you start the interest at 157, it runs still three minutes later, you are charged two hours. So if you do development in a private cloud, you can start instances that run for three minutes and then terminate, and then you do it again as much as you want. Don't do that in a public cloud because it's going to be expensive. We also developed our own usage reporting app that can be used by the principal investigators to track usage of their resources by the members of their teams and also sends emails with reports. And this is going to allow us to do cost recovery. In terms of OpenStack upgrades, we started with Juno, then upgraded to Kilo, Liberty, and Mitaka. All these upgrades were live. We still notified the users and that we are doing an upgrade. We asked them to abstain from creating new instances in that time frame. We make use of virtualized labs to test new versions of OpenStack. We do basic configuration testing, take care of hard applications, database upgrades, et cetera. And then we leverage HAProxy to minimize the impact to our users. We take one controller out of the pool, we upgrade it, test it individually, put it back, and so on. Self-upgrades, we started with Giant, and then upgraded to Hammer and Jewel. Same process. Self was great, very stable. Again, we are not using any fancy feature. No erasure coding, no snapshots on top of snapshots. We don't have a latency-sensible environment. We also use Ansible to roll over the upgrades. Also, it's very useful to have historical performance graphs that allows you to see if after an upgrade, performance degrades or it improves, hopefully. And I'm going to pass now to Jared to discuss more of the operational, daily operational tasks. Thanks, George. Yeah, my name's Jared. I work with George, and I have been working with OpenStack since last August. So far, it's been pretty fun. So today I'm going to go over some operational details and lessons learned with the Collaboratory. So security updates. They happen much more frequently than OpenStack and self-upgrades, but we use the same methodology as self and OpenStack upgrades. We leverage use of host aggregates that include reserved nodes for testing to test security updates against our compute nodes. Security updates on our controllers, that gets a bit more involved. Since our controllers have load-balanced services, we can use HAProxy and keep AliveD to isolate a controller's service during updates by bringing them out of the load-balancing pool. Once out of the pool, we test the controller's API directly using an API script that we developed. If everything looks good, back in the pool it goes. So because the system is designed to be HA, we can do controller and self-maintenance live during business hours. For the most part, we can roll through those updates and reboot transparently to our users with typically only kernel updates on our compute nodes, requiring reboots and notifications to the users. I mean, why not migrate the instances off those compute nodes that we're rebooting? I mean, we do have the ability to migrate them, but as George said, large local disks, potentially IO-intensive workloads going on in them. So we take those case-by-case. I mean, it would be pretty difficult to migrate a fleet of VMs that have a lot of heavy IO going on them. So, I mean, sometimes we have to migrate those, and usually it works. So let's talk tools. We use a variety of tools for monitoring, but Elk in particular has been pretty useful at parsing and making sense of all the logs that the environment is generating. We can create custom dashboards in Kibana to suit our monitoring needs. Dashboards provide a really easy way to show out a glance if something is misbehaving. So for example, in these pie charts here, they represent who is logging the most stuff. We can categorize each pie chart by log type or even server type. And we find this really handy if we've left something in debug or if something's actually wrong as they will stick out like a sore thumb. Here's another Kibana dashboard that we made. This one is focused on web analytics for OpenStack dashboard. Here we can see the login statistics, top URLs, HTTP status codes, and even geographical location using the GUIP plugin. The idea here is having good tools and monitoring is crucial for being able to manage things at scale, especially on a small team. So deployments, everybody loves deployments, right? It means you've got new hardware, it's shiny, and maybe you even get to kick the bucket on some old hardware that just never worked right. So deployments are triggered by capacity planning and have been happening about twice a year, each time adding compute and storage nodes. In the odd time, we expand into a new rack and add all the fixings like a switch, PDUs, et cetera. Since they don't happen all that often, we're tweaking them as needed, documenting as much as possible about the deployment and referring to our previous documentation. We've started to use MAS or Metal as a Service to lay down the operating system, configure the NIC bonds, IPs, and partitions, and it's been working out well, but with any automation tool that required a bit of testing before it worked the way we wanted, now it's paying dividends. So once we're finished with MAS, we switch to our Ansible playbooks during deployments to push installations of our various tools, monitoring software, open-stack packages, and configurations. From this point forward, we use Ansible to make any future configuration changes and keep things consistent. So some operational details here. We let SEF heal itself for things like drive failures, and what I mean by that is when a disk fails, we let the cluster rebalance and settle, then we can work with our on-site technicians to replace the disk. This is transparent to the user and has worked out well thus far. We also spare on-site about 4% of our hard drives so that we have inventory during failures. This helps resolve any immediacy to replacing a disk while we file the paperwork with our vendor to get a replacement. We also set up SEFs so that it rebalance, so that rebalances aren't triggered automatically if a full rack is down, which is our failure domain, our fault domain, but it will rebalance for smaller events like a host failing or drive failures. Zabix is at the core of our monitoring and alerting. We use it to check and graph everything. We basically have historical graphs for many obscure things, which you might not think you need right away, but when something goes wrong and you want to have some kind of historical context to it, it's something that we can rely on for that. And like any good SysAd administrator or DevOps, many scripts have been developed to make our lives easier or to check things that we've experienced silent failures for in the field. For example, we have a script that monitors and keeps our glance images up-to-date so that we know our users are deploying new instances that are well-patched and up-to-date. We also got a mailing list made so that we can send announcements to our users if we're doing notifications, and our team also developed a website so that people can sign up and read more about the project. Era, Ansible Run Analysis. We use this tool. It's new to us. We appreciate the effort that's gone into its development. It helps us easily digest our playbook runs. So, I mean, gone are the days of scrolling through hundreds of lines in a terminal looking for the things that Ansible changed that we didn't expect. This way we get like a nice tabulated format that we can sort and see exactly what's changed really quickly. So let's talk a little bit about networking. We use VLAN-based networking in this OpenStack deployment. This has some obvious networking benefits such as smaller broadcast domains, but it also segments out nicely for monitoring and is well-supported and understood. So networking at the compute and storage level are configured with multiple 10-gig ports. And from those ports we create a bond where some are load balanced and then we layer on VLAN interfaces on top of the bond for different types of traffic, management, GRE, CefPublic, et cetera. So let's talk a little bit about Cef and monitoring Cef. So we've got this trifecta of Cef monitoring docker containers which are collectee, graphite, and grafana. We have dedicated dashboards just for Cef. In this screenshot I'm showing the IO profile or how the IO profile differs in our environment between volumes in the object store. We've got a lot of reading from the object store but not a lot of IOPS. Whereas compared to volumes we see more of a mix of reads and writes. This is because our Cef cluster is predominantly used for reads from the object store. Researchers are pulling large genomic files down to the instance's local disk. And as George said this allows us to get away with more capacity versus performance ratio and no need for SSDs in our compute nodes so we can just cram our storage nodes with just large multi-terabyte disks. So we perform hourly downloads from the object store with very large files in order to have long-term performance metrics available. This download also verifies the integrity of the object storage by performing a hash check on the file. So this information is particularly useful to have after Cef upgrades or other related system changes. Rados gateway throughput. We're load balancing 10 Rados gateway instances across five servers and we're able to achieve 28 gigabits of throughput from the Rados gateway thanks to the use of the ECC certificates at HA Proxy. And we allocate 22 cores to HA Proxy and keep a watchful eye out for performance improvements in like future HA Proxy change logs. So I'm gonna talk a bit about our observations during Cef rebalancing events. In this screenshot here we're adding new storage nodes with 36 drives into the cluster. The network throughput reached more than 14 gigabits taking advantage of the provided 20 gigabits. More than 20 gigabits would have likely been overkill in this storage node configuration with 36 drives. More proof on that in the next couple of slides. So here's a capture of CPU utilization during a Cef rebalancing event and this helps us identify bottlenecks. We can see that for this rebalancing effort most of the CPU time is spent in IO wait and this tells me that we're not bottlenecked by the CPU and there's more than enough computing power for the 36 drives. Here's a capture of memory usage and we see very minimal consumption of memory during this rebalancing event. We could get away with less RAM in the future in a future storage server specification but I guess it's always safe to err on the side of too much than too less, if you can. And in this screenshot here we see the individual disks and the respective IOPS. In this case the SAS disks appear to be at their maximum IOPS rating during the rebalance. So that tells me in this particular rebalance event the disks are the choke point. Here we see a capture of the rebalance event and how the data gets shuffled between storage nodes. So the two top graphs are existing storage nodes and the two bottom ones are brand new and as we introduce those new nodes to the cluster we see, now this is just a small snapshot there's many other servers in the cluster as you can see the existing nodes they start to offload some data onto the new nodes which are just chugging along ingesting lots of data to get up to the right ratio with the rest of the cluster. So Rally is another tool that we use it's excellent for keeping tabs on the functionality and performance of your open stack and stuff environment we use Rally to verify end-to-end functionality by simulating what a user might do they can test a lot of components at once like Nova, Glantz, Neutron, and Cinder and saves a lot of time versus doing it manually. In addition to functionality testing we use Rally for stress testing the cluster and we often use this after making configuration changes or an upgrade. We can set up a large task to simulate heavy cluster usage and analyze the results. So just this slide here this is like a the Rally result is essentially outputted to an HTML file for each Rally test we have an HTML file so as you can imagine reviewing a whole bunch of Rally tests to see trends is kind of difficult so what we do is at scale we just pump those Rally results into Grafana and graph them that way we have a much easier historical view of those Rally tests and how they performed. You can put anything into Graphite and graph it with Grafana which is a useful tool. For example in this screenshot we are looking at a collection of active projects active OpenStack projects in our cloud and what percentage of their quota has been used and this helps us to determine if we have any stale or constrained projects and the data from this graph is generated from BashGrid, right George? So that's enough tools for me probably enough tools for you so I'm going to hand it back to George. So I'm going to talk about the lessons learned after operating this environment for two years it was initially just me and then Jared so basically if something needs to be running you have to test it you can just assume that it is running we had cases for example where the Neutron metadata service was not running properly in the project and the IP tables rules that are supposed to perform NAT between the Neutron 169 address on port 80 and Neutron server the IP tables rule was not there just for just that project so after we fixed that by restarting the Neutron metadata server and the IP tables rule came up we said okay let's put a Xabix check that goes into each name space on the controller and looks for these IP tables rules if it's there, okay once a minute simple tasks sometimes are not we did a regular security upgrade and because we are running Mitaka and Canonical had dual part of the Mitaka repo we upgraded a component of Ceph on some nodes, okay we didn't expect that so we basically were running the entire cluster on Hammer but some parts were on dual not dual form Ceph repo but dual build by Ubuntu it worked but it was scared it's good to have more RAM on Ceph on the Ceph storage side because this allows you to have larger nodes and not be affected by small memory leaks okay like we have 256 gigs as you can see in the previous graph we maybe used 50 gigs but the other 200 helps with caching grids and if in the future Ceph has a memory leak we'll be able to get away with it for a while also you can run very economical configurations for your compute nodes which are a lot so that's where you want to be very strict about designing really for scale but for the infrastructure node that does the monitoring on your controllers you should be more generous so we have 128 gigs of RAM on our Zabik server that also has Grafana and has an Elk stack that stores 300 million logs on SSD disks so as we scale the environment we still have a lot of capacity and performance on the infrastructure node in conclusion it's possible to run a stable and performant OpenStack without paying for support with few qualified resources what you have to be careful of is design it well choose the most stable features of OpenStack and Ceph and if it's not needed then don't try to make it work or to deploy it because you're going to spend a lot of time for very little benefit as the future plans we have to upgrade to Ubuntu 16.04 in order to move on to Newton and then Okata that's the next item on our to-do list we have another research grant coming that's going to build a new and larger environment for which we are going to use the networking design because the stack ring has the limitation of 12 switches so we cannot go more than 12 racks it's good enough for Collaboratory but it's not going to be good enough for this new project so we are going to go with the Spineleaf architecture we are also looking at moving from Debian packages and Ansible to containers but we had mixed results with dockers running in production it's pretty unstable from my point of view so I'm not quite ready to run the entire control plane on docker containers but definitely as technology matures we'll be doing that I would like to, in conclusion thank the funding agencies and we are open for questions yes so the question was if we are considering using a racial coding or a rados gateway no as I said I've seen people on the mailing list losing data with racial coding and recovery is tricky we have a few customizations on rados gateway so we upload the very large generic files like 150GB files into 1GB multi parts and then rados gateway chunks 64MB rados blocks this keeps the number of rados objects in the rados gateway pool manageable still millions but could be much larger if we go with the default 4MB also having larger rados objects helps with an increased read ahead on the storage, on the drives helps retrieving the data so racial coding it's not something that I'm ready to support in production we know support paid from Red Hat and no easy to access backups for 540TB of data thanks for talking guys, it's a good content how are you using your object storage is it S3 and Swift S3 I mean it can be accessed as Swift but our custom object client using S3 but the client also was rewritten to talk to Microsoft object storage and also to Amazon and I think there is work in progress to make it compatible with Swift but natively is S3 and second question you mentioned about you tuned RGW for some performance improvement can you talk about that? basically the client uploads into 1GB files we will use 64MB chunks on the rados gateway we switch on the HAProxy to ECC certificates which lower the CPU utilization on HAProxy side we use 8MB reader head on the drives that's just a few of the optimizations that we made just one comment you mentioned about object storage I've been doing a lot of testing and it's supported by Red Hat and people are using in production right now but let's talk about it hey guys I have a question for you by the way thanks for the great content in terms of mass are you using it as a standalone or with conjunction with juju? not just mass so you're doing the rest of the stuff with Ansible? we use mass as far as we can take it without getting too much in the weeds as you know the documentation is not so hot hopefully that changes but we get it to a point where we can take over with Ansible and mass itself are you doing it all automatically there's some manual work sorry provisioning for example dynamically discovers any new nodes that are coming on to the network but we do it is a pain point for me I do have to configure the partitions configure the IP addressing and all that at least I have a fast UI to do it with I would love to find out how to automate that so it's still kind of struggling with that right? yeah we are adding capacity like twice a year one rack at a time so it's a few days of work to lay down the OS and then maybe not if we were doing deployments more often that would be much higher on my priority list to resolve for sure the question around networking so how does your networking look like is it a single broadcast domain so that it's a stack but how are you connecting your servers to that stack so we have there are trunk ports going to all the servers and they carry multiple villains one for management, one for IPMI actually IPMI is on out of band yeah some are dedicated and then the front end interface any 10 gig interface is going to be trumped for sure I have 4 or 5 villains depending on if it's compute node or storage node okay thank you I want to stop the recording and if you have any questions we can answer them okay yeah no thanks guys thank you