 Hello, hello to everyone. My name is Mariano Galar. I'm part of the Cloud Builder team at Mercado Libre. So today I want to start giving a brief introduction about the company and about how we were doing things before OpenStack. Then Leandro will show you more in detail. Our OpenStack implementation. So, well, for those that don't know about Mercado Libre, it is the leader e-commerce platform in Latin America. It has presence in 16 countries. It's the eight online retailer worldwide. It has more than 1,600 employees. 300 of them are IT related. We have more than 62 million registered users. We have 20 million IPI requests per minute, 50k requests per second on our peak season, and around 4 gigabit of internet bandwidth per second. So, we have around 1,000 physical servers and 7,000 virtual instances running in our environment. So, in 2008, we started thinking on virtualizing all our infrastructure, and we thought that that would be enough and we'll be able to supply all the needs that our development team had. But, as you will see, that didn't happen. So, first of all, we started just virtualizing our instances, but it was not an automatic process. Everything was manual. But, we were able to supply all the demand. Then, as the business grew and our business changed, because we moved from a boxed software to an open solution with IPIs, and the demand of hardware was increasing a lot, and we were not able to provision enough infrastructure for our developers. So, there were some impacts on the business because of that. We didn't have scripted configurations on the VMs. The operation team was provisioning the servers, not the final user. We were cloning the VMs, so we didn't have an image provision to provision. We were using a lot of storage. It was not easy to manage. We were locked by an OS vendor. The deploys were not easy, and we were very slow. So, we started thinking on another solution for that. We were not able to satisfy the demand. The projects were delayed. So, the technology was obsolete. We needed to find another solution. The costs of the data center were growing, and the company was growing, but not as suspected because of all these issues. So, we were targeted as a botonic. So, we started thinking on redesigning our infrastructure to become a real infrastructure as a service. So, Leandro will talk to you about that. Okay. Hi, guys. My name is Leandro. I'm a senior engineer at Mercado Libre, and I'm one of the guys who deployed the OpenStack infrastructure in our company. So, I want to talk to you about a little bit of how our infrastructure is composed and how we're designed to become a real infrastructure as a service provider. Why would you sell OpenStack? We chose OpenStack because we love open source. Everything in Mercado Libre is open source. Our laptops run Linux. There's just one window box that the sub-users use. Of course, we wanted to reduce cost because our infrastructure was really expensive. We were using Oracle VM to utilize, and it was costing us a lot. All the licensing stuff, all the stuff was costing a lot. Because Hasn't Opened or Locked in, we can deploy on everywhere because it's really a full cloud-based right now. It's really in Python. We love Python, so we love to hack it a lot. It's a platform under the boom to do so. Okay, all of our infrastructure that run on top of OpenStack, it run on Ubuntu, yeah. The cactus version of our cloud was running on Ubuntu 10.10 Maverick, and we are actually running Essex on top of Ubuntu 204. We love Ubuntu because it's free. It's open source. It's based on JVN. I personally love JVN. In terms of cost, of course, because it has LTAs for updates and upgrades. It's flexible. It's stable. It's fast. It's easy to deploy. It's easy to tune. It's proven to shine as a host and guest. We are using KBM as an hypervisor. It's working pretty well on OpenStack, as you know. Recently, we have professional support from Canonica. Okay, I want to talk about a little bit about how the architecture is composed. I think many of you guys are curious about how everything is set up. That's our first version of the cloud that was running actually on cactus. We are using OpenStack since Vexor released. The first production environment was in cactus about a year or plus, almost two years ago. It was composed last several clusters of several regions distributed across two data centers. We have two data centers located in Virginia. There are several clusters while spread across all those data centers. Of course, we need a way to integrate all that makes environment. We have to deploy our custom API that's called Massive Deployer, that one you're seeing there. In that way, we see the whole infrastructure, the whole clusters, the whole region of OpenStack as a one-only resource. It's so unaware. Back in cactus, the song code wasn't working pretty well. It was removed before. NSX actually sells. It's not active yet. Actually, we deploy it. We create an API that shows the whole clusters on the whole region as a one region. From that API, we manage all the instance creation and we know there where is the cluster with more resources or something where we can create a VM back there. The whole intelligence of the creation of which cluster has the CPU, so which flavor should run on which one. It's located on the Massive Deployer API. It's kind of a schedule. As I told you, we have many clusters distributed across regions. We have about... By now, we have about 40 or 50 clusters running about 16 nodes per cluster. We have a pool of Glance Image Services because it's hill use. Back then, we were using Nova Volumes as a block storage to attach to the VMs. We were using LBM formatted file system that was running on the controller. That was a huge problem because every cluster in every region that was located across whole data centers has their own Nova Volume endpoint. So that was pretty bad because if you had a connectivity issue or something, the VM gets corrupted and we lost all that block storage until we recovered it. We knew that we had to work on that because there was a part of the cloud that we are not happy to. Actually, the LBM file system was on top of a loan that was in a NetApp storage system. It's a loan that we show on the NetApp. Then we export via the controller and Nova Volume service to the VMs. It was kind of tricky. So we knew we had to work on that. Separately, we have a Swift cluster, about 40 nodes. Swift cluster, about 70 terabytes usable storage of Swift. On Swift, we store CSS files, JS files, template files, the whole thing that you see when you enter the site, all the style sheets, the device ID, the security, it's actually stored on Swift. When you log into Mercado Lira, all the things you see is actually getting from Swift. Of course, we have CDN, we have caching, I'm going to talk about after. But in that time, back then, we have a MySQL cluster with DRVD. It wasn't scaling. We knew we had to change that part of infrastructure too. If you ask how we can manage 7,000 VMs or 100 plus physical nodes, we are using Chef for it. The whole physical infrastructure and the whole virtual machines configuration is managed by Chef. So that's how we do it. I'm going to explain a little bit more in deeper in a couple slides. I'm going to show you how a cluster itself was composed back there. We're about 15 nodes computer cluster with a controller that has such an API endpoint as a NOVA network, as a NOVA scheduler, and as a NOVA volume server. 16 node, two Spurs node. One of the Spurs nodes was configured as a controller back then. In Cactus, there's no HA for OpenSug. So we actually managed to have another Spurs controller that can turn it on in case of this controller was lost and recovered. So that was how it composed. There's quasi-traffic that actually goes to the NetApp Storage. The massive deploy API as I told you before, it was our API to show the whole infrastructure as one, as an awareness. All the... I don't know if Mariano told and said it, but Mercado Libre is the whole site for the 16 countries. We're running on top of OpenSug. We have OpenSug on production. The whole site is on it. We have about 7,000 VMs running for about 80%. This is for the production traffic. About 20% is for testing and internal use. But all our infrastructure testing, production, everything is running on top of OpenSug. Now we're running on top of SXM back there, we're running on Cactus. So that's one... That's one of our cluster labs composed back there, so... Yeah. Yeah, we have many problems with that, so... We have to... We have to manage to reconfigure to re-architect through the whole cloud to solve a lot of problems that we were having, so... How we managed to do that. First, we moved from Cactus to SX. We didn't touch the upload. Many of you guys maybe are asking how we upgraded the whole cloud from Cactus to SX. Yeah, we don't. In our API, we just started to install the SX class on SX production environment. And in our API, we just redirected the creation calls to the new whole cloud. And our infrastructure is... All of our infrastructure is stateless. So all our APIs, all our databases, if we spin up, we can run it on anywhere, on Amazon, on Rackspace, on our internal cloud. That's one of the big changes we made when we moved from your flat virtualization to our cloud infrastructure. We have to tell all our devs how we are going to work in a cloud environment. And that all the APIs that we were going to go should be stateless. So... That's really advantage. That's why we managed to move so easily between Cactus and SX without any problem. For example, if you're running a... I don't know, 50 node front-end web servers on Cactus, the developer just made an API call to create all the 15 nodes on the new cloud, on the new version of the cloud, and then be a monitoring. We know when a compute node wasn't running any VMs anymore, so we have managed to be a chef, pick up that resource or pick up that compute node and add it to the new cloud. We do that automatically. So we already know wasting our resources on zombie machines or kind of that. Of course, when we deploy SX, many new things come like Keystone, for example, that was a big change because when we were using Cactus, we have all our departments of our slash tenants created on each cluster, which they part of keys. So actually, the API managed to pick up the keys for the cluster that the VM is going to run and running on that. With Keystone, that gets very simplified because you just have an identity service and identity API, so you just have to grab a token and it's been up and then the massive deployer knows where to put all the VMs. So based on the worst, more free resources, BCPU, memory or that, which flavor have run. Of course, we have to integrate Swift with Keystone. That was pretty easy. The Glance image service just moved from versions. We have to, of course, upload the images that we had on the Cactus version of the cloud to the new Glance image service. But actually, we don't have many APIs. That's the Amazon term, I don't know how to do it. We have about one Ubuntu on 204 image. We have a Red Hat image and a Windows image just in case. But the developers just spin up an image that doesn't have anything on it. So Ubuntu and the Red Hat. And based on metadata it's passed via user data into the instance. The whole instance gets configured by itself using that key value values to configure the self-using Chef. Every instance that's spin up just know who it is and where it has to go and where it has to put it on a load balancer. So the instance just spin up and gets itself configured via Chef. So that's the word for Red Hat and Ubuntu and everything. Okay. So that's what an instance life is in Mercola Libre. As I told you before, the database was a big problem. DRVD was scaling enough. We have a lot of write errors, a lot of sync servers, so we moved from my SQL with DRVD. To my SQL Galera cluster. I don't know if any guys of you here know Galera. It's a really great product. It's open source and made by Cold Ship. Actually, Galera runs WSREP protocol, so to sync every info between the nodes. It's really great because it has a master-master configuration, so we can have actually we have about 20 Galera nodes, 20 Galera MySQL nodes that receive all the cloud traffic. So all the databases of each region are stored on Galera. It's really cool. Okay. We're still using Chef, the new version of Chef with our database is storing Galera too. Okay. Of course. We're still using Swift as an option storage, as I told you before, to store all the CSS and the site styleship. We are planning to move all the items, all the items images that I don't know if you guys know Mercado Libre, but it's kind of naive. So we're going to store the whole site images in Swift. A big project that's going on. Okay. Okay. If you see here, now the controller is only acting as a network and APN scheduler. It's not acting as a nova volume server or distributor anymore. What we've done here, we have a lot of, I don't know, a couple of trillion of dollars in the app storage. We have to use it. We knew that NetApp was really involved on OpenStack when we were trying Essex. Actually, we asked NetApp if they were calling a nova volume driver. Robesker told us that they're working on it, so we started to try the nova volume driver. Actually, our Essex Cloud is using it, so now the whole volume, the whole block storage, is directly stored on the whole NetApp farm. So all the ISKSI mapping is not done via the controller until they be into the loan anymore, so now it's done directly to the data fabric manager that it's the server that has all the storage pools configured. Each cluster has its own storage service to store all the to all the projects. So we solved that problem and we solved that database issue too. We're still having about a... By now, we're having actually about 24 or 25 nodes per cluster. So I have to do this slide, so... Okay. What is running inside a VM or a computer OpenStack or a computer? The whole... As I told you before, the home market delivery infrastructure runs on OpenStack, the whole infrastructure. The second distribution layer that runs on NGNX, it's all virtualized, it's running on Ubuntu, and it's running on top of OpenStack. All our load balancers that distributes all the traffic to the APIs are on top of OpenStack. And it can tell you it's a lot of traffic. Two, the caching layer, we use varnish from caching. It's running on top of OpenStack, too, on virtual machines. We have to work a lot with a networking driver to get that work pretty well because we have to use the virtual driver and tune up a little bit because it wasn't scaling up enough. And the server layer, of course, where all the applications run mainly are Apache Tomcat, but we have a lot of APIs running on Node.js and Python. The whole server layer is running on top of OpenStack. All the vacants with no SQL databases like Redis, Mongo, etc. are running on top of OpenStack, too. So, RabbitMQ, too. So, we have the front-end, the layer, the caching layer. All is running virtualized on top of OpenStack. Okay. Oh, sorry. I'll go back a little bit. One more? Okay. Okay. Okay. Fire the slide, man. Okay. I forgot to mention something important. We're actually running Quantum, too. We have multi-tenants. Every department on Mercado Libre is a tenant for OpenStack. Every pool of machine that the developer has has their own identity and belongs to his tenant and only he can manage that. We're running Quantum to isolate everything. We are using Bill and Mold now. Bill and Mold in Quantum, yeah. Actually, we managed to develop our custom API to talk to our Johnny per switches because we didn't have BX Cisco switches. So, we have to develop ourselves a custom API to talk to them. So, when we create or spin up a VM that belongs to a Bill and actually Quantum talked to our bugging and then created on Johnny per switches. So, a little concussions. I think that many guys here think that OpenStack is not ready for production maybe because of the experience installation experience is generally pretty bad. But when we use to I don't know, if you want to install OpenStack and run on production, if you don't have a DevOps team that actually loves Python and loves to hack old code, maybe it's going to get pretty tricky for you. But we managed to love OpenStack and all the features that we use are mainly most of the OpenStack core projects are pretty stable for us. Of course, when we were installing it was stack trace, stack trace, stack trace, stack trace. We have to hack it a little bit, but once you get that all that stable is, it works pretty well. It's really important to notice this. We managed to go from a flat virtualization era to move to a cloud infrastructure or infrastructure as a service. It was really hard to educate our developers to move from that type of virtualization to move to a cloud. I think that was a really hardest part of it. But now we can manage to equal the supply. So actually, we moved from a developer, I'm a developer, I need a machine. I wrote an email to the OpenStack team and the Ops team just created the VM and cloned it for another similar VM. It was pretty fucked up. So we have a little bit of scripting involved on the process to configure and craft all the VMs. Then we passed to the architecture team to configure all the applications. The whole process was all. Then the storage team created the block storage volume and touched it to a VM. We have to send a monitoring request to the nog to get monitored and send traffic after adding to the load balancer manually. So we moved from that to the developers just curl the API create a server exactly the same like others that they have. Create his volumes and attach it to a server manually script that they do wherever they want. With another API call they can set the VM to production state. When they say the VM to production state it gets monitored automatically. We are using Senos for monitoring. We are using the Senos API to do that. And then when it's ready we just made a pull request on Nginx and the machine starts to receive traffic. Actually it can be in manual or it can be automatic. If the developer configure his pull as Nginx automatic pull when they set the VM to production state actually automatically gets pull to the Nginx balancer. Okay. These are a little bit of concussions. So we managed to get all the infrastructure presented as a whole. We wanted to do that. We wanted to grow faster. Actually the developers were pretty cranky about I asked for a VM two days ago what's going on. So we knew that we need to move from flat virtualization mode to a cloud mode with all the advantages that they have. We run on commodity hardware. We are using Supermicro and Dell hardware. We have Canonical support. We are having a lot of issues with KBM on 12.4 on the last kernel and having Canonical support to backfix that was really great. This Caribbean performance of course. We have auto-scaling developed to a feature of auto-scaling that if a pull is fully load they just automatically spin up our VMs. Flexibility and we are six time flasters. Mill-BMs in the virtualization era. Just numbers. We have about 7000 pieces of data. Back then we spent about two hours or four hours maybe to get the VM fully configured. Now the developers just run a call. The VM gets productive to production state to receive traffic in about eight seconds. We are using NFS storage to store the underscore base directory where the glance images are stored. With that and the copy and write feature of KBM we managed to spin up really faster. That's really cool. In eight seconds you have a productive front-end web server for example running Apache. We managed to build a lot of features in top open set during the flexibility and sustainability. We have actually load balancing as a service that actually use our custom API, one custom API that we craft. Load balancing as a service is using HAProxy and Nginx depending on what balancing method you need or you need caching or not. We developed database as a service on top of it using MySQL. We built an API that developers just need a database and just made a call. Queue as a service that's actually what we're using in RabbitMqueue. Cache as a service that uses main cache supports main cache binary and text protocol. That's really cool because in the past if a developer actually needs for example to create a feed or to create a feed sappy it has to install the whole set about install, I don't know, all the queue and all the things. Now this just can create a topic or something in our queue as a service system so that's been pretty fast. Okay. Okay, this is our context if you want to connect. I don't know if you guys had a question for us. We are using, actually we're using a 5BiP for the edge traffic and then we are using NGINX as a second layer. Can you manage a snap mirror in the NetApp environment to do replication between your environments? No, actually each data center is pretty independent between each other. There are sure services between them but we don't replicate them between the storage as well. I can't see a issue. Actually the massive deployer has many, many custom internally Mercado Libre things like metadata pool configuration. In Mercado Libre a group of VMs, it's a VM pool and you can manage I don't know how to say it. Each VM pool has particularities of internal Mercado Libre so I don't know if it's really worth to get in open source but the load balancing as a service Q as a service Cache as a service and database as a service features we're going to release a DS open service. Actually if you're going to if you need to see all the infrastructure as a whole maybe when cells get seen too grizzly I don't know if you're going to get into grizzly that maybe will be a better option for you. Do you think are you taking advantage of the security groups features and all the IP tables and that's one question what do you think about I notice that you have 10 servers or 20 servers with Galera cluster to deploy your high availability feature in terms of your SQL deployment but what would you think to try to create something like a no SQL solution to store all the open stack data taking advantage of the scalability instead of persistence Using security groups No, that's the first question and the other one is instead of 20 nodes with my SQL Galera cluster try to think in open stack storing the data in a no SQL model but what is your opinion in terms of that that does not exist what do you think we are using a lot of MongoDB and Redis we actually manage via Chef I don't know if you guys saw Mad presentation before but we have a lot of no SQL clusters in open stack it performs pretty well what was about it instead of the Galera cluster quite not to deploy into a no SQL model ok we got with Galera because we don't just store open stacking we store a lot of things all our API databases we need a multi master environment that performs really well we go with that solution you only have about 25 compute nodes in each cluster is there a reason why you stopped at 25 actually that size came from the early version of the cloud because we were using flat networking and we managed to size that number of compute nodes running I don't know 10pm or 15pm was enough to fool a whole slash 24 so that comes from I don't know the early version of the cloud but actually we are planning to move from that scheme to a big cluster for each data center but we are working with that we are trying with quantum and melange actually the quantum and melange integration is pretty crappy so we are developing on top of that we are hacking a little bit to actually go to two big fat ass clusters and the question is when your controller node dies how do you bring up a replacement controller in each cluster if a controller node dies since all block storage traffic it actually goes directly to net up we actually don't have downtime so the only thing that you can do is spin up a new VM of that cluster or when image boot maybe you that VM doesn't get me the data but that's why we have a super controller node we have configure so if a controller dies we just turn it on what is that manual or do you have automation there to bring the spare controller we do that with Chef if a controller dies Chef notice we are monitoring and then a working driver we are using 100 1000 sorry wasn't I don't know how to say it the whole traffic was dropping packages and was actually performing pretty slack itself we moved to Beertagio driver and we have to tune Beertagio a little bit but now it's taking the whole traffic bar and it's caching engine extraffin it's all go through there when a new VM spin up if the pool has automated commit activated actually add itself as a backend node on ng-next and that allows the configuration we have an agent that actually is monitoring constantly if there are changes to the backends when the text one is reloaded Glance actually we are using glance just one time to spin up the first VM because I said to you before we are using an NFS share volume to store the underscore base directory where the image itself is stored so glance is used just one time when the first VM is spin up with that image and then when you launch the same image on another cluster or everything first they check with a simple vehicle to glance if it exists on the underscore base and actually it does and then spin up the VM really faster and you don't need to get it from glance every time that you create a machine yeah actually the developers that's one of the reasons that we choose CHEF many applications in Mercado Library are built in Ruby so our developers are pretty experienced writing CHEF recipes for them was pretty easy because it's just the clarity so we assist them if they need but they write it themselves it wasn't a big effort because we were creating the whole cloud stuff in a new environment where the old stuff was still working so many developers were dedicated actually to translate all the installation stuff through CHEF recipes and then