 So Amaro here from Portugal, he works for Akia and he is also the chairman of the Portuguese user group and he is going to talk to us about multi-region failover scenarios in the cloud and he will explain to us how the cloud can be made a safe place. So we are here to see how we can turn the cloud in a safe place. How many of you have already servers in the cloud? Please raise your hands. Wow. Okay, so this session is for you, definitely. You are in the right place. I'll go here. So my name is Ricardo Amaro. I'm from Portugal and I'm a cloud systems engineer. I work for Akia in the operations team. What we do is that we provision servers, we manage these servers, we arrange solutions for performance and we deal with bare-bone critical issues. So if everything is failing, the last resort is ops. So that's what we do. Everything in the cloud. The agenda for today is why the cloud? We're going to talk about the cloud hosting, see the AWS Amazon regions, see how to make a failover, compare the solution between DNS failover and CDN failover. We're going to see file replication. We are going to configure file replication, also database replication. And I will show you a complete setup out, well, that you can do actually for yourself. This will be a basic demo. And this demo will be like a multi-region shot recipe that you will be using two cloud servers on US, two cloud servers on EU. I hope you understand US, United States, European Union, okay, so you're ready on this subject. We will use Ubuntu Linux for this. We will use the Glaster Unison solution. We will install the Lampstack and Drupal and Tangsten for MySQL replication. So why the cloud? I'm sure a lot of you have your own reasons, but since this is a Drupal con, I'm sure also that the major issue is to solve the scalability of Drupal. And like 3 says here, this was written like in 2008 when it started Acquia. And one of the mission criticals of Acquia is to make Drupal the most known and used content management system. We actually don't develop sites on Drupal. We just host them. So we work with a lot of partners. So in a cloud infrastructure, or a moving infrastructure, which one of these do you think Drupal will be for, let's say, a media site, a newspaper site? Which one? It will be only a cloud infrastructure fixed or like a moving infrastructure that you will have the site living and always changing and evolving. Probably that's why you choose Drupal to construct your site. But while it's moving ahead, you must have the notion that a spike can ruin the day and probably a bad setup of your cloud servers can ruin the whole month. Imagine a media site, a newspaper, that will lose visitors' income during one month because of a bad setup, a company that didn't do a good hosting, et cetera. So why taking chances? We have really to take care about our cloud solutions. We know it's cheap to put in the cloud, but we have to make it safe. So the cloud offers us a series of services. The first services that it offers us, and it's the one that we're talking here about, it's the infrastructure as a services. Of course you have the platform as a service or the software as a service. There are other companies, they do dedicated Drupal hosting with software as a service. We also have like Drupal Gardens is software as a service. About the cloud regions, I'm sure you all know this. There are several data centers around the world and the ones that we are focusing today are the ones in US and EU. We of course now, Acrea is starting here on Asia Pacific and we'll start soon using Sao Paulo. So from small to big. What we have in terms of Drupal on Acrea is an offer that starts with one server. Who knows, who has used already this product? It's DevCloud. Yeah, OK. So DevCloud is just one server, no failover. If the machine goes down, we have to put it up again. So we have to reboot or relaunch the instance, it's how it is. But it's because it's a development, it's DevCloud, it's for developers. Because if we're talking about scalability and we're talking about insurance that the site will always respond, we have to start to use the managed cloud. And in the managed cloud, the simplest setup is this one. Where you have two balancers on the front, then you have two webs, and then you have two file systems with database replicating to each other. So you can see if one balancer fails, it will switch to the other. If one web fails, it will switch to the other. And the same for the FSDB, the file system database. And the theme of today is this one, is when you have two different regions, and this is very difficult to set up. Because normally you have to deal with some issues regarding the passing of the data between regions and the asynchronous nature of this. So if a server is not responding, now it could respond later. Cloud providers, for us, we are using now Amazon EC2. But we could be using a lot of them. So you have a source here. These slides, by the way, they will be available later. You can check here a very good comparison sheet, where you have prices and features of the several providers. But there are other providers on the cloud. And these are the dedicated cloud hosting. One of them is Apia. And what we do now has these numbers. So this is only on managed cloud, not on dev cloud. Dev cloud is increasingly going up and going down in terms of people that are using and then switch to managed cloud when they really want to put the site up and running. So we have above 600 customers, a total of 2,300 servers. That's basically managed by me on the GMT of Europe, because, well, we're a small operations team. But since we have everything automatized, it's fairly good. 7.7 billion requests per month, 1.4 billion page use per month, and about 133 terabytes of information delivered. And these are basically the major clients that we have. It's large media, global political, and large news. This, of course, the responsibility of this infrastructure is divided between us and Amazon. So this is basically the diagram of all this infrastructure. We have Apia cloud, then the application, of course, is Drupal. Then we have the LAMP stack, Amazon EC2, which has the virtual structure, network, physical media, and physical security. That's their responsibility. And so we have the security, availability, and compliance in all this diagram. Another company that now spawned lately was Pantheon. They are very promising. They raised a lot of money now on financing from the Foundry Group, and, well, we hope really they have success. It's good for Drupal. We have also Omega 8. They use Eager or Iajir. Some of you will say one thing or the other. Doesn't matter, but it's this who knows Eager from here. OK. It's a very good hosting platform. So you can follow the discussion here. Cloud outages. It's been a hard journey because a very large number of companies don't have a disaster recovery plan. We have, but still, on 21st of April, 2001, it was a major crash in AWS and, of course, Rackspace. Sorry? Yeah, I'm going to just refer that. So this was really a way that Rackspace took some ground from AWS. Then in December 2011, a lot of people don't know this, but Amazon just told us, well, we have a lot of servers. They are on degraded hardware. They are going to be shut down. You have to relaunch everything. So we were relaunching machines like two weeks or something. And like you said, in June 2012, we had a massive, massive outage damaging hundreds of data volumes and instances. And this was in US Virginia, that the center, because, well, they tell it was a storm. You probably have heard about this case. So you have to plan for failure. That's what we are talking about today. And we give our customers a production scenario and a staging scenario. The production scenario is this one. What we do here is to have a CDN. We advise customers to have a CDN on front. Why is that? Because it's easier to fail off. Then the CDN is connecting to those balancers that you saw in the first diagram. And these balancers, depends on what's active now, they are connected to a series of webs. Depends on the client. We have a client that has 24 webs. That's massive. And we have clients that have two webs. And also the webs can have different sizes, memory, CPU, well, like when you spawn an instance on AWS. And on these webs, it's Drupal. Drupal is there in the file system as a doc root. And it's being shared by these machines over Gluster. And then it's r-synced with a constant process to the other region. So we're talking about here, you can make a line here and divide and see, for instance, no, this is really EU. And this is the US. On this part, you have a lot of servers. They are just standing by, just waiting. Can be on the end? Yeah, unless it's a small question. Yeah, wait for it, wait for it. Yeah, I will talk about that. So what we have here, it's basically a process running at this moment for some clients. And it's called the multi-region setup. MySQL, it's being replicated with Tungsten. And we also see how that is used. The staging scenario, there was a request from this special client, he asked us, instead of having machines on the same region, if they could test machines, of course, in a smaller quantity, but to have the development and the staging between two regions. This was the smallest setup that we could get to. I think that's clear enough. So the failover. How to do the failover? The failover between regions can be done by two ways, at least, and the preferable way is this one. It's called the CDN switch. And since we give the customer two balancers, one in region A, the second in region B, we give the customer an IP. And we tell them, if it fails on region A, just switch the CDN IP where it's requesting, where it's making all the requests to the other IP. And this has been working quite fine. The other would be to have no CDN and low TTL on a DNS that's connected to an IP directly. But in that case, you would have to wait that DNS cache expires. So it's kind of problematic. I would not advise that. So what? Everything is freaking? Yeah, yeah, it's true. Or a C name problem, actually, on AWS. So this is like a hint for you. Always use elastic IPs. They assure that when the machines restart, you will have always the same DNS name internally or externally. And the IP, it's the same. It's the same inside, which is different from the outside IP, but both are the same. Replication software. I'm sure that most of you that you know Gluster file system, I'm going to install it here in a video so you can see for the ones that don't know it yet. But please use the last version. Yeah, that is laughing because that's the reason we use our sync, of course. But still, I would advise make always an asynchronous replication between regions. So to configure Gluster, it's pretty easy. They went from version one to two. And now in version three, it's really lame to copy. It's like only this commands to configure Gluster. And on the end, of course, you have to put on the AFS tab amount for your Linux server. So let's see the video. We have our servers, they are running on European Union. And we were given this DNS name to which we are going to connect in order to create Gluster on them. So the first thing to do is to go inside of these machines. The EU machines, they are more also for the second one. I don't know why. We are doing here the third and the fourth server in our multi-region cluster. OK, so you're good. And we can now start to create our bricks. In this case, we are going to create two directories where Gluster will keep its files. It's not ever, ever write or read directly from them. But it will be the place where Gluster keeps it. Now we are going to install Gluster itself. So it's simple, apt-get install command. And this one, and this one. Notice I created another directory that was GFS, MNT GFS. And that directory is where we are going to actually mount the Gluster network file system. OK, we're all set. So now what we are going to do, Gluster is running, what we are going to do is to probe from one server to the other server. So this server will probe the fourth server. As for that, we are going to use the TNS name. OK, worked. And this server here will probe the server three. That's done. OK. So and of course, we can check the status. It's connected. Great. What we are going to do next is to create the actual volume that will use these two bricks. And for that, we will use the command volume create. And in this case, volume one. And we will say that it's a replica of two bricks. And we will say that the first brick is on this DNS name. And the second brick is on this DNS name. So it's great. OK, we created the volume. It seems OK. And we can just get the volume information of it. Volume info. It's running fine. And we can start now the volume. OK, it started. So the next step will be actually to mount this volume on our mounting point. And for that, we will use a file that was automatic generated by Gluster itself and also with the options of the network file system. Good. So if we write sudo mount, we can see that it's mounted. Then we can check. Actually, if it is there, great. And we can touch a file inside just to check. OK? Good. Of course, we can alias the file that's there. You can actually, if you want, you can see that the file is also on your brick. But you should not go and touch it. And now we go for this server here, where we are going to do exactly the same mount. And we are going to see whatever is inside of the mount. Yeah, our file is there. So it means that our Gluster is connected. I can put this command here either on RC local or on FS tab, so when the server starts, you will have the Gluster mounted already. OK. I hope you all understood this demo. I made this video because normally we have Wi-Fi problems in Drupal Coins. And I don't want to make the same mistake I've been seeing. So in terms of Gluster, you have to know that on EBS volumes, that's what instance use on AWS, they have not so good performance. So the best thing to do is always to go around it and create some tweaking on EBS volumes. So always choose the instance that will have 1 gigabyte connection. And since you have 20 megabyte per second for IEO, you should probably make eight devices together as a write zero or a stripe. So you could use the velocity multiplied by 8. Oops. OK. In the previous drawing, you saw our sync replication. I think to have a replication that's still a synchronous and it's not constant because you have the server trying to probe the other server constantly. And this takes a lot of CPU from the machine itself. I think to have replication both sides, it's the best way to do it. This could be discussable. Of course, other people can come here in the end. Let's discuss. I think it's always good to have other ideas. But if we set up a cron with Unison that's replicated both sides, I think it's a good solution. You could have another other solutions and I'm willing to hear them. So I had this video. How to set up now? Good. Now that we have already our machines configured with cluster, next thing we'll do, the next step, will be to install the LAMP stack and also use Unison to replicate Drupal across the whole four servers using multi-region setup. We'll start to go to the file system and to create here a series of directories that will keep our configuration files. Next, we will link these directories to the dedicated to the system files there on ATC and on VAR. Next thing to do will be to install our LAMP server also here in, sorry, also here on this server. We have to do this. So the next thing we'll install the LAMP server on this, the third server, which will request for a password and we will let it install. And we will install the LAMP server on the fourth password. It's installing. So while it's doing that, we can go here and switch to the web directory where we have a web files for Apache and just get the latest version of Drupal. So we can enter it a little bit slow. One of the things that I saw in this testing is that the EU machines are really, really slower. Great. We have the Drupal 7 entered, so we will move this directory to a directory called Drupal itself and we will change the user of this series of web files to the user of Apache. Good. This one already finished. Let's see if GIDI, PHP GIDI is already installed. OK. It's doing that. And the next thing to do is to actually create the site where we'll have a Drupal site. And for that, we will use here the sites available directory. And I have here a neat command, one minute command, that will transform the default site into a Drupal site. So we will enable this Drupal site and we will disable the default Apache site that comes with Apache. We'll restart here. Good. And we will restart here. The last thing I want to check is if we have our Simlinks. Correct. This is important because you will have the same configurations on both servers, actually on the four servers at the same time. And this is correct. Just change on one. Great. Replicate the Drupal. So we should be all set. At least regarding the file. So what we can do now is since we have the DNS of this server, we can just put the DNS here and see if the site appears. It does. Great. So let's see if the other server also responds to the same thing. Perfect. It's installed. So the next thing to do will be to use Unison to replicate these files across the regions to the other servers. So let's clean this up here. Just change to GFS. OK. And we will have to create special keys with SSH to connect to several servers. So the first thing to do will be to install Unison on all of these servers. That's another reason to use Unison for our Simlinks. It's a question of security. Because in this case, either you would have to configure a firewall to only accept connections on that last report, or you use a command like Unison via SSH. So on this server, it's the third server that I'm working in. I will just go to the keys directory and I'll generate ID, MR, multi-region key. It's done. And we will put this key already on our authorized keys. Also here, this key is good. So you can see that here you already have the keys because they are replicating. So I'm just going to grab here the pub. Going to put this key. I'm going to put this key. Good. We should go here to the third server now and just connect to the other server. How will we do this? I have here a command. It's Unison. And then you pass some arguments for SSH. And then you tell that the server and the directory that you want to be replicated, it's that one. And also the local directory to be replicated, it's that one. So you go now and pass this series of commands and the magic starts. OK, it's now copying all of the things, multi-region. And it's copying both sides. This is the big difference between Unison and Common R-Sync. So if you go now to the US, we will go to this server. We just go to the GFS and we'll see that everything is now replicated here to the other region. And since this one, the second server is also being connected. These both are connected with Gluster. You have the same files. So at this moment, we have the full sync configured for a multi-region replication. The only thing that we are missing is that we have to put this command into a cron job. Contra E. And yeah, whatever can be that one. And we just say, OK, five minutes, we will replicate, use this command to replicate. And we can put this into a lock. Good. So we just now save this file. And we will be sure that in five minutes, this server will replicate to the other server continuously. At the moment, we have already. Sorry. So this was the installation of the LAN and the Unison with Drupal. And now we go to the tungsten replicator, which will be much, much faster because I cut some parts of the same. As you notice, these commands are actually normal commands that you should know from other stuff. So this you can build for yourself. There is no proprietary stuff around this. So tungsten is a tool that serves to replicate MySQL in a secure way because it uses SSH also. So you can use it, actually, since you already have the keys on all the servers. Now you can install tungsten on all the servers. And you can check if the servers are all online, just with one command. And you can just have an overview about the status of all databases in all servers. This is a very important thing. Most of the people don't know this. We submitted a patch on Drupal 7.12, which is there is a failure on the MySQL replication if the SQL mode is not defined. So what we do now, this is kind of an internal secret, but it's not really a secret. What we do now is use this SQL mode on settings.php so that MySQL does know it's only this SQL mode. You should use this. And it will not break. Yeah, it is. Good point. Our replication of the files done and working. The next thing to do will be to create the database replication with tungsten. Tungsten has a series of difficulties where replication breaks on Drupal 7. But in Acquia, we managed to create here a little trick, which is to create an SQL mode on settings.php. And this will keep the replication stable within tungsten. So we save this file. This will be our settings.php file. And we will use this file in our Drupal site. Tungsten is not easy to install. But I have already prepared a series of commands. They will install our four master slaves. And for that, I have here a series of variables that I will use with the DNS names. I will put these variables on the several servers. And since we have already installed the master slave servers on these servers, we will now install for every server in our cluster, we will install the slave service. So the first one, second one, third one. So what it is doing now is actually to enable the slave service on these servers. OK. So the servers is now installed on each of the four machines. And you can see that tungsten is online. So what we will do now is to watch our service here and here and install our database. So we'll get here to our Drupal installation. And we will install a minimal Drupal 7. OK, so our installation. OK. OK, so I'm very sorry the videos were at late night. So I was some kind of tired. So the thing with tungsten is really that you should have that thing of the replication fixed. And we should have a consideration. I will give you these slides. It's not really important that you have all the performance tips from AWS. But it's really important that you see our running site now. So our installation is done. Our site is running on two regions now. And it's based on a cluster of cluster with tungsten and unison. And the machine should be replicating to each other. Let's see the site. Here is the site. And let's check for the replication. So for that, I prepared here some tabs. You have server 1, server 2, server 3, and server 4. They are now clean. It's based on the actual installation. And we can create some content to it. And let's check after this the other server. So let's create the content type first. Let's say it's page and page. Save. Let's create content. Let's add some tests, page, test summary, and test body. Let's save this one. And with this, we should already have our home page modified. Let's check the home page. So the test page appears on the home page. Now let's check the second server. See, the test page has already replicated to the second server. So if we put anything into the database, it will be replicated. Also the files, they will be replicated to the other servers. So this ends our installation. And thank you. So one thing with this do-it-yourself scenario. As you've seen, it takes a lot of time. It takes a lot of patience. It takes a lot of effort. And basically, it's trapped. So OK, now I can unveil. I was trying to do all of that stuff like late hours after my work with Acre. And finally, I found out, yeah, if someday I will advise someone to do that, I will say, yeah, just try to do a smart choice, choose hosting with Acre. If not, if your business is actually to host, then probably this presentation will be very interesting for you. In any way, I think it's very important to choose AGI in the cloud. Don't use just one server. Please take care of your customers, because this will have reflections on Drupal, well, the way that people see Drupal. So any questions that you have? I see there was, like, you had some questions? Yeah, OK. No, the DNS fell over in, you're talking about the balancers that we use on Acre, right? Yeah, like in the fact that they are automated. Yeah, they are automatic. Like in the same second that the machine starts to fail, it will fail over to the other machine. So it's, yeah, yeah. Yeah, we put a lot of construction in terms of our infrastructure, which is called the fields environment. It's a very technical and inside thing. But that basically automatizes everything that we have in Acre. It's a fields way. We use puppets. We use Ruby. We use a lot of bash scripts, a lot of PHP inside of Drupal. We have checks that go using Drush to see if the mic, sorry. So my question is? Yeah, hundreds of gigabytes. That's what we do, actually. What is your really, it's about the size of the files, the amount of files? The amount and the size at the same time. Yeah, it works. If you do it, like in our infrastructure, there should be no problem. The only problem is, as he stated, is the cluster that we use now. It's going to be a brave enough. And the new version of Gluster has no problems dealing with more files. And you have, of course, the problem of AWS. EDS volumes, they are slow. I assure you they are really slow. In this case, I only saw this after. But the volumes were very slow while installing. And this was because these two machines that I was using were in Ireland. I don't know. But maybe the neighborhood is awful. Like, slow. It's just a generative contact. Yeah. Yeah, the only thing that you have to really take care of there is that you don't have the files on the same directory. Just that. Yeah. So you will not. Yeah. You should not have any problem with that. At least what we've seen. But please don't put the files on the same directory. That will give problems for sure. Yeah, that's good. Yeah. That's good practice on Drupal. Yeah. Yeah, we can stay later a little bit talking about this. It would be nice to have other. This is like an unexplored ground, like. It's very hard. OK, guys. Thank you. I have here three t-shirts. I will make just two questions. Maybe you heard what is the name of the product that we have that has just one server on the cloud? Hey, raise your arms. OK. Take your shirt. So what is? Maybe you were paying attention. How many servers do we have on Managed Cloud at the moment? Raise your arm. Great. Yeah, but it was first. OK, and since you have the other question, I'll give this to you. Thank you. OK, guys. Thank you. I appreciate you being here. It was really, it's a difficult demo. I hope you understand. But let's continue to talk about this. Yeah, yeah, the presentation. Even the video. Yeah.