 Felly, mae'n ddweud y gallu ei chyfnod yn yw'r intro, ac mae'r intro yn fawr yn gyflwyno'n gyfle hwn, ac mae'n gwybod i'n ddweud yn fawr hwnnw'n mynd i'r tyfnol. Roedd cwm yn ddegi'n gweithio'n gwneud ymlaen yn ymlaen, o'r ffordd o'r cyfrân yn A-W-S ac ychydig o'r gweithio. Y clicar wedi arweinydd yw hwnnw. Felly, rwy'n gweithio. Ychydig, roedd y clicar yn fawr yw hwnnw, ac mae'r cyfrannu yn ymlaen. I'm Mike. I'm a software engineer and CTO at a company called Stu-Rent. Check us out if you're interested in working for us. I'm also a skydiver and a northerner. So if you can't understand anything I say, come and ask for clarifications afterwards. You can follow me on Twitter at M1KE and we've got a joined in link there. I'll put that up at the end of the talk again. If you're interested in giving some feedback, if you like me, tell me. If you don't like me, it's up to you really. So cloud is going to change how we work. It's really cool. It's cool enough that a lot of you have come along to see a guy you've never heard of before. Talk about it this morning. It's cool that even people with real jobs like administrator and manager have heard about it, as well as every developer from the script kitty right up to a senior vice president of engineering or some equally overblown title on LinkedIn. But this means that you might get pressure to move to the cloud without knowing why. As we're going to see, moving to the cloud can look simple, but it's often anything but. What I want to focus on when I talk to you and what my company focus on when we move to AWS wasn't the hype or the tech or all the jargon, but the business case. Not just why move to the cloud, but can we do it in a way that makes sense to our modern, hopefully agile business practices. So firstly, why are we even going to move in the first place? We identified three things that were the reasons to move to a cloud service. Scalability, availability and durability. Scalability is a simple one. As your business grows, as your use case grows, can you improve the compute capacity or storage capacity or the process and requirement of your system over time without costly migrations, downtime, getting new contracts, working with new providers every few years as your business grows. Availability, this was a big one for us. That would probably be the same for most people. Some businesses might be able to do without it, but availability means is your service online? Can your customers access your product? Availability is going to be crucial for anyone who's selling things because any time you're down, you're not selling. It's going to be crucial for anyone providing a consistent service to people like software as a service platforms. If you're down, all your customers are down. And durability, fixing things that break. Obviously we're all wonderful software engineers that never make a mistake. But every now and then, we accidentally hire someone who's not a wonderful software engineer and they make a mistake. And things break. So can we fix things that break? Can we recover from problems? And so where are we coming from? This might be the same for a lot of you. You might be on a single server, a lamp stack. Sometimes you might have separated out to maybe a separate database, so you've got two servers. These might be VPS in various hosting companies. They might already be a sort of minor cloud provider at Digital Ocean. Or you might be on a traditional bare metal box that you own in a data centre somewhere. If you've gone a bit further, you could be on hypervisors where you are running multiple machines but still on a single piece of dedicated hardware somewhere. So you're still vulnerable to some of the same failures that can happen to any sort of dedicated hardware in a data centre. Data centres are great things but you can have problems with them no matter how much they show you how nothing can ever go wrong in this data centre. I know of one recently that someone drilled through both of their redundant power supplies at the same time. So you can't always be perfect. Another thing we have to be prepared of as we go into AWS is acronyms. AWS is full of acronyms. AWS is an acronym. And the coolest ones amongst you might know all the acronyms but I won't assume that. I definitely didn't. So these slides are going to explain it all and if it gets too much for you there's a nice icon of a plant. And we're not going to be going into tutorial detail here. So this isn't a sort of how to in a how to move your entire service to AWS. I don't have time to do that. But I'm going to offer you our pathway, the reasons we made our decisions. And afterwards there'll be time for you to ask some questions about me, about how we did it. And also I'm planning to do sort of impromptu workshops sometime later in the day if you want to come along and get a bit more hands on with your own business case. And one final thing, is this really the best way to do this? The answer is no. What we're doing here is not best practice. This isn't what a AWS certified solution to architect will tell you. This is the journey from where you are now, which could be in a variety of different places, through to being up on AWS fully. And there are times when business cases and best practice do line up. So things like security, making sure your data is protected and backed up. But other times we want to move more slowly. We need critical components on to AWS to protect our business availability and scaling requirements, but we can then move other parts and change our practices later. If you were to go up to your boss or your investors or whoever else is a stakeholder in your product and say, by the way, we can't do any work for six months. We're training the entire team in DevOps and moving everything to AWS. So we'll see you in the next year. They are not going to be too happy. They're probably just going to say no. So this is a way you can present a case to move on to AWS in a way that works for your business and works for your development team. So this is all about what we're going to miss. And let's begin with your server being on. Amazon give a service-level agreement of 99.95%. This equates to about four hours of downtime a year, which I think is pretty good. But there is a clarification in this SLA which says this only applies if you've built redundancy into your setup yourself. So to talk about what redundancy means, we need to understand how AWS is structured. So we have our first acronym explainer slide. So we have EC2. This is Elastic Compute Cloud, basically servers. They call them instances, but they're servers. They're virtual machines. RDS is a relational database service which is servers with a managed SQL on top of them. All flavours of SQL are offered. And then AZ. This is an availability zone. And that bears going into a bit more detail. AWS is split into regions. And it moves fast. This map that I put in my slides a few months ago is already out of date. There are already new AWS regions that are open and being built at the moment. The numbers in each of the circles are the number of availability zones per region. An availability zone might be made of one or more data centres. So you can see if we look to the middle of the graph that Dublin, which is known as EU West 1, has three availability zones. This means there are three independent hubs of AWS data providers and storage in Dublin. At any time, AWS reserves the right to simply have an AZ switch off. So if they need to do maintenance, if something goes wrong, they are totally happy to just kill an availability zone without much warning. If they know they're going to do it, they will try and give you warning. They could rather preserve their power and take their AZ down than give everyone warning and maybe risk something else going wrong. So we have to be able to avoid the loss of an availability zone. So if your idea of moving to AWS is we're just going to do exactly what we did with the VPS or a dedicated box, but start servers on AWS, you might face a server suddenly going offline without any notice, so we can't do that. We have to avoid the loss of one availability zone. With EC2, this involves launching your instances in multiple availability zones. So when you launch one from the console or command line, you can say, I want this to be in EU West 1A or 1B. The idea is if you have servers in multiple availability zones, you are protected if one of those availability zones goes offline and you're now covered by that 99.95% service level agreement. With RDS, the database service, this is even simpler. There's a multi AZ tick box. You tick that and, automatically, AWS creates extra database service for you in other regions. So if your main one goes offline, your system fails over with around one to two seconds consistency, which should be pretty good for most use cases. Again, they don't expect it to happen and they'll try and transition you slowly if they can. Some services, it doesn't matter. So if you're using S3, which some of you might already be because it's very separate from a lot of their other services, for data storage, that's already replicated across all availability zones. They copy each object. I think it's around seven times. So you're very safe. Same with Lambda, which we're going to talk about later, and their DNS service. Obviously you don't have to create redundant DNS servers or that would be going a little bit too far. Once we realise we need to manage servers to even start to use AWS reliably, things already become complicated. This is where you hit that first problem in the business case. So we have autoscaling. Autoscaling is AWS' core mechanism to make the servers and therefore your products highly available. It's built from a simple process. You create an image, which is a configuration, so your server config, plus a snapshot of your disk. So your operating system, all your files, configs, everything else. You then create a launch configuration. This tells AWS that you want this image to be used with this size hard disk networking and this size of instance. And then you have availability zones. You choose whether you want to launch in one. You don't. You want to launch in multiple. And then you can say what you want the balance to be. Do you always want one instance per availability zone? Are you happy with one every two? It's up to you to how you want to balance that. And AWS then starts instances for you. Once it started instances for you, in the event that one of them goes down, AWS will simply start another one up. That could be going down because of your own configuration problems. You could have broken something. Your server could just crash. And if AWS notices it's crashed, you can make it out of service and give you a new instance. So it's great if you are running specifically like hacky programmes that might break a server. You can just have your server much more durable by putting it in an auto scaling group. But there's a problem. We talked about LAMP. That's the one box that runs all your stuff. Now you have multiple boxes. So where does your traffic go? And you're going to miss mapping an IP to a domain name. We're all familiar with this format. This is a server. So our DNS looks something like this. You have your domain. You get the IP from your provider. You stick it in. It all works. Unless we want to tell our users, can you guys access a.domain.com and you guys b.domain.com with lots of subdomains, that's not going to work for multiple servers. Helpfully, AWS still has us covered. We've got load balancing. Load balancing, again, is a fairly simple process. Most of the processes I've managed to break down are the same. We choose a target group. This can be instances that you're managing manually or auto scaling groups. It will work with either. You add some rules. So the simplest rule is forward everything that arrives at your load balancer on port 443 to your servers on port 443. You can map them between. So you can have everything that arrives on your HTTPS port on the load balancer. Just go to your HTTP port on your server. And then you can add a whole load of complex information, in that case, send it to this server. You add some health checks. This is a way that the load balancer can just ping your instances to say, hey, you're still there. That can relate from a simple return of a HTML file if you just want to be simple. Or you could even have your health check do something like connect to your database or connect to another crucial service you're running so that you know that if traffic is going to your instance, it's running your application in the way you intend it to be run. If a server fails a health check, again, AWS can take it out of service and give you a new one. Or, at the very least, not send traffic to it and alert you that something's wrong. Load balancers are across multiple availability zones. By default, you don't have to say anything. It's called a load balancer. One fairly early pitfall is it doesn't actually do any balancing. It's a randomizer. So if you have certain processes that might be activated through a web root that are very intensive on your processor, maybe you're rendering some sort of video files or doing lots of image processing when someone uploads something, you could still get one server being overwhelmed. So don't just assume that if your server can handle sort of 75% load that all the load of one specific thing won't go to one server and still bring it down. It's a load randomizer, not necessarily a balancer. The default storage for sessions in Apache and PHP is on disk. Now we have multiple disks across multiple servers. Again, if your users are accessing your site with their sessions and they hit multiple servers, they're going to be asked for a new session each time they sign in. Unless you're already using something like stateless login with JWT, you are going to find your users suddenly getting logged out for no apparent reason. The same with anything in your application that relies on them storing data into the session. Once again, AWS does have us covered with sticky sessions. You can tick a box and this puts a cookie on your user browser that says, anytime I come to you, send me to the same server. Obviously if that server goes out of commission or your users will still get signed out or lose their session data, but you probably need to handle that in your application anyway. Most applications should handle a user losing their session for some reason. You can use cookies or you can store something to disk if you need it. With too many servers though, we now have to deploy code to a lot of servers. It will take you a bit longer to our sync code to two servers than one, but it will take you even longer to our sync code to 20 servers. Deploying that way has been a consistent feature of most development environments for years. You'll miss using the file system. Each server has its own disk, but they don't talk to each other unless you start setting up some complex routing. Again, we're trying to do this. It's the simplest case for our business. It's not just code either. If you're using a CMS, which many of you will be given the most popular CMS is generally written in PHP, we're using WordPress using Magento, that will store all your user's content to disk. Unless you want to start rewriting your CMS to tell it to store it somewhere else, you're going to have a challenge for your migration. Your own apps can be reacted to maybe slightly easier, but it can still be a big challenge. So we have some storage acronyms. We've got to teach you first. EBS, this is Elastic Block Store, basically hard disks. They do like to come up with fun names for things. EFS, Elastic File System, this is a networked hard disk. You'll notice it's also the most expensive storage option per gigabyte. This is to say the one that a lot of people will have already heard of. This was the first service AWS released and has become very successful throughout the world for disclosing private information publicly. EFS volumes are a networked hard drive which you can choose to mount onto any number of instances, no matter what availability zone they're in. So they're stored across all availability zones. You don't have to worry about that. There's no need to set a volume size. You're just paid for what you use. And they have write consistency. So this means that when you write a file to EFS, it only tells you, yes, I've finished writing when it knows it's replicated it wherever it needs to be replicated. And it scales with a number of stored files. If you go into an instance once it's got an EFS volume mounted and do the Linux DF command, you get to see the available storage size on EFS which is 8E. I had to look it up. That's X of bytes. That's a lot of zeros of bytes. I don't know if anyone's ever managed to fill it. I guess you could just set DD running and see if it completes or if AWS call you first. But, you know, good luck if anyone wants to try. I will retweet you if you do. So, unfortunately, this is becoming a bit of a habit. We have a few problems now. With EFS files upload slowly and PHP files execute even more slowly. So you're going to miss speed. Why? We talked about EFS's characteristics. It has write consistency and it's networked across multiple locations. So write consistency, every file has to let you know it's everywhere it needs to be before it tells you it's done. Otherwise, your application could go, yeah, that's stored. One of the EFS nodes could have a problem and you've got this inconsistent file system. For a single file, this is fine. The write consistency takes extra milliseconds. I don't know about your source code base. Our code contains about 17,000 files. That's suddenly the little time on each one. It takes a lot longer. And our sync to our server that used to take a minute, maybe two, took 20 when we first moved on to EFS. That's not really affordable in your average development environment. And then network read. Reading over network, again, adds a network connection. It's a fairly simple network connection. But the latency is noticeable. If you're reading one file, you won't see it. PHP applications tend to use a lot of files. If you load an average root in a symphony application, you're going to call 100 to 200 different files that get loaded in via the magic of autoloading and the fact that we don't think about what we're loading anymore. That means that your file read delay slows down the entire application. The file handling in the app is also affected. Again, for single file upload, you won't notice this. But if you're letting users upload big sets of documents or zip files and unzipping them, then you're going to find a problem with your application as well. We tried a few ways to solve these problems. We tried atomic deployments, PHP Opcache, and then eventually S3 deployments. We tried a few things, and, again, as the theme of this talk goes, they didn't work. But I'm going to talk to you about them anyway. Atomic deployments, you might already be familiar with this. Some deployment agents, things like Capistrano, will use this where you upload files somewhere to a timestamp directory. When everything is in the correct place, you flip a symlink pointing to your code that says this is now the active version of my website. And that's generally a really good pattern as well, because it means you don't get this slight delay where you have code from different versions of your application all at the same time. Again, with an r-sync that takes a few seconds and a low user base, you might not have a problem with that. But this is a good idea, even if you're not doing AWS and you're on one server. So you send files to EBS, which is your connected hard disk, which is really fast. They're SSDs. You never really notice any right problems with them when you're coding. And then you start your instance synchronising to EFS. Now, this is still slow, even from AWS. But at least now your machine that you're developing on, or your machine that's doing your continuous deployment, isn't having to do the work. It's not having to do this really long file copy. You can just leave it going and have something notify you when you're done. And then once it's done, you switch. The problem is that deployment is still slow. If I start a deployment, I still have to tell anyone who's waiting for that deployment, yeah, guys, that'll be done in half an hour. Wait time when it comes to computers. They assume everything is instant. So when you say, yes, it's done in half an hour, they're very confused. So we found that, especially with things like bug fixes or features that people want urgently, or trying to do a deployment at six o'clock and trying to go home. So it is still slow, but your machine is free after the first copy, which is a little better. Then you have the PHP speed issue. You might find you already used PHP opcache. It's enabled on a lot of default configurations. Opcache basically means that you run a PHP script, and it caches all the generated opcache from passing your PHP, and that way the next time that runs, it doesn't have to do this massive read of the entirety of the symphony framework to render your page. It just goes to the existing opcache. So this does solve the slow performance due to the file read time on EFS. By default, opcache validates time stamps. If you've been following along, you'll notice that that's going to kill our performance again, because it's going to have to validate time stamps on these 100 files anyway, which is the same with the network read. So you have to turn that off. But now you have the issue that your opcache is fixed. If you're not validating time stamps, your opcache will keep the same opcodes cached forever. So you need a way to reset. So we came up with a very simple way, which was you have a system running a cron. Every minute you check if a time stamp file that you upload with a deployment has changed. If it has, you curl a local PHP root which runs opcache reset, and that will log a reset in your opcache, refresh everything, and everything will cache again. That does mean the users who access your site immediately after will get the slow down for the first few minutes until opcache has cached everything. So you do get the slow first loads, and opcache is now critical to your application working. So any issues with opcache missing something, not working in a certain case, and there have been cases with specifically complex PHP that opcache can go wrong on. We had one recently that we were chasing for a while. Opcache is now critical to your app even performing normally, let alone speeding it up. So we have to change our plans. There are a few other deployment mechanisms that are suggested by people who are experts in AWS. Blue degree in deployment is basically a common pattern where you put up servers and then deploy your application to those and then flip over where your load balancer points to. Amazon have their own tool called code deploy which will do this for you. We had a look at this and it seemed great. It will create a new group of servers for you, put your code on those, and only when it's ready will it move all your users across. The issue with that is that you can still have a problem of consistency. There can be a time when users might be accessing two sides of your code base. This creates problems if you ever wanted to do something like a database schema change where suddenly queries that are running on one code base might be hitting a database that's already changed its schema to match the new code base and you get errors. It's not helpful for different types of servers. So if you're running your same code base across multiple servers for different uses such as one server for serving web and one server for running batch jobs you might have them configured differently. Blue green deployments via code deploy can't handle that. They have to target a single auto scaling group. At the moment, Amazon might change this. It's harder to run post deployment code. So if you have things that need to happen at certain stages in your deployment to your SQL databases, to files, to anything else you're going to find it harder with the blue green deployment because of the hooking into when it's completed. So this didn't work for us either and we came up with our own method deploying via S3. We wrote an agent which can monitor for deployments on S3 and then it can synchronise this across servers. The principle we used was that you upload code to an S3 bucket which is fast and you upload a timestamp saying this is the latest deployment I've done. You then check, your instances check your S3 timestamp on schedule or you could go further and S3 has an event system where you can send an event to your servers but that has a whole actual problem of where you root those events to. Once your server is aware that it needs to do deployment it creates a lock and synchronises all your files to the current instance. This lock can be read by all the servers so when servers have different times to synchronise they can each check all the locks from the other servers in order to all switch your code at the exact same time. This avoids having problems with things like front-end caches where if you're using long life caches for your JavaScript, CSS or images and then changing a query string timestamp to invalidate them, the last thing you want is to change that timestamp and then caches go and pick up the old version of your files because now you have some really inconsistent bugs that are very hard to trace. So EFS doesn't cut it for code but it's still really good for shared content. So your images or your files are still fine on EFS until you can move to something else. We're also going to miss modifying servers on the fly. If you thought deploying code was hard just try going to patch the latest OS level vulnerability or installing a PHP extension or patching a vulnerability in a PHP extension. On your own server we all do this. We sign in by SSH, make some config changes, we sign out and hope everything works. On EFS you can't do this because servers can just vanish and be recreated by autoscaling so your config needs to be versioned. There are tools to manage your configuration as code but again we've talked about we don't want our entire team to have to go off and learn DevOps. Hopefully a lot of our teams are already fairly good at this kind of task, simple server administration. Basically it's happening work. They do it on their own machines I imagine. So rather than upskilling our team what can we do? I've got a few more acronyms before I can tell you about the solution. So we have an ASG, this is an autoscaling group, we've talked about these. LC, we mentioned that talking about autoscaling, this is a launch configuration so it tells you how you want your instances to be created in your autoscaling group. And AMI, or AMI, some people disagree on the pronunciation, thoughts afterwards. This is an Amazon machine image so this is your configuration and your disk snapshot. So we have a master instance. The helpful thing about AWS is they don't charge you for instances which are switched off. You pay for the storage but the storage is relatively cheap. So if you have an instance that spends most of its time switched off you don't get charged any money. That's different to a lot of other cloud providers who will charge you even if your instances are switched off. So you can have this master image which you can turn on and make config changes whenever you want. You turn it on, you change it, you edit your configurations and then you switch it off. Once it's off you create a new image and then you build a launch configuration from that image and then you tell your autoscaling group to use this launch configuration. Now, when your autoscaling group starts new servers, they're going to be using your new configuration. Now, remember this isn't about deploying code, this is about operating system level changes which hopefully you're not doing on a daily basis but it's still very important to make these changes especially if there is a new security vulnerability. But we would like to automate it because that whole process seems slow. You have to turn on an instance, you have to SH in, do your work, turn it off, image it, put it into your autoscaling group, kill your old servers one by one. That's a lot of work. So, we have Lambda. Lambda is currently the new hotness in AWS. Everyone is talking about serverless, the stiller server, it's just not one you control. But it lets you run single scripts on schedules or triggers. It costs basically nothing, especially for automated work. If you put a web frontend in front of it and send all your requests there, it will start costing you something. It runs Python and a few other lesser languages but not PHP. So, you can use two sets of Lambda functions in order to handle rolling out your Amazon machine images into your autoscaling groups. So, one, we'll watch an instance, your master instance for it being stopped. Every time it stops, this function just helpfully goes and makes an image of it for you. So, you don't need to worry about going and clicking that make image button, working out what you call it, everything like that. And they generate in about five minutes. You then have a second function in our AMI into service in the autoscaling group. It finds the latest AMI, it copies it, it puts the launch configuration together and tells your autoscaling group to use this one. But you now have all these instances in your autoscaling group using your old AMI. So, you can then use scheduling or just instructions to your autoscaling group from Lambda to tell it, okay, I want more instances. The new instances will launch with your new AMI. Then, you have a schedule which says, okay, 10 minutes after that, half the size of my group again. And your autoscaling group will kill the old instances first. So, now you have, with about 10 minutes of creating your changes, you have a full new configuration. You've not had to learn any new tools. This can all be done from within AWS's control panel. You're going to miss Cron. We thought we were nearly there, didn't we? This is part of many applications. It's used for loads of different things. And the task can be minor. Statistics. Or they can be pretty major handling your payment processing. So, not having your Cron is a bad thing. Probably even worse to run your Cron's multiple times across different instances. Because whilst you can maybe check things like locks, if everything's running at the same time, you might miss something and customers being charged twice for things or having twice the number of items arrive or twice the number of emails arrive from you. It's going to be pretty upsetting. Instances can start and stop at any time. So, you'll have one of your instances running a Cron and then it goes down. So, what's the state of that Cron job now? Does another one pick it up? How do you handle these things? So, we looked at a few ways to get around this problem. One of them was a centralized lock. So, could we have all our instances running Cron's and just have a lock using EFS that you lock when you begin a process and you hope that if there's tiny millisecond differences between instances notice each of the first. We thought that could work, but it seemed a risk. We didn't necessarily have probability that the locking would work. What if you check for the lock and by the time you created a lock someone else has missed the check for the lock and you get some really nasty timing issues. You could just set your instances all about a few seconds apart in their clocks, but that has another problem. We also looked at queues. Amazon has a queuing service, so we thought, well, you could have a separate instance that just pushes tasks to a queue and that did seem like it was a good idea and that was recommended by our consultants in AWS. The problem with that was once again we're at the stage of having to design an entire new system to do something we're already used to doing. So, we came up with an easier option. We have a control instance. The control instance can use the same shared codebase pulled off S3. It can connect to the same database, the same EFS value, but it's a different image. It's not a web server and we can use auto scaling with a fixed group size of one. This means that the cron server should just stay up. If it does go down, the auto scaling group will start a new server only after it's been off. We can do this with cron, whereas we can't do it with web. With web, if your instances are booting, that's five minutes that you're down for, that could still be critical to your business, but with crons, generally you can sustain a few minutes of downtime because your cron can just catch up when it's back. You might have to think about how your jobs run. If you have jobs that run once a day that are essential, you might want to check that they have run a little while afterwards in case something's gone wrong with your servers. The way we dealt with it is using a system called CloudWatch, which I will introduce later, and sending metrics to CloudWatch to tell us the server's selective. If those metrics fail, we get an alert and at least we can go and investigate what's happening. We've got this control instance, but how can we SSH into the instances if they keep moving? You've got this auto scaling group, things are going up, things are going down. You don't know where your instances are. Day-to-day, they might have different IP addresses, but if you want to do monitoring of our crons, if you want to check files on the EFS value or access MySQL through the instances without publishing a MySQL by PHP MyAdmin or opening MySQL of ports publicly, which I wouldn't recommend, then SSH into your instances somewhere secure is a really useful tool. So our servers change IP and we want to access them without opening the AWS dashboard and seeing what the current IP is. We've got a few more acronyms here. EIP is an elastic IP, basically a fixed IP address that's given to you, it won't change and you can choose what server or network interface it's assigned to. I must have route 53, that's not an acronym but I thought I'd use this point to mention it. That's Amazon's DNS service, so that's how they handle all their name services and their DNS is really useful for things like load balancing. So I had a system using EIPs to map to our domains. But actually whilst I was writing the notes for this slide a few weeks ago I realised there's actually a better way. This is one of the wonderful things about AWS. Even writing a presentation telling you how we do AWS, I learnt some new ways to do AWS. It does mean you're always on your toes and someone could always come along and tell you everything you're doing might have been wrong, but it's really exciting to just keep learning. So, the AWS CLI. This is a Python package which you can install using pip and it takes your API credentials. Everything in AWS can be done via CLI. Basically their dashboard is just a big interface to their backend command line system. So anything you can do in dashboard, fetching data, changing things, you can do via the CLI. A really helpful part of learning CLI means that you're also learning their software development kits. So if you're using the PHP interface that's just named after everything from the CLI. They're authentication. So they have a system called IAM which is a user management. If you start using AWS with any regularity you will grow to hate IAM. It is defining policies which allow certain users to access certain things. They have recently released a nice editor for IAM policies which makes things a lot less headache inducing. But some things are still complicated. Try and set a multi-factor authentication on one of your users if you don't believe me. But if you are using the CLI you'll learn all these helpful commands you need for IAM. So when you run a command you then just add that command name to a role or a policy in IAM so you can allow users to do the thing you're trying to do. The AWS CLI returns JSON. That means that output can be passed using something like JQ. If you haven't used JQ it's an awesome little command line JSON parser and search system. So we can run a little system like this where we say AWS describe my instances with an instance ID which can be our control instance or you can put in a more complicated filter like a tag or an instance size and then using JQ we filter to get the public IP address. Stick that into a bash script and have it SSH into whatever instance it returns from that and you now have easy SSH access into your server without using any other part of the AWS stack. If you do do this be warned when you connect to different servers twice on the same IP your SSH will freak out because it thinks it's being spoofed. You can remove it using the SSQGN command. There are maybe ways around this I've not looked into yet. I thought about there are instance startup scripts you can run which do things like set a host name so could we actually modify the startup scripts in order to do something like set a common set of names for us but then where would you get those names from? It got a little bit complicated. Please do tell me. You're going to miss viewing your logs though. Even though we can now SSH into our instances we still have a lot of them. I don't know how you currently manage your logs or tracking things genuinely at the moment before we moved on to AWS the wide way of handling logs on our servers was to SSH in and stick tail on at the start of every day and just watch them coming through. It's especially fun to watch a DDoS attack and then through your access logs with no power to do anything about it. You've got your system doing logs at system level. You've got things like auth logs, you've got syslog MySQL has its own logs and most likely your application is logging a lot of things already. You can change your configuration and new servers appear so all your old logs that were on the old servers just vanish so all this history of access who's access what errors did we have just go away but you still want to monitor everything. Now you can rewrite your application again or your logs just get sent somewhere remote but once more we have the business case do we actually want to rewrite our whole application how we're used to doing things to deal with a new world of working? If you're already streaming logs with the third party maybe you're already fine but again we're dealing with this minimal case so how do you get your logs back? Well currently we've said we have system logs in our log so if you have your own logs runtime errors can appear in different places so sometimes they'll appear in the Apache logs sometimes PHP will have its own logs sometimes there won't be exceptions your application logs somewhere else Cron sometimes have problems for some reason often logs are in VAR mail if you've ever found that one there's a massive mail folder with the root name and tons of logs for years of crons that were failing and you never knew but with multiple servers nope they could vanish at any point and it's very hard to monitor them so we use CloudWatch I already mentioned CloudWatch before CloudWatch is basically AWS's way of monitoring all of your stuff there's loads of features in CloudWatch you have metrics dashboards, alarms I could probably do an entire talk on CloudWatch so come back next year for that one but this is just going to look at logs CloudWatch has an agent you install this agent on your servers and then you tell CloudWatch where my logs are stored so I want a log my Apache logs I want my auth log for SSH I want various of the logs that I care about from my application to send into CloudWatch the agent is really useful because it will track your logs as you make them you can point your agent at a single log file or a directory so any new files appearing in that directory should get logged it's reasonably good at handling things like log rotate so if you're rotating logs you can re-log your old logs when they get rotated but every now and then it does but if you want multiple log files from a single directory to be sent into different log streams because maybe it's an application level log and your developers just drop files somewhere and they log something you'll end up with them all in one stream so totally unrelated parts of your application now again have to either be rewritten to log to a directory or you have all these logs in one stream like that so I wrote an extra log checker so it's sort of the ADUS logs agent agent and what this does is this handles in a bit more intelligently your file system it analyzes your log files and then it rewrites the ADUS logging agent configuration if it sees a difference and recently added the ability to also add instance IDs to your log streams so you can not only tell where a log is coming from but which instance created it in the first place and so if we want to be real professional loggers we can look at the ADUS logs package this is something again I found out whilst writing this talk and the best loggers use CLI so the logs package lets you have a CLI into your logs you can do things like tail basically your entire logging infrastructure so logs from 10 servers can get streamed down to a CLI on your own system so you can easily view them in real time so again for watching things like errors for watching that exciting fun fundidos or just for monitoring general things you can do it, you can use this for search it's a really great tool really nice colouring and things like that if you're interested so again if anyone wants help setting that up just drop me a tweet I've been doing that recently so where are we now we have an application that is hopefully running on AWS we've had a few challenges we've had some slowdown, we've had a lot of questions from the board of directors as to what on earth we're doing and why so many acronyms no one really knows what they're paying for these services but we'll work that out in the end so why have we done all this well, we get to the good parts and sorry to any JavaScript fans we have a lot of new flexibility with AWS, there are a lot of new tools we have as well as the ones we talked about at the start the basic infrastructure level flexibility AWS is a constantly expanding platform I don't want to sound like some sort of fanboy a lot of these things will apply to Google Cloud Platform or Azure or anyone else who tries to compete with them but the strength of these kind of services is that they are always advancing faster than the things we build ourselves so if you want bulletproof backup if you want the ability to analyse massive amounts of data in a short amount of time you can start sending things your information into Elasticsearch or the services AWS offer and every time that you think a service is really hard to set up and use and configure you can guarantee that in about a year's time AWS is going to have automated that process for you so each time each year each time they release new features our ability as developers just goes bigger, we have to do less of the boilerplate of managing servers of SSHing into things and more of building our applications and our services one of the things I realised fairly early on is that actually AWS is seen as a cloud server provider it's not, it's a cloud data centre with AWS you have the ability to create an entire data centre architecture if you want you can create networking systems you can create firewalls you can use NAT gateways to route traffic to different places you can make different subnets that talk to each other you can create bank level security in your own little application just by firing a few things at the AWS console and it really does offer a huge amount of flexibility the challenges that would at one time be very insurmountable to a small development team are very easy with AWS we had an API client who we started working with who told us shortly after our integration was finished that the API could only be accessed over HTTP and that wasn't secure which is good news to us but to be secure we were going to have a VPN permanently into their internal intranet in order to access the API I mean, you know, whoever heard of REST with a previous provider we'd have had to say, well we can't do this we now have to talk to our server provider they have to set up networking systems of VPN it's going to cost a lot of money who monitors it with AWS I looked up a guide called how to make a VPN with AWS and I followed it and I sent them the configuration at the end and we now have a VPN into the network still a stupid system but we could actually do it even though this requirement was just dropped on us at the end of a long project so AWS is a cloud data centre you will find more and more ways to use it once you're based on their platform Terraform Terraform is something you might not have heard of before this is one of these configuration as code systems so Terraform allows you to declare what you want your AWS infrastructure to be so to declare your DNS your S3 buckets, your instances databases, networking users, everything and then it will go to AWS and say this is all the stuff I want what's currently there and what do I need to change and it will handle it all for you it's a great thing they've made their own little language there's a syntax highlighter for Terraform available for PHP storm and Terraform genuinely changes the way you actually use AWS I don't know if you would get the same benefit from going straight into Terraform it might be helpful to use their console for a bit first before you then add Terraform I think once you've been using the console for a while moving to Terraform suddenly changes your life in a new way if you want to learn more about Terraform Tyce is doing a talk at 240 in the main track and talking about that and a few other DevOps tools so I'll be there to try and unlearn everything I've already learned so come and join us for that and CloudFront you can use CloudFront right now if you're not using CloudFront maybe take some time at lunchtime today to go and start using CloudFront CloudFront is an edge caching system simply put this means that it caches things from your server near to where your users are so static resources basically so if you're serving JavaScript you probably are, you're probably serving some massive overblown JavaScript libraries go and cache them you're serving CSS, go and cache that serving massive image files that no one's able to optimise go and cache those too if you're existing site right away and simply cache requests for certain resource types it takes a bit of configuration to start with but a wide bit I mean maybe two hours and then you suddenly have this massive load reduction on your server because your server isn't trying to work out how to constantly serve 10 million JavaScript files anymore your server can do what it's there to do which is produce dynamic web content CloudFront is also a really useful tool even in front of your dynamic web content because you can apply Amazon's firewall technology to it for someone who comes under DDoS attack if you come under lots of people trying to do SQL injection attacks automatically on your system use CloudFront with their firewall system which allows it to do things like strip out malicious SQL from your requests so CloudFront will save your server load and it will improve browser performance for your users and that especially matters if you're global so if you're a global business serving people in Australia whilst your site content is quite small quickly your images in your CSS and JavaScript are going to take ages to get there CloudFront caches them near to your users so I can't mention it enough go and do it now I think CloudFront probably saved our life this year before we could finish our migration to AWS I went and accidentally had a baby and that suddenly took me out of work for a while and that meant that during the busiest time of our year no one was there to manage our infrastructure helpfully we deployed CloudFront the week before and that meant that this massive amount of code on our servers just died and our servers went down to a much better baseline load so, yeah, I can't advocate for it enough and that's it, thank you for listening don't clap yet and once again you can find me on Twitter at Mike I'm on Slack, the PHP North West Slack group is open if you want an invite to that on phpnorthwest.org.uk and OG AWS this is the open guide to AWS it's a GitHub repo you can contribute to and they have an amazing Slack channel with a really helpful community lots of them are based in London and lots worldwide so join OG AWS if you want some real good advice that's where I get most of my information so you'll see me every day asking questions if you've liked this, tell me on joinedin