 Welcome to this presentation. I know we are running a bit late, so I tried to keep this brief. Everybody has the time for lunch and so on. As an introduction, my name is Mikku Lammih. I'm for working in AppCloud Finland as a Senior developer. I flew here from Helsinki. Our main office is located in Helsinki, though we have a location here in Singapore as well. People usually fly from Sinti to Hell, but I did it the other way. I've been working for five years in the cloud. I'm doing all kinds of mostly bagging stuff. Unlike many of you here, I'm not so hardcore PHP developer, but I've done that also in the past. I know something about PHP. Anyway, our goal is not to be the PHP provider, but the general hosting provider for all kinds of developer related projects and, of course, production hosting as well. I'd like to talk today about a bit of scaling the IO performance, especially in the cloud hosted environment. Because usually people in the cloud when they talk about scaling, it's about CPU or RAM, but rarely about the IO operations. Nowadays, of course, we have lots of different products to help to scale the IO, but the basic problems still remains. I will show you some basic information about this and some examples and some demonstration of how our products can maybe help with this. As a part of my history with PHP, well, as many speakers were already mentioned, things were a bit different years ago. My first touch with PHP was when I was doing my compulsory military service in Finland 15 years ago. We had this opportunity to end up some sort of special operations groups after the basic training, basic military training, and I somehow ended up doing the PHP programming for the Finnish military public affairs division. Back then, there was a heated situation in the world politics. There was actually a crisis ongoing. The US-led coalition was invading Iraq and Finnish defense colleagues wanted to create a special website where they could put up some analysis about the situation where military specialists could write updates about what's going on and so on. We conscripts working on the website development. We had like two weeks' time to set up entire site from strats with a built-in CMS system. This was 15 years ago, so tools were a bit different back then. We used PHP 3 and basic lamp stack for that, but we didn't have really much clue about scaling the website. The thing was that before that, most of us was only familiar with some PHP hobby projects, but this website, since it was the official website of the Finnish defense forces, it was mentioned in the public news, in the television and newspapers, and suddenly it became very popular. It got lots of hits, and we were suddenly faced with lots of scaling issues, lots of performance issues, and we had very limited resources. We had like three physical servers, just one for the database and two for the web front end, and we really had much clue about how to improve the performance, but we improvised a lot and learned a lot in the short time, and those experience were quite valuable, and they still think that some of those principles apply. But what we couldn't do back then, we couldn't just add more resources like more CPU or more RAM or more servers in the line, because those were all physical hosts. There were no cloud hosting back then, no virtual machines. It was just plain hardware and operating system running on top of that. And of course, we didn't use PHP for any actual military stuff. We didn't program weapons with PHP. It was just this public website telling about the specialist analysis of the Iraq crisis situation. So what's so great about cloud nowadays is that it's so easy to scale things up. You don't have to actually configure every new physical server or install it, or put it to the racks or anything like that. I don't have a photo from the past. This is just truly 20 years ago in our small company. It wasn't that cloud. It was different company back then. We were doing web hosting 20 years ago, and it was basically like this back then. It was more like running stuff from your basement. I remember when we got our first rack and we were about, now we are professionals, now we have racks. Before it was just some more or less desktop computers doing stuff. So scaling up website was a bit different challenge back then. Nowadays, you could just go to your cloud provider and click some new servers, and that should do it. But is it always enough? Well, to define the scaling apps in the cloud problem a bit more, this might be a bit naive definition I'm going to give you. I know that many of you are very experienced developers and probably have faced this kind of problems before. But still I'd like to kind of focus what your problem here is, because at least from our provider, our perspective as a service provider, we still see many of our users running quite old applications. Applications that have been built with a test and proven lamp or lamp stack, but they are built more or less as monolithic apps, so they don't scale that easily. And here's probably the reasons why. So here's the basic setup. We have a server which has resources like storage, CPU and RAM. Then we have the application software. Here I'm simply putting it as a lamp stack. And then we have the actual application, which is written in PHP. You can of course think about layers in a bit different manner, like web server being on top of the application code. But anyway, I guess you get the picture. So here's the basic app. So this could be like a Magento website or WordPress site from our perspective. We got hundreds of them running on top of us. And usually when the customers encounter a spike in the traffic, what they first do is that they increase the flexible resources, CPU and RAM, because of course many of these have been things that we need to have more RAM or we need to have more memory, and that will make app go faster. Well, in some cases it will. Some applications scale up this way, even without tuning any configuration parameters. Like usually when you scale up your RAM, it doesn't help immediately unless you configure your web server or PHP or MySQL to use more of it. But in any way, this is a sort of easy and nice solution. And then what if this is not enough? Well, then we have always option in cloud to just duplicate the server, put a load balancer in front of it and run two instances, or even three or four or five. That should do it. Well, of course if your application is capable of running independent instances, it will work this way. But what about if you have shared data that needs to be accessed from any of these instances? Then you need to have some sort of shared IO, some sort of shared storage, and that creates a totally new kind of problems. Then you can just put additional instances. So IO is nowadays maybe, people think that IO might be a bit of outdated problem because we have so much caching going on, like everything can be stored in memory, memory is cheap. Why do you need to think about disk when you can have 100 gigs of RAM in your server and keep everything there? Well, in some cases you have more data than the word of 100 gig. What about if you have one terabyte of data that you need to access across your friends? I don't know what kind of website that might be, maybe some very popular cat photo site where people could write their cat photos and comment them and so on. But we've seen applications that actually have quite huge resource needs and applications that have gone this big without their developers realizing the need of the scaling at least in the early phases of the architecture design. So basic IO program usually is that you have to satisfy IO, you have reads and then you have writes. Reads usually consists a larger part of the application usage. Here in the reads, I guess they're also doing any sort of file system operations that gives you status of the files like getting the latest modification, timestamps or anything like that. And writes could be either writes from your application or it could be database writes, but still something that you have to persist and then be available to other processes to read. So writes or reads are usually like you have static resources like images or CSS data or cached content and then dynamic content that needs to be every time read from the disk or from your storage because it could have changed and you just don't have the latest copy. And writes here could be also dynamic data that you get from your application, from your users. And then you have logs. Logs here could mean anything that application or the underlying application stack and server software writes, but your application doesn't immediately need to provide new content for the clients. So it could be access logs, it could be error logs, it could be database replication logs, anything like that depending on the configuration. What is common with it is that it will stress the underlying storage. It will mean that you have storage operations going on which might then in turn slow down the read operations where you are reading data for your application and its customers. So in some cases, of course you could probably minimize the writes like in production system don't write any debug logs or keep the application logs and server logs in the separate storage or send them over the network, but there could be use cases where you just have to write things down in quite a detailed manner. For example, you might have some legal requirements to log every user coming to your website and have detailed access logs. There are so many different use cases that it's not easy to give a simple solution that would work for everything. So usually if you want to improve the IO performance, it's about reads and in that sense, having more caches will usually help a lot caching the data so that it doesn't have to be read from the disk every time. That's the common solution, but it doesn't help with the writes. What might help is to direct some of the writes to the separate storage. Here again, when the image has logs, I mean all kinds of data that application doesn't need immediately like it could be database binary logs or anything like that. So that's easy solution. You have separate storage system that handles part of the load like separate log disk in your virtual server. That's quite easy setup and it usually will help up to some point, but not forever. Adding more caches on the read side will, of course, again improve performance, but still if your application is write intensive, no matter how much caching you do on the server data side, it still comes down to the fact that if you get lots of new data that needs to be written down and available immediately for the other processors reading the data, then you have a bottleneck and it's the IO bottleneck. No matter how much CPU you can put on, no matter how much RAM you can put, the disk is still the underlying impacting thing here. So, of course, you could do multiple servers and have shared storage like there's many, many storage available, good old NFS or GlusterFS systems that provide multiple servers read and write access to the same storage at the same time, but still the storage is a single storage and it has limited performance, limited IO capacity. So if you have more clients accessing the same storage, it will not improve the performance, it might actually make the situation worse. So that won't work too much. So then there's some common tips for sharing the IO load. Again, I suppose this is nothing new for you better still I'd like to go through this because at least with the existing applications there's no easy way of doing this usually. So most common way is to sort of divide your data. Use partitioning or sharding, which means that you separate your data to different physical entities, totally independent units based on some sort of key like user ID or product ID or whatever unique ID that might be. So in this example we have three different independent storages and then we have a sort of storage director which gets the applications key. In this case it could be the user ID which is divided by three and then we get the reminder models and use it as a director. There could be more advanced setups like hash keys or multiple composite keys or multi-level sharding and of course many database products do this for you. You don't have to think about this maybe. That again depends on how your application is built. Are you building a new application from the strats or are you probably trying to fix some legacy old application that has lots of internal data structures, lots of dependencies between. In this case it might be useful to actually know how the databases do this. Go to the details and think about yourself maybe. You could do this also without a database. You could use this kind of setup with just for example if you have lots of small files that you have to store and serve. You could use this kind of sharding architecture with the files as well. Just use the file name or metadata as a key. Well of course it gets complex with this kind of setups. It's not easy to set up and it's not easy to maintain. So before going this kind of setups it might be worth thinking about what are the pros and what are the cons doing this. Well pros, you have smaller units of data managed or it will give increased performance and then you can do fetching data in parallel like getting data from multiple different storages at the same time which will then result faster combined data performance and then you might have four torrents but having this kind of setup doesn't automatically imply that it will be four torrents. It's a totally different story to create another density than create just the better performance. This is like same thing as the good old hard drives and right disk setups. You could have performance or you could have redundancy but doing both gets expensive. So cons then, well it's indeed more complex to setup so more work needs to be done, more servers to maintain configure and so on. So if you are not the administrator for all this maybe you have suppressed this, I've been doing this for you or DevOps team, they might be annoyed to setup more complex system but that always depends on the business case requirements. Yeah and the joins are of course slower. If you have data that needs to be fetched from multiple locations and data fetching from location B it's dependent on data fetched from location A and these are stored on different bag ends then the partitioned or sorted architecture really doesn't help that much. So it depends on how do you choose your data sorting strategy. If you choose your keys in the wrong way it might be that the setup might actually be slower than having just a single storage bag end but it depends, it needs some kind of testing and tuning and doing it maybe many times over. Iterations are usually good way to improve performance and then there's also risk for having part of the data unavailable if you have multiple different storage hosts one of them just might go down and depending on your application it might still work with partial data but some of the data might not be available. I suppose everyone who's ever used Facebook has seen sometimes that there has been some glitches where not all the user profiles are available and I suppose of course Facebook systems are far more complex but the basic reason is that some of the bag end storing some user profiles have not been available at that moment. So that kind of things could happen. Well tools for setting this up. Nowadays we have lots of modern OSQ databases which provide a shouting options by default like MongoDB, Cassandra or even Riddice. I won't go into details of how to set this up and each of these have a bit different strategies and each of these have a bit different pros and cons but one thing is that these are maybe not so much used with PHP I think these are more like just with the Node.js or Java but with PHP at least with the legacy applications we still have more like traditional databases. Just a quick handgrip how many of you have implemented any OSQ databases with your PHP applications? And how many have you used like traditional databases like SQL or Postgres? That's what I would assume. So only smart parts have used this. So they are not automatically better in the sense that they provide shouting functionalities there's of course always a use case and of course basic use case for doing this is that you have a massive loads of data and massive loads of data that needs to be constantly written if your database consists only like 100 rows there's of course no need for going all this but if we are talking about hundreds of millions of rows or data entities then this might be handy. So these traditional SQL databases they do have some tools for setting these kind of sharded setups up and nowadays at least well Oracle has some built-in tools but as always they are expensive MySQL and its derivatives have some third-party products that can be used to set up those kind of clusters and I think Postgres also has some kind of sharded partitions to support nowadays but they are not so built-in systems that with those other kinds of databases So if you have legacy application that's using old database well not old database but traditional database then changing it to use sharded database Sharded storage architecture might require quite a lot of re-engineering and at that point it's important that you know your app and know your data and even more important it is when you are building something from the scratch Good question is that should everything nowadays be able to scale to like a web scale Should you build your every application in a way that it should be capable of handling 100 million users from the beginning Probably not There's no point of doing early optimization two months but still I think traditionally PHP applications have not been built two months with this in mind So if you are writing something totally out of scratch it might be a good idea to think about how to set up your data how to make it so that it doesn't have two months dependencies if it would be split to multiple different storage pagents at some point So if you are building a new app from scratch it's of course architectural decision to do it that way and if you are modifying some legacy app then it might be worth digging actually how it uses data and would there be some use cases that would benefit from doing some sort of optimization in this sense and then there's of course non-blocking IO options I'm sorry if the previous talk about React mentioned this help to some point but not with the multiple rights So then we come down to part that what if it happens that you just have some legacy application or even a modern application with a sharded databases this architecture that has been scaled up to the best possible but it still won't give you enough performance Well luckily there's option that you could have faster storages available even in the cloud So here's first from our sponsor So app cloud is a growing hosting provider we provide services in the infrastructure as a service model So in customer responsibility it's usually setting up the operating system and the applications and anything that runs on top of them we don't provide a lampstacks by default we provide just Linux servers and think that the end users usually know how to configure them best and we focus on the underlying hardware underlying virtualization and underlying storage and try to make it as fast and good as possible We have currently eight data centers around the world one here in Singapore, two in the US and rest are in the Europe Now if you got our promo code and try to deploy a server you might notice that deploying a server to for example Singapore or US locations might take some time and we are aware of this and we are working on this and I promise that during the rest of this year this will be much better but we also have nice features like built in complementary private networking between each of the data centers so you don't have to set up anything extra or pay anything extra for having communications between the servers that also might help with creating applications that scale and communicate with each other over the private network and we have of course good 24-7 support available and here's some pricing information you can find more on our website but let's go to so about how we can help you with setting up applications we have some resources like our API documentation then we have a collection of how-to articles and then we have our application libraries one in PHP and others in other languages available in our public GitHub so I'm next going to show you some short demos let me change the view so we have this nice feature called initaliation scripts which helps you to set up some basic things on your server on the first run so I created one that will set up a very simple LAMP stack server this is probably something that you don't want to do in production you probably want to do instead like setting up the basic Ansible host there and use that instead but just for the demo purposes how to kick up PHP LAMP running application in short time and this so let's deploy a new server this won't take too much time hopefully so I'm going to deploy it to Helsinki too let's use just 40 cheapest plan for a moment Helsinki prices have a bit additional extra compared to our other data sensors because production costs there are a bit more higher so we have lots of built-in templates which means that you don't have to actually install the operating system you just select X in template and I use the search key and then I use this Inescript here okay let's get up it should be up and running in about one minute in the meantime let me show you the IO demo so we provide quite fast storages by default for every customer and just to show you how fast they are I'm showing you the FIO tool which is used to benchmark IO first against my own laptop here so this is my local disk here in my own laptop some of you might be familiar with this one this tool like it's most important thing here in the random writes is to measure how many different random writes you can get from the second and here you can see that my local MacBook Pro performs about about 60k IOs per second which was quite good much much better than traditional spinning disks much much better than your user cloud hosted server but let's see now I think we have our server up and running let's check it might take some time okay yeah, it's up and running and here we have LEM stuck installed there already so let's login I used root logging by default you could of course define other users as well so server has my SSH key already deployed in and here we have the latest LEM stuck installed just like I specified previously so let's install FIO here as well I should have put this also to that installation script so it would be in there by default but no matter it will be there in no time and then I have to find my cheat sheet to copy the FIO command here so let's run this same test here on the cloud server and we can see it's performing about the same or even a bit faster than my local MacBook Pro server and this is the performance you can get with our $5 per month servers with the storage so how does this translate to actual application performance well if you have a server that does lots of IO requests like storing the data database or writing data from database it will be quite significant improvement compared to some traditional spinning disks here I have other server which uses our older storage bag end it doesn't use max IOs this is different one so if I run this here well it's capped to 1000 IO apps per second which is of course much slower but this is like performance you might get in many other cloud hosting providers for your storage and if you set up a complex application like Magento Soap it will truly show up how this translates to the actual application performance so besides of the user interface you could also set up servers from the API here's an example in XML format how to create server from API here's the same script same script here so let me just finally show this so this is output from our API it started to create a server and it will be up and running in few minutes with the same kind of setup as the previous one so I think time is getting up here if I would have more time I would like it to do the actual Magento Soap demo but it seems that setting up the Magento's from the scrubs in short time was a bit too complicated because I haven't done it in a long time and surprisingly latest Magento was not compatible with the PHP 7.2 by default I suppose there are people here who know how to set up this in like switch of your hand but for me as a bag end developer for like bag end infrastructure developer it was a bit too much in this short time anyway thanks for listening I hope you got something out of this we have a boot over there in the hallway so feel free to come talk to us and ask about our systems and services and we also have some swag to give away and of course we have the $50 credit for new users if you register now so thanks for watching and hope you enjoy your rest of the conference thank you