 My name is Piyush Gurd. I'm working at a team here with the platforms and systems who are collaborating with us. We're talking about what are the general practices and guidelines that should be followed to make sure that you know, building of an application using the LAMP and PHP staff can skate. So, yeah, it's a very famous saying from Donald who that premature optimization will come all the way. So, let's say you start building a web application from the day one. Assume that, okay, I'll make it a scalable application to be able to serve 100 million views on the day one. At that point, you start making small, small changes which actually may not be required in your early stage. And what happens when you actually start doing that, you're actually reducing the time it goes for doing your best thing and making more effective deployments and all those things. So, at the end of the year, you actually end up paying more for those premature optimization that you do as an actual gaming product. So, when we talk about optimization and doing performance scalability, the one major question that should be asked at that point is what is actually the goal of my, what is the goal of the exercise of scaling my app. If someone says that I want my database to be able to skate to 10 million users, that is a very incomplete kind of goal. So, someone can even say that, okay, I want my database to scale up to 100 million users, right, but it's like a incomplete goal. Why? Okay, I want 10 million users in my database, but at the same time, I want 300 concurrent users on my system able to get a latency of maximum one second with a 99% SLA of one second and my CPU spike is not going beyond 80%, right. So, that is a set and defined goal, right. So, otherwise this, many times what happens, people say that, okay, I want my system to scale 10 years, right. So, this is actually a very, not something to which you can actually aim to. It's not something that you, you know, it's not like a magic formula you have that, you know. If I design my app in such a way that it will always be scalable. No, it's never like that. When you have built an application, you actually have to do multiple iterations over it, do a profiling, do a benchmarking and then, it's an iterative process and always takes time and it's not something like, you know, it's not actually a self-evolution. There's no self-evolution which you can use to actually design your app so that it can scale for more and more users, right. Yeah, and actually if you go and search for how to scale your apps, you will never actually be able to ensure that your app will be scalable in the long run. All you get is common guidelines and practices and that's how it should be, right. And that's how it has to be, right. Because each and every app will have its own design perspective. It will have its own user requirements and all those things, right. So all you can do is follow some defined set of guidelines and practices and make sure that, you know, we are following it on every step of the road. Yeah, so the very first step when you're actually trying to make an application which you want, assuming that it should scale in the long run, is assuming that, you know, it has multiple layers to it, which is actually just accessing my database, pulling out the data, generating my HTML pages and dumping it to the user. Rather than just adding a whole thing approach, you should always assume that it has multiple layers to it. This is a generic, generic request load that happens for, you know, when you have a very large scale system. When a HTTP request comes, it actually hits a load balancer sitting in front of your frontend layer, right. Then frontend layer, CSS file, HTML files are something. And along with that, it actually forwards the request by a load balancer a bit to this backend layer, right. And this backend layer is actually doing all the talking to database and doing all the business logic and your every code processor, right. So, one main advantage that this kind of an architecture supports is that, you know, it has something called as it enables you to have horizontal scalability and each and every layer, right. So, let's say two years on the line, your app is so successful that maybe 200 million people are coming, but the app is more and more, you know, more intensive on the business layer. You can, since it's behind a bit for a load balancer, you can add more and more servers as you go. And your frontend layer will detect what's happening on the business layer, right. So really if you feel that your application is more, you know, rendered HTML page, so that means a frontend server is building scalability apps in the long run. Any questions? So, I will try to focus mainly on the lamp stack of this application. So, let's start with what kind of, you know, common configuration, common dunes that you can do on your Apache server so that you can actually have a better performance and, you know, you can actually have an optimal. So, as most of you are knowing that Apache actually comes in two papers, 1.3 and 2.0, right. So, 1.3 was a primarily pre-booked model in what happens as soon as you bring up your server, it actually forks a bunch of processes, right. And what it does is for every new request that comes in your system, it actually assigns a particular process to serve that request, right. So, let's say a new request is coming and all the current processes are busy, it will actually forks upon a new process and hand over that request to that process, right. So, one major advantage of this kind of web server model is that it's pretty robust since every request is executing it from process space, right. So, even if one particular request crashes, it's not going to affect your request. Many people actually say that this kind of a design is not suitable for, you know, for building very, this is something which I personally don't believe to be true, right. Because even before you actually start seeing the latency because of this particular reason, some other part of your application will start adding so much of latency that this will be pretty normal. Actually, no performance is different to be used. Then, I mean, statement is not, except of your multi-person modules and there are actually tons of modules available, but the commonly used are the MPF refocals and the workflow models. So, refocals is pretty much the same as any new process is spawned for every request, right. Small set of processes and for every new, and every process is maintaining a set pool in itself, right. And each and every request is actually handed over to another set inside. So, these are natively not go for this MPF workflow model, right. If you start doing it, you might actually exception from GH2. What are the parameters in Apache that we can actually, you know, even so that we can be sure that your Apache is not actually becoming a model like in the long run, right. So, the first thing which is very useful is actually configuring your max clients for your Apache, right. So, this is something which is useful for both 1.3 and actually your 2.0 refocals model, right. In fact, all of these configurations are common for both Apache 1.3 and Apache 2.0, right. So, what actually max clients show is what is the maximum number of processes your Apache server can actually spawn during its, whenever it's up, right. That means you are actually kind of putting an upper cap on the maximum number of parallel requests that you can serve from your, from your web server, right. So, let's say I set it to a 200, as if all, at a particular time, point of time, all 200 requests are getting processed by Apache and the 200 requests come, it actually goes inside of GH2, inside Apache and then it gets allocated to process, right. So, max client is something which should be due depending on what kind of, let's say you are expecting your web server will be able to be serving around 40 requests per second, right. Or let's say your Apache server will be serving maximum of 100 requests per second and you also know that my PHP process image is 100 MB, right. So, let's say the whole, the whole code that is running inside Apache has a memory footprint of 100 MB, right. And let's say you have an 8GB RAM machine, right. So, you can actually kind of find out what is the actual maximum number of clients your server can actually spawn, right. Or it actually start over utilizing your machines as well, right. So, what is, what is parameter actually configure this, let's say I restart my Apache server, right. So, it actually tells what is the number of processes will be spawned whenever you start the, start your Apache server, right. So, let's say I have my max client set to 200 and this value is set to just 10, right. So, what will happen when my Apache start, I will just have 10 processes to do as your server came up, set your site in a pretty, let's say at a particular instance you suddenly get 100 requests, right. So, at that point of time Apache actually has to 400 requests, right. For that particular time, you will actually see a very high CPU spike in your web server, right. That will actually, start servers generally should be pretty much dependent on the request per second you are expecting on your web server, right. It should not be low, I mean at least the average number of requests you see during this. What max request per child actually tells us, what is the maximum number of requests my process can serve before that process will be terminated and given. So, why is this actually necessary? So, what happens is PHP, the actual, the, all the heavy lifting code of PHP is in C. So, what happens is when you are writing code, lot of memory leaks actually are there in your code, right. And your Apache is running for continuously let's say for 10 days, 20 days. So, what happens is for each and every request, let's say your process is actually leaking 10 MP, let's say 10 KB or 20 KB of memory, right. So, after some time your Apache server, that particular process might have served 10,000 requests, right. So, the memory footprint of that process might have gone from 100 MP to 200 MP, right. So, at some point of time what will happen is your server will run out of memory, right. So, that's where this config parameter actually tells you, right. This is the process serving my 10 processes on a web server, right. And let's say I keep a max request per child value to be 100, right. That means what happens is, so assuming that each and every, all the processes are getting requests on a equal distribution kind of basis. So, I'm kind of assuming that after 1000 requests, all of my processes would have served 100 requests at once, right. So, as soon as those 100 requests are served, Apache will actually pay that process and spawn it in. So that it can actually free up the memory after that process. And let's say you have a, from where you actually configured. Let's say your document route has multiple folders and subfolders inside it, right. So, now one method is that for controlling the accesses to each and every folder, you actually configure it in your stdpd.conf file itself, right. So, other option what it does is find something called as staccess file, right. So, though staccess file you can actually drop in each and every folder. So that each and every folder can actually define its own security parameters. That means actually, you know, give you the feature of, you know, isolation between multiple user accounts and all that. So each and every, all the user accounts are actually inside a document route and each user account actually has a dot access file. Which is actually controlling the permissions for that. What happens is, let's say I have a URL which says, let's say www.fuseworld.com slash home slash pictures slash deli slash apc.jp, right. So what happens is, you actually serve this file, Apache actually has to traverse each and every folder along its path, right. So for each and every folder, it actually has to go and search whether that dot access file is present or not, right. So, and for each and every file looper, it actually has to do a staccess report call on the, on the operating system, right. So if you actually put a dot access, even if you put, so as your tracking please is, these small, small looper will actually add more and more latency to your request. Whatever config, security configuration you want to do, put it in a set. Yeah, so basically you have an scdb.com file, right, or you have a virtual host and all that, right. So in that virtual host itself, you can actually define directly your permissions rather than actually delegating it to a staccess file in each and every folder. Sorry? Yes, even directly. Yeah, it should be now. In a web hosting company, obviously it is something which cannot be overdone, but if you have a standard web application, then you generally don't have the case of multiple user accounts, right. I'm not very sure, but I don't think that's possible. Absolute path, yeah, exactly. From the route path of your document route, you will start searching for that. The document route, it will start, right. And that is the document route, right. As far as I know, it actually starts from your document route, right, where you have a config. If your document route is actually pointed to slash, bash, and double-double-double, it will actually start from there. Home views, web, it will actually start from home views where both do not actually go. The default value is actually, by default it is disabled, but I mean, this is something which should not be enabled even by, you know. So what host name lookup means is when a client, when a request actually reaches the patch A, it actually contains the IP address of the host from here when the request is coming, right. If we actually enable this, what a patch will try to do, it will actually automatically try to translate it into actually a canonical URL, right. What it actually means, we have to do an external DNS lookup file. And DNS lookups need not always be passed. The patch actually maintains a, by default, a patch maintains 4 kilobyte easily. So whatever HTML or whatever data you actually dump into the worst kernel, it will automatically transfer it on the socket, right. So, you're going to serve a lot of data on each and every request. Let's say your application is going to serve 30 kilobyte of it. That means on an average, the TCP buffer would have to be flushed almost eight times. In that case, what can we do? You can actually increase your default buffer size. Let, increase it to, let's say, 30 kb. Depending on the average size of your, one thing to be careful when doing this is that this is actually allocated on the common standard. So each, all the browsers that actually kind of support your content and coding, if possible, you should always try to gzip and compress your data as much as possible. At least in my personal experience, I haven't felt the need to use keep live at all, right. So if you, so try to actually put it mathematically. So let's say you have a page in which we have to serve n objects and you have even seconds, you take to actually open up a 32 seconds per object. So that's your time taken for a non-team live with this one and for a keep live with this one, right. So for actually keep live to perform better, your K has to be less than T1 into M1. Even if you keep your keep live time on at one second, this value will be actually pretty much like 1.2 to 1.3 seconds. In general case, the keep live is not something with electric helpies. In general cases, it's not something which actually helps you to do the performance of your application. Most of the big companies, they actually upload static files to a different bunch of servers altogether. So the main reason for that is Apache running mod PHP is actually a very heavy process, right. If you're actually using that process just to serve a static file, you're actually kind of doing an underutilization of your resource, right. It's better off to actually upload it to a different server. You can use anything exciting or you can actually put a split accelerator on top, which is running as a reverse proxy. Yeah, so PHP, yeah. So PHP is an interpreter as well as a separate interpreter language. It actually converts it into your Zen byte quotes, right. And for each and every request, it tries to compile the PHP script into your Zen byte quotes, right. So that can actually, you know, where that actually asking the interpreter to compile your code for each and every request into a byte quote. As goes byte quotes using the Zen cache. What it will do is it will actually cache the raw quotes and it will actually not be compiled each and every time. We'll just take the raw quotes and run it on there. So the next recommended practice is you should actually minimize or actually take all your includes, jump into a single PHP file and include that file. Right, so what happens is for each and every include, it actually has to do one start system for it and one deal. So first, you're a single web server, this may not help you much. But that's why you want to take it to 60 rebus per second. Then actually each and every system file will actually kind of add 15 to 20 milliseconds. Coming at the number of includes, let's say you have 10 includes, so it's kind of added around 100 to 150 milliseconds. Always try to use absolute parts that will automatically reduce the number of start systems. Yes. Correct. Correct. In your production environment, I mean you should avoid to do minimize logging. Enable full-mode logging in your product. Let's say you use an echo to actually start sending out your HTML pages, right? So when you're constructing your HTML pages, you start sending out the parts of your pages using echo command, right? For each and every echo, PHP actually dumps it into Apache. Apache mended it to 4kb buffer and then actually dumps it to the 2kb. If you're not using the output buffering, what it'll do is for each and every echo statement, you'll actually have to go right to Apache. Rather than that, enable buffering and keep writing into that buffer. And that buffer, whenever your script is finished, you just do a single flush. All that will dump to Apache and it will actually pass it up to Apache. Compression as I mentioned, and PHP also has a return on compression modules. One set of pluggable storage engines. And each and every... So by default, the part of my single is actually just this clear. Basically, the parts are optimized in the cache buffer. And the actual feature of the transaction, your table level locking, your role level lock. So when you're actually looking for tuning my single, you are majorly tuning. So what are the general questions that might actually come up when you're wanting to tune your own? My SQL server, right? What hardware do you use? In general, the typical workload for my SQL is always your memory bound or it's a disk bound. In rare cases, a quad-core machine or a dual quad-core machine, you're actually good enough. And the amount of it... What working does that means is, let's say, I have a pretty big size. But at a particular point of time, not more than 10 GB of data will actually... So that's a working data. So for this kind of scenario, you're actually going to go with a 16 GB machine or something like that. Storage engines, which one to choose? So in order to use an engine which actually most commonly use, we use it for each and every kind of workload. Whether it's a read-in test here, whether it's write-in tensor. Unless you want to go for... It's a manual memcache. So basically, all your non-significant data you can take or future reference will actually hit memcache rather than your data. When memcache will actually... No, but actually, you're correct. So what happens is, let's say I'm running a query on a table called as users, right? Let's say I did a query on select star from users where ID equal to visit something, blah, blah, blah, blah, right? And this query actually comes into my query cache, right? So what happens is next time the first TML query which runs on the user's table, all the queries for the user's table are actually in my cache, right? So even if that row is not affected, that row is actually gone out of the query cache. It will actually end up getting executed, okay? So you're saying that the randomization is the best? Yes, yes, yes. That's not something that you can control, right? No, let's say you have a scenario where you have, let's say, just 1% of writes and 90% of reads, right? Let's say I have a table on which I have, for every 100 reads, I have one write. Write it down. Yes, on a write-in TML. Sorry? Any TML. This is something which we are based on. What used to happen is that every 3 days, for a period of 2 hours, the latency of our request used to go through on one second. But then what we actually realized, we had a 16-bit database machine with the reloading we were going to, after a period of time, the memory was actually going beyond 15 GB and the kernel got the surface or not there, right? So as soon as the MySQL was getting restarted, all the indexes were actually going out of memory and all the working data actually had to be brought into memory and SQL was even, your simple join queries were taking more than... This is something which should always be taken care of because we actually spent almost 2 weeks as time to debug why that's happening. So as far as generally, I think there are some... Generally having a softwares, the amount of RAM is only 70-80% of your RAM if you have a degree of detail. Slow query logs, okay? Always the neighbors go query logs and always constantly monitor it because people will actually write bad queries and they will actually print down your server. In any scale you have a web app, I can confidently say that database is something which actually adds the most latency. If you don't have slow query logs, you will use the statistics of the indexes inside MySQL. So MySQL does not actually maintain real-time statistics of indexes. They are actually kind of obsolete. Unless you do an analysis of your data, your indexes will not be up to date. It really is to actually prevent internal fragmentation of your data. There's something which we faced last week. So basically your table, something that I actually had to run a very quick don't take away actually. Even if you do data single, look up on a secondary index, the query was actually... We've seen and we've found a lot of discover activity was happening on that because the data file had actually been... So always avoid feedbacks. Try to go for softwares. So by softwares you just make a column which actually says whether this record is active or not. Always try to use tools like... Factory should be followed is... Let's say you have a web app in which it's going into production and people have started using it. So make sure you're actually doing a regular profiling on that. The common tools which are going to be common in the LAMP, STAC, as XGB, can advance to HPD burgers. So they actually give you very nice statistics on each and every function level. So let's say you have 100 functions in your old part. It will actually tell you which function took how much amount of time. So you can actually narrow it down to the function level and start doing optimizations on that. Visualize the... They will actually give you very nice... If they actually give you a graph, you're saying that this function took this amount of time. Low testing and benchmarking on that. This is something which is again, aligned to the very first inference I made that when you're doing a scalability exercise, always make sure that it is goal oriented. Always make sure that I won't do achieve 100 requests per second. So make sure that you actually have a low testing and benchmarking set up on a different product. It actually has a data which is kind of a replica of what you have on production. Tools which are pretty common, Apache Mint, it actually comes with many installer projects. Actually, we've got... I personally like JVita. It's basically, again, a bunch of modules where you can actually simulate JVita. You can actually add more cookies, you can add very parameters. We're actually doing post-analysis and we can actually run our master JVita installation and we have three JVita installations and this master is actually controlling the slave installation and these three JVita slaves are actually... Make sure that in your production system, you have strong monitoring. Something which can never be done. In your server, you should always have regular alerts on that because once that phase goes off, you actually use the opportunity to actually do the profiling of what's going on at that point of time. Let's say at 2 p.m. today, you actually see a very high spike in your system. But there is no alarm. At that point itself, you'll come to know that, yes, my latency is going beyond 5 seconds away. So at that point itself, you can actually use it as an extender called APD and something at that point in my server. And actually then if you go and optimize your system, store exhaustively and alert to this system. So what happens is, let's say you go to office, you will see that you have 500 alert mails from the system. Here at the level, you're actually getting critical alert. What people naturally tend to do at that point is make a filter, move all the alert mails from the different box and comment about it. Make sure that you store each and everything so that post-analysis can be done. You're actually alerting when there's actually need to be, you know, when there's actually something wrong inside the system. Google is actually helping finding what we have around the database machine, exactly enables it. VMs are definitely interesting because your processes are doing a lot of swapping and all. It stays at your neighbors to analyze the system for some purpose. So let's say you're analyzing top of a string and delay something else a bit. So sad things and developers are kind of, you know, always on the back of each other. Let's say something goes wrong in production. It kind of becomes like a scene in both the world, right? So let's say I'm deploying a code for someone else and after some time suddenly the performance degrades. That guy comes to me and says, dude, you should have some water scaling tools in your Amazon server so that more and more machines can come up and latency of my model does not go down, right? But I actually go back and tell him it should not be like that. It should always be kind of, you know, a mutual effort between the sysadvent and developers. And what we have faced at Capitol is in the past couple of months, we have taken almost more than 70 interviews for finding good sysadvent and developers. And generally what can tend to happen is people actually know how to set up systems, but they actually don't know what's how the system is actually working, right? We have actually asked the guy with the six years work experience in database administration how does mysql validation work, right? He starts off saying that, okay, we have might have seen a file in the slave to point it to the server, configure the server ID, configure the build log and browse position and do a start slave on the machine that makes it work. That's fine, but how does it work actually, right? How does it actually go from master to the slave, right? In the same instance, we actually have to have failed with Naju's also, right? We actually asked him, how does Naju's work? He said, you just run a do an installation of Naju's package on your system, you actually get a graph where you actually see red color and green color lines. If the line is red, that means it's something critical, right? So, as I said, we always try to know what's actually happening under the hood, your web servers, otherwise, I mean, that's what actually kind of differentiates a good system guy from, you know, not so good guy and it might not be applicable to the folks present here. It's all I've had, any questions? It's more of a personal choice, I would say, right? So, now, one advantage that Apache has, you know, PHP gets compiled inside your Apache itself, runs as a module, right? But if you want to run PHP with nginx, you actually have to configure it using Apache CGI, right? Apache CGI actually adds an extra overhead of running an extra process on your machine, right? But with Apache, you don't have that. So, when you want to... Yeah. The whole PHP interpreter is part of Apache itself, right? I'm not very sure about that. The theme I have here is personally, so I can't keep up. I'm not at a good position to actually give a job to another one. No, not really. Apache and nginx with PHP are using Apache CGI mode and it has never been able to scale as much as mod PHP inside of Apache. That I didn't say for that reason, right? So, at the end of the day, what happens? Apache CGI actually spawns a new process for serving a request, right? Let's say I have 100 requests per second. Actually, 100 new processes have to be spawned, right? But the model is pretty much the same as your Apache, right? That is the case for Apache. Yeah, so, but it's actually still spawning a new process, right? But with Apache using Apache CGI, that is combinational. I actually personally use it. I personally use it and I've always found it to be... I mean, even if the performance can be as good as that, it can never be better. That I can say for sure. Apache and Apache CGI will never be a better combination than the mod PHP in Apache. Yes, so Pogona actually has their own customized distribution of MySQL, right? So, but this tool is something which is free. But the Pogona version is actually an enterprise version. You actually have to buy a license for that. Pogona's version of MySQL, yes. It's an enterprise version. No, so if you... Then you actually won't get support and all the security prices for that, right? So basically, it has a limited support for that. Yeah, these tools are pretty much... Which one? Big days. Yeah. So basically, you can have a... You're just adding a one byte to your system, right? So, earlier columns are tiny ends. You're just adding a one byte per column, where the two signifiers in that column are actually available. But that... I mean, where is this run through that system? Sorry, no. Then you have to obviously modify it. But at some point, at some point... Yeah, so let's say you have done a big deal and if you have... You can afford to have a down payment and a slave kind of a set up where actually all your needs are going to the slave and the rights are going to them. Have any questions? So I'll be hanging around please. Feel free to ask me.