 35th lecture in the course design and engineering of computer systems. In this lecture, we are going to wrap up our discussion on performance engineering by covering the last topic which is performance scalability. So, let us get started. So, what is performance scalability? Let us understand that first. So, so far what we have seen this week is how to measure performance, how to analyze the performance that we have got and how to kind of optimize the performance of a system, you know, a system with given number of CPU cores and memory, how to optimize its performance. So, the other aspect of performance is what is called performance scalability. So, the performance of a system only tells us what performance you get for a given set of resources. This system has, you know, some capacity of say 100 requests per second. Now, you know, over 4 cores, this system has 100 requests per second. Now, performance scalability tells me that if I add 4 more cores into the system, will the total capacity become 200 requests per second? That is what I expect, right? I have given more CPU cores and you know, all the CPU was fully utilized. So, I gave it 4 more cores. Now, it is my capacity also increasing. So, can a system's capacity increase if we give it more resources? So, if it does, if a performance improves, if we give more resources, then the system is said to have good performance scalability. On the other hand, if no matter how many resources you give, the performance does not improve and the system has poor scalability. Note that a system can have very high performance itself to begin with, but the performance can scale poorly. So, these are two different things, performance measurement itself and the performance scalability, these are two different aspects. Now, once we have, you know, this system with given configuration, given hardware, we have optimized its performance using all the techniques studied in the previous lectures and we can optimize it no more, but we still need more performance, you know. This performance is not enough. It is not meeting the offered load, the expected incoming load. In such cases, what we do is the only way to improve performance is by scaling, by adding more resources and hoping that the performance improves because of these extra hardware resources. And the scaling can be done in two ways, which is one is what is called vertical scaling. You know, you have a 4 core CPU, I will add more CPU cores, I will make it 8 core CPU or I will add more memory or whatever is the bottleneck resource, you will add more of that. This is called vertical scaling or scale up. You are simply making one machine more and more powerful. The other is what is called horizontal scaling or scale out. So, if this component is the bottleneck, I will, you know, add, run the software on another machine or another VM and create a replica of this component and split my traffic amongst these replicas. If this component can only handle 100 requests per second, I am getting 200 requests per second, I will have a second copy of it to split the load, right. This is called horizontal scaling or scale out. So, both of these are tried in general in order to scale the performance of a system. Now, note that a lot of cloud management orchestration systems, this auto scaling, they do it automatically. If some component has become the bottleneck, then they will identify, you know, the CPUs at 100% utilization, then they will, you know, add an extra replica and so on. You know, all these cloud orchestration softwares, they will measure the incoming load and they will measure your utilizations and automatically scale up components when needed. So, this auto scaling, you know, dynamically expanding your resources when your load increases and also similarly scaling down. If you do not need so many replicas, you know, shutting some of them down. All of this is very important in a real life system because you do not know how much load is going to come at any point of time. Therefore, you have to dynamically adapt, scale your system to meet the incoming load. So, first let us understand a little bit more on vertical scaling, you know, specifically suppose, you know, your system, the performance is limited by the CPU, your CPU cores are 100% utilized and you want to add more CPU cores in your system to improve performance, right. When you do this, what you need is, you need a property called multi-core scalability. That is the performance of your application should increase in proportion to CPU cores. Otherwise, you know, vertical scaling is meaningless. It is not beneficial to add more CPU cores if the performance stays the same even if I give extra CPU cores, right. So, what can you do to have good multi-core scalability? This is important to know. Note that this kind of scaling is done all the time. Once again, cloud orchestration systems, they will monitor your CPU usage and, you know, give more CPU cores to your VM or, you know, take away some CPU cores. So, this dynamically scaling the number of CPU cores given to an application component is very commonly done. But just giving more CPU cores is not enough, okay. If your applications performance does not increase with increasing cores, suppose for, you know, one CPU core you are getting this much performance with 2 and so on, then your applications performance should linearly increase. This is the performance with the number of cores. They should roughly increase like this. Only then is it beneficial to add more cores. Otherwise, if it does not increase, then there is no point adding more cores. So, applications that can be parallelized easily, you know, the application logic can easily, if it is running on one core, it can easily be replicated on the other cores. Such applications have good multi-core scalability. But why do some applications not have good multi-core scalability? For some applications, no matter how many cores you add, the performance will more or less plateau out at some level, you know, they cannot use more CPU cores beyond a point. Why does this happen? There are many reasons. One is of course, cache coherence overheads. You know, each CPU core has its own private caches and it has some common caches, okay. So, we have seen this before. So, if you have multiple CPU cores, all of them accessing the same memory, then, you know, the CPU core has accessed some memory location, then the CPU core also wants it, then there is some cache coherence traffic. So, this cache coherence overheads means that you cannot just keep on adding CPU cores forever to your system. At some point, you want to run out of, you know, more CPU cores is not going to improve performance. So, cache coherence is one important reason why the performance of an application may not scale beyond a certain number of cores. The other reason is also, you know, within the application, there could be lot of serialization, you know, just because you have so many parallel cores, does not mean the application threads can run in parallel. Maybe this thread is waiting for this thread for some reason, for some locking, things like that. There could be some serialization and not enough parallelism in an application. So, these are the common reasons for poor multi-core scalability, right. So, let us understand this in a bit more detail. So, when do these cache coherence overheads occur? They occur when the same memory location is accessed from multiple CPU cores. So, this CPU core has a private cache, this CPU core has a private cache and this CPU core is accessing, say, some memory location X here and this CPU core also wants to, you know, write to the same memory location, then you need some synchronization here to either update the latest value here or invalidate this value fetch it from here. Something needs to be done, you know, you need some snooping, some directory to keep track of which core has which memory location and when one CPU core updates its copy and its private cache, all other cores which also have a copy of this data have to either update their copy or invalidate their copy. So, these invalidation messages need to go to all the other cores which also have used this same memory location in the past, right. So, these all these invalidation messages, updating messages, the snooping, directory, all of this imposes overhead, this is all extra work. Therefore, you cannot just keep on adding CPU cores to your system and expect that the performance will improve. And this is not only a problem with true sharing, this is also false sharing, ok. Two CPU cores could be accessing different memory locations that are located on the same 64 byte cache line. Recall that caching is done at the granularity of not bytes but chunks of 64 bytes each. Therefore, different cores are using different data but they are all on the same cache line. So, the same cache line has to be invalidated here and invalidated here when it is updated here, ok. So, this causes cache coherence overhead. So, then the question comes up why do CPU cores need to access the same memory location, why cannot this thread access different memory, this thread access this different memory. So, sometimes multiple threads may be accessing the same parts of a memory image, you know, they might be accessing the same global variable in of the process or you know, the same variable in the heap of the process. So, then in such cases the same memory location may be requested by different threads that are running on different CPU cores which belong to the same process. The other thing that can happen is multiple processes on kernel mode. So, the kernel code and data structures is the same, right, when any process goes into kernel mode it is accessing the same PCBs and same list of processes and everything. In such cases also the same OS code might be requested here and here in multiple different CPU cores by different processes in kernel mode. The other thing is, you know, commonly threads need things like locks and this locks are also accessed from multiple CPU cores. For example, this is the code we have seen of how to acquire a lock, you know, you have this lock variable we have seen this way back in this course and different threads when they try to acquire the lock what they do is they try to set this lock variable to true and if the previous value is also true then you have not acquired the lock you wait, right. So, this is the code to acquire a spin lock using this hardware instruction that basically updates this lock variable. Now this is locked this variable the cache line containing this variable suppose this thread has written to it then this thread also running on another core also wants the lock it is also trying to do test and set right to this lock another thread on another core is also trying to access the lock trying to acquire the lock. All of these threads are contending for the lock and what is happening this cache line that has this variable it is being updated here then again some other core will update this copy is invalidated some other core will update again this copy is invalidated. So, this cache line keeps this is called bouncing the same cache line multiple threads on different cores are writing to the same variable or trying to access the same variable then this cache line keeps bouncing across these different cores keeps getting updated here invalidated here updated here invalidated here this keeps on happening and this causes a lot of extra traffic and overhead in the system. So, locks because the lock variable is shared across different CPU cores this will happen. In general these cache coherence overhead hurt the performance of your system especially when you want to scale to a large number of CPU cores and the other problem is that sometimes threads and processes cannot execute in parallel all the time just because you have n threads and n CPU cores it may not be the case that all of these threads can always run in parallel on each CPU core it is not possible why because maybe you know this thread has entered a critical section and it has acquired a lock then all these other threads cannot run in parallel why because they are waiting for the lock or maybe you know it is a pipeline design where this thread has to do something only then this thread can do something else you know they are waiting on a condition variable whatever reason it is not possible that all the code in your program can be executed in parallel on multiple CPU cores therefore it is foolish to expect that as you add more CPU cores your performance will just keep on linearly increasing in proportion to the number of CPU cores. In fact there is a law or a simple formula and not just a complicated law but a simple formula which is called the Amdahl's law that tells you how much performance gains you can get due to parallelism you know suppose I have some application that is doing some work and some work takes a time t1 on one CPU core and if the same work I try to do it in parallel by having multiple threads on p different CPU cores it is taking a time of tp right. So ideally what do I expect I expect that this t1 by tp this is called my speed up you know if I have one core I took 100 seconds if I have 10 cores I took only 10 seconds each then I got a perfect scale up you know this should ideally be equal to p because now the same work you are doing it across multiple different cores and if it is parallelizable you will get a speed up equal to p but in real life this is not the case why because not your entire application code cannot be parallelized suppose in your application logic only some part of it can be parallelized the other part cannot be parallelized it is a critical section it has to be run by the threads one after the other. So if alpha is the fraction that can be parallelized then what do you have when you run on p cores this alpha part of your work it can be parallelized so instead of this alpha part instead of taking alpha t1 time it is taking alpha t1 by p time because you are running it in parallel on these p cores then the remaining 1 minus alpha time part what is happening you are not able to parallelize it it will still take the same amount of time as it took for a single core therefore this is your formula for tp and the speed up you know t1 by tp if you do this it will come out to not exactly p and if you do this t1 by tp and if you take for large values of p what is it it will work out to this 1 by alpha by p plus 1 minus alpha that is what your speed up will work out if you divide t1 by tp and now for large values of p what is happening this is approximately equal to 1 by 1 minus alpha right. So what is this saying if you have large parts of your core that can be parallelized your alpha is close to 1 then your speed up will be very large by adding more cores you are getting better performance but if your alpha is very small you know almost equal to 0 then your speed up is just 1 how many ever cores you add you are still not increasing your performance by a whole lot okay. So this is what Amdahl's law tells you of course it is basic common sense that if you have large parts of your core that can run in parallel hey adding more CPU cores is good you will get improved performance by increasing CPU cores but if large parts of your core can only be run serially due to you know critical sections locking and whatnot then there is no point adding a lot of CPU cores in your system. So what are the techniques we can keep in mind in order to improve multicore scalability given all this we have seen one thing is avoid sharing data cross cores as far as possible you know if possible split your application data into separate slices and give each slice to a thread and let each thread run on a CPU core on its own so that you know each thread is accessing its own data and it is not trying to access data of other threads on other cores and even split at the granularity of in fact cache lines not just memory variables. So this of course it is not possible to do this in all applications all the time there could be some common data structures at all threads need to access but where possible split data cross cores have per core or per thread data structures so that you can reduce cache coherence overheads and also use logs only when needed because you know logs cause a lot of this cache coherence traffic this lock variable is bouncing across cores and they also serialize your core execution. So do not just you know every time your thread starts do not acquire a lock and do all the work use locking judiciously and there are also actually modern lock implementations that avoid this access cache coherence traffic and all of that. There is also recent work on how do you design data structures that are correct even without logs what are called lock free data structures lock free application design. So all of these are advanced topics so read about it if this locking is in fact becoming an overhead in your application and not just at the user application level even at the level of the operating system also try to modern operating systems are evolving by keeping you know OS data structures also as per core slices where possible so that the interaction between CPU courses minimized and modern operating systems also have what is called NUMA awareness that is you know if some CPU cores are closer to some memory then you will run the process if the process is memory images here you will run the process on the CPU core if the process is memory images here you will run it on the CPU core. This also helps in multi-core scalability. So these are all the techniques to improve multi-core scalability and if you have good multi-core scalability in your application you can vertically scale up to a certain point you can add more CPU cores to improve your systems performance if your systems performance is limited by CPU. So the next thing we are going to see is horizontal scaling of course you cannot add unlimited hardware resources on a system at some point you will reach a limit. So the next thing we can do you know if your systems performance cannot improve any further then what you will do is you will add more replicas and that is called horizontal scaling. If suppose you have a component then what you will do is you will scale this bottleneck component and add multiple replicas and you will redirect traffic you know instead of all traffic going to one replica it will go to the different replicas so that the load is balanced out and so this bottleneck can handle more request can handle more load and this scale out or horizontal scaling is also once again done automatically by cloud orchestration systems like Kubernetes for example it will monitor some pod is you know becoming the bottleneck then it will spawn more instances of that pod. So now when you have multiple replicas you know of say a server or a database then how do the other components and the clients contact these multiple replicas. So one thing you can do is you can tell the other components of these multiple replicas for example if a web server if it has multiple replicas you can return all of these IP addresses in your DNS response so that the clients can you know some clients will talk to this server some clients with this server so if a client randomly picks one of the n servers load is automatically distributed. So when you scale a system you can tell everybody else hey look I had four replicas before now I am adding another extra replica that you can do or a better way is to use a load balancer that is all clients do not know that there are multiple such replicas clients will send all the traffic to just one address and this load balancer will then distribute the traffic to the multiple replicas of a component. This is the load balancer so this is a special component either software or hardware component which actually directs traffic to some replicas according to some policy and you know it will adapt as more replicas are added it will you know distribute traffic to more replicas less replicas and so on right. So these load balancers are extra components that are added to redirect load to multiple replicas now which one do you use this one or this one it depends on your application but usually the load balancer based design is preferred because every time you scale up you do not want to tell all the other components in your system you know if some database has an extra replica you do not want to tell all the application servers hey look you know my configuration has changed so it is easier to simply use load balancers. So now this load balancer design let us understand it in a little bit more detail you know how do these load balancers work. So there are roughly two types of load balancers one type of load balancer just works at the network layer of the traffic that is you know there are multiple server replicas at you know these servers are all listening at different IP addresses then what this load balancer will do is whatever traffic is coming in so all traffic comes into one server IP address you know that is returned in the DNS then this load balancer will rewrite the destination IP address on the packet it will change this destination IP address to point to the server replica right. So this load balancer is redirecting traffic you know all the traffic is coming to some virtual IP address some dummy IP address then the load balancer is changing these destination IP address putting server 1's IP address on this packet server 2's IP address on this packet in this way it is changing the IP address of packets the destination IP address and maybe even port number of packets to redirect them to different server replicas okay. So this is a network layer load balancer why because it is changing only the network headers in order to do that it is not doing any application layer processing. The other kind of load balancer is an application layer load balancer in this what happens is the clients directly talk to the load balancer and not to the server replicas for example if you have an HTTP application layer load balancer then the client when it sends an HTTP request the request is received by the load balancer you know the TCP syn message is received by the load balancer SINAC is sent the connection is established between the client and the load balancer the HTTP request is sent to the load balancer. So the load balancer does all this transport application layer processing it will see the request and then it will send the request to either this server or this server to the various replicas it will send the request it will get the response back and it will return the response back to the client right. So this load balancer is very much involved in the application layer processing in reading the application layer request okay. Here you can do more sophisticated you know maybe if it is a HTTP request for the finding the searching for products in any commerce website. So those requests I will send it to the server if it is a request to purchase items I will send it to the server right. So this load balancer can do more application level redirection of request to the different server replicas okay. So these load balancers are more complex than a simple network layer load balancer. And this application layer load balancer now that it is reading the HTTP request it can do lot more things for example if it is static content it can just directly serve it from disk it does not have to go to any of these application servers if it can cache these responses it can also terminate HTTP SSL connections that is you know SSL the secure HTTP only runs till here this guy will send the HTTP certificates SSL certificates and all that. So that because this part does not have to be secure it is within a data center. So all this SSL level functions so this application layer load balancers usually do a lot of extra application layer processing also. That is why these are called proxy servers they are called reverse proxy servers because they are at the client side if you have this proxy server it is a regular proxy at the server side if you have it is called a reverse proxy right. So application layer load balancers in addition to just redirecting traffic they also do some extra application layer processing also because they are reading the application layer request they understand what the request is so things like caching or static content directly getting it all of this can be done in a application layer load balancer. So these are the two types of load balancers that we have. So the next thing is policy you know load balancer has multiple replicas it has to redirect traffic to these replicas on what basis does it do it, what policy does it use. The most important thing to remember is traffic of one TCP connection should always go to the same replica. You cannot send one packet of a TCP connection to one replica another packet to the other replica it would not work you know the TCP level state that is maintained inside the operating system it has to be in one replica itself. So now the question comes up how do you distribute TCP or UDP connections to replicas that is the question not just distributing packets randomly but how do you assign connections to replicas. So one simple thing could be a round robin policy okay whenever a new connection comes I will send it to the first connection I will send it to this replica second connection to this replica third connection to this replica right you can use a simple round robin policy. So when and you have to do this on the first packet of a connection not on every packet the first packet of a connection you pick one of the replicas and then you store this mapping saying this connection you know from this IP this port this source IP this source port this destination IP this port this connection has been assigned to this server right this mapping you remember. So the first packet when a new connection comes up you create this mapping use based on say round robin policy and for all subsequent packets of this connection you will look up this table oh this connection I have already assigned it to this server you will send all packets of that connection to this server right. So the first packet you will assign in a round robin manner remember this mapping somewhere because for the second packet you have to send it to the same server therefore you remember the mapping and all subsequent packets of the connection you will send it to whichever server has been assigned previously. This is one way the other way is you can do something like hashing you know hash this connection tuple and you know suppose you take the source port destination port source IP destination IP you take a computer hash of this you do modulo n where n is the number of servers and you know this modulo n value works out to 1 you send it to this server if it works out to 2 you send it to this server and so on right. In this way you can use any mathematical function also to map a connection to a server. But of course the problem with this is what if your n changes suppose there were 5 replicas you are sending the hash value worked out to something modulo n worked out to something you are sending now this n value changes then existing connections once they have been assigned to a server they should always go to that server just because your n value changes you cannot say oh this connection now start going to some other server. So, for the duration of the lifetime of a connection you have to keep this mapping the same otherwise existing connections will break. So, there are various other policies possible instead of round dropping you can see which server is least loaded you know the server is doing a lot of work you might send it to some of the free replicas. In this way there are many different policies possible but the important thing is this has to be done at a connection level you cannot send one packet of a connection to one server another packet to another server because all the TCP state you know congestion control acknowledgement all of that it has to be in one machine only. Now the next question comes up what if you know a request of one user. So, suppose I am a user I am doing some online shopping and I added one item and you know my request to add an item was sent to this server replica and this replica added my item to the shopping cart. Now my next request to add another item if it goes to another server then the server may not have my shopping cart from the previous server you know my shopping cart was here. Now my next request this TCP connection goes to this server then what is happening. So, we need these servers to somehow synchronize with each other and you know take care that the user state is correctly maintained just because you have multiple replicas you cannot say you know incorrectly process user state you cannot say oh no your request this is the first time I am seeing your request the rest of your request has been processed on another replica you cannot do that. In spite of having multiple replicas we should ensure that the application layer state is maintained correctly. So, how do we do that? One common way to do that is a lot of load balancers they try to do ensure what is called stickiness of users or sessions that is if a user is doing multiple transactions over multiple TCP connections in a session for example, user is searching for products and buying products adding a bunch of products to shopping cart and buying them. In such cases all these requests of one user if they go to different replicas then we have this problem that you know half the shopping cart is here half the shopping cart is here in the server and so on. So, all of these different connections belonging to one user to one session of a user. So, the session can have a notion of the time you know if this user comes back 20 days later I can send him to another replica, but for this session for this browsing session for this purchase session all the traffic of one user we want to redirect it to the same replica all the TCP connections of that user we want to send to the same replica. So, that the user state is correctly maintained you know whatever state of the user like shopping cart is correctly maintained. If you do not do this if you send different connections of a user to different replicas then the only way for this to work is all of these replicas store the shopping cart in some database and every time a user request comes you have to fetch it from the database to see what is the latest update and then add the item to that shopping cart right. So, it involves a lot of communication with a database and if you want to avoid that you have to send all redirect all connections of a session of a user to one replica only. So, load balancers that do this they are said to have stickiness you stick all the connections of a user to the same replica. Now how do you do this stickiness? How do you identify a user session? You can identify user session using various things like users IP address or HTTP also has this notion of cookies. Cookies nothing but a special string that identifies a user. So, whenever you make a request to a web server it will send back along with the response it will send back a cookie and if you store this cookie in all subsequent requests you will your browser will send this cookie. So, now the web server can easily identify you and redirect all those requests with the same cookie to the same replica right. You remember that this user this session is assigned to one replica you can first time you can assign it anywhere around Robin or hash or something. Then all subsequent connections from that session are redirected to the same replica and this stickiness ensures that the data can be stored locally you do not have to always update data in a database always synchronize with other replicas you do not have to do that it means it makes it easy to maintain the state of the user in one replica locally right. So, the larger problem we are seeing is how to manage this application state across replicas this is not easy once you have you know multiple replicas of a component if you had just one replica of a component whatever processing the component is doing whatever state information about a user like contents of the shopping cart or all these things can be stored in one machine in you know in the memory or disk of that one machine. But once your processing is split across multiple replicas you have to be very careful because if a user's request goes to one replica one time another replica another time you have to ensure consistency right. So, how this is done this is done in many ways either you can say I will have a completely stateless design what does this mean all your you know front end and all your application servers they do not store any data at all all data is always stored in remote databases in some other machine it is stored. So, that even if you have multiple replicas even if you know one user's one connection goes to this replica another connection goes to another replica all these replicas are accessing data from a common database ok. So, one user adds an item to a shopping cart that request is handled at this server then this server will update the shopping cart in the database the next connection of the user goes to another replica this replica will also access the shopping cart from the same database. So, you have a lot of overhead you to constantly accessing the database but in this stateless design because any of these replicas do not store any information locally everything is stored in a database it is easy to do the scaling right the load balancer design is simple you can always add more replicas because everybody is anyway using the same database you do not need any user level stickiness any replica can access state of any user from the database. Can you know it is easy to add replicas it is easy to scale the system the load balancer design is simple you know you do not have to worry about stickiness and all of that you can send any connection to any replica it will all work fine. So, this is called a stateless design this is one way to design applications only caveat is that the overhead will be hide you to these remote accesses. Then the other design is what is called a stateful design but a shared nothing stateful design that is whatever is the application state you know this shopping cart database you split it amongst the multiple replicas you say this replica will have the shopping cart for these users this replica will have the shopping carts for these users. So, you split all your application database into these multiple replicas. So, that this is called the shared nothing architecture because you completely split each of these replicas has a disjoint set of the application data. So, now your load balancer whenever user request comes if it is the users database is stored at this replica it will redirect the request there if the user state is here it will redirect the request here. So, you need perfect user level stickiness all connections of this user go to this replica all connections of this user go to this replica because this replica has all your state. So, you have split the state across all these replicas each replica stores a slice of the application layer databases and tables and everything and user state is partitioned this way and the load balancer will ensure perfect stickiness right this is called a shared nothing design and it is a stateful design why because each replica is keeping state you are no longer using a remote database and you know this replica is not stateless but it is keeping state this is of course one way to design applications. The other way is you use a stateful design but you use a fully replicated stateful design what does this mean each replica has all state of all users here also there is all state here also there is all state. So, now your load balancer design is simple any user can go to any replica why because the state is not partitioned it is not like user one is only stored here user two is only stored here all user state all shopping carts are stored here all user state is also replicated and stored here. So, it is a fully replicated stateful design right. So, the load balancer design is once again easy no user stickiness or anything any connection goes to any replica but the problem with this is you have to keep this state in sync with each other ok every time any time some state is updated here you have to tell all the replicas to update that state right all the copies of the state have to be kept consistent with each other which is a problem especially when you know some servers fail the server failed so therefore this server could not update the state at the server all of these issues come up. So, how to ensure consistency when state is replicated when there are failures all of this we will study in the next part of the course right at this point what I would like to do is just lay out these designs to you when you are designing application components you also have to keep in mind should the application components be stateless should I store all state in some other database or should I store the state locally within the component itself and this will have implications for how you scale your application if all your components are stateless it is easy to scale if your components are stateful but your state is split across different replicas then also it is easy to scale but your load balancer should ensure user level stickiness or if your state is fully replicated in all replicas then no stickiness is needed but you have to constantly keep the state in sync right. So, when you design applications for horizontal scalability you have to think about how you will manage the state within the application also across these replicas. So, that is all I have for this lecture in this lecture I have talked to you about what is performance scalability. So, whatever performance you get in one component is different from the concept of scalability which is how does the performance improve as I add more components or more CPU cores or more resources and we have seen two different types of scalability vertical scaling which is adding more resources to an existing component and horizontal scaling adding more replicas of the component right and a simple exercise for you is you know measure the performance of any application web server by increasing the number of CPU cores you know if you have done a load test with something like jmeter on an Apache web server increase the number of CPU cores and C does your performance increase do you have multicore scalability or not. So, that is all I have for this lecture this week completes our discussion on performance engineering and we are going to start on reliability engineering in the next week. So, thank you all and see you in the next lecture.