 For large-scale deployments, emulation and prototyping is not feasible. Therefore, before we actually implement the content delivery network, it is important to look at the theoretical side of it. That is where mathematical modeling, problem formulation and solutions come into play. So in this module, we are going to look at two issues, resource, management and allocation in terms of mathematical model that we might like to consider and we are going to look at the need, the problems that are typically addressed in CDN and the rotations which are used. Obviously, the need for mathematical modeling comes from the fact that typical problems in CDN, such as a resource allocation of hardware, bandwidth, computing resources and their reappropriation or reallocation is a serious concern because for CDN, the number of users is changing arbitrarily. This results into some interesting definable problems and these problems can be translated into mathematical models which best describe the scenarios and then these mathematical models can be solved by using some classical theories and best experiences. The typical problems that we face in CDNs are known to us already. For the sake of mathematical formulation, let's look at these into two categories. We have fundamental problems, the problems which are easy to state and easy to model and then we have integrated problems that could be comprising more than one of these fundamental problems. Let's look at a typical example of cache a server placement. We are going to discuss that then request routing and object placement. These are all very specific issues which are related to CDNs. Before we move to the server placement, let's look at some typical notations. These notations are the basics in mathematical model where we have clients which make requests for some content in the form of objects. Then we have the servers, the content itself, the objects and then the requests which are made in the form of HTTP or SOAP request. Then the physical distance between the client and the server and then the costs which are incurred in operating the server including the computing storage and the bandwidth requirement. Then we look at the capacity of a certain caching server. It is one of the variables that we can define. The bandwidth which shall be consumed by an object, the processing power consumed and the eventually the revenue which is generated by the whole activity that is servicing or provisioning the content to the requesting client. Then the latency in terms of we can say the round trip time or the delay. Then we have the variables which are more boolean in nature. For instance, if certain request is entertained, if a request is redirected to the caching server, if the origin server is not there. So all these things are based on our understanding of how CDNs work. So before we actually go into an example scenario, let's just be comfortable that yes, indeed, all these different variables and different units can be assigned some. The caching server placement problem is a serious problem where we are interested in deploying the caching servers optimally at locations which in which we have a certain variety. Let's say we've got m locations and we've got n servers and we need to deploy n servers at n locations out of m total locations. This would involve reducing the overall cost function. The cost function is determined by the overall traffic. So we'd like to minimize the traffic. Likewise, the delay that the clients experience. We want the clients to experience lesser delay and then the overall delivery cost which involves the hardware, software, bandwidth, everything included. Then this can be mathematically described by a very simple expression that is the cost of serving client i from server j. It can be determined in terms of f, h, c and that's about it. Now what are these variables? We need to go back into the table and see if these expressions make some sense. Then we have the static data placement. Static data placement is more of integrated problem because here we have issue. We do not have the origin server. We are relying on the proxy server or the caches server. Now the job is to place the objects at various locations. These locations could be the proxy or the caching servers to minimize the cost. The cost for a client to access the object has to be reduced. Similarly, if the object is held at the caching server then we will have a variable which is binary zjk. If for a k object if it is served by a caching server the cost is going to be 1 otherwise it's going to be 0. Another particular variable that might be of interest to us is xijk. It means if an object is served by a client by a server node which holds a copy this particular variable is going to return a 1 otherwise it's going to return a 0. It means we can imagine a mathematical equation that is getting a resultant value coming from different variables. This could be from simple problem like server placement or it could be more complicated problem like integrated example of static object placement. This is the mathematical formulation. Let's not go into the variables because that's going to take some more time. This is actually integer linear program. It's an optimization problem where we would like to minimize the overall cost. So it's an objective function that we would like to minimize. Now this has some constraints. The constraints are determined by xijk. We are interested in summing up the total cost it takes for the object to be served by a nearest server and then we have the same variable zjk that's going to have either a 0 or a 1 value if that particular object was served by a certain server or not. Now with these mathematical equations we have the problems. Now these problems could be solved by some workarounds. The exact solutions which are going to take more time and of course these are going to be more expensive could have some simplifying measures. For example we have something known as benders decomposition. Actually it splits a large problem into two small sub problems. So it means using a binary tree kind of arrangement a large problem could be divided into sub problems. Each sub problem could be further divided into a sub-sub problem. In this way we can break down a very large problem into various small sub problems. If each of these sub problems is solved then eventually that problem can be solved. Then we have some relaxation assumptions which could be used in relaxing some constraints that is known as Lagrangian relaxation. Now this Lagrangian relaxation and decomposition helps us to remove certain constraints which would not result into too much of performance degradation. Now these are the exact solutions but then we have another fast and easy mechanism that is heuristics. Heuristics is to basically hit and trial that is if you are looking for some kind of optimal solution then instead of trying each and every possible solution you could go for greedy heuristic algorithm that searches the best possible alternative at each iteration and if you are interested in not going beyond certain iterations so the best possible fit in that number of iterations could be the best greedy heuristic output. Now this is going to result into a faster scalable solution and it's a better less optimal work than exact solution. Now these are the solutions to the problems that we had initially modeled in the mathematical relationship. Now this reference is again by Raj Kumar Bhaiya Mukadam Patan. This is from the content delivery networks.