 Whenever, we are designing a water resources project, we must decide on what should be the design values. For example, if we are designing against flood control, then we should know what is the maximum flood which can occur. Now, these are all probabilistic or depending on chance. For example, the floods we do not know what will be the flood which will be occurring next year or next to next year and so on. But, based on the previous data, we can estimate some of the values. Similarly, if we are making a storage reservoir to protect against drought, we should know what is the minimum flow which can occur in that area. So, all these are, since they are all based on probability and chance, we should now look at some of the things which depend on a probability. And these are known as risk, reliability and safety factor. All these factors, they partly will depend on what is the probability of occurrence of an event. Most of the hydrological events, for example, rainfall or floods, they would be dependent on chance and nobody can predict what will happen next year or next to next year. So, that is why we have to analyze them in a probabilistic frame, so that we know at least what is the probability that such a flood will occur next year or what is the probability that minimum flow in the river would be so much. So, for this, we should look at some probability analysis and that is what we are going to do in this lecture. We will look at some of the probability theories and based on that, we can assign a risk and choose a design value. .. So, let us write the probability as probability of occurrence of an event. Let us say x is an event which may be rainfall, it may be flood in a river, it may be water level in the river and we want typically probability that this x would be greater than or equal to some specified value x. Now, if we are dealing with floods, typically we will be using greater than or equal to. If we are concerned with drought, then typically we want the probability that x is less than or equal to certain given value x which may be the minimum desired flow in a river. If we use this greater than or equal to x means probability of accidents, then typically this would be the flood in a river and naturally this probability would depend on what time period we are talking about. For example, what is the probability of occurring it in next 10 years, next 100 years and so on. So, this we would denote by P and we would write it as probability x greater than or equal to x or in most of our discussions, we will just use the symbol P to denote the probability that the variable x which may be let us say the maximum annual flow in the river would be greater than or equal to some specified value x. Now, based on the probability, we can define a return period T and it would be simply equal to 1 over P or we can also call it recurrence interval means what is the average time lag or time gap between two occurrences of an event. So, typically it would be given as 1 over P where P is the probability. So, if probability of an event occurring in 1 year is let us say 1 by 2, then the return period will be 2 that means on average the event will occur once in 2 years, but we should be careful with the term return period it does not mean that the event will occur every T years does not mean event occurs every T years. So, this is important because an event may have a probability of 0.5 it does not mean that it will occur once in 2 years, we can take the example of a coin if we toss a coin we know that the probability of heads or tails is half. So, this means that on an average out of every 2. So, the return period in this case would be 2 1 over the probability. So, this is the probability of heads or tails and therefore, the recurrence interval would be 2. So, we can say that on an average in every 2 tosses of the coin one head and one tail will occur, but it does not mean that if we toss the coin twice or let us say 4 times there will be 2 heads, but if we take a very large number of trials. So, the key word here is that large number. So, if we toss the coin let us say 1000 times then we would expect there would be 500 heads and 500 tails. So, it is not that every 2 occurrences will have one head and one tail, but on an average every 2 occurrences will have a head. So, that is why we should be careful in using this return period that it is not that if we have a return period of 10 years the event will occur every 10 years or once in every 10 years it may occur twice in a period of 10 years then it may not occur even once in the next 10 years. The other term which we use or we can based on this value of p suppose we have n years of record and typically these will be consecutive years. So, suppose p is known to us and we want to predict how many times the event will occur in the next n years. So, we can define a probability of x being greater than equal to x this event occurring r times in let us say next n years. So, in theory of probability there are equations which give this probability and we can write it as p r n which is the probability that x will be greater than capital X or equal to capital X r times and n is the number of years typically we would deal in terms of years. So, if annual flood we will say that n is the number of years in which the event is occurring r times and this can be given in the probability theory there is an equation which relates this is nothing but n c r factorial n over factorial n minus r factorial r then we have probability of occurrence and probability of non occurrence. So, if we look at this equation this is the combination parameter n c r p is the probability of occurrence of the event and 1 minus p is the probability of non occurrence. So, this equation gives us the probability that an event will occur r times in next n years. Now, based on this we can decide what is the probability that the event will not occur at all in the next n years. So, if r equal to 0 this indicates that the event does not occur in next n years and this is a term in which we are interested quite a bit because if we design some project with some value of let us say design flood and its design life is n years we would like to know what is the reliability. So, we will call this term reliability because this indicates that if we design some structure for n years of useful life then in those n years the design flood will not be exceeded because r equal to 0 means that the event is not occurring even once. So, we can write the reliability r e by putting r equal to 0 in this equation and therefore, the equation which we get is 1 minus p to the power n. So, by putting r equal to 0 this term will become 1 this term will also become 1 and this term 1 minus p to the power n will be left. So, reliability is given by this simple equation if we know the probability of occurrence of the event in a single year then in n years the probability that the event does not occur even once is the reliability of the structure or that project. So, we say that the reliability of the project is the probability that the event will not occur during the lifetime of the project which is let us say n years and therefore, the structure will be safe because it has been designed for that design flood. Similar to reliability there is another concept which we associate with it which is the risk and risk let us call it r i is nothing but 1 minus reliability. So, which is logical and this risk is the probability that design value will be exceeded in this case we will talk mostly about the floods. So, we will use the exceeded term here in route it will be less than that. So, probability that design value will be exceeded at least once in n years where n will be the design life or the useful life of this structure and at least once means it may occur more than once, but at least once if it occurs then we will assume that if the flood is more than a design flood then the structure is likely to fail. So, our risk associated with the project will be this probability and therefore, we can also write it as 1 minus power n or since p is 1 over t that is the return period of the particular design flood we can put that value here or here and get the reliability and the risk in terms of the return period. So, we can write the risk as 1 minus where t is the return period of the design let us call it design flood or design discharge. So, if we use the design flood for a particular return period we can get what is the risk associated with that design flood and risk is something which we have to choose. So, we choose how much risk we can accept for a particular structure if the structure is very important for example, if there is a dam failure of the dam may cause a lot of loss of life and property then we would try to have minimum risk or very small risk, but if the structure is such that its failure will not lead to loss of life or property it may be just inconvenience to people for example, there may be a culvert on a road. So, the flooding of that road will not cause typically loss of life or property it may be just inconvenience to the travelling people that the road may be blocked for some time, but the loss will be minimum in that case we can go for a higher risk and to reduce the cost. As is clear if we use a low risk then the cost becomes large because low risk means that the return period has to be high. So, high return period would mean low risk, but costly because q will be larger the design q obviously, once in 2 years flood or once in 100 years flood they will be very different, once in 100 year flood will be very high. So, if we design for that the structure will be very expensive and risk will be very small. So, we have to balance the cost and the risk based on our engineering judgment as to how important the structure is. Now, similar to the concept of risk and reliability there is a concept of safety which we can described in terms of a factor of safety or sometimes we use the term safety margin. They are little different factor of safety is a ratio as the name implies and it is defined as the actual value used for some parameter p divided by its theoretical value of the same parameter p where this parameter may be rainfall, it may be flood, it may be water level. So, A p is the actual value of p used in design and T p is the theoretical value of the same parameter. So, this p may be it may be flood design flood, it may be water level for example, in a river or it may be rainfall. So, this may be any parameter p and T p is the theoretical value of p which is based on the hydrologic analysis. So, what the factor of safety means that once we find out T p using theoretical considerations we would further increase it by some factor to take care of other uncertainties. So, we perform a theoretical analysis, but this analysis also is based on some assumptions and therefore, to be on the safer side we would increase the value of T p by some factor this may be for example, 1.1 or may be 2. So, this factor of safety also will depend on it may have any range and it will depend on what is the importance of this structure. More important structure we will probably use a higher factor of safety and safety margin is nothing, but the difference of A p and T p. So, for example, if theoretical flood comes out to be 10,000 meter cube per second the actual value which is used is may be 11,000 meter cube per second with a factor of safety of 1.1 then our safety margin will be a 1000 meter cube per second. So, this means that if our design flood is let us say 11,000 meter cube per second means we have a margin of safety there of about 1000 meter cube per second. The actual maximum which we expect is only 10,000 meter cube per second, but in some case if there is a higher flood due to unforeseen circumstances up to 1000 meter cube per second higher flood can be accounted for. So, this gives us a margin which there is further safety in our design. So, risk reliability and factor of safety all these are very important concepts whenever we design something which depends on chance. So, all these hydrological variables for example, flood rainfall etcetera they would depend on chance and therefore, these three must be accounted for the risk reliability and the factor of safety. Now, let us see how we can obtain the probability and how we can analyze a given data this is typically done using frequency analysis. The data which we have typically will have values of parameters given like time may be q suppose we are dealing with the flood in a river then this q may be the discharge in the river. The time may be daily it may be hourly it may even be weekly or 10 days. So, that depends on what frequency of measurement we are using, but the discharge values in the river they may be shown in a continuous curve it may look like this will respond to rainfall in the catchment area or flow to ground water evaporation. So, all these factors will decide what shape of the discharge curve we will get. Now, based on this data this can be thought of as a time series. So, q is a variable which is a function of time and we can think of this as a time series and then there are lots of methods commonly called this frequency analysis methods which can be used to obtain the frequency distribution or the distribution with time also we can say that suppose this q data is available to us for 10 years or 30 years and we want to design our structure for let us say 100 year return period. So, one aim of the frequency analysis will be to extrapolate the available data. If the data available is only for 10 years or 30 years and we want 100 year flood then we would have to do some kind of extrapolation and those techniques frequency analysis techniques can be used to do that. The way we do it is suppose we are interested in the flood. So, we can find out suppose this time is in years 1, 2, 3 and suppose we have this 50 year of data. Here we can say annual maximum flood or maximum annual flood. So, for the first year what is the maximum flood in the river for the second year what is the maximum value for the third year what is the maximum value and so on. So, we would have 50 of these values available to us and based on these 50 values we have to decide on the probability of a certain event. We may have to extrapolate it to get 100 year flood or 200 years flood. When we choose this maximum annual flood generally what we do is for a single year we take the maximum value and we ignore the values which are smaller than that maximum value. There is one other option in which we can consider all the floods and then choose let us say 50 largest floods. In that case in 1 year itself there may be 2 occurrences of floods, but in hydrology typically we use maximum annual flood such that only one which is the largest flood in a year is accounted for. So, we consider only the largest flood in a particular year. If the second largest flood in a year is larger than any other largest flood of any other year then we do not consider that. So, the maximum annual flood is for a particular year the maximum value. Once we get these values suppose there are n records we will arrange them in order. So, let us say we have here the rank and their queue. So, these n records are arranged in order like m will be going from 1 to and suppose n is 50 or we can use some number n here. The largest value would be written here and the smallest here. So, these would be decreasing order of magnitude. For example, if we look at these 50 values suppose the largest value occurs in year 40 that largest value will go as rank 1 and the smallest of these floods may be it will occur in let us say year 30 then that value will be at the end the smallest. So, once we arrange them in the order. So, we have a table like this where we have m and q we define a new term which is the plotting position. So, m goes from 1 to up to n q largest and smallest and the plotting position is generally taken as the plotting position m minus some constant c 1 over n plus some constant c 2. These constants vary for different techniques. It is most commonly used technique is known as the viable plotting techniques and in the viable technique we have m over n plus 1. So, the plotting position here which is used is 1 over m over n plus 1. So, for example, for this case it will be 1 over n plus 1 and for the last value. So, this tells the probability of exceedance of the particular event. So, for this discharge this would be the probability for this discharge this would be the probability and the recurrence interval is just 1 over probability. So, if we do that we can plot either probability versus q or we can plot recurrence interval t versus q. So, t or p since p is 1 over t we can plot either 1 they are essentially the same. So, the time period of the recurrence interval would let us say go from 1.01 and we may reach a recurrence interval of about 50, 100 and so on. So, data which we will plot from this we can see that the largest value will have a high recurrence interval then the smallest value will have. So, there will be some kind of plot here which will tell us about recurrence interval versus discharge relationship and then we can extrapolate. So, we can extend following that same trend and we can extrapolate based on extension of this curve what will be the 100 year flood in the river. So, this probability plotting is an essential component of extrapolation. Weibull is just one method there are some other methods also for example, you can use both c 1, c 2 as 0, 0 or you can sometimes use half and 0 sometimes 3 by 8, 1 by 4. So, these are various combinations, but we would not be using these we will just stick with c 1 equal to 0 and c 2 equal to 1. So, 0 and 1 which gives us the weibull plotting position. All these frequency distributions are typically written in terms of there is some mean value of the variable x bar and k sigma x. So, this is the mean of x. So, x may be the annual maximum flood and this is the standard deviation. This x is t here written period of t. So, x. So, we want to find out the value which has written period of t it will depend on the mean the standard deviation and a frequency factor. A frequency factor which is a function of written period and type of distribution. So, if you look at this figure whether our distribution is like this or it is like this or it is like this or it is a straight line like this. So, depending on what kind of distribution we have for the variable x in this case q, we will have some equation which will relate the frequency factor with the written period and the type of distribution. And once we know the value of k, we can find out the x for any written period t and use that in our design. The some of the commonly used distributions we know that we mostly use the normal distribution for most random variables. So, normal distribution or which is also known as the Gaussian distribution is quite common for most of the variables, but here we should realize that the events which we are talking about are extreme events. So, q may be normally distributed, but the maximum q in a year these are all extreme events. So, if we look at let us say discharge in a river it goes now in one year they will be let us say this maximum then the other year there is some other maximum and so on. So, very simplistic approach in which we are saying that most of the times it is almost constant and once there is a big flood sometime in the monsoon and then like this. So, the actual distribution of q may be closer to normal, but distribution of the extreme events which are the floods will not be normal. And therefore, we do not typically use the normal distribution for extreme values. There are some distributions which are known as extreme value distributions which are commonly used for analyzing the maximum floods in a river. We will look at two or three common distributions for example, gumbel distribution then there is a Pearson type 3 distribution and we take normally the log of the variable. So, that we call it log of log Pearson type 3 distribution and sometimes we take log normal distribution where we say that the values are not normally distributed, but their logarithms are normally distributed. So, let us start with the gumbel distribution and the gumbel distribution the probability is given in terms of a variable and the again the probability that x is greater than or equal to some x is given by this equation where y is a new variable we can call it a reduced variable which depends on the mean and the variance of x. So, 0.577 plus 1.2825 x minus its mean. So, this term represents the deviation from the mean and then we divided by the standard deviation sigma x to get reduced variable y and based on this y we can obtain the probability of accidents of x from that value x. So, this probability distribution for any given pi p we can also obtain. So, if we say that we want the reduced variable y for a given probability p then we can use this expression which is nothing but transformation of this equation to y p. So, based on return period also suppose we know return period t then simply using p equal to 1 over t we can write this in terms of y t as minus 7 and of t over t minus 1. So, if we know the return period we can estimate what is y t we can put that y t here and get x corresponding to that return period or if we know the probability same thing we can find out y p from this which are identical equations except that we are using this probability in one case and return period in other case and then we can obtain the x corresponding to that probability or return period using this equation. So, the equation which we use is x equal to x bar. So, for any given time period we can find out y t from this equation if t is known we get y t from this and knowing y t we can get x corresponding to that return period t using this equation. This equation is valid when we have a very long period of recurred. So, if the data is limited we will not be able to use this equation let us say n years. So, this equation which is based on the Gumbels distribution is valid for a long period of record. If it is limited to n years data then we will have to modify this equation a little bit by comparing that Gumbels equation with this we see that this factor k which is the frequency factor in Gumbels distribution is equal to. So, this is k. So, this k is valid very long period of now if as we say data is limited to n years of record then this k has to be modified and the equation which we use for k becomes y t y n bar. So, now this value and this value which are some kind of mean n is termed division they would depend on n that is why the subscript n is put there and y n bar is generally the function of n of course, y n bar is reduced mean for n years of data and for example if n is 10 then y n bar is about 0.5 if n is about 50 then y n bar is about 0.55 and as we know if as an infinity then we get the old value 577. So, there is a table of values given in various books and the references which provide the value of y n bar. So, you can have a table of values in which n will be given and corresponding y n bar values of the given. So, this table can be seen in various books. Similarly, this S n is a reduced standard deviation this also is a function of n and again for different n values the value of S n is given in books and other references. So, a table of n and S d or S n will be given for different n values you have different S d values or S n values for infinity of course, you get 1.2825. Some other values are for example, n equal to 10 the value of n is about 0.95 for n equal to 50 it is about 1.16. So, these values again can be looked up from the tables you can put the value here and get a value of k from this equation. So, for smaller number of data points n we can modify the k value and then use that k value here to obtain the value of x. Now, once we have the data we should check whether the data really follows Gumbel's distribution or not. So, for that plotting of the data is required and one of the plots which is commonly used is known as the Gumbel's probability paper plot and in this as we know the time period and corresponding probability which is given by. So, this is the time period and this is the corresponding probability. So, return period of t corresponds to a probability of this. Now, what we can do is we can plot this value, but do not write the values here we write the corresponding values of t and that way we prepare a probability paper which let us say has time return period here 1.01 to 5, 10, 50, 100 and so on. On the simple scale these values show the time period, but they really indicate the value of this term. For example, if you take 1.01, 1.01 corresponding to 1.01 from this equation we will get a value which is around minus 1. something. Similarly, when you take a value of 10 you will get minus ln. So, natural log of 10 over 9. So, you can compute this value and that will be the value corresponding to 10 similarly, 400 and so on. So, on this one you can plot the time period here and then the values of let us say discharge which will be meter cube per second and sometimes when we are talking about floods this will typically be in 1000 meter cube per second because the discharges are generally of the order of a few 1000 meter cube per second. So, that value may go from let us 0 to 4, 6 and on this curve if we plot the observed data and it shows a straight line that means our data follows the Gumbel's probability closely and from this suppose we have data up to 50 year return period then we can extend this line and estimate the 100 year return period flow or the 100 year return period flood using this distribution. You can note down in Gumbel's distribution there is a point which is if you say t equal to 2.33 that is the flood which has written period of 2.33 this will correspond to the mean of x or q in this case and that we can see from this equation when the x is equal to mean. So, when x equal to the mean y is equal to 0.577 by putting y equal to 0.577 and this equation we will get a return period of 2.33 using this equation or using this equation we can get for y equal to 0.577 t equal to 2.33. So, in the Gumbel's plot the value corresponding to 2.33 would be the mean of all the values. So, this is one distribution which can be used and as we have already discussed the data is has some uncertainty. So, the 100 year flood which we predict will also have some uncertainty and therefore, one other thing which becomes important to know is what is known as the confidence interval. So, let us say that we say our 100 year let us say this x is at t return period flood. Now, what is the confidence level which we attach to this x or let us say what would be the range. So, this x is a single value, but what we would now like to have is what is the range of values which will have let us say 90 percent or 95 percent confidence interval. So, what we say is that the x value because there will be some error in our estimates. So, we call this standard error the FPC is a factor which is dependent on the confidence or confidence level. So, our confidence level may be let us say 90 percent or it may be 95 percent or 99 percent. So, depending on that this factor will change some values for example, if we have a 50 percent confidence probability the value is about 0.674. So, this is the confidence level or confidence probability and this will be the function which we use here. Similarly, if we have about 90 percent probability then the so x plus minus means from the expected value which we present for the design flood we will go plus minus this value and what we say is these will be two values. So, one would be x is in the middle then here we have x plus this thing and then the third value is x minus same thing. So, within this range there is 90 percent chance that our actual value will be within this range. So, if we want 90 percent probability we use 1.645 factor here to get the two values of x. The standard error is typically given by some constant c standard deviation of the data points which are n data points. So, sigma n minus 1 standard deviation divided by square root of n multiplied by some constant c and there are empirical equations available to express this c as a function of k and one of the equations which is commonly used is 1 plus 1.3 k plus 1.1 k square. So, this equation can be used to obtain c and then we can obtain the standard error depending on the probability. For example, if we have 99 percent probability we have to use a large multiplying factor 2.58 and therefore, this range will become higher. So, if we have this expected value x the range will be here for let us say 90 percent for 99 percent the range would go even higher. So, we say that with 99 percent confidence we can say that 100 year flood will be within this range or with 90 percent confidence we can say that the 100 year flood will be within this range. So, that is the confidence interval which is also an important concept because there is really so much uncertainty in the data that we cannot say that there is a single value of the 100 year flood. So, this confidence interval gives us a possible range of values and we can choose a higher or lower values depending on the importance of this structure. The other type of distribution which we use commonly for extreme value is known as a log Pearson type 3 distribution. There is a distribution which is Pearson type 3 and when we say log Pearson type 3 that means instead of the variable let us say x we will be using log of x and say that that log x follows a Pearson type 3 distribution. The equation for this is first we transform x to z where z is nothing but log of x and then z will follow a Pearson type 3 distribution. So, first we do log of x and then z will follow Pearson type 3 distribution which is given as. So, now we will be doing in terms of z rather than x, but we use the equation which is similar to the previous one. We have this frequency factor mean standard deviation and this is written period t value. K z again is a function of the written period and there is a coefficient which is known as the skewness coefficient which depends on the particular distribution. We can change the values of C s and we can get this different distributions within Pearson type 3 and these skewness coefficient is defined as n number of records z minus z bar cubed. The skewness in a sense represents how far away from a symmetric distribution we are. So, if we have a symmetric distribution skewness will be 0. We may have sometimes a positive skew or sometimes we may have a negative skew. So, depending on that we will get a skewness coefficient which may be positive or negative or it may be 0 if it is completely symmetric. There are tables available for this K z as a function of C s. So, there are C s values here. There is a time period here. So, it may be 2 years, 10 years, 20 years, 100 years. C s let us say minus 3, minus 2, 0, 1, 2, 3. So, for these C s values the value of K will be given in the tables. So, K for this combination would be given like this. The values which correspond to 0 are known as log normal distribution because they have 0 skewness means they are symmetric and therefore, it is like a normal distribution. So, depending on this we can find out K z. Then we can find out z from here and once we know z then x can be easily obtained as 10 to power z and therefore, capital X which is our design value would be 10 to power z. So, these distributions help us in obtaining the value of the variate for any return period. So, if we want 100 year flood we will go here. Suppose we want to use log normal distribution we will go to C s equal to 0, find out the value of K z. From the mean of z and standard deviation of z we can find out capital Z and then find out. So, we have looked at some methods of extrapolating available data. That means suppose we have a record for 30 years or 50 years and then we want to extrapolate and find out what is the 100 year flood. We can use some of these standard extreme value distributions and extrapolate the values. We have also looked at the concept of risk reliability and safety factors and in the next few lectures we would look at how to obtain the design values for different kind of structures.