 Welcome to this lecture number 37 of groundwater hydrology course and in this particular lecture class, we will cover this modeling and management of groundwater. Consider this topics to be covered are Contaminant Source Identification and Monitoring Network Design. So, first topic is this Contaminant Source Identification. Why this Contaminant Source Identification? Groundwater contamination is a problem of worldwide concern and this in this often man-made causes are responsible either we can have some kind of geogenic source or we can have anthropogenic thing like arsenic problem is geogenic thing, but contamination there can be dumping of some pollutant in online ponds that is some kind of man-made cause. So, source identification is management necessity. Why it is necessary? It is necessary from groundwater management point of view. If we can manage or manage to identify the source, we can have some kind of remediation strategy, remediation strategy for that particular aquifer and we can decontaminate or we can start the remediation process in that particular aquifer. Next is effective remediation requires reliable source identification. So, this is the point that remediation thing we need some kind of proper estimate about the source both in space and time both in space and time and its strength also a key parameter for identification. So, useful in fixing liabilities for pollution. Let us say for man-made causes we cannot do with the do anything with the geogenic causes, but for man-made causes man-made causes it is important that we should fix the liabilities. So that if 1, 2 or 3 or 4 defaulters are present within that groundwater jurisdiction area then we can fix the share of their responsibilities for decontamination or some kind of price that they need to pay for the health related spending of that locality. So, what is this hydrogeology as forensic science and forensic because we need to identify the things properly without identification it is a difficult thing. And ethics in this field so we should have some kind of ethics and the safety of groundwater that is the most important thing without a proper management strategy or identification strategy we cannot protect our groundwater. So, what is the basic problem? Basic problem is that let us say we have a source S 1 another source S 2 and this rectangular part is 1 aquifer and this is let us say our wells W 1, W 2 wells W 1, W 2, W 3 and W 4 these are in the down gradient portion of the aquifer. Let us say this is your flow direction so with this aquifer configuration and these many observation well or monitoring wells we can identify the S 1 or S 2 in terms of its location whether S 1 is responsible whether S 2 is responsible in terms of its strength whether for the first year second year third year fourth year maybe S 1 is responsible for the contamination only in the second year, but in case of S 2 they are responsible for the contamination of first third and fourth year. That is also important to find out the activity period and so three things are important here source location this is source strength and their activity period so these three points are most important thing. So, in this case forward modeling will give us if we have this source and this strength this source and this strength it will give us some kind of breakthrough curve that is time versus concentration curve for W 1 location like this and for more forward modeling we can get W 2 or breakthrough curve for W 2 as this multiple peaks due to different strengths and their activity periods, but the complicated problem is that we may not have a proper observation or management plan in place and we do not have any proper monitoring strategy for that contaminated area. So, it may so happen that the contamination may be noticed after 10 or 20 years after it has started in that particular area. So, the problem is that inverse modeling, inverse modeling means we have the breakthrough curve maybe we can find the breakthrough curve in truncated sense truncated means let us say that we are starting the monitoring at the end of second year. So, we will get this part of the breakthrough curve only. So, same for this part let us say we have started this W 2 well observation well after third year. So, we may get this kind of breakthrough curve or truncated breakthrough curve. So, it is important that with this complete or truncated or limited information we need to have some kind of proper estimate of the source in terms of his strength, its location and its activity period. So, source location magnitude and period of activity are three important aspects. Now, what are these inverse problems? So, first type is backward or retrospective problem the initial conditions are to be found. Second one is coefficient inverse problem in this one classical parameter estimation problem where a constant multiplier in the governing equation is to be found out and boundary inverse problem some missing information at the boundary of a domain is to be found out. So, our problem is basically backward or retrospective problem. Inverse problems are mostly ill posed problems because most of the cases we may not have a unique solution for a unique inverse problem. So, difficulties in source identification sparsely distributed observation wells that is the most important thing because if there is no proper observation or monitoring network is there then it is a problem and sparsity of observation data, observation data is also sparse in nature inaccurate prediction of contaminant transport processes, modeling errors, measurement errors. So, one kind of error that may be there is related to modeling error another one is measurement error. So, sparsity of data error in measurement and error in modeling these three are the most important things for source identification. So, billions of possible discrete combination of magnitude locations and duration possible. So, there may be multiple combinations that will give you the same set of breakthrough curves that is available for a particular monitoring well. So, it is important that a proper strategy should be followed for monitoring otherwise there will be difficulties in identification of sources. So, we can have situations where let us say this is first kind of combination in our previous problem we are having S 2 and S 3 S 1 and S 2 these two are the sources. So, we have two things that this kind of S 1 combination and the second S 2 combination may give the same results in the down gradient monitoring wells in terms of breakthrough curves. So, other difficulties problems with delineating the physical extent of the area to be modeled because we need to have certain kind of limitation in terms of delineating the physical extent of the area uncertainties in the boundary conditions and initial conditions. So, the problem is that although our in terms of our classification we are interested in finding out the initial conditions, but boundary conditions are also important because physical extent is important and if you are considering if you are not considering a proper physical extent of the area to be modeled then boundary condition is a critical thing for modeling. So, uncertainties in estimation of flow transport parameters so, uncertainties are also there in terms of estimation of flow transport parameters like hydraulic conductivity, longitudinal dispersivity, transverse dispersivity these can play all also important role in modeling and identification of sources. And identification of unknown pollution sources belong to the category of inverse problem which are often imposed because we do not have a proper system in place for which we can say that this is our proper source that we have identified from our problem, but the problem is that we can have a multiple combination for which there will be same breakthrough curve in the monitoring wells and unique solution does not necessarily exist and solution may be unstable to small changes in the input data. So, sensitivity of the solution approach that is another important issue so, various processes involving solute transport in the porous media so, we can have advection processes where ground water flow is caused by gravity, next we can have diffusion molecular process where constituents are spread due to differences in concentration, next we can have dispersion mixing process caused by differences in velocity, in magnitude and direction of water particles and another one is adsorption process where certain constituents are attached to grain material and final thing is the decay. So, there will be combination of these processes which will dictate the source identification. So, proper transport process identification is the first part of any source identification problem. So, overall methodology is that we can have optimization model, optimization model we can have objective function and we can have Jacobian matrix and with this we can have some kind of search direction, step length and decision vector that can be determined and the most important thing is the flow transport simulation model. So, we can use our flow transport simulation model as linked simulation optimization model as external module and we can calculate our objective functions also our search directions. Search directions are basically dependent on the Jacobian matrix and with this Jacobian matrix it will give the proper direction, but the problem is that if you have a Jacobian matrix based approach then you may or may not get a proper global optimal solution. So, it is important that your selection of optimization model is also important thing. So, first is identification of proper transport process that is identification of proper numerical flow and transport simulation model to simulate the complex hydrogeological system. Next is identification of proper optimization model and intermediate things are your linking things these are intermediate calculations. So, in this case we can have two optimization problems. First one this is the observed concentration in monitoring well and this is our estimated concentration. So, this the square of this difference and this is a weighted one. So, weight is calculated like this, this is observed concentration plus eta value, eta is a small value which gives some kind of support for the weight, this C fq this is again our simulation model, this is restriction in terms of concentration, this is restriction in terms of our injection from sources. Let us say that there will be some kind of physical limit for the injection rate that is maintained with this particular constraint and there is some physical limit for concentration that is maintained with this particular constraint. And this is one particular component of our Jacobian matrix that is being calculated based on finite difference approach or difference approach. Second model is that your objective function is linear in nature, but our constraints are non-linear. Previously we have seen that our objective function was non-linear in OSIME 2 OSIME 1 and this is also non-linear constraint and these are our linear constraints. So, some literatures suggest that there is advantage in placing any linear objective function instead of a non-linear objective function. So, it is being converted like this, this is considered as equality constraint within the optimization problem. So, we need to incorporate some kind of errors. So, for a particular a hypothetically illustrative problem we have considered that this is a simulated value. Simulated value we have added some kind of error this it represents the measurement assumed error free it may represent a special case where the value is 0 and standard normal random variate for concentration. So, we can introduce some kind of error with the simulated values and we can create some observed value to test our two objective functions or our two formulations OSIME 1 and OSIME 2. So, what is our evaluation criteria? We have used this normalized error estimate for source fluxes, this is actual value and this is estimated value. So, this is average one for this is for a particular case. So, this is number of realizations. So, for multiple realizations what is the difference? So, from this we can get the standard deviation of the estimated source flux and this is the average estimated value of the source strength. So, let us consider one illustrative problem where we have three sources one S 1, S 2 and S 3 and we have three four monitoring wells W 1, W 2, W 3 and W 4. Out of this this thing is having two impermeable boundaries and two constant head boundaries these are linearly varying. So, in this direction and this is the final direction. So, this is the solution result for disposal period one it has been found that this is the actual flux and this is the estimated flux values. Though this is 47 from OSIME this is giving better result compared to OSIME 2 using this minus and NP Sol and this is actual flux 0 for all the cases this is 30, 30 is the strength and in case of Maharan Datta this was 29.92, this is 30, this is 29.99, this is 30 again this is 30 and for period two also the things are almost matching period three it is almost matching period four this both the methods are matching and case of NP Sol this OSIME 2 is performing better and in this case we have 1, 2, 3, 4, 5, 6, 7, 8 sources or potential sources and this many pumping or observation wells and these two are basically pumping wells we have two zone case here and we have clean recharge pond we have zone 1 then zone 2 and this HT is time varying head which is defined like this and this is a constant head and these two boundaries are impermeable boundaries. So, with this kind of pumping rates these are the pumping rates for location 1, p 1 and p 2 have considered different scenarios scenario 1 that is error factor is theta that is 0.1 in the measurement data 5 percent increase in the hydraulic conductivities, 5 percent decrease in the porosity values, 5 percent increase in longitudinal dispersivity, missing data with error factor, concentration data during the first 5 years are assumed to be missing for observation wells 4, 5, 6 and 8 and observation wells W1, W13, W16, W18 the concentration data during the first 10 years are considered to be missing. So, in this case we can see that for different scenarios the error percentage is 9.98 percent this is 16.68, 9.23, 9.94 percent. So, graphical representation of different scenarios this is actual scenario and with actual scenario how things are varying here. So, conclusion is that with the linked simulation optimization model can potentially solve the large and complex system capable of incorporating erroneous concentration measurements and unknown parameter values missing observation data. So, monitoring is most important thing for any management problem or a source identification problem. Let us say this is our GL value GL and this is unsaturated part of the aquifer this is saturated part and this is our basically bedrock and unit of and this is our groundwater level in the aquifer. So, unit of groundwater level that is generally considered as H this H is a BGL value meter BGL. BGL is below ground level this is a piezometer a typical piezometer and this is the screen or well screen. So, with that we can monitor the groundwater level in any area. So, this is about the water level what about the contamination. Let us say we have a point source and we have three down gradient locations this is the direction of hydraulic gradient. So, ideally this point source should be detected by this particular monitoring well and if we estimate the concentration there. So, we can correctly identify that point source, but in reality the situation is different our soil is highly heterogeneous and due to that heterogeneity the point source may be detected by may be detected by the third well which is another well. So, the selection of sampling schedule under budgetary limitation that is one most important thing. So, monitoring network is basically that finding out that sampling schedule under cost constraints. So, long term groundwater monitoring is important. So, first thing is ambient monitoring which is basically regional annual monitoring for water safety, detection monitoring watch dog watch a dangerous spot for detection, compliance monitoring evaluate the progress of any management policy or remediation process, research monitoring is that monitoring for a specific research purpose. So, out of that most of the cases compliance monitoring is the important one. So, a site with 30 wells and single constituent single chemical constituent to measure at each well would have 2 to the power 30 or 1 billion possible sampling plants either 0 or 1 either 0 or 1 whether to monitor or not to monitor that way we have 2 to the power 30 solutions. So, any trial and error method is unlikely to identify the most effective sampling plan. So, mathematical optimization can effectively identify the most effective sampling plans to identify to satisfy any monitoring objective that can be quantified. So, objectives for monitoring one objective is minimization of concentration estimation error second one is minimization of uncertainty, third one is this mass estimation error minimization of error in locating plume centroids, maximization of spatial coverage and all are subject to budgetary limitation. So, basic approach is that we have the formulation we have optimization problem we can find out the optimal solution either local global or robust optimal. Ideally if we have a large number of iterations then it should reach to this x should reach to the ideal global optimal solution and linearity or convexity of the constraints space those are important things and we need to select proper optimization algorithm for guaranteed global optimality. So, spatial interpolation of concentration is important because we can have some kind of monitoring information for any selected locations or for other locations we can get some kind of estimate about the concentration from the some kind of interpolation spatial interpolation technique. Distance inverse distance wedding or IDW is the most common method where this W L is the weight 1 by W L x this is the distance between two points that is L and x to the power p p is the exponent D L is the distance or this estimated value for any particular attribute or parameter is estimated summation of W L x F L L is in the neighborhood points of j. So, j is if we are estimating the value for j and n b is the neighborhood set then L is in the neighborhood set of that neighborhood set of j this is weight this is the actual value this is again summation of total weight. So, one illustrative thing is that let us say we have this configuration where a and b two wells are unmonitored locations, but other locations we have monitored situation. So, if we are drawing this triangular neighborhood locations. So, for a we will see that e is the neighborhood location for a and in case of b f will be the neighborhood location for b, but the problem is that a e this distance is far compared to a b. So, that this information of b should be utilized while calculating a and information of a should be utilized while calculating the value of b. So, this is estimated value. So, we should have some kind of this is actual values in the neighborhood plus estimated value of b while calculating or estimating the concentration at a. Similarly, this is while calculating or estimating the value at b we need to use the concentration values which will be estimated using our previous equation. So, these two equation will act as constraints in our optimization model because these two variables are unknown. So, we can write it in disjunctive form that is if a well is monitored then we have actual value otherwise it should be based on the neighborhood concept or we should get the estimation from the neighboring points. If we use big M relaxation. So, this is a large value of M and this xi this chi thing this has got 1 if a particular location is monitored otherwise it is 0 if particular location is not monitored. So, for this particular form of constraints we can use this to get two sets of constraints that is if it is monitored then this chi equals to 1 1 means this is less than equals to 0 and this is again greater than equals to 0 that means ideally this should be both the equation will converge to the equality constraint and this Cj will be calculated based on this actual value otherwise it will be calculated from this particular equation. So, variogram is important thing. So, it is basically gives some kind of idea about the spatial variability of any attribute. So, this is the H or this lag distance between two different spatial points. This is the variogram value and variogram is calculated like this variance half of the variance between U plus Z U plus H and Z. Z is any particular attribute for our case we can calculate concentration value using this approach and the C is covariance and covariance is related to this gamma infinity. This gamma infinity is basically a constant value minus gamma H. So, we can use this expression for our calculations. So, ordinary Kriging is basically minimization the estimation variance subject to our constraints that is estimation in terms of these weights and weights should be such that the summation of weights should be 1. So, finally, we can get this particular set of equations and we can solve it and we can get the value of lambda and mu for our particular problem and we can also estimate the variance for any particular problem. For ordinary Kriging these are the potential monitoring locations. So, while if it is monitored then this part is not required this is lambda is basically zero and based on the thing that a potential monitoring location and unmonitored location we can use the Big M method to write our original Kriging equation like this. And we can have situations where chi i and chi l this is 1 1. 1 1 means this is represents the situation like this that a particular row or column whether it will be used or not that will be determined by the value of chi i and chi l and this left hand side this whole thing is represented as this chi i l. So, we can use this as constraints for our optimization problem and we can finally, get these constraints. So, these are basically spatial constraints in case of IDW we have seen we have got a set of spatial constraints these are spatial constraints for our optimization problem related to groundwater monitoring. So, now the formulation in formulation basically this is actual value this is estimated one and actual value divided plus this is eta, eta is some number which varies between 0 to 1 and this is number of wells and number of time periods. So, at the end of any particular time period and for a particular well this is valid and we need to minimize the total deviation and this is basically normalized deviation and absolute of normalized deviation. And this is the cost constraint we can install only p number of monitoring wells out of this NW impossible potential monitoring wells. And we have that IDW or ordinary querying for spatial interpolation. So, if we see our objective function this is having absolute operated and it is like this blue line and if we differentiate it it will be a discontinuous function. So, it is better if you convert it into linear formulation you have we have found out a linear equivalent thing of that. So, this is the linear equivalent thing and positive and negative. So, both are positive and we can minimize this if this is 0 that means we have got 0 estimation error and this is basically to balance or to calculate the absolute operators and C j it should have a proper value which should be greater than equal to 0. This can be solved using any algorithm for optimization algorithm lingo or seplex. If you have the linear mixed integer programming then it will guarantee global optimality. So, for a particular study area decreasing agent trichloroethylene or TCE 8 sampling events from December 99 to 2001 this was this methodology was tested for Fort Lewis logistic certain centurion the Spears County of Washington. And for these are the boundary wells these are the boundary wells and these are the inner wells. So, for this is the concentration contour for 2000 data and this is a scale we can see that we have these many wells and out of that we have IDW scenario where only September data 2000 was used ID2 scenario only September 2000 data with boundary well restrictions was used that all boundary wells should be selected and IDW scenario 3 that is all 8 time period data was used and scenario 4 all 8 time period with boundary well restriction was used and ordinary creaking scenario 5 only for September data was used. So, in this case the performance measures where this particular relative estimation error that is average value based on number of wells removed or eliminated number of time periods same here this is RMSE or a root mean square error difference between the actual value and this is the estimated one and error plots shows some interesting results. So, interestingly if we have out of these many wells if you have only 27, 26 or 25 wells the error is almost negligible for scenario 1, but if we remove more number of wells that is 6 wells if we remove then it will be a problem out of this wells if you remove more number of wells then it will be a problem and interestingly if you remove more number of wells again this error is decreasing. So, we can infer that not only the number of wells also configuration is important for any monitoring network. So, particular configuration can give a lesser error compared to a monitoring network where we have more number of wells. So, IDW scenario have compared this with the existing existing solution and this is our IDW scenario 1 is far better compared to existing results also for up to 17 percent reduction in wells this is performing better as have already told that configuration of monitoring wells that is also important thing here and for ordinary creaking scenario it out performs the existing results. So, these are the removal locations for both the cases we have found out a different configuration that is why our results are better. So, number of variables with so many number of variables it is giving a guaranteed optimal solution because our optimization problem is linear in nature and our approach that is also proper. So, that is why it is giving a guaranteed global optimal solution. So, you can see that ordinary creaking scenario have 1020 and NW that is 30 integer variables. In last case we have 1020 real variables and this 30 integer variables. So, this is all about optimization thing and in optimization basically we have covered this monitoring network design part and now you can see that what is the importance of optimization in monitoring network design method logic. So, with this lecture 37 ends