 Hello, welcome to SSUnitex Uciliside and this is continuation of Azure Data Factory tutorial. So today we are going to see about the schema drifting in data flows. So what is schema drifting? So basically schema drifting is the case where your source often changes the metadata like the fields, columns or the type can be added or removed or changed on the fly. Without handling for the schema drift, your data flow becomes vulnerable to upstream data source changes. So this is basically let's assume we are having the source data and the source data is going to change very often. For example, previously we have only two columns and after the first execution, next time when the file received then it has three columns or for example, the second case if your file in the first execution has three columns in the second execution, it will have only two columns. So such type of changes in the data or the data type changes. So on that case we have to handle that by using the schema drifting option inside the data flow. So don't worry if it's not very clear now, we'll see in the practical. So go to on the browser and here like inside the blob storage under this SSU testing under this input folder, we have this sales India file and this sales India file is having total six columns like sales order ID, date, customer ID, quantity, value and country. So here as we could see, we just want to load the data from this file to the output folder of the sales underscore India. So we just want to copy the data from here to there, then we'll try to do certain modifications at the source and then again we'll see. So go to on the SEO data factory and here let me quickly go in the data flow. Let me quickly create a new data flow here. So we can create it here. Let me call this as schema drifting like this. Now we can see the add source. So let me click on that and after that we can go here and under this data set we can search for the data set. So we have not created the data set for this. So let me try to create a new data set here go to the as your blob storage click on continue delimited text then continue. Let me call this as data set for the source of sales India. Now we can select the link service which is the SSU testing then then we can see the first row as header say this checkbox and let me set the file from here input folder and this is the file. So let me open it. Everything is okay. We can click on okay. Now we can go in the source option. Now we can go in the data preview and try to refresh it. So we will be going to see the data as we have seen at the source. So that we can see that that we can see now go to the source setting again. So under this here we can see the option and under this option we can see the allow schema drift. So let me check this information. So it is saying select allow schema drift if the source column will change often the setting will allow you to all incoming feeds from your source to flow through the transformation to the sink. So this is basically quite enough to handle whatever the data that we are getting from the source. So if the metadata is not going to match then it will handle all these in the second we can see the inferred drifting column types. So this is the auto-direction of the type of the column. So let me select this checkbox as well. So it will be going to detect automatically. Now let me select the sink. So we just want to dump the data into the sink location. So for the sink I did not created the data set. So let me create a new data set here. So again for the Azure blob stories delimited text and after that here let me call this as data set of destination for the sales India. Now we can select the link service and then we can select the path which is the output folder then here the first row as header. So we have to select this checkbox go to the advance. So everything is okay. Let me click on okay. So we have configured the data set here. Let me go in the setting and here we can see the file name option. So we have to use the output to single file and let me call this file as sales India like this. So this is the CSV file so we have to select this set single partition. So we have to select that one. Now let me go in the data preview and try to refresh it. So we will be going to see the data. So here as we can see the data. So now let me go and try to publish all. So publish like this. Now let me go in the pipeline in between while this publish is happening and try to create a new pipeline for executing this data flow execution of this data flow. We have to use the data flow activity. So we can drag and drop it here and now we can go in the setting and under this setting we have to choose the data flow that we have created which is the ischema drift. Now let me try to debug it. So it will be going to load the data from source to destination end. So here so let me go in the output folder and under this output folder we can see this one. So let me try to open it and let me quickly go in the edit mode and after that we can see the preview. So we can click on that. So we are able to see the data in the destination end. Now let me go in the source and we will try to make the change in the source file. Then we will see how it is going to handle that error. So click on this edit again. This time I am going to remove this customer ID from here and the values as well for this customer ID. So let me quickly remove all these. Now we can save this. So this time your source is having only five columns. We don't have the customer ID here. So let me try to debug it again and then we will see whether this is able to handle this or not. So let me go quickly in the output folder of this and under this output folder we can see this file. Let me open this again. Then go to edit and after that click on preview. So now what we can see we can see the customer ID column is here but the values are not available because in the source side we did not have the customer ID column. So what we can say that if destination is having any column that is not in the source so destination will have that column and pipeline will be executed as well but the data under that will be blank as we can see. Now let me go in the source again and we will try to add a new column here. Then we will see how it is going to handle that. So here let me go in the edit and after that let me add the city here. So city like Mumbai, Mumbai, Noida, Unae as well and for these two will be left now. Now let me try to save this. So this time your destination table did not have the city column. So in your source we have added a new column here. Destination table has the customer ID and source is not having the customer ID. Destination did not have the city. Now let me try to debug it again and we will see what will be happening now. Go to the output folder and then let me try to open this and after that we can go in the edit and under this we can preview it. So what we can see in the destination it has added a new column. This is going to handle all these just because of the schema diff that we have selected here. So if any changes is happening at the source end it is quite enough to deal with all these. So as we can see in the destination customer ID is not in the source but your pipeline got executed successfully. Your source did have the data for city column but your destination did not. So destination now we can see have the extra column for the city. So I hope guys you have the clear understanding about the schema drifting inside the data flows. So thank you so much for watching this video. See you in the next video.