 Hey guys welcome to sysunitex social website and today we are going to start with the data flows. So what is the data flow inside the SEO data factory and where we are going to use all these in the real time. What is the data flow? So basically data flow feature in SEO data factory will allow you to develop graphical data transformation logic that can be executed as activity inside the SEO data factory. So the most important thing we can understand here like we will be having one two or more than two sources here. So for example this is a source one and we have second source. So we just want to do certain transformation based on these two sources so that can be done under the data flows. The same thing we have already done inside the sys that is the transformation. So under the transformation if you just want to modify our raw data so we can do that. So similarly under the data flows inside the SEO data factory we can do the same thing. Next by using the data flow activity we can just execute all these data flows inside the SEO data factory pipeline. So that we will be going to see one by one in the practical demo. So don't worry for that. Next your data flow will execute on your own SEO data bricks cluster for scale out data processing using Spark. So we have not discussed about the SEO data bricks and the cluster. So all these will be going to create a separate playlist. So you will be going to learn all these there. So you will be going to understand very easily about this point. Next is the SEO data factory internally handle all the code translation, Spark optimization and execution of the transmission. So ADF pipeline internally has all these options. So it is going to automatically handle. So we are not required to do any additional thing there. So go to on the SEO data factory and we will try to see about the data flow. So this is under the SSU test. So here we can see the option first is the pipeline. So we have discussed about the pipeline then the dataset we have already discussed. Now we can see the data flows. So we just want to create the data flows and that data flow will be executed under the pipeline. So let me try to click on these three dots and here we can see the three options. First is the new data flow, new flowlet then the new folder. So I am going to create the new data flow in this video. So here we can see the property. So under this property we can call the name of this data flow. So that will be the data flow introduction. Here we can see the related so related items are not anything. So that's why we don't see anything here. Now we can minimize this. Here we can see the property of the data flow. So the property is parameter. So if we want to create any parameter under the data flow then we can create it from here. Like we can click outside anywhere and here we can see the parameter. So we can create a new parameter. So that we will be going to create in the upcoming videos. Now here we can see the add source. So we can click on the add source. Here below we can see the add source button is again. So we can create more than two, three, four whatever your source is we can create all those. So here we can see the properties of the source. So first is the source setting. So under the source setting here we can see the output stream name. So we can specify the stream name here. So let me call this as EMP. So this will be going to rename as EMP that you can see here in the top side. Now the source type. So here it is having two type of sources. One is the data set and second is the inline. So in case of the data set we have already created the data sets there. So we can utilize all those. If we have not created the data set then we can use the inline. So under the inline either we can write the query directly or the table or the stop procedure to retrieve the data. So all these things we can get it from the inline. So go back to data set. I am not going to cover the inline in practical in this video. So under the data set here we can see the data sets. So we have to select the data set. So we have already created all the data sets. So that is why you can see and if you want to create any new data set then we can also create it from here. So here we can see the plus symbol for new. So if we just want to create a new data set so that we can create it from here. So I am not going to create a new data set when we use the same which we have already created like the employee data source. So in this case your source is the blob storage. So we can click on this open and here we can validate that. So our source is the blob storage under the ssu path. Under this we can see this file. Now we can preview the data here. So once you preview the data then we can see it will be having two columns like the first column and the second column and it is having total six rows in this table. So let me close this go back to the data flow and here we can see the option for the allow schema drift. So once we click on this information then we can read what is the schema drift. So it is saying select allow schema drift if the source column will change often. So if your source column values are going to change then we can allow this schema drift. So it will be going to check whether your incoming source field is having the same or not. So this is something like the mapping is going to check but our source is static so we can ignore that part. Now here we can see the validate schema. So if we want to validate the schema then we can select it. Now here we can see the sampling. So if you want to get this data as per the sample then we can enable and how many rows we want from the data we can set it from here. So I am not going to use this as sample. So let me disable it again. Now go back to the source options. So under the source options here we can see the wildcard path and other properties. So these are not going to very required in this video. So I am going to skip this. Now we can see the projection. So under the projection it is going to say like how many columns we are having and if you want to update the column names like it is having the column 1 and column 2. So we just want to call this as id and the second column value that will be the emp names. So we can call this from here. So your output column values will be changing. Now here we can see the optimize. So under the optimize it is going to partition your source data. So we can use the current partition option. We can use the single partition or we can set the partitioning. So in the single partition whatever the data we are getting in the source it will be going to dump as it is without doing any partition. Second we can set the partitioning. So for example here we can see the round robin then the has then the dynamic range then the fixed range then the key. So all these partitioning options are available. So we can set it from there and we can do like in the round robin we can set the number of partitions like it is saying 2 we can set it 3 or whatever. In case of the highest partitioning here it will be having like how many number of partitioning we want and the column values by which we just want to do the partitioning. So we have to select that column value like if you want to do the partitioning on the id or the name we can select it from here. Similarly in the dynamic range it will be going to do the same thing and here we have to select the column values. In the fixed range it will be going to write some kind of expression and that expression will be going to do the partitioning on based on certain condition. Last is the key. If we will be going to have any key values so for example here it will be going to do the partition unique value as per the column values. For example here we are having only two columns that is the id and employee name so if you are going to do the partitioning on the id and employee name so it will not be more affected. For example if your source is having something like employee gender as well then we can do the partitioning based on the gender so that will be having only three values like male female and unknown. So we can do that. Now go to the next one that is the inspect so under the inspect we can see like the column and whatever the data type of that column. So this is the id and this is the employee names and the data type that is the string for all these two and the last option that is the data preview. In the data preview first we have to enable this data flow debug so we can on this and here during time to live so I am going to set for one hour that is a by default and click on okay. So it will be going to start because it is going to start with the twister so here we can preview the data. Now go back to the source setting so here now we can see this plus symbol so this plus symbol is indicating for example here we can see the multiple inputs and outputs so we can do the join between two tables that will be your two inputs basically. Second we can do the conditional split will be doing the exist union lookup then the schema modifier we can add a column that will be the derived column and all these transformations we can see so don't worry for now we will be going to cover all these one by one in our upcoming videos and at last we can see the sync so it will be adding a sync and under this sync whatever we are going to load that we can directly load it so this is all about the introduction of the data flow. In the next video we will be going to start one by one for all these transformations so thank you so much for watching this video if you really like this video please subscribe our channel to get many more videos don't forget to press the bell icon to get the notification of our newly uploaded videos.