 वियोर्ज जिस्त्तरा कोईलिटी जो है जिन्दिगी के हर शोबे में it is important in every work of life इसी तरा ये देटा साँिस के लिए देटा की कोलिटी के लिए देटा की रपोटिं के लिए quality is very important for the reporting of the data इसमें ये जिए बी जीहात है। because this is more important because the decision making that is done on the basis of the data you have, which is available to you for analysis वो interrogate the things etc that you apply and for richmo. पर स Canal यह Shankar यह आप ह Beijing इस �nes पर गठा कनन लिए लर आप拍 लर तो उसको मेंटेन करना भी उतना ही जरूरी है, कुंके वो देटा है वो उसको अमने एक दफात प्रोस्स नी करना, it is an iterative process, we already discussed that its frequency can be anything but we have to make sure that we maintain its quality and the quality of decision making, the decision management or the audience of the people who have informed decision making, informed decision making is basically the decision which the state holders make on the basis of the information available, and the information or the data available, and if the quality of the data is good, the quality of the data we have seen during the wrangling process we have performed, if the quality of every performance is good, then the decision maker in the end will make their life easy and they will be able to make best decisions, this is very important. and one thing to understand is that the quality is not an end process, end means the steps involved, you have to maintain the quality on every step, and I will share with you a very simple example, if you are doing a duty, we have 6 steps in this, if you reiterate on every step you have to keep in mind always this is an end process, I will do this, end this, end this, end this, end this, end this, end this, end this, end this. then I will get the final product, but if I have done 3 processes well and if I do one or not, then that will not help you, it will not work and that is not acceptable. so you can see this example in the quality parameter, if you have 6 steps and you have to give 100% right, only then your end product will be 100% just imagine if you say that if I have 90% of my input or value addition, then imagine that 90% after completing 6 processes, if every 6 steps value addition is 90% instead of 100, then your end product will be less than 60% accurate as I said keep in mind, this is very very important and a lifelong lesson for you to learn, this is an end operation, this and this and this you have to make sure that every process you have to give 100% of your input or effort so that your end product, even if it is a little less, then still it should be more than 90-95% accurate now in this we have 6 different areas that data has to be consistent, it has to be accurate, it has to be validated, then there is completeness, uniqueness, timeliness, all these things are very critical and they matter a lot I will try to highlight a little bit so that you can understand the basic understanding of data quality, then tomorrow you will be able to apply these techniques in your professional life look at the data consistency, look at the data that it has to be consistent, consistency means that every time something is happening, it has to be done in one way this cannot be done in one way or in another way or if you see it from another angle that every time you have to give your 100% which you have written procedure, which is your ETL process, which is your enrichment process that process has to be 100% accurate, so that you can maintain the desired quality from desired results then your transaction, your application, look at the data produced, it is produced through an application the application that we have seen in the previous module, the anomalies and inconsistencies in the data are basically coming from the source system the source application where people are entering manual data or they are producing automatic data and sometime that is not negligence, that is limitation of a particular system that you have to get that data from here, that is why we say that when we have to ingest and extract our data, then we make sure that during the wrangling we make sure that we are doing proper care and proper massaging and we make sure that the quality of our data is as per the expectation we can use it the data completeness, basically I will share it in another slide, completeness is the purpose and the meaning of completeness completeness is that the completeness can be for just one column or one cell, one record, one file, one database, one set data set, one table, it can be anything, right so this is very interesting and all these things that you have to comprehend, then digest, reproduce, use at an appropriate time to meet the requirements of that particular hour in your professional life, right so this is very important that you understand these things that the accuracy of the data, completeness of it, what is the purpose of it, right so now you have to make sure that the accuracy of the data is accurate now if you have seen the example of the excel sheet with your share, then if the column of the gender is not right or the format of the date is not right or there is one more interesting phenomenon, when you analyze the data, then in many situations, now we are talking about the back end data, we are not talking about what we are looking at again this is a very important lesson and point to understand there is one data that we are seeing from the front end one is the data which is stored in a database, maybe in a binary format or in some other format so there are different things in it too, there is one thing which I hope you all understand what is the bit, what is the byte, what is the record, you know all these things so there is a check digit in it, sometimes there is a parity bit and these are the things which are stored within the data or quality of the data but it does not have any analytical value or it does not have any value like this so you have to understand these things that what is the check digit, what is the parity byte so when you look at the back end, first of all you take an example of barcode you buy things in the morning and evening, you see that there are barcodes on it the barcode is actually a representation of the product if you are buying a cold drink, then the barcode tells you the whole thing if you are buying a t-shirt, then barcode is a complete representation so basically it is that if you have to save the time, if you enter the entire time, then it will take maybe 1-2 minutes if you scan it, it will just click maybe few seconds so basically this is its purpose but the picture of barcode, it has a different meaning we will discuss this in some other area in detail but this is just to educate you on this particular thing and this is very critical in data engineering that whatever data you are seeing and whatever is stored on the hardware level, they can be different in this there are different characters, your decimal or octon there are so many things basically so just park all these words that I have just said but just keep in mind that these things can come to you that how you have to handle them, how you have to manage them so the uniqueness of data is that once a record comes in the database due to communication error, network error or any other reason then you have to ensure that your data is unique and the data that we store in the system, whether it comes from an application or through an operation along with that there is one thing which is called time stamp that time stamp is accurate to your millisecond in that there is no chance and the system which is on the nanosecond it operates on the next level in that you know that time stamp creates uniqueness in a particular data set in a particular data record so you have to make sure and these are your scripts, even database scripts your scripting language, their syntax in that time stamp, basically this is a control to stop the repetition of the or duplication of record, repetition is different but duplication is sort of an error so you have to make sure that your data does not have any duplication and the timing of the data is very important timing basically, you see that we have a lot of real time data in this situation basically we have three types of data historical or batch processing on which we perform our descriptive or exploratory analysis in our statistical subject, when we touch these things we have understood what these things were so the time of the data is very important if I have real time data, then what technique I have to use because in the scope of big data, we have discussed these things to some extent so that concept is used there if you have real time data if you have operational data that is being transacted and that data is coming or that is IoT data, industry 40 data then that is real time data, how you have to manage it if you have that data in the form of batches how you have to manage it and the data that you have the logs, files that you do analysis so these are three or four types of data but the analysis is basically three types of data which is your descriptive, inferential forecasting these are the things that you do perspective analysis there are new angles which are all touched by your profession as a data scientist so these are the things that are very important for you to understand that the quality of the data is 6 parameters you have to make sure that you understand all these things and make sure that the data is complete, accurate, you have got the data on time and you have maintained its uniqueness there is no duplication, anomalies, noisy data all these things are not there time is a very important time I will just conclude this module after sharing this what happens is that the decision makers the people who are sitting in the operation they need a special information at a time for example if you work in a store and you have to send a stock for the next day's sale then what will you do? before starting the morning office you should know which orders I have to dispatch this is an example of a time that all the sales that were delivered to you last day were updated in the system the order of the next day based on a certain process that is again beyond our discussion that under some process your new stock replenishment or the order of the next sales according to that you plan all your deliveries and then those dispatches dispatching or delivery planning everything this is a complete science and data science analytics is everywhere last mile delivery that is a very critical subject even last mile delivery is a million dollar subject maybe more than that so with this we conclude our discussion about the data quality and I will see you in the next segment