 students know we have established dimensions and measures and how they are linked with your data source or data set now we have to understand why it is required in our data science and analytics if you have multiple data types, it is very interesting to understand that the concept of data science is the network engineering if you have a network of data, telecom, or even you have to do something else, you can compress the files or encrypt the data the logic behind this is that you have to do a 100 MB file and send it to email and it is not allowed to send so you can zip it or compress it into more generic terms so the size of it is less than 10 MB and you can easily send it so the mechanism behind it does the same that the data using your algorithms is a different logic it compresses it while using them when you open it because you use software for it, like winzip, winr, r, etc when you unzip it, it reverses the entire process whatever measures it has excluded, reduced dimensions, it includes software again when you open the file, it is the same format that you get before compressing it so as we said we tend to add as many features as possible at first to grab useful indicators but then we get issues, so it is very important for us to understand why we reduce them and we have to reduce them in such a way that the functionality of our data features according to the problem we are addressing and the noisy data, unwanted data that we want to remove we can remove it from it so basically you have a 3 dimensional data in which x, y and z are 3 dimensions now you have done it in 2, x and y, y and z now you have removed one dimension from here so this is just to develop an understanding that in practical terms there are methods that you use to reduce these dimensions and how to bring it to x axis, how to bring it to y axis, how to see its dimensions and it is very interesting to understand these things now if you see that the normal data that we get from any source, it is almost like this in which there is a lot of data, if you have given an example of name, age, address etc you can put an email address in it, you can put a zip code in it, add the address of the house you can put an education in it, if we consider the data of an employee then you will find hundreds of dimensions and the normal databases that are operational, hundreds of tables and thousands of columns which you analyze based on them eventually according to your area of interest you select one dimension to perform a particular analysis and this can be done when you see that one is objective for one and then for the other one, for the third one but the biggest advantage of this is that as if you do some age and data birth of age is correlated then you don't need to keep both of them in your analysis you can keep only age or data birth and calculate age so that these things are correlated in this way income and many other things like education or grade for example HR, it is easy to understand if this is a grade of a level house then this is its salary if this is a lower grade then what is its salary in this way these things are related to each other so you decide that if these are correlated then what do we have to do to reduce it similarly if you take the name of the city then you don't need to write a state because there is more than one city in the state if you take the name of the city or in some cases you just give the zip code or the postal code that is already in the database it has the information that this particular postal code is related to any city even to any union council so in that case you just keep the zip code column that you will reduce all of them from your data set so as I said earlier but just to reiterate that time and storage is saved processing time is saved correlation that I talked about that we can reduce it in machine learning because the model we have to make it very very efficient always keep in mind that we are developing all the things that we are understanding the orientation we are building that is because the knowledge you can use in machine learning and analytics when inshallah you will be working as a data scientist so these are just to reinforce your knowledge base so that along with time you can refer to these things and you can use them to do your work then we have talked about the dimensions we can do it in two or in one directions these are such useless things that we will discuss in the module and the models especially the statistical linear algebra when we discuss them you will understand how exciting and interesting is this area