 Dr. R. R. Giddhi, working in Computer Science and Engineering Department of Valshan Institute of Technology, Sholapur, today's topic is data warehouse from the subject data warehousing and mining. The learning outcome of today's video, at the end of this video student will be able to comprehend the concept of data warehouse, then they will be able to use data warehouse for data mining and knowledge discovery. The contents of today's topic are, we will see what is data warehouse, the definition of data warehouse, then the various features of data warehouse, there are four features, then we will see or we will enlighten the architecture of the data warehouse. Let us see what is data warehouse, as we know that data mining database is nothing but it is a collection of data in one particular database, it may be database may be structured query language or no SQL or access. But what is data warehouse, data is a collection of corporate information and data derived from operational systems and external data sources, it means that it is collection of various databases, means it may be collection of SQL, it may be collection, it may contain no SQL, access, ingress and so many databases and through various sources. So collection of all those databases in one data, in one data that is called as a or that part is called as a data warehouse. Now it is the features of database, there are four features, the first feature is subject oriented, then second is integrated, third is time variant and the fourth one is non volatile. And all those features are used to make the decisions, it contains information data which are used to support other functions such as planning and forecasting. Let us see the features one by one, the first one subject oriented, data are organized according to subject not according to applications. Now subject may be you may consider customer, then the data is stored related to customer, then product the database is stored related to product only, sales the data will be stored related to sales only. It mainly focuses on modeling of the data and analysis of the data and all those modeling and analysis can be used for the decision makers or we can draw some inference. It provides simple and concise view of the subject by removing the noise or by removing the not use by removing the not useful data so that we can make use in a decision support system or we can make some decisions. The second key feature is integrated, as we know that data is collected from heterogeneous places or various data sources, it may be relational databases or it may be simple file or flat files or it may be data that is collected online transactional records. So all this data is collected, then the process called as a cleaning is applied and integration is applied on that particular collected data. Cleaning is nothing but to remove the noises from the data and integration is nothing but putting all those various databases into one format. It also ensures consistency in naming conventions when data is collected from various sources. So we should see that it should maintain the consistency in naming, it should maintain the consistency in encoding the structures, attribute measures, for example the naming may be hotel price, currency, tax, brick pass, covered, etc. The third is time variant. The time horizon for the data variance is significantly longer than the operational system. It means that whatever the data we are going to keep in data warehouse that will be kept for a long term, for a long time. What is operational system or operational database? The current value of the data and such data can be stored maybe it is historical for the historical perspective say maybe 5 to 10 years data is stored in a data warehouse. Then the next one is every key structure in the data warehouse contains an element of time explicitly or implicitly. The fourth one is non-olatile. No updates are allowed. Once the data is stored we cannot change the data. It cannot be removed from the data warehouse. The data in a warehouse represents the history of the company. It does not require transactional processing or recovery and concurrency control mechanism. It requires only two operations that is initial loading of the data is required and accessing of the data is required. This is the feature of non-olatile. Then let us see what is the process design process. Now there are two approaches one is called as a top-down and the second one is called as a bottom-up. In the bottom-up we see the overall design and planning and bottom-up it will start the data from experiments or the data is collected from the experiments or the prototypes. Now the top-down approach it is very similar to that of software engineering in a waterfall method in which structured and systematic analysis of each step before proceeding to the next and in spiral rapid generation of increasing functional system short-turn around time and quick-turn around time. Typically data warehouse design process contain business process grain dimensions of the records and the count of the record. Now the next bit is architecture of the data warehouse. Now this is three-tier architecture. This is bottom-tier, this is a middle-tier and this is a top-tier. Now let us see in bottom-tier we have a data warehouse. This data warehouse fetches the data from operational databases that are collected from the various databases or various sources and on that data the various processes that is cleaning, integration, transformation, loading and refreshing of the data is carried out. Then metadata is nothing but data about data warehouse is maintained over here and monitoring is done and integration of those databases are carried out over here. And this data is fetched given to the next tier that is middle tier OLAP server. Here the various techniques of data mining are selected and on that we generate some models or some patterns so that that pattern can be used for getting some inference or drawing some conclusion which will be useful for making the decisions. And in the last that is in the top-tier analysis is carried out then query will be fired and reports will be generated. This is all about architecture. Now let us take a question. What are the steps involved before the data is stored in the data warehouse? Take a pause over here and try to answer this particular question. The answer is data cleaning, data integration, transformation of data, loading of the data and refreshing of the data. Data let us see what are the data various models. There are three models. The first model is enterprise model. It collects all information about the entire organization means data collected from entire organization or maintaining of the entire organization is enterprise warehouse. Then data mart, data mart is subset of a particular specific group of users or from the corporate wild data or it may be selected groups of marketing data. It is independent versus dependent directly from warehouse data mart. In virtual warehouse a set of views over operational databases only some of the possible summary views may be materialized. These are the references. I hope you understood. Thank you very much.