 Hello and welcome to a session on Data Warehouse Architecture and Schemas. This is Dr. Anitha Poojar, Professor of Computer Science and Engineering Department of Walton Institute of Technology, SolarPore. These are some of the prerequisites. Learner is required to have the knowledge of ER modeling and entity relationships. At the end of this session, students are able to list the components of Data Warehouse Architecture, design the schemas for given business processes for some real-time business applications. Now, let's see what is Data Warehouse? Data Warehouse is a centralized repository that stores data from external multiple information sources and transforms them into one common multidimensional data model for efficient querying and analysis. A Data Warehouse can also be defined as an efficient mechanism which is designed to support business decisions by allowing data consolidation, that is, collection, analysis and reporting at different aggregate levels. Now, let's see the Data Warehouse architecture. These are the external information sources, that is, operational systems, operational databases, enterprise resource planning data, customer relationship management data, flat files, all these are collected, data is staged by the ETL process, it cleans the data and stores the data into one common repository called as Data Warehouse. So, Data Warehouse consists of raw data, metadata and aggregated data that is called as summary data and this data of the Data Warehouse is now ready for analysis using OLAPS, that is online analytical processing, then for generating the reports on various data and it is also taken by the data mining techniques for further analysis. Now, let's see what is ETL process? ETL stands for Extract, Transform and Load. It is a processing data warehousing which is responsible for extracting the data out of multiple source systems and placing it into one multidimensional data model called as Data Warehouse. It involves three steps, extract that is collect data from multiple sources, transform the data that is either store the data in sorted form, filtered form or aggregated data and load the data in such a way that it is ready for analysis. So, this is the ETL process diagram. So, on the first part these are the multiple information sources, the data is extracted from these sources by the ETL and it is stored into the staging area, then from the staging area it goes through multiple transformation processes and then it is stored in the Data Warehouse. Now, the data and the Data Warehouse is ready for further analysis. So, in the extraction part two things are done that is data cleaning and data profiling. In transformation part various types of transformation are performed on the data that is aggregated data can be generated, data can be in the filtered form or subsequent joints are performed to generate the data. The data is generated in the sorted form or the data is presented in the summarized form. Now, this data is ready for analysis. So, it is loaded into the Data Warehouse. So, Data Warehouse then is used for various analysis to generate the reports. Now, let us see the Data Warehouse Schemas. Data Warehouse Schemas is usually a multi-dimensional modeling which stores the data in two tables that is fact table and dimensional table. Data Warehouse converts the relational model into special architectures. The most commonly used schema models are star schema, snowflake schema and fact constellation. Let us see the star schema. So, we have taken the example of sales revenue. So, there is one fact table called a sales fact table and multiple dimension tables that is product dimension table, date dimension table, location dimension table and branch dimension table. So, you can see here fact table consists of two types of attributes. One is foreign keys that is product key, location key, date key and branch key. Then there are two facts that are to be measured that is sales revenue and unit sold. The other dimension tables they describe the attributes such as sales revenue and unit sold. The dimension tables are joined with the fact tables using the foreign keys and the relationship between the fact table and dimension table is many to one. Now think and write what type of relationship does a star schema have between dimension tables and the fact tables. Is it one to one, one to many, many to one or all of them? So if we see the previous slide again, so you can find that a business process or the fact such as sales revenue is described by multiple dimensions such as the date on which the sales took place, the branch from which the sales took place, the product which was sold and the location that is the product was sold to which location. So in this way one fact is described by the multiple dimension tables. So the relationship between the dimension table and the fact table is many to one. Now the second type of schema is the snowflake schema. Here same the single fact table is used and multiple dimension tables are used but the difference between the star schema and the snowflake schema is in star schema the data is denormalized whereas in the snowflake schema the data is normalized. Now if we go to the star schema you can find that multiple items can be sold by can be sold to the same city which belongs to the same state province or country. So that means the state or the country is a repeated data. So to eliminate the redundant data we use the snowflake schema. So here you can find that the location dimension table is decomposed into two tables that is location dimension table and city dimension table. So location dimension table has a foreign key called a city key. This city key again joins to a city dimension table separately. So in this way snowflake schema helps to reduce the redundant data and helps to manage the data very easily. Difference between star schema and snowflake schema is as follows. Star schema is a very simple database design whereas snowflake schema is a very complex database design. Star schema provides denormalized data structure hence query also run faster that is there is an efficient query processing. Normalized data structure is provided by snowflake schema therefore queries do not run faster they require lots of joints for execution of query. So in star schema there is a high level of data redundancy whereas in snowflake schema the data redundancy level is very low. Star schema provides single dimension table contains aggregated data whereas in snowflake schema data is split into different dimension tables. In star schema hierarchies for dimensions are stored in dimension table whereas in snowflake schema hierarchies are divided into separate tables. Star schema is useful for metrics analysis such as what is the revenue for a given customer whereas snowflake schema is useful for dimension analysis such as how many accounts or campaigns are online for a given advertiser. The third type of schema is the fact constellation which is also called as galaxy schema it is a combination of multiple star schemas. Here one or more fact tables are used this type of schema is used for very big enterprises which have multiple business process to be measured. Here we have shown two fact tables that is sales fact table and shipping fact table. Sales fact table is measuring the sales revenue and the units sold whereas shipping fact table is measuring the dollars cost and the unit shipped. The multiple dimension tables are provided in the fact constellation and these dimension tables are shared by multiple fact tables. So this type of schema is used for big enterprises. These are some of my references thank you.