 Hello, warm welcome to a session on basic operations for data analysis in Tableau. This is Dr. Nithapuja, professor in computer science and engineering department from Walton Institute of Technology, Sola. At the end of this session, students are able to illustrate the basic operations in Tableau. They can also illustrate getting started with Tableau, that is connect to a data source, choose dimensions and measures and apply various visualization techniques. Now let's see how to get started with the Tableau. Let's see some basic operations for creating the data analysis reports. There are three basic operations for creating the data analysis reports in Tableau. First connect to a data source, that is locate the data source and use the appropriate connection to read the data. Then choose dimensions and measures, select the columns required from a data source for analysis. Third, apply visualization technique, that is apply different visualization methods such as generating charts, graphs or tables for the data being analyzed. Now assuming that you have already installed Tableau on your machine, double click on the icon of Tableau, you will get the start page showing the various data sources. Under the header connect, you have options to choose a file or server or same data source. Files you have different types of files such as Excel file, text files, JSON files, PDF files and so on. In servers you have Tableau server, Oracle server, MySQL server, MongoDB server, Amazon Redshift server and so on. Tableau has with it some saved data sources such as Superstore.xls that comes with Tableau and many other such saved data sources are present in the Tableau. In this video, we will be using the sample data set Superstore.xls that comes with Tableau. Under files, you choose the Excel data source, when you click on the Excel data source, then you navigate to the file sample Superstore.xls. Now the next slide shows you how to open the start page of the Tableau. By double clicking on the Tableau icon, this is the first page that opens and this is called the start page of the Tableau. Now in the left pane, you can see the connect header. Under the connect header, you can see various types of files, servers and the saved data sources. So we are choosing the saved data source Superstore.xls for analysis. So I will double click on this file. So this screen shows the different worksheets that are present in the Superstore.xls file. The beautiful feature of Tableau is it provides the drag and drop interface for working on the data. So you can drag the worksheets that you want on which the analysis can be done. So here I drag the worksheet orders to the canvas area. As soon as I drag it, the below section shows the content of this worksheet that is orders worksheet. If I want to weave the complete table, it's shown here with all the fills. If you want to work with the additional data in the same data source, then you can drag one more worksheet people into the canvas area. As soon as you drag, immediately an inner join is performed between orders and people and the result of this inner join is shown in the below area of this screen. Now we will remove the people's worksheet from this canvas area by using the drop down and using the remove option from that. Other natively, you can collect the related data in a different data source by using add connection tab. So here after clicking on add, it shows the various types of files or servers or the same data source which you want to use. So I click on text file data source. After clicking on it, it will show this file that is Google Superstore Returns 2016.csv file. So I will connect to this file by double clicking on it. So this is the inner join formed between the orders worksheet and the returns worksheet that is returns csv file and the result of the join is shown below. Now if I want to see the nature of the join, I will click on this icon. Immediately I can see the information about the join. The join used here is inner join and it is a join between two data sources that is orders and the returns.csv file and the join clause is the order id. If required, join clause can be changed. Now I can do some more data management functions. We can change the data type of say row id from number to string. So I will click on the drop down box and I will choose the string format for row id. So now it is seen as a string. In the same way, I can even split the order id field which is a composition of three fields. The first field is the distribution center. The second field is the year in which the order has been placed and the third field is one unique number. So we will click on the drop down box, go to the custom split. When you click on custom split, a window pops up wherein you have to enter the separator. So let's enter it as hyphen and then we have to retain the first field as we want the first field to be into the worksheet. So I will retain the first field as it is and say OK. After that, you can see that one more additional field has been added to this worksheet where only the distribution centers are specified and the header of that has been changed to distribution center. So you can see the distribution center additional column that has been added to the worksheet. In this way, many data management functions can be done in the tab view. So now let's pause the video for a while and think on it. In tab view, one can integrate data in two different data sources. Is it true or false? So the answer to this question is it is true. As we have seen in the previous slide, we have performed the data integration between the data of two different data sources. One was orders worksheet from Excel data source. Another one was returns.csv file from the text file data source. We use the tab add connection and provided the connection between these two data sources and then further analysis. Now let's go to the second step of data analysis that is choosing dimensions and measures. Dimensions are the descriptive data or qualitative data on which no mathematical functions can be performed whereas measures of the numeric data on which any mathematical function can be performed. Taken together, they help to visualize the performance of the dimensional data with respect to the data which are measures. Now let's choose some dimensions and measures here for analysis. So we will choose category and region as the dimensions and sales as the measure. The result shows the total sales in each category for each region as follows. So we have dragged category to the columns region, region to the rows and sales to the text tab in marks. So in this way a table is generated giving the complete analysis of the sales in each category region wise. Now what is the disadvantage of this? The tables are not easy for judging the performance. You cannot compare the performance either region wise or category wise. So to convert this table into charts, we do the following thing. Drag and drop the sum of sales column from the marks tab to the column shelf. The table showing the numeric values of sales now turns into bar chart automatically. Add another dimension segment to the existing bar chart. So I will just show you in the next slide how these dimensions and measures are chosen. So here we have chosen market as the first column, then someone sales as the second column and rows we have chosen category and segment. So in this way now it gives the complete analysis in the form of bar chart. It gives the analysis market wise that is Africa, Asia, Pacific, Europe and so on. And in each market it gives the sales in the different categories and that to column wise that is consumer, corporate, home office for furniture, similarly for office suppliers and for technology and uses different colors for different markets. Now if I drag the segment to the color then the analysis is shown in this way. In the column category, in the column there is a category and some of sales and in the rows there is a region. Segment has been dragged to the color in the marks tab and here the sales for the different segments are shown in different colors. For example, blue color is shown for consumer, orange is shown for corporate and green is shown for home office and it is shown for different categories that to region wise. These are some of the references that I use for preparing this video, tutorialspoint.com and tablu.com that is official website of tablu.