 Hello everybody, I am Shukla Ram. I have got small data analysis for the last few months here. Initially we were working with the EDX data. Now after the IIT Bombay is started. We have started working on IIT Bombay. Now after, you know, I hope that you know about the IIT Bombay. I hope you know about the IIT Bombay courses and what we are going on now. We have been introduced to, as you said, all the courses and ideas, love of the morning sessions. So, I will give you details just for introductory talk. I will cover the basic concepts. I will talk about two or three things. One is the data, what kind of data are you working on and what are the types of data. That will be my first topic and then the second will be the technology. What we will be able to use. And thirdly, I will talk about the projects. So now the data, IIT Bombay creates three kinds of data. So one is MySQL database is MongoDB. This is the log file. It is generated in the sort of way today. Now log files are normally very huge and to deal with the log files, we will be using deep data concept. Now MySQL is having some data, which are basically users' personal data. That is the students' personal data. Then enrollment data, some summary data. That is whatever, whenever they have access to the course. So all those data will be generated in the log file from there. The summary data are collected and kept in the MySQL. And we will have the great information with the students also. This is the basically MySQL data. MongoDB will have the course information. These are the courses which are offered by the IIT Bombay. Then each course is what is the course structure? What are the course chapters or sessions? And what are the videos covered? And what are the problems? All those information are collected. That is one kind of information. And second information is the quorum data. There is a discussion topic. So quorum will be created. So all those data information are kept in the quorum data. So these two kinds of information are given in the MongoDB. And third is the log file. Log file is normally users' access data. It is users' navigation data. So every time user accesses the corresponding log to record will be collected in the log file. So these are three basic kinds of things, kinds of data which we will be working with. So this is about the data concept. Second, we will be talking about the technology. Python, Canvas, then hard data analysis. Hard of hard data analysis? How many of you have hard of hard data analysis? Two of you. Have you worked with that in hard data analysis? Or just a concept of it? Hard data analysis. Same for the big data, we will be taking hard to find system in Spark as a computing environment which will work on the hard data. It must have hard of Spark also. Then lastly we will work on Spark R which is actually an adaptation of Spark on R. So we will work on Spark R and then we will work on a lot of visualisation. These are basically parts of art. Art is having huge libraries. Different data analysis, like classifications, there is A, tree analysis, then A and all those things, whatever is there in the data is tools. Art is having all the library plus visualisation tools. It has got many visualisation tools so we will be using some of them. And Spark you know that it is very fast. It is in-memory computation. That is, hard data is not an in-memory computation. It is a write-safe data on the hard disk and so it is slow-visical. So Spark is very fast and so is Spark R. This is the technology which we will be working on. Now I will be talking about the projects. We will have three projects. Some sort of basic program which is written in Java. In Java speak hard and so we will have to convert it first into Python and Django framework. It is basically in the Java is a framework. It is a framework that is in the sublet Java speak. How do both step and for visualisation right now we are using normal ways. So these are all existing. Then we have to convert it to the Python and Django framework. This is one project. Second is that we have mostly the statistical analysis result. In just personal data like general-wise, then degree-wise, age-wise analysis which itself also provides. Then we have environment-wise analysis that is how many of these are available and which shares and feed-out all those things. Then we have on the summary like code and participation, course interaction, video course which includes video, program, problem, everything. This is navigation also. And those are all mostly on the summary data. So what do you want to do now? What are the log files? When you create a Spark cluster and then put all the log data in the Spark cluster and run Spark R on that. So Spark uses RDD that is a resilient data distribution structure. So we will be using RDD to create the data models. This is and then after the... This will be converting mainly to Python and Django framework. And after that we will be having the log files that we distribute into our cluster. That is part we have to install on the cluster Spark and RDD system and then we have to distribute the log files and then work on the log files. So while working on the log files we are thinking of doing some prediction and recommendation system. Like when the user is navigating so we can see how they are navigating. So we can do the analysis on the log file and depending on the user's navigation that is the collected data we can predict something for the future or recommend something like this user is using this one. So you can also use this one. So this is the second and third is making our project futuristic. So we have to take care of any future analysis model which somebody can come up with. So that means we have to feel that we provision for dynamic creation of models. If one creates a new model we will say that which which are the parameters for that model then each other whether it is to create a visualization like whether it will be creating HTML file or BNC file or image file or Excel file whatever it is. So we have to indicate all those things and then indicate this model. So the input system will take it and integrate with its own main system. So in the next time you can do the analysis. Go to your main and directly you will load that program and it will run it. So these are the three projects which we are talking about. She is part of the earlier Nagesh group. On these projects you should choose the Nagesh group.