 Good evening everyone. I am Kajol from NIT Patna and I am presenting before your implementation of discussion navigation Earlier the information about the engagement of students and the discussion forum were not available to EDX analytics dashboard which is basically an idea used by the instructor to find the engagement of the students in their courses So here is a short demo regarding the work we have done Good afternoon dual. Here I am going to give you a demo of discussion navigation that we have added in our insights dashboard So before I proceed I would like to tell you we have created APIs to retrieve data from backends For example these are the APIs that are pre-installed by EDX and they use to retrieve data from backends So we have developed APIs that are pre-installed by EDX and they use to retrieve data from backends So you can see that we have developed to APIs new discussion participation and discussion contributed to retrieve data from backends So we have to provide a course ID for which we want to retrieve data For example here I am going to paste a course ID for which I have data I got an error of the authentication of credentials were not provided because I haven't provided an EDX API key Let's check it out So you can see that I got some data Here I got some title and number of contributed and number of views For example I have a title and corresponding to the number of contributed and number of views That in our discussion form For example for this particular title I have number of contributed equal to 10 and number of viewed equal to 200 And similarly for contributed I will retrieve data from backends For same course ID You can see that the number of upload to 72 that means the number of student who uploaded any data Corresponding to that course ID in a discussion form This API will integrate in our discussion tab that we have created in our discussion form Yes man, this data is got from Spark program So here I am going to enter the login credentials and corresponding password And sign in This is the EDX insights, EDX dashboard Here I am going to choose a particular course for which I want to retrieve data I am instructor so I will go to instructor tab So we have added our discussion navigation in our insights site So I have to redirect to inside site So here I will click on So you can see that here are four navigations We have created our discussion navigation in engagement You can access from here or we can access from directly through there This is the discussion navigation that we have added And we have plotted the corresponding graph that we have retrieved data This is the title and the corresponding viewed That we have retrieving data from backends from our APIs For example this is our title and corresponding viewed are 48,610 And similarly for contribution The number of contributed people for this title Also we have retrieved data of number of people who uploaded any data corresponding to this ID Also we have displayed the data in tabular form as inside zoo You can download this data as in CSV format You can see that this is the same data that we have plotted over and showed there on tabular form Very good afternoon everyone So we have set up the Hadoop Multinode cluster We took three slaves and one master And then we set up the hive, then scoop and apache spark So after receiving the data from the parser group Which we receive in MySQL tables We use the scoop and transfer the data into HDFS Then from there we run some spark tags And we did some analytics to answer some of the five questions Top five questions which were not earlier there So the first question was this The top ten comments For my reference I have taken the course CS101.1x And we have plotted the graph about the top ten comments So as you can see that blame plus petty field problem This is the most important comments in CS101.1x So this comment can be useful to the instructor So that when instructor will get the idea that these comments are discussed among the students So in his future videos or in future lectures He can include the brief more idea about the comments The next question was the number of questions versus the number of discussion And for this I have again used the same course that is CS101.1x So as you can see for this course the number of questions were around 57% While discussion was 43% So this will give the idea to the instructor that since there are more questions So maybe the students are not clear about certain topics And so in the next question I have tried to find out these questions So these were the top 20 to 30 questions Uninsert questions about the CS101 There is a note gone to Dr. Fatter Please go back to the earlier slide Okay The students have doubt regarding your subject Have you sent a note to him? So what? Your inference has been communicated to the faculty We have used only few What? We have used a sample of the log files Why? We got around nearly 60 GB of data we collected from the IT Bombay Then we did the parsing and all Then for our reference just we took the sample of the data What I am saying is you people are claiming analytics are very important I don't believe it but forget it We don't want that discussion And then you are saying that people who believe so robustly analytics And has spent 8 weeks on it Okay On working on answering questions which are not there The result of the question has not been communicated to the faculty This is the experimental Then I don't want to see experimental value I want to see actual results The same we can use the code Our code will In a presentation at the end of your thing You are saying that one single course also you have not passed through Fully Why? So what was stopping them 8 weeks? Sir from the starting we need No no even if we take 4 days to crunch Why are you not crunched? You are not crunching Why I said you don't have to work You are crunching the machine Let it go for 4 days You have got 8 weeks And you don't take one full course And I don't get any output Of your thing Okay How much I fired my poor library guy For not doing work and giving completion The same thing is true with you people No no no same thing is true with these people Okay You have to produce for one course at least Okay Take all the data and run it Which one of you is there for one more week? There is one person correct So You are there for one more week Take one course Let us see whether Dr. Fatter's course questions are more Discussions are more or whatever it is Okay Complete one at least He said complete one at least You have to work with the machine You don't have to work with the machine Sample data is not correct We cannot believe that your software is working Unless you complete one course In the middle you will say All the blocks are ruined My person is dead That we have to know One data you have to do So Again we tried to find the most viewed comments Again we use the same course that is CS101.1x Basically most viewed comments Okay I need to know if there is an outlier Okay If I run it on the full course Okay If all the comments There should be some comments which are way above And something where which is way below Most of them average I don't want to see The reason for running a full course is When I do an analytics I have to only concentrate on outliers As a faculty I need to know what are the outliers Anyway no outliers Everywhere it is same Within a range of say Plus or minus 10% Then that data is useless It doesn't get me anything Okay That is why it is necessary to run it on the full course Okay With analytics you don't run every day You run it at the end of the course No some feedback has to be there Okay Basically something that What is a valuable feedback that a faculty can get You people should put it in your project report Okay Good evening everyone I am Manshu I will talk about Jenkins Jenkins is a powerful open source application That allows continuous integration And build automation regardless of the Regardless of the platform which we are using It is used to build and test the system automatically When someone commits their changes It integrates the system automatically And if there is an error That error can be sent to a person By a notification Now after installing we should configure Jenkins Security must be enabled And path of a Java home And get should be given Transferring tracking logs to HDFS R2 connected Jenkins No no no We have to install Hadoop on a server In which Jenkins is running It is a Jenkins scheduler Which will transfer Everyday new log files has been created And stored in a remote server Now and so these log files should be transferred To HDFS in a local server So that we can run analytics task on that So why we use Jenkins is We should automate this task So that it can run everyday at midnight And transfer the tracking log From the remote server to the HDFS on a local machine So these are the I cannot show the demo because there is no land wire And I have to access the There is a course running on the production server right now Okay Correct Why is your system not used To test that course And secondly if you are saying That I will download data everyday I will transfer the data to HDFS everyday Doesn't matter Everyday you are doing some activity correct It is done automatically You through the agent called Jenkins Is doing something correct Everyday Okay Does the faculty know that you are doing it everyday First of all why are you doing it everyday Why not at the end of the course I assume you are doing it everyday Because you want to have current data So current data and the current analysis Can you not use Jenkins Okay To run her program that program At the end of the day early morning 6 o'clock Indian time Not US time Faculty gets a report Of the analysis tool why not This This So has Correct So I am asking Currently is Dr. Fartak getting a report everyday No Then he is not doing What are you saying he is doing No No Nothing No work in progress He is downloading data everyday That persons whatever Question that he is answering He is answering everyday The only thing is you are not reporting to Dr. Fartak Because you are I don't know whether Are you running it on IIT Bombay X production data No it has been run on EDX platform On EDX platform What EDX platform Which course are you taking Downloading data everyday Backing logs are from IIT Bombay X CS 101 CS 101 as far as I know is not running everyday No Is a running course now No What he is doing everyday Is my question The only course which I think is running is Dr. Fartak's course which is Something else Learning technology or something like that That is the course that is running on IIT Bombay X The log files which we get Everyday it will Transfer them to HDFS But are you Monitoring the current course that is running No I am not Then what are you monitoring everyday There is one course which is running You are not monitoring that Production That we are talking about production This is what we are doing We are developing But he says it is implemented I cannot show you a demo It is all he said I can show What have you tested Why have you not tested I don't understand Why have you not tested on Real live production data Because I didn't But why they are not doing it The operation people have to give the access No What operation people have to give the access The server has to be What we have done is the operation We have taken from him The log file and have to give the access Are you saying That the IIT Bombay Does not have access to the S3 server Everyday If that is true then there is no point in developing anything You are using S3 server You are taking the data from S3 And loading the local Hadoop file server No Yeah IIT Bombay X Is producing data on the Amazon S3 Correct Does Dr. Fattak Does he have the authority to download data everyday That is the question number one He has Then why are you not using it You are downloading data Data can only come from S3 There is S3 You have got this big parser of yours Which is taking data and storing it That whatever pipeline you said So that is being stored in local Hadoop Then why it is not being used But then on what you have tested If you are not testing production data On what have you tested this But on sample log files You can't have everyday data, no? I generally put data on that Production data is available You are not causing any damage I will ask another question Is Dr. Fattak prevented from accessing the data More than once I was thinking that you did not take it That once Dr. Fattak has gone Then Dr. Fattak's data will go He was a permission to source it from there Have you asked show me the documentary mail Show me a mail I want to know if If my data and Dr. Fattak is not adversely impacted Who is stopping it You have to test it There are not a lot of other things No, that is exactly what I am saying No, no, no, no This is not, I did not say that I am saying are you causing any harm By pulling data from S3 If you are not causing any harm Why have you not pulled Your software may not work You will destroy the data everyday What is the problem Your software not working Is perfectly fine I am just talking about test data Which is available Large amount of test data is available I can have performance testing On your software I can know whether her code Requires 48 hours to run Or his code runs in 48 seconds All that valid information is lost Because you have refused to use data Even we can make the data available As an open source for other sources It will not be tested No, it will not be tested It will not be tested on S3 Anything that is not much tested Large data I would call it tested Large data I would call it Next I will call Ipshita for conclusions Moving over to the future work As of now only two tasks have been converted From MapReduce to PySpark What can be done in the future This can be extended to the entire Enrollment section There is one thing which is absent in the future I think you should add it Okay Run it on actual production server data Why it is not there That is future work The event catching mechanism Can be made better And now in the discussion menu We are showing only one graph So similarly whatever Queries we have answered That can be integrated Pradeep will continue Moving on The LMS to HDFS transfer Can be made Avoided through a temporary storage This is making the system inefficient One more idea that we had was To supply machine learning techniques To draw meaningful conclusions From the comments on the discussion forum Also our parser can be optimized By reordering the data Also in the end we would like to thank Our lovely mentor Mr. Shukla Naag For a constant support and encouragement We would also like to thank Dr. D. B. Fartak sir For this wonderful program I want to tell all of you one thing Okay Part of the thing here Is as somebody said learning industry grade Okay This is college grade Industry grade means you have to Test with whatever opportunity Okay Testing is the most important thing My people know that I have been hammering them on testing Okay For the rest of you Testing testing testing testing Testing Your software only as Good as it is tested Otherwise it is college software Okay You can walk through any Interview Okay Any interview Okay If the person knows that you write Tested software Okay Because that is the differentiator Between a college level software And industry software Only Because What the thesis lies Writes all kinds of junk Absolutely all kinds of junk Hopefully she is still around Okay But it is tested Okay And I believe in thesis testing At least At my time They were actually testing And only junk Software was junk But testers will catch it Okay That is the only difference