 Good morning everyone, we are OER team and today we are going to present the work that we have done during this internship period. So our project was enhancement of OER repository for NVLI. So these are the topics that we are going to cover in our presentation. As the topic of our project is enhancement of OER repository, first I am going to explain what is OER. So these are the materials that are useful for teaching and learning purpose and are either released in public domain or released under a license which allow them to be freely used, changed and shared with other people. And OER can be anything from a single video or lesson plan to a complete course. The software platform that we have used for our OER repository is dSpace. So it is an open source repository software and it basically focuses on the long term preservation of the materials and their storage and their access. So the source code of dSpace is written in Java and it makes use of Fibernate framework and Spring. And it allows us to capture and ingest material including metadata about the materials. And it also facilitate user with the enhanced searching and listing of the content. So this is the architecture of our OER repository. First there are multiple departments and under each department there can be multiple subjects and under each subject there are different courses and under courses there are multiple learning objects. So learning objects are nothing, basically they are videos or PDF files. dSpace provides us the basic features which we need in an OER repository but there were certain features that we thought would be useful for user but are not present in dSpace. So we modified the code of dSpace and added certain modules to it. So the first thing that we added is a personalized dashboard and the second is a rating system, a search query tag cloud and a recommendation system. So the purpose behind adding dashboard is that it will allow user an easy access to the content which he likes because we will provide user a facility to add the courses to his dashboard. So whenever user browsers the content of repository he can add those courses which he found interesting and later when he logs in he will get all those courses on dashboard. So this will provide an easy access to the content and save his time to browse the content again and again. We have provided three features, first is adding courses to the dashboard. The second thing is if the user does not want to continue that course or may have completed his course then he can also delete that course on the dashboard and the third thing is displaying all the added courses on dashboard. So whenever he logs in we will display all the courses. The second module that we have added is a rating system. So dSpace by default does not provide any rating system so we do not have any option for knowing that how our content and how good our content is. So we have added a rating system. So our rating system allow user to rate the learning object on a scale of 1 to 5. So basically with every learning object we have provided a scale user can rate the content and using the individual rating of all the contents we can compute the cumulative rating of the course and this was all about the rating system. And the third module is search tray cloud which Prem will explain. Good morning everyone. Another feature we have we have added in the dSpace is search tray cloud. The idea is to show the words which are most frequently searched in a large font size and the words which are most least frequently searched in a small font size and it gives a beautiful visualization to the users such that the user can understand the trending searches very easily. And it is implemented in three phases query word filtration removal of stop words and visualization of data. Let us see the implementation in detail. All the actions of the users are stored in log files in dSpace. From the log files we have to extract the search queries for that we give the log files to the log parser. Log parser parses the log files and it extracts the search queries. Let us see the example. Let us say these are the words which are extracted from the log files after the first step. We have given a stop word, stop word list from this query words. The stop words such as a and the are eliminated and the frequency of each word is counted and a key value pair search query and the associated frequency is generated and next part is visualization a random color is generated for every word and the font size is decided based on its frequency after the second step. See the dSpace is repeated three times and repository is repeated one time. So dSpace is having the font size is more compared to repository by this the user can understand the trending searches very easily and next will be said by Prasad. On the next feature we have added to this NVLI OER is recommendation system. Why do we need the recommendation system actually? The OER repository of dSpace is targeted to host the material in large amount. So the user can have a many number of parts to go through to these contents. So recommendation system basically provides and facilitates and predicts one path which is suitable for the particular user based on his or her past history in that OER repository. Recommendation system in this OER can be done at two levels. One is at learning object level and another is at course level and to implement the recommendations at learning object level we are using keyword based recommendation approach to recommend the courses we are using association mining the part of data mining. This is the architecture after we have added the recommendation system to the already previously existing dSpace and the first approach I am going to explain you is keyword based. The idea is here to make a model so that whenever we want to recommend the user with respect to particular item we will go into that model and we will search the most similar items to that specific item we will recommend them. For that we are using keywords as a representative of each and every item and metadata value is the such table which is having the metadata values in that. In the first step of future selection we are extracting the keywords of each and every object we are making the data and in the next step in any machine learning or data mining task it would be easy for us to deal with the numbers for that we are transforming the OER space into the number space for that we are using TFIDF metric or to map the OER space to the number space. In such manner we are transforming this into a number space and now we are having a bit vector that is represented as a each and every item. So now it would be easy for us to apply any kind of mathematical operator for that we are using cosine similarity it will calculate the similarity measure between a pair of items. So we are giving the feature vectors to the cosine similarity and we are getting n by n matrix which is a matrix of similarity measures. So that whenever we want to recommend a particular recommend items related to the particular item we will go into that similarity matrix we will fetch some of the items with related to that specific item which are having the highest similarity measures. For that again we are transforming these similarity matrix to the rank matrix so that there is no need to search every time whenever we want to recommend the few items and now we are having the model whenever we want to recommend the items we can go through this model but based on the user history only we can do that for that the D space is providing inbuilt logging mechanism event logging is there and we are taking the log files we are parsing it and we are getting the history as we are taking the user name we will filter out what are the items he has viewed and we are getting the set of handle items as we make the model with the help of object IDs we are converting these handles into the object IDs and then we are going into the rank matrix which is model and we are getting some recommendations we are showing into the D space and next approach will be explained by the moscon. As mentioned earlier the first approach recommends on learning objects level and this approach will be recommending on course level and the approach that we have used here is association rule mining and specifically we have used a priory algorithm of this approach. A priory is an algorithm for frequent item set mining and association rule learning over transactional databases that is it is used to determine the rules that bind objects together in a particular database and here there are some important terms which you should know k item set k item set refers to all possible sets of k items taken together and this term is very important when we refer to a priory algorithm because we calculate different item sets to finally find the rules and another measure of determining how to find the relationship between objects is support. Support tells us how popular an item is or how frequently it occurs in a database and confidence is a measure which determines how frequently an item x occurs with respect to item y that is how often do these two items occur together and this is very helpful when we talk about association rules because a rule makes itself available in the final set of rules that we obtain only if it crosses the threshold confidence based on our choice. A priory algorithm finds application in market basket analysis for example, as we see here there are four baskets with different grocery items purchased by a user in three out of these four milk and bread occur together. So as a rule we can recommend bread to a user who buys milk based on previous history of other users and customers who have purchased these items in our case in OER repository the courses can be likened to these grocery products and similar analysis can be applied to them also. For example, if two courses such as machine learning and neural networks have been viewed by many users together. So the next time a user views a course on deep learning we will automatically recommend the machine learning course to him. This is the implementation model of this approach. Firstly we take the OER repository log files which consists of many IDs such as transaction ID, collection ID, event ID, etc. Then we pass it to a log parser which finally gives us only the IDs or fields that we require such as user ID and the collection ID viewed by the user. This CSV file is then passed to a priory library which calculates the association rules or how to transaction records map to each other and we get a set which relates between different objects. Then our recommender fetches the history of one user and it passes to this function the list of items viewed by that user and for that item we fetch the association rules and see which items is it mapped to and then those items are recommended to that user. Now we come to the challenges faced in implementation. Firstly working with new frameworks, these space users frameworks like spring and hibernate and it was really difficult to understand the intricate functionalities that comes in these frameworks. So it took us time to really understand how these are fabricated together then understanding flyway database. It was a major challenge for the team to create new tables in the database and to fetch data from these database tables for which we had to write queries that we were not aware of which we came to know about later. Then integration of recommendation system with dspace. All the scripts of both the recommendation systems were written in python and dspace is purely java based. Earlier we started with installing gythin to call these python scripts in the java code but then it posed some problems. So we used some SQL triggers and functions to finally implement our recommendation system models. Futurescope, video and document streaming. We can provide this feature for better user experience and to reduce traffic network then incremental maintenance. The algorithms that we have used here can be more efficiently implemented if we use certain updating algorithms such as fast update algorithm for a priory so that it can handle incremental changes in database because as time proceeds the log files generated will be in huge amounts and our current algorithm will not be able to handle it. So we can implement fast update algorithm for the same. We can learn tearing internship, new and improved skills and how to apply them then we also became well versed in professional communications. We also nurtured leadership, confidence and responsibility in all of us. So with this we come to the end of the presentation. Thank you. Good work but a few questions. In your very first presentation you used the term department as a part of the classification. What is the notion of department? I suggest that you picked up this term because that is how you see contents getting organized in educational institutions. It is not actually a department is an administrative entity within an educational institution. It is not a knowledge entity. So you may call it domain or something, right? Anyway, one question when you are discussing the similarity matrices, use the term cosine. Why cosine word is used? Because sine cosine are trigonometric terms. So any reason you think why cosine is used? Just calculating the angle between two vectors. No, no, that's okay. But why cosine? Why not sine? Why not tangent? Why not cotangent? Anybody here? Why cosine is used? Yeah. No, no, okay. In vectors if you are calculating an angle you use dot product. Yeah but why cosine term? So what does it represent? It represents component of an entity on another component. And that is where the similarity comes. So whenever two things are orthogonal, cos theta is zero. So we often say that two unrelated things means cos theta is zero. And that's the reason why the term cosine is used. However, when you talk about similarity here, it goes far beyond just the component being similar because you are dealing with multiple things. There's one more question that I had. See, this is something that you yourself mentioned there is too much material, right? Now imagine d-space, our repository after let's say five years. And let's say there are about 100 million artifacts, digital contents, small clips, video clips, PDF files, et cetera, et cetera. And you are talking about recommendation system. How do you ensure that I as an individual do not get, as a recommendation from you, about 200 videos to be observed and 500 PDF files to be read? Because my performance or something is at this level. I'll go dead reading those and weaving those. So your recommendation system will actually kill me. Sir, we only have two objects which come to the table. Good. I expected this answer. How are you sure that those top five are relevant to me as an individual? Because if two of us have similar interactions, similar problems, both of us will get exactly the same top five. For me, the 11th might be most relevant. For him, the 23rd might be most relevant. The point is you don't know about me or him. You're treating both of us to be identical based only on the responses that you have captured from our online behavior. So this is the larger problem. I again don't recall whether I mentioned the 14 grand challenges which the US Academy of Engineers had initiated a few years ago. Clean drinking water, cheap solar energy, et cetera, grand challenges for this century. And the United States National Academy of Engineers expect that the mankind will spend this century solving some of these problems. One of those 14 problems is personalized instructions. And these recommendation systems are actually a small effort or a step towards such personalized instruction. Ideally, if I'm a student attending your class, I would like a personal attention of the teacher. I would like teacher to know me well and teach me exactly what is most relevant, what is most adaptable by me, and what is most required. Now that can happen only if a teacher is doing a private tuition for one or two students. But for a class of 60, it will not happen. For one million students, it will not happen. One of the important initiatives which our educational technology research group is taking in the whole world is to create individual model of learning behavior of every student. And while learning behavior is reflected not only in terms of the online activities that I do, but even otherwise. And that is not captured. For example, when I'm not studying online, but I'm discussing with him some problem. Or I'm doing a team activity in my college in spare time, relevant to a subject. This is not captured. So one of the things that we are suggesting, and this is an important suggestion for you, which you should implement. And you should get several other students also to implement. Start writing a professional diary of certain activities that you would like to want it. How much time do you spend in reading something? How much time do you spend in discussing a issue with somebody? How much time do you spend in contemplating? These things are not recorded. Whereas these things are the most critical to understand ourselves as to how we learn. Now if you want to create a learning model, these events will have to be captured which are outside the online system. And that is going to be the future direction of work of capturing information. It's like Google captures information about you, where you are, where you are moving from here, there, et cetera, et cetera. They model it for different purposes, for commercial purposes, but this is the kind of modeling that is to be done. And it is in this context that I wanted to mention that just giving top five will not be good enough. So how would you answer if I said that instead of giving top five, you give me those which are most relevant to me as an individual? Now forget the capture of other information. But just from the events that you are capturing, what for him, for me, for others, is there any mechanism for you to distinguish between me and him? Think about this. The events that you are capturing alone. So what are the events that are being captured for the recommendation system, the performance and... For the work collection ideas, we would... You can get actually much more information, no? You are not using it right. You can get time that he and I spend individually on different questions. You can get which questions I answered correctly, which questions he answered correctly. You can get which difficult questions he answered, which I answered. Would you not be able to distinguish between him and me then? Now, this was completely beyond the scope of your internship project. But what I am saying this and I am saying this to all of you is, keep this in mind that from the events which are captured, will you be able to distinguish between individuals with a greater clarity? That's all. All right, thank you.