 Nancy is working at CORE in the United Kingdom as open access aggregation officer and she is involved in two very interesting projects about open access and open science like foster and open-minded. Before we properly start, I would like to remind you that you can type whatever question you have in the chat or go to menti.com using the code 489563. I'm just typing it in the chat again for your convenience and ask your question in a totally anonymous way in case you are you feel shy about asking questions directly. We already collected a few questions on Mentimeter so before asking Nancy to answer them, I would like to also remind you that we are going to have a full webinar with open-minded tomorrow at 2 p.m. with Nancy again and you are much more welcome to join it. Nancy, would you like to introduce briefly what are the main topics for the webinar tomorrow? Ilaria, I think that I don't have a webinar tomorrow, not an open-minded. Oh, God. There is one but I don't think it's Ilaria who is going to present it. Okay, because I was a little bit confused. Okay, that's my fault so sorry everybody, there's going to be a webinar tomorrow about open-minded with Nancy. But on my presentation on text and data mining and machinery stability which I have done already and it's already recorded, I present the work that we did in the open-minded project and because the open-minded project was a very technical project in part of it, what I presented was the part that was not technical and the work that we did for librarians and this support administrative stuff. So if the people want to find out more about this, before they watch tomorrow's webinar so that they get the full picture, I think that it would be good if they haven't done so already to go back and watch the webinar that I gave on TDM and machinery stability and then they can also come tomorrow on your webinar tomorrow. Okay, thanks Nancy and apologies all for messing up but it's a very hard week this week with all these webinars. So I'm putting the recordings of the webinar that Nancy already recorded about text and data mining and machinery stability of open-access research in the chat and please remember that this webinar, well this tutorial has been already recorded and it's available on the page collecting information about the open-access week on the open-air website. Okay, so just to kick-start this question and answer session, I will share my screen for you Nancy to read the questions that have already been asked. Alright. Okay, are you able to see anything? Yes, I am. Okay, so the floor is yours. So the question is, is text and data mining something a librarian, not IT, skilled person can approach? And that is a very good question and before I answer this question let me tell you that I am not an IT skilled person. I am a librarian, my undergraduate degree is purely in library science and in the past I have worked in libraries both as a simple librarian dealing with the day-to-day aspects of a library and after my PhD I moved and I became a repository manager but I was located in a library. So I'm not a technical person and I don't approach text and data mining from an IT point of view and that said the IT point of view or the technical point of view is only one aspect of text and data mining and maybe this can be difficult if you are not familiar with the technical components of text and data mining and I don't expect and I think that no one does we don't expect all librarians out of the sudden to start studying let's say computer science or in a magic way to start developing technical skills but nonetheless there are other components of text and data mining and one of those are the legal components around it and as far as I can tell best from my experience as a former librarian but also from my librarian colleagues that I have seen throughout the years is that librarians are the ones who know the law well and they know the copyright law very well and they know how to interpret the copyright law so the legal aspects of text and data mining relate to the copyright laws and to the fact that when an output is accessible for people to read is not equally accessible for machines to read and those are the small tricks of the copyright law that librarians know very well how to use and the other component that relates a lot to the legal stuff and moves us also to the practical organizational policies stuff is that librarians are the ones who in the past have pointed those parts of the organization or those parts in the law that create a barrier in the promotion and engagement of the user with a specific practice or with a specific tool so that is the second part that libraries could get and librarians and the research support administrators in general could be involved with and those are the policies for example that funders or big organizations create and librarians and repository and pardon and research support staff what can do is that they can discuss with those people and provide feedback and comments on those limitations on those difficult areas where one practice cannot be applied and push for a change in the system for example here in the UK people who are not from the UK may not be aware but very recently we have an exception in the copyright law for TDM practices and the law says that everyone that has a lawful right to the content let's say that a person is affiliated with an academic institution like a university if the university subscribes to the resources of a single publisher or from many publishers and the person has lawful right to access those resources then the person can practice text and data mining without having to make single agreements per publisher so this is an exception for the UK in the law but those who support the practice of text and data mining can push for similar exceptions in the laws per country and something similar we also see that the European Union is also going towards a more free and open law that is going to enable text and data mining for research purposes only pardon I'm feeling a little bit under the weather lately and the last thing that I wanted to point out is that it is the librarians or the people who are doing research support not necessarily have a library degree but work in a similar environment is that those people in the past have been doing the training for all the new trends that come up in the past it used to be information literacy and then it used to be open access and then further on maybe it used to be research data management or they also provide training on other tools that researchers or students use such as the citation system tools and deposits for example in a repository and in a similar way what libraries can do is that they create training at text and data mining now you may think oh a librarian is not a technical person so how is the librarian going to be in position to create training on text and data mining again this doesn't have to be a technical training you can leave a technical training to a person who has technical skills maybe this needs to be at a departmental level because taking into consideration that not all your students or researchers will need to know the technical parts of text and data mining but what they may need to know is for example resources where they can go and collect text and data collect a large corpus of information for text and data mining purposes or for example a student, a PhD student that you may have in your team or a researcher in history perhaps this person doesn't have the skills to perform text and data mining but it would be good for this person to know that there are text and data mining practices out there and these text and data mining practices have certain benefits in the discovery of knowledge and information that could be useful to them and to back up what I am saying for example is that the MIT libraries and I'm going to paste the note in the chat area and I think that I will do it correctly so I'm sharing a link in the chat area this is a very simple page from the MIT library pages which lists the APIs that come from organizations, institutions or services that have scholarly communications content so workshops and trainings do not have to be necessarily technically but they can be also but they can be around the organizational stuff about text and data mining about the policies stuff about text and data mining and about the legal stuff around text and data mining and don't take it for granted that even the IT skilled persons who perform text and data mining are familiar with the legal stuff around text and data mining after my experience I had with the open-minded project we realized that there are people who perform text and data mining without being 100% sure that they could do it, that they were doing this without infringing any part of copyright and we've also seen cases where people wanted to perform text and data mining but they wouldn't do it in the end because again they were not quite sure about the legal issues around the text that they wanted to work with so I think that there is room for libraries and not IT skilled persons to work around the subject of text and data mining Thanks Nancy, it's good to know that libraries can play a relevant role in a wider area than libraries themselves That's very interesting. Next question is about what are the practical uses of TBM? In my mind and I will say this because I do not come from a company and because my focus and my interest in text and data mining is more applying text and data mining to open access content because nowadays we are in a position to say that there is lots of open access content out there I will briefly say what are the practical uses of the business side of you and I'm not going to explore this further but I will focus more on the research point of view and on the content that we can use that already is open access Sorry, but before I say that I would like to say that from the business point of view there are a lot of things that can happen that relate to knowledge management for example or can relate to cyber crime prevention or there are a lot of text and data mining companies or businesses that do those, there are a lot of businesses that do text and data mining practices in order to help their customers For example we all go to supermarkets and the vast majority of them now they have those supermarket loyalty cards that every time you buy something you have to scan your cards One of the reasons that these cards exist is because these cards are the background they collect information about your shopping habits and then the supermarket is going to be able to help you with your future needs and that's why whenever you get this card this card comes in with like, it is glued on a piece of paper and there are like tiny little letters at the back of the paper that pretty much none of us reads because this is what they say they say that they're going to collect information from your shopping activities and habits so that they can formulate services for the users and also text and data mining is used a lot in the web searches and all major web browser search engines are used text and data mining In general for text and data mining to be a successful practice the people who perform text and data mining they need a very large corpus of data or text or whatever that may be and this is why text and data mining is more successful when the corpus of information is very large so that is one point to have in mind and the other point to have in mind which brings me back to the researcher point of view is that not everything, do not take for granted that everything that people's eyes can read, machines can read as well and I have developed this very well in the webinar that I gave a couple of weeks ago that is already on YouTube and you can watch so from the non-business point of view but from the researcher, scholarly communications and scholarly outputs point of view I need to have two prerequisites in mind the first one is the large corpus and the second one is an open license and that is what makes the use of an open license very important because currently, especially with the repositories there is a lot of contact with the repositories where a user can go in and download or maybe read but this doesn't necessarily mean that machines can do the same thing or this doesn't necessarily mean that people can use for text and data mining purposes but from the scholarly communications point of view which was also the focus of the open-minded project one of the practical issues of TDM is for example mining bibliographic data and that is something that happens a lot with the data that we have in the project that I am currently working on which is CORE is an aggregator, a harvester of open access content and it harvests repositories and journals both pure open access journals but also hybrid gold open access journals and then we have all this information we give all this information out to the end user for free via our API and then users can get in there and they can get bibliographic data for specific topics that they are interested in and users can also conduct the practice of what is called in TDM as sentiment analysis now I know that I am using a little bit of jargon and as I said earlier I am not a technical person so I am not going to tell you how sentiment analysis can be done I am not familiar on how sentiment analysis is performed but what I can tell you is that with sentiment analysis for example someone can understand the meaning of the papers is this positive or is this negative for example when there is one paper in the medical subject field which has a very large number of citations and people may think oh that is a very good paper because the citations are very large but nonetheless this has a very big number of citations because those are negative citations people cite this paper on their literature to indicate that something negative was and wrongly was discovered and they want for example in the area where it says be careful don't do those things they want to say don't do those things that this x paper has done so with sentiment analysis not only people are able to find the meaning of the papers but they are able to find also if there is like a positive meaning or a negative meaning and because the corpus of the information that we have today it's so large that people even though if they were reading the papers 24-7 and if we have said that for some weird reason they can digest all this information and make the connections which is not humanly possible then they wouldn't be able to find all this information for example another part like practical issue of text and data mining is what we call document summarization and by this you can understand there are techniques where someone can quickly find out which are the main concepts in a document so to go back to the history an example that I gave earlier so let's say that there is a research researcher or a PhD student in history and is not aware of text and data mining if this researcher or a PhD student learns about text and data mining then can have someone else who is an IT person to perform for example document summarization and mainly provide the concepts of the documents and then this technique is going to be able to find out from thousands of papers which hundreds of papers are useful for the researcher or the PhD student and then there are also other techniques so that those papers can be narrowed down to the correct papers that the researcher needs and maybe you will think well yes but we have those databases where we search like ProQuest and EBSCO and all those things but bear in mind that those databases they have certain fields they perform searches either on the title or the text or in the abstract without necessarily performing those practices of text and data mining and those who want to who are not IT who don't feel that have technical skills but nonetheless would like to see a little bit of how text and data mining practices and different practical issues are being performed in the course that we created for the open-minded project we tried to create some exercises which were not technical and we give also instruction on how people can perform those exercises and we use the core API a limited volume of the core API so that we show what are the possibilities around text and data mining in a way that it's going to be useful to those who are not technical so you can go into the course and I'm trying to find the URL right now and you can go into the course and you can try to the technical part I will paste the link now here in the chat you can try the technical part and I hope that you are going to enjoy it Thank you very much Nancy for the participants I would like to remind you that this webinar has been recorded and the links that Nancy is putting in the chat are going to be available on the OpenAir portal together with the recordings and the tutorials she has already recorded There are two more questions so this one you can see on Mentimeter Do you have any examples of TDM in humanities and social sciences have difficulties convincing my HSS colleagues that this is important? That's a very good question I wish I had this question yesterday because by now I would have found something I am afraid that I don't have something but this person who has asked this question can you please send either me or Ilaria an email and I will try to find information and send it to you because I don't have something right now on the top of my card but I know that I have seen something but I cannot locate it right now Yeah I think it's feasible so please anyone who raised this question you can I am going to write my email address and I am also putting the OpenAir contact email for the webinars to address this question Okay thanks So there is another long question by Garrett and Nancy I suggest that you read it directly from the chat Okay It starts from Yeah exactly So your usability Okay so Gareth poses a very good question and the question is not so much about text and content but it's about research data and this is what makes research data more difficult than open access In general I am not like an expert in research data because I got involved in this open movement earlier on when research data was not so much the norm and I got involved more on open access I am kind of lucky that I was involved more on open access because research data is more difficult to moan and deal with Now I see also that there is another comment that says this applies to all content types research papers So for research papers the license needs to be open not only for people but also for machines and the same applies for research data Now the difficulty with research data is that they appear in various forms that perhaps further work needs to be done so that the data is ready for text and data mining So that is one thing Now with regards to the preservation this is another important thing and especially based on the fact that policies do not necessarily talk about openness of research data but they talk about having the metadata of the research data in a field, in an environment most of these cases this is going to be a repository which describes the data first and second describes the license of the data and then those who are interested in getting access into the data then they would have for example to ask the data owner whether the data owner can give access to the data or if the data owner would allow the practice of text and data mining depending on what the person that needs to do if this is allowed because there are a lot of ethical and legal issues around research data Now from what I think is that there are difficulties with research data in relation to the fair component and I do not feel that we are already over there and there is still a lot of work that needs to be done from both sides excuse me there is still a lot that needs to be done from both sides both from the academic side and or from the organization who maintains and keeps the repository and from the other side on educating those who own this data on how to prepare this data so that they can be fair and they can be machine readable Thanks Nancy for the exhaustive reply I hope this answers to your question Garrett Is there any other curiosity or question from the audience Are you familiar already familiar with TDM or is this the first time you have a question from Nancy What is the most strange question that people usually ask you about TDM Oh the most strange TDM is kind of a mysterious thing for most of the people even dealing with open science and open access I'm just wondering what is the strangest question I think that we don't have strange questions The question that comes up in everyone's mind is the question that also was the first question in this webinar which was I am not an IT person, I don't have skills why is text and data mining for me or what can I do with text and data mining I think that this is the most common question that I hear and I cannot say that I have had weird questions maybe because and that's my interpretation it feels that there may be either those who know because people on computer science or other fields that are more technical are familiar with text and data mining and it's not like a brand new thing Now as technology involves the field on its own involves as well but I think that very lately since the European Union has decided to add text and data mining into its agenda because it realized the benefits of text and data mining that now we start talking to other people from other subject fields about this So I don't think that I don't remember that I've had like a strange question about this. Ok so maybe the most strangest one was where I've been answered You still have some time to ask questions if you want from the audience If not, I would like to thank Nancy a lot for your very very interesting webinar the one you recorded and also for taking the time today to be here with us and reply to these questions and ok as I said the recording of this webinar will be available next week and thank you all Thanks a lot Thanks again Bye bye Bye