 We're so glad that you could join us. You have reached a webinar that is part of CNI's Spring 2020 virtual membership meeting, which is continuing on through the end of May. So we have about 10 days left or so. And we're really glad that you could join us here today. Today we're going to be hearing about trends in institutional repositories or IR access, looking at whether users are conducting searches from the global north or south. The title of today's session is Democratizing Access to Research, using RAMP data to compare trends and IR usage between the global north and south. And RAMP stands for the Repository Analytics and Metrics Portal. We've hosted a number of sessions at CNI on this project and we're absolutely delighted to have another presentation for this meeting, this time coming from John Wheeler, who's the data curation librarian at the University of New Mexico, along with Min Pham, who's a doctoral student at the University of Missouri, Columbia. I'm Diane Goldenberg-Hart from CNI. And again, I want to welcome you and I want to welcome our presenters. And with that, I'm going to hand it over to John. Okay, great. Thank you. And I want to take a second to thank Diane and Beth and everyone at CNI for the opportunity to present on some of the ongoing research that we're doing with RAMP data. And also, before we continue to acknowledge the contributions of our colleagues, Kenning Arlich, who is the Dean of the Library at Montana State University and Nicholas Peroullian, who is a doctoral student at UIUC, who are our collaborators and have contributed a great deal to the research that we'll be discussing today and to this presentation. I would also like to take a second to acknowledge that RAMP is one of many products of research collaborations that were funded by the IMLS. It was a result of work that was done on the Measuring Up project, which was a collaboration between MSU, ARL, UNM, and OCLC Research. And since then, the IMLS has additionally funded a RAMP research into using these data to improve institutional repository discoverability and use. And to give a quick overview of what we're going to discuss today, an argument for IR that you often encounter in the literature is that they can democratize access to research in the sense that research that is produced in established economies, for example, in the global north and countries North America and Europe, that by making these data publicly accessible through institutional repositories, they become discoverable and accessible by researchers and citizens in developing economies and throughout the global south. So RAMP data gives us an opportunity to test this assumption and we'll look at it in a minute at more detail about what RAMP data are. But what we're going to talk about today is initial findings. Some of the preliminary findings that we have looking at this is that most IR traffic comes from the global north. However, an interesting point that's been noticeable for us is that most traffic from mobile devices actually comes from the global south and that this has real implications about how device use can affect equity of access. Before launching into that, I also want to step back for a second and talk about how another focus of our current work is to increase transparency about how RAMP harvests and processes data, right? There's 60 IR currently in RAMP. Some of them have been with us for three years. All major repository platforms are presented and so there's a lot of data and in order to make it more accessible and to put it out there for the community to do some analysis on it, we publicly released some of it in November with an update in January, 2020 and that information is available. There's a link to that, the Dryad repository for that in the slide here. We've done some analysis and written up some manuscripts with our findings and some of those analysis we've got the code in a GitHub repository. And so we encourage others to access the data and to see what can be done with it. And another aspect of our research is not just to use RAMP data to see what kinds of insights we can get from it but how can RAMP data be augmented for additional insights and additional findings and in the past, for the past year or so we've been talking about insights that we've gained by combining RAMP information with IR information. For example, how many items are in repositories, how many of those items are ETPs, electronic theses and dissertations. Today we're gonna be looking from a different perspective about how we can combine RAMP with data with global region and population data to start asking questions and looking at answers to these questions about how do devices use to access IR content, effects, search behavior. And how is IR content accessed differently between nations in the global north and the global south? And there's a link to a Wikipedia page article on this definition but in brief, the global north is generally regarded as North America and Europe and again some more developed economies whereas the global south tends to be Asia, South America and Africa and a lot of developing economies there. And we're really interested in looking at how IR content is accessed across these two different regions. To bring some of the transparency that I just discussed to this presentation, here's an overview of RAMP data and what kind of data RAMP includes. I'm not gonna go into it in detail but basically what RAMP does is it harvests search engine performance for participating IR from Google Search Console. And this gives us a way to capture activity that is not reported by server logs or services like Google Analytics. We use the same reporting method for all repository platforms so we're able to create a baseline of search engine performance for digital commons, repositories, d-space, et cetera. We download the data in two data sets. This is important for our discussion today because one data set actually does include search engine performance information for individual URLs, for individual items and content files. It's a very granular data set. The other data set is the one we're gonna be talking about today and it's not quite as granular. It's aggregated instead, not at the page level but at the IR level but it does provide information about the country from which users are conducting searches as well as the devices that they're using. More detailed documentation is provided with the published data set and one last thing we like to emphasize in all of these presentations is that RAMP data contain no personally identifiable or sensitive information. And to demonstrate that, here is a few rows of what we call our country device data. As I said, it's harvested in a way that is aggregated at the IR level so this is as granular as this information gets for us. So we can see if you look at that second row there that on March 8th, 2017, some user from Panama who was using a mobile device clicked on one URL that was listed one time on the first page of a search result. So again, it's not very granular but we can still gain a lot of insight from this data even at that level of aggregation. And as a way of demonstrating that and some of the advantages that we can gain from augmenting the data, I first wanna step through some sort of analysis and overviews that we can get using just RAMP data. So for example, and this is a graphic that should be familiar to some of our IR who are participating in RAMP, we're able to give us some of clicks by country, right? So if the data that we're looking at today are January 1st through May 31st, 2019, a subset of 35 repositories, we can see in RAMP that at a high level that across that time period, however many millions of clicks are in that data set, clicks on IR content, almost 3 million of those came from just the United States. Looking at it by device, again within that same time period of the first five months of 2019, we can see that nearly 8 million of the total clicks on IR content and search engine results pages came from desktop devices. We can combine these views within RAMP just using straight up raw RAMP data. For example, we have a heat map here that shows the sum of clicks on IR content within search engine results pages and it's broken down by country of origin and then sorted by device. So if we see there on the bottom left that the most clicks came from the US, the United States, we can further sort that and break it down and see that two and a quarter million roughly came from desktop devices, about half a million or a little more came from mobile devices and less than 100,000 came from tablets. So we start to get a visual idea of some of the access trends that we're getting within the system. But we want to think about what does it mean, right? And we're interested in researching these additional insights that we can gain by augmenting RAMP with complementary data sets. And in this case, what we're doing is we're going to map out a process. We have mapped out a process for augmenting RAMP data with world population data and also a data set that men created manually of regional classifications where for example, you have the country and the location is a zero or a one, a zero being the country is located in the global south, a one indicating that it's located in the global north. Here's the way that we map these three data sets together. So there's our RAMP country device fields in that data set. The country codes that we get are three letter ISO codes which we're able to map to a set of regional classifications I'm sorry, that we're able to map to country ISO codes in the regional classifications data. And then we're able to pull out country names and population information by mapping to the country name across those two data sets. So it's a very straightforward way to extend RAMP data to gain some additional insights that NIN is going to go ahead and discuss for us. Thank you, man. Thank you, Sean. So looking at the location map, 35 IRs registered with range. We see that most IR primary located in the global north with 33 are located in the global north, with 24 are located in the US and there are only two representatives from the global south which is south which is located in South Africa. And the number of items registered in the 33 repositories in the global north account for 99% of the total content items. And though the 35 IRs for which we have published their data may not be representative of the global IR distribution, the sample we have to some extent contribute to supporting the assumption that research and academic knowledge is primarily produced in the global north. And how IR contribute to the democratization of access to research. To answer that question, we first look at how IR content is assessed differently between the global north and global south. Then we look at how devices use to assess IR content of research behavior. Looking at the total click activity aggregated by global region, we see the big change here is most IR share comes from the global north despite lower populations. The global north accounts for less than 20% of the global population, but more than half of the click activity on IR searches, on IR pages, appearing on such results on Google properties were from users in the global north. To have a more detailed view of the chance in IR usage, we make click activity by global region and by country. And we see big chance here. First, before waiting click by population, the overall change is a majority of click activity comes from countries in the global north, especially from US, UK, Australia, and Sweden. India and the Philippines represent the highest click activity in the global south. Second, when waited by population, most click activity is still from the global north, but we see another two different chance. First, there is a shift in country ranking based on average click activity per person. For example, why US and United Kingdom have the most clicked activity, but it is Sweden and Australia that have the highest rate of average click activity per person. And second, click activity per person across the global south with weighted data appears more evenly distributed. To further examine how chance in click activity change when clicks are weighted by population, we look at click activity of top 10 and bottom 10 countries. When weighted by population, average click activity per person changes global chance as well as regional ones. Before waiting by population, the list of top 10 countries with the highest activity includes several countries in the global south, including India, the Philippines, South Africa, and Nigeria. India and the Philippines are even ranked second and third respectively in terms of overall clicks before waiting click activity by population. After waiting click by population, then the top 10 countries by click activity per person only includes two countries from the global south, which are Barbados and Panama. And the first country from the global south after waiting ranks fifth higher than US we only ranks 10 after waiting data while it ranks first before waiting by population. And when we look at the bottom 10 countries, then both before and after waiting click by population, the bottom 10 countries are all from the global south. So looking at the top 10 countries and the bottom 10 countries by click activity in general or waited by population, we see that, yeah, it once confirmed that most click activity come from global north. Extending the analysis to include data about the device use to conduct a search, we see new chance stand out. First, the use of desktop is widespread overall. But it is more noticeable in the global north and desktop accounts for 30, sorry, 79% of the clicks in the global north when it is only 59 in the global south and the use of mobile devices is more common in the global south. Sorry, global south than in the global north. Mobile device account for 39% of the click in the global south when it is only 18% in the global north and the U of template in both regions is limited. When looking at how device use to access IR content affects search behavior, we see that click activity is affected by the position of URLs in search engine without space. It is nothing new, very intuitive. But what's worth noting here is when it comes to mobile and tablet users, their tolerance for high positions of items on the result space is significantly lower than desktop users and considering a high amount click activity generated from mobile devices by users in the global south, this poses a usability challenge as well as search engine optimization implications from IRs. IR managers need to make sure that the content is high up in the search engine results page. This improvement of usability and shows engine results page position of IR content may benefit researchers on mobile devices, especially researchers from the global north, sorry, global south. Lastly, we look at the correlation between position, device, location and clicks and the results from across the Poisson model. We show that all the correlations are significant and confirm that all the chance we have seen are valid to be more specific. The negative correlation between position and click indicate that the higher the position of a document in the results page, the more unlikely it is to be clicked. The negative correlation between device and click indicate that the change from desktop to mobile tablet are used negatively affect the clicked. And last but not least, if someone from the global north, it is more likely that that person will click on the content on the results page. And now I will give the floor back to Sean. Thank you. Great, thank you very much, Min. And as we see there's concerns that are raised by the data or questions that are raised by the data about an overall disparity of access to IR content among users in the global north and global south. And some of these trends that we see, some of these trends that we see, they actually go, they're common across all regions. So we notice potential disparity of access between the global north and the global south. We also notice a potential disparity of access between users in specific countries. We notice that India and the Philippines with unweighted population, when we didn't weight by population, they were very high in terms of click activity and we weighted by population, suddenly they were very low in terms of their ranking of click activity, which is suggestive that perhaps a smaller set of the population in those countries is actually having able to access these data. Or, I'm sorry, IR content. But with that in mind, we will talk briefly about some of the limitations as Min pointed out in our subset of 35 repositories in the published data set. They were mainly from the global north, mainly from the United States. Within Ramp itself, among the 60 repositories total, we do have a broader global distribution, but it's still true that most of our participants are from the US and other countries in the global north. And so in terms of search engine performance, that can create language barriers. There are other factors that can bias our data in that regard. So that's something for us to be aware of. Google Search Console is the sole Ramp data source. And that comes with limitations in terms of people using other search engines and other services such as social media to access IR content. But importantly to this discussion, those limitations are important, but additionally important to this discussion, users may not have access to Google services in some countries where internet access is more controlled. Which brings us to some of the emerging questions. What does cause some of these suggested disparities in access to IR content? Could be research culture, the political climate, et cetera. But importantly, how can IR and IR proponents address the possible usability and discovery challenges that are faced by users of mobile devices? Because as our data show, across the global south, so much access is coming from mobile devices that we really need to make sure that IR content is surfacing as highly as it possibly can within search engine results pages. And so with that in mind, we always like to take an opportunity to encourage more community participation in Ramp. It is a free service. It works with all major IR platforms and we can get IR set up in the system in about a day. There's some instructions here that I won't linger on in the interest of time, but there will be included with the slides. We want to thank and acknowledge a lot of the repositories that participate in Ramp and whose data were included in these subsets. And finally, there's our data sources for our data augmentation. And we'll back up a second to open the floor up to questions. Terrific, thank you so much, John and Min for that really fascinating overview of this project. And we really appreciate your coming to CNI to talk to us about the explorations you're making into the accessibility of IR content and the methodology that you're using to unearth some of that data. As John said, we're definitely fielding questions at this time. I apologize, I neglected to orient our attendees during my introduction to the Q&A tool at the bottom of your screen. There's a little button that says Q&A. If you click on that, a Q&A box will pop up and you can type your questions in that box. And John and Min will be happy to field those now live. Another option that you have is to raise your virtual hand signaling to us that you would like to ask your question live or make a comment live. And I can unmute you at that point and we welcome you for making live comments. While we're waiting for attendees to type out their questions or raise their hands, one question that I have for you both, whoever would like to take this, I'm curious to know if you have any partners you're working with in the Global South to help sort of drill down maybe a little bit. I'm thinking in particular the data that you have when you were looking at population. You mentioned that in India, when you waited it by population, there may be an indication that there are pockets of the population that may have better access. And I was just wondering, are you working with anyone in those regions of the world? We are not currently. One thing, a development that happened just before the pandemic came along and resulted in a lot of shutdowns was we added a repository of some research fisheries in the Philippines, who I expect there's tremendous amounts of traffic to that resource and there's research going on there. So that would be one to reach out to. But part of our current research is to, again, to explore open data sets or publicly accessible data that we can use to leverage this information. And I think that finding collaborators is definitely a part of that. Great. Okay, thank you. Yeah, and so there's a question from an anonymous attendee. We have 60 IR registered with ramp, but our data set only includes 35. And can we explain why and yes. That is because at the time, in the first five months of 2019, those 35 IR had been with us the longest and they actually had data for that full period. We also periodically run into configuration errors. For example, if a repository migrates their platform or if they do an upgrade, that can cause the authentication file that we use to be able to access data. Those files can be lost in the migration. And so at the time, I think we had about 45 to 50 IR and after doing a quality control check and a quality assurance check on all of them, we decided on the subset of 35 as having the most complete data. Okay, great, thank you for that question. And thank you for addressing that, John. And the list of participating IR, that's available from the OSF portal, is that right? Where can people see that? That actually is a great question. And the names of the IR whose data are included in the subset is included in the documentation with the subset. I don't know that we have actually published all of the participating IR. So that's something we'll have to give our attention to. Oh, okay, okay, good to know. And just so everybody is aware, we did chat out in the portal, I'm sorry, in the chat box, the link to that OSF portal, the data subset, as well as the link to the project briefing webpage on the CNI website where we will post a copy of the presentation slides from today as well as a video of this presentation. I also wanna just take this opportunity to remind everyone that this is part of CNI's Spring 2020 membership meeting. We still have about 10 days left to go with lots of offerings. And I'm chatting out there, a link to the schedule so you can take a look and see what else there is to come. And we hope you'll join us again. I would like to just take this opportunity to thank our presenters once again for coming to CNI and talking to us about your project. Thank all of our attendees for being here. Much everyone and take care, bye-bye. Thank you.