 Okay, so it looks like we are recording now, so this is now the September 2018 community call for, development community call for share, can just volunteer to be a note taker. So we're just rolling on through, folks are filling in as being here on the call. So today I think we have kind of this draft agenda going really, really fast, I'm looking through the list, I don't see anyone new on the call, is that correct? Okay, so looking at the action items from the last call, so we're looking at pulling together a list of GitHub issues for people to look at to start contributing, looking at the sources for share I think, is that what that is? We did put out a survey for more use cases, and I have not looked at the data for that yet, so that may be something to follow up on. And draft data model that is something that is in progress, but really kind of based on some of the comments from last time we were thinking that we could try to discuss what we think of as the minimum viable product so that everyone is kind of in agreement upon that for the work we're doing here. So myself, Jeff, Cam, and Ryan compiled our assumptions in this document last week, so it seems like it would be good to review really quickly here. I think I already have it open. Yes, so if you look at page two and then on three, so at the very top of page two, it goes from kind of the general to the more specific. So generally speaking, you're having a configurable editor for creating harvesting workflows with some predefined nodes there, with node red, a deployable server instance that takes flows configured in the editor to run them, present them, collect data to a backend data store. Sources, we're assuming the MVP is harvest from the highest priority sources, not necessarily 100% of the current sources in share yet, so like looking to, like as we add them to add them in order of the ones that are highest priority. And then ability to replicate and share data between two instances of the share server, so I'll stop there, or that should be two or more, really. But I guess two is like them in the vital products. Yeah, any questions or comments on this so far? I see some things coming through in the chat. Let's see here. So I was wondering if all these tasks that we can do it independently because for me, if I want to contribute, there are many discussions which I'm not part of, which you guys do regarding the model and the schema. So unless the model and the schema are done, is there any, or like, while you are doing the schema part, is there any part which we can pick and can we discuss some issues like that? Yeah, so that's actually like a good thing that matches up with like one of our guesses on what we would need to talk about today, where we had like, what do we need to know or what are the barriers to contribute? So that is then, so one you indicated, one that we had already talked about is trying to get, like as it makes sense to get others plugged into some of the side conversations that are happening. Yeah, so say that again. So like what were the other things? So my thing is that in the list of GitHub issues, can you point out some of the issues which we can pick up without being part of like the discussion which you guys have there, like, which you can do it independently basically? Sure, sure, sure, sure. Yeah, and I don't know that we actually have anything that is well-defined enough yet to qualify. Let me pull up the link again really quick here. So the ones we have here are still pretty high level. So this is something that is more of an administrative task. So we're looking to have, or the proposal is to have the various repos as they are in GitHub spread out for the project, like in order to be able to pull those all together at once, have those be MPM packages that could then be loaded in that manner. So that's what we started thinking about from kind of an installation process perspective. So if someone was to, say, build a server, you know, like what would be the process for doing that? So that's what, and then also like if someone wants to have a client, like how would they actually install and run their client? Like what would be the various things they would do if they're assuming that there would be different options for how that would be built or configured? So while this is under working, I wonder if it'd be a good idea for us to start looking at the lexid data, well, not lexid data, it's currently being collected. So is there a path to think about moving the existing data is collected to this platform, if there is, then we probably can do it in parallel to think about to get data out of the current share not file. Would that be a good idea? Yeah. So I think it is a good idea to take a look at how the data exists today. So that, yeah, so that's good. That is a good task. So let's go ahead and create that now. Let's say it was out of existing. So could Rick and Jeff help us to get access to those? Yes. Yes, we can do that. So yeah, okay. And I don't know if, if, if you're starting to capture that in the notes here, I would do that right now. I was just watching along. Okay. All right. Yeah. Yes. Because the it is, it is assumed that, well, I think that is actually like one of the questions of, of, uh, how useful it will be to migrate the existing data or just to recreate it. So I think that will be either way. Uh, this is a, that is a task that would feed into that. Um, it would be great to be able to take the data that we have and not have to reharvest that. Um, some of the record might have been deleted or have been changed. Uh, so, uh, I think one value would be keep a historic snapshot of what we have. Right. Right. Right. Right. Sure. Sure. Sure. Sure. Sure. Okay. Okay. So I didn't hear any, um, disagreement with the primary objectives. Does that sound, did these sound right for the, I see nods. I'll try to scroll so I can see other nods. All right. So, so it sounds like yes. So, so then the rest of this was really starting to get into okay to actually accomplish those objectives, what would be the technical requirements. Um, to do that. So looking at, uh, or like the more granular things that would need to be done, uh, harvesting data collection, validation, having, uh, JSON schemas that then could be used to validate against as data is coming in, uh, the GUI for share read, which, you know, exists in some form already, obviously, um, the raw storage. So like, so thinking that initially, uh, like MongoDB is one of the possible technologies there. I don't think that there has been like an absolute, this is what we're using decision, um, is that correct, Jeff? Yes. No, it's okay. Yeah. So, so this is really one, just trying to get some, some ideas down to start to test these, to see how well these things might work. Um, and then kind of mapping, looking at the schemas and mapping while it starts a pipeline in raw storage. Uh, for the mapping, uh, we're thinking about having some JSON templates that, uh, will define various methods, primitive methods that can be used to parse and taken data as it comes. So like, you know, a great example is, uh, a name value, maybe one more composite value that has, you know, surname, given name, middle name, et cetera, et cetera, and we may want to parse that apart and, and, and map it to a particular structure. So, you know, having methods to kind of, to go to that, to, so people have to rewrite that every time, um, and have that just be something to say, okay, as it's bringing this in, uh, apply this method and then spit out the appropriate, uh, type of data. Right. So that's, so, so thinking about how's we're map taking in all of these different values, like ideally in, in many cases, it'll be just a value to value transfer, uh, but not in every case. Um, let's see, enhance process data source. So, so one thinking that there would be one raw data storage as it's coming in, but then also as it is processed, have that go to more, and this would be more the active data store, right, uh, that would then feed into various tools, but wanting to maintain the raw data, if that is of any value, um, as it's coming from various sources. So that, so then there's some other things in here, um, you know, looking at the sharing data between instances, you know, just some various questions, uh, can someone ask for a tree of data for me? Can someone edit, update that data? I'm looking at my board since, uh, Jeff was actually here at Notre Dame yesterday, um, then we talked through a little bit of this, um, but in terms of some of the aspects, uh, and Jeff, maybe you could talk about this a little bit more, uh, but like there's various questions that the nodes could ask of each other. Um, so saying like, can you give me certain records from archive, for example, and then there's certain things that we could do that says, well, also ask your peers what data they have. So like, so, so one, doing more of a direct node inquiry. So like if Virginia Tech sent a, the node sent a message to Notre Dame's node saying, what data do you have on this? Notre Dame could also send that out to other nodes as well as, as an inquiry. Um, so those are some of the things that are just starting to be talked about. Um, APIs, having some kind of API, obviously, uh, and, and then kind of thinking about the MVP that not just building the kind of the core infrastructure, but also looking to actually build, have a couple applied examples of using the data. So to make sure that we're, we're driving towards at least a couple of those. So that's, that's what we have so far as the, as the MVP. How does, how does this seem? Okay. I assume silence is acceptance. Okay. All right. So, um, let me go back to the agenda to make sure we're sitting on track here. So then with that, so we started talking about some of the issues as well, and we did go through that. So, so maybe that is a good jumping up point. Um, Jeff talked about your node communication. Also wanted to mention, I know David, you said that you had to leave at, at the top of the hour. Uh, did you have anything that you wanted to touch upon before you had to head out? Uh, yeah, probably not. That is, uh, directly relevant to the agenda. No, I think I'm good. Okay. Okay. Rick, do you want to, do you want to go back to that question of what people need to know to contribute and what areas of interest they have? Yeah. Yeah. So let's, yes, that, that, that would make sense to, to close the loop on that. That would be good. Yeah. So just, uh, even independent of the list that you saw, uh, you know, a lot of the stuff, uh, we're pushing on a variety of areas just to get things kickstarted. Um, but, uh, obviously we would like more people involved as this is a community project and we'd like to, uh, you know, really foster that side of things. So what is it that you need from us to get involved? Where are the points that you'd like to get involved? What can you contribute? Um, what areas do you think, even from stuff you've heard right now, do you think you potentially contribute if you, if you knew more about it? We've heard a little bit of that earlier, uh, with, uh, transitioned some of the data, but, um, for this new development, uh, that, that's I think where we're most interested in in these questions. I would say I'd rather leave the architecting of the new product to you guys. Uh, once you, the framework is there, we can contribute to harvesting, uh, bit by bit. That's the, uh, nitty gritty implementation of that part. That's certainly something that we can contribute. But, uh, at this moment, I don't think you want too many, uh, too many chefs in the, in the kitchen in terms of architecting. But, uh, I would say architecting probably is better left with one team instead of too many. But I'd rather tune, uh, we would be interested to know how architect, architecting it, but, uh, we'd rather not stand in your way. Would that be a good idea? I wouldn't. Yeah. I mean, that comments appreciated. And I understand we're coming from, uh, but that does leave, um, you know, this harvest or side of things, uh, for example, uh, we think that node read, uh, and what we're developing with share read is going to be a, a useful, um, environment and framework for harvesting. And so, you know, one thing we could use some help with, uh, would be trying to write some harvesters, having other people other than us write harvesters, play with the framework. So that would be one area where then we could still be working on getting that framework built up, getting the pieces put together. And I think we can, I think we can do that. So I believe, uh, the main harder is the schema right now, uh, that's the starting point for anything I want to pick up from the issues. I think, uh, even if it goes for API or the front end or say a harvester, knowing the schema is the starting point. So, uh, if we can have that as, uh, like we can share with us, like what is the progress and, uh, is it done or like, you know, what we're, what we're doing is, uh, thinking about this a little differently than the past share, uh, uh, work and how we're thinking about this is this sort of, uh, um, I don't want to say objects of record and I don't want to say, you know, I don't want to put too much emphasis on the truth of these items, but harvesters, um, the raw data that comes from a harvester is interested in many ways before it's mapped to another schema. And so, uh, and, and what that, what that allows is for some common discussion on, uh, what that entity should be, uh, we can, we can more strongly identify that, and then access that data in, in a somewhat less, um, dynamic way. And so what we're, what we've, what we've been thinking is, uh, we will generate a set of schemas, uh, uh, in JSON schema format for each harvester. And so what that defines is sort of the, the truth from that harvester, and that gives us then a way to, to talk about that between groups, independent of how you map that data to another schema. And so, for example, you can maybe, uh, camera, Ryan can post in a link to, to a few of the, to the, a few of the harvesters we've written schemas for. Um, uh, and what this does is then allow for validation. So even, even ignoring any of the, uh, the identity, identifiability of that data, uh, it allows us to validate the API to make sure the API is not changing, uh, while we collect more and more data and try to do this mapping. Uh, so you can think of these as validators if you want to ignore, uh, some of those other characteristics. And the, the, the link is now in the chat. And I encourage you to take a look at that. Those, those repos and the files that are in there. It's early work, obviously. Um, but if we do that, then if we write a harvester, what we're trying to really do is get as raw of data as possible into, just into some format, and we'll be working on that very soon. Um, uh, basically right now a good exercise would be, can we write no red, uh, workflows that then can be validated against the schemas that you see here? And again, they are, they are supposed to be very direct mappings to exactly what is in that data. Um, and if we can do that and we enjoy that process and that process seems like it's, it's a useful way to work, then we could be doing the mapping exercises, the second step. And that's the piece I think you're, you're referring to with schema and that, that part Rick is actively working on right now, uh, thinking about the schemas for, um, some of these use cases, uh, in particular the, the research, uh, data dashboard style, um, uh, work or, or research output dashboard. Um, uh, but until then we still have, we still can harvest this raw data and maintain it in its raw format, which will have some benefits as we think about decentralizing, um, uh, storage and whatnot. So I think what you're getting at is that you want to see as many harvesters as possible to then finally go for the schema. Like you want to see all the raw, uh, input format, which will be available to you beforehand. Is that the crux of it? Yeah. I mean, we don't need as many as, as possible, certainly. But having a few, especially written by other people will help us really understand, is this, you know, we're, we're trying to do a few things with, with the harvesting. So, um, uh, if we're, if we're use case focused, um, which is the actual products that use the shared data, um, not everyone's going to be interested in the same harvesters. Um, uh, it won't make sense for one group, uh, to have, you know, thousands of harvesters that they actively maintain. And, and that's, there's, it's not sustainable. It's not, um, uh, it doesn't make sense practically in many cases. But what, what we could have is, is, you know, several groups maintaining a few harvesters each, uh, or many, many groups maintaining many harvesters each. And what that would be is, is simply maintaining the schema and the, um, uh, way of getting that data. Now, now once that happens, uh, so that the other, the other side of this is, if we're thinking about many people being involved in this data collection effort, this harvesting, which is non-trivial by the way, maintaining all of those harvesters, uh, at COS. Um, uh, it took a lot of just silly manual time, fixing bugs as, as people's APIs changed and stuff. There was a lot of time involved that went to that, that was actually not on fun novel work of using the data, but on just matching these things. And so if we can have the experts, the local experts, the people who really care about those harvesters keeping up with that, you know, I think it'll be a more scalable approach. Now we need to lower the barrier of entry then to writing harvesters, if this is a vision that we want to see implemented. And we think that Node-RED offers that accessibility to, uh, writing these harvesters without being necessarily experts at, at coding and, and, uh, you know, web scraping and APIs and all of this. And so thinking about that aspect of it, trying to really lower the barrier to entry, make that more inclusive, bring more diversity to that process. Uh, you know, what are the components, for example, in Node-RED that we need to make it really easy to extract these things to do this stuff? Well, we need, for example, uh, an OAI harvester. We shouldn't have people just, you know, parsing raw XML from an OAI feed. We should have an OAI harvester that takes in the, the, the sets of, of, of information you want, and, and blacklist the ones that you don't, and does all of that, uh, parsing, um, uh, somewhat automatically. Now that would lower the barrier entry to writing an OAI harvester. Uh, we might need a reg X, uh, uh, uh, uh, Node. We might need a, you know, just a glob, so you can use wildcard formats. That's a little easier for most people than regular expressions. And some of those are already built by the Node-RED community. We have a lot of these accessible to us, but we need to know which ones we need more and which ones we would lower the barrier entry to. And so having multiple people test this and try this, I think it will give us an idea about how inclusive we can make this project. It would be great if, if, uh, people learning how to, uh, deal with APIs or, uh, learning how to write JavaScript, um, or harvesters or scrapers could use this as a, as a way to get started. And then we'd have contribution from a lot of different people, a lot of stakeholders, um, without the need to have then, you know, this core set of, of developers actively maintaining every one of these harvesters. And if we think of the only really route of transformation, being the, um, the, you know, getting the raw data, then those people don't even need to worry about trying to do the mapping and transformation, which is a whole nother set of, of, uh, problems that, that we can work on making easier, uh, in a, in a, in this mapping layer. And so I think that that's how we, how we've tried to parse this of thinking about what is the, what is the, uh, uh, not, uh, what is the most, uh, direct way of getting contribution at that initial stage that, that allows one to focus really on, on the, the core aspect of that API, uh, of that source API. And that is purely validating that the API is the same and getting that data into a raw format as easily as possible. And we think no directs offer some, some benefits there. So that's the, that's the sort of the core of the vision and the, the, the philosophy that we're trying to maintain of inclusivity and, and lowering barrier entry and, and bringing more people into this, this, uh, area of, of, of research. I, I understand. I think we can contribute on that part. I wonder, uh, could we start, uh, some tickets and starting from the tickets and then we can pick and choose from there. Make the list of all the resources we want to harvest from and get the priority list. So, uh, that, that's when, uh, I mean, uh, if there is an, um, uh, uh, concrete task that we can take on, then we can pick it on one by one. Yeah. What, what, what we'll do is, um, I, yeah, well, Rick is writing a ticket for this as we speak. So we will generate this list, but I think what would be good for people who want to test this side of things and again, the more people, the better. And I don't think you have to have, uh, JavaScript programming skills to do. So, um, I, uh, and if you don't, and if you do have to, we can also help you, uh, sort of build those skills. I think we should maybe do another call at some point just to have, um, uh, maybe, uh, Ryan and Cam, walk through what they've done. And then, uh, we can, we can maybe even do one of, one of the harvesters as an exercise, uh, for people to see how we're thinking about this. And then you can, you can get a feel for whether you think we're hitting the mark in terms of, is this really easier, uh, than writing code or are we just, uh, fooling ourselves? Okay. Actually, that, that call would be good if you can have that call. Okay. So, Nick, let's, let's, uh, get to see if we can get that scheduled. So if you're, if you're interested in this side of things, why don't, why don't you, uh, put your name down, uh, uh, somewhere here, uh, and we will, uh, make sure to, uh, send out an email and get a call scheduled, um, as soon as possible. Okay. I'll put it in the notes. Okay. Okay. Very good. And, and really I welcome anybody else here, uh, to, to do that. I think this would be a useful, um, uh, again, the more people, the better here, um, with a variety of backgrounds, uh, to, to, uh, try this and play with the system. If we're, if we're off on this flow style, this visual style program, and we should, we should know about that before we commit too much to, um, this is a method for, uh, creating harvesters. Maybe there's some other, uh, paradigms that we could look at before we, we do too much to commit to it. Uh, I have one other question. I think in one of our previous meetings, someone put in a note saying, uh, there is a different project that's, has already harvesting almost every, uh, repository from OpenCore. Can we use that as the source? Yeah. So, uh, if, obviously, if there's data that's already being harvested into a, a useful, um, uh, format, we can, we can certainly create, uh, harvesters for that. Uh, there were some issues, I believe, with OpenCore in the past, and that the license, um, to that, that, or the use of, not license, the use of that API was, uh, um, and I believe it was OpenCore. You shouldn't, you shouldn't, uh, well, we're recording this to video, but, uh, I'll need to look into this, but, uh, I believe in the past, this was a, uh, a resource who, uh, required non-commercial use, for the API, and that works with most use cases, but because we're combining that, some groups are going to be combining that with commercial data. Uh, there were, there were just some ambiguities around the, uh, uh, use of that API. So we would just need to make sure that, um, if you get that data and you make it available, then you'd have to make it available under, this was the issue, you'd have to make it under available, you had to make the data on available under also a non-commercial use clause, uh, and for many people, they wanted anybody to be able to use this data. And so if we introduce that data into the corpus, then we have to somehow mark that as only available for non-commercial use, um, and that doesn't, that hasn't in the past, uh, fit well with, with the objectives of the stakeholders, but, um, given that we do want this to be used by many people and whatnot, but we can certainly look into that. We should look into, if I'm even right, if core was the, the, uh, uh, group, and then, um, if we can, yeah, we should, we should harvest from, from anyone that has good data. Okay, I see it. It's in our use case, uh, someone's, uh, putting, uh, UW guys putting in an item called integrate unpaywall, that unpaywall thing, uh, has already harvesting huge amount of stuff from the open core. So I wonder if we can use that as the source. Yeah, and unpaywall, uh, I, I, I, I spoke to them about this issue in the past, and I'm not, I'm not sure where they landed on it. I don't think they ever got, um, uh, confirmation that they could use it for non-commercial purposes. So I think they may have just ignored that clause or, or I, they're maybe not ignored it, but, uh, maybe they believe there are reasons that it wasn't, um, uh, it wasn't relevant. But, but, uh, I, I do want to respect their terms of use. Um, if, if there are, uh, restrictions upon the data, I, I don't want to just, uh, violate that, um, uh, without, without having some way to pass that information on. But unpaywall was also trying to, I believe, harvest data without OpenCorp because of this issue. So, uh, they may, uh, be harvesting their own data as an alternative to OpenCorp. I think that was the plan was to write harvesters, um, that didn't use OpenCorp because of this issue. So, um, we, we'd have to bring on, uh, the open paywall folks to, to, to check for someone to contact them. But, um, yeah, we can, we can certainly look at that as a source as well. Is, is there, is there a consideration in the implementation to sort out the licensing mass? I mean, it's, it's very complicated issue. Um, just because data are there, uh, that it's available that we can grab, um, does not necessarily mean we can, uh, put the data as an output. So I wonder, um, do, do we, do we set ourselves to deal with the licensing issue? We can, we can possibly just ignore this all day. People use at their own risk. Yeah. Well, uh, we certainly could take a more liberal perspective and a lot of people have encouraged that in the past, given that they, they believe that the, some of these things are, are not, um, they, they can't be upheld in, in for any, any sort of, uh, legal consequences. We really do want to stay away from licensing metadata. Certainly that's, we'll set a bad precedent, uh, in the community. This, this, these data are facts and they should be thought of that way. The problem is that, you know, like the, if, if you would, and again, because I'm not positive, I won't use that as an example, but, um, let's say I have some data that I make available under, so with my API, you can only use for non-commercial purposes, for example. Um, well, so the data isn't technically licensed non-commercial, because you can't license facts. Facts are, are facts, and they, they, they, uh, by definition cannot be licensed. Um, uh, but the use of my API, I could have a, have non-commercial, uh, uh, terms for that. Now, uh, uh, there's a question of whether that can be upheld in court, uh, at least in U.S. jurisdictions. Uh, uh, but, but just being, being respectful to the API, you may say, well, I want to respect the fact that those are the terms, and therefore, um, I want to then pass those terms along, the use of this block of data, because I accessed it under those terms should now be, should now be non-commercial. Uh, the license, really, they're facts, and so now that they're open, they're open, but are we respecting that, that original source's intention with that, the collection, and really access to that data? No, not really. And so, uh, it gets very, very complicated. Uh, that's what I'm asking, yeah. Um, I think, I think the best thing that we can do is try to, um, for the most part, if you plan to share these openly, uh, uh, collect data that is facts, that can be treated as facts, and, and cannot be licensed, is, is available to anyone. Now, we can segregate, some of that data, and, and actually annotate it with different, um, terms. So, for example, there are stakeholders in the community that want to bring in data from Scopus or Web of Science. Now, that data, again, it gets funny because these are facts that are being, that are being, um, pulled from the API, but that would contract of most of those groups. It would violate the, the initial, the agreements that, that you sign when you purchase the Scopus API to, to then share those freely. And so, we may need then to either maintain that data as private, so that no one can access it, or you may need to, um, have some way to say, well, I know that Virginia Tech, buy Scopus, and so they'll be whitelisted, but any, no one else can access this unless we can confirm that they have purchased the Scopus API. We can, we can at least build the framework to allow for those things. But I, I, um, in terms of dealing with that, in terms of sharing, it'll be up to the, to the nodes themselves to make the call whether they want to share that freely or not. And I think that's, that's going to be the part that we'll need to do a little bit of education on. But we'll, we'll allow for that in the framework for sure. Okay. It's very, very complicated. This open core one is specifically complicated if it is the one because of the, because of the, the structuring of the terms of use. We've tried to get in touch with them to, to ask these questions. Um, uh, we should look at Unpaywall's API because I believe they, they harvested their own data because of this, explicitly. So that would be the, that I think the thing to do. Sorry. Do we just understand do we mean core C-O-R-E or do we, is there another thing open core? I think we mean C-O-A-R, right? C-O-A-R, yes. U-A-R, okay. Yeah. I think it's just, I think I don't, I don't even know if it's open core. I think it's just, it might just be core. Um, it's a, it's a, There's a thing called core, which is partly funded by GIST, which is basically sort of an aggregator for open access. Yeah. I don't think that's it. That's not it. Well, is it, is it C-O-A-R, C-O-R-E? And then there is, I think it's C-O-A-R. We're talking about C-O-A-R. Door, D-O-A-R. Sorry. Door, yes. Which is, yeah, D-O-A-R. Directly open access with positive trees. Okay. Sorry. Just, okay. Yeah. That makes sense. Yeah. Yes, yes, yes, yes. So, uh, too many O-A-Rs and O-O-Rs of this. So. Yeah. Okay. Good. Let's, let's clear this up while we're, while we're here so that no one thinks, so ignore everything we've said about core so far. Open door, open door, yes. It's, it's open door. Open D-O-A-R. Yes, that's correct. The directory of open access repositories. Core and core and open core and open core are doing wonderful things. I'm sure I have no clue about their commercial use or not. Open door is what we've been talking about. Okay. All right. Because I know, I mean from working on open air, I know open air, I'm fairly certain open air is sourcing data from open door. Yep. This, I think for the most part people have been ignoring this, this non-commercial thing or just assuming that they are because they're non-profits, they are also non-commercial but that, that's actually not sort of an incorrect way to look at it because we're passing this data onto other people. So we just need to look at what in it, let's, let's just make this more general. With any data that we harvest, if you want to be harvesting that data from your machines, you need to do a few things. You need to look at, if they do try to license the data, you need to look at the terms of use that they have. You need to look at whether they have a privacy policy and whether they have a robots.txt file. All three of these things are sort of critical to, or four of these things are critical to understanding what you can and can't do with data. And so before anyone harvest anything, you should have a good understanding of those things and those terms may include, for example, you can only hit this API 100 times per hour or one time per hour or one time per day. You can only make so many requests and so you have to, you know, legally, you don't have to because some of these things have not been tried and whatnot and we don't know if there's legal consequences or not but out of respect for that, that source and probably out of respect for that your lawyers on campus and whatnot, you should, you should probably think about those things before you make that call. Now, most of these groups, they want you to use their data, they want you to use their APIs. There's not going to be any problems whatsoever. But those are the, those are the things you should take a look at before harvesting and I can help you understand some of those things. Obviously I'm not a lawyer and you need to talk to your lawyers if you're going to be doing any of this stuff but I can at least tell you where to look to find this stuff. Most of the groups, they're fine and you can always email the group as well and ask for specific access for your use case. So that's the other other thing that even if you find an ambiguous clause you can email. We had a problem getting a hold of anybody at Open Door back at COS and I think Open Paywall had or Unpaywall had some issues getting in touch with with someone. So we may just need to try again if we do want to use Open Door or GEO AR. It looks like they've also released a version two which may not have this non-commercial clause in it. So put your name down if you want to be part of a harvesting call and we can do that at least the technical side of that we can choose some benign repositories to draw from and then we can deal with some of these issues later on and maybe that can be something we try to get a group together to document how to guide or recommend practices best practices when it comes to harvesting data. So I am wondering if we should move the internet communication to a future call just because it feels like we had a good absolutely. I think I think we can we can focus on this initial step of harvesting get people into that pipeline and then we can talk about that at another call. Okay. Yeah but just to kind of review. I think there was at least one action item that's not in the list yet. Let's see here. Let me go to the because I was starting to create some like GitHub issues there. Okay so ones we have this journey in the list of desired harvesters of feed intips into those so like what are the like this more specific desired I I priority harvesters Oh come on now what I have to think it's making want to comment Oh it's up here. Here we go. So I you want to if you want to dig into to some of the the licensing issue to see the complexity I'm going to link a few things here with open gore as an example now that we've confirmed that that one is so if you look at that link you'll see explicitly that they license which is a problem to begin with I think for a metadata standpoint for reuse under a creative commons non-commercial share alike attribution license which attribute the attribution aspect is actually also problematic when it comes to things about machine learning AI aggregate use cases where if you use that information to say generate a training set that you then did something with you'd have to then find a way to attribute every every piece of data that you used and then but if you are using stochastic algorithms or you know a very large corpus of data that attribution can actually be impossible or practically very very difficult so there's some there's some issues just there to begin with now that that page links to version two and the version two API page says nothing about terms of use so that that I see as we're looking so hopefully they've moved on to a more facts based use of their data but this is where things got very complex and we had many discussions about licensing and the legal side of metadata and whatnot okay sorry just just to jump in just because obviously this is a JISC service if you need and I'm from JISC even though I'm not overly fulfilled familiar with open door but obviously I can help sort of facilitate finding the right kind of people that would be that would be excellent and I didn't I didn't know I guess I wasn't in the past I didn't know that yeah it's yeah it's it's not been heavily branded yeah now it's very clear in the version two that it's JISC and so I would have I would have emailed different people in the past when I didn't think the response but yeah you know we'll we might need to to get in touch then and have you put us in contact with someone the project that would be good yeah and I'm happy to help with that obviously actually why don't we do that open door has always been one that we thought would be highly useful and because because there was that more conservative license around the data and now there's not I think it'd be smart for us to go ahead and confirm what what the that term used for and so if you can find someone that would that would know or can talk to us about this that would that would probably be a good thing just to go ahead and do yep okay okay I'll follow up with that and cc you and then yeah that'd be great you have my email address but why don't we let me find the dock I'll put I'll drop for camera Ryan if you have the dock open put my email in next to oh here we go I think I got it so at the top I'll put it in there all right okay very good thanks thanks Lou I appreciate that that's good you're welcome okay so yeah here's the more complete notes see yeah so so thinking about what's left to tie up for this call we've got one scheduling the call to walk through harasser and like just quick question like when should we target you think maybe well let's see we could try to add one call in between this one and the next one or we could try to shoot for more like week from now any opinions there do a little poll when you have the list of names okay yeah next week that should be fine but it's especially time that's might have issues students have their class times that cannot be shifted okay okay that works okay okay yeah but it looks like some time within the next two weeks I would say or or two weeks out from now like some more than that time frame and see anything else for today feel like we're edging closer and closer to expanding the contributor pool all right yeah so like so then kind of I think in preparation for that call we'll be trying to pull together all the relevant information examples etc that it would be a good idea to record that call to so that where people cannot join the call they can view the videos layer yeah absolutely yep sounds good okay all right well I think is there anything else for today okay well let's break a few minutes really then very good thanks all looking forward to to future conversations here thank you all right and I assume is is that Abram from C.U.S. that's also on the on the call there yeah sorry I missed the roll call no problem I'm curious about Sharon and I'm like so I'm still maintaining I'm now the only maintainer of the share V2 here at C.U.S. so perfect perfect well good thanks for getting on the call Abram all right okay thanks all thank you all all right bye