 Okay, I think that is a good so again for the recording. This is the first of many share community development calls and we're doing a brief round of introduction before we get into details. All right, so sorry, Jeff, were you done there. All right. So, Judy, you're next in the list. Hi, this is Judy Rutenberg from the Association of Research Libraries, and I'm here on this call really to listen to all of you. I'm really thrilled to see how many of you on the call and to listen sort of keenly for how ARL can continue to support this community. All right, going down the list. Cam, you're next in the list. Hey guys, my name is Cam. I've been working with Jeff at 221B for about six months now, and Open Access is something I'm really passionate about and I'm really excited to be working on this project. And Ryan's next. Hey guys, I'm Ryan Mason. Same thing as Cam that I'm working with Jeff at 221B. I'm a software engineer and I got involved as chair through TOS. All right, then David, I thought I saw you in the list as well. Yeah, I'm here. Hi. Hi, everyone. David Minor from UC San Diego where I work in the library and run our research data curation program. And we spent a large chunk of 2016-17 working with Cher, building a local dashboard, a portal, if you will, to access kind of our campus-based information that was in Cher. I'll introduce myself. I'm Julie Shea from Virginia Tech. I've been involved in Cher on and off from the beginning till now, some involvement but not very heavily. And we're happy to contribute more from now. And I also have Eileen Chen here. You probably introduced yourself and I think Cyrus Wong is joining us online, I'm not sure. But if it is, he will introduce himself after Eileen. So let's move on to Eileen. Hi, my name is Eileen Chen. I work with Tzu and University of Virginia Tech. Okay, Cyrus out there? Well, no, then let's move on to the next one. Hello. Can you hear me? Yes, we can. Yeah, okay. Cyrus here. Let's introduce yourself. Yeah, this is Xin Yue. I'm a PhD student in Virginia Tech. So I work with Tzu in the relative digital library project. So I'm interested in the Cher project to see what I can contribute. Excellent. Let's keep going down the list. Thank you, Matt. Hi, I'm Matt Harp. You're the next on the list and then it probably makes sense for others that similar location to to pipe in as well. Hi, I'm Matt Harp, a research data librarian here at Arizona State University. I'm also one of the share creation associates. I'm also working on the NEH grant as a visiting program officer for ARL. And ASU has a lot of interest in what's happening with Cher, the future of Cher and potential collaborations. And I'm actually sending a link to one of our AULs, Deborah Kurtz, so she can join the call. Great. Yeah, I thought I saw Deborah in the, in the Discord channel. Yeah, there was there were some issues on her end, but I'm sending her the link now. Awesome. Thanks, Matt. Hi, I'm Hannah Kraft, UNC Greensboro, and I represent the NC DOC, a collaborative institutional repository that's based at UNCG. And we have right now nine schools were potentially growing, and there's interest in that group in working more with discovery through Cher and contributing materials. So I'm here to represent that group. Great. Hi, this is Allison. There's a bunch of us here from University of Minnesota Libraries. Lisa Johnson has been involved with Cher in the past, but the rest of us are just hoping to learn more. Hello. Damian Capps from Villanova, the whole technology team. We know absolutely nothing about Cher, and we are hoping to run back today. Good morning. Can you hear me? This is Lauren DeMonte from University of Rochester. I'm the Director of Research and Issue Visit to the Library. Hi, I'm Sherri Lake. I'm the scholarly repository librarian at the University of Virginia. I was to one of the share curation associates. And I'm very interested in the next phase of share. So thanks. Curtis Thacker from Brigham Union University was here to learn more. Welcome. Chris Cullen from, I'm a data curation librarian from the University of Arizona and also manage our data management services. And I'm interested to learn more about how we might be able to work with Cher. Excellent. And then Leo, I think your last on the official list here. Hello. My name is Leo Mack. I work for JISC in the UK. If I sense it correctly, I might even be the only person who is not on the American continent. So yeah, I work for JISC on partly in one big EU funded project, which is open air and more specifically open air connect. And I'm mostly here to learn about share, learn a bit about sort of, I guess, development aspirations, what the type of collaborators are, and maybe also ask some questions about sort of business developments or business model developments. And then, yeah, just get a sense for where creation opportunities are. I think Deborah, as I saw that you joined. Hi. Welcome. We're just doing brief introductions here. Hi, Deborah Hank and Kurtz, Arizona State University Library. I'm new to the role here at the libraries but familiar with the work at Cher through other previous roles and looking forward to working with everybody kind of in new contexts. Thank you. Thank you. Is there anyone that we missed? Okay. Sounds like we pretty much got it covered. Welcome all. So we have a great group here and it's very exciting to have you all here. I think our plan here is to stay pretty informal for this first call. So I'm looking, hopefully you can all see the agenda and the screen sharing here. Maybe zoom in a little bit of that. I won't let me will it. Why zoom through the menu. I'm too used to the quick keys and it won't let me do that. Oh, there we go. Perfect. Okay. So, so essentially we just got acquainted obviously. And then the plan is just outline the current direction where the activity has been current progress. And then really just talking about getting organized ways we see folks contributing, participating and then near future and and longer term as well. And then really thinking about the frequency of these calls as well. So there is a link in the agenda there that goes as other documents that I figured we would start from and the plan here is for Jeff and I to kind of outline a little bit of the current direction and activity. But that this this document is what we've been working on. I just went to the share read specs that is essentially a more detailed view in terms of what I'm what what Jeff and I will talk through quickly here. But you can skim through it and look at all the various aspects of this. But I think really the best place to start is this diagram here on Jeff you added a lot new stuff didn't you, or someone did is real time updates here. So let's see. So the Jeff would it would it make sense to kind of talk through the ideas around note read and share read then kind of within the larger context. Yeah, I think that yeah, maybe we should start with just a little bit of the larger context and then get into the share read. Yeah, sounds good so so so within share. One of the things that that were that were in progress is a a shift from being a highly centralized support and development as well as infrastructure model to being more distributed in infrastructure as well as people support participation etc so really top to bottom in that regard and and and the to what degree each of those is distributed versus centralized is something that is still in progress. But the, the basic idea is to have the actual processing of metadata that may be coming in to share the indexing the sharing of data the development, the building of tools on top of it where those tools may be running to have that be kind of distributed and I have a question from David about, are the notes supposed to be collaboratively worked on. Yes, please, please do. So, and anyone that is, that's, that is a good point that we haven't designated note taker have we. I think it probably makes sense at this point for anyone to add notes as they can but that's probably a good logistical point for future calls to does any note taker. So, so within that, the get back to my train of thought here. So things being distributed. What we're working on now is kind of the first version of an architecture to be distributed and thinking that knowing that not every institution organization that would be involved with share would necessarily be standing up infrastructure one of the things we have in mind is that there may be some points in the community that are able to take a slightly heavier load than others that like so so for example, be able to accept jobs and run those for those for different institutions but but really the one of the big focuses here is to shift from a highly centralized or national regional geographical regional view to also really enable more of a local institutional view so so I think what what David Minard alluded to a little bit earlier about the dashboard that was created for them. That was a dashboard that was looking at the, the total share corpus data set in total but really filtering down to what is most important from their standpoint from looking at a UC San Diego lens. So, so taking a step back and thinking about that what we can then look to replicate for others to do, not necessarily with the same kind of thing with the dashboard necessarily but but looking at the kind of information that that is pulled together, taking a look at that from an institutional context, but then also making that available to share to others and and potentially so if, for example, if at Notre Dame we harvested some records that are relevant to UC San Diego, or vice versa that very easy to also share those index them, link them, etc. and then continue to build things on top of it. And, and also knowing that share is not the only type of tool or system or communities work on this that you're really thinking about how we can also start to bridge some of those community boundaries as well. So, I think that's, that's kind of a rough like strategic overview. Any questions initially on that. Okay, so I'll take silences nothing yet. So, so now, let me zoom out a little bit so this is a little easier to see. There we go. Sure. A bit of a little commentary. So, when we're thinking about where we, where we were in this last phase of share we were, we were collecting metadata, harvesting metadata from many sources in hopes to aggregate that normalize it and make it available for the creation of new services on top of this this knowledge. That is a that's that in a centralized system is a very big task to big ask, because I think everyone on the call is probably very well aware. Metadata at best is inconsistent in its quality. That's that's a problem that not just technology will solve, but that humans need to be involved in. And that metadata is awfully locally relevant. It's locally relevant to, for example, the institution that is providing that metadata, for example, on their repository. And the same way that the data, the information that we'd use as input is relevant. What we want from that data is also locally relevant to that institution. So, UCSD when we built the dashboard for them try and share using on top of this share data this data from many different places and then displayed on this dashboard. They had a problem they wanted to solve for UCSD. And from a, from a scaling standpoint, this is where this is exactly what we need to be doing is solving these local problems and engaging local experts to be able to help solve those. And so this is why this decentralized approach is so important because it just it doesn't make sense for one group to be able to solve everybody's problems you have data you can't share publicly that you have private to your institution that you want to do something with. And that may be connected to some things that Rick wants to do at Notre Dame and others want to do other places. But there are local problems you want to solve. And you need to be able to solve those in the way that you need to be able to solve them. That seems like it's an obvious statement, but we need to create this virtuous cycle. If we want people to be able to contribute to this, this, you know, large aggregated data set they need to solve their own problems first you need to be somewhat selfish in that. And so that's that that is the the impetus for really pushing this decentralized model now in this phase is that there needs to be this institutional ownership because those are the problems that the product owners are the, in this case, in many cases the institution. And so what you're going to be hearing about as Rick will tell you about next is technologies that really allow us to better engage the people that are closest to both the problem and or the original data that will solve those problems. And this is this is critical because the bigger problems that we want to solve when all of this data could be, you know, put together and combined with private data and all of these things. These are, this is this is game changing stuff. This is these knowledge discovery systems where someone just goes to a terminal and says tell me how malaria and intestinal disease are related in ways that I've never thought of before. And then just some, you know, information pops up that is this is powered by this metadata, but that ignores all of this curation and local expertise and local problems and combination with private data and all of these other things that that have to be solved in a locally owned manner. And so that's that is that is for the people that are just new to share new to new to this phase, especially that is what is behind all of this this next piece of the conversation. Great, great. Yeah, so and so and and part of how we have really approached this is in thinking about, you know, having the now experience of the three plus years of existing chair, and how things have how the support has been executed with center for this as the primary holder of that service provider for that, and I'm moving to this other broader model, you know, part of that is how we get the data in, and how we also put the control of how that works how that happens more in the community's hands is a big part of it. So we that with center for the science it was it was a big, we do it for you model, and we're looking to work to more of an enabling facilitation model with this. And a part of that is then thinking well how can we make it as easy as possible to configure and sit up and hide some of the complexities that are not going to vary from institution to institution, etc. So that's a big part of the thinking behind this and I think it think it makes sense to show a very simple example of how we do this before the more complex example that that I think can have may be able to show as we as we progress here. But the basic idea here so we have we were working within this framework called node read. And what it is is it is a way to set up a workflow with kind of some set operations that you don't have to define yourself so so making a web request, taking that response and then starting to parse that so this is a very simple example here so something that we've been just playing with the different ideas of different destination different sources, etc. So this is one. This example is looking at going to cross ref. So, so this very simple example, just defining the URL. So saying I'm going to grab works from cross ref. And then it's going to pass that URL to a web request and this is actually just a standard htp request that node read as it does for you already. And then this is just saying I'm going to select the things coming from there and then parse those through so it's actually a very simple thing so this is a an example of a flow that can then get get items from cross ref and at the moment this this is very kind of low level look into things, but essentially this is the web request, the contents the web request that came back. And then if I started to dive in I can start to see some of the records coming in here. So, so like this is example of a record that a book chapter. For example, so so there's there's a whole series direction and obviously this is just the tip of the iceberg. In terms of the kind of data that we would harvest from a place like cross ref but it but it's a this really just meant to be kind of a simple example of the idea here where I go back to the diagram now. So the idea that that really for any source that we want to pull data from define a workflow that pulls that in defines the schema defines any anything configuration so what URL doesn't need to request from etc. What's what kind of endpoint is it is it OIPMH is it going to accept rest base requests. And then once the data comes back comes comes back how do we want to map that. So so then what I'm pulling those records and then pulling together data from say, you know, 10 or 15 different sources. You want to be able to map that to something fairly common in order to use it in total. So so the idea here then is that you kind of we're thinking you would define this flow and then having a message queue that those those jobs would be able to submit to so I define a flow and say okay I want to run this and then get the data back, and then do the mapping and persisted to our our layer so it's on so in this example. Elastic search. So the whole idea here is that we think you have no red is very much a way where it would be editing configuring and then also then having kind of a a an environment that is not necessarily the same as where you're setting up these flows that what I just showed with no red and and I can give a little background of that in a second of what no red actually is. But essentially this this is something that is a really nice community based tool that we didn't create that we realized was something that would be incredibly useful to simplify how things are configured within this but but really also looking to add more of a more shared infrastructure that folks can use so so again one of the things I talked about earlier about how we're not necessarily assuming that every institution will actually be running these cues but the idea here is that we could we we have some shared community sites that that are able to do what they can to help with the community that can accept those so so Notre Dame being one example of that that we've thought of as as an instance within this kind of network of providers but really not looking to dictate that it has to be a set number it has to be in our name you submit things to etc right but really just like as as one place that could do some of that and then to kind of be that one of those service providers. Now of course we haven't fully explored explored what our overall capacity would be to accept those kinds of the volume and things like that so those are all things that we're going to be figuring out as we go. Jeff do you have anything else to add to that. Yeah, so this could be something we discussed and I'd be curious to know. So, so this this, I'd be curious to know from folks on the line, if they've used tools like this before so get here is is. You know, we're trying to in any problem that we're trying to solve in this share context we gather data first, and then do something with it. And so this node read piece. So our way of thinking about how do we, how do we engage that local expertise but in as inclusive a manner as possible, and I think what we've learned in the past is that simply expecting code to be developed is not going to get the community engagement that we want, especially the diversity of expertise and individuals that we want to be contributing to these to these to this problem solving effort. And so what node read is is an example of what's called a flow based programming environment. You literally drag and drop different functions different tasks into the environment and then connect the inputs and outputs of those tasks together in order to create these more complex flows so what Rick had, what Rick took you through was sort of going to the you know a certain API, asking for that data doing something with some of those fields. And then transforming that into you know this next schema based persistence. And so all we're doing is gathering the data, and then doing some basic normalization against what that data what we think that data is supposed to look like. But you can do that without touching any code, and the more people contribute the less code that's required so at some point, while right now, hitting crossref maybe making a restful request to an API and this these these words may not be super familiar to you and that's fine. But someone who for whom that is familiar could wrap that all up and say, Okay, just, we're going to drag the crossref function into the environment. And that's going to then generate some data with this certain schema. And so now they've encapsulated that that logic into this. There's another level abstraction that someone then may be able to work with they may be able to say, Okay, I don't know a lot about requests and apis but I do know that if I, if I drag this crossref function on and I say I want these this type of data so I want data sets and preprints from crossref I or preprints with data sets, for example, I know how to make that query. So I do that. And then I went data from archive and I want to combine those in a certain way. That then becomes I think much more accessible and this is what flow based program is demonstrated in the IoT community, the Internet of Things community the home automation community. That's where node red comes from our fork of it is we're right now calling it share read, but this was one of the ideas to really lower the barrier of entry access more expertise in the local environment in order to decentralize and distribute the potential to solve these bigger problems so one, does anybody have experience with tools like this to just from what you've seen this this seem like an approach that might include more people to solving these local efforts. And then if there's still questions here, we can even run through a more detailed example I think Ryan has an example that we can we can share with the team. So those two things I'd like to hear more from the community about so one anybody have experience with tools like this. Hi, this is from which attack. This looks awfully like Yahoo pipes that's been released like 10 years ago. I wonder if this is a similar concept. Yeah, it's very, very similar Yahoo pipes was was a flow based environment flow based programming never picked up a lot of steam but Yahoo pipes is a good example of where it was successful for certain domains. And so yeah that that's pretty much exactly what it is. No red is a open source community developed version of that with a little bit of a higher abstraction to do just about anything, rather than dealing just with data. It does pretty much any function that you can imagine because it's just it's really it's flow based programming and it's it's the programming is the level of abstraction they're not working with data like in pipes. And so you can you can write functions for anything and have a do anything you can have a talk to Alexa and turn on your lights. If you know if if camera the texture car in the garage whatever you want it can do. And that's how it's being used now, but we can take the that same mentality there's a lot of these predefined functions from the community that deal with data. And so we have a lot of the pipes functionality already built in and anything we don't. It's quite trivial to add in fact we're working right now on a OAPMH function. The Internet of Things community doesn't need to tap too many OAI resources, but we do and so we can just contribute that one small module, and then benefit from everything else that we're doing. So yes very similar to pipes and but has that general is it has that generalizability, but I think we will benefit from as we expand the community who is has concerns in this in this domain. Good. One other question is about this looks like me to be mainly about data source. So are we thinking about pull the data together when they're going out of the system, or each distributed institution will have their own data sink. Yeah, that's a good question so Rick touched on it in by saying Notre Dame could be one of the hosts of some of this data and that'll be a decision that everyone will need to make for themselves. I want to assume that in I think developing the technology assume that people could just use this for their own purposes and not share it with anybody and so we want to build it in such a way that I think the University of Virginia, for example, wanted to gather from five different sources, combine them in a certain way and do that on a regular basis every day ask if there's new data from those five sources, and then generate a dashboard based on that data. We would do that without worrying what everybody else is concerned with what everybody else is doing, and we would develop technologies, as long as they're aligned with enough of the community that would help them do that and this interface here for gathering seems to be one of the places that does have enough commonality that's worth putting some effort as a community to make really easy and extremely functional for these use cases. So, if everybody's hitting, you know, archive and bio archive by themselves that that's somewhat of a wasted resource. Perhaps there should be, you know, a few groups that say okay yeah we're going to be doing this at scale we have the resources, we're going to collect these, these 10 resources and normalize them in this way and make that available. That's the sort of next piece of this where those groups can do that, and we are, and we want to think about, and I have some ideas using some, some of the more modern thinking about decentralization from, you know, torrents and dare I say blockchain, and I mean that very, very lightly that can sort of basically announce to the world okay Notre Dame is harvesting from these 100 sources. And so if you would prefer just to gather data from them rather than asking the source for all that raw information, you could you could find out very quickly that oh Notre Dame gathers crossref data every day. They've done this normalization, I might as well hit their cash, rather than hitting crossref myself and so we lower crossrefs burden and we maximize the fact that Notre Dame is willing to host this information in this way. And so that's this next piece that that would become after the gathering phase that we would be we'd be thinking about. Thanks a lot. Yep. One more question. I'm sorry. So is the existing share project to data with that with with the share project considered making them a data source. Yeah, we could make that a data source there's different ways to handle the data that currently exists. And it really this is going to depend on the use cases that we want to be working on. We need to be very outcome focused we really need to deal with the problems that you're concerned with we're only going to get contribution from the community. If we're dealing with concerns that you have. So, you know, instead of if sort of an if you build it, they will come sort of approach. Well, what is the problem that you need to solve and let's solve that problem and see how it's aligned with this. So with with that in mind, we would then want to look at what's existing and share and say in sort of this former phase of share and say, well, does that normalize format benefit us to these problems that we're going to solve now. Or do we need to do we need to tweak that do we need to combine that with private data that can't be shared publicly, whatever those whatever those issues are. But that that would be the right approach where we could then just make that a source just just like you said, and so we have a we have a nice migration path here because of that. I guess. And just to add on that one of the things we've we've thought of is is the, a lot of the work that Center for Open Science has done with prefront services and stuff like that. also seem like a natural one that could potentially be, you know, one community source of data where they're handling a portion of the responsibility but it's not then on everyone to do to duplicate that. And in terms of the questions about, so far we've just talked about getting the data in and as Jeff has said, this is really a means to the end of looking to solve local problems. So what I've pulled up here is just one example of the one that we have with, you know, creating a dashboard on top of the data. So pulling it in, so for example, at Notre Dame, we're very much keen on taking what UC San Diego has done for the dashboard, but then changing and tailoring that to concerns at Notre Dame and also the sources coming in and thinking about like as Jeff alluded to, you know, bringing in some data that we license from Web of Science, for example. So we want to be able to incorporate that data that we may or may not be able to share, but we want to incorporate that to get as comprehensive a view that we can of what's happening at Notre Dame and then also incorporate other sources as well. And then by doing so, we then have a lot more power to get that aspect where we can look at both what is the data we have openly available and also the data that we're licensing as well. Like I think that in the end, we all want data to be as open as possible and we continue to strive for that. And we also want to get access to other aspects and there may be private data that we only want to have available at the institution if we're bringing these to executive level leaders within the university, etc. So there's other concerns to incorporate like that. But it doesn't have to be as large a use case of these either. It could be very small, smaller in terms of organizational exposure or scope. It doesn't have to be something that is so widely university-reaching. It could be something that could just be helping feed into a particular system or repository. It could be something that then feeds into your own search or helping do analysis for a particular research project. We really don't want to limit the thinking on this in terms of the level of scope that we would expect it to be used. So I think it may make sense. I'm just looking scanning through this more detailed document to see if there's anything else to mention. One thing to mention is, so you'll notice within the shared document, there's a section called Statement of Work and those are the immediate high-level priorities that we have. So we've already been talking a lot about re-architecting the harvesting framework with focus on community contribution. There is a graph database focus to this as well once we've received the data to persist that and link it. And then once it is there to expose it, really this metadata editing pipeline, we've talked a little bit about how we want the mappings to be more easily configurable in tools like Node-RED, but really doing, really moving beyond that is the scope and is the intention of this. So now, I don't know if there aren't other immediate, are there other immediate questions? Rick, I have a question. This is David Minow. Is there still some kind of a concept that there is a share, sorry I did scare close, you can't see that, a share corpus of data sitting somewhere that is being harvested and collected or is it really moving to a purely distributed, there's a whole bunch of different pots of shared data sitting out there? I think the philosophy and intent is the latter of multiple pots. That said, share as exists today continues to harvest data being hosted at the Center for Urban Science. We are looking to move some of that hosting to Notre Dame and Virginia Tech is also a site for that as well, but really we don't want it to end there. The goal is not for Notre Dame and Virginia Tech to be kind of the primary hosts of share. It's really meant to be more of that distributed model. But again, kind of based on organizational capacity and what we each agree to do in terms of dividing concrete division of labor of being able to pull in the data aggregated, etc. We want to be pragmatic about that as well where we can kind of take ownership of this model that I have thought of that is kind of similar to this in terms of how distributed is with locks where different institutions agree to say, okay, I'm going to preserve this portion of the data. It's not going to be 100%, but then there's some analysis within the community of instances of locks saying, well, let's make sure there's at least three or four sites that are duplicating this data. That's the kind of mindset that we have coming into this to think about how we can kind of work together as a community to make sure we have things in a healthy state. There's a just to get a little deeper on what decentralization really means. Most of time, especially in these more modern contexts like blockchain, torrents, you're talking about decentralization environments where you don't have a lot of trust or you don't want to trust, you want certain guarantees via the technology or certain levels of anonymization where trust isn't really an issue, where enough people can hold things. We can capitalize on the fact that we still, we are an environment of trust to a certain degree. There's obviously use cases, especially for example in some of the data rescue, data refuge projects where we thought there was trust and it seems to be maybe a little less trustworthy, but still for many use cases, there are environments of trust and so we just need to expand that a little bit to get the benefits of decentralization, which includes protections like a resource becoming untrusted or less trustworthy. But still, I know Rick, I know Notre Dame and if they tell me something, I can gauge some probability of that, okay, what they tell me will be right. And so if I just have a few more players in there, okay, now I have Notre Dame and I have Virginia Tech, my trust in both of them is very high and together that that is extremely high. And so we can, this is this sort of balance decentralization where we don't need to go full-fledged, we can't trust any single agent in the system, but because we have this sort of whitelist, these groups that we know are reputable in the ecosystem, we can create what sort of feels like a centralized environment from a tool standpoint. The benefits of centralization is it's very easy to work with, it's an easy resource to access typically from a tool development point of view. But we can create, because we have this element of trust, or because we can engage our users and ask them who you trust, we can create sort of this hybrid, this abstraction against that decentralized framework that looks like a centralized framework that just knows that if you've said that you trust Rick and you trust Virginia Tech, you trust Notre Dame, you trust Virginia Tech, now you can access those resources almost as if they're one centralized resource without them having to only one of those groups take on this entire burden. We could do that for many resources and we can put into place again protections that guarantee overlap in a certain way such that if Rick were to go in an evil direction, well we can still fall back on Virginia Tech to host these pieces until the community can come up with a response to cover what Rick was providing and I don't think Rick will become evil, but just in case he does. Well it's not just me, it's Notre Dame back in this, the Hesburgh libraries here, so yeah. Hesburgh libraries, high potential for evil. Right, right. Okay, so let's, so I think, Cam, if there's something that makes sense for you to show at this point, that would be a quick, maybe a good time. Sorry, Leo here from JISC, sorry to jump in. I was just wondering, sorry? Go ahead. Yeah, so I was just wondering, because it's quite interesting to hear sort of the approach you're taking with ShareNow, because looking at it from the sort of European context, I'm not sure how much you know about OpenAir, but there's obviously kind of a system that is similar-ish in the about funded by the European Commission, which is under development, which is the so-called OpenAir system, which takes a much more sort of centralized approach. And the thing I know about sort of OpenAir is that despite this far more sort of centralized approach, there is, well even sort of OpenAir struggling with kind of the, its own vision on it, on the sustainability of the system and the kind of, you could say, well, business organization model behind this. Do you have any thoughts on what your transition towards a system that is more decentralized would mean for your, well, well, I guess long-term sustainability and how you organize the system or, well, business model might be wrong, because there might not be much of a business behind it, but maybe your organization model is more appropriate, the more appropriate term. Yeah, Jeff, do you want to speak to that or? Yeah, I can comment on it and then others can weigh in. The business model changes dramatically from one of a centralized service provider to a community, a distributed community effort, even with these decentralization techniques that allow for, to look like it's single service. And each of those groups involved in that community effort can have their own business models for how they approach this. If, for example, they're solving very local concerns and decide, well, we use Crossref a lot, so we might as well just share that with the community. Caches are cheap to maintain, so we'll do this one thing, but we're really solving our local concerns, then locally they can solve that with, say, internal funds. If another group says, well, really, we want to focus on preprints. Preprints is a big deal right now. Discovery is difficult because there's so many providers popping up. We want to engage funders on that. They can pursue that business model and work on sustainability for those efforts. The benefits to the community and the technologies that we're thinking about developing is that sometimes those business models don't work out. Sometimes sustainability, we always talk about 10, 20, 100-year sustainability before we even prove a concept is going to work. Well, if it doesn't work out, there's still the chance that if someone is using that or leveraging what that service provided before they, you know, pivoted or went under or whatever the issue is, they can gather that information very quickly and easily and to the community, then, there's a seamless transition. The protocols, the communication standards are all there so that we talk to each other in the same way, and so Rick may be hosted now, and then all of a sudden I host a little bit, a Virginia Tech host comes on, they say, well, I can help host this. There's a way to deal with that, and that's I think the, that's this community model that open source, for example, has demonstrated out that you can, there's many, in the open source world, there are many, many business models and sustainability plans that behind those approaches, but there's always this backup to the community that because there is some similarity in how we speak and how we license things, for example, we can fall back on those standards as a protection in case one of those business models doesn't pan out the way that perhaps we had intended to buy or 100 years ago. Okay, and knowing that we have five minutes officially left a week, but I can go beyond this, but I wanted to make sure for folks that that cannot, that we can, one, I wanted to just touch upon quickly the last two bullets there before giving camera chance to show a little bit of that, so in terms of the frequency of these calls, I think we're thinking at least monthly, if it makes sense to do it more frequently, we certainly can, especially as the activity gets more and more active, that at the at the moment, I would say the community participation is very organic, and but also really looking to be very open as well. So, so we've, you know, we've started with Notre Dame and 221B actively working together and really looking to broaden that as more and more partners are looking to solve these problems, that, you know, this call itself has been a pretty large group, but not assuming that every call is going to be this large or needs to be this large necessarily, there may be different people that come in and out of the conversation, but really in terms of getting organized around next steps, one of the things I want to pull up, so I'm on the chair research.org site, just the announcement for this call itself, I wanted to point out towards the bottom, there are links to where we have been working in GitHub, as well as within Discord chat, so that's, if you're not familiar with Discord, it's like many other chat, it's, you know, it looks a lot like Slack as well, we look at these and so this is the channel that we're currently operating in, see some behind the scenes notes of kind of prepping for this call we had, but we've had a lot of just the planning around this, just initial development activity pinging back and forth, we've been relying a lot on this channel and it's open that really, that anyone can join here, and then the GitHub space, this is share research and you'll see all of the activity that's happening there, so share red and share red OAPMH, obviously the ones that have received the most activity recently. So in terms of tracking tasks for work, we have not necessarily designated JIRA or GitHub issues or anything like that as a primary place to do that, I think that can be a point of discussion for this group as we move forward, but we definitely as, of course as you add more more people, the necessity for those kinds of tracking tools becomes more and more important, so we definitely recognize that. So any questions, comments on that? I'd be curious, before ever, we can stay a little bit longer, people want to see a more in-depth demo of Node-RED, Ryan and Cam can demo what they have, but I'd be curious to know what, after hearing this, what do you need? Do you need more information? What are you interested in doing? Do you want to, you know, I want to really, really stress that this does not all have to be technical contributions. I would like to see this be a community-based project, not just open source technology, but open source product, open product development. So requirements generation, specification generation, functional design, specs, QA, product vision, I mean there's a whole series of work that happens before code gets written and so documentation. So I'd like to know if, you know, what you need to engage with us, with the expertise that you have or that you'd want to provide to this growing community. This is Matt. Hi, I think for me it's some of that what you talked about earlier, Jeff, which is what are the problems we were trying to solve, some elevator speech points that's talked to really the type of products that could come out of an effort like this. So for example, you know, you need a research administration view. They're certain intelligence that they're trying to gather. How can share help with that? There's the actual research review in the sense of I want to find collaborators or sources of information that I can work with and then of course the consumers of research that are like when I think of the original shared dashboard, which is I'm looking for articles on X, Y, and Z or a pre-print, et cetera, and then those integration points. So being able to sort of like elevate that to a point where we can actually talk about deliverables that when we're looking at our community, how can we explain this to them that gets them to buy into it? And again, some of that is that customization point that you also talked about too, which is if we're feeding into this, the ability to manage that, but then also what can we take out of it and maybe make more interoperable with like I think Rick alluded to our internal discovery systems and things like that. So I know that's a lot, but just having some talking points, I think around this is very helpful for us. Yeah, and I would like, I think in that spirit of not doing it, if you build that they will come sort of thing really also on the sort of flipping that around and asking, you know, restate some of those things with your institution's interests in mind. What are the things that you need with respect to research information management? What are the data sources that you've been wanting to combine certain ways to deal with those local problems? And so if we can sort of do both of those things, we can certainly generate ideas for how this data can be used, but there's so many of them and really what we're trying to achieve is what problem do you want to solve tomorrow? Like literally tomorrow, what can we build or maybe next week or two weeks, not in a year, like right now, what do you want to build? What do you have resources? What do you have passion? What do you have expertise in building to solve those problems locally? So I think it's that we can create those bullet points, but I want to hear that with reference to a given institution. I think that might be more meaningful to other institutions when they hear, okay, it's not just we're building this thing that sounds like a RIM system or the harvesting for a RIM system, but Notre Dame wants to build this piece to engage with their current research information management system to solve this problem for users. Yeah, and so those known use cases would be great because we could hear that that would percolate ideas from our end too and think, oh, okay, we had this problem over here that we didn't even think could be related. Sure, yep, exactly. Yeah, that's right. That's good. Others, what do you need from us as far as the next type of calls or more information for those people that are new to share? This was probably a whirlwind of what's what's going on with share. So what do you need from us to sort of engage in some of these these processes? Hi, Jeff, this is sure. I just have a question. If this is share v3, then what's the current state of the share v2 that you can search now? And is that still being harvested from things or what's that state if you can update us on that? Yeah, so as Rick mentioned, it's still being gathered and made available for, I think, public consumption. I left COS back in March, and so I don't know the latest thinking on on that product or service, but I think data, as far as I know, data is continue to be gathered and for example, preprint search is still, I think it's still the only or one of the only aggregate searches available is made available via OSF preprints, which is built right on top of share v2. So I expect that I hope that that that remains available. But then as we work on these more decentralized pieces, others like Notre Dame and Virginia Tech may be willing to grab chunks of that data and post it themselves. But I guess for the time being, as far as I know, that the plan is to keep harvesting data. Yeah, certainly a big part of that is as we're building up this new form of infrastructure is to migrate the majority of the activity to that. Yeah, we reduce COS burden with that in that respect. And so that is one of the potential outcomes of that decentralization. Okay, so I started adding an action items list to the notes here as well. And definitely, if something was said, and it's probably something that is missing still, please add to the list. Okay, so folks have, you know, five or 10 minutes, maybe we could get a quick demo of some of the more sophisticated detailed things that Cam, you could show. Yeah, and I will stop sharing here. And if there's anything else that you want to ask or say, you can go ahead and leave it in that community called document, there's a section at the bottom, and we can follow up asynchronously. We do want this to be a across the board of community effort. Okay, so this is Cam here. And I just wanted to give you guys a quick tour of what the API looks like right now. So this is a collection of works from various sources. And I know it's kind of hard to see, but there are multiple entries here with various levels of detail. This one obviously has nothing in it. But here you have a preprint on the DOI, for example, from the archive. And so the neat thing about the API that I built out is that it can take non normalized data and normalize it for you. So here is on this side is what an archive like payload looks like. If you just scrape their site convert it to JSON looks like this. And you can see that you can see that when you post it to this API, it actually scrapes out the relevant information. So it gives you a public shade. It gives you the title of the abstract. And then it also holds on to all the original information. So we can scrape that and deal with that in the future if we want to. And this also works for crossref. Again, you have this standard crossref body. And if you submit that, you get a crossref result. Again, normalized to the database. And you can also submit pre-normalized data. So this is just some example of maybe what might appear if you decide to do the mapping in the no bed flow itself. Sorry. There we go. So you can see that it includes all the right information. And the benefit of this is that it adds redundancy. So for sources that people don't have mappings for or are not able to make their own mappings, we can handle that on the back end. And if we end up deciding to use different technology besides the way which I don't think is the case, it still gives us that flexibility. Also this API layer currently interacts with Postgres, which is a relational database that doesn't have to be the case for this to work. This can accept or this can put data into any sort of pot, any sort of decentralized or centralized database. And that's most of what I have, just this idea of being able to scrape from any source and just submit the raw information and then normalize that. The final thing I want to mention is that, well, two things. One, elastic is very simple to add on top of this. And two, that mappings could be made by users and stored in a database and then pulled out when they see the relevant URLs. So if it sees, I don't know, a bio archive, it can grab the data or somebody can write a mapping that takes the data from bio archive and turns it into a centralized format. And then whenever in the future someone submits raw data from bio archive, it can be adapted to this format automatically. And that's basically everything I have right now. Okay, thanks Cam. I think this traits is how quickly, and Cam put this together quite quickly, how quickly we can solve, how quickly we can create techniques to solve different challenges. So here, we've never really thought about surfacing a normalized database or an API to a normalized database as the main asset for a service. If we have this local concern, it's very easy to take the data that comes out of those harvested nodes and map them into a model that we would need say to be normalized to solve whatever problem that we're trying to solve. So we can do that very quickly. We can put it into a full tech search database very quickly, which is how current shaver T2 is mainly used. We can leave it as those raw key values like Cam showed. Does that seem accurate in terms of how easy it is to work with when we have a less specified approach, Cam? Yeah, it does. And to reiterate on how quick this was, it took about maybe three or four hours worth of work to set this entire thing up. And transporting parts of this already made it be easier to do other projects. Okay, good. Okay. And I know we're over time. So I don't want to push it too far in that regard. But just to say quickly that a lot of these things, the example that Cam just gave, really thinking that that would be the kind of tool or component that would be able to be called from a flow, for example, and all the things that I, the very simple example that I showed, there is the ability within Node-RED to wrap all that up as just one node in that flow. So as we continue to develop these common flows, ones that are going to be consistent in many ways, if the only thing that's changing is where you point it to, which OAPMH source you harvest from, we don't expect everyone to then be expected to write an OAPMH flow that would do all of the logic that would handle pulling the data from that. So those are the kind of things that we're really looking to build out templates and component and reusable things that can then just make it that much easier to pull from. So then it really is just a matter of working with the differences in the metadata and the kind of inputs and outputs that come out of these things. But the actual plumbing, we found that that does not need to change dramatically. And there's for the most part a smaller number of types of sources that we would pull from. Yeah, so that's a good point. So people like CAM, like building these sorts of systems that automatically take embedded and nested data and normalize them so that you can use SQL or use R to use SQL to access or whatnot. But that could be, you just want to do all of that, could be literally one function in Node-RED that you say, I want crossref and archive and bioarchive to go into this system, normalize it, and then I want to be able to use R to access that data that can all be wrapped up in these very visual sorts of ways using this interface that we're starting with, as far as how you put together these flows. Okay, so any last questions? This is a quick question. So this looks very promising, but I wonder, in order for us to share this type of modules, do we need to define an output format? Otherwise, it sounds like you would have multiple in, multiple out, different types of format for input and output, then it becomes harder to share because it's end-to-end relation. If we can put something like an end-to-one or end-to-two or end-to-three, a defined set of output format, then this model would be much easier to share, but then it could be a little bit restriction. I wonder, do we want to strike a balance or we just let people do whatever they want? That's a great question and a critical one. So the thinking right now, and Rick can comment on this as well, is that we would, there would be two places that we try to define some schema, and you don't have to use that schema. We don't have to have a community-based standard. There's nothing needs to be developed there. It's better if there is, and we can, you know, as sub-communities come together to solve problems, we can, we can find those. But one, we need to basically say this is what that API, what, what that, that source provides. What is the data they provide? And even that can change. We need some way to validate that if you change your date to mean not just the date published, but the date that you got the data, we need to know that. Just as well as if you change date to a capital D rather than a lowercase d. So we have some input schema that we write against, that we validate against. That's an easier one. That one shouldn't be too hard for the community to agree with because it's mostly just mapping fields to this validation format, which is a, you know, standard, what we're thinking right now about the JSON schema, which is easy to write and easy to use. Someone could develop one in XML if they want to. Now on the, on the, how do you do that across multiple sources? This is this ontology alignment phase that we're thinking about. This ability to map, because we have some standard ways to talk about the data coming in, and we know the type of data because it's typed, it says this is a string or this is a date. This is a date that obeys ISO, you know, XYZ. We can use sort of techniques from ontology alignment to map that data into an aggregate source. And then someone could provide that source using that schema if they want to. But where we focus our efforts is really define how that's mapped. And then if a community wants to come together and say they really like data site, they want to use the data site schema, they like to share V2 schema. We can just create that mapping to that. And then if people use it great, if they want to change it, all they do is change that mapping. It's very then low effort to then pump that right back through, for example, CAM's auto normalization system to create the database they need on their backup. Yep. Thank you. Okay. So I think it's probably good to wrap this up now. So the, so again, looking, if you have the agenda open, so that I really think really the next step is to schedule the next call here. And we'll have that out in the next few days, but really looking to have that be within four weeks. I think as we really become active, I can see this becoming at least biweekly. And, and, you know, that, again, you know, less overview, more granular detail of, of current tasks is what I would expect that to be. And then we can, as we go, continue to decide, you know, should we have different types of calls, etc. to make sure we can communicate and discuss all the various levels of detail there. But, but this is very, very much appreciate everyone joining today. It's been, I think it's been a fantastic discussion and, and really a great start. And as we really pull more and more groups actively into this. All right. Thank you. Well, I will go ahead and stop the recording. Again, thank you all. And please keep in touch. And, and I think if you want to, if you're really excited to contribute now, the getting on the discord channel is the quickest way to connect sending an email to myself, Jeff, Judy, anyone within that space. But I think, like my email is just rick.johnson.nd.edu. So very, very simple, easy, easy to remember. All right. Thank you all. Thank you. Thank you.