 Okay, so I think it did start and got a smaller group this one, which is just fine. Everyone has a lengthy agenda, hopefully, now. And I think I will just share my screen, especially for the recording sake. Okay, so hopefully folks can see that. Great, great, great. Okay, let me go to the agenda. Okay, so thank you for already placing yourselves into the attendee list. I think everyone should be able to add that themselves. The notes from the last caller on there, as well as the recording from last time. I made a point to try to designate a note taker. This time to put a list in there, is there anyone that would like to volunteer to take notes other than me? So it's hard to do it. Thank you, thank you, Cam. All right, yeah, then I think roll call is also pretty easy. This time around, oh, we have a few more people joining. Excellent, okay. So how about we run through it? So, hello again, myself, I'm Rick Johnson. I'm at Notre Dame and I'm the share operations team and also helping with the refactor of sorts of share along with at the very beginning, Jess B's and Cam Blandford and Ryan Mason were hopefully gonna be pulling more people in as we go here. Let's see, and I think I'll hand it over to Virginia Tech. I can do a quick roll call. I need to, I mean, let's go. Okay, this is Chiu Wuxi from Virginia Tech, Chief Stretching Officer. I'll introduce myself. My name is Yirin Chen, hi everyone, and I'm software and engineering, engineering, book at the Lunaman Three Library. Hi, I'm Avino Kumar, I'm second year of graduate student here in Cambodia. We also have Saris, he's had a conflict today, so we'll just share the recording with him. That's another great experience with us. Great, great, great. Yeah, then Sherry, you're here from UVA. Yes, Sherry Lake, I'm the scholarly repository librarian here at UVA, the library. Okay, and Cam and Ryan, I'll say Cam first, he don't compete. So, hey, I'm Cam Blandford, I am working with 221B, and I worked on Sher a little bit before this at COS. And then Ryan. Hey guys, this is Ryan Mason, I'm working with 221B, I'm pretty similar to Cam here, I worked on Sher at COS prior to this. And then, I'll call on you Ho-jung next. There we go. Hi, I'm Ho-jung, UC San Diego, I worked with David Minor and with Rick on the UCSD dashboard, Triton Share. Thanks, Ho-jung. And then I think Leo is the last one here. You are muted. Okay, now I'm muted, I think. There we go. Yeah, hi all, I'm Leo, I'm the, well, I guess continental, well, not continental European, but the Europeans, I work at JISC in the United Kingdom. And I'm just really sort of interested in where you're going with Sher and the metadata aspects around it. Great, great. And I think Ho-jung, you're new to this series of calls, but not new to Sher. I think that is the, you're the closest thing to a newcomer on the call at the moment. So welcome. Hi indeed, thank you. Let's see, so then next on the list, like agenda reviewing action items from last call. So those were actually all done. So note taker, compiling, local use cases, scheduling us a call, so that is great. So I thought the next step, we could take a look at the use cases that have been compiled so far. How is the font size? Should I bump it up at all? Okay, I'm gonna guess silence is good. All right. So yeah, so as you can see, there's a few columns there, a brief description, who cares is kind of who the customer would be. And then who contributed. And I realized after I put the document out first that, hey, we wanna know who all agrees with this one, not just who contributed, so that plus one there is for who also thinks would be important for their locally as well. So the first three got a lot of plus ones, which is not too surprising since we've had the dashboard as one of the key things that is real that people have seen that they have a sense of what that is. And then, but I think there's definitely some interesting things in here. I can run through the ones that I added and then maybe others that contributors. Let's see, I think Matt and Cynthia and Joe are the ones that are not here that contributed. But that's okay. So then kind of the other one is, I think the second one is kind of a branch off of the first one, but storing affiliate data for local academic departments. Benchmarking data is something that, against other academic departments, is something that's come up. And this is something that feels more challenging, at least locally here at Notre Dame, about how we might get good data on multiple institutions to then compare and contrast. This is coming from our chemistry department initially here where they want to look at what are some criteria to identify to then work to improve upon. Where they see, hey, this Southern institution is really strong at this. And this is maybe why let's focus on this. So that's part of the thinking behind there. And beyond that, I think those are the ones that I added. Xiu, who are others at Virginia Tech? Could you talk about the fourth one there, the overlaps? Yeah, Xiu, with hearts and hearts, the loss of metadata might be a good time to take a look at those metadata and try to get some metrics out of it. So this is one use case. Let's see how many open access materials are having overlaps with each other. If there are overlaps, which are of the better quality and so on. Okay, okay. And then Ho-Jung, that fifth one there, analytics-related use cases. This is something that David Minor had actually sent to me where he said, hey, would it make sense to do just a general unpacking of different use cases that were there? So I don't know, that might be something just to follow up with him to maybe flesh that out more. I don't know if you have any immediate comment on that or not. No, unfortunately, I need to touch base with them. I was out all last week and had no chance to do that. No worries, no worries. Yeah, this is a living document, so it's all good. And then Xiu, you had the next one. Yeah, it's coming from the notion that there's no way in Samberra to have the OEI PMA harvesting, so I was coming from it from the other side. If there's a use case to put metadata from Hydra, Samberra repositories in share, then we need some way to get it out. Okay, okay, okay. And then I'll read to the next one. Let's see, aggregate scholarship and research activity of Arizona's three public universities and do discover dashboards. I think this is kind of like the Arizona data live use case, right, Arizona live data. I can never remember the exact order. That there is a portal for that, that cause has stood up in the past year, I believe. But I think that's where that is coming from. Let's see. And then Cynthia has some fatality added. I use case that lines up with the currently, there's an NEH grant related to share that is to work to make the humanities work more discoverable and findable. So that's where that is coming from. So I want to make sure we didn't lose sight of that. So she added that. And then the unpay wall. I'll just read through that one as well. So in great unpay wall, those looking for legal open versions of information resources. This is an interesting one, because there are things like the, I think I remember seeing something about like the open access button project. I don't know if others saw that, but this seems related to that. But it would be, it seems like something where we might just want to, whoever we were displaying the metadata, be able to highlight or tag or mark ones. These are sources that are open access sources, right? So any comments on these? Okay. So, and part of to share kind of some of my own thinking with this, I also was thinking of this as a tool to help surface which parts of this are kind of the most important of various places. So then that would also help us figure out who all might want to work on what related to, related to shares as kind of we move forward, whether it be a particular feature in the core code base or looking at metadata or looking at workflows or process or communication, et cetera. So this, I think this will help a lot with that. Okay. So if there aren't any other comments, we've got those there. So I think those are good. So we're going to, I think we want to continue to take a look at those and look and see, okay, then what are some core features they would translate into? I think initially that's probably something that me plus the 220 and B folks will take initial crack at it, but at the moment we're really still focused on a lot of kind of just the core infrastructure of sorts within kind of the converting share to a more distributed work model. So that's kind of where we are still at the moment, but that does lead into the next one where we were thinking about, let me see, I think I may still have it open hopefully, the diagram we showed last time. Yes, I do. Let me pull that up. Yeah, there it is. Okay, and can I zoom out? Perfect. So in looking at this diagram from last time, this is kind of the high level map that we have in place at the moment for the ingest of data into share. So thinking how it would shift to a model where things that the configuration for how things are harvested, how they're mapped, a lot of that kind of linked up and sketched within share read. I think some of this is probably, I don't know if, is some of this new for you, Ho-jung, I think everyone else saw this in the last call. I think it's a nod. I think I saw a nod. Yes, okay. Oh yeah, so I'll try to fill in little background pieces that I go. So kind of the idea being that within share read, we kind of link up these basic harvesting workflows. And within that, each node can do a particular task. And within share read, you can also wrap up multiple nodes into kind of like a super node. So within that, a lot of these things we're working through that if they are, you know, the same every time, or it's just really just a tiny like configuration or a parameter used to apply, we would, that's where we're aiming to keep that as simple as possible on the user administrator end of sorts. So we're, so the, I think Ryan and Cam, you can give an update in a minute on some of the things you've been working on, but some things I'm working on this week. And recently is looking at the data model and data model, and this is not like saying that I have or that we have a like concrete sketch of what that is yet, but in looking at what exists for today for share as one of the baselines. Again, looking and taking a look at what is there with data site and cross ref, thinking about the idea of really wanting to move, share closer to a knowledge graph, being another criteria there. But essentially the thinking here is that we would start with one process of mapping from a particular source to the data model. So that's where these things come in. So we've got thinking that for any source, we would likely have kind of a source schema definition and then have a mapping that then goes to the share data model. And you can see this is colored the same on purpose. So it's meant to be kind of the same thing used later on in the process, but it could be an input to a harvester where these things may just be things that you won't not have to change as much. So the assumption at the moment is that we'll need to make sure we know what to map to in terms of data model, sorry, that keeps popping up. I was hoping I could avoid doing that for now. Yeah, so at the moment, my thinking, and I was bouncing this off these ideas off of Ryan, Cam, and Jeff this morning was that we do want to have a more object-based schema where if you're looking at, let me pull, I think I had this site up recently. I'll just pull it up though. So I'm looking at data site. Data site is particularly focused on data. So I'm focused on the work or the research activity as kind of the primary part of the schema. Let me scroll down. So this is, I think, the latest data site schema release. If there is a newer one, definitely please let me know. But essentially thinking all around, hey, there's a record for an article or a data set. But then alongside that, what we've had and shared in the past, what I think is probably one they would want to continue is you also have the independent notion of a creator that may have created many things or contributed, that may have contributed to many things, the instrument notion of an organization that is affiliated with particular work. So kind of like having all those things be mapped out and then also things like subject and discipline also be their own entities. So we could start to group things accordingly and link things based on relative attributes. So that's kind of my assumption at the moment. Any comments on that or plus ones? Does that seem like the right track? This is sure. I think for data site, it's a good track because their metadata is very consistent. When you get to cross-ref, their schema depends on what type of resource it is. So it's very varied. Yeah, I looked at Crossref quickly as well and I had the same reaction where it did not seem like a starting point, but something that we could map to if necessary. Right, so yeah, yeah. But unfortunately Crossref of course has all the papers and things. Data site has papers and stuff, but it wasn't there in the field at first where Crossref is, so it's great. Right, right, right, right. Yeah, so that may be a more complex mapper harvested to build for Crossref, but that's okay. And as we discussed further, we already wrote up the work schema validation for Crossref, and so it's obviously just the first step in that, but it's a step in the right direction. And as we continue, we'll have more and more validation in mapping for Crossref. Yeah, so I think the next step that I was gonna continue to work on was to continue to iterate on trying to formalize the data model more and then we'll branch out to Ryan and Cam and Jeff, and I can certainly also post it on the Discord channel and send a note out to the list once. I think probably like once it feels like a slightly more polished draft to see what people think about it. But I think the benefit we have now is having to done this a couple times. It's able to move past things that are more hypotheticals and I'm very much thinking of trying to stick close to that data site as initial standard as possible. But again, because data site is kind of focused on the work-centric aspect, there's other things that are there. There's also Vivo ontology as well that does map well to a knowledge graph use case. So it's just slightly, it's much more complex. So I'm feeling like we wanna maybe be somewhere in between but again, be able to translate back and forth as needed between these things. Does anyone have any like opinions on Vivo necessarily or think at Virginia Tech you have Vivo or using Vivo, correct? I don't know how much exposure you've had to it. It's not officially launched yet, there's still work on it. Okay, okay. Yeah, but I don't know how much exposure you've had to the internal metadata ontology and things like that. But I'm not sure, I need to check with them. Okay, okay. But yeah, because the, I still have it open. I did have it open before. I can pull it up to show you what it looks like. Is that it, yes I think. This is the one, yes this is it. So this is kind of the high level look at it and no it's not, there we go. Let's see if I can get it to zoom a little bit more. I think it looks more intimidating in this format than it actually is just because all of the arrows everywhere make it look like a lot more. But when you count the actual number of boxes, it's not that many, it's like 30 or 40 possibly, which is not too bad. But this is something also we're keeping in mind of wanting to map well to and also trying not to duplicate too much to create yet another metadata standard and schema that folks would work with. I think that's a part of it as well. I think there certainly could be something at least with DataSite where we might tweak the expectations around fields or whether they're mandatory recommended or optional. DataSite has that designation and I actually just went through that exercise for another thing and there were at least a few fields that I marked as mandatory that they had as recommended like description I think should be mandatory, not recommended, but just little things like that. I think it would be the easy ones to tweak but it still kind of sticks through the standard. So that's where we are with that. Any other questions on that? Any comments? Ho-jung, did you have anything from your end? As you guys have done quite a bit with Data? Nope. I'm just trying to catch up at this point with what you're doing. No problem, no problem, no problem. Okay, so let's go back to the agenda then to continue to stay on track here. Okay, so Cameron, did you have anything else that you wanted to share today? Yeah, I will share my screen really quick. Okay. Okay, so currently this is a list of all the schemas we've written for different API endpoints. So you have one for archive, one for Crossref, one for GitHub, DataSite, Dryad and this is a list we're gonna be adding to over time as well. If you look at any of these like the GitHub one for instance, basically what it is, is validation at a very, very minimal level where you can say I expect this field to be at this API endpoint, which is something that for some reason doesn't really currently exist. And then from here, once you validate that all these fields exist, you can go through and start mapping them to whatever kind of schema you have in your product or your dashboard or however you wanna do that. So this provides a good starting point for normalization of data and for letting us take in a whole bunch of different APIs and sort of like combine them into one like sort of super API, which is kind of, it's not the end goal for share but it's a good way of taking this necessary step and then putting it to immediate use where we can bypass the storing of things for now and just provide a mechanism to like view different APIs and sort of conglomerate that data into one, one mass that you can then query through and filter through. So this is like really great for example, like UCSD dashboard where you can check five or six different end points without having to run an entire server on your own and then sort of display all that data at one location. And the next step for this obviously would be the mapping portion but this is just the JSON validation for the API and points it in the responses you get from them. And if you guys have any recommendations on other APIs you'd like to see added to this list, like please let us know. We're always trying to expand it. Rick, you're muted. Thank you. Thank you. So thank you. Yeah. So if there's, is there anything else? Ryan, do you have anything? Pretty much what Cam is going over with the schemas that was about all that we were working on right now. Okay, okay. Great, great. So that covers what I said when I was muted then. I will mention one more thing and that's that work on decentralized database stuff is going pretty well. We have a few test instances up and running of just very, very basic like a communication protocol between several nodes but it's getting there. They were making progress. Great, great. Yeah. So like kind of along those lines. So one of the things that we also have been talking about offline was how to start to pull in some other folks to contribute some code. And I think the initial idea is to draft up some smaller tickets probably as GitHub issues, right? As was thought of. I see some nods. Yeah, so I think that's what we had talked about. So look for that. We figured that would be a way to try to bring folks in addition to kind of looking at this many aspects. So like once I think we have the draft metadata model in place that at least assuming this is not going to be the like the final destination but at least something to start working with. And then as we start working with we'll of course see any holes that are there. So then we probably would identify okay for various sources and even if I guess that makes me think would it be useful to compile a list of preferred sources as well to see where there is any convergence around that? Yeah, that seems like that would be useful. Okay. So I'll just start another document for that so that folks can add into it and also take themselves as this is a useful source. I mean, there's the obvious ones but I think it would be good to understand the ones that are more weighted. Yeah, yeah, so like the idea. And Rick, what do you consider a source? So I would consider so any kind of data source. So like data site is a source. But an individual repository too? Yes, those are definitely sources. Fund or database is a source. You could have, so not to leave it as just community things. So we've also thought one of the opportunities with the re-architecting of shares that it also opens it up to bringing more local data sources in as well. Especially if like otherwise you wouldn't have wanted to push it out to the wider global community but it's still as relevant when you are doing kind of local analysis and monitoring and tracking and information gathering of sorts. So like local sources as well. And that, and there could be even if it is like a local source that you wouldn't share if there's like a common technology they would want to integrate with that would definitely be one that I could see being something that could be multiple places if that's the case or like the same kind of data interchange or API that we would want to interact with. Like LDAP is one, for example, if we're trying it looking to get any data that would coming from that kind of source, et cetera. So, or like, I guess this is where like it probably most likely be one that we wouldn't push it beyond your own local instance of share but things like some of the more people soft type systems things like that. Okay, all right. I think that is all I had for today. Is there anything anyone else wanted to add to the agenda or bring up anyone? Virginia Tech, from your end, since you are signed on to be in another instance note of this, like is there anything in your mind being to think of, okay, this is what we need to try to figure out in order to get organized and plugged in ourselves? So I'm wondering, are we gonna do this until all the development is done then we're doing the operations or we should start thinking about replicating the current share, not the file and get it running on our end then start to evolve towards the new version. Arvina has, I think has a good question that, yeah, go ahead, ask it. Yeah, so as is my first meeting. So what I got is all the source and destination data movement is kind of a pull movement. Like you will run as a job and that will fetch data from the source and put it to your repository. Is there a way to make a push kind of system so that if, say, example, I'm running some job and I'm producing some data, I can directly provide it to your system without first storing it in some data source and then starting a job? Yeah, that's a good question. So obviously that would require some kind of listener within the infrastructure, right? So the, I mean, that has existed for share as of today with the current share that's in production and it's good to know that we would wanna try to continue some kind of support for that. Is this part of red functionality or not? I'm not sure. Say again, other node red? Yeah, listening part. Yeah, I don't, K.M. or Randy, no answer that or? What's the question exactly? So, go ahead. So most of what we're doing right now is to pull stuff from source. I wonder if red allows, we have some kind of a listening part so people can push to us. Oh, I think there's no reason that couldn't be done. So the way this would end up working is that we would have an API with probably like a public endpoint of some sort where you could submit. So we would be submitting data through that public API. Like we'd be posting to that API using node red but what you can also do is just post directly to that API and I'm sure there'd be some sort of data validation process to make sure that it's real data coming from a legitimate source. But if you're whitelisted or something else that data could probably just go right in. So, can we do that? Yes, definitely. Yeah, and I think kind of like the closest thing to that that I personally had thought about so far was like if we had shared compute infrastructure, if anyone would want to submit a particular job or say I wanna run these 10 jobs again, that's, we had already been thinking about that kind of listener. So like kind of more of a job queue, but yeah. It could even be, there's a data streaming source that registered with us then can format and why I'm not sure. But is data streaming also something that we could consider including this? For example, is there this Twitter API type of stuff? Yeah, I don't see why not immediately. I think that sounds like a good idea, but I would have to dig more into it to make sure that that was feasible. And also to your point earlier, Node-RED can be used to make these sort of like, the whole point of Node-RED is that it's modular, right? So you can, you take in the data with one node and then you send that data off somewhere else with another node. But instead of taking that data in with the first node, you could just have a list of pre-compiled data stored in that node. So you could just submit through Node-RED if you still wanted to and use the rest of the functionality that has been built out. Yeah, and I would say also just to continue to try to reframe how we proceed. Like if even if those are things that are not like immediately on the list for Ryan or Cam or others, that doesn't preclude Virginia Tech from also working on some of those things. I think that's really what we're trying to also to try to surface is what are all the areas that are gonna be really important? And those are also likely the ones that you would, you know, feel the most benefit from contributing your time. So I think those are all the things we're really trying to figure out here. So I would say, don't feel like you need to have Ryan or Cam do that work, right? Is the main message. I think like how we're able to plug that in is where we wanna get to, where you don't have to, where it'll be clear. Hey, I wanna add this particular module or support for something and then have a pretty good idea of where to start. Right, so get our projects in general, even when they're public are not necessarily always easy to contribute to because making a project just physically open doesn't really necessitate that you take all the steps that are needed to make it easy to contribute to. But our whole plan with this from the ground up is to make sure that they're, you know, easy discreet tickets for people to get involved with. There's an easy PR submission process. There's an easy communication process, just everything, like good documentation, everything to facilitate that anyone who wants to work on this can easily work on this. And so hopefully we can get more people involved too. And I'm thinking out loud that I think we should probably have the goal for the next call to have at least a small set of issues for folks to jump on to start contributing to. And so we currently have the next call, so it's like having these things be monthly four weeks from now, on September 20th. And if folks want to have another call in between, that is great, that's fine too. I think we're looking at a minimum having the monthly at this point. Any call for doing it sooner than that, like in two weeks? Let's just think about this. Okay. We'll reach out to you. Yeah, maybe I imagine that'll change once we start getting more tickets documented, things like that, and even things that may warrant more weekly check-ins at times, depending on people that are, if we're depending on how active, who these folks are working on this stuff. So, okay, so let's do a quick review of the action items, then ones that we can try to pull out in here. Okay, so it was... Lift of issues. Let's see, the other one was a document for... Sources. Sources, yep, I shouldn't use the word compile, yes. This is kind of a stab in the dark, but it would be useful to post these on some sort of like public webpage somewhere, or at least maybe a forum where people can submit use cases if they're interested. So you can just share that on Twitter or something like that, where people who might not otherwise respond or get involved in the community as a whole can easily contribute to the document. Just go Google Form or something. Yeah, you were thinking for use cases? Yeah, or for any of these. For use cases? Yeah. Okay, yep, yep, yep. Is either anything else? Maybe that was it. Besides just things that are already ongoing. You could put down a draft, we have a model, the next call, just to try to put a date against that. Okay, all right, thank. Are we close to things for today? All right, I think if there is nothing else then, I think I will wrap it up then. Thank you again all for joining. Like I said, I hope more and more that we continue to get more into the details on these and less presentation and more active discussion and planning and coordinating, so. All right, and then we'll see in terms of we have like design discussions and things like that, what's gonna be the most effective way to do those, to pull in as many people as makes sense, so. Okay, thank you all. Yeah, so I guess also Jibu would answer your, I think, I guess we didn't ever answer your question about like whether to stand up, share current share or the new one. And I think we were assuming that you wouldn't stand up the current one yet, that we would try to start from kind of this new architecture if we could. Do we have some kind of all-park estimate on when this new architecture would be in place? There is no estimate yet, so I think, but I think it probably would be useful to start to set some small incremental dates. Yep. I think we can start getting the issues and all the things in GitHub that'll pull that a lot faster. Yeah, okay. Yeah, but also like there's probably some like high level milestones we can try to document, like having an instance of the server, having a job queue running, like kind of things of that nature, right? So. And I would think to be able to harvest everything that the current share notify is harvesting that probably would be one of the milestones that we're looking for. And I think for that one, it may be useful to take a look at the list of what's there and whether we would like assume we harvest everything now, but then look to see if there would be anything that we would do sooner rather than later, right? Since there are a lot of sources in share now, I think that would be the only thing making sure. But at the same time, like one of the goals for this is to make it easier to configure and map sources to things. So those are also things that we could make it easier to farm out, to try to help fill out that list. Oh, the other thing. It's from before this, how many? Okay, and we can probably fill in the list as we go. All right, okay. Well, thank you all for your time and we'll look ahead to the next call. And does everyone know about the Discord channel as well? If you have not joined this yet, definitely please do. There is a link, I believe, on this page, I think it is maybe in the agenda as well on how to get to Discord. Yeah, there it is. So you should be able to get to it from there. So you can join that, but also we'll be posting to the, and just feel free to post to the shared DIP list as well. But Discord is kind of more where the active conversation has been happening, at least with the initial set up for folks with myself, Jeff and Kam and Ryan, but others are already on there as well. At the moment, you'll be able to see the informal conversations happening. So, all right, thanks all. Hope you have a good rest of the day. Take care. Thank you, bye-bye. Bye.