 Welcome to this afternoon's webinar, Queering the Share Datasets. My name is Judy Rutenberg. I am a Program Director with the Association of Research Libraries working on the Share Initiative. Share is a higher education and library initiative to maximize research impact by building a free open data set of research activities across the life cycle. As a fully open source project, we are particularly keen to provide this webinar for the library developer community, in the hopes that you will contribute to and derive value from Share by creating tools and services on top of the open metadata we are aggregating. My job today is to go over a few brief logistics and then to introduce our speaker, Erin Braswell. This webinar will be live captioned and recorded and then posted to YouTube in approximately one week. Due to the number of participants today, questions will be through chat only as your lines are muted. I will read any questions we receive aloud for Erin or myself to answer. With that, it is my pleasure to introduce Erin Braswell, a developer at the Center for Open Science who has been working on Share since it began in 2014. Erin is an astronomer by training, a very talented instructor as you will soon experience, and a professional wrestling enthusiast. Thank you all for joining us this afternoon and now over to you Erin. Thanks for the intro Judy. So we are going to get started. So this is going to be quite an interactive webinar. So we are going to run through a bunch of slides that are going to be querying the Share API as is specified by the title. And so to get started, I am going to share my desktop and share with you something called Jupyter Notebook, which is an interactive Python notebook. So we will be making web queries as we go. Here is our Jupyter Notebook, which will let us run some Python code. So here we go, we are going to prevent the mode. So you can see these slides and some example notebooks at this URL. Some other examples that go a little bit more in-depth in querying the Share API. And you can download the code along with some more in-depth examples on how to run these things with the URL. Okay, so here are some examples of how to get started. So you are going to need to use Python. So here is a basic guide to getting started with Python. And again, a link to the slides on GitHub. And just a little bit of information on just how to get up and running with Python. You would use some tools that are highlighted on the readme to get started. And you would be able to run this exact same notebook that I am running in a terminal by running the command Jupyter Notebook. And you can follow along if you are familiar with Python. So to get started first, we are interested in just doing something very simple. We want to get a list of the current share providers. So we can make an API call. We are going to basically go to a URL and ask for that information at that URL. And from that URL we will get the official name of every share provider. The URL for the home page of that share provider, the short name or a nickname that we use in the system to make sure that we are querying the right one. So it is a nickname. And we will use this nickname when we query for documents inside the source. And we are also going to get the Savicon or the provider's image as associated with that provider. So here is some code to get started with Python module called Request. And we are going to use Request to get this URL. So we will run this cell code and then we have that ready. And now let's see what our data looks like. So we are going to print out what that data looks like. It is going to be quite big and bulky and there is a lot of information there that we don't quite know how to interpret. But we see some shreds of what we are looking for. Something called a long name which was the provider's official name. The short name or the nickname. The URL which is the provider's official home page. And then something called a Savicon which is actually the image for each provider stored in a data URL format. But that is not very nice. It is not very easy to read. So we are going to format that in a way that looks a little bit better. So we can use just some very simple tools that this iPython notebook or Jupyter notebook provides. And some simple Python code to then print out the image. The long name with the official name. The URL for the official home page. And as well as that nickname that we are going to be using throughout. So if we run that, that looks so much better. It is a lot easier to see. So for each of the shared providers we have their icon, their official name, a link back to the original, and then the nickname, the short name that we use in the system to differentiate between the providers. Here are all 99 providers that we have today. Let's scroll through that real quick. You can see some of those icons are different sizes, but they are all in there. So in order to make queries to the share API, we need to know some more about the share schema. So the share schema has a few required fields. Those are fields that we look for in every single document that we harvest. Those fields are titles, contributors, a URI, which is just simply a link back to the original item, be it the original item or like datasets related to that item or many different formats. And also a provider updated date time, which is the last time that the source that we got the document from updated the document. And so after we harvest the documents, we add some additional information just so we can make sure that we have unique documents in our system. We have a source, which is where the document was originally harvested, which will be one of those short names that we looked at before, as well as a document identifier. So we try to find unique identifier for that object from that particular source. Sometimes that's a DOI, sometimes it's an internal identifier. We always try to find a unique identifier for each document. And when you combine those two fields, you make basically a unique document identifier for the source and then that identifier for that source. And to see more information about the share schema, including an example of a document that would have all of those fields, you can follow this link. And then you can see the many, many, many fields in the share schema that we try to look for. So with that information in mind, we can start building up some simple queries to see the kind of data that's in the share dataset. So we'll start with the basic URL, and then we'll add some arguments to that URL. The size, which is how many results we're going to get back, how we want those results to be sorted, and the firm parameter, which is where we want to start in the results. So first we will define a basic base URL that we can start with. And from there, we're going to use a Python tool called Spurl, which will just let us very simply build up this URL with arguments. And so we're going to add that size parameter, that sort parameter, which is the last time that it was updated. And then we're going to start at number five, just so we can start a little bit into those results. So let's take a look at our URL so far. We've added some parameters. So the URL is this. So we're going to be making a query to the share API using size of three, sort of provider updated daytime, and we're going to start at result number five. So let's take a look at our results. So we're going to request that URL, and we're going to get back data in a format called JSON. This is a series of key value pairs. So we can parse through that data. So let's run this self. And check out our three results. We're going to print out the title of each of those results. We're going to print out what source it came from. And we're going to print out the date that it was last updated. We have three results. One about protein targeted corona phase molecular recognition. From MIT, actually it looks like the most recent document from MIT. So it's probably the most recent data. So if we're interested in narrowing down the results only by source, we can narrow it down to only results from MIT. So we're going to add a query parameter using that source parameter that we add after we harvest every document. So we're only going to be looking for a document whose source is MIT. So we're going to use some of the exact same code that we used before and print out the title, the source it came from, just to make sure there are a lot of these. And then also the date that it was last updated. And so we get some results back that are all from MIT, which is good because we were looking for it. So those are some very basic queries using just paginating through them and also looking for results from just one source. But we can also use the Share API to build up much more complicated queries, and we will do that now. So first of all, we've been repeating a lot of our code, so we're going to define a helper function in order to make this go a little bit smoother so we have to type last thing. So our helper function is going to be called query share, and it will just take a URL that we would like to use and then a query that we're going to pass along. It's going to make that request, and then return up the value in the form of JSON. So we're going to start building up a query. So this is what Elasticsearch format looks like. We're going to ask for five results, and we're going to ask the API to return us results that have a field called sponsorship. So we're adding a filter that the field sponsorship exists. Sponsorship is not one of our required fields, so we want to make sure that we're just going to get results back that have the field sponsorship. We define that variable sponsorship query. So now we can run that query and then print the results. So the results, we're going to use that helper function query share. We're going to give it that URL that we built up, as well as that sponsorship query that we made before. And then we're going to iterate through those results and then print them all at a time. Ah, right, but I forgot to run this cell. There you go. So we're going to actually save that sponsorship query and then run this cell one more time using that sponsorship query and that result URL. So again, it's going to print out the title as well as the source that it's coming from, clinical trials, and then what the sponsor information is. So this one is sponsored there. This one is sponsored by a different place. This one is sponsored by a university in Copenhagen, a university of Vermont, and a specific cancer center. So this is iterating through results that have specific sponsorship. And you can see information about the title, where it came from, and then who the sponsors were. And this is, again, only the first, only five results to the sample. So you can iterate through all of the results that have sponsorship. So let's do a new query. We want to know how many results do not have subjects associated with them. So we're going to create a new query object just like we did before. And then we're going to query share using that URL and get the total results back as well as the total results that we get from that query so that we can do some interesting things with percentage results. We would like to know what percentage of results do not have subjects. So we can go ahead and print that result. So it looks like three million results out of a little over five million or just about 68% of our results do not have subjects attached to them. So that's an interesting metric to know, especially as we move into our next phases of trying to enhance the share dataset and making sure that more of our results have subjects attached to them. So in addition to making raw queries just using Elastic Search, building them up yourself, we developed a share parsing and analysis python library that is available. All of the code is open source. Here's a link to all of that source code. And that tool is called sharepa which will guide you through accessing share data. And so sharepa has a couple of basic actions that are really simple to use and will give you access to information about documents in 10 documents like this. And one of the most basic functions is a count. So you can use sharepa just to get a raw count of all of the documents that are in share. So from sharepa import basic search and then basicshareps.count. And we'll run that. And we can just really quickly see there are 5,297,000 documents in share. So we can iterate through those results. We had that basic search. We got the count. We can then do a basic search and then iterate through those results and come back from this. So we'll go ahead and execute that basic search again. And then for every hit in the results we will go ahead and print the title. So here are the first 10 titles, the most earliest titles in share at the moment. So we can also use sharepa to slice results. We can access a different subset of results. We're going to print out five of them. We're going to start actually, I was going to say 20 and 25 because Python index is everything from zero. It's actually going to be the 21st and the 26th results. This is going to show up. So we'll go ahead and execute that search again. But instead of giving us the first 10 we want to specify a range. And that's going to start a little bit further in and then only give us five titles. So by default, the oldest results are returned first. But instead, we probably are most interested in the most recent results. So instead of getting the oldest results we can sort by what we call provider updated daytime which is essentially the last time that the provider updated that document and we store that information and we can sort our query by that parameter instead. So we'll go ahead and run this query sorted by provider updated daytime and then print the title as well as the last time was updated just to make sure that it looks okay. We'll go ahead and run that and so these are the titles and it looks like everything was indeed last updated today. So that's a good sign. And we can see that we have a wide variety of results in share. So these are the 10 most recent documents. So we can also do some more advanced search using Sharepa and using some of those queries that we've dealt with before. So we'll start up off by querying for objects that have a subject area. So that's what that query looks like. So we can then go ahead and execute that search again sorting by most recent document search. So this query will give us the number will give us the top first 10 documents that have subjects and then also print out those subjects for us. We'll go ahead and run that. So it says the title of the document as well as the subjects that are attached to that document as well. When you're making queries to a new and unfamiliar API sometimes you run into problems. So we're going to step through making a query to the share API trying some things out and then making adjustments as we go. We're going to start forming a search that we're not exactly sure how to make but we're going to figure it out. So we're interested in seeing how many results are specified as being in a language other than English. We want to know how many tagged non-English language results are in share. So we'll start off by mimicking the structure of some queries we've seen before. We're interested in query strings, type of query, and we want to look for not languages equal English. That sounds pretty intuitive. So we're going to go ahead and find that query and then we're going to go ahead and execute that query and then print the, for each result print the languages. Make sure that we get it right. So when that happens we get an error and we get an attribute error. The result object has no attribute languages. So that's a bad sign. That means that the information that we want is not in the query results. So result object has no attribute languages. So we should look into that a little bit more. And if we, so first of all maybe we want to start out by narrowing the query to results that only do have a language attribute. So since language isn't required lots of results are not going to have that information so if you try to access the language directly and it's not there, it's not going to know how to access it. So we'll go ahead and start a new search and add a filter that makes sure that the languages field exists in each one of our queries. That we return. And from there we're going to go ahead and execute the search again get the total number of documents and shares and the number of results that have language information included and then print that out in a nice way to see more information. So there are 213,000, almost 214,000 documents in share or only 4% of documents in share have a language attribute specified so we're able to harvest language information from only about 4% of documents in share. So we're going to go ahead and print out the languages we'll try that again, print out the languages for each of those results and see what happens. So here are the languages that are included in each one of those first-time results and they're all English. So at least we're getting somewhere, we have a query that's not giving us an error. So from here we can continue to refine our query and drill down into that getting the number of results that have non-English results. So another thing that we might have noticed in our other results is that they're all three-letter codes in ENG before we were trying to query for English before. So if we go check out the share schema we'll notice that the languages section specifies that it is a three-letter code that's specified conforming to ISO 639-3. So it's going to be a series of three letters and so if we know that ENG is going to be the pattern we want to not include. So now we have more information, right? So we want a three-letter string and particularly we want ENG to stand for English. So we can continue to refine our query from there. So we're going to use another elastic search tool. So we're going to query for not the term languages English and then we're going to go ahead and execute that query. And we will go ahead and print out some information, the number of documents that do not have English language and then print out the first 10 results. So it looks like there are 17,000 documents in share that don't have English lists and here are the languages for those first 10 results. Just a little bit of a sanity check to make sure that we don't have English lists there. And it looks like we have some German, French, Italian and Latin and no English which is a good sign that our query worked. It's a lower number and also we don't see English in our first 10 results. That's a good sign. So we can also do some more complex queries and do some very basic visualization of the results of those queries. So we're going to do that using both HTTP requests to the API directly and also using SharePath and Analysis tool. And we're going to look at something called aggregation which are very useful queries that return summary statistics over the entire dataset at once because Elasticsearch has information about everything. So it can do what's called an aggregation query which is very useful in many ways and we'll see that. And we're also going to do some very simple data visualizations of our results using a Python tool called Pandas and also Matclotlib to draw those plots. So first we'll start out with those aggregations and we're going to very quickly get summary statistics over the entirety of the shared dataset in one query. So we're interested in the number of documents per source that are missing description fields. So to do that we're going to start building up a query in much the same way as we've done before. We're going to query for something that does not have a description field. And in addition to that query we're going to add something called acts or aggregations. So that field, we're going to do an aggregation on the terms of underscore type which is the field in Elasticsearch where the source is stored. That's kind of how Elasticsearch indexes all the documents. So we're going to save that aggregation query so we can run it later. And then we're going to go ahead and run that. We're going to query share using our helper function that we've assigned before. And go ahead and for each source we're going to print the source and then the number of documents for that description. This will take a second because there are quite a few of them. So now we have just a very long list of a number of documents and each source that don't have description. Which is not particularly useful because we don't have information about how many documents are in that source in the first place because that would be really interesting to know. So we can add that information to the aggregation query to make sure that it returns us something a little bit more manageable and interesting. We really want to know statistics about number of documents without descriptions for each source. So in that ag field, we're going to add a percentage field which will just turn one of the parameters into a percentage which will make it a lot easier for us to print out results. So we're going to go ahead and make that query. Get a number of documents with no description. And then for each one of those sources we're going to go ahead and print out the percentage and also the number of total documents, the number of documents that fit the query, and the number of total documents from each source. So it's a little bit more information. And we're going to limit it. We'll go back to the query. We're going to limit it. We're going to limit it to documents that have a minimum document count of one. We're going to ignore all of those many fields that have a description that have 100% of their documents have descriptions. So it makes the list a little more manageable. So it looks like lots of the documents that we've harvested from NIH we do not have descriptions for, whereas from 60% of the documents from every site has descriptions. So we can also use SHERPA to do some aggregation and use that tool to build up some of the more complex queries so we don't have to do them by hand. It makes things a little bit clearer. So we're going to find out how many documents per source do not have a subject. So we're going to start off with a blank shared search object. And we're going to have that exact same query that we've had before, not subjects equal stars. We want to know documents that do not have subjects. And we're going to add a source's aggregation, our significant terms aggregation. So we are interested in finding, yeah, okay, there we go. So documents from each source. We'll go ahead and run this cell and we have that already to go. So we're going to take a look at that long query that SHERPA has generated that we don't have to quite do by hand. We're going to print that query out. So it's getting a little bit long, but it looks a lot like the query that we defined before by hand. So this time we've broken it up into a little of those people and it's easier for us to write. So we can go ahead and execute that query and then check out the results. So we're going to go ahead and execute that search and then print out some of the initial documents to make sure the cell looks the same. And yeah, it looks, so this is interesting information about the documents and the SHERPA results. Some of our sources don't provide subjects at all, but others of them do. UT often, most of their documents have subjects. A lot of sources are right in the middle. So another thing that we might be interested in is doing an elastic search query to find out what the most common subjects are across the entirety of the dataset in all the sources. This might be really interesting if we're interested in knowing the kind of data that we can share. So we're going to go ahead and start a new search object and we're going to call this subject term filter and we are interested in the field subject and we want to exclude some of the most common numbers as some of the most common words. And we're going to go for a size of 10. So we want the top 10 subjects that we're going to get returned. So we'll go ahead and run that. And so we're going to execute the search and then go ahead and drill down into what exactly we want and then print out those results. So it looks like the top is article, which is interesting. And then we have physics, science, research, mathematics, and engineering, astrophysics. These are just some of the top subjects that different documents can share are tagged. So now that we have that information, we might want to plot it. So we're going to do some basic plotting with a tool called Pandas and Matplotlib. So first of all, we're going to create a data frame. So we need to get the data into appropriate formats so that Pandas can then pack on to Matplotlib and make a nice graph with it. So we're going to get that into a format called a data frame, which is really a lot like a spreadsheet. So we're going to import that data frame tool and then just take our dictionary of results and then put that into a data frame and then print that out and see what it looks like. That looks a lot better than that JSON blob we had before. So you can see the document count and then the key. So we know exactly how many documents came back using each one of the top terms, top subjects, that we quoted for earlier. So now that we have a data frame, we can plot it. So we're going to use a tool called Piplot from the Matplotlib Python module. And we're going to say that we're going to ask for a bar graph and then we're going to ask it to base its bar graph on the document count from that data frame that we just had and then we'll actually show the plot. So here we go. Here's a bar graph of the top 10 tags and the numbers of documents that included those subjects. So let's make that query a little bit more, a little bit narrower. So we're going to make a new search for all documents that have been updated between the years 2012 and 2015 that have the subject science. So we're going to build up quite a big query here. So we're going to create a new search object. The first thing we're going to do, we're going to add a range filter. And then that range filter, we're going to filter the field provider updated date time, which is the last time that the provider updated the document. And we're going to look for documents that have a date greater than or after January 1, 2012, but before December 31, 2015. So we're going to add that filter to our new search. Okay, and execute that. And because we don't have to do it all at once, we're going to build it up slowly. So then the next thing we're going to do is add the subject science and also add an aggregation, which is that significant terms aggregation. So we'll go ahead and do that. So we're starting to really build up a rather large query. We've added a lot of different parts to it. So we're going to take a look at that query that we've built by printing it out. And it's getting really big. And Sherpa and Elasticsearch has taken care of some of the formatting of the query. So you'll notice that there's a lot of nested errors. And that's one nice thing about Sherpa is that it will help you build these Elasticsearch query layers so you don't have to write this out yourself because it can get a little bit involved. So we have that filter, we have that date range filter in there. We also have that subject filter. And we have the aggregation or the significant terms query. So we're interested in the field underscore type, which again was that short name, which is what Elasticsearch indexes the results on. So let's go ahead and make that query and then graph the results. So using a lot of the same code that we've seen before, we're going to execute the query and then convert the results of that query into a data frame. And then we're going to turn one of the results into percent so that it looks a little bit better. And then we're going to limit it just to the first 30 results so it doesn't get too overwhelming so you can actually read the results. So we'll go ahead and run this. And so we can see these are the top 30 sources that have documents using the term science that work between those two date ranges. It looks like CODPRINCE, which is one of their sources that we have run the furthest back. And between 60 and 70 documents is in that time range that have the subject science. Whereas it looks like University of Delaware and a couple other small ones down here have a couple results using the term science that has been updated between those two dates. So say we're interested in just plotting the number of documents per source. And we'll go ahead and limit it again to 30 sources to make the graph readable but we can make it as many as we like. So we'll go ahead and create a blank search object like before. We're going to query for star, which just essentially means query for everything. And we're going to add that terms aggregation, again aggregating on that underscore type, which is the source or the short name for each of the documents. So we'll build up that query. We have a terms query already built. So we'll execute that query again. We'll convert it into a data frame. And then we will sort that data frame based on... So descending, we were interested in the most, the biggest ones up on the left and the small ones on the right. And then we want to limit that to the first 30 results that we get to overwhelming. We'll go ahead and run that. And there is our graph down at the bottom. So it looks like most of our results are from data sites. And then cross-draft some big share in PubMed Central and Vuda1. And then there's lots of there. So today we're interested in making a pie chart. And we want to limit that to the top 10 sources that have data and share. So we'll go ahead and plot that instead of a bar graph, we'll plot that as a pie graph. And there's our output. We can see the same results just in a slightly different format. So say that we're interested in importing and exporting shared data to use it in the wide world. So here are some just very basic examples of how to get shared data into different formats. So say we have that very interesting data frame, we want to export that data into something like CSV or an Excel format so that we can use that data elsewhere outside of our Python programs. So we can go ahead, we're going to do a query, and then we're going to export those results into different files on that to use elsewhere. And so we want to know the number of documents of each source that do have a description. Before we were looking at the number of sources that don't, but now we're going to do a query for the number of documents of each source that do have a description. So like we've been doing, we're going to start with a basic share search object. And then we'll start building up that query. We're going to look for anything with a description. And then we're going to add another term aggregation, and we're going to aggregate on the field underscore type, which is that short name that we have, which is the source. And we're going to make sure that results are returned in percentage format. And size of zero, make sure that we return all of the results regardless. So we'll go ahead and execute that search. Let's take a second. So now that we have our search executed, we're going to make a data frame out of it. And we're going to do a little bit of cleanup for the data frame to make sure that it looks a little better. So that includes adding our own percentage column in that old score column, because the score column is a little bit unclear. And so we are going to multiply that score column by 100 and then make our new column, which we know is going to be the percentage of documents that match our query. In our case, the number of documents that have a description. And we're also going to set the source name as the index. So it will look and know each of the results is indexed by that source name. And then drop that old column, which is called key. So now we're going to go ahead and show our results. So that key column did stick around for some reason. But these are all of the sources. And we have the background count, which is the number of documents that match our query. So that's the number of documents that are total from each one of the results. And then we have the document count, which is the number of documents that match our query. So these are the numbers of documents that do have a description. And then we have the percent, which is that new column that we added. And so we can see documents from SSRN, documents from Mason, and NIST all have a very high number of documents that do have descriptions. And since this data frame is so large, since we have so many sources in share, it cuts out some of the middle ones just to keep it manageable. And then you go down to some of our other sources that have fewer results with descriptions included. So now we have that rather interesting data set, and we have that in Python in a data frame. But we want to get that out of Python. So we can use Pandas actually to convert that data frame into lots of different data formats. So we're going to essentially use a tool. We're going to call that data frame that we have, and we're going to use a tool called 2CSV and a tool called 2XL. And we're just going to pass it a path to a place that we want this to be saved. So in my case, a folder called exportedData. And then give that a name, share accounts with subscriptions, this one's .csv, this one's .xl. So I'll show you, this is the folder. I'm going to have a second go. Okay. Code, share stuff, share tutorials, exportedData. So here, exportedData, that's why I'm going to export my data frame too. So right now that folder is empty. But we're going to go ahead and run this cell, which will take those data frames, that data frame, and then export it into, right into my file. So here I have an Excel spreadsheet and also a .csv that I can go ahead and take a look at. And it will have, it's a .csv, it will have the source that it came from, the number of documents that are total, and then the number of documents that match the query, and then the percentage, so that I can take the .csv and do whatever I like with it. So say that we have some outside data that we would like to use to query share. So in our case, we're going to assume that you have the data in Python where lots of different ways to get a list, a .csv file of names, or Excel file of forkins, or anything you'd like to work with in the Python. We're going to start having a list of names that we would like to use to query share. So we're just going to have two names of two researchers in our university. And then we're going to use those names to start building up a share query. So we're going to say, start off with a new share search object, and then for a name in our list of names, go ahead and add a new query to our name search. And then we're going to go ahead and execute that result. So we're going to start building up, we'll build up a large query for each of those names, and then execute. And we'll go ahead and print out our results, the number of documents that are in our search, and then the title of each of our documents, and then the names of the contributors in those documents just to make sure that our search went okay. So it looks like there are 38 documents in share with contributors that have those names, and here are some of our titles. So we had the first results had a contributor with that name, which looks great. We'll go down to the last one. It looks like this result also had a contributor matching one of those names that we did. We can add an aggregation if we're interested in knowing what sources those results came from. So we'll just take that name search that we built up before with all of those queries. We'll just add an aggregation of the field underscore type, which is the source of the document, and we'll go ahead and re-execute that and save it into a data frame. So we can see what it looks like. So it looks like 12 of those documents were from data site, 11 cross-wraps, 7.9 central. So it's just an interesting spread of all the different sources that had documents with contributors or matching those names that we gave it. So people's names are rather variable. There's lots of people who have the exact same name. But something a little bit more concrete is something like an orchid that is unique to each researcher. And SHARE does try to gather orchids whenever we can find them in order to differentiate between people who might have the same name. So say we have a list of orchids that we know are researchers that we're interested in finding more about and seeing if they have information in SHARE. So we'll go ahead and save that list of sources. And we'll start up a new SHARE search object. And then we'll do much of the same thing. We'll have a for loop. And we'll add a new query for each one of those orchids in our list. And we'll go ahead and query SHARE for that list. So we'll build up the query first. And then we're also interested in the number of sources that these came from. So we're going to go ahead and add an aggregation again on that field of underscore type, which is that short name of the source. And we will execute that query. And so now we'll go ahead and print out the results in those documents. And then put the results into a data frame. So it looks like there are 12 documents with contributors who have those orchids. And so here's the title of those documents. The doc ID in this case is a DOI. And it looks like they all come from the source cross-draft, which makes a lot of sense since cross-draft is one of the only sites that we harvest at the moment that provides orchids pretty reliably for contributors in their documents. So we're interested in those orchids and we got back a list of documents. So the SHARE API has lots of different ways that you can query it. This is really just the surface. There are a few examples of things that I could think of. But it really has lots of potential to answer a lot of questions about the data that was being gathered by SHARE and the data that's publicly available from all of our different providers at the moment. And as we continue to do data curation and enhancement, we'll only make these analyses more interesting and hopefully more useful. This is my final slide, and thank you so much for being here and listening. This is my email address if you had questions. Here's a URL for SHARE technical documentation and more information. And these slides, as well as some example notebooks, can be found here. In the chat window. Okay, while people are thinking of their questions, Erin, I have one, which is what is some of our participants watching today who are not yet providers to SHARE and wish to be. Sure. Yes, so you can go to OSS.io slash SHARE slash registration. And I think we can probably put that somewhere in the chat window. And that URL, you can fill out a pretty simple form and give us some information about your provider. Mainly, we need to know a URL where we can find metadata about the documents in the provider. Hopefully some kind of documentation that lets us know how to filter those results by date, because we're going to go back on a daily basis and check for new results. And so being able to query by date updated is really helpful. As well as maybe what some of your schema for your data. And we also need to know that you're okay with freely redistributing the data. And that's okay with your terms of service. And yeah, that would be the way to do it. And that will get added to our queue of providers to continue adding to SHARE. And we also have, if you are interested in not having us harvest you, but you come to us, we also have a push API. And so you can pre-format the data according to the SHARE schema and then send it along to SHARE. And more information is in that SHARE technical documentation. I believe there's a link to that in the chat window. Great, thank you very much. I also want to, this is Judy, just point out, Erin mentioned at the end of the presentation that during this phase of SHARE, we will be addressing some of the issues that she raised in her queries with missing values and the opportunity for data curation. On the SHARE website, www.share-research.org, there is an opportunity to become a SHARE curation associate. You'll find that information under news and under our team about an opportunity to gain some of these skills more in depth and to work directly with the data in your own repository coming into SHARE and in the SHARE aggregate itself. If that opportunity is of interest, I urge you to explore it. Okay, we're not seeing questions. I want to thank everyone who participated today. And a big thank you to, oh, I see a question. I'm about to thank Erin, hold on. I do have a question coming in. The question is about the associate program. It says that library directors should send applications to SHARE. I presently do not have a library dean. Would a supervisor be okay? Yes, that's a great question. Thank you for raising it. We are really just looking for somebody who, yes, a supervisor, department chair, somebody who can answer questions on behalf of the institution with respect to your application. So, yes, thank you for asking that question. Other questions? Okay, well, Erin, I want to thank you so much for preparing these tutorials and for providing training today. Erin did give her email address at the end. Erin at COS.IO, and you can also reach SHARE at info at share-research.org with additional questions. So, thanks to everyone for ringing on today, and I hope that everyone has a chance to go into the URL provided and explore these notebooks and tutorials.