 Hi everyone. Welcome. My name is Jonathan Portz. I have no particular affiliation at the moment. Information about me is on my website, which is portz.net, P-O-R-I-T-Z dot net slash Jonathan or just slash J. These slides and lots of other information data files and code and everything all available at portz.net slash J slash share slash how many we are all one word lower case how many and then we are uppercase. All right. So I want to talk about the question of how many we are there are. This is something I got interested in a while ago. Before I get started, I should mention that I began this work while living on the empty territory of the youth peoples. Other people in that area are the Apache, the Arapaho, the Comanche, and the Cheyenne. I am grateful for the opportunity to have worked there, lived and worked there. I am no longer resident there. I live in a new place for which there is no tradition of giving land accomplishments. All right. So as I said, I'm interested in the question of how many OER there are. I wanted to investigate this. I was curious about it and this presentation I'm going to tell you about what kind of answer I was looking for, how to make the question a little more precise, how to go about getting the answer to that question, and also what I could understand from the answer I was able to get. Lots of details from these slides that I'm not going to have time to say now. Please look at the slides for that information. All right. So what kind of answer do I want? Whenever I see a single statistic, I sort of feel like it wants context. Also, I did not give a particular date and time. So I want all, at all dates and all times, I want to know how many OER there are or were. And so let's put all these numbers together. I want to graph. I want to graph how many OER there have been over all time up to the present. What are the OER that I'm counting? Well, the UNESCO OER recommendation gives a sort of definition I'm taking from now on is canonical. Basically, what it amounts to is works that under Creative Commons licenses, the four common license, Creative Commons licenses by, by SA, by NC and by NCSA, not the Creative Commons license by ND or by NC ND and certainly not all rights reserved. Also, two statuses which are CC0 and the CC public domain mark also count, although those are not licenses. All right. Be a little bit careful. There can be other licenses to copyright statuses. So, for example, some things that have fallen to the global public domain and could have the CC public domain mark, if some qualified person chose to put on them, do not. For example, the, the Raffaello painting behind my head. Raffaello died a long, long time ago and all those works are now in the public domain. So it's not the actual painting. And so reproduction might not have the actual public domain mark, but nonetheless might be involved. I mean, also, I think we need to be aware in the OER community that more and more books, things that look like textbooks will have interactive elements in them, maybe entirely interactive, like Jupyter notebooks and so on. And so maybe filled with code. Code is not something that should be, you should have a CC license as a group of commons folks tell us. So that's something to keep in mind. All right. But okay, they're materials, learning, teaching and research materials from the UNESCO OER definition. So what kind of materials, handbooks, videos, software? Well, let's start by just counting textbooks. Why not? We can go and count more things. Other things are harder to count, but let's count textbooks. What exactly is a textbook? I'm going with US Supreme Court Justice Potter Stute here. I know one when I see one. Basically, I'll take people's word for what they say. I'm a little bit more willing to take things as being textbooks that maybe some folks aren't, for example, most academic monographs, I would consider when using a classroom, they're a textbook. So it doesn't even have to say it's a textbook for me to be a textbook. All right. Also, if I'm going to make a graph with time, what is time? So the ideal thing would be sort of when it's been shared publicly. So publication date, issuing date, something like that, a very common replacement of that would be the copyright year. It's sometimes hard to get one or the other of those numbers, or dual dues, whichever one I can find. Finally, we should ask the question, when are two OER? When should I count them? When should I increase my counter or one or two? They're different because we do a lot of adaptations and remixing in the OER world. Well, there's an answer in copyright law. There's a certain amount of minimal amount of creativity that has to be added to something if you were to be a derivative work. That's going to be when I consider it a new work. If it doesn't have that initial amount of creativity to be considered by copyright law to be a new work, I will consider it just another one of the same. Let's do a test case. The high quality data set, quite somewhat small, although more than a thousand books in it by now. The Open Education Network's Open Textbook Library, they share their data. Here's the graph. Grows pretty fast. That's pretty beautiful. I like it. You can see shooting up from 1985 or earliest days of things in the Open Textbook Library. That's very pretty. Let's ooh, I hope it was sort of growing exponential. Let's put an exponential curve through it. I put the best exponential curve through that data. Not a very good fit. How about if we do some linear regression? So there do seem to be two linear regimes in here. That's kind of interesting. What does that mean? Actually, so linearity in data is very rare in nature. I say this as someone who works as a data analyst. Sometimes my guess is that basically during the whole life of the Open Textbook Library, there have been so many books available that could go into that library, they were basically always operating in a capacity, capacity for their intake gestion process. And so those two different regimes represent two different sort of workflows, numbers of available staff, processes in place that could deal with a certain number of books per day, per year, whatever. In those regimes, that rate of adding to the OTL was constant. Therefore, linear growth, there was two different setups happen and we can see therefore two different kind of linear regimes, but there were always more and then could have been counted that way. Another test case, the BC Open Textbook collection. Again, they were very kind to share the data with me. A little more recent, not quite as big, only getting up to 400 and something, that's pretty impressive too. Also growing quite quickly, other than a nice knee in that curve, as you would say, putting a special growth through it doesn't look exponential, that's unfortunate. Linear regimes, two linear regimes, again, my same explanation of what I think they're two linear regimes. How about Open Stacks? Much more limited collection of Open Stacks books. Again, I think you'll get the... Open Stacks has two different numbers on their side about copyright date and publication date. I'm using what they call the publication date, which is weirdly different, but I don't know what that's about. I haven't answered my questions on this topic. Anyway, exponential regime really looks ugly, exponential fit, but two linear regimes, again, seems to be my same hypothesis. What's going on? Here's an interesting case, the Directive of Open Stacks books. It's a large, wonderful, seconding thousands. Look at the top of that y-axis is 25,000. These are Open Stacks, but there are a variety of different licenses and regimes. One nice thing about it is it has a lot more non-North American books, and it also has a lot of commercial things that however have been released with some open license. They share data. I made this graph. If you look carefully at this graph, it doesn't look like it really, so I'm not going to put it in an exponential, the whole thing. It turns out if you put a linear regime at the beginning and then an exponential regime afterwards, you get very nice fits. Again, you would guess my operating at capacity up until 2005 or so, then they blew their capacity wide open, and they matched the exponential growth that was out there in the ambient community, which other folks have not been able to match because of their individual internal capacities. That's my hypothesis. Let's go to do the whole project now. Okay. Let's what's the steps would be and make a list of all existing OER, remove list of things that are not textbooks, remove things that are not the right creative commons licenses or copyright status. Make another list containing the copyright years or publication years of each of the items, sort the list, can many, and make the corresponding graphs. I think you see the problem with my approach. How are we going to get a list of all the existing? You could try to crawl all of the known OER policies. You're going to miss a lot of things. You're going to get a lot of duplications. The things I've shown you before already have a lot of duplications among them. So technically, it's hard telling the difference between calculus, second addition and calculus. 2e is hard because there are lots of different variants. I will continue working on this, but I unfortunately do not have a single graph, which I claim is some sort of universal graph. I have lots of individual ones as I showed you. What have we learned? I feel like what we've learned is that there is a body of OER is growing basically exponentially at a doubling time of about 3.8 years. That's what the best exponential models that we've seen so far seem to indicate. It wants to grow that way, but lots of different particular platforms or locations are at capacity and therefore, because as I explained a minute ago, therefore, they look like they're linear growth. This suggests to me that in short term, adding capacity to support groups will result in great and great out numbers up until we get to that exponential growth rate, as we saw in the DOAB. At some point, then that growth, the exponential growth will run out, exponential growth is never permanent. That'll be a beautiful world. I'm not worried about that. That's a great world I'd like to live in that world. What else did we learn? It's very hard to do research on OER because they're spread out. I don't think that's a bad thing. There's someone I know who calls to say they want to be the Facebook of OER. I don't want to participate in Facebook. I don't want to participate in Facebook. I think the richness of the OER community is because it's so well spread. It's hard to find good metadata and more metadata makes things more findable and the approach to the internet of searching rather than indexing is what I think will work for OER. Thanks to lots of people. Please see the slides for details and contact me.