 All right, good morning. Some people are filing in still, but we're going to go ahead and get started. Thank you for coming this morning. My name is Tom Kramer. I'm the associate university librarian for digital library systems and services at Stanford University. And I am here today with my colleague, Simeon Warner, from Cornell University. And today, we're going to introduce a new project which is actually being born out of the last 10 or so years work of concerted effort on community-based bib frame adoption. And this project is called BlueCore. With us at CNI this week is our other colleague, Jason Kovari, from Cornell University. I believe Jason is in another room, but it's actually hard to see. But he knows all of this stuff, so he doesn't have to be here. So we're the designated spokespeople. But if, at any point, you have questions, Simeon and I or Jason would be happy to have further discussions. And with that, we'll go ahead and begin. Simeon and I are going to talk for about 20, maybe 25 minutes, and then there should be, we hope, ample time for questions and discussion at the end. So here's the conclusion of the slide if you want to jump out to a different track right now. But we believe sincerely that bib frame and link data are the future of bibliographic description as brilliant and as wonderful as Mark is. It will not be with us forever. And to us, it looks like bib frame is the right next step and the next evolution. We've been working for a number of years through the LD4L, LD4P grants to work with the Library of Congress and others in the bib frame community to really understand what adoption and application of bib frame would be, especially for the production of metadata and for cataloging. And we think that bib frame is at a certain point as is significant changes in our library systems environment where we are at an inflection point and we believe there's an opportunity to take a new direction, which will be significant and pervasive. And this concept, which is still in development, is known as BlueCore, which is a project to develop a truly shared bibliographic store of linked data. And this will not rock you, but we believe that linked data is best when it is linked rather than copied. And one of the realizations that we have realized through experimentation and observation is that trying to replicate the mark-based environment with every institution copying its own linked data is perhaps misguided. OK, easy audience. This is going to be good. Not only do we want to develop this truly shared store of linked data, we believe that the tooling is now on its way to make this real and to move from the area of theory and ontological discussions about the importance of these descriptors or those relators to actually being able to produce useful cataloging. So the LD4L, LD4P grants, actually started back in 2014. Cornell University kicked this all off with Dean Kraft. And many of you may remember Dean. And we once quipped, what's the, oh god, what was it? What's the problem that we're trying to solve? Dean said, the answer is linked to data. He says, now what's the problem that we're trying to solve? So he was a real convert. He understood and really understood and celebrated the benefits of linked data. And he kicked off a wide set of grants, starting with Stanford University at the beginning and at various times partners with Harvard University of Iowa and Princeton, as well as cohorts, to explore how linked data could be better baked into library systems beyond great side projects and actually work their way into the core of our work. For the first two years of this series of projects, we did exploration, developed a set of use cases, and really did a deep dive into the ontology of what was then bitframe 1.0, as well as some associated ontologies. That was followed up by two companion grants from 2016 to 18, which looked at developing tooling to apply those ontologies and still more ontology work, as well as a new series of grants starting up at Stanford for linked data for production, which is what LD4P stands for. And this is linked data for metadata production. So that was sketching out what a original cataloging and copy cataloging workflows might be in our current systems environment and doing ontology extensions of bitframe into non-book materials. The 2018 to 2020 set of grants was LD4P2, and that established working sandboxes for cataloging tools, conversion tools, and authority lookups. And the current incarnation of the grant is LD4P3, and we've called that closing the loop, which is not enough to simply produce the linked data. We need to plug into the rest of the bibliographic ecosystem so to be able to place orders, to be able to circulate materials, to be able to do reports, to be able to transport data across the environment. Over the years, I think we've made and seen substantial progress. So bitframe 1 is very different and less mature than the current versions of bitframe 2.0 or 2.3, 2.4. We have produced, we've now seen a couple of high quality linked data cataloging tools that are developed. So one of them came out of this grant, which is Sonopia with development led at Stanford. Cornell has invested heavily in developing questioning authority, which is an authority lookup and cataloging aid. We have worked with the good people at Share VDE to do large scale conversion and establish a data pool for the program for cooperative cataloging, or PCC. There's been a lot of work on incorporating linked data into actual discovery environments. Cornell in particular has put into production both knowledge panels based on linked data, as well as new browse interfaces. We've seen widespread adoption and well, widespread interest in working groups within PCC with a whole set of professional catalogers engaged in this, working on training, documentation, and approaches. And if you have not heard of it, LD4.io is an ongoing organization that is taking up a lot of the outputs of these grants, moving them to the community. There's an annual conference in the summer. We're seeing about 1,500 to 2,000 people participate in this virtual event every year. I think the mission when we think of all of this is something that Dave Eichmann from University of Iowa coined, is we see a world that is enriched with library data, and we see the libraries that are enriched with the world's data by moving to linked data. But one of the things that we have realized is that we've probably carried forward some of the assumptions about how a mark-based environment for metadata works, where every cataloging department is in their own kingdom. They're able to import the data and tweak or fiddle with the data, customize, enhance whatever verbs that you want to use. This creates lots and lots of copies of data. This is how our current systems work. But it's maybe not the right model for linked data moving forward. If we're talking about linked data, does every institution need to mint its own URIs for the same entities? Maybe not. Is it realistic that every institution will run its own triple store or its own RDF store? No, it's probably not. The technologies simply are not as robust or as simple to run as we've seen for relational databases. Are we mindlessly or maybe in some cases deliberately trying to copy how the mark environment has worked for the last 50, 60 years? It's probably true that we are. Shouldn't we be focused on making links to authoritative URIs that are shared rather than creating their own and tweaking with them? And isn't this an opportunity to reimagine what a 21st century metadata environment might look like with new models, new tooling, new workflows, new efficiencies, and actually better affordances for discovery? So with that, I'm going to turn this over to Simeon. OK, so I'll start by celebrating Mark. There are few standards that we use that are just about to turn 60, right? This is unbelievably successful and much credit due to a great many people involved for a long time. Of course, I'm going to move away from the celebration now and right on the party a little bit. Mark has provided us a very valuable infrastructure for a very long time and got us a long way. But it has some really fundamental flaws, which I think are an impediment to progress. You can argue about the detail of all the information in Mark, all you want, but I think it's really these sort of very high level things that are the things that really motivate us to move, things that we can't solve by noodling with Mark in any way. Mark combines in one record information about entities that should be managed separately. And Mark doesn't use Identify as well for linking and for control vocabularies. There are just things that are too many strings. Now, people have tried to sort of retrofit links into Mark. In fact, we've done a lot of that at Cornell as a kind of stop-gap measure, but that's not a long-term solution. What we need is something else. And the something else we're talking about is Bipframe. Now, again, you can raise plenty of criticisms about the details of Bipframe, but I think it has two key features which make it the right next stepping stone. Firstly, it does properly separate the key entities that we're interested in. And it's also maybe this is a good feature and a bad feature at the same time. Because it's closely allied with Mark and evolved from it, it supports what will be a very necessary migration from Mark and, in fact, back to Mark backwards and forwards for a long transition. I wouldn't like to predict how long it might take if we're wildly successful to make a big move away from Mark. 10 years, that seems kind of maybe not an underestimate. So these two features, I think, set up Bipframe as a useful stepping stone to whatever Bipframe then evolves into. You've probably seen this picture of Bipframe. I want to focus on the idea of sharing and reuse within this model and how we treat different parts of it. So the first thing, in the library world, we've had authorities for a long time. This is sort of the precursor of the idea of linked data entities. So we move from authorities to external stores of shared curation of entities where we put combined effort into making authoritative linked data stores and we just link to them. We don't copy it all into the record. In the focus of Bluecore, we have a shared set of bibliographic descriptions. And missing from this picture is higher level works, which might go into the authorities place instead. And then locally, we only store the very minimal data at our individual institutions. So we have taken the level of sharing sort of down in the stack. So if you think of the current environment where we're only sharing the authority data, we're saying, let's build a model where we share the bibliographic description as well and store locally, maintain locally, only those parts that truly are local. So to go over what we're proposing building, a shared linked data store of Bipframe works and instances, maintained and operated by some consortium of libraries. I think there's an important distinction here between one library and the whole world. That's wiki data, right? We're talking about some consortium, perhaps seeded by a small group, maybe Cornell, Library Congest Pen and Stanford, but structured to grow where we have the level of trust to just edit the data. It's not Cornell's data or Stanford's data. It's our data. We, of course, need to integrate with the rest of the bibliographic systems and providers. We've got big companies that provide systems we depend on, EBSCO, Ex Libris, OCLC, Chevy, et cetera. But we fundamentally break the institutional model of copy cataloging. We're not copying, we're linking, or adding to a shared store. And while we're at it, let's take this opportunity to lock that bibliographic data open for reuse, accessible by other institutions that perhaps aren't part of this initial pull as part of a growth strategy. So why? The current approach is incredibly duplicative and wasteful of effort. Just the act of copying, the act of writing description where we should be linking is not effective. We believe this is an approach to make link data work at scale and allow our libraries to better focus their efforts toward richer linking rather than local description. We want to enable data use and reuse by a sort of growing set of libraries, including not just the big ones who can afford to make an experiment and perhaps push this forward. So small and diverse libraries. And lay the future for perhaps working together in this much more open and collaborative way. And in the process, avoid lock-in associated with data systems and services that have hampered progress. So we have an architecture picture. This is kind of a sketch at the moment, but I just want to highlight a few features of this picture. The first is that we need all the sets of traditional data in, data out. We get data from vendors. We get data from other libraries. We share data back with partners, consumers, the public. So all that needs to work. And that does involve conversion because the marked word will still be out there. These tools have already been developed. Cornell and other institutions have separated their ILS, LSP, whatever you want to call it, from their discovery system. And we intend to continue doing that. So we still imagine local discovery at each institution because we're mixing together different resources that are either local or we have access to and we present that. So there's the feeding of local discovery solutions from the shared data store. And then there's an ecosystem around natively editing RDF data. So that involves the curation of works and instances in the shared store, and very importantly, linking to entities in other shared environments. And then I think back to Tom. So we are, at this point, still in the process of conceptual development. We feel that this is the right way to go, but there are a number of important details still to be sorted. So workflow and data flow. This is a radically different approach to how catalogers do their day-to-day work and the mental models and understanding of what it means to have a cataloging record. These will need to be redefined as we go, explored, put into coding, and actually validated or and tweaked as appropriate. We also think that there's going to be there's no future in which we will not have local or cloud-based IOSs, sets of editors, and all of the tooling that Simeon has just highlighted. All of these need to be integrated with. There will be no flag day where everyone uses current tooling and then swaps over from Mark to some future environment with brand new tools. We will be growing into this. If our experience with cataloging in general and with linked data in general has demonstrated everything, is that change comes slowly. We are pulled forward but it is neither radical nor dramatic. One of the big questions that we're wondering about is where is, as data flows across the system and the ecosystem, where is the data of record? And as Simeon mentioned, we'll be having one foot in two camps for a long time between Mark and the bit frame and this convertibility and reversibility of transformations is essential. But as we're making updates and as we're making enhancements, where is the system of record and the data of record? And can we regenerate that and always look back to an authoritative source? One of the questions, of course, is also gonna be organizational design. How do you define cataloging norms for a truly shared pool of data? One of the advantages to the current environment is if you have a different professional opinion or institutional priority for how records are structured or how you want to apply things, you can make those decisions locally. In a truly shared store, everyone is working from the same records, is enhancing the same records and is affected by the changes that others make. Governance of what this looks like, not just in terms of cataloging norms but the infrastructure and the overall approach is a fantastic topic for any group of libraries and I'm sure that we can look forward to many, many, many, many, many, many, many discussions. And then of course there's also the resourcing not only for the original build out but also the sustainability for the ecosystem that Simian has described. In terms of this specific project, which I'll get to is how do we engage broadly across the ecosystem in terms of catalogers, libraries, commercial sector, standards bodies and others who might be involved and interested in participating and then the specifics of project structure, plan and timeline. One thing that we do know is that for this approach to be successful, the Big Bang methodology and a waterfall methodology for software systems developers simply will not work. So we need to be sure that we're iterative in linked data if you've been involved in it for a while. It turns out that the success and what you produce is often different than what was originally imagined or conceived. I think it was yogi, I've seen this attributed to a couple people but the difference between theory and practice is in theory they're the same but in practice they're not. That has never been more true than with linked data. It also turns out as demonstrated by Simian's diagram is that we're not developing a single system with a single set of interactions, we're developing an ecosystem and as we pull forward with progress or changes in one area, it will take time to see how other areas may develop and may respond. This is a co-evolutionary process. And so as part of that we will need to build in time and both to assess and to adapt to our findings. As part of this we also know that from agile software development to minimize risk, every phase needs to have value in and of itself and each phase should be a baby step rather than a revolutionary approach. And then finally one thing we've also learned is it is so hard to predict what the workflow will be, what the human responses, what the professional societies, how they will react to changes in the environment. And as only through actually putting these into prototyping or actual tooling are we able to assess how things will develop if we've got it right or if we need to go back to the drawing board or for minor tweaks. We have been working on this with partners at Library of Congress and University of Pennsylvania for the better part of this year and at this point we feel that there's a fairly solid conceptual or concept here and we're approaching the end of the conceptualization phase. For 2024 we plan to engage in planning and prototyping and then around 2025 start moving into more serious implementation and it's difficult to predict because of the precepts on the very last or predecessor slide, it's difficult to predict exactly what that will look like or how that will be approached. But I think this is an area where all of us feel great interest and are committed to exploring and pushing out. One thing I should mention or I alluded to at the very beginning is this is a time of radical change in our systems environments. So for libraries that have adopted Folio, for example, we are no longer locked into a mark-based descriptive environment. We've done, we've actually have in working integrations with Synopia, the linked data editor with not only Folio but also Alma and previously with Cersei Dynex Symphony. So we know that this is an environment that actually works. You can have catalogers who are actually doing bibliographic description using those tools and then having those percolate into the source ILS for things like circulation and reporting but still retaining the richness of the linked data. As Library of Congress moves towards Folio over the next year to two years, we think that that's really going to break a log jam and we'll see rapid progress across the entire sector. It's really exciting. And with that, I think we have time for questions or discussion. I feel very short right now, but that's okay. Thank you for this presentation. Maybe there's a way I can just, yeah, that feels better. I don't have to crave my neck quite as much. This is, I mean, just to see this change and to see the work that you've all put into it. It's really exciting, but it also feels like something that's like very much cloudy still. Despite that, I think what I'm wondering about, we get to this point where we have a shared catalog record. What considerations have you made around who makes the decision about what goes in those records and how that might play out? I'm thinking about when there's challenges or we need to update records, we need to update the language or thinking about bias, things like that. Has that come into any of the discussions? And I know that's not quite technical per se, but I think it's something early on to be thinking about what kind of bias could be baked into some of these practices. What are the blind spots there? Thank you. So this is not an answer to the question. It's an approach to coming toward an answer, which is to start in a relatively small group where you can hope to have not too slow discussion of what that smaller group can agree on and how you move forward. I think one of the things that we have to remember is the cases where librarians really disagree about how to solve these problems are actually really edge cases. So we have built an environment that supports the sort of independent descriptions at every institution which we use 0.0 something, you know, percent of the time. So in a way it's a sort of question of where are we putting all of our effort? I think as a community we can work out these issues and we're largely heading in the same direction. So it's a matter of building a process that helps us get there. So my university has a shared catalog with a bunch of our state on public libraries. So the way you're talking to me about this, it feels very like I could see academic libraries making a slightly easier transition than the public libraries. How do you see them coming into play with this framework? Well, I think there's, I need that microphone. I think there's two questions here. There's the generation of the records and there's the consumption of the records. I think for the consumption or the consumption of the entities, because you can't say records with a graph, just even the language doesn't work anymore. I think any library would be able to benefit from and use the description that's produced. I think one of the models as Simeon was just alluding to is how does participation and the group editing of the shared data store develop, and that will have to start small and grow larger. We've seen some efforts of that, like WikiData or Wikipedia, which have scaled up quite large and quite successfully, not without controversy, not without a lot of effort. And we've seen things that have failed to scale or things that move too slowly to actually meet the needs. So I'm not sure it will make a difference in the resourcing production if it's a academic library, a public library, a special library. I think ultimately the goal would be to open this up as broadly as possible to any library. And in fact, maybe not any library. In fact, we might be talking about individual catalogers or individual specialists. If they have the training, if they have the knowledge, if there's a good system for trust and a good system for checks, there's no reason that we can't have many more people that are contributing knowledge and ultimately adding links. I guess the thing I might add to that is I can't foresee quite what the end game is in terms of how many shared stores there are. I think it's easy to conceive of a path forward to get to the size of collaboration where truly shared stores are a successful scale to work, but quite how many perhaps somewhat more loosely coupled shared stores there might be is a question I can't predict. There will be more than one. You could predict that. I, okay, I predict that, but I can't. You did already. So I'm just reminding you that Simeon believes there's gonna be more than one. Hi, so as you were sort of talking about that, I don't know, co-evolutionary environment, I was thinking that the current environment has a lot of different players, commercial and otherwise, and not to put too fine a point on it, but who do you see as being sort of displaced from this environment, particularly regarding commercial solutions or vendor-based solutions? I don't think we know what a linked data cataloging environment looks like yet. I think that's completely, no one's doing it. I mean, National Library of Sweden's doing some cool things, but there's no one really doing it at scale with the entire ecosystem. I'm not sure you can say that anyone will be displaced because it hasn't happened yet. Yeah, maybe the language is a little too, but I'm just curious, who might you see then being a part of this environment that perhaps, or who might survive in an environment like this? Or what would they need to do that? Yeah, I guess for me, it's more about sort of redrawing the lines between the boxes in the system. So for myself and my institution, we run folio, I'm quite heavily involved with the folio community. So you could say, does this challenge the position of a system like folio? Well, I'd say no, we still need much of what we do in folio, such as checking things in and out, buying things or licensing things, all of that. But maybe we've moved the bibliographic data out of folio. So we have just that minimal local data in there. So it sort of, it redraws the border of how that system operates with all the other players we're involved with. So that's kind of how I would think about the evolution. Thanks. I think it's really interesting to think about the boundaries of what I think you call the consortium at one point, the group of people that exists and the question of whether there's more than one. You mentioned at least in my head, WikiData and Wikipedia come up as things that sort of resonate with us. And Wikipedia I think is successful to some extent because there's only one of them. And they have managed to pull everybody into a set of policies and tools. Another thing that you could find inspiration in would be the open web, which is successful precisely because it's not one thing, but because you can in fact have links between things that are maintained in different places. So it might be really interesting from an architectural point of view to reflect on what the boundaries look like between consortia and what that ecosystem might in fact look like because a single system probably doesn't make a good ecosystem, but thinking about the boundaries between different systems and different platforms and how you can have meaningful interoperability and some meaningful sense of identity of entities in a distributed environment seems like it would be very helpful. Otherwise it's inspiring to hear this or to see this precisely because it challenges some really foundational notions of what it means to collaboratively maintain this stuff. What that everyone has already said. So to build on those though, in your architecture diagram, which I will not go into detail about, but you focused on the core of works and instances, but then pushed all the other entities outside of Bluecore and to other stores. So at Yale what we found is that in order to have a coherent properly connected linked database system, you have to deal with reconciliation and enrichment, not just of the works and instances, but also the items and critically of the people, agents, the places, the concepts, the events, the time periods and so forth. So do you foresee the same kind of approach taken with the authority, what we currently think of as authority data, but in practice everyone has their own authorities. There is ULAND at the Getty for artists, but those artists are also in rookie data and they're in all of our national library catalogs around the world, et cetera, et cetera. So does this shared space model also then extend to people and to places and concepts and so forth? So the short answer is yes, and I think for two reasons. So I guess the line I had on the slide said, works, instances and more. So philosophically it is reuse external trusted entities as much as possible, but realizing that that's not always possible for potentially, well at least two reasons. One is the transitional case. You're in the middle of a workflow, you're working with an external service that can't accept a sort of immediate update from someone in your workflow. You perhaps need to create something that's perhaps proposed or migrated, but then there's the other case where locally you simply wish to do more. So you're perhaps creating a richer entity description and that would be then shared within the local pool if even if it can't be pushed out to one of the more broadly shared entity stores. Yeah, and in the preliminary discussions we've had the colleagues at Library of Congress who also maintain id.loak.gov this is gonna have to be a blurry line and updates and reconciliation in both ways. But I think 2024 we'll be able to get into much more detail on that. We have time for one last quick question. Hi, Claire. People probably wanna transition. Recently moved to University of Illinois which has a significant international and area studies collection and cataloging function. You mentioned the National Library of Sweden. I was just curious if you could talk a little bit more about what this looks like internationally. Well, there is an entire, there's a Bib Frame Europe or BFE group. And I think when Simeon is envisioning that there might be more than one shared store for Bib Frame I believe many of the National Libraries in Europe are probably going to feel empowered and interested in having their own. I don't, I am not as up on that as some others. Maybe some people that are in this room but there is a strong current of interest internationally in Bib Frame and in adoption, sometimes with local flavors but maybe we can do better than Mark and Intermark I think remains somewhat of a hope. Do you have, maybe you have more details in that? No, I think that's good, thank you. All right, well we are bleeding in so people have quick time for coffee and then to relocate for the next session. Thank you very much.