 Welcome to day number two of I annotate. We had a great time yesterday and I'm looking forward to getting it going again today. And I'm super happy to be introducing a panel that we've been talking about for a while now and really honored to have a pretty extraordinary group of people who are some of the most thoughtful and thought provoking people that I know in this field who are working on pretty extraordinary projects. So let me first just kind of introduce the panel. Brewster Cale, a digital librarian from the Internet Archive. Jennifer Lin, who's a director of product management at Crossref. Mario Torraborelli, the head of research at the Wikimedia Foundation. And Elizabeth Caley, the chief of staff at CZI, Chan Zuckerberg Metta. So over the last 25 years the way that the world's knowledge infrastructure has transformed has been pretty revolutionary. I think we all know that. And the organizations up here are part of that story. We have a crowdsourced encyclopedia that's orders of magnitude more comprehensive, accurate and timely than anything else that we've had in history. A free and open archive of most of everything that's ever been on the web. Free copies of a tremendous amount of the world's books, music and software programs. And now a growing searchable near real time archive of dozens of the world's most important TV news channels going back a decade. And maybe more than a dozen or dozens, I'm not quite sure. I haven't got the update. We have a new resource that using AI reads the world's scientific papers and delivers their insights to scientists in real time so that they can make faster progress. It can predict with surprising accuracy the likely importance of an article within seconds of its publication and is now available to everyone for free forever. And we have a nonprofit service on the behalf of publishers that is opening the metadata about all scholarly works and creating permanent stable identifiers to them. But what lies ahead may prove to be even more breathtaking. We read science fiction because it frees us from the world that we know and allows us to imagine the world that might be. Can be a useful way to think about problems from a different perspective. The recent movie Passengers with Jennifer Lawrence and Chris Pratt puts us in an enormous spaceship of 5,000 people traveling for 100 years to a new world to settle there. They carry with them all the supplies that they'll need to survive, all the accumulated knowledge of their species, a robotic doctor that can cure any disease, fix any injury. We naturally presume, if we think about it, that all that software on that spaceship is probably open source, the APIs extensively documented, and of course the information there is all free and open because anybody who cared about royalty payments would be dead long before any money would be able to be shipped back to them either physically or digitally from a ship traveling for over 100 years. Purely hypothetical, not at all because Elon wants to start manned missions to Mars within the decade. Apparently for now these will be one-way trips and everything folks will need to have to build to do what they're going to do, will have to travel with them. And of course this peculiar looking balloon which hopefully we can get back on the screen is the pressure vessel for the fuel that will power that flight. So they just tested this out in the middle of the ocean about 2 or 3 weeks ago, successfully apparently. So while it sounds kind of fanciful, Elon as we know tends to accomplish the things he sets out to do and he is now actively working to put this mission together. So the question that we have here is what do we need to do in order to build the infrastructure we want for tomorrow? And of course that begs the question what do we want and why? So what I want to do is really get out of the way and let the folks here talk a little bit about their vision for the future and how we can all work a little bit more effectively together to make that happen. So what we'll do is let each one of them share some of their thoughts and perhaps pose a few questions for each other but fairly rapidly I'd love to get to the point where we here open up this to all of you. So I want to start with a few of my questions. So the first is what do we want? What are we trying to build and what kind of knowledge infrastructure do we need in order to get there? So let me start just right to left here and Brewster why don't you go for it. Thank you. This is fantastic. This is the Bay Area because it's full of dreamers and builders and I woke up in a fury this morning so I thought I'd share that with you. Imagine if you're a graduate student in some field economics, medicine, history, law, biology, design and you had an idea to do something new but to understand it to try something out first you had to buy all of the books of the field then you had to build a building to store all of these books and then you had to hire a bunch of experts to help you sort through it. Sound ridiculous? Yeah it is ridiculous. What you do when we were growing up is you went to a library and a university and it was all there. But how about in the digital world? What if you wanted to build a better web search engine? What if you had a medical idea but needed to survey all of the papers? What if you had an idea of how the brain reacts to music and wanted to simulate the brain listening to all music? Can you do that in a university library? No, only at Google. They are the only ones with the library. Daniel Ellis, the best computer science music researcher in the world, tenured Columbia law professor, left Columbia to join Google because he didn't have access to the data he needed. This is crazy. I came of age at the artificial intelligence lab at MIT where we were data starved. If we were going to develop real AIs we needed the library of everything in digital form and we needed reference librarians to be digital as well. I wanted that library to be in the public sphere so that everyone could dream and build. So I set out my career 35 years ago to build that library. Google is ahead of us and privatized it. But they have shown that it is possible and fantastic. The Internet Archive can be a piece of the solution to this. We now have the books, the scientific literature, the music, the television, the history of the web. It has taken decades and hundreds of millions of dollars. Now we need the middleware layers to make it so that there can be a graduate student that can build a new Google as a semester project, a medical student that can do virtual studies on millions of people in a month, a history student that can model what would have happened if women and minorities' votes were not actively suppressed? Would we be where we are now? We can do this. We're close. We need your help. Thank you. A researcher turned to me this morning right before the panel started and said, let's go through some bombs. We're feeling it today and this is the place to do it. So I think we will definitely start the day strong with a few more talk cocktails in the bad metaphor. But thinking about what do we want in this future which Dan has painted for us is a really challenging thing for me because there's the nitty-gritty that all of us are involved in building and it's a small piece of that much larger picture. Thank you. But the larger concepts and the larger design specifications that the entire ecosystem if you will needs is a whole other level up or multiple layers up. Myself and a few collaborators have been thinking, trying to think about how do we see this entire problem as an ecosystem problem? The one major shortcoming to us putting our heads together is that we didn't actually get any biologists or ecologists in this exercise. But anyway, if we treat this ecosystem as a living entity that is robust and healthy environment, it would be productive, it would be resilient. Currently in the academic research industry it is not at all the case. We have parties that have been working for quite a long time, many entrenched values and ways of working, whether it's in publishing, whether it's at the bench as a researcher, whether it's funding, those have been going on for a long time and they have put us in a place where we are quite fragile. We have many parties that are wondering how do we deal with this new thing called the internet? I mean, that is not a resilient ecosystem. One of the big pieces or themes that have cropped up in thinking about this is diversity. What that actually means, I think it would be, we need everyone to kind of weigh in. What does diversity mean in a vibrant ecosystem? We know that is a property. What does it look like for the space that we all work in? We have a paper out that might help start the conversation. I'll just reference that, you can find it somewhere online. I think that diversity is one of those things and it will be great to hear more about what do we, how might we build toward this if we do agree that it is an attribute of this new world. One of the things to build on top of that, on top of this diversity topic is ways in which these individual parties relate to one another, trust one another within this ecosystem. On a very basic level, how do we even talk to each other? This is a very interesting question because we not only have humans, but we have machines which is something that I know Dan wants us to get into over the course of the hour. How do machines and humans, humans and humans, machines and machines talk to one another in this new world, that is something I'd like for us to consider. But even when humans talk to humans, it doesn't translate well across disciplines. In another paper, a bunch of us have been thinking about preprints and the term preprints means a million different things and it's kind of that term that we want it to mean what we most need it to and what we means is the specific local community needs, domain specific needs that are at hand. Given that there are many multiple local domains, this word preprints means a number of things. So terminology has been and proves to be a very challenging thing. It has been not just obfuscating but has created a lot of problems in trying to solve what we do have as shared problems on the table. I think that having some understanding as to what specific definitions are across communities will go a far ways even if we cannot come to say a universal definition, which I would argue from linguistics is impossible anyway. But at least it will help us get to a better understanding of what counts and how to validate it. I think those are kind of side issues that come out of this problem of language and terminology. But that I think is another big topic that I think in our new world we will be much better equipped to solve. I think I'll just close my intro remarks with one thing about the future. We need open scholarly infrastructure and there's a piece out there that I wrote with others that begins to lay out some principles that we think are really important to creating, sustaining open infrastructure. And these principles fall along three axes. One is that it's community governed. Another that it is that these infrastructures are sustainable over time. They're financially, et cetera. Also that it's forkable and it provides the community with insurance that even if infrastructure is no longer needed or something happens to it, it goes away. The overall ecosystem can still find ways in which they can go about doing their work. So I think infrastructure, we'll hear a lot more about that later on. All right, so what do we want? I'm going to think a lot about that question. And I was thinking that if you were to ask a random sample of cool kids from a Bay Area about the future of knowledge infrastructure, probably the answer you will get is something along the lines of one systems that allow me to ask any question in using natural language and voice interfaces and get back like a seamless response as soon as possible with no barrier whatsoever, frictionless lookup of information. I think it's a pretty good representation of many things that are happening at the moment in the knowledge ecosystem. And personally, I want something very different. So I'm trying to figure out why that is the case and asking questions about how the work I'm doing and the organization I'm working for position itself with respect to that vision. And if anything, personally, I don't want seamless and fast systems. I want to think of slow systems that break down knowledge and in fact do quite the opposite of what, say, answer engines are doing today. I don't want to have a quick lookup for something other than maybe how to translate feet into meters. That's something for which I'd like to have a lookup system. But when it comes to knowledge, when it comes to knowing where information comes from, what are the processes that are behind the making of something that we entertain as true, I want to have the ability of inspect and reconstruct the genealogy of knowledge. I want to see where it comes from. I want to see what are the quality insurance checks that have been put in place behind this knowledge. And so to that aim, I think really what I'd like to see in terms of infrastructure is systems that can support that vision. And I think the first piece of that is the idea that we need to build systems that allow us to share a common vocabulary. I'd like to get to a point where every bit of knowledge that we entertain, what is a great knowledge or knowledge can be extracted from a database, can reference an entity in a shared vocabulary. And I want the entity itself to be something that is not owned by any single property or corporation. I like the entity itself to be something that's constructed collaboratively by the community. The second piece I'd like to see in this slow system is preservation of provenance. I think one of the problems we see at the moment where, again, these seamless computational answer engines is that they have the tendency of optimise for quick lookups and quick responses at a cost of stripping provenance and not providing the consumer of this information any way of reconstructing where the information comes from. And I think what happened, I want to say just over the past months in this country, but more generally, what's happening over the past years all over the planet with people questioning something is entertained as a scientific truth really calls for systems that do quite the opposite, that don't give quick answers, but allow us to reconstruct the provenance of information. And I think the last piece of this is that I'd like to see knowledge systems that allow experts and laypeople to come together and share not just the vocabulary, but also understanding of these concepts. It struck me that, you know, in the context of the March for Science and initiatives that many of us, many in my work contributed to, everybody was infuriated about the fact that, you know, the public discourse is moving away from scientific evidence. At the same time, we live in a world where science lives in this bubble and the vast majority of the output of science is not accessible by the regular person, by a patient, by a student, unless they have a subscription to some kind of service that only universities and rich universities can pay for. That to me is the fundamental problem of having a shared space that is in the open where citizens and experts can share in the definition of the standing of these objects. That to me is one of the biggest problems we need to address with the design of the systems. So when Dan asked us to take part in today, we were extremely excited and when he asked us then to think big, we were even more excited because that's what we like to do a lot on the meta team at Chan-Sahkaberg. We're in a pretty unique position to be able to think about what we can do now for the next 80 or 100 years in order to improve equality and equal opportunity for people all over the world. And specifically on this subject, which is one that, again, the meta team has been thinking about a lot. I invite you to imagine a world where we can tap into the brain of every single scientist who has ever lived and that we can have access to humanity's collective understanding of all the mechanisms of life, all the mechanisms of nature and the universe, really everything that has advanced our human progress to date. This would change the way that we ask questions going forward about scientific knowledge, how we tackle the next most important problems in front of us overall. And we all know here in this room very acutely that science drives all of modern life from what we're wearing to what we eat to the devices we put our crib notes on here at the front of the room, but over the last 350 years very little has changed about how we experience scientific knowledge and it really exists in only two forms to this day, one which is inside the brains of the living scientists on the planet and the second is in the millions of articles and journals and textbooks and now web pages digitally that have been written throughout history, but neither of those forms are really make it easy for us to understand the big picture, put it all together and understand how that sort of global vision of knowledge actually exists and then to build on it very easily and very quickly from a big picture point of view. So imagine if we created a third place that we could contain all of human knowledge for it to live a place where the entirety of scientific understanding can be grasped and explored and examined at various levels, various levels of detail whether you're a lay person or whether you're an expert with a shared vocabulary. So if you're coming new to a field or new to a particular area, you could actually really start to understand not only through reading, reading thousands of articles or web pages, but by exploring dynamic up to the date models to really start to understand of where we are in the progress of understanding nature's systems and then imagine if this system was automatically updated to the minute when new discoveries and innovations were found and imagine then if the system could generate new hypotheses that automatically by looking at the patterns in the data and looking to project where there might be potential connections between entities and nodes that human scientists haven't thought of yet and then we can go as scientists into the field, into the labs and start to perform experiments based on these hypotheses that would be generated by a system. Essentially what we could do is start have a new starting point, a new starting line for scientific knowledge. And that's something that we think about a lot. How do we make all scientific knowledge computable? Thank you very much. So everybody up here has been handpicked, particularly because you guys have solved and continue to work on some really hard problems. And in particular, you know, Brewster, you having mined the web and scanned a lot of books and basically fought hard for access, not because you negotiated it, but because you extracted it. Elizabeth and Mehta, you guys, one of the most extraordinary things I think you did is negotiated access in a way that nobody had ever done before. Even Google Scholar hadn't negotiated the kind of full text access that you guys were able to pull off in Mehta. And Dario and Jennifer, you guys have been working together on a really interesting project, the initiative for open citations. And Dario showing a lot of leadership there and Jennifer working from Crossref who had the information to make available, but needed the impetus from outside to help catalyze the publishers to go ahead and release that. And that those open citations can help us make public the links between data that can help us propel us forward. So the question, I guess I have now, is what exactly are the hard problems that we need to work on next? And I'm going to go in the reverse order. And great different points of view, but there are lots, lots of problems. Some of them are technical, but I think the most important problems are not necessarily technical or certainly can't be solved by technology alone. And I feel extremely lucky to be building on the shoulders of giants because I do think the first problem is collecting the information. And so we use Crossref, we use the information from the publishers that you mentioned, and are in total awe of things like the Internet Archive where the first thing we have to do is get all this information together online in a format in our view of the world in a format that's computable so we can do more work on it. Then we have to capture and understand all the entities, and that's not an easy problem to do in all languages, in all fields, and make sure that we understand what an entity is and how they're related to each other. And really that great progress has been made there, but there's lots more work to do. I think one of the hardest problems and one that we're particularly focused on is then how do you understand and extract the relationship between those entities? So in science, what's the relationship between this gene and this molecule or this drug target and this disease? And how do you take that from the scientific literature, which is primarily what we focus on, and do it in a way that's important and the hardest part is in context. So how do you represent the paper, the lineage that you were speaking about a little bit earlier, where there is a relationship, say this molecule has shown to help suppress tumors in mice populations, but then there's another finding in another paper that says that same relationship doesn't seem to exist in human populations. So it's not just about finding where there are relationships, it's understanding the type of relationships, the directionality of relationships, if that's possible, the strength, and most importantly, the context. There's a couple other I'm going to skip. There's lots of interesting problems around visualization and annotation and having a way that both the expert populations and the lay populations can comment and ask questions and help evolve the model or the models as they progress. It has to happen at scale. In the time that we're up here on this panel, there's another 200 new scientific papers published in biomedicine alone and that rate is doubling every nine years. So we need something that's going to keep up with all of our innovations and discoveries. But I think the most important problem is bias, is taking all of the information that has gone back 350 years. I don't think anybody would think that 22, 23 million published scientists would in throughout all history would actually represent, be a good representative population for the rest of the planet. So anything that we build that's based on existing knowledge that comes from human is going to come with all of those biases. So how do we accommodate it for that? How do we make that entirely transparent? And over time, how can we change that? All right. So yeah, I actually second that a lot. I don't think that the hardest problems are necessarily technical problems. I think there are questions around sustainability. A few people brought this up. Then you brought this up, too. I guess the question of how to build systems that are open and reusable and supporting the ecosystem as opposed to supporting one individual player and maybe resulting a few days, a few years down the line is something that just disappears because of there's no more funding or there are different business priorities. That is a hard question. And it's a hard question that I think this group of people and the organizations that you represent can contribute to the answer. I think that all the technical problems around the information extraction and matching harvesting of facts and literature, they're all tractable. There are scalability issues. There are technical problems, but in terms of like, you know, finding the best ways of disambiguating entities and what not, but these are all like technical, technically tractable problems. What is, in my opinion, not easily tractable at the moment is how to figure out the design for a graceful integration of the systems that is conducive to the creation of a common space infrastructure. And on that note, I want to say something more specific. I think one of the teams that have emerged recently is this big debate between like a centralized versus distributed and federated approaches to building this ecosystem. And personally, I found that question fascinating and also extremely complex. It's a mix. Again, it's not so much a technical problem. It's more a question about governance and social, like social aspects of technology. And I feel that's actually one of the questions I'd like to ask the panel later on. I'd like to see how we individually stand with respect to that question of centralization versus federation or decentralization because I believe that that's something that may work differently along different dimensions. I'm curious about your thoughts around centralization, around identity or branding versus decentralized infrastructure. I'd like to maybe help unpack this question, which to me is one of the hardest problems that the knowledge infrastructure ecosystem is facing at the moment. I agree with Dario, and that is actually one of the hard problems that I think is probably and there's no clear cut answer. And the answer, the most appropriate answer that is best for the entire ecosystem were there to be one would change over time anyway. But so I think this is a good ongoing perpetual eternal question. And to others, the question of public versus private, it's not necessarily I support openness. I don't think there are very few people who do would say open all the way down, only open. There is a proper space for open for some of its it's much larger than for others. But that line between public and private, you know, I think most people in so far as both exist, they exist within the larger ecosystem. And so how can we design the system in order to support whatever advantages can accrue out of both. So that's one of the hard problems. Another hard problem I think that we need to tackle is how do we support both cooperation and competition? I think. This is it comes down to human nature, what drives us, what are incentives? The human nature part is human nature. That may well not be changeable, but the external conditions within which all of us work under within which we work is is is definitely part of the question, right? And can we design a system where cooperation flourishes, where competition may occur and in a productive manner, right? And I think a lot of that may very well touch upon, have a lot of touch points with the public versus private question. One of the offshoots of the centralization versus decentralization are federation question, which Dario raised is a small, minor, I guess, use case, if you will, will I think the bigger the bigger use cases at the level of systems, right, and ecosystem. But I was listening to an interview with with Cal New Port recently. I don't know how many of you have heard of him, but he got me thinking about a different way or a minor use case of this whole centralization versus decentralization problem, which is at the level of the individual. So there is the individual, say a researcher or a layperson and there is thinking that goes on in the process of knowledge production, right? But with all of our Internet technologies, all of which us on the stage are part of building, we have we've constructed what he calls, you know, the hyperactive hive mind available to us. So this tension between deep work, his term versus hyperactive hive mind, I think is going to be one of the really big questions for all of us as we think about the tools that we use, the tools that we build. How do we create them in a way where we still have and hold on to the conditions in which knowledge production needs to happen? Perhaps it's a particular form that deep work he calls is situated within. But I think, you know, we all, I'm sure, have struggled with balancing, you know, the notifications and the new new tweets that have popped up and the onslaught of research articles that we feel we need to stay abreast of. That is that is the hyperactive hive mind. And so what type of social processes do we need to have in place in order to better find that balance between the individual work versus the decentralized, tapping into the decentralized aspects of it is another big problem. Post truth. Alt facts. Do you take this personally? I think you should. We're fucking up at a scale that is damaging things. The the those movie of open does not have its ending written yet. It's up to us. It's going to be written by us. We have to show the value of open and we actually have to prove that it actually works better for people. We can fix this. And it's really actually a bunch of it is our fault. And we have to fix it. We have to show the value of open. We have to go and sort of tie it all back around. We have access to a lot of these information. We have networks. We have communication structures to people. We conned everyone into turning to their screens to answer questions. And what have we built? What have we gotten? We've got a problem out there and it's a lot of it is our fault. So we need to pivot around examine ourselves. Look at what we've done and fix it. We have good data. We've got people earnestly trying to figure out answers to questions. And they're coming up with what. So how can we make systems that can help people answer problems and questions that are more deep than what you can with the current web services? That is our challenge, our opportunity. We've got everybody by their screens. Let's go and build a better system. We need middleware layers that allow a lot of different systems to bloom. We need to tame petabytes. As Danny Hill has put it 30 years ago, we need the spreadsheet of big data. How do you go and make it so that it's manageable so that you can go and leverage what it is that we have at our capability and actually make it useful to people? I'm motivated. I'm looking for others that want to help with it, too. OK, we are going to we have our human mic stand, otherwise known as already over there. And if you guys have questions for the panel, stand up and let them fly. Hello. Well, good morning. It seems like you're all bumping up against the same thing, which I want to provide a reflection of how I know it, which is where so we're dealing with knowledge systems, information, structuring it. And then what's for me, what's missing is like when it comes to applying this, there's still myself and how I'm being when I'm applying this. And that for people is a tricky part because it doesn't work in the same way as structuring something outside of myself. And so interfacing people where knowledge doesn't make a difference. It occurs to me that communication isn't about knowing things. It's about discovering other people's peopleness. And that that's where I'm saying that you like the traction to really have all this latent knowledge, the power of knowledge actually enter the world through people's interaction. Like that's that's what I hear you're bumping up against. This would comment. Give us a question. Does that are we talking about the same thing? Thank you. OK, so as someone who who comes from the open community and is now deep within the bowels of a large publishing company, this is something that I struggle with on a daily basis. And I think you. You're right, Brewster, exactly when you say that this crisis that we've been given is is a huge problem. But I also believe we shouldn't waste a really good crisis. And we have this we have this great opportunity to take the value that is foremost in everyone's minds of. Expert curation of things, right, and the value of experts and really use that to make the case for. For what it is that we do as a community and how openness, you know. Allows people who are engaged in the process of their daily life can engage with with and see the value of experts. All the studies, you know, in the survey still show that the population trusts scientists more than they do politicians. So. You know, I just wanted to I have no question. I just wanted to put it out there that I think that I think, you know, first of all, I'm rising to your challenge and we should not waste a good crisis. Thanks, William. OK, I'm going to mandate a question. OK, short. And to the point to I have a question to somebody on the panel. I have a question and it's to all of you on the panel, but you primarily. Dan, because I think you know the most about hypothesis. So when I was moved when Dan gave his talk at the Personal Democracy Forum, he made a convincing case that we're facing an existential crisis as humans. Am I wrong? That's what you were saying. We don't we're halfway through a very thin carbon layer. If we don't do it, there is our planet doesn't have the resources for intelligent life to evolve again. OK, now the next thing that I asked is if what you're trying to do is put a conversation layer over all knowledge. That conversation layer is going to if all knowledge exists on the Internet, that conversation layer is going to be bigger than the Internet. But the mathematician in me says what if all knowledge was a variable called X. And there was only one website in the world that contained one character, the letter X. Of course, the computer scientist in me wants that character to be a dot. And your task is as humans, create a conversation layer over that X or that dot that consists on contains all knowledge so that and you also made the case that the person who controls the real estate controls the narrative. And you were you were saying, OK, we'll give two thirds to the guy who built the real estate, but we want one third for the rest of us. But I think that the letter X takes up a lot less. So my question, are you up to that? Because I own WWW dot dot word and I want to put a dot there and I want to challenge you as your tool up to implementing that conversation layer all overall and knowledge where that dot or the X is the unknown and we can get rid of all that real estate with a little help from my friends. Yes. OK, so I suppose we take that as a parallel. Mike, I want a question. OK, this is a question, honestly. Two sentences of prep for the question in the 1960s. You know, we have a series called Star Trek and they showed a computer based on their understanding of what computers were at that time. Then we had amazing people. We had Engelbart. We had Adele Goldberg. We had Alan Kay. We had Ward Cunningham. We had people who realized that it was really about computers helping people to think better, right, showing people how to think. And so my area is really you know, I come from education and that's what I'm interested in is technology that doesn't give us answers like a Star Trek computer, but technology that trains us and puts us into the grooves of productive thought of thought that benefits society. So my question is, since I completely agree, Silicon Valley is obsessed with the Star Trek computer. It's a billion dollar enterprise down here. Can we get to where we need to get unless we kill this ridiculous dream, OK, of everybody being Captain Kirk? That's my question. So I think that one of Darius comments earlier speaks to your question, which is provenance and can we get what Elizabeth has called the lineage of the production of knowledge? You know, the journalism is trying to tackle this right now with the rise of old facts and etc. By showing and they've been doing this even before, right? This term has become the death now of democracy in the U.S. With, you know, by exposés that highlight, say, the funding that went into this particular effort, this campaign, etc. Right. Those are all unveiling what has happened so that everyone knows. And I think that the question of provenance, the question of lineage of knowledge, that's those are all analogical and the more that we can build that into the system, so there aren't machines that give you, say, the easy answer, the better off will be. Yeah, I'm going to refold that for me. There is a beautiful quote that says that the non-profit product is a changed human being. And I look at that quote when someone brought it up to me and I was thinking, even in the context of Wikipedia, which in a way sounds like the easy answer to that question, are we living up to that standard? And I don't think we are. So back to your point. There's a question of storing a structure in collecting knowledge. There's a question of changing human beings and giving them the tools to understand and to form opinions about anything. And if you go and read any Wikipedia article on something slightly scientific, good luck learning about some notion of statistics or genomic or you name it. I think we still have a massive challenge in bridging, as I said before, the question of storing representing knowledge for experts on the one hand and turning the system to something that can change the public understanding of this knowledge. And I think you're totally right. That is the number one question that to me when you try and answer today. So this is a question to Metta. When I hear computable knowledge, I think Wolfram Alpha. I know a bit about the pipeline that company uses to curate knowledge and to make it computable. And I wonder whether or not you've learned from that and how you might be advancing the story. It's a great question. There's there certainly is, again, building on the shoulders of giants there. It's something that our co founder, Sam Malin, who couldn't be here today. We talk about a lot. So happy to take the conversation offline. I'm not the the expert in the rumor in the organization, but it really is where a lot of the inspiration comes from. Hi, question for Dario. I really appreciated preservation of provenance. I hadn't heard that term before in it. Really opening. I had a question about the community curated common vocabulary you mentioned. I'm curious what projects you see now and how you see going into the future. How will we agree on how to reference either entities in the world or ideas? How can we even approach such a problem? Excellent. Yes. Yeah. So shameless plug. Time. Yes. One one project that I'm very excited about going that direction is Wikileta. Wikileta is the most recent addition to Wikimedia's projects. It's an open knowledge base that works at Wikipedia. It's transparent. It's editable by anyone. It's curated by humans and machines alike. It provides powerful APIs and it tries and and and create what I see as the glue of the vocabulary that can be that can connect knowledge bases across disparate fields. So I'm really excited about it. And I'm seeing like a like many organizations trying to figure out how to cooperate with that to create like the backbone of knowledge using data that is entirely in the public domain with no copyright restrictions. So I think that that's something great that's happening in that direction. The one the one challenge they see speaking to people who have been doing this for years. There are great communities of bio curators who have been using you know soft money from governmental agencies to build open knowledge bases that are taxpayer funded and they can then be reused by the public. And the number one challenge that these people report when I ask them OK what was the problem that keeps you up at night is the fact that these knowledge knowledge bases are siloed and it typically can disappear or become unusable after soft money disappears. Right. So I think the question goes back to the sustainability issue how how can we build an ecosystem where all these efforts are very often are funded by a taxpayer can result into a knowledge base or a share or network of knowledge bases that is sustainable and it doesn't disappear after three or five years. I want to come back to a subject that came up a couple of times which is distributed versus federated systems or centralized systems rather. And my question is can you do what you do in that kind of worldview and Brewster would there be an interplanetary file system involved in that. Things are harder to build than centralized systems. I really like the line. I want a system that works as well as a centralized system and fails as well as a decentralized system. And it's really really hard. I asked Vince surf who like how did you go and make the Internet protocols work in a decentralized way such that if any particular piece of it gets nuked it still works around. And he said that it took a year of about seven people sitting together and they had one guy that what I don't remember his name who is supposed to debug it. So they came up with an idea on how to solve problems and then this one guy would go and say no if this happens the whole thing falls upon they say shit. And they'd have to go back and they'd have to try to figure out how to make the protocols work again. And they just did this for a year to come up with TCP IP. The World Wide Web is awesome but but really fragile. I mean the cool thing about it is it took it would take an afternoon in Pearl if you remember that programming language. An afternoon in Pearl to make a web server. And that was a huge advantage at the time. But it's really shown its problems. We've got we've got an infrastructure that's long in the tooth. Snowden has pointed out every in a lot of problems of building a web the way that we did. You say it's decentralized because it's got lots of but now it's getting you stand in the way of WikiLeaks you can watch all the traffic to it and the GCHQ did and handed it to the NSA. So we got a problem out there. So but it's easier to build centralized systems than decentralized systems. We have to go the extra mile. So even if we've gone and built a centralized system then we have to take it the next mile and make it decentralized. This is very difficult to do within the capitalist system. And it's really hard to do ego wise. But it's important to take the next step whether it's Wikipedia or the Internet Archive. We've got to go and take it apart and make it a decentralized system. On that question I guess I'll speak within from the perspective of scholarly communications. I think that there this is a bounded space. The those who benefit from it is unbounded. But there are specific parties that play certain roles within the space. And in this environment I haven't seen much hope or possibility for decentralized infrastructure. Something as simple as persistent identifiers has been shown to work in a centralized environment. So I mean take it with a grain of salt. I work for a centralized party that serves an infrastructure or scholarly infrastructure. I think that to Dario's point there are many of these infrastructures that serve very important needs. The question of sustainability is a problem. Decentralization may help with that or that concept or model might address some of those financial issues. But from a very practical standpoint there are certain I guess fundamental things that all of us need in order to do the work to build on that work to share that work. And to me it seems like that part of bottom layer of infrastructure needs to be centralized. Yeah. I agree in a very much much like the quote about failing and succeeding terms of centralization decentralization. I believe the my way of rephrasing that question is to think about the problem of the two separate problems of centralization decentralization at the infrastructural level and at the social level. And I want to cloud the example of hypothesis in the open annotation standards. I think it's an extremely bold move. What you know hypothesis is an organization and the movement involved have invested into of creating basically a self-effacing layer that really doesn't require specific applications specific vendor to be able to contribute this kind of data just using interoperable systems and infrastructure. The main question I have is how that works at the social layer. Something that we've seen a lot of Wikimedia is the fact that ironically the Wikimedia movement is extremely distributed and decentralized internally. So when people think about Wikipedia think about the website but the making of that content is the result of a large number of disparate communities and tools and that really not much to do with with each other. But what keeps them aligned in a way what keeps their their purpose aligned to a shared vision is the centralization of the brand and kind of weird to speak of a brand for a non-profit project. But that's really what it is about like knowing that a label like Wikipedia is associated with open licenses and no control by governmental or private properties. It comes with the fact that the data that is collected the behavioral data is not sold to anyone like these values are something that to me are a benefit of a centralized system and they can work really effectively at aligning the purpose of people who believe into this notion of building the commons. The question is how to combine that centralized aspect with the infrastructure that aims to be as self-facing and decentralized as possible and have the answer but I'm looking at this space with a lot of interest and curiosity. No, no, okay. No, so a decentralized system. So I take some of the really kind of great ones. Scientific literature itself. So the Enlightenment ideal had a basic set of rules of what it took to get a science piece of scientific literature written as a decentralized model. I'd say even Stallman's GNU thing was a decentralized model. You just adapted to this and it worked and it was a license and you can even rev with the license a little bit before you get Stallman crawling down your pants. But it worked as a system towards moving forward. Even libraries in the print era I would say were a decentralized system and as the digital wave has gone over us we've taken the easy way out and centralized things. And we've ended up with J Store, Hathi Trust. We've ended up with Wikipedia, Internet Archive, Crossref. These are centrally owned and controlled entities that there's some part of it that it provides the brand, the centralization, the local structures that are not only its greatness but it's an undoing. And if we're going to build long-term robust structures that are going to outlive the generation that built the original ones, it's most likely to be a decentralized structure that allows adaption, revolution and still coherence enough to move forward which we saw in the physical era and there's this network effect of winner takes all in our area and there's this ability to step forward and own and control a space. Isn't it completely weird that you can go and rent a taxi in Madrid and pay 25% of that fee to a company in Silicon Valley? Isn't that the weirdest thing? I mean it's great. I use it all the time but that ain't no decentralized system. So it is hard to build things that are really honest to God decentralized and I tip my hat to those that have figured out how to do it. I haven't yet. And we're struggling with it and we're like ideas and help with it. So yes there's lots of communities building Wikipedia but there's some centralized problems with how some of those things work and we're seeing it happen for real. So I'm not going to beat up on you but let's this is the group that's going to think big and think about the structure of how things work. Let's do our jobs really right if we possibly can. It may take some steps along the way through places that we don't necessarily want to be but let's keep our eye on the big picture. I don't disagree on the big picture. I want to take a realistic pragmatic approach to how to get there and it's funny you brought up the scholarly system. I cannot think as of today. I mean the scholarly system is the most ridiculous centralized system at the moment. And we're in the digital era right. Right or no I get it. Yeah I agree with you on that on that point. But changing so changing the current system and the incentives and the representation system that lies behind the current state of scholarship to go to a fully distributed system is going to require more than just coming up with a with a an excellent distributed technological solution is going to require realigning the functioning of funding the functioning of how people decide what to work on. Yeah. And I'm looking at the preprint movement as a great right. Opportunities here to do something right. But so what I want to say is that totally agree. It's a freaking hard problem and the one thing I want to say about what communities the most horribly centralized entity I agree with you on that one. So internally like I said it's a collection of lose actors but as a property as a as a website it is extremely centralized and in that respect is not tremendously resilient. There are some benefits to it and the one that I didn't mention before is sustainability financial standpoint. Right now the vast majority of the revenue of the Wikimedia Foundation still comes from individual donations and that is something I think it's hard to generalize because Wikimedia just happens to be in that spot of the Internet out of some historical accident. But it allows like that centralization allows something is in the common good to be maintained primarily out of individual donations without any major players having control on that I'd like to see like what are like practical steps that even on the financial and government sustainability can be implemented to build something that a is open and not open by anyone but also financially sustainable in an independent way. Dan we've got some questions. Hi. I heard expert a heck of a lot and I want to just say amateur because amateur and expert there you know Dario answered one of my questions about you know these contributions and people have been talking about integrating non-experts but certainly amateurs a very positive way of saying non-expert. Bernard Stiegler is a theoretician digital theotician from France he talks about a contributory not a participatory but the contributions that every day people steal workers people working garages make to the knowledge of the community and the community knowledge is what the answer that I got from Dario which is talking about Wikipedia but the community knowledge does need to be distributed locally known locally and contributed locally. What's the question that's a good question. Dario kind of answered it but I just want to ask if we could use the word amateur as opposed to non-expert. Chuck already I want you to enforce questions. I have a question so we often talk about centralized and decentralized but really what I think we care about is failure under a variety of different case cases and how we recover from that and whether the system is centralized or decentralized doesn't matter right. So I wonder whether is there some way that we can think about this in terms of robustness rather than centralized and decentralized. Jennifer. I'm not quite sure I probably grok to the question but to add to the decentralized versus not federated sorry centralized versus decentralized question. I guess I personally I don't necessarily I'm not accusing you of this Brewster of associating centralization with authoritarianism but some do and so I guess I would encourage us to think of more of broader meanings rather than that more extreme end. To Dario's point there are because something is centralized there may be it may be the way that it is run though the governance the engagement may yet be decentralized. So there are different levels whether it's the technology the operations the content production the governance the finance the brand. I mean there are many many different levels. So when we're talking about centralization versus federation it may we may also benefit from thinking more nuanced that there are there are different layers and it's not one thing all the way down. I'm sorry I didn't get your question though. Just to clarify it's that decentralization or federation are ways of building systems that are designed to prevent a certain class of failure modes and they don't protect against all classes of failure modes centralized systems are susceptible to it you know some similar and but many other types of failure modes. Right. And the question is should we have a conversation about what failure modes we find acceptable such as being spliced on by our governments and are there others that we need to just sort of accept that they're going to happen sometimes. So it's a question about should we be having a conversation about robustness which is more technical in a sense rather than the one which is well do we want centralized or decentralized. Great point one last question and then we got to go. Okay great I'm going to take us in a slightly different direction I guess. So I was really struck by Elizabeth's point about the bias of existing knowledge kind of inflecting any anything new that we build on top of it and my training is in library and information science. So that's very much in line with my understanding of how information is created and very particular often implicit context. So I just love to hear from the rest of you as well kind of about this issue of bias and kind of what happens when you port information from one context to another. And in this case we're talking I think about open we're open context. Yeah. So kind of how we can address that. I can start that one. Yeah it is I think like a really high priority question for many people working in that space I can speak for for Wikimedia. I want to tell a story of something that not many people necessarily know about. So Wikimedia is still primarily created by Western young male contributors. They tell the story of you know the worst knowledge from extremely limited and privileged standpoint. There are ridiculous gaps in this knowledge and skews. One of my favorite examples is that there are 20,000 articles on French Wikipedia about individual asteroids but a language like Hausa that is spoken by 30 million people in Central Africa doesn't have an entry on the universe. So if you take the the the sum of all knowledge there is representing Wikimedia and you look at where it comes from and who created it it's ridiculously skewed and slanted towards the demographic of the contributors. The problem with that is not just with Wikimedia. Wikimedia not many people may know but the contents get translated into RDF via projects like BbPedia they're then propagated to the rest of the Internet. And basically every single link data system they use today what is like a search engine for music or biomedical information gets its entities gets is like a fundamental relations from Wikipedia. So buy us in buy us out. The fact that it is a small population of contributors that are creating data and information that powers the entire ecosystem that AI relies upon. I think it's a fundamental problem that we all should be worried about. I've been very encouraged by watching some of the studies of how people use the web. People are very particular and very peculiar. Nobody wakes up in the morning saying hey I want to live a biased life or hey I really want to go to the biased and unfair news channel. What I think we're missing out there are tools for context and citation. We've made it hard for people to actually know what the hell they're looking at that we've made it so that it's really difficult to go and understand is this some babble that just has been bouncing around for a long time and long discredited or is this something that actually is real and I have trusted sources behind it. So I'm encouraged by people want to have access to this stuff. The Internet Archive gets four or five million people a day coming and using its services as best we can tell. It's about the 300th most popular. This is about the most the fifth most popular. OK I'm a little envious but it does indicate that there's a lot of interest in finding deeper information than is casually available. So people want it. That's the good news. Now we need to build some of the tools I would suggest for citation for context and embed it and that's what this whole conference is about. I'm really glad to be here. Sorry one last note on context I think you've gotten to the heart of a really really big problem which we missed out on the entire problem of knowledge production is about context not just merely switching from one platform to another but you know to take perhaps a banalic example a researcher who read a paper of a lab that performed a set of experimental conditions that requires a context change for if you are working on a different organism if you even if you're trying to validate and reproduce those results that is a context change which requires translations. So big new big problem we should definitely work on this. I think with that we will wrap it up and just want to say thanks to the panelists and for coming up here and sharing.